Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add test coverage for indirection transformation
- 64492917280a 19 (unreleased) landed
-
Fix typo in comment
- 81a61fde84ff 19 (unreleased) landed
-
Implementation of subscripting for jsonb
- 676887a3b0b8 14.0 cited
-
SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2024-08-29T16:33:28Z
Hello Hackers, I’ve attached a patch to start adding SQL:2023 JSON simplified accessor support. This allows accessing JSON or JSONB fields using dot notation (e.g., colname.field.field...), similar to composite types. Currently, PostgreSQL uses nonstandard syntax like colname->x->y for JSON and JSONB, and colname['blah'] for JSONB. These existing syntaxes predate the standard. Oracle already supports the standard dot notation syntax [1]. The full specification for the JSON simplified accessor format is as follows: <JSON simplified accessor> ::= <value expression primary> <JSON simplified accessor op chain> <JSON simplified accessor op chain> ::= <JSON simplified accessor op> | <JSON simplified accessor op chain> <JSON simplified accessor op> <JSON simplified accessor op> ::= <JSON member accessor> | <JSON wildcard member accessor> | <JSON array accessor> | <JSON wildcard array accessor> | <JSON item method> I’ve implemented the member and array accessors and attached two alternative patches: 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch enables dot access to JSON object fields and subscript access to indexed JSON array elements by converting "." and "[]" indirection into a JSON_QUERY JsonFuncExpr node. 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This alternative patch implements dot access to JSON object fields by transforming the "." indirection into a "->" operator. The upside of the v1 patch is that it strictly aligns with the SQL standard, which specifies that the simplified access is equivalent to: JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) However, the performance of JSON_QUERY might be suboptimal due to function call overhead. Therefore, I implemented the v2 alternative using the "->" operator. There is some uncertainty about the semantics of conditional array wrappers. Currently, there is at least one subtle difference between the "->" operator and JSON_QUERY, as shown: postgres=# select '{"a": 42}'::json->'a'; ?column? ---------- 42 (1 row) postgres=# select json_query('{"a": 42}'::json, 'lax $.a' with conditional array wrapper null on empty null on error); json_query ------------ [42] (1 row) JSON_QUERY encloses the JSON value 42 in brackets, which may be a bug, as Peter noted [2]. If there are no other semantic differences, we could implement simple access without using JSON_QUERY to avoid function call overhead. I aim to first enable standard dot notation access to JSON object fields. Both patches implement this, and I’m also open to alternative approaches. For subscripting access to jsonb array elements, jsonb already supports this via the subscripting handler interface. In the v1 patch, I added json support using JSON_QUERY, but I can easily adapt this for the v2 patch using the -> operator. I did not leverage the subscripting handler interface for json because implementing the fetch/assign functions for json seems challenging for plain text. Let me know if you have a different approach in mind. Finally, I have not implemented wildcard or item method accessors yet and would appreciate input on their necessity. [1] https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/simple-dot-notation-access-to-json-data.html#GUID-7249417B-A337-4854-8040-192D5CEFD576 [2] https://www.postgresql.org/message-id/8022e067-818b-45d3-8fab-6e0d94d03626@eisentraut.org -
Re: SQL:2023 JSON simplified accessor support
Peter Eisentraut <peter@eisentraut.org> — 2024-09-16T19:44:54Z
On 29.08.24 18:33, Alexandra Wang wrote: > I’ve implemented the member and array accessors and attached two > alternative patches: > > 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch > enables dot access to JSON object fields and subscript access to > indexed JSON array elements by converting "." and "[]" indirection > into a JSON_QUERY JsonFuncExpr node. > > 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This > alternative patch implements dot access to JSON object fields by > transforming the "." indirection into a "->" operator. > > The upside of the v1 patch is that it strictly aligns with the SQL > standard, which specifies that the simplified access is equivalent to: > > JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > However, the performance of JSON_QUERY might be suboptimal due to > function call overhead. Therefore, I implemented the v2 alternative > using the "->" operator. Using the operator approach would also allow taking advantage of optimizations such as <https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn%2BBCbFaBMt3z-2%3Di5fKO3bW%3D6HPhbid2Dw%40mail.gmail.com>. > There is some uncertainty about the semantics of conditional array > wrappers. Currently, there is at least one subtle difference between > the "->" operator and JSON_QUERY, as shown: That JSON_QUERY bug has been fixed. I suggest you rebase both of your patches over this, just to double check everything. But then I think you can drop the v1 patch and just submit a new version of v2. The patch should eventually contain some documentation. It might be good starting to look for a good spot where to put that documentation. It might be either near the json types documentation or near the general qualified identifier syntax, not sure.
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2024-09-23T19:22:20Z
Hi Peter, Thank you so much for helping! On Mon, Sep 16, 2024 at 12:44 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 29.08.24 18:33, Alexandra Wang wrote: > > I’ve implemented the member and array accessors and attached two > > alternative patches: > > > > 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch > > enables dot access to JSON object fields and subscript access to > > indexed JSON array elements by converting "." and "[]" indirection > > into a JSON_QUERY JsonFuncExpr node. > > > > 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This > > alternative patch implements dot access to JSON object fields by > > transforming the "." indirection into a "->" operator. > > > > The upside of the v1 patch is that it strictly aligns with the SQL > > standard, which specifies that the simplified access is equivalent to: > > > > JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > > EMPTY NULL ON ERROR) > > > > However, the performance of JSON_QUERY might be suboptimal due to > > function call overhead. Therefore, I implemented the v2 alternative > > using the "->" operator. > Using the operator approach would also allow taking advantage of > optimizations such as > <https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn%2BBCbFaBMt3z-2%3Di5fKO3bW%3D6HPhbid2Dw%40mail.gmail.com>. OK, that makes sense. > > There is some uncertainty about the semantics of conditional array > > wrappers. Currently, there is at least one subtle difference between > > the "->" operator and JSON_QUERY, as shown: > > That JSON_QUERY bug has been fixed. > > I suggest you rebase both of your patches over this, just to double > check everything. But then I think you can drop the v1 patch and just > submit a new version of v2. Done. I rebased both patches and confirmed they have the same test outputs. I attached v3, which also adds JSON subscript support on top of v2. > The patch should eventually contain some documentation. It might be > good starting to look for a good spot where to put that documentation. > It might be either near the json types documentation or near the general > qualified identifier syntax, not sure. Right, I’m not sure either. A third option, I think, would be to include it in the JSON Functions and Operators section [1]. [1] https://www.postgresql.org/docs/devel/functions-json.html Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2024-09-26T15:45:11Z
Hi, I didn’t run pgindent earlier, so here’s the updated version with the correct indentation. Hope this helps! Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Andrew Dunstan <andrew@dunslane.net> — 2024-09-26T17:16:38Z
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote: > Hi, > > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! This is a really nice feature, and provides a lot of expressive power for such a small piece of code. I notice this doesn't seem to work for domains over json and jsonb. andrew@~=# create domain json_d as json; CREATE DOMAIN andrew@~=# create table test_json_dot(id int, test_json json_d); CREATE TABLE andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; INSERT 0 1 | | andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; ERROR: column notation .b applied to type json_d, which is not a composite type LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l... I'm not sure that's a terribly important use case, but we should probably make it work. If it's a domain we should get the basetype of the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com -
Re: SQL:2023 JSON simplified accessor support
David E. Wheeler <david@justatheory.com> — 2024-09-27T09:49:31Z
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want the text output far more often than a JSON scalar. Best, David
-
Re: SQL:2023 JSON simplified accessor support
Andrew Dunstan <andrew@dunslane.net> — 2024-09-27T11:07:51Z
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote: > On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > >> I didn’t run pgindent earlier, so here’s the updated version with the >> correct indentation. Hope this helps! > Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want the text output far more often than a JSON scalar. > That would defeat being able to chain these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: SQL:2023 JSON simplified accessor support
David E. Wheeler <david@justatheory.com> — 2024-09-27T21:19:15Z
On Sep 27, 2024, at 12:07, Andrew Dunstan <andrew@dunslane.net> wrote: > That would defeat being able to chain these. Not if it’s a different operator. But I’m fine to just keep using ->> at the end of a chain. D
-
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2024-10-04T12:33:07Z
On Thu, Sep 26, 2024 at 11:45 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi, > > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! > the attached patch solves the domain type issue, Andrew mentioned in the thread. I also added a test case: composite over jsonb domain type, it still works. for example: create domain json_d as jsonb; create type test as (a int, b json_d); create table t1(a test); insert into t1 select $$(1,"{""a"": 3, ""key1"": {""c"": ""42""}, ""key2"": [11, 12]}") $$; insert into t1 select $$ (1,"{""a"": 3, ""key1"": {""c"": ""42""}, ""key2"": [11, 12, {""x"": [31, 42]}]}") $$; select (t1.a).b.key2[2].x[1] from t1; select (t1.a).b.key1.c from t1; -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2024-11-07T21:57:58Z
On Fri, Oct 4, 2024 at 7:33 AM jian he <jian.universality@gmail.com> wrote: > > On Thu, Sep 26, 2024 at 11:45 PM Alexandra Wang > <alexandra.wang.oss@gmail.com> wrote: > > > > Hi, > > > > I didn’t run pgindent earlier, so here’s the updated version with the > > correct indentation. Hope this helps! > > > > the attached patch solves the domain type issue, Andrew mentioned in the thread. > > I also added a test case: composite over jsonb domain type, > > > it still works. for example: > create domain json_d as jsonb; > create type test as (a int, b json_d); > create table t1(a test); > insert into t1 select $$(1,"{""a"": 3, ""key1"": {""c"": ""42""}, > ""key2"": [11, 12]}") $$; > insert into t1 select $$ (1,"{""a"": 3, ""key1"": {""c"": ""42""}, > ""key2"": [11, 12, {""x"": [31, 42]}]}") $$; > > select (t1.a).b.key2[2].x[1] from t1; > select (t1.a).b.key1.c from t1; Thank you so much, Jian, for reviewing the patch and providing a fix! I’ve integrated your fix into the attached v5 patch. Inspired by your test case, I discovered another issue with domains over JSON: top-level JSON array access to a domain over JSON when the domain is a field of a composite type. Here’s an example: create domain json_d as json; create type test as (a int, b json_d); create table t1(a test); insert into t1 select $$ (1,"[{""a"": 3}, {""key1"": {""c"": ""42""}}, {""key2"": [11, 12]}]") $$; select (t1.a).b[0] from t1; The v5 patch includes the following updates: - Fixed the aforementioned issue and added more tests covering composite types with domains, nested domains, and arrays of domains over JSON/JSONB. - Refactored the logic for parsing JSON/JSONB object fields by moving it from ParseFuncOrColumn() to transformIndirection() for improved readability. The ParseFuncOrColumn() function is already handling both single-argument function calls and composite types, and it has other callers besides transformIndirection(). Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Peter Eisentraut <peter@eisentraut.org> — 2024-11-14T12:31:35Z
On 07.11.24 22:57, Alexandra Wang wrote: > The v5 patch includes the following updates: > > - Fixed the aforementioned issue and added more tests covering composite > types with domains, nested domains, and arrays of domains over > JSON/JSONB. > > - Refactored the logic for parsing JSON/JSONB object fields by moving it > from ParseFuncOrColumn() to transformIndirection() for improved > readability. The ParseFuncOrColumn() function is already handling both > single-argument function calls and composite types, and it has other > callers besides transformIndirection(). This patch implements array subscripting support for the json type, but it does it in a non-standard way, using ParseJsonSimplifiedAccessorArrayElement(). This would be better done by providing a typsubscript function for the json type. This is what jsonb already has, which is why your patch doesn't need to provide the array support for jsonb. I suggest you implement the typsubscript support for the json type (make it a separate patch but you can keep it in this thread IMO) and remove the custom code from this patch. A few comments on the tests: The tests look good to me. Good coverage of weirdly nested types. Results look correct. +drop table if exists test_json_dot; This can be omitted, since we know that the table doesn't exist yet. This code could be written in the more conventional insert ... values syntax: +insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; +insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json; Then the ::json casts can also go away. Also, using a different value for "id" for each row would be more useful, so that the subsequent tests could then be written like select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot; so we can see which result corresponds to which input row. Also make id the primary key in this table. Also, let's keep the json and the jsonb variants aligned. There are some small differences, like the test_json_dot table having 4 rows but the test_jsonb_dot having 3 rows. And the array and wildcard tests in the opposite order. Not a big deal, but keeping these the same helps eyeballing the test files. Maybe add a comment somewhere in this file that you are running the json_query equivalents to cross-check the semantics of the dot syntax. Some documentation should be written. This looks like this right place to start: https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION and them maybe some cross-linking between there and the sections on JSON types and operators. -
Re: SQL:2023 JSON simplified accessor support
Nikita Glukhov <glukhov.n.a@gmail.com> — 2024-11-20T00:06:16Z
Hi, hackers. I have implemented dot notation for jsonb using type subscripting back in April 2023, but failed post it because I left Postgres Professional company soon after and have not worked anywhere since, not even had any interest in programming. But yesterday I accidentally decided to look what is going on at commitfests and found this thread. I immediately started to rebase code from PG16, fixed some bugs, and now I'm ready to present my version of the patches which is much more complex. Unfortunately, I probably won't be able to devote that much time to the patches as before. Description of the patches: 1. Allow transformation only of a sublist of subscripts This gives ability to custom subscripting code to consume only the supported prefix of the subscripts list. Remaining subscripts are applied to the result of subscription which can be of different data type. Example of behavior change for hstore which always consumes only the first subscript and returns text: =# select (hstore 'foo=>bar')['foo']['bar']; -ERROR: hstore allows only one subscript +ERROR: cannot subscript type text because it does not support subscripting 2. GUC compat_field_notation disables treating of field selection as a function application (non-standard PG feature) if the column data type supports subscripting. Example for function hstore_hash(hstore) (after patch #4 applied): =# set compat_field_notation = on; =# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash; -- function hstore_hash ------------ -1718799972 =# select (hstore 'foo=>bar,hstore_hash=>123').foo; foo ----- bar =# set compat_field_notation = off; -- default =# select (hstore 'foo=>bar').hstore_hash; -- missing key hstore_hash ------------- =# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash; hstore_hash ------------- 123 3. Enable processing of field accessors by generic subscripting code. Field accessors are represented as String nodes in refupperexprs for distinguishing from ordinary text subscripts which can be needed for correct EXPLAIN. Strings node seem to no longer be a valid expression nodes, so I had to add quite ugly handling for them in nodeFuncs etc during rebase to PG18. 4. Implement read-only dot notation for hstore (see example above). 5. Enable processing of .* by generic subscripting (similary to #3) 6. Export jsonPathFromParseResult(): simple refactoring for #7. 7. Implement read-only dot notation for jsonb using jsonpath. A list of subscripts is converted to jsonpath using JsonPathParseItem. Non-constant array accessors are replaced with jsonpath parameters. TODO: - wildcard array accessor [*] - item methods There is some inconsistency in subscripting in jsonpath, which supports only arrays with numeric indexes, and existing generic subscripting for jsonb, which also supports indexing of objects by string keys. -- JSON_QUERY(jb, '$.a["b"]') =# select (jsonb '{"a": {"b": 1}}').a['b']; a --- NULL -- JSON_QUERY(jb, '$.a')['b'] = generic subscripting after JSON_QUERY =# select ((jsonb '{"a": {"b": 1}}').a)['b']; a --- 1 So, I think it would be good to enable subscripting of objects by strings in jsonpath. 8. Extract transformColumnRefInternal(): simple refactoring for #9 9. Enable non-parenthesized column references in dot notation. I think such patch needs a separate thread. This feature, required by the SQL standard, is implemented by calling transformColumnRefInternal() in a loop and passing different prefixes of the field chain. But I'm not sure how correct this is, I simply tried to preserve compatibility with existing name resolution rules. For example, for A.B.C.D.E we will try the following variants of splitting to column reference and indirections: (A.B.C.D).E = db A, schema B, table C, column D (A.B.C).D.E = schema A, table B, column C (A.B).C.D.E = table A, column B (A).B.C.D.E is not tried because by the SQL standard column reference in dot notation should be prefixed with table name. Although it would be very nice to write "jb.key" instead of "tab.jb.key". There is also an inconsistency in execution when the accessor chain is separated by parentheses. Example: (jb).a [0] => JSON_QUERY(jb, '$.a[0]') ((jb).a)[0] => JSON_QUERY(jb, '$.a') [0] =# select (jsonb '[{"a": 1}, {"a": 2}]').a[0]; a -------- [1, 2] - jsonpath '$.a' returns two items: 1, 2 - jsonpath '[0]' returns the same items due to auto array wrapping - items wrapped into array =# select ((jsonb '[{"a": 1}, {"a": 2}]').a)[0]; a --- 1 - JSON_QUERY(jb, '$.a') returns array [1, 2], - generic subscript '[0]' extracts its first element The problem could be solved if we somehow allowed intermediate jsonpaths to return unwrapped items sequences (like SRF). Also I think it would be an interesting task to implement the assignment to JSON using dot notation, but it is unclear what to do here with .* and [*]. -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2024-11-21T20:52:49Z
Hi, On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: > > Hi, hackers. > > I have implemented dot notation for jsonb using type subscripting back > in April 2023, but failed post it because I left Postgres Professional > company soon after and have not worked anywhere since, not even had > any interest in programming. > > But yesterday I accidentally decided to look what is going on at > commitfests and found this thread. I immediately started to rebase > code from PG16, fixed some bugs, and now I'm ready to present my > version of the patches which is much more complex. > > Unfortunately, I probably won't be able to devote that much time to > the patches as before. Thank you so much, Nikita, for revisiting this topic and sharing your v6 patches! Now that we have two solutions, I’d like to summarize our current options. In Postgres, there are currently three ways to access json/jsonb object fields and array elements: 1. '->' operator (Postgres-specific, predates SQL standard): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' -> 0; -- returns 1 2. jsonb subscripting (not available for the plain json type): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['d'][0]; --returns 1 3. json_query() function: postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, 3]}', 'lax $.d[0]'); --returns 1 A few weeks ago, I did the following performance benchmarking of the three approaches: -- setup: create table tbl(id int, col1 jsonb); insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, {"j":"vj"}]}]}' from generate_series(1, 100000)i; -- jsonb_operator.sql SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; -- jsonb_subscripting.sql SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; -- jsonb_path_query.sql SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; # pgbench on my local MacOS machine, using -O3 optimization: pgbench -n -f XXX.sql postgres -T100 Results (Latency | tps): "->" operator: 14ms | 68 jsonb subscripting: 17ms | 58 jsonb_path_query() function: 23ms | 43 So performance from best to worst: "->" operator > jsonb subscripting >> jsonb_path_query() function. I’m excited to see your implementation of dot notation for jsonb using type subscripting! This approach rounds out the three possible ways to implement JSON simplified accessors: ## v1: json_query() implementation Pros: - Fully adheres to the SQL standard. According to the SQL standard, if the JSON simplified accessor <JA> is not a JSON item method, it is equivalent to a <JSON query>: JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) (I’m skipping <JA> that includes a JSON item method, as it is currently outside the scope of both sets of patches.) - Easiest to implement Cons: - Slow due to function call overhead. ## v2-v5: "->" operator implementation We initially chose this approach for its performance benefits. However, while addressing Peter’s feedback on v5, I encountered the following issue: -- setup create table test_json_dot(id serial primary key, test_json json); insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, "b": {"c": 42}}]'); -- problematic query: test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; id | b | expected ----+---+----------------- 5 | | [42, {"c": 42}] (1 row) This issue arises from the semantic differences between the "->" operator and json_query’s "lax" mode. One possible workaround is to redefine the "->" operator and modify its implementation. However, since the "->" operator has been in use for a long time, such changes would break backward compatibility. ## v6: jsonb subscription implementation Nikita's patches pass all my functional test cases, including those that failed with the previous approach. Supported formats: - JSON member accessor - JSON wildcard member accessor (Not available in v5, so this is also a plus) - JSON array accessor Questions: 1. Since Nikita’s patches did not address the JSON data type, and JSON currently does not support subscripting, should we limit the initial feature set to JSONB dot-notation for now? In other words, if we aim to fully support JSON simplified accessors for the plain JSON type, should we handle support for plain JSON subscripting as a follow-up effort? 2. I have yet to have a more thorough review of Nikita’s patches. One area I am not familiar with is the hstore-related changes. How relevant is hstore to the JSON simplified accessor? Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Andrew Dunstan <andrew@dunslane.net> — 2024-11-21T22:46:50Z
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote: > Hi, > > On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: >> Hi, hackers. >> >> I have implemented dot notation for jsonb using type subscripting back >> in April 2023, but failed post it because I left Postgres Professional >> company soon after and have not worked anywhere since, not even had >> any interest in programming. >> >> But yesterday I accidentally decided to look what is going on at >> commitfests and found this thread. I immediately started to rebase >> code from PG16, fixed some bugs, and now I'm ready to present my >> version of the patches which is much more complex. >> >> Unfortunately, I probably won't be able to devote that much time to >> the patches as before. > Thank you so much, Nikita, for revisiting this topic and sharing your > v6 patches! > > Now that we have two solutions, I’d like to summarize our current > options. > > In Postgres, there are currently three ways to access json/jsonb > object fields and array elements: > > 1. '->' operator (Postgres-specific, predates SQL standard): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' > -> 0; -- returns 1 > > 2. jsonb subscripting (not available for the plain json type): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, > 3]}'::jsonb)['d'][0]; --returns 1 > > 3. json_query() function: > > postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, > 3]}', 'lax $.d[0]'); --returns 1 > > A few weeks ago, I did the following performance benchmarking of the > three approaches: > > -- setup: > create table tbl(id int, col1 jsonb); > insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, > {"j":"vj"}]}]}' from generate_series(1, 100000)i; > > -- jsonb_operator.sql > SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; > > -- jsonb_subscripting.sql > SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; > > -- jsonb_path_query.sql > SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; > > # pgbench on my local MacOS machine, using -O3 optimization: > pgbench -n -f XXX.sql postgres -T100 > > Results (Latency | tps): > > "->" operator: 14ms | 68 > jsonb subscripting: 17ms | 58 > jsonb_path_query() function: 23ms | 43 > > So performance from best to worst: > "->" operator > jsonb subscripting >> jsonb_path_query() function. > > I’m excited to see your implementation of dot notation for jsonb using > type subscripting! This approach rounds out the three possible ways to > implement JSON simplified accessors: > > ## v1: json_query() implementation > > Pros: > - Fully adheres to the SQL standard. > > According to the SQL standard, if the JSON simplified accessor <JA> is > not a JSON item method, it is equivalent to a <JSON query>: > > JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > (I’m skipping <JA> that includes a JSON item method, as it is > currently outside the scope of both sets of patches.) > > - Easiest to implement > > Cons: > - Slow due to function call overhead. > > ## v2-v5: "->" operator implementation > > We initially chose this approach for its performance benefits. > However, while addressing Peter’s feedback on v5, I encountered the > following issue: > > -- setup > create table test_json_dot(id serial primary key, test_json json); > insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, > "b": {"c": 42}}]'); > > -- problematic query: > test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH > CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from > test_json_dot; > id | b | expected > ----+---+----------------- > 5 | | [42, {"c": 42}] > (1 row) > > This issue arises from the semantic differences between the "->" > operator and json_query’s "lax" mode. One possible workaround is to > redefine the "->" operator and modify its implementation. However, since > the "->" operator has been in use for a long time, such changes would > break backward compatibility. > > ## v6: jsonb subscription implementation > > Nikita's patches pass all my functional test cases, including those > that failed with the previous approach. > > Supported formats: > - JSON member accessor > - JSON wildcard member accessor (Not available in v5, so this is also a plus) > - JSON array accessor > > Questions: > > 1. Since Nikita’s patches did not address the JSON data type, and JSON > currently does not support subscripting, should we limit the initial > feature set to JSONB dot-notation for now? In other words, if we aim > to fully support JSON simplified accessors for the plain JSON type, > should we handle support for plain JSON subscripting as a follow-up > effort? > > 2. I have yet to have a more thorough review of Nikita’s patches. > One area I am not familiar with is the hstore-related changes. How > relevant is hstore to the JSON simplified accessor? > We can't change the way the "->" operator works, as there could well be uses of it in the field that rely on its current behaviour. But maybe we could invent a new operator which is compliant with the standard semantics for dot access, and call that. Then we'd get the best performance, and also we might be able to implement it for the plain JSON type. If that proves not possible we can think about not implementing for plain JSON, but I'd rather not go there until we have to. I don't think we should be including hstore changes here - we should just be aiming at implementing the standard for JSON access. hstore changes if any should be a separate feature. The aren't relevant to JSON access, although they might use some of the same infrastructure, depending on implementation. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com -
Re: SQL:2023 JSON simplified accessor support
Peter Eisentraut <peter@eisentraut.org> — 2024-11-26T09:12:06Z
On 21.11.24 23:46, Andrew Dunstan wrote: >> Questions: >> >> 1. Since Nikita’s patches did not address the JSON data type, and JSON >> currently does not support subscripting, should we limit the initial >> feature set to JSONB dot-notation for now? In other words, if we aim >> to fully support JSON simplified accessors for the plain JSON type, >> should we handle support for plain JSON subscripting as a follow-up >> effort? >> >> 2. I have yet to have a more thorough review of Nikita’s patches. >> One area I am not familiar with is the hstore-related changes. How >> relevant is hstore to the JSON simplified accessor? >> > > We can't change the way the "->" operator works, as there could well be > uses of it in the field that rely on its current behaviour. But maybe we > could invent a new operator which is compliant with the standard > semantics for dot access, and call that. Then we'd get the best > performance, and also we might be able to implement it for the plain > JSON type. If that proves not possible we can think about not > implementing for plain JSON, but I'd rather not go there until we have to. Yes, I think writing a custom operator that is similar to "->" but has the required semantics is the best way forward. (Maybe it can be just a function?) > I don't think we should be including hstore changes here - we should > just be aiming at implementing the standard for JSON access. hstore > changes if any should be a separate feature. The aren't relevant to JSON > access, although they might use some of the same infrastructure, > depending on implementation. In a future version, the operator/function mentioned above could be a catalogued property of a type, similar to typsubscript. Then you could also apply this to other types. But let's leave that for later. If I understand it correctly, Nikita's patch uses the typsubscript support function to handle both bracket subscripting and dot notation. I'm not sure if it's right to mix these two together. Maybe I didn't understand that correctly.
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-02-05T07:20:30Z
Hi hackers, On Tue, Nov 26, 2024 at 3:12 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 21.11.24 23:46, Andrew Dunstan wrote: > >> Questions: > >> > >> 1. Since Nikita’s patches did not address the JSON data type, and JSON > >> currently does not support subscripting, should we limit the initial > >> feature set to JSONB dot-notation for now? In other words, if we aim > >> to fully support JSON simplified accessors for the plain JSON type, > >> should we handle support for plain JSON subscripting as a follow-up > >> effort? > >> > >> 2. I have yet to have a more thorough review of Nikita’s patches. > >> One area I am not familiar with is the hstore-related changes. How > >> relevant is hstore to the JSON simplified accessor? > >> > > > > We can't change the way the "->" operator works, as there could well be > > uses of it in the field that rely on its current behaviour. But maybe we > > could invent a new operator which is compliant with the standard > > semantics for dot access, and call that. Then we'd get the best > > performance, and also we might be able to implement it for the plain > > JSON type. If that proves not possible we can think about not > > implementing for plain JSON, but I'd rather not go there until we have > to. > > Yes, I think writing a custom operator that is similar to "->" but has > the required semantics is the best way forward. (Maybe it can be just a > function?) > > > I don't think we should be including hstore changes here - we should > > just be aiming at implementing the standard for JSON access. hstore > > changes if any should be a separate feature. The aren't relevant to JSON > > access, although they might use some of the same infrastructure, > > depending on implementation. > > In a future version, the operator/function mentioned above could be a > catalogued property of a type, similar to typsubscript. Then you could > also apply this to other types. But let's leave that for later. > > If I understand it correctly, Nikita's patch uses the typsubscript > support function to handle both bracket subscripting and dot notation. > I'm not sure if it's right to mix these two together. Maybe I didn't > understand that correctly. > I’ve been working on a custom operator-like function to support dot notation in lax mode for JSONB. However, I realized this approach has the following drawbacks: 1. Handling both dot notation and bracket subscripting together becomes complicated, as we still need to consider jsonb’s existing type subscript functions. 2. Chaining N dot-access operators causes multiple unnecessary deserialization/serialization cycles: for each operator call, the source jsonb binary is converted to an in-memory JsonbValue, then the relevant field is extracted, and finally it’s turned back into a binary jsonb object. This hurts performance. A direct use of the jsonpath functions API seems more efficient. 3. Correctly applying lax mode requires different handling for the first, middle, and last operators, which adds further complexity. Because of these issues, I took a closer look at Nikita’s patch. His solution generalizes the existing jsonb typesubscript support function to handle both bracket subscripting and dot notation. It achieves this by translating dot notation into a jsonpath expression during transformation, and then calls JsonPathQuery at execution. Overall, I find this approach more efficient for chained accessors and more flexible for future enhancements. I attached a minimized version of Nikita’s patch (v7): - The first three patches are refactoring steps that could be squashed if preferred. - The last two patches implement dot notation and wildcard access, respectively. Changes in this new version: - Removed code handling hstore, as Andrew pointed out it isn’t directly relevant to JSON access and should be handled separately. - Split tests for dot notation and wildcard access. - Dropped the two patches in v6 that enabled non-parenthesized column references (per Nikita’s suggestion, this will need its own separate discussion). For reference, I’ve also attached the operator-like function approach in 0001-WIP-Operator-approach-JSONB-dot-notation.txt. I’d appreciate any feedback and thoughts! Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-02-05T13:39:15Z
Hi, On Wed, Feb 5, 2025 at 1:20 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I attached a minimized version of Nikita’s patch (v7): > > - The first three patches are refactoring steps that could be squashed > if preferred. > - The last two patches implement dot notation and wildcard access, > respectively. > > Changes in this new version: > - Removed code handling hstore, as Andrew pointed out it isn’t > directly relevant to JSON access and should be handled separately. > - Split tests for dot notation and wildcard access. > - Dropped the two patches in v6 that enabled non-parenthesized column > references (per Nikita’s suggestion, this will need its own separate > discussion). > It appears that the Commitfest app selected the wrong patch. Sorry about the confusion! I'm reposting the patches to correct this.
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-02-27T15:46:48Z
Hello hackers, I’ve fixed the compilation failure for hstore and updated the patches. In this version, I’ve further cleaned up the code and added more comments. I hope this helps! Summary of changes: v8-0001 through v8-0005: Refactoring and preparatory steps for the actual implementation. v8-0006 (Implement read-only dot notation for JSONB): I removed the vars field (introduced in v7) from JsonbSubWorkspace after realizing that JsonPathVariable is not actually needed for dot-notation. v8-0007 (Allow wildcard member access for JSONB): I'm aware that the #if 0 in check_indirection() is not ideal. I haven’t removed it yet because I’m still reviewing other cases—beyond our JSONB simplified accessor use case—where this check should remain strict. I’ll post an additional patch to address this. Looking forward to comments and feedback! Thanks, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-02-27T23:23:30Z
Hello hackers, On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > Summary of changes: > > v8-0001 through v8-0005: > Refactoring and preparatory steps for the actual implementation. > > v8-0006 (Implement read-only dot notation for JSONB): > I removed the vars field (introduced in v7) from JsonbSubWorkspace > after realizing that JsonPathVariable is not actually needed for > dot-notation. > > v8-0007 (Allow wildcard member access for JSONB): > I'm aware that the #if 0 in check_indirection() is not ideal. I > haven’t removed it yet because I’m still reviewing other cases—beyond > our JSONB simplified accessor use case—where this check should remain > strict. I’ll post an additional patch to address this. > I made the following minor changes in v9: - More detailed commit messages - Additional tests - Use "?column?" as the column name for trailing .*. Other than that, the patches remain the same as the previous version: 0001 - 0005: preparation steps for the actual implementation and do not change or add new behavior. 0006: jsonb dot notation and sliced subscripting 0007: jsonb wildcard member access Thanks, Alex
-
Re: SQL:2023 JSON simplified accessor support
Matheus Alcantara <matheusssilv97@gmail.com> — 2025-03-03T19:16:09Z
Hi Alex, The code comments and the commit messages help a lot when reviewing! Thanks for the new version. The code LGTM and check-world is happy. I've also performed some tests and everything looks good! Just some minor points about this new version: ## v9-0005 Typo on commit message title ## v9-0006 > + * The following functions create various types of JsonPathParseItem nodes, > + * which are used to build JsonPath expressions for jsonb simplified accessor. > Just to avoid misinterpretation I think that we can replace "The following functions" with "The make_jsonpath_item_* functions" since we can have more functions in the future that are not fully related with these. Does that make sense? -- Matheus Alcantara
-
Re: SQL:2023 JSON simplified accessor support
Matheus Alcantara <matheusssilv97@gmail.com> — 2025-03-03T19:43:07Z
On Mon, Mar 3, 2025 at 4:16 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > Hi Alex, > > The code comments and the commit messages help a lot when reviewing! Thanks for > the new version. > > The code LGTM and check-world is happy. I've also performed some tests and > everything looks good! > > Just some minor points about this new version: > > ## v9-0005 > > Typo on commit message title > > ## v9-0006 > > > + * The following functions create various types of JsonPathParseItem nodes, > > + * which are used to build JsonPath expressions for jsonb simplified accessor. > > > Just to avoid misinterpretation I think that we can replace "The following > functions" with "The make_jsonpath_item_* functions" since we can have more > functions in the future that are not fully related with these. Does that make > sense? > Sorry, I've forgotten to include a question. Do you have anything in mind about documentation changes for this patch? -- Matheus Alcantara
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-03-03T20:22:49Z
Hi Matheus, On Mon, Mar 3, 2025 at 1:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > On Mon, Mar 3, 2025 at 4:16 PM Matheus Alcantara > <matheusssilv97@gmail.com> wrote: > > > > Hi Alex, > > > > The code comments and the commit messages help a lot when reviewing! > Thanks for > > the new version. > > > > The code LGTM and check-world is happy. I've also performed some tests > and > > everything looks good! > Just some minor points about this new version: > > > > ## v9-0005 > > > > Typo on commit message title > ## v9-0006 > > > > > + * The following functions create various types of JsonPathParseItem > nodes, > > > + * which are used to build JsonPath expressions for jsonb simplified > accessor. > > > > > Just to avoid misinterpretation I think that we can replace "The > following > > functions" with "The make_jsonpath_item_* functions" since we can have > more > > functions in the future that are not fully related with these. Does that > make > > sense? > Thank you so much for reviewing! I've attached v10, which addresses your feedback. On Mon, Mar 3, 2025 at 1:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > Sorry, I've forgotten to include a question. Do you have anything in mind > about > documentation changes for this patch? > For the documentation, I’m thinking of adding it under JSON Types [1]. I’d either add a new “Simple Dot-Notation” section after jsonb subscripting [2] or replace it. Let me know what you think. [1] https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSON [2] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Mark Dilger <mark.dilger@enterprisedb.com> — 2025-03-04T14:05:18Z
On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I've attached v10, which addresses your feedback. > > Hi Alex! Thanks for the patches. In src/test/regress/sql/jsonb.sql, the section marked with "-- slices are not supported" should be relabeled. That comment predates these patches, and is now misleading. A bit further down in expected/jsonb.out, there is an expected failure, but no SQL comment to indicate that it is expected: +SELECT (t.jb).* FROM test_jsonb_dot_notation; +ERROR: missing FROM-clause entry for table "t" +LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation; Perhaps a "-- fails" comment would clarify? Then, further down, +SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported +ERROR: syntax error at or near "**" +LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation; I wonder if it would be better to have the parser handle this case and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead? I got curious about the support for this new dot notation in the plpgsql parser and tried: +DO $$ +DECLARE + a jsonb := '[1,2,3,4,5,6,7]'::jsonb; +BEGIN + WHILE a IS NOT NULL + LOOP + RAISE NOTICE '%', a; + a := a[2:]; + END LOOP; +END +$$ LANGUAGE plpgsql; +NOTICE: [1, 2, 3, 4, 5, 6, 7] +NOTICE: [3, 4, 5, 6, 7] +NOTICE: [5, 6, 7] +NOTICE: 7 which looks good! But then I tried: +DO $$ +DECLARE + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb; +BEGIN + WHILE a IS NOT NULL + LOOP + RAISE NOTICE '%', a; + a := COALESCE(a."NU", a[2]); + END LOOP; +END +$$ LANGUAGE plpgsql; +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} +ERROR: missing FROM-clause entry for table "a" +LINE 1: a := COALESCE(a."NU", a[2]) + ^ +QUERY: a := COALESCE(a."NU", a[2]) +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment which suggests the plpgsql parser does not recognize a."NU" as we'd expect. Any thoughts on this? I notice there are no changes in src/interfaces/ecpg/test, which concerns me. The sqljson.pgc and sqljson_jsontable.pgc files are already testing json handling in ecpg; perhaps just extend those a bit? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -
Re: SQL:2023 JSON simplified accessor support
Mark Dilger <mark.dilger@enterprisedb.com> — 2025-03-04T15:34:53Z
On Tue, Mar 4, 2025 at 6:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > But then I tried: > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, > -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE(a."NU", a[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} > +ERROR: missing FROM-clause entry for table "a" > +LINE 1: a := COALESCE(a."NU", a[2]) > + ^ > +QUERY: a := COALESCE(a."NU", a[2]) > +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment > > which suggests the plpgsql parser does not recognize a."NU" as we'd > expect. Any thoughts on this? > I should mention that +DO $$ +DECLARE + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]}'::jsonb; +BEGIN + WHILE a IS NOT NULL + LOOP + RAISE NOTICE '%', a; + a := COALESCE((a)."NU", (a)[2]); + END LOOP; +END +$$ LANGUAGE plpgsql; +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} +NOTICE: [{"": [[3]]}, [6], [2], "bCi"] +NOTICE: [2] works fine. I guess that is good enough. Should we add these to the sql/jsonb.sql to document the expected behavior, both with the error when using plain "a" and with the correct output when using "(a)"? The reason I mention this is that the plpgsql parser might get changed at some point, and without a test case, we might not notice if this breaks. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -
Re: SQL:2023 JSON simplified accessor support
Andrew Dunstan <andrew@dunslane.net> — 2025-03-04T18:13:39Z
On 2025-03-04 Tu 10:34 AM, Mark Dilger wrote: > > > On Tue, Mar 4, 2025 at 6:05 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: > > But then I tried: > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], > "aaf": [-6, -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE(a."NU", a[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": > [-6, -8]} > +ERROR: missing FROM-clause entry for table "a" > +LINE 1: a := COALESCE(a."NU", a[2]) > + ^ > +QUERY: a := COALESCE(a."NU", a[2]) > +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment > > which suggests the plpgsql parser does not recognize a."NU" as > we'd expect. Any thoughts on this? > > > I should mention that > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": > [-6, -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE((a)."NU", (a)[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} > +NOTICE: [{"": [[3]]}, [6], [2], "bCi"] > +NOTICE: [2] > works fine. I guess that is good enough. Should we add these to the > sql/jsonb.sql to document the expected behavior, both with the error > when using plain "a" and with the correct output when using "(a)"? > The reason I mention this is that the plpgsql parser might get changed > at some point, and without a test case, we might not notice if this > breaks. > Yes, I think so. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-03-13T14:02:06Z
Hi Mark, Thank you so much for reviewing! I have attached the new patches. On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang < > alexandra.wang.oss@gmail.com> wrote: > >> I've attached v10, which addresses your feedback. >> >> > Hi Alex! Thanks for the patches. > > In src/test/regress/sql/jsonb.sql, the section marked with "-- slices are > not supported" should be relabeled. That comment predates these patches, > and is now misleading. > > A bit further down in expected/jsonb.out, there is an expected failure, > but no SQL comment to indicate that it is expected: > > +SELECT (t.jb).* FROM test_jsonb_dot_notation; > +ERROR: missing FROM-clause entry for table "t" > +LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation; > > Perhaps a "-- fails" comment would clarify? Then, further down, > Fixed. > > +SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported > +ERROR: syntax error at or near "**" > +LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation; > > I wonder if it would be better to have the parser handle this case and > raise a ERRCODE_FEATURE_NOT_SUPPORTED instead? > In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers to represent "**". Hope this helps! > I got curious about the support for this new dot notation in the plpgsql > parser and tried: > > +DO $$ > +DECLARE > + a jsonb := '[1,2,3,4,5,6,7]'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := a[2:]; > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: [1, 2, 3, 4, 5, 6, 7] > +NOTICE: [3, 4, 5, 6, 7] > +NOTICE: [5, 6, 7] > +NOTICE: 7 > > which looks good! But then I tried: > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, > -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE(a."NU", a[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} > +ERROR: missing FROM-clause entry for table "a" > +LINE 1: a := COALESCE(a."NU", a[2]) > + ^ > +QUERY: a := COALESCE(a."NU", a[2]) > +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment > > which suggests the plpgsql parser does not recognize a."NU" as we'd > expect. Any thoughts on this? > Thanks for the tests! I added them to the "jsonb" regress test. > I notice there are no changes in src/interfaces/ecpg/test, which concerns > me. The sqljson.pgc and sqljson_jsontable.pgc files are already testing > json handling in ecpg; perhaps just extend those a bit? > Thanks for bringing this up! I have added new tests in src/interfaces/ecpg/test/sql/sqljson.pgc. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Peter Eisentraut <peter@eisentraut.org> — 2025-03-20T14:18:07Z
This patch set has expanded significantly in scope recently, which is probably the right thing, but that means there won't be enough time to review and finish it for PG18. So I'm moving this to the next commitfest now. On 13.03.25 15:02, Alexandra Wang wrote: > Hi Mark, > > Thank you so much for reviewing! I have attached the new patches. > > On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com > <mailto:mark.dilger@enterprisedb.com>> wrote: > > > On Mon, Mar 3, 2025 at 12:23 PM Alexandra Wang > <alexandra.wang.oss@gmail.com <mailto:alexandra.wang.oss@gmail.com>> > wrote: > > I've attached v10, which addresses your feedback. > > > Hi Alex! Thanks for the patches. > > In src/test/regress/sql/jsonb.sql, the section marked with "-- > slices are not supported" should be relabeled. That comment > predates these patches, and is now misleading. > > A bit further down in expected/jsonb.out, there is an expected > failure, but no SQL comment to indicate that it is expected: > > +SELECT (t.jb).* FROM test_jsonb_dot_notation; > +ERROR: missing FROM-clause entry for table "t" > +LINE 1: SELECT (t.jb).* FROM test_jsonb_dot_notation; > > Perhaps a "-- fails" comment would clarify? Then, further down, > > > Fixed. > > +SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported > +ERROR: syntax error at or near "**" > +LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation; > > I wonder if it would be better to have the parser handle this case > and raise a ERRCODE_FEATURE_NOT_SUPPORTED instead? > > > In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers to > represent "**". Hope this helps! > > I got curious about the support for this new dot notation in the > plpgsql parser and tried: > > +DO $$ > +DECLARE > + a jsonb := '[1,2,3,4,5,6,7]'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := a[2:]; > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: [1, 2, 3, 4, 5, 6, 7] > +NOTICE: [3, 4, 5, 6, 7] > +NOTICE: [5, 6, 7] > +NOTICE: 7 > > which looks good! But then I tried: > > +DO $$ > +DECLARE > + a jsonb := '{"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": > [-6, -8]}'::jsonb; > +BEGIN > + WHILE a IS NOT NULL > + LOOP > + RAISE NOTICE '%', a; > + a := COALESCE(a."NU", a[2]); > + END LOOP; > +END > +$$ LANGUAGE plpgsql; > +NOTICE: {"": 6, "NU": [{"": [[3]]}, [6], [2], "bCi"], "aaf": [-6, -8]} > +ERROR: missing FROM-clause entry for table "a" > +LINE 1: a := COALESCE(a."NU", a[2]) > + ^ > +QUERY: a := COALESCE(a."NU", a[2]) > +CONTEXT: PL/pgSQL function inline_code_block line 8 at assignment > > which suggests the plpgsql parser does not recognize a."NU" as we'd > expect. Any thoughts on this? > > > Thanks for the tests! I added them to the "jsonb" regress test. > > I notice there are no changes in src/interfaces/ecpg/test, which > concerns me. The sqljson.pgc and sqljson_jsontable.pgc files are > already testing json handling in ecpg; perhaps just extend those a bit? > > Thanks for bringing this up! I have added new tests in src/interfaces/ > ecpg/test/sql/sqljson.pgc. > > > Best, > Alex -
Re: SQL:2023 JSON simplified accessor support
Vik Fearing <vik@postgresfriends.org> — 2025-03-27T14:27:50Z
On 13/03/2025 15:02, Alexandra Wang wrote: > Hi Mark, > > Thank you so much for reviewing! I have attached the new patches. > Hi Alex, I am reviewing this from a feature perspective and not from a code perspective. On the whole, this looks good to me from a standards point of view. There are two things that stand out to me about this feature: 1) I am not seeing the ** syntax in the standard, so does it need to be signaled as not supported? Perhaps I am just overlooking something. 2) There is no support for <JSON item method>. Since this appears to be constructing a jsonpath query, could that not be added to the syntax and allow jsonpath to throw the error if the function doesn't exist? -- Vik Fearing
-
Re: SQL:2023 JSON simplified accessor support
Nikita Malakhov <hukutoc@gmail.com> — 2025-04-23T16:54:36Z
Hi Alex! Glad you made so much effort to develop this patch set! I think this is an important part of Json functionality. I've looked into you patch and noticed change in behavior in new test results: postgres@postgres=# create table t(x int, y jsonb); insert into t select 1, '{"a": 1, "b": 42}'::jsonb; insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb; insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb; CREATE TABLE Time: 6.373 ms INSERT 0 1 Time: 3.299 ms INSERT 0 1 Time: 2.532 ms INSERT 0 1 Time: 2.453 ms Original master: postgres@postgres=# select (t.y).b.c.d.e from t; ERROR: column notation .b applied to type jsonb, which is not a composite type LINE 1: select (t.y).b.c.d.e from t; ^ Time: 0.553 ms Patched (with v11): postgres@postgres=# select (t.y).b.c.d.e from t; e --- (3 rows) Is this correct? -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-06-23T14:22:07Z
Hi Vik, On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote: > > I am reviewing this from a feature perspective and not from a code > perspective. On the whole, this looks good to me from a standards point > of view. > Thank you so much for reviewing! On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote: > There are two things that stand out to me about this feature: > > > 1) I am not seeing the ** syntax in the standard, so does it need to be > signaled as not supported? Perhaps I am just overlooking something. > I think you’re referring to patch v11-0008, which adds the ** token in the lexer. You’re right — the ** syntax is not mentioned at all in the SQL standard. I added lexer support for ** based on earlier feedback from Mark Dilger. See his comment below: On Thu, Mar 13, 2025 at 3:02 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > On Tue, Mar 4, 2025 at 8:05 AM Mark Dilger <mark.dilger@enterprisedb.com> > wrote: > >> >> > +SELECT (jb).a.**.x FROM test_jsonb_dot_notation; -- not supported >> +ERROR: syntax error at or near "**" >> +LINE 1: SELECT (jb).a.**.x FROM test_jsonb_dot_notation; >> >> I wonder if it would be better to have the parser handle this case and >> raise a ERRCODE_FEATURE_NOT_SUPPORTED instead? >> > > In 0008 I added a new token named "DOUBLE_ASTERISK" to the lexers to > represent "**". Hope this helps! > The ** tests are there because ** is valid in jsonpath, and can be used in functions like json_query() and jsonb_path_query(). So I think we should make it explicit to users that ** is not supported by dot access. As long as the tests make that clear, I don’t have a strong preference whether we throw a syntax error in the lexer or a “feature not supported” error in the parser. If we prefer a syntax error, I’m happy to drop patch 0008. On Thu, Mar 27, 2025 at 3:27 PM Vik Fearing <vik@postgresfriends.org> wrote: > 2) There is no support for <JSON item method>. Since this appears to be > constructing a jsonpath query, could that not be added to the syntax and > allow jsonpath to throw the error if the function doesn't exist? > You’re right — there’s currently no support for <JSON item method>. I intentionally didn’t include it in this patch because it would again expand the scope of work. Supporting <JSON item method> seems non-trivial to me, because the current design transforms expression nodes separately for each indirection. I previously tried a patch that simply converted the full chain of indirections into a JSON_QUERY() call in transformIndirection(). The problem with that approach is that in EXPLAIN output or view definitions, we’d see the rewritten JSON_QUERY() instead of the original dot notation — which feels like a leaky abstraction. To support item methods, I think we’d need to add a new kind of indirection to represent function calls. Then, we could either: a) explicitly add each item method token in gram.y, similar to the list of method keywords in jsonpath_gram.y; or b) allow a generic method-call token in gram.y, and transform it into the corresponding JsonPathItemType that jsonpath understands. Either way, I’m happy to work on it, but would prefer to discuss and implement it in a separate follow-up patch. P.S. After reading your second question, your first one makes more sense to me — I’ve already taken the easy route of not parsing item methods and was planning to leave that work for a follow-up patch. So maybe it’s more consistent to just let the syntax error for ** happen, rather than explicitly throwing a “feature not supported” error? The only difference, I think, is that we don’t want ** in simplified accessor syntax, whereas we do want to support item methods in the future. Let me know what you think! Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-06-23T14:34:34Z
Hi Nikita, Thank you so much for reviewing! On Wed, Apr 23, 2025 at 6:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > Hi Alex! > > Glad you made so much effort to develop this patch set! > I think this is an important part of Json functionality. > > I've looked into you patch and noticed change in behavior > in new test results: > > postgres@postgres=# create table t(x int, y jsonb); > insert into t select 1, '{"a": 1, "b": 42}'::jsonb; > insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb; > insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb; > CREATE TABLE > Time: 6.373 ms > INSERT 0 1 > Time: 3.299 ms > INSERT 0 1 > Time: 2.532 ms > INSERT 0 1 > Time: 2.453 ms > > Original master: > postgres@postgres=# select (t.y).b.c.d.e from t; > ERROR: column notation .b applied to type jsonb, which is not a composite > type > LINE 1: select (t.y).b.c.d.e from t; > ^ > Time: 0.553 ms > > Patched (with v11): > postgres@postgres=# select (t.y).b.c.d.e from t; > e > --- > > > > (3 rows) > > Is this correct? > This is correct. With this patch, the query should return 3 empty rows. We expect dot notation to behave the same as the json_query() below in lax mode with NULL ON EMPTY. postgres=# select json_query(y, 'lax $.b.c.d.e' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from t; json_query ------------ (3 rows) Best, Alex -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-06-24T13:29:51Z
hi. I have applied for 0001 to 0006. static void jsonb_subscript_transform(SubscriptingRef *sbsref, List **indirection, ParseState *pstate, bool isSlice, bool isAssignment) { List *upperIndexpr = NIL; ListCell *idx; sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; if (jsonb_check_jsonpath_needed(*indirection)) { sbsref->refjsonbpath = jsonb_subscript_make_jsonpath(pstate, indirection, &sbsref->refupperindexpr, &sbsref->reflowerindexpr); return; } foreach(idx, *indirection) { Node *i = lfirst(idx); A_Indices *ai; Node *subExpr; Assert(IsA(i, A_Indices)); ai = castNode(A_Indices, i); if (isSlice) { Node *expr = ai->uidx ? ai->uidx : ai->lidx; ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("jsonb subscript does not support slices"), parser_errposition(pstate, exprLocation(expr)))); } I am confused by the above error handling: errmsg("jsonb subscript does not support slices"). we can do select (jsonb '[1,2,3]')[0:1]; or SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation; this is by definition, "slices"? Anyway, I doubt this error handling will ever be reachable. jsonb_check_jsonpath_needed checks whether the indirection contains is_slice, but jsonb_subscript_transform already takes isSlice as an argument. Maybe we can refactor it somehow. T_String is a primitive node type with no subnodes. typedef struct String { pg_node_attr(special_read_write) NodeTag type; char *sval; } String; then in src/backend/nodes/nodeFuncs.c: if (expr && !IsA(expr, String) && WALK(expr)) return true; we can change it to if (WALK(expr)) return true; but in function expression_tree_walker_impl we have to change it as case T_MergeSupportFunc: case T_String: /* primitive node types with no expression subnodes */ break; -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-06-25T05:56:51Z
hi. in src/backend/catalog/sql_features.txt should we mark any T860, T861, T862, T863, T864 items as YES? typedef struct SubscriptingRef { /* expressions that evaluate to upper container indexes */ List *refupperindexpr; } SubscriptingRef.refupperindexpr meaning changed, So the above comments also need to be changed? v11-0004-Extract-coerce_jsonpath_subscript.patch + /* + * We known from can_coerce_type that coercion will succeed, so + * coerce_type could be used. Note the implicit coercion context, which is + * required to handle subscripts of different types, similar to overloaded + * functions. + */ + subExpr = coerce_type(pstate, + subExpr, subExprType, + targetType, -1, + COERCION_IMPLICIT, + COERCE_IMPLICIT_CAST, + -1); + if (subExpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("jsonb subscript must have text type"), the targetType can be "integer", then the error message errmsg("jsonb subscript must have text type") would be wrong? Also this error handling is not necessary. since we can_coerce_type already tell us that coercion will succeed. Also, do we really put v11-0004 as a separate patch? in gram.y we have: if (!IsA($5, A_Const) || castNode(A_Const, $5)->val.node.type != T_String) ereport(ERROR, errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only string constants are supported in JSON_TABLE path specification"), so simply, in make_jsonpath_item_expr we can expr = transformExpr(pstate, expr, pstate->p_expr_kind); if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID) ereport(ERROR, errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("only integer constants are supported in jsonb simplified accessor subscripting"), parser_errposition(pstate, exprLocation(expr))); because I think the current error message "got type: unknown" is not good. select ('123'::jsonb).a['1']; ERROR: jsonb simplified accessor supports subscripting in type: INT4, got type: unknown then we don't need two "ereport(ERROR" within make_jsonpath_item_expr we can also Assert (cnst->constisnull) is false. see gram.y:16976 I saw you introduce the word "AST", for example "Converts jsonpath AST into jsonpath value in binary." I am not sure that is fine. in jsonb_subscript_make_jsonpath we have: + foreach(lc, *indirection) + { + + if (IsA(accessor, String)) + .... + else if (IsA(accessor, A_Star)) + .... + else if (IsA(accessor, A_Indices)) + .... + else + break; Is the last else branch unreliable? since indirection only support for String, A_Star, A_Indices, we already have Assert in jsonb_check_jsonpath_needed to ensure that. + *indirection = list_delete_first_n(*indirection, pathlen); also this is not necessary, because pathlen will be the same length as list *indirection in current design. Please check the attached minor refactoring based on v11-0001 to v11-0006 -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-06-25T07:40:50Z
On Wed, Jun 25, 2025 at 1:56 PM jian he <jian.universality@gmail.com> wrote: > > hi. CREATE TABLE test_jsonb_dot_notation AS SELECT '{"a": [1, 2, {"b": "c"}, {"b": "d", "e": "f", "x": {"y": "yyy", "z": "zzz"}}], "b": [3, 4, {"b": "g", "x": {"y": "YYY", "z": "ZZZ"}}]}'::jsonb jb; CREATE VIEW v1 AS SELECT (jb).a[3].x.y FROM test_jsonb_dot_notation; CREATE VIEW v2 AS SELECT (jb).a[3:].x.y[:-1] FROM test_jsonb_dot_notation; CREATE VIEW v3 AS SELECT (jb).a[:3].x.y[-1:] FROM test_jsonb_dot_notation; \sv v2 \sv v3 will trigger segment fault. also do we need ban subscript slicing, when upper bound is less than lower bound, for example: SELECT (jb).a[3:].x.y[0:'-1'::integer] AS y FROM test_jsonb_dot_notation; please check the attached minor refactoring, which addresses the above issue based on patches v11-0001 through v11-0006. -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-06-27T02:09:28Z
hi. in gram.y we have: indirection_el: '.' attr_name { $$ = (Node *) makeString($2); } we can be sure that dot notation, following dot is a plain string. then in jsonb_subscript_transform, we can transform the String Node to a TEXTOID Const. with that, most of the v11-0005-Enable-String-node-as-field-accessors-in-generic.patch would be unnecessary. Also in v11-0006-Implement-read-only-dot-notation-for-jsonb.patch all these with pattern ``if (IsA(expr, String)`` can be removed. in transformContainerSubscripts we have: sbsref = makeNode(SubscriptingRef); sbsref->refcontainertype = containerType; sbsref->refelemtype = elementType; sbsref->reftypmod = containerTypMod; sbsref->refexpr = (Expr *) containerBase; sbsref->refassgnexpr = NULL; /* caller will fill if it's an assignment */ sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); then jsonb_subscript_transform we have sbsref->refjsonbpath = jsonb_subscript_make_jsonpath(pstate, indirection, &sbsref->refupperindexpr, &sbsref->reflowerindexpr); of course sbsref->refupperindexpr, sbsref->reflowerindexpr is NIL since we first called jsonb_subscript_make_jsonpath. so we can simplify the function signature as static void jsonb_subscript_make_jsonpath(pstate, indirection, sbsref) Within jsonb_subscript_make_jsonpath we are going to populate refupperindexpr, reflowerindexpr, refjsonbpath. The attached patch addresses both of these issues, along with additional related refactoring. It consolidates all the changes I think are appropriate, based on patches v1-0001 to v1-0006. This will include patches previously I sent in the earlier thread. -
Re: SQL:2023 JSON simplified accessor support
Jelte Fennema-Nio <postgres@jeltef.nl> — 2025-06-28T23:52:38Z
On Thu, 13 Mar 2025 at 15:02, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I have attached the new patches. Okay I finally found the time to look at this. I created a draft PR for pg_duckdb[1] to see if I would run into issues. There was only one real issue I found by doing this. The .* syntax is encoded as NULL in refupperindexpr, but that is currently already a valid representation of a slice operation without specifying upper or lower. The easiest way to reproduce the problem is: CREATE TABLE arr(a int[]); explain (verbose) SELECT a[:] FROM arr; QUERY PLAN ─────────────────────────────────────────────────────────────── Seq Scan on public.arr (cost=0.00..23.60 rows=1360 width=32) Output: a.* That last line should instead be Output: a[:] So I think we need to add a new Star node type, that represents the * literal after parsing. I haven't looked too closely at the code yet. -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-07-09T08:01:45Z
Hi Jian, On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote: > hi. > > I have applied for 0001 to 0006. > Thank you so much for the very detailed and thorough review, and for the patches! I've attached v12 that addresses your feedback: v12-0001 to v12-0004 are refactoring patches that prepare for the implementation. v12-0005 - implements jsonb dot-notation (e.g., (jb).a) Changes from v11-0006 include: a. subscripting with slicing and wildcard-related code has been removed and will be added in the following commit b. introduced a dedicated FieldAccessorExpr node to transform Strings in dot-notation. c. mixed syntax (such as (jb).a['b'].c is now allowed, with warnings. d. miscellaneous bug fixes and code cleanup based on Jian’s feedback. v12-0006 - implements jsonb subscripting with slicing. This was split out from v11-0006 into a separate commit. v12-0007 - implements wildcard member accessor. I added the Star node that Jelte suggested for transforming the wildcard member accessor. Please find more detailed replies below: On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote: > static void > jsonb_subscript_transform(SubscriptingRef *sbsref, > List **indirection, > ParseState *pstate, > bool isSlice, > bool isAssignment) > { > List *upperIndexpr = NIL; > ListCell *idx; > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > if (jsonb_check_jsonpath_needed(*indirection)) > { > sbsref->refjsonbpath = > jsonb_subscript_make_jsonpath(pstate, indirection, > &sbsref->refupperindexpr, > &sbsref->reflowerindexpr); > return; > } > foreach(idx, *indirection) > { > Node *i = lfirst(idx); > A_Indices *ai; > Node *subExpr; > Assert(IsA(i, A_Indices)); > ai = castNode(A_Indices, i); > if (isSlice) > { > Node *expr = ai->uidx ? ai->uidx : ai->lidx; > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("jsonb subscript does not support slices"), > parser_errposition(pstate, exprLocation(expr)))); > } > > I am confused by the above error handling: > errmsg("jsonb subscript does not support slices"). > we can do > select (jsonb '[1,2,3]')[0:1]; > or > SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation; > > this is by definition, "slices"? > Anyway, I doubt this error handling will ever be reachable. > jsonb_check_jsonpath_needed checks whether the indirection contains > is_slice, > but jsonb_subscript_transform already takes isSlice as an argument. > Maybe we can refactor it somehow. > Good catch! I believe this error handling was originally copied because the existing JSONB subscripting in Postgres doesn’t support slices. In v12, I’ve separated the slicing implementation into its own commit, distinct from the one that implements dot-notation. Hope that helps! On Tue, Jun 24, 2025 at 3:30 PM jian he <jian.universality@gmail.com> wrote: > T_String is a primitive node type with no subnodes. > typedef struct String > { > pg_node_attr(special_read_write) > NodeTag type; > char *sval; > } String; > then in src/backend/nodes/nodeFuncs.c: > if (expr && !IsA(expr, String) && WALK(expr)) > return true; > we can change it to > if (WALK(expr)) > return true; > but in function expression_tree_walker_impl > we have to change it as > > case T_MergeSupportFunc: > case T_String: > /* primitive node types with no expression subnodes */ > break; > You’re right, the changes to the walker-related code weren’t very clean. In v12, I introduced a new node, FieldAccessorExpr, to serve as the expression node for dot-notation. This helped minimize changes to the walker code and, in my opinion, results in a cleaner design than using a more general Const(text) node. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > in src/backend/catalog/sql_features.txt > should we mark any T860, T861, T862, T863, T864 > items as YES? > I’ve updated T860 and T861. I’m not entirely sure about the rest so left them unchanged. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > typedef struct SubscriptingRef > { > /* expressions that evaluate to upper container indexes */ > List *refupperindexpr; > } > SubscriptingRef.refupperindexpr meaning changed, > So the above comments also need to be changed? > Thanks. Done. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > v11-0004-Extract-coerce_jsonpath_subscript.patch > + /* > + * We known from can_coerce_type that coercion will succeed, so > + * coerce_type could be used. Note the implicit coercion context, which is > + * required to handle subscripts of different types, similar to overloaded > + * functions. > + */ > + subExpr = coerce_type(pstate, > + subExpr, subExprType, > + targetType, -1, > + COERCION_IMPLICIT, > + COERCE_IMPLICIT_CAST, > + -1); > + if (subExpr == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("jsonb subscript must have text type"), > > the targetType can be "integer", then the error message > errmsg("jsonb subscript must have text type") would be wrong? > Also this error handling is not necessary. > since we can_coerce_type already tell us that coercion will succeed. > Also, do we really put v11-0004 as a separate patch? > Thanks for pointing this out! I’ve removed the unnecessary error handling as you suggested. I kept v12-0004 as a separate patch to make the review easier, but I’ll leave it up to the committer to decide whether to squash it with the other commits. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > in gram.y we have: > if (!IsA($5, A_Const) || > castNode(A_Const, $5)->val.node.type != T_String) > ereport(ERROR, > errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("only string constants are > supported in JSON_TABLE path specification"), > so simply, in make_jsonpath_item_expr we can > > expr = transformExpr(pstate, expr, pstate->p_expr_kind); > if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID) > ereport(ERROR, > errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("only integer constants are supported in jsonb > simplified accessor subscripting"), > parser_errposition(pstate, exprLocation(expr))); > > because I think the current error message "got type: unknown" is not good. > select ('123'::jsonb).a['1']; > ERROR: jsonb simplified accessor supports subscripting in type: INT4, > got type: unknown > then we don't need two "ereport(ERROR" within make_jsonpath_item_expr > we can also Assert (cnst->constisnull) is false. > see gram.y:16976 > Thanks for pointing this out! In v12, I added support for mixed subscripting syntax, and updated the ERROR (or WARNING) messages as follows: postgres=# SELECT (jb)['a'].b FROM test_jsonb_dot_notation; -- returns an array due to lax mode b ------------ ["c", "d"] (1 row) postgres=# SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL due to strict mode with warnings WARNING: 01000: mixed usage of jsonb simplified accessor syntax and jsonb subscripting. LINE 1: SELECT (jb).a['b'] FROM test_jsonb_dot_notation; ^ HINT: use dot-notation for member access, or use non-null integer constants subscripting for array access. LOCATION: jsonb_subscript_make_jsonpath, jsonbsubs.c:366 a --- (1 row) postgres=# select ('{"a": 1}'::jsonb)['a':'b']; -- fails ERROR: 42804: only non-null integer constants are supported for jsonb simplified accessor subscripting LINE 1: select ('{"a": 1}'::jsonb)['a':'b']; ^ HINT: use int data type for subscripting with slicing. LOCATION: make_jsonpath_item_expr, jsonbsubs.c:218 So some of the ERRORs now disappear or downgrade to WARNINGs. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > I saw you introduce the word "AST", for example > "Converts jsonpath AST into jsonpath value in binary." > I am not sure that is fine. > Fixed. On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> wrote: > in jsonb_subscript_make_jsonpath we have: > + foreach(lc, *indirection) > + { > + > + if (IsA(accessor, String)) > + .... > + else if (IsA(accessor, A_Star)) > + .... > + else if (IsA(accessor, A_Indices)) > + .... > + else > + break; > > Is the last else branch unreliable? since indirection only support for > String, A_Star, A_Indices, we already have Assert in > jsonb_check_jsonpath_needed > to ensure that. > + *indirection = list_delete_first_n(*indirection, pathlen); > also this is not necessary, > because pathlen will be the same length as list *indirection in current > design. > I kept this logic in v12, because in order to support the mixed usage of dot-notation and existing jsonb subscripting (e.g., (jb).a['b'].c), we need to switch between making jsonpath and transforming the upper indexes for evaluation. So now, *indirection = list_delete_first_n(*indirection, pathlen); is necessary, and pathlen can differ. The else break; can be removed, but I choose to keep it for now and added comments to clarify the behavior. On Wed, Jun 25, 2025 at 12:41 AM jian he <jian.universality@gmail.com> wrote: > On Wed, Jun 25, 2025 at 1:56 PM jian he <jian.universality@gmail.com> > wrote: > > > > hi. > > CREATE TABLE test_jsonb_dot_notation AS > SELECT '{"a": [1, 2, {"b": "c"}, {"b": "d", "e": "f", "x": {"y": > "yyy", "z": "zzz"}}], "b": [3, 4, {"b": "g", "x": {"y": "YYY", "z": > "ZZZ"}}]}'::jsonb jb; > CREATE VIEW v1 AS SELECT (jb).a[3].x.y FROM test_jsonb_dot_notation; > CREATE VIEW v2 AS SELECT (jb).a[3:].x.y[:-1] FROM test_jsonb_dot_notation; > CREATE VIEW v3 AS SELECT (jb).a[:3].x.y[-1:] FROM test_jsonb_dot_notation; > > \sv v2 > \sv v3 > will trigger segment fault. > Great catch, thanks! Fixed and added your tests. On Wed, Jun 25, 2025 at 12:41 AM jian he <jian.universality@gmail.com> wrote: > also do we need ban subscript slicing, when upper bound is less than > lower bound, > for example: > SELECT (jb).a[3:].x.y[0:'-1'::integer] AS y FROM test_jsonb_dot_notation; > I don't think we need to ban this case, since the SQL standard doesn't ban it, and the (empty) result seems reasonable. On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote: > in gram.y we have: > indirection_el: > '.' attr_name > { > $$ = (Node *) makeString($2); > } > > we can be sure that dot notation, following dot is a plain string. > then in jsonb_subscript_transform, we can transform the String Node to > a TEXTOID Const. > with that, most of the > v11-0005-Enable-String-node-as-field-accessors-in-generic.patch > would be unnecessary. > Also in v11-0006-Implement-read-only-dot-notation-for-jsonb.patch > all these with pattern > ``if (IsA(expr, String)`` > can be removed. > Thanks for giving so much thought into this, I really appreciate it! As I mentioned earlier, instead of using the Const(text) node, I added a dedicated FieldAccessorExpr node for dot-notation. I think this addresses the same concern. On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote: > in transformContainerSubscripts we have: > sbsref = makeNode(SubscriptingRef); > sbsref->refcontainertype = containerType; > sbsref->refelemtype = elementType; > sbsref->reftypmod = containerTypMod; > sbsref->refexpr = (Expr *) containerBase; > sbsref->refassgnexpr = NULL; /* caller will fill if it's an > assignment */ > sbsroutines->transform(sbsref, indirection, pstate, > isSlice, isAssignment); > > then jsonb_subscript_transform we have > sbsref->refjsonbpath = > jsonb_subscript_make_jsonpath(pstate, indirection, > &sbsref->refupperindexpr, > &sbsref->reflowerindexpr); > of course sbsref->refupperindexpr, sbsref->reflowerindexpr is NIL > since we first called jsonb_subscript_make_jsonpath. > > so we can simplify the function signature as > static void jsonb_subscript_make_jsonpath(pstate, indirection, sbsref) > > Within jsonb_subscript_make_jsonpath we are going to populate > refupperindexpr, reflowerindexpr, refjsonbpath. > You are right. I applied your suggestion. On Thu, Jun 26, 2025 at 7:10 PM jian he <jian.universality@gmail.com> wrote: > The attached patch addresses both of these issues, along with additional > related > refactoring. It consolidates all the changes I think are appropriate, > based on > patches v1-0001 to v1-0006. This will include patches previously I sent in > the earlier thread. > Thanks again for the patch! It was really helpful! I didn't directly apply it as I made a few different choices, but I think I have addressed all the points you covered in it. Let me know your thoughts! Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-07-09T08:10:36Z
Hi Jelte, On Sat, Jun 28, 2025 at 4:52 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > On Thu, 13 Mar 2025 at 15:02, Alexandra Wang > <alexandra.wang.oss@gmail.com> wrote: > > I have attached the new patches. > > Okay I finally found the time to look at this. I created a draft PR > for pg_duckdb[1] to see if I would run into issues. There was only one > real issue I found by doing this. The .* syntax is encoded as NULL in > refupperindexpr, but that is currently already a valid representation > of a slice operation without specifying upper or lower. The easiest > way to reproduce the problem is: > > CREATE TABLE arr(a int[]); > explain (verbose) SELECT a[:] FROM arr; > QUERY PLAN > ─────────────────────────────────────────────────────────────── > Seq Scan on public.arr (cost=0.00..23.60 rows=1360 width=32) > Output: a.* > > That last line should instead be > Output: a[:] > > So I think we need to add a new Star node type, that represents the * > literal after parsing. > > I haven't looked too closely at the code yet. > Thank you so much for testing and reviewing, I really appreciate it! I fixed the EXPLAIN in v12 by adding a Star node, as you suggested. Looking forward to your thoughts when you have time for a further review! Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-07-10T08:53:28Z
On Wed, Jul 9, 2025 at 4:02 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Thanks again for the patch! It was really helpful! I didn't directly > apply it as I made a few different choices, but I think I have > addressed all the points you covered in it. > > Let me know your thoughts! > hi. in v12-0001 and v12-0002. in transformIndirection if (!newresult) { /* * generic subscripting failed; falling back to function call or * field selection for a composite type. */ Node *n; /* try to find function for field selection */ newresult = ParseFuncOrColumn(pstate, list_make1(n), list_make1(result), last_srf, NULL, false, location); } the above comments mentioning "function call" is wrong? you passed NULL for (FuncCall *fn) in ParseFuncOrColumn. and ParseFuncOrColumn comments says ```If fn is null, we're dealing with column syntax not function syntax.`` I think coerce_jsonpath_subscript can be further simplified. we already have message like: errhint("jsonb subscript must be coercible to either integer or text."), no need to pass the third argument a constant (INT4OID). also ``Oid targetType = UNKNOWNOID;`` set it as InvalidOid would be better. attached is a minor refactoring of coerce_jsonpath_subscript based on (v12-0001 to v12-0004). after applied v12-0001 to v12-0006 + /* emit warning conditionally to minimize duplicate warnings */ + if (list_length(*indirection) > 0) + ereport(WARNING, + errcode(ERRCODE_WARNING), + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."), + errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."), + parser_errposition(pstate, warning_location)); src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; WARNING: mixed usage of jsonb simplified accessor syntax and jsonb subscripting. LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... ^ HINT: use dot-notation for member access, or use non-null integer constants subscripting for array access. ERROR: subscript type bigint is not supported LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... ^ HINT: jsonb subscript must be coercible to either integer or text. The above example looks very bad. location printed twice, hint message is different. two messages level (ERROR, WARNING). also "or use non-null integer constants subscripting for array access." seems wrong? as you can see the below hint message saying it could be text or integer. select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]; ERROR: subscript type bigint is not supported LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]... ^ HINT: jsonb subscript must be coercible to either integer or text. also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4]; return NULL, so "use non-null integer constants" is wrong. -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-07-10T13:34:12Z
On Thu, Jul 10, 2025 at 4:53 PM jian he <jian.universality@gmail.com> wrote: > > src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; > WARNING: mixed usage of jsonb simplified accessor syntax and jsonb > subscripting. > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: use dot-notation for member access, or use non-null integer > constants subscripting for array access. > ERROR: subscript type bigint is not supported > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: jsonb subscript must be coercible to either integer or text. > > The above example looks very bad. location printed twice, hint message > is different. > two messages level (ERROR, WARNING). > For plainSELECT statement, we have WARNING only in src/test/regress/expected/xml.out, src/test/regress/expected/xml_2.out for example: SELECT xpath('/*', '<relativens xmlns=''relative''/>'); WARNING: line 1: xmlns: URI relative is not absolute <relativens xmlns='relative'/> ^ xpath -------------------------------------- {"<relativens xmlns=\"relative\"/>"} (1 row) so i am not sure a plain SELECT statement issuing WARNING is appropriate. ------------------------------------------ in jsonb_subscript_make_jsonpath we have foreach(lc, *indirection) { if (IsA(accessor, String)) .... else if (IsA(accessor, A_Indices)) else /* * Unsupported node type for creating jsonpath. Instead of * throwing an ERROR, break here so that we create a jsonpath from * as many indirection elements as we can and let * transformIndirection() fallback to alternative logic to handle * the remaining indirection elements. */ break; } the above ELSE branch comments look suspicious to me. transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath As you can see, transformIndirection have a long distance from jsonb_subscript_make_jsonpath, let transformIndirection handle remaining indirection elements seems not good. if you look at src/backend/parser/gram.y line 16990. transformIndirection(ParseState *pstate, A_Indirection *ind) ind->indirection can be be Node of String, A_Indices, A_Star also the above ELSE branch never reached in regress tests. ------------------------------------------ typedef struct FieldAccessorExpr { Expr xpr; char *fieldname; /* name of the JSONB object field accessed via * dot notation */ Oid faecollid pg_node_attr(query_jumble_ignore); int location; } FieldAccessorExpr; first field as NodeTag should be just fine? I am not sure the field "location" is needed now, if it is needed, it should be type as ParseLoc. we should add it to src/tools/pgindent/typedefs.list -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-07-11T05:53:52Z
On Thu, Jul 10, 2025 at 9:34 PM jian he <jian.universality@gmail.com> wrote: > > ------------------------------------------ > in jsonb_subscript_make_jsonpath we have > foreach(lc, *indirection) > { > if (IsA(accessor, String)) > .... > else if (IsA(accessor, A_Indices)) > else > /* > * Unsupported node type for creating jsonpath. Instead of > * throwing an ERROR, break here so that we create a jsonpath from > * as many indirection elements as we can and let > * transformIndirection() fallback to alternative logic to handle > * the remaining indirection elements. > */ > break; > } > the above ELSE branch comments look suspicious to me. > transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath > As you can see, transformIndirection have a long distance from > jsonb_subscript_make_jsonpath, > let transformIndirection handle remaining indirection elements seems not good. > context v12-0001 to v12-0006. this ELSE branch comments is wrong, because + if (jsonb_check_jsonpath_needed(*indirection)) + { + jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); + if (sbsref->refjsonbpath) + return; + } in jsonb_check_jsonpath_needed we already use Assert to confirm that list "indirection" is either String or A_Indices Node. in transformContainerSubscripts we have sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); /* * Error out, if datatype failed to consume any indirection elements. */ if (list_length(*indirection) == indirection_length) { Node *ind = linitial(*indirection); if (noError) return NULL; if (IsA(ind, String)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s does not support dot notation", format_type_be(containerType)), parser_errposition(pstate, exprLocation(containerBase)))); else if (IsA(ind, A_Indices)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s does not support array subscripting", format_type_be(containerType)), parser_errposition(pstate, exprLocation(containerBase)))); else elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); } sbsroutines->transform currently will call array_subscript_transform, hstore_subscript_transform, jsonb_subscript_transform in jsonb_subscript_transform callee we unconditionally do: *indirection = list_delete_first_n(*indirection, pathlen); *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr)); in array_subscript_transform, we do *indirection = list_delete_first_n(*indirection, ndim); That means, if sbsroutines->transform not error out and indirection is not NIL (which is unlikely) then sbsroutines->transform will consume some induction elements. instead of the above verbose ereport(ERROR, error handling, we can use Assert Assert(indirection_length > list_length(*indirection)); for the above comments, i did a refactoring based on v12 (0001 to 0006). -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-21T04:53:53Z
Hi Jian, Thanks for reviewing! I’ve attached v13, which addresses your feedback. See my individual replies below, and let me know if you have any questions! On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote: > in v12-0001 and v12-0002. > in transformIndirection > if (!newresult) > { > /* > * generic subscripting failed; falling back to function call > or > * field selection for a composite type. > */ > Node *n; > /* try to find function for field selection */ > newresult = ParseFuncOrColumn(pstate, > list_make1(n), > list_make1(result), > last_srf, > NULL, > false, > location); > } > the above comments mentioning "function call" is wrong? > you passed NULL for (FuncCall *fn) in ParseFuncOrColumn. > and ParseFuncOrColumn comments says > ```If fn is null, we're dealing with column syntax not function syntax.`` > You are right. Fixed. On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote: > I think coerce_jsonpath_subscript can be further simplified. > we already have message like: > errhint("jsonb subscript must be coercible to either integer or text."), > no need to pass the third argument a constant (INT4OID). > also > ``Oid targetType = UNKNOWNOID;`` > set it as InvalidOid would be better. > attached is a minor refactoring of coerce_jsonpath_subscript > based on (v12-0001 to v12-0004). > Thanks for the patch! I squashed it with v13-0004 and renamed the function to coerce_jsonpath_subscript_to_int4_or_text() for clarity. On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote: > after applied v12-0001 to v12-0006 > + /* emit warning conditionally to minimize duplicate warnings */ > + if (list_length(*indirection) > 0) > + ereport(WARNING, > + errcode(ERRCODE_WARNING), > + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb > subscripting."), > + errhint("use dot-notation for member access, or use non-null integer > constants subscripting for array access."), > + parser_errposition(pstate, warning_location)); > > src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; > WARNING: mixed usage of jsonb simplified accessor syntax and jsonb > subscripting. > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: use dot-notation for member access, or use non-null integer > constants subscripting for array access. > ERROR: subscript type bigint is not supported > LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... > ^ > HINT: jsonb subscript must be coercible to either integer or text. > > The above example looks very bad. location printed twice, hint message > is different. > two messages level (ERROR, WARNING). > > also "or use non-null integer constants subscripting for array > access." seems wrong? > as you can see the below hint message saying it could be text or integer. > > select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]; > ERROR: subscript type bigint is not supported > LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]... > ^ > HINT: jsonb subscript must be coercible to either integer or text. > > also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4]; > return NULL, so "use non-null integer constants" is wrong. > I see your confusion about the WARNING/ERROR messages. My intent was to encourage users to use consistent syntax flavors rather than mixing the SQL standard JSON simplified accessor with the pre-standard PostgreSQL jsonb subscripting. In the meantime, I want to support mixed syntax flavors as much as possible. However, your feedback makes it clear that users don’t need that level of details. So, in v13 I’ve removed the WARNING message and instead added a comment explaining how we support mixed syntaxes. Here's the relevant diff between v12 and v13, hope it helps: --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti ListCell *lc; Datum jsp; int pathlen = 0; - int warning_location = -1; sbsref->refupperindexpr = NIL; sbsref->reflowerindexpr = NIL; @@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti if (jpi_from == NULL) { /* - * postpone emitting the warning until the end of this - * function + * Break out of the loop if the subscript is not a + * non-null integer constant, so that we can fall back to + * jsonb subscripting logic. + * + * This is needed to handle cases with mixed usage of SQL + * standard json simplified accessor syntax and PostgreSQL + * jsonb subscripting syntax, e.g: + * + * select (jb).a['b'].c from jsonb_table; + * + * where dot-notation (.a and .c) is the SQL standard json + * simplified accessor syntax, and the ['b'] subscript is + * the PostgreSQL jsonb subscripting syntax, because 'b' + * is not a non-null constant integer and cannot be used + * for json array access. + * + * In this case, we cannot create a JsonPath item, so we + * break out of the loop and let + * jsonb_subscript_transform() handle this indirection as + * a PostgreSQL jsonb subscript. */ - warning_location = exprLocation(ai->uidx); pfree(jpi->value.array.elems); pfree(jpi); break; @@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti *indirection = list_delete_first_n(*indirection, pathlen); - /* emit warning conditionally to minimize duplicate warnings */ - if (list_length(*indirection) > 0) - ereport(WARNING, - errcode(ERRCODE_WARNING), - errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."), - errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."), - parser_errposition(pstate, warning_location)); - jsp = jsonPathFromParseResult(&jpres, 0, NULL); On Thu, Jul 10, 2025 at 6:34 AM jian he <jian.universality@gmail.com> wrote: > typedef struct FieldAccessorExpr > { > Expr xpr; > char *fieldname; /* name of the JSONB object field > accessed via > * dot notation */ > Oid faecollid pg_node_attr(query_jumble_ignore); > int location; > } FieldAccessorExpr; > > first field as NodeTag should be just fine? > I am not sure the field "location" is needed now, if it is needed, it > should be > type as ParseLoc. > we should add it to src/tools/pgindent/typedefs.list Good catch! I changed the first field from Expr to NodeTag, removed the location field, and added the Node to typedefs.list. On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote: > On Thu, Jul 10, 2025 at 9:34 PM jian he <jian.universality@gmail.com> > wrote: > > > > ------------------------------------------ > > in jsonb_subscript_make_jsonpath we have > > foreach(lc, *indirection) > > { > > if (IsA(accessor, String)) > > .... > > else if (IsA(accessor, A_Indices)) > > else > > /* > > * Unsupported node type for creating jsonpath. Instead of > > * throwing an ERROR, break here so that we create a > jsonpath from > > * as many indirection elements as we can and let > > * transformIndirection() fallback to alternative logic to > handle > > * the remaining indirection elements. > > */ > > break; > > } > > the above ELSE branch comments look suspicious to me. > I kept the "else" branch, but added an Assert with updated comments. Here's the relevant diff between v12 and v13: In jsonb_subscript_make_jsonpath(): else { - * Unsupported node type for creating jsonpath. Instead of - * throwing an ERROR, break here so that we create a jsonpath from - * as many indirection elements as we can and let - * transformIndirection() fallback to alternative logic to handle - * the remaining indirection elements. + * Unexpected node type in indirection list. This should not + * happen with current grammar, but we handle it defensively by + * breaking out of the loop rather than crashing. In case of + * future grammar changes that might introduce new node types, + * this allows us to create a jsonpath from as many indirection + * elements as we can and let transformIndirection() fallback to + * alternative logic to handle the remaining indirection elements. + Assert(false); /* not reachable */ break; } Hope the updated comment clarifies why I kept it. On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote: > > > transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath > > As you can see, transformIndirection have a long distance from > > jsonb_subscript_make_jsonpath, > > let transformIndirection handle remaining indirection elements seems not > good. > In 0001's commit message, we are given the following promise: Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Date: Tue Jul 8 22:18:07 2025 -0700 Allow transformation of only a sublist of subscripts This is a preparation step for allowing subscripting containers to transform only a prefix of an indirection list and modify the list in-place by removing the processed elements. Currently, all elements are consumed, and the list is set to NIL after transformation. In the following commit, subscripting containers will gain the flexibility to stop transformation when encountering an unsupported indirection and return the remaining indirections to the caller. With this contract provided by the subscripting interface to the container's transform function, I don't feel terribly inappropriate to let jsonb's transform function: jsonb_subscript_transform(), to transform indirections as much as it can, and leave the rest to transformIndirection(). 0002 is a good example that shows why the else branch is useful: it adds support for String as a subscript node of jsonb, while String is still unsupported in other in-core container subscripting. For example, array_subscript_transform() has: if (!IsA(lfirst(idx), A_Indices)) break; The else branch in jsonb_subscript_make_jsonpath() just follows that same convention. > context v12-0001 to v12-0006. > this ELSE branch comments is wrong, because > + if (jsonb_check_jsonpath_needed(*indirection)) > + { > + jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); > + if (sbsref->refjsonbpath) > + return; > + } > in jsonb_check_jsonpath_needed we already use Assert to confirm that > list "indirection" > is either String or A_Indices Node. > The Assert I added in v13, as well as the one you mentioned in jsonb_check_jsonpath_needed(), are both only applicable in development build and won’t help in production. So I’d prefer to break here and let the callee handle any unexpected Node with an ERROR. Let me know if this explanation makes sense. If not, and if you think the current logic is overly complicated, we can discuss further. On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote: > in transformContainerSubscripts we have > sbsroutines->transform(sbsref, indirection, pstate, > isSlice, isAssignment); > /* > * Error out, if datatype failed to consume any indirection elements. > */ > if (list_length(*indirection) == indirection_length) > { > Node *ind = linitial(*indirection); > if (noError) > return NULL; > if (IsA(ind, String)) > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("type %s does not support dot notation", > format_type_be(containerType)), > parser_errposition(pstate, > exprLocation(containerBase)))); > else if (IsA(ind, A_Indices)) > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("type %s does not support array subscripting", > format_type_be(containerType)), > parser_errposition(pstate, > exprLocation(containerBase)))); > else > elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); > } > sbsroutines->transform currently will call > array_subscript_transform, hstore_subscript_transform, > jsonb_subscript_transform > > in jsonb_subscript_transform callee we unconditionally do: > *indirection = list_delete_first_n(*indirection, pathlen); > *indirection = list_delete_first_n(*indirection, > list_length(upperIndexpr)); > in array_subscript_transform, we do > *indirection = list_delete_first_n(*indirection, ndim); > > That means, if sbsroutines->transform not error out and indirection is > not NIL (which is unlikely) > then sbsroutines->transform will consume some induction elements. > instead of the above verbose ereport(ERROR, error handling, we can use > Assert > Assert(indirection_length > list_length(*indirection)); > > for the above comments, i did a refactoring based on v12 (0001 to 0006). Thanks for the patch! I didn’t apply it for the same reason I mentioned in the previous quote reply. In the case of an unexpected Node type, I prefer raising an explicit ERROR rather than using an Assert() that crashes in a dev build but silently continues in production. One possible way to hit these ERRORs is by creating a custom data type that supports subscripting, but only for a subset of Node types that transformIndirection() allows. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-08-22T08:19:35Z
On Thu, Aug 21, 2025 at 12:54 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi Jian, > > Thanks for reviewing! I’ve attached v13, which addresses your > feedback. > hi. in the context of v13-0001 and v13-0002. In transformIndirection: while (subscripts) { Node *newresult = (Node *) = transformContainerSubscripts(....&subscripts...) } In transformContainerSubscripts: sbsroutines->transform(sbsref, indirection, pstate, isSlice, isAssignment); /* * Error out, if datatype failed to consume any indirection elements. */ if (list_length(*indirection) == indirection_length) { } As you can see from the above code pattern, "sbsroutines->transform" is responsible for consuming the "indirection". If it doesn’t consume it, the function should either return NULL or raise an error. But what happens if the elements consumed by "sbsroutines->transform" are not contiguous? jsonb_subscript_transform below changes in v13-0002 + if (!IsA(lfirst(idx), A_Indices)) + break; may cause elements consumed not contiguous. diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c index 8ad6aa1ad4f..a0d38a0fd80 100644 --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -55,9 +55,14 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, */ foreach(idx, *indirection) { - A_Indices *ai = lfirst_node(A_Indices, idx); + A_Indices *ai; Node *subExpr; + if (!IsA(lfirst(idx), A_Indices)) + break; + + ai = lfirst_node(A_Indices, idx); + if (isSlice) { Node *expr = ai->uidx ? ai->uidx : ai->lidx; @@ -160,7 +165,9 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; - *indirection = NIL; + /* Remove processed elements */ + if (upperIndexpr) + *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr)); } If the branch ``if (!IsA(lfirst(idx), A_Indices))`` is ever reached, then ``*indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));`` might produce an incorrect result? However, it appears that this branch is currently unreachable, at least regress tests not coverage for + if (!IsA(lfirst(idx), A_Indices)) + break; Later patch we may have resolved this issue, but in the context of 0001 and 0002, this seem inconsistent? -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-22T19:33:22Z
Hi Jian, On Fri, Aug 22, 2025 at 1:20 AM jian he <jian.universality@gmail.com> wrote: > On Thu, Aug 21, 2025 at 12:54 PM Alexandra Wang > <alexandra.wang.oss@gmail.com> wrote: > > > > Hi Jian, > > > > Thanks for reviewing! I’ve attached v13, which addresses your > > feedback. > > > > hi. > in the context of v13-0001 and v13-0002. > > In transformIndirection: > while (subscripts) > { > Node *newresult = (Node *) = > transformContainerSubscripts(....&subscripts...) > } > > In transformContainerSubscripts: > sbsroutines->transform(sbsref, indirection, pstate, > isSlice, isAssignment); > /* > * Error out, if datatype failed to consume any indirection elements. > */ > if (list_length(*indirection) == indirection_length) > { > } > As you can see from the above code pattern, "sbsroutines->transform" is > responsible for consuming the "indirection". > If it doesn’t consume it, the function should either return NULL or raise > an > error. > But what happens if the elements consumed by "sbsroutines->transform" > are not contiguous? > > jsonb_subscript_transform below changes in v13-0002 > + if (!IsA(lfirst(idx), A_Indices)) > + break; > may cause elements consumed not contiguous. > > > diff --git a/src/backend/utils/adt/jsonbsubs.c > b/src/backend/utils/adt/jsonbsubs.c > index 8ad6aa1ad4f..a0d38a0fd80 100644 > --- a/src/backend/utils/adt/jsonbsubs.c > +++ b/src/backend/utils/adt/jsonbsubs.c > @@ -55,9 +55,14 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, > */ > foreach(idx, *indirection) > { > - A_Indices *ai = lfirst_node(A_Indices, idx); > + A_Indices *ai; > Node *subExpr; > > + if (!IsA(lfirst(idx), A_Indices)) > + break; > + > + ai = lfirst_node(A_Indices, idx); > + > if (isSlice) > { > Node *expr = ai->uidx ? ai->uidx : ai->lidx; > @@ -160,7 +165,9 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > > - *indirection = NIL; > + /* Remove processed elements */ > + if (upperIndexpr) > + *indirection = list_delete_first_n(*indirection, > list_length(upperIndexpr)); > } > > If the branch ``if (!IsA(lfirst(idx), A_Indices))`` is ever reached, then > ``*indirection = list_delete_first_n(*indirection, > list_length(upperIndexpr));`` > might produce an incorrect result? > > However, it appears that this branch is currently unreachable, > at least regress tests not coverage for > + if (!IsA(lfirst(idx), A_Indices)) > + break; > > Later patch we may have resolved this issue, but in the context of > 0001 and 0002, > this seem inconsistent? > I don’t understand the question. In the case of an unsupported Node type (not an Indices in patch 0001 or 0002), we break out of the loop to stop transforming the remaining subscripts. So there won’t be any ‘not contiguous’ indirection elements. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-08-25T08:09:47Z
On Sat, Aug 23, 2025 at 3:34 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > I don’t understand the question. In the case of an unsupported Node > type (not an Indices in patch 0001 or 0002), we break out of the loop > to stop transforming the remaining subscripts. So there won’t be any > ‘not contiguous’ indirection elements. > Sorry, my last message is wrong. this time, I applied v13-0001 to v13-0005. and I found some minor issues..... v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch + /* + * Error out, if datatype failed to consume any indirection elements. + */ + if (list_length(*indirection) == indirection_length) + { + Node *ind = linitial(*indirection); + + if (noError) + return NULL; + + if (IsA(ind, String)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support dot notation", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase)))); + else if (IsA(ind, A_Indices)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support array subscripting", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase)))); + else + elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); + } these error message still not reached after I apply v13-0005-Implement-read-only-dot-notation-for-jsonb.patch maybe we can simply do + if (noError) + return NULL; + + ereport(ERROR, + errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("type %s does not support subscripting", + format_type_be(containerType)), + parser_errposition(pstate, exprLocation(containerBase))); ? for V13-0004. in master we have if (subExpr == NULL) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("jsonb subscript must have text type"), parser_errposition(pstate, exprLocation(subExpr)))); but this part is removed from v13-0004-Extract-coerce_jsonpath_subscript ? coerce_jsonpath_subscript_to_int4_or_text maybe we can just rename to coerce_jsonpath_subscript, then add some comments explaining why currently only support result node coerce to text or int4 data type. for v13-0005-Implement-read-only-dot-notation-for-jsonb.patch + i = 0; + foreach(lc, sbsref->refupperindexpr) { + Expr *e = (Expr *) lfirst(lc); + + /* When slicing, individual subscript bounds can be omitted */ + if (!e) { + sbsrefstate->upperprovided[i] = false; + sbsrefstate->upperindexnull[i] = true; + } else { + sbsrefstate->upperprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->upperindex[i], + &sbsrefstate->upperindexnull[i]); + } + i++; } curly braces should be put into the next new line. there are two + if (!sbsref->refjsonbpath) +{ +} maybe we can put these two together. +SELECT (jb)['b']['x'].z FROM test_jsonb_dot_notation; -- returns NULL with warnings + z +--- + +(1 row) there is no warning, "returns NULL with warnings" should be removed. +-- clean up +DROP TABLE test_jsonb_dot_notation; \ No newline at end of file to remove "\ No newline at end of file" you need to add a newline to jsonb.sql, you may see other regress sql test files. + foreach(lc, *indirection) + if() + { + } + else + { + /* + * Unexpected node type in indirection list. This should not + * happen with current grammar, but we handle it defensively by + * breaking out of the loop rather than crashing. In case of + * future grammar changes that might introduce new node types, + * this allows us to create a jsonpath from as many indirection + * elements as we can and let transformIndirection() fallback to + * alternative logic to handle the remaining indirection elements. + */ + Assert(false); /* not reachable */ + break; + } + + /* append path item */ + path->next = jpi; + path = jpi; + pathlen++; If the above else branch is reached, it will crash due to `Assert(false);`, which contradicts the preceding comments. to make the comments accurate, we need remove " Assert(false); /* not reachable */" ? /* - * Transform and convert the subscript expressions. Jsonb subscripting - * does not support slices, look only at the upper index. + * We would only reach here if json simplified accessor is not needed, or + * if jsonb_subscript_make_jsonpath() didn't consume any indirection + * element — either way, the first indirection element could not be + * converted into a JsonPath component. This happens when it's a non-slice + * A_Indices with a non-integer upper index. The code below falls back to + * traditional jsonb subscripting for such cases. */ the above comments, "non-integer" part is wrong? For example, the below two SELECTs both use "traditional" jsonb subscripting logic. CREATE TABLE ts1 AS SELECT '[1, [2]]' ::jsonb jb; SELECT (jb)[1] FROM ts1; SELECT (jb)['a'] FROM ts1; -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-26T03:52:28Z
Hi Jian, I’ve attached v14, which includes only indentation and comment changes from v13. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > On Sat, Aug 23, 2025 at 3:34 AM Alexandra Wang > <alexandra.wang.oss@gmail.com> wrote: > > > > I don’t understand the question. In the case of an unsupported Node > > type (not an Indices in patch 0001 or 0002), we break out of the loop > > to stop transforming the remaining subscripts. So there won’t be any > > ‘not contiguous’ indirection elements. > > > > Sorry, my last message is wrong. > this time, I applied v13-0001 to v13-0005. > and I found some minor issues..... > No problem. Thanks for reviewing. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch > + /* > + * Error out, if datatype failed to consume any indirection elements. > + */ > + if (list_length(*indirection) == indirection_length) > + { > + Node *ind = linitial(*indirection); > + > + if (noError) > + return NULL; > + > + if (IsA(ind, String)) > + ereport(ERROR, > + (errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("type %s does not support dot notation", > + format_type_be(containerType)), > + parser_errposition(pstate, exprLocation(containerBase)))); > + else if (IsA(ind, A_Indices)) > + ereport(ERROR, > + (errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("type %s does not support array subscripting", > + format_type_be(containerType)), > + parser_errposition(pstate, exprLocation(containerBase)))); > + else > + elog(ERROR, "invalid indirection operation: %d", nodeTag(ind)); > + } > > these error message still not reached after I apply > v13-0005-Implement-read-only-dot-notation-for-jsonb.patch > maybe we can simply do > > + if (noError) > + return NULL; > + > + ereport(ERROR, > + errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("type %s does not support subscripting", > + format_type_be(containerType)), > + parser_errposition(pstate, exprLocation(containerBase))); > ? > The regression tests do not currently trigger these ERROR cases because Assignment for dot-notation has not yet been implemented. At present, the noError argument of transformContainerSubscripts() is set to true for SELECT and false for UPDATE. As a result, the ereport(ERROR) calls are never reached in SELECT queries. This patch series also does not aim to implement dot-notation for UPDATE statements that assign to a jsonb field. We have therefore made only minimal adjustments in transformAssignmentIndirection(). Since the assignment code path is separate, it likewise does not reach the ereport(ERROR) cases with in-core data types. Once dot-notation support for Assignment is added, those ERROR messages will become relevant. That said, the ERROR messages are not "dead code." They can still be triggered when working with custom data types. For example, you can define a custom array-like type by copying the in-core array type and slightly modifying its *_subscript_transform() function. For now, you can simply update array_subscript_transform() in-place with the following diff: --- a/src/backend/utils/adt/arraysubs.c +++ b/src/backend/utils/adt/arraysubs.c @@ -82,6 +82,8 @@ array_subscript_transform(SubscriptingRef *sbsref, ai = lfirst_node(A_Indices, idx); + if (isSlice) + break; if (isSlice) { if (ai->lidx) This way, we have a *_subscript_transform function that you can register for a custom array-like data type that does not handle slicing. Then you can hit the error message like this: test=# update t set arr[1:] = 1; ERROR: 42804: type integer[] does not support array subscripting LINE 1: update t set arr[1:] = 1; ^ LOCATION: transformContainerSubscripts, parse_node.c:345 test=# \d t Table "public.t" Column | Type | Collation | Nullable | Default --------+-----------+-----------+----------+--------- arr | integer[] | | | I know this error message is not very precise for the demo case in particular (since one might expect something about slicing), but it still demonstrates how this code path is relevant. In the meantime, I don’t think it’s worth adding a new data type to exercise this error message, given that we are likely to make changes to the assignment code path for in-core types. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > for V13-0004. > in master we have > if (subExpr == NULL) > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("jsonb subscript must have text type"), > parser_errposition(pstate, > exprLocation(subExpr)))); > but this part is removed from v13-0004-Extract-coerce_jsonpath_subscript > ? I removed this one because you asked me to (see our past conversation for v11). On Wed, Jul 9, 2025 at 1:01 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > On Tue, Jun 24, 2025 at 10:57 PM jian he <jian.universality@gmail.com> > wrote: > >> v11-0004-Extract-coerce_jsonpath_subscript.patch >> + /* >> + * We known from can_coerce_type that coercion will succeed, so >> + * coerce_type could be used. Note the implicit coercion context, which >> is >> + * required to handle subscripts of different types, similar to >> overloaded >> + * functions. >> + */ >> + subExpr = coerce_type(pstate, >> + subExpr, subExprType, >> + targetType, -1, >> + COERCION_IMPLICIT, >> + COERCE_IMPLICIT_CAST, >> + -1); >> + if (subExpr == NULL) >> + ereport(ERROR, >> + (errcode(ERRCODE_DATATYPE_MISMATCH), >> + errmsg("jsonb subscript must have text type"), >> >> the targetType can be "integer", then the error message >> errmsg("jsonb subscript must have text type") would be wrong? >> Also this error handling is not necessary. >> since we can_coerce_type already tell us that coercion will succeed. >> Also, do we really put v11-0004 as a separate patch? >> > > Thanks for pointing this out! I’ve removed the unnecessary error > handling as you suggested. I kept v12-0004 as a separate patch to make > the review easier, but I’ll leave it up to the committer to decide > whether to squash it with the other commits. > I think either way works, but I’d appreciate it if we could settle on one option so we don’t keep going back and forth. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > coerce_jsonpath_subscript_to_int4_or_text > maybe we can just rename to > coerce_jsonpath_subscript, > then add some comments explaining why currently only support result node > coerce > to text or int4 data type. > What is the reason for renaming this function to a less explicit name? It was previously called coerce_jsonpath_subscript() because we passed INT4OID as an argument to specify the target type. I followed your earlier review comment (for v12) to remove that argument and hard-code INT4OID within the function. Given that change, I think it’s clearer to keep a more explicit name that describes what the function achieves. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > for v13-0005-Implement-read-only-dot-notation-for-jsonb.patch > > + i = 0; > + foreach(lc, sbsref->refupperindexpr) { > + Expr *e = (Expr *) lfirst(lc); > + > + /* When slicing, individual subscript bounds can be omitted */ > + if (!e) { > + sbsrefstate->upperprovided[i] = false; > + sbsrefstate->upperindexnull[i] = true; > + } else { > + sbsrefstate->upperprovided[i] = true; > + /* Each subscript is evaluated into appropriate array entry */ > + ExecInitExprRec(e, state, > + &sbsrefstate->upperindex[i], > + &sbsrefstate->upperindexnull[i]); > + } > + i++; > } > curly braces should be put into the next new line. > Fixed. Thank you! On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > there are two > + if (!sbsref->refjsonbpath) > +{ > +} > maybe we can put these two together. > The code blocks are different enough, and merging them would only reduce readability in my opinion. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > +SELECT (jb)['b']['x'].z FROM test_jsonb_dot_notation; -- returns NULL > with warnings > + z > +--- > + > +(1 row) > there is no warning, "returns NULL with warnings" should be removed. > Fixed. Thank you! On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > +-- clean up > +DROP TABLE test_jsonb_dot_notation; > \ No newline at end of file > > to remove "\ No newline at end of file" > you need to add a newline to jsonb.sql, you may see other regress sql > test files. > Fixed. Thank you! On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > + foreach(lc, *indirection) > + if() > + { > + } > + else > + { > + /* > + * Unexpected node type in indirection list. This should not > + * happen with current grammar, but we handle it defensively by > + * breaking out of the loop rather than crashing. In case of > + * future grammar changes that might introduce new node types, > + * this allows us to create a jsonpath from as many indirection > + * elements as we can and let transformIndirection() fallback to > + * alternative logic to handle the remaining indirection elements. > + */ > + Assert(false); /* not reachable */ > + break; > + } > + > + /* append path item */ > + path->next = jpi; > + path = jpi; > + pathlen++; > > If the above else branch is reached, it will crash due to `Assert(false);`, > which contradicts the preceding comments. > to make the comments accurate, we need remove > " Assert(false); /* not reachable */" > ? > The Assert is intentional. We want to fail fast for the “not reachable” case in the development build, while breaking out of the loop in the production build. On Mon, Aug 25, 2025 at 1:10 AM jian he <jian.universality@gmail.com> wrote: > /* > - * Transform and convert the subscript expressions. Jsonb subscripting > - * does not support slices, look only at the upper index. > + * We would only reach here if json simplified accessor is not needed, or > + * if jsonb_subscript_make_jsonpath() didn't consume any indirection > + * element — either way, the first indirection element could not be > + * converted into a JsonPath component. This happens when it's a non-slice > + * A_Indices with a non-integer upper index. The code below falls back to > + * traditional jsonb subscripting for such cases. > */ > the above comments, "non-integer" part is wrong? > For example, the below two SELECTs both use "traditional" jsonb > subscripting logic. > CREATE TABLE ts1 AS SELECT '[1, [2]]' ::jsonb jb; > SELECT (jb)[1] FROM ts1; > SELECT (jb)['a'] FROM ts1; > I updated the comment. In the updated comment, I also chose to use “pre-standard” json subscripting instead of “traditional” json subscripting. Hope that helps! Best, Alex -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-08-27T04:15:06Z
On Tue, Aug 26, 2025 at 11:53 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi Jian, > > I’ve attached v14, which includes only indentation and comment changes > from v13. > hi. still reviewing v14-0001 to v14-0005. I am confused by the comments in jsonb_subscript_transform "" * (b) jsonb_subscript_make_jsonpath() examined the first indirection * element but could not turn it into a JsonPath component (for example, * ['a']). """ CREATE TABLE ts2 AS SELECT '{"a": "b"}' ::jsonb jb; SELECT (jb)['a'] FROM ts2; SELECT (jb).a['a'] FROM ts2; in these two cases, ['a'], it won't reach jsonb_subscript_make_jsonpath, because jsonb_check_jsonpath_needed will return false. maybe I am missing something, for the above point b, can you use some SQL example to explain it? make_jsonpath_item_expr(ParseState *pstate, Node *expr, List **exprs) if expr satisfies the condition, then append the transformed expr to List exprs if not returns NULL. The list append logic is within make_jsonpath_item_int. seems not that intuitive (another level of indirection) maybe we don't need make_jsonpath_item_int. Another reason would be in make_jsonpath_item_int it's not easy found out that exprs actually refers to SubscriptingRef->refupperindexpr also v14-0006-Implement-Jsonb-subscripting-with-slicing.patch also doesn't use make_jsonpath_item_int that much frequently. -------------------------------- in v14-0005-Implement-read-only-dot-notation-for-jsonb.patch + * In addition to building the JsonPath expression, this function populates + * the following fields of the given SubscriptingRef: + * - refjsonbpath: the generated JsonPath + * - refupperindexpr: upper index expressions (object keys or array indexes) + * - reflowerindexpr: lower index expressions, remains NIL as slices are not yet supported. "reflowerindexpr" changed in v14-0006, so "reflowerindexpr" comments in v14-0006 need to change. but "reflowerindexpr" we didn't touch in v14-0005, so the "reflowerindexpr" comments look weird. +SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL due to strict mode we can not specify strict just like, ``select 'strict $'::jsonpath;`` all the patches in here are default to lax mode. so I am confused by this comment. ("strict mode"). -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-08-27T07:26:08Z
> On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Best, > Alex > <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch> I just started with 0001, I just got a comment: diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index f7d07c84542..58a4b9df157 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -361,7 +361,7 @@ extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, int32 containerTypMod, - List *indirection, + List **indirection, bool isAssignment); Here we change “indirection” from * to **, I guess the reason is to indicate if the indirection list is fully processed. In case of fully processed, set it to NULL, then caller gets NULL via **. As “indirection” is of type “List *”, can we just set “indirection->length = 0”? I checked pg_list.h, there is not a function or macro to cleanup a list (free elements and reset length to 0). There is a “list_free(List *list)”, but it will free “list” itself as well. Maybe we can add a new function, say “list_reset(List *list)” or “list_cleanup(List *list)”. This way will keep the function interface unchanged, and less changes would be involved. diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index d66276801c6..e1565e11d09 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -466,14 +466,13 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) Assert(IsA(n, String)); /* process subscripts before this field selection */ - if (subscripts) + while (subscripts) result = (Node *) transformContainerSubscripts(pstate, Changing “if” to “while” in 0001 is a little confusing. Because the impression is that, transform will stop when it cannot handle next indirection, so that transform again will also fail. Maybe my impression is wrong, can you add some comments here to explain why change “if” to “while”. Regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-08-29T03:29:15Z
>> On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: >> >> Best, >> Alex >> <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch> > > I found a bug. ``` INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’); ``` If I use a index following a slice, it doesn’t work: ``` evantest=# select data[0] from test_jsonb_types; data ------ 1 (2 rows) evantest=# select data[0:2][1] from test_jsonb_types; # This should return “2" data ------ (2 rows) evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; # returned the slice properly data ----------------------------------------------------- [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}] (2 rows) evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # also returned the slice, which is wrong data ----------------------------------------------------- [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}] (2 rows) ``` We should consider a slice as a container, so the fix is simple. My quick unpolished fix is: ``` chaol@ChaodeMacBook-Air postgresql % git diff diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c index cb72d12ca3f..8845dcf239a 100644 --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti ListCell *lc; Datum jsp; int pathlen = 0; + bool isSlice = false; sbsref->refupperindexpr = NIL; sbsref->reflowerindexpr = NIL; @@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti if (ai->is_slice) { + isSlice = true; while (list_length(sbsref->reflowerindexpr) < list_length(sbsref->refupperindexpr)) sbsref->reflowerindexpr = lappend(sbsref->reflowerindexpr, NULL); @@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti path->next = jpi; path = jpi; pathlen++; + + if (isSlice) + break; } if (pathlen == 0) ``` After the fix, let’s test again: ``` evantest=# select data[0:2][1] from test_jsonb_types; # good result data ------ 2 (2 rows) evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # good result data ------------------------- {"b": {"c": {"d": 99}}} (2 rows) ``` Regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-08-29T03:42:31Z
> > Best, > Alex > <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch> I am trying to split different topics to different email to keep every issue to be focused. I also have a suggestion. If I do: ``` — s1 select (t.data)['con']['a'][1]['b']['c']['d'] from test_jsonb_types t; —s2 select (t.data).con.a[1].b['c'].d from test_jsonb_types t; ``` The two statements are actually identical. But they generate quite different rewritten query trees. S1’s rewritten tree is much simpler than s2’s. However, their plan trees are the same. I think we can convert string indirections to indices indirection first before transform indirections, so that the two SQLs will generate the same rewritten query tree, which would simplify the code logic of rewriting query tree and the planner. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
-
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-08-29T07:22:04Z
> On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch> A few other comments: -static Node * -transformIndirection(ParseState *pstate, A_Indirection *ind) +Node * +transformIndirection(ParseState *pstate, A_Indirection *ind, + bool *trailing_star_expansion) { Node *last_srf = pstate->p_last_srf; Node *result = transformExprRecurse(pstate, ind->arg); @@ -454,12 +454,7 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) if (IsA(n, A_Indices)) subscripts = lappend(subscripts, n); else if (IsA(n, A_Star)) - { - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("row expansion via \"*\" is not supported here"), - parser_errposition(pstate, location))); - } + subscripts = lappend(subscripts, n); else { Assert(IsA(n, String)); Now, the 3 clauses are quite similar, the if-elseif-else can be simplified as: If (!IsA(n, A_Indices) && !Is_A(n, A_Start)) Assert(IsA(n, String)); Subscripts = lappend(subscripts, n) +static bool +jsonb_check_jsonpath_needed(List *indirection) +{ + ListCell *lc; + + foreach(lc, indirection) + { + Node *accessor = lfirst(lc); + + if (IsA(accessor, String)) + return true; Like I mentioned in my previous comment, if we convert String (dot notation) to Indices in rewritten phase, then jsonpath might be no longer needed, and the code logic is much simplified. --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, int32 containerTypMod, - List *indirection, + List **indirection, bool isAssignment) { SubscriptingRef *sbsref; @@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate, * element. If any of the items are slice specifiers (lower:upper), then * the subscript expression means a container slice operation. */ - foreach(idx, indirection) + foreach(idx, *indirection) { A_Indices *ai = lfirst_node(A_Indices, idx); Should this foreach be pull up to out of transformContainerSubscripts()? Originally transformContainerSubscripts() runs only once, but now it’s placed in a while loop, and every time this foreach needs to go through the entire indirection list, which are duplicated computation. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-30T00:53:06Z
Hi Chao, Thanks for reviewing! On Thu, Aug 28, 2025 at 8:29 PM Chao Li <li.evan.chao@gmail.com> wrote: > > On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> > wrote: > > Best, > Alex > <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch> > <v14-0003-Export-jsonPathFromParseResult.patch> > <v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch> > <v14-0005-Implement-read-only-dot-notation-for-jsonb.patch> > <v14-0007-Implement-jsonb-wildcard-member-accessor.patch> > <v14-0006-Implement-Jsonb-subscripting-with-slicing.patch> > <v14-0004-Extract-coerce_jsonpath_subscript.patch> > > > > I found a bug. > ``` > INSERT INTO test_jsonb_types (data) VALUES > ('[1, 2, "three"]'), > ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’); > ``` > > If I use a index following a slice, it doesn’t work: > > ``` > evantest=# select data[0] from test_jsonb_types; > data > ------ > > 1 > > (2 rows) > > evantest=# select data[0:2][1] from test_jsonb_types; # This should return > “2" > data > ------ > > > (2 rows) > > evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; # > returned the slice properly > data > ----------------------------------------------------- > > [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}] > (2 rows) > > evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # > also returned the slice, which is wrong > data > ----------------------------------------------------- > > [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}] > (2 rows) > ``` > > We should consider a slice as a container, so the fix is simple. My quick > unpolished fix is: > > ``` > chaol@ChaodeMacBook-Air postgresql % git diff > diff --git a/src/backend/utils/adt/jsonbsubs.c > b/src/backend/utils/adt/jsonbsubs.c > index cb72d12ca3f..8845dcf239a 100644 > --- a/src/backend/utils/adt/jsonbsubs.c > +++ b/src/backend/utils/adt/jsonbsubs.c > @@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List > **indirection, Subscripti > ListCell *lc; > Datum jsp; > int pathlen = 0; > + bool isSlice = false; > > sbsref->refupperindexpr = NIL; > sbsref->reflowerindexpr = NIL; > @@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List > **indirection, Subscripti > > if (ai->is_slice) > { > + isSlice = true; > while > (list_length(sbsref->reflowerindexpr) < > list_length(sbsref->refupperindexpr)) > sbsref->reflowerindexpr = > lappend(sbsref->reflowerindexpr, NULL); > > @@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List > **indirection, Subscripti > path->next = jpi; > path = jpi; > pathlen++; > + > + if (isSlice) > + break; > } > > if (pathlen == 0) > ``` > > After the fix, let’s test again: > > ``` > evantest=# select data[0:2][1] from test_jsonb_types; # good result > data > ------ > 2 > > (2 rows) > > evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # > good result > data > ------------------------- > > {"b": {"c": {"d": 99}}} > (2 rows) > ``` > TL;DR: It is a feature, not a bug. See longer explanation below: This behavior aligns with the SQL:2023 standard. While the result you expected is more intuitive in my opinion, it is incorrect according to the spec. As I mentioned in the commit message of patch v14-0005: * The SQL standard states that simplified access is equivalent to: JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) where: VEP = <value expression primary> JC = <JSON simplified accessor op chain>* The most relevant detail that I want to highlight is "WITH CONDITIONAL ARRAY WRAPPER". The documentation[1] says: *> If the path expression may return multiple values, it might be> necessary to wrap those values using the WITH WRAPPER clause to make> it a valid JSON string, because the default behavior is to not wrap> them, as if WITHOUT WRAPPER were specified. The WITH WRAPPER clause is> by default taken to mean WITH UNCONDITIONAL WRAPPER, which means that> even a single result value will be wrapped. To apply the wrapper only> when multiple values are present, specify WITH CONDITIONAL WRAPPER.> Getting multiple values in result will be treated as an error if> WITHOUT WRAPPER is specified.* So, for your test queries: select data[0:2] from test_jsonb_types; select data[0:2][1] from test_jsonb_types; select (t.data)['con']['a'][0:1] from test_jsonb_types t; select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; We have these equivalents using json_query(): select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; select json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; select json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]** select json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]** **[NOTE]**: .a and ['a'] (as well as .con and ['con']) are not syntactically equivalent, as the dot-notation .a is in "lax" mode, whereas the pre-standard subscript ['a'] is in "strict" mode. I will discuss this more in a separate reply to your other comment. However, for the specific data we inserted in your example table, they happen to return the same results. Since our focus here is not dot-notation, we won’t go further into it here. You can verify correctness with: test=# select data[0:2] = json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; ?column? ---------- t t (2 rows) test=# select (data[0:2][1] is NULL) AND (json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) is NULL) from test_jsonb_types; ?column? ---------- t t (2 rows) test=# select (((t.data)['con']['a'][0:1] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1] = json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t; ?column? ---------- t t (2 rows) test=# select (((t.data)['con']['a'][0:1][0] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1][0] = json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t; ?column? ---------- t t (2 rows) All tests show "t", confirming the current results are all correct. I think the root of your confusion is the meaning of CONDITIONAL ARRAY WRAPPER. So let’s try more examples: -- keep your previous setup drop table test_jsonb_types; create table test_jsonb_types (data jsonb); INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}'); Let's start with: select data[0:2] from test_jsonb_types; It is equivalent to: select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; They all have the following output: test=# select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; json_query --------------------------------------------------------------------- [1, 2, "three"] {"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}} (2 rows) To find out what WITH CONDITIONAL ARRAY WRAPPER does, let's toggle it to WITHOUT ARRAY WRAPPER: test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; json_query --------------------------------------------------------------------- {"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}} (2 rows) The first row return NULL, because we've specified NULL ON ERROR, so let's toggle that as well to ERROR ON ERROR: test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types; ERROR: 22034: JSON path expression in JSON_QUERY must return single item when no wrapper is requested HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array. LOCATION: JsonPathQuery, jsonpath_exec.c:3987 As shown, without ARRAY WRAPPER, the query produces a sequence of JSON value items, not a JSON array. Note that array wrapping is only applied to the final result of a jsonpath, not to each intermediate result in the chain. See the following example: -- new setup truncate test_jsonb_types; INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('[1, [2, 22], "three"]'); test=# select json_query(data, 'lax $[0 to 2][1]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types; json_query ------------ 22 (2 rows) In this case, you can see more clearly that "[0 to 2]" fetches three individual jsonb array elements, and "[1]" treats each of the three jsonb values as independent jsonb arrays, reading the first element of each. This is different from wrapping the intermediate result into an array and accessing the first element of that wrapped array. Another thing I want to point out is that there is a trivial case for "lax" mode when accessing a jsonb object (not a jsonb array) using a JSON array accessor "[0]". For example: -- another setup, still use your data truncate test_jsonb_types; insert into test_jsonb_types VALUES ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}'); test=# select (data).con[0] from test_jsonb_types; con ------------------------------------------------------------ {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]} (1 row) This is equivalent to: test=# select json_query(data, 'lax $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; json_query ------------------------------------------------------------ {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]} (1 row) which is different from the result if we use "strict" mode: test=# select json_query(data, 'strict $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; json_query ------------ (1 row) According to SQL:2023: *In lax mode:— If an operation requires an SQL/JSON array but the operand is not an SQL/JSON array, then the operand is first “wrapped” in an SQL/JSON array prior to performing the operation.— If an operation requires something other than an SQL/JSON array, but the operand is an SQL/JSON array, then the operand is “unwrapped” by converting its elements into an SQL/JSON sequence prior to performing the operation.— After applying the preceding resolutions to structural errors, if there is still a structural error , the result is an empty SQL/JSON sequence.* Please refer to the first point to understand the example queries. Because "lax" mode wraps a jsonb object into an array of a single element, accessing it with [0] will always return the same jsonb object. In fact, you can access it with a chain of [0]s and still get the same jsonb object: test=# select json_query(data, 'lax $.con[0][0][0][0][0][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; json_query ------------------------------------------------------------ {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]} (1 row) test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types; con ------------------------------------------------------------ {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]} (1 row) I hope this long explanation helps! [1] https://www.postgresql.org/docs/current/functions-json.html#SQLJSON-QUERY-FUNCTIONS Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-01T04:11:08Z
Hi Chao, On Thu, Aug 28, 2025 at 8:42 PM Chao Li <li.evan.chao@gmail.com> wrote: > I am trying to split different topics to different email to keep every > issue to be focused. > Sure! On Thu, Aug 28, 2025 at 8:42 PM Chao Li <li.evan.chao@gmail.com> wrote: > I also have a suggestion. > > If I do: > > ``` > — s1 > select (t.data)['con']['a'][1]['b']['c']['d'] from test_jsonb_types t; > > —s2 > select (t.data).con.a[1].b['c'].d from test_jsonb_types t; > ``` > > The two statements are actually identical. But they generate quite > different rewritten query trees. S1’s rewritten tree is much simpler than > s2’s. However, their plan trees are the same. > The above two statements are NOT identical. Specifically, dot-notation (e.g., .con) and pre-standard jsonb subscripting (e.g., ['con']) are NOT semantically the same. Here's an example: -- setup create table t (jb jsonb); insert into t SELECT '{"con": 1}'::jsonb; insert into t SELECT '[{"con": 1}, {"con": {"a": 2}}]'::jsonb; -- queries test=# select (t.jb).con from t; con --------------- 1 [1, {"a": 2}] (2 rows) test=# select (t.jb)['con'] from t; jb ---- 1 (2 rows) As you can see, dot-notation returns different results from jsonb subscripting. As I mentioned in the previous reply: The SQL standard states that simplified access is equivalent to: > JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR > ) > where: > VEP = <value expression primary> > JC = <JSON simplified accessor op chain> And > *In lax mode:* > *— If an operation requires an SQL/JSON array but the operand is not an > SQL/JSON array, then the operand is first “wrapped” in an SQL/JSON array > prior to performing the operation.* > *— If an operation requires something other than an SQL/JSON array, but > the operand is an SQL/JSON array, then the operand is “unwrapped” by > converting its elements into an SQL/JSON sequence prior to performing the > operation.**— After applying the preceding resolutions to structural > errors, if there is still a structural error , the result is an empty > SQL/JSON sequence.* The example query demonstrates the second point above. The dot-notation attempts to access a member field (."con") of a JSON object, while the operand is a JSON array ([{"con": 1}, {"con": {"a": 2}}]). In "lax" mode, the operand is "unwrapped" into a JSON sequence (two elements: {"con": 1} and {"con": {"a": 2}}), and the member field access is performed on each element. The multiple results are then wrapped into a JSON array ([1, {"a": 2}]) due to WITH CONDITIONAL ARRAY WRAPPER. I’ve already explained what "ARRAY WRAPPER" does in my previous reply, so I won't repeat it here. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-02T02:43:58Z
Hi Alex, Thank you so much for such a detailed explanation. > On Aug 30, 2025, at 08:53, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > > > I hope this long explanation helps! > > [1] https://www.postgresql.org/docs/current/functions-json.html#SQLJSON-QUERY-FUNCTIONS > > I used to only use the “->” and “->>” accessors for jsonb statements. After some research, I think my misunderstanding was about the slice operator [x:y]. I thought it could return a new array, however it actually returns a sequence. I revisited the patch diffs again, and got a few other comments. Let me raise them in a separate email. Regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
-
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-02T02:46:35Z
> On Aug 27, 2025, at 15:26, Chao Li <li.evan.chao@gmail.com> wrote: > > diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h > index f7d07c84542..58a4b9df157 100644 > --- a/src/include/parser/parse_node.h > +++ b/src/include/parser/parse_node.h > @@ -361,7 +361,7 @@ extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, > Node *containerBase, > Oid containerType, > int32 containerTypMod, > - List *indirection, > + List **indirection, > bool isAssignment); > > Here we change “indirection” from * to **, I guess the reason is to indicate if the indirection list is fully processed. In case of fully processed, set it to NULL, then caller gets NULL via **. > > As “indirection” is of type “List *”, can we just set “indirection->length = 0”? I checked pg_list.h, there is not a function or macro to cleanup a list (free elements and reset length to 0). There is a “list_free(List *list)”, but it will free “list” itself as well. Maybe we can add a new function, say “list_reset(List *list)” or “list_cleanup(List *list)”. This way will keep the function interface unchanged, and less changes would be involved. After revisiting the changes, I take back this comment. Given the nature that List are always used in pointer form, my comment would just make things much more complicated. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
-
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-02T02:59:39Z
> On Aug 29, 2025, at 15:22, Chao Li <li.evan.chao@gmail.com> wrote: > > +static bool > +jsonb_check_jsonpath_needed(List *indirection) > +{ > + ListCell *lc; > + > + foreach(lc, indirection) > + { > + Node *accessor = lfirst(lc); > + > + if (IsA(accessor, String)) > + return true; > > Like I mentioned in my previous comment, if we convert String (dot notation) to Indices in rewritten phase, then jsonpath might be no longer needed, and the code logic is much simplified. > Taking back this comment. As Alex explained in the other email, dot notation and indices accessor are different. > > --- a/src/backend/parser/parse_node.c > +++ b/src/backend/parser/parse_node.c > @@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate, > Node *containerBase, > Oid containerType, > int32 containerTypMod, > - List *indirection, > + List **indirection, > bool isAssignment) > { > SubscriptingRef *sbsref; > @@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate, > * element. If any of the items are slice specifiers (lower:upper), then > * the subscript expression means a container slice operation. > */ > - foreach(idx, indirection) > + foreach(idx, *indirection) > { > A_Indices *ai = lfirst_node(A_Indices, idx); > > Should this foreach be pull up to out of transformContainerSubscripts()? Originally transformContainerSubscripts() runs only once, but now it’s placed in a while loop, and every time this foreach needs to go through the entire indirection list, which are duplicated computation. After revisiting the code, I think this loop of checking is_slice should be removed here. It seems only arrsubs cares about this is_slice flag. hstore_subs can only take a single indirection, so it can do a simple check like: ``` ai = initial_node(A_Indices, *indirection); If (list_length(*indirection) != 1 || ai->is_slice) { ereport(…) } ``` jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments. So that, “bool isSlace” can be removed from SubscriptTransform, and let every individual subs check is_slice. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-02T03:32:33Z
> On Sep 2, 2025, at 10:59, Chao Li <li.evan.chao@gmail.com> wrote: > > jsonbsubs doesn’t need to check is_slice flag as well, I will explain that in my next email tougher with my new comments. Now jsonb_check_json_needed() loops over all indirections: static bool jsonb_check_jsonpath_needed(List *indirection) { ListCell *lc; foreach(lc, indirection) { Node *accessor = lfirst(lc); if (IsA(accessor, String)) return true; else { Assert(IsA(accessor, A_Indices)); if (castNode(A_Indices, accessor)->is_slice) return true; } } return false; } Let’s consider this statement: select data[‘a’][‘b’].c from jsonb_table. For this indirection list, first two elements are non-slice indices, and the third is a string accessor. So json_check_jsonpath_needed() will return true. Then: static void jsonb_subscript_transform(SubscriptingRef *sbsref, List **indirection, ParseState *pstate, bool isSlice, bool isAssignment) { List *upperIndexpr = NIL; ListCell *idx; /* Determine the result type of the subscripting operation; always jsonb */ sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; if (isSlice || jsonb_check_jsonpath_needed(*indirection)) { jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); if (sbsref->refjsonbpath) return; } It will fall into the call to json_subscript_make_jsonpath(), and it cannot process [‘a’], so sbsref->refjsonbpath will be NULL, then it will fall through down to the logic of processing [‘a’]. That’s actually a waste of looping over the entire indirection list and making an unnecessary call to jsonb_subscript_make_jsonpath(). In json_check_jsonpath_needed(), we can only check the first item. Also, as jsonb_check_jsonpath_needed() already has the logic of checking is_slice, here in jsonb_subscript_transform(), we don’t need to check “isSlice” in the “if” condition at all. I made the change locally, and “make check” passed. My dirty change is like: ``` @@ -118,11 +118,11 @@ coerce_jsonpath_subscript_to_int4_or_text(ParseState *pstate, Node *subExpr) static bool jsonb_check_jsonpath_needed(List *indirection) { - ListCell *lc; + //ListCell *lc; - foreach(lc, indirection) - { - Node *accessor = lfirst(lc); + //foreach(lc, indirection) + //{ + Node *accessor = lfirst(list_head(indirection)); if (IsA(accessor, String) || IsA(accessor, A_Star)) return true; @@ -133,7 +133,8 @@ jsonb_check_jsonpath_needed(List *indirection) if (castNode(A_Indices, accessor)->is_slice) return true; } - } + //break; + //} return false; } @@ -401,7 +435,7 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; - if (isSlice || jsonb_check_jsonpath_needed(*indirection)) + if (jsonb_check_jsonpath_needed(*indirection)) { jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); if (sbsref->refjsonbpath) ``` The other comment is that jsonb_subscript_make_jsonpath() can be optimized a little bit. When indices and dot notations are mixed, it will always hit the clause “if (api_from == NULL)”. In this case, memory of jpi has been allocated, and with the “if” we need to free the memory. So, we can pull up calculating jpi_from, if it is NULL, then we don’t need to allocate and free memory. My dirty change is like: ``` else if (IsA(accessor, A_Indices)) { + JsonPathParseItem *jpi_from = NULL; A_Indices *ai = castNode(A_Indices, accessor); + if (!ai->is_slice) + { + Assert(ai->uidx && !ai->lidx); + jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true); + if (jpi_from == NULL) + { + /* + * Break out of the loop if the subscript is not a + * non-null integer constant, so that we can fall back to + * jsonb subscripting logic. + * + * This is needed to handle cases with mixed usage of SQL + * standard json simplified accessor syntax and PostgreSQL + * jsonb subscripting syntax, e.g: + * + * select (jb).a['b'].c from jsonb_table; + * + * where dot-notation (.a and .c) is the SQL standard json + * simplified accessor syntax, and the ['b'] subscript is + * the PostgreSQL jsonb subscripting syntax, because 'b' + * is not a non-null constant integer and cannot be used + * for json array access. + * + * In this case, we cannot create a JsonPath item, so we + * break out of the loop and let + * jsonb_subscript_transform() handle this indirection as + * a PostgreSQL jsonb subscript. + */ + break; + } + } + jpi = make_jsonpath_item(jpiIndexArray); jpi->value.array.nelems = 1; jpi->value.array.elems = palloc(sizeof(jpi->value.array.elems[0])); @@ -303,12 +337,12 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti } else { - JsonPathParseItem *jpi_from = NULL; + //JsonPathParseItem *jpi_from = NULL; - Assert(ai->uidx && !ai->lidx); - jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true); - if (jpi_from == NULL) - { + //Assert(ai->uidx && !ai->lidx); + //jpi_from = make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true); + //if (jpi_from == NULL) + //{ /* * Break out of the loop if the subscript is not a * non-null integer constant, so that we can fall back to @@ -331,10 +365,10 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti * jsonb_subscript_transform() handle this indirection as * a PostgreSQL jsonb subscript. */ - pfree(jpi->value.array.elems); - pfree(jpi); - break; - } + // pfree(jpi->value.array.elems); + // pfree(jpi); + // break; + //} jpi->value.array.elems[0].from = jpi_from; jpi->value.array.elems[0].to = NULL; ``` With this change, “make check” also passed. The third comment is about test cases. I only see tests for indices following dot notation, like data.a[‘b’], but not the reverse. Let’s add a few test cases for dot notation following indices, like data[‘a’].b. The last comment is about error message: ``` evantest=# select data.a from test_jsonb_types; ERROR: missing FROM-clause entry for table "data" LINE 1: select data.a from test_jsonb_types; ``` “Missing FROM-clause entry” is quite confusing and not providing much useful hint to resolve the problem. When I first saw this error, I was stuck. Only after read through the commit comments, I figured out the problem. For end users, we should provide some more meaningful error messages. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
David G. Johnston <david.g.johnston@gmail.com> — 2025-09-02T03:53:34Z
On Monday, September 1, 2025, Chao Li <li.evan.chao@gmail.com> wrote: > > The last comment is about error message: > > ``` > evantest=# select data.a from test_jsonb_types; > ERROR: missing FROM-clause entry for table "data" > LINE 1: select data.a from test_jsonb_types; > ``` > > “Missing FROM-clause entry” is quite confusing and not providing much > useful hint to resolve the problem. When I first saw this error, I was > stuck. Only after read through the commit comments, I figured out the > problem. For end users, we should provide some more meaningful error > messages. > > I don’t think it’s fair to blame this patch set for this. If you want to reference a component of a composite column you need to write ([tbl.]col).field otherwise the parser gets confused because it wants the thing prior to “field” to be a table name if you don’t use the parentheses. This comes up all the time if one uses composite typed columns. I’ll agree this isn’t the greatest error message but I’d suggest starting a new thread if you want to see about improving it. David J.
-
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-02T05:27:29Z
> On Sep 2, 2025, at 11:53, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Monday, September 1, 2025, Chao Li <li.evan.chao@gmail.com <mailto:li.evan.chao@gmail.com>> wrote: >> >> The last comment is about error message: >> >> ``` >> evantest=# select data.a from test_jsonb_types; >> ERROR: missing FROM-clause entry for table "data" >> LINE 1: select data.a from test_jsonb_types; >> ``` >> >> “Missing FROM-clause entry” is quite confusing and not providing much useful hint to resolve the problem. When I first saw this error, I was stuck. Only after read through the commit comments, I figured out the problem. For end users, we should provide some more meaningful error messages. >> > > I don’t think it’s fair to blame this patch set for this. If you want to reference a component of a composite column you need to write ([tbl.]col).field otherwise the parser gets confused because it wants the thing prior to “field” to be a table name if you don’t use the parentheses. This comes up all the time if one uses composite typed columns. I’ll agree this isn’t the greatest error message but I’d suggest starting a new thread if you want to see about improving it. > > David J. > Sure, we can discuss the topic separately. -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-03T02:16:37Z
Hi Chao, Thanks for reviewing! I've attached v15, which addresses your feedback. I didn't make all the changes you suggested, please see my individual comments below and in the next email. On Mon, Sep 1, 2025 at 7:59 PM Chao Li <li.evan.chao@gmail.com> wrote: > --- a/src/backend/parser/parse_node.c > +++ b/src/backend/parser/parse_node.c > @@ -244,7 +244,7 @@ transformContainerSubscripts(ParseState *pstate, > Node > *containerBase, > Oid containerType, > int32 > containerTypMod, > - List *indirection, > + List > **indirection, > bool isAssignment) > { > SubscriptingRef *sbsref; > @@ -280,7 +280,7 @@ transformContainerSubscripts(ParseState *pstate, > * element. If any of the items are slice specifiers > (lower:upper), then > * the subscript expression means a container slice operation. > */ > - foreach(idx, indirection) > + foreach(idx, *indirection) > { > A_Indices *ai = lfirst_node(A_Indices, idx); > > Should this foreach be pull up to out of transformContainerSubscripts()? > Originally transformContainerSubscripts() runs only once, but now it’s > placed in a while loop, and every time this foreach needs to go through the > entire indirection list, which are duplicated computation. > > > After revisiting the code, I think this loop of checking is_slice should > be removed here. > > It seems only arrsubs cares about this is_slice flag. > > hstore_subs can only take a single indirection, so it can do a simple > check like: > > ``` > ai = initial_node(A_Indices, *indirection); > If (list_length(*indirection) != 1 || ai->is_slice) > { > ereport(…) > } > ``` > > jsonbsubs doesn’t need to check is_slice flag as well, I will explain that > in my next email tougher with my new comments. > > So that, “bool isSlace” can be removed from SubscriptTransform, and let > every individual subs check is_slice. > I don't like the idea of removing the "bool isSlice" argument from the *SubscriptTransform() function pointer. This patch series should make only minimal changes to the subscripting API. While it's true that each implementation can easily determine the boolean value of isSlice itself, I don't see a clear benefit to changing the interface. As you noted, arrsubs cares about isSlice; and users can also define custom data types that support subscripting, so the interface is not limited to built-in types. As the comment for *SubscriptTransform() explains: > (Of course, if the transform method > * does not care to support slicing, it can just throw an error if isSlice.) It's possible that in the end the "isSlice" argument isn't needed at all, but I don’t think this patch set is the right place for that refactor. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-03T02:20:53Z
Hi Chao, Continue from my previous email. On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote: > > > On Sep 2, 2025, at 10:59, Chao Li <li.evan.chao@gmail.com> wrote: > > jsonbsubs doesn’t need to check is_slice flag as well, I will explain that > in my next email tougher with my new comments. > > > > Now jsonb_check_json_needed() loops over all indirections: > > static bool > jsonb_check_jsonpath_needed(List *indirection) > { > ListCell *lc; > > foreach(lc, indirection) > { > Node *accessor = lfirst(lc); > > if (IsA(accessor, String)) > return true; > else > { > Assert(IsA(accessor, A_Indices)); > > if (castNode(A_Indices, accessor)->is_slice) > return true; > } > } > > return false; > } > > Let’s consider this statement: select data[‘a’][‘b’].c from jsonb_table. > > For this indirection list, first two elements are non-slice indices, and > the third is a string accessor. So json_check_jsonpath_needed() will return > true. > > Then: > > static void > jsonb_subscript_transform(SubscriptingRef *sbsref, > List **indirection, > ParseState *pstate, > bool isSlice, > bool isAssignment) > { > List *upperIndexpr = NIL; > ListCell *idx; > > /* Determine the result type of the subscripting operation; always jsonb */ > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > > if (isSlice || jsonb_check_jsonpath_needed(*indirection)) > { > jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); > if (sbsref->refjsonbpath) > return; > } > > It will fall into the call to json_subscript_make_jsonpath(), and it > cannot process [‘a’], so sbsref->refjsonbpath will be NULL, then it will > fall through down to the logic of processing [‘a’]. > > That’s actually a waste of looping over the entire indirection list and > making an unnecessary call to jsonb_subscript_make_jsonpath(). > > In json_check_jsonpath_needed(), we can only check the first item. Also, > as jsonb_check_jsonpath_needed() already has the logic of checking > is_slice, here in jsonb_subscript_transform(), we don’t need to check > “isSlice” in the “if” condition at all. > > I made the change locally, and “make check” passed. > > My dirty change is like: > ``` > @@ -118,11 +118,11 @@ coerce_jsonpath_subscript_to_int4_or_text(ParseState > *pstate, Node *subExpr) > static bool > jsonb_check_jsonpath_needed(List *indirection) > { > - ListCell *lc; > + //ListCell *lc; > > - foreach(lc, indirection) > - { > - Node *accessor = lfirst(lc); > + //foreach(lc, indirection) > + //{ > + Node *accessor = lfirst(list_head(indirection)); > > if (IsA(accessor, String) || IsA(accessor, A_Star)) > return true; > @@ -133,7 +133,8 @@ jsonb_check_jsonpath_needed(List *indirection) > if (castNode(A_Indices, accessor)->is_slice) > return true; > } > - } > + //break; > + //} > > return false; > } > > @@ -401,7 +435,7 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > > - if (isSlice || jsonb_check_jsonpath_needed(*indirection)) > + if (jsonb_check_jsonpath_needed(*indirection)) > { > jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); > if (sbsref->refjsonbpath) > ``` > This change would give an incorrect result for an accessor like "[0].a" when the jsonb column data is a jsonb object instead of a jsonb array. I've added two new test cases to cover this scenario: In jsonb.sql, the newly added tests are: select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).a select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL In v5, I included isSlice as an argument of jsonb_check_jsonpath_needed(), so we no longer need to check the is_slice field of each indirection element. That should make the check more efficient. In jsonb_subscript_make_jsonpath(), I also moved the check for cases like ['a'] earlier, as you suggested. These changes should eliminate some unnecessary comparisons and allocate then immediately free. I also want to point out that we should not recommend users mix the standard simplified accessor syntax with the pre-standard jsonb subscripting, because: 1. It is confusing. I would assume that a user who uses dot-notation understands its definition. As we discussed earlier, (jb).a and (jb)['a'] can return different results: the former is in "lax" mode and with a conditional array wrapper, while the latter has neither. If a user queries something like (jb)['a'].b['c'], I think most likely she wants either (jb)['a']['b']['c'] or (jb).a.b.c. 2. As you said, it hurts performance. When a pre-standard non-integer subscript appears before a dot-notation, we bail out of creating a jsonpath. Alternatively, if a user really wants (jb)['a'].b['c'], it could be explicitly written as (((jb)['a']).b)['c']. This avoids unnecessary looping and bailout. In my earlier revisions (up to v12), I actually emitted warnings when users mixed syntaxes. Starting from v13, I removed the warning based on Jian’s feedback. Here's the context: On Wed, Aug 20, 2025 at 9:53 PM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> > wrote: > >> after applied v12-0001 to v12-0006 >> + /* emit warning conditionally to minimize duplicate warnings */ >> + if (list_length(*indirection) > 0) >> + ereport(WARNING, >> + errcode(ERRCODE_WARNING), >> + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb >> subscripting."), >> + errhint("use dot-notation for member access, or use non-null integer >> constants subscripting for array access."), >> + parser_errposition(pstate, warning_location)); >> >> src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]; >> WARNING: mixed usage of jsonb simplified accessor syntax and jsonb >> subscripting. >> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... >> ^ >> HINT: use dot-notation for member access, or use non-null integer >> constants subscripting for array access. >> ERROR: subscript type bigint is not supported >> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]... >> ^ >> HINT: jsonb subscript must be coercible to either integer or text. >> >> The above example looks very bad. location printed twice, hint message >> is different. >> two messages level (ERROR, WARNING). >> ... omitting some irelavent comments... > > I see your confusion about the WARNING/ERROR messages. My intent was > to encourage users to use consistent syntax flavors rather than mixing > the SQL standard JSON simplified accessor with the pre-standard > PostgreSQL jsonb subscripting. In the meantime, I want to support > mixed syntax flavors as much as possible. However, your feedback makes > it clear that users don’t need that level of details. So, in v13 I’ve > removed the WARNING message and instead added a comment explaining how > we support mixed syntaxes. > > Here's the relevant diff between v12 and v13, hope it helps: > > --- a/src/backend/utils/adt/jsonbsubs.c > +++ b/src/backend/utils/adt/jsonbsubs.c > @@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List > **indirection, Subscripti > ListCell *lc; > Datum jsp; > int pathlen = 0; > - int warning_location = -1; > > sbsref->refupperindexpr = NIL; > sbsref->reflowerindexpr = NIL; > @@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, > List **indirection, Subscripti > if (jpi_from == NULL) > { > /* > - * postpone emitting the warning > until the end of this > - * function > + * Break out of the loop if the > subscript is not a > + * non-null integer constant, so > that we can fall back to > + * jsonb subscripting logic. > + * > + * This is needed to handle cases > with mixed usage of SQL > + * standard json simplified > accessor syntax and PostgreSQL > + * jsonb subscripting syntax, e.g: > + * > + * select (jb).a['b'].c from > jsonb_table; > + * > + * where dot-notation (.a and .c) > is the SQL standard json > + * simplified accessor syntax, and > the ['b'] subscript is > + * the PostgreSQL jsonb > subscripting syntax, because 'b' > + * is not a non-null constant > integer and cannot be used > + * for json array access. > + * > + * In this case, we cannot create > a JsonPath item, so we > + * break out of the loop and let > + * jsonb_subscript_transform() > handle this indirection as > + * a PostgreSQL jsonb subscript. > */ > - warning_location = > exprLocation(ai->uidx); > pfree(jpi->value.array.elems); > pfree(jpi); > break; > @@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, > List **indirection, Subscripti > > *indirection = list_delete_first_n(*indirection, pathlen); > > - /* emit warning conditionally to minimize duplicate warnings */ > - if (list_length(*indirection) > 0) > - ereport(WARNING, > - errcode(ERRCODE_WARNING), > - errmsg("mixed usage of jsonb simplified > accessor syntax and jsonb subscripting."), > - errhint("use dot-notation for member > access, or use non-null integer constants subscripting for array access."), > - parser_errposition(pstate, > warning_location)); > - > jsp = jsonPathFromParseResult(&jpres, 0, NULL); > I think we should definitely add detailed documentation about mixed syntax. However, I'm not convinced it's worth putting more effort into optimizing performance for those cases. Alternatively, we could fail outright in those cases, but I don't think that would be the best user experience. Thoughts? On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote: > The other comment is that jsonb_subscript_make_jsonpath() can be optimized > a little bit. When indices and dot notations are mixed, it will always hit > the clause “if (api_from == NULL)”. In this case, memory of jpi has been > allocated, and with the “if” we need to free the memory. So, we can pull up > calculating jpi_from, if it is NULL, then we don’t need to allocate and > free memory. My dirty change is like: > > ``` > else if (IsA(accessor, A_Indices)) > { > + JsonPathParseItem *jpi_from = NULL; > A_Indices *ai = castNode(A_Indices, accessor); > > + if (!ai->is_slice) > + { > + Assert(ai->uidx && !ai->lidx); > + jpi_from = make_jsonpath_item_expr(pstate, > ai->uidx, &sbsref->refupperindexpr, true); > + if (jpi_from == NULL) > + { > + /* > + * Break out of the loop if the > subscript is not a > + * non-null integer constant, so > that we can fall back to > + * jsonb subscripting logic. > + * > + * This is needed to handle cases > with mixed usage of SQL > + * standard json simplified > accessor syntax and PostgreSQL > + * jsonb subscripting syntax, e.g: > + * > + * select (jb).a['b'].c from > jsonb_table; > + * > + * where dot-notation (.a and .c) > is the SQL standard json > + * simplified accessor syntax, and > the ['b'] subscript is > + * the PostgreSQL jsonb > subscripting syntax, because 'b' > + * is not a non-null constant > integer and cannot be used > + * for json array access. > + * > + * In this case, we cannot create > a JsonPath item, so we > + * break out of the loop and let > + * jsonb_subscript_transform() > handle this indirection as > + * a PostgreSQL jsonb subscript. > + */ > + break; > + } > + } > + > jpi = make_jsonpath_item(jpiIndexArray); > jpi->value.array.nelems = 1; > jpi->value.array.elems = > palloc(sizeof(jpi->value.array.elems[0])); > @@ -303,12 +337,12 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, > List **indirection, Subscripti > } > else > { > - JsonPathParseItem *jpi_from = NULL; > + //JsonPathParseItem *jpi_from = NULL; > > - Assert(ai->uidx && !ai->lidx); > - jpi_from = make_jsonpath_item_expr(pstate, > ai->uidx, &sbsref->refupperindexpr, true); > - if (jpi_from == NULL) > - { > + //Assert(ai->uidx && !ai->lidx); > + //jpi_from = > make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true); > + //if (jpi_from == NULL) > + //{ > /* > * Break out of the loop if the > subscript is not a > * non-null integer constant, so > that we can fall back to > @@ -331,10 +365,10 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, > List **indirection, Subscripti > * jsonb_subscript_transform() > handle this indirection as > * a PostgreSQL jsonb subscript. > */ > - pfree(jpi->value.array.elems); > - pfree(jpi); > - break; > - } > + // pfree(jpi->value.array.elems); > + // pfree(jpi); > + // break; > + //} > > jpi->value.array.elems[0].from = jpi_from; > jpi->value.array.elems[0].to = NULL; > ``` > > With this change, “make check” also passed. > Fixed. Thank you! On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote: > The third comment is about test cases. I only see tests for indices > following dot notation, like data.a[‘b’], but not the reverse. Let’s add a > few test cases for dot notation following indices, like data[‘a’].b. > I believe both cases are already covered. Please take a look at src/test/regress/jsonb.sql; there's a "-- mixed syntax" section. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-03T03:56:10Z
Hi Alex, Thanks for addressing my comments. I have a few follow up comments: > On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > > > I don't like the idea of removing the "bool isSlice" argument from the > *SubscriptTransform() function pointer. This patch series should make > only minimal changes to the subscripting API. While it's true that > each implementation can easily determine the boolean value of isSlice > itself, I don't see a clear benefit to changing the interface. As you > noted, arrsubs cares about isSlice; and users can also define custom > data types that support subscripting, so the interface is not limited > to built-in types. > > As the comment for *SubscriptTransform() explains: >> (Of course, if the transform method >> * does not care to support slicing, it can just throw an error if isSlice.) > > It's possible that in the end the "isSlice" argument isn't needed at > all, but I don’t think this patch set is the right place for that > refactor. > I agree we should keep the change minimum and tightly related to the core feature. My original suggestion was to move /* * Detect whether any of the indirection items are slice specifiers. * * A list containing only simple subscripts refers to a single container * element. If any of the items are slice specifiers (lower:upper), then * the subscript expression means a container slice operation. */ foreach(idx, *indirection) { Node *ai = lfirst(idx); if (IsA(ai, A_Indices) && castNode(A_Indices, ai)->is_slice) { isSlice = true; break; } } To out of transformContainerSubscripts(). Because the function was called only once under “if”, now this patch change it to be called under “while”, which brings two issues: * Original it was O(n) as it was under “if”, now it is O(n2) as it is under “while”. * Logically it feels wrong now. Because this loops over the entire indirection list to check is_slice, while the subsequent sbsroutines->transform() may only process a portion (prefix) of indirection list. Say, the 5th element is slice, but the first sbsroutines-transform() call will only process the first 3 elements of indirection list, then pass true to the first transform() call sounds wrong. if we pull the loop out of transformContainerSubscripts(), then we have to add a new parameter “bool is_slice” to it. But after some researching, I found that making that change is more complicated than removing “is_slice” parameter from SubscriptTransform(). That’s why I ended up suggesting removing “is_slice” from SubscriptTransform(). Does that sound reasonable? -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-03T06:56:44Z
> On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > This change would give an incorrect result for an accessor like > "[0].a" when the jsonb column data is a jsonb object instead of a > jsonb array. I've added two new test cases to cover this scenario: > > In jsonb.sql, the newly added tests are: > select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).a > select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL I think the latest patch is still wrong. Ultimately, dot notation “.a", index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies on the rest logic in jsonb_subscript_transform() in “foreach”. Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but not checking index case. So that reason why your case “select (jb)[0].a” works is because the second indirection is a dot nation. However, “select (jb)[0][‘a’]” will not work. See my test: ``` evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; name --------- "Alice" (1 row) evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; jsonb ------- (1 row) ``` In my test, (jsonb)[0][’name’] should also return “Alice”. So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and inaccurate checks, only jsonb_subscript_make_jsonpath() can make an accurate decision and return a null jsonpath upon subscript “[‘a’]”. As a result, “json_check_jsonpath_needed()” feels not needed at all. In jsonb_subscript_transform(), just go ahead to run jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then run rest of logic. With my dirty change of removing json_check_jsonpath_needed: ``` chaol@ChaodeMacBook-Air postgresql % git diff diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c index 374040b3b4e..d9faab5c799 100644 --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, sbsref->refrestype = JSONBOID; sbsref->reftypmod = -1; - if (jsonb_check_jsonpath_needed(*indirection, isSlice)) - { + //if (jsonb_check_jsonpath_needed(*indirection, isSlice)) + //{ jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); if (sbsref->refjsonbpath) return; - } + //} /* * We reach here only in two cases: (a) the JSON simplified accessor is ``` You can see: ``` evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; jsonb --------- "Alice" (1 row) evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; name --------- "Alice" (1 row) ``` Then I found “make check” failed. For example the first broken test case: ``` @@ -4998,7 +4998,7 @@ select ('123'::jsonb)[0]; jsonb ------- - + 123 (1 row) ``` The test expected an empty result, which implies “strict” mode. But the problem is, which mode should be the default? JSON_QUERY() uses “lax” as default mode. And from Peter Eisentraut’s blog: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new ``` The semantics of this are defined in terms of JSON_QUERY and JSON_VALUE constructions (which have been available since SQL:2016), so this really just syntactic sugar. ``` Also feels like “lax” should be the default mode. If that is true, then my dirty change of removing “json_check_jsonpath_need()” works properly. The current logic with this patch sounds strange. Because “json_check_jsonpath_need()” iterate through unprocessed indirections to decide if jsonpath is needed (lax mode). With this logic: 1) if index [0] directly following dot notation, like (data).a[0], it’s lax mode 2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], it’s strict mode 3) if index [0] directly following the data column, then if there is a dot nation in indirection list, use lax mode, otherwise strict mode. For the failed test case, as there is no more indirection following [0], so it expected strict mode. I wonder where this behavior is defined? With my change, 1) and 2) are the same, for 3), if index [0] directly following the data column, regardless what indirections are followed, it’s by default lax mode. So, I think this is a design decision. Maybe I missed something from your previous design, but I don’t find anything about that from the commit comments. I feel this would be better aligned with 1) and 2). Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-03T07:11:03Z
> On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > > <v15-0007-Implement-jsonb-wildcard-member-accessor.patch><v15-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v15-0004-Extract-coerce_jsonpath_subscript.patch><v15-0006-Implement-Jsonb-subscripting-with-slicing.patch><v15-0005-Implement-read-only-dot-notation-for-jsonb.patch><v15-0003-Export-jsonPathFromParseResult.patch><v15-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch> I have a few more other small comments: 1 - 0002 ``` diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 6e8fd42c612..ff104c95311 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -441,8 +441,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) ListCell *i; /* - * We have to split any field-selection operations apart from - * subscripting. Adjacent A_Indices nodes have to be treated as a single + * Combine field names and subscripts into a single indirection list, as + * some subscripting containers, such as jsonb, support field access using + * dot notation. Adjacent A_Indices nodes have to be treated as a single * multidimensional subscript operation. */ foreach(i, ind->indirection) @@ -460,19 +461,43 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) } else { - Node *newresult; - Assert(IsA(n, String)); + subscripts = lappend(subscripts, n); + } + } ``` I raised this comment in my previous email, I guess you missed this one. With this change, the 3 clauses are quite similar, the if-elseif-else can be simplified as: If (!IsA(n, A_Indices) && !Is_A(n, A_Start)) Assert(IsA(n, String)); subscripts = lappend(subscripts, n) 2 - 0001 ``` diff --git a/src/include/nodes/subscripting.h b/src/include/nodes/subscripting.h index 234e8ad8012..5d576af346f 100644 --- a/src/include/nodes/subscripting.h +++ b/src/include/nodes/subscripting.h @@ -71,6 +71,11 @@ struct SubscriptExecSteps; * does not care to support slicing, it can just throw an error if isSlice.) * See array_subscript_transform() for sample code. * + * The transform method receives a pointer to a list of raw indirections. + * This allows the method to parse a sublist of the indirections (typically + * the prefix) and modify the original list in place, enabling the caller to + * either process the remaining indirections differently or raise an error. + * * The transform method is also responsible for identifying the result type ``` This is nit comment about the wording. “This allows the method to parse a sublist..." sounds like more from the patch perspective. It’s more suitable for git commit comments, but code comment, I feel it’s better to be more general, for example: * The transform method receives a pointer to a list of raw indirections. * It can parse a sublist (typically the prefix) of these indirections and * modify the original list in place, allowing the caller to either handle * the remaining indirections differently or raise an error. 3 - 0005 ``` diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index f1569879b52..eac31b097e4 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -3320,50 +3320,54 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, state->steps_len - 1); } + /* When slicing, individual subscript bounds can be omitted */ + if (!e) + { + sbsrefstate->upperprovided[i] = false; + sbsrefstate->upperindexnull[i] = true; + } + else { + sbsrefstate->upperprovided[i] = true; ``` This is also a nit comment about code format. “{" should be put to the next line of “else” according to other existing code. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-09T15:14:04Z
Hi Chao, Thank you so much for your review! I’ve attached v16. My current goal is to get consensus on 0001 - 0005 and move them toward commitment. I’d also appreciate feedback on 0006 - 0007 as follow-up. In v16, I addressed your feedback regarding the isSlice argument in SubscriptTransform() -- I removed it. I’ve also addressed your three other comments, sorry for missing them earlier! I did not change anything about the jb[0] array accessor for jsonb objects, will discuss that in a follow-up email. On Tue, Sep 2, 2025 at 8:56 PM Chao Li <li.evan.chao@gmail.com> wrote: > On Sep 3, 2025, at 10:16, Alexandra Wang <alexandra.wang.oss@gmail.com> > wrote: > I don't like the idea of removing the "bool isSlice" argument from the > *SubscriptTransform() function pointer. This patch series should make > only minimal changes to the subscripting API. While it's true that > each implementation can easily determine the boolean value of isSlice > itself, I don't see a clear benefit to changing the interface. As you > noted, arrsubs cares about isSlice; and users can also define custom > data types that support subscripting, so the interface is not limited > to built-in types. > > As the comment for *SubscriptTransform() explains: > >> (Of course, if the transform method >> * does not care to support slicing, it can just throw an error if >> isSlice.) > > > It's possible that in the end the "isSlice" argument isn't needed at > all, but I don’t think this patch set is the right place for that > refactor. > > I agree we should keep the change minimum and tightly related to the core > feature. > > My original suggestion was to move > > /* > * Detect whether any of the indirection items are slice specifiers. > * > * A list containing only simple subscripts refers to a single container > * element. If any of the items are slice specifiers (lower:upper), then > * the subscript expression means a container slice operation. > */ > foreach(idx, *indirection) > { > Node *ai = lfirst(idx); > > if (IsA(ai, A_Indices) && castNode(A_Indices, ai)->is_slice) > { > isSlice = true; > break; > } > } > > To out of transformContainerSubscripts(). Because the function was called > only once under “if”, now this patch change it to be called under “while”, > which brings two issues: > > * Original it was O(n) as it was under “if”, now it is O(n2) as it is > under “while”. > * Logically it feels wrong now. Because this loops over the entire > indirection list to check is_slice, while the subsequent > sbsroutines->transform() may only process a portion (prefix) of indirection > list. Say, the 5th element is slice, but the first sbsroutines-transform() > call will only process the first 3 elements of indirection list, then pass > true to the first transform() call sounds wrong. > > if we pull the loop out of transformContainerSubscripts(), then we have to > add a new parameter “bool is_slice” to it. But after some researching, I > found that making that change is more complicated than removing “is_slice” > parameter from SubscriptTransform(). That’s why I ended up suggesting > removing “is_slice” from SubscriptTransform(). > > Does that sound reasonable? > Yes, that makes total sense! The logical defect you found can indeed cause bugs in arraysubs. In 0002, I’ve added a test in arrays.sql to demonstrate the potential bug and updated the code accordingly: This change also updates the SubscriptTransform() API to remove the "bool isSlice" argument. Each container’s transform function now determines slice-ness itself, since it may only process a prefix of the indirection list. For example, array_subscript_transform() stops at dot notation, so "isSlice" should not depend on slice specifiers that appear afterward. Thanks, Alex -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-09T23:53:04Z
Hi Chao, On Tue, Sep 2, 2025 at 11:57 PM Chao Li <li.evan.chao@gmail.com> wrote: > On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra.wang.oss@gmail.com> > wrote: > > This change would give an incorrect result for an accessor like > "[0].a" when the jsonb column data is a jsonb object instead of a > jsonb array. I've added two new test cases to cover this scenario: > > In jsonb.sql, the newly added tests are: > select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as > (jb).a > select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL > > > I think the latest patch is still wrong. Ultimately, dot notation “.a", > index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies > on the rest logic in jsonb_subscript_transform() in “foreach”. > > Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but > not checking index case. So that reason why your case “select (jb)[0].a” > works is because the second indirection is a dot nation. However, “select > (jb)[0][‘a’]” will not work. See my test: > > ``` > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; > name > --------- > "Alice" > (1 row) > > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; > jsonb > ------- > > (1 row) > ``` > > In my test, (jsonb)[0][’name’] should also return “Alice”. > > So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and > inaccurate checks, only jsonb_subscript_make_jsonpath() can make an > accurate decision and return a null jsonpath upon subscript “[‘a’]”. > > As a result, “json_check_jsonpath_needed()” feels not needed at all. In > jsonb_subscript_transform(), just go ahead to run > jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then > run rest of logic. > > With my dirty change of removing json_check_jsonpath_needed: > ``` > chaol@ChaodeMacBook-Air postgresql % git diff > diff --git a/src/backend/utils/adt/jsonbsubs.c > b/src/backend/utils/adt/jsonbsubs.c > index 374040b3b4e..d9faab5c799 100644 > --- a/src/backend/utils/adt/jsonbsubs.c > +++ b/src/backend/utils/adt/jsonbsubs.c > @@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref, > sbsref->refrestype = JSONBOID; > sbsref->reftypmod = -1; > > - if (jsonb_check_jsonpath_needed(*indirection, isSlice)) > - { > + //if (jsonb_check_jsonpath_needed(*indirection, isSlice)) > + //{ > jsonb_subscript_make_jsonpath(pstate, indirection, sbsref); > if (sbsref->refjsonbpath) > return; > - } > + //} > > /* > * We reach here only in two cases: (a) the JSON simplified > accessor is > ``` > > You can see: > ``` > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name']; > jsonb > --------- > "Alice" > (1 row) > > evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name; > name > --------- > "Alice" > (1 row) > ``` > > Then I found “make check” failed. For example the first broken test case: > > ``` > @@ -4998,7 +4998,7 @@ > select ('123'::jsonb)[0]; > jsonb > ------- > - > + 123 > (1 row) > ``` > > The test expected an empty result, which implies “strict” mode. > > But the problem is, which mode should be the default? JSON_QUERY() uses > “lax” as default mode. And from Peter Eisentraut’s blog: > https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new > > ``` > The semantics of this are defined in terms of JSON_QUERY and JSON_VALUE constructions > (which have been available since SQL:2016), so this really just syntactic > sugar. > ``` > > Also feels like “lax” should be the default mode. If that is true, then my > dirty change of removing “json_check_jsonpath_need()” works properly. > > The current logic with this patch sounds strange. Because > “json_check_jsonpath_need()” iterate through unprocessed indirections to > decide if jsonpath is needed (lax mode). With this logic: > > 1) if index [0] directly following dot notation, like (data).a[0], it’s > lax mode > 2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], > it’s strict mode > 3) if index [0] directly following the data column, then if there is a dot > nation in indirection list, use lax mode, otherwise strict mode. For the > failed test case, as there is no more indirection following [0], so it > expected strict mode. > > I wonder where this behavior is defined? > > With my change, 1) and 2) are the same, for 3), if index [0] directly > following the data column, regardless what indirections are followed, it’s > by default lax mode. > > So, I think this is a design decision. Maybe I missed something from your > previous design, but I don’t find anything about that from the commit > comments. I feel this would be better aligned with 1) and 2). > Thanks for investigating this, and for the great questions! Here’s a bit of background. The pre-standard jsonb subscripting feature [1] was introduced in Postgres 14. Specifically, support for queries like (jb)[0] and (jb)[0]['a'] was added by this commit: commit 676887a3b0b8e3c0348ac3f82ab0d16e9a24bd43 (HEAD) Author: Alexander Korotkov <akorotkov@postgresql.org> Date: Sun Jan 31 23:50:40 2021 +0300 Implementation of subscripting for jsonb Without my patch, from version 14 through the current master branch, all supported versions of Postgres return the following results: test=# select ('123'::jsonb)[0]; jsonb ------- (1 row) test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]; jsonb ------- (1 row) test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['age']; jsonb ------- (1 row) The simplified accessor syntax defined in the SQL standard includes dot-notation access as well as integer-based subscripts. It does not include text-based subscripts such as ['a']. Therefore, the only overlap between the pre-standard jsonb subscripting and the standard simplified accessor is non-slicing integer-based subscripts. (Since the pre-standard jsonb subscripting never supported slicing, we don’t need to consider that case either when comparing the two syntaxes.) When we compare the two syntaxes for non-slicing integer-based subscripts, the only discrepancy is that for a jsonb object jb, (jb)[0] in the pre-standard syntax returns NULL, while (jb)[0] in the standard syntax returns (jb) itself. As long as the integer subscript is non-zero, both syntaxes return the same results. I think this (jb)[0] case is rather trivial. However, since it's been behaving the pre-standard way since PG14, I hesitate to change the existing behavior as part of my patches, and I feel it could be a bikeshedding kind of discussion in a separate thread. The main goal of my patch is to add dot-notation. For now, my intention is for cases like (jb)[0].a to work in the standard way, because if a user chooses dot-notation instead of text-based subscripting, they are likely expecting the standard behavior. Thoughts? [1] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-10T03:52:05Z
> <v16-0007-Implement-jsonb-wildcard-member-accessor.patch><v16-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v16-0004-Extract-coerce_jsonpath_subscript.patch><v16-0006-Implement-Jsonb-subscripting-with-slicing.patch><v16-0005-Implement-read-only-dot-notation-for-jsonb.patch><v16-0003-Export-jsonPathFromParseResult.patch><v16-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch> A few more comment for v16. 1 - 0002 ``` --- a/contrib/hstore/hstore_subs.c +++ b/contrib/hstore/hstore_subs.c - if (isSlice || list_length(*indirection) != 1) + if (ai->is_slice || list_length(*indirection) != 1) ``` We should put list_length() check before ai->is_slace. Because when indirection length is greater than 1, it take a single check; other it may need to check the both conditions. 2 - 0002 - also in hstore_subs.c ``` *indirection = NIL; ``` We should do “list_delete_first_n(indirection, 1)”. 3 - 0003 ``` +Datum +jsonPathFromParseResult(JsonPathParseResult *jsonpath, int estimated_len, + struct Node *escontext) +{ ``` `jsonpath` is not changed in this function, so it should be defined as a const: “const JsonPathParseResult *jsonpath”. 4 - 0004 ``` + Oid targets[2] = {INT4OID, TEXTOID}; + + /* + * Jsonb can handle multiple subscript types, but cases when a + * subscript could be coerced to multiple target types must be + * avoided, similar to overloaded functions. It could be possibly + * extend with jsonpath in the future. + */ + for (int i = 0; i < 2; i++) ``` This is a nit optimization. We can do: Oid targets[] = {INT4OID, TEXTOID}; For (int I = 0; I < sizeof(targets) / sizeof(targets[0]); I ++) This way lets the compiler to decide length of “targets”. So that avoids hard code “2” in “for”. If we need to add more elements to “targets”, we can just add it without updating “for”. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-10T04:03:36Z
> On Sep 10, 2025, at 07:53, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > I think this (jb)[0] case is rather trivial. However, since it's been > behaving the pre-standard way since PG14, I hesitate to change the > existing behavior as part of my patches, and I feel it could be a > bikeshedding kind of discussion in a separate thread. I am fine to retain the PG14 behavior. But the confusion part is: ``` evantest=# select ('{"name": "Alice", "birthday": {"year": 2000, "month": 8}}'::jsonb)[0]['birthday']['year']; jsonb ------- (1 row) evantest=# select ('{"name": "Alice", "birthday": {"year": 2000, "month": 8}}'::jsonb)[0]['birthday'].year; year ------ 2000 (1 row) ``` As long as the expression containing a dot notation, “[0]” behaves differently from PG14. I agree we should not recommend the mixed usages of `[`a`]` and `.a`, but we haven’t prevented from that. Given they are different meanings, `[‘a’]` implies strict mode and `.a` implies lax mode, for a complex JSON object, users may intentionally mix use both, which sounds reasonable. I don’t see you have updated any documentation yet. I don’t want to block you because of this issue. As long as you state the behavior clearly in the document, I am happy to let it go. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-10T16:40:33Z
Hi Chao, On Tue, Sep 9, 2025 at 8:52 PM Chao Li <li.evan.chao@gmail.com> wrote: > > <v16-0007-Implement-jsonb-wildcard-member-accessor.patch> > <v16-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch> > <v16-0004-Extract-coerce_jsonpath_subscript.patch> > <v16-0006-Implement-Jsonb-subscripting-with-slicing.patch> > <v16-0005-Implement-read-only-dot-notation-for-jsonb.patch> > <v16-0003-Export-jsonPathFromParseResult.patch> > <v16-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch> > > > A few more comment for v16. > > 1 - 0002 > ``` > --- a/contrib/hstore/hstore_subs.c > +++ b/contrib/hstore/hstore_subs.c > > - if (isSlice || list_length(*indirection) != 1) > + if (ai->is_slice || list_length(*indirection) != 1) > ``` > > We should put list_length() check before ai->is_slace. Because when > indirection length is greater than 1, it take a single check; other it may > need to check the both conditions. > > 2 - 0002 - also in hstore_subs.c > ``` > *indirection = NIL; > ``` > > We should do “list_delete_first_n(indirection, 1)”. > > 3 - 0003 > ``` > +Datum > +jsonPathFromParseResult(JsonPathParseResult *jsonpath, int estimated_len, > + struct Node *escontext) > +{ > ``` > > `jsonpath` is not changed in this function, so it should be defined as a > const: “const JsonPathParseResult *jsonpath”. > > 4 - 0004 > ``` > + Oid targets[2] = {INT4OID, TEXTOID}; > + > + /* > + * Jsonb can handle multiple subscript types, but cases when a > + * subscript could be coerced to multiple target types must be > + * avoided, similar to overloaded functions. It could be possibly > + * extend with jsonpath in the future. > + */ > + for (int i = 0; i < 2; i++) > ``` > > This is a nit optimization. We can do: > > Oid targets[] = {INT4OID, TEXTOID}; > For (int I = 0; I < sizeof(targets) / sizeof(targets[0]); I ++) > > This way lets the compiler to decide length of “targets”. So that avoids > hard code “2” in “for”. If we need to add more elements to “targets”, we > can just add it without updating “for”. > Thanks for the feedback! Here’s v17. I’ve addressed 1 - 3 exactly as you suggested. For 4, I made the targets array const and used the lengthof macro to determine the array length. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-10T23:56:40Z
Hi Chao, On Tue, Sep 9, 2025 at 9:03 PM Chao Li <li.evan.chao@gmail.com> wrote: > I don’t see you have updated any documentation yet. I don’t want to block > you because of this issue. As long as you state the behavior clearly in the > document, I am happy to let it go. > Thanks for the reminder for documentation! I've attached v18. v18 adds documentation in 0005. I've also updated the commit messages for 0003, 0004, and 0007 to include reviewers' names. For everyone: my immediate goal is to move 0001 - 0005 forward, as I feel more confident about them now. In the meantime, I'd also appreciate code review for 0006 - 0007. Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-15T18:32:52Z
Hi, I’ve rebased the patch set, no changes since v18. Best, Alex
-
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-19T20:40:32Z
Hi there, I've attached v20. It has the following changes: 1. New 0001: It adds test coverage for single-argument functions and casting for jsonb expressions. This ensures that the relevant behavior changes become visible in 0005 when field access via dot-notation is introduced. Specifically, once member access through dot-notation is enabled for jsonb, we can no longer write single-argument functions (including casts) in dot form for jsonb. For example: Before 0005: select ('{"a":1}'::jsonb).jsonb_typeof; jsonb_typeof -------------- object (1 row) select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name; name -------------------------------------- [{"name": "alice"}, {"name": "bob"}] (1 row) After 0005: select ('{"a":1}'::jsonb).jsonb_typeof; jsonb_typeof -------------- (1 row) select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name; name ------------------ ["alice", "bob"] (1 row) In the meanwhile, these functions still return correct results through standard syntax: test=# select jsonb_typeof(('{"a":1}'::jsonb)); jsonb_typeof -------------- object (1 row) test=# select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb)::name; name -------------------------------------- [{"name": "alice"}, {"name": "bob"}] (1 row) I don't consider this behavior change a major issue, because the dot-form for single-argument functions is not standard SQL and seems to be PostgreSQL-specific. Still, it's worth highlighting here so users aren't surprised. 2. Refactored 0002: It combines and refactors v19-0001 and v19-0002. Instead of changing the existing transform() callback in SubscriptRoutines, it now introduces an additional callback, transform_partial(). This alternative transform method, used by jsonb, is more flexible: it accepts a wider range of indirection node types and can transform only a prefix of the indirection list. This avoids breaking compatibility for arrays, hstore, and external data types that supports subscripting. 3. 0003 and 0004 stay unchanged. They are both small and can be squashed into 0005. I leave them as-is for now for easier review. 4. Added two additional tests in 0005 for assignments using jsonb dot-notation, showing explicitly that assignment is not yet supported. 5. Removed 0006 (array slicing) and 0007 (wildcard) from the previous versions, as they need additional work. My immediate goal is to first reach consensus on the dot-notation implementation. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-22T03:32:05Z
Looks like patch files need rebase. > On Sep 20, 2025, at 04:40, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > Hi there, > > I've attached v20. It has the following changes: > > 1. New 0001: It adds test coverage for single-argument functions and > casting for jsonb expressions. This ensures that the relevant behavior > changes become visible in 0005 when field access via dot-notation is > introduced. > > Specifically, once member access through dot-notation is enabled for > jsonb, we can no longer write single-argument functions (including > casts) in dot form for jsonb. For example: > > Before 0005: > > select ('{"a":1}'::jsonb).jsonb_typeof; > jsonb_typeof > -------------- > object > (1 row) > > select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name; > name > -------------------------------------- > [{"name": "alice"}, {"name": "bob"}] > (1 row) > > After 0005: > > select ('{"a":1}'::jsonb).jsonb_typeof; > jsonb_typeof > -------------- > > (1 row) > > select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb).name; > name > ------------------ > ["alice", "bob"] > (1 row) > > In the meanwhile, these functions still return correct results through > standard syntax: > > test=# select jsonb_typeof(('{"a":1}'::jsonb)); > jsonb_typeof > -------------- > object > (1 row) > test=# select ('[{"name": "alice"}, {"name": "bob"}]'::jsonb)::name; > name > -------------------------------------- > [{"name": "alice"}, {"name": "bob"}] > (1 row) > > I don't consider this behavior change a major issue, because the > dot-form for single-argument functions is not standard SQL and seems > to be PostgreSQL-specific. Still, it's worth highlighting here so > users aren't surprised. > > 2. Refactored 0002: It combines and refactors v19-0001 and v19-0002. > Instead of changing the existing transform() callback in > SubscriptRoutines, it now introduces an additional callback, > transform_partial(). This alternative transform method, used by jsonb, > is more flexible: it accepts a wider range of indirection node types > and can transform only a prefix of the indirection list. This avoids > breaking compatibility for arrays, hstore, and external data types > that supports subscripting. > > 3. 0003 and 0004 stay unchanged. They are both small and can be squashed > into 0005. I leave them as-is for now for easier review. > > 4. Added two additional tests in 0005 for assignments using jsonb > dot-notation, showing explicitly that assignment is not yet supported. > > 5. Removed 0006 (array slicing) and 0007 (wildcard) from the previous > versions, as they need additional work. My immediate goal is to first > reach consensus on the dot-notation implementation. > > Best, > Alex > > <v20-0005-Implement-read-only-dot-notation-for-jsonb.patch><v20-0004-Extract-coerce_jsonpath_subscript.patch><v20-0001-Add-test-coverage-for-indirection-transformation.patch><v20-0002-Add-an-alternative-transform-function-in-Subscri.patch><v20-0003-Export-jsonPathFromParseResult.patch> -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-22T17:31:47Z
On Sun, Sep 21, 2025 at 8:32 PM Chao Li <li.evan.chao@gmail.com> wrote: > Looks like patch files need rebase. > There were trailing whitespaces in the documentation I added, I’ve fixed them now.
-
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-23T05:48:24Z
The new approach of introducing “transform_partial” looks like a better solution, which leads to less code change to hstore_subs and arraysubs. However, when I tested the v21, I encountered errors when combine composite type, array and jsonb together. Prepare test data: ``` drop table if exists people; drop type if exists person; CREATE TYPE person AS ( name text, size int[], meta jsonb[] ); CREATE TABLE people ( p person ); INSERT INTO people VALUES (ROW('Alice', array[10, 20], array['{"a": 30}'::jsonb, '{"a": 40}'::jsonb])); ``` Then run the test: ``` # old jsonb accessor works to extract a jsonb field from an array item of a composite field evantest=# select (p).meta[1]->'a' from people; ?column? ---------- 30 (1 row) # dot notation also works evantest=# select (p).meta[1].a from people; a ---- 30 (1 row) # but index accessor doesn’t work evantest=# select (p).meta[1]['a'] from people; ERROR: invalid input syntax for type integer: "a" LINE 1: select (p).meta[1]['a'] from people; ^ ``` Other than that, I got a few new comments: > On Sep 23, 2025, at 01:31, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > > There were trailing whitespaces in the documentation I added, I’ve fixed them now. > > <v21-0004-Extract-coerce_jsonpath_subscript.patch><v21-0005-Implement-read-only-dot-notation-for-jsonb.patch><v21-0001-Add-test-coverage-for-indirection-transformation.patch><v21-0002-Add-an-alternative-transform-function-in-Subscri.patch><v21-0003-Export-jsonPathFromParseResult.patch> 1 - 0001 - overall looks good 2 - 0002 ``` + /* Collect leading A_Indices subscripts */ + foreach(lc, indirection) + { + Node *n = lfirst(lc); + + if (IsA(n, A_Indices)) + { + A_Indices *ai = (A_Indices *) n; + + subscriptlist = lappend(subscriptlist, n); + if (ai->is_slice) + isSlice = true; + } + else + break; ``` We can break after “isSlice=true”. 3 - 0002 ``` + * list, and handle the + * remaining indirections differently or to raise an error as needed. ``` Not well formatted, “remaining” can go to the previous line. 4 - 0002 ``` + if (sbsroutines->transform_partial != NULL) + { ``` Do we want to assert that one of transform and transform_partial should not be NULL before “if"? 5 - 0002 ``` + /* + * If there is no partial transform function, use the full transform + * function, which only accepts bracket subscripts (A_Indices nodes). + * We pre-collect the leading A_Indices nodes from the indirection ``` “If there is no partial transform function” sounds redundant, I think we can just go with “Full transform function only accepts …”. 6 - 0002 ``` + /* This should not happen with well-behaved transform functions */ + elog(ERROR, "subscripting transform function failed to consume any indirection elements”); ``` I don’t see an existing error message uses “indirection” and “transform”. This error message looks more like a log message rather than a message to show to end users. 7 - 0002 ``` --- a/src/backend/utils/adt/jsonbsubs.c +++ b/src/backend/utils/adt/jsonbsubs.c @@ -39,11 +39,10 @@ typedef struct JsonbSubWorkspace * Transform the subscript expressions, coerce them to text, * and determine the result type of the SubscriptingRef node. */ -static void +static int jsonb_subscript_transform(SubscriptingRef *sbsref, List *indirection, ParseState *pstate, - bool isSlice, bool isAssignment) ``` As return type is changed, function comment should be updated accordingly. 8 - 0005 - jsonb.sql As we discussed earlier, now select ('{"a": 1}'::jsonb)[0]['a']; and select ('{"a": 1}'::jsonb)[0].a; Will return different results. Maybe that part needs more discussion, but we at least don’t want random behavior. So I would suggest add the two cases into the test script, so that other reviewers may easily notice that, thus gets more inputs from more people. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-09-24T01:05:26Z
Hi Chao, Thanks for reviewing. I'm glad you like the new approach of introducing "transform_partial". I've attached v22, which addresses some of your feedback, and I ran pgindent again. See detailed replies below. On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote: > The new approach of introducing “transform_partial” looks like a better > solution, which leads to less code change to hstore_subs and arraysubs. > However, when I tested the v21, I encountered errors when combine composite > type, array and jsonb together. > > Prepare test data: > ``` > drop table if exists people; > drop type if exists person; > CREATE TYPE person AS ( > name text, > size int[], > meta jsonb[] > ); > > CREATE TABLE people ( > p person > ); > > INSERT INTO people VALUES (ROW('Alice', array[10, 20], array['{"a": > 30}'::jsonb, '{"a": 40}'::jsonb])); > ``` > > Then run the test: > ``` > # old jsonb accessor works to extract a jsonb field from an array item of > a composite field > evantest=# select (p).meta[1]->'a' from people; > ?column? > ---------- > 30 > (1 row) > > # dot notation also works > evantest=# select (p).meta[1].a from people; > a > ---- > 30 > (1 row) > > # but index accessor doesn’t work > evantest=# select (p).meta[1]['a'] from people; > ERROR: invalid input syntax for type integer: "a" > LINE 1: select (p).meta[1]['a'] from people; > ^ > This is the expected behavior for array subscripting, and my patch doesn't change that. I don't think this is a problem. With or without my patch, you can avoid the ERROR by adding parentheses: test=# select ((p).meta[1])['a'] from people; meta ------ 30 (1 row) On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote: > 2 - 0002 > ``` > + /* Collect leading A_Indices subscripts */ > + foreach(lc, indirection) > + { > + Node *n = lfirst(lc); > + > + if (IsA(n, A_Indices)) > + { > + A_Indices *ai = (A_Indices *) n; > + > + subscriptlist = lappend(subscriptlist, n); > + if (ai->is_slice) > + isSlice = true; > + } > + else > + break; > ``` > > We can break after “isSlice=true”. > Why? We still want to get the whole prefix list of A_Indices. On Mon, Sep 22, 2025 at 10:48 PM Chao Li <li.evan.chao@gmail.com> wrote: > 6 - 0002 > ``` > + /* This should not happen with well-behaved transform functions */ > + elog(ERROR, "subscripting transform function failed to consume any > indirection elements”); > ``` > > I don’t see an existing error message uses “indirection” and “transform”. > This error message looks more like a log message rather than a message to > show to end users. > This is a defensive elog message that should not happen. So it is a log message for developers. That said, I'm open to suggestions for better wording. The rest of your feedback I've made changes accordingly as you suggested. Best, Alex -
Re: SQL:2023 JSON simplified accessor support
Chao Li <li.evan.chao@gmail.com> — 2025-09-24T06:37:43Z
> On Sep 24, 2025, at 09:05, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > This is the expected behavior for array subscripting, and my patch > doesn't change that. I don't think this is a problem. With or without > my patch, you can avoid the ERROR by adding parentheses: > > test=# select ((p).meta[1])['a'] from people; meta ------ 30 (1 row) > Okay, make sense. >> 2 - 0002 >> ``` >> + /* Collect leading A_Indices subscripts */ >> + foreach(lc, indirection) >> + { >> + Node *n = lfirst(lc); >> + >> + if (IsA(n, A_Indices)) >> + { >> + A_Indices *ai = (A_Indices *) n; >> + >> + subscriptlist = lappend(subscriptlist, n); >> + if (ai->is_slice) >> + isSlice = true; >> + } >> + else >> + break; >> ``` >> >> We can break after “isSlice=true”. > > Why? We still want to get the whole prefix list of A_Indices. Ah, I just realized the main goal of the foreach loop is to build a new “subscrptlist”. In that case, it should not break. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ -
Re: SQL:2023 JSON simplified accessor support
jian he <jian.universality@gmail.com> — 2025-12-10T15:11:33Z
On Wed, Sep 24, 2025 at 9:06 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > > The rest of your feedback I've made changes accordingly as you suggested. > > Best, > Alex > hi. + <para> + PostgreSQL implements the JSON simplified accessor as specified in SQL:2023. not sure we need to decorated SQL:2023 as <acronym>SQL:2023</acronym>, but PostgreSQL should be decorated as <productname>PostgreSQL</productname>. I believe SELECT * FROM users WHERE profile.preferences.theme = '"dark"'; should be SELECT * FROM users WHERE (profile).preferences.theme = '"dark"'; +INSERT INTO test_table VALUES + ('{"brightness": 80}'), -- Object case + ('[{"brightness": 45}, {"brightness": 90}]'); -- Array case comments no need, i think. + <sect3 id="jsonb-access-method-comparison"> + <title>Comparison of JSON Access Methods</title> + <para> I am worried that the wording "Access Methods" would be confused with "Table Access Methods". +-- Comparison with other access methods (NOT equivalent - different semantics): +SELECT json_col['address']['city']; -- Subscripting +SELECT json_col->'address'->'city'; -- Operator +SELECT json_col.address.city; -- Simplified accessor (different behavior) +</programlisting> "access methods" would be confusing as mentioned above. also these SQL query with SELECT is wrong? since no FROM clause. again, I think the last one should be SELECT (json_col).address.city; we generally expect </programlisting> content can be passed to psql. +-- Different behaviors: +SELECT data.brightness FROM test_table; -- Simplified accessor +-- Results: 80, [45, 90] (array elements unwrapped, results wrapped) Again, here I believe, it should be SELECT (data).brightness FROM test_table; also the Results should be two rows, so this needs to change. +<programlisting> +-- Setup data +INSERT INTO test_table VALUES ('{"weather": "sunny", "temperature": "72F"}'); + +-- Different behaviors when accessing [0] on a non-array value: +SELECT data[0] FROM test_table; -- Simplified accessor (lax mode, if dots present elsewhere) +-- Result: {"weather": "sunny", "temperature": "72F"} (object wrapped as array, [0] returns entire object) + there is no GUC about json lex mode or not, we can only specify it via jsonpath. but in the HEAD select (jsonb '{"weather": "sunny", "temperature": "72F"}')[0]; return NULL. so I am confused with the above comment. +<programlisting> +-- All parts use simplified accessor (standard behavior) +SELECT data.location.coordinates.latitude FROM table; -- Good +SELECT data.repertoire[0].title FROM table; -- Good +SELECT data.users[1].profile.email FROM table; -- Good +</programlisting> + </para> TABLE is a reserved word, ``SELECT FROM table;`` will result in syntax error. That means most of the examples in <sect3 id="jsonb-accessor-best-practices"> needs more polishing. + <sect3 id="jsonb-accessor-best-practices"> + <title>Best Practices: Avoid Mixing Access Methods</title> + <para> + <emphasis>Important:</emphasis> Do not mix SQL:2023 simplified accessor syntax + with pre-standard subscripting syntax in the same accessor chain. These + methods have subtly different semantics and are not interchangeable aliases. + Mixing them can lead to confusion and code that is difficult to understand. + </para> If we want users not to confuse SQL:2023 simplified accessor with pre-standard subscripting syntax, we can wrap this important information in a <note> tag. some changes are reflected on the attached file, but some I don't know how to change, so I didn't do it. some sgml lines are way too line, I have split them into separate lines. -- jian https://www.enterprisedb.com