Thread

Re: SQL Property Graph Queries (SQL/PGQ)

Henson Choi <assam258@gmail.com> — 2025-12-28T04:43:54Z
Hi,

I apologize for the confusion caused by my partial patch submission.
The patch was intended to be applied on top of the PGQ base patch,
but I was not aware that CFBot would try to apply it directly to master.

Next time, I will use a .txt extension so that the original authors
can review it before CFBot processes it.

Sorry for the inconvenience.

Regards,
Henson Choi

2025년 12월 25일 (목) AM 1:23, Henson Choi <assam258@gmail.com>님이 작성:

> Subject: Re: SQL Property Graph Queries (SQL/PGQ)
>
> Hi hackers,
>
> Apologies for jumping into the SQL/PGQ discussion mid-stream. I've been
> working on implementing the LABELS() graph element function and ran into
> an issue I'd like to get some guidance on.
>
> == What LABELS() does ==
>
> LABELS(element_variable) returns a text[] array containing all labels
> associated with a graph element. For example:
>
>   SELECT * FROM GRAPH_TABLE (myshop
>       MATCH (n IS customers)
>       COLUMNS (n.name, LABELS(n) AS lbls)
>   );
>
>      name     |    lbls
>   ------------+-------------
>    customer1  | {customers}
>    customer2  | {customers}
>
> == Implementation approach ==
>
> A naive approach would return LABELS() as a constant (e.g.,
> ARRAY['customers']
> directly). However, this prevents the optimizer from pushing down
> predicates
> involving LABELS() through UNION ALL branches.
>
> To enable optimizer pushdown, I wrap each element table in a subquery that
> adds a virtual __labels__ column containing a constant array. This way,
> LABELS(n) returns a Var referencing that column, allowing the optimizer to
> evaluate it per-branch and prune accordingly:
>
>   -- Conceptually, the rewrite produces:
>   SELECT name, __labels__ AS lbls FROM (
>       SELECT *, ARRAY['customers']::text[] AS __labels__ FROM customers
>   ) ...
>
> This allows the optimizer to perform constant folding and prune UNION ALL
> branches when filtering by label:
>
>   -- Filter in outer WHERE clause
>   EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
>       MATCH (n IS lists)  -- lists label shared by orders and wishlists
>       COLUMNS (LABELS(n) AS lbls, n.node_id)
>   ) WHERE 'orders' = ANY(lbls);
>
>                        QUERY PLAN
>   ------------------------------------------------------
>    Seq Scan on orders
>      Output: '{orders,lists}'::text[], orders.order_id
>
>   -- Filter in MATCH WHERE clause without IS (both tables scanned)
>   EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
>       MATCH (n) WHERE 'lists' = ANY(LABELS(n))
>       COLUMNS (LABELS(n) AS lbls, n.node_id)
>   );
>
>                        QUERY PLAN
>   ------------------------------------------------------
>    Append
>      ->  Seq Scan on orders
>            Output: '{orders,lists}'::text[], orders.order_id
>      ->  Seq Scan on wishlists
>            Output: '{lists,wishlists}'::text[], wishlists.wishlist_id
>
> The first example prunes the wishlists branch since 'orders' is not in its
> label set. The second example uses LABELS() without an IS clause to filter
> across all element tables that have the 'lists' label.
>
> == The problem ==
>
> The subquery wrapping breaks column-level privilege checking.
>
> Test case from privileges.sql:
>
>   -- regress_priv_user4 has SELECT on atest5(one, four) only
>   -- lttc label maps atest5.three -> lttck (no privilege)
>
>   SET ROLE regress_priv_user4;
>   SELECT * FROM graph_table (ptg1 MATCH (v IS lttc) COLUMNS (v.lttck));
>   -- Expected: ERROR: permission denied for table atest5
>   -- Actual: query succeeds (returns rows without error)
>
> I haven't yet identified the exact root cause of this issue. Has anyone
> encountered a similar issue? Any pointers would be appreciated.
>
> Thanks,
> Henson
>
> 2025년 12월 18일 (목) PM 6:16, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이
> 작성:
>
>> On Wed, Dec 17, 2025 at 2:28 PM Peter Eisentraut <peter@eisentraut.org>
>> wrote:
>> >
>> > On 17.12.25 06:32, Ashutosh Bapat wrote:
>> > > On Mon, Dec 15, 2025 at 6:43 PM Ashutosh Bapat
>> > > <ashutosh.bapat.oss@gmail.com> wrote:
>> > >>
>> > >> Rebased patches on the latest HEAD which required me to move
>> > >> graph_table.sql to another parallel group.
>> > >
>> > > Huh, the movement resulted in losing that test from parallel_schedule.
>> > > Fixed in the attached patches.
>> >
>> > A couple of items:
>> >
>> > 1) I was running some tests that involved properties with mismatching
>> > typmods, and I got an error message like
>> >
>> > ERROR:  property "p1" data type modifier mismatch: 14 vs. 19
>> >
>> > but the actual types were varchar(10) and varchar(15).  So to improve
>> > that, we need to run these through the typmod formatting routines, not
>> > just print the raw typmod numbers.  I actually just combined that with
>> > the check for the type itself.  Also, there was no test covering this,
>> > so I added one.  See attached patches.
>>
>> +1. The error message is better.
>>
>> >
>> > I did another investigation about whether this level of checking is
>> > necessary.  I think according to the letter of the SQL standard, the
>> > typmods must indeed match.  It seems Oracle does not check (the example
>> > mentioned above came from an Oracle source).  But I think it's okay to
>> > keep the check.  In PostgreSQL, it is much less common to write like
>> > varchar(1000).  And we can always consider relaxing it later.
>>
>> +1.
>>
>> Attached patch adds a couple more test statements.
>>
>> >
>> > 2) I had it in my notes to consider whether we should support the colon
>> > syntax for label expressions.  I think we might have talked about that
>> > before.
>> >
>> > I'm now leaning toward not supporting it in the first iteration.  I
>> > don't know that we have fully explored possible conflicts with host
>> > variable syntax in ecpg and psql and the like.  Maybe avoid that for
>> now.
>> >
>>
>> I was aware of ecpg and I vaguely remember we fixed something in ECPG
>> to allow : in MATCH statement. Probably following changes in
>> psqlscan.l and pgc.l
>> -self                   [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
>> +self                   [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
>>
>> Those changes add | after : (and not the : itself) so maybe they are
>> not about supporting : . Do you remember what those are?
>>
>> However, I see that : in psql can be a problem
>> #create table t1 (a int, b int);
>> #create property graph g1 vertex tables (t1 key (a));
>> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
>>  a
>> ---
>> (0 rows)
>>
>> #\set t1 blank
>> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
>> ERROR:  syntax error at or near "blank"
>> LINE 1: select * from GRAPH_TABLE (g1 MATCH (a blank) COLUMNS (a.a))...
>>
>> > There was also a bit of an inconsistency in the presentation: The
>> > documentation introduced the colon as seemingly the preferred syntax,
>> > but ruleutils.c dumped out the IS syntax.
>> >
>> > (It was also a bit curious that some test code put spaces around the
>> > colon, which is not idiomatic.)
>> >
>>
>> That might have been just me trying to type one letter less and yet
>> keeping the query readable. A column between variable name and the
>> label is not always noticeable.
>>
>> > Attached is a patch that shows how to revert the colon support.  It's
>> > pretty simple, and it would be easy to add it back in later, I think.
>>
>> I agree that it's better not to support it now. It requires more work
>> and it's optional. When we come to support it, we will need thorough
>> testing.
>>
>> I spotted some examples that use : in ddl.sgml.
>> <programlisting>
>> SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
>> (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
>> COLUMNS (c.name AS customer_name));
>> </programlisting>
>>
>> The query demonstrates that one can use label names in a way that will
>> make the pattern look like an English sentence. Replacing : with IS
>> defeats that purpose.
>>
>> As written in that paragraph, the labels serve the purpose of exposing
>> the table with a different logical view (using different label and
>> property names). So we need that paragraph, but I think we should
>> change the example to use IS instead of :. Attached is suggested
>> minimal change, but I am not happy with it. Another possibility is we
>> completely remove that paragraph; I don't think we need to discuss all
>> possible usages the users will come up with.
>>
>> The patch changes one more instance of : by IS. But that's straight
>> forward.
>>
>> In ddl.sgml I noticed a seemingly incomplete sentence
>>    A property graph is a way to represent database contents, instead of
>> using
>>    relational structures such as tables.
>>
>> Represent the contents as what? I feel the complete sentence should be
>> one of the following
>> property graph is a way to represent database contents as a graph,
>> instead of representing those as relational structures OR
>> property graph is another way to represent database contents instead
>> of using relational structures such as tables
>>
>> But I can't figure out what was originally intended.
>>
>> I have squashed 0002 from my earlier patchset and your 3 patches into
>> 0001.
>>
>> 0002 has extra tests mentioned above. It also removes "TODO: dubious
>> error message" from a comment. I don't see anything dubious in the
>> error message. I think this patch is safe to be merged into 0001.
>>
>> 0003 is changed to ddl.sgml. Those need a bit more work as described
>> above.
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>>
>