Thread

  1. Re: SQL Property Graph Queries (SQL/PGQ)

    Henson Choi <assam258@gmail.com> — 2025-12-28T04:43:54Z

    Hi,
    
    I apologize for the confusion caused by my partial patch submission.
    The patch was intended to be applied on top of the PGQ base patch,
    but I was not aware that CFBot would try to apply it directly to master.
    
    Next time, I will use a .txt extension so that the original authors
    can review it before CFBot processes it.
    
    Sorry for the inconvenience.
    
    Regards,
    Henson Choi
    
    2025년 12월 25일 (목) AM 1:23, Henson Choi <assam258@gmail.com>님이 작성:
    
    > Subject: Re: SQL Property Graph Queries (SQL/PGQ)
    >
    > Hi hackers,
    >
    > Apologies for jumping into the SQL/PGQ discussion mid-stream. I've been
    > working on implementing the LABELS() graph element function and ran into
    > an issue I'd like to get some guidance on.
    >
    > == What LABELS() does ==
    >
    > LABELS(element_variable) returns a text[] array containing all labels
    > associated with a graph element. For example:
    >
    >   SELECT * FROM GRAPH_TABLE (myshop
    >       MATCH (n IS customers)
    >       COLUMNS (n.name, LABELS(n) AS lbls)
    >   );
    >
    >      name     |    lbls
    >   ------------+-------------
    >    customer1  | {customers}
    >    customer2  | {customers}
    >
    > == Implementation approach ==
    >
    > A naive approach would return LABELS() as a constant (e.g.,
    > ARRAY['customers']
    > directly). However, this prevents the optimizer from pushing down
    > predicates
    > involving LABELS() through UNION ALL branches.
    >
    > To enable optimizer pushdown, I wrap each element table in a subquery that
    > adds a virtual __labels__ column containing a constant array. This way,
    > LABELS(n) returns a Var referencing that column, allowing the optimizer to
    > evaluate it per-branch and prune accordingly:
    >
    >   -- Conceptually, the rewrite produces:
    >   SELECT name, __labels__ AS lbls FROM (
    >       SELECT *, ARRAY['customers']::text[] AS __labels__ FROM customers
    >   ) ...
    >
    > This allows the optimizer to perform constant folding and prune UNION ALL
    > branches when filtering by label:
    >
    >   -- Filter in outer WHERE clause
    >   EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
    >       MATCH (n IS lists)  -- lists label shared by orders and wishlists
    >       COLUMNS (LABELS(n) AS lbls, n.node_id)
    >   ) WHERE 'orders' = ANY(lbls);
    >
    >                        QUERY PLAN
    >   ------------------------------------------------------
    >    Seq Scan on orders
    >      Output: '{orders,lists}'::text[], orders.order_id
    >
    >   -- Filter in MATCH WHERE clause without IS (both tables scanned)
    >   EXPLAIN SELECT * FROM GRAPH_TABLE (myshop
    >       MATCH (n) WHERE 'lists' = ANY(LABELS(n))
    >       COLUMNS (LABELS(n) AS lbls, n.node_id)
    >   );
    >
    >                        QUERY PLAN
    >   ------------------------------------------------------
    >    Append
    >      ->  Seq Scan on orders
    >            Output: '{orders,lists}'::text[], orders.order_id
    >      ->  Seq Scan on wishlists
    >            Output: '{lists,wishlists}'::text[], wishlists.wishlist_id
    >
    > The first example prunes the wishlists branch since 'orders' is not in its
    > label set. The second example uses LABELS() without an IS clause to filter
    > across all element tables that have the 'lists' label.
    >
    > == The problem ==
    >
    > The subquery wrapping breaks column-level privilege checking.
    >
    > Test case from privileges.sql:
    >
    >   -- regress_priv_user4 has SELECT on atest5(one, four) only
    >   -- lttc label maps atest5.three -> lttck (no privilege)
    >
    >   SET ROLE regress_priv_user4;
    >   SELECT * FROM graph_table (ptg1 MATCH (v IS lttc) COLUMNS (v.lttck));
    >   -- Expected: ERROR: permission denied for table atest5
    >   -- Actual: query succeeds (returns rows without error)
    >
    > I haven't yet identified the exact root cause of this issue. Has anyone
    > encountered a similar issue? Any pointers would be appreciated.
    >
    > Thanks,
    > Henson
    >
    > 2025년 12월 18일 (목) PM 6:16, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이
    > 작성:
    >
    >> On Wed, Dec 17, 2025 at 2:28 PM Peter Eisentraut <peter@eisentraut.org>
    >> wrote:
    >> >
    >> > On 17.12.25 06:32, Ashutosh Bapat wrote:
    >> > > On Mon, Dec 15, 2025 at 6:43 PM Ashutosh Bapat
    >> > > <ashutosh.bapat.oss@gmail.com> wrote:
    >> > >>
    >> > >> Rebased patches on the latest HEAD which required me to move
    >> > >> graph_table.sql to another parallel group.
    >> > >
    >> > > Huh, the movement resulted in losing that test from parallel_schedule.
    >> > > Fixed in the attached patches.
    >> >
    >> > A couple of items:
    >> >
    >> > 1) I was running some tests that involved properties with mismatching
    >> > typmods, and I got an error message like
    >> >
    >> > ERROR:  property "p1" data type modifier mismatch: 14 vs. 19
    >> >
    >> > but the actual types were varchar(10) and varchar(15).  So to improve
    >> > that, we need to run these through the typmod formatting routines, not
    >> > just print the raw typmod numbers.  I actually just combined that with
    >> > the check for the type itself.  Also, there was no test covering this,
    >> > so I added one.  See attached patches.
    >>
    >> +1. The error message is better.
    >>
    >> >
    >> > I did another investigation about whether this level of checking is
    >> > necessary.  I think according to the letter of the SQL standard, the
    >> > typmods must indeed match.  It seems Oracle does not check (the example
    >> > mentioned above came from an Oracle source).  But I think it's okay to
    >> > keep the check.  In PostgreSQL, it is much less common to write like
    >> > varchar(1000).  And we can always consider relaxing it later.
    >>
    >> +1.
    >>
    >> Attached patch adds a couple more test statements.
    >>
    >> >
    >> > 2) I had it in my notes to consider whether we should support the colon
    >> > syntax for label expressions.  I think we might have talked about that
    >> > before.
    >> >
    >> > I'm now leaning toward not supporting it in the first iteration.  I
    >> > don't know that we have fully explored possible conflicts with host
    >> > variable syntax in ecpg and psql and the like.  Maybe avoid that for
    >> now.
    >> >
    >>
    >> I was aware of ecpg and I vaguely remember we fixed something in ECPG
    >> to allow : in MATCH statement. Probably following changes in
    >> psqlscan.l and pgc.l
    >> -self                   [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
    >> +self                   [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=]
    >>
    >> Those changes add | after : (and not the : itself) so maybe they are
    >> not about supporting : . Do you remember what those are?
    >>
    >> However, I see that : in psql can be a problem
    >> #create table t1 (a int, b int);
    >> #create property graph g1 vertex tables (t1 key (a));
    >> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
    >>  a
    >> ---
    >> (0 rows)
    >>
    >> #\set t1 blank
    >> #select * from GRAPH_TABLE (g1 MATCH (a :t1) COLUMNS (a.a));
    >> ERROR:  syntax error at or near "blank"
    >> LINE 1: select * from GRAPH_TABLE (g1 MATCH (a blank) COLUMNS (a.a))...
    >>
    >> > There was also a bit of an inconsistency in the presentation: The
    >> > documentation introduced the colon as seemingly the preferred syntax,
    >> > but ruleutils.c dumped out the IS syntax.
    >> >
    >> > (It was also a bit curious that some test code put spaces around the
    >> > colon, which is not idiomatic.)
    >> >
    >>
    >> That might have been just me trying to type one letter less and yet
    >> keeping the query readable. A column between variable name and the
    >> label is not always noticeable.
    >>
    >> > Attached is a patch that shows how to revert the colon support.  It's
    >> > pretty simple, and it would be easy to add it back in later, I think.
    >>
    >> I agree that it's better not to support it now. It requires more work
    >> and it's optional. When we come to support it, we will need thorough
    >> testing.
    >>
    >> I spotted some examples that use : in ddl.sgml.
    >> <programlisting>
    >> SELECT customer_name FROM GRAPH_TABLE (myshop MATCH
    >> (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date)
    >> COLUMNS (c.name AS customer_name));
    >> </programlisting>
    >>
    >> The query demonstrates that one can use label names in a way that will
    >> make the pattern look like an English sentence. Replacing : with IS
    >> defeats that purpose.
    >>
    >> As written in that paragraph, the labels serve the purpose of exposing
    >> the table with a different logical view (using different label and
    >> property names). So we need that paragraph, but I think we should
    >> change the example to use IS instead of :. Attached is suggested
    >> minimal change, but I am not happy with it. Another possibility is we
    >> completely remove that paragraph; I don't think we need to discuss all
    >> possible usages the users will come up with.
    >>
    >> The patch changes one more instance of : by IS. But that's straight
    >> forward.
    >>
    >> In ddl.sgml I noticed a seemingly incomplete sentence
    >>    A property graph is a way to represent database contents, instead of
    >> using
    >>    relational structures such as tables.
    >>
    >> Represent the contents as what? I feel the complete sentence should be
    >> one of the following
    >> property graph is a way to represent database contents as a graph,
    >> instead of representing those as relational structures OR
    >> property graph is another way to represent database contents instead
    >> of using relational structures such as tables
    >>
    >> But I can't figure out what was originally intended.
    >>
    >> I have squashed 0002 from my earlier patchset and your 3 patches into
    >> 0001.
    >>
    >> 0002 has extra tests mentioned above. It also removes "TODO: dubious
    >> error message" from a comment. I don't see anything dubious in the
    >> error message. I think this patch is safe to be merged into 0001.
    >>
    >> 0003 is changed to ddl.sgml. Those need a bit more work as described
    >> above.
    >>
    >> --
    >> Best Wishes,
    >> Ashutosh Bapat
    >>
    >