Thread

  1. Re: SQL Property Graph Queries (SQL/PGQ)

    Hannu Krosing <hannuk@google.com> — 2025-07-06T16:10:47Z

     What is the current thinking about getting PGQ committed ?
    
    The attached fixes *ONLY* the duplicate OIDs error to get it build again in CI
    
    On Mon, Apr 7, 2025 at 10:13 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
    >
    > Hi Ashutosh,
    >
    > On Mon, Apr 7, 2025 at 1:19 PM Ashutosh Bapat
    > <ashutosh.bapat.oss@gmail.com> wrote:
    > >
    > > On Sat, Apr 5, 2025 at 6:20 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
    > > >
    > > > Hi Ashutosh and Peter,
    > > >
    > > > Since this PGQ feature won't be in PG 18, I'd like to raise a discussion of
    > > > the possibility of implementing the quantifier feature, which I think is a
    > > > quite useful feature in the graph database area.
    > >
    > > I agree that quantifiers feature is very useful; it's being used in
    > > many usecases. However, it's a bit of a complex feature. IMO, we
    > > should keep that discussion as well as the patch in a separate thread,
    > > so that this patchset doesn't grow too large to review and also
    > > discussion in this thread can remain focused. Once we get the current
    > > patch set reviewed and committed we can tackle the quantifier problem
    > > in a separate discussion. Of course that doesn't mean that we can not
    > > start discussion, try POC and even a working patch for quantifier
    > > support.
    > >
    > > Peter may think otherwise.
    > >
    > > >
    > > > I'll start with a graph definition first.
    > > >
    > > > `Person(id, name, age, sex)` with id as PK
    > > > `Knows(id, start_id, end_id, since)` with id as PK, start_id and
    > > > end_id FK referencing Person's id
    > > >
    > > > insert into Person values(1, 'A', 31, 'M'), (2, 'B', 30, 'F'), (3,
    > > > 'C', 33, 'M'), (4, 'D', 31, 'F'), (5, 'E', 32, 'M'), (6, 'F', 33,
    > > > 'M');
    > > > insert into Knows values (1, 1, 2, '2020');  -- A knows B since 2020
    > > > insert into Knows values (2, 1, 3, '2021');  -- A knows C since 2021
    > > > insert into Knows values (3, 1, 4, '2020');  -- A knows D since 2020
    > > > insert into Knows values (4, 2, 4, '2023');  -- B knows D since 2023
    > > > insert into Knows values (5, 3, 5, '2022');  -- C knows E since 2022
    > > > insert into Knows values (6, 2, 6, '2021');  -- B knows F since 2021
    > > > insert into Knows values (7, 4, 6, '2020');  -- D knows F since 2020
    > > >
    > > > Then we create a property graph:
    > > >
    > > > CREATE property graph new_graph
    > > > VERTEX TABLES (Person)
    > > > EDGE TABLES (Knows);
    > > >
    > > > If we want to find A's non-directly known friends within 3 hops, we can query:
    > > >
    > > > select name from graph_table (new_graph match (a:Person WHERE a.name =
    > > > 'A') --> (b:Person) --> (c:Person) COLUMNS (c.name))
    > > > union
    > > > select name from graph_table (new_graph match (a:Person WHERE a.name =
    > > > 'A') --> (b:Person) -->(c:Person)-->(d:Person) COLUMNS (d.name));
    > > >
    > > > Or if we support quantifier, we can simply the query as:
    > > >
    > > > select name from graph_table (new_graph match (a:Person WHERE a.name =
    > > > 'A') -->{2,3} (b:Person) COLUMNS (b.name));
    > > >
    > > > In the current design of PostgreSQL, we can rewrite this pattern with
    > > > quantifiers to
    > > > the union form with some effort.
    > > >
    > > > But what if the pattern is more complicated, for example:
    > > >
    > > > 1. select name, since from graph_table (new_graph match (a:Person
    > > > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
    > > > r.since));
    > > > Can we support the r.since column? I guess not, in this case r is a
    > > > variable length edge.
    > > >
    > > > 2. select name, count from graph_table (new_graph match (a:Person
    > > > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
    > > > count(r)));
    > > > Can we support this count aggregation(this is called horizontal
    > > > aggregation in Oracle's pgql)? How can the executor know the length of
    > > > the variable length edge?
    > > >
    > > > 3. What if the query doesn't specify the Label of edge, and there can
    > > > be different edge labels of r, can we easily do the rewrite?
    > > >
    > > > I did some study of the apache age, they have fixed columns for node
    > > > labels(id, agtype)
    > > > and edge labels(id, source_id, end_id, agtype), agtype is kind of
    > > > json. So they can
    > > > resolve the above question easily.
    > > >
    > > > Above are just my random thoughts of the quantifier feature, I don't have a copy
    > > > of the PGQ standard, so I'd like to hear your opinion about this.
    > > >
    > >
    > > I think the questions you have raised are valid. If we decide to
    > > discuss this in a separate thread, I will start that thread just by
    > > responding to these questions and design I have in mind.
    >
    > I'm ok with starting a new thread for quantifier discussion, and I'd
    > really happy to know your design on this.
    >
    > >
    > > --
    > > Best Wishes,
    > > Ashutosh Bapat
    >
    >
    >
    > --
    > Regards
    > Junwang Zhao
    >
    >