Thread

Re: SQL Property Graph Queries (SQL/PGQ)

Hannu Krosing <hannuk@google.com> — 2025-07-06T16:10:47Z
 What is the current thinking about getting PGQ committed ?

The attached fixes *ONLY* the duplicate OIDs error to get it build again in CI

On Mon, Apr 7, 2025 at 10:13 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Ashutosh,
>
> On Mon, Apr 7, 2025 at 1:19 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Sat, Apr 5, 2025 at 6:20 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > >
> > > Hi Ashutosh and Peter,
> > >
> > > Since this PGQ feature won't be in PG 18, I'd like to raise a discussion of
> > > the possibility of implementing the quantifier feature, which I think is a
> > > quite useful feature in the graph database area.
> >
> > I agree that quantifiers feature is very useful; it's being used in
> > many usecases. However, it's a bit of a complex feature. IMO, we
> > should keep that discussion as well as the patch in a separate thread,
> > so that this patchset doesn't grow too large to review and also
> > discussion in this thread can remain focused. Once we get the current
> > patch set reviewed and committed we can tackle the quantifier problem
> > in a separate discussion. Of course that doesn't mean that we can not
> > start discussion, try POC and even a working patch for quantifier
> > support.
> >
> > Peter may think otherwise.
> >
> > >
> > > I'll start with a graph definition first.
> > >
> > > `Person(id, name, age, sex)` with id as PK
> > > `Knows(id, start_id, end_id, since)` with id as PK, start_id and
> > > end_id FK referencing Person's id
> > >
> > > insert into Person values(1, 'A', 31, 'M'), (2, 'B', 30, 'F'), (3,
> > > 'C', 33, 'M'), (4, 'D', 31, 'F'), (5, 'E', 32, 'M'), (6, 'F', 33,
> > > 'M');
> > > insert into Knows values (1, 1, 2, '2020');  -- A knows B since 2020
> > > insert into Knows values (2, 1, 3, '2021');  -- A knows C since 2021
> > > insert into Knows values (3, 1, 4, '2020');  -- A knows D since 2020
> > > insert into Knows values (4, 2, 4, '2023');  -- B knows D since 2023
> > > insert into Knows values (5, 3, 5, '2022');  -- C knows E since 2022
> > > insert into Knows values (6, 2, 6, '2021');  -- B knows F since 2021
> > > insert into Knows values (7, 4, 6, '2020');  -- D knows F since 2020
> > >
> > > Then we create a property graph:
> > >
> > > CREATE property graph new_graph
> > > VERTEX TABLES (Person)
> > > EDGE TABLES (Knows);
> > >
> > > If we want to find A's non-directly known friends within 3 hops, we can query:
> > >
> > > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > > 'A') --> (b:Person) --> (c:Person) COLUMNS (c.name))
> > > union
> > > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > > 'A') --> (b:Person) -->(c:Person)-->(d:Person) COLUMNS (d.name));
> > >
> > > Or if we support quantifier, we can simply the query as:
> > >
> > > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > > 'A') -->{2,3} (b:Person) COLUMNS (b.name));
> > >
> > > In the current design of PostgreSQL, we can rewrite this pattern with
> > > quantifiers to
> > > the union form with some effort.
> > >
> > > But what if the pattern is more complicated, for example:
> > >
> > > 1. select name, since from graph_table (new_graph match (a:Person
> > > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> > > r.since));
> > > Can we support the r.since column? I guess not, in this case r is a
> > > variable length edge.
> > >
> > > 2. select name, count from graph_table (new_graph match (a:Person
> > > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> > > count(r)));
> > > Can we support this count aggregation(this is called horizontal
> > > aggregation in Oracle's pgql)? How can the executor know the length of
> > > the variable length edge?
> > >
> > > 3. What if the query doesn't specify the Label of edge, and there can
> > > be different edge labels of r, can we easily do the rewrite?
> > >
> > > I did some study of the apache age, they have fixed columns for node
> > > labels(id, agtype)
> > > and edge labels(id, source_id, end_id, agtype), agtype is kind of
> > > json. So they can
> > > resolve the above question easily.
> > >
> > > Above are just my random thoughts of the quantifier feature, I don't have a copy
> > > of the PGQ standard, so I'd like to hear your opinion about this.
> > >
> >
> > I think the questions you have raised are valid. If we decide to
> > discuss this in a separate thread, I will start that thread just by
> > responding to these questions and design I have in mind.
>
> I'm ok with starting a new thread for quantifier discussion, and I'd
> really happy to know your design on this.
>
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
>
>
>
> --
> Regards
> Junwang Zhao
>
>