Re: SQL Property Graph Queries (SQL/PGQ)

Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

To: Junwang Zhao <zhjwpku@gmail.com>

Cc: Amit Langote <amitlangote09@gmail.com>, Vik Fearing <vik@postgresfriends.org>, Ajay Pal <ajay.pal.k@gmail.com>, Imran Zaheer <imran.zhir@gmail.com>, Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers <pgsql-hackers@postgresql.org>

Date: 2025-04-07T05:19:40Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix some typos and make small stylistic improvements
- 5282bf535e47 19 (unreleased) landed
Cleanup users and roles in graph_table_rls test
- 040a56be4bcc 19 (unreleased) landed
Dump labels in reproducible order
- c9babbc8816a 19 (unreleased) landed
SQL Property Graph Queries (SQL/PGQ)
- 2f094e7ac691 19 (unreleased) landed
Factor out constructSetOpTargetlist() from transformSetOperationTree()
- 8c2b30487cc7 19 (unreleased) landed
Sort out table_open vs. relation_open in rewriter
- d537f59fbbfc 19 (unreleased) landed
Rename grammar nonterminal to simplify reuse
- 8080f44f96a9 19 (unreleased) landed
Make ecpg parse.pl more robust with braces
- 7f88553ceaca 19 (unreleased) landed
Don't lock partitions pruned by initial pruning
- 525392d5727f 18.0 cited
Remove pg_regex_collation
- 792b2c7e6d92 18.0 cited
Use auxv to check for CRC32 instructions on ARM.
- aac831cafa6f 18.0 cited
Fix inappropriate uses of atol()
- f5a1311fccd2 18.0 cited
Remove unnecessary array object_classes[] in dependency.c
- ef5e2e90859a 17.0 cited

On Sat, Apr 5, 2025 at 6:20 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Ashutosh and Peter,
>
> Since this PGQ feature won't be in PG 18, I'd like to raise a discussion of
> the possibility of implementing the quantifier feature, which I think is a
> quite useful feature in the graph database area.

I agree that quantifiers feature is very useful; it's being used in
many usecases. However, it's a bit of a complex feature. IMO, we
should keep that discussion as well as the patch in a separate thread,
so that this patchset doesn't grow too large to review and also
discussion in this thread can remain focused. Once we get the current
patch set reviewed and committed we can tackle the quantifier problem
in a separate discussion. Of course that doesn't mean that we can not
start discussion, try POC and even a working patch for quantifier
support.

Peter may think otherwise.

>
> I'll start with a graph definition first.
>
> `Person(id, name, age, sex)` with id as PK
> `Knows(id, start_id, end_id, since)` with id as PK, start_id and
> end_id FK referencing Person's id
>
> insert into Person values(1, 'A', 31, 'M'), (2, 'B', 30, 'F'), (3,
> 'C', 33, 'M'), (4, 'D', 31, 'F'), (5, 'E', 32, 'M'), (6, 'F', 33,
> 'M');
> insert into Knows values (1, 1, 2, '2020');  -- A knows B since 2020
> insert into Knows values (2, 1, 3, '2021');  -- A knows C since 2021
> insert into Knows values (3, 1, 4, '2020');  -- A knows D since 2020
> insert into Knows values (4, 2, 4, '2023');  -- B knows D since 2023
> insert into Knows values (5, 3, 5, '2022');  -- C knows E since 2022
> insert into Knows values (6, 2, 6, '2021');  -- B knows F since 2021
> insert into Knows values (7, 4, 6, '2020');  -- D knows F since 2020
>
> Then we create a property graph:
>
> CREATE property graph new_graph
> VERTEX TABLES (Person)
> EDGE TABLES (Knows);
>
> If we want to find A's non-directly known friends within 3 hops, we can query:
>
> select name from graph_table (new_graph match (a:Person WHERE a.name =
> 'A') --> (b:Person) --> (c:Person) COLUMNS (c.name))
> union
> select name from graph_table (new_graph match (a:Person WHERE a.name =
> 'A') --> (b:Person) -->(c:Person)-->(d:Person) COLUMNS (d.name));
>
> Or if we support quantifier, we can simply the query as:
>
> select name from graph_table (new_graph match (a:Person WHERE a.name =
> 'A') -->{2,3} (b:Person) COLUMNS (b.name));
>
> In the current design of PostgreSQL, we can rewrite this pattern with
> quantifiers to
> the union form with some effort.
>
> But what if the pattern is more complicated, for example:
>
> 1. select name, since from graph_table (new_graph match (a:Person
> WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> r.since));
> Can we support the r.since column? I guess not, in this case r is a
> variable length edge.
>
> 2. select name, count from graph_table (new_graph match (a:Person
> WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> count(r)));
> Can we support this count aggregation(this is called horizontal
> aggregation in Oracle's pgql)? How can the executor know the length of
> the variable length edge?
>
> 3. What if the query doesn't specify the Label of edge, and there can
> be different edge labels of r, can we easily do the rewrite?
>
> I did some study of the apache age, they have fixed columns for node
> labels(id, agtype)
> and edge labels(id, source_id, end_id, agtype), agtype is kind of
> json. So they can
> resolve the above question easily.
>
> Above are just my random thoughts of the quantifier feature, I don't have a copy
> of the PGQ standard, so I'd like to hear your opinion about this.
>

I think the questions you have raised are valid. If we decide to
discuss this in a separate thread, I will start that thread just by
responding to these questions and design I have in mind.

-- 
Best Wishes,
Ashutosh Bapat