Re: Key joins
Joel Jacobson <joel@compiler.org>
From: "Joel Jacobson" <joel@compiler.org>
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: "Arne Roland" <arne.roland@malkut.net>,
"Anders Granlund" <anders.granlund.0@gmail.com>,
"Andreas Karlsson" <andreas@proxel.se>, "Vik Fearing" <vik@chouppes.com>
Date: 2026-05-28T22:13:52Z
Lists: pgsql-hackers
Attachments
- v2-0003-Add-information_schema.view_constraint_usage.patch.gz (application/x-gzip)
- v2-0002-Implement-FOR-KEY-join-support.patch.gz (application/x-gzip)
- v2-0001-Lock-procedures-before-recording-dependencies.patch.gz (application/x-gzip)
On Thu, May 28, 2026, at 20:47, Joel Jacobson wrote: > Hi hackers, > > This patch implements a new SQL language feature, that we intent to > submit as a Change Proposal to the WG 3 SQL committee for the next > meeting in Stockholm in June. We would greatly appreciate any feedback > from the community. > > Key Joins > --------- > > Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a > subset of it, and OUTER JOINs potentially null-extend that subset. In > practice, however, most joins look up additional information along a > foreign key rather than resembling a cartesian product. > > Today a written query does not convey whether a particular JOIN simply > enriches the referencing side, filters it, or fans it out: > > -- both sums silently inflated by the fan-out > SELECT o.id, SUM(oi.amount), SUM(p.amount) > FROM orders o > LEFT JOIN order_items oi ON oi.order_id = o.id > LEFT JOIN payments p ON p.order_id = o.id > GROUP BY o.id; > > The above example is based on a pgsql-generals thread [1]. > > We propose a new JOIN syntax that makes it easy to determine locally > that the immediate join result, before any further steps, just enriches > the referencing side with information from the referenced side, with > null-extension for OUTER JOINs. It conveys the author's intent, makes > the referencing side visually clear, and is enforced at compile time > against the schema. If we can't prove it, the user gets a compile-time > error. > > Under FOR KEY the same query will not compile: > > SELECT o.id, SUM(oi.amount), SUM(p.amount) > FROM orders o > LEFT JOIN order_items oi FOR KEY (order_id) -> o (id) > LEFT JOIN payments p FOR KEY (order_id) -> o (id) > GROUP BY o.id; > > ERROR: key join from referencing relation p to referenced relation o > cannot be proven > LINE 7: LEFT JOIN payments AS p FOR KEY (order_id) -> o (id) > ^ > DETAIL: Referenced columns o (id) are not proven unique. A preceding > join may duplicate rows from referenced relation o. > > Web demo > -------- > > The attached Discussion paper has also been published at https://keyjoin.org > with all examples in the paper runnable in the browser using a patched PGLite. > > Patches > ------- > > 0001 > The first patch is an attempt to fix a problem discussed in thread [2], where > DROP FUNCTION can cause issues with concurrent dependency lookups. For key > joins, this issue also extends to ALTER FUNCTION. > > The patch prevents stored expressions from depending on stale function > OIDs by locking referenced procedures before recording dependencies, and > by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting > locks before changing pg_proc. > > 0002 > The second patch implements the FOR KEY join feature. As this is a > first prototype, there are definitively things that needs to be > improved. For example, we would love feedback on our the revalidation > logic and our dependency tracking approach, that adds a new deptype for > purpose of tracking the proof facts. Another problem we didn't find a > perfect solution to, was our need to expand views during parse for proof > checking and finding constraints. > > The logics to compute the facts needed by the proof checker are kind of > complex, which is partially due to the ambition to not introduce > overhead to queries not using the new feature. > > The patch comes with a massive test suite, that we understand will need > to be trimmed down to have a chance to be committable. > > 0003 > The third patch adds information_schema.view_constraint_usage, that > shows what constraints a view depend on, due to usage of key joins in > the view's query. This is also part of the proposal. > > Joel Jacobson > Vik Fearing > Andreas Karlsson > Arne Roland > Anders Granlund > > [1] [Avoiding double-counting in aggregates with more than one join?] > (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com) > [2] [RE: Parallel INSERT SELECT take 2] > (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com) I noticed cfbot was red; I see I forgot to include these files in patch 0002: src/test/modules/injection_points/expected/key_join_proof_race_record_proc_dep.out src/test/modules/injection_points/specs/key_join_proof_race_record_proc_dep.spec Fixed in new version attached. /Joel