Thread
-
Key joins
Joel Jacobson <joel@compiler.org> — 2026-05-28T18:47:50Z
Hi hackers, This patch implements a new SQL language feature, that we intent to submit as a Change Proposal to the WG 3 SQL committee for the next meeting in Stockholm in June. We would greatly appreciate any feedback from the community. Key Joins --------- Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a subset of it, and OUTER JOINs potentially null-extend that subset. In practice, however, most joins look up additional information along a foreign key rather than resembling a cartesian product. Today a written query does not convey whether a particular JOIN simply enriches the referencing side, filters it, or fans it out: -- both sums silently inflated by the fan-out SELECT o.id, SUM(oi.amount), SUM(p.amount) FROM orders o LEFT JOIN order_items oi ON oi.order_id = o.id LEFT JOIN payments p ON p.order_id = o.id GROUP BY o.id; The above example is based on a pgsql-generals thread [1]. We propose a new JOIN syntax that makes it easy to determine locally that the immediate join result, before any further steps, just enriches the referencing side with information from the referenced side, with null-extension for OUTER JOINs. It conveys the author's intent, makes the referencing side visually clear, and is enforced at compile time against the schema. If we can't prove it, the user gets a compile-time error. Under FOR KEY the same query will not compile: SELECT o.id, SUM(oi.amount), SUM(p.amount) FROM orders o LEFT JOIN order_items oi FOR KEY (order_id) -> o (id) LEFT JOIN payments p FOR KEY (order_id) -> o (id) GROUP BY o.id; ERROR: key join from referencing relation p to referenced relation o cannot be proven LINE 7: LEFT JOIN payments AS p FOR KEY (order_id) -> o (id) ^ DETAIL: Referenced columns o (id) are not proven unique. A preceding join may duplicate rows from referenced relation o. Web demo -------- The attached Discussion paper has also been published at https://keyjoin.org with all examples in the paper runnable in the browser using a patched PGLite. Patches ------- 0001 The first patch is an attempt to fix a problem discussed in thread [2], where DROP FUNCTION can cause issues with concurrent dependency lookups. For key joins, this issue also extends to ALTER FUNCTION. The patch prevents stored expressions from depending on stale function OIDs by locking referenced procedures before recording dependencies, and by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting locks before changing pg_proc. 0002 The second patch implements the FOR KEY join feature. As this is a first prototype, there are definitively things that needs to be improved. For example, we would love feedback on our the revalidation logic and our dependency tracking approach, that adds a new deptype for purpose of tracking the proof facts. Another problem we didn't find a perfect solution to, was our need to expand views during parse for proof checking and finding constraints. The logics to compute the facts needed by the proof checker are kind of complex, which is partially due to the ambition to not introduce overhead to queries not using the new feature. The patch comes with a massive test suite, that we understand will need to be trimmed down to have a chance to be committable. 0003 The third patch adds information_schema.view_constraint_usage, that shows what constraints a view depend on, due to usage of key joins in the view's query. This is also part of the proposal. Joel Jacobson Vik Fearing Andreas Karlsson Arne Roland Anders Granlund [1] [Avoiding double-counting in aggregates with more than one join?] (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com) [2] [RE: Parallel INSERT SELECT take 2] (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)