Thread

Key joins

Joel Jacobson <joel@compiler.org> — 2026-05-28T18:47:50Z
Hi hackers,

This patch implements a new SQL language feature, that we intent to
submit as a Change Proposal to the WG 3 SQL committee for the next
meeting in Stockholm in June.  We would greatly appreciate any feedback
from the community.

Key Joins
---------

Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
subset of it, and OUTER JOINs potentially null-extend that subset. In
practice, however, most joins look up additional information along a
foreign key rather than resembling a cartesian product.

Today a written query does not convey whether a particular JOIN simply
enriches the referencing side, filters it, or fans it out:

-- both sums silently inflated by the fan-out
SELECT o.id, SUM(oi.amount), SUM(p.amount)
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN payments    p  ON p.order_id  = o.id
GROUP BY o.id;

The above example is based on a pgsql-generals thread [1].

We propose a new JOIN syntax that makes it easy to determine locally
that the immediate join result, before any further steps, just enriches
the referencing side with information from the referenced side, with
null-extension for OUTER JOINs. It conveys the author's intent, makes
the referencing side visually clear, and is enforced at compile time
against the schema. If we can't prove it, the user gets a compile-time
error.

Under FOR KEY the same query will not compile:

SELECT o.id, SUM(oi.amount), SUM(p.amount)
FROM orders o
LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
LEFT JOIN payments    p  FOR KEY (order_id) -> o (id)
GROUP BY o.id;

ERROR:  key join from referencing relation p to referenced relation o cannot be proven
LINE 7: LEFT JOIN payments    AS p  FOR KEY (order_id) -> o (id)
                                    ^
DETAIL:  Referenced columns o (id) are not proven unique. A preceding join may duplicate rows from referenced relation o.

Web demo
--------

The attached Discussion paper has also been published at https://keyjoin.org
with all examples in the paper runnable in the browser using a patched PGLite.

Patches
-------

0001
The first patch is an attempt to fix a problem discussed in thread [2], where
DROP FUNCTION can cause issues with concurrent dependency lookups.  For key
joins, this issue also extends to ALTER FUNCTION.

The patch prevents stored expressions from depending on stale function
OIDs by locking referenced procedures before recording dependencies, and
by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
locks before changing pg_proc.

0002
The second patch implements the FOR KEY join feature.  As this is a
first prototype, there are definitively things that needs to be
improved. For example, we would love feedback on our the revalidation
logic and our dependency tracking approach, that adds a new deptype for
purpose of tracking the proof facts.  Another problem we didn't find a
perfect solution to, was our need to expand views during parse for proof
checking and finding constraints.

The logics to compute the facts needed by the proof checker are kind of
complex, which is partially due to the ambition to not introduce
overhead to queries not using the new feature.

The patch comes with a massive test suite, that we understand will need
to be trimmed down to have a chance to be committable.

0003
The third patch adds information_schema.view_constraint_usage, that
shows what constraints a view depend on, due to usage of key joins in
the view's query.  This is also part of the proposal.

Joel Jacobson
Vik Fearing
Andreas Karlsson
Arne Roland
Anders Granlund

[1] [Avoiding double-counting in aggregates with more than one join?]
(https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com)
[2] [RE: Parallel INSERT SELECT take 2]
(https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)