Re: Key joins

Joel Jacobson <joel@compiler.org>

From: "Joel Jacobson" <joel@compiler.org>

To: pgsql-hackers <pgsql-hackers@postgresql.org>

Cc: "Arne Roland" <arne.roland@malkut.net>, "Anders Granlund" <anders.granlund.0@gmail.com>, "Andreas Karlsson" <andreas@proxel.se>, "Vik Fearing" <vik@chouppes.com>

Date: 2026-05-28T22:13:52Z

Lists: pgsql-hackers

Attachments

v2-0003-Add-information_schema.view_constraint_usage.patch.gz (application/x-gzip)
v2-0002-Implement-FOR-KEY-join-support.patch.gz (application/x-gzip)
v2-0001-Lock-procedures-before-recording-dependencies.patch.gz (application/x-gzip)

On Thu, May 28, 2026, at 20:47, Joel Jacobson wrote:
> Hi hackers,
>
> This patch implements a new SQL language feature, that we intent to
> submit as a Change Proposal to the WG 3 SQL committee for the next
> meeting in Stockholm in June.  We would greatly appreciate any feedback
> from the community.
>
> Key Joins
> ---------
>
> Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
> subset of it, and OUTER JOINs potentially null-extend that subset. In
> practice, however, most joins look up additional information along a
> foreign key rather than resembling a cartesian product.
>
> Today a written query does not convey whether a particular JOIN simply
> enriches the referencing side, filters it, or fans it out:
>
> -- both sums silently inflated by the fan-out
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi ON oi.order_id = o.id
> LEFT JOIN payments    p  ON p.order_id  = o.id
> GROUP BY o.id;
>
> The above example is based on a pgsql-generals thread [1].
>
> We propose a new JOIN syntax that makes it easy to determine locally
> that the immediate join result, before any further steps, just enriches
> the referencing side with information from the referenced side, with
> null-extension for OUTER JOINs. It conveys the author's intent, makes
> the referencing side visually clear, and is enforced at compile time
> against the schema. If we can't prove it, the user gets a compile-time
> error.
>
> Under FOR KEY the same query will not compile:
>
> SELECT o.id, SUM(oi.amount), SUM(p.amount)
> FROM orders o
> LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
> LEFT JOIN payments    p  FOR KEY (order_id) -> o (id)
> GROUP BY o.id;
>
> ERROR:  key join from referencing relation p to referenced relation o 
> cannot be proven
> LINE 7: LEFT JOIN payments    AS p  FOR KEY (order_id) -> o (id)
>                                     ^
> DETAIL:  Referenced columns o (id) are not proven unique. A preceding 
> join may duplicate rows from referenced relation o.
>
> Web demo
> --------
>
> The attached Discussion paper has also been published at https://keyjoin.org
> with all examples in the paper runnable in the browser using a patched PGLite.
>
> Patches
> -------
>
> 0001
> The first patch is an attempt to fix a problem discussed in thread [2], where
> DROP FUNCTION can cause issues with concurrent dependency lookups.  For key
> joins, this issue also extends to ALTER FUNCTION.
>
> The patch prevents stored expressions from depending on stale function
> OIDs by locking referenced procedures before recording dependencies, and
> by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
> locks before changing pg_proc.
>
> 0002
> The second patch implements the FOR KEY join feature.  As this is a
> first prototype, there are definitively things that needs to be
> improved. For example, we would love feedback on our the revalidation
> logic and our dependency tracking approach, that adds a new deptype for
> purpose of tracking the proof facts.  Another problem we didn't find a
> perfect solution to, was our need to expand views during parse for proof
> checking and finding constraints.
>
> The logics to compute the facts needed by the proof checker are kind of
> complex, which is partially due to the ambition to not introduce
> overhead to queries not using the new feature.
>
> The patch comes with a massive test suite, that we understand will need
> to be trimmed down to have a chance to be committable.
>
> 0003
> The third patch adds information_schema.view_constraint_usage, that
> shows what constraints a view depend on, due to usage of key joins in
> the view's query.  This is also part of the proposal.
>
> Joel Jacobson
> Vik Fearing
> Andreas Karlsson
> Arne Roland
> Anders Granlund
>
> [1] [Avoiding double-counting in aggregates with more than one join?]
> (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com)
> [2] [RE: Parallel INSERT SELECT take 2]
> (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)

I noticed cfbot was red; I see I forgot to include these files in patch 0002:
src/test/modules/injection_points/expected/key_join_proof_race_record_proc_dep.out
src/test/modules/injection_points/specs/key_join_proof_race_record_proc_dep.spec

Fixed in new version attached.

/Joel