Thread

  1. Re: Key joins

    Joel Jacobson <joel@compiler.org> — 2026-05-28T22:13:52Z

    On Thu, May 28, 2026, at 20:47, Joel Jacobson wrote:
    > Hi hackers,
    >
    > This patch implements a new SQL language feature, that we intent to
    > submit as a Change Proposal to the WG 3 SQL committee for the next
    > meeting in Stockholm in June.  We would greatly appreciate any feedback
    > from the community.
    >
    > Key Joins
    > ---------
    >
    > Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
    > subset of it, and OUTER JOINs potentially null-extend that subset. In
    > practice, however, most joins look up additional information along a
    > foreign key rather than resembling a cartesian product.
    >
    > Today a written query does not convey whether a particular JOIN simply
    > enriches the referencing side, filters it, or fans it out:
    >
    > -- both sums silently inflated by the fan-out
    > SELECT o.id, SUM(oi.amount), SUM(p.amount)
    > FROM orders o
    > LEFT JOIN order_items oi ON oi.order_id = o.id
    > LEFT JOIN payments    p  ON p.order_id  = o.id
    > GROUP BY o.id;
    >
    > The above example is based on a pgsql-generals thread [1].
    >
    > We propose a new JOIN syntax that makes it easy to determine locally
    > that the immediate join result, before any further steps, just enriches
    > the referencing side with information from the referenced side, with
    > null-extension for OUTER JOINs. It conveys the author's intent, makes
    > the referencing side visually clear, and is enforced at compile time
    > against the schema. If we can't prove it, the user gets a compile-time
    > error.
    >
    > Under FOR KEY the same query will not compile:
    >
    > SELECT o.id, SUM(oi.amount), SUM(p.amount)
    > FROM orders o
    > LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
    > LEFT JOIN payments    p  FOR KEY (order_id) -> o (id)
    > GROUP BY o.id;
    >
    > ERROR:  key join from referencing relation p to referenced relation o 
    > cannot be proven
    > LINE 7: LEFT JOIN payments    AS p  FOR KEY (order_id) -> o (id)
    >                                     ^
    > DETAIL:  Referenced columns o (id) are not proven unique. A preceding 
    > join may duplicate rows from referenced relation o.
    >
    > Web demo
    > --------
    >
    > The attached Discussion paper has also been published at https://keyjoin.org
    > with all examples in the paper runnable in the browser using a patched PGLite.
    >
    > Patches
    > -------
    >
    > 0001
    > The first patch is an attempt to fix a problem discussed in thread [2], where
    > DROP FUNCTION can cause issues with concurrent dependency lookups.  For key
    > joins, this issue also extends to ALTER FUNCTION.
    >
    > The patch prevents stored expressions from depending on stale function
    > OIDs by locking referenced procedures before recording dependencies, and
    > by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
    > locks before changing pg_proc.
    >
    > 0002
    > The second patch implements the FOR KEY join feature.  As this is a
    > first prototype, there are definitively things that needs to be
    > improved. For example, we would love feedback on our the revalidation
    > logic and our dependency tracking approach, that adds a new deptype for
    > purpose of tracking the proof facts.  Another problem we didn't find a
    > perfect solution to, was our need to expand views during parse for proof
    > checking and finding constraints.
    >
    > The logics to compute the facts needed by the proof checker are kind of
    > complex, which is partially due to the ambition to not introduce
    > overhead to queries not using the new feature.
    >
    > The patch comes with a massive test suite, that we understand will need
    > to be trimmed down to have a chance to be committable.
    >
    > 0003
    > The third patch adds information_schema.view_constraint_usage, that
    > shows what constraints a view depend on, due to usage of key joins in
    > the view's query.  This is also part of the proposal.
    >
    > Joel Jacobson
    > Vik Fearing
    > Andreas Karlsson
    > Arne Roland
    > Anders Granlund
    >
    > [1] [Avoiding double-counting in aggregates with more than one join?]
    > (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com)
    > [2] [RE: Parallel INSERT SELECT take 2]
    > (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)
    
    I noticed cfbot was red; I see I forgot to include these files in patch 0002:
    src/test/modules/injection_points/expected/key_join_proof_race_record_proc_dep.out
    src/test/modules/injection_points/specs/key_join_proof_race_record_proc_dep.spec
    
    Fixed in new version attached.
    
    /Joel