Thread

  1. Key joins

    Joel Jacobson <joel@compiler.org> — 2026-05-28T18:47:50Z

    Hi hackers,
    
    This patch implements a new SQL language feature, that we intent to
    submit as a Change Proposal to the WG 3 SQL committee for the next
    meeting in Stockholm in June.  We would greatly appreciate any feedback
    from the community.
    
    Key Joins
    ---------
    
    Mathematically, CROSS JOINs are a cartesian product, INNER JOINs a
    subset of it, and OUTER JOINs potentially null-extend that subset. In
    practice, however, most joins look up additional information along a
    foreign key rather than resembling a cartesian product.
    
    Today a written query does not convey whether a particular JOIN simply
    enriches the referencing side, filters it, or fans it out:
    
    -- both sums silently inflated by the fan-out
    SELECT o.id, SUM(oi.amount), SUM(p.amount)
    FROM orders o
    LEFT JOIN order_items oi ON oi.order_id = o.id
    LEFT JOIN payments    p  ON p.order_id  = o.id
    GROUP BY o.id;
    
    The above example is based on a pgsql-generals thread [1].
    
    We propose a new JOIN syntax that makes it easy to determine locally
    that the immediate join result, before any further steps, just enriches
    the referencing side with information from the referenced side, with
    null-extension for OUTER JOINs. It conveys the author's intent, makes
    the referencing side visually clear, and is enforced at compile time
    against the schema. If we can't prove it, the user gets a compile-time
    error.
    
    Under FOR KEY the same query will not compile:
    
    SELECT o.id, SUM(oi.amount), SUM(p.amount)
    FROM orders o
    LEFT JOIN order_items oi FOR KEY (order_id) -> o (id)
    LEFT JOIN payments    p  FOR KEY (order_id) -> o (id)
    GROUP BY o.id;
    
    ERROR:  key join from referencing relation p to referenced relation o cannot be proven
    LINE 7: LEFT JOIN payments    AS p  FOR KEY (order_id) -> o (id)
                                        ^
    DETAIL:  Referenced columns o (id) are not proven unique. A preceding join may duplicate rows from referenced relation o.
    
    Web demo
    --------
    
    The attached Discussion paper has also been published at https://keyjoin.org
    with all examples in the paper runnable in the browser using a patched PGLite.
    
    Patches
    -------
    
    0001
    The first patch is an attempt to fix a problem discussed in thread [2], where
    DROP FUNCTION can cause issues with concurrent dependency lookups.  For key
    joins, this issue also extends to ALTER FUNCTION.
    
    The patch prevents stored expressions from depending on stale function
    OIDs by locking referenced procedures before recording dependencies, and
    by making CREATE OR REPLACE FUNCTION and ALTER FUNCTION take conflicting
    locks before changing pg_proc.
    
    0002
    The second patch implements the FOR KEY join feature.  As this is a
    first prototype, there are definitively things that needs to be
    improved. For example, we would love feedback on our the revalidation
    logic and our dependency tracking approach, that adds a new deptype for
    purpose of tracking the proof facts.  Another problem we didn't find a
    perfect solution to, was our need to expand views during parse for proof
    checking and finding constraints.
    
    The logics to compute the facts needed by the proof checker are kind of
    complex, which is partially due to the ambition to not introduce
    overhead to queries not using the new feature.
    
    The patch comes with a massive test suite, that we understand will need
    to be trimmed down to have a chance to be committable.
    
    0003
    The third patch adds information_schema.view_constraint_usage, that
    shows what constraints a view depend on, due to usage of key joins in
    the view's query.  This is also part of the proposal.
    
    Joel Jacobson
    Vik Fearing
    Andreas Karlsson
    Arne Roland
    Anders Granlund
    
    [1] [Avoiding double-counting in aggregates with more than one join?]
    (https://www.postgresql.org/message-id/flat/86b9ec78-925c-1935-bc9d-6bad4ceb1f40@illuminatedcomputing.com)
    [2] [RE: Parallel INSERT SELECT take 2]
    (https://www.postgresql.org/message-id/TY4PR01MB17718A4DE63020A9EA5E9CB6594382%40TY4PR01MB17718.jpnprd01.prod.outlook.com)