Thread

  1. Re: Remove inner joins based on foreign keys

    Richard Guo <guofenglinux@gmail.com> — 2026-05-25T21:01:08Z

    (CC'ing folks who discussed this with me at PGConf.dev last week;
    thanks again for the conversations.)
    
    After several discussions at PGConf.dev last week, there are concerns
    with the lock-based predicate this thread has been converging on.  I'd
    like to lay out the underlying issue and the available approaches in
    detail on-list, so the wider community can weigh in before we go
    further.  I'm not advocating any particular option here.  My goal in
    this email is to make sure everyone evaluating the patch is starting
    from the same picture.
    
    
    1. The trigger gap, demonstrated
    --------------------------------
    
    On master, no patch applied:
    
    CREATE TABLE users  (id int PRIMARY KEY, name text);
    CREATE TABLE orders (id int PRIMARY KEY,
                         user_id int REFERENCES users(id) ON DELETE CASCADE);
    INSERT INTO users  VALUES (1, 'Alice'), (2, 'Bob');
    INSERT INTO orders VALUES (10, 1), (11, 1), (20, 2);
    
    CREATE FUNCTION show_gap() RETURNS void LANGUAGE plpgsql AS $$
    DECLARE u_rows text; o_rows text;
    BEGIN
      SELECT string_agg(format('%s (id=%s)', name, id),
                        ', ' ORDER BY id)
        INTO u_rows FROM users;
      SELECT string_agg(format('(id=%s, user_id=%s)', id, user_id),
                        ', ' ORDER BY id)
        INTO o_rows FROM orders;
      RAISE NOTICE 'users:  %', u_rows;
      RAISE NOTICE 'orders: %', o_rows;
    END $$;
    
    DELETE FROM users WHERE id = 1 RETURNING show_gap();
    
    NOTICE:  users:  Bob (id=2)
    NOTICE:  orders: (id=10, user_id=1), (id=11, user_id=1), (id=20, user_id=2)
    
    show_gap() runs inside the DELETE's RETURNING clause, after Alice has
    been removed from users but before the cascade trigger has deleted her
    orders.  In show_gap()'s snapshot, Alice is gone, but her two orders
    (10 and 11) are still there, pointing at a user_id that no longer
    exists.  The FK invariant is locally false.
    
    Why this happens: FK constraints are enforced by AFTER ROW triggers
    that fire at end of statement, after RETURNING evaluation.  Between
    when the DELETE modifies the heap and when the cascade trigger fires,
    the heap is transiently inconsistent.  Any user code that runs in that
    window with a fresh snapshot observes the inconsistency.  In the demo
    above, plpgsql's SPI call takes such a snapshot.
    
    For the FK-based inner-join-removal optimization, this matters because
    the rewrite from 'child JOIN parent ON c.fk = p.id' to 'child WHERE
    c.fk IS NOT NULL' assumes the FK invariant holds for the executing
    snapshot.  In show_gap()'s snapshot it doesn't, and the rewrite would
    return Alice's orders (10 and 11) as if Alice were still present, when
    the joined form correctly excludes them.
    
    
    2. Approaches
    -------------
    
    What PG promises about FK constraints today is what the SQL standard
    requires: INITIALLY IMMEDIATE constraints are satisfied at end-of-
    statement, INITIALLY DEFERRED at commit.  Neither the standard nor the
    PG docs explicitly promise that the FK invariant holds for every
    snapshot a query inside the database can use.  The optimization
    proposed in this thread needs the stronger property: it rewrites a
    join on the assumption that every visible child row with non-null FK
    has a corresponding visible parent in the same snapshot.  The trigger
    gap is where that stronger property fails.
    
    The approaches I have considered are listed below, with (B) being
    the predicate currently in the v4 patch.
    
    (A) Drop the optimization.
    
    Don't pursue FK-based inner-join removal, or any other optimization
    whose correctness depends on the FK invariant.
    
    (B) Lock-based predicate.
    
    inner_join_is_removable() and choose_custom_plan() consult
    CheckRelationOidLockedByMe(RowExclusiveLock) for the FK rels; skip the
    rewrite if held.  RowExclusiveLock is xact-scoped and exceeds the
    lifetime of any in-xact snapshot, so the lock being present is
    sufficient to block the optimization in any context where a
    gap-bearing snapshot might exist.
    
    Pros: No new infrastructure.  Predicate is local to two functions.
    CheckRelationOidLockedByMe() is a cheap local hash lookup.  Correct
    against every gap-bearing snapshot path.
    
    Cons: Xact-scoped, therefore pessimistic.  Once a DML on either FK rel
    runs in a transaction, the rewrite is suppressed for the rest of that
    transaction, even when subsequent statements take fresh snapshots that
    the FK invariant *does* hold for.  For cached plans that previously
    used FK removal, this also produces a planner-replan storm:
    choose_custom_plan trips on every consultation in the writing xact and
    replans each time.
    
    (C) Snapshot-anchored predicate.
    
    Add a "captured during a gap window" bit to SnapshotData, populated in
    CopySnapshot when AfterTriggerPendingOnAnyRel() returns true at
    capture time.  inner_join_is_removable() and choose_custom_plan()
    consult the bit on the active snapshot.
    
    Pros: Precise.  Only snapshots actually captured inside a gap suppress
    the optimization.  Sequential DML-then-SELECT in the same xact keeps
    the rewrite.  Eliminates the replan storm and most of the plan-shape
    cliff that (B) introduces.
    
    Cons: SnapshotData, one of PG's most foundational abstractions, gains
    a field that encodes trigger-subsystem state.  Every future reader of
    snapshot.h has to learn what the gap is.
    
    (D) Close the gap.
    
    Make the FK invariant hold for every in-xact snapshot.  Either by
    enforcing RI synchronously inside the DML rather than via AFTER ROW
    triggers, or by arranging that the deleted parent tuple and the
    trigger's modifications to children become visible atomically.
    
    Pros: Strengthens what PG promises about constraints.  Benefits any
    future optimization that needs the FK invariant.  No gating predicate
    is needed in the planner or plan cache.
    
    Cons: Invasive and difficult.
    
    (E) Apply the optimization without gap handling, and document the
    corner cases.
    
    Skip the predicate entirely.  The optimization fires whenever the FK
    constraint structurally permits it.  Queries that execute against a
    gap-bearing snapshot can return wrong results, documented as a known
    limitation.
    
    Pros: Smallest patch.  Writing a join query that runs during a trigger
    gap is unusual in practice for most user code.
    
    Cons: Can have wrong results.
    
    (F) Something else.
    
    
    3. On splitting the patch
    -------------------------
    
    Independently of which option above wins, I'd like to separate the
    patch into two parts so the optimization mechanics and the
    gap-handling can be reviewed independently (see v5 patches):
    
    Part 1. The structural inner-join-removal logic, assuming the FK
    invariant holds.
    
    Part 2. Whatever gap-handling we converge on from section 2 above.  If
    (A) wins, Part 2 is replaced by dropping Part 1 entirely.
    
    Part 1 alone would be unsafe to commit by itself; the value of the
    split is reviewing-order, not commit-order.
    
    (Attached v5 is the split version of v4, plus addressing Alex's two
    comments.  0002 is still the lock-based predicate from v4, posted as
    the concrete reference for option (B); it can be swapped for whichever
    approach the gap-handling discussion settles on.)
    
    
    I'd like input on which of the approaches in section 2 people would be
    willing to live with.  Pointers to prior discussion I haven't found,
    or design considerations I've missed, are also welcome.
    
    - Richard