Thread
-
Re: Remove inner joins based on foreign keys
Richard Guo <guofenglinux@gmail.com> — 2026-05-25T21:01:08Z
(CC'ing folks who discussed this with me at PGConf.dev last week; thanks again for the conversations.) After several discussions at PGConf.dev last week, there are concerns with the lock-based predicate this thread has been converging on. I'd like to lay out the underlying issue and the available approaches in detail on-list, so the wider community can weigh in before we go further. I'm not advocating any particular option here. My goal in this email is to make sure everyone evaluating the patch is starting from the same picture. 1. The trigger gap, demonstrated -------------------------------- On master, no patch applied: CREATE TABLE users (id int PRIMARY KEY, name text); CREATE TABLE orders (id int PRIMARY KEY, user_id int REFERENCES users(id) ON DELETE CASCADE); INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'); INSERT INTO orders VALUES (10, 1), (11, 1), (20, 2); CREATE FUNCTION show_gap() RETURNS void LANGUAGE plpgsql AS $$ DECLARE u_rows text; o_rows text; BEGIN SELECT string_agg(format('%s (id=%s)', name, id), ', ' ORDER BY id) INTO u_rows FROM users; SELECT string_agg(format('(id=%s, user_id=%s)', id, user_id), ', ' ORDER BY id) INTO o_rows FROM orders; RAISE NOTICE 'users: %', u_rows; RAISE NOTICE 'orders: %', o_rows; END $$; DELETE FROM users WHERE id = 1 RETURNING show_gap(); NOTICE: users: Bob (id=2) NOTICE: orders: (id=10, user_id=1), (id=11, user_id=1), (id=20, user_id=2) show_gap() runs inside the DELETE's RETURNING clause, after Alice has been removed from users but before the cascade trigger has deleted her orders. In show_gap()'s snapshot, Alice is gone, but her two orders (10 and 11) are still there, pointing at a user_id that no longer exists. The FK invariant is locally false. Why this happens: FK constraints are enforced by AFTER ROW triggers that fire at end of statement, after RETURNING evaluation. Between when the DELETE modifies the heap and when the cascade trigger fires, the heap is transiently inconsistent. Any user code that runs in that window with a fresh snapshot observes the inconsistency. In the demo above, plpgsql's SPI call takes such a snapshot. For the FK-based inner-join-removal optimization, this matters because the rewrite from 'child JOIN parent ON c.fk = p.id' to 'child WHERE c.fk IS NOT NULL' assumes the FK invariant holds for the executing snapshot. In show_gap()'s snapshot it doesn't, and the rewrite would return Alice's orders (10 and 11) as if Alice were still present, when the joined form correctly excludes them. 2. Approaches ------------- What PG promises about FK constraints today is what the SQL standard requires: INITIALLY IMMEDIATE constraints are satisfied at end-of- statement, INITIALLY DEFERRED at commit. Neither the standard nor the PG docs explicitly promise that the FK invariant holds for every snapshot a query inside the database can use. The optimization proposed in this thread needs the stronger property: it rewrites a join on the assumption that every visible child row with non-null FK has a corresponding visible parent in the same snapshot. The trigger gap is where that stronger property fails. The approaches I have considered are listed below, with (B) being the predicate currently in the v4 patch. (A) Drop the optimization. Don't pursue FK-based inner-join removal, or any other optimization whose correctness depends on the FK invariant. (B) Lock-based predicate. inner_join_is_removable() and choose_custom_plan() consult CheckRelationOidLockedByMe(RowExclusiveLock) for the FK rels; skip the rewrite if held. RowExclusiveLock is xact-scoped and exceeds the lifetime of any in-xact snapshot, so the lock being present is sufficient to block the optimization in any context where a gap-bearing snapshot might exist. Pros: No new infrastructure. Predicate is local to two functions. CheckRelationOidLockedByMe() is a cheap local hash lookup. Correct against every gap-bearing snapshot path. Cons: Xact-scoped, therefore pessimistic. Once a DML on either FK rel runs in a transaction, the rewrite is suppressed for the rest of that transaction, even when subsequent statements take fresh snapshots that the FK invariant *does* hold for. For cached plans that previously used FK removal, this also produces a planner-replan storm: choose_custom_plan trips on every consultation in the writing xact and replans each time. (C) Snapshot-anchored predicate. Add a "captured during a gap window" bit to SnapshotData, populated in CopySnapshot when AfterTriggerPendingOnAnyRel() returns true at capture time. inner_join_is_removable() and choose_custom_plan() consult the bit on the active snapshot. Pros: Precise. Only snapshots actually captured inside a gap suppress the optimization. Sequential DML-then-SELECT in the same xact keeps the rewrite. Eliminates the replan storm and most of the plan-shape cliff that (B) introduces. Cons: SnapshotData, one of PG's most foundational abstractions, gains a field that encodes trigger-subsystem state. Every future reader of snapshot.h has to learn what the gap is. (D) Close the gap. Make the FK invariant hold for every in-xact snapshot. Either by enforcing RI synchronously inside the DML rather than via AFTER ROW triggers, or by arranging that the deleted parent tuple and the trigger's modifications to children become visible atomically. Pros: Strengthens what PG promises about constraints. Benefits any future optimization that needs the FK invariant. No gating predicate is needed in the planner or plan cache. Cons: Invasive and difficult. (E) Apply the optimization without gap handling, and document the corner cases. Skip the predicate entirely. The optimization fires whenever the FK constraint structurally permits it. Queries that execute against a gap-bearing snapshot can return wrong results, documented as a known limitation. Pros: Smallest patch. Writing a join query that runs during a trigger gap is unusual in practice for most user code. Cons: Can have wrong results. (F) Something else. 3. On splitting the patch ------------------------- Independently of which option above wins, I'd like to separate the patch into two parts so the optimization mechanics and the gap-handling can be reviewed independently (see v5 patches): Part 1. The structural inner-join-removal logic, assuming the FK invariant holds. Part 2. Whatever gap-handling we converge on from section 2 above. If (A) wins, Part 2 is replaced by dropping Part 1 entirely. Part 1 alone would be unsafe to commit by itself; the value of the split is reviewing-order, not commit-order. (Attached v5 is the split version of v4, plus addressing Alex's two comments. 0002 is still the lock-based predicate from v4, posted as the concrete reference for option (B); it can be swapped for whichever approach the gap-handling discussion settles on.) I'd like input on which of the approaches in section 2 people would be willing to live with. Pointers to prior discussion I haven't found, or design considerations I've missed, are also welcome. - Richard