Thread
-
Re: Remove inner joins based on foreign keys
Richard Guo <guofenglinux@gmail.com> — 2026-05-01T08:25:13Z
On Thu, Apr 30, 2026 at 4:50 PM Tender Wang <tndrwang@gmail.com> wrote: > The inner join between orders and users is not removed. It appears > this can be removed in this query. This is my first question. Right. This is a mis-optimization case I was aware of. The inner join can be removed in this case. inner_join_is_removable() is currently overly conservative about ref_rel->joininfo: it bails as long as the list is non-empty. We can relax this to bail only when a clause actually references ref_rel via clause_relids. A clause can appear in ref_rel->joininfo without its expression referencing ref_rel, which connects to your second question. > The second question may not be related to this $SUBJECT. > My question is that the (n.id = o.id) clause seems not related to > users. Why does the planner add it to the users' RelOptInfo's > joininfo? > > I only know that the thing happens in the > reconsider_outer_join_clauses(), the restrictinfo->required_relids > contains {1,2,3}, where the rtindex = 3 is the users relation. Because the clause "n.id = o.id" is an outer join ON clause that is non-degenerate (one that references the non-nullable side of the join) and we need to force it to be evaluated exactly at the level of the outer join. To handle that, we add the join's minimum input relid set to required_relids. That's why it appears in users' joininfo without referencing users. Attached patch relaxes the check against ref_rel->joininfo. Nothing else has changed. - Richard -
Re: Remove inner joins based on foreign keys
Tender Wang <tndrwang@gmail.com> — 2026-05-06T12:57:57Z
Richard Guo <guofenglinux@gmail.com> 于2026年5月1日周五 16:25写道: > Because the clause "n.id = o.id" is an outer join ON clause that is > non-degenerate (one that references the non-nullable side of the join) > and we need to force it to be evaluated exactly at the level of the > outer join. To handle that, we add the join's minimum input relid set > to required_relids. That's why it appears in users' joininfo without > referencing users. Thanks for the explanation. > > Attached patch relaxes the check against ref_rel->joininfo. Nothing > else has changed. CREATE TABLE users (id int primary key, name text); CREATE TABLE orders (id int primary key, user_id int not null references users(id), amount int); CREATE TABLE nation (id int primary key, name text); postgres=# explain select n.* from nation n left join (orders o join users u on o.user_id = u.id) on n.id = o.id; QUERY PLAN ------------------------------------------------------------------------- Hash Right Join (cost=38.58..74.34 rows=1270 width=36) Hash Cond: (o.id = n.id) -> Seq Scan on orders o (cost=0.00..30.40 rows=2040 width=8) -> Hash (cost=22.70..22.70 rows=1270 width=36) -> Seq Scan on nation n (cost=0.00..22.70 rows=1270 width=36) (5 rows) postgres=# explain select n.* from nation n left join orders o on n.id = o.id; QUERY PLAN ------------------------------------------------------------- Seq Scan on nation n (cost=0.00..22.70 rows=1270 width=36) (1 row) Recently, I encountered the two plans above. The first plan can continue to transform into the second plan after inner-join removing. But in current logic, we cannot do this. Because we do left-join removable first. Then we do other join(semi-join, self-join,inner-join) removable. We can only remove outer-join, whose min-righthand is single. Maybe we can call remove_useless_joins() again to remove the outer join that the function couldn't remove in the first call. I'm not sure it is worth doing this. Any thoughts? -
Re: Remove inner joins based on foreign keys
Alexandra Wang <alexandra.wang.oss@gmail.com> — 2026-05-07T19:12:06Z
Hi Richard, On Fri, May 1, 2026 at 1:25 AM Richard Guo <guofenglinux@gmail.com> wrote: > Attached patch relaxes the check against ref_rel->joininfo. Nothing > else has changed. Thanks for the patch! I've tested and reviewed v4. Overall, this looks good to me. Please find individual feedback below. On Wed, Apr 1, 2026 at 2:46 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Mon, Mar 23, 2026 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote: > > Just for the sake of archives, the timeline of the trap during a > > cascading delete looks like this: > > > > T0: DELETE FROM users WHERE id = 1; > > > > T1: The executor finds users row 1 and sets its xmax, physically > > marking it as dead. > > > > T2: [The Gap] The executor pushes the RI trigger into a queue to deal > > with orders later. Right now, the orders row still exists, but its > > referenced row in users is dead. > > > > T3: The statement finishes, the trigger fires, and the orders row is > > finally deleted. > > > > The "T2 Gap" is small, but there are several ways to execute user code > > inside that window, such as RETURNING clauses, volatile functions, or > > user-defined AFTER ROW triggers. > > I've been exploring ways to detect whether a query is executing within > the RI trigger gap. It seems one promising approach is to leverage > the lock manager. > > Every DML statement acquires RowExclusiveLock on its target table > before modifying any rows, so if the trigger gap is active, the lock > is guaranteed to be held. We can verify this during planning by > calling CheckRelationOidLockedByMe(), which is a quite efficient > check. (As a point of precedent, the planner already relies on the > local lock manager's state in several existing code paths.) > > For prepared statements and cached plans, a generic plan that was > built with FK join removal could be reused in a context where the > trigger gap is active. To handle this, we can modify > choose_custom_plan() to check whether any relation involved in an > FK-based join removal currently holds RowExclusiveLock, and choose > custom plan if so. > > The trade-off is false positives: because RowExclusiveLock persists > for the entire transaction while the trigger gap is intra-statement, > the optimization is also skipped after the DML completes but within > the same transaction, after ROLLBACK TO a savepoint, or when > RowExclusiveLock is held for other reasons (e.g., LOCK TABLE). These > seem like uncommon cases, and the query simply falls back to executing > the join normally as it would without the optimization at all, so I > think this is an acceptable trade-off. On Tue, Apr 28, 2026 at 3:09 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Wed, Apr 1, 2026 at 6:45 PM Richard Guo <guofenglinux@gmail.com> wrote: > > Please see the v2 patch for the implementation details. > > > > I didn't find any mention of this approach in the 2014 thread. I'd > > appreciate any thoughts or feedback on this direction. > > Here is a new rebase over 20efbdffe. It also tightens up some checks > in inner_join_is_removable(). > > I've reconsidered the trigger-gap issue and I think I now have a > cleaner understanding. A snapshot captured inside the trigger-gap > window can outlive the window's closure -- for example via a STABLE > function that inherits an older snapshot from its caller, as Tom > pointed out back in 2015 [1]. A query executed against such a stale > snapshot would still observe the inconsistency even after the trigger > queue has drained. So the predicate guarding the optimization has to > remain positive at least as long as any in-transaction snapshot could > still be referenced. > > I went through several alternatives (a per-statement counter > incremented in ExecInitModifyTable, AfterTriggerPendingOnRel, etc.) > and convinced myself that all of them go silent before the relevant > snapshots are gone. The lock-based predicate is still the best > correct approach I can think of: RowExclusiveLock is released only at > end of transaction, which trivially exceeds the lifetime of any > in-transaction snapshot. By the time the lock is released, every > in-transaction snapshot has been released, so no stale gap-window > snapshot can still be referenced. Maybe the false positives are just > the price we need to pay for that lifetime guarantee. Checking RowExclusiveLock makes sense to me. One nitpick on wording: the RowExclusiveLock acquired at the beginning of the DML command can be released early by ROLLBACK TO a savepoint established before that command, so it doesn't strictly "exceed the lifetime of any in-transaction snapshot." In practice this doesn't matter, though, because the lock is guaranteed to outlive any snapshot taken during the trigger gap of the DML command, which is the only window where the FK invariant can be violated. IIUC, as long as we ensure that the snapshot of the join statement -- whether it is an after-row trigger, RETURNING clause, cursor, etc. -- is taken outside of the trigger gap, which IS guarded by RowExclusiveLock, removing the join from its plan should be safe. So, setting aside the false positives you mentioned, which I think are acceptable, I believe this approach is correct. I then tried to find edge cases similar to the 2014 threads and found one bug with TABLESAMPLE: -- If the patch removes the join when the referenced table uses TABLESAMPLE, -- it will return MORE rows than correct (all child rows instead of only -- those whose parent appeared in the sample). CREATE TABLE exp_parent (id int PRIMARY KEY, data text); CREATE TABLE exp_child ( id int PRIMARY KEY, parent_id int NOT NULL REFERENCES exp_parent(id) ); INSERT INTO exp_parent SELECT g, 'row_' || g FROM generate_series(1, 100) g; INSERT INTO exp_child SELECT g, g FROM generate_series(1, 100) g; ANALYZE exp_parent; ANALYZE exp_child; -- This join should NOT be removed because TABLESAMPLE limits the parent rows EXPLAIN (COSTS OFF) SELECT c.id FROM exp_child c INNER JOIN exp_parent p TABLESAMPLE BERNOULLI(1) REPEATABLE(42) ON c.parent_id = p.id; -- Verify: if join removal fires, this would return ~100 rows. -- With the join, it should return ~1 row (1% sample). SELECT count(*) FROM exp_child c INNER JOIN exp_parent p TABLESAMPLE BERNOULLI(1) REPEATABLE(42) ON c.parent_id = p.id; We should add a condition for TABLESAMPLE in inner_join_is_removable() to disallow join removal. Minor things I think might be worth mentioning, but I don't have strong opinions: 1. Self-join removal has a GUC: enable_self_join_elimination. I didn't investigate why that was needed, and I don't know in what cases people would want to turn it off, maybe for debugging purposes? Do you think we need a similar GUC for inner joins? Disclaimer: I'm not a fan of unnecessary GUCs, just putting the question out there. 2. Nitpick: should we consider renaming remove_useless_joins() to remove_useless_left_joins() to reflect what it actually does? Best, Alex -- Alexandra Wang EDB: https://www.enterprisedb.com -
Re: Remove inner joins based on foreign keys
Richard Guo <guofenglinux@gmail.com> — 2026-05-25T21:01:08Z
(CC'ing folks who discussed this with me at PGConf.dev last week; thanks again for the conversations.) After several discussions at PGConf.dev last week, there are concerns with the lock-based predicate this thread has been converging on. I'd like to lay out the underlying issue and the available approaches in detail on-list, so the wider community can weigh in before we go further. I'm not advocating any particular option here. My goal in this email is to make sure everyone evaluating the patch is starting from the same picture. 1. The trigger gap, demonstrated -------------------------------- On master, no patch applied: CREATE TABLE users (id int PRIMARY KEY, name text); CREATE TABLE orders (id int PRIMARY KEY, user_id int REFERENCES users(id) ON DELETE CASCADE); INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'); INSERT INTO orders VALUES (10, 1), (11, 1), (20, 2); CREATE FUNCTION show_gap() RETURNS void LANGUAGE plpgsql AS $$ DECLARE u_rows text; o_rows text; BEGIN SELECT string_agg(format('%s (id=%s)', name, id), ', ' ORDER BY id) INTO u_rows FROM users; SELECT string_agg(format('(id=%s, user_id=%s)', id, user_id), ', ' ORDER BY id) INTO o_rows FROM orders; RAISE NOTICE 'users: %', u_rows; RAISE NOTICE 'orders: %', o_rows; END $$; DELETE FROM users WHERE id = 1 RETURNING show_gap(); NOTICE: users: Bob (id=2) NOTICE: orders: (id=10, user_id=1), (id=11, user_id=1), (id=20, user_id=2) show_gap() runs inside the DELETE's RETURNING clause, after Alice has been removed from users but before the cascade trigger has deleted her orders. In show_gap()'s snapshot, Alice is gone, but her two orders (10 and 11) are still there, pointing at a user_id that no longer exists. The FK invariant is locally false. Why this happens: FK constraints are enforced by AFTER ROW triggers that fire at end of statement, after RETURNING evaluation. Between when the DELETE modifies the heap and when the cascade trigger fires, the heap is transiently inconsistent. Any user code that runs in that window with a fresh snapshot observes the inconsistency. In the demo above, plpgsql's SPI call takes such a snapshot. For the FK-based inner-join-removal optimization, this matters because the rewrite from 'child JOIN parent ON c.fk = p.id' to 'child WHERE c.fk IS NOT NULL' assumes the FK invariant holds for the executing snapshot. In show_gap()'s snapshot it doesn't, and the rewrite would return Alice's orders (10 and 11) as if Alice were still present, when the joined form correctly excludes them. 2. Approaches ------------- What PG promises about FK constraints today is what the SQL standard requires: INITIALLY IMMEDIATE constraints are satisfied at end-of- statement, INITIALLY DEFERRED at commit. Neither the standard nor the PG docs explicitly promise that the FK invariant holds for every snapshot a query inside the database can use. The optimization proposed in this thread needs the stronger property: it rewrites a join on the assumption that every visible child row with non-null FK has a corresponding visible parent in the same snapshot. The trigger gap is where that stronger property fails. The approaches I have considered are listed below, with (B) being the predicate currently in the v4 patch. (A) Drop the optimization. Don't pursue FK-based inner-join removal, or any other optimization whose correctness depends on the FK invariant. (B) Lock-based predicate. inner_join_is_removable() and choose_custom_plan() consult CheckRelationOidLockedByMe(RowExclusiveLock) for the FK rels; skip the rewrite if held. RowExclusiveLock is xact-scoped and exceeds the lifetime of any in-xact snapshot, so the lock being present is sufficient to block the optimization in any context where a gap-bearing snapshot might exist. Pros: No new infrastructure. Predicate is local to two functions. CheckRelationOidLockedByMe() is a cheap local hash lookup. Correct against every gap-bearing snapshot path. Cons: Xact-scoped, therefore pessimistic. Once a DML on either FK rel runs in a transaction, the rewrite is suppressed for the rest of that transaction, even when subsequent statements take fresh snapshots that the FK invariant *does* hold for. For cached plans that previously used FK removal, this also produces a planner-replan storm: choose_custom_plan trips on every consultation in the writing xact and replans each time. (C) Snapshot-anchored predicate. Add a "captured during a gap window" bit to SnapshotData, populated in CopySnapshot when AfterTriggerPendingOnAnyRel() returns true at capture time. inner_join_is_removable() and choose_custom_plan() consult the bit on the active snapshot. Pros: Precise. Only snapshots actually captured inside a gap suppress the optimization. Sequential DML-then-SELECT in the same xact keeps the rewrite. Eliminates the replan storm and most of the plan-shape cliff that (B) introduces. Cons: SnapshotData, one of PG's most foundational abstractions, gains a field that encodes trigger-subsystem state. Every future reader of snapshot.h has to learn what the gap is. (D) Close the gap. Make the FK invariant hold for every in-xact snapshot. Either by enforcing RI synchronously inside the DML rather than via AFTER ROW triggers, or by arranging that the deleted parent tuple and the trigger's modifications to children become visible atomically. Pros: Strengthens what PG promises about constraints. Benefits any future optimization that needs the FK invariant. No gating predicate is needed in the planner or plan cache. Cons: Invasive and difficult. (E) Apply the optimization without gap handling, and document the corner cases. Skip the predicate entirely. The optimization fires whenever the FK constraint structurally permits it. Queries that execute against a gap-bearing snapshot can return wrong results, documented as a known limitation. Pros: Smallest patch. Writing a join query that runs during a trigger gap is unusual in practice for most user code. Cons: Can have wrong results. (F) Something else. 3. On splitting the patch ------------------------- Independently of which option above wins, I'd like to separate the patch into two parts so the optimization mechanics and the gap-handling can be reviewed independently (see v5 patches): Part 1. The structural inner-join-removal logic, assuming the FK invariant holds. Part 2. Whatever gap-handling we converge on from section 2 above. If (A) wins, Part 2 is replaced by dropping Part 1 entirely. Part 1 alone would be unsafe to commit by itself; the value of the split is reviewing-order, not commit-order. (Attached v5 is the split version of v4, plus addressing Alex's two comments. 0002 is still the lock-based predicate from v4, posted as the concrete reference for option (B); it can be swapped for whichever approach the gap-handling discussion settles on.) I'd like input on which of the approaches in section 2 people would be willing to live with. Pointers to prior discussion I haven't found, or design considerations I've missed, are also welcome. - Richard