Thread
-
Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
Adam Brusselback <adambrusselback@gmail.com> — 2026-05-26T18:53:28Z
Hi Vellaipandiyan, thanks for the review. > I wonder whether overlapping refreshes could still encounter deadlock > scenarios around UPSERT conflicts. That was definitely a gap. The FOR UPDATE step previously issued: SELECT 1 FROM mv WHERE (...) FOR UPDATE with no ORDER BY, so two overlapping refreshes could lock the existing rows in different physical orders and deadlock before either one reached the next statement. To fix this, the next patch gives the locking SELECT a deterministic ORDER BY on the unique key columns so every refresh acquires row locks in the same sequence. The upsert now feeds from an ordered source as well. > The CONCURRENTLY behavior also feels somewhat unintuitive here. I'll include this change in the next patch. After more thinking about it (and some feedback), I decided that going this way seems better. So any prior discussion about CONCURRENTLY vs without is now talking about the inverse of the current version of the patch. Additionally, I've fixed a "scope drift" issue where the two different implementations had inconsistent behavior. A row's non-key column changes in the base table, shifting it into the WHERE predicate's scope. Previously, this caused a unique constraint violation for the match_merge path because the old row was invisible to the filter and never deleted. This was previously handled by the direct_mod path properly. Now, both the direct-modification and match/merge paths resolve this by using an INSERT ... ON CONFLICT DO UPDATE step (or DO NOTHING if there are no non-key columns) against the arbiter index. This safely updates the out-of-scope stale row in place without duplicating the key. > It may also help to document the intended guarantees around overlapping > partial refreshes and concurrent DML on base tables. Here is my attempt at saying what the implementation actually guarantees; I will fit this into the docs in some way. If anyone notices something wrong with the below, please speak up. Concurrent Partial Refresh (direct_mod path): The FOR UPDATE / upsert-CTE path takes RowExclusiveLock. Readers are never blocked. Neither RowExclusiveLock nor the per-row FOR UPDATE locks conflict with a plain SELECT. The refresh issues two statements in one transaction: the locking SELECT, then the select/upsert/ anti-join-delete CTE. Under the default READ COMMITTED, these take separate snapshots, but the FOR UPDATE row locks are held to transaction end, so they bridge both statements. An overlapping refresh blocks on those locks for the full duration of the refreshing transaction, not just for the locking SELECT, so the gap between the two statements does not expose existing MV rows. The CTE's upsert and delete are a single statement, so a reader sees either the pre-refresh or post-refresh state of an affected row, never a half-applied one. Concurrent partial refreshes whose predicates touch overlapping existing MV rows are serialized. The second waits on the first's FOR UPDATE locks. Refreshes over disjoint row sets run in parallel. Within the predicate scope, the MV is made consistent with the query snapshot: rows present in the snapshot are upserted, rows that no longer appear are deleted via the anti-join. Rows whose key falls outside the predicate are not touched. Note this is keyed on the unique index, not on the predicate. As mentioned with the drift fix, if the predicate matches a fresh row whose key collides with an existing MV row that does not currently match the predicate, ON CONFLICT will update that existing row, and the step-one FOR UPDATE will not have locked it. This only arises when the predicate references non-key columns. When the predicate ranges over the unique-key columns, the colliding row necessarily matches too. It does not lock the base tables. Base-table DML committed after the refresh's snapshot is not reflected, identical to a normal full REFRESH. SELECT FOR UPDATE only serializes overlapping refreshes covering rows that already exist in the MV. Two refreshes that both insert the same new logical key are serialized by ON CONFLICT and the unique index, not by FOR UPDATE. The outcome is still correct. The last writer wins on that key. The predicate must be non-volatile (enforced) and a usable unique index is required (enforced). The above assumes READ COMMITTED. I haven't thought through how things will work with other isolation levels. Non-Concurrent Partial Refresh (match_merge path): The WHERE diff/merge path takes an ExclusiveLock and operates similarly to the standard full-refresh diff/merge, but with the diff scope restricted to rows matching the predicate. Readers are allowed, writers are blocked, and overlapping refreshes are serialized at the table level. The main difference from a full concurrent refresh is that its final insert uses ON CONFLICT DO UPDATE (just like the direct_mod path) specifically to resolve unique key violations caused by rows drifting into the predicate's scope. I will provide a new patch shortly. Thanks, Adam Brusselback