Improve conflict detection when replication origins are reused

Nisha Moond <nisha.moond412@gmail.com>

From: Nisha Moond <nisha.moond412@gmail.com>
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2026-05-14T03:05:06Z
Lists: pgsql-hackers

Attachments

Hi hackers,

While reviewing the issue reported at [1] and the proposed solutions
at [2], I noticed a related problem: false negative conflict detection
when a 'ReplOriginId' gets reused.

In logical replication, conflict detection relies on the tuple’s
replication origin ('roident'). The problem is that if a subscription
is dropped and a new subscription later reuses the same origin ID, the
apply worker may incorrectly treat incoming changes as “its own”
changes and skip conflict detection.

A simple example:
  1. Create subscription sub1 with 'roident = 1'
  2. Replicate some rows into table 't1'
  3. Drop 'sub1'
  4. Create another subscription 'sub2'
  5. `sub2` reuses 'roident = 1'
  6. New updates arrive for rows previously written by 'sub1'
  At this point, conflict detection sees:
      tuple_origin == current_origin

and incorrectly assumes the row was written by the current
subscription instance, so no 'update_origin_differ' conflict is
raised.

This may look harmless in this simple setup, but it becomes
problematic if the new subscription is connected to a different
publisher, because real conflicts can then be silently missed.

I explored two possible approaches to solve this:

Approach 1. Zero out old origin IDs in commit_ts data when dropping a
subscription
----------------------
 - When a subscription is dropped and its replication origin becomes
free, scan all 'commit_ts' SLRU entries and replace that old origin ID
with 'InvalidRepOriginId (0)'.
 - So rows previously written by the old subscription would no longer
appear to belong to any active replication origin.
 - A new subscription reusing the same 'roident' will always conflict
with origin '0'.

Pros:
 - Fixes the stale-origin problem completely and may also help solve
the tablesync-origin issue discussed in [1]
 - No additional checks needed during conflict detection

Cons:
 - Requires scanning the entire 'commit_ts' SLRU during DROP
SUBSCRIPTION, so it can become very expensive on large systems
 - Not crash-safe currently(patch):
    - if the server crashes midway, some entries may still contain the
old origin ID
    - after restart, reused origins can again lead to missed conflicts
 - Making this fully crash-safe would likely require WAL logging or
recovery-time reprocessing.

Approach 2. Store replication origin creation time
----------------------
 - Add a creation timestamp for each replication origin
 - During conflict check:
    if tuple_origin != current_origin
        -> existing behavior
    if tuple_origin == current_origin
        -> compare tuple commit timestamp with origin creation time
        if tuple_commit_ts <= origin_creation_time
            -> treat as an origin reuse case and raise conflict

Pros:
-------
 - No additional processing during DROP SUBSCRIPTION
 - Lightweight runtime check (just one timestamp comparison)
 - Naturally crash-safe since origin creation is WAL-logged already

Cons:
 - Requires a catalog schema change
 - The <= comparison can produce false-positive conflicts for rows
committed at the exact same microsecond as origin creation
 -  May require additional handling for upgraded origins

IMO, the second approach currently looks more practical because it
avoids the heavy SLRU scan and crash-recovery complexity.

Attached:
 - Patch for approach 1
 - Patch for approach 2
 - A TAP test reproducing the issue

Note: The patches are manually tested for the reported issue, but not
yet tested for performance or additional edge cases.

Feedback and suggestions are welcome.

[1] https://www.postgresql.org/message-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu%3Dog%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg%40mail.gmail.com

--
Thanks,
Nisha