Thread

  1. Improve conflict detection when replication origins are reused

    Nisha Moond <nisha.moond412@gmail.com> — 2026-05-14T03:05:06Z

    Hi hackers,
    
    While reviewing the issue reported at [1] and the proposed solutions
    at [2], I noticed a related problem: false negative conflict detection
    when a 'ReplOriginId' gets reused.
    
    In logical replication, conflict detection relies on the tuple’s
    replication origin ('roident'). The problem is that if a subscription
    is dropped and a new subscription later reuses the same origin ID, the
    apply worker may incorrectly treat incoming changes as “its own”
    changes and skip conflict detection.
    
    A simple example:
      1. Create subscription sub1 with 'roident = 1'
      2. Replicate some rows into table 't1'
      3. Drop 'sub1'
      4. Create another subscription 'sub2'
      5. `sub2` reuses 'roident = 1'
      6. New updates arrive for rows previously written by 'sub1'
      At this point, conflict detection sees:
          tuple_origin == current_origin
    
    and incorrectly assumes the row was written by the current
    subscription instance, so no 'update_origin_differ' conflict is
    raised.
    
    This may look harmless in this simple setup, but it becomes
    problematic if the new subscription is connected to a different
    publisher, because real conflicts can then be silently missed.
    
    I explored two possible approaches to solve this:
    
    Approach 1. Zero out old origin IDs in commit_ts data when dropping a
    subscription
    ----------------------
     - When a subscription is dropped and its replication origin becomes
    free, scan all 'commit_ts' SLRU entries and replace that old origin ID
    with 'InvalidRepOriginId (0)'.
     - So rows previously written by the old subscription would no longer
    appear to belong to any active replication origin.
     - A new subscription reusing the same 'roident' will always conflict
    with origin '0'.
    
    Pros:
     - Fixes the stale-origin problem completely and may also help solve
    the tablesync-origin issue discussed in [1]
     - No additional checks needed during conflict detection
    
    Cons:
     - Requires scanning the entire 'commit_ts' SLRU during DROP
    SUBSCRIPTION, so it can become very expensive on large systems
     - Not crash-safe currently(patch):
        - if the server crashes midway, some entries may still contain the
    old origin ID
        - after restart, reused origins can again lead to missed conflicts
     - Making this fully crash-safe would likely require WAL logging or
    recovery-time reprocessing.
    
    Approach 2. Store replication origin creation time
    ----------------------
     - Add a creation timestamp for each replication origin
     - During conflict check:
        if tuple_origin != current_origin
            -> existing behavior
        if tuple_origin == current_origin
            -> compare tuple commit timestamp with origin creation time
            if tuple_commit_ts <= origin_creation_time
                -> treat as an origin reuse case and raise conflict
    
    Pros:
    -------
     - No additional processing during DROP SUBSCRIPTION
     - Lightweight runtime check (just one timestamp comparison)
     - Naturally crash-safe since origin creation is WAL-logged already
    
    Cons:
     - Requires a catalog schema change
     - The <= comparison can produce false-positive conflicts for rows
    committed at the exact same microsecond as origin creation
     -  May require additional handling for upgraded origins
    
    IMO, the second approach currently looks more practical because it
    avoids the heavy SLRU scan and crash-recovery complexity.
    
    Attached:
     - Patch for approach 1
     - Patch for approach 2
     - A TAP test reproducing the issue
    
    Note: The patches are manually tested for the reported issue, but not
    yet tested for performance or additional edge cases.
    
    Feedback and suggestions are welcome.
    
    [1] https://www.postgresql.org/message-id/CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu%3Dog%40mail.gmail.com
    [2] https://www.postgresql.org/message-id/CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg%40mail.gmail.com
    
    --
    Thanks,
    Nisha