Thread

  1. RE: Orphaned records in pg_replication_origin_status after subscription drop

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-22T10:15:30Z

    On Monday, December 22, 2025 2:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > On Mon, Dec 22, 2025 at 10:16 AM Zhijie Hou (Fujitsu)
    > <houzj.fnst@fujitsu.com> wrote:
    > >
    > > On Monday, December 22, 2025 7:01 AM Michael Paquier
    > <michael@paquier.xyz> wrote:
    > > > On Sat, Dec 20, 2025 at 02:55:15PM +0530, Amit Kapila wrote:
    > > > > As of today, I can't think of a case where next time (restart
    > > > > after
    > > > > error) we won't generate the same origin_name but I think this
    > > > > will add the dependency that each time the origin name should be
    > > > > generated the same.
    > > >
    > > > ReplicationOriginNameForLogicalRep() would generate the origin name
    > > > as pg_suboid_relid, so we would always reuse the same origin name on
    > > > restart as long as the subscription is not recreated, wouldn't we?
    > > >
    > > > > Also, moving just repl_origin_create part (leaving other things
    > > > > like origin_advance at its location) would need some explanation
    > > > > in comments which is that it has some dependency on
    > > > > DropSubscription and cleanup. Anyway, if we want to go with
    > > > > creating origin in a separate transaction than COPY, then we
    > > > > should change few comments like: "It is possible that the origin
    > > > > is not yet created for tablesync worker, this can happen for the
    > > > > states before SUBREL_STATE_FINISHEDCOPY." in the code.
    > > >
    > > > Hmm.  So...  Do you think that it should be OK to just create a new
    > > > transaction before we open the transaction that locks
    > > > MyLogicalRepWorker->relid (one that opens a BEGIN READ ONLY on the
    > > > remote side)?  And I guess that we would just move the
    > > > replorigin_by_name(),
    > > > replorigin_create() and ERROR into this new transaction?  Setting up
    > > > replorigin_session_setup() and replorigin_session_origin could then
    > > > be delayed until we have done the
    > > > replorigin_advance() in BEGIN READ ONLY transaction?  By that I mean
    > > > to leave the replorigin_advance() position untouched.  I have
    > > > studied this code quite a bit.  I "think" that something among these
    > > > lines would work, but I am not 100% sure if things are OK based the
    > > > relstate we store in each of these phases.  If you have an argument that
    > invalidates these lines, please feel free!
    > >
    > > I think the solution you outlined works. One nitpick is that, instead
    > > of starting a new transaction, we could create the origin within the
    > > same transaction that updates the DATASYNC states, thereby ensuring
    > > the origin information is available as soon as the catalog is updated.
    > > I think the bug won't happen as long as the origin is created in a
    > > transaction separate from the one setting up the shared memory state.
    > >
    > 
    > Agreed. But please update the comment (/* Update the state and make it
    > visible to others. */) just before that transaction to reflect that origin will also
    > be created in this transaction.
    
    Updated.
    
    > 
    > > Additionally, if we relocate the origin creation code, we need to
    > > remove the ERROR report. This is because the origin may already exist
    > > if a table sync restarts due to an ERROR during the initial COPY. This
    > > should be safe since the origin is created with the reserved name
    > > "pg_xx," preventing users from creating another origin with the same name.
    > >
    > 
    > Right.
    > 
    > + /*
    > + * Advance the origin to the LSN got from walrcv_create_slot. This is
    > + WAL
    > + * logged for the purpose of recovery. Locks are to prevent the
    > + * replication origin from vanishing while advancing.
    > + */
    > + LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
    > + replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
    > +    true /* go backward */ , true /* WAL log */ );
    > + UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
    > +
    >   /*
    >   * Setup replication origin tracking. The purpose of doing this before the
    >   * copy is to avoid doing the copy again due to any error in setting up
    >   * origin tracking.
    >   */
    > 
    > Shouldn't the second comment starting from "Setup replication origin .." be
    > merged with the previous one because it is also intended for the previous
    > code change?
    
    Agreed and merged.
    
    Here is the V2 patch which addressed above comments.
    
    Best Regards,
    Hou zj