Thread

  1. Re: Adding REPACK [concurrently]

    Amit Kapila <amit.kapila16@gmail.com> — 2026-05-10T11:31:04Z

    On Tue, May 5, 2026 at 6:17 PM Antonin Houska <ah@cybertec.at> wrote:
    >
    > Antonin Houska <ah@cybertec.at> wrote:
    >
    > I think the problem is that with database-specific snapshot,
    > SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin
    >
    >         /*
    >          * Database specific transaction info may exist to reach CONSISTENT state
    >          * faster, however the code below makes no use of it. Moreover, such
    >          * record might cause problems because the following normal (cluster-wide)
    >          * record can have lower value of oldestRunningXid. In that case, let's
    >          * wait with the cleanup for the next regular cluster-wide record.
    >          */
    >         if (OidIsValid(running->dbid))
    >                 return;
    >
    > and thus some transactions whose XID is below running->oldestRunningXid may
    > continue to be incorrectly considered running.
    >
    > I originally thought that this should not happen because such transactions
    > will be added to the builder's array of committed transactions by
    > SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record of
    > a transaction listed in the xl_running_xacts WAL record is not guaranteed to
    > follow the xl_running_xacts record in WAL. In other words, even if
    > xl_running_xacts is created before a COMMIT record of the contained
    > transaction, it may end up at higher LSN in WAL. So the cleanup I relied on
    > might not take place.
    >
    
    BTW, is it possible to write a test by using injection_points or via
    manual steps (by using debugger, etc) so that we can more clearly
    understand this problem and proposed fix?
    
    -- 
    With Regards,
    Amit Kapila.