Re: Adding REPACK [concurrently]

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>
To: Antonin Houska <ah@cybertec.at>
Cc: Mihail Nikalayeu <mihailnikalayeu@gmail.com>, Andres Freund <andres@anarazel.de>, Alvaro Herrera <alvherre@alvh.no-ip.org>, Srinath Reddy Sadipiralla <srinath2133@gmail.com>, Matthias van de Meent <boekewurm+postgres@gmail.com>, Pg Hackers <pgsql-hackers@lists.postgresql.org>, Robert Treat <rob@xzilla.net>
Date: 2026-05-10T11:31:04Z
Lists: pgsql-hackers
On Tue, May 5, 2026 at 6:17 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Antonin Houska <ah@cybertec.at> wrote:
>
> I think the problem is that with database-specific snapshot,
> SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin
>
>         /*
>          * Database specific transaction info may exist to reach CONSISTENT state
>          * faster, however the code below makes no use of it. Moreover, such
>          * record might cause problems because the following normal (cluster-wide)
>          * record can have lower value of oldestRunningXid. In that case, let's
>          * wait with the cleanup for the next regular cluster-wide record.
>          */
>         if (OidIsValid(running->dbid))
>                 return;
>
> and thus some transactions whose XID is below running->oldestRunningXid may
> continue to be incorrectly considered running.
>
> I originally thought that this should not happen because such transactions
> will be added to the builder's array of committed transactions by
> SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record of
> a transaction listed in the xl_running_xacts WAL record is not guaranteed to
> follow the xl_running_xacts record in WAL. In other words, even if
> xl_running_xacts is created before a COMMIT record of the contained
> transaction, it may end up at higher LSN in WAL. So the cleanup I relied on
> might not take place.
>

BTW, is it possible to write a test by using injection_points or via
manual steps (by using debugger, etc) so that we can more clearly
understand this problem and proposed fix?

-- 
With Regards,
Amit Kapila.