Thread

  1. Re: Adding REPACK [concurrently]

    Amit Kapila <amit.kapila16@gmail.com> — 2026-05-13T12:34:17Z

    On Tue, May 12, 2026 at 4:38 PM Antonin Houska <ah@cybertec.at> wrote:
    >
    > Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > >
    > > I see your point. Due to this, once the xmin regresses based on
    > > cluster-wide running_xact, some transaction could appear to be running
    > > when it should have appeared as committed.
    >
    > The problem is that xmin does not advance when it should. Attached is a test
    > that reproduces the problem (it includes [1], to handle injection points in
    > background worker), I hope the comments in the specification file are helpful.
    >
    
    Yes, the comments were helpful. IIUC, the test skipped insert into
    repack_test because the transaction doing that insert happened before
    the snapbuild reached FULL_SNAPSHOT/CONSISTENT state, so its commit is
    not decoded. Then we also didn't update builder->xmin after reaching
    CONSISTENT state in the last running_xact record for MyDatabase. So,
    the insert is neither covered in initial copy, nor decoded, so after
    repack (concurrently) is finished the table is empty.
    
    I think the patch proposed will fix this specific issue but apart from
    other points raised for this patch few more are: (a) Post CONSISTENT
    state, for cleanup of db_specific snapshots, we need to separately
    again LOG db-specific running xacts whenever we encounter another
    running_xacts, (b) Other point is that because
    repack-concurrently always use full-snapshot, the serialization of
    snapshot is useless because it can't be used by others.
    
    > It's actually not exactly the problem reported in the stress test, but IMO the
    > core issue is the same: effects of some transactions are lost. In the stress
    > test, tuple deletion was lost, so the error was "could not create unique
    > index". Here I only demonstrate lost INSERT.
    >
    > > Assuming, the problematic case is something
    > > like what I described, even than the fix of skipping cluster-wide
    > > running xacts and instead LOG db-specific running_xacts to help
    > > updating builder's xmin sounds inelegant and probably inefficient. For
    > > example, I think such a dependency means we can never enable
    > > db-specific snapshots on standby.
    >
    > ok
    >
    
    So now the question is where do we go from here. I am not confident
    that the current code to achieve db-specific snapshots in logical
    decoding is the best possible solution both because of the drawbacks
    (like we won't be able to enable this on standby) and inefficiencies
    pointed out by me in this and previous emails in this work.
    
    -- 
    With Regards,
    Amit Kapila.