Thread

  1. Re: Adding REPACK [concurrently]

    Amit Kapila <amit.kapila16@gmail.com> — 2026-05-14T07:02:25Z

    On Wed, May 13, 2026 at 10:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    >
    > Hello Amit,
    >
    > On 2026-May-13, Amit Kapila wrote:
    >
    > > So now the question is where do we go from here. I am not confident
    > > that the current code to achieve db-specific snapshots in logical
    > > decoding is the best possible solution both because of the drawbacks
    > > (like we won't be able to enable this on standby) and inefficiencies
    > > pointed out by me in this and previous emails in this work.
    >
    > This is a fair question.  I don't think we have time to go much further
    > on this aspect before beta 1, so we either accept this patch, fix the
    > inefficiencies you pointed out and keep db-specific snapshots,
    >
    
    I don't think it would be easy to address these inefficiencies before
    beta 1. The root of those inefficiencies is that the patch reuses the
    cluster-wide running_xact WAL infrastructure to log db-specific
    running transactions, and then tries to feed that into the existing
    snapbuild machinery to reach a consistent state.
    
    As another example of this mismatch that occurred to me today: in
    SnapBuildCommitTxn, we are tracking the committed_xip array for all
    cluster-wide XIDs, even when using a db-specific snapshot. A
    db-specific snapshot shouldn't need to care about XIDs from other
    databases. We only try to take care of it in one part of the system
    where process running_xacts record. I admit that I don't know at this
    stage what exactly we should do about it but all such things deserve a
    discussion and careful thought.
    
    The broader issue is that the entire logical decoding mechanism is
    designed to process cluster-wide transactions. This patch tries to
    bypass that foundational assumption, but only during the initial
    snapshot construction while processing running_xacts record.
    
    To be clear, I am not against the idea of db-specific snapshots to
    enable concurrent repacks. My concern is simply the time required to
    get the architecture right. In its current state, we need more time to
    carefully consider how this db-specific concept interacts with the
    rest of the logical decoding machinery, which is built for
    cluster-wide records.
    
    > or we
    > revert db-specific snapshots and go back to the standard snapshot-taking
    > technique for REPACK in 19 and see what we can improve for 20.
    >
    > Now, the worst consequence of reverting db-specific snapshots is that
    > you will only be able to run REPACK in a single database at a time
    > (because any subsequent REPACK will have to wait until the first one
    > finishes before being able to get its snapshot).  In most normal cases
    > this is probably not a big deal.  But if you have a multitenant system,
    > and you want your users to be able to run REPACK on their tables, you
    > may be a bit screwed.  So I hesitate to just go and revert it without
    > offering those people any alternative.
    >
    
    I understand your point but I feel we can extend the current feature
    in future versions to address such cases (allow REPACK CONCURRENTLY on
    tables in multiple-databases simultaneously). For now, they may need
    to rely on REPACK without CONCURRENTLY option, if they want to use it
    for multiple databases simultaneously.
    
    > (It's also possible that being unable to run more than one REPACK at a
    > time is not so big a deal.  After all, it's supposed to be an infrequent
    > operation.  And users probably don't or shouldn't have multi-terabyte
    > tables in multitenant databases anyway.)
    >
    > I'm not sure I understand the point of the standby.  I mean, you can't
    > run REPACK on the standby anyway, so I don't see this as a very
    > problematic restriction.  Do you have other reasons for wanting a
    > db-specific snapshot in a standby?
    >
    
    We are exposing need_shared_catalogs as a generic plugin option,
    defined as: 'it can be set to false if one is certain the plugin
    functions do not access shared system catalogs.' This implies it can
    be used for purposes other than REPACK.
    
    For example, one can imagine a single-database audit plugin that only
    cares about data modifications within a specific database. By setting
    need_shared_catalogs = false on a standby, it could reach a CONSISTENT
    state much faster, perfectly serving its needs.
    
    While such a plugin might not exist right now, my broader point is
    this: when we expose a generic facility, it can and will be used in
    ways beyond our initial core use cases. We should try to ensure the
    design doesn't permanently preclude such extensions. With the current
    design choice, we are painting ourselves into a corner where this
    feature cannot easily be extended to standbys even in the future.
    
    -- 
    With Regards,
    Amit Kapila.