Thread

Re: Adding REPACK [concurrently]

Amit Kapila <amit.kapila16@gmail.com> — 2026-05-14T07:02:25Z
On Wed, May 13, 2026 at 10:28 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Hello Amit,
>
> On 2026-May-13, Amit Kapila wrote:
>
> > So now the question is where do we go from here. I am not confident
> > that the current code to achieve db-specific snapshots in logical
> > decoding is the best possible solution both because of the drawbacks
> > (like we won't be able to enable this on standby) and inefficiencies
> > pointed out by me in this and previous emails in this work.
>
> This is a fair question.  I don't think we have time to go much further
> on this aspect before beta 1, so we either accept this patch, fix the
> inefficiencies you pointed out and keep db-specific snapshots,
>

I don't think it would be easy to address these inefficiencies before
beta 1. The root of those inefficiencies is that the patch reuses the
cluster-wide running_xact WAL infrastructure to log db-specific
running transactions, and then tries to feed that into the existing
snapbuild machinery to reach a consistent state.

As another example of this mismatch that occurred to me today: in
SnapBuildCommitTxn, we are tracking the committed_xip array for all
cluster-wide XIDs, even when using a db-specific snapshot. A
db-specific snapshot shouldn't need to care about XIDs from other
databases. We only try to take care of it in one part of the system
where process running_xacts record. I admit that I don't know at this
stage what exactly we should do about it but all such things deserve a
discussion and careful thought.

The broader issue is that the entire logical decoding mechanism is
designed to process cluster-wide transactions. This patch tries to
bypass that foundational assumption, but only during the initial
snapshot construction while processing running_xacts record.

To be clear, I am not against the idea of db-specific snapshots to
enable concurrent repacks. My concern is simply the time required to
get the architecture right. In its current state, we need more time to
carefully consider how this db-specific concept interacts with the
rest of the logical decoding machinery, which is built for
cluster-wide records.

> or we
> revert db-specific snapshots and go back to the standard snapshot-taking
> technique for REPACK in 19 and see what we can improve for 20.
>
> Now, the worst consequence of reverting db-specific snapshots is that
> you will only be able to run REPACK in a single database at a time
> (because any subsequent REPACK will have to wait until the first one
> finishes before being able to get its snapshot).  In most normal cases
> this is probably not a big deal.  But if you have a multitenant system,
> and you want your users to be able to run REPACK on their tables, you
> may be a bit screwed.  So I hesitate to just go and revert it without
> offering those people any alternative.
>

I understand your point but I feel we can extend the current feature
in future versions to address such cases (allow REPACK CONCURRENTLY on
tables in multiple-databases simultaneously). For now, they may need
to rely on REPACK without CONCURRENTLY option, if they want to use it
for multiple databases simultaneously.

> (It's also possible that being unable to run more than one REPACK at a
> time is not so big a deal.  After all, it's supposed to be an infrequent
> operation.  And users probably don't or shouldn't have multi-terabyte
> tables in multitenant databases anyway.)
>
> I'm not sure I understand the point of the standby.  I mean, you can't
> run REPACK on the standby anyway, so I don't see this as a very
> problematic restriction.  Do you have other reasons for wanting a
> db-specific snapshot in a standby?
>

We are exposing need_shared_catalogs as a generic plugin option,
defined as: 'it can be set to false if one is certain the plugin
functions do not access shared system catalogs.' This implies it can
be used for purposes other than REPACK.

For example, one can imagine a single-database audit plugin that only
cares about data modifications within a specific database. By setting
need_shared_catalogs = false on a standby, it could reach a CONSISTENT
state much faster, perfectly serving its needs.

While such a plugin might not exist right now, my broader point is
this: when we expose a generic facility, it can and will be used in
ways beyond our initial core use cases. We should try to ensure the
design doesn't permanently preclude such extensions. With the current
design choice, we are painting ourselves into a corner where this
feature cannot easily be extended to standbys even in the future.

-- 
With Regards,
Amit Kapila.