Thread

  1. Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

    Hannu Krosing <hannuk@google.com> — 2025-11-28T21:01:04Z

    On Fri, Nov 28, 2025 at 9:42 PM Mihail Nikalayeu
    <mihailnikalayeu@gmail.com> wrote:
    >
    > Hi, Hannu!
    >
    > I think you pressed "Reply" instead of "Reply All" - so, I put it to
    > the list (looks like nothing is secret here).
    > Mostly it is because of my opinion at the end of the mail which I want
    > to share with the list.
    
    Thanks, and yes, it was meant for the list.
    
    > On Fri, Nov 28, 2025 at 8:33 PM Hannu Krosing <hannuk@google.com> wrote:
    > > If it is an *index AM* then this may not solve HOT chains issue (see
    > > below), if we put it on top of *table AM* as some kind of pass-through
    > > collector then likely yes, though you may still want to do final sort
    > > in commit order to know which one is the latest version of updated
    > > tuples which needs to go in the index. The latter is not strictly
    > > needed, but would be a nice optimisation for oft-updated rows.
    >
    > It is AM which is added as an index (with the same
    > columns/expressions/predicates) to the table before phase 1 starts.
    > So, all new tuples are inserted into it.
    >
    > > And I would not collect just TID, but also the indexes value, as else
    > > we end up accessing the table in some random order for getting the
    > > value (and possibly do visibility checks)
    > Just TIDs - it is ordered at validation phase (while merging with an
    > main index) and read using AIO - pretty fast.
    
    It is a space vs work compromise - you either collect it at once or
    have to read it again later. Even pretty fast is still slower than
    doing nothing :)
    
    > > I am not sure where we decide that tuple is HOT-updatable, but I
    > > suspect that it is before we call any index AMs, so STIR ios not
    > > guaranteed to solve the issues with HOT chains.
    >
    > I am not sure what the HOT-chains issue is, but it actually works
    > correctly already, including stress tests.
    > It is even merged into one commercial fork of PG (I am not affiliated
    > with it in any way).
    
    It was about a simplistic approach for VACUUM to just ignore the CIC
    backends and then missing some inserts.
    
    > > (And yes, I have a patch in works to include old and new tids>  as part
    > > of logical decoding - they are "almost there", just not passed through
    > > - which would help here too to easily keep just the last value)
    >
    > Yes, at least it is required for the REPACK case.
    >
    > But....
    >
    > Antonin already has a prototype of patch to enable logical decoding
    > for all kinds of tables in [0] (done in scope of REPACK).
    >
    > So, if we have such mechanics in place, it looks nice (and almost the
    > same) for both CIC and REPACK:
    > * in both cases we create temporary slot to collect incoming tuples
    > * in both cases scan the table resetting snapshot every few pages to
    > keep xmin horizon propagate
    > * in both cases the process already collected part every few megabytes
    > * just the logic of using collected tuples is different...
    >
    > So, yes, from terms of effectiveness STIR seems to be better, but such
    > a common approach like LD looks tempting to have for both REPACK/CIC.
    
    My reasoning was mainly that using something that already exists, and
    must work correctly in any case, is a better long-term strategy than
    adding complexity in multiple places.
    
    After looking up when CIC appeared (v 8.2) and when logical decoding
    came along (v9.4) I start to think that CIC probably would have used
    LD if it had been available when CIC was added.
    
    > On Fri, Nov 28, 2025 at 5:58 PM Matthias van de Meent
    > <boekewurm+postgres@gmail.com> wrote:
    > > -1: Requiring the logical decoding system just to reindex an index
    > >  without O(tablesize) lock time adds too much overhead,
    >
    > How big is the additional cost of maintaining logical decoding for a
    > table? Could you please evolve a little bit?
    >
    > Best regards,
    > Mikhail.
    >
    >
    > [0]: https://www.postgresql.org/message-id/152010.1751307725%40localhost
    > (v15-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch)