Thread

  1. Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2025-11-28T16:57:55Z

    On Fri, 28 Nov 2025 at 15:50, Mihail Nikalayeu
    <mihailnikalayeu@gmail.com> wrote:
    >
    > Hello!
    >
    > On Thu, Nov 27, 2025 at 9:07 PM Matthias van de Meent
    > <boekewurm+postgres@gmail.com> wrote:
    > > While it might not break, and might not hold back other tables'
    > > visibility horizons, it'll still hold back pruning on the table we're
    > > acting on, and that's likely one which already had bloat issues if
    > > you're running RIC (or REPACK).
    >
    > Yes, a good point about REPACK, agreed.
    >
    > BTW, what is about using the same reset snapshot technique for REPACK also?
    >
    > I thought it is impossible, but what if we:
    >
    > * while reading the heap we "remember" our current page position into
    > shared memory
    > * preserve all xmin/max/cid into newly created repacked table (we need
    > it for MVCC-safe approach anyway)
    > * in logical decoding layer - we check TID of our tuple and looking at
    > "current page" we may correctly decide what to do with at apply phase:
    >
    > - if it in "non-yet read pages" - ignore (we will read it later) - but
    > signal scan to ensure it will reset snapshot before that page
    > (reset_before = min(reset_before, tid))
    > - if it in "already read pages" - remember the apply operation (with
    > exact target xmin/xmax and resulting xmin/xmax)
    
    Yes, exactly - keep track of which snapshot was used for which part of
    the table, and all updates that add/remove tuples from the scanned
    range after that snapshot are considered inserts/deletes, similar to
    how it'd work if LR had a filter on `ctid BETWEEN '(0, 0)' AND
    '(end-of-snapshot-scan)'` which then gets updated every so often.
    
    I'm a bit worried, though, that LR may lose updates due to commit
    order differences between WAL and PGPROC. I don't know how that's
    handled in logical decoding, and can't find much literature about it
    in the repo either.
    
    
    Kind regards,
    
    Matthias van de Meent
    Databricks (https://www.databricks.com)