Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Revert changes to CONCURRENTLY that "sped up" Xmin advance

  2. VACUUM: ignore indexing operations with CONCURRENTLY

  1. Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Hannu Krosing <hannuk@google.com> — 2025-11-24T21:18:26Z

    When VACUUM decides which rows are safe to freeze or permanently
    remove it currently ignores backends which have PROC_IN_VACUUM or
    PROC_IN_LOGICAL_DECODING bits set.
    
    This patch adds PROC_IN_SAFE_IC to this set, so backends running
    CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY and where the index
    is "simple" - i.e. not expression indexes or conditional indexes are
    involved - these would be ignored too.
    
    The reasoning behind this is simple:
    
    1) Why this is safe:
    
    a) The vacuum operation can not run on the same table that vacuum is
    working on because of locks.
    b) The CIC operation only runs on a single table in one transaction,
    so it can not touch other tables
    
    2) Why this is useful:
    
    CIC can take significant amount of time, and in case of high-traffic
    database with vacuum cleanups blocked a significant amount of dead
    rows can accumulate which can have significant impact on certain
    workloads. The worst affected are the ones that are considered
    anti-patterns anyway, like updatein a single counter row from all DML,
    but this can work "well enough" if all the DML transactions are tiny
    and and the performance can be maintained between vacuum runs by just
    setting the deleted flags in indexes and heap which currentlyis also
    blocke.
    
    Future improvements
    
    It would be good to do some more introspection to determine if the CIC
    skipping is also safe for specifioc cases of expression and
    conditional indexes which are currently excluded from setting the
    PROC_IN_SAFE_IC flag.
    
    ---
    Hannu
    
  2. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Peter Geoghegan <pg@bowt.ie> — 2025-11-24T22:09:05Z

    On Mon, Nov 24, 2025 at 4:18 PM Hannu Krosing <hannuk@google.com> wrote:
    > When VACUUM decides which rows are safe to freeze or permanently
    > remove it currently ignores backends which have PROC_IN_VACUUM or
    > PROC_IN_LOGICAL_DECODING bits set.
    >
    > This patch adds PROC_IN_SAFE_IC to this set, so backends running
    > CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY and where the index
    > is "simple" - i.e. not expression indexes or conditional indexes are
    > involved - these would be ignored too.
    
    Are you aware of commit d9d076222f5b? It was subsequently reverted by
    commit e28bb885 because it led to subtle data corruption. Indexes had
    wrong contents due to an unforeseen interaction with pruning.
    
    -- 
    Peter Geoghegan
    
    
    
    
  3. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Dilip Kumar <dilipbalaut@gmail.com> — 2025-11-25T14:05:51Z

    On Tue, Nov 25, 2025 at 3:39 AM Peter Geoghegan <pg@bowt.ie> wrote:
    >
    > On Mon, Nov 24, 2025 at 4:18 PM Hannu Krosing <hannuk@google.com> wrote:
    > > When VACUUM decides which rows are safe to freeze or permanently
    > > remove it currently ignores backends which have PROC_IN_VACUUM or
    > > PROC_IN_LOGICAL_DECODING bits set.
    > >
    > > This patch adds PROC_IN_SAFE_IC to this set, so backends running
    > > CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY and where the index
    > > is "simple" - i.e. not expression indexes or conditional indexes are
    > > involved - these would be ignored too.
    >
    > Are you aware of commit d9d076222f5b? It was subsequently reverted by
    > commit e28bb885 because it led to subtle data corruption. Indexes had
    > wrong contents due to an unforeseen interaction with pruning.
    
    Interesting, so although it's not a problem for the vacuum itself,
    vacuum would move the xmin forward and that will allow other backends
    to hot prune the tuples from the table needed by the index building
    snapshot.
    
    -- 
    Regards,
    Dilip Kumar
    Google
    
    
    
    
  4. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2025-11-25T16:09:17Z

    On Mon, 24 Nov 2025 at 23:09, Peter Geoghegan <pg@bowt.ie> wrote:
    >
    > On Mon, Nov 24, 2025 at 4:18 PM Hannu Krosing <hannuk@google.com> wrote:
    > > When VACUUM decides which rows are safe to freeze or permanently
    > > remove it currently ignores backends which have PROC_IN_VACUUM or
    > > PROC_IN_LOGICAL_DECODING bits set.
    > >
    > > This patch adds PROC_IN_SAFE_IC to this set, so backends running
    > > CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY and where the index
    > > is "simple" - i.e. not expression indexes or conditional indexes are
    > > involved - these would be ignored too.
    >
    > Are you aware of commit d9d076222f5b? It was subsequently reverted by
    > commit e28bb885 because it led to subtle data corruption. Indexes had
    > wrong contents due to an unforeseen interaction with pruning.
    
    Indeed, I don't think this is a correct change, given that these
    visibility horizons are calculated in every backend, and are also used
    for pruning.
    On-access pruning is one case where this is used and which would break
    - exactly the issue that caused d9d076222f5b to be reverted in
    e28bb885.
    
    
    Kind regards,
    
    Matthias van de Meent
    Databricks (https://www.databricks.com)
    
    
    
    
  5. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Hannu Krosing <hannuk@google.com> — 2025-11-26T11:07:04Z

    So what are the options for a clean fix ?
    (the "Discussion:
    https://postgr.es/m/17485-396609c6925b982d%40postgresql.org" link
    gives 503 back so can't immediately check myself)
    
    Do we also need to make sure that CIC will not miss hot-pruned tuples
    ? What is the exact mechanism for missing them ?
    
    Or should we just try to have a separate frozen per-table Xmin to be
    used by CIC ?
    
    
    
    On Mon, Nov 24, 2025 at 11:09 PM Peter Geoghegan <pg@bowt.ie> wrote:
    >
    > On Mon, Nov 24, 2025 at 4:18 PM Hannu Krosing <hannuk@google.com> wrote:
    > > When VACUUM decides which rows are safe to freeze or permanently
    > > remove it currently ignores backends which have PROC_IN_VACUUM or
    > > PROC_IN_LOGICAL_DECODING bits set.
    > >
    > > This patch adds PROC_IN_SAFE_IC to this set, so backends running
    > > CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY and where the index
    > > is "simple" - i.e. not expression indexes or conditional indexes are
    > > involved - these would be ignored too.
    >
    > Are you aware of commit d9d076222f5b? It was subsequently reverted by
    > commit e28bb885 because it led to subtle data corruption. Indexes had
    > wrong contents due to an unforeseen interaction with pruning.
    >
    > --
    > Peter Geoghegan
    
    
    
    
  6. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2025-11-26T12:18:39Z

    On Wed, 26 Nov 2025 at 12:07, Hannu Krosing <hannuk@google.com> wrote:
    >
    > So what are the options for a clean fix ?
    > (the "Discussion:
    > https://postgr.es/m/17485-396609c6925b982d%40postgresql.org" link
    > gives 503 back so can't immediately check myself)
    >
    > Do we also need to make sure that CIC will not miss hot-pruned tuples
    > ? What is the exact mechanism for missing them ?
    
    The issue is CIC has a normal snapshot which would then be ignored by vacuum:
    
    1. CIC, in second heap scan phase, gets a snapshot.
    2. Another backend HOT-updates a tuple visible to CIC's snapshot, and commits.
       ^ HOT, so no insertion into the index
    3. Another backend prunes the old tuple of step 2
       ^ now the tuple that CIC's snapshot should have seen is being
    removed before CIC completes.
        If CIC's scan didn't scan this tuple yet, then the second scan's
    guarantee of inserting all missing tuples is no longer guaranteed: the
    missing tuple may have been inserted during the first heap scan phase.
    
    > Or should we just try to have a separate frozen per-table Xmin to be
    > used by CIC ?
    
    In that case VACUUM and prune operations would have to build
    visibility horizons for each table, and I don't think that's viable;
    especially so in a heavily partitioned workload. It'd also be
    prohibitively expensive to add a per-table Xmin -- we don't always
    know which tables will be accessed by any backend until they lock that
    table, but at that time they may already have a snapshot that they
    need tuples from. If we lazily included that backend, there may be
    visibility horizons and prune operations that were built and executed
    ahead of them notifying the other backends of their use, and pruned
    away still-visible tuples like in the aforementioned CIC hot pruning
    issue.
    
    If you're interested in improving CIC and reducing its impact on
    visibility horizons, you may be interested in reviewing Mihail's work
    in [0].
    
    
    Kind regards,
    
    Matthias van de Meent
    Databricks (https://www.databricks.com)
    
    [0] https://postgr.es/m/flat/CADzfLwWrGYGE8%3Dcg%2B6C57Nypv1Y-1mBv8BVzzPWVJy5EfR6YfQ%40mail.gmail.com#4cb06452a314aa851a7dde936e817bb3
    
    
    
    
  7. Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

    Mikhail Nikalayeu <mihailnikalayeu@gmail.com> — 2025-11-27T02:30:00Z

    Hello!
    
    > If you're interested in improving CIC and reducing its impact on
    > visibility horizons, you may be interested in reviewing Mihail's work
    > in [0].
    
    Actually I reduced the scope of the patch to only
    single-heap-scan-STIR-based-CIC. But patch files are still available.
    I did it because it looks like we need first a solution which also
    will work for REPACK concurrently, see [1].
    Reset-snapshot technique may still be applied later though (to reduce
    bloat in the target table).
    
    > In that case VACUUM and prune operations would have to build
    > visibility horizons for each table, and I don't think that's viable;
    > especially so in a heavily partitioned workload. It'd also be
    > prohibitively expensive to add a per-table Xmin -- we don't always
    > know which tables will be accessed by any backend until they lock that
    > table, but at that time they may already have a snapshot that they
    > need tuples from. If we lazily included that backend, there may be
    > visibility horizons and prune operations that were built and executed
    > ahead of them notifying the other backends of their use, and pruned
    > away still-visible tuples like in the aforementioned CIC hot pruning
    > issue.
    
    I think I know the way to implement it with almost zero cost for a
    "normal" situation.
    It is based on Antonin's idea to change
    GetOldestNonRemovableTransactionId(rel) because we have Relation as an
    argument.
    We may put some "pinned" xmin into Relation and fill it only while
    filling RelCache by accessing some shared memory.
    
    Of course there is a lot of things we need to worry about:
    * make sure every backend will respect it (invalidate cache and wait
    for each backend to "know" about it)
    * make sure horizon moved only forward
    * handle all kind of error of backend who "pinned" horizon
    * etc
    
    In a few words, protocol of "pinning" is next:
    * get a "normal snapshot"
    * put its xmin into shared memory as newest horizon value possible for
    a particular relation
    * invalidate RelCache for that relation
    * wait for all transaction having any locks on that relation to commit
    (and, as result, drop old cached Relation)
    * we a free to release snapshot and set some PROC flag to ignore our
    xmin for data (not catalog) horizon
    * GetOldestNonRemovableTransactionId(rel) will return
    TransactionIdOlder(actual_horizon, pinned) taking "pinned" from `rel`.
    
    If someone is interested - some dirty version is available at [1], but
    it is better to wait for a more-less polished patchset.
    
    [0]: https://www.postgresql.org/message-id/flat/202510301734.pj4uds3mqxx4%40alvherre.pgsql#fd20662912580a89b860790f9729aaff
    [1]: https://github.com/michail-nikolaev/postgres/commits/make-PROC_IN_SAFE_IC-great-again/