Thread

  1. Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2025-12-22T22:23:05Z

    On Thu, 24 Apr 2025 at 22:46, Matthias van de Meent
    <boekewurm+postgres@gmail.com> wrote:
    >
    > On Fri, 21 Mar 2025 at 17:14, Matthias van de Meent
    > <boekewurm+postgres@gmail.com> wrote:
    > > Attached is v10, which polishes the previous patches, and adds a patch
    > > for nbtree to use the new visibility checking strategy so that it too
    > > can release its index pages much earlier, and adds a similar
    > > visibility check test to nbtree.
    >
    > And here's v12. v11 (skipped) would've been a rebase, but after
    > finishing the rebase I noticed a severe regression in btree's IOS with
    > the new code, so v12 here applies some optimizations which reduce the
    > overhead of the new code.
    
    Here's v13, which moves the changes around a bit:
    
    v12's 0001 is split into 3 patches (1, 3, and 4), whilst v12's 2-5
    were correspondingly renumbered 5-8. Patch 0002 is an otherwise
    unrelated change in pg_visibility that updates it to use the new
    vectorized API, reducing overhead. So, summary of the patches:
    
    0001: Replaces visibilitymap_get_status with a vectorized variant that
    touches each VM page at most once per call; reducing buffer churn and
    enabling later patches
    0002: update pg_visibility to use this newly vectorized API (instead
    of the current model that checks each page at a time)
    0003: Adds the table_index_vischeck_tuples API, requiring a table AM
    to expose VM checks through an API.
    0004: Adjusts Index-Only scan infrastructure to make use of 0004,
    instead of using VM_ALL_VISIBLE(). It also adds the relevant
    infrastructure for enabling index AMs to provide the pre-checked
    visibility status (from table_index_vischeck_tuples) to an index-only
    scan.
    0005: Implement VM-checks in GIST's IOS code
    0006: Same, but for SP-GIST
    0007: Same, but for NBTREE
    0008: Add tests which validate that we still get correct results from
    our queries, even when we use cursors to block results from getting
    returned, and cleaning up tuples from those index pages.
    
    A big benefit with this patch is that indexes now have no direct
    reason anymore to hold back a VACUUM scan -- the visibility of tuples
    can be checked at page scan time, and any shared resources can be
    released before returning tuples to higher nodes.
    
    ------------------------
    
    Summary of the problem that I'm solving here:
    
    An index that holds dead tuples could return those dead tuples in an
    Index-Only Scan (IOS) to the scan node, as the index AM itself doesn't
    have any information about the visibility of the tuples that it
    contains. The IOS infrastructure prevents these tuples from being
    exposed by doing visibility checks against the Visibility Map (VM) and
    -if necessary- the underlying heap.
    
    This, however, depends on an invariant: VACUUM MUST NOT remove a TID
    that's being returned by an index scan, at least not before before
    that tuple has been checked for visibility in the VM; otherwise VACUUM
    may get to clean up the dead tuple's page and mark it all-visible
    before the visibility check occurs, incorrectly returning an
    'all-visible' result for that dead tuple.
    
    Btree indexes interlock with vacuum using a buffer cleanup lock and a
    pin on pages it's yet to return results for; and that solution works
    quite fine [^1].
    This same solution sadly doesn't work for GiST and SP-GiST, as in a
    worst case scenario they may have to sort the whole index in memory
    before they can return the first index tuple, and holding pins on
    those pages would be extremely punishing and might even cause system
    crashes due to a lack of available un-pinned shared buffers.
    To solve this, we implement a mechanism to allow indexes to do rough
    visibility checks on TIDs; the results of which can then be passed
    through the IndexScanDesc to indicate what the VM state was when the
    tuple was still in the index. This enables them to make the VM check
    happen just after they've scanned a page, but before they release
    their pin on the page, adding the interlock with VACUUM without
    requiring unreasonably large amounts of page pins.
    
    This new mechanism is safe in MVCC snapshots, where we know that
    tuples which are all-visible can't be removed while the scan is
    ongoing, and where possibly-dead possibly-replaced TIDs are known to
    be visibility-checked using transaction IDs, and where any new TID
    would have to be inserted in a different transaction and therefore is
    definitely invisible to our current transaction.
    
    ------------------------
    
    Open items: review that this doesn't have any further issues, and
    commits once this has been considered good enough.
    
    Note:
      This patch changes TableAMRoutine and renames/changes exposed
    functions, and as a result can't be backpatched as-is. I have a
    separate thread over at [0] where I'm keeping track of a patchset that
    is derived from this one and is focused on backpatching. That patchset
    will contain patches 0004/0005/0006 and a reduced version of 0001+0003
    to make it work in older branches without breaking external ABI
    compatibility. I intend for the exposed table_index_vischeck_tuples()
    API to remain consistent across the two patchsets.
    
    
    Kind regards,
    
    Matthias van de Meent
    Databricks (https://www.databricks.com)
    
    [0]: https://www.postgresql.org/message-id/flat/CAEze2WgH13m=MDST58KLo-NkZpbwBEt4xNWcgtghWBwRj3J0+A@mail.gmail.com
    
    [^1]: Mostly fine, because it still holds VACUUM back when an
    index-only scan holds a page pin and VACUUM needs to process that
    page. If the index scan doesn't progress, then VACUUM can't progress
    either, and that can cause vacuum to get stuck. That issue is solved
    (for normal index scans) with patch 0007.