Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Make _bt_killitems drop pins it acquired itself.

  2. Avoid BufferGetLSNAtomic() calls during nbtree scans.

  3. Reduce pinning and buffer content locking for btree scans.

  1. Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support

    Peter Geoghegan <pg@bowt.ie> — 2025-06-11T17:29:56Z

    As already discussed on another nearby thread [1], I think that we
    should get rid of nbtree's BTScanPosUnpinIfPinned idiom (which
    originated in 2015 commit 2ed5b87f), since it is rather brittle.
    Recent experience from today's bugfix commit 7c319f54 made that pretty
    clear.
    
    Attached patch gets rid of nbtree's use of BTScanPosUnpinIfPinned.
    Rather than deciding whether or not to unpin a saved position's buffer
    on the basis of it having a buffer set at all (at any point where we
    invalidate the buffer), the patch has us condition everything on the
    scan-level so->dropPin field. This so->dropPin field was added by my
    recent commit e6eed40e; you can think of this patch as finishing off
    the work started in that commit.
    
    I'll add this patch to the Postgres 19 cycle's first commitfest.
    
    More on the underlying motivation:
    
    Removing BTScanPosUnpinIfPinned allows me to significantly simplify
    the management of buffer pins with mark/restore. The patch also gets
    rid of all of the calls to IncrBufferRefCount made from
    nbtree, since it's no longer necessary to support a so->markPos
    representation of a mark that needs its own pin, independent of the
    pin held by/for so->currPos (if so->markPos needs its own pin it'll be
    because it has its own page).
    
    In other words, with the patch, there is exactly one way of
    representing that there's a mark held on the current page. We still
    need multiple pins (one on currPos, another on markPos) during
    !so->dropPin scans, but now that only happens when they are on
    different pages. In other other words, the only valid representation
    of a mark held on the page currently in so->currPos is the
    "so->markItemIndex" field (we're now consistent about that).
    
    I also believe that this simplified approach to holding buffer pins in
    mark/restore will facilitate better nbtree designs. Having "exactly
    one way of representing that there's a mark held on the current page"
    helps with designs that do more heap prefetching by reading multiple
    leaf pages, before earlier pages have performed all of their required
    heap page accesses.
    
    I'm thinking of new designs that maintain multiple leaf page
    details/returned items within one big currPos-like position (the
    current so->currPos just maintains a single leaf page). The main thing
    that makes that difficult right now is the complex rules around buffer
    pins, particularly with mark/restore, which is generally a very hard
    code path to test.
    
    [1] https://postgr.es/m/CAH2-Wzm3a8i31aBXi6oQt9uG7m1-xpX-MXjMMYoDJH=sBj1jnA@mail.gmail.com
    -- 
    Peter Geoghegan
    
  2. Re: Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support

    Peter Geoghegan <pg@bowt.ie> — 2025-12-03T21:10:44Z

    On Wed, Jun 11, 2025 at 1:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
    > Removing BTScanPosUnpinIfPinned allows me to significantly simplify
    > the management of buffer pins with mark/restore. The patch also gets
    > rid of all of the calls to IncrBufferRefCount made from
    > nbtree, since it's no longer necessary to support a so->markPos
    > representation of a mark that needs its own pin, independent of the
    > pin held by/for so->currPos (if so->markPos needs its own pin it'll be
    > because it has its own page).
    
    I'm going to withdraw this patch.
    
    The rationale for the patch still makes perfect sense. However, the
    patch has now been superseded by the ongoing work on adding a new
    amgetbatch interface. That other work (which is tied up in work on I/O
    prefetching for index scans) does everything that this patch does as a
    stepping stone to allowing nbtree index scans to return batches
    (so->currPos style positions that contain at least one matching item
    to return to the scan) that are controlled by the caller/the table AM
    that manages the index scan.
    
    Independently implementing these changes without any of the much more
    extensive changes required for amgetbatch was a useful exercise. But
    there's no reason to keep the patch in the next CF anymore.
    
    -- 
    Peter Geoghegan