Re: Buffer locking is special (hints, checksums, AIO writes)

Melanie Plageman <melanieplageman@gmail.com>

From: Melanie Plageman <melanieplageman@gmail.com>
To: Andres Freund <andres@anarazel.de>
Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, pgsql-hackers@postgresql.org, Thomas Munro <thomas.munro@gmail.com>, Heikki Linnakangas <hlinnaka@iki.fi>, Noah Misch <noah@leadboat.com>, Robert Haas <robertmhaas@gmail.com>, Michael Paquier <michael.paquier@gmail.com>
Date: 2025-11-21T17:52:38Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Require share-exclusive lock to set hint bits and to flush

  2. lwlock: Remove ForEachLWLockHeldByMe

  3. bufmgr: Implement buffer content locks independently of lwlocks

  4. bufmgr: Change BufferDesc.state to be a 64-bit atomic

  5. heapam: Add batch mode mvcc check and use it in page mode

  6. freespace: Don't modify page without any lock

  7. heapam: Move logic to handle HEAP_MOVED into a helper function

  8. bufmgr: Optimize & harmonize LockBufHdr(), LWLockWaitListLock()

  9. bufmgr: Add one-entry cache for private refcount

  10. bufmgr: Separate keys for private refcount infrastructure

  11. Add pg_atomic_unlocked_write_u64

  12. Rename BUFFERPIN wait event class to BUFFER

  13. bufmgr: Turn BUFFER_LOCK_* into an enum

  14. lwlock: Fix, currently harmless, bug in LWLockWakeup()

  15. bufmgr: Use atomic sub for unpinning buffers

  16. bufmgr: Allow some buffer state modifications while holding header lock

  17. bufmgr: Fix valgrind checking for buffers pinned in StrategyGetBuffer()

  18. bufmgr: Don't lock buffer header in StrategyGetBuffer()

  19. bufmgr: fewer calls to BufferDescriptorGetContentLock

  20. bufmgr: Fix signedness of mask variable in BufferSync()

  21. bufmgr: Introduce FlushUnlockedBuffer

  22. Improve ReadRecentBuffer() scalability

This email is just a review for some (specified below) of the patches 0001-0007

On Wed, Nov 19, 2025 at 9:47 PM Andres Freund <andres@anarazel.de> wrote:
>
> 0002: Not really required, but seems like an improvement to me

The commit message says the point is to get compiler warnings for
switch cases that should be exhaustive, but as soon as I looked for a
switch case like that, I see BufferIsLockedByMeInMode()

        switch (mode)
        {
            case BUFFER_LOCK_EXCLUSIVE:
                lw_mode = LW_EXCLUSIVE;
                break;
            case BUFFER_LOCK_SHARE:
                lw_mode = LW_SHARED;
                break;
            default:
                pg_unreachable();
        }

Which makes it impossible to get such a warning. When I add a lock
mode, it specifically doesn't warn when compiling.
However, I'm a big fan of using enums instead of macros when
appropriate, so I have no issue with this change. I just think the
commit message is a bit confusing.

> 0003: A prerequisite to 0004, pretty boring itself

LGTM.

> 0004: Use 64bit atomics for BufferDesc.state - at this point nothing uses the
> additional bits yet, though.  Some annoying reformatting required to avoid
> long lines.

I noticed that the BUF_STATE_GET_REFCOUNT and BUF_STATE_GET_USAGECOUNT
macros cast the return value to a uint32. We won't use the extra bits
but we did bother to keep the macro result sized to the field width
before so keeping it uint32 is probably more confusing now that state
is 64 bit.

Not related to this patch, but I noticed GetBufferDescriptor() calls
for a uint32 and all the callers pretty much pass a signed int —
wonder why it calls for uint32.

> 0005: There already was a wait event class for BUFFERPIN. It seems better to
> make that more general than to implement them separately.

I reviewed and see no issues with the code, but I don't have an
opinion on this wait event naming so maybe you better _wait_ for some
other review ;)

> 0006+0007: This is preparatory work for 0008, but also worthwhile on its
> own. The private refcount stuff does show up in profiles. The reason it's
> related is that without these changes the added information in 0008 makes that
> worse.

I found it slightly confusing that this commit appears to
unnecessarily add the PrivateRefCountData struct (given that it
doesn't need it to do the new parallel array thing). You could wait
until you need it in 0008, but 0008 is big as it is, so it probably is
fine where it is.

in InitBufferManagerAccess(), why do you have

memset(&PrivateRefCountArrayKeys, 0, sizeof(Buffer));
seems like it should be
memset(PrivateRefCountArrayKeys, 0, sizeof(PrivateRefCountArrayKeys));

I wonder how easy it will be to keep the Buffer in sync between
PrivateRefCountArrayKeys and the PrivateRefCountEntry — would a helper
function help?

ForgetPrivateRefCountEntry doesn’t clear the data member — but maybe
it doesn’t matter...

in ReservePrivateRefCountEntry() there is a superfluous clear

memset(&victim_entry->data, 0, sizeof(victim_entry->data));
victim_entry->data.refcount = 0;

0007
needs a commit message. overall seems fine though.
You should probably capitalize the "c" of "count" in
PrivateRefcountEntryLast to be consistent with the other names.

- Melanie