Re: Buffer locking is special (hints, checksums, AIO writes)

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Heikki Linnakangas <hlinnaka@iki.fi>

Cc: Melanie Plageman <melanieplageman@gmail.com>, Matthias van de Meent <boekewurm+postgres@gmail.com>, pgsql-hackers@postgresql.org, Thomas Munro <thomas.munro@gmail.com>, Noah Misch <noah@leadboat.com>, Robert Haas <robertmhaas@gmail.com>, Michael Paquier <michael.paquier@gmail.com>

Date: 2025-12-18T22:06:43Z

Lists: pgsql-hackers

Hi,

On 2025-12-18 19:20:38 +0200, Heikki Linnakangas wrote:
> On 18/12/2025 19:03, Andres Freund wrote:
> > On 2025-12-17 09:54:32 -0500, Andres Freund wrote:
> > > On 2025-12-17 11:25:50 +0200, Heikki Linnakangas wrote:
> > > > - LWLockWaitListLock() uses pg_atomic_read_u32() after spinning,
> > > > LockBufHdr() retries directly with pg_atomic_fetch_or_u32().
> > >
> > > I think here LWLockWaitListLock() is likely right - but it seems like a change
> > > to LockBufHdr() that I would probably make in a separate commit?
> >
> > FWIW, I couldn't come up with a scenario where it makes a performance
> > difference - exclusive content locks just aren't *that* frequent. And because
> > of that the wait list lock doesn't have similar contention as some non-content
> > lwlocks (like XidGenLock). The most extreme workload I could think of was
> > pgbench hammering a single sequence across many sessions. While the exclusive
> > locks show up in wait events, the buffer header spinlock itself doesn't..
> >
> > So I'm inclined to not change anything about this for now.
>
> Ok. My thinking was just that LockBufHdr() and LWLockWaitListLock() should
> be consistent with each other. Otherwise anyone reading the code will ask
> the question "why are they different?". They're the only two things using
> the spin delay mechanism in our codebase, in addition to actual spinlocks.

I guess for me it didn't really seem like this patch's job to fix
that. Regardless of that, here's a version that tries to make them more
similar.

I did check, adding a likely() to LWLockWaitListLock()'s break does improve
code generation (verified by looking at the generated code) and seems to
improve performance in some very extreme workloads (e.g. [1]) a bit.

I'll try to come up with a combined patch that applies the optimizations in
LWLockWaitListLock() and LockBufHdr() to each other.

> BTW, I wonder if it would be worthwhile to have an inlineable fast-path of
> LockBufHdr() for the common case that the lock is free? I see that
> UnlockBufHdr() is already a static inline function.

I tried that a while ago and couldn't see any improvement, I think because all
the performance relevant callers are in bufmgr.c and thus can already inline
[parts of] the implementation.  I guess you could make the generated code a
bit smaller if you use pg_noinline on the slowpath, but that seems like a
separate project / effort to me.

Greetings,

Andres Freund

[1] Many connections doing
DO $do$
    BEGIN
        FOR i IN 1 .. 1000 LOOP
            PERFORM txid_current();
	    COMMIT;
	END LOOP;
     END;
$do$;