Re: Changing the state of data checksums in a running cluster

Daniel Gustafsson <daniel@yesql.se>

From: Daniel Gustafsson <daniel@yesql.se>
To: Tomas Vondra <tomas@vondra.me>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>, Ayush Tiwari <ayushtiwari.slg01@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>, Andres Freund <andres@anarazel.de>, Bernd Helmle <mailings@oopsware.de>, Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>
Date: 2026-05-28T11:28:49Z
Lists: pgsql-hackers

Attachments

> On 26 May 2026, at 20:12, Tomas Vondra <tomas@vondra.me> wrote:

> I suppose this means we should not be updating the checksum state
> without emitting the barrier? I think all other places do that.

Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
to maintain cluster consistency.  I ended up writing a test for this very case
as well.

> I'm still not sure if it really is an issue or just an annoyance,
> because I've not been able to find a case where it'd lead to checksum
> failures (or obviously incorrect final state after recovery).

I've tried to get it to reach an incorrect end state but failed, but I do agree
that maybe we need an improved locking protocol around state updates.  Need to
spend some more time thinking about this.

> I still don't understand why this needs DELAY_CHKPT_START ...

Having stared at this for some time, and going over old threads, I think this
is a mistake.  AFAICT though it cannot cause any error, so I'd lean towards
erring on the safe side by leaving as is and looking at removing in 20.  What
do you think?

> I also noticed a couple minor comment issues, per attached patch (this
> may need pgindent).

I ended up splitting this into two, one for the comment fixes and one for the
data type change.

I propose applying the three patches below to v19 to fix the promotion issue
before we wrap beta1.

--
Daniel Gustafsson