Re: Changing the state of data checksums in a running cluster

Tomas Vondra <tomas@vondra.me>

From: Tomas Vondra <tomas@vondra.me>
To: Daniel Gustafsson <daniel@yesql.se>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>, Ayush Tiwari <ayushtiwari.slg01@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>, Andres Freund <andres@anarazel.de>, Bernd Helmle <mailings@oopsware.de>, Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>
Date: 2026-05-29T20:27:20Z
Lists: pgsql-hackers

On 5/29/26 22:08, Daniel Gustafsson wrote:
>> On 28 May 2026, at 13:51, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> On 5/28/26 13:28, Daniel Gustafsson wrote:
>>>> On 26 May 2026, at 20:12, Tomas Vondra <tomas@vondra.me> wrote:
>>>
>>>> I suppose this means we should not be updating the checksum state
>>>> without emitting the barrier? I think all other places do that.
>>>
>>> Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
>>> to maintain cluster consistency.  I ended up writing a test for this very case
>>> as well.
>>
>> Good.
> 
> I've pushed this now, along with your other findings, ahead of the beta1
> deadline, buildfarm seems happy so far.
> 

Thanks!

>>>> I still don't understand why this needs DELAY_CHKPT_START ...
>>>
>>> Having stared at this for some time, and going over old threads, I think this
>>> is a mistake.  AFAICT though it cannot cause any error, so I'd lean towards
>>> erring on the safe side by leaving as is and looking at removing in 20.  What
>>> do you think?
>>>
>>
>> I'd probably try to fix this for 19, otherwise it may be confusing
>> people looking at the code in the future. We're still months from 19
>> getting released. Ofc, maybe I'm underestimating the risk.
> 
> You're probably right.  Once beta1 is out I'll work on getting this fixed.
> 

+1


regards

-- 
Tomas Vondra