Re: Changing the state of data checksums in a running cluster
Daniel Gustafsson <daniel@yesql.se>
From: Daniel Gustafsson <daniel@yesql.se>
To: Tomas Vondra <tomas@vondra.me>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>,
Ayush Tiwari <ayushtiwari.slg01@gmail.com>,
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
Heikki Linnakangas <hlinnaka@iki.fi>,
Andres Freund <andres@anarazel.de>,
Bernd Helmle <mailings@oopsware.de>,
Michael Paquier <michael@paquier.xyz>,
Michael Banck <mbanck@gmx.net>
Date: 2026-05-28T11:28:49Z
Lists: pgsql-hackers
Attachments
- 0003-Use-correct-datatype-for-PID.patch (application/octet-stream)
- 0002-Improve-comments-in-online-checksums-code.patch (application/octet-stream)
- 0001-Fix-checksum-state-transition-during-promotion.patch (application/octet-stream)
> On 26 May 2026, at 20:12, Tomas Vondra <tomas@vondra.me> wrote: > I suppose this means we should not be updating the checksum state > without emitting the barrier? I think all other places do that. Good catch, it's indeed a bug, any state change must emit a procsignalbarrier to maintain cluster consistency. I ended up writing a test for this very case as well. > I'm still not sure if it really is an issue or just an annoyance, > because I've not been able to find a case where it'd lead to checksum > failures (or obviously incorrect final state after recovery). I've tried to get it to reach an incorrect end state but failed, but I do agree that maybe we need an improved locking protocol around state updates. Need to spend some more time thinking about this. > I still don't understand why this needs DELAY_CHKPT_START ... Having stared at this for some time, and going over old threads, I think this is a mistake. AFAICT though it cannot cause any error, so I'd lean towards erring on the safe side by leaving as is and looking at removing in 20. What do you think? > I also noticed a couple minor comment issues, per attached patch (this > may need pgindent). I ended up splitting this into two, one for the comment fixes and one for the data type change. I propose applying the three patches below to v19 to fix the promotion issue before we wrap beta1. -- Daniel Gustafsson