Re: Changing the state of data checksums in a running cluster

Daniel Gustafsson <daniel@yesql.se>

From: Daniel Gustafsson <daniel@yesql.se>
To: Tomas Vondra <tomas@vondra.me>
Cc: Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-03-14T14:06:52Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Use correct datatype for PID

  2. Improve comments in online checksums code

  3. Fix checksum state transition during promotion

  4. Fix regex searching for page verification failures in tests

  5. Apply data-checksum worker throttling parameters

  6. Skip WAL for unlogged main fork during online checksum enable

  7. Revert "Get rid of WALBufMappingLock"

  8. Get rid of WALBufMappingLock

  9. Improve grammar of options for command arrays in TAP tests

Attachments

> On 14 Mar 2025, at 14:38, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 14 Mar 2025, at 13:20, Tomas Vondra <tomas@vondra.me> wrote:

>> This is "ephemeral" in the sense that setting the value to "on" again
>> would be harmless, and indeed a non-assert build will run just fine.
> 
> As mentioned off-list, being able to loosen the restriction for the first
> barrier seen seem like a good way to keep this assertion.  Removing it is of
> course the alternative solution, as it's not causing any issues, but given how
> handy it's been to find actual issues it would be good to be able to keep it.
> 
>> i.e. to first register into procsignal, and then read the new value.
>> AFAICS this guarantees we won't lose any checksum version updates. It
>> does mean we still can get a barrier for a value we've already seen, but
>> I think we should simply ignore this for the very first update.
> 
> Calling functions with sideeffects in setting state seems like a bad idea
> before ProcSignalInit has run, that's thinko on my part in this patch.  Your
> solution of reordering seems like the right way to handle this.

0006 in the attached version is what I have used when testing the above, along
with an update to the copyright year which I had missed doing earlier.  It also
contains the fix in LocalProcessControlFile which I had in my local tree, I
think we need something like that at least.

--
Daniel Gustafsson