Re: Changing the state of data checksums in a running cluster

Daniel Gustafsson <daniel@yesql.se>

From: Daniel Gustafsson <daniel@yesql.se>
To: Tomas Vondra <tomas@vondra.me>
Cc: Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-03-13T12:32:20Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Use correct datatype for PID

  2. Improve comments in online checksums code

  3. Fix checksum state transition during promotion

  4. Fix regex searching for page verification failures in tests

  5. Apply data-checksum worker throttling parameters

  6. Skip WAL for unlogged main fork during online checksum enable

  7. Revert "Get rid of WALBufMappingLock"

  8. Get rid of WALBufMappingLock

  9. Improve grammar of options for command arrays in TAP tests

> On 13 Mar 2025, at 12:03, Tomas Vondra <tomas@vondra.me> wrote:
> On 3/13/25 10:54, Daniel Gustafsson wrote:
>>> On 12 Mar 2025, at 14:16, Tomas Vondra <tomas@vondra.me> wrote:

>>> I believe the approach is correct, but the number of possible states
>>> (e.g. after a crash/restart) seems a bit complex. I wonder if there's a
>>> better way to handle this, but I can't think of any. Ideas?
>> 
>> Not sure if this moves the needle too far in terms of complexity wrt to the
>> previous version of the patch, there were already multiple copies.
> 
> It does add one more place (XLogCtl->data_checksum_version) to store the
> current state, so it's not that much more complex, ofc. But I was not
> really comparing this to the previous patch version, I meant the state
> space in general - all possible combinations of all the flags (control
> file, local + xlogct).

Fair point.

> I wonder if it might be possible to have a more thorough validation of
> the transitions. We already have that for the LocalDataChecksumVersion,
> thanks to the asserts - and it was damn useful, otherwise we would not
> have noticed this issue for a long time, I think.
> 
> I wonder if we can have similar checks for the other flags. I'm pretty
> sure we can have the same checks for XLogCtl, right?

I don't see why not, they should abide by the same rules.

> I'm not quite sure
> about ControlFile - can't that "skip" some of the changes, e.g. if we do
> (enable->disable->enable) within a single checkpoint? Need to check.

For enable->disable->enable within a single checkpoint then ControlFile should
never see the disable state.

> This also reminds me I had a question about the barrier - can't it
> happen a process gets to process multiple barriers at the same time? I
> mean, let's say it gets stuck for a while, and the cluster happens to go
> through disable+enable. Won't it then see both barriers? That'd be a
> problem, because the core processes the barriers in the order determined
> by the enum value, not in the order the barriers happened. Which means
> it might break the expected state transitions again (and end with the
> wrong local value). I haven't tried, though.

Interesting, that seems like a general deficiency in the barriers, surely
processing them in-order would be more intuitive?  That would probably require
some form of Lamport clock though.

>>> One issue I ran into is the postmaster does not seem to be processing
>>> the barriers, and thus not getting info about the data_checksum_version
>>> changes.
>> 
>> Makes sense, that seems like a pretty reasonable constraint for the barrier.
> 
> Not sure I follow. What's a reasonable constraint?

That the postmaster deosn't process them.

>>> That's fine until it needs to launch a child process (e.g. a
>>> walreceiver), which will then see the LocalDataChecksumVersion as of the
>>> start of the instance, not the "current" one. I fixed this by explicitly
>>> refreshing the value in postmaster_child_launch(), but maybe I'm missing
>>> something. (Also, EXEC_BACKEND may need to handle this too.)
>> 
>> The pg_checksums test is failing for me on this version due to the GUC not
>> being initialized, don't we need something like the below as well?  (With a
>> comment explaining why ReadControlFile wasn't enough.)
>> 
>> @@ -5319,6 +5319,7 @@ LocalProcessControlFile(bool reset)
>>        Assert(reset || ControlFile == NULL);
>>        ControlFile = palloc(sizeof(ControlFileData));
>>        ReadControlFile();
>> +       SetLocalDataChecksumVersion(ControlFile->data_checksum_version);
>> 
> 
> Yeah, I think this (or something like it) is missing.

Thanks for confirming.

>> +uint32
>> +GetLocalDataChecksumVersion(void)
>> +{
>> + return LocalDataChecksumVersion;
>> +}
>> +
>> +/*
>> + * Get the *current* data_checksum_version (might not be written to control
>> + * file yet).
>> + */
>> +uint32
>> +GetCurrentDataChecksumVersion(void)
>> +{
>> + return XLogCtl->data_checksum_version;
>> +}
>> I wonder if CachedDataChecksumVersion would be more appropriate to distinguish
>> it from the Current value, and also to make appear less like actual copies of
>> controlfile values like LocalMinRecoveryPoint.  Another thought is if we should
>> have the GetLocalDataChecksumVersion() API?  GetCurrentDataChecksumVersion()
>> should be a better API no?
>> 
> 
> FWIW those functions are for debug logging only, I needed to print the
> values in a couple places outside xlog.c. I don't intend to make that
> part of the patch.

Ah, gotcha, I never applied the debug patch from the patchset so I figured this
was a planned API.  The main question still stands though, if LocalDataCheckXX
can be confusing and CachedDataCheckXX would be better in order to
distinguish it from actual controlfile copies?

--
Daniel Gustafsson