Re: Changing the state of data checksums in a running cluster

Daniel Gustafsson <daniel@yesql.se>

From: Daniel Gustafsson <daniel@yesql.se>
To: Tomas Vondra <tomas@vondra.me>
Cc: Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-03-10T13:27:06Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Use correct datatype for PID

  2. Improve comments in online checksums code

  3. Fix checksum state transition during promotion

  4. Fix regex searching for page verification failures in tests

  5. Apply data-checksum worker throttling parameters

  6. Skip WAL for unlogged main fork during online checksum enable

  7. Revert "Get rid of WALBufMappingLock"

  8. Get rid of WALBufMappingLock

  9. Improve grammar of options for command arrays in TAP tests

> On 10 Mar 2025, at 12:17, Tomas Vondra <tomas@vondra.me> wrote:
> 
> On 3/10/25 10:46, Tomas Vondra wrote:
>> On 3/10/25 01:18, Tomas Vondra wrote:

Thank you so much for picking up and fixing the blockers, it's highly appreciated!

>> For me, this passes all CI tests, hopefully cfbot will be happy too.

Confirmed, it compiles clean, builds docs and passes all tests for me as well.

A few comments from reading over your changes:

+   launcher worker has this value set, the other worker processes
+   have this <literal>NULL</literal>.
There seems to be a word or two missing (same in a few places), should this be
"have this set to NULL"?


+   The command is currently waiting for a checkpoint to update the checksum
+   state at the end.
s/at the end/before finishing/?


+ * XXX aren't PG_DATA_ and DATA_ constants the same? why do we need both?
They aren't mapping 1:1 as PG_DATA_ has the version numbers, and if checksums
aren't enabled there is no version and thus there is no PG_DATA_CHECKSUMS_OFF.
This could of course be remedied.  IIRC one reason for adding the enum was to
get compiler warnings on missing cases when switch()ing over the value, but I
don't think the current code has any switch.


+   /* XXX isn't it weird there's no wait between the phase updates? */
It is, I think we should skip PROGRESS_DATACHECKSUMS_PHASE_WAITING_BACKENDS in
favor of PROGRESS_DATACHECKSUMS_PHASE_ENABLING.


+   * When enabling checksums, we have to wait for a checkpoint for the
+   * checksums to e.
Seems to be missing the punchline, "for the checksum state to be moved from
in-progress to on" perhaps?


It also needs a pgindent and pgperltidy but there were only small trivial
changes there.

Thanks again for updating the patch!

--
Daniel Gustafsson