Re: Changing the state of data checksums in a running cluster
Tomas Vondra <tomas@vondra.me>
From: Tomas Vondra <tomas@vondra.me>
To: Michael Paquier <michael@paquier.xyz>
Cc: Daniel Gustafsson <daniel@yesql.se>, Michael Banck <mbanck@gmx.net>,
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-03-09T17:57:52Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Use correct datatype for PID
- 0ca1b3010597 19 (unreleased) landed
-
Improve comments in online checksums code
- cd857dec0e0a 19 (unreleased) landed
-
Fix checksum state transition during promotion
- 5fee7cab1b87 19 (unreleased) landed
-
Fix regex searching for page verification failures in tests
- 486b9a9b9eb4 19 (unreleased) landed
-
Apply data-checksum worker throttling parameters
- 9a39056c418c 19 (unreleased) landed
-
Skip WAL for unlogged main fork during online checksum enable
- 2018bd616790 19 (unreleased) landed
-
Revert "Get rid of WALBufMappingLock"
- c13070a27b63 19 (unreleased) cited
-
Get rid of WALBufMappingLock
- bc22dc0e0ddc 18.0 cited
-
Improve grammar of options for command arrays in TAP tests
- ce1b0f9da03e 18.0 cited
Attachments
- v20250309-0001-Online-enabling-and-disabling-of-data-chec.patch (text/x-patch) patch v20250309-0001
- v20250309-0002-simple-post-rebase-fixes.patch (text/x-patch) patch v20250309-0002
- v20250309-0003-sync-the-data_checksums-GUC-with-the-local.patch (text/x-patch) patch v20250309-0003
- v20250309-0004-make-progress-reporting-work.patch (text/x-patch) patch v20250309-0004
- v20250309-0005-update-docs.patch (text/x-patch) patch v20250309-0005
Here's a rebased version of the patch series, addressing the issues I've pointed out in the last round of reviews. I've kept the changes in separate patches for clarity, but it should be squashed into a single patch in the end. 1) v20250309-0001-Online-enabling-and-disabling-of-data-chec.patch ------------------------------------------------------------------ Original patch, rebased, resolving merge conflicts. 2) v20250309-0002-simple-post-rebase-fixes.patch ------------------------------------------------ A couple minor fixes, addressing test failures due to stuff committed since the previous patch version. Mostly mechanical, the main change is I don't think the pgstat_bestart() call is necessary. Or is it? 3) v20250309-0003-sync-the-data_checksums-GUC-with-the-local.patch ------------------------------------------------------------------ This is the main change, fixing failures in 002_actions.pl - the short version is that test does "-C data_checksums", but AFAICS that does not quite work because it does not call show_data_checksums() that early, and instead just grabs the variable backing the GUC. Which may be out of sync, so this patch fixes that by updating them both. That fixes the issue, but it's it a bit strange we now have three places tracking the state of data checksums? We have data_checksum_version in the control file, and then data_checksums and LocalDataChecksumVersion in the backends. Would it be possible to "unify" the latter two? That would also mean we don't have the duplicate constants like PG_DATA_CHECKSUM_VERSION and DATA_CHECKSUM_VERSION. Or why do we need that? 4) v20250309-0004-make-progress-reporting-work.patch ---------------------------------------------------- The progress reporting got "mostly disabled" in an earlier version, due to issues with the bgworkers. AFAICS the issue is the "status" row can be updated only by a single process, which does not quite work with the launcher + per-db workers architecture. I've considered a couple different approaches: a) updating the status only from the launcher This is mostly what CREATE INDEX does with parallel builds, and there it's mostly sufficient. But for checksums it'd mean we only have the number of databases to process/done, and that seems unsatisfactory, considering large clusters often have only a single large database. So not good enough, IMHO. b) allowing workers to update the status row, created by the launcher I guess this would be better, we'd know the relations/blocks counts. And I haven't tried coding this, but there would need to be some locking so that the workers don't overwrite increments from other workers, etc. But I don't think it'd work nicely with parallel per-db workers (which we don't do now, but we might). c) having one status entry per worker I ended up doing this, it didn't require any changes to the progress infrastructure, and it will work naturally even with multiple workers. There will always be one row for launcher status (which only has the number of databases total/done), and then one row per worker, with database-level info (datid, datname, #relations, #blocks). I removed the "DONE" phase, because that's right before the launcher exists, and I don't think we have that for similar cases. And I added "waiting on checkpoint" state, because that's often a long wait when the launcher seems to do nothing, so it seems useful to communicate the reason for that wait. 5) v20250309-0005-update-docs.patch ----------------------------------- Minor tweaks to docs, to reflect the changes to the progress reporting changes, and also some corrections (no resume after restart, ...). So far this passed all my tests - both chekc-world and stress testing (no failures / assert crashes, ...). One thing that puzzles me is I wasn't able to reproduce the failures reported in [1] - not even with just the rebase + minimal fixes (0001 + 0002). My best theory is this is somehow machine-specific, and my laptop is too slow or something. I'll try with the machine I used before once it gets available. regards [1] https://www.postgresql.org/message-id/e4dbcb2c-e04a-4ba2-bff0-8d979f55960e%40vondra.me -- Tomas Vondra