Re: Changing the state of data checksums in a running cluster

Tomas Vondra <tomas@vondra.me>

From: Tomas Vondra <tomas@vondra.me>

To: Daniel Gustafsson <daniel@yesql.se>

Cc: Bernd Helmle <mailings@oopsware.de>, Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2025-11-18T19:06:40Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Use correct datatype for PID
- 0ca1b3010597 19 (unreleased) landed
Improve comments in online checksums code
- cd857dec0e0a 19 (unreleased) landed
Fix checksum state transition during promotion
- 5fee7cab1b87 19 (unreleased) landed
Fix regex searching for page verification failures in tests
- 486b9a9b9eb4 19 (unreleased) landed
Apply data-checksum worker throttling parameters
- 9a39056c418c 19 (unreleased) landed
Skip WAL for unlogged main fork during online checksum enable
- 2018bd616790 19 (unreleased) landed
Revert "Get rid of WALBufMappingLock"
- c13070a27b63 19 (unreleased) cited
Get rid of WALBufMappingLock
- bc22dc0e0ddc 18.0 cited
Improve grammar of options for command arrays in TAP tests
- ce1b0f9da03e 18.0 cited

On 11/10/25 02:26, Tomas Vondra wrote:
> What could we do about the root cause? We discussed this with Daniel and
> we've been stuck for quite a while. But then it occurred to us maybe we
> can simply "pause" the checksum state change while there's backup in
> progress. We already enable/disable FPW based on this, so why couldn't
> we check XLogCtl->Insert.runningBackups, and only advance to the next
> checksum state if (runningBackups==0)?
> 
> That would mean a single backup does not need to worry about seeing a
> mix of blocks written with different checksum states, and it also means
> the final pg_control file has the correct checksum state, because it is
> not allowed to change during the basebackup.
> 
> Of course, this would mean checksum changes may take longer. A corner
> case is that database with a basebackup running 100% of the time won't
> be able to change checksums on-line. But to me that seems acceptable, if
> communicated / documented clearly.

After thinking about this approach a bit, I realized the basebackup may
also run on the standby. Which means the checksum process won't see it
by checking XLogCtl->Insert.runningBackups. It will merrily proceed,
breaking the standby backup just as described earlier ...

Not sure what would be a good fix. One option is to "pause" the redo,
which is what the patch already does (by forcing an immediate checkpoint
whenever checksum state changes). We could pause redo until the backup
completes. But of course, that'd be terrible - especially for syncrep. I
hoped we'd find a better approach, and pausing redo for longer goes in
the opposite direction.

On the other hand, we already have similar issue with full_page_writes.
The backup on standby is not allowed to start if fpw=off, and if the
setting changes while the backup is running, the backup fails:

  pg_basebackup: error: backup failed: ERROR:  WAL generated with
"full_page_writes=off" was replayed during online backup
HINT:  This means that the backup being taken on the standby is corrupt
and should not be used. Enable "full_page_writes" and run CHECKPOINT on
the primary, and then try an online backup again.

Maybe this would be acceptable for checksums too ...

It's not exactly the same, of course. We don't really expect people to
change fpw in a running cluster.

regards

-- 
Tomas Vondra