Re: Changing the state of data checksums in a running cluster
Alexander Korotkov <aekorotkov@gmail.com>
From: Alexander Korotkov <aekorotkov@gmail.com>
To: Tomas Vondra <tomas@vondra.me>
Cc: Andres Freund <andres@anarazel.de>, Daniel Gustafsson <daniel@yesql.se>, Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-04-04T07:55:32Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Use correct datatype for PID
- 0ca1b3010597 19 (unreleased) landed
-
Improve comments in online checksums code
- cd857dec0e0a 19 (unreleased) landed
-
Fix checksum state transition during promotion
- 5fee7cab1b87 19 (unreleased) landed
-
Fix regex searching for page verification failures in tests
- 486b9a9b9eb4 19 (unreleased) landed
-
Apply data-checksum worker throttling parameters
- 9a39056c418c 19 (unreleased) landed
-
Skip WAL for unlogged main fork during online checksum enable
- 2018bd616790 19 (unreleased) landed
-
Revert "Get rid of WALBufMappingLock"
- c13070a27b63 19 (unreleased) cited
-
Get rid of WALBufMappingLock
- bc22dc0e0ddc 18.0 cited
-
Improve grammar of options for command arrays in TAP tests
- ce1b0f9da03e 18.0 cited
Hi!
On Sat, Mar 15, 2025 at 7:33 PM Tomas Vondra <tomas@vondra.me> wrote:
> On 3/15/25 17:26, Andres Freund wrote:
> > Jo.
> >
> > On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote:
> >> Thanks, here's an updated patch version
> >
> > FWIW, this fails in CI;
> >
> > https://cirrus-ci.com/build/4678473324691456
> > On all OSs:
> > [16:08:36.331] # Failed test 'options --locale-provider=icu
> --locale=und --lc-*=C: no stderr'
> > [16:08:36.331] # at /tmp/cirrus-ci-build/src/bin/initdb/t/
> 001_initdb.pl line 132.
> > [16:08:36.331] # got: '2025-03-15 16:08:26.216 UTC [63153]
> LOG: XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0
> > [16:08:36.331] # 2025-03-15 16:08:26.216 UTC [63153] LOG:
> XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0
> (UPDATED)
> >
> > Windows & Compiler warnings:
> > [16:05:08.723] ../src/backend/storage/page/bufpage.c(25): fatal error
> C1083: Cannot open include file: 'execinfo.h': No such file or directory
> >
> > [16:18:52.385] bufpage.c:25:10: fatal error: execinfo.h: No such file or
> directory
> > [16:18:52.385] 25 | #include <execinfo.h>
> > [16:18:52.385] | ^~~~~~~~~~~~
> >
> > Greetings,
>
> Yeah, that's just the "debug stuff" - I don't expect any of that to be
> included in the commit, I only posted it for convenience. It adds a lot
> of debug logging, which I hope might help others to understand what the
> problem with checksums on standby is.
>
I took a look at this patch. I have following notes.
1) I think reporting of these errors could be better, more detailed.
Especially the second one could be similar to some of other errors on
checksums processing.
ereport(ERROR,
(errmsg("failed to start background worker to process
data checksums")));
ereport(ERROR,
(errmsg("unable to enable data checksums in cluster")));
2) ProcessAllDatabases() contains loop, which repeats scanning the new
databases for checkums. It continues while there are new database on each
iteration. Could we just limit the number of iterations to 2? Given at
each step we're calling WaitForAllTransactionsToFinish(), everything that
gets created after first WaitForAllTransactionsToFinish() call should have
checksums enabled in the beginning.
------
Regards,
Alexander Korotkov
Supabase