Re: Changing the state of data checksums in a running cluster

SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>

From: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
To: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Cc: Daniel Gustafsson <daniel@yesql.se>, Tomas Vondra <tomas@vondra.me>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>, Andres Freund <andres@anarazel.de>, Bernd Helmle <mailings@oopsware.de>, Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>
Date: 2026-05-05T19:45:09Z
Lists: pgsql-hackers

Attachments

Hi,

On Tue, May 5, 2026 at 12:19 PM Ayush Tiwari <ayushtiwari.slg01@gmail.com>
wrote:

> Hi,
>
> On Wed, 6 May 2026 at 00:38, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> > On 5 May 2026, at 17:21, Ayush Tiwari <ayushtiwari.slg01@gmail.com>
>> wrote:
>>
>> > I've a small concern in 0001.  The new guard uses only
>> RelationNeedsWAL(reln),
>> > but ProcessSingleRelationByOid() iterates all forks.  For unlogged
>> relations,
>> > the init fork is special, there are several existing call sites that
>> preserve
>> > WAL for INIT_FORKNUM, for example using
>> >
>> >   RelationNeedsWAL(rel) || forknum == INIT_FORKNUM
>> >
>> > and catalog/storage.c notes that unlogged init forks need WAL and sync.
>> >
>> > So I think the condition in ProcessSingleRelationFork() should preserve
>> the
>> > init-fork case, e.g.
>> >
>> >   if (RelationNeedsWAL(reln) || forkNum == INIT_FORKNUM)
>> >       log_newpage_buffer(buf, false);
>>
>> Which failure scenario are you thinking about here?  When dealing with the
>> catalog relation I can see the need but here we are reading, and writing,
>> data
>> pages.  In which case would we need to issue an FPI for an unlogged
>> relation
>> init fork? I might be missing something obvious here.
>>
>
> The case I was thinking about is not the unlogged relation contents
> themselves, but the init fork used as the reset template.  Some unlogged
> indexes can have initialized pages in the init fork, and recovery later
> copies
> that fork to the main fork when resetting unlogged relations.
>
> So my concern was that, during online checksum enable, we might update the
> checksum state of an init-fork page on the primary but not WAL-log an FPI
> for
> it because RelationNeedsWAL(reln) is false.  Then a standby, or recovery
> after
> a crash, could still have the old version of that init fork.  If that fork
> is
> later copied to the main fork after checksums are enabled, it might lead to
> checksum verification failures?
>
> Maybe there is another guarantee that makes this impossible, but I did not
> see
> it from the patch/test.  That is why I wondered whether the condition
> should
> preserve the existing special treatment for INIT_FORKNUM.
>

It is a bug in the code, I added a test in the v2 patch to test this
scenario and the test
failed earlier.

Thanks,
Satya

>