Thread

  1. Re: Changing the state of data checksums in a running cluster

    SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> — 2026-05-05T19:45:09Z

    Hi,
    
    On Tue, May 5, 2026 at 12:19 PM Ayush Tiwari <ayushtiwari.slg01@gmail.com>
    wrote:
    
    > Hi,
    >
    > On Wed, 6 May 2026 at 00:38, Daniel Gustafsson <daniel@yesql.se> wrote:
    >
    >> > On 5 May 2026, at 17:21, Ayush Tiwari <ayushtiwari.slg01@gmail.com>
    >> wrote:
    >>
    >> > I've a small concern in 0001.  The new guard uses only
    >> RelationNeedsWAL(reln),
    >> > but ProcessSingleRelationByOid() iterates all forks.  For unlogged
    >> relations,
    >> > the init fork is special, there are several existing call sites that
    >> preserve
    >> > WAL for INIT_FORKNUM, for example using
    >> >
    >> >   RelationNeedsWAL(rel) || forknum == INIT_FORKNUM
    >> >
    >> > and catalog/storage.c notes that unlogged init forks need WAL and sync.
    >> >
    >> > So I think the condition in ProcessSingleRelationFork() should preserve
    >> the
    >> > init-fork case, e.g.
    >> >
    >> >   if (RelationNeedsWAL(reln) || forkNum == INIT_FORKNUM)
    >> >       log_newpage_buffer(buf, false);
    >>
    >> Which failure scenario are you thinking about here?  When dealing with the
    >> catalog relation I can see the need but here we are reading, and writing,
    >> data
    >> pages.  In which case would we need to issue an FPI for an unlogged
    >> relation
    >> init fork? I might be missing something obvious here.
    >>
    >
    > The case I was thinking about is not the unlogged relation contents
    > themselves, but the init fork used as the reset template.  Some unlogged
    > indexes can have initialized pages in the init fork, and recovery later
    > copies
    > that fork to the main fork when resetting unlogged relations.
    >
    > So my concern was that, during online checksum enable, we might update the
    > checksum state of an init-fork page on the primary but not WAL-log an FPI
    > for
    > it because RelationNeedsWAL(reln) is false.  Then a standby, or recovery
    > after
    > a crash, could still have the old version of that init fork.  If that fork
    > is
    > later copied to the main fork after checksums are enabled, it might lead to
    > checksum verification failures?
    >
    > Maybe there is another guarantee that makes this impossible, but I did not
    > see
    > it from the patch/test.  That is why I wondered whether the condition
    > should
    > preserve the existing special treatment for INIT_FORKNUM.
    >
    
    It is a bug in the code, I added a test in the v2 patch to test this
    scenario and the test
    failed earlier.
    
    Thanks,
    Satya
    
    >