Thread

  1. Re: Changing the state of data checksums in a running cluster

    SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> — 2026-05-05T20:05:51Z

    Hi,
    
    On Tue, May 5, 2026 at 12:45 PM SATYANARAYANA NARLAPURAM <
    satyanarlapuram@gmail.com> wrote:
    
    > Hi,
    >
    > On Tue, May 5, 2026 at 12:19 PM Ayush Tiwari <ayushtiwari.slg01@gmail.com>
    > wrote:
    >
    >> Hi,
    >>
    >> On Wed, 6 May 2026 at 00:38, Daniel Gustafsson <daniel@yesql.se> wrote:
    >>
    >>> > On 5 May 2026, at 17:21, Ayush Tiwari <ayushtiwari.slg01@gmail.com>
    >>> wrote:
    >>>
    >>> > I've a small concern in 0001.  The new guard uses only
    >>> RelationNeedsWAL(reln),
    >>> > but ProcessSingleRelationByOid() iterates all forks.  For unlogged
    >>> relations,
    >>> > the init fork is special, there are several existing call sites that
    >>> preserve
    >>> > WAL for INIT_FORKNUM, for example using
    >>> >
    >>> >   RelationNeedsWAL(rel) || forknum == INIT_FORKNUM
    >>> >
    >>> > and catalog/storage.c notes that unlogged init forks need WAL and sync.
    >>> >
    >>> > So I think the condition in ProcessSingleRelationFork() should
    >>> preserve the
    >>> > init-fork case, e.g.
    >>> >
    >>> >   if (RelationNeedsWAL(reln) || forkNum == INIT_FORKNUM)
    >>> >       log_newpage_buffer(buf, false);
    >>>
    >>> Which failure scenario are you thinking about here?  When dealing with
    >>> the
    >>> catalog relation I can see the need but here we are reading, and
    >>> writing, data
    >>> pages.  In which case would we need to issue an FPI for an unlogged
    >>> relation
    >>> init fork? I might be missing something obvious here.
    >>>
    >>
    >> The case I was thinking about is not the unlogged relation contents
    >> themselves, but the init fork used as the reset template.  Some unlogged
    >> indexes can have initialized pages in the init fork, and recovery later
    >> copies
    >> that fork to the main fork when resetting unlogged relations.
    >>
    >> So my concern was that, during online checksum enable, we might update the
    >> checksum state of an init-fork page on the primary but not WAL-log an FPI
    >> for
    >> it because RelationNeedsWAL(reln) is false.  Then a standby, or recovery
    >> after
    >> a crash, could still have the old version of that init fork.  If that
    >> fork is
    >> later copied to the main fork after checksums are enabled, it might lead
    >> to
    >> checksum verification failures?
    >>
    >> Maybe there is another guarantee that makes this impossible, but I did
    >> not see
    >> it from the patch/test.  That is why I wondered whether the condition
    >> should
    >> preserve the existing special treatment for INIT_FORKNUM.
    >>
    >
    > It is a bug in the code, I added a test in the v2 patch to test this
    > scenario and the test
    > failed earlier.
    >
    
    Please find the latest v3. Earlier I took dependency on amcheck and removed
    it in v3
    and instead added queries that forces necessary checks.
    
     Thanks,
    Satya