Thread

  1. Re: Changing the state of data checksums in a running cluster

    Daniel Gustafsson <daniel@yesql.se> — 2026-05-28T11:28:49Z

    > On 26 May 2026, at 20:12, Tomas Vondra <tomas@vondra.me> wrote:
    
    > I suppose this means we should not be updating the checksum state
    > without emitting the barrier? I think all other places do that.
    
    Good catch, it's indeed a bug, any state change must emit a procsignalbarrier
    to maintain cluster consistency.  I ended up writing a test for this very case
    as well.
    
    > I'm still not sure if it really is an issue or just an annoyance,
    > because I've not been able to find a case where it'd lead to checksum
    > failures (or obviously incorrect final state after recovery).
    
    I've tried to get it to reach an incorrect end state but failed, but I do agree
    that maybe we need an improved locking protocol around state updates.  Need to
    spend some more time thinking about this.
    
    > I still don't understand why this needs DELAY_CHKPT_START ...
    
    Having stared at this for some time, and going over old threads, I think this
    is a mistake.  AFAICT though it cannot cause any error, so I'd lean towards
    erring on the safe side by leaving as is and looking at removing in 20.  What
    do you think?
    
    > I also noticed a couple minor comment issues, per attached patch (this
    > may need pgindent).
    
    I ended up splitting this into two, one for the comment fixes and one for the
    data type change.
    
    I propose applying the three patches below to v19 to fix the promotion issue
    before we wrap beta1.
    
    --
    Daniel Gustafsson