Re: Changing the state of data checksums in a running cluster

Tomas Vondra <tomas@vondra.me>

From: Tomas Vondra <tomas@vondra.me>

To: Daniel Gustafsson <daniel@yesql.se>

Cc: Michael Paquier <michael@paquier.xyz>, Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2025-03-13T23:11:39Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Use correct datatype for PID
- 0ca1b3010597 19 (unreleased) landed
Improve comments in online checksums code
- cd857dec0e0a 19 (unreleased) landed
Fix checksum state transition during promotion
- 5fee7cab1b87 19 (unreleased) landed
Fix regex searching for page verification failures in tests
- 486b9a9b9eb4 19 (unreleased) landed
Apply data-checksum worker throttling parameters
- 9a39056c418c 19 (unreleased) landed
Skip WAL for unlogged main fork during online checksum enable
- 2018bd616790 19 (unreleased) landed
Revert "Get rid of WALBufMappingLock"
- c13070a27b63 19 (unreleased) cited
Get rid of WALBufMappingLock
- bc22dc0e0ddc 18.0 cited
Improve grammar of options for command arrays in TAP tests
- ce1b0f9da03e 18.0 cited

On 3/13/25 17:26, Tomas Vondra wrote:
> On 3/13/25 13:32, Daniel Gustafsson wrote:
>>> On 13 Mar 2025, at 12:03, Tomas Vondra <tomas@vondra.me> wrote:
>>>
>>> ...
>>>
>>> This also reminds me I had a question about the barrier - can't it
>>> happen a process gets to process multiple barriers at the same time? I
>>> mean, let's say it gets stuck for a while, and the cluster happens to go
>>> through disable+enable. Won't it then see both barriers? That'd be a
>>> problem, because the core processes the barriers in the order determined
>>> by the enum value, not in the order the barriers happened. Which means
>>> it might break the expected state transitions again (and end with the
>>> wrong local value). I haven't tried, though.
>>
>> Interesting, that seems like a general deficiency in the barriers, surely
>> processing them in-order would be more intuitive?  That would probably require
>> some form of Lamport clock though.
>>
> 
> Yeah, that seems non-trivial. What if we instead ensured there can't be
> two barriers set at the same time? Say, if we (somehow) ensured all
> processes saw the previous barrier before allowing a new one, we would
> not have this issue, right?
> 
> But I don't know what would be a good way to ensure this. Is there a way
> to check if all processes saw the barrier? Any ideas?
> 

Actually, scratch this. There already is a way to do this, by using
WaitForProcSignalBarrier. And the XLOG_CHECKSUMS processing already
calls this. So we should not see two barriers at the same time ...

>>>>> One issue I ran into is the postmaster does not seem to be processing
>>>>> the barriers, and thus not getting info about the data_checksum_version
>>>>> changes.
>>>>
>>>> Makes sense, that seems like a pretty reasonable constraint for the barrier.
>>>
>>> Not sure I follow. What's a reasonable constraint?
>>
>> That the postmaster deosn't process them.
>>
> 
> OK, that means we need a way to "refresh" the value for new child
> processses, similar to what my patch does. But I suspect there might be
> a race condition - if the child process starts while processing the
> XLOG_CHECKUMS record, it might happen to get the new value and then also
> the barrier (if it does the "refresh" in between the XLogCtl update and
> the barrier). Doesn't this need some sort of interlock, preventing this?
> 
> The child startup would need to do this:
> 
> 1) acquire lock
> 2) reset barriers
> 3) refresh the LocalDataChecksumValue (from XLogCtl)
> 4) release lock
> 
> while the walreceiver would do this
> 
> 1) acquire lock
> 2) update XLogCtl value
> 3) emit barrier
> 4) release lock
> 
> Or is there a reason why this would be unnecessary?
> 

I still think this might be a problem. I wonder if we could maybe
leverage the barrier generation, to detect that we don't need to process
this barrier, because we already got the value directly ...

FWIW we'd have this problem even if postmaster was processing barriers,
because there'd always be a "gap" between the fork and ProcSignalInit()
registering the new process into the procsignal array.


regards

-- 
Tomas Vondra