Re: VM corruption on standby

Andrey Borodin <x4mmm@yandex-team.ru>

From: Andrey Borodin <x4mmm@yandex-team.ru>
To: Alexander Korotkov <aekorotkov@gmail.com>
Cc: Kirill Reshke <reshkekirill@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Melanie Plageman <melanieplageman@gmail.com>
Date: 2025-09-10T11:59:56Z
Lists: pgsql-hackers

> On 10 Sep 2025, at 15:25, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> 
> I think the approach #2 is more appropriate for bc22dc0e0d, because in
> the critical section we only wait for other processes also in the
> critical section (so, there is no risk they will exit immediately
> after postmaster death making us stuck).  I've implemented a patch,
> where waiting on conditional variable is replaced with LWLock-style
> waiting on semaphore.  However, custom waiting code just for
> AdvanceXLInsertBuffer() doesn't look good.

Well, at least I'd like to see corruption-free solution for injection point wait too.

>  I believe we need some
> general solution.  We might have a special kind of condition variable,
> a critical section condition variable, where both waiting and
> signaling must be invoked only in a critical section.  However, I dig
> into our Latch and WaitEventSet, it seems there are too many
> assumptions about postmaster death.  So, a critical section condition
> variable probably should be implemented on top of semaphore.  Any
> thoughts?

We want Latch\WaitEventSet, but for critical section. Is it easier to implement from scratch (from semaphores), or is it easier to fix and maintain existing Latch\WaitEventSet?


Best regards, Andrey Borodin.