Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
Andrey Borodin <x4mmm@yandex-team.ru>
From: Andrey Borodin <x4mmm@yandex-team.ru>
To: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Michael Paquier <michael@paquier.xyz>,
Ayush Tiwari <ayushtiwari.slg01@gmail.com>,
Radim Marek <radim@boringsql.com>,
Marko Tiikkaja <marko@joh.to>,
PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Date: 2026-05-26T18:29:58Z
Lists: pgsql-hackers
> On 26 May 2026, at 17:28, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > looks correct I tested that change as follows. Setted up REL_16_0 as primary, REL_16_STABLE as standby. Generate multixacts in a single session using savepoints: BEGIN; SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE; -- repeat 2500 times: SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a; COMMIT; Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU page boundary at multixact 2048 with some spare multis (we'll pickle the excess ones in jars when all is fixed, toying with 2048 wasted dev cycles for no reason). Test: 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page 0->1) 1. Take pg_basebackup 2. Run the workload again (2500 more, crossing page 1->2) 3. Start the standby I observe: Without the change startup deadlocks. With the change standby catches up, the DEBUG1 message "next offsets page is not initialized, initializing it now" confirms the compat block fires correctly. I packaged this test into a buildfarm module (TestReplayXversion) [0] that builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces the deadlock on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject regress WAL trace into it, not working so far. On a bright side - I managed to get PR number 42 in buildfarm client repo. Best regards, Andrey Borodin. [0] https://github.com/PGBuildFarm/client-code/pull/42