Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
Nazneen Jafri <jafrinazneen@gmail.com>
From: Nazneen Jafri <jafrinazneen@gmail.com>
To: Andrey Borodin <x4mmm@yandex-team.ru>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>,
Michael Paquier <michael@paquier.xyz>, Ayush Tiwari <ayushtiwari.slg01@gmail.com>,
Radim Marek <radim@boringsql.com>, Marko Tiikkaja <marko@joh.to>,
PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Date: 2026-05-27T02:55:14Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix self-deadlock when replaying WAL generated by older minor version
- 2bb60eb4feab 14 (unreleased) landed
- 2dfe75f9844f 15 (unreleased) landed
- 42a3194e5483 16 (unreleased) landed
-
Fix multixact backwards-compatibility with CHECKPOINT race condition
- 77dff5d937b1 16.14 cited
-
Don't reset 'latest_page_number' when replaying multixid truncation
- 23064542f8bd 16.13 cited
-
Set next multixid's offset when creating a new multixid
- 635166913078 16.12 cited
Tested Andrey's demo.diff on a fresh environment: - Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert) - ~2300 MultiXacts crossing the offsets page boundary - Without patch: startup deadlocks at RecordNewMultiXact(multi=2047) - With patch: standby replays all WAL and catches up Thanks, Nazneen On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > > > On 26 May 2026, at 17:28, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > looks correct > > I tested that change as follows. > > Setted up REL_16_0 as primary, REL_16_STABLE as standby. > > Generate multixacts in a single session using savepoints: > > BEGIN; > SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE; > -- repeat 2500 times: > SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a; > COMMIT; > > Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU > page > boundary at multixact 2048 with some spare multis (we'll pickle the excess > ones in > jars when all is fixed, toying with 2048 wasted dev cycles for no reason). > > Test: > 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page > 0->1) > 1. Take pg_basebackup > 2. Run the workload again (2500 more, crossing page 1->2) > 3. Start the standby > > I observe: > Without the change startup deadlocks. > With the change standby catches up, the DEBUG1 message "next offsets page > is not > initialized, initializing it now" confirms the compat block fires > correctly. > > I packaged this test into a buildfarm module (TestReplayXversion) [0] that > builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces > the deadlock > on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject > regress WAL trace > into it, not working so far. On a bright side - I managed to get PR number > 42 in buildfarm > client repo. > > > Best regards, Andrey Borodin. > > [0] https://github.com/PGBuildFarm/client-code/pull/42 > > > > > >