Thread

  1. Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

    Andrey Borodin <x4mmm@yandex-team.ru> — 2026-05-26T18:29:58Z

    
    > On 26 May 2026, at 17:28, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > 
    > looks correct
    
    I tested that change as follows.
    
    Setted up REL_16_0 as primary, REL_16_STABLE as standby.
    
    Generate multixacts in a single session using savepoints:
    
    BEGIN;
    SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
    -- repeat 2500 times:
    SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
    COMMIT;
    
    Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU page
    boundary at multixact 2048 with some spare multis (we'll pickle the excess ones in
    jars when all is fixed, toying with 2048 wasted dev cycles for no reason).
    
    Test:
    0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page 0->1)
    1. Take pg_basebackup
    2. Run the workload again (2500 more, crossing page 1->2)
    3. Start the standby
    
    I observe:
    Without the change startup deadlocks.
    With the change standby catches up, the DEBUG1 message "next offsets page is not
    initialized, initializing it now" confirms the compat block fires correctly.
    
    I packaged this test into a buildfarm module (TestReplayXversion) [0] that
    builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces the deadlock
    on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject regress WAL trace
    into it, not working so far. On a bright side - I managed to get PR number 42 in buildfarm
    client repo.
    
    
    Best regards, Andrey Borodin.
    
    [0] https://github.com/PGBuildFarm/client-code/pull/42