Thread

  1. BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

    PG Bug reporting form <noreply@postgresql.org> — 2026-05-20T21:16:59Z

    The following bug has been logged on the website:
    
    Bug reference:      19490
    Logged by:          Radim Marek
    Email address:      radim@boringsql.com
    PostgreSQL version: 16.14
    Operating system:   Linux - Ubuntu 22.04
    Description:        
    
    Hello, 
    due to a mistake we have run a higher minor version of 16.x against the
    non-upgraded primary. This led to repeated issues on WAL processing.
    
    Description:
    
    A streaming replication standby running 16.14 stops advancing replay while
    WAL keeps arriving from a 16.8 primary. The startup process is parked in
    futex_wait_queue with wait_event = LWLock:MultiXactOffsetSLRU and no longer
    makes progress.
    
    pg_stat_slru shows zero MultiXact activity over the same window, so it
    appears to stop on the lock itself rather than inside any SLRU read/write
    path. Downgrading the standby binary to 16.12 (same data directory) resolved
    the symptom under the same workload.
    
    Configuration:
    
    Primary running 16.8-1.pgdg22.04+1, we observed both loaded and "relatively"
    idle (below 1000 QPS)
    Replica: 16.14-1.pgdg22.04+1,  physical streaming, async, single replica on
    16.14 due to misconfiguration, no cascading. Other replicas not affected
    (running 16.8).
    
    hot_standby_feedback enabled, logical replication from primary. default WAL
    segment size. Default SLRU buffer sizes.
    
    Observed symptoms on the standby
    
    1. pg_stat_replication on primary, just the affected node
    
    client_addr   state     sent_lag  write_lag  flush_lag  replay_lag_bytes
     replay_lag
    10.x.x.x      streaming 0         0          0          8766784344      
     02:42:50
    
    2. Receive/write/flush all at the primary's current LSN; only replay is far
    behind and growing.
    
    3. Startup process wait event on standby (sampled repeatedly, always
    identical)pid    wait_event_type    wait_event             state
    19095  LWLock             MultiXactOffsetSLRU    (null)
    
    4. Kernel stack of the startup process
    cat /proc/19095/stack
    [<0>] futex_wait_queue+0x67/0xa0
    [<0>] __futex_wait+0x155/0x1d0
    [<0>] futex_wait+0x74/0x120
    [<0>] do_futex+0x16d/0x230
    [<0>] __x64_sys_futex+0x95/0x200
    [<0>] x64_sys_call+0x117b/0x2480
    [<0>] do_syscall_64+0x81/0x170
    [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80
    cat /proc/19095/wchan
    futex_wait_queue
    
    5. pg_stat_slru on the standby, after pg_stat_reset_slru(NULL) and a
    60-second wait under live WAL streaming
    name             blks_zeroed  blks_hit  blks_read  blks_written
    MultiXactMember  0            0         0          0
    MultiXactOffset  0            0         0          0
    
    6. There was no MultiXact SLRU activity while the startup process is
    reportedly waiting on the MultiXact offset SLRU lock.
    
    7. Replay LSN frozen, receive LSN advancing. Sampled 60 sec apart.
    recv             replay          lag_bytes
    1476A/D1DA158    14767/EE01DB78  9111848416
    1476A/EB565D0    14767/EE01DB78  9138571864
    
    8. No replay progress; ~9 GB of WAL buffered locally that is never applied.
    
    6. Other backends on the standby: only a diagnostic psql client. No
    hot-standby readers. 
    
    7. MultiXact age on the primary is small (~360k on most DBs, ~239k on the
    main DB). No MultiXact storm.
    
    Workarounds
    
    - Restarting the standby cleared the block but once it caught up it repeated
    again- Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against
    the same data directory restored normal replay. After 60s under the same
    workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact.
    
    I understand that running 6 minor versions behind is not particulary good
    setup, but given this being supported direction this might be worth at least
    in 16.13/16.14 release notes.
    
    ---
    
    Hope this helps,
    Radim