Thread

  1. Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

    Radim Marek <radim@boringsql.com> — 2026-05-21T09:06:18Z

    Altough the culprit is known, I've got more data as requested.
    
    #0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
    ./build/src/backend/port/pg_sema.c:327
    #4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
    mode=mode@entry=LW_EXCLUSIVE) at
    ./build/../src/backend/storage/lmgr/lwlock.c:1314
    #5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
    <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
    ./build/../src/backend/access/transam/slru.c:1174
    #6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
    nmembers=2, members=members@entry=0x7f20de6831ec) at
    ./build/../src/backend/access/transam/multixact.c:944
    #7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
    ./build/../src/backend/access/transam/multixact.c:3464
    #8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
    record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
    ./build/../src/backend/access/transam/xlogrecovery.c:1951
    #9  PerformWalRecovery () at
    ./build/../src/backend/access/transam/xlogrecovery.c:1782
    #10 0x000055a717740def in StartupXLOG () at
    ./build/../src/backend/access/transam/xlog.c:5452
    #11 0x000055a71797c7e4 in StartupProcessMain () at
    ./build/../src/backend/postmaster/startup.c:282
    #12 0x000055a717972b20 in AuxiliaryProcessMain
    (auxtype=auxtype@entry=StartupProcess)
    at ./build/../src/backend/postmaster/auxprocess.c:141
    #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
    ./build/../src/backend/postmaster/postmaster.c:5381
    #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
    argv=argv@entry=0x55a73a8d0590)
    at ./build/../src/backend/postmaster/postmaster.c:1463
    #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
    ./build/../src/backend/main/main.c:200
    
    and WAL dump
    
    rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
    1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
    1663/16384/16432 blk 536
    rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
    1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
    2: 336089 (keysh)
    336098 (keysh)
    rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
    1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
    [IS_MULTI, LOCK_ONLY,
    KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
    rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
    1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
    52, old_infobits: [],
    flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk
    22
    rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
    1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
    149, old_infobits: [],
    flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk
    6
    rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
    1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
    rel 1663/16384/16417
    blk 741
    rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
    1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
    243, old_infobits: [],
    flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk
    26
    rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
    1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC
    
    Radim
    
    On Thu, 21 May 2026 at 10:34, Radim Marek <radim@boringsql.com> wrote:
    
    > Thank you for the follow-up. In mean-time I can confirm the
    > commit 77dff5d937b1 might be the source of the original reported issue.
    >
    > Unfortunately pinning version down to 16.12 only avoids the
    > MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
    > after 12+ hours.
    >
    > FATAL: could not access status of transaction 24958976 DETAIL: Could not
    > read from file "pg_multixact/offsets/017C" at offset 221184: read too few
    > bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
    > 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
    > (keysh)
    >
    > We are going to try to pin 16.13 and try that before we can safely upgrade
    > of the primary/are confident we have working PITR recovery available should
    > we need it.
    >
    > Radim
    >
    > PS: Once I have some time I will try to setup a docker based harness to be
    > able to replicate original problem for later testing of the fix.
    >
    > On Thu, 21 May 2026 at 09:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
    >
    >>
    >>
    >> > On 21 May 2026, at 00:12, Marko Tiikkaja <marko@joh.to> wrote:
    >> >
    >> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
    >>
    >> Thanks!
    >>
    >> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
    >> If by chance you will have a backtrace of another deadlocking process -
    >> please post it.
    >>
    >> But it's not strictly necessary for analysis, I think we can figure out
    >> what
    >> happened from the backtrace you already posted.
    >>
    >>
    >> Best regards, Andrey Borodin.
    >>
    >