Thread

  1. Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

    Radim Marek <radim@boringsql.com> — 2026-05-21T08:34:49Z

    Thank you for the follow-up. In mean-time I can confirm the
    commit 77dff5d937b1 might be the source of the original reported issue.
    
    Unfortunately pinning version down to 16.12 only avoids the
    MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
    after 12+ hours.
    
    FATAL: could not access status of transaction 24958976 DETAIL: Could not
    read from file "pg_multixact/offsets/017C" at offset 221184: read too few
    bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
    24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
    (keysh)
    
    We are going to try to pin 16.13 and try that before we can safely upgrade
    of the primary/are confident we have working PITR recovery available should
    we need it.
    
    Radim
    
    PS: Once I have some time I will try to setup a docker based harness to be
    able to replicate original problem for later testing of the fix.
    
    On Thu, 21 May 2026 at 09:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
    
    >
    >
    > > On 21 May 2026, at 00:12, Marko Tiikkaja <marko@joh.to> wrote:
    > >
    > > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
    >
    > Thanks!
    >
    > This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
    > If by chance you will have a backtrace of another deadlocking process -
    > please post it.
    >
    > But it's not strictly necessary for analysis, I think we can figure out
    > what
    > happened from the backtrace you already posted.
    >
    >
    > Best regards, Andrey Borodin.
    >