Thread

  1. Re: IPC/MultixactCreation on the Standby server

    Andrey Borodin <x4mmm@yandex-team.ru> — 2025-11-14T12:11:50Z

    Hi Ivan!
    
    Thanks for the review.
    
    > On 24 Oct 2025, at 19:33, Ivan Bykov <I.Bykov@modernsys.ru> wrote:
    > 
    > I believe that we should provide more comprehensible feedback to developers 
    > by interrupting the deadlock state with a psql timeout. 15 seconds seems safe 
    > enough to distinguish between slow node operation and deadlock issue.
    
    Makes sense, but I used $PostgreSQL::Test::Utils::timeout_default seconds.
    
    
    > On 24 Oct 2025, at 22:16, Bykov Ivan <i.bykov@ftdata.ru> wrote:
    > 
    > In GetNewMultiXactId() this code may lead to error
    > ---
    > ExtendMultiXactOffset(MultiXactState->nextMXact + 1);
    > ---
    > If MultiXactState->nextMXact = MaxMultiXactId (0xFFFFFFFF)
    > we will not extend MultiXactOffset as we should
    
    It will extend SLRU, this calculations are handled in ExtendMultiXactOffset(). Moreover, we should eliminate "multi != FirstMultiXactId" bailout condition too. I've extended comments and added Assert(). I've added test for this, but it's whacky-hacky with dd, resetwal and such. But it uncovered wrond assertion in this code:
    /* This is an overflow-only branch */
    Assert(next_entryno == 0 || next == FirstMultiXactId);
    This "next == FirstMultiXactId" was missing.
    
    I also included Kirill's suggestions.
    
    GPT is also worried by initialization of page 0, but I don't fully understand it's concerns:
    "current MXID’s page is never extended: GetNewMultiXactId() now calls ExtendMultiXactOffset(MultiXactState->nextMXact + 1) instead of extending the page that will hold result. This skips zeroing the page for the MXID we are about to assign, so the first allocation on a fresh cluster (or after wrap) tries to write into an uninitialized SLRU page and will fail/crash once the buffer manager attempts I/O."
    
    I think we have now good coverage of what happens on fresh cluster and after a wraparound.
    
    "wraparound can’t recreate page 0: ExtendMultiXactOffset() now asserts multi != FirstMultiXactId, yet after wrap the first ID on page 0 is exactly FirstMultiXactId. Because callers no longer pass that value, page 0 is never re-created and we keep reusing stale offsets."
    
    Page 0 is actually created via different path on cluster initialization. Though everything works fine on wraparound.
    
    Thanks!
    
    
    Best regards, Andrey Borodin.