Thread

  1. Re: Memory context can be its own parent and child in replication command

    Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> — 2025-03-11T09:26:32Z

    On Tue, Mar 11, 2025 at 1:09 AM Michael Paquier <michael@paquier.xyz> wrote:
    > It seems to me that you mean 1afe31f03cd2, no?
    
    Yes, that was a bad copy/paste from me.
    
    On Tue, Mar 11, 2025 at 2:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > I dunno about this patch: it seems to me it's doing things exactly
    > backwards, namely trying to restore the CurrentMemoryContext after
    > a transaction abort to what it was just beforehand.  With only
    > a slightly different set of suppositions, this is introducing
    > use of a deleted context rather than removing it.
    
    Yeah, I'm not super happy about the fix either. This leaves the issue
    that CurrentMemoryContext is set to a deleted context, even if
    temporarily.
    
    > I'm inclined to suspect that the real problem is somebody installed
    > the wrong context as current just before the transaction was started.
    > I don't have the energy to look closer right now, though.
    
    The context installed is the replication command context created at
    the beginning of exec_replication_command. This context is deleted at
    the end of the function while the transaction is still running. Maybe
    a better fix would be to not delete the Replication command context
    when a transaction was started to handle the export, and switch back
    to this context at the start of the next exec_replication_command?
    This way, the memory context referenced by priorContext would still be
    valid when the transaction is aborted.
    
    This would also require the Replication command context to not be
    attached under the MessageContext to avoid being deleted at the end of
    the query cycle.
    
    > Independently of where the actual bug is there, it seems not
    > nice that it's so easy to bollix the memory context data
    > structures.  I wonder if we can make that more detectable
    > at reasonable cost.
    
    One thing that made it hard to trace the issue was that there's no way
    to know if a MemoryContext was deleted or not. Setting the
    MemoryContext's node type to T_Invalid during reset could be a way to
    tag the context as deleted. And this will be able to leverage the
    existing MemoryContextIsValid check and additional
    MemoryContextIsValid checks could be added before restoring the
    priorContext.
    
    --- a/src/backend/utils/mmgr/mcxt.c
    +++ b/src/backend/utils/mmgr/mcxt.c
    @@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)
    
            context->methods->delete_context(context);
    
    +       context->type = T_Invalid;
            VALGRIND_DESTROY_MEMPOOL(context);
     }
    
    However, when testing this on my mac, it seems to trigger a heap
    corruption during initdb.
    
    postgres(9057,0x200690840) malloc: Heap corruption detected, free list
    is damaged at 0x60000322bb10
    *** Incorrect guard value: 276707563012096
    postgres(9057,0x200690840) malloc: *** set a breakpoint in
    malloc_error_break to debug
    
    While it ran fine on linux. I didn't have time to check why yet.