Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Switch memory contexts in ReinitializeParallelDSM.

  1. Parallel query: Use TopTransactionContext for ReinitializeParallelDSM()

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-08T09:07:25Z

    Hi -hackers, we've got report of rare crash in EDB's PostgreSQL fork
    which revealed that the PostgreSQL's ReinitializeParallelDSM()
    allocates memory from wrong memory context (ExecutorState) instead of
    TopTransactionContext.
    
    Currently we do not have public reproducer for this in PG community
    version, however when using Parallel Query, initially both
    ParallelContext pcxt->worker and pcxt->worker[i].error_mqh are
    allocated from TopTransactionContext (and that's appears to be
    correct), but that happens only initially, because later on - where
    reuse of DSM/workers is in play and more data is involved /
    technically where
    ExecParlallelReinitalize()->ReinitializeParallelDSM() is involved -
    after some time, the pcxt->worker[i].error_mqh will end up being
    reinitialized from ExecutorState memory context and that's the bug. In
    layman terms it means that this ReinitializeParallelDSM() is usually
    used for Nested Loop with Gather (useful side note: but this can be
    easier triggered with enable_material = off as this reaches the
    ReinitializeParallelDSM() way faster ).
    
    Normally this is not causing problems, however this might be
    problematic when query is cancelled, as the ExecutorState might be
    pfreed depending on what is inside PG_CATCH(): in case of having
    SPI_finish() there, the ExecutorState will be pfreed, then the
    pcxt->workers[i].error_mqh-> might end up being accessed by the SIGINT
    handler itself later like this: AbortTransaction →
    DestroyParallelContext() which leads to use after free.
    
    The stack would be similiar to : longjmp -> AbortCurrentTransaction()
    -> AbortTransaction() -> AtEOXact_Parallel() ->
    DestroyParallelContext() -> shm_mq_detach(). SPI_finish() is just an
    example here where we have caught it (it's releasing ExecutorState).
    
    To sum up it occurs in the following conditions:
    1. parallel query involved with nest loop/gather
    2. ReinitializeParallelDSM() being used
    3. query cancellation
    4. PG_CATCH() pfreeing ExecutorState
    
    We have attached the patch. The issue is within shm_mq_attach(), but
    we want to protect the whole function just in case just like in
    InitializeParallelDSM. Thanks to Jeevan Chalke and Robert Haas for
    help while debugging this.
    
    -J.
    
  2. Re: Parallel query: Use TopTransactionContext for ReinitializeParallelDSM()

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2025-12-10T08:02:55Z

    Hi Jakub,
    
    Thanks for the patch.
    
    Although I was unable to reproduce the reported issue, a review of the code
    flow confirms that to maintain consistency with all other Parallel Query
    memory allocations, the context should be switched to TopTransactionContext.
    This change appears correct and serves as a necessary safeguard to prevent
    potential future race conditions.
    
    Thanks
    
    
    On Mon, Dec 8, 2025 at 2:37 PM Jakub Wartak <jakub.wartak@enterprisedb.com>
    wrote:
    
    > Hi -hackers, we've got report of rare crash in EDB's PostgreSQL fork
    > which revealed that the PostgreSQL's ReinitializeParallelDSM()
    > allocates memory from wrong memory context (ExecutorState) instead of
    > TopTransactionContext.
    >
    > Currently we do not have public reproducer for this in PG community
    > version, however when using Parallel Query, initially both
    > ParallelContext pcxt->worker and pcxt->worker[i].error_mqh are
    > allocated from TopTransactionContext (and that's appears to be
    > correct), but that happens only initially, because later on - where
    > reuse of DSM/workers is in play and more data is involved /
    > technically where
    > ExecParlallelReinitalize()->ReinitializeParallelDSM() is involved -
    > after some time, the pcxt->worker[i].error_mqh will end up being
    > reinitialized from ExecutorState memory context and that's the bug. In
    > layman terms it means that this ReinitializeParallelDSM() is usually
    > used for Nested Loop with Gather (useful side note: but this can be
    > easier triggered with enable_material = off as this reaches the
    > ReinitializeParallelDSM() way faster ).
    >
    > Normally this is not causing problems, however this might be
    > problematic when query is cancelled, as the ExecutorState might be
    > pfreed depending on what is inside PG_CATCH(): in case of having
    > SPI_finish() there, the ExecutorState will be pfreed, then the
    > pcxt->workers[i].error_mqh-> might end up being accessed by the SIGINT
    > handler itself later like this: AbortTransaction →
    > DestroyParallelContext() which leads to use after free.
    >
    > The stack would be similiar to : longjmp -> AbortCurrentTransaction()
    > -> AbortTransaction() -> AtEOXact_Parallel() ->
    > DestroyParallelContext() -> shm_mq_detach(). SPI_finish() is just an
    > example here where we have caught it (it's releasing ExecutorState).
    >
    > To sum up it occurs in the following conditions:
    > 1. parallel query involved with nest loop/gather
    > 2. ReinitializeParallelDSM() being used
    > 3. query cancellation
    > 4. PG_CATCH() pfreeing ExecutorState
    >
    > We have attached the patch. The issue is within shm_mq_attach(), but
    > we want to protect the whole function just in case just like in
    > InitializeParallelDSM. Thanks to Jeevan Chalke and Robert Haas for
    > help while debugging this.
    >
    > -J.
    >
    
    
    -- 
    *Jeevan Chalke*
    *Principal Engineer, Engineering Manager*
    *Product Development*
    
    enterprisedb.com <https://www.enterprisedb.com>
    
  3. Re: Parallel query: Use TopTransactionContext for ReinitializeParallelDSM()

    Robert Haas <robertmhaas@gmail.com> — 2025-12-16T17:32:22Z

    On Wed, Dec 10, 2025 at 3:03 AM Jeevan Chalke
    <jeevan.chalke@enterprisedb.com> wrote:
    > Although I was unable to reproduce the reported issue, a review of the code flow confirms that to maintain consistency with all other Parallel Query memory allocations, the context should be switched to TopTransactionContext. This change appears correct and serves as a necessary safeguard to prevent potential future race conditions.
    
    Committed and back-patched.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com