Thread

  1. Re: Bound memory usage during manual slot sync retries

    Xuneng Zhou <xunengzhou@gmail.com> — 2026-05-15T12:53:50Z

    Hi Shveta, Zhijie,
    
    On Fri, May 15, 2026 at 4:41 PM Zhijie Hou (Fujitsu)
    <houzj.fnst@fujitsu.com> wrote:
    >
    > On Friday, May 15, 2026 2:36 PM shveta malik <shveta.malik@gmail.com> wrote:
    > > On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <xunengzhou@gmail.com>
    > > wrote:
    > > >
    > > > Hi Hackers,
    > > >
    > > > pg_sync_replication_slots() now retries inside a single SQL function
    > > > call.  Unlike the slotsync worker, it does not get a transaction
    > > > boundary between retry cycles, so allocations made while fetching and
    > > > synchronizing remote slots can accumulate until the function returns.
    > > >
    > > > The existing list_free_deep(remote_slots) is not enough to bound this.
    > > > It frees the List cells and the RemoteSlot structs stored as list
    > > > elements, but it does not recursively free allocations owned by those
    > > > structs, such as the copied slot name, plugin name, and database name.
    > > > It also does not release unrelated per-cycle allocations made while
    > > > fetching and processing the remote slots.
    > > >
    > > > This is probably modest in normal cases, as the retained memory is
    > > > roughly proportional to the number of retries times the number of remote
    > > slots.
    > > > Still, the function can wait for a long time on a lagging or
    > > > misconfigured standby, so the growth is unbounded for that call.
    > > >
    > > > The attached patch runs each manual retry cycle in a short-lived
    > > > memory context and resets it before the next attempt.  The slot name
    > > > list needed across retries is copied outside that context.
    > > >
    > > > It also adds two local cleanups.  Tuple slots created with
    > > > MakeSingleTupleTableSlot() are explicitly released with
    > > > ExecDropSingleTupleTableSlot() before clearing the walreceiver result.
    > > > The memory context would reclaim the storage eventually, but the slot
    > > > also holds a reference to the result tuple descriptor, so releasing it
    > > > at the matching ownership boundary seems clearer and follows nearby
    > > code.
    > > >
    > > > drop_local_obsolete_slots() now frees the temporary list container it
    > > > builds.  The retry-cycle context would reclaim this list too, but
    > > > freeing it locally would make the helper self-contained.
    > > >
    > > > Politely CCed the original authors.
    > > >
    > > >
    > >
    > > I like the core idea of this patch. It makes the flow cleaner and protects the
    > > flow from memory-leaks when dealing with a high number of slots. I think
    > > during implementation, we considered having a separate memory context,
    > > but since we were freeing the remote_slots then, we thought allocating a new
    > > memory context might not be required. But on re-thinking and reading the
    > > details above, IMO, it is a good improvement.
    
    Yeah, we need to stop the memory accumulation, either by using a
    dedicated memory context or by freeing the memory manually.
    
    > +1 to the general idea of avoiding memory accumulation.
    >
    > >
    > > A couple of minor comments:
    > >
    > > 1. I think we need to delete the new memory context in
    > > slotsync_failure_callback() as well.
    >
    > I think we can avoid doing that, because the new memory context should be
    > destroyed along with its parent context on transaction abort (if any error
    > occurs).
    
    Yeah, normal error/transaction memory context cleanup seems enough. It
    avoids adding callback state only to delete cycle_ctx.
    
    > Just one comment:
    >
    > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
    >                                                    local_slot->data.database));
    >                 }
    >         }
    > +
    > +       list_free(local_slots);
    >  }
    >
    > I think we prefer to avoid freeing memory explicitly, since this is a
    > static function and we already have a memory context in place to prevent leaks.
    > (We've seen this discussion about explicit freeing multiple times before, and
    > the previous conclusion was to rely on the memory context management rather than
    > adding more free code.)
    
    Thanks for pointing out. I wasn't aware of the prior discussion. I
    mainly wanted to avoid potential issues in case future callers invoke
    it without handling the cleanup properly.
    
    
    --
    Regards,
    Xuneng Zhou
    HighGo Software Co., Ltd.