Thread

  1. Re: Bound memory usage during manual slot sync retries

    Xuneng Zhou <xunengzhou@gmail.com> — 2026-05-29T02:30:57Z

    Hi Khoa,
    
    On Fri, May 29, 2026 at 12:43 AM Khoa Nguyen <kdnguyen9.oss@gmail.com> wrote:
    >
    > On Tue, May 26, 2026 at 4:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > >
    > > On Tue, May 26, 2026 at 1:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    > > >
    > > >
    > > >> On Mon, May 25, 2026 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > > >> >
    > > >> > Okay, then let's go with a per-retry memory context approach.
    > > >> >
    > > >> > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
    > > >> >      local_slot->data.database));
    > > >> >   }
    > > >> >   }
    > > >> > +
    > > >> > + list_free(local_slots);
    > > >> >  }
    > > >> >
    > > >> > Why do we need this retail pfree if the caller is using memory context?
    > > >> >
    > > >>
    > > >> I see that the latest patch in email [1] has already addressed this
    > > >> point. So, I'll push the v2 version.
    > > >>
    > > >> [1] - https://www.postgresql.org/message-id/CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ%40mail.gmail.com
    > > >
    > > >
    > > > Thanks. My original reasoning for adding the pfree here was to act as a safety guard in case other future callers might not manage the memory properly. But Zhijie pointed out that this double-free pattern is not favored in previous community discussions. I'll post the  worst-case test and its results later.
    > > >
    > >
    > > Pushed.
    >
    >
    > This patch is already pushed by Amit but I wanted to add my review
    > from a validation standpoint.  I was able to use the
    > measure_slotsync_memory.sh to verify the leak that Xuneng reported.
    > pre-patch
    > timestamp,backend_pid,total_bytes,delta_bytes
    > 2026-05-27T16:52:20Z,78800,1339920,
    > 2026-05-27T16:52:22Z,78800,1405456,65536
    > 2026-05-27T16:52:24Z,78800,1405456,0
    > 2026-05-27T16:52:26Z,78800,1405456,0
    > 2026-05-27T16:52:28Z,78800,1536528,131072
    > 2026-05-27T16:52:30Z,78800,1536528,0
    > 2026-05-27T16:52:32Z,78800,1536528,0
    >
    > patched
    > timestamp,backend_pid,total_bytes,delta_bytes
    > 2026-05-27T01:17:50Z,66917,1315344,
    > 2026-05-27T01:17:52Z,66917,1315344,0
    > 2026-05-27T01:17:54Z,66917,1315344,0
    > 2026-05-27T01:17:56Z,66917,1315344,0
    > 2026-05-27T01:17:58Z,66917,1315344,0
    > 2026-05-27T01:18:00Z,66917,1315344,0
    > 2026-05-27T01:18:02Z,66917,1315344,0
    >
    > I also reviewed the script and it is well done.  The script setup and
    > capture data points showing memory leak within the session over time.
    > The patch looks fine though I think that oldctx should be captured
    > once outside the retry loop since the current logic is more about
    > parent and child memory context rather than previous and current
    > context.  Nevertheless, the current code works as well.
    
    Thanks for your review and verification. I agree with your suggestion
    that capturing the outer memory context once outside the retry loop
    looks cleaner and reflect the lifetime model better. I am also
    wondering whether renaming the oldctx to outerctx could express the
    parent/child relationship more clearly. That said, I am ok with the
    status quo, if Amit thinks no further modification is needed.
    
    -- 
    Regards,
    Xuneng Zhou
    HighGo Software Co., Ltd.