Thread

  1. Re: Bound memory usage during manual slot sync retries

    Khoa Nguyen <kdnguyen9.oss@gmail.com> — 2026-05-28T16:42:54Z

    On Tue, May 26, 2026 at 4:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Tue, May 26, 2026 at 1:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    > >
    > >
    > >> On Mon, May 25, 2026 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > >> >
    > >> > Okay, then let's go with a per-retry memory context approach.
    > >> >
    > >> > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
    > >> >      local_slot->data.database));
    > >> >   }
    > >> >   }
    > >> > +
    > >> > + list_free(local_slots);
    > >> >  }
    > >> >
    > >> > Why do we need this retail pfree if the caller is using memory context?
    > >> >
    > >>
    > >> I see that the latest patch in email [1] has already addressed this
    > >> point. So, I'll push the v2 version.
    > >>
    > >> [1] - https://www.postgresql.org/message-id/CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ%40mail.gmail.com
    > >
    > >
    > > Thanks. My original reasoning for adding the pfree here was to act as a safety guard in case other future callers might not manage the memory properly. But Zhijie pointed out that this double-free pattern is not favored in previous community discussions. I'll post the  worst-case test and its results later.
    > >
    >
    > Pushed.
    
    
    This patch is already pushed by Amit but I wanted to add my review
    from a validation standpoint.  I was able to use the
    measure_slotsync_memory.sh to verify the leak that Xuneng reported.
    pre-patch
    timestamp,backend_pid,total_bytes,delta_bytes
    2026-05-27T16:52:20Z,78800,1339920,
    2026-05-27T16:52:22Z,78800,1405456,65536
    2026-05-27T16:52:24Z,78800,1405456,0
    2026-05-27T16:52:26Z,78800,1405456,0
    2026-05-27T16:52:28Z,78800,1536528,131072
    2026-05-27T16:52:30Z,78800,1536528,0
    2026-05-27T16:52:32Z,78800,1536528,0
    
    patched
    timestamp,backend_pid,total_bytes,delta_bytes
    2026-05-27T01:17:50Z,66917,1315344,
    2026-05-27T01:17:52Z,66917,1315344,0
    2026-05-27T01:17:54Z,66917,1315344,0
    2026-05-27T01:17:56Z,66917,1315344,0
    2026-05-27T01:17:58Z,66917,1315344,0
    2026-05-27T01:18:00Z,66917,1315344,0
    2026-05-27T01:18:02Z,66917,1315344,0
    
    I also reviewed the script and it is well done.  The script setup and
    capture data points showing memory leak within the session over time.
    The patch looks fine though I think that oldctx should be captured
    once outside the retry loop since the current logic is more about
    parent and child memory context rather than previous and current
    context.  Nevertheless, the current code works as well.