Re: Changing shared_buffers without restart

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>
To: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Thomas Munro <thomas.munro@gmail.com>, pgsql-hackers@postgresql.org, Robert Haas <robertmhaas@gmail.com>
Date: 2025-09-26T18:36:43Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove PG_MMAP_FLAGS from mem.h

  2. Improve runtime and output of tests for replication slots checkpointing.

  3. Revert support for improved tracking of nested queries

  4. Use exported symbols list on macOS for loadable modules as well

  5. Add support for basic NUMA awareness

  6. Avoid unnecessary copying of a string in pg_restore.c

  7. aio: Infrastructure for io_method=worker

  8. Improve InitShmemAccess() prototype

Hi,

On 2025-09-26 20:04:21 +0200, Dmitry Dolgov wrote:
> > On Thu, Sep 18, 2025 at 09:52:03AM -0400, Andres Freund wrote:
> > > From 0a13e56dceea8cc7a2685df7ee8cea434588681b Mon Sep 17 00:00:00 2001
> > > From: Dmitrii Dolgov <9erthalion6@gmail.com>
> > > Date: Sun, 6 Apr 2025 16:40:32 +0200
> > > Subject: [PATCH 03/16] Introduce pending flag for GUC assign hooks
> > > 
> > > Currently an assing hook can perform some preprocessing of a new value,
> > > but it cannot change the behavior, which dictates that the new value
> > > will be applied immediately after the hook. Certain GUC options (like
> > > shared_buffers, coming in subsequent patches) may need coordinating work
> > > between backends to change, meaning we cannot apply it right away.
> > > 
> > > Add a new flag "pending" for an assign hook to allow the hook indicate
> > > exactly that. If the pending flag is set after the hook, the new value
> > > will not be applied and it's handling becomes the hook's implementation
> > > responsibility.
> > 
> > I doubt it makes sense to add this to the GUC system. I think it'd be better
> > to just use the GUC value as the desired "target" configuration and have a
> > function or a show-only GUC for reporting the current size.
> > 
> > I don't think you can't just block application of the GUC until the resize is
> > complete. E.g. what if the value was too big and the new configuration needs
> > to fixed to be lower?
> 
> I think it was a bit hasty to post another version of the patch without
> the design changes we've agreed upon last time. I'm still working on
> that (sorry, it takes time, I haven't wrote so much Perl for testing
> since forever), the current implementation doesn't include anything with
> GUC to simplify the discussion. I'm still convinced that multi-step GUC
> changing makes sense, but it has proven to be more complicated than I
> anticipated, so I'll spin up another thread to discuss when I come to
> it.

FWIW, I'm fairly convinced it's a completely dead end.




> > > From e2f48da8a8206711b24e34040d699431910fbf9c Mon Sep 17 00:00:00 2001
> > > From: Dmitrii Dolgov <9erthalion6@gmail.com>
> > > Date: Tue, 17 Jun 2025 11:47:04 +0200
> > > Subject: [PATCH 06/16] Address space reservation for shared memory
> > > 
> > > Currently the shared memory layout is designed to pack everything tight
> > > together, leaving no space between mappings for resizing. Here is how it
> > > looks like for one mapping in /proc/$PID/maps, /dev/zero represents the
> > > anonymous shared memory we talk about:
> > >
> > >     00400000-00490000         /path/bin/postgres
> > >     ...
> > >     012d9000-0133e000         [heap]
> > >     7f443a800000-7f470a800000 /dev/zero (deleted)
> > >     7f470a800000-7f471831d000 /usr/lib/locale/locale-archive
> > >     7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34
> > >     ...
> > > 
> > > Make the layout more dynamic via splitting every shared memory segment
> > > into two parts:
> > > 
> > > * An anonymous file, which actually contains shared memory content. Such
> > >   an anonymous file is created via memfd_create, it lives in memory,
> > >   behaves like a regular file and semantically equivalent to an
> > >   anonymous memory allocated via mmap with MAP_ANONYMOUS.
> > > 
> > > * A reservation mapping, which size is much larger than required shared
> > >   segment size. This mapping is created with flags PROT_NONE (which
> > >   makes sure the reserved space is not used), and MAP_NORESERVE (to not
> > >   count the reserved space against memory limits). The anonymous file is
> > >   mapped into this reservation mapping.
> > 
> > The commit message fails to explain why, if we're already relying on
> > MAP_NORESERVE, we need to anything else? Why can't we just have one maximally
> > sized allocation that's marked MAP_NORESERVE for all the parts that we don't
> > yet need?
> 
> How do we return memory to the OS in that case? Currently it's done
> explicitly via truncating the anonymous file.

madvise with MADV_DONTNEED or MADV_REMOVE.

Greetings,

Andres Freund