Re: Changing shared_buffers without restart

Dmitry Dolgov <9erthalion6@gmail.com>

From: Dmitry Dolgov <9erthalion6@gmail.com>
To: Peter Eisentraut <peter@eisentraut.org>
Cc: pgsql-hackers@postgresql.org
Date: 2024-11-19T13:29:12Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove PG_MMAP_FLAGS from mem.h

  2. Improve runtime and output of tests for replication slots checkpointing.

  3. Revert support for improved tracking of nested queries

  4. Use exported symbols list on macOS for loadable modules as well

  5. Add support for basic NUMA awareness

  6. Avoid unnecessary copying of a string in pg_restore.c

  7. aio: Infrastructure for io_method=worker

  8. Improve InitShmemAccess() prototype

> On Tue, Nov 19, 2024 at 01:57:00PM GMT, Peter Eisentraut wrote:
> On 18.10.24 21:21, Dmitry Dolgov wrote:
> > v1-0001-Allow-to-use-multiple-shared-memory-mappings.patch
> >
> > Preparation, introduces the possibility to work with many shmem mappings. To
> > make it less invasive, I've duplicated the shmem API to extend it with the
> > shmem_slot argument, while redirecting the original API to it. There are
> > probably better ways of doing that, I'm open for suggestions.
>
> After studying this a bit, I tend to think you should just change the
> existing APIs in place.  So for example,
>
> void *ShmemAlloc(Size size);
>
> becomes
>
> void *ShmemAlloc(int shmem_slot, Size size);
>
> There aren't that many callers, and all these duplicated interfaces almost
> add more new code than they save.
>
> It might be worth making exceptions for interfaces that are likely to be
> used by extensions.  For example, I see pg_stat_statements using
> ShmemInitStruct() and ShmemInitHash().  But that seems to be it.  Are there
> any other examples out there?  Maybe there are many more that I don't see
> right now.  But at least for the initialization functions, it doesn't seem
> worth it to preserve the existing interfaces exactly.
>
> In any case, I think the slot number should be the first argument.  This
> matches how MemoryContextAlloc() or also talloc() work.

Yeah, agree. I'll reshape this part, thanks.

> (Now here is an idea:  Could these just be memory contexts?  Instead of
> making six shared memory slots, could you make six memory contexts with a
> special shared memory type.  And ShmemAlloc becomes the allocation function,
> etc.?)

Sound interesting. I don't know how good the memory context interface
would fit here, but I'll do some investigation.

> I noticed the existing code made inconsistent use of PGShmemHeader * vs.
> void *, which also bled into your patch.  I made the attached little patch
> to clean that up a bit.

Right, it was bothering me the whole time, but not strong enough to make
me fix this in the PoC just yet.

> I suggest splitting the struct ShmemSegment into one struct for the three
> memory addresses and a separate array just for the slock_t's.  The former
> struct can then stay private in storage/ipc/shmem.c, only the locks need to
> be exported.
>
> Maybe rename ANON_MAPPINGS to something like NUM_ANON_MAPPINGS.
>
> Also, maybe some of this should be declared in storage/shmem.h rather than
> in storage/pg_shmem.h.  We have the existing ShmemLock in there, so it would
> be a bit confusing to have the per-segment locks elsewhere.
>
> [...]
>
> I'm wondering about this change:
>
> -#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
> +#define PG_MMAP_FLAGS                  (MAP_SHARED|MAP_HASSEMAPHORE)
>
> It looks like this would affect all mmap() calls, not only the one you're
> changing.  But that's the only one that uses this macro!  I don't understand
> why we need this; I don't see anything in the commit log about this ever
> being used for any portability.  I think we should just get rid of it and
> have mmap() use the right flags directly.
>
> I see that FreeBSD has a memfd_create() function.  Might be worth a try.
> Obviously, this whole thing needs a configure test for memfd_create()
> anyway.

Yep, those points make sense to me.

> > v1-0005-Use-anonymous-files-to-back-shared-memory-segment.patch
> >
> > Allows an anonyous file to back a shared mapping. This makes certain things
> > easier, e.g. mappings visual representation, and gives an fd for possible
> > future customizations.
>
> I think this could be a useful patch just by itself, without the rest of the
> series, because of
>
> > * By default, Linux will not add file-backed shared mappings into a
> > core dump, making it more convenient to work with them in PostgreSQL:
> > no more huge dumps to process.
>
> This could be significant operational benefit.
>
> When you say "by default", is this adjustable?  Does someone actually want
> the whole shared memory in their core file?  (If it's adjustable, is it also
> adjustable for anonymous mappings?)

Yes, there is /proc/<pid>/coredump_filter [1], that allows to specify
what to include. One can ask to exclude anon, file-backed and hugetlb
shared memory, with the only caveat that it's per process. I guess
normally no one wants to have a full shared memory in the coredump, but
there could be exceptions.

> I see that memfd_create() has a MFD_HUGETLB flag.  It's not very clear how
> that interacts with the MAP_HUGETLB flag for mmap().  Do you need to specify
> both of them if you want huge pages?

Correct, both (one flag in memfd_create and one for mmap) are needed to
use huge pages.

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-coredump-filter-core-dump-filtering-settings