Re: Changing shared_buffers without restart
Dmitry Dolgov <9erthalion6@gmail.com>
From: Dmitry Dolgov <9erthalion6@gmail.com>
To: Matthias van de Meent <boekewurm+postgres@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Robert Haas <robertmhaas@gmail.com>, pgsql-hackers@postgresql.org
Date: 2024-11-29T16:47:27Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
-
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
-
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
-
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
-
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
-
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
-
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
-
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed
> On Fri, Nov 29, 2024 at 01:56:30AM GMT, Matthias van de Meent wrote: > > I mean, we can do the following to get a nice contiguous empty address > space no other mmap(NULL)s will get put into: > > /* reserve size bytes of memory */ > base = mmap(NULL, size, PROT_NONE, ...flags, ...); > /* use the first small_size bytes of that reservation */ > allocated_in_reserved = mmap(base, small_size, PROT_READ | > PROT_WRITE, MAP_FIXED, ...); > > With the PROT_NONE protection option the OS doesn't actually allocate > any backing memory, but guarantees no other mmap(NULL, ...) will get > placed in that area such that it overlaps with that allocation until > the area is munmap-ed, thus allowing us to reserve a chunk of address > space without actually using (much) memory. From what I understand it's not much different from the scenario when we just map as much as we want in advance. The actual memory will not be allocated in both cases due to CoW, oom_score seems to be the same. I agree it sounds attractive, but after some experimenting it looks like it won't work with huge pages insige a cgroup v2 (=container). The reason is Linux has recently learned to apply memory reservation limits on hugetlb inside a cgroup, which are applied to mmap. Nowadays this feature is often configured out of the box in various container orchestrators, meaning that a scenario "set hugetlb=1GB on a container, reserve 32GB with PROT_NONE" will fail. I've also tried to mix and match, reserve some address space via non-hugetlb mapping, and allocate a hugetlb out of it, but it doesn't work either (the smaller mmap complains about MAP_HUGETLB with EINVAL).