Re: Changing shared_buffers without restart
Dmitry Dolgov <9erthalion6@gmail.com>
From: Dmitry Dolgov <9erthalion6@gmail.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, pgsql-hackers@postgresql.org, Robert Haas <robertmhaas@gmail.com>
Date: 2025-04-21T09:29:59Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
-
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
-
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
-
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
-
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
-
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
-
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
-
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed
> On Fri, Apr 18, 2025 at 09:17:21PM GMT, Thomas Munro wrote: > I was imagining that you might map some maximum possible size at the > beginning to reserve the address space permanently, and then adjust > the virtual memory object's size with ftruncate as required to provide > backing. Doesn't that achieve the goal with fewer steps, using only > portable* POSIX stuff, and keeping all pointers stable? Ah, I see what you folks mean. So in the latest patch there is a single large shared memory area reserved with PROT_NONE + MAP_NORESERVE. This area is logically divided between shmem segments, and each segment is mmap'd out of it and could be resized withing these logical boundaries. Now the suggestion is to have one reserved area for each segment, and instead of really mmap'ing something out of it, manage memory via ftruncate. Yeah, that would work and will allow to avoid MAP_FIXED and mremap, which are questionable from portability point of view. This leaves memfd_create, and I'm still not completely clear on it's portability -- it seems to be specific to Linux, but others provide compatible implementation as well. Let me experiment with this idea a bit, I would like to make sure there are no other limitations we might face. > I understand that pointer stability may not be required Just to clarify, the current patch maintains this property (stable pointers), which I also see as mandatory for any possible implementation. > *You might also want to use fallocate after ftruncate on Linux to > avoid SIGBUS on allocation failure on first touch page fault, which > raises portability questions since it's unspecified whether you can do > that with shm fds and fails on some systems, but it let's call that an > independent topic as it's not affected by this choice. I'm afraid it would be strictly neccessary to do fallocate, otherwise we're back where we were before reservation accounting for huge pages in Linux (lot's of people were facing unexpected SIGBUS when dealing with cgroups). > TIL that mmap(size, fd) will actually extend a hugetlb memfd as a side > effect on Linux, as if you had called ftruncate on it (fully allocated > huge pages I expected up to the object's size, just not magical size > changes beyond that when I merely asked to map it). That doesn't > happen for regular page size, or for any page size on my local OS's > shm objects and doesn't seem to fit mmap's job description given an > fd*, but maybe I'm just confused. Anyway, a workaround seems to be > to start out with PROT_NONE and MAP_NORESERVE, then mprotect(PROT_READ > | PROT_WRITE) new regions after extending with ftruncate(), at least > in simple tests... Right, it's similar to the currently implemented space reservation, which also goes with PROT_NONE and MAP_NORESERVE. I assume it boils down to the way how memory reservation accounting in Linux works.