Re: Changing shared_buffers without restart
Konstantin Knizhnik <knizhnik@garret.ru>
From: Konstantin Knizhnik <knizhnik@garret.ru>
To: Dmitry Dolgov <9erthalion6@gmail.com>, pgsql-hackers@postgresql.org
Cc: Robert Haas <robertmhaas@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: 2025-04-17T11:21:07Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
-
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
-
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
-
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
-
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
-
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
-
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
-
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed
On 25/02/2025 11:52 am, Dmitry Dolgov wrote: >> On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote: >> TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via >> changing shared memory mapping layout. Any feedback is appreciated. Hi Dmitry, I am sorry that I have not participated in the discussion in this thread from the very beginning, although I am also very interested in dynamic shared buffer resizing and evn proposed my own implementation of it: https://github.com/knizhnik/postgres/pull/2 based on memory ballooning and using `madvise`. And it really works (returns unused memory to the system). This PoC allows me to understand the main drawbacks of this approach: 1. Performance of Postgres CLOCK page eviction algorithm depends on number of shared buffers. My first native attempt just to mark unused buffers as invalid cause significant degrade of performance pgbench -c 32 -j 4 -T 100 -P1 -M prepared -S (here shared_buffers - is maximal shared buffers size and `available_buffers` - is used part: | shared_buffers | available_buffers | TPS | | ------------------| ---------------------------- | ---- | | 128MB | -1 | 280k | | 1GB | -1 | 324k | | 2GB | -1 | 358k | | 32GB | -1 | 350k | | 2GB | 128Mb | 130k | | 2GB | 1Gb | 311k | | 32GB | 128Mb | 13k | | 32GB | 1Gb | 140k | | 32GB | 2Gb | 348k | My first thought is to replace clock with LRU based in double-linked list. As far as there is no lockless double-list implementation, it need some global lock. This lock can become bottleneck. The standard solution is partitioning: use N LRU lists instead of 1. Just as partitioned has table used by buffer manager to lockup buffers. Actually we can use the same partitions locks to protect LRU list. But it not clear what to do with ring buffers (strategies).So I decided not to perform such revolution in bufmgr, but optimize clock to more efficiently split reserved buffers. Just add|skip_count|field to buffer descriptor. And it helps! Now the worst case shared_buffer/available_buffers = 32Gb/128Mb shows the same performance 280k as shared_buffers=128Mb without ballooning. 2. There are several data structures i Postgres which size depends on number of buffers. In my patch I used in some cases dynamic shared buffer size, but if this structure has to be allocated in shared memory then still maximal size has to be used. We have the buffers themselves (8 kB per buffer), then the main BufferDescriptors array (64 B), the BufferIOCVArray (16 B), checkpoint's CkptBufferIds (20 B), and the hashmap on the buffer cache (24B+8B/entry). 128 bytes per 8kb bytes seems to large overhead (~1%) but but it may be quote noticeable with size differences larger than 2 orders of magnitude: E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer we'd have ~2GiB of static overhead on only 0.5GiB of actual buffers. 3. `madvise` is not portable. Certainly you have moved much further in your proposal comparing with my PoC (including huge pages support). But it is still not quite clear to me how you are going to solve the problems with large memory overhead in case of ~100x times variation of shared buffers size. I