Re: Changing shared_buffers without restart

Konstantin Knizhnik <knizhnik@garret.ru>

From: Konstantin Knizhnik <knizhnik@garret.ru>
To: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: pgsql-hackers@postgresql.org, Robert Haas <robertmhaas@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: 2025-04-18T07:06:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove PG_MMAP_FLAGS from mem.h

  2. Improve runtime and output of tests for replication slots checkpointing.

  3. Revert support for improved tracking of nested queries

  4. Use exported symbols list on macOS for loadable modules as well

  5. Add support for basic NUMA awareness

  6. Avoid unnecessary copying of a string in pg_restore.c

  7. aio: Infrastructure for io_method=worker

  8. Improve InitShmemAccess() prototype

On 18/04/2025 12:26 am, Dmitry Dolgov wrote:
>> On Thu, Apr 17, 2025 at 02:21:07PM GMT, Konstantin Knizhnik wrote:
>>
>> 1. Performance of Postgres CLOCK page eviction algorithm depends on number
>> of shared buffers. My first native attempt just to mark unused buffers as
>> invalid cause significant degrade of performance
> Thanks for sharing!
>
> Right, but it concerns the case when the number of shared buffers is
> high, independently from whether it was changed online or with a
> restart, correct? In that case it's out of scope for this patch.
>
>> 2. There are several data structures i Postgres which size depends on number
>> of buffers.
>> In my patch I used in some cases dynamic shared buffer size, but if this
>> structure has to be allocated in shared memory then still maximal size has
>> to be used. We have the buffers themselves (8 kB per buffer), then the main
>> BufferDescriptors array (64 B), the BufferIOCVArray (16 B), checkpoint's
>> CkptBufferIds (20 B), and the hashmap on the buffer cache (24B+8B/entry).
>> 128 bytes per 8kb bytes seems to  large overhead (~1%) but but it may be
>> quote noticeable with size differences larger than 2 orders of magnitude:
>> E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer we'd
>> have ~2GiB of static overhead on only 0.5GiB of actual buffers.
> Not sure what do you mean by using a maximal size, can you elaborate.
>
> In the current patch those structures are allocated as before, except
> each goes into a separate segment -- without any extra memory overhead
> as far as I see.

Thank you for explanation. I am sorry that I have not precisely 
investigated your patch before writing: it seems to be that you are are 
placing in separate segment only content of shared buffers.
Now I see that I was wrong and it is actually the main difference with 
memory ballooning approach I have used. As far as you are are allocating 
buffers descriptors and hash table in the same segment,
there is no extra memory overhead.
The only drawback is that we are loosing content of shared buffers in 
case of resize. It may be sadly, but not looks like there is no better 
alternative.

But there are still some dependencies on shared buffers size which are 
not addressed in this PR.
I am not sure how critical they are and is it possible to do something 
here, but at least I want to enumerate them:

1. Checkpointer: maximal number of checkpointer requests depends on 
NBuffers. So if we start with small shared buffers and then upscale, it 
may cause the too frequent checkpoints:

Size
CheckpointerShmemSize(void)
...
         size = add_size(size, mul_size(NBuffers, 
sizeof(CheckpointerRequest)));

CheckpointerShmemInit(void)
         CheckpointerShmem->max_requests = NBuffers;

2. XLOG: number of xlog buffers is calculated depending on number of 
shared buffers:

XLOGChooseNumBuffers(void)
{
...
      xbuffers = NBuffers / 32;

Should not cause some errors, but may be not so efficient if once again 
we start we tiny shared buffers.

3. AIO: AIO max concurrency is also calculated based on number of shared 
buffers:

AioChooseMaxConcurrency(void)
{
...

     max_proportional_pins = NBuffers / max_backends;

For small shared buffers (i.e. 1Mb,  there will be no concurrency at all).

So none of this issues can cause some error, just some inefficient behavior.
But if we want to start with very small shared buffers and then increase 
them on demand,
then it can be a problem.

In all this three cases NBuffers is used not just to calculate some 
threshold value, but also determine size of the structure in shared memory.
The straightforward solution is to place them in the same segment as 
shared buffers. But I am not sure how difficult it will be to implement.