Re: Changing shared_buffers without restart

Tomas Vondra <tomas@vondra.me>

From: Tomas Vondra <tomas@vondra.me>

To: Dmitry Dolgov <9erthalion6@gmail.com>

Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Thomas Munro <thomas.munro@gmail.com>, pgsql-hackers@postgresql.org, Jack Ng <Jack.Ng@huawei.com>, Ni Ku <jakkuniku@gmail.com>

Date: 2025-07-07T11:57:42Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed


On 7/5/25 12:35, Dmitry Dolgov wrote:
>> On Fri, Jul 04, 2025 at 05:23:29PM +0200, Tomas Vondra wrote:
>>>> 2) pending GUC changes
>>>>
>>>> Perhaps this should be a separate utility command, or maybe even just
>>>> a new ALTER SYSTEM variant? Or even just a function, similar to what
>>>> the "online checksums" patch did, possibly combined with a bgworder
>>>> (but probably not needed, there are no db-specific tasks to do).
>>>
>>> This is one topic we still actively discuss, but haven't had much
>>> feedback otherwise. The pros and cons seem to be clear:
>>>
>>> * Utilizing the existing GUC mechanism would allow treating
>>>   shared_buffers as any other configuration, meaning that potential
>>>   users of this feature don't have to do anything new to use it -- they
>>>   still can use whatever method they prefer to apply new configuration
>>>   (pg_reload_conf, pg_ctr reload, maybe even sending SIGHUP directly).
>>>
>>>   I'm also wondering if it's only shared_buffers, or some other options
>>>   could use similar approach.
>>>
>>
>> I don't know. What are the "potential users" of this feature? I don't
>> recall any, but there may be some. How do we know the new pending flag
>> will work for them too?
> 
> It could be potentialy useful for any GUC that controls a resource
> shared between backend, and requires restart. To make this GUC
> changeable online, every backend has to perform some action, and they
> have to coordinate to make sure things are consistent -- exactly the use
> case we're trying to address, shared_buffers is just happened to be one
> of such resources. While I agree that the currently implemented
> interface is wrong (e.g. it doesn't prevent pending GUCs from being
> stored in PG_AUTOCONF_FILENAME, this has to happen only when the new
> value is actually applied), it still makes sense to me to allow more
> flexible lifecycle for certain GUC.
> 
> An example I could think of is shared_preload_libraries. If we ever want
> to do a hot reload of libraries, this will follow the procedure above:
> every backend has to do something like dlclose / dlopen and make sure
> that other backends have the same version of the library. Another maybe
> less far fetched example is max_worker_processes, which AFAICT is mostly
> used to control number of slots in shared memory (altough it's also
> stored in the control file, which makes things more complicated).
> 

Not sure. My concern is the config reload / GUC assign hook was not
designed with this use case in mind, and we'll run into issues. I also
dislike the "async" nature of this, which makes it harder to e.g. abort
the change, etc.

>>> * Having a separate utility command is a mighty simplification, which
>>>   helps avoiding problems you've described above.
>>>
>>> So far we've got two against one in favour of simple utility command, so
>>> we can as well go with that.
>>>
>>
>> Not sure voting is a good way to make design decisions ...
> 
> I'm somewhat torn between those two options myself. The more I think
> about this topic, the more I convinced that pending GUC makes sense, but
> the more work I see needed to implement that. Maybe a good middle ground
> is to go with a simple utility command, as Ashutosh was suggesting, and
> keep pending GUC infrastructure on top of that as an optional patch.
> 

What about a simple function? Probably not as clean as a proper utility
command, and it implies a transaction - not sure if that could be a
problem for some part of this.

>>>> 3) max_available_memory
>>>>
>>>> I think the GUC should specify the maximum shared_buffers we want to
>>>> allow, and then we'd work out the total to pre-allocate? Considering
>>>> we're only allowing to resize shared_buffers, that should be pretty
>>>> trivial. Yes, it might happen that the "total limit" happens to exceed
>>>> the available memory or something, but we already have the problem
>>>> with shared_buffers. Seems fine if we explain this in the docs, and
>>>> perhaps print the calculated memory limit on start.
>>>
>>> Somehow I'm not following what you suggest here. You mean having the
>>> maximum shared_buffers specified, but not as a separate GUC?
>>
>> My suggestion was to have a guc max_shared_buffers. Based on that you
>> can easily calculate the size of all other segments dependent on
>> NBuffers, and reserve memory for that.
> 
> Got it, ok.
> 
>>>> Moreover, all of the above is for mappings sized based on NBuffers. But
>>>> if we allocate 10% for MAIN_SHMEM_SEGMENT, won't that be a problem the
>>>> moment someone increases of max_connection, max_locks_per_transaction
>>>> and possibly some other stuff?
>>>
>>> Can you elaborate, what do you mean by that? Increasing max_connection,
>>> etc. leading to increased memory consumption in the MAIN_SHMEM_SEGMENT,
>>> but the ratio is for memory reservation only.
>>>
>>
>> Stuff like PGPROC, fast-path locks etc. are allocated as part of
>> MAIN_SHMEM_SEGMENT, right? Yet the ratio assigns 10% of the maximum
>> space for that. If I significantly increase GUCs like max_connections or
>> max_locks_per_transaction, how do you know it didn't exceed the 10%?
> 
> Still don't see the problem. The 10% we're talking about is the reserved
> space, thus it affects only shared memory resizing operation and nothing
> else. The real memory allocated is less than or equal to the reserved
> size, but is allocated and managed completely in the same way as without
> the patch, including size calculations. If some GUCs are increased and
> drive real memory usage high, it will be handled as before. Are we on
> the same page about this?
> 

How do you know reserving 10% is sufficient? Imagine I set

  max_available_memory = '256MB'
  max_connections = 1000000
  max_locks_per_transaction = 10000

How do you know it's not more than 10% of the available memory?

FWIW if I add a simple assert to CreateAnonymousSegment

  Assert(mapping->shmem_reserved >= allocsize);

it crashes even with just the max_available_memory=256MB

  #4  0x0000000000b74fbd in ExceptionalCondition (conditionName=0xe25920
"mapping->shmem_reserved >= allocsize", fileName=0xe251e7 "pg_shmem.c",
lineNumber=878) at assert.c:66

because we happen to execute it with this:

  mapping->shmem_reserved 26845184  allocsize 125042688

I think I mentioned a similar crash earlier, not sure if that's the same
issue or a different one.

>>>> 11) Actually, what's the difference between the contents of Mappings
>>>>     and Segments? Isn't that the same thing, indexed in the same way?
>>>>     Or could it be unified? Or are they conceptually different thing?
>>>
>>> Unless I'm mixing something badly, the content is the same. The relation
>>> is a segment as a structure "contains" a mapping.
>>>
>> Then, why do we need to track it in two places? Doesn't it just increase
>> the likelihood that someone misses updating one of them?
> 
> To clarify, under "contents" I mean the shared memory content (the
> actual data) behind both "segment" and the "mapping", maybe you had
> something else in mind.
> 
> On the surface of it those are two different data structures that have
> mostly different, but related, fields: a shared memory segment contains
> stuff needed for working with memory (header, base, end, lock), mapping
> has more lower level details, e.g.  reserved space, fd, IPC key. The
> only common fields are size and address, maybe I can factor them out to
> not repeat.
> 

OK, I think I'm just confused by the ambiguous definitions of
segment/mapping. It'd be good to document/explain this in a comment
somewhere.

>>>> 2) In fact, what happens if the user tries to resize to a value that is
>>>>    too large for one of the segments? How would the system know before
>>>>    starting the resize (and failing)?
>>>
>>> This type of situation is handled (doing hard stop) in the latest
>>> version, because all the necessary information is present in the mapping
>>> structure.
>>>
>> I don't know, but crashing the instance (I assume that's what you mean
>> by hard stop) does not seem like something we want to do. AFAIK the GUC
>> hook should be able to determine if the value is too large, and reject
>> it at that point. Not proceed and crash everything.
> 
> I see, you're pointing out that it would be good to have more validation
> at the GUC level, right?

Well, that'd be a starting point. We definitely should not allow setting
a value that end up crashing an instance (it does not matter if it's
because of FATAL or hitting a segfault/sigbut somewhere).

cheers

-- 
Tomas Vondra