Re: Changing shared_buffers without restart

Dmitry Dolgov <9erthalion6@gmail.com>

From: Dmitry Dolgov <9erthalion6@gmail.com>

To: Robert Haas <robertmhaas@gmail.com>

Cc: pgsql-hackers@postgresql.org

Date: 2024-11-26T19:17:58Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed

> On Mon, Nov 25, 2024 at 02:33:48PM GMT, Robert Haas wrote:
>
> I think the idea of having multiple shared memory segments is
> interesting and makes sense, but I would prefer to see them called
> "segments" rather than "slots" just as do we do for DSMs. The name
> "slot" is somewhat overused, and invites confusion with replication
> slots, inter alia. I think it's possible that having multiple fixed
> shared memory segments will spell trouble on Windows, where we already
> need to use a retry loop to try to get the main shared memory segment
> mapped at the correct address. If there are multiple segments and we
> need whatever ASLR stuff happens on Windows to not place anything else
> overlapping with any of them, that means there's more chances for
> stuff to fail than if we just need one address range to be free.
> Granted, the individual ranges are smaller, so maybe it's fine? But I
> don't know.

I haven't had a chance to experiment with that on Windows, but I'm
hoping that in the worst case fallback to a single mapping via proposed
infrastructure (and the consequent limitations) would be acceptable.

> The big thing that worries me is synchronization, and while I've only
> looked at the patch set briefly, it doesn't look to me as though
> there's enough machinery here to make that work correctly. Suppose
> that shared_buffers=8GB (a million buffers) and I change it to
> shared_buffers=16GB (2 million buffers). As soon as any one backend
> has seen that changed and expanded shared_buffers, there's a
> possibility that some other backend which has not yet seen the change
> might see a buffer number greater than a million. If it tries to use
> that buffer number before it absorbs the change, something bad will
> happen. The most obvious way for it to see such a buffer number - and
> possibly the only one - is to do a lookup in the buffer mapping table
> and find a buffer ID there that was inserted by some other backend
> that has already seen the change.

Right, I haven't put much efforts into synchronization yet. It's in my
bucket list for the next iteration of the patch.

> code, but I'm not sure exactly which points are safe. If we have no
> code anywhere that assumes the address of an unpinned buffer can't
> change before we pin it, then I guess the check for pins is the only
> thing we need, but I don't know that to be the case.

Probably I'm missing something here. What scenario do you have in mind,
when the address of a buffer is changing?

> I guess I would have imagined that a change like this would have to be
> done in phases. In phase 1, we'd tell all of the backends that
> shared_buffers had expanded to some new, larger value; but the new
> buffers wouldn't be usable for anything yet. Then, once we confirmed
> that everyone had the memo, we'd tell all the backends that those
> buffers are now available for use. If shared_buffers were contracted,
> phase 1 would tell all of the backends that shared_buffers had
> contracted to some new, smaller value. Once a particular backend
> learns about that, they will refuse to put any new pages into those
> high-numbered buffers, but the existing contents would still be valid.
> Once everyone has been told about this, we can go through and evict
> all of those buffers, and then let everyone know that's done. Then
> they shrink their mappings.

Yep, sounds good. I was pondering about more crude approach, but doing
this in phases seems to be a way to go.

> It looks to me like the patch doesn't expand the buffer mapping table,
> which seems essential. But maybe I missed that.

Do you mean the "Shared Buffer Lookup Table"? It does expand it, but
under somewhat unfitting name STRATEGY_SHMEM_SLOT. But now that I look
at the code, I see a few issues around that -- so I would have to
improve it anyway, thanks for pointing that out.