Re: Changing shared_buffers without restart

Dmitry Dolgov <9erthalion6@gmail.com>

From: Dmitry Dolgov <9erthalion6@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: pgsql-hackers@postgresql.org
Date: 2024-11-26T19:17:58Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove PG_MMAP_FLAGS from mem.h

  2. Improve runtime and output of tests for replication slots checkpointing.

  3. Revert support for improved tracking of nested queries

  4. Use exported symbols list on macOS for loadable modules as well

  5. Add support for basic NUMA awareness

  6. Avoid unnecessary copying of a string in pg_restore.c

  7. aio: Infrastructure for io_method=worker

  8. Improve InitShmemAccess() prototype

> On Mon, Nov 25, 2024 at 02:33:48PM GMT, Robert Haas wrote:
>
> I think the idea of having multiple shared memory segments is
> interesting and makes sense, but I would prefer to see them called
> "segments" rather than "slots" just as do we do for DSMs. The name
> "slot" is somewhat overused, and invites confusion with replication
> slots, inter alia. I think it's possible that having multiple fixed
> shared memory segments will spell trouble on Windows, where we already
> need to use a retry loop to try to get the main shared memory segment
> mapped at the correct address. If there are multiple segments and we
> need whatever ASLR stuff happens on Windows to not place anything else
> overlapping with any of them, that means there's more chances for
> stuff to fail than if we just need one address range to be free.
> Granted, the individual ranges are smaller, so maybe it's fine? But I
> don't know.

I haven't had a chance to experiment with that on Windows, but I'm
hoping that in the worst case fallback to a single mapping via proposed
infrastructure (and the consequent limitations) would be acceptable.

> The big thing that worries me is synchronization, and while I've only
> looked at the patch set briefly, it doesn't look to me as though
> there's enough machinery here to make that work correctly. Suppose
> that shared_buffers=8GB (a million buffers) and I change it to
> shared_buffers=16GB (2 million buffers). As soon as any one backend
> has seen that changed and expanded shared_buffers, there's a
> possibility that some other backend which has not yet seen the change
> might see a buffer number greater than a million. If it tries to use
> that buffer number before it absorbs the change, something bad will
> happen. The most obvious way for it to see such a buffer number - and
> possibly the only one - is to do a lookup in the buffer mapping table
> and find a buffer ID there that was inserted by some other backend
> that has already seen the change.

Right, I haven't put much efforts into synchronization yet. It's in my
bucket list for the next iteration of the patch.

> code, but I'm not sure exactly which points are safe. If we have no
> code anywhere that assumes the address of an unpinned buffer can't
> change before we pin it, then I guess the check for pins is the only
> thing we need, but I don't know that to be the case.

Probably I'm missing something here. What scenario do you have in mind,
when the address of a buffer is changing?

> I guess I would have imagined that a change like this would have to be
> done in phases. In phase 1, we'd tell all of the backends that
> shared_buffers had expanded to some new, larger value; but the new
> buffers wouldn't be usable for anything yet. Then, once we confirmed
> that everyone had the memo, we'd tell all the backends that those
> buffers are now available for use. If shared_buffers were contracted,
> phase 1 would tell all of the backends that shared_buffers had
> contracted to some new, smaller value. Once a particular backend
> learns about that, they will refuse to put any new pages into those
> high-numbered buffers, but the existing contents would still be valid.
> Once everyone has been told about this, we can go through and evict
> all of those buffers, and then let everyone know that's done. Then
> they shrink their mappings.

Yep, sounds good. I was pondering about more crude approach, but doing
this in phases seems to be a way to go.

> It looks to me like the patch doesn't expand the buffer mapping table,
> which seems essential. But maybe I missed that.

Do you mean the "Shared Buffer Lookup Table"? It does expand it, but
under somewhat unfitting name STRATEGY_SHMEM_SLOT. But now that I look
at the code, I see a few issues around that -- so I would have to
improve it anyway, thanks for pointing that out.