Re: Changing shared_buffers without restart

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>
To: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org
Date: 2024-12-03T14:31:19Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove PG_MMAP_FLAGS from mem.h

  2. Improve runtime and output of tests for replication slots checkpointing.

  3. Revert support for improved tracking of nested queries

  4. Use exported symbols list on macOS for loadable modules as well

  5. Add support for basic NUMA awareness

  6. Avoid unnecessary copying of a string in pg_restore.c

  7. aio: Infrastructure for io_method=worker

  8. Improve InitShmemAccess() prototype

On Mon, Dec 2, 2024 at 2:18 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've asked about that in linux-mm [1]. To my surprise, the
> recommendations were to stick to creating a large mapping in advance,
> and slice smaller mappings out of that, which could be resized later.
> The OOM score should not be affected, and hugetlb could be avoided using
> MAP_NORESERVE flag for the initial mapping (I've experimented with that,
> seems to be working just fine, even if the slices are not using
> MAP_NORESERVE).
>
> I guess that would mean I'll try to experiment with this approach as
> well. But what others think? How much research do we need to do, to gain
> some confidence about large shared mappings and make it realistically
> acceptable?

Personally, I like this approach. It seems to me that this opens up
the possibility of a system where the virtual addresses of data
structures in shared memory never change, which I think will avoid an
absolutely massive amount of implementation complexity. It's obviously
not ideal that we have to specify in advance an upper limit on the
potential size of shared_buffers, but we can live with it. It's better
than what we have today; and certainly cloud providers will have no
issue with pre-setting that to a reasonable value. I don't know if we
can port it to other operating systems, but it seems at least possible
that they offer similar primitives, or will in the future; if not, we
can disable the feature on those platforms.

I still think the synchronization is going to be tricky. For example
when you go to shrink a mapping, you need to make sure that it's free
of buffers that anyone might touch; and when you grow a mapping, you
need to make sure that nobody tries to touch that address space before
they grow the mapping, which goes back to my earlier point about
someone doing a lookup into the buffer mapping table and finding a
buffer number that is beyond the end of what they've already mapped.
But I think it may be doable with sufficient cleverness.

-- 
Robert Haas
EDB: http://www.enterprisedb.com