Re: Changing shared_buffers without restart

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: Dmitry Dolgov <9erthalion6@gmail.com>

Cc: Matthias van de Meent <boekewurm+postgres@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgresql.org

Date: 2024-12-03T14:31:19Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Remove PG_MMAP_FLAGS from mem.h
- c100340729b6 19 (unreleased) landed
Improve runtime and output of tests for replication slots checkpointing.
- 4464fddf7b50 18.0 cited
Revert support for improved tracking of nested queries
- f85f6ab051b7 18.0 cited
Use exported symbols list on macOS for loadable modules as well
- 3feff3916ee1 18.0 cited
Add support for basic NUMA awareness
- 65c298f61fc7 18.0 cited
Avoid unnecessary copying of a string in pg_restore.c
- 5e1915439085 18.0 cited
aio: Infrastructure for io_method=worker
- 55b454d0e140 18.0 cited
Improve InitShmemAccess() prototype
- 2a7b2d97171d 18.0 landed

On Mon, Dec 2, 2024 at 2:18 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've asked about that in linux-mm [1]. To my surprise, the
> recommendations were to stick to creating a large mapping in advance,
> and slice smaller mappings out of that, which could be resized later.
> The OOM score should not be affected, and hugetlb could be avoided using
> MAP_NORESERVE flag for the initial mapping (I've experimented with that,
> seems to be working just fine, even if the slices are not using
> MAP_NORESERVE).
>
> I guess that would mean I'll try to experiment with this approach as
> well. But what others think? How much research do we need to do, to gain
> some confidence about large shared mappings and make it realistically
> acceptable?

Personally, I like this approach. It seems to me that this opens up
the possibility of a system where the virtual addresses of data
structures in shared memory never change, which I think will avoid an
absolutely massive amount of implementation complexity. It's obviously
not ideal that we have to specify in advance an upper limit on the
potential size of shared_buffers, but we can live with it. It's better
than what we have today; and certainly cloud providers will have no
issue with pre-setting that to a reasonable value. I don't know if we
can port it to other operating systems, but it seems at least possible
that they offer similar primitives, or will in the future; if not, we
can disable the feature on those platforms.

I still think the synchronization is going to be tricky. For example
when you go to shrink a mapping, you need to make sure that it's free
of buffers that anyone might touch; and when you grow a mapping, you
need to make sure that nobody tries to touch that address space before
they grow the mapping, which goes back to my earlier point about
someone doing a lookup into the buffer mapping table and finding a
buffer number that is beyond the end of what they've already mapped.
But I think it may be doable with sufficient cleverness.

-- 
Robert Haas
EDB: http://www.enterprisedb.com