Thread

Re: pg_stat_io_histogram

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2026-05-20T08:37:28Z
On Fri, May 8, 2026 at 11:57 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> On Thu, Mar 19, 2026 at 11:16 AM Jakub Wartak
> <jakub.wartak@enterprisedb.com> wrote:
> >
> > On Wed, Mar 18, 2026 at 2:29 PM Jakub Wartak
> > <jakub.wartak@enterprisedb.com> wrote:
> > >
> > > On Tue, Mar 17, 2026 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2026-03-17 13:13:59 +0100, Jakub Wartak wrote:
> > > > > 1. Concerns about memory use. With v7 I had couple of ideas, and with those
> > > > > the memory use is really minimized as long as the code is still simple
> > > > > (so nothing fancy, just some ideas to trim stuff and dynamically allocate
> > > > > memory). I hope those reduce memory footprint to acceptable levels, see my
> > > > > earlier description for v7.
> > > >
> > > > Personally I unfortunately continue to think that storing lots of values that
> > > > are never anything but zero isn't a good idea once you have more than a
> > > > handful of kB. Storing pointless data is something different than increasing
> > > > memory usage with actual information.
> > > >
> > > > I still think you should just count the number of histograms needed, have an
> > > > array [object][context][op] with the associated histogram "offset" and then
> > > > increment the associated offset.  It'll add an indirection at count time, but
> > > > no additional branches.
> > >
> > > Great idea, thanks, I haven't thought about that! Attached v9 attempts to do
> > > that for pending backend I/O struct, which minimizes the (backend) memory
> > > footprint for client backends to just about ~5kB.
> > >
> > > I have been pulling my hair trying to achieve the same for shared-memory, but I
> > > have failed to do that w/o sinking into complexity [..]
> >
> > OK, I've made  it done too with indirect offset on shared memory, it wasn't easy
> > at least for me, but now we have two approaches/patchsets:
> >
> [..]
> > v9b: with more code and build complexity but that should address concern of not
> >      used memory
> >
> > 'Shared Memory Stats' allocated size:
> > master - uses ~308kB for shm
> > v9a-000[12]: 578kB shm
> > v9a-000[123]: 507kB shm
> > v9a-000[1234]: 471kB shm (+~163kB more)
> >
> > v9b-000[123]: 361kB shm
> >
> > v9a-000[12] are identical to v9b-00[12], but included just for
> > patchset completeness.
> >
> > In v9b meson/autoconf (for adding pgstat_io_genstats) build most of
> > the time what
> > they need, but probably that needs some cleanups and better dependency
> > tracking. I'm
> > not sure about correctnes of those changes as especially
> > autoconf/Makefile is a lot
> > like brainf**k to me and that area would need some help...
> >
> > I think now we could even increase max resolution of buckets to cover
> > max those maximum
> > of 32s+ (at the cost of one extra 64-byte cacheline for pending IO
> > stats, so go with
> > PGSTAT_IO_HIST_BUCKETS from 16 to 24)
>
> Good morning all,
>
> Ok here comes v10, which is bit like earlier v9b (so has reduced shared memory
> footprint using Yours idea about indirect offsets idea), but now with shm memory
> sized and allocated on startup by postmaster. There are 3 patches:
> - 0001, one to introduce view and bucketting, no changes since quite some time
> - 0002, saves some private (backend) memory
> - 0003, main meat, saving shared memory (main problem raised earlier),
> now switched
>   to simply dynamically size shared memory based on those pgstat_track_io*()
>   logic
>
> The problem with the 0003 earlier was that I wanted to absolutley avoid further
> complexiy/alterations in struct PgStat_IO related to dynamic shared memory
> allocation for hist_time_buckets_slots[PGSTAT_IO_HIST_BUCKET_SLOTS]
> [PGSTAT_IO_HIST_BUCKETS] (I was afraid to touch that shm code, it
> looks complex),
> so I had to come out with something that would tell us how many slots
> (PGSTAT_IO_HIST_BUCKET_SLOTS) we need, I wish we had C++'s `constexpr` that
> would do all of that. I've tried three aproaches (like in v9b but that hit
> some serious cross-compiling obstacles, also had perl doing that, but that
> had lots of code duplication), so in the end I had to alter the pgstat_io
> shm allocation which is now in 0003.
>
> Summary of changes in 0003 since v9b / earlier post:
> - Fixed potential race condition (touch via memset/memcpy() only histogram
>   slots under LWLock)
> - Fixed/removed the PGSTAT_IO_HIST_BUCKET_SLOTS macro
> - Removed pgstat_io_genslots.c (first idea, above) and abandonded attempt to
>   fixup some cross compilation woes on MSVC/mingw
> - Bumped PGSTAT_FILE_FORMAT_ID
> - Move/optimize pending_off in pgstat_io_flush_cb out of hot loop
> - Document that hist_time_buckets_offsets should be the last member of
> PgStat_BktypeIO
> - Be defensive - added some asserts()
> - Adjust _bucket_offsets from uint64 to just int to save memory (offsets are low
>   numbers)
> - and finally moved to dynamic shm allocation of PgStat_IO stuff during
>   startup
>
> At the end of the day, I'll squeze 000[123] into just one, but wanted
> to ease the
> review first a bit. Of course this is material for PG20.

Just noticed it needed a rebase (due to c7cb8e5b73c6; renumber_oids.pl), so v11
attached before I forget.

-J.