Thread

  1. Re: pg_stat_io_histogram

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2026-05-20T08:37:28Z

    On Fri, May 8, 2026 at 11:57 AM Jakub Wartak
    <jakub.wartak@enterprisedb.com> wrote:
    >
    > On Thu, Mar 19, 2026 at 11:16 AM Jakub Wartak
    > <jakub.wartak@enterprisedb.com> wrote:
    > >
    > > On Wed, Mar 18, 2026 at 2:29 PM Jakub Wartak
    > > <jakub.wartak@enterprisedb.com> wrote:
    > > >
    > > > On Tue, Mar 17, 2026 at 3:17 PM Andres Freund <andres@anarazel.de> wrote:
    > > > > On 2026-03-17 13:13:59 +0100, Jakub Wartak wrote:
    > > > > > 1. Concerns about memory use. With v7 I had couple of ideas, and with those
    > > > > > the memory use is really minimized as long as the code is still simple
    > > > > > (so nothing fancy, just some ideas to trim stuff and dynamically allocate
    > > > > > memory). I hope those reduce memory footprint to acceptable levels, see my
    > > > > > earlier description for v7.
    > > > >
    > > > > Personally I unfortunately continue to think that storing lots of values that
    > > > > are never anything but zero isn't a good idea once you have more than a
    > > > > handful of kB. Storing pointless data is something different than increasing
    > > > > memory usage with actual information.
    > > > >
    > > > > I still think you should just count the number of histograms needed, have an
    > > > > array [object][context][op] with the associated histogram "offset" and then
    > > > > increment the associated offset.  It'll add an indirection at count time, but
    > > > > no additional branches.
    > > >
    > > > Great idea, thanks, I haven't thought about that! Attached v9 attempts to do
    > > > that for pending backend I/O struct, which minimizes the (backend) memory
    > > > footprint for client backends to just about ~5kB.
    > > >
    > > > I have been pulling my hair trying to achieve the same for shared-memory, but I
    > > > have failed to do that w/o sinking into complexity [..]
    > >
    > > OK, I've made  it done too with indirect offset on shared memory, it wasn't easy
    > > at least for me, but now we have two approaches/patchsets:
    > >
    > [..]
    > > v9b: with more code and build complexity but that should address concern of not
    > >      used memory
    > >
    > > 'Shared Memory Stats' allocated size:
    > > master - uses ~308kB for shm
    > > v9a-000[12]: 578kB shm
    > > v9a-000[123]: 507kB shm
    > > v9a-000[1234]: 471kB shm (+~163kB more)
    > >
    > > v9b-000[123]: 361kB shm
    > >
    > > v9a-000[12] are identical to v9b-00[12], but included just for
    > > patchset completeness.
    > >
    > > In v9b meson/autoconf (for adding pgstat_io_genstats) build most of
    > > the time what
    > > they need, but probably that needs some cleanups and better dependency
    > > tracking. I'm
    > > not sure about correctnes of those changes as especially
    > > autoconf/Makefile is a lot
    > > like brainf**k to me and that area would need some help...
    > >
    > > I think now we could even increase max resolution of buckets to cover
    > > max those maximum
    > > of 32s+ (at the cost of one extra 64-byte cacheline for pending IO
    > > stats, so go with
    > > PGSTAT_IO_HIST_BUCKETS from 16 to 24)
    >
    > Good morning all,
    >
    > Ok here comes v10, which is bit like earlier v9b (so has reduced shared memory
    > footprint using Yours idea about indirect offsets idea), but now with shm memory
    > sized and allocated on startup by postmaster. There are 3 patches:
    > - 0001, one to introduce view and bucketting, no changes since quite some time
    > - 0002, saves some private (backend) memory
    > - 0003, main meat, saving shared memory (main problem raised earlier),
    > now switched
    >   to simply dynamically size shared memory based on those pgstat_track_io*()
    >   logic
    >
    > The problem with the 0003 earlier was that I wanted to absolutley avoid further
    > complexiy/alterations in struct PgStat_IO related to dynamic shared memory
    > allocation for hist_time_buckets_slots[PGSTAT_IO_HIST_BUCKET_SLOTS]
    > [PGSTAT_IO_HIST_BUCKETS] (I was afraid to touch that shm code, it
    > looks complex),
    > so I had to come out with something that would tell us how many slots
    > (PGSTAT_IO_HIST_BUCKET_SLOTS) we need, I wish we had C++'s `constexpr` that
    > would do all of that. I've tried three aproaches (like in v9b but that hit
    > some serious cross-compiling obstacles, also had perl doing that, but that
    > had lots of code duplication), so in the end I had to alter the pgstat_io
    > shm allocation which is now in 0003.
    >
    > Summary of changes in 0003 since v9b / earlier post:
    > - Fixed potential race condition (touch via memset/memcpy() only histogram
    >   slots under LWLock)
    > - Fixed/removed the PGSTAT_IO_HIST_BUCKET_SLOTS macro
    > - Removed pgstat_io_genslots.c (first idea, above) and abandonded attempt to
    >   fixup some cross compilation woes on MSVC/mingw
    > - Bumped PGSTAT_FILE_FORMAT_ID
    > - Move/optimize pending_off in pgstat_io_flush_cb out of hot loop
    > - Document that hist_time_buckets_offsets should be the last member of
    > PgStat_BktypeIO
    > - Be defensive - added some asserts()
    > - Adjust _bucket_offsets from uint64 to just int to save memory (offsets are low
    >   numbers)
    > - and finally moved to dynamic shm allocation of PgStat_IO stuff during
    >   startup
    >
    > At the end of the day, I'll squeze 000[123] into just one, but wanted
    > to ease the
    > review first a bit. Of course this is material for PG20.
    
    Just noticed it needed a rebase (due to c7cb8e5b73c6; renumber_oids.pl), so v11
    attached before I forget.
    
    -J.