Thread
-
Re: pg_stat_io_histogram
Jakub Wartak <jakub.wartak@enterprisedb.com> — 2026-05-20T08:37:28Z
On Fri, May 8, 2026 at 11:57 AM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > > On Thu, Mar 19, 2026 at 11:16 AM Jakub Wartak > <jakub.wartak@enterprisedb.com> wrote: > > > > On Wed, Mar 18, 2026 at 2:29 PM Jakub Wartak > > <jakub.wartak@enterprisedb.com> wrote: > > > > > > On Tue, Mar 17, 2026 at 3:17 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2026-03-17 13:13:59 +0100, Jakub Wartak wrote: > > > > > 1. Concerns about memory use. With v7 I had couple of ideas, and with those > > > > > the memory use is really minimized as long as the code is still simple > > > > > (so nothing fancy, just some ideas to trim stuff and dynamically allocate > > > > > memory). I hope those reduce memory footprint to acceptable levels, see my > > > > > earlier description for v7. > > > > > > > > Personally I unfortunately continue to think that storing lots of values that > > > > are never anything but zero isn't a good idea once you have more than a > > > > handful of kB. Storing pointless data is something different than increasing > > > > memory usage with actual information. > > > > > > > > I still think you should just count the number of histograms needed, have an > > > > array [object][context][op] with the associated histogram "offset" and then > > > > increment the associated offset. It'll add an indirection at count time, but > > > > no additional branches. > > > > > > Great idea, thanks, I haven't thought about that! Attached v9 attempts to do > > > that for pending backend I/O struct, which minimizes the (backend) memory > > > footprint for client backends to just about ~5kB. > > > > > > I have been pulling my hair trying to achieve the same for shared-memory, but I > > > have failed to do that w/o sinking into complexity [..] > > > > OK, I've made it done too with indirect offset on shared memory, it wasn't easy > > at least for me, but now we have two approaches/patchsets: > > > [..] > > v9b: with more code and build complexity but that should address concern of not > > used memory > > > > 'Shared Memory Stats' allocated size: > > master - uses ~308kB for shm > > v9a-000[12]: 578kB shm > > v9a-000[123]: 507kB shm > > v9a-000[1234]: 471kB shm (+~163kB more) > > > > v9b-000[123]: 361kB shm > > > > v9a-000[12] are identical to v9b-00[12], but included just for > > patchset completeness. > > > > In v9b meson/autoconf (for adding pgstat_io_genstats) build most of > > the time what > > they need, but probably that needs some cleanups and better dependency > > tracking. I'm > > not sure about correctnes of those changes as especially > > autoconf/Makefile is a lot > > like brainf**k to me and that area would need some help... > > > > I think now we could even increase max resolution of buckets to cover > > max those maximum > > of 32s+ (at the cost of one extra 64-byte cacheline for pending IO > > stats, so go with > > PGSTAT_IO_HIST_BUCKETS from 16 to 24) > > Good morning all, > > Ok here comes v10, which is bit like earlier v9b (so has reduced shared memory > footprint using Yours idea about indirect offsets idea), but now with shm memory > sized and allocated on startup by postmaster. There are 3 patches: > - 0001, one to introduce view and bucketting, no changes since quite some time > - 0002, saves some private (backend) memory > - 0003, main meat, saving shared memory (main problem raised earlier), > now switched > to simply dynamically size shared memory based on those pgstat_track_io*() > logic > > The problem with the 0003 earlier was that I wanted to absolutley avoid further > complexiy/alterations in struct PgStat_IO related to dynamic shared memory > allocation for hist_time_buckets_slots[PGSTAT_IO_HIST_BUCKET_SLOTS] > [PGSTAT_IO_HIST_BUCKETS] (I was afraid to touch that shm code, it > looks complex), > so I had to come out with something that would tell us how many slots > (PGSTAT_IO_HIST_BUCKET_SLOTS) we need, I wish we had C++'s `constexpr` that > would do all of that. I've tried three aproaches (like in v9b but that hit > some serious cross-compiling obstacles, also had perl doing that, but that > had lots of code duplication), so in the end I had to alter the pgstat_io > shm allocation which is now in 0003. > > Summary of changes in 0003 since v9b / earlier post: > - Fixed potential race condition (touch via memset/memcpy() only histogram > slots under LWLock) > - Fixed/removed the PGSTAT_IO_HIST_BUCKET_SLOTS macro > - Removed pgstat_io_genslots.c (first idea, above) and abandonded attempt to > fixup some cross compilation woes on MSVC/mingw > - Bumped PGSTAT_FILE_FORMAT_ID > - Move/optimize pending_off in pgstat_io_flush_cb out of hot loop > - Document that hist_time_buckets_offsets should be the last member of > PgStat_BktypeIO > - Be defensive - added some asserts() > - Adjust _bucket_offsets from uint64 to just int to save memory (offsets are low > numbers) > - and finally moved to dynamic shm allocation of PgStat_IO stuff during > startup > > At the end of the day, I'll squeze 000[123] into just one, but wanted > to ease the > review first a bit. Of course this is material for PG20. Just noticed it needed a rebase (due to c7cb8e5b73c6; renumber_oids.pl), so v11 attached before I forget. -J.