Re: Extended Statistics set/restore/clear functions.

Michael Paquier <michael@paquier.xyz>

From: Michael Paquier <michael@paquier.xyz>

To: Corey Huinker <corey.huinker@gmail.com>

Cc: Tomas Vondra <tomas@vondra.me>, jian he <jian.universality@gmail.com>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us

Date: 2025-11-07T22:56:48Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add test doing some cloning of extended statistics data
- fc365e4fccc4 19 (unreleased) landed
Add test for pg_restore_extended_stats() with multiranges
- 0b7beec42ae2 19 (unreleased) landed
Add support for "mcv" in pg_restore_extended_stats()
- efbebb4e8587 19 (unreleased) landed
Include extended statistics data in pg_dump
- c32fb29e979d 19 (unreleased) landed
Add support for "dependencies" in pg_restore_extended_stats()
- 302879bd68d1 19 (unreleased) landed
Add test for MAINTAIN permission with pg_restore_extended_stats()
- d9abd9e1050d 19 (unreleased) landed
Add pg_restore_extended_stats()
- 0e80f3f88dea 19 (unreleased) landed
Add routine to free MCVList
- 7ebb64c55757 19 (unreleased) landed
Improve pg_clear_extended_stats() with incorrect relation/stats combination
- 395b73c045e0 19 (unreleased) landed
Add pg_clear_extended_stats()
- d756fa1019ff 19 (unreleased) landed
Introduce routines to validate and free MVNDistinct and MVDependencies
- 32e27bd32082 19 (unreleased) landed
Fix typo in stat_utils.c
- eee19a30d60d 19 (unreleased) landed
Move attribute statistics functions to stat_utils.c
- 213a1b895270 19 (unreleased) landed
Improve error messages of input functions for pg_dependencies and pg_ndistinct
- f68597ee777d 19 (unreleased) landed
Improve test output of extended statistics for ndistinct and dependencies
- 2f04110225ab 19 (unreleased) landed
Fix some compiler warnings
- 7bc88c3d6f3a 19 (unreleased) landed
Add input function for data type pg_dependencies
- e1405aa5e3ac 19 (unreleased) landed
Add input function for data type pg_ndistinct
- 44eba8f06e55 19 (unreleased) landed
Rework output format of pg_dependencies
- e76defbcf09e 19 (unreleased) landed
Rework output format of pg_ndistinct
- 1f927cce4498 19 (unreleased) landed
Fix comments of output routines for pg_ndistinct and pg_dependencies
- 040a39ed25bf 19 (unreleased) landed
Move code specific to pg_dependencies to new file
- 2ddc8d9e9baa 19 (unreleased) landed
Move code specific to pg_ndistinct to new file
- a5523123430f 19 (unreleased) landed
Document some structures in attribute_stats.c
- d6c132d83bff 19 (unreleased) landed
Fix FATAL message for invalid recovery timeline at beginning of recovery
- 71f17823ba01 18.0 cited

On Fri, Nov 07, 2025 at 05:28:50PM -0500, Corey Huinker wrote:
> I'm open to other formats, but aside from renaming the json keys (maybe
> "attnums" or "keys" instead of "attributes"?), I'm not sure what really
> could be done and still be JSON. I suppose we could go with a tuple format
> like this:
> 
> '{({3,4},11),...}' for pg_ndistinct and
> '{({3},4,1.00000),...}'  for pg_dependencies.
> 
> Those would certainly be more compact, but makes for a hard read by humans,
> and while the JSON code is big, it's also proven in other parts of the
> codebase, hence less risky.

I've liked the human-readability factor of the format in the current
patches with names in the keys, and values assigned to each property.

Another thing that may be worth doing is pushing the names of the keys
and some its the JSON meta-data shaping the object into a new header
than can be loaded by both the backend and the frontend.  It would be
nice to not hardcode this knowledge in a bunch of places if we finish
by renaming these attributes.

> A part of me thinks that everything that remains after removing
> in/out/send/recv is just taking a table sample data structure and crunching
> numbers to come up with the deserialized data structure...that's in/out
> with a different starting/ending points.
> 
> There's no denying that JSON parsing is a very different code style than
> statistical number crunching, and mixing the two is incongruous, so it's
> worth a shot, and I'll try that for v9.

Yeah, right.  Thanks.  The parsing pieces seem like pieces worth their
own file.

> The functions in question are needed because the exprs value is itself an
> array of partly-filled-out pg_attribute tuples, so it's common to those two
> needs, but specific to stats about attributes. Maybe we need an
> attr_stats_utils.h?

Hmm, maybe.  I'd be OK to revisit these structures once we're happy
with the in/out structures.  That would be a good start point before
working on the SQL functions and the dump/restore bits in more
details.
--
Michael