Re: Extended Statistics set/restore/clear functions.
Michael Paquier <michael@paquier.xyz>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add test doing some cloning of extended statistics data
- fc365e4fccc4 19 (unreleased) landed
-
Add test for pg_restore_extended_stats() with multiranges
- 0b7beec42ae2 19 (unreleased) landed
-
Add support for "mcv" in pg_restore_extended_stats()
- efbebb4e8587 19 (unreleased) landed
-
Include extended statistics data in pg_dump
- c32fb29e979d 19 (unreleased) landed
-
Add support for "dependencies" in pg_restore_extended_stats()
- 302879bd68d1 19 (unreleased) landed
-
Add test for MAINTAIN permission with pg_restore_extended_stats()
- d9abd9e1050d 19 (unreleased) landed
-
Add pg_restore_extended_stats()
- 0e80f3f88dea 19 (unreleased) landed
-
Add routine to free MCVList
- 7ebb64c55757 19 (unreleased) landed
-
Improve pg_clear_extended_stats() with incorrect relation/stats combination
- 395b73c045e0 19 (unreleased) landed
-
Add pg_clear_extended_stats()
- d756fa1019ff 19 (unreleased) landed
-
Introduce routines to validate and free MVNDistinct and MVDependencies
- 32e27bd32082 19 (unreleased) landed
-
Fix typo in stat_utils.c
- eee19a30d60d 19 (unreleased) landed
-
Move attribute statistics functions to stat_utils.c
- 213a1b895270 19 (unreleased) landed
-
Improve error messages of input functions for pg_dependencies and pg_ndistinct
- f68597ee777d 19 (unreleased) landed
-
Improve test output of extended statistics for ndistinct and dependencies
- 2f04110225ab 19 (unreleased) landed
-
Fix some compiler warnings
- 7bc88c3d6f3a 19 (unreleased) landed
-
Add input function for data type pg_dependencies
- e1405aa5e3ac 19 (unreleased) landed
-
Add input function for data type pg_ndistinct
- 44eba8f06e55 19 (unreleased) landed
-
Rework output format of pg_dependencies
- e76defbcf09e 19 (unreleased) landed
-
Rework output format of pg_ndistinct
- 1f927cce4498 19 (unreleased) landed
-
Fix comments of output routines for pg_ndistinct and pg_dependencies
- 040a39ed25bf 19 (unreleased) landed
-
Move code specific to pg_dependencies to new file
- 2ddc8d9e9baa 19 (unreleased) landed
-
Move code specific to pg_ndistinct to new file
- a5523123430f 19 (unreleased) landed
-
Document some structures in attribute_stats.c
- d6c132d83bff 19 (unreleased) landed
-
Fix FATAL message for invalid recovery timeline at beginning of recovery
- 71f17823ba01 18.0 cited
Attachments
- v8-0001-Refactor-output-format-of-pg_ndistinct.patch (text/x-diff) patch v8-0001
- v8-0002-Add-working-input-function-for-pg_ndistinct.patch (text/x-diff) patch v8-0002
- v8-0003-Refactor-output-format-of-pg_dependencies.patch (text/x-diff) patch v8-0003
- v8-0004-Add-working-input-function-for-pg_dependencies.patch (text/x-diff) patch v8-0004
- v8-0005-Expose-attribute-statistics-functions-for-use-in-.patch (text/x-diff) patch v8-0005
- v8-0006-Add-extended-statistics-support-functions.patch (text/x-diff) patch v8-0006
- v8-0007-Include-Extended-Statistics-in-pg_dump.patch (text/x-diff) patch v8-0007
On Thu, Nov 06, 2025 at 01:35:34PM -0500, Corey Huinker wrote:
> So the mailing list archive will still pick it up? That's nice.
It did. My email client does not care much either.
> Rebased to reflect that commit.
I have spent a bit more time on this set.
Patch 0001 for ndistinct was missing a documentation update, we have
one query in perform.sgml that looks at stxdndistinct. Patch 0003 is
looking OK here as well.
For dependencies, the format switches from a single json object
with key/vals like that:
"3 => 4": 1.000000
To a JSON array made of elements like that:
{"degree": 1.000000, "attributes": [3],"dependency": 4},
For ndistincts, we move from a JSON blob with key/vals like that:
"3, 4": 11
To a JSON array made of the following elements:
{"ndistinct": 11, "attributes": [3,4]}
Using a keyword within each element would force a stronger validation
when these get imported back, which is a good thing. I like that.
Before going in-depth into the input functions to cross-check the
amount of validation we should do, have folks any comments about the
proposed format? That's the key point this patch set depends on, and
I'd rather not spend more time the whole thing if somebody would like
a different format. This is the format that Tomas has mentioned at
the top of the thread. Note: as noted upthread, pg_dump would be in
charge of transferring the data of the old format to the new format at
the end.
While looking at 0002 and 0004 (which have a couple of issues
actually), I have been wondering about moving into a new file the four
data-type functions (in, out, send and receive) and the new input
functions that rely on a new JSON lexer and parser logic into for both
ndistinct and dependencies. The new set of headers added at the top
of mvdistinct.c and dependencies.c for the new code points that a
separation may be better in the long-term, because the new code relies
on parts of the backend that the existing code does not care about,
and these files become larger than the relation and attribute stats
files. I would be tempted to name these new files pg_dependencies.c
and pg_ndistinct.c, mapping with their catalog types. With this
separation, it looks like the "core" parts in charge of the
calculations with ndistinct and dependencies can be kept on its own.
What do you think?
A second comment is for 0005. The routines of attributes.c are
applied to the new clear and restore functions. Shouldn't these be in
stats_utils.c at the end? That's where the "common" functions used by
the stats manipulation logic are.
--
Michael