Re: Extended Statistics set/restore/clear functions.
Michael Paquier <michael@paquier.xyz>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add test doing some cloning of extended statistics data
- fc365e4fccc4 19 (unreleased) landed
-
Add test for pg_restore_extended_stats() with multiranges
- 0b7beec42ae2 19 (unreleased) landed
-
Add support for "mcv" in pg_restore_extended_stats()
- efbebb4e8587 19 (unreleased) landed
-
Include extended statistics data in pg_dump
- c32fb29e979d 19 (unreleased) landed
-
Add support for "dependencies" in pg_restore_extended_stats()
- 302879bd68d1 19 (unreleased) landed
-
Add test for MAINTAIN permission with pg_restore_extended_stats()
- d9abd9e1050d 19 (unreleased) landed
-
Add pg_restore_extended_stats()
- 0e80f3f88dea 19 (unreleased) landed
-
Add routine to free MCVList
- 7ebb64c55757 19 (unreleased) landed
-
Improve pg_clear_extended_stats() with incorrect relation/stats combination
- 395b73c045e0 19 (unreleased) landed
-
Add pg_clear_extended_stats()
- d756fa1019ff 19 (unreleased) landed
-
Introduce routines to validate and free MVNDistinct and MVDependencies
- 32e27bd32082 19 (unreleased) landed
-
Fix typo in stat_utils.c
- eee19a30d60d 19 (unreleased) landed
-
Move attribute statistics functions to stat_utils.c
- 213a1b895270 19 (unreleased) landed
-
Improve error messages of input functions for pg_dependencies and pg_ndistinct
- f68597ee777d 19 (unreleased) landed
-
Improve test output of extended statistics for ndistinct and dependencies
- 2f04110225ab 19 (unreleased) landed
-
Fix some compiler warnings
- 7bc88c3d6f3a 19 (unreleased) landed
-
Add input function for data type pg_dependencies
- e1405aa5e3ac 19 (unreleased) landed
-
Add input function for data type pg_ndistinct
- 44eba8f06e55 19 (unreleased) landed
-
Rework output format of pg_dependencies
- e76defbcf09e 19 (unreleased) landed
-
Rework output format of pg_ndistinct
- 1f927cce4498 19 (unreleased) landed
-
Fix comments of output routines for pg_ndistinct and pg_dependencies
- 040a39ed25bf 19 (unreleased) landed
-
Move code specific to pg_dependencies to new file
- 2ddc8d9e9baa 19 (unreleased) landed
-
Move code specific to pg_ndistinct to new file
- a5523123430f 19 (unreleased) landed
-
Document some structures in attribute_stats.c
- d6c132d83bff 19 (unreleased) landed
-
Fix FATAL message for invalid recovery timeline at beginning of recovery
- 71f17823ba01 18.0 cited
Attachments
- v10-0001-Make-pg_ndinstinct-a-proper-adt.patch (text/x-diff) patch v10-0001
- v10-0002-Make-pg_dependencies-a-proper-adt.patch (text/x-diff) patch v10-0002
- v10-0003-Refactor-output-format-of-pg_ndistinct.patch (text/x-diff) patch v10-0003
- v10-0004-Refactor-output-format-of-pg_dependencies.patch (text/x-diff) patch v10-0004
- v10-0005-Add-working-input-function-for-pg_ndistinct.patch (text/x-diff) patch v10-0005
- v10-0006-Add-working-input-function-for-pg_dependencies.patch (text/x-diff) patch v10-0006
- v10-0007-Expose-attribute-statistics-functions-for-use-in.patch (text/x-diff) patch v10-0007
- v10-0008-Add-extended-statistics-support-functions.patch (text/x-diff) patch v10-0008
- v10-0009-Include-Extended-Statistics-in-pg_dump.patch (text/x-diff) patch v10-0009
On Mon, Nov 10, 2025 at 12:33:40AM -0500, Corey Huinker wrote:
> It may not be quite what you wanted, but the attribute names are now static
> constants in the new adt c files. It's possible/probable that you wanted
> them in some header file, but so far I haven't had to create any new header
> files, but that can be done if desired.
No, that's not the best thing we can do with the dump/restore pieces
in mind. Let's put that in a separate header.
> That's done in the 0008-0009 patches. If I was starting from scratch, I
> would have moved the pre-existing in/out/send/recv functions to their own
> files in their own patches before changing the output format, but tacked on
> at the end like they are it's easier to see what the changes were, and the
> patches will probably get squashed together anyway.
Thanks for the new patch. And FWIW I disagree with this approach:
cleanup and refactoring pieces make more sense if done first, as these
lead to less code churn in the final result. So... I've begun to put
my hands on the patch set. The whole has been restructured a bit, as
per the attached. Patch 0001 to 0004 feel OK here, these include two
code moves and the two output functions:
- Two new files for adt/, that I'm planning to apply soon as a
separate cleanup.
- New output functions, with keys added to a new header named
statistics_format.h, for frontend and backend consumption.
Next comes the input functions. First, I am unhappy with the amount
of testing that has been put into ndistinct, first and only input
facility I've looked at in details for the moment. I have quickly
spotted a couple a few issues while testing buggy input, like this one
that crashes on pointer dereference, not good obviously:
SELECT '[]'::pg_ndistinct;
There was a second one with the error message generated when using an
incorrect key value.
Second, the inputs are too permissive and could be more strictly
checked IMHO. For example, patterns like that are incorrect, still
authorized with only the patches up to 0005 in:
- Duplicated list of attributes:
SELECT '[{"attributes" : [2,3], "ndistinct" : 4},
{"attributes" : [2,3], "ndistinct" : 4}]'::pg_ndistinct;
- Partial (K,N) sets, for example say we take stats on attrs (1,2,3),
a partial input like this one is basically OK:
SELECT '[{"attributes" : [1,3], "ndistinct" : 4},
{"attributes" : [1,2,3], "ndistinct" : 4}]'::pg_ndistinct;
These are checked in the patches that introduce the functions like
with pg_ndistinct_validate_items(), based on the list of stxkeys we
have. However, I think that this is not enough by itself. Shouldn't
we check that the list of items in the array is what we expect based
on the longest "attributes" array at least, even after a JSON that was
parsed? That would be cheap to check in the output function itself,
at least as a first layer of checks before trying something with the
import function and cross-checking the list of attributes for the
extended statistics object. This means checking that for N attributes
we have all the elements we'd expect in each element of the array,
without gaps or duplications, with an extra step done once the JSON
parsing is finished. Except for this sanity issue this part of the
patch set should be mostly OK, plus more cleanup and more typo/grammar
fixes.
I suspect a similar family of issues with pg_dependencies, and it
would be nice to move the tests with the input function into a new
regression file, like the other one.
I've rebased the full set using the new structure. 0001~0004 are
clean. 0005~ need more work and analysis, but that's a start.
--
Michael