Re: Extended Statistics set/restore/clear functions.
Michael Paquier <michael@paquier.xyz>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add test doing some cloning of extended statistics data
- fc365e4fccc4 19 (unreleased) landed
-
Add test for pg_restore_extended_stats() with multiranges
- 0b7beec42ae2 19 (unreleased) landed
-
Add support for "mcv" in pg_restore_extended_stats()
- efbebb4e8587 19 (unreleased) landed
-
Include extended statistics data in pg_dump
- c32fb29e979d 19 (unreleased) landed
-
Add support for "dependencies" in pg_restore_extended_stats()
- 302879bd68d1 19 (unreleased) landed
-
Add test for MAINTAIN permission with pg_restore_extended_stats()
- d9abd9e1050d 19 (unreleased) landed
-
Add pg_restore_extended_stats()
- 0e80f3f88dea 19 (unreleased) landed
-
Add routine to free MCVList
- 7ebb64c55757 19 (unreleased) landed
-
Improve pg_clear_extended_stats() with incorrect relation/stats combination
- 395b73c045e0 19 (unreleased) landed
-
Add pg_clear_extended_stats()
- d756fa1019ff 19 (unreleased) landed
-
Introduce routines to validate and free MVNDistinct and MVDependencies
- 32e27bd32082 19 (unreleased) landed
-
Fix typo in stat_utils.c
- eee19a30d60d 19 (unreleased) landed
-
Move attribute statistics functions to stat_utils.c
- 213a1b895270 19 (unreleased) landed
-
Improve error messages of input functions for pg_dependencies and pg_ndistinct
- f68597ee777d 19 (unreleased) landed
-
Improve test output of extended statistics for ndistinct and dependencies
- 2f04110225ab 19 (unreleased) landed
-
Fix some compiler warnings
- 7bc88c3d6f3a 19 (unreleased) landed
-
Add input function for data type pg_dependencies
- e1405aa5e3ac 19 (unreleased) landed
-
Add input function for data type pg_ndistinct
- 44eba8f06e55 19 (unreleased) landed
-
Rework output format of pg_dependencies
- e76defbcf09e 19 (unreleased) landed
-
Rework output format of pg_ndistinct
- 1f927cce4498 19 (unreleased) landed
-
Fix comments of output routines for pg_ndistinct and pg_dependencies
- 040a39ed25bf 19 (unreleased) landed
-
Move code specific to pg_dependencies to new file
- 2ddc8d9e9baa 19 (unreleased) landed
-
Move code specific to pg_ndistinct to new file
- a5523123430f 19 (unreleased) landed
-
Document some structures in attribute_stats.c
- d6c132d83bff 19 (unreleased) landed
-
Fix FATAL message for invalid recovery timeline at beginning of recovery
- 71f17823ba01 18.0 cited
Attachments
- v18-0001-Add-working-input-function-for-pg_ndistinct.patch (text/x-diff) patch v18-0001
- v18-0002-Add-working-input-function-for-pg_dependencies.patch (text/x-diff) patch v18-0002
On Sat, Nov 22, 2025 at 03:26:19AM -0500, Corey Huinker wrote:
> I added a comment debating the feasibility of testing for subsets of
> attribute sets in pg_dependencies. Basically, I think we can't have the
> test at all, but I haven't removed it just yet pending consensus.
+ * Verify that all attnum sets are a proper subset of the first longest
+ * attnum set.
+ *
+ * TODO:
+ *
+ * I'm fairly certain that because statisticsally insignificant dependency
+ * combinations are not stored, there is a chance that the longest dependency
+ * does not exist, and therefore this test cannot be done. I have left the
+ * test in place for the time being until the issue can be definitively
+ * settled.
As you have already quoted upthread, statext_dependencies_build()
settles the issue on this one, I think. It is entirely possible that
any group returned by DependencyGenerator generates a degree value
that would prevent a given group to be stored, and this could as well
be the largest possible group there could be in the set. So we cannot
do any of that for dependencies, unfortunately. We can always rely on
the list of attributes when assigning the json blob to the stats
object, at least, cross-checking that each attribute list matches with
the numbers of the stats object. At least we can check for
duplicates, which is better than nothing at all.
Regarding the suggested check where we'd want to enforce all the
groups of attributes to be listed depending on the longest set we have
found, at the end estimate_multivariate_ndistinct() checks the items
listed one-by-one, giving up if we cannot find something in the list
of items. I think that I am going to be content with the patch as it
is, without this piece. Let's add an extra SQL test to treat that as
valid input, though. So I am feeling OK with the input for ndistinct
at this stage. I have noticed a couple of issues in passing,
adjusting them. We are reaching more than 90% of coverage with the
tests, and I am not sure that we can actually reach the rest except if
one of the previous steps failed.
So That's one. Now into the second patch for the input of the
dependencies.
+SELECT '[{"attributes" : [2], "dependency" : 4, "degree": "NaN"}]'::pg_dependencies;
+SELECT '[{"attributes" : [2], "dependency" : 4, "degree": "-inf"}]'::pg_dependencies;
+SELECT '[{"attributes" : [2], "dependency" : 4, "degree": "inf"}]'::pg_dependencies;
+SELECT '[{"attributes" : [2], "dependency" : 4, "degree": "-inf"}]'::pg_dependencies::text::pg_dependencies;
Okay, I have to admit that these ones are fun. I doubt that anybody
would actually do that, and these do not produce valid json objects,
which is what the last case shows. Hmm, it makes sense to keep these,
and I'm still siding that we should not care too much about applying
checks on the values and complicate the input function more than that,
so fine by me.
There were a couple of things in the tests, missing quite a few soft
errors. Many typos, grammar mistakes in the whole. Also, please do
not split the error strings into multiple lines to make these
greppable. There is also no need for a break after a return. In some
cases, a return was used where a break made more sense as the default
path returned a failure..
The TODO in build_mvdependencies() could be an elog(), but I have left
it untouched for the errdetail().
We're reaching 91% of coverage here, not bad. The rest does not seem
reachable, as far as I can see.
With that said, a v18 for the first two patches with the input
functions. Comments and/or opinions?
--
Michael