Re: Extended Statistics set/restore/clear functions.

Michael Paquier <michael@paquier.xyz>

From: Michael Paquier <michael@paquier.xyz>
To: Corey Huinker <corey.huinker@gmail.com>
Cc: jian he <jian.universality@gmail.com>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-17T06:56:06Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

Attachments

On Fri, Nov 14, 2025 at 03:25:27PM +0900, Michael Paquier wrote:
> Thanks for the new versions, I'll also look at all these across the
> next couple of days.  Probably not at 0005~ for now.

0001 and 0002 from series v13 have been applied to change the output
functions.

And I have looked at 0003 in details for now.  Attached is a revised
version for it, with many adjustments.  Some notes:
- Many portions of the coverage were missing.  I have measured the
coverage at 91% with the updated version attached.  This includes
coverage for some error reporting, something that we rely a lot on for
this code.
- The error reports are made simpler, with the token values getting
hidden.  While testing with some fancy values, I have actually noticed
that the error handlings for the parsing of the int16 and int32 values
were incorrect, the error reports used what the safe functions
generated, not the reports from the data type.
- Passing down arbitrary bytes sequences was leading to these bytes
reported in the error outputs because we cared about the token values.
I have added a few tests based on that for the code paths involved.

There is an extra thing that bugs me as incorrect for the pg_ndistinct
input, something I have not tackled myself yet.  Your patch checks
that subsets of attributes are included in the longest set found, but
it does not match the guarantees we have in mvndistinct.c: we have to
check that *all* the combinations generated by generator_init() are
satisfied based on the longest of attributes detected.  For example,
this is thought as correct in the input function:
SELECT '[{"attributes" : [-1,2], "ndistinct" : 1},
         {"attributes" : [-1,2,3], "ndistinct" : 3}]'::pg_ndistinct;

However it is obviously not correct as we are missing an element for
the attributes [-1, 3].  The simplest solution would be to export the
routines that generate the groups now in mvndistinct.c.  Also we
should make sure that the number of elements in the arrays match with
the number of groups we expect, not only the elements.  I don't think
that we need to care much about the values, but we ought to provide
stronger guarantees for the attributes listed in these elements.

Except for this argument, the input of pg_ndistinct feels OK in terms
of the guarantees that we'd want to enforce on an import.  The same
argument applies in terms of attribute number guarantees for
pg_dependencies, based on DependencyGenerator_init() & friends in
dependencies.c.  Could you look at that?

For pg_dependencies, we also require some checks on the value for
"dependency", of course, making sure that this matches with what's
expected with the "largest" sets of attributes.  In this case, we need
to track the union of "dependency" and "attributes", with "attributes"
having at least one element.

The tests of pg_dependencies need also to be extended more (begun that
a bit, far from being complete and I'm lacking of time this week due
to a conference).  One thing that I would add are nested JSON objects
in the paths where we expect values, for example.  Please note that I
have done a brush of 0004, while on it, cleaning up typos,
inconsistencies and making the error codes consistent with the
ndistinct case where possible.  This is not ready, but that's at least
it's a start to rely on.

In terms of committable bits, it would be better to apply the input
functions once both parts are ready to go.  For now I am attached a
v14 with the work I've put into them.  0005~ are not reviewed yet, as
mentioned previously.  The changes in pg_dependencies are actually
straight-forward to figure out (well, mostly) once the pg_ndistinct
changes are OK in shape.
--
Michael