Re: Extended Statistics set/restore/clear functions.

Corey Huinker <corey.huinker@gmail.com>

From: Corey Huinker <corey.huinker@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: Tomas Vondra <tomas@vondra.me>, jian he <jian.universality@gmail.com>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-12T06:47:33Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

Attachments

>
> Thanks for the new patch.  And FWIW I disagree with this approach:
> cleanup and refactoring pieces make more sense if done first, as these
> lead to less code churn in the final result.  So...  I've begun to put
> my hands on the patch set.  The whole has been restructured a bit, as
> per the attached.  Patch 0001 to 0004 feel OK here, these include two
> code moves and the two output functions:
> - Two new files for adt/, that I'm planning to apply soon as a
> separate cleanup.
> - New output functions, with keys added to a new header named
> statistics_format.h, for frontend and backend consumption.
>

Agreed, 0001-0004 all look good.


> Next comes the input functions.  First, I am unhappy with the amount
> of testing that has been put into ndistinct, first and only input
> facility I've looked at in details for the moment.  I have quickly
> spotted a couple a few issues while testing buggy input, like this one
> that crashes on pointer dereference, not good obviously:
> SELECT '[]'::pg_ndistinct;
>

- I put some work into more specific error messages for invalid values for
both pg_ndistinct and pg_dependencies.
- The check for empty attribute lists and item lists now occur in the
array-end event handler.
- Also tried to standardize conventions between the two data types (switch
statements, similar utility functions, etc).


>
> These are checked in the patches that introduce the functions like
> with pg_ndistinct_validate_items(), based on the list of stxkeys we
> have.  However, I think that this is not enough by itself.  Shouldn't
> we check that the list of items in the array is what we expect based
> on the longest "attributes" array at least, even after a JSON that was
> parsed?  That would be cheap to check in the output function itself,
> at least as a first layer of checks before trying something with the
> import function and cross-checking the list of attributes for the
> extended statistics object.


I added tests for both duplicate attribute sequences as well as making the
first-longest attribute sequence the template by which all later and
shorter sequences are checked. I had been reluctant to add checks like
this, because so many similar validations were removed from the earlier
statistics code like histograms and the like.


>
> I suspect a similar family of issues with pg_dependencies, and it
> would be nice to move the tests with the input function into a new
> regression file, like the other one.
>

Did so.

0001-0004,0007,0009 unchanged.

Significant modification of the stats_import.sql regression tests in 0008
to conform to stricter datatype rules enacted in 0005, 0006.