Re: Extended Statistics set/restore/clear functions.

Corey Huinker <corey.huinker@gmail.com>

From: Corey Huinker <corey.huinker@gmail.com>
To: jian he <jian.universality@gmail.com>
Cc: Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org
Date: 2025-03-04T20:30:11Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

Attachments

>
>
>> I think my initial reaction is to just refuse those special values, but
>> I'll look into the parsing code to see what can be done.
>>
>
> I noticed that the output function for pg_ndistinct casts that value to an
> integer before formatting it %d, so it's being treated as an integer even
> if it is not stored as one. After some consultation with Tomas, it made the
> most sense to just replicate this on the input side as well, and that is
> addressed in the patches below.
>
> I've updated and rebased the patches.
>
> The existing pg_ndistinct and pg_dependences formats were kept as-is. The
> formats are clumsy, more processing-friendly formats would be easier, but
> the need for such processing is minimal bordering on theoretical, so there
> is little impact in keeping the historical format.
>
> There are now checks to ensure that the pg_ndistinct or pg_dependencies
> value assigned to an extended statistics object actually makes sense for
> that object. What this amounts to is checking that for every attnum cited,
> the positive attnums are also ones found the in the stxkeys of the
> pg_statistic_ext tuple, and the negative attnums correspond do not exceed
> the number of expressions in the attnum. In other words, if the stats
> object has no expressions in it, then no negative numbers will be accepted,
> if it has 2 expressions than any value -3 or lower will be rejected, etc.
>
> All patches rebased to 71f17823ba010296da9946bd906bb8bcad6325bc.
>

A rebasing, and a few changes
* regnamespace and name parameters changed to statistics_schemaname as text
and statistics_name as text, so that there's one less thing that can
potentially fail in an upgrade
* schema lookup and stat name lookup failures now issue a warning and
return false, rather than ERROR
* elevel replaced with hardcoded WARNING most everywhere, as has been done
with relation/attribute stats