Re: Extended Statistics set/restore/clear functions.

Corey Huinker <corey.huinker@gmail.com>

From: Corey Huinker <corey.huinker@gmail.com>
To: Tomas Vondra <tomas@vondra.me>
Cc: pgsql-hackers@lists.postgresql.org
Date: 2025-01-27T19:52:15Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

Attachments

>
> I'd like to merge these down to 3 patches again, but I'm keeping them
> separate for this patchset to isolate the attnum-checking code for this
> go-round.
>

These are mock-ups of the to/from JSON functions, but building from/to text
rather than the not-yet-committed pg_ndistinct and pg_dependencies data
types. Currently they're done with JSON rather than JSONB because I assume
that the ordering within the datatype matters. We can probably preserve
order by adding an "order" field populated by WITH ORDINALITY.

To get all Jurrassic about this: I've spent some time thinking about
whether we CAN make these functions, it's time to consider whether we
SHOULD. And that leads me to a couple of points:

p1. We could switch to the new formats without any change to the internal
representation, but pg_dump would always need to know about the old formats.

p2. The JSON format is both more understandable and easier to manipulate.

p3. If we thought the number of people using extended stats was small, the
number of people tweaking extended stats is going to be smaller.

So that gives us a few paths forward:

o1. Switch to the new input/output format, and the queries inside these
functions get incorporated into some future pg_dump queries.

o2. Keep the old formats, create these functions inside the system and
whoever wants to use them can use them.

o3. Keep old formats, and make these functions work as the CASTs to and
from JSON/JSONB.

o4. Keep old formats, could create these functions in an extension.

o5. Keep old formats, leave these function definitions here for a future
intrepid hacker.