Re: Extended Statistics set/restore/clear functions.
Tomas Vondra <tomas@vondra.me>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add test doing some cloning of extended statistics data
- fc365e4fccc4 19 (unreleased) landed
-
Add test for pg_restore_extended_stats() with multiranges
- 0b7beec42ae2 19 (unreleased) landed
-
Add support for "mcv" in pg_restore_extended_stats()
- efbebb4e8587 19 (unreleased) landed
-
Include extended statistics data in pg_dump
- c32fb29e979d 19 (unreleased) landed
-
Add support for "dependencies" in pg_restore_extended_stats()
- 302879bd68d1 19 (unreleased) landed
-
Add test for MAINTAIN permission with pg_restore_extended_stats()
- d9abd9e1050d 19 (unreleased) landed
-
Add pg_restore_extended_stats()
- 0e80f3f88dea 19 (unreleased) landed
-
Add routine to free MCVList
- 7ebb64c55757 19 (unreleased) landed
-
Improve pg_clear_extended_stats() with incorrect relation/stats combination
- 395b73c045e0 19 (unreleased) landed
-
Add pg_clear_extended_stats()
- d756fa1019ff 19 (unreleased) landed
-
Introduce routines to validate and free MVNDistinct and MVDependencies
- 32e27bd32082 19 (unreleased) landed
-
Fix typo in stat_utils.c
- eee19a30d60d 19 (unreleased) landed
-
Move attribute statistics functions to stat_utils.c
- 213a1b895270 19 (unreleased) landed
-
Improve error messages of input functions for pg_dependencies and pg_ndistinct
- f68597ee777d 19 (unreleased) landed
-
Improve test output of extended statistics for ndistinct and dependencies
- 2f04110225ab 19 (unreleased) landed
-
Fix some compiler warnings
- 7bc88c3d6f3a 19 (unreleased) landed
-
Add input function for data type pg_dependencies
- e1405aa5e3ac 19 (unreleased) landed
-
Add input function for data type pg_ndistinct
- 44eba8f06e55 19 (unreleased) landed
-
Rework output format of pg_dependencies
- e76defbcf09e 19 (unreleased) landed
-
Rework output format of pg_ndistinct
- 1f927cce4498 19 (unreleased) landed
-
Fix comments of output routines for pg_ndistinct and pg_dependencies
- 040a39ed25bf 19 (unreleased) landed
-
Move code specific to pg_dependencies to new file
- 2ddc8d9e9baa 19 (unreleased) landed
-
Move code specific to pg_ndistinct to new file
- a5523123430f 19 (unreleased) landed
-
Document some structures in attribute_stats.c
- d6c132d83bff 19 (unreleased) landed
-
Fix FATAL message for invalid recovery timeline at beginning of recovery
- 71f17823ba01 18.0 cited
On 10/23/25 01:46, Michael Paquier wrote: > On Wed, Oct 22, 2025 at 02:55:31PM +0300, Corey Huinker wrote: >>> Do you have some numbers regarding the increase in size this generates >>> for the catalogs? >> >> Sorry, I don't understand. There shouldn't be any increase inside the >> catalogs as the internal storage of the datatypes hasn't changed, so I can >> only conclude that you're referring to something else. > > The new format meant more characters, perhaps I've just missed > something while quickly testing the patch.. Anyway, that's OK at this > stage. > >> The equivalent structures in attribute_stats.c will need documenting too. > > Right. This sounds like a separate patch to me, impacting HEAD. > >> Right now we have a situation where the vast majority of databases can >> carry forward all of their stats via pg_upgrade, except for those databases >> that have extended stats. The trouble is, most customers don't know if >> their database uses extended statistics or not, and those that do are in >> for some bad query plans if they haven't run vacuumdb --missing-stats-only. >> Explaining that to customers is complicated, especially when most of them >> do not know what extended stats are, let alone whether they have them. It >> would be a lot simpler to just say "all stats are carried over on upgrade", >> and vacuumdb becomes unnecessary, making upgrades one step simpler as well. > > Okay. > >> Given that, I think that the admittedly ugly transformation is worth it, >> and sequestering it inside pg_dump is the smallest footprint it can have. >> Earlier in this thread I posted some functions that did the translation >> from the existing formats to the proposed new formats. We could include >> those as new system functions, and that would make the dump code very >> simple. Having said that, I don't know that there would be use for those >> functions except inside pg_dump, hence the decision to do the transforms >> right in the dump query. > > I'd prefer the new format. One killer pushing in favor of the new > format that you are making upthread in favor of is that it makes much > easier the viewing, editing and injecting of these stats. It's the > part of the patch where we would need Tomas' input on the matter > before deciding anything, I guess, as primary author of the original > facilities. My view of the problem is just one opinion. > Sorry for not paying much attention to this thread ... My opinion is that we should both use the new format and keep the pg_dump code to allow upgrading from older pre-19 versions. There really is nothing special about the current format - I should have used JSON (or any other established format) from the beginning. But I only saw that as human-readable version of ephemeral data, it didn't occur to me we'll use this to export/import stats cross versions. So if we need to adjust that to make new use cases more convenient, let's bite the bullet now. If doing both is too complex / ugly, I think the pg_upgrade capability is more valuable. I'd rather keep the old, less convenient format to have pg_upgrade support for all versions. Otherwise users may not benefit from this pg_upgrade feature for a couple more years. Plenty of users delay upgrading until the EOL gets close, and so might be unable to dump/restore extended stats for the next ~5 years. regards -- Tomas Vondra