Re: Extended Statistics set/restore/clear functions.

Michael Paquier <michael@paquier.xyz>

From: Michael Paquier <michael@paquier.xyz>
To: Corey Huinker <corey.huinker@gmail.com>
Cc: jian he <jian.universality@gmail.com>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-18T03:34:29Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

On Mon, Nov 17, 2025 at 09:32:37PM -0500, Corey Huinker wrote:
> So I looked at the generator functions, hoping they'd have enough in common
> that they could be made generic. And they're just different enough that I
> think it's not worth it to try.
> 
> But, if we don't care about the order of the combinations, I also don't
> think we need to expose the functions at all. We know exactly how many
> combinations there should be for any N attributes as each attribute must be
> unique. So if we have the right number of unique combinations, and they're
> all subsets of the first-longest, then we must have a complete set.
> Thoughts on that?
> 
> Getting _too_ tight with the ordering and contents makes me concerned for
> the day when the format might change. We don't want to _fail_ an upgrade
> because some of the combinations were in the wrong order.

That's fair.  The planner costing code pulling the stats numbers based
on the attributes was smart enough to not care much about the ordering
as far as I recall, but I'd rather make sure of that first.  This
needs some careful lookup.

>> These don't make sense anyway because they have a predictible and
>> perfectly matching correlation relationship.
>>
> 
> They do, for now, but are we willing to lock ourselves into that forever?

Perhaps not.  I cannot say for sure what's the future is going to be
made of.

> Looking over those functions, they both could have use the same generator,
> but the dependencies-side decided that dependency order doesn't matter,
> which puts doubt in my head that the order is perfectly the same for both,
> so we'd better follow each individually IF we want to enforce order.

I'd try to look at the bits related to pg_dependencies and
pg_ndistinct as two separate concepts, at the end.  They're sort of
alike, but have too many differences already.
--
Michael