Re: Extended Statistics set/restore/clear functions.

Corey Huinker <corey.huinker@gmail.com>

From: Corey Huinker <corey.huinker@gmail.com>
To: jian he <jian.universality@gmail.com>
Cc: Michael Paquier <michael@paquier.xyz>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-14T05:49:23Z
Lists: pgsql-hackers

Attachments

>
> extenssted statistics surely won't work on system columns,
> how should we deal with case like:
> ```
> {"attributes": [6, -1], "ndistinct": 14}
> {"attributes": [6, -7], "ndistinct": 14},
> ```
> issue a warning or error out saying that your attribute number is invalid?
> Should we discourage using system columns as examples in comments here?
>

Negative numbers represent the Nth expression defined in the extended
statistics object. So if you have extended statistics on a, b, length(a),
length(b) then you can legally have -1 and -2 in the attributes, but
nothing lower than that.
See functions pg_ndistinct_validate_items() and
pg_depdendencies_validate_deps() as these check the attributes in the value
against the definition of the extended stats object.

Though this does bring up a small naming problem: Elements of a
pg_dependencies are Dependency, abbreviated to dep, but are also called
Items like the elements in a pg_ndistinct. We should pick a standard name
for such things (probably item) and use it everywhere.


>
> I have added more test code in src/test/regress/sql/pg_ndistinct.sql,
> to improve the code coverage.
>

I'm trying to implement those test cases, but I may have missed some.


>
>
> this (and many other places) looks wrong, because
> ereturn would really return ``(Datum) 0``, and this function returns
> JsonParseErrorType.
> so we have to errsave here.
>

Good point. Implemented.



>
> NDistinctParseState.ndistinct should be integer,
> otherwise pg_ndistinct_out will not be consistent with  pg_ndistinct_in?
>

+1

Implemented many, but not all of these suggestions.