Re: Extended Statistics set/restore/clear functions.

jian he <jian.universality@gmail.com>

From: jian he <jian.universality@gmail.com>
To: Corey Huinker <corey.huinker@gmail.com>
Cc: Michael Paquier <michael@paquier.xyz>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-13T11:51:00Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

Attachments

hi.
now looking at v12-0003-Add-working-input-function-for-pg_ndistinct.patch
again.

+ * example input:
+ *     [{"attributes": [6, -1], "ndistinct": 14},
+ *      {"attributes": [6, -2], "ndistinct": 9143},
+ *      {"attributes": [-1,-2], "ndistinct": 13454},
+ *      {"attributes": [6, -1, -2], "ndistinct": 14549}]
  */
 Datum
 pg_ndistinct_in(PG_FUNCTION_ARGS)

extenssted statistics surely won't work on system columns,
how should we deal with case like:
```
{"attributes": [6, -1], "ndistinct": 14}
{"attributes": [6, -7], "ndistinct": 14},
```
issue a warning or error out saying that your attribute number is invalid?
Should we discourage using system columns as examples in comments here?

I have added more test code in src/test/regress/sql/pg_ndistinct.sql,
to improve the code coverage.

as mentioned before in
https://postgr.es/m/CACJufxEZYqocFdgn-x-bJMRBSk_zkS=ziGGkaSumteiPDksnsg@mail.gmail.com
I think it's a good thing to change
``(errcode....``
to
``errcode``.
So I did the change.


+static JsonParseErrorType
+ndistinct_array_element_start(void *state, bool isnull)
+{
+ NDistinctParseState *parse = state;
+
+ switch(parse->state)
+ {
+ case NDIST_EXPECT_ATTNUM:
+ if (!isnull)
+ return JSON_SUCCESS;
+
+ ereturn(parse->escontext, (Datum) 0,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
+ errdetail("Attnum list elements cannot be null.")));

this (and many other places) looks wrong, because
ereturn would really return ``(Datum) 0``, and this function returns
JsonParseErrorType.
so we have to errsave here.

+typedef struct
+{
+ const char *str;
+ NDistinctSemanticState state;
+
+ List   *distinct_items; /* Accumulated complete MVNDistinctItems */
+ Node   *escontext;
+
+ bool found_attributes; /* Item has an attributes key */
+ bool found_ndistinct; /* Item has ndistinct key */
+ List   *attnum_list; /* Accumulated attributes attnums */
+ int64 ndistinct;
+} NDistinctParseState;

+ case NDIST_EXPECT_NDISTINCT:
+ /*
+ * While the structure dictates that ndistinct in a double precision
+ * floating point, in practice it has always been an integer, and it
+ * is output as such. Therefore, we follow usage precendent over the
+ * actual storage structure, and read it in as an integer.
+ */
+ parse->ndistinct = pg_strtoint64_safe(token, parse->escontext);
+
+ if (SOFT_ERROR_OCCURRED(parse->escontext))
+ return JSON_SEM_ACTION_FAILED;

NDistinctParseState.ndistinct should be integer,
otherwise pg_ndistinct_out will not be consistent with  pg_ndistinct_in?

SELECT '[{"attributes" : [1, 2], "ndistinct" :
2147483648}]'::pg_ndistinct; --error
                    pg_ndistinct
----------------------------------------------------
 [{"attributes": [1, 2], "ndistinct": -2147483648}]
(1 row)

The result seems not what we expected.