Re: Extended Statistics set/restore/clear functions.

jian he <jian.universality@gmail.com>

From: jian he <jian.universality@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: Corey Huinker <corey.huinker@gmail.com>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-18T05:07:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

On Mon, Nov 17, 2025 at 2:56 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Nov 14, 2025 at 03:25:27PM +0900, Michael Paquier wrote:
> > Thanks for the new versions, I'll also look at all these across the
> > next couple of days.  Probably not at 0005~ for now.
>
> 0001 and 0002 from series v13 have been applied to change the output
> functions.
>

> And I have looked at 0003 in details for now.  Attached is a revised
> version for it, with many adjustments.  Some notes:
> - Many portions of the coverage were missing.  I have measured the
> coverage at 91% with the updated version attached.  This includes
> coverage for some error reporting, something that we rely a lot on for
> this code.
> - The error reports are made simpler, with the token values getting
> hidden.  While testing with some fancy values, I have actually noticed
> that the error handlings for the parsing of the int16 and int32 values
> were incorrect, the error reports used what the safe functions
> generated, not the reports from the data type.
> - Passing down arbitrary bytes sequences was leading to these bytes
> reported in the error outputs because we cared about the token values.
> I have added a few tests based on that for the code paths involved.
>
hi.

in src/backend/statistics/mvdistinct.c, we have:
Assert(AttributeNumberIsValid(item->attributes[j]));

should we disallow 0 in key attributes?
SELECT '[{"attributes" : [0,1], "ndistinct" : 4}]'::pg_ndistinct;
I didn't find a way to trigger this Assert yet.


+ errsave(parse->escontext,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
+ errdetail("Invalid \"%s\" value.", PG_NDISTINCT_KEY_ATTRIBUTES));

+ errsave(parse->escontext,
+ errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
+ errdetail("Invalid \"%s\" value.",
+  PG_NDISTINCT_KEY_NDISTINCT));

the errdetail is way too generic?
similar to ``select 'a'::int;``
we can
DETAIL:  Invalid input syntax for type integer: "a"
HINT: "ndistinct" value expected to be a type of integer.

what do you think?


we already have "fname" in ndistinct_object_field_start,
we can also print out the "fname", like:
    errsave(parse->escontext,
            errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
            errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
            errdetail("Unexpected key \"%s\"", fname),
            errhint("Only allowed keys are \"%s\" and \"%s\".",
                      PG_NDISTINCT_KEY_ATTRIBUTES,
                      PG_NDISTINCT_KEY_NDISTINCT));


SELECT '[{"attributes" : [2,3], "ndistinct" : 4, "ndistinct" :
14}]'::pg_ndistinct;
               pg_ndistinct
-------------------------------------------
 [{"attributes": [2, 3], "ndistinct": 14}]

SELECT '[{"attributes" : [2,3], "ndistinct" : 4, "attributes" :
[]}]'::pg_ndistinct;
               pg_ndistinct
------------------------------------------
 [{"attributes": [2, 3], "ndistinct": 4}]

Is the above output what we expected?


+ /*
+ * We need at least two attribute numbers for a ndistinct item, anything
+ * less is malformed.
+ */
+ natts = parse->attnum_list->length;
here, we can use list_length.

+ if (parse->attnum_list != NIL)
+ if (parse->distinct_items != NIL)
here, we can also use list_length.


--
jian
https://www.enterprisedb.com/