Re: Extended Statistics set/restore/clear functions.

Chao Li <li.evan.chao@gmail.com>

From: Chao Li <li.evan.chao@gmail.com>
To: Corey Huinker <corey.huinker@gmail.com>, Michael Paquier <michael@paquier.xyz>
Cc: jian he <jian.universality@gmail.com>, Tomas Vondra <tomas@vondra.me>, pgsql-hackers@lists.postgresql.org, tgl@sss.pgh.pa.us
Date: 2025-11-21T03:36:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add test doing some cloning of extended statistics data

  2. Add test for pg_restore_extended_stats() with multiranges

  3. Add support for "mcv" in pg_restore_extended_stats()

  4. Include extended statistics data in pg_dump

  5. Add support for "dependencies" in pg_restore_extended_stats()

  6. Add test for MAINTAIN permission with pg_restore_extended_stats()

  7. Add pg_restore_extended_stats()

  8. Add routine to free MCVList

  9. Improve pg_clear_extended_stats() with incorrect relation/stats combination

  10. Add pg_clear_extended_stats()

  11. Introduce routines to validate and free MVNDistinct and MVDependencies

  12. Fix typo in stat_utils.c

  13. Move attribute statistics functions to stat_utils.c

  14. Improve error messages of input functions for pg_dependencies and pg_ndistinct

  15. Improve test output of extended statistics for ndistinct and dependencies

  16. Fix some compiler warnings

  17. Add input function for data type pg_dependencies

  18. Add input function for data type pg_ndistinct

  19. Rework output format of pg_dependencies

  20. Rework output format of pg_ndistinct

  21. Fix comments of output routines for pg_ndistinct and pg_dependencies

  22. Move code specific to pg_dependencies to new file

  23. Move code specific to pg_ndistinct to new file

  24. Document some structures in attribute_stats.c

  25. Fix FATAL message for invalid recovery timeline at beginning of recovery

After 16 rounds of revisions, v16 looks solid and robust. Only got a few small comments:

> On Nov 21, 2025, at 07:08, Corey Huinker <corey.huinker@gmail.com> wrote:
> 
> <v16-0001-Add-working-input-function-for-pg_ndistinct.patch><v16-0002-Add-working-input-function-for-pg_dependencies.patch><v16-0003-Expose-attribute-statistics-functions-for-use-in.patch><v16-0004-Add-extended-statistics-support-functions.patch><v16-0005-Include-Extended-Statistics-in-pg_dump.patch>

1 - 0001
```
+	/*
+	 * We need at least two attribute numbers for a ndistinct item, anything
+	 * less is malformed.
+	 */
+	natts = list_length(parse->attnum_list);
+	if ((natts < 2) || (natts > STATS_MAX_DIMENSIONS))
+	{
+		errsave(parse->escontext,
+				errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
+				errdetail("The \"%s\" key must contain an array of at least %d "
+						  "and no more than %d attributes.",
+						  PG_NDISTINCT_KEY_ATTRIBUTES, 2, STATS_MAX_DIMENSIONS));
+		return JSON_SEM_ACTION_FAILED;
+	}
```

Do we also need to reset item state vars similar to the success case?

2 - 0001
```
*
+ * Arrays can never be empty.
+ */
+static JsonParseErrorType
+ndistinct_array_end(void *state)
+{
```

This function comment is too simple, and doesn’t match to the function name. Reading the function body, the logic is clear, so I would suggest either just remove the comment or enhance it.

3  - 0001
```
static JsonParseErrorType
+ndistinct_array_element_start(void *state, bool isnull)
+{
+	NDistinctParseState *parse = state;
```

Nit: NDistinctParseState * can be const.

4 - 0001
```
+static bool
+valid_subsequent_attnum(const AttrNumber prev, const AttrNumber cur)
```

AttrNumber is int16, thus “const” here is unnecessary.

5 - 0001
```
+			return JSON_SUCCESS;
+			break;
```

“Break” after “return” is unnecessary. I tried to delete the “break” and didn’t see a compile warning, it should be safe to delete the “break”.

6 - 0001
```
+/*
+ * Compare the attribute arrays of two MVNDistinctItem values,
+ * looking for duplicate sets.

+ */
+static bool
+has_duplicate_attributes(const MVNDistinctItem *a,
+						 const MVNDistinctItem *b)
```

The function comment says “looking for duplicate sets”, seems to imply the function would return the set, but the function returns a bool, which is a little confusing.

7 - 0002
```
+	for (int i = 0; i < natts; i++)
+	{
+		dep->attributes[i] = (AttrNumber) list_nth_int(parse->attnum_list, i);
+		if (dep->attributes[i] == parse->dependency)
+		{
+			errsave(parse->escontext,
+					errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+					errmsg("malformed pg_dependencies: \"%s\"", parse->str),
+					errdetail("Item \"%s\" value %d found in the \"%s\" list.",
+							  PG_DEPENDENCIES_KEY_DEPENDENCY, parse->dependency,
+			   				  PG_DEPENDENCIES_KEY_ATTRIBUTES));
+			return JSON_SEM_ACTION_FAILED;
+		}
```

Similar to comment 1, do we need to reset state vars upon failure?

8 - 0002
```
+				errdetail("The \"%s\" key must contain an array of at least %d "
+						  " and no than %d elements.",
```

I believe “more” is missed between “no than”.

9 - 0002
```
+static JsonParseErrorType
+dependencies_array_start(void *state)
+{
+	DependenciesParseState *parse = state;
```

Similar to commit 3, DependenciesParseState * can be const.

10 - 0002
```
static bool
valid_subsequent_attnum(const AttrNumber prev, const AttrNumber cur)
```

This function is a dup to the one in 0001, can we put it to a common place?

11 - 0003
```
+extern void statatt_get_type(Oid reloid, AttrNumber attnum,
+							 Oid *atttypid, int32 *atttypmod,
+							 char *atttyptype, Oid *atttypcoll,
+							 Oid *eq_opr, Oid *lt_opr);
+extern void statatt_init_empty_tuple(Oid reloid, int16 attnum, bool inherited,
+									 Datum *values, bool *nulls, bool *replaces);
+
+extern void statatt_set_slot(Datum *values, bool *nulls, bool *replaces,
+							 int16 stakind, Oid staop, Oid stacoll,
+							 Datum stanumbers, bool stanumbers_isnull,
+							 Datum stavalues, bool stavalues_isnull);
+
+extern Datum text_to_stavalues(const char *staname, FmgrInfo *array_in, Datum d,
+							   Oid typid, int32 typmod, bool *ok);
+extern bool statatt_get_elem_type(Oid atttypid, char atttyptype,
+								  Oid *elemtypid, Oid *elem_eq_opr);
```

Several functions are made external visible, they are all renamed with adding a prefix “statatt_”, why text_to_stavalues is an exception?

11 - 0004
```
+bool
+pg_dependencies_validate_deps(MVDependencies *dependencies, int2vector *stxkeys, int numexprs, int elevel)
+{
+	int		attnum_expr_lowbound = 0 - numexprs;
+
+	for (int i = 0; i < dependencies->ndeps; i++)
+	{
+		MVDependency *dep = dependencies->deps[i];
```

This MVDependency * can be const.

12 - 0004
```
static void
upsert_pg_statistic_ext_data(Datum *values, bool *nulls, bool *replaces)
{
```

This function pass values, nulls and replaces to heap_modify_tuple() and heap_form_tuple(), the both functions take all const pointers as parameters. So, here values, nulls and replaces can all be const.

13 - 0004
```
+		if (!exprs_nulls[offset + AVG_WIDTH_ELEM])
+		{
+			ok = text_to_int4(exprs_elems[offset + AVG_WIDTH_ELEM],
+							  &values[Anum_pg_statistic_stawidth - 1]);
+
+			if (!ok)
+			{
+				char	   *s = TextDatumGetCString(exprs_elems[offset + NULL_FRAC_ELEM]);
+
+				ereport(WARNING,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("Expression %s element \"%s\" does not match expected input type.",
+								extexprarginfo[AVG_WIDTH_ELEM].argname, s)));
+				pfree(s);
+				return (Datum) 0;
+			}
+		}
```

Here: char    *s = TextDatumGetCString(exprs_elems[offset + NULL_FRAC_ELEM]);
Should NULL_FRAC_ELEM be AVG_WIDTH_ELEM?

14 - 0004
```
+		if (!exprs_nulls[offset + N_DISTINCT_ELEM])
+		{
+			ok = text_to_float4(exprs_elems[offset + N_DISTINCT_ELEM],
+								&values[Anum_pg_statistic_stadistinct - 1]);
+
+			if (!ok)
+			{
+				char	   *s = TextDatumGetCString(exprs_elems[offset + NULL_FRAC_ELEM]);
+
+				ereport(WARNING,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("Expression %s element \"%s\" does not match expected input type.",
+								extexprarginfo[N_DISTINCT_ELEM].argname, s)));
+				pfree(s);
+				return (Datum) 0;
+			}
+		}
```

Similar to comment 13, should NULL_FRAC_ELEM be N_DISTINCT_ELEM?

15 - 0004
```
+extern Datum import_mcvlist(HeapTuple tup, int elevel, int numattrs,
+							Oid *atttypids, int32 *atttypmods, Oid *atttypcolls,
+							int nitems, Datum *mcv_elems, bool *mcv_nulls,
+							bool *mcv_elem_nulls, float8 *freqs, float8 *base_freqs);
+
+extern Datum import_mcvlist(HeapTuple tup, int elevel, int numattrs,
+							Oid *atttypids, int32 *atttypmods, Oid *atttypcolls,
+							int nitems, Datum *mcv_elems, bool *mcv_nulls,
+							bool *mcv_elem_nulls, float8 *freqs, float8 *base_freqs);
```

import_mcvlist is declared twice, looks like a copy-paste mistake.

16 - 0005
```
+	PREPQUERY_DUMPEXTSTATSSTATS,
```

“statsstats” looks weird, I’d suggest PREPQUERY_DUMPEXTSTATSDATA.

Best regards,
—
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/