Thread
-
Re: Import Statistics in postgres_fdw before resorting to sampling.
Etsuro Fujita <etsuro.fujita@gmail.com> — 2025-12-12T14:04:26Z
On Fri, Dec 12, 2025 at 6:59 AM Corey Huinker <corey.huinker@gmail.com> wrote: > Rebase, no changes. Thanks for rebasing! I reviewed the changes made to the core: @@ -196,13 +196,56 @@ analyze_rel(Oid relid, RangeVar *relation, { /* * For a foreign table, call the FDW's hook function to see whether it - * supports analysis. + * supports statistics import and/or analysis. */ FdwRoutine *fdwroutine; bool ok = false; fdwroutine = GetFdwRoutineForRelation(onerel, false); + if (fdwroutine->ImportStatistics != NULL) + { + FdwImportStatsResult res; + + /* + * Fetching pre-existing remote stats is not guaranteed to be a quick + * operation. + * + * XXX: Should this be it's own fetch type? If not, then there might be + * confusion when a long stats-fetch fails, followed by a regular analyze, + * which would make it look like the table was analyzed twice. + */ + pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE, + RelationGetRelid(onerel)); + + res = fdwroutine->ImportStatistics(onerel); + + pgstat_progress_end_command(); + + /* + * If we were able to import statistics, then there is no need to collect + * samples for local analysis. + */ + switch(res) + { + case FDW_IMPORT_STATS_OK: + relation_close(onerel, ShareUpdateExclusiveLock); + return; + case FDW_IMPORT_STATS_DISABLED: + break; + case FDW_IMPORT_STATS_NOTFOUND: + ereport(INFO, + errmsg("Found no remote statistics for \"%s\"", + RelationGetRelationName(onerel))); + break; + case FDW_IMPORT_STATS_FAILED: + default: + ereport(INFO, + errmsg("Fetching remote statistics from \"%s\" failed", + RelationGetRelationName(onerel))); + } + } Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the foreign table is an inheritance parent, we would fail to do inherited stats. IIUC, the FDW needs to do two things within the ImportStatistics callback function: check the importability, and if ok, do the work. I think that would make the API complicated. To avoid that, how about 1) splitting the callback function into the two functions shown below and then 2) rewriting the above to something like the attached? The attached addresses the returning issue mentioned above as well. bool StatisticsAreImportable(Relation relation) Checks the importability, and if ok, returns true, in which case the following callback function is called. In the postgres_fdw case, we could implement this to check if the fetch_stats flag for the foreign table is set to true, and if so, return true. void ImportStatistics(Relation relation, List *va_cols, int elevel) Imports the stats for the foreign table from the foreign server. As mentioned in previous emails, I don't think it's a good idea to fall back to the normal processing when the attempt to import the stats fails, in which case I think we should just throw an error (or warning) so that the user can re-try to import the stats after fixing the foreign side in some way. So I re-defined this as a void function. Note that this re-definition removes the concern mentioned in the comment starting with "XXX:". In the postgres_fdw case, as mentioned in a previous email, I think it would be good to implement this so that it checks whether the remote table is a view or not when importing the relation stats from the remote server, and if so, just throws an error (or warning), making the user reset the fetch_stats flag. I added two arguments to the callback function: va_cols, for supporting the column list in the ANALYZE command, and elevel, for proper logging. I'm not sure if VaccumParams should be added as well. That's it for now. I'll continue to review the patch. Best regards, Etsuro Fujita