Re: Import Statistics in postgres_fdw before resorting to sampling.
Corey Huinker <corey.huinker@gmail.com>
From: Corey Huinker <corey.huinker@gmail.com>
To: Etsuro Fujita <etsuro.fujita@gmail.com>
Cc: Michael Paquier <michael@paquier.xyz>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, pgsql-hackers@postgresql.org, jkatz@postgresql.org, nathandbossart@gmail.com
Date: 2025-12-13T19:12:16Z
Lists: pgsql-hackers
Attachments
- v6-0001-Add-remote-statistics-fetching-to-postgres_fdw.patch (text/x-patch)
On Fri, Dec 12, 2025 at 2:45 PM Corey Huinker <corey.huinker@gmail.com> wrote: > CCing Jonathan Katz and Nathan Bossart as they had been sounding boards > for me when I started designing this feature. > > >> Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the >> foreign table is an inheritance parent, we would fail to do inherited >> stats. >> > > Perhaps I'm not understanding completely, but I believe what we're doing > now should be ok. > > The local table of type 'f' can be a member of a partition, but can't be a > partition itself, so whatever stats we get for it, we're storing them as > inh = false. > > On the remote side, the table could be an inheritance parent, in which > case we ONLY want the inh=true stats, but we will still store them locally > as inh = false. > > The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of > the query ensures that we get inh=true stats if they're there in preference > to the inh=false steps. > > I will grant you that in an old-style inheritance (i.e. not formal > partitions) situation we'd probably want some weighted mix of the inherited > and non-inherited stats, but 1) very few people use old-style inheritance > anymore, 2) few of those export tables via a FDW, and 3) there's no way to > do that weighting so we should fall back to sampling anyway. > > None of this takes away from your suggestions down below. > > >> IIUC, the FDW needs to do two things within the ImportStatistics >> callback function: check the importability, and if ok, do the work. I >> think that would make the API complicated. To avoid that, how about >> 1) splitting the callback function into the two functions shown below >> and then 2) rewriting the above to something like the attached? The >> attached addresses the returning issue mentioned above as well. >> >> bool >> StatisticsAreImportable(Relation relation) >> >> Checks the importability, and if ok, returns true, in which case the >> following callback function is called. In the postgres_fdw case, we >> could implement this to check if the fetch_stats flag for the foreign >> table is set to true, and if so, return true. >> > > +1 > > >> void >> ImportStatistics(Relation relation, List *va_cols, int elevel) >> >> Imports the stats for the foreign table from the foreign server. As >> mentioned in previous emails, I don't think it's a good idea to fall >> back to the normal processing when the attempt to import the stats >> fails, in which case I think we should just throw an error (or >> warning) so that the user can re-try to import the stats after fixing >> the foreign side in some way. So I re-defined this as a void >> function. Note that this re-definition removes the concern mentioned >> in the comment starting with "XXX:". In the postgres_fdw case, as >> mentioned in a previous email, I think it would be good to implement >> this so that it checks whether the remote table is a view or not when >> importing the relation stats from the remote server, and if so, just >> throws an error (or warning), making the user reset the fetch_stats >> flag. >> > > I think I'm ok with this design as the decision, as it still leaves open > the fdw-specific options of how to handle initially finding no remote stats. > > I can still see a situation where a local table expects the remote table > to eventually have proper statistics on it, but until that happens it will > fall back to table samples. This design decision means that either the user > lives without any statistics for a while, or alters the foreign table > options and hopefully remembers to set them back. While I understand the > desire to first implement something very simple, I think that adding the > durability that fallback allows for will be harder to implement if we don't > build it in from the start. I'm interested to hear with Nathan and/or > Jonathan have to say about that. > > >> I added two arguments to the callback function: va_cols, for >> supporting the column list in the ANALYZE command, and elevel, for >> proper logging. I'm not sure if VaccumParams should be added as well. >> > > Good catch, I forgot about that one. > > Going to think some more on this before I work incorporating your > Heh, the word "changes" got cut off. Anyway, here's v6 incorporating both threads of feedback.