Thread

  1. Re: Import Statistics in postgres_fdw before resorting to sampling.

    Etsuro Fujita <etsuro.fujita@gmail.com> — 2025-11-27T12:46:09Z

    On Sat, Nov 22, 2025 at 6:31 AM Corey Huinker <corey.huinker@gmail.com> wrote:
    >> Other initial comments:
    >>
    >> The commit message says:
    >>
    >>     This is managed via two new options, fetch_stats and remote_analyze,
    >>     both are available at the server level and table level. If fetch_stats
    >>     is true, then the ANALYZE command will first attempt to fetch statistics
    >>     from the remote table and import those statistics locally.
    >>
    >>     If remote_analyze is true, and if the first attempt to fetch remote
    >>     statistics found no attribute statistics, then an attempt will be made
    >>     to ANALYZE the remote table before a second and final attempt to fetch
    >>     remote statistics.
    >>
    >>     If no statistics are found, then ANALYZE will fall back to the normal
    >>     behavior of sampling and local analysis.
    >>
    >> I think the first step assumes that the remote stats are up-to-date;
    >> if they aren't, it would cause a regression.  (If the remote relation
    >> is a plain table, they are likely to be up-to-date, but for example,
    >> if it is a foreign table, it's possible that they are stale.)  So how
    >> about making it the user's responsibility to make them up-to-date?  If
    >> doing so, we wouldn't need to do the second and third steps anymore,
    >> making the patch simple.
    
    > Obviously there is no way to know the quality/freshness of remote stats if they are found.
    >
    > The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.
    
    Actually, I have some concerns about the ANALYZE and fall-back
    options.  As for the former, if the remote user didn't have the
    MAINTAIN privilege on the remote table, remote ANALYZE would be just a
    waste effort.  As for the latter, imagine the situation where a user
    ANALYZEs a foreign table whose remote table is significantly large.
    When the previous attempts fail, the user might want to re-try to
    import remote stats after ANALYZEing the remote table in the remote
    side in some way, rather than postgres_fdw automatically falling back
    to the normal lengthy processing.  I think just throwing an error if
    the first attempt fails would make the system not only simple but
    reliable while providing some flexibility to users.
    
    >> On the other hand:
    >>
    >>     This operation will only work on remote relations that can have stored
    >>     statistics: tables, partitioned tables, and materialized views. If the
    >>     remote relation is a view then remote fetching/analyzing is just wasted
    >>     effort and the user is better of setting fetch_stats to false for that
    >>     table.
    >>
    >> I'm not sure the waste effort is acceptable; IMO, if the remote table
    >> is a view, I think that the system should detect that in some way, and
    >> then just do the normal ANALYZE processing.
    >
    >
    > The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisions on whether to continue from there, only applying the relstats after attrstats have been successfully applied.
    
    Good idea!  I would vote for throwing an error if the relkind is view,
    making the user set fetch_stats to false for the foreign table.
    
    Best regards,
    Etsuro Fujita