Re: analyze-in-stages post upgrade questions

Mircea Cadariu <cadariu.mircea@gmail.com>

From: Mircea Cadariu <cadariu.mircea@gmail.com>
To: Fujii Masao <masao.fujii@gmail.com>
Cc: Laurenz Albe <laurenz.albe@cybertec.at>, "Zechman, Derek S" <Derek.S.Zechman@snapon.com>, Adrian Klaver <adrian.klaver@aklaver.com>, pgsql-hackers@lists.postgresql.org
Date: 2025-08-06T04:01:18Z
Lists: pgsql-hackers
Hi,

On 30/07/2025 12:49, Fujii Masao wrote:
> I've started reviewing the patch since it's marked as ready for committer.
Thanks!
> Overall, I like the change. But I have one question: should this be treated as
> a bug fix that we back-patch to supported branches, or is it more of
> an improvement that should only go into master?
I reckon it might make sense to back-patch it to previous versions, as 
users might not upgrade always to the latest version.
>           Only calculate statistics for use by the optimizer (no vacuum).
> +        If that option is specified, <command>vacuumdb</command> will also
> +        process partitioned tables.  Without that option, only the partitions
> +        will be considered, unless a partitioned table is explicitly specified
> +        with the <option>--table</option> option.
>
> This wording seems a bit out of place in the --analyze-only section,
> since it also describes the default behavior of vacuumdb without that option.
> Wouldn't it make more sense to move that explanation in the --table section?
>
> For example, we could add something like:
>
> ------------------
> If no tables are specified with the --table option, vacuumdb will
> clean all regular tables and materialized views in the connected
> database. If --analyze-only or --analyze-in-stages is also specified,
> it will analyze all regular tables, partitioned tables, and
> materialized views (but not foreign tables).
> ------------------
Yes, agreed.
> + /*
> + * VACUUMing partitioned tables would be unreasonably expensive, since
> + * that entails processing the partitions twice (once as part of the
> + * partitioned table, once as tables in their own right) for no
> + * benefit. But if we only ANALYZE, collecting statistics for
> + * partitioned tables is worth the effort.
> + */
>
> This is probably true. But isn't the main reason more about aligning with
> the behavior of the underlying VACUUM and ANALYZE commands? As the vacuumdb
> docs says, "There is no effective difference between vacuuming and analyzing
> databases via this utility and via other methods for accessing the server.",
> so its default target objects should match: VACUUM skips partitioned tables
> by default, while ANALYZE includes them. If that's the case, maybe the comment
> should reflect that instead.
I see what you mean. From that perspective, I wonder if we even need a 
comment there at all.
> + qr/statement:\s+ANALYZE\s+public\.parent_table/s,
> + '--analyze_only updates statistics for partitioned tables');
>
> A plain space might be sufficient instead of \s+.
> Also, I don't think the backslash before ".parent_table" is necessary.

Good catch! Indeed let's simplify that to contain strictly only what's 
necessary.

Kind regards,

Mircea Cadariu