Thread

Re: Adding REPACK [concurrently]

Robert Treat <rob@xzilla.net> — 2025-11-05T05:10:47Z
On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
> On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Hello,
> >
> > Here's a new installment of this series, v25, including the CONCURRENTLY
> > part, which required some conflict fixes on top of the much-changed
> > v24-0001 patch.
> >
>
>  <refnamediv>
>   <refname>pg_repackdb</refname>
>   <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
>   database</refpurpose>
>  </refnamediv>
>
> but with --all option specified, it's doing repack whole cluster.
> (more than one database).
> I am not fully sure this description is OK.
>

This wording came from vacuumdb, which operates the same way, and I
don't think it's lead to confusion. And while I don't think we need to
take away the option, I see no reason to encourage the idea that
people should be doing cluster wide full database repacks. On that
note, I'd take the "and analyze" from the refpurpose as well; the more
I look at it, I see pg_repackdb as a replacement for clusterdb, with
selected bells and whistles from vacuum full or external repack-type
tooling, but at the end of the day that's a simpler model for
operators, and helps draw a distinction for which features we DONT
need to include, like -Z (ie. analyze only; if you want that, use
vacuumdb, not pg_repackdb)

>
> I think pg_repackdb Synopsis section:
> pg_repackdb [connection-option...] [option...] [ -t | --table table [(
> column [,...] )] ] ... [ dbname | -a | --all ]
> pg_repackdb [connection-option...] [option...] [ -n | --schema schema
> ] ... [ dbname | -a | --all ]
> pg_repackdb [connection-option...] [option...] [ -N | --exclude-schema
> schema ] ... [ dbname | -a | --all ]
>
> can be simplified the same way as as pg_dump:
>
> pg_repackdb [connection-option...] [option...]  [ dbname | -a | --all ]
>

I think it's laid out that way in vacuumdb to indicate that those
options are exclusive of one another. I'm not sure how convincing that
is, but the above would need to do more to make the switch imo.

> ------------------------
> [-d] dbname
> [--dbname=]dbname
>
> what do you think to expand it as below:
> dbname
> -d dbname
> --dbname=dbname

not sure i am following this one, but the brackets are the standard
way we should items to be optional, which in either case they are.

> --------------------
>
> + printf(_("      --index[=INDEX]             repack following an index\n"));
> should it be
> + printf(_("--index[=INDEX]                   repack following an index\n"));
> ?
>

I believe this is included for alignment, since this option has no
shorthand version.

>
> similar to pg_dump:
>     printf(_("\nIf no database name is supplied, then the PGDATABASE
> environment\n"
>              "variable value is used.\n\n"));
>
> in pg_repackdb help section, we can mention:
>     printf(_("\nIf no database name is supplied and --all option not
> specified then the PGDATABASE environment\n"
>              "variable value is used.\n\n"));
> Do you think it's necessary?
>

no. (again, looking first at clusterdb, and also vacuumdb, neither of
which have it).

>
> what the expectation of
> pg_repackdb --index=index_name, the doc is not very helpful.
>
> pg_repackdb --analyze --index=zz --verbose
> pg_repackdb: repacking database "src3"
> pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
> is not an index for table "tenk1"
>
> select pg_get_indexdef ('zz'::regclass);
>                   pg_get_indexdef
> ---------------------------------------------------
>  CREATE INDEX zz ON public.tenk2 USING btree (two)
>

Hmm... yes, this is a bit confusing. I didn't verify it in the code,
but from memory I think the --index option is meant to be used only in
conjunction with --table, in which case it would repack the table
using the specified index. I could be overlooking something though.

Robert Treat
https://xzilla.net