Thread

  1. Re: Adding REPACK [concurrently]

    Robert Treat <rob@xzilla.net> — 2025-11-05T05:10:47Z

    On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
    > On Fri, Oct 31, 2025 at 7:17 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    > >
    > > Hello,
    > >
    > > Here's a new installment of this series, v25, including the CONCURRENTLY
    > > part, which required some conflict fixes on top of the much-changed
    > > v24-0001 patch.
    > >
    >
    >  <refnamediv>
    >   <refname>pg_repackdb</refname>
    >   <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
    >   database</refpurpose>
    >  </refnamediv>
    >
    > but with --all option specified, it's doing repack whole cluster.
    > (more than one database).
    > I am not fully sure this description is OK.
    >
    
    This wording came from vacuumdb, which operates the same way, and I
    don't think it's lead to confusion. And while I don't think we need to
    take away the option, I see no reason to encourage the idea that
    people should be doing cluster wide full database repacks. On that
    note, I'd take the "and analyze" from the refpurpose as well; the more
    I look at it, I see pg_repackdb as a replacement for clusterdb, with
    selected bells and whistles from vacuum full or external repack-type
    tooling, but at the end of the day that's a simpler model for
    operators, and helps draw a distinction for which features we DONT
    need to include, like -Z (ie. analyze only; if you want that, use
    vacuumdb, not pg_repackdb)
    
    >
    > I think pg_repackdb Synopsis section:
    > pg_repackdb [connection-option...] [option...] [ -t | --table table [(
    > column [,...] )] ] ... [ dbname | -a | --all ]
    > pg_repackdb [connection-option...] [option...] [ -n | --schema schema
    > ] ... [ dbname | -a | --all ]
    > pg_repackdb [connection-option...] [option...] [ -N | --exclude-schema
    > schema ] ... [ dbname | -a | --all ]
    >
    > can be simplified the same way as as pg_dump:
    >
    > pg_repackdb [connection-option...] [option...]  [ dbname | -a | --all ]
    >
    
    I think it's laid out that way in vacuumdb to indicate that those
    options are exclusive of one another. I'm not sure how convincing that
    is, but the above would need to do more to make the switch imo.
    
    > ------------------------
    > [-d] dbname
    > [--dbname=]dbname
    >
    > what do you think to expand it as below:
    > dbname
    > -d dbname
    > --dbname=dbname
    
    not sure i am following this one, but the brackets are the standard
    way we should items to be optional, which in either case they are.
    
    > --------------------
    >
    > + printf(_("      --index[=INDEX]             repack following an index\n"));
    > should it be
    > + printf(_("--index[=INDEX]                   repack following an index\n"));
    > ?
    >
    
    I believe this is included for alignment, since this option has no
    shorthand version.
    
    >
    > similar to pg_dump:
    >     printf(_("\nIf no database name is supplied, then the PGDATABASE
    > environment\n"
    >              "variable value is used.\n\n"));
    >
    > in pg_repackdb help section, we can mention:
    >     printf(_("\nIf no database name is supplied and --all option not
    > specified then the PGDATABASE environment\n"
    >              "variable value is used.\n\n"));
    > Do you think it's necessary?
    >
    
    no. (again, looking first at clusterdb, and also vacuumdb, neither of
    which have it).
    
    >
    > what the expectation of
    > pg_repackdb --index=index_name, the doc is not very helpful.
    >
    > pg_repackdb --analyze --index=zz --verbose
    > pg_repackdb: repacking database "src3"
    > pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
    > is not an index for table "tenk1"
    >
    > select pg_get_indexdef ('zz'::regclass);
    >                   pg_get_indexdef
    > ---------------------------------------------------
    >  CREATE INDEX zz ON public.tenk2 USING btree (two)
    >
    
    Hmm... yes, this is a bit confusing. I didn't verify it in the code,
    but from memory I think the --index option is meant to be used only in
    conjunction with --table, in which case it would repack the table
    using the specified index. I could be overlooking something though.
    
    Robert Treat
    https://xzilla.net