Thread

Re: Adding REPACK [concurrently]

Robert Treat <rob@xzilla.net> — 2025-11-09T22:13:43Z
On Wed, Nov 5, 2025 at 2:12 AM Antonin Houska <ah@cybertec.at> wrote:
> Robert Treat <rob@xzilla.net> wrote:
> > On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
>
> > > what the expectation of
> > > pg_repackdb --index=index_name, the doc is not very helpful.
> > >
> > > pg_repackdb --analyze --index=zz --verbose
> > > pg_repackdb: repacking database "src3"
> > > pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
> > > is not an index for table "tenk1"
> > >
> > > select pg_get_indexdef ('zz'::regclass);
> > >                   pg_get_indexdef
> > > ---------------------------------------------------
> > >  CREATE INDEX zz ON public.tenk2 USING btree (two)
> > >
> >
> > Hmm... yes, this is a bit confusing. I didn't verify it in the code,
> > but from memory I think the --index option is meant to be used only in
> > conjunction with --table, in which case it would repack the table
> > using the specified index. I could be overlooking something though.
>
> The corresponding code is:
>
> +       /*
> +        * In REPACK mode, if the 'using_index' option was given but no index
> +        * name, filter only tables that have an index with indisclustered set.
> +        * (If an index name is given, we trust the user to pass a reasonable list
> +        * of tables.)
> +        *
> +        * XXX it may be worth printing an error if an index name is given with no
> +        * list of tables.
> +        */
> +       if (vacopts->mode == MODE_REPACK &&
> +               vacopts->using_index && !vacopts->indexname)
> +       {
> +               appendPQExpBufferStr(&catalog_query,
> +                                                        " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
> +                                                        "    WHERE indrelid = c.oid AND indisclustered)\n");
> +       }
>
> I'm not sure if it's worth allowing the --index option to have an
> argument. Since the user can specify multiple tables, he should also be able
> to specify multiple indexes. And then the question would be: what should
> happen if the user forgot to specify (or just mistyped) the index name for a
> table which does not yet have the clustering index set? Skip that table (and
> print out a warning)? Or consider it an error?
>

Ah, yes, this is something completely different. So, we do need a way
to differentiate between "vacuum full" vs "cluster" all tables... as
well as "vacuum full" vs "cluster" of a specific table (including the
idea of "vacuum full" of a previously clustered table, and the
existing code handles all that (though I might quibble with the option
name).

As for having an --index= option, I'd love to hear the use case;
something like partitions or maybe some client per schema situation
comes to mind, but ISTM in all those cases the user would also know
(or be expected to know) the table name, so I agree with Antonin that
the extra complexity doesn't seem worth supporting to me. (It's even
worse the more you think about it, what if some table has the index
named above, but is clustered on a different index, then what should
we do?)

As for the use case I was thinking of, specifying a table and index in
order to repack using that index (and setting indisclustered if not
already); while I feel like that would be a useful option, if it isn't
currently supported I don't see a strong argument for adding it now.


Robert Treat
https://xzilla.net