Thread

  1. Re: Adding REPACK [concurrently]

    Robert Treat <rob@xzilla.net> — 2025-11-09T22:13:43Z

    On Wed, Nov 5, 2025 at 2:12 AM Antonin Houska <ah@cybertec.at> wrote:
    > Robert Treat <rob@xzilla.net> wrote:
    > > On Tue, Nov 4, 2025 at 9:48 PM jian he <jian.universality@gmail.com> wrote:
    >
    > > > what the expectation of
    > > > pg_repackdb --index=index_name, the doc is not very helpful.
    > > >
    > > > pg_repackdb --analyze --index=zz --verbose
    > > > pg_repackdb: repacking database "src3"
    > > > pg_repackdb: error: processing of database "src3" failed: ERROR:  "zz"
    > > > is not an index for table "tenk1"
    > > >
    > > > select pg_get_indexdef ('zz'::regclass);
    > > >                   pg_get_indexdef
    > > > ---------------------------------------------------
    > > >  CREATE INDEX zz ON public.tenk2 USING btree (two)
    > > >
    > >
    > > Hmm... yes, this is a bit confusing. I didn't verify it in the code,
    > > but from memory I think the --index option is meant to be used only in
    > > conjunction with --table, in which case it would repack the table
    > > using the specified index. I could be overlooking something though.
    >
    > The corresponding code is:
    >
    > +       /*
    > +        * In REPACK mode, if the 'using_index' option was given but no index
    > +        * name, filter only tables that have an index with indisclustered set.
    > +        * (If an index name is given, we trust the user to pass a reasonable list
    > +        * of tables.)
    > +        *
    > +        * XXX it may be worth printing an error if an index name is given with no
    > +        * list of tables.
    > +        */
    > +       if (vacopts->mode == MODE_REPACK &&
    > +               vacopts->using_index && !vacopts->indexname)
    > +       {
    > +               appendPQExpBufferStr(&catalog_query,
    > +                                                        " AND EXISTS (SELECT 1 FROM pg_catalog.pg_index\n"
    > +                                                        "    WHERE indrelid = c.oid AND indisclustered)\n");
    > +       }
    >
    > I'm not sure if it's worth allowing the --index option to have an
    > argument. Since the user can specify multiple tables, he should also be able
    > to specify multiple indexes. And then the question would be: what should
    > happen if the user forgot to specify (or just mistyped) the index name for a
    > table which does not yet have the clustering index set? Skip that table (and
    > print out a warning)? Or consider it an error?
    >
    
    Ah, yes, this is something completely different. So, we do need a way
    to differentiate between "vacuum full" vs "cluster" all tables... as
    well as "vacuum full" vs "cluster" of a specific table (including the
    idea of "vacuum full" of a previously clustered table, and the
    existing code handles all that (though I might quibble with the option
    name).
    
    As for having an --index= option, I'd love to hear the use case;
    something like partitions or maybe some client per schema situation
    comes to mind, but ISTM in all those cases the user would also know
    (or be expected to know) the table name, so I agree with Antonin that
    the extra complexity doesn't seem worth supporting to me. (It's even
    worse the more you think about it, what if some table has the index
    named above, but is clustered on a different index, then what should
    we do?)
    
    As for the use case I was thinking of, specifying a table and index in
    order to repack using that index (and setting indisclustered if not
    already); while I feel like that would be a useful option, if it isn't
    currently supported I don't see a strong argument for adding it now.
    
    
    Robert Treat
    https://xzilla.net