Thread

  1. Re: Adding REPACK [concurrently]

    Robert Treat <rob@xzilla.net> — 2025-10-13T00:03:18Z

    On Tue, Oct 7, 2025 at 10:05 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
    > On 2025-Sep-26, Robert Treat wrote:
    <snip>
    > That said, on this topic, I've always been bothered by our usage of
    > command names as verbs, because they are (IMO) horrible for translation.
    > For instance, in this version of the patch I am making this change:
    >
    >     if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
    >         ereport(ERROR,
    > -               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    > -                errmsg("cannot cluster a shared catalog")));
    > +               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
    > +               errmsg("cannot run %s on a shared catalog",
    > +                      RepackCommandAsString(cmd)));
    >
    > In the old version, the message is not very translatable because you
    > have to find a native word to say "to cluster" or "to vacuum", and that
    > doesn't always work very well in a direct translation.  For instance, in
    > the Spanish message catalog you find this sort of thing:
    >
    > msgid "vacuuming \"%s.%s.%s\""
    > msgstr "haciendo vacuum a «%s.%s.%s»"
    >
    > which is pretty clear ... but the reason it works, is that I have turned
    > the phrase around before translating it.  I would struggle if I had to
    > find a Spanish verb that means "to repack" without contorting the
    > message or saying something absurd and/or against Spanish language
    > rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
    > XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
    > correct, but far enough from "repack" that it's annoying and potentially
    > confusing).  So I would rather the original used "running REPACK on
    > table using method XYZ", which is very very easy to translate, and then
    > the translator doesn't have to editorialize.
    >
    
    I see you didn't do this in the current patch, but +1 for this idea
    from me. And if you think it'd help, I'm also +1 on the idea for the
    main docs as well, for example doing something like
    
    +  <para>
    -   <application>pg_repackdb</application> is a utility for repacking a
    +   <application>pg_repackdb</application> is a utility for running REPACK on a
    +   <productname>PostgreSQL</productname> database.
    
    I'd be inclined to leave the internal comments alone though, since
    they aren't translated.
    
    > > #5
    > > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
    > > pg_repackdb: repacking database "pagila"
    > >
    > > In the above scenario, I am repacking without having previously
    > > specified an index. At the SQL level this would throw an error, at the
    > > command line it gives me a heart attack. :-)
    > > It's actually not that bad, because we don't actually do anything, but
    > > maybe we should throw an error?
    >
    > Yeah, I think this is confusing.  I think we should make pg_repackdb
    > explicitly indicate what has been done, in all cases, without requiring
    > -v.  Otherwise it's too confusing, particularly for the using-index mode
    > which determines which tables to process based on the existance of an
    > index marked indiscluster.
    >
    
    At the moment, clusterdb runs silently, but vacuumdb emits output, so
    there is an argument for either way as default behavior. That said, I
    think the current behavior of vacuum, which is what we are currently
    following in pg_repackdb, is the worst of the two:
    
     [xzilla@zebes]  pgsql/bin/vacuumdb -t actor  pagila
    vacuumdb: vacuuming database "pagila"
    
    Without any additional information, the information we do give is
    misleading; I would rather not say anything. We could of course try to
    make this more verbose, but I think clusterdb actually gets this
    right...
    - say nothing by default (follow the "rule of silence.")
    - if we want to see commands, pass -e
    - if we want to see the details, pass -v
    - if we do something that causes an error, return the error
    - if we don't want errors, pass -q
    
    This is also how reindexdb works, and I think most of the other
    utilities, and I'd argue this is how vacuumdb should work... to the
    extent I almost consider it a bug that it doesn't (I leave a little
    room since I am not sure why it doesn't operate like the other
    utilities). vacuum is a bit outside the purview of what we are doing
    here, but I do think following clusterdb/reindexdb is the behavior we
    should follow for pg_repackdb.
    
    > I admit I haven't paid too much attention to these tests.  I think I
    > would rather create a separate src/test/regress/sql/repack.sql file with
    > the tests for this command.  Let's consider this part a WIP for now --
    > clearly more tests are needed both for the SQL command CLUSTER and for
    > pg_repackdb.
    
    Yeah, istm as long as we have all 3 commands (repack, cluster, vacuum
    full) we need regression tests for all 3.
    
    > - pg_stat_progress_cluster is no longer a view on top of the low-level
    >   pg_stat_get_progress_info() function.  Instead, it's a view on top of
    >   pg_stat_progress_repack.  The only change it applies on top of that
    >   one is change the command from REPACK to one of VACUUM FULL or
    >   CLUSTER, depending on whether an index is being used or not.  This
    >   should keep the behavior identical to previous versions.
    >   Alternatively we could just hide rows where the command is REPACK, but
    >   I don't think that would be any better.  This way, we maintain
    >   compatibility with tools reading pg_stat_progress_cluster.  Maybe this
    >   is useless and we should just drop the view, not sure, we can discuss
    >   separately.
    >
    
    I think this mostly depends on how aggressive you want to be in moving
    people away from cluster and toward repack. If we remove
    _progress_cluster, it will force people to update monitoring which
    probably encourages people to switch to pg_repackdb. We probably need
    to have at least one "bridge" release though, and I think you've got
    the right balance for that.
    
    > - I noticed that you can do "CLUSTER pg_class ON some_index" and it will
    >   happily modify pg_index.indisclustered, which is a bit weird
    >   considering that allow_system_table_mods is off -- if you later try
    >   ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you.  I think this is
    >   bogus and we should change it so that CLUSTER refuses to change the
    >   clustered index on a system catalog, unless allow_system_table_mods is
    >   on.  However, that would be a change from longstanding behavior which
    >   is specifically tested for in regression tests, so I didn't do it.
    >   We can discuss such a change separately.  But I did make REPACK refuse
    >   to do that, because we don't need to propagate bogus historical
    >   behavior.  So REPACK will fail if you try to change the indisclustered
    >   index, but it will work fine if you repack based on the same index as
    >   before, or repack with no index.
    >
    
    Since cluster will presumably be deprecated with this release, I'd
    leave the existing behavior and move forward with repack as you've
    laid out.
    
    > - pg_repackdb: if you try with a non-superuser without specifying a
    >   table name, it will fail as soon as it hits the first catalog table or
    >   whatever with "ERROR: cannot lock this table".  This is sorta fine for
    >   vacuumdb, but only because VACUUM itself will instead say "WARNING:
    >   cannot lock table XYZ, skipping", so it's not an error and vacuumdb
    >   keeps running.  IMO this is bogus: vacuumdb should not try to process
    >   tables that it doesn't have privileges to.  However, not wanting to
    >   change longstanding behavior, I left that alone.  For pg_repackdb, I
    >   added a condition in the WHERE clause there to only fetch tables that
    >   the current user has MAINTAIN privilege over.  Then you can do a
    >   "pg_repackdb -U foobar" and it will nicely process the tables that
    >   that user is allowed to process.  We can discuss changing the vacuumdb
    >   behavior separately.
    
    Again, vacuumdb seems to be a good example of what not to do, but I'll
    leave that for another thread. In general I like this idea, but it
    does make for a weird corner case where if I specify a table with -t
    that I don't have permission to repack, repack returns silently whilst
    doing nothing. I suppose one way to handle that would be to check if
    the table passed in -t is found in the list of tables with MAINTAIN
    privileges, and if not to issue a WARNING like "%s not found. Make
    sure that the table exists and that you have MAINTAIN privileges".
    
    Robert Treat
    https://xzilla.net