Thread

  1. REPACK and naming

    Bruce Momjian <bruce@momjian.us> — 2025-09-16T16:40:46Z

    I am starting to get worried about the confusing of adding a REPACK
    command.  We already have a lot of confusion around vacuum and analyze:
    
    *  autoanalyze does vacuum and analyze
    *  VACUUM FULL is much different from VACUUM
    
    It seems if we add REPACK as a command, it is somewhere between VACUUM
    FULL and VACUUM in severity/impact.  Should we be rethinking the naming
    in this area?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.
    
    
    
    
  2. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-16T23:42:29Z

    Hi,
    
    On 2025-Sep-16, Bruce Momjian wrote:
    
    > I am starting to get worried about the confusing of adding a REPACK
    > command.  We already have a lot of confusion around vacuum and analyze:
    > 
    > *  autoanalyze does vacuum and analyze
    > *  VACUUM FULL is much different from VACUUM
    > 
    > It seems if we add REPACK as a command, it is somewhere between VACUUM
    > FULL and VACUUM in severity/impact.
    
    No, REPACK is exactly where VACUUM FULL is in terms of impact and
    severity; it's not between anything.  The confusion stems precisely from
    VACUUM being a thing that's a completely different one from VACUUM FULL,
    yet they have pretty much the same name.  What I'm doing is give one of
    those things a different name, to reduce confusion.  Note that there's
    no intention to add autorepack, because that would (IMO) make no sense.
    
    Another thing I'm doing with that patch is rename CLUSTER so that it is
    also REPACK.  This also makes sense, because VACUUM FULL _is_ the same
    as CLUSTER, except that it follows current physical order instead of
    following a specific index's order.
    
    
    Peter E suggested that since we have REINDEX to rewrite indexes, then
    the command to rewrite tables should be RETABLE.  I haven't been able to
    get myself to like that idea, and also I think that was a bit
    tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
    we can have a vote to decide which one of those names to use.  However,
    I don't think that change would make a tremendous difference, and also I
    don't think RETABLE is enough of an English name to become a command
    name.
    
    > Should we be rethinking the naming in this area?
    
    I haven't seen anything that needs renaming TBH, but if you have
    specific proposals, feel free to air them.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "Thou shalt not follow the NULL pointer, for chaos and madness await
    thee at its end." (2nd Commandment for C programmers)
    
    
    
    
  3. Re: REPACK and naming

    Robert Haas <robertmhaas@gmail.com> — 2025-09-17T02:01:03Z

    On Tue, Sep 16, 2025 at 7:42 PM Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    > Peter E suggested that since we have REINDEX to rewrite indexes, then
    > the command to rewrite tables should be RETABLE.  I haven't been able to
    > get myself to like that idea, and also I think that was a bit
    > tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
    > we can have a vote to decide which one of those names to use.  However,
    > I don't think that change would make a tremendous difference, and also I
    > don't think RETABLE is enough of an English name to become a command
    > name.
    
    I think RETABLE is not a proposal to be taken seriously. That's
    extremely confusing.
    
    I don't love the name REPACK, but I think it's good enough. If we come
    up with something better, great.
    
    I agree that having a single command that does both VACUUM FULL and
    CLUSTER makes a lot more sense than the status quo, which is a
    confusing historical accident.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  4. Re: REPACK and naming

    Marcos Pegoraro <marcos@f10.com.br> — 2025-09-17T12:04:19Z

    Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com>
    escreveu:
    
    > I think RETABLE is not a proposal to be taken seriously. That's
    > extremely confusing.
    >
    
    This feature could be used in a future version to rearrange fields in a
    table, for better padding.
    I don't think we have another one available for this purpose.
    
    CREATE TABLE T(A text, B integer, C bigint, D integer);
    
    We could have something like
    RETABLE T USING(B, D, C, A)
    
    So REPACK isn't the best for this, if this feature would exist some day.
    
    regards
    Marcos
    
  5. Re: REPACK and naming

    Thom Brown <thom@linux.com> — 2025-09-17T12:54:47Z

    On Wed, 17 Sept 2025, 03:01 Robert Haas, <robertmhaas@gmail.com> wrote:
    
    > On Tue, Sep 16, 2025 at 7:42 PM Álvaro Herrera <alvherre@alvh.no-ip.org>
    > wrote:
    > > Peter E suggested that since we have REINDEX to rewrite indexes, then
    > > the command to rewrite tables should be RETABLE.  I haven't been able to
    > > get myself to like that idea, and also I think that was a bit
    > > tongue-in-cheek, but if you like RETABLE better than REPACK, then maybe
    > > we can have a vote to decide which one of those names to use.  However,
    > > I don't think that change would make a tremendous difference, and also I
    > > don't think RETABLE is enough of an English name to become a command
    > > name.
    >
    > I think RETABLE is not a proposal to be taken seriously. That's
    > extremely confusing.
    >
    > I don't love the name REPACK, but I think it's good enough. If we come
    > up with something better, great.
    >
    
    COMPACT?
    
    Thom
    
    >
    
  6. Re: REPACK and naming

    Bruce Momjian <bruce@momjian.us> — 2025-09-17T12:54:59Z

    On Wed, Sep 17, 2025 at 01:42:29AM +0200, Álvaro Herrera wrote:
    > Hi,
    > 
    > On 2025-Sep-16, Bruce Momjian wrote:
    > 
    > > I am starting to get worried about the confusing of adding a REPACK
    > > command.  We already have a lot of confusion around vacuum and analyze:
    > > 
    > > *  autoanalyze does vacuum and analyze
    > > *  VACUUM FULL is much different from VACUUM
    > > 
    > > It seems if we add REPACK as a command, it is somewhere between VACUUM
    > > FULL and VACUUM in severity/impact.
    > 
    > No, REPACK is exactly where VACUUM FULL is in terms of impact and
    > severity; it's not between anything.  The confusion stems precisely from
    > VACUUM being a thing that's a completely different one from VACUUM FULL,
    > yet they have pretty much the same name.  What I'm doing is give one of
    > those things a different name, to reduce confusion.  Note that there's
    > no intention to add autorepack, because that would (IMO) make no sense.
    > 
    > Another thing I'm doing with that patch is rename CLUSTER so that it is
    > also REPACK.  This also makes sense, because VACUUM FULL _is_ the same
    > as CLUSTER, except that it follows current physical order instead of
    > following a specific index's order.
    
    So the CLUSTER command is removed and people should use REPACK instead? 
    And VACUUM FULL stays unchanged?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.
    
    
    
    
  7. Re: REPACK and naming

    Ranier Vilela <ranier.vf@gmail.com> — 2025-09-17T13:01:05Z

    Em ter., 16 de set. de 2025 às 13:40, Bruce Momjian <bruce@momjian.us>
    escreveu:
    
    > I am starting to get worried about the confusing of adding a REPACK
    > command.  We already have a lot of confusion around vacuum and analyze:
    >
    > *  autoanalyze does vacuum and analyze
    > *  VACUUM FULL is much different from VACUUM
    >
    > It seems if we add REPACK as a command, it is somewhere between VACUUM
    > FULL and VACUUM in severity/impact.  Should we be rethinking the naming
    > in this area?
    >
    SqlServer has similar feature.
    SHRINK
    
    best regards,
    Ranier Vilela
    
  8. Re: REPACK and naming

    Robert Haas <robertmhaas@gmail.com> — 2025-09-17T13:09:28Z

    On Wed, Sep 17, 2025 at 8:04 AM Marcos Pegoraro <marcos@f10.com.br> wrote:
    > Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com> escreveu:
    >> I think RETABLE is not a proposal to be taken seriously. That's
    >> extremely confusing.
    >
    > This feature could be used in a future version to rearrange fields in a table, for better padding.
    > I don't think we have another one available for this purpose.
    >
    > CREATE TABLE T(A text, B integer, C bigint, D integer);
    >
    > We could have something like
    > RETABLE T USING(B, D, C, A)
    >
    > So REPACK isn't the best for this, if this feature would exist some day.
    
    RETABLE just isn't a word. The code sometimes calls this a REWRITE of
    a table, which would be reasonable. I suspect, though, that changing
    the column order would end up being a form of ALTER TABLE.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  9. REPACK and naming

    David G. Johnston <david.g.johnston@gmail.com> — 2025-09-17T13:17:56Z

    On Wednesday, September 17, 2025, Marcos Pegoraro <marcos@f10.com.br> wrote:
    
    > Em ter., 16 de set. de 2025 às 23:01, Robert Haas <robertmhaas@gmail.com>
    > escreveu:
    >
    >> I think RETABLE is not a proposal to be taken seriously. That's
    >> extremely confusing.
    >>
    >
    > This feature could be used in a future version to rearrange fields in a
    > table, for better padding.
    > I don't think we have another one available for this purpose.
    >
    > CREATE TABLE T(A text, B integer, C bigint, D integer);
    >
    > We could have something like
    > RETABLE T USING(B, D, C, A)
    >
    
    That changes logical aspects of a table and so would be done as part of
    alter table, IMO. “AT tbl Rearrange columns (names list) “
    
    I’m not a fan of “retable” as a command keyword.
    
    But this digresses from the topic at hand.
    
    I’m fine with repack itself.  Deprecating vacuum full would be nice - but
    actually renaming existing things is bound to just make matters worse, IMO.
    
    Concretely, maybe we should remove vacuum full from the vacuum command
    page, and just call it out as compatibility spelling of repack on its
    page.  Maybe do the same for cluster (I haven’t dived into the new feature
    enough to confidently describe all this yet though).
    
    David J.
    
  10. Re: REPACK and naming

    Junwang Zhao <zhjwpku@gmail.com> — 2025-09-17T13:18:23Z

    On Wed, Sep 17, 2025 at 9:01 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
    >
    >
    >
    > Em ter., 16 de set. de 2025 às 13:40, Bruce Momjian <bruce@momjian.us> escreveu:
    >>
    >> I am starting to get worried about the confusing of adding a REPACK
    >> command.  We already have a lot of confusion around vacuum and analyze:
    >>
    >> *  autoanalyze does vacuum and analyze
    >> *  VACUUM FULL is much different from VACUUM
    >>
    >> It seems if we add REPACK as a command, it is somewhere between VACUUM
    >> FULL and VACUUM in severity/impact.  Should we be rethinking the naming
    >> in this area?
    >
    > SqlServer has similar feature.
    > SHRINK
    
    C++ vector has a shrink_to_fit method which seems to serve a similar purpose.
    
    REPACK or REBUILD looks good to me, COMPACT, on the other hand, feels
    more specific to the idea of consolidating free space within a page or block.
    
    >
    > best regards,
    > Ranier Vilela
    
    
    
    -- 
    Regards
    Junwang Zhao
    
    
    
    
  11. Re: REPACK and naming

    Marcos Pegoraro <marcos@f10.com.br> — 2025-09-17T13:36:34Z

    Em qua., 17 de set. de 2025 às 10:17, David G. Johnston <
    david.g.johnston@gmail.com> escreveu:
    
    > That changes logical aspects of a table and so would be done as part of
    > alter table, IMO. “AT tbl Rearrange columns (names list) “
    >
    
    If this command recreates entirely that table, it is not only a logical
    aspect of that table.
    REPACK/RETABLE recreates that table as a replacement for VACUUM FULL/CLUSTER
    and ALTER TABLE REARRANGE COLUMNS would recreate that table too ?
    and both would have USING INDEX to do what CLUSTER does today ? both would
    have CONCURRENTLY ?
    
    Being named REPACK or RETABLE or RECREATE TABLE or COMPACT or anything else
    could do it all.
    
    regards
    Marcos
    
  12. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-17T13:37:45Z

    On 2025-Sep-17, David G. Johnston wrote:
    
    > That changes logical aspects of a table and so would be done as part of
    > alter table, IMO. “AT tbl Rearrange columns (names list) “
    
    Yes.
    
    > Concretely, maybe we should remove vacuum full from the vacuum command
    > page, and just call it out as compatibility spelling of repack on its
    > page.  Maybe do the same for cluster (I haven’t dived into the new feature
    > enough to confidently describe all this yet though).
    
    I think we should list VACUUM FULL as deprecated, document that feature
    in the REPACK documentation page, and leave VACUUM FULL in working state
    so as not to break existing scripts.  Same for CLUSTER.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "Those who use electric razors are infidels destined to burn in hell while
    we drink from rivers of beer, download free vids and mingle with naked
    well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)
    
    
    
    
  13. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-17T13:38:53Z

    On 2025-Sep-17, Bruce Momjian wrote:
    
    > So the CLUSTER command is removed and people should use REPACK instead? 
    > And VACUUM FULL stays unchanged?
    
    No, not removed.  It's going to stay, to avoid breaking scripts.  People
    should use REPACK on new code going forward, but existing code is not
    going to break.  Same with VACUUM FULL.
    
    -- 
    Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
    "Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
    
    
    
    
  14. Re: REPACK and naming

    David G. Johnston <david.g.johnston@gmail.com> — 2025-09-17T13:48:11Z

    On Wednesday, September 17, 2025, Álvaro Herrera <alvherre@alvh.no-ip.org>
    wrote:
    
    > On 2025-Sep-17, David G. Johnston wrote:
    >
    > > Concretely, maybe we should remove vacuum full from the vacuum command
    > > page, and just call it out as compatibility spelling of repack on its
    > > page.  Maybe do the same for cluster (I haven’t dived into the new
    > feature
    > > enough to confidently describe all this yet though).
    >
    > I think we should list VACUUM FULL as deprecated, document that feature
    > in the REPACK documentation page, and leave VACUUM FULL in working state
    > so as not to break existing scripts.  Same for CLUSTER.
    >
    
    I was unclear - this is indeed what I suggesting as well.  Reframe the
    documentation but leave the commands functioning.
    
    David J.
    
  15. Re: REPACK and naming

    Mikhail Nikalayeu <mihailnikalayeu@gmail.com> — 2025-09-17T14:05:33Z

    Ranier Vilela <ranier.vf@gmail.com>:
    > SqlServer has similar feature.
    > SHRINK
    
    MySQL/MariaDB
    OPTIMIZE TABLE table_name
    
    SQL Server
    ALTER TABLE table_name REBUILD
    DBCC SHRINKFILE
    DBCC SHRINKDATABASE
    
    Oracle
    ALTER TABLE table_name SHRINK SPACE
    
    SQLite
    VACUUM
    
    IBM DB2
    REORG TABLE table_name
    
    Sybase ASE
    REORG REBUILD table_name
    
    Best regards,
    Mikhail.
    
    
    
    
  16. Re: REPACK and naming

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-09-17T14:22:22Z

    =?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@alvh.no-ip.org> writes:
    > On 2025-Sep-17, David G. Johnston wrote:
    >> Concretely, maybe we should remove vacuum full from the vacuum command
    >> page, and just call it out as compatibility spelling of repack on its
    >> page.  Maybe do the same for cluster (I haven’t dived into the new feature
    >> enough to confidently describe all this yet though).
    
    > I think we should list VACUUM FULL as deprecated, document that feature
    > in the REPACK documentation page, and leave VACUUM FULL in working state
    > so as not to break existing scripts.  Same for CLUSTER.
    
    I'm not at all in love with documenting VACUUM FULL and CLUSTER as
    being fundamentally the same thing.  I think that is an implementation
    happenstance that could go away as easily as it appeared.  Even if you
    think we'll never again rewrite it for heap, what of other table AMs?
    The underlying reality could be totally different for them.
    
    By and large, I don't think I like this renaming proposal.
    Maybe eventually it would reduce confusion, but there will be
    a long interval where it adds more.
    
    			regards, tom lane
    
    
    
    
  17. Re: REPACK and naming

    Robert Haas <robertmhaas@gmail.com> — 2025-09-17T14:59:55Z

    On Wed, Sep 17, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > By and large, I don't think I like this renaming proposal.
    > Maybe eventually it would reduce confusion, but there will be
    > a long interval where it adds more.
    
    I mean, it's PRETTY confusing that VACUUM FULL does something much
    more similar to CLUSTER than it is to VACUUM. We can't ever get out
    from under that confusion if we don't change something. I think it's
    more than fair to bikeshed what the verb should be that describes the
    action we currently describe by writing either VACUUM FULL or CLUSTER,
    but I agree with Álvaro that having one verb for both of those things
    makes a lot more sense than the status quo.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  18. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-17T15:03:42Z

    On 2025-Sep-17, Tom Lane wrote:
    
    > I'm not at all in love with documenting VACUUM FULL and CLUSTER as
    > being fundamentally the same thing.  I think that is an implementation
    > happenstance that could go away as easily as it appeared.  Even if you
    > think we'll never again rewrite it for heap, what of other table AMs?
    > The underlying reality could be totally different for them.
    
    So there two operations here.  One is 
    REPACK tab USING INDEX idx
    which we currently call CLUSTER, and there is also
    REPACK TAB
    (no index specified) which we currently call VACUUM FULL.  These have
    the very specific charter of rewriting the table while removing bloat,
    the distinction being that they keep the rows ordered according to the
    index or not.  Both these operations currently use the same
    implementation, yes; but if we were to reimplement one of them to use
    some completely different piece of code, then the new command name
    continues to work, it just calls the new different implementation, while
    the other command continues to call the other one.  (Or maybe we decide
    reimplement both using different techniques, and we throw away
    cluster.c, but still the command names continue to be sensible and would
    continue to work.)
    
    Thinking about the other half of your argument: if we add new table AMs
    for which the cluster.c implementation doesn't work, then we'll have to
    wire the table AM support routines to call some different implementation
    into REPACK or REPACK USING INDEX.  This is no different than if we keep
    these commands being VACUUM FULL or CLUSTER; we would still need a
    different implementation underneath, and we would still need to wire the
    table AM support routines to call that different implementation.
    
    So all things considered, I'm not seeing what aspect of the renaming
    exactly are you uncomfortable with.  We're not making the situation any
    worse.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    <Schwern> It does it in a really, really complicated way
    <crab> why does it need to be complicated?
    <Schwern> Because it's MakeMaker.
    
    
    
    
  19. Re: REPACK and naming

    David Rowley <dgrowleyml@gmail.com> — 2025-09-17T20:35:06Z

    On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com> wrote:
    > RETABLE just isn't a word. The code sometimes calls this a REWRITE of
    > a table, which would be reasonable.
    
    +1. I was reading this yesterday wondering why "REWRITE" didn't get a
    mention.  The problem I have with REPACK is that "re" indicates that
    something is being re-done that's been done before. If you're calling
    REPACK for the first time on a table, that's not true.
    
    David J's "REBUILD" also seems ok.  In a green field, you could then
    have "REBUILD TABLE ..." and "REBUILD INDEX ..."
    
    David
    
    
    
    
  20. REPACK and naming

    David G. Johnston <david.g.johnston@gmail.com> — 2025-09-17T21:03:01Z

    On Wednesday, September 17, 2025, David Rowley <dgrowleyml@gmail.com> wrote:
    
    > On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com> wrote:
    > > RETABLE just isn't a word. The code sometimes calls this a REWRITE of
    > > a table, which would be reasonable.
    >
    > +1. I was reading this yesterday wondering why "REWRITE" didn't get a
    > mention.
    
    
    Agreed.
    
    
    >
    > The problem I have with REPACK is that "re" indicates that
    > something is being re-done that's been done before. If you're calling
    > REPACK for the first time on a table, that's not true.
    
    
    As soon as you’ve written the first tuple you’ve begun “packing” the table
    - repack then is simply unpacking it and putting back the stuff you want to
    keep in possibly a structured way.
    
    David J's "REBUILD" also seems ok.  In a green field, you could then
    > have "REBUILD TABLE ..." and "REBUILD INDEX ..."
    >
    
    Rebuild has some prior art apparently, which makes it appealing.  But I’m
    not a fan of the “shrink” usage the other products seem drawn to.
    
    David J.
    
  21. Re: REPACK and naming

    Bruce Momjian <bruce@momjian.us> — 2025-09-17T21:05:38Z

    On Wed, Sep 17, 2025 at 10:59:55AM -0400, Robert Haas wrote:
    > On Wed, Sep 17, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > By and large, I don't think I like this renaming proposal.
    > > Maybe eventually it would reduce confusion, but there will be
    > > a long interval where it adds more.
    > 
    > I mean, it's PRETTY confusing that VACUUM FULL does something much
    > more similar to CLUSTER than it is to VACUUM. We can't ever get out
    > from under that confusion if we don't change something. I think it's
    > more than fair to bikeshed what the verb should be that describes the
    > action we currently describe by writing either VACUUM FULL or CLUSTER,
    > but I agree with Álvaro that having one verb for both of those things
    > makes a lot more sense than the status quo.
    
    Yeah, I think we are in a bad naming place here with VACUUM FULL and
    CLUSTER, and avoiding clarifying it because of a risk of future change
    just seems unwise.  We know it is confusing, and there are no plans to
    expand VACUUM FULL or CLUSTER, so let's improve the user interface and
    take the hit if things ever change in the future.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.
    
    
    
    
  22. Re: REPACK and naming

    Bruce Momjian <bruce@momjian.us> — 2025-09-18T14:56:35Z

    On Wed, Sep 17, 2025 at 05:05:38PM -0400, Bruce Momjian wrote:
    > On Wed, Sep 17, 2025 at 10:59:55AM -0400, Robert Haas wrote:
    > > I mean, it's PRETTY confusing that VACUUM FULL does something much
    > > more similar to CLUSTER than it is to VACUUM. We can't ever get out
    > > from under that confusion if we don't change something. I think it's
    > > more than fair to bikeshed what the verb should be that describes the
    > > action we currently describe by writing either VACUUM FULL or CLUSTER,
    > > but I agree with Álvaro that having one verb for both of those things
    > > makes a lot more sense than the status quo.
    > 
    > Yeah, I think we are in a bad naming place here with VACUUM FULL and
    > CLUSTER, and avoiding clarifying it because of a risk of future change
    > just seems unwise.  We know it is confusing, and there are no plans to
    > expand VACUUM FULL or CLUSTER, so let's improve the user interface and
    > take the hit if things ever change in the future.
    
    Uh, if VACUUM FULL and CLUSTER functionality are going to eventually
    point to REPACK, REPACK is the wrong name.  While I can see REPACK
    having a similar function to VACUUM FULL, the ordering idea of CLUSTER
    just doesn't fit into "repack".  I am repacking in a specific order? 
    What if I only want the order changed without repacking?  When did I
    pack that I have to re-pack now?
    
    Of the names I have seen, I think REBUILD makes the most sense.  I built
    the table --- I want it rebuilt now, and perhaps with a specific
    ordering.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EDB                                      https://enterprisedb.com
    
      Do not let urgent matters crowd out time for investment in the future.
    
    
    
    
  23. Re: REPACK and naming

    Vik Fearing <vik@postgresfriends.org> — 2025-09-18T15:27:16Z

    On 18/09/2025 16:56, Bruce Momjian wrote:
    > On Wed, Sep 17, 2025 at 05:05:38PM -0400, Bruce Momjian wrote:
    >> On Wed, Sep 17, 2025 at 10:59:55AM -0400, Robert Haas wrote:
    >>> I mean, it's PRETTY confusing that VACUUM FULL does something much
    >>> more similar to CLUSTER than it is to VACUUM. We can't ever get out
    >>> from under that confusion if we don't change something. I think it's
    >>> more than fair to bikeshed what the verb should be that describes the
    >>> action we currently describe by writing either VACUUM FULL or CLUSTER,
    >>> but I agree with Álvaro that having one verb for both of those things
    >>> makes a lot more sense than the status quo.
    >> Yeah, I think we are in a bad naming place here with VACUUM FULL and
    >> CLUSTER, and avoiding clarifying it because of a risk of future change
    >> just seems unwise.  We know it is confusing, and there are no plans to
    >> expand VACUUM FULL or CLUSTER, so let's improve the user interface and
    >> take the hit if things ever change in the future.
    > Uh, if VACUUM FULL and CLUSTER functionality are going to eventually
    > point to REPACK, REPACK is the wrong name.  While I can see REPACK
    > having a similar function to VACUUM FULL, the ordering idea of CLUSTER
    > just doesn't fit into "repack".  I am repacking in a specific order?
    > What if I only want the order changed without repacking?  When did I
    > pack that I have to re-pack now?
    >
    > Of the names I have seen, I think REBUILD makes the most sense.  I built
    > the table --- I want it rebuilt now, and perhaps with a specific
    > ordering.
    
    
    +1
    
    -- 
    
    Vik Fearing
    
    
    
    
    
  24. Re: REPACK and naming

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-09-18T22:53:15Z

    On 2025-Sep-18, Bruce Momjian wrote:
    
    > Uh, if VACUUM FULL and CLUSTER functionality are going to eventually
    > point to REPACK, REPACK is the wrong name.
    
    I obviously don't agree.
    
    > While I can see REPACK having a similar function to VACUUM FULL, the
    > ordering idea of CLUSTER just doesn't fit into "repack".
    
    In my opinion, it fits perfectly well.
    
    > I am repacking in a specific order?  What if I only want the order
    > changed without repacking?
    
    Yes, you can ask the repacking process to write the tuples in a specific
    order -- that's a perfectly reasonable thing to ask for, and nobody
    would be confused about what that would do.  You want the order changed
    but not repacking?  I don't see why would anybody think that that's
    something to ask for.
    
    > When did I pack that I have to re-pack now?
    
    David Johnston said it here best:
    https://postgr.es/m/CAKFQuwYbZ2c7viape0B+TAoa_t8WteNfu+RF8+3i=D1ZQZQFAg@mail.gmail.com
    you packed when you started adding tuples to the table.  Repack takes
    what you have "packed" and puts it in order again.
    
    > Of the names I have seen, I think REBUILD makes the most sense.  I
    > built the table --- I want it rebuilt now, and perhaps with a specific
    > ordering.
    
    I frankly don't like REBUILD one bit.  Nobody BUILDs tables either.
    One thing we do build is indexes, but a table is not an index.
    
    (I was not very impressed with David Rowley's idea that we could have
    REBUILD INDEX as a substitute for REINDEX, if we were inventing these in
    a green field.  Two reasons for that: one is that REINDEX is not only a
    Postgres-established term, but also an industry-wide established term.
    The other is that REINDEX also supports REINDEX TABLE, which doesn't
    literally mean to reindex the table but instead of reindex all the
    _indexes_ of the table; if we wanted to use REBUILD for that we would
    have to do something like "REBUILD INDEXES OF TABLE xyz" which, as you
    see, gets unwieldy.  And for what?)
    
    I highly doubt that anybody is going to be confused in the slightest
    about what the REPACK command does, or that they would start to think
    about when did they PACK anything before.  That sort of argument seems
    entirely artificial, and that is not how people think.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "The ability of users to misuse tools is, of course, legendary" (David Steele)
    https://postgr.es/m/11b38a96-6ded-4668-b772-40f992132797@pgmasters.net
    
    
    
    
  25. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-18T22:55:51Z

    On 2025-Sep-17, David G. Johnston wrote:
    
    > On Wednesday, September 17, 2025, David Rowley <dgrowleyml@gmail.com> wrote:
    > 
    > > On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com> wrote:
    > > > RETABLE just isn't a word. The code sometimes calls this a REWRITE of
    > > > a table, which would be reasonable.
    > >
    > > +1. I was reading this yesterday wondering why "REWRITE" didn't get a
    > > mention.
    > 
    > Agreed.
    
    That's true, but ALTER TABLE also rewrites the table in some cases, so
    the words used inside the source code shouldn't really be taken as
    guidance for what to call the user-visible features.  Consider for
    instance that the code that implements this whole thing lives in
    cluster.c, and we're not going to change that file name, at least not
    yet.
    
    I'm not particularly excited about REWRITE, as it sounds a little bit
    too generic, plus we already have the "query rewriter" which rewrites,
    and I think it would be good that we stop overloading terms for
    completely different things.
    
    > > The problem I have with REPACK is that "re" indicates that something
    > > is being re-done that's been done before. If you're calling REPACK
    > > for the first time on a table, that's not true.
    >
    > As soon as you’ve written the first tuple you’ve begun “packing” the
    > table - repack then is simply unpacking it and putting back the stuff
    > you want to keep in possibly a structured way.
    
    Yeah, that's how I see it.
    
    > > David J's "REBUILD" also seems ok.  In a green field, you could then
    > > have "REBUILD TABLE ..." and "REBUILD INDEX ..."
    
    Yeah, but then we have REINDEX TABLE or REINDEX SYSTEM, so if you wanted
    to do all of REINDEX with REPACK or REBUILD, you would need more syntax
    than just that.
    
    It had never occurred to me that we could use this new command (whatever
    its name) to rebuild indexes without rebuilding the table.  I'm not
    proposing that.  Do we actually want to entertain such a proposal?
    
    > Rebuild has some prior art apparently, which makes it appealing.
    
    Can you cite that?  I've seen "ALTER TABLE/INDEX REBUILD", but not
    REBUILD as a standalone command.
    
    -- 
    Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
    
    
    
    
  26. Re: REPACK and naming

    David G. Johnston <david.g.johnston@gmail.com> — 2025-09-18T23:03:59Z

    On Thu, Sep 18, 2025 at 3:55 PM Álvaro Herrera <alvherre@alvh.no-ip.org>
    wrote:
    
    > On 2025-Sep-17, David G. Johnston wrote:
    >
    > > On Wednesday, September 17, 2025, David Rowley <dgrowleyml@gmail.com>
    > wrote:
    > >
    > > > On Thu, 18 Sept 2025 at 01:09, Robert Haas <robertmhaas@gmail.com>
    > wrote:
    > > > > RETABLE just isn't a word. The code sometimes calls this a REWRITE of
    > > > > a table, which would be reasonable.
    > > >
    > > > +1. I was reading this yesterday wondering why "REWRITE" didn't get a
    > > > mention.
    > >
    >
    > I'm not particularly excited about REWRITE, as it sounds a little bit
    > too generic, plus we already have the "query rewriter" which rewrites,
    > and I think it would be good that we stop overloading terms for
    > completely different things.
    >
    
    Agreed
    
    
    > > Rebuild has some prior art apparently, which makes it appealing.
    >
    > Can you cite that?  I've seen "ALTER TABLE/INDEX REBUILD", but not
    > REBUILD as a standalone command.
    >
    >
    I was just skimming the list Mikhail provided.
    
    https://www.postgresql.org/message-id/CADzfLwXjBvqQCuPOaMGe20a+ySTcqSpfvoGyZZmrMkJ=HbtcOw@mail.gmail.com
    
    It isn't standalone but the keyword REBUILD gets used versus our unlisted
    use of REPACK.
    
    But there isn't enough consistency/similarity for me to want to avoid
    repack altogether.
    
    David J.
    
  27. Re: REPACK and naming

    David Rowley <dgrowleyml@gmail.com> — 2025-09-19T00:58:57Z

    On Thu, 18 Sept 2025 at 03:03, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    > So there two operations here.  One is
    > REPACK tab USING INDEX idx
    > which we currently call CLUSTER, and there is also
    > REPACK TAB
    > (no index specified) which we currently call VACUUM FULL.
    
    I was just thinking about how much of a heap-ism cluster using an
    index is. If we were to ever have an index organised table AM, what
    would it mean to REPACK tab USING INDEX idx? Would that "secondary"
    index then go away and the table would become that index? or would
    both continue to exist and the secondary index would be surplus?
    
    I do understand that heap is well ingrained in our code (still), but
    at least things like system catalogue tables/columns can evolve over
    time. e.g pg_index.indisclustered I could imagine evolving (or
    disappearing) if we had an IOT-AM. I do think locking in syntax is
    going to be quite a bit more permanent and needs to be considered very
    carefully. Something like REPACK tab ORDER BY col1; seems a bit more
    future proof. table_relation_copy_for_cluster() does support both use
    of an Index to get presorted results and sorting by the index's key
    columns, so it doesn't seem impossible that the ability to cluster a
    table *specifically* by an index couldn't easily go away at some
    point. Locking us deeper into a syntax for that, I do have concerns
    for. But maybe you've thought about all this already and I'm just not
    aware...
    
    I'm also trying to keep something like a column store in mind here
    where you might not have any indexes, and efficient filtering is done
    via the pruning of "chunks", which works by each chunk recording the
    min/max (or maybe a dictionary of) values it contains for the columns.
    I imagine something like that very much would want the ability to have
    something like REPACK tbl ORDER BY col; if you think how efficient
    run-length encoding would be for some orders and now inefficient it
    could be for other orders.
    
    Anyway, I'm not intentionally trying to make your job here any more
    complex. I'm just trying to help make sure we don't end up with some
    new syntax that also won't stand up to the test of time.
    
    David
    
    
    
    
  28. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-19T09:36:08Z

    On 2025-Sep-19, David Rowley wrote:
    
    > I was just thinking about how much of a heap-ism cluster using an
    > index is. If we were to ever have an index organised table AM, what
    > would it mean to REPACK tab USING INDEX idx? Would that "secondary"
    > index then go away and the table would become that index? or would
    > both continue to exist and the secondary index would be surplus?
    
    So, there's already an implementation of an index-organized table in
    OrioleDB, as I understand, so maybe we can ask Alexander K. about this.
    I suspect it's fine to say that if you have a table for which it makes
    no sense to use REPACK USING INDEX, then we just throw an error in that
    case (but I suppose plain REPACK continues to work, and it just
    recreates/compacts the primary index and rebuilds all secondary indexes,
    just like VACUUM FULL would presumably do.)
    
    > I do understand that heap is well ingrained in our code (still), but
    > at least things like system catalogue tables/columns can evolve over
    > time. e.g pg_index.indisclustered I could imagine evolving (or
    > disappearing) if we had an IOT-AM. I do think locking in syntax is
    > going to be quite a bit more permanent and needs to be considered very
    > carefully. Something like REPACK tab ORDER BY col1; seems a bit more
    > future proof.
    
    Oh, I think we can implement REPACK tab ORDER BY all right -- do note
    that the current syntax has mandatory USING INDEX keywords (unlike
    CLUSTER), so we can add that feature and others with no grammar
    problems.  In fact even for current heaps it might make sense to allow
    an ORDER BY clause for which there's no index.  I don't see us
    gratuituously removing the option of specifying just an index name (or
    indisclustered), though, because there are likely users that have been
    running that for years.
    
    > table_relation_copy_for_cluster() does support both use
    > of an Index to get presorted results and sorting by the index's key
    > columns, so it doesn't seem impossible that the ability to cluster a
    > table *specifically* by an index couldn't easily go away at some
    > point.
    
    Well, I hope you mean that clustering by an index would stop being the
    _only_ way, not that it would completely disappear as an option.
    
    > Locking us deeper into a syntax for that, I do have concerns for. But
    > maybe you've thought about all this already and I'm just not aware...
    
    At this point we're not *implementing* any of that, but it is possible
    to do so afterwards and we're not blocking that road.
    
    > I'm also trying to keep something like a column store in mind here
    > where you might not have any indexes, and efficient filtering is done
    > via the pruning of "chunks", which works by each chunk recording the
    > min/max (or maybe a dictionary of) values it contains for the columns.
    > I imagine something like that very much would want the ability to have
    > something like REPACK tbl ORDER BY col; if you think how efficient
    > run-length encoding would be for some orders and now inefficient it
    > could be for other orders.
    
    That makes sense, yes, and again, AFAICT it can easily be implemented on
    top of the current work.
    
    > Anyway, I'm not intentionally trying to make your job here any more
    > complex. I'm just trying to help make sure we don't end up with some
    > new syntax that also won't stand up to the test of time.
    
    The time you and others spend on this thread is much appreciated.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "If you want to have good ideas, you must have many ideas.  Most of them
    will be wrong, and what you have to learn is which ones to throw away."
                                                             (Linus Pauling)
    
    
    
    
  29. Re: REPACK and naming

    David Rowley <dgrowleyml@gmail.com> — 2025-09-19T11:57:29Z

    On Fri, 19 Sept 2025 at 21:36, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    >
    > On 2025-Sep-19, David Rowley wrote:
    >
    > > I was just thinking about how much of a heap-ism cluster using an
    > > index is. If we were to ever have an index organised table AM, what
    > > would it mean to REPACK tab USING INDEX idx? Would that "secondary"
    > > index then go away and the table would become that index? or would
    > > both continue to exist and the secondary index would be surplus?
    >
    > So, there's already an implementation of an index-organized table in
    > OrioleDB, as I understand, so maybe we can ask Alexander K. about this.
    > I suspect it's fine to say that if you have a table for which it makes
    > no sense to use REPACK USING INDEX, then we just throw an error in that
    > case (but I suppose plain REPACK continues to work, and it just
    > recreates/compacts the primary index and rebuilds all secondary indexes,
    > just like VACUUM FULL would presumably do.)
    
    What I was wondering was if REPACK tab ORDER BY col; would be the way
    to change the IOT's order. Otherwise, I'm not sure how you'd make that
    work aside from providing something like: SELECT
    pg_recluster_iot_by(....)
    
    > > table_relation_copy_for_cluster() does support both use
    > > of an Index to get presorted results and sorting by the index's key
    > > columns, so it doesn't seem impossible that the ability to cluster a
    > > table *specifically* by an index couldn't easily go away at some
    > > point.
    >
    > Well, I hope you mean that clustering by an index would stop being the
    > _only_ way, not that it would completely disappear as an option.
    
    I mean support clustered orders that don't exist in any defined index.
    
    David
    
    
    
    
  30. Re: REPACK and naming

    Antonin Houska <ah@cybertec.at> — 2025-09-19T11:58:30Z

    Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    
    > On 2025-Sep-19, David Rowley wrote:
    > 
    > > I was just thinking about how much of a heap-ism cluster using an
    > > index is. If we were to ever have an index organised table AM, what
    > > would it mean to REPACK tab USING INDEX idx? Would that "secondary"
    > > index then go away and the table would become that index? or would
    > > both continue to exist and the secondary index would be surplus?
    > 
    > So, there's already an implementation of an index-organized table in
    > OrioleDB, as I understand, so maybe we can ask Alexander K. about this.
    > I suspect it's fine to say that if you have a table for which it makes
    > no sense to use REPACK USING INDEX, then we just throw an error in that
    > case (but I suppose plain REPACK continues to work, and it just
    > recreates/compacts the primary index and rebuilds all secondary indexes,
    > just like VACUUM FULL would presumably do.)
    > 
    > > I do understand that heap is well ingrained in our code (still), but
    > > at least things like system catalogue tables/columns can evolve over
    > > time. e.g pg_index.indisclustered I could imagine evolving (or
    > > disappearing) if we had an IOT-AM. I do think locking in syntax is
    > > going to be quite a bit more permanent and needs to be considered very
    > > carefully. Something like REPACK tab ORDER BY col1; seems a bit more
    > > future proof.
    > 
    > Oh, I think we can implement REPACK tab ORDER BY all right -- do note
    > that the current syntax has mandatory USING INDEX keywords (unlike
    > CLUSTER), so we can add that feature and others with no grammar
    > problems.  In fact even for current heaps it might make sense to allow
    > an ORDER BY clause for which there's no index.  I don't see us
    > gratuituously removing the option of specifying just an index name (or
    > indisclustered), though, because there are likely users that have been
    > running that for years.
    > 
    > > table_relation_copy_for_cluster() does support both use
    > > of an Index to get presorted results and sorting by the index's key
    > > columns, so it doesn't seem impossible that the ability to cluster a
    > > table *specifically* by an index couldn't easily go away at some
    > > point.
    > 
    > Well, I hope you mean that clustering by an index would stop being the
    > _only_ way, not that it would completely disappear as an option.
    > 
    > > Locking us deeper into a syntax for that, I do have concerns for. But
    > > maybe you've thought about all this already and I'm just not aware...
    > 
    > At this point we're not *implementing* any of that, but it is possible
    > to do so afterwards and we're not blocking that road.
    > 
    > > I'm also trying to keep something like a column store in mind here
    > > where you might not have any indexes, and efficient filtering is done
    > > via the pruning of "chunks", which works by each chunk recording the
    > > min/max (or maybe a dictionary of) values it contains for the columns.
    > > I imagine something like that very much would want the ability to have
    > > something like REPACK tbl ORDER BY col; if you think how efficient
    > > run-length encoding would be for some orders and now inefficient it
    > > could be for other orders.
    > 
    > That makes sense, yes, and again, AFAICT it can easily be implemented on
    > top of the current work.
    
    Admittedly I haven't thought about clause like ORDER BY yet, but I wonder if
    it'd really be useful. My understanding is that the purpose of clustering is
    to make index scan more efficient: with a clustered table, the heap tuples
    pertaining to given index tuple should be located on the same page, so the
    heap access is not that random.
    
    If IOT-AM table does not have anything like index, I imagine it has some kind
    of ordering information in the system catalog. Without that the query planner
    can hardly utilize the ordering. In such case REPACK should use the catalog
    information on ordering rather than accept arbitrary ORDER BY clause.
    
    -- 
    Antonin Houska
    Web: https://www.cybertec-postgresql.com
    
    
    
    
  31. Re: REPACK and naming

    David Rowley <dgrowleyml@gmail.com> — 2025-09-19T12:19:47Z

    On Fri, 19 Sept 2025 at 23:58, Antonin Houska <ah@cybertec.at> wrote:
    > Admittedly I haven't thought about clause like ORDER BY yet, but I wonder if
    > it'd really be useful. My understanding is that the purpose of clustering is
    > to make index scan more efficient: with a clustered table, the heap tuples
    > pertaining to given index tuple should be located on the same page, so the
    > heap access is not that random.
    
    I imagine that's true most of the time, but it could also be so that
    fewer pages are dirtied when an UPDATE updates a set or rows with the
    same or similar clustered column values.
    
    > If IOT-AM table does not have anything like index, I imagine it has some kind
    > of ordering information in the system catalog. Without that the query planner
    > can hardly utilize the ordering. In such case REPACK should use the catalog
    > information on ordering rather than accept arbitrary ORDER BY clause.
    
    Well, it would be impossible to insert records without some metadata
    to indicate the IOT keys...
    
    You might assume that someone might change their mind one day about
    the chosen order and wish to change it. My point was about leaving the
    door open to support that by having some native syntax that could be
    used to trigger that change.
    
    David
    
    
    
    
  32. Re: REPACK and naming

    Antonin Houska <ah@cybertec.at> — 2025-09-19T12:29:58Z

    David Rowley <dgrowleyml@gmail.com> wrote:
    
    > On Fri, 19 Sept 2025 at 23:58, Antonin Houska <ah@cybertec.at> wrote:
    > > Admittedly I haven't thought about clause like ORDER BY yet, but I wonder if
    > > it'd really be useful. My understanding is that the purpose of clustering is
    > > to make index scan more efficient: with a clustered table, the heap tuples
    > > pertaining to given index tuple should be located on the same page, so the
    > > heap access is not that random.
    > 
    > I imagine that's true most of the time, but it could also be so that
    > fewer pages are dirtied when an UPDATE updates a set or rows with the
    > same or similar clustered column values.
    
    Good point.
    
    > > If IOT-AM table does not have anything like index, I imagine it has some kind
    > > of ordering information in the system catalog. Without that the query planner
    > > can hardly utilize the ordering. In such case REPACK should use the catalog
    > > information on ordering rather than accept arbitrary ORDER BY clause.
    > 
    > Well, it would be impossible to insert records without some metadata
    > to indicate the IOT keys...
    > 
    > You might assume that someone might change their mind one day about
    > the chosen order and wish to change it. My point was about leaving the
    > door open to support that by having some native syntax that could be
    > used to trigger that change.
    
    I doubted whether the current AM API is designed to do catalog changes, but
    then recalled that CLUSTER does set pg_index.indisclustered, and that it does
    so outside table_relation_copy_for_cluster(). So I can now imagine that REPACK
    ... ORDER BY can do something like that.
    
    -- 
    Antonin Houska
    Web: https://www.cybertec-postgresql.com
    
    
    
    
  33. Re: REPACK and naming

    Alvaro Herrera <alvherre@alvh.no-ip.org> — 2025-09-19T12:42:58Z

    On 2025-Sep-19, David Rowley wrote:
    
    > On Fri, 19 Sept 2025 at 21:36, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
    > >
    > > So, there's already an implementation of an index-organized table in
    > > OrioleDB, as I understand, so maybe we can ask Alexander K. about this.
    > > I suspect it's fine to say that if you have a table for which it makes
    > > no sense to use REPACK USING INDEX, then we just throw an error in that
    > > case (but I suppose plain REPACK continues to work, and it just
    > > recreates/compacts the primary index and rebuilds all secondary indexes,
    > > just like VACUUM FULL would presumably do.)
    > 
    > What I was wondering was if REPACK tab ORDER BY col; would be the way
    > to change the IOT's order.
    
    Sure -- the IOT is the "primary index", so what you are saying is, I
    think, the same as what I was saying.
    
    > > Well, I hope you mean that clustering by an index would stop being the
    > > _only_ way, not that it would completely disappear as an option.
    > 
    > I mean support clustered orders that don't exist in any defined index.
    
    Yes: I mean, I imagine that an implementation of
      REPACK tab ORDER BY col2
    (on normal heap tables) when there's no index on col2 would just use a
    seqscan-and-sort to put the tably in that physical order, and
    potentially store somewhere that "col2" is the order that was used.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "Selbst das größte Genie würde nicht weit kommen, wenn es
    alles seinem eigenen Innern verdanken wollte." (Johann Wolfgang von Goethe)
                   Ni aún el genio más grande llegaría muy lejos si
                        quisiera sacarlo todo de su propio interior.
    
    
    
    
  34. Re: REPACK and naming

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-09-19T12:49:02Z

    On 2025-Sep-19, Antonin Houska wrote:
    
    > Admittedly I haven't thought about clause like ORDER BY yet, but I wonder if
    > it'd really be useful. My understanding is that the purpose of clustering is
    > to make index scan more efficient:
    
    Not necessarily.  For some queries in some workloads, having tuples in a
    certain order for a seqscan might give considerable performance benefit
    also.  Moreso with, say, BRIN indexes, where having one tuple in one
    page range or another could mean having to scan that page range or
    eliding it completely.
    
    > with a clustered table, the heap tuples
    > pertaining to given index tuple should be located on the same page, so the
    > heap access is not that random.
    
    Yes, I suppose this is the first-order reason, and probably why we
    currently only support basing clustering on an index.  But I doubt it's
    the only one.  (It's also worth pointing out that quite possibly having
    REPACK CONCURRENTLY is going to make clustering a lot more popular;
    without concurrency, clustering is practically useless.)
    
    > If IOT-AM table does not have anything like index, I imagine it has some kind
    > of ordering information in the system catalog. Without that the query planner
    > can hardly utilize the ordering.
    
    Sure.
    
    > In such case REPACK should use the catalog information on ordering
    > rather than accept arbitrary ORDER BY clause.
    
    ... but, as David said, it might be valuable to change that ordering for
    whatever reason.
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    
    
    
    
  35. Re: REPACK and naming

    Andrew Dunstan <andrew@dunslane.net> — 2025-09-24T11:12:44Z

    On 2025-09-18 Th 6:53 PM, Álvaro Herrera wrote:
    > I highly doubt that anybody is going to be confused in the slightest
    > about what the REPACK command does, or that they would start to think
    > about when did they PACK anything before.  That sort of argument seems
    > entirely artificial, and that is not how people think.
    
    
    
    I agree.
    
    
    cheers
    
    
    andrew
    
    
    --
    Andrew Dunstan
    EDB:https://www.enterprisedb.com