Thread

  1. pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-21T02:05:46Z

    Hi all,
    
    During the last PGCon, I heard that some community members would be
    interested in having pg_reorg directly in core.
    Just to recall, pg_reorg is a functionality developped by NTT that allows
    to redistribute a table without taking locks on it.
    The technique it uses to reorganize the table is to create a temporary copy
    of the table to be redistributed with a CREATE TABLE AS
    whose definition changes if table is redistributed with a VACUUM FULL or
    CLUSTER.
    Then it follows this mechanism:
    - triggers are created to redirect all the DMLs that occur on the table to
    an intermediate log table.
    - creation of indexes on the temporary table based on what the user wishes
    - Apply the logs registered during the index creation
    - Swap the names of freshly created table and old table
    - Drop the useless objects
    
    The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
    I am also maintaining a fork in github in sync with pgfoundry here:
    https://github.com/michaelpq/pg_reorg.
    
    Just, do you guys think it is worth adding a functionality like pg_reorg in
    core or not?
    
    If yes, well I think the code of pg_reorg is going to need some
    modifications to make it more compatible with contrib modules using only
    EXTENSION.
    For the time being pg_reorg is divided into 2 parts, binary and library.
    The library part is the SQL portion of pg_reorg, containing a set of C
    functions that are called by the binary part. This has been extended to
    support CREATE EXTENSION recently.
    The binary part creates a command pg_reorg in charge of calling the set of
    functions created by the lib part, being just a wrapper of the library part
    to control the creation and deletion of the objects.
    It is also in charge of deleting the temporary objects by callback if an
    error occurs.
    
    By using the binary command, it is possible to reorganize a single table or
    a database, in this case reorganizing a database launches only a loop on
    each table of this database.
    
    My idea is to remove the binary part and to rely only on the library part
    to make pg_reorg a single extension with only system functions like other
    contrib modules.
    In order to do that what is missing is a function that could be used as an
    entry point for table reorganization, a function of the type
    pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
    All the functionalities of pg_reorg could be reproducible:
    - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
    - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has
    a CLUSTER key
    - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based
    on a wanted column.
    
    Is it worth the shot?
    
    Regards,
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  2. Re: pg_reorg in core?

    Josh Kupershmidt <schmiddy@gmail.com> — 2012-09-21T03:07:26Z

    On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
    <michael.paquier@gmail.com> wrote:
    > Hi all,
    >
    > During the last PGCon, I heard that some community members would be
    > interested in having pg_reorg directly in core.
    
    I'm actually not crazy about this idea, at least not given the current
    state of pg_reorg. Right now, there are a quite a few fixes and
    features which remain to be merged in to cvs head, but at least we can
    develop pg_reorg on a schedule independent of Postgres itself, i.e. we
    can release new features more often than once a year. Perhaps when
    pg_reorg is more stable, and the known bugs and missing features have
    been ironed out, we could think about integrating into core.
    
    Granted, a nice thing about integrating with core is we'd probably
    have more of an early warning when reshuffling of PG breaks pg_reorg
    (e.g. the recent splitting of the htup headers), but such changes have
    been quick and easy to fix so far.
    
    > Just to recall, pg_reorg is a functionality developped by NTT that allows to
    > redistribute a table without taking locks on it.
    > The technique it uses to reorganize the table is to create a temporary copy
    > of the table to be redistributed with a CREATE TABLE AS
    > whose definition changes if table is redistributed with a VACUUM FULL or
    > CLUSTER.
    > Then it follows this mechanism:
    > - triggers are created to redirect all the DMLs that occur on the table to
    > an intermediate log table.
    
    N.B. CREATE TRIGGER takes an AccessExclusiveLock on the table, see below.
    
    > - creation of indexes on the temporary table based on what the user wishes
    > - Apply the logs registered during the index creation
    > - Swap the names of freshly created table and old table
    > - Drop the useless objects
    >
    > The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
    > I am also maintaining a fork in github in sync with pgfoundry here:
    > https://github.com/michaelpq/pg_reorg.
    >
    > Just, do you guys think it is worth adding a functionality like pg_reorg in
    > core or not?
    >
    > If yes, well I think the code of pg_reorg is going to need some
    > modifications to make it more compatible with contrib modules using only
    > EXTENSION.
    > For the time being pg_reorg is divided into 2 parts, binary and library.
    > The library part is the SQL portion of pg_reorg, containing a set of C
    > functions that are called by the binary part. This has been extended to
    > support CREATE EXTENSION recently.
    > The binary part creates a command pg_reorg in charge of calling the set of
    > functions created by the lib part, being just a wrapper of the library part
    > to control the creation and deletion of the objects.
    > It is also in charge of deleting the temporary objects by callback if an
    > error occurs.
    >
    > By using the binary command, it is possible to reorganize a single table or
    > a database, in this case reorganizing a database launches only a loop on
    > each table of this database.
    >
    > My idea is to remove the binary part and to rely only on the library part to
    > make pg_reorg a single extension with only system functions like other
    > contrib modules.
    
    > In order to do that what is missing is a function that could be used as an
    > entry point for table reorganization, a function of the type
    > pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
    > All the functionalities of pg_reorg could be reproducible:
    > - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
    > - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has a
    > CLUSTER key
    > - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based on
    > a wanted column.
    >
    > Is it worth the shot?
    
    I haven't seen this documented as such, but AFAICT the reason that
    pg_reorg is split into a binary and set of backend functions which are
    called by the binary is that pg_reorg needs to be able to control its
    steps in several transactions so as to avoid holding locks
    excessively. The reorg_one_table() function uses four or five
    transactions per table, in fact. If all the logic currently in the
    pg_reorg binary were moved into backend functions,  calling
    pg_reorg_table() would have to be a single transaction, and there
    would be no advantage to using such a function vs. CLUSTER or VACUUM
    FULL.
    
    Also, having a separate binary we should be able to perform some neat
    tricks such as parallel index builds using multiple connections (I'm
    messing around with this idea now). AFAIK this would also not be
    possible if pg_reorg were contained solely in the library functions.
    
    Josh
    
    
    
  3. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-21T03:33:03Z

    On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy@gmail.com>wrote:
    
    > On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
    > <michael.paquier@gmail.com> wrote:
    > > Hi all,
    > >
    > > During the last PGCon, I heard that some community members would be
    > > interested in having pg_reorg directly in core.
    >
    > I'm actually not crazy about this idea, at least not given the current
    > state of pg_reorg. Right now, there are a quite a few fixes and
    > features which remain to be merged in to cvs head, but at least we can
    > develop pg_reorg on a schedule independent of Postgres itself, i.e. we
    > can release new features more often than once a year. Perhaps when
    > pg_reorg is more stable, and the known bugs and missing features have
    > been ironed out, we could think about integrating into core.
    >
    
    What could be also great is to move the project directly into github to
    facilitate its maintenance and development.
    My own copy is based and synced on what is in pgfoundry as I don't own any
    admin access to on pgfoundry (honestly don't think I can get one either),
    even if I am from NTT. Hey, some people with admin rights here?
    
    
    > Granted, a nice thing about integrating with core is we'd probably
    > have more of an early warning when reshuffling of PG breaks pg_reorg
    > (e.g. the recent splitting of the htup headers), but such changes have
    > been quick and easy to fix so far.
    
    Yes, that is also why I am proposing to integrate it into core. Its
    maintenance pace would be faster and easier than it is now in pgfoundry.
    However, if hackers do not think that it is worth adding it to core... Well
    separate development as done now would be fine but slower...
    Also, just by watching the extension modules in contrib, I haven't seen one
    using both the library and binary at the same time like pg_reorg does.
    
    > - creation of indexes on the temporary table based on what the user wishes
    > > - Apply the logs registered during the index creation
    > > - Swap the names of freshly created table and old table
    > > - Drop the useless objects
    > >
    > > The code is hosted by pg_foundry here:
    > http://pgfoundry.org/projects/reorg/.
    > > I am also maintaining a fork in github in sync with pgfoundry here:
    > > https://github.com/michaelpq/pg_reorg.
    > >
    > > Just, do you guys think it is worth adding a functionality like pg_reorg
    > in
    > > core or not?
    > >
    > > If yes, well I think the code of pg_reorg is going to need some
    > > modifications to make it more compatible with contrib modules using only
    > > EXTENSION.
    > > For the time being pg_reorg is divided into 2 parts, binary and library.
    > > The library part is the SQL portion of pg_reorg, containing a set of C
    > > functions that are called by the binary part. This has been extended to
    > > support CREATE EXTENSION recently.
    > > The binary part creates a command pg_reorg in charge of calling the set
    > of
    > > functions created by the lib part, being just a wrapper of the library
    > part
    > > to control the creation and deletion of the objects.
    > > It is also in charge of deleting the temporary objects by callback if an
    > > error occurs.
    > >
    > > By using the binary command, it is possible to reorganize a single table
    > or
    > > a database, in this case reorganizing a database launches only a loop on
    > > each table of this database.
    > >
    > > My idea is to remove the binary part and to rely only on the library
    > part to
    > > make pg_reorg a single extension with only system functions like other
    > > contrib modules.
    >
    > > In order to do that what is missing is a function that could be used as
    > an
    > > entry point for table reorganization, a function of the type
    > > pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
    > > All the functionalities of pg_reorg could be reproducible:
    > > - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
    > > - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table
    > has a
    > > CLUSTER key
    > > - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization
    > based on
    > > a wanted column.
    > >
    > > Is it worth the shot?
    >
    > I haven't seen this documented as such, but AFAICT the reason that
    > pg_reorg is split into a binary and set of backend functions which are
    > called by the binary is that pg_reorg needs to be able to control its
    > steps in several transactions so as to avoid holding locks
    > excessively. The reorg_one_table() function uses four or five
    > transactions per table, in fact. If all the logic currently in the
    > pg_reorg binary were moved into backend functions,  calling
    > pg_reorg_table() would have to be a single transaction, and there
    > would be no advantage to using such a function vs. CLUSTER or VACUUM
    > FULL.
    >
    Of course, but functionalities like CREATE INDEX CONCURRENTLY use multiple
    transactions. Couldn't it be possible to use something similar to make the
    modifications visible to other backends?
    
    
    >
    > Also, having a separate binary we should be able to perform some neat
    > tricks such as parallel index builds using multiple connections (I'm
    > messing around with this idea now). AFAIK this would also not be
    > possible if pg_reorg were contained solely in the library functions.
    >
    Interesting idea, this could accelerate the whole process. I am just
    wondering about possible consistency issues like the logs being replayed
    before swap.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  4. Re: pg_reorg in core?

    Hitoshi Harada <umi.tanuki@gmail.com> — 2012-09-21T04:00:47Z

    On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
    <michael.paquier@gmail.com> wrote:
    > Hi all,
    >
    > During the last PGCon, I heard that some community members would be
    > interested in having pg_reorg directly in core.
    > Just to recall, pg_reorg is a functionality developped by NTT that allows to
    > redistribute a table without taking locks on it.
    > The technique it uses to reorganize the table is to create a temporary copy
    > of the table to be redistributed with a CREATE TABLE AS
    > whose definition changes if table is redistributed with a VACUUM FULL or
    > CLUSTER.
    > Then it follows this mechanism:
    > - triggers are created to redirect all the DMLs that occur on the table to
    > an intermediate log table.
    > - creation of indexes on the temporary table based on what the user wishes
    > - Apply the logs registered during the index creation
    > - Swap the names of freshly created table and old table
    > - Drop the useless objects
    >
    
    I'm not familiar with pg_reorg, but I wonder why we need a separate
    program for this task.  I know pg_reorg is ok as an external program
    per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
    pessimistic about) in the same way, it's much nicer than having
    additional binary + extension.  Isn't it possible to do the same thing
    above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?
    
    Thanks,
    -- 
    Hitoshi Harada
    
    
    
  5. Re: pg_reorg in core?

    Josh Kupershmidt <schmiddy@gmail.com> — 2012-09-21T04:17:02Z

    On Thu, Sep 20, 2012 at 8:33 PM, Michael Paquier
    <michael.paquier@gmail.com> wrote:
    
    > On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy@gmail.com>
    > wrote:
    >>
    >> On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
    >> <michael.paquier@gmail.com> wrote:
    
    > What could be also great is to move the project directly into github to
    > facilitate its maintenance and development.
    
    No argument from me there, especially as I have my own fork in github,
    but that's up to the current maintainers.
    
    >> Granted, a nice thing about integrating with core is we'd probably
    >> have more of an early warning when reshuffling of PG breaks pg_reorg
    >> (e.g. the recent splitting of the htup headers), but such changes have
    >> been quick and easy to fix so far.
    >
    > Yes, that is also why I am proposing to integrate it into core. Its
    > maintenance pace would be faster and easier than it is now in pgfoundry.
    
    If the argument for moving pg_reorg into core is "faster and easier"
    development, well I don't really buy that. Yes, there would presumably
    be more eyeballs on the project, but you could make the same argument
    about any auxiliary Postgres project which wants more attention, and
    we can't have everything in core. And I fail to see how being in-core
    makes development "easier"; I think everyone here would agree that the
    bar to commit things to core is pretty darn high. If you're concerned
    about the [lack of] development on pg_reorg, there are plenty of
    things to fix without moving the project. I recently posted an "issues
    roundup" to the reorg list, if you are interested in pitching in.
    
    Josh
    
    
    
  6. Re: pg_reorg in core?

    Daniele Varrazzo <daniele.varrazzo@gmail.com> — 2012-09-21T12:33:37Z

    On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt <schmiddy@gmail.com> wrote:
    
    > If the argument for moving pg_reorg into core is "faster and easier"
    > development, well I don't really buy that.
    
    I don't see any problem in having pg_reorg in PGXN instead.
    
    I've tried adding a META.json to the project and it seems working fine
    with the pgxn client. It is together with other patches in my own
    github fork.
    
    https://github.com/dvarrazzo/pg_reorg/
    
    I haven't submitted it to PGXN as I prefer the original author to keep
    the ownership.
    
    -- Daniele
    
    
    
  7. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-21T13:32:32Z

    On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo <
    daniele.varrazzo@gmail.com> wrote:
    
    > On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt <schmiddy@gmail.com>
    > wrote:
    >
    > > If the argument for moving pg_reorg into core is "faster and easier"
    > > development, well I don't really buy that.
    >
    > I don't see any problem in having pg_reorg in PGXN instead.
    >
    > I've tried adding a META.json to the project and it seems working fine
    > with the pgxn client. It is together with other patches in my own
    > github fork.
    >
    > https://github.com/dvarrazzo/pg_reorg/
    >
    > I haven't submitted it to PGXN as I prefer the original author to keep
    > the ownership.
    >
    Thanks, I merged your patches with the dev branch for the time being.
    It would be great to have some input from the maintainers of pg_reorg in
    pgfoundry to see if they agree about putting it in pgxn.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  8. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-21T13:42:45Z

    On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada <umi.tanuki@gmail.com>wrote:
    
    > I'm not familiar with pg_reorg, but I wonder why we need a separate
    > program for this task.  I know pg_reorg is ok as an external program
    > per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
    > pessimistic about) in the same way, it's much nicer than having
    > additional binary + extension.  Isn't it possible to do the same thing
    > above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?
    >
    CLUSTER might be more adapted in this case as the purpose is to reorder the
    table.
    The same technique used by pg_reorg (aka table coupled with triggers) could
    lower the lock access of the table.
    Also, it could be possible to control each sub-operation in the same
    fashion way as CREATE INDEX CONCURRENTLY.
    By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
    of doubts:
    1) isn't it be too costly for a core operation as pg_reorg really needs
    many temporary objects? Could be possible to reduce the number of objects
    created if added to core though...
    2) Do you think the current CLUSTER is enough and are there wishes to
    implement such an optimization directly in core?
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  9. Re: pg_reorg in core?

    sakamoto <dsakamoto@lolloo.net> — 2012-09-22T00:08:35Z

    (2012/09/21 22:32), Michael Paquier wrote:
    > On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo 
    > <daniele.varrazzo@gmail.com <mailto:daniele.varrazzo@gmail.com>> wrote:
    >
    >     On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
    >     <schmiddy@gmail.com <mailto:schmiddy@gmail.com>> wrote:
    >
    >     I haven't submitted it to PGXN as I prefer the original author to keep
    >     the ownership.
    >
    > Thanks, I merged your patches with the dev branch for the time being.
    > It would be great to have some input from the maintainers of pg_reorg 
    > in pgfoundry to see if they agree about putting it in pgxn.
    >
    Hi, I'm Sakamoto, reorg mainainer.
    I'm very happy Josh, Michael  and Daniele are interested in reorg.
    
    I'm working on the next version of reorg 1.1.8, which will be released 
    in a couple of days.
    And I come to think that it is a point to reconsider the way to 
    develop/maintain.
    To be honest,   we have little available development resources, so no 
    additional
    features are added recently.  But features and fixes to be done (as Josh 
    sums up. thanks).
    
    I think it is a good idea to develop on github. Michael's repo is the root?
    After the release of 1.1.8, I will freeze CVS repository and create a 
    mirror on github.
    # Or Michael's repo will do :)
    
    I have received some patches from Josh, Daniele. It should be developed 
    in the next
    major version 1.2. So some of them may not be included in 1.1.8 (caz 
    it's minor versionup),
    but I feel so appreciated.
    
    I think we can discuss further at reorg list.
    
    Sakamoto
    
    
    
  10. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-22T00:17:35Z

    On Sat, Sep 22, 2012 at 9:08 AM, sakamoto <dsakamoto@lolloo.net> wrote:
    
    > (2012/09/21 22:32), Michael Paquier wrote:
    >
    >> On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo <
    >> daniele.varrazzo@gmail.com <mailto:daniele.varrazzo@**gmail.com<daniele.varrazzo@gmail.com>>>
    >> wrote:
    >>
    >>     On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
    >>     <schmiddy@gmail.com <mailto:schmiddy@gmail.com>> wrote:
    >>
    >>     I haven't submitted it to PGXN as I prefer the original author to keep
    >>     the ownership.
    >>
    >> Thanks, I merged your patches with the dev branch for the time being.
    >> It would be great to have some input from the maintainers of pg_reorg in
    >> pgfoundry to see if they agree about putting it in pgxn.
    >>
    >>  Hi, I'm Sakamoto, reorg mainainer.
    > I'm very happy Josh, Michael  and Daniele are interested in reorg.
    >
    > I'm working on the next version of reorg 1.1.8, which will be released in
    > a couple of days.
    > And I come to think that it is a point to reconsider the way to
    > develop/maintain.
    > To be honest,   we have little available development resources, so no
    > additional
    > features are added recently.  But features and fixes to be done (as Josh
    > sums up. thanks).
    >
    > I think it is a good idea to develop on github. Michael's repo is the root?
    > After the release of 1.1.8, I will freeze CVS repository and create a
    > mirror on github.
    > # Or Michael's repo will do :)
    >
    As you wish. You could create a root folder based on a new organization, or
    on your own account, or use my repo.
    The result will be the same. I let it at your appreciation
    
    I have received some patches from Josh, Daniele. It should be developed in
    > the next
    > major version 1.2. So some of them may not be included in 1.1.8 (caz it's
    > minor versionup),
    > but I feel so appreciated.
    >
    Great!
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  11. Re: pg_reorg in core?

    Christopher Browne <cbbrowne@gmail.com> — 2012-09-22T01:02:26Z

    If the present project is having a tough time doing enhancements, I should
    think it mighty questionable to try to draw it into core, that presses it
    towards a group of already very busy developers.
    
    On the other hand, if the present development efforts can be made more
    public, by having them take place in a more public repository, that at
    least has potential to let others in the community see and participate.
    There are no guarantees, but privacy is liable to hurt.
    
    I wouldn't expect any sudden huge influx of developers, but a steady
    visible stream of development effort would be mighty useful to a "merge
    into core" argument.
    
    A *lot* of projects are a lot like this.  On the Slony project, we have
    tried hard to maintain this sort of visibility.  Steve Singer, Jan Wieck
    and I do our individual efforts on git repos visible at GitHub to ensure
    ongoing efforts aren't invisible inside a corporate repo.  It hasn't led to
    any massive of extra developers, but I am always grateful to see Peter
    Eisentraut's bug reports.
    
  12. Re: pg_reorg in core?

    sakamoto <dsakamoto@lolloo.net> — 2012-09-22T02:01:01Z

    (2012/09/22 10:02), Christopher Browne wrote:
    >
    > If the present project is having a tough time doing enhancements, I 
    > should think it mighty questionable to try to draw it into core, that 
    > presses it towards a group of already very busy developers.
    >
    > On the other hand, if the present development efforts can be made more 
    > public, by having them take place in a more public repository, that at 
    > least has potential to let others in the community see and 
    > participate.  There are no guarantees, but privacy is liable to hurt.
    >
    > I wouldn't expect any sudden huge influx of developers, but a steady 
    > visible stream of development effort would be mighty useful to a 
    > "merge into core" argument.
    >
    > A *lot* of projects are a lot like this.  On the Slony project, we 
    > have tried hard to maintain this sort of visibility.  Steve Singer, 
    > Jan Wieck and I do our individual efforts on git repos visible at 
    > GitHub to ensure ongoing efforts aren't invisible inside a corporate 
    > repo.  It hasn't led to any massive of extra developers, but I am 
    > always grateful to see Peter Eisentraut's bug reports.
    >
    
    Agreed.  What reorg project needs first is transparency, including
    issue traking, bugs,  listup todo items, clearfied release schedules,
    quarity assurance and so force.
    Only after all that done, the discussion to put them to core can be started.
    
    Until now, reorg is developed and maintained behind corporate repository.
    But now that its activity goes slow, what I should do as a maintainer is to
    try development process more public and finds someone to corporate with:)
    
    Sakamoto
    
    
    
  13. Re: pg_reorg in core?

    Satoshi Nagayasu <snaga@uptime.jp> — 2012-09-22T07:25:21Z

    (2012/09/22 11:01), sakamoto wrote:
    > (2012/09/22 10:02), Christopher Browne wrote:
    >>
    >> If the present project is having a tough time doing enhancements, I 
    >> should think it mighty questionable to try to draw it into core, that 
    >> presses it towards a group of already very busy developers.
    >>
    >> On the other hand, if the present development efforts can be made more 
    >> public, by having them take place in a more public repository, that at 
    >> least has potential to let others in the community see and 
    >> participate.  There are no guarantees, but privacy is liable to hurt.
    >>
    >> I wouldn't expect any sudden huge influx of developers, but a steady 
    >> visible stream of development effort would be mighty useful to a 
    >> "merge into core" argument.
    >>
    >> A *lot* of projects are a lot like this.  On the Slony project, we 
    >> have tried hard to maintain this sort of visibility.  Steve Singer, 
    >> Jan Wieck and I do our individual efforts on git repos visible at 
    >> GitHub to ensure ongoing efforts aren't invisible inside a corporate 
    >> repo.  It hasn't led to any massive of extra developers, but I am 
    >> always grateful to see Peter Eisentraut's bug reports.
    >>
    > 
    > Agreed.  What reorg project needs first is transparency, including
    > issue traking, bugs,  listup todo items, clearfied release schedules,
    > quarity assurance and so force.
    > Only after all that done, the discussion to put them to core can be 
    > started.
    > 
    > Until now, reorg is developed and maintained behind corporate repository.
    > But now that its activity goes slow, what I should do as a maintainer is to
    > try development process more public and finds someone to corporate with:)
    
    I think it's time to consider some *umbrella project* for maintaining
    several small projects outside the core.
    
    As you pointed out, the problem here is that it's difficult to keep
    enough eyeballs and development resource on tiny projects outside
    the core.
    
    For examples, NTT OSSC has created lots of tools, but they're facing
    some difficulties to keep them being maintained because of their
    development resources. There're diffrent code repositories, different
    web sites, diffirent issus tracking system and different dev mailing
    lists, for different small projects. My xlogdump as well.
    
    Actually, that's the reason why it's difficult to keep enough eyeballs
    on small third-party projects. And also the reason why some developers
    want to push their tools into the core, isn't it? :)
    
    To solve this problem, I would like to have some umbrella project.
    It would be called "pg dba utils", or something like this.
    This umbrella project may contain several third-party tools (pg_reorg,
    pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.
    
    And also it may have single web site, code repository, issue tracking
    system and developer mailing list in order to share its development
    resource for testing, maintening and releasing. I think it would help
    third-party projects keep enough eyeballs even outside the core.
    
    Of course, if a third-party project has faster pace on its development
    and enough eyeballs to maintain, it's ok to be an independent project.
    However when a tool have already got matured with less eyeballs,
    it needs to be merged into this umbrella project.
    
    Any comments?
    
    > 
    > Sakamoto
    > 
    > 
    
    
    -- 
    Satoshi Nagayasu <snaga@uptime.jp>
    Uptime Technologies, LLC. http://www.uptime.jp
    
    
    
  14. Re: pg_reorg in core?

    Pavel Stehule <pavel.stehule@gmail.com> — 2012-09-22T08:01:44Z

    2012/9/22 Satoshi Nagayasu <snaga@uptime.jp>:
    > (2012/09/22 11:01), sakamoto wrote:
    >> (2012/09/22 10:02), Christopher Browne wrote:
    >>>
    >>> If the present project is having a tough time doing enhancements, I
    >>> should think it mighty questionable to try to draw it into core, that
    >>> presses it towards a group of already very busy developers.
    >>>
    >>> On the other hand, if the present development efforts can be made more
    >>> public, by having them take place in a more public repository, that at
    >>> least has potential to let others in the community see and
    >>> participate.  There are no guarantees, but privacy is liable to hurt.
    >>>
    >>> I wouldn't expect any sudden huge influx of developers, but a steady
    >>> visible stream of development effort would be mighty useful to a
    >>> "merge into core" argument.
    >>>
    >>> A *lot* of projects are a lot like this.  On the Slony project, we
    >>> have tried hard to maintain this sort of visibility.  Steve Singer,
    >>> Jan Wieck and I do our individual efforts on git repos visible at
    >>> GitHub to ensure ongoing efforts aren't invisible inside a corporate
    >>> repo.  It hasn't led to any massive of extra developers, but I am
    >>> always grateful to see Peter Eisentraut's bug reports.
    >>>
    >>
    >> Agreed.  What reorg project needs first is transparency, including
    >> issue traking, bugs,  listup todo items, clearfied release schedules,
    >> quarity assurance and so force.
    >> Only after all that done, the discussion to put them to core can be
    >> started.
    >>
    >> Until now, reorg is developed and maintained behind corporate repository.
    >> But now that its activity goes slow, what I should do as a maintainer is to
    >> try development process more public and finds someone to corporate with:)
    >
    > I think it's time to consider some *umbrella project* for maintaining
    > several small projects outside the core.
    >
    > As you pointed out, the problem here is that it's difficult to keep
    > enough eyeballs and development resource on tiny projects outside
    > the core.
    >
    > For examples, NTT OSSC has created lots of tools, but they're facing
    > some difficulties to keep them being maintained because of their
    > development resources. There're diffrent code repositories, different
    > web sites, diffirent issus tracking system and different dev mailing
    > lists, for different small projects. My xlogdump as well.
    >
    > Actually, that's the reason why it's difficult to keep enough eyeballs
    > on small third-party projects. And also the reason why some developers
    > want to push their tools into the core, isn't it? :)
    >
    > To solve this problem, I would like to have some umbrella project.
    > It would be called "pg dba utils", or something like this.
    > This umbrella project may contain several third-party tools (pg_reorg,
    > pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.
    >
    > And also it may have single web site, code repository, issue tracking
    > system and developer mailing list in order to share its development
    > resource for testing, maintening and releasing. I think it would help
    > third-party projects keep enough eyeballs even outside the core.
    >
    > Of course, if a third-party project has faster pace on its development
    > and enough eyeballs to maintain, it's ok to be an independent project.
    > However when a tool have already got matured with less eyeballs,
    > it needs to be merged into this umbrella project.
    >
    > Any comments?
    >
    
    good idea
    
    Pavel
    
    >>
    >> Sakamoto
    >>
    >>
    >
    >
    > --
    > Satoshi Nagayasu <snaga@uptime.jp>
    > Uptime Technologies, LLC. http://www.uptime.jp
    >
    >
    > --
    > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    > To make changes to your subscription:
    > http://www.postgresql.org/mailpref/pgsql-hackers
    
    
    
  15. Re: pg_reorg in core?

    Peter Eisentraut <peter_e@gmx.net> — 2012-09-22T23:45:24Z

    On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:
    > I think it's time to consider some *umbrella project* for maintaining
    > several small projects outside the core.
    
    Well, that was pgfoundry, and it didn't work out.
    
    
    
    
  16. Re: pg_reorg in core?

    Christopher Browne <cbbrowne@gmail.com> — 2012-09-23T01:21:34Z

    On Sat, Sep 22, 2012 at 7:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
    > On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:
    >> I think it's time to consider some *umbrella project* for maintaining
    >> several small projects outside the core.
    >
    > Well, that was pgfoundry, and it didn't work out.
    
    There seem to be some efforts to update it, but yeah, the software
    behind it didn't age gracefully, and it seems doubtful to me that
    people will be flocking back to pgfoundry.
    
    The other ongoing attempt at an "umbrella" is PGXN, and it's different
    enough in approach that, while it's not obvious that it'll succeed, if
    it fails, the failure wouldn't involve the same set of issues that
    made pgfoundry problematic.
    
    PGXN notably captures metadata about the project; resources (e.g. -
    SCM) don't have to be kept there.
    -- 
    When confronted by a difficult problem, solve it by reducing it to the
    question, "How would the Lone Ranger handle this?"
    
    
    
  17. Re: pg_reorg in core?

    Greg Sabino Mullane <greg@turnstep.com> — 2012-09-23T03:37:42Z

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: RIPEMD160
    
    
    >> I think it's time to consider some *umbrella project* for maintaining
    >> several small projects outside the core.
    >
    > Well, that was pgfoundry, and it didn't work out.
    
    I'm not sure that is quite analogous to what was being proposed. 
    I read it as more of "let's package a bunch of these small utilities 
    together into a single project", such that installing one installs them 
    all (e.g. aptitude install pg_tools), and they all have a single bug 
    tracker, etc. That tracker could be github, of course.
    
    I'm not convinced of the merit of that plan, but that's an alternative 
    interpretation that doesn't involve our beloved pgfoundry. :)
    
    Oh, and -1 for putting it in core. Way too early, and not 
    important enough.
    
    - -- 
    Greg Sabino Mullane greg@turnstep.com
    PGP Key: 0x14964AC8 201209222334
    http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
    -----BEGIN PGP SIGNATURE-----
    
    iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
    xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
    =KgXW
    -----END PGP SIGNATURE-----
    
    
    
    
    
  18. Re: pg_reorg in core?

    Satoshi Nagayasu <snaga@uptime.jp> — 2012-09-23T16:14:38Z

    2012/09/23 12:37, Greg Sabino Mullane wrote:
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: RIPEMD160
    > 
    > 
    >>> I think it's time to consider some *umbrella project* for maintaining
    >>> several small projects outside the core.
    >>
    >> Well, that was pgfoundry, and it didn't work out.
    > 
    > I'm not sure that is quite analogous to what was being proposed.
    > I read it as more of "let's package a bunch of these small utilities
    > together into a single project", such that installing one installs them
    > all (e.g. aptitude install pg_tools), and they all have a single bug
    > tracker, etc. That tracker could be github, of course.
    
    Exactly --- I do not care the SCM system though. :)
    
    > I'm not convinced of the merit of that plan, but that's an alternative
    > interpretation that doesn't involve our beloved pgfoundry. :)
    
    For example, xlogdump had not been maintained for 5 years when
    I picked it up last year. And the latest pg_filedump that supports 9.2
    has not been released yet. pg_reorg as well.
    
    If those tools are in a single project, it would be easier to keep
    attention on it. Then, developers can easily build *all of them*
    at once, fix them, and post any patch on the single mailing list.
    Actually, it would save developers from waisting their time.
    
    From my viewpoint, it's not just a SCM or distributing issue.
    It's about how to survive for such small projects around the core
    even if these could not come in the core.
    
    Regards,
    
    > 
    > Oh, and -1 for putting it in core. Way too early, and not
    > important enough.
    > 
    > - -- 
    > Greg Sabino Mullane greg@turnstep.com
    > PGP Key: 0x14964AC8 201209222334
    > http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
    > -----BEGIN PGP SIGNATURE-----
    > 
    > iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
    > xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
    > =KgXW
    > -----END PGP SIGNATURE-----
    > 
    > 
    > 
    > 
    
    
    -- 
    Satoshi Nagayasu <snaga@uptime.jp>
    Uptime Technologies, LLC. http://www.uptime.jp
    
    
    
  19. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-23T23:23:58Z

    On Mon, Sep 24, 2012 at 1:14 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
    
    > 2012/09/23 12:37, Greg Sabino Mullane wrote:
    > > -----BEGIN PGP SIGNED MESSAGE-----
    > > Hash: RIPEMD160
    > >
    > >
    > >>> I think it's time to consider some *umbrella project* for maintaining
    > >>> several small projects outside the core.
    > >>
    > >> Well, that was pgfoundry, and it didn't work out.
    > >
    > > I'm not sure that is quite analogous to what was being proposed.
    > > I read it as more of "let's package a bunch of these small utilities
    > > together into a single project", such that installing one installs them
    > > all (e.g. aptitude install pg_tools), and they all have a single bug
    > > tracker, etc. That tracker could be github, of course.
    >
    > Exactly --- I do not care the SCM system though. :)
    
    The bug tracker is going to be a mess if it has to manage 100 subprojects,
    knowing that each of them is strictly independant.
    Maintainers are also different people for each tool.
    
    
    >
    > > I'm not convinced of the merit of that plan, but that's an alternative
    > > interpretation that doesn't involve our beloved pgfoundry. :)
    >
    > For example, xlogdump had not been maintained for 5 years when
    > I picked it up last year. And the latest pg_filedump that supports 9.2
    > has not been released yet. pg_reorg as well.
    >
    > If those tools are in a single project, it would be easier to keep
    > attention on it. Then, developers can easily build *all of them*
    > at once, fix them, and post any patch on the single mailing list.
    > Actually, it would save developers from waisting their time.
    >
    > From my viewpoint, it's not just a SCM or distributing issue.
    > It's about how to survive for such small projects around the core
    > even if these could not come in the core.
    >
    The package manager system could be  easily pgxn. It is already designed
    for that.
    For development what you are looking for here is something that github
    could perfectly manage.
    As proposed by Masahiko, a single organization grouping all the tools (one
    repository per tool) would be enough. Please note that github can also host
    documentation. Bug tracker would be tool-dedicated in this case.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  20. Re: pg_reorg in core?

    Daniele Varrazzo <daniele.varrazzo@gmail.com> — 2012-09-24T01:02:51Z

    On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
    <michael.paquier@gmail.com> wrote:
    
    > As proposed by Masahiko, a single organization grouping all the tools (one
    > repository per tool) would be enough. Please note that github can also host
    > documentation. Bug tracker would be tool-dedicated in this case.
    
    From this PoV, pgFoundry allows your tool to be under
    http://yourtool.projects.postgresql.org instead of under a more
    generic namespace: I find it a nice and cozy place in the url space
    where to put your project. If pgFoundry will be dismissed I hope at
    least a hosting service for static pages will remain.
    
    -- Daniele
    
    
    
  21. Re: pg_reorg in core?

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2012-09-24T14:17:53Z

    Excerpts from Daniele Varrazzo's message of dom sep 23 22:02:51 -0300 2012:
    > On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
    > <michael.paquier@gmail.com> wrote:
    > 
    > > As proposed by Masahiko, a single organization grouping all the tools (one
    > > repository per tool) would be enough. Please note that github can also host
    > > documentation. Bug tracker would be tool-dedicated in this case.
    > 
    > From this PoV, pgFoundry allows your tool to be under
    > http://yourtool.projects.postgresql.org instead of under a more
    > generic namespace: I find it a nice and cozy place in the url space
    > where to put your project. If pgFoundry will be dismissed I hope at
    > least a hosting service for static pages will remain.
    
    I don't think that has been offered.
    
    -- 
    Álvaro Herrera                http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  22. Re: pg_reorg in core?

    Christopher Browne <cbbrowne@gmail.com> — 2012-09-24T15:00:08Z

    On Mon, Sep 24, 2012 at 10:17 AM, Alvaro Herrera
    <alvherre@2ndquadrant.com> wrote:
    > Excerpts from Daniele Varrazzo's message of dom sep 23 22:02:51 -0300 2012:
    >> On Mon, Sep 24, 2012 at 12:23 AM, Michael Paquier
    >> <michael.paquier@gmail.com> wrote:
    >>
    >> > As proposed by Masahiko, a single organization grouping all the tools (one
    >> > repository per tool) would be enough. Please note that github can also host
    >> > documentation. Bug tracker would be tool-dedicated in this case.
    >>
    >> From this PoV, pgFoundry allows your tool to be under
    >> http://yourtool.projects.postgresql.org instead of under a more
    >> generic namespace: I find it a nice and cozy place in the url space
    >> where to put your project. If pgFoundry will be dismissed I hope at
    >> least a hosting service for static pages will remain.
    >
    > I don't think that has been offered.
    
    But I don't think it's necessarily the case that pgFoundry is getting
    "dismissed", either.
    
    I got a note from Marc Fournier not too long ago (sent to some
    probably-not-small set of people with pgFoundry accounts) indicating
    that they were planning to upgrade gForge as far as they could, and
    then switch to FusionForge <http://fusionforge.org/>, which is
    evidently the successor.  It shouldn't be assumed that the upgrade
    process will be easy or quick.
    -- 
    When confronted by a difficult problem, solve it by reducing it to the
    question, "How would the Lone Ranger handle this?"
    
    
    
  23. Re: pg_reorg in core?

    Simon Riggs <simon@2ndquadrant.com> — 2012-09-24T15:15:32Z

    On 21 September 2012 08:42, Michael Paquier <michael.paquier@gmail.com> wrote:
    >
    >
    > On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada <umi.tanuki@gmail.com>
    > wrote:
    >>
    >> I'm not familiar with pg_reorg, but I wonder why we need a separate
    >> program for this task.  I know pg_reorg is ok as an external program
    >> per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
    >> pessimistic about) in the same way, it's much nicer than having
    >> additional binary + extension.  Isn't it possible to do the same thing
    >> above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?
    >
    > CLUSTER might be more adapted in this case as the purpose is to reorder the
    > table.
    > The same technique used by pg_reorg (aka table coupled with triggers) could
    > lower the lock access of the table.
    > Also, it could be possible to control each sub-operation in the same fashion
    > way as CREATE INDEX CONCURRENTLY.
    > By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
    > of doubts:
    > 1) isn't it be too costly for a core operation as pg_reorg really needs many
    > temporary objects? Could be possible to reduce the number of objects created
    > if added to core though...
    > 2) Do you think the current CLUSTER is enough and are there wishes to
    > implement such an optimization directly in core?
    
    
    For me, the Postgres user interface should include
    * REINDEX CONCURRENTLY
    * CLUSTER CONCURRENTLY
    * ALTER TABLE CONCURRENTLY
    and also that autovacuum would be expanded to include REINDEX and
    CLUSTER, renaming it to automaint.
    
    The actual implementation mechanism for those probably looks something
    like pg_reorg, but I don't see it as preferable to include the utility
    directly into core, though potentially some of the underlying code
    might be.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  24. Re: pg_reorg in core?

    Roberto Mello <roberto.mello@gmail.com> — 2012-09-24T16:22:25Z

    On Sat, Sep 22, 2012 at 3:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
    >
    > To solve this problem, I would like to have some umbrella project.
    > It would be called "pg dba utils", or something like this.
    > This umbrella project may contain several third-party tools (pg_reorg,
    > pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.
    
    Great idea!
    
    +1
    
    Roberto Mello
    
    
    
  25. Re: pg_reorg in core?

    Satoshi Nagayasu <snaga@uptime.jp> — 2012-09-24T18:38:13Z

    2012/09/25 0:15, Simon Riggs wrote:
    > On 21 September 2012 08:42, Michael Paquier <michael.paquier@gmail.com> wrote:
    >>
    >>
    >> On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada <umi.tanuki@gmail.com>
    >> wrote:
    >>>
    >>> I'm not familiar with pg_reorg, but I wonder why we need a separate
    >>> program for this task.  I know pg_reorg is ok as an external program
    >>> per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
    >>> pessimistic about) in the same way, it's much nicer than having
    >>> additional binary + extension.  Isn't it possible to do the same thing
    >>> above within the CLUSTER command?  Maybe CLUSTER .. CONCURRENTLY?
    >>
    >> CLUSTER might be more adapted in this case as the purpose is to reorder the
    >> table.
    >> The same technique used by pg_reorg (aka table coupled with triggers) could
    >> lower the lock access of the table.
    >> Also, it could be possible to control each sub-operation in the same fashion
    >> way as CREATE INDEX CONCURRENTLY.
    >> By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
    >> of doubts:
    >> 1) isn't it be too costly for a core operation as pg_reorg really needs many
    >> temporary objects? Could be possible to reduce the number of objects created
    >> if added to core though...
    >> 2) Do you think the current CLUSTER is enough and are there wishes to
    >> implement such an optimization directly in core?
    >
    >
    > For me, the Postgres user interface should include
    > * REINDEX CONCURRENTLY
    > * CLUSTER CONCURRENTLY
    > * ALTER TABLE CONCURRENTLY
    > and also that autovacuum would be expanded to include REINDEX and
    > CLUSTER, renaming it to automaint.
    >
    > The actual implementation mechanism for those probably looks something
    > like pg_reorg, but I don't see it as preferable to include the utility
    > directly into core, though potentially some of the underlying code
    > might be.
    
    I think it depends on what trade-off we can see.
    
    AFAIK, basically, rebuilding tables and/or indexes has
    a trade-off between "lock-free" and "disk-space".
    
    So, if we have enough disk space to build a "temporary"
    table/index when rebuilding a table/index, "concurrently"
    would be a great option, and I would love it to have
    in core.
    
    Regards,
    -- 
    Satoshi Nagayasu <snaga@uptime.jp>
    Uptime Technologies, LLC. http://www.uptime.jp
    
    
    
  26. Re: pg_reorg in core?

    Josh Berkus <josh@agliodbs.com> — 2012-09-24T22:36:16Z

    >> For me, the Postgres user interface should include
    >> * REINDEX CONCURRENTLY
    
    I don't see why we don't have REINDEX CONCURRENTLY now.  When I was
    writing out the instructions for today's update, I was thinking "we
    already have all the commands for this".
    
    -- 
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
    
    
    
  27. Re: pg_reorg in core?

    Simon Riggs <simon@2ndquadrant.com> — 2012-09-24T22:43:40Z

    On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    >
    >>> For me, the Postgres user interface should include
    >>> * REINDEX CONCURRENTLY
    >
    > I don't see why we don't have REINDEX CONCURRENTLY now.
    
    Same reason for everything on (anyone's) TODO list.
    
    Lack of vision is not holding us back, we just need the vision to realise it.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  28. Re: pg_reorg in core?

    Josh Berkus <josh@agliodbs.com> — 2012-09-24T22:55:35Z

    On 9/24/12 3:43 PM, Simon Riggs wrote:
    > On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    >>
    >>>> For me, the Postgres user interface should include
    >>>> * REINDEX CONCURRENTLY
    >>
    >> I don't see why we don't have REINDEX CONCURRENTLY now.
    > 
    > Same reason for everything on (anyone's) TODO list.
    
    Yes, I'm just pointing out that it would be a very small patch for
    someone, and that AFAIK it didn't make it on the TODO list yet.
    
    -- 
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
    
    
    
  29. Re: pg_reorg in core?

    Andres Freund <andres@2ndquadrant.com> — 2012-09-24T23:13:15Z

    On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
    > On 9/24/12 3:43 PM, Simon Riggs wrote:
    > > On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    > >>>> For me, the Postgres user interface should include
    > >>>> * REINDEX CONCURRENTLY
    > >> 
    > >> I don't see why we don't have REINDEX CONCURRENTLY now.
    > > 
    > > Same reason for everything on (anyone's) TODO list.
    > 
    > Yes, I'm just pointing out that it would be a very small patch for
    > someone, and that AFAIK it didn't make it on the TODO list yet.
    Its not *that* small.
    
    1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP INDEX 
    CONCURRENTLY because the index can e.g. be referenced by a foreign key 
    constraint. So you need to replace the existing index oid with a new one by 
    swapping the relfilenodes of both after verifying several side conditions 
    (indcheckxmin, indisvalid, indisready).
    
    It would probably have to look like:
    
    - build new index with indisready = false
    - newindex.indisready = true
    - wait
    - newindex.indisvalid = true
    - wait
    - swap(oldindex.relfilenode, newindex.relfilenode)
    - oldindex.indisvalid = false
    - wait
    - oldindex.indisready = false
    - wait
    - drop new index with old relfilenode
    
    Every wait indicates an externally visible state which you might encounter/need 
    to cleanup...
    
    To make it viable to use that systemwide it might be necessary to batch the 
    individual steps together for multiple indexes because all that waiting is 
    going to suck if you do it for every single table in the database while you 
    also have longrunning queries...
    
    2. no support for concurrent on system tables (not easy for shared catalogs)
    
    3. no support for the indexes of exlusion constraints (not hard I think)
    
    Greetings,
    
    Andres
    -- 
     Andres Freund	                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  30. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-25T02:37:05Z

    On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <andres@2ndquadrant.com>wrote:
    
    > On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
    > > On 9/24/12 3:43 PM, Simon Riggs wrote:
    > > > On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    > > >>>> For me, the Postgres user interface should include
    > > >>>> * REINDEX CONCURRENTLY
    > > >>
    > > >> I don't see why we don't have REINDEX CONCURRENTLY now.
    > > >
    > > > Same reason for everything on (anyone's) TODO list.
    > >
    > > Yes, I'm just pointing out that it would be a very small patch for
    > > someone, and that AFAIK it didn't make it on the TODO list yet.
    > Its not *that* small.
    >
    > 1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP
    > INDEX
    > CONCURRENTLY because the index can e.g. be referenced by a foreign key
    > constraint. So you need to replace the existing index oid with a new one by
    > swapping the relfilenodes of both after verifying several side conditions
    > (indcheckxmin, indisvalid, indisready).
    >
    > It would probably have to look like:
    >
    > - build new index with indisready = false
    > - newindex.indisready = true
    > - wait
    > - newindex.indisvalid = true
    > - wait
    > - swap(oldindex.relfilenode, newindex.relfilenode)
    > - oldindex.indisvalid = false
    > - wait
    > - oldindex.indisready = false
    > - wait
    > - drop new index with old relfilenode
    >
    > Every wait indicates an externally visible state which you might
    > encounter/need
    > to cleanup...
    >
    Could you clarify what do you mean here by cleanup?
    I am afraid I do not get your point here.
    
    
    > 2. no support for concurrent on system tables (not easy for shared
    > catalogs)
    >
    Doesn't this exclude all the tables that are in the schema catalog?
    
    
    >
    > 3. no support for the indexes of exclusion constraints (not hard I think)
    >
    This just consists in a check of indisready in pg_index.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  31. Re: pg_reorg in core?

    Andres Freund <andres@2ndquadrant.com> — 2012-09-25T08:55:29Z

    On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
    > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <andres@2ndquadrant.com>wrote:
    > > On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote:
    > > > On 9/24/12 3:43 PM, Simon Riggs wrote:
    > > > > On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    > > > >>>> For me, the Postgres user interface should include
    > > > >>>> * REINDEX CONCURRENTLY
    > > > >> 
    > > > >> I don't see why we don't have REINDEX CONCURRENTLY now.
    > > > > 
    > > > > Same reason for everything on (anyone's) TODO list.
    > > > 
    > > > Yes, I'm just pointing out that it would be a very small patch for
    > > > someone, and that AFAIK it didn't make it on the TODO list yet.
    > > 
    > > Its not *that* small.
    > > 
    > > 1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP
    > > INDEX
    > > CONCURRENTLY because the index can e.g. be referenced by a foreign key
    > > constraint. So you need to replace the existing index oid with a new one
    > > by swapping the relfilenodes of both after verifying several side
    > > conditions (indcheckxmin, indisvalid, indisready).
    > > 
    > > It would probably have to look like:
    > > 
    > > - build new index with indisready = false
    > > - newindex.indisready = true
    > > - wait
    > > - newindex.indisvalid = true
    > > - wait
    > > - swap(oldindex.relfilenode, newindex.relfilenode)
    > > - oldindex.indisvalid = false
    > > - wait
    > > - oldindex.indisready = false
    > > - wait
    > > - drop new index with old relfilenode
    > > 
    > > Every wait indicates an externally visible state which you might
    > > encounter/need
    > > to cleanup...
    > 
    > Could you clarify what do you mean here by cleanup?
    > I am afraid I do not get your point here.
    Sorry, was a bit tired when writing the above.
    
    The point is that to work concurrent the CONCURRENT operations commit/start 
    multiple transactions internally. It can be interrupted (user, shutdown, error, 
    crash) and leave transient state behind every time it does so. What I wanted to 
    say is that we need to take care that each of those can easily be cleaned up 
    afterwards.
    
    > > 2. no support for concurrent on system tables (not easy for shared
    > > catalogs)
    > Doesn't this exclude all the tables that are in the schema catalog?
    No. Only
    
    SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE 
    relisshared AND relkind = 'r';
    
    their toast tables and their indexes are shared. The problem is that for those 
    you cannot create a separate index and let it update concurrently because you 
    cannot write into each databases pg_class/pg_index.
    
    > > 3. no support for the indexes of exclusion constraints (not hard I think)
    > This just consists in a check of indisready in pg_index.
    It will probably be several places, but yea, I don't think its hard.
    
    Andres
    -- 
     Andres Freund	                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  32. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-25T11:48:34Z

    On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund <andres@2ndquadrant.com>wrote:
    
    > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
    > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <andres@2ndquadrant.com
    > >wrote:
    > > Could you clarify what do you mean here by cleanup?
    > > I am afraid I do not get your point here.
    >
    > Sorry, was a bit tired when writing the above.
    >
    > The point is that to work concurrent the CONCURRENT operations commit/start
    > multiple transactions internally. It can be interrupted (user, shutdown,
    > error,
    > crash) and leave transient state behind every time it does so. What I
    > wanted to
    > say is that we need to take care that each of those can easily be cleaned
    > up
    > afterwards.
    >
    Sure, many errors may happen.
    But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
    implemented as far as I know (might be missing something though). Isn't an
    index
    only considered as invalid in case of failure for concurrent creation?
    In the case of REINDEX it would be essential to create such a cleanup
    mechanism
    as I cannot imagine a production database with an index that has been
    marked as
    invalid due to a concurrent reindex failure, by assuming here, of course,
    that
    REINDEX CONCURRENTLY would use the same level of process error as CREATE
    INDEX CONCURRENTLY.
    
    One of the possible cleanup mechanisms I got on top of my head is a
    callback at
    transaction abort, each callback would need to be different for each
    subtransaction
    used at during the concurrent operation.
    In case the callback itself fails, well the old and/or new indexes become
    invalid.
    
    
    >
    > > > 2. no support for concurrent on system tables (not easy for shared
    > > > catalogs)
    > > Doesn't this exclude all the tables that are in the schema catalog?
    > No. Only
    >
    > SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE
    > relisshared AND relkind = 'r';
    >
    > their toast tables and their indexes are shared. The problem is that for
    > those
    > you cannot create a separate index and let it update concurrently because
    > you
    > cannot write into each databases pg_class/pg_index.
    >
    Yes indeed, I didn't think about things that are shared among databases.
    Blocking that is pretty simple, only a matter of places checked.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  33. Re: pg_reorg in core?

    Dimitri Fontaine <dimitri@2ndquadrant.fr> — 2012-09-25T19:42:16Z

    Simon Riggs <simon@2ndQuadrant.com> writes:
    > For me, the Postgres user interface should include
    > * REINDEX CONCURRENTLY
    > * CLUSTER CONCURRENTLY
    > * ALTER TABLE CONCURRENTLY
    > and also that autovacuum would be expanded to include REINDEX and
    > CLUSTER, renaming it to automaint.
    
    FWIW, +1 to all those user requirements, and for not having pg_reorg
    simply moved as-is nearer to core. I would paint the shed "autoheal",
    maybe.
    
    Regards,
    -- 
    Dimitri Fontaine
    http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
    
    
    
  34. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-26T00:38:31Z

    On Wed, Sep 26, 2012 at 4:42 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr>wrote:
    
    > Simon Riggs <simon@2ndQuadrant.com> writes:
    > > For me, the Postgres user interface should include
    > > * REINDEX CONCURRENTLY
    > > * CLUSTER CONCURRENTLY
    > > * ALTER TABLE CONCURRENTLY
    > > and also that autovacuum would be expanded to include REINDEX and
    > > CLUSTER, renaming it to automaint.
    >
    > FWIW, +1 to all those user requirements, and for not having pg_reorg
    > simply moved as-is nearer to core. I would paint the shed "autoheal",
    > maybe.
    >
    Yes, completely agreed.
    Based on what Simon is suggesting, REINDEX and CLUSTER extensions
    are prerequisites for autovacuum extension. It would need to use a mechanism
    that it slightly different than pg_reorg. ALTER TABLE could used something
    close
    to pg_reorg by creating a new table then swaping the 2 tables. The cases of
    column
    drop and addition particularly need some thoughts.
    
    I would like to work on such features and provide patches for the 2 first.
    This will of
    course strongly depend on the time I can spend on in the next couple of
    months.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  35. Re: pg_reorg in core?

    Andres Freund <andres@2ndquadrant.com> — 2012-09-26T11:13:03Z

    On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
    > On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund <andres@2ndquadrant.com>wrote:
    > > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
    > > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <andres@2ndquadrant.com
    > > >
    > > >wrote:
    > > > Could you clarify what do you mean here by cleanup?
    > > > I am afraid I do not get your point here.
    > > 
    > > Sorry, was a bit tired when writing the above.
    > > 
    > > The point is that to work concurrent the CONCURRENT operations
    > > commit/start multiple transactions internally. It can be interrupted
    > > (user, shutdown, error,
    > > crash) and leave transient state behind every time it does so. What I
    > > wanted to
    > > say is that we need to take care that each of those can easily be cleaned
    > > up
    > > afterwards.
    > 
    > Sure, many errors may happen.
    > But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
    > implemented as far as I know (might be missing something though). Isn't an
    > index only considered as invalid in case of failure for concurrent creation?
    Well, you can DROP or REINDEX the invalid index.
    
    There are several scenarios where you can get invalid indexes. Unique 
    violations, postgres restarts, aborted index creation...
    
    > In the case of REINDEX it would be essential to create such a cleanup
    > mechanism as I cannot imagine a production database with an index that has
    > been marked as invalid due to a concurrent reindex failure, by assuming here,
    > of course, that REINDEX CONCURRENTLY would use the same level of process
    > error as CREATE INDEX CONCURRENTLY.
    Not sure what youre getting at?
    
    > One of the possible cleanup mechanisms I got on top of my head is a
    > callback at transaction abort, each callback would need to be different for
    > each subtransaction used at during the concurrent operation.
    > In case the callback itself fails, well the old and/or new indexes become
    > invalid.
    Thats not going to work. E.g. the session might have been aborted or such. 
    Also, there is not much you can do from an callback at transaction end as you 
    cannot do catalog modifications.
    
    I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely similar.
     
    > > > > 2. no support for concurrent on system tables (not easy for shared
    > > > > catalogs)
    > > > 
    > > > Doesn't this exclude all the tables that are in the schema catalog?
    > > 
    > > No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class
    > > WHERE relisshared AND relkind = 'r';
    > > their toast tables and their indexes are shared. The problem is that for
    > > those you cannot create a separate index and let it update concurrently
    > > because you cannot write into each databases pg_class/pg_index.
    
    > Yes indeed, I didn't think about things that are shared among databases.
    > Blocking that is pretty simple, only a matter of places checked.
    
    Its just a bit sad to make the thing not really appear lockless ;)
    
    
    Greetings,
    
    Andres
    -- 
     Andres Freund	                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  36. Re: pg_reorg in core?

    Michael Paquier <michael.paquier@gmail.com> — 2012-09-26T12:39:36Z

    On Wed, Sep 26, 2012 at 8:13 PM, Andres Freund <andres@2ndquadrant.com>wrote:
    
    > On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
    > > On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund <andres@2ndquadrant.com
    > >wrote:
    > > > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
    > > > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <
    > andres@2ndquadrant.com
    > > > >
    > > > >wrote:
    > > > > Could you clarify what do you mean here by cleanup?
    > > > > I am afraid I do not get your point here.
    > > >
    > > > Sorry, was a bit tired when writing the above.
    > > >
    > > > The point is that to work concurrent the CONCURRENT operations
    > > > commit/start multiple transactions internally. It can be interrupted
    > > > (user, shutdown, error,
    > > > crash) and leave transient state behind every time it does so. What I
    > > > wanted to
    > > > say is that we need to take care that each of those can easily be
    > cleaned
    > > > up
    > > > afterwards.
    > >
    > > Sure, many errors may happen.
    > > But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up
    > method
    > > implemented as far as I know (might be missing something though). Isn't
    > an
    > > index only considered as invalid in case of failure for concurrent
    > creation?
    > Well, you can DROP or REINDEX the invalid index.
    >
    > There are several scenarios where you can get invalid indexes. Unique
    > violations, postgres restarts, aborted index creation...
    >
    > > In the case of REINDEX it would be essential to create such a cleanup
    > > mechanism as I cannot imagine a production database with an index that
    > has
    > > been marked as invalid due to a concurrent reindex failure, by assuming
    > here,
    > > of course, that REINDEX CONCURRENTLY would use the same level of process
    > > error as CREATE INDEX CONCURRENTLY.
    > Not sure what youre getting at?
    >
    I just meant that when CREATE INDEX CONCURRENTLY fails, the index created is
    considered as invalid, so it cannot be used by planner.
    
    Based on what you told before:
    1) build new index with indisready = false
    newindex.indisready = true
    wait
    2) newindex.indisvalid = true
    wait
    3) swap(oldindex.relfilenode, newindex.relfilenode)
    oldindex.indisvalid = false
    wait
    4) oldindex.indisready = false
    wait
    drop new index with old relfilenode
    
    If the reindex fails at step 1 or 2, the new index is not usable so the
    relation will finish
    with an index which is not valid. If it fails at step 4, the old index is
    invalid. If it fails at step
    3, both indexes are valid and both are usable for given relation.
    Do you think it is acceptable to consider that the user has to do the
    cleanup of the old or new index
    himself if there is a failure?
    
    
    > > One of the possible cleanup mechanisms I got on top of my head is a
    > > callback at transaction abort, each callback would need to be different
    > for
    > > each subtransaction used at during the concurrent operation.
    > > In case the callback itself fails, well the old and/or new indexes become
    > > invalid.
    > Thats not going to work. E.g. the session might have been aborted or such.
    > Also, there is not much you can do from an callback at transaction end as
    > you
    > cannot do catalog modifications.
    >
    > I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely
    > similar.
    >
    You could also reissue the reindex command and avoid an additional command.
    When launching a
    concurrent reindex, it could be possible to check if there is already an
    index that has been created to replace the
    old one that failed previously. In order to control that, why not adding an
    additional field in pg_index?
    When creating a new index concurrently, we register in its pg_index entry
    the oid of the index that it has to
    replace. When reissuing the command after a failure, it is then possible to
    check if there is already an index that has
    been issued by a previous REINDEX CONCURRENT command and based on the flag
    values of the old and new
    indexes it is then possible to replay the command from the step where it
    previously failed.
    -- 
    Michael Paquier
    http://michael.otacoo.com
    
  37. Re: pg_reorg in core?

    Andres Freund <andres@2ndquadrant.com> — 2012-09-26T13:21:42Z

    On Wednesday, September 26, 2012 02:39:36 PM Michael Paquier wrote:
    > Do you think it is acceptable to consider that the user has to do the
    > cleanup of the old or new index himself if there is a failure?
    The problem I see is that if you want the thing to be efficient you might end up 
    doing step 1) for all/a bunch of indexes, then 2), then .... In that case you 
    can have loads of invalid indexes around. 
    
    > You could also reissue the reindex command and avoid an additional command.
    > When launching a concurrent reindex, it could be possible to check if there
    > is already an index that has been created to replace the old one that failed
    > previously. In order to control that, why not adding an additional field in
    > pg_index?
    > When creating a new index concurrently, we register in its pg_index entry
    > the oid of the index that it has to replace. When reissuing the command
    > after a failure, it is then possible to check if there is already an index
    > that has been issued by a previous REINDEX CONCURRENT command and based on
    > the flag values of the old and new indexes it is then possible to replay the
    > command from the step where it previously failed.
    I don't really like this idea but we might end up there anyway because we 
    probably need to keep track whether an index is actually only a "replacement" 
    index that shouldn't exist on its own. Otherwise its hard to know which 
    indexes to drop if it failed halfway through.
    
    Greetings,
    
    Andres
    -- 
    Andres Freund		http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  38. Re: pg_reorg in core?

    Bruce Momjian <bruce@momjian.us> — 2012-09-26T19:29:38Z

    On Mon, Sep 24, 2012 at 03:55:35PM -0700, Josh Berkus wrote:
    > On 9/24/12 3:43 PM, Simon Riggs wrote:
    > > On 24 September 2012 17:36, Josh Berkus <josh@agliodbs.com> wrote:
    > >>
    > >>>> For me, the Postgres user interface should include
    > >>>> * REINDEX CONCURRENTLY
    > >>
    > >> I don't see why we don't have REINDEX CONCURRENTLY now.
    > > 
    > > Same reason for everything on (anyone's) TODO list.
    > 
    > Yes, I'm just pointing out that it would be a very small patch for
    > someone, and that AFAIK it didn't make it on the TODO list yet.
    
    I see it on the TODO list, and it has been there for years:
    
    	https://wiki.postgresql.org/wiki/Todo#Indexes
    	Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY 
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +