Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

  1. WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-08-21T13:47:04Z

    Hi Hackers,
    
    I have been looking into adding parallel backup feature in pg_basebackup.
    Currently pg_basebackup sends BASE_BACKUP command for taking full backup,
    server scans the PGDATA and sends the files to pg_basebackup. In general,
    server takes the following steps on BASE_BACKUP command:
    
    - do pg_start_backup
    - scans PGDATA, creates and send header containing information of
    tablespaces.
    - sends each tablespace to pg_basebackup.
    - and then do pg_stop_backup
    
    All these steps are executed sequentially by a single process. The idea I
    am working on is to separate these steps into multiple commands in
    replication grammer. Add worker processes to the pg_basebackup where they
    can copy the contents of PGDATA in parallel.
    
    The command line interface syntax would be like:
    pg_basebackup --jobs=WORKERS
    
    
    Replication commands:
    
    - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
    If the parallel option is there, then it will only do pg_start_backup,
    scans PGDATA and sends a list of file names.
    
    - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
    pg_basebackup will then send back a list of filenames in this command. This
    commands will be send by each worker and that worker will be getting the
    said files.
    
    - STOP_BACKUP
    when all workers finish then, pg_basebackup will send STOP_BACKUP command.
    
    The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and
    getting a list of filenames from the server in response. It should then
    divide this list as per --jobs parameter. (This division can be based on
    file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS
    (file1, file2,...) command. In response, the server will send the files
    mentioned in the list back to the requesting worker process.
    
    Once all the files are copied, then pg_basebackup will send the STOP_BACKUP
    command. Similar idea has been been discussed by Robert, on the incremental
    backup thread a while ago. This is similar to that but instead of
    START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP
    PARALLEL.
    
    I have done a basic proof of concenpt (POC), which is also attached. I
    would appreciate some input on this. So far, I am simply dividing the list
    equally and assigning them to worker processes. I intend to fine tune this
    by taking into consideration file sizes. Further to add tar format support,
    I am considering that each worker process, processes all files belonging to
    a tablespace in its list (i.e. creates and copies tar file), before it
    processes the next tablespace. As a result, this will create tar files that
    are disjointed with respect tablespace data. For example:
    
    Say, tablespace t1 has 20 files and we have 5 worker processes and
    tablespace t2 has 10. Ignoring all other factors for the sake of this
    example, each worker process will get a group of 4 files of t1 and 2 files
    of t2. Each process will create 2 tar files, one for t1 containing 4 files
    and another for t2 containing 2 files.
    
    
    Regards,
    Asif
    
  2. Re: WIP/PoC for parallel backup

    P <apraveen@pivotal.io> — 2019-08-23T10:17:51Z

    Hi Asif
    
    Interesting proposal.  Bulk of the work in a backup is transferring files
    from source data directory to destination.  Your patch is breaking this
    task down in multiple sets of files and transferring each set in parallel.
    This seems correct, however, your patch is also creating a new process to
    handle each set.  Is that necessary?  I think we should try to achieve this
    using multiple asynchronous libpq connections from a single basebackup
    process.  That is to use PQconnectStartParams() interface instead of
    PQconnectdbParams(), wich is currently used by basebackup.  On the server
    side, it may still result in multiple backend processes per connection, and
    an attempt should be made to avoid that as well, but it seems complicated.
    
    What do you think?
    
    Asim
    
  3. Re: WIP/PoC for parallel backup

    Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-08-23T13:03:10Z

    On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
    
    > Hi Asif
    >
    > Interesting proposal.  Bulk of the work in a backup is transferring files
    > from source data directory to destination.  Your patch is breaking this
    > task down in multiple sets of files and transferring each set in parallel.
    > This seems correct, however, your patch is also creating a new process to
    > handle each set.  Is that necessary?  I think we should try to achieve this
    > using multiple asynchronous libpq connections from a single basebackup
    > process.  That is to use PQconnectStartParams() interface instead of
    > PQconnectdbParams(), wich is currently used by basebackup.  On the server
    > side, it may still result in multiple backend processes per connection, and
    > an attempt should be made to avoid that as well, but it seems complicated.
    >
    > What do you think?
    >
    > The main question is what we really want to solve here. What is the
    bottleneck? and which HW want to saturate?. Why I am saying that because
    there are multiple H/W involve while taking the backup (Network/CPU/Disk).
    If we
    already saturated the disk then there is no need to add parallelism because
    we will be blocked on disk I/O anyway.  I implemented the parallel backup
    in a sperate
    application and has wonderful results. I just skim through the code and have
    some reservation that creating a separate process only for copying data is
    overkill.
    There are two options, one is non-blocking calls or you can have some
    worker threads.
    But before doing that need to see the pg_basebackup bottleneck, after that,
    we
    can see what is the best way to solve that. Some numbers may help to
    understand the
    actual benefit.
    
    
    -- 
    Ibrar Ahmed
    
  4. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-08-23T16:04:07Z

    On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
    
    > Hi Asif
    >
    > Interesting proposal.  Bulk of the work in a backup is transferring files
    > from source data directory to destination.  Your patch is breaking this
    > task down in multiple sets of files and transferring each set in parallel.
    > This seems correct, however, your patch is also creating a new process to
    > handle each set.  Is that necessary?  I think we should try to achieve this
    > using multiple asynchronous libpq connections from a single basebackup
    > process.  That is to use PQconnectStartParams() interface instead of
    > PQconnectdbParams(), wich is currently used by basebackup.  On the server
    > side, it may still result in multiple backend processes per connection, and
    > an attempt should be made to avoid that as well, but it seems complicated.
    >
    > What do you think?
    >
    > Asim
    >
    
    Thanks Asim for the feedback. This is a good suggestion. The main idea I
    wanted to discuss is the design where we can open multiple backend
    connections to get the data instead of a single connection.
    On the client side we can have multiple approaches, One is to use
    asynchronous APIs ( as suggested by you) and other could be to decide
    between multi-process and multi-thread. The main point was we can extract
    lot of performance benefit by using the multiple connections and I built
    this POC to float the idea of how the parallel backup can work, since the
    core logic of getting the files using multiple connections will remain the
    same, wether we use asynchronous, multi-process or multi-threaded.
    
    I am going to address the division of files to be distributed evenly among
    multiple workers based on file sizes, that would allow to get some concrete
    numbers as well as it will also us to gauge some benefits between async and
    multiprocess/thread approach on client side.
    
    Regards,
    Asif
    
  5. Re: WIP/PoC for parallel backup

    Stephen Frost <sfrost@snowman.net> — 2019-08-23T17:26:38Z

    Greetings,
    
    * Asif Rehman (asifr.rehman@gmail.com) wrote:
    > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
    > > Interesting proposal.  Bulk of the work in a backup is transferring files
    > > from source data directory to destination.  Your patch is breaking this
    > > task down in multiple sets of files and transferring each set in parallel.
    > > This seems correct, however, your patch is also creating a new process to
    > > handle each set.  Is that necessary?  I think we should try to achieve this
    > > using multiple asynchronous libpq connections from a single basebackup
    > > process.  That is to use PQconnectStartParams() interface instead of
    > > PQconnectdbParams(), wich is currently used by basebackup.  On the server
    > > side, it may still result in multiple backend processes per connection, and
    > > an attempt should be made to avoid that as well, but it seems complicated.
    > 
    > Thanks Asim for the feedback. This is a good suggestion. The main idea I
    > wanted to discuss is the design where we can open multiple backend
    > connections to get the data instead of a single connection.
    > On the client side we can have multiple approaches, One is to use
    > asynchronous APIs ( as suggested by you) and other could be to decide
    > between multi-process and multi-thread. The main point was we can extract
    > lot of performance benefit by using the multiple connections and I built
    > this POC to float the idea of how the parallel backup can work, since the
    > core logic of getting the files using multiple connections will remain the
    > same, wether we use asynchronous, multi-process or multi-threaded.
    > 
    > I am going to address the division of files to be distributed evenly among
    > multiple workers based on file sizes, that would allow to get some concrete
    > numbers as well as it will also us to gauge some benefits between async and
    > multiprocess/thread approach on client side.
    
    I would expect you to quickly want to support compression on the server
    side, before the data is sent across the network, and possibly
    encryption, and so it'd likely make sense to just have independent
    processes and connections through which to do that.
    
    Thanks,
    
    Stephen
    
  6. Re: WIP/PoC for parallel backup

    Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-08-23T17:50:09Z

    On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfrost@snowman.net> wrote:
    
    > Greetings,
    >
    > * Asif Rehman (asifr.rehman@gmail.com) wrote:
    > > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
    > > > Interesting proposal.  Bulk of the work in a backup is transferring
    > files
    > > > from source data directory to destination.  Your patch is breaking this
    > > > task down in multiple sets of files and transferring each set in
    > parallel.
    > > > This seems correct, however, your patch is also creating a new process
    > to
    > > > handle each set.  Is that necessary?  I think we should try to achieve
    > this
    > > > using multiple asynchronous libpq connections from a single basebackup
    > > > process.  That is to use PQconnectStartParams() interface instead of
    > > > PQconnectdbParams(), wich is currently used by basebackup.  On the
    > server
    > > > side, it may still result in multiple backend processes per
    > connection, and
    > > > an attempt should be made to avoid that as well, but it seems
    > complicated.
    > >
    > > Thanks Asim for the feedback. This is a good suggestion. The main idea I
    > > wanted to discuss is the design where we can open multiple backend
    > > connections to get the data instead of a single connection.
    > > On the client side we can have multiple approaches, One is to use
    > > asynchronous APIs ( as suggested by you) and other could be to decide
    > > between multi-process and multi-thread. The main point was we can extract
    > > lot of performance benefit by using the multiple connections and I built
    > > this POC to float the idea of how the parallel backup can work, since the
    > > core logic of getting the files using multiple connections will remain
    > the
    > > same, wether we use asynchronous, multi-process or multi-threaded.
    > >
    > > I am going to address the division of files to be distributed evenly
    > among
    > > multiple workers based on file sizes, that would allow to get some
    > concrete
    > > numbers as well as it will also us to gauge some benefits between async
    > and
    > > multiprocess/thread approach on client side.
    >
    > I would expect you to quickly want to support compression on the server
    > side, before the data is sent across the network, and possibly
    > encryption, and so it'd likely make sense to just have independent
    > processes and connections through which to do that.
    >
    > +1 for compression and encryption, but I think parallelism will give us
    the benefit with and without the compression.
    
    Thanks,
    >
    > Stephen
    >
    
    
    -- 
    Ibrar Ahmed
    
  7. Re: WIP/PoC for parallel backup

    Ahsan Hadi <ahsan.hadi@gmail.com> — 2019-08-23T19:15:32Z

    On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
    
    > Greetings,
    >
    > * Asif Rehman (asifr.rehman@gmail.com) wrote:
    > > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
    > > > Interesting proposal.  Bulk of the work in a backup is transferring
    > files
    > > > from source data directory to destination.  Your patch is breaking this
    > > > task down in multiple sets of files and transferring each set in
    > parallel.
    > > > This seems correct, however, your patch is also creating a new process
    > to
    > > > handle each set.  Is that necessary?  I think we should try to achieve
    > this
    > > > using multiple asynchronous libpq connections from a single basebackup
    > > > process.  That is to use PQconnectStartParams() interface instead of
    > > > PQconnectdbParams(), wich is currently used by basebackup.  On the
    > server
    > > > side, it may still result in multiple backend processes per
    > connection, and
    > > > an attempt should be made to avoid that as well, but it seems
    > complicated.
    > >
    > > Thanks Asim for the feedback. This is a good suggestion. The main idea I
    > > wanted to discuss is the design where we can open multiple backend
    > > connections to get the data instead of a single connection.
    > > On the client side we can have multiple approaches, One is to use
    > > asynchronous APIs ( as suggested by you) and other could be to decide
    > > between multi-process and multi-thread. The main point was we can extract
    > > lot of performance benefit by using the multiple connections and I built
    > > this POC to float the idea of how the parallel backup can work, since the
    > > core logic of getting the files using multiple connections will remain
    > the
    > > same, wether we use asynchronous, multi-process or multi-threaded.
    > >
    > > I am going to address the division of files to be distributed evenly
    > among
    > > multiple workers based on file sizes, that would allow to get some
    > concrete
    > > numbers as well as it will also us to gauge some benefits between async
    > and
    > > multiprocess/thread approach on client side.
    >
    > I would expect you to quickly want to support compression on the server
    > side, before the data is sent across the network, and possibly
    > encryption, and so it'd likely make sense to just have independent
    > processes and connections through which to do that.
    
    
    It would be interesting to see the benefits of compression (before the data
    is transferred over the network) on top of parallelism. Since there is also
    some overhead associated with performing the compression. I agree with your
    suggestion of trying to add parallelism first and then try compression
    before the data is sent across the network.
    
    
    >
    > Thanks,
    >
    > Stephen
    >
    
  8. Re: WIP/PoC for parallel backup

    Stephen Frost <sfrost@snowman.net> — 2019-08-23T19:42:54Z

    Greetings,
    
    * Ahsan Hadi (ahsan.hadi@gmail.com) wrote:
    > On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
    > > I would expect you to quickly want to support compression on the server
    > > side, before the data is sent across the network, and possibly
    > > encryption, and so it'd likely make sense to just have independent
    > > processes and connections through which to do that.
    > 
    > It would be interesting to see the benefits of compression (before the data
    > is transferred over the network) on top of parallelism. Since there is also
    > some overhead associated with performing the compression. I agree with your
    > suggestion of trying to add parallelism first and then try compression
    > before the data is sent across the network.
    
    You're welcome to take a look at pgbackrest for insight and to play with
    regarding compression-before-transfer, how best to split up the files
    and order them, encryption, et al.  We've put quite a bit of effort into
    figuring all of that out.
    
    Thanks!
    
    Stephen
    
  9. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-09-24T17:53:03Z

    On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
    > If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.
    
    So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
    tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
    list of file names. I don't think that's a good approach. It's too
    confusing to have one replication command that returns totally
    different things depending on whether some option is given.
    
    > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
    > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
    
    Seems reasonable, but I think you should just pass one file name and
    use the command multiple times, once per file.
    
    > - STOP_BACKUP
    > when all workers finish then, pg_basebackup will send STOP_BACKUP command.
    
    This also seems reasonable, but surely the matching command should
    then be called START_BACKUP, not BASEBACKUP PARALLEL.
    
    > I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:
    
    Instead of doing this, I suggest that you should just maintain a list
    of all the files that need to be fetched and have each worker pull a
    file from the head of the list and fetch it when it finishes receiving
    the previous file.  That way, if some connections go faster or slower
    than others, the distribution of work ends up fairly even.  If you
    instead pre-distribute the work, you're guessing what's going to
    happen in the future instead of just waiting to see what actually does
    happen. Guessing isn't intrinsically bad, but guessing when you could
    be sure of doing the right thing *is* bad.
    
    If you want to be really fancy, you could start by sorting the files
    in descending order of size, so that big files are fetched before
    small ones.  Since the largest possible file is 1GB and any database
    where this feature is important is probably hundreds or thousands of
    GB, this may not be very important. I suggest not worrying about it
    for v1.
    
    > Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.
    
    This is one of several possible approaches. If we're doing a
    plain-format backup in parallel, we can just write each file where it
    needs to go and call it good. But, with a tar-format backup, what
    should we do? I can see three options:
    
    1. Error! Tar format parallel backups are not supported.
    
    2. Write multiple tar files. The user might reasonably expect that
    they're going to end up with the same files at the end of the backup
    regardless of whether they do it in parallel. A user with this
    expectation will be disappointed.
    
    3. Write one tar file. In this design, the workers have to take turns
    writing to the tar file, so you need some synchronization around that.
    Perhaps you'd have N threads that read and buffer a file, and N+1
    buffers.  Then you have one additional thread that reads the complete
    files from the buffers and writes them to the tar file. There's
    obviously some possibility that the writer won't be able to keep up
    and writing the backup will therefore be slower than it would be with
    approach (2).
    
    There's probably also a possibility that approach (2) would thrash the
    disk head back and forth between multiple files that are all being
    written at the same time, and approach (3) will therefore win by not
    thrashing the disk head. But, since spinning media are becoming less
    and less popular and are likely to have multiple disk heads under the
    hood when they are used, this is probably not too likely.
    
    I think your choice to go with approach (2) is probably reasonable,
    but I'm not sure whether everyone will agree.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  10. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-09-27T16:00:01Z

    Hi Robert,
    
    Thanks for the feedback. Please see the comments below:
    
    On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
    > > If the parallel option is there, then it will only do pg_start_backup,
    > scans PGDATA and sends a list of file names.
    >
    > So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
    > tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
    > list of file names. I don't think that's a good approach. It's too
    > confusing to have one replication command that returns totally
    > different things depending on whether some option is given.
    >
    
    Sure. I will add a separate command (START_BACKUP)  for parallel.
    
    
    > > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
    > list.
    > > pg_basebackup will then send back a list of filenames in this command.
    > This commands will be send by each worker and that worker will be getting
    > the said files.
    >
    > Seems reasonable, but I think you should just pass one file name and
    > use the command multiple times, once per file.
    >
    
    I considered this approach initially,  however, I adopted the current
    strategy to avoid multiple round trips between the server and clients and
    save on query processing time by issuing a single command rather than
    multiple ones. Further fetching multiple files at once will also aid in
    supporting the tar format by utilising the existing ReceiveTarFile()
    function and will be able to create a tarball for per tablespace per worker.
    
    
    >
    > > - STOP_BACKUP
    > > when all workers finish then, pg_basebackup will send STOP_BACKUP
    > command.
    >
    > This also seems reasonable, but surely the matching command should
    > then be called START_BACKUP, not BASEBACKUP PARALLEL.
    >
    > > I have done a basic proof of concenpt (POC), which is also attached. I
    > would appreciate some input on this. So far, I am simply dividing the list
    > equally and assigning them to worker processes. I intend to fine tune this
    > by taking into consideration file sizes. Further to add tar format support,
    > I am considering that each worker process, processes all files belonging to
    > a tablespace in its list (i.e. creates and copies tar file), before it
    > processes the next tablespace. As a result, this will create tar files that
    > are disjointed with respect tablespace data. For example:
    >
    > Instead of doing this, I suggest that you should just maintain a list
    > of all the files that need to be fetched and have each worker pull a
    > file from the head of the list and fetch it when it finishes receiving
    > the previous file.  That way, if some connections go faster or slower
    > than others, the distribution of work ends up fairly even.  If you
    > instead pre-distribute the work, you're guessing what's going to
    > happen in the future instead of just waiting to see what actually does
    > happen. Guessing isn't intrinsically bad, but guessing when you could
    > be sure of doing the right thing *is* bad.
    >
    > If you want to be really fancy, you could start by sorting the files
    > in descending order of size, so that big files are fetched before
    > small ones.  Since the largest possible file is 1GB and any database
    > where this feature is important is probably hundreds or thousands of
    > GB, this may not be very important. I suggest not worrying about it
    > for v1.
    >
    
    Ideally, I would like to support the tar format as well, which would be
    much easier to implement when fetching multiple files at once since that
    would enable using the existent functionality to be used without much
    change.
    
    Your idea of sorting the files in descending order of size seems very
    appealing. I think we can do this and have the file divided among the
    workers one by one i.e. the first file in the list goes to worker 1, the
    second to process 2, and so on and so forth.
    
    
    >
    > > Say, tablespace t1 has 20 files and we have 5 worker processes and
    > tablespace t2 has 10. Ignoring all other factors for the sake of this
    > example, each worker process will get a group of 4 files of t1 and 2 files
    > of t2. Each process will create 2 tar files, one for t1 containing 4 files
    > and another for t2 containing 2 files.
    >
    > This is one of several possible approaches. If we're doing a
    > plain-format backup in parallel, we can just write each file where it
    > needs to go and call it good. But, with a tar-format backup, what
    > should we do? I can see three options:
    >
    > 1. Error! Tar format parallel backups are not supported.
    >
    > 2. Write multiple tar files. The user might reasonably expect that
    > they're going to end up with the same files at the end of the backup
    > regardless of whether they do it in parallel. A user with this
    > expectation will be disappointed.
    >
    > 3. Write one tar file. In this design, the workers have to take turns
    > writing to the tar file, so you need some synchronization around that.
    > Perhaps you'd have N threads that read and buffer a file, and N+1
    > buffers.  Then you have one additional thread that reads the complete
    > files from the buffers and writes them to the tar file. There's
    > obviously some possibility that the writer won't be able to keep up
    > and writing the backup will therefore be slower than it would be with
    > approach (2).
    >
    > There's probably also a possibility that approach (2) would thrash the
    > disk head back and forth between multiple files that are all being
    > written at the same time, and approach (3) will therefore win by not
    > thrashing the disk head. But, since spinning media are becoming less
    > and less popular and are likely to have multiple disk heads under the
    > hood when they are used, this is probably not too likely.
    >
    > I think your choice to go with approach (2) is probably reasonable,
    > but I'm not sure whether everyone will agree.
    >
    
    Yes for the tar format support, approach (2) is what I had in
    mind. Currently I'm working on the implementation and will share the patch
    in a couple of days.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  11. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-10-03T11:47:31Z

    Hi  Asif,
    
    I was looking at the patch and tried comipling it. However, got few errors
    and warnings.
    
    Fixed those in the attached patch.
    
    On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Hi Robert,
    >
    > Thanks for the feedback. Please see the comments below:
    >
    > On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com>
    > wrote:
    >
    >> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
    >> > If the parallel option is there, then it will only do pg_start_backup,
    >> scans PGDATA and sends a list of file names.
    >>
    >> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
    >> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
    >> list of file names. I don't think that's a good approach. It's too
    >> confusing to have one replication command that returns totally
    >> different things depending on whether some option is given.
    >>
    >
    > Sure. I will add a separate command (START_BACKUP)  for parallel.
    >
    >
    >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
    >> list.
    >> > pg_basebackup will then send back a list of filenames in this command.
    >> This commands will be send by each worker and that worker will be getting
    >> the said files.
    >>
    >> Seems reasonable, but I think you should just pass one file name and
    >> use the command multiple times, once per file.
    >>
    >
    > I considered this approach initially,  however, I adopted the current
    > strategy to avoid multiple round trips between the server and clients and
    > save on query processing time by issuing a single command rather than
    > multiple ones. Further fetching multiple files at once will also aid in
    > supporting the tar format by utilising the existing ReceiveTarFile()
    > function and will be able to create a tarball for per tablespace per worker.
    >
    >
    >>
    >> > - STOP_BACKUP
    >> > when all workers finish then, pg_basebackup will send STOP_BACKUP
    >> command.
    >>
    >> This also seems reasonable, but surely the matching command should
    >> then be called START_BACKUP, not BASEBACKUP PARALLEL.
    >>
    >> > I have done a basic proof of concenpt (POC), which is also attached. I
    >> would appreciate some input on this. So far, I am simply dividing the list
    >> equally and assigning them to worker processes. I intend to fine tune this
    >> by taking into consideration file sizes. Further to add tar format support,
    >> I am considering that each worker process, processes all files belonging to
    >> a tablespace in its list (i.e. creates and copies tar file), before it
    >> processes the next tablespace. As a result, this will create tar files that
    >> are disjointed with respect tablespace data. For example:
    >>
    >> Instead of doing this, I suggest that you should just maintain a list
    >> of all the files that need to be fetched and have each worker pull a
    >> file from the head of the list and fetch it when it finishes receiving
    >> the previous file.  That way, if some connections go faster or slower
    >> than others, the distribution of work ends up fairly even.  If you
    >> instead pre-distribute the work, you're guessing what's going to
    >> happen in the future instead of just waiting to see what actually does
    >> happen. Guessing isn't intrinsically bad, but guessing when you could
    >> be sure of doing the right thing *is* bad.
    >>
    >> If you want to be really fancy, you could start by sorting the files
    >> in descending order of size, so that big files are fetched before
    >> small ones.  Since the largest possible file is 1GB and any database
    >> where this feature is important is probably hundreds or thousands of
    >> GB, this may not be very important. I suggest not worrying about it
    >> for v1.
    >>
    >
    > Ideally, I would like to support the tar format as well, which would be
    > much easier to implement when fetching multiple files at once since that
    > would enable using the existent functionality to be used without much
    > change.
    >
    > Your idea of sorting the files in descending order of size seems very
    > appealing. I think we can do this and have the file divided among the
    > workers one by one i.e. the first file in the list goes to worker 1, the
    > second to process 2, and so on and so forth.
    >
    >
    >>
    >> > Say, tablespace t1 has 20 files and we have 5 worker processes and
    >> tablespace t2 has 10. Ignoring all other factors for the sake of this
    >> example, each worker process will get a group of 4 files of t1 and 2 files
    >> of t2. Each process will create 2 tar files, one for t1 containing 4 files
    >> and another for t2 containing 2 files.
    >>
    >> This is one of several possible approaches. If we're doing a
    >> plain-format backup in parallel, we can just write each file where it
    >> needs to go and call it good. But, with a tar-format backup, what
    >> should we do? I can see three options:
    >>
    >> 1. Error! Tar format parallel backups are not supported.
    >>
    >> 2. Write multiple tar files. The user might reasonably expect that
    >> they're going to end up with the same files at the end of the backup
    >> regardless of whether they do it in parallel. A user with this
    >> expectation will be disappointed.
    >>
    >> 3. Write one tar file. In this design, the workers have to take turns
    >> writing to the tar file, so you need some synchronization around that.
    >> Perhaps you'd have N threads that read and buffer a file, and N+1
    >> buffers.  Then you have one additional thread that reads the complete
    >> files from the buffers and writes them to the tar file. There's
    >> obviously some possibility that the writer won't be able to keep up
    >> and writing the backup will therefore be slower than it would be with
    >> approach (2).
    >>
    >> There's probably also a possibility that approach (2) would thrash the
    >> disk head back and forth between multiple files that are all being
    >> written at the same time, and approach (3) will therefore win by not
    >> thrashing the disk head. But, since spinning media are becoming less
    >> and less popular and are likely to have multiple disk heads under the
    >> hood when they are used, this is probably not too likely.
    >>
    >> I think your choice to go with approach (2) is probably reasonable,
    >> but I'm not sure whether everyone will agree.
    >>
    >
    > Yes for the tar format support, approach (2) is what I had in
    > mind. Currently I'm working on the implementation and will share the patch
    > in a couple of days.
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  12. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-10-03T13:39:56Z

    On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
    >> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
    >>
    >> Seems reasonable, but I think you should just pass one file name and
    >> use the command multiple times, once per file.
    >
    > I considered this approach initially,  however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.
    
    I think that sending multiple filenames on a line could save some time
    when there are lots of very small files, because then the round-trip
    overhead could be significant.
    
    However, if you've got mostly big files, I think this is going to be a
    loser. It'll be fine if you're able to divide the work exactly evenly,
    but that's pretty hard to do, because some workers may succeed in
    copying the data faster than others for a variety of reasons: some
    data is in memory, some data has to be read from disk, different data
    may need to be read from different disks that run at different speeds,
    not all the network connections may run at the same speed. Remember
    that the backup's not done until the last worker finishes, and so
    there may well be a significant advantage in terms of overall speed in
    putting some energy into making sure that they finish as close to each
    other in time as possible.
    
    To put that another way, the first time all the workers except one get
    done while the last one still has 10GB of data to copy, somebody's
    going to be unhappy.
    
    > Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.
    
    I think we should just have the client generate the tarfile. It'll
    require duplicating some code, but it's not actually that much code or
    that complicated from what I can see.
    
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  13. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-04T11:01:55Z

    On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
    > list.
    > >> > pg_basebackup will then send back a list of filenames in this
    > command. This commands will be send by each worker and that worker will be
    > getting the said files.
    > >>
    > >> Seems reasonable, but I think you should just pass one file name and
    > >> use the command multiple times, once per file.
    > >
    > > I considered this approach initially,  however, I adopted the current
    > strategy to avoid multiple round trips between the server and clients and
    > save on query processing time by issuing a single command rather than
    > multiple ones. Further fetching multiple files at once will also aid in
    > supporting the tar format by utilising the existing ReceiveTarFile()
    > function and will be able to create a tarball for per tablespace per worker.
    >
    > I think that sending multiple filenames on a line could save some time
    > when there are lots of very small files, because then the round-trip
    > overhead could be significant.
    >
    > However, if you've got mostly big files, I think this is going to be a
    > loser. It'll be fine if you're able to divide the work exactly evenly,
    > but that's pretty hard to do, because some workers may succeed in
    > copying the data faster than others for a variety of reasons: some
    > data is in memory, some data has to be read from disk, different data
    > may need to be read from different disks that run at different speeds,
    > not all the network connections may run at the same speed. Remember
    > that the backup's not done until the last worker finishes, and so
    > there may well be a significant advantage in terms of overall speed in
    > putting some energy into making sure that they finish as close to each
    > other in time as possible.
    >
    > To put that another way, the first time all the workers except one get
    > done while the last one still has 10GB of data to copy, somebody's
    > going to be unhappy.
    >
    
    I have updated the patch (see the attached patch) to include tablespace
    support, tar format support and all other backup base backup options to
    work in parallel mode as well. As previously suggested, I have removed
    BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
    backup. The tar format will write multiple tar files depending upon the
    number of workers specified. Also made all commands
    (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
    base_backup_opt_list. This way the command-line options can also be
    provided to these commands. Since the command-line options don't change
    once the backup initiates, I went this way instead of storing them in
    shared state.
    
    The START_BACKUP command will now return a sorted list of files in
    descending order based on file sizes. This way, the larger files will be on
    top of the list. hence these files will be assigned to workers one by one,
    making it so that the larger files will be copied before other files.
    
    Based on my understanding your main concern is that the files won't be
    distributed fairly i.e one worker might get a big file and take more time
    while others get done early with smaller files? In this approach I have
    created a list of files in descending order based on there sizes so all the
    big size files will come at the top. The maximum file size in PG is 1GB so
    if we have four workers who are picking up file from the list one by one,
    the worst case scenario is that one worker gets a file of 1GB to process
    while others get files of smaller size. However with this approach of
    descending files based on size and handing it out to workers one by one,
    there is a very high likelihood of workers getting work evenly. does this
    address your concerns?
    
    Furthermore the patch also includes the regression test. As t/
    010_pg_basebackup.pl test-case is testing base backup comprehensively, so I
    have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel
    option in all of its tests, to make sure parallel mode works expectantly.
    The one thing that differs from base backup is the file checksum reporting.
    In parallel mode, the total number of checksum failures are not reported
    correctly however it will abort the backup whenever a checksum failure
    occurs. This is because processes are not maintaining any shared state. I
    assume that it's not much important to report total number of failures vs
    noticing the failure and aborting.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  14. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-10-04T12:07:42Z

    On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?
    
    Somewhat, but I'm not sure it's good enough. There are lots of reasons
    why two processes that are started at the same time with the same
    amount of work might not finish at the same time.
    
    I'm also not particularly excited about having the server do the
    sorting based on file size.  Seems like that ought to be the client's
    job, if the client needs the sorting.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  15. Re: WIP/PoC for parallel backup

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-10-07T08:52:36Z

    Thanks Asif for the patch.  I am opting this for a review.  Patch is
    bit big, so here are very initial comments to make the review process
    easier.
    
    1) Patch seems doing lot of code shuffling, I think it would be easy
    to review if you can break the clean up patch separately.
    
    Example:
    a: setup_throttle
    b: include_wal_files
    
    2) As I can see this patch basically have three major phase.
    
    a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
    STOP_BACKUP.
    b) Implementation of actual parallel backup.
    c) Testcase
    
    I would suggest, if you can break out in three as a separate patch that
    would be nice.  It will benefit in reviewing the patch.
    
    3) In your patch you are preparing the backup manifest (file which
    giving the information about the data files). Robert Haas, submitted
    the backup manifests patch on another thread [1], and I think we
    should use that patch to get the backup manifests for parallel backup.
    
    Further, I will continue to review patch but meanwhile if you can
    break the patches - so that review process be easier.
    
    [1]
    https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
    
    Thanks,
    
    On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in
    >> given list.
    >> >> > pg_basebackup will then send back a list of filenames in this
    >> command. This commands will be send by each worker and that worker will be
    >> getting the said files.
    >> >>
    >> >> Seems reasonable, but I think you should just pass one file name and
    >> >> use the command multiple times, once per file.
    >> >
    >> > I considered this approach initially,  however, I adopted the current
    >> strategy to avoid multiple round trips between the server and clients and
    >> save on query processing time by issuing a single command rather than
    >> multiple ones. Further fetching multiple files at once will also aid in
    >> supporting the tar format by utilising the existing ReceiveTarFile()
    >> function and will be able to create a tarball for per tablespace per worker.
    >>
    >> I think that sending multiple filenames on a line could save some time
    >> when there are lots of very small files, because then the round-trip
    >> overhead could be significant.
    >>
    >> However, if you've got mostly big files, I think this is going to be a
    >> loser. It'll be fine if you're able to divide the work exactly evenly,
    >> but that's pretty hard to do, because some workers may succeed in
    >> copying the data faster than others for a variety of reasons: some
    >> data is in memory, some data has to be read from disk, different data
    >> may need to be read from different disks that run at different speeds,
    >> not all the network connections may run at the same speed. Remember
    >> that the backup's not done until the last worker finishes, and so
    >> there may well be a significant advantage in terms of overall speed in
    >> putting some energy into making sure that they finish as close to each
    >> other in time as possible.
    >>
    >> To put that another way, the first time all the workers except one get
    >> done while the last one still has 10GB of data to copy, somebody's
    >> going to be unhappy.
    >>
    >
    > I have updated the patch (see the attached patch) to include tablespace
    > support, tar format support and all other backup base backup options to
    > work in parallel mode as well. As previously suggested, I have removed
    > BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
    > backup. The tar format will write multiple tar files depending upon the
    > number of workers specified. Also made all commands
    > (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
    > base_backup_opt_list. This way the command-line options can also be
    > provided to these commands. Since the command-line options don't change
    > once the backup initiates, I went this way instead of storing them in
    > shared state.
    >
    > The START_BACKUP command will now return a sorted list of files in
    > descending order based on file sizes. This way, the larger files will be on
    > top of the list. hence these files will be assigned to workers one by one,
    > making it so that the larger files will be copied before other files.
    >
    > Based on my understanding your main concern is that the files won't be
    > distributed fairly i.e one worker might get a big file and take more time
    > while others get done early with smaller files? In this approach I have
    > created a list of files in descending order based on there sizes so all the
    > big size files will come at the top. The maximum file size in PG is 1GB so
    > if we have four workers who are picking up file from the list one by one,
    > the worst case scenario is that one worker gets a file of 1GB to process
    > while others get files of smaller size. However with this approach of
    > descending files based on size and handing it out to workers one by one,
    > there is a very high likelihood of workers getting work evenly. does this
    > address your concerns?
    >
    > Furthermore the patch also includes the regression test. As t/
    > 010_pg_basebackup.pl test-case is testing base backup comprehensively, so
    > I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added
    > parallel option in all of its tests, to make sure parallel mode works
    > expectantly. The one thing that differs from base backup is the file
    > checksum reporting. In parallel mode, the total number of checksum failures
    > are not reported correctly however it will abort the backup whenever a
    > checksum failure occurs. This is because processes are not maintaining any
    > shared state. I assume that it's not much important to report total number
    > of failures vs noticing the failure and aborting.
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Rushabh Lathia
    
  16. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-07T12:48:12Z

    On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    > Thanks Asif for the patch.  I am opting this for a review.  Patch is
    > bit big, so here are very initial comments to make the review process
    > easier.
    >
    
    Thanks Rushabh for reviewing the patch.
    
    
    > 1) Patch seems doing lot of code shuffling, I think it would be easy
    > to review if you can break the clean up patch separately.
    >
    > Example:
    > a: setup_throttle
    > b: include_wal_files
    >
    > 2) As I can see this patch basically have three major phase.
    >
    > a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
    > STOP_BACKUP.
    > b) Implementation of actual parallel backup.
    > c) Testcase
    >
    > I would suggest, if you can break out in three as a separate patch that
    > would be nice.  It will benefit in reviewing the patch.
    >
    
    Sure, why not. I will break them into multiple patches.
    
    
    >
    > 3) In your patch you are preparing the backup manifest (file which
    > giving the information about the data files). Robert Haas, submitted
    > the backup manifests patch on another thread [1], and I think we
    > should use that patch to get the backup manifests for parallel backup.
    >
    
    Sure. Though the backup manifest patch calculates and includes the checksum
    of backup files and is done
    while the file is being transferred to the frontend-end. The manifest file
    itself is copied at the
    very end of the backup. In parallel backup, I need the list of filenames
    before file contents are transferred, in
    order to divide them into multiple workers. For that, the manifest file has
    to be available when START_BACKUP
     is called.
    
    That means, backup manifest should support its creation while excluding the
    checksum during START_BACKUP().
    I also need the directory information as well for two reasons:
    
    - In plain format, base path has to exist before we can write the file. we
    can extract the base path from the file
    but doing that for all files does not seem a good idea.
    - base backup does not include the content of some directories but those
    directories although empty, are still
    expected in PGDATA.
    
    I can make these changes part of parallel backup (which would be on top of
    backup manifest patch) or
    these changes can be done as part of manifest patch and then parallel can
    use them.
    
    Robert what do you suggest?
    
    
    -- 
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  17. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-10-07T13:05:34Z

    On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
    > while the file is being transferred to the frontend-end. The manifest file itself is copied at the
    > very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
    > order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
    >  is called.
    >
    > That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
    > I also need the directory information as well for two reasons:
    >
    > - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
    > but doing that for all files does not seem a good idea.
    > - base backup does not include the content of some directories but those directories although empty, are still
    > expected in PGDATA.
    >
    > I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
    > these changes can be done as part of manifest patch and then parallel can use them.
    >
    > Robert what do you suggest?
    
    I think we should probably not use backup manifests here, actually. I
    initially thought that would be a good idea, but after further thought
    it seems like it just complicates the code to no real benefit.  I
    suggest that the START_BACKUP command just return a result set, like a
    query, with perhaps four columns: file name, file type ('d' for
    directory or 'f' for file), file size, file mtime. pg_basebackup will
    ignore the mtime, but some other tools might find that useful
    information.
    
    I wonder if we should also split START_BACKUP (which should enter
    non-exclusive backup mode) from GET_FILE_LIST, in case some other
    client program wants to use one of those but not the other.  I think
    that's probably a good idea, but not sure.
    
    I still think that the files should be requested one at a time, not a
    huge long list in a single command.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  18. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-07T13:35:19Z

    On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > > Sure. Though the backup manifest patch calculates and includes the
    > checksum of backup files and is done
    > > while the file is being transferred to the frontend-end. The manifest
    > file itself is copied at the
    > > very end of the backup. In parallel backup, I need the list of filenames
    > before file contents are transferred, in
    > > order to divide them into multiple workers. For that, the manifest file
    > has to be available when START_BACKUP
    > >  is called.
    > >
    > > That means, backup manifest should support its creation while excluding
    > the checksum during START_BACKUP().
    > > I also need the directory information as well for two reasons:
    > >
    > > - In plain format, base path has to exist before we can write the file.
    > we can extract the base path from the file
    > > but doing that for all files does not seem a good idea.
    > > - base backup does not include the content of some directories but those
    > directories although empty, are still
    > > expected in PGDATA.
    > >
    > > I can make these changes part of parallel backup (which would be on top
    > of backup manifest patch) or
    > > these changes can be done as part of manifest patch and then parallel
    > can use them.
    > >
    > > Robert what do you suggest?
    >
    > I think we should probably not use backup manifests here, actually. I
    > initially thought that would be a good idea, but after further thought
    > it seems like it just complicates the code to no real benefit.
    
    
    Okay.
    
    
    >   I
    > suggest that the START_BACKUP command just return a result set, like a
    > query, with perhaps four columns: file name, file type ('d' for
    > directory or 'f' for file), file size, file mtime. pg_basebackup will
    > ignore the mtime, but some other tools might find that useful
    > information.
    >
    yes current patch already returns the result set. will add the additional
    information.
    
    
    > I wonder if we should also split START_BACKUP (which should enter
    > non-exclusive backup mode) from GET_FILE_LIST, in case some other
    > client program wants to use one of those but not the other.  I think
    > that's probably a good idea, but not sure.
    >
    
    Currently pg_basebackup does not enter in exclusive backup mode and other
    tools have to
    use pg_start_backup() and pg_stop_backup() functions to achieve that. Since
    we are breaking
    backup into multiple command, I believe it would be a good idea to have
    this option. I will include
    it in next revision of this patch.
    
    
    >
    > I still think that the files should be requested one at a time, not a
    > huge long list in a single command.
    >
    sure, will make the change.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  19. Re: WIP/PoC for parallel backup

    Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-10-07T13:43:22Z

    On Mon, Oct 7, 2019 at 6:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > > Sure. Though the backup manifest patch calculates and includes the
    > checksum of backup files and is done
    > > while the file is being transferred to the frontend-end. The manifest
    > file itself is copied at the
    > > very end of the backup. In parallel backup, I need the list of filenames
    > before file contents are transferred, in
    > > order to divide them into multiple workers. For that, the manifest file
    > has to be available when START_BACKUP
    > >  is called.
    > >
    > > That means, backup manifest should support its creation while excluding
    > the checksum during START_BACKUP().
    > > I also need the directory information as well for two reasons:
    > >
    > > - In plain format, base path has to exist before we can write the file.
    > we can extract the base path from the file
    > > but doing that for all files does not seem a good idea.
    > > - base backup does not include the content of some directories but those
    > directories although empty, are still
    > > expected in PGDATA.
    > >
    > > I can make these changes part of parallel backup (which would be on top
    > of backup manifest patch) or
    > > these changes can be done as part of manifest patch and then parallel
    > can use them.
    > >
    > > Robert what do you suggest?
    >
    > I think we should probably not use backup manifests here, actually. I
    > initially thought that would be a good idea, but after further thought
    > it seems like it just complicates the code to no real benefit.  I
    > suggest that the START_BACKUP command just return a result set, like a
    > query, with perhaps four columns: file name, file type ('d' for
    > directory or 'f' for file), file size, file mtime. pg_basebackup will
    > ignore the mtime, but some other tools might find that useful
    > information.
    >
    > I wonder if we should also split START_BACKUP (which should enter
    > non-exclusive backup mode) from GET_FILE_LIST, in case some other
    > client program wants to use one of those but not the other.  I think
    > that's probably a good idea, but not sure.
    >
    > I still think that the files should be requested one at a time, not a
    > huge long list in a single command.
    >
    
    What about have an API to get the single file or list of files? We will use
    a single file in
    our application and other tools can get the benefit of list of files.
    
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    >
    >
    
    -- 
    Ibrar Ahmed
    
  20. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-10-07T13:47:29Z

    On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
    > What about have an API to get the single file or list of files? We will use a single file in
    > our application and other tools can get the benefit of list of files.
    
    That sounds a bit speculative to me. Who is to say that anyone will
    find that useful? I mean, I think it's fine and good to build the
    functionality that we need in a way that maximizes the likelihood that
    other tools can reuse that functionality, and I think we should do
    that. But I don't think it's smart to build functionality that we
    don't really need in the hope that somebody else will find it useful
    unless we're pretty sure that they actually will. I don't see that as
    being the case here; YMMV.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  21. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-16T13:19:10Z

    On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > Sure. Though the backup manifest patch calculates and includes the
    >> checksum of backup files and is done
    >> > while the file is being transferred to the frontend-end. The manifest
    >> file itself is copied at the
    >> > very end of the backup. In parallel backup, I need the list of
    >> filenames before file contents are transferred, in
    >> > order to divide them into multiple workers. For that, the manifest file
    >> has to be available when START_BACKUP
    >> >  is called.
    >> >
    >> > That means, backup manifest should support its creation while excluding
    >> the checksum during START_BACKUP().
    >> > I also need the directory information as well for two reasons:
    >> >
    >> > - In plain format, base path has to exist before we can write the file.
    >> we can extract the base path from the file
    >> > but doing that for all files does not seem a good idea.
    >> > - base backup does not include the content of some directories but
    >> those directories although empty, are still
    >> > expected in PGDATA.
    >> >
    >> > I can make these changes part of parallel backup (which would be on top
    >> of backup manifest patch) or
    >> > these changes can be done as part of manifest patch and then parallel
    >> can use them.
    >> >
    >> > Robert what do you suggest?
    >>
    >> I think we should probably not use backup manifests here, actually. I
    >> initially thought that would be a good idea, but after further thought
    >> it seems like it just complicates the code to no real benefit.
    >
    >
    > Okay.
    >
    >
    >>   I
    >> suggest that the START_BACKUP command just return a result set, like a
    >> query, with perhaps four columns: file name, file type ('d' for
    >> directory or 'f' for file), file size, file mtime. pg_basebackup will
    >> ignore the mtime, but some other tools might find that useful
    >> information.
    >>
    > yes current patch already returns the result set. will add the additional
    > information.
    >
    >
    >> I wonder if we should also split START_BACKUP (which should enter
    >> non-exclusive backup mode) from GET_FILE_LIST, in case some other
    >> client program wants to use one of those but not the other.  I think
    >> that's probably a good idea, but not sure.
    >>
    >
    > Currently pg_basebackup does not enter in exclusive backup mode and other
    > tools have to
    > use pg_start_backup() and pg_stop_backup() functions to achieve that.
    > Since we are breaking
    > backup into multiple command, I believe it would be a good idea to have
    > this option. I will include
    > it in next revision of this patch.
    >
    >
    >>
    >> I still think that the files should be requested one at a time, not a
    >> huge long list in a single command.
    >>
    > sure, will make the change.
    >
    >
    >
    
    I have refactored the functionality into multiple smaller patches in order
    to make the review process easier. I have divided the code into backend
    changes and pg_basebackup changes. The
    backend replication system now supports the following commands:
    
    - START_BACKUP
    - SEND_FILE_LIST
    - SEND_FILES_CONTENT
    - STOP_BACKUP
    
    The START_BACKUP will not return the list of files, instead SEND_FILE_LIST
    is used for that. The START_BACKUP
    now calls pg_start_backup and returns starting WAL position, tablespace
    header information and content of backup label file.
    Initially I was using tmp files to store the backup_label content but that
    turns out to be bad idea, because there can be multiple
    non-exclusive backups running. The backup label information is needed by
    stop_backup so pg_basebackup will send it as part
    of STOP_BACKUP.
    
    The SEND_FILE_LIST will return the list of files. It will be returned as
    resultset having four columns (filename, type, size, mtime).
    The SEND_FILES_CONTENT can now return the single file or multiple files as
    required. There is not much change required to
    support both, so I believe it will be much useable this way if other tools
    want to utilise it.
    
    As per suggestion from Robert, I am currently working on making changes in
    pg_basebackup to fetch files one by one. However that's not complete and
    the attach patch
    is still using the old method of multi-file fetching to test the backend
    commands. I will send an updated patch which will contain the changes on
    fetching file one by one.
    
    I wanted to share the backend patch to get some feedback in the mean time.
    
    Thanks,
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  22. Re: WIP/PoC for parallel backup

    Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2019-10-16T20:32:56Z

    I quickly tried to have a look at your 0001-refactor patch.
    Here are some comments:
    
    1. The patch fails to compile.
    
    Sorry if I am missing something, but am not able to understand why in new
    function collectTablespaces() you have added an extra parameter NULL while
    calling sendTablespace(), it fails the compilation :
    
    + ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
    
    
    gcc -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Werror=vla -Wendif-labels
    -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
    -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
    -I../../../../src/include    -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
    xlog.c:12253:59: error: too many arguments to function call, expected 2,
    have 3
                    ti->size = infotbssize ? sendTablespace(fullpath, true,
    NULL) : -1;
                                             ~~~~~~~~~~~~~~                 ^~~~
    
    2. I think the patch needs to run via pg_indent. It does not follow 80
    column
    width.
    e.g.
    
    +void
    +collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
    infotbssize, bool needtblspcmapfile)
    +{
    
    3.
    The comments in re-factored code appear to be redundant. example:
    Following comment:
     /* Setup and activate network throttling, if client requested it */
    appears thrice in the code, before calling setup_throttle(), in the
    prologue of
    the function setup_throttle(), and above the if() in that function.
    Similarly - the comment:
    /* Collect information about all tablespaces */
    in collectTablespaces().
    
    4.
    In function include_wal_files() why is the parameter TimeLineID i.e. endtli
    needed. I don't see it being used in the function at all. I think you can
    safely
    get rid of it.
    
    +include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
    
    Regards,
    Jeevan Ladhe
    
    On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >
    >>
    >>
    >> On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >>
    >>> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>> > Sure. Though the backup manifest patch calculates and includes the
    >>> checksum of backup files and is done
    >>> > while the file is being transferred to the frontend-end. The manifest
    >>> file itself is copied at the
    >>> > very end of the backup. In parallel backup, I need the list of
    >>> filenames before file contents are transferred, in
    >>> > order to divide them into multiple workers. For that, the manifest
    >>> file has to be available when START_BACKUP
    >>> >  is called.
    >>> >
    >>> > That means, backup manifest should support its creation while
    >>> excluding the checksum during START_BACKUP().
    >>> > I also need the directory information as well for two reasons:
    >>> >
    >>> > - In plain format, base path has to exist before we can write the
    >>> file. we can extract the base path from the file
    >>> > but doing that for all files does not seem a good idea.
    >>> > - base backup does not include the content of some directories but
    >>> those directories although empty, are still
    >>> > expected in PGDATA.
    >>> >
    >>> > I can make these changes part of parallel backup (which would be on
    >>> top of backup manifest patch) or
    >>> > these changes can be done as part of manifest patch and then parallel
    >>> can use them.
    >>> >
    >>> > Robert what do you suggest?
    >>>
    >>> I think we should probably not use backup manifests here, actually. I
    >>> initially thought that would be a good idea, but after further thought
    >>> it seems like it just complicates the code to no real benefit.
    >>
    >>
    >> Okay.
    >>
    >>
    >>>   I
    >>> suggest that the START_BACKUP command just return a result set, like a
    >>> query, with perhaps four columns: file name, file type ('d' for
    >>> directory or 'f' for file), file size, file mtime. pg_basebackup will
    >>> ignore the mtime, but some other tools might find that useful
    >>> information.
    >>>
    >> yes current patch already returns the result set. will add the additional
    >> information.
    >>
    >>
    >>> I wonder if we should also split START_BACKUP (which should enter
    >>> non-exclusive backup mode) from GET_FILE_LIST, in case some other
    >>> client program wants to use one of those but not the other.  I think
    >>> that's probably a good idea, but not sure.
    >>>
    >>
    >> Currently pg_basebackup does not enter in exclusive backup mode and other
    >> tools have to
    >> use pg_start_backup() and pg_stop_backup() functions to achieve that.
    >> Since we are breaking
    >> backup into multiple command, I believe it would be a good idea to have
    >> this option. I will include
    >> it in next revision of this patch.
    >>
    >>
    >>>
    >>> I still think that the files should be requested one at a time, not a
    >>> huge long list in a single command.
    >>>
    >> sure, will make the change.
    >>
    >>
    >>
    >
    > I have refactored the functionality into multiple smaller patches in order
    > to make the review process easier. I have divided the code into backend
    > changes and pg_basebackup changes. The
    > backend replication system now supports the following commands:
    >
    > - START_BACKUP
    > - SEND_FILE_LIST
    > - SEND_FILES_CONTENT
    > - STOP_BACKUP
    >
    > The START_BACKUP will not return the list of files, instead SEND_FILE_LIST
    > is used for that. The START_BACKUP
    > now calls pg_start_backup and returns starting WAL position, tablespace
    > header information and content of backup label file.
    > Initially I was using tmp files to store the backup_label content but that
    > turns out to be bad idea, because there can be multiple
    > non-exclusive backups running. The backup label information is needed by
    > stop_backup so pg_basebackup will send it as part
    > of STOP_BACKUP.
    >
    > The SEND_FILE_LIST will return the list of files. It will be returned as
    > resultset having four columns (filename, type, size, mtime).
    > The SEND_FILES_CONTENT can now return the single file or multiple files as
    > required. There is not much change required to
    > support both, so I believe it will be much useable this way if other tools
    > want to utilise it.
    >
    > As per suggestion from Robert, I am currently working on making changes in
    > pg_basebackup to fetch files one by one. However that's not complete and
    > the attach patch
    > is still using the old method of multi-file fetching to test the backend
    > commands. I will send an updated patch which will contain the changes on
    > fetching file one by one.
    >
    > I wanted to share the backend patch to get some feedback in the mean time.
    >
    > Thanks,
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  23. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-17T05:21:15Z

    On Thu, Oct 17, 2019 at 1:33 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
    wrote:
    
    > I quickly tried to have a look at your 0001-refactor patch.
    > Here are some comments:
    >
    > 1. The patch fails to compile.
    >
    > Sorry if I am missing something, but am not able to understand why in new
    > function collectTablespaces() you have added an extra parameter NULL while
    > calling sendTablespace(), it fails the compilation :
    >
    > + ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
    >
    >
    > gcc -Wall -Wmissing-prototypes -Wpointer-arith
    > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
    > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
    > -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
    > -I../../../../src/include    -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
    > xlog.c:12253:59: error: too many arguments to function call, expected 2,
    > have 3
    >                 ti->size = infotbssize ? sendTablespace(fullpath, true,
    > NULL) : -1;
    >                                          ~~~~~~~~~~~~~~
    > ^~~~
    >
    > 2. I think the patch needs to run via pg_indent. It does not follow 80
    > column
    > width.
    > e.g.
    >
    > +void
    > +collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
    > infotbssize, bool needtblspcmapfile)
    > +{
    >
    > 3.
    > The comments in re-factored code appear to be redundant. example:
    > Following comment:
    >  /* Setup and activate network throttling, if client requested it */
    > appears thrice in the code, before calling setup_throttle(), in the
    > prologue of
    > the function setup_throttle(), and above the if() in that function.
    > Similarly - the comment:
    > /* Collect information about all tablespaces */
    > in collectTablespaces().
    >
    > 4.
    > In function include_wal_files() why is the parameter TimeLineID i.e. endtli
    > needed. I don't see it being used in the function at all. I think you can
    > safely
    > get rid of it.
    >
    > +include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
    >
    >
    Thanks Jeevan. Some changes that should be part of 2nd patch were left in
    the 1st. I have fixed that and the above mentioned issues as well.
    Attached are the updated patches.
    
    Thanks,
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  24. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-10-18T11:11:53Z

    On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > Attached are the updated patches.
    >
    
    I had a quick look over these changes and they look good overall.
    However, here are my few review comments I caught while glancing the patches
    0002 and 0003.
    
    
    --- 0002 patch
    
    1.
    Can lsn option be renamed to start-wal-location? This will be more clear
    too.
    
    2.
    +typedef struct
    +{
    +    char        name[MAXPGPATH];
    +    char        type;
    +    int32        size;
    +    time_t        mtime;
    +} BackupFile;
    
    I think it will be good if we keep this structure in a common place so that
    the client can also use it.
    
    3.
    +    SEND_FILE_LIST,
    +    SEND_FILES_CONTENT,
    Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
    respectively?
    The reason behind the first name change is, we are not getting only file
    lists
    here instead we are getting a few more details with that too. And for
    others,
    it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
    
    4.
    Typos:
    non-exlusive => non-exclusive
    retured => returned
    optionaly => optionally
    nessery => necessary
    totoal => total
    
    
    --- 0003 patch
    
    1.
    +static int
    +simple_list_length(SimpleStringList *list)
    +{
    +    int            len = 0;
    +    SimpleStringListCell *cell;
    +
    +    for (cell = list->head; cell; cell = cell->next, len++)
    +        ;
    +
    +    return len;
    +}
    
    I think it will be good if it goes to simple_list.c. That will help in other
    usages as well.
    
    2.
    Please revert these unnecessary changes:
    
    @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
    int rownum)
                  */
                 snprintf(filename, sizeof(filename), "%s/%s", current_path,
                          copybuf);
    +
                 if (filename[strlen(filename) - 1] == '/')
                 {
                     /*
    
    @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
    int rownum)
                          * can map them too.)
                          */
                         filename[strlen(filename) - 1] = '\0';    /* Remove
    trailing slash */
    -
                         mapped_tblspc_path =
    get_tablespace_mapping(&copybuf[157]);
    +
                         if (symlink(mapped_tblspc_path, filename) != 0)
                         {
                             pg_log_error("could not create symbolic link from
    \"%s\" to \"%s\": %m",
    
    3.
    Typos:
    retrive => retrieve
    takecare => take care
    tablespae => tablespace
    
    4.
    ParallelBackupEnd() function does not do anything for parallelism. Will it
    be
    better to just rename it as EndBackup()?
    
    5.
    To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
    reusing
    a LABEL option, that seems odd. How about adding a new option for that?
    
    6.
    It will be good if we have some comments explaining what the function is
    actually doing in its prologue. For functions like:
    GetBackupFilesList()
    ReceiveFiles()
    create_workers_and_fetch()
    
    
    Thanks
    
    
    >
    > Thanks,
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  25. Re: WIP/PoC for parallel backup

    Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-10-24T10:19:08Z

    On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >> Attached are the updated patches.
    >>
    >
    > I had a quick look over these changes and they look good overall.
    > However, here are my few review comments I caught while glancing the
    > patches
    > 0002 and 0003.
    >
    >
    > --- 0002 patch
    >
    > 1.
    > Can lsn option be renamed to start-wal-location? This will be more clear
    > too.
    >
    > 2.
    > +typedef struct
    > +{
    > +    char        name[MAXPGPATH];
    > +    char        type;
    > +    int32        size;
    > +    time_t        mtime;
    > +} BackupFile;
    >
    > I think it will be good if we keep this structure in a common place so that
    > the client can also use it.
    >
    > 3.
    > +    SEND_FILE_LIST,
    > +    SEND_FILES_CONTENT,
    > Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
    > respectively?
    > The reason behind the first name change is, we are not getting only file
    > lists
    > here instead we are getting a few more details with that too. And for
    > others,
    > it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
    >
    > 4.
    > Typos:
    > non-exlusive => non-exclusive
    > retured => returned
    > optionaly => optionally
    > nessery => necessary
    > totoal => total
    >
    >
    > --- 0003 patch
    >
    > 1.
    > +static int
    > +simple_list_length(SimpleStringList *list)
    > +{
    > +    int            len = 0;
    > +    SimpleStringListCell *cell;
    > +
    > +    for (cell = list->head; cell; cell = cell->next, len++)
    > +        ;
    > +
    > +    return len;
    > +}
    >
    > I think it will be good if it goes to simple_list.c. That will help in
    > other
    > usages as well.
    >
    > 2.
    > Please revert these unnecessary changes:
    >
    > @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
    > int rownum)
    >               */
    >              snprintf(filename, sizeof(filename), "%s/%s", current_path,
    >                       copybuf);
    > +
    >              if (filename[strlen(filename) - 1] == '/')
    >              {
    >                  /*
    >
    > @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
    > int rownum)
    >                       * can map them too.)
    >                       */
    >                      filename[strlen(filename) - 1] = '\0';    /* Remove
    > trailing slash */
    > -
    >                      mapped_tblspc_path =
    > get_tablespace_mapping(&copybuf[157]);
    > +
    >                      if (symlink(mapped_tblspc_path, filename) != 0)
    >                      {
    >                          pg_log_error("could not create symbolic link from
    > \"%s\" to \"%s\": %m",
    >
    > 3.
    > Typos:
    > retrive => retrieve
    > takecare => take care
    > tablespae => tablespace
    >
    > 4.
    > ParallelBackupEnd() function does not do anything for parallelism. Will it
    > be
    > better to just rename it as EndBackup()?
    >
    > 5.
    > To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
    > reusing
    > a LABEL option, that seems odd. How about adding a new option for that?
    >
    > 6.
    > It will be good if we have some comments explaining what the function is
    > actually doing in its prologue. For functions like:
    > GetBackupFilesList()
    > ReceiveFiles()
    > create_workers_and_fetch()
    >
    >
    > Thanks
    >
    >
    >>
    >> Thanks,
    >>
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    >
    > --
    > Jeevan Chalke
    > Associate Database Architect & Team Lead, Product Development
    > EnterpriseDB Corporation
    > The Enterprise PostgreSQL Company
    >
    >
    I had a detailed discussion with Robert Haas at PostgreConf Europe about
    parallel backup.
    We discussed the current state of the patch and what needs to be done to
    get the patch committed.
    
    - The current patch uses a process to implement parallelism. There are many
    reasons we need to use threads instead of processes. To start with, as this
    is a client utility it makes
    more sense to use threads. The data needs to be shared amongst different
    threads and the main process,
    handling that is simpler as compared to interprocess communication.
    
    - Fetching a single file or multiple files was also discussed. We concluded
    in our discussion that we
    need to benchmark to see if disk I/O is a bottleneck or not and if parallel
    writing gives us
    any benefit. This benchmark needs to be done on different hardware and
    different
    network to identify which are the real bottlenecks. In general, we agreed
    that we could start with fetching
    one file at a time but that will be revisited after the benchmarks are done.
    
    - There is also an ongoing debate in this thread that we should have one
    single tar file for all files or one
    TAR file per thread. I really want to have a single tar file because the
    main purpose of the TAR file is to
    reduce the management of multiple files, but in case of one file per
    thread, we end up with many tar
    files. Therefore we need to have one master thread which is responsible for
    writing on tar file and all
    the other threads will receive the data from the network and stream to the
    master thread. This also
    supports the idea of using a thread-based model rather than a process-based
    approach because it
    requires too much data sharing between processes. If we cannot achieve
    this, then we can disable the
    TAR option for parallel backup in the first version.
    
    - In the case of data sharing, we need to try to avoid unnecessary locking
    and more suitable algorithm to
    solve the reader-writer problem is required.
    
    -- 
    Ibrar Ahmed
    
  26. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-24T11:24:41Z

    On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
    
    >
    >
    > On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
    > jeevan.chalke@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>> Attached are the updated patches.
    >>>
    >>
    >> I had a quick look over these changes and they look good overall.
    >> However, here are my few review comments I caught while glancing the
    >> patches
    >> 0002 and 0003.
    >>
    >>
    >> --- 0002 patch
    >>
    >> 1.
    >> Can lsn option be renamed to start-wal-location? This will be more clear
    >> too.
    >>
    >> 2.
    >> +typedef struct
    >> +{
    >> +    char        name[MAXPGPATH];
    >> +    char        type;
    >> +    int32        size;
    >> +    time_t        mtime;
    >> +} BackupFile;
    >>
    >> I think it will be good if we keep this structure in a common place so
    >> that
    >> the client can also use it.
    >>
    >> 3.
    >> +    SEND_FILE_LIST,
    >> +    SEND_FILES_CONTENT,
    >> Can above two commands renamed to SEND_BACKUP_MANIFEST and
    >> SEND_BACKUP_FILE
    >> respectively?
    >> The reason behind the first name change is, we are not getting only file
    >> lists
    >> here instead we are getting a few more details with that too. And for
    >> others,
    >> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
    >>
    >> 4.
    >> Typos:
    >> non-exlusive => non-exclusive
    >> retured => returned
    >> optionaly => optionally
    >> nessery => necessary
    >> totoal => total
    >>
    >>
    >> --- 0003 patch
    >>
    >> 1.
    >> +static int
    >> +simple_list_length(SimpleStringList *list)
    >> +{
    >> +    int            len = 0;
    >> +    SimpleStringListCell *cell;
    >> +
    >> +    for (cell = list->head; cell; cell = cell->next, len++)
    >> +        ;
    >> +
    >> +    return len;
    >> +}
    >>
    >> I think it will be good if it goes to simple_list.c. That will help in
    >> other
    >> usages as well.
    >>
    >> 2.
    >> Please revert these unnecessary changes:
    >>
    >> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
    >> *res, int rownum)
    >>               */
    >>              snprintf(filename, sizeof(filename), "%s/%s", current_path,
    >>                       copybuf);
    >> +
    >>              if (filename[strlen(filename) - 1] == '/')
    >>              {
    >>                  /*
    >>
    >> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
    >> *res, int rownum)
    >>                       * can map them too.)
    >>                       */
    >>                      filename[strlen(filename) - 1] = '\0';    /* Remove
    >> trailing slash */
    >> -
    >>                      mapped_tblspc_path =
    >> get_tablespace_mapping(&copybuf[157]);
    >> +
    >>                      if (symlink(mapped_tblspc_path, filename) != 0)
    >>                      {
    >>                          pg_log_error("could not create symbolic link
    >> from \"%s\" to \"%s\": %m",
    >>
    >> 3.
    >> Typos:
    >> retrive => retrieve
    >> takecare => take care
    >> tablespae => tablespace
    >>
    >> 4.
    >> ParallelBackupEnd() function does not do anything for parallelism. Will
    >> it be
    >> better to just rename it as EndBackup()?
    >>
    >> 5.
    >> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
    >> reusing
    >> a LABEL option, that seems odd. How about adding a new option for that?
    >>
    >> 6.
    >> It will be good if we have some comments explaining what the function is
    >> actually doing in its prologue. For functions like:
    >> GetBackupFilesList()
    >> ReceiveFiles()
    >> create_workers_and_fetch()
    >>
    >>
    >> Thanks
    >>
    >>
    >>>
    >>> Thanks,
    >>>
    >>> --
    >>> Asif Rehman
    >>> Highgo Software (Canada/China/Pakistan)
    >>> URL : www.highgo.ca
    >>>
    >>>
    >>
    >> --
    >> Jeevan Chalke
    >> Associate Database Architect & Team Lead, Product Development
    >> EnterpriseDB Corporation
    >> The Enterprise PostgreSQL Company
    >>
    >>
    > I had a detailed discussion with Robert Haas at PostgreConf Europe about
    > parallel backup.
    > We discussed the current state of the patch and what needs to be done to
    > get the patch committed.
    >
    > - The current patch uses a process to implement parallelism. There are many
    > reasons we need to use threads instead of processes. To start with, as
    > this is a client utility it makes
    > more sense to use threads. The data needs to be shared amongst different
    > threads and the main process,
    > handling that is simpler as compared to interprocess communication.
    >
    
    Yes I agree. I have already converted the code to use threads instead of
    processes. This avoids the overhead
    of interprocess communication.
    
    With a single file fetching strategy, this requires communication between
    competing threads/processes. To handle
    that in a multiprocess application, it requires IPC. The current approach
    of multiple threads instead of processes
    avoids this overhead.
    
    
    > - Fetching a single file or multiple files was also discussed. We
    > concluded in our discussion that we
    > need to benchmark to see if disk I/O is a bottleneck or not and if
    > parallel writing gives us
    > any benefit. This benchmark needs to be done on different hardware and
    > different
    > network to identify which are the real bottlenecks. In general, we agreed
    > that we could start with fetching
    > one file at a time but that will be revisited after the benchmarks are
    > done.
    >
    
    I'll share the updated patch in the next couple of days. After that, I'll
    work on benchmarking that in
    different environments that I have.
    
    
    >
    > - There is also an ongoing debate in this thread that we should have one
    > single tar file for all files or one
    > TAR file per thread. I really want to have a single tar file because the
    > main purpose of the TAR file is to
    > reduce the management of multiple files, but in case of one file per
    > thread, we end up with many tar
    > files. Therefore we need to have one master thread which is responsible
    > for writing on tar file and all
    > the other threads will receive the data from the network and stream to the
    > master thread. This also
    > supports the idea of using a thread-based model rather than a
    > process-based approach because it
    > requires too much data sharing between processes. If we cannot achieve
    > this, then we can disable the
    > TAR option for parallel backup in the first version.
    >
    
    I am in favour of disabling the tar format for the first version of
    parallel backup.
    
    
    > - In the case of data sharing, we need to try to avoid unnecessary locking
    > and more suitable algorithm to
    > solve the reader-writer problem is required.
    >
    > --
    > Ibrar Ahmed
    >
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  27. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-28T14:03:33Z

    On Thu, Oct 24, 2019 at 4:24 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
    >
    >>
    >>
    >> On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
    >> jeevan.chalke@enterprisedb.com> wrote:
    >>
    >>>
    >>>
    >>> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>>
    >>>> Attached are the updated patches.
    >>>>
    >>>
    >>> I had a quick look over these changes and they look good overall.
    >>> However, here are my few review comments I caught while glancing the
    >>> patches
    >>> 0002 and 0003.
    >>>
    >>>
    >>> --- 0002 patch
    >>>
    >>> 1.
    >>> Can lsn option be renamed to start-wal-location? This will be more clear
    >>> too.
    >>>
    >>> 2.
    >>> +typedef struct
    >>> +{
    >>> +    char        name[MAXPGPATH];
    >>> +    char        type;
    >>> +    int32        size;
    >>> +    time_t        mtime;
    >>> +} BackupFile;
    >>>
    >>> I think it will be good if we keep this structure in a common place so
    >>> that
    >>> the client can also use it.
    >>>
    >>> 3.
    >>> +    SEND_FILE_LIST,
    >>> +    SEND_FILES_CONTENT,
    >>> Can above two commands renamed to SEND_BACKUP_MANIFEST and
    >>> SEND_BACKUP_FILE
    >>> respectively?
    >>> The reason behind the first name change is, we are not getting only file
    >>> lists
    >>> here instead we are getting a few more details with that too. And for
    >>> others,
    >>> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
    >>>
    >>> 4.
    >>> Typos:
    >>> non-exlusive => non-exclusive
    >>> retured => returned
    >>> optionaly => optionally
    >>> nessery => necessary
    >>> totoal => total
    >>>
    >>>
    >>> --- 0003 patch
    >>>
    >>> 1.
    >>> +static int
    >>> +simple_list_length(SimpleStringList *list)
    >>> +{
    >>> +    int            len = 0;
    >>> +    SimpleStringListCell *cell;
    >>> +
    >>> +    for (cell = list->head; cell; cell = cell->next, len++)
    >>> +        ;
    >>> +
    >>> +    return len;
    >>> +}
    >>>
    >>> I think it will be good if it goes to simple_list.c. That will help in
    >>> other
    >>> usages as well.
    >>>
    >>> 2.
    >>> Please revert these unnecessary changes:
    >>>
    >>> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
    >>> *res, int rownum)
    >>>               */
    >>>              snprintf(filename, sizeof(filename), "%s/%s", current_path,
    >>>                       copybuf);
    >>> +
    >>>              if (filename[strlen(filename) - 1] == '/')
    >>>              {
    >>>                  /*
    >>>
    >>> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
    >>> *res, int rownum)
    >>>                       * can map them too.)
    >>>                       */
    >>>                      filename[strlen(filename) - 1] = '\0';    /* Remove
    >>> trailing slash */
    >>> -
    >>>                      mapped_tblspc_path =
    >>> get_tablespace_mapping(&copybuf[157]);
    >>> +
    >>>                      if (symlink(mapped_tblspc_path, filename) != 0)
    >>>                      {
    >>>                          pg_log_error("could not create symbolic link
    >>> from \"%s\" to \"%s\": %m",
    >>>
    >>> 3.
    >>> Typos:
    >>> retrive => retrieve
    >>> takecare => take care
    >>> tablespae => tablespace
    >>>
    >>> 4.
    >>> ParallelBackupEnd() function does not do anything for parallelism. Will
    >>> it be
    >>> better to just rename it as EndBackup()?
    >>>
    >>> 5.
    >>> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
    >>> reusing
    >>> a LABEL option, that seems odd. How about adding a new option for that?
    >>>
    >>> 6.
    >>> It will be good if we have some comments explaining what the function is
    >>> actually doing in its prologue. For functions like:
    >>> GetBackupFilesList()
    >>> ReceiveFiles()
    >>> create_workers_and_fetch()
    >>>
    >>>
    >>> Thanks
    >>>
    >>>
    >>>>
    >>>> Thanks,
    >>>>
    >>>> --
    >>>> Asif Rehman
    >>>> Highgo Software (Canada/China/Pakistan)
    >>>> URL : www.highgo.ca
    >>>>
    >>>>
    >>>
    >>> --
    >>> Jeevan Chalke
    >>> Associate Database Architect & Team Lead, Product Development
    >>> EnterpriseDB Corporation
    >>> The Enterprise PostgreSQL Company
    >>>
    >>>
    >> I had a detailed discussion with Robert Haas at PostgreConf Europe about
    >> parallel backup.
    >> We discussed the current state of the patch and what needs to be done to
    >> get the patch committed.
    >>
    >> - The current patch uses a process to implement parallelism. There are
    >> many
    >> reasons we need to use threads instead of processes. To start with, as
    >> this is a client utility it makes
    >> more sense to use threads. The data needs to be shared amongst different
    >> threads and the main process,
    >> handling that is simpler as compared to interprocess communication.
    >>
    >
    > Yes I agree. I have already converted the code to use threads instead of
    > processes. This avoids the overhead
    > of interprocess communication.
    >
    > With a single file fetching strategy, this requires communication between
    > competing threads/processes. To handle
    > that in a multiprocess application, it requires IPC. The current approach
    > of multiple threads instead of processes
    > avoids this overhead.
    >
    >
    >> - Fetching a single file or multiple files was also discussed. We
    >> concluded in our discussion that we
    >> need to benchmark to see if disk I/O is a bottleneck or not and if
    >> parallel writing gives us
    >> any benefit. This benchmark needs to be done on different hardware and
    >> different
    >> network to identify which are the real bottlenecks. In general, we agreed
    >> that we could start with fetching
    >> one file at a time but that will be revisited after the benchmarks are
    >> done.
    >>
    >
    > I'll share the updated patch in the next couple of days. After that, I'll
    > work on benchmarking that in
    > different environments that I have.
    >
    >
    >>
    >> - There is also an ongoing debate in this thread that we should have one
    >> single tar file for all files or one
    >> TAR file per thread. I really want to have a single tar file because the
    >> main purpose of the TAR file is to
    >> reduce the management of multiple files, but in case of one file per
    >> thread, we end up with many tar
    >> files. Therefore we need to have one master thread which is responsible
    >> for writing on tar file and all
    >> the other threads will receive the data from the network and stream to
    >> the master thread. This also
    >> supports the idea of using a thread-based model rather than a
    >> process-based approach because it
    >> requires too much data sharing between processes. If we cannot achieve
    >> this, then we can disable the
    >> TAR option for parallel backup in the first version.
    >>
    >
    > I am in favour of disabling the tar format for the first version of
    > parallel backup.
    >
    >
    >> - In the case of data sharing, we need to try to avoid unnecessary
    >> locking and more suitable algorithm to
    >> solve the reader-writer problem is required.
    >>
    >> --
    >> Ibrar Ahmed
    >>
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    I have updated the patch to include the changes suggested by Jeevan. This
    patch also implements the thread workers instead of
    processes and fetches a single file at a time. The tar format has been
    disabled for first version of parallel backup.
    
    Conversion from the previous process based application to the current
    thread based one required slight modification in data structure,
    addition of a few new functions and progress reporting functionality.
    
    The core data structure remains in tact where table space based file
    listing is maintained, however, we are now maintaining a list of all
    files (maintaining pointers to FileInfo structure; so no duplication of
    data), so that we can sequentially access these without adding too
    much processing in critical section. The current scope of the critical
    section for thread workers is limited to incrementing the file index
    within the list of files.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  28. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-10-28T15:28:49Z

    On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
    > processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.
    
    Looking at 0001-0003:
    
    It's not clear to me what the purpose of the start WAL location is
    supposed to be. As far as I can see, SendBackupFiles() stores it in a
    variable which is then used for exactly nothing, and nothing else uses
    it.  It seems like that would be part of a potential incremental
    backup feature, but I don't see what it's got to do with parallel full
    backup.
    
    The tablespace_path option appears entirely unused, and I don't know
    why that should be necessary here, either.
    
    STORE_BACKUPFILE() seems like maybe it should be a function rather
    than a macro, and also probably be renamed, because it doesn't store
    files and the argument's not necessarily a file.
    
    SendBackupManifest() does not send a backup manifest in the sense
    contemplated by the email thread on that subject.  It sends a file
    list.  That seems like the right idea - IMHO, anyway - but you need to
    do a thorough renaming.
    
    I think it would be fine to decide that this facility won't support
    exclusive-mode backup.
    
    I don't think much of having both sendDir() and sendDir_(). The latter
    name is inconsistent with any naming convention we have, and there
    seems to be no reason not to just add an argument to sendDir() and
    change the callers.
    
    I think we should rename - perhaps as a preparatory patch - the
    sizeonly flag to dryrun, or something like that.
    
    The resource cleanup does not look right.  You've included calls to
    PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
    and StopBackup(), but what happens if there is an error or even a
    clean shutdown of the connection in between? I think that there needs
    to be some change here to ensure that a walsender will always call
    base_backup_cleanup() when it exits; I think that'd probably remove
    the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
    we have already.  This might also be something that could be done as a
    separate, prepatory refactoring patch.
    
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  29. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-10-30T14:16:11Z

    On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > I have updated the patch to include the changes suggested by Jeevan.
    > This patch also implements the thread workers instead of
    > > processes and fetches a single file at a time. The tar format has been
    > disabled for first version of parallel backup.
    >
    > Looking at 0001-0003:
    >
    > It's not clear to me what the purpose of the start WAL location is
    > supposed to be. As far as I can see, SendBackupFiles() stores it in a
    > variable which is then used for exactly nothing, and nothing else uses
    > it.  It seems like that would be part of a potential incremental
    > backup feature, but I don't see what it's got to do with parallel full
    > backup.
    >
    
    'startptr' is used by sendFile() during checksum verification. Since
    SendBackupFiles() is using sendFIle we have to set a valid WAL location.
    
    
    > The tablespace_path option appears entirely unused, and I don't know
    > why that should be necessary here, either.
    >
    
    This is to calculate the basepathlen. We need to exclude the tablespace
    location (or
    base path) from the filename before it is sent to the client with sendFile
    call. I added
    this option primarily to avoid performing string manipulation on filename
    to extract the
    tablespace location and then calculate the basepathlen.
    
    Alternatively we can do it by extracting the base path from the received
    filename. What
    do you suggest?
    
    
    >
    > STORE_BACKUPFILE() seems like maybe it should be a function rather
    > than a macro, and also probably be renamed, because it doesn't store
    > files and the argument's not necessarily a file.
    >
    Sure.
    
    
    >
    > SendBackupManifest() does not send a backup manifest in the sense
    > contemplated by the email thread on that subject.  It sends a file
    > list.  That seems like the right idea - IMHO, anyway - but you need to
    > do a thorough renaming.
    >
    
    I'm considering the following command names:
    START_BACKUP
    - Starts the backup process
    
    SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
    - Sends the list of all files (along with file information such as
    filename, file type (directory/file/link),
    file size and file mtime for each file) to be backed up.
    
    SEND_BACKUP_FILES
    - Sends one or more files to the client.
    
    STOP_BACKUP
    - Stops the backup process.
    
    I'll update the function names accordingly after your confirmation. Of
    course, suggestions for
    better names are welcome.
    
    
    >
    > I think it would be fine to decide that this facility won't support
    > exclusive-mode backup.
    >
    
    Sure. Will drop this patch.
    
    
    >
    > I don't think much of having both sendDir() and sendDir_(). The latter
    > name is inconsistent with any naming convention we have, and there
    > seems to be no reason not to just add an argument to sendDir() and
    > change the callers.
    
    
    > I think we should rename - perhaps as a preparatory patch - the
    > sizeonly flag to dryrun, or something like that.
    >
    
    Sure, will take care of it.
    
    
    > The resource cleanup does not look right.  You've included calls to
    > PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
    > and StopBackup(), but what happens if there is an error or even a
    > clean shutdown of the connection in between? I think that there needs
    
    to be some change here to ensure that a walsender will always call
    > base_backup_cleanup() when it exits; I think that'd probably remove
    > the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
    > we have already.  This might also be something that could be done as a
    > separate, prepatory refactoring patch.
    >
    
    You're right. I didn't handle this case properly. I will removed
    PG_ENSURE_ERROR_CLEANUP
    calls and replace it with before_shmem_exit handler. This way
    whenever backend process exits,
    base_backup_cleanup will be called:
    - If it exists before calling the do_pg_stop_backup, base_backup_cleanup
    will take care of cleanup.
    - otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
    then base_backup_cleanup
    will simply return without doing anything.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  30. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-11-01T15:26:02Z

    On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > I have updated the patch to include the changes suggested by Jeevan.
    >> This patch also implements the thread workers instead of
    >> > processes and fetches a single file at a time. The tar format has been
    >> disabled for first version of parallel backup.
    >>
    >> Looking at 0001-0003:
    >>
    >> It's not clear to me what the purpose of the start WAL location is
    >> supposed to be. As far as I can see, SendBackupFiles() stores it in a
    >> variable which is then used for exactly nothing, and nothing else uses
    >> it.  It seems like that would be part of a potential incremental
    >> backup feature, but I don't see what it's got to do with parallel full
    >> backup.
    >>
    >
    > 'startptr' is used by sendFile() during checksum verification. Since
    > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
    >
    >
    >> The tablespace_path option appears entirely unused, and I don't know
    >> why that should be necessary here, either.
    >>
    >
    > This is to calculate the basepathlen. We need to exclude the tablespace
    > location (or
    > base path) from the filename before it is sent to the client with sendFile
    > call. I added
    > this option primarily to avoid performing string manipulation on filename
    > to extract the
    > tablespace location and then calculate the basepathlen.
    >
    > Alternatively we can do it by extracting the base path from the received
    > filename. What
    > do you suggest?
    >
    >
    >>
    >> STORE_BACKUPFILE() seems like maybe it should be a function rather
    >> than a macro, and also probably be renamed, because it doesn't store
    >> files and the argument's not necessarily a file.
    >>
    > Sure.
    >
    >
    >>
    >> SendBackupManifest() does not send a backup manifest in the sense
    >> contemplated by the email thread on that subject.  It sends a file
    >> list.  That seems like the right idea - IMHO, anyway - but you need to
    >> do a thorough renaming.
    >>
    >
    > I'm considering the following command names:
    > START_BACKUP
    > - Starts the backup process
    >
    > SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
    > - Sends the list of all files (along with file information such as
    > filename, file type (directory/file/link),
    > file size and file mtime for each file) to be backed up.
    >
    > SEND_BACKUP_FILES
    > - Sends one or more files to the client.
    >
    > STOP_BACKUP
    > - Stops the backup process.
    >
    > I'll update the function names accordingly after your confirmation. Of
    > course, suggestions for
    > better names are welcome.
    >
    >
    >>
    >> I think it would be fine to decide that this facility won't support
    >> exclusive-mode backup.
    >>
    >
    > Sure. Will drop this patch.
    >
    >
    >>
    >> I don't think much of having both sendDir() and sendDir_(). The latter
    >> name is inconsistent with any naming convention we have, and there
    >> seems to be no reason not to just add an argument to sendDir() and
    >> change the callers.
    >
    >
    >> I think we should rename - perhaps as a preparatory patch - the
    >> sizeonly flag to dryrun, or something like that.
    >>
    >
    > Sure, will take care of it.
    >
    >
    >> The resource cleanup does not look right.  You've included calls to
    >> PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
    >> and StopBackup(), but what happens if there is an error or even a
    >> clean shutdown of the connection in between? I think that there needs
    >
    > to be some change here to ensure that a walsender will always call
    >> base_backup_cleanup() when it exits; I think that'd probably remove
    >> the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
    >> we have already.  This might also be something that could be done as a
    >> separate, prepatory refactoring patch.
    >>
    >
    > You're right. I didn't handle this case properly. I will removed
    > PG_ENSURE_ERROR_CLEANUP
    > calls and replace it with before_shmem_exit handler. This way
    > whenever backend process exits,
    > base_backup_cleanup will be called:
    > - If it exists before calling the do_pg_stop_backup, base_backup_cleanup
    > will take care of cleanup.
    > - otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
    > then base_backup_cleanup
    > will simply return without doing anything.
    >
    >
    >
    The updated patches are attached.
    
    Thanks,
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  31. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-11-01T15:52:51Z

    On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > 'startptr' is used by sendFile() during checksum verification. Since
    > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
    
    Ugh, global variables.
    
    Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
    STOP_BACKUP all using the same base_backup_opt_list production as
    BASE_BACKUP? Presumably most of those options are not applicable to
    most of those commands, and the productions should therefore be
    separated.
    
    You should add docs, too.  I wouldn't have to guess what some of this
    stuff was for if you wrote documentation explaining what this stuff
    was for. :-)
    
    >> The tablespace_path option appears entirely unused, and I don't know
    >> why that should be necessary here, either.
    >
    > This is to calculate the basepathlen. We need to exclude the tablespace location (or
    > base path) from the filename before it is sent to the client with sendFile call. I added
    > this option primarily to avoid performing string manipulation on filename to extract the
    > tablespace location and then calculate the basepathlen.
    >
    > Alternatively we can do it by extracting the base path from the received filename. What
    > do you suggest?
    
    I don't think the server needs any information from the client in
    order to be able to exclude the tablespace location from the pathname.
    Whatever it needs to know, it should be able to figure out, just as it
    would in a non-parallel backup.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  32. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-11-04T13:08:55Z

    On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > 'startptr' is used by sendFile() during checksum verification. Since
    > > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
    >
    > Ugh, global variables.
    >
    > Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
    > STOP_BACKUP all using the same base_backup_opt_list production as
    > BASE_BACKUP? Presumably most of those options are not applicable to
    > most of those commands, and the productions should therefore be
    > separated.
    >
    
    Are you expecting something like the attached patch? Basically I have
    reorganised the grammar
    rules so each command can have the options required by it.
    
    I was feeling a bit reluctant for this change because it may add some
    unwanted grammar rules in
    the replication grammar. Since these commands are using the same options as
    base backup, may
    be we could throw error inside the relevant functions on unwanted options?
    
    
    
    > You should add docs, too.  I wouldn't have to guess what some of this
    > stuff was for if you wrote documentation explaining what this stuff
    > was for. :-)
    >
    
    Yes I will add it in the next patch.
    
    
    >
    > >> The tablespace_path option appears entirely unused, and I don't know
    > >> why that should be necessary here, either.
    > >
    > > This is to calculate the basepathlen. We need to exclude the tablespace
    > location (or
    > > base path) from the filename before it is sent to the client with
    > sendFile call. I added
    > > this option primarily to avoid performing string manipulation on
    > filename to extract the
    > > tablespace location and then calculate the basepathlen.
    > >
    > > Alternatively we can do it by extracting the base path from the received
    > filename. What
    > > do you suggest?
    >
    > I don't think the server needs any information from the client in
    > order to be able to exclude the tablespace location from the pathname.
    > Whatever it needs to know, it should be able to figure out, just as it
    > would in a non-parallel backup.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  33. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-11-12T12:07:14Z

    On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > 'startptr' is used by sendFile() during checksum verification. Since
    >> > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
    >>
    >> Ugh, global variables.
    >>
    >> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
    >> STOP_BACKUP all using the same base_backup_opt_list production as
    >> BASE_BACKUP? Presumably most of those options are not applicable to
    >> most of those commands, and the productions should therefore be
    >> separated.
    >>
    >
    > Are you expecting something like the attached patch? Basically I have
    > reorganised the grammar
    > rules so each command can have the options required by it.
    >
    > I was feeling a bit reluctant for this change because it may add some
    > unwanted grammar rules in
    > the replication grammar. Since these commands are using the same options
    > as base backup, may
    > be we could throw error inside the relevant functions on unwanted options?
    >
    >
    >
    >> You should add docs, too.  I wouldn't have to guess what some of this
    >> stuff was for if you wrote documentation explaining what this stuff
    >> was for. :-)
    >>
    >
    > Yes I will add it in the next patch.
    >
    >
    >>
    >> >> The tablespace_path option appears entirely unused, and I don't know
    >> >> why that should be necessary here, either.
    >> >
    >> > This is to calculate the basepathlen. We need to exclude the tablespace
    >> location (or
    >> > base path) from the filename before it is sent to the client with
    >> sendFile call. I added
    >> > this option primarily to avoid performing string manipulation on
    >> filename to extract the
    >> > tablespace location and then calculate the basepathlen.
    >> >
    >> > Alternatively we can do it by extracting the base path from the
    >> received filename. What
    >> > do you suggest?
    >>
    >> I don't think the server needs any information from the client in
    >> order to be able to exclude the tablespace location from the pathname.
    >> Whatever it needs to know, it should be able to figure out, just as it
    >> would in a non-parallel backup.
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >
    >
    I have updated the replication grammar with some new rules to differentiate
    the options production
    for base backup and newly added commands.
    
    I have also created a separate patch to include the documentation changes.
    The current syntax is as below:
    
    - START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
    - STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
    - SEND_BACKUP_FILELIST
    - SEND_BACKUP_FILES ( 'FILE' [, ...] )  [ MAX_RATE rate ] [
    NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  34. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-11-13T13:34:04Z

    On Tue, Nov 12, 2019 at 5:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >
    >>
    >>
    >> On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >>
    >>> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>> > 'startptr' is used by sendFile() during checksum verification. Since
    >>> > SendBackupFiles() is using sendFIle we have to set a valid WAL
    >>> location.
    >>>
    >>> Ugh, global variables.
    >>>
    >>> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
    >>> STOP_BACKUP all using the same base_backup_opt_list production as
    >>> BASE_BACKUP? Presumably most of those options are not applicable to
    >>> most of those commands, and the productions should therefore be
    >>> separated.
    >>>
    >>
    >> Are you expecting something like the attached patch? Basically I have
    >> reorganised the grammar
    >> rules so each command can have the options required by it.
    >>
    >> I was feeling a bit reluctant for this change because it may add some
    >> unwanted grammar rules in
    >> the replication grammar. Since these commands are using the same options
    >> as base backup, may
    >> be we could throw error inside the relevant functions on unwanted options?
    >>
    >>
    >>
    >>> You should add docs, too.  I wouldn't have to guess what some of this
    >>> stuff was for if you wrote documentation explaining what this stuff
    >>> was for. :-)
    >>>
    >>
    >> Yes I will add it in the next patch.
    >>
    >>
    >>>
    >>> >> The tablespace_path option appears entirely unused, and I don't know
    >>> >> why that should be necessary here, either.
    >>> >
    >>> > This is to calculate the basepathlen. We need to exclude the
    >>> tablespace location (or
    >>> > base path) from the filename before it is sent to the client with
    >>> sendFile call. I added
    >>> > this option primarily to avoid performing string manipulation on
    >>> filename to extract the
    >>> > tablespace location and then calculate the basepathlen.
    >>> >
    >>> > Alternatively we can do it by extracting the base path from the
    >>> received filename. What
    >>> > do you suggest?
    >>>
    >>> I don't think the server needs any information from the client in
    >>> order to be able to exclude the tablespace location from the pathname.
    >>> Whatever it needs to know, it should be able to figure out, just as it
    >>> would in a non-parallel backup.
    >>>
    >>> --
    >>> Robert Haas
    >>> EnterpriseDB: http://www.enterprisedb.com
    >>> The Enterprise PostgreSQL Company
    >>>
    >>
    >>
    > I have updated the replication grammar with some new rules to
    > differentiate the options production
    > for base backup and newly added commands.
    >
    > I have also created a separate patch to include the documentation changes.
    > The current syntax is as below:
    >
    > - START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
    > - STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
    > - SEND_BACKUP_FILELIST
    > - SEND_BACKUP_FILES ( 'FILE' [, ...] )  [ MAX_RATE rate ] [
    > NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]
    >
    >
    Sorry, I sent the wrong patches. Please see the correct version of the
    patches (_v6).
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  35. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-11-27T08:38:31Z

    On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > Sorry, I sent the wrong patches. Please see the correct version of the
    > patches (_v6).
    >
    
    Review comments on these patches:
    
    1.
    +    XLogRecPtr    wal_location;
    
    Looking at the other field names in basebackup_options structure, let's use
    wallocation instead. Or better startwallocation to be precise.
    
    2.
    +    int32        size;
    
    Should we use size_t here?
    
    3.
    I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
    Can't we return the file list with START_BACKUP itself?
    
    4.
    +        else if (
    +#ifndef WIN32
    +                 S_ISLNK(statbuf.st_mode)
    +#else
    +                 pgwin32_is_junction(pathbuf)
    +#endif
    +            )
    +        {
    +            /*
    +             * If symlink, write it as a directory. file symlinks only
    allowed
    +             * in pg_tblspc
    +             */
    +            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
    +            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
    false);
    +        }
    
    In normal backup mode, we skip the special file which is not a regular file
    or
    a directory or a symlink inside pg_tblspc. But in your patch, above code,
    treats it as a directory. Should parallel backup too skip such special
    files?
    
    5.
    Please keep header file inclusions in alphabetical order in basebackup.c and
    pg_basebackup.c
    
    6.
    +        /*
    +         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
    +         * 'base/1/1245/32683', ...) [options]
    +         */
    
    Please update these comments as we fetch one file at a time.
    
    7.
    +backup_file:
    +            SCONST                            { $$ = (Node *)
    makeString($1); }
    +            ;
    +
    
    Instead of having this rule with only one constant terminal, we can use
    SCONST directly in backup_files_list. However, I don't see any issue with
    this approach either, just trying to reduce the rules.
    
    8.
    Please indent code within 80 char limit at all applicable places.
    
    9.
    Please fix following typos:
    
    identifing => identifying
    optionaly => optionally
    structre => structure
    progrsss => progress
    Retrive => Retrieve
    direcotries => directories
    
    
    =====
    
    The other mail thread related to backup manifest [1], is creating a
    backup_manifest file and sends that to the client which has optional
    checksum and other details including filename, file size, mtime, etc.
    There is a patch on the same thread which is then validating the backup too.
    
    Since this patch too gets a file list from the server and has similar
    details (except checksum), can somehow parallel backup use the
    backup-manifest
    infrastructure from that patch?
    
    When the parallel backup is in use, will there be a backup_manifest file
    created too? I am just visualizing what will be the scenario when both these
    features are checked-in.
    
    [1]
    https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
    
    
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    Thanks
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  36. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-11-27T19:57:27Z

    On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
    <jeevan.chalke@enterprisedb.com> wrote:
    > I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
    > Can't we return the file list with START_BACKUP itself?
    
    I had the same thought, but I think it's better to keep them separate.
    Somebody might want to use the SEND_BACKUP_FILELIST command for
    something other than a backup (I actually think it should be called
    just SEND_FILE_LIST). Somebody might want to start a backup without
    getting a file list because they're going to copy the files at the FS
    level. Somebody might want to get a list of files to process after
    somebody else has started the backup on another connection. Or maybe
    nobody wants to do any of those things, but it doesn't seem to cost us
    much of anything to split the commands, so I think we should.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  37. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-12-10T14:33:48Z

    On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >> Sorry, I sent the wrong patches. Please see the correct version of the
    >> patches (_v6).
    >>
    >
    > Review comments on these patches:
    >
    > 1.
    > +    XLogRecPtr    wal_location;
    >
    > Looking at the other field names in basebackup_options structure, let's use
    > wallocation instead. Or better startwallocation to be precise.
    >
    > 2.
    > +    int32        size;
    >
    > Should we use size_t here?
    >
    > 3.
    > I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
    > Can't we return the file list with START_BACKUP itself?
    >
    > 4.
    > +        else if (
    > +#ifndef WIN32
    > +                 S_ISLNK(statbuf.st_mode)
    > +#else
    > +                 pgwin32_is_junction(pathbuf)
    > +#endif
    > +            )
    > +        {
    > +            /*
    > +             * If symlink, write it as a directory. file symlinks only
    > allowed
    > +             * in pg_tblspc
    > +             */
    > +            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
    > +            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
    > false);
    > +        }
    >
    > In normal backup mode, we skip the special file which is not a regular
    > file or
    > a directory or a symlink inside pg_tblspc. But in your patch, above code,
    > treats it as a directory. Should parallel backup too skip such special
    > files?
    >
    
    Yeah going through the code again, I found it a little bit inconsistent. In
    fact
    SendBackupFiles function is supposed to send the files that were requested
    of
    it. However, currently is performing these tasks:
    
    1) If the requested file were to be a directory, it will return a TAR
    directory entry.
    2) If the requested files were to be symlink inside pg_tblspc, it will
    return the link path.
    3) and as you pointed out above, if the requested files were a symlink
    outside pg_tblspc
    and inside PGDATA then it will return TAR directory entry.
    
    I think that this function should not take care of any of the above.
    Instead, it should
    be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should
    only send the
    regular files and ignore the request of any other kind, be it a directory
    or symlink.
    
    Any thoughts?
    
    
    > 5.
    > Please keep header file inclusions in alphabetical order in basebackup.c
    > and
    > pg_basebackup.c
    >
    > 6.
    > +        /*
    > +         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
    > +         * 'base/1/1245/32683', ...) [options]
    > +         */
    >
    > Please update these comments as we fetch one file at a time.
    >
    > 7.
    > +backup_file:
    > +            SCONST                            { $$ = (Node *)
    > makeString($1); }
    > +            ;
    > +
    >
    > Instead of having this rule with only one constant terminal, we can use
    > SCONST directly in backup_files_list. However, I don't see any issue with
    > this approach either, just trying to reduce the rules.
    >
    > 8.
    > Please indent code within 80 char limit at all applicable places.
    >
    > 9.
    > Please fix following typos:
    >
    > identifing => identifying
    > optionaly => optionally
    > structre => structure
    > progrsss => progress
    > Retrive => Retrieve
    > direcotries => directories
    >
    >
    > =====
    >
    > The other mail thread related to backup manifest [1], is creating a
    > backup_manifest file and sends that to the client which has optional
    > checksum and other details including filename, file size, mtime, etc.
    > There is a patch on the same thread which is then validating the backup
    > too.
    >
    > Since this patch too gets a file list from the server and has similar
    > details (except checksum), can somehow parallel backup use the
    > backup-manifest
    > infrastructure from that patch?
    >
    
    This was discussed earlier in the thread, and as Robert suggested, it would
    complicate the
    code to no real benefit.
    
    
    > When the parallel backup is in use, will there be a backup_manifest file
    > created too? I am just visualizing what will be the scenario when both
    > these
    > features are checked-in.
    >
    
    Yes, I think it should. Since the full backup will have a manifest file,
    there is no
    reason for parallel backup to not support it.
    
    I'll share the updated patch in the next couple of days.
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  38. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-12-10T14:34:14Z

    On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
    > <jeevan.chalke@enterprisedb.com> wrote:
    > > I am still not sure why we need SEND_BACKUP_FILELIST as a separate
    > command.
    > > Can't we return the file list with START_BACKUP itself?
    >
    > I had the same thought, but I think it's better to keep them separate.
    > Somebody might want to use the SEND_BACKUP_FILELIST command for
    > something other than a backup (I actually think it should be called
    > just SEND_FILE_LIST)
    
    
    Sure. Thanks for the recommendation. To keep the function names in sync, I
    intend to do following the
    following renamings:
    - SEND_BACKUP_FILES --> SEND_FILES
    - SEND_BACKUP_FILELIST -->  SEND_FILE_LIST
    
    . Somebody might want to start a backup without
    > getting a file list because they're going to copy the files at the FS
    > level. Somebody might want to get a list of files to process after
    > somebody else has started the backup on another connection. Or maybe
    > nobody wants to do any of those things, but it doesn't seem to cost us
    > much of anything to split the commands, so I think we should.
    >
    
    +1
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  39. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2019-12-12T15:19:57Z

    On Tue, Dec 10, 2019 at 7:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com>
    > wrote:
    >
    >> On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
    >> <jeevan.chalke@enterprisedb.com> wrote:
    >> > I am still not sure why we need SEND_BACKUP_FILELIST as a separate
    >> command.
    >> > Can't we return the file list with START_BACKUP itself?
    >>
    >> I had the same thought, but I think it's better to keep them separate.
    >> Somebody might want to use the SEND_BACKUP_FILELIST command for
    >> something other than a backup (I actually think it should be called
    >> just SEND_FILE_LIST)
    >
    >
    > Sure. Thanks for the recommendation. To keep the function names in sync, I
    > intend to do following the
    > following renamings:
    > - SEND_BACKUP_FILES --> SEND_FILES
    > - SEND_BACKUP_FILELIST -->  SEND_FILE_LIST
    >
    > . Somebody might want to start a backup without
    >> getting a file list because they're going to copy the files at the FS
    >> level. Somebody might want to get a list of files to process after
    >> somebody else has started the backup on another connection. Or maybe
    >> nobody wants to do any of those things, but it doesn't seem to cost us
    >> much of anything to split the commands, so I think we should.
    >>
    >
    > +1
    >
    >
    I have updated the patches (v7 attached) and have taken care of all issues
    pointed by Jeevan, additionally
    ran the pgindent on each patch. Furthermore, Command names have been
    renamed as suggested and I
    have simplified the SendFiles function. Client can only request the regular
    files, any other kind such as
    directories or symlinks will be skipped, the client will be responsible for
    taking care of such.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  40. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2019-12-19T17:47:22Z

    On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
    > ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
    > have simplified the SendFiles function. Client can only request the regular files, any other kind such as
    > directories or symlinks will be skipped, the client will be responsible for taking care of such.
    
    Hi,
    
    Patch 0001 of this series conflicts with my recent commit
    303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
    inspired by some previous study of 0001. That being said, I think 0001
    has the wrong idea. There's no reason that I can see why it should be
    correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
    perform_base_backup(). It's true that if we register a long-lived
    before_shmem_exit hook, then the backup will get cleaned up even
    without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
    question of the warning message. I think that our goal should be to
    emit the warning message about a backup being stopped too early if the
    user uses either pg_start_backup() or the new START_BACKUP command and
    does not end the backup with either pg_stop_backup() or the new
    STOP_BACKUP command -- but not if a single command that both starts
    and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
    that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
    need to temporarily register do_pg_abort_backup() as a
    before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
    commands like BASE_BACKUP() -- and for things like pg_start_backup()
    or the new START_BACKUP command, we just need to add a single call to
    register_persistent_abort_backup_handler().
    
    So I think you can drop 0001, and then in the patch that actually
    introduces START_BACKUP, add the call to
    register_persistent_abort_backup_handler() before calling
    do_pg_start_backup(). Also in that patch, also adjust the warning text
    that do_pg_abort_backup() emits to be more generic e.g. "aborting
    backup due to backend exiting while a non-exclusive backup is in
    progress".
    
    0003 creates three new functions, moving code from
    do_pg_start_backup() to a new function collectTablespaces() and from
    perform_base_backup() to new functions setup_throttle() and
    include_wal_files(). I'm skeptical about all of these changes. One
    general nitpick is that the way these function names are capitalized
    and punctuated does not seem to have been chosen very consistently;
    how about name_like_this() throughout? A bit more substantively:
    
    - collectTablespaces() is factored out of do_pg_start_backup() so that
    it can also be used by SendFileList(), but that means that a client is
    going to invoke START_BACKUP, indirectly calling collectTablespaces(),
    and then immediately afterward the client is probably going to call
    SEND_FILE_LIST, which will again call collectTablespaces(). That does
    not appear to be super-great. For one thing, it's duplicate work,
    although because SendFileList() is going to pass infotbssize as false,
    it's not a lot of duplicated work. Also, what happens if the two calls
    to collectTablespaces() return different answers due to concurrent
    CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
    it seems like there is at least the possibility of bugs if different
    parts of the backup have different notions of what tablespaces exist.
    
    - setup_throttle() is factored out of perform_base_backup() so that it
    can be called in StartBackup() and StopBackup() and SendFiles(). This
    seems extremely odd. Why does it make any sense to give the user an
    option to activate throttling when *ending* a backup? Why does it make
    sense to give the user a chance to enable throttling *both* at the
    startup of a backup *and also* for each individual file. If we're
    going to support throttling here, it seems like it should be either a
    backup-level property or a file-level property, not both.
    
    - include_wal_files() is factored out of perform_base_backup() so that
    it can be called by StopBackup(). This seems like a poor design
    decision. The idea behind the BASE_BACKUP command is that you run that
    one command, and the server sends you everything. The idea in this new
    way of doing business is that the client requests the individual files
    it wants -- except for the WAL files, which are for some reason not
    requested individually but sent all together as part of the
    STOP_BACKUP response. It seems like it would be more consistent if the
    client were to decide which WAL files it needs and request them one by
    one, just as we do with other files.
    
    I think there's a common theme to all of these complaints, which is
    that you haven't done enough to move things that are the
    responsibility of the backend in the BASE_BACKUP model to the frontend
    in this model. I started wondering, for example, whether it might not
    be better to have the client rather than the server construct the
    tablespace_map file. After all, the client needs to get the list of
    files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
    almost enough to construct the tablespace map. The only additional
    thing it needs is the full pathname to which the link points. But, it
    seems that we could fairly easily extend SEND_FILE_LIST to send, for
    files that are symbolic links, the target of the link, using a new
    column. Or alternatively, using a separate command, so that instead of
    just sending a single SEND_FILE_LIST command, the client might first
    ask for a tablespace list and then might ask for a list of files
    within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
    for each tablespace, with 0 for the main tablespace, perhaps). I'm not
    sure which way is better.
    
    Similarly, for throttling, I have a hard time understanding how what
    you've got here is going to work reasonably. It looks like each client
    is just going to request whatever MAX_RATE the user specifies, but the
    result of that will be that the actual transfer rate is probably a
    multiple of the specified rate, approximately equal to the specified
    rate times the number of clients. That's probably not what the user
    wants. You could take the specified rate and divide it by the number
    of workers, but limiting each of 4 workers to a quarter of the rate
    will probably lead to a combined rate of less than than the specified
    rate, because if one worker doesn't use all of the bandwidth to which
    it's entitled, or even exits earlier than the others, the other
    workers don't get to go any faster as a result. Another problem is
    that, in the current approach, throttling applies overall to the
    entire backup, but in this approach, it is applied separately to each
    SEND_FILE command. In the current approach, if one file finishes a
    little faster or slower than anticipated, the next file in the tarball
    will be sent a little slower or faster to compensate. But in this
    approach, each SEND_FILES command is throttled separately, so this
    property is lost. Furthermore, while BASEBACKUP sends data
    continuously, this approach naturally involves pauses between
    commands. If files are large, that won't matter much, but if they're
    small and numerous, it will tend to cause the actual transfer rate to
    be less than the throttling rate.
    
    One potential way to solve this problem is... move it to the client
    side. Instead of making it the server's job not to send data too fast,
    make it the client's job not to receive data too fast. Let the server
    backends write as fast as they want, and on the pg_basebackup side,
    have the threads coordinate with each other so that they don't read
    data faster than the configured rate. That's not quite the same thing,
    though, because the server can get ahead by the size of the client's
    receive buffers plus whatever data is on the wire. I don't know
    whether that's a big enough problem to be worth caring about. If it
    is, then I think we need some server infrastructure to "group
    throttle" a group of cooperating backends.
    
    A general comment about 0004 is that it seems like you've proceeded by
    taking the code from perform_base_backup() and spreading it across
    several different functions without, necessarily, as much thought as
    is needed there. For instance, StartBackup() looks like just the
    beginning of perform_base_backup(). But, why shouldn't it instead look
    like pg_start_backup() -- in fact, a simplified version that only
    handles the non-exclusive backup case? Is the extra stuff it's doing
    really appropriate? I've already complained about the
    tablespace-related stuff here and the throttling, but there's more.
    Setting statrelpath here will probably break if somebody tries to use
    SEND_FILES without first calling START_BACKUP. Sending the
    backup_label file here is oddly asymmetric, because that's done by
    pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
    looks like it's just the end of perform_base_backup(), but that's not
    pretty strange-looking too. Again, I've already complained about
    include_wal_files() being part of this, but there's also:
    
    +       /* ... and pg_control after everything else. */
    
    ...which (1) is an odd thing to say when this is the first thing this
    particular function is to send and (2) is another example of a sloppy
    division of labor between client and server; apparently, the client is
    supposed to know not to request pg_control, because the server is
    going to send it unsolicited. There's no particular reason to have
    this a special case. The client could just request it last. And then
    the server code wouldn't need a special case, and you wouldn't have
    this odd logic split between the client and the server.
    
    Overall, I think this needs a lot more work. The overall idea's not
    wrong, but there seem to be a very large number of details which, at
    least to me, do not seem to be correct.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  41. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-01-04T06:53:52Z

    On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > I have updated the patches (v7 attached) and have taken care of all
    > issues pointed by Jeevan, additionally
    > > ran the pgindent on each patch. Furthermore, Command names have been
    > renamed as suggested and I
    > > have simplified the SendFiles function. Client can only request the
    > regular files, any other kind such as
    > > directories or symlinks will be skipped, the client will be responsible
    > for taking care of such.
    >
    > Hi,
    >
    > Patch 0001 of this series conflicts with my recent commit
    > 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
    > inspired by some previous study of 0001. That being said, I think 0001
    > has the wrong idea. There's no reason that I can see why it should be
    > correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
    > perform_base_backup(). It's true that if we register a long-lived
    > before_shmem_exit hook, then the backup will get cleaned up even
    > without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
    > question of the warning message. I think that our goal should be to
    > emit the warning message about a backup being stopped too early if the
    > user uses either pg_start_backup() or the new START_BACKUP command and
    > does not end the backup with either pg_stop_backup() or the new
    > STOP_BACKUP command -- but not if a single command that both starts
    > and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
    > that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
    > need to temporarily register do_pg_abort_backup() as a
    > before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
    > commands like BASE_BACKUP() -- and for things like pg_start_backup()
    > or the new START_BACKUP command, we just need to add a single call to
    > register_persistent_abort_backup_handler().
    >
    > So I think you can drop 0001, and then in the patch that actually
    > introduces START_BACKUP, add the call to
    > register_persistent_abort_backup_handler() before calling
    > do_pg_start_backup(). Also in that patch, also adjust the warning text
    > that do_pg_abort_backup() emits to be more generic e.g. "aborting
    > backup due to backend exiting while a non-exclusive backup is in
    > progress".
    >
    > Sure. will do.
    
    
    > 0003 creates three new functions, moving code from
    > do_pg_start_backup() to a new function collectTablespaces() and from
    > perform_base_backup() to new functions setup_throttle() and
    > include_wal_files(). I'm skeptical about all of these changes. One
    > general nitpick is that the way these function names are capitalized
    > and punctuated does not seem to have been chosen very consistently;
    > how about name_like_this() throughout? A bit more substantively:
    >
    > - collectTablespaces() is factored out of do_pg_start_backup() so that
    > it can also be used by SendFileList(), but that means that a client is
    > going to invoke START_BACKUP, indirectly calling collectTablespaces(),
    > and then immediately afterward the client is probably going to call
    > SEND_FILE_LIST, which will again call collectTablespaces(). That does
    > not appear to be super-great. For one thing, it's duplicate work,
    > although because SendFileList() is going to pass infotbssize as false,
    > it's not a lot of duplicated work.
    
    
    I'll remove this duplication by eliminating this call from START_BACKUP and
    SEND_FILE_LIST functions. More about this is explained later in this email.
    
    
    > Also, what happens if the two calls
    > to collectTablespaces() return different answers due to concurrent
    > CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
    > it seems like there is at least the possibility of bugs if different
    > parts of the backup have different notions of what tablespaces exist.
    >
    
    The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
    be resolved by the WAL files collected for the backup. I don't think we
    can do anything when objects are created or dropped in-between start and
    stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
    scenario and does not error out when some relation files go away.
    
    
    >
    > - setup_throttle() is factored out of perform_base_backup() so that it
    > can be called in StartBackup() and StopBackup() and SendFiles(). This
    > seems extremely odd. Why does it make any sense to give the user an
    > option to activate throttling when *ending* a backup? Why does it make
    > sense to give the user a chance to enable throttling *both* at the
    > startup of a backup *and also* for each individual file. If we're
    > going to support throttling here, it seems like it should be either a
    > backup-level property or a file-level property, not both.
    >
    
    It's a file-level property only. Throttle functionality relies on global
    variables. StartBackup() and StopBackup() are calling setup_throttle
    function to disable the throttling.
    
    I should have been more explicit here by using -1 to setup_throttle,
    Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
    (Although it defaults to -1 for these functions).
    
    I'll remove the setup_throttle() call for both functions.
    
    
    >
    > - include_wal_files() is factored out of perform_base_backup() so that
    > it can be called by StopBackup(). This seems like a poor design
    > decision. The idea behind the BASE_BACKUP command is that you run that
    > one command, and the server sends you everything. The idea in this new
    > way of doing business is that the client requests the individual files
    > it wants -- except for the WAL files, which are for some reason not
    > requested individually but sent all together as part of the
    > STOP_BACKUP response. It seems like it would be more consistent if the
    > client were to decide which WAL files it needs and request them one by
    > one, just as we do with other files.
    >
    
    As I understand you are suggesting to add another command to fetch the
    list of WAL files which would be called by the client after executing stop
    backup. Once the client gets that list, it starts requesting the WAL files
    one
    by one.
    
    So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
    as arguments and return the list of WAL files between these LSNs.
    
    Something like this :
    LIST_WAL_FILES 'start_lsn'  'end_lsn';
    
    
    >
    > I think there's a common theme to all of these complaints, which is
    > that you haven't done enough to move things that are the
    > responsibility of the backend in the BASE_BACKUP model to the frontend
    > in this model. I started wondering, for example, whether it might not
    > be better to have the client rather than the server construct the
    > tablespace_map file. After all, the client needs to get the list of
    > files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
    > almost enough to construct the tablespace map. The only additional
    > thing it needs is the full pathname to which the link points. But, it
    > seems that we could fairly easily extend SEND_FILE_LIST to send, for
    > files that are symbolic links, the target of the link, using a new
    > column. Or alternatively, using a separate command, so that instead of
    > just sending a single SEND_FILE_LIST command, the client might first
    > ask for a tablespace list and then might ask for a list of files
    > within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
    > for each tablespace, with 0 for the main tablespace, perhaps). I'm not
    > sure which way is better.
    >
    
    do_pg_start_backup is collecting the tablespace information anyway to
    build the tablespace_map for BASE_BACKUP. So returning the same seemed
    better than adding a new command for the same information. hence multiple
    calls to the collectTablespaces() [to be renamed to collect_tablespaces].
    
    tablespace_map can be constructed by the client, but then BASE_BACKUP
    is returning it as part of the full backup. If clients in parallel mode
    are to construct this themselves, these will seem like two different
    approaches. Perhaps this should be done for BASE_BACKUP as
    well?
    
    I'll refactor the do_pg_start_backup function to remove the code related
    to tablespace information collection (to collect_tablespaces) and
    tablespace_map file creation, so that this function does not collect this
    information unnecessarily. perform_base_backup function can collect and
    send the tablespace information to the client and then the client can
    construct the tablespace_map file.
    
    I'll add a new command to fetch the list of tablespaces i.e.
    LIST_TABLESPACES
    which will return the tablespace information to the client for parallel
    mode. And will refactor START_BACKUP and STOP_BACKUP commands,
    so that they only do the specific job of putting the system in backup mode
    or
    out of it, nothing else.These commands should only return the start and end
    LSN to the client.
    
    
    
    >
    > Similarly, for throttling, I have a hard time understanding how what
    > you've got here is going to work reasonably. It looks like each client
    > is just going to request whatever MAX_RATE the user specifies, but the
    > result of that will be that the actual transfer rate is probably a
    > multiple of the specified rate, approximately equal to the specified
    > rate times the number of clients. That's probably not what the user
    > wants. You could take the specified rate and divide it by the number
    > of workers, but limiting each of 4 workers to a quarter of the rate
    > will probably lead to a combined rate of less than than the specified
    > rate, because if one worker doesn't use all of the bandwidth to which
    > it's entitled, or even exits earlier than the others, the other
    > workers don't get to go any faster as a result. Another problem is
    > that, in the current approach, throttling applies overall to the
    > entire backup, but in this approach, it is applied separately to each
    > SEND_FILE command. In the current approach, if one file finishes a
    > little faster or slower than anticipated, the next file in the tarball
    > will be sent a little slower or faster to compensate. But in this
    > approach, each SEND_FILES command is throttled separately, so this
    > property is lost. Furthermore, while BASEBACKUP sends data
    > continuously, this approach naturally involves pauses between
    > commands. If files are large, that won't matter much, but if they're
    > small and numerous, it will tend to cause the actual transfer rate to
    > be less than the throttling rate.
    >
    > One potential way to solve this problem is... move it to the client
    > side. Instead of making it the server's job not to send data too fast,
    > make it the client's job not to receive data too fast. Let the server
    > backends write as fast as they want, and on the pg_basebackup side,
    > have the threads coordinate with each other so that they don't read
    > data faster than the configured rate. That's not quite the same thing,
    > though, because the server can get ahead by the size of the client's
    > receive buffers plus whatever data is on the wire. I don't know
    > whether that's a big enough problem to be worth caring about. If it
    > is, then I think we need some server infrastructure to "group
    > throttle" a group of cooperating backends.
    >
    
    That was a mistake in my code. maxrate should've been equally divided
    amongst all threads. I agree that we should move this to the client-side.
    When a thread exits, its share should also be equally divided amongst
    the remaining threads (i.e. recalculate maxrate for each remaining thread).
    
    Say we have 4 running threads with each allocation 25% of the bandwidth.
    Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
    33.33% each. This solves one problem that you had identified. However,
    it doesn't solve where one (or more) thread is not fully consuming their
    allocated share. I'm not really sure how we can solve it properly.
    Suggestions
    are welcome.
    
    
    >
    > A general comment about 0004 is that it seems like you've proceeded by
    > taking the code from perform_base_backup() and spreading it across
    > several different functions without, necessarily, as much thought as
    > is needed there. For instance, StartBackup() looks like just the
    > beginning of perform_base_backup(). But, why shouldn't it instead look
    > like pg_start_backup() -- in fact, a simplified version that only
    > handles the non-exclusive backup case? Is the extra stuff it's doing
    > really appropriate? I've already complained about the
    > tablespace-related stuff here and the throttling, but there's more.
    > Setting statrelpath here will probably break if somebody tries to use
    > SEND_FILES without first calling START_BACKUP. Sending the
    > backup_label file here is oddly asymmetric, because that's done by
    > pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
    > looks like it's just the end of perform_base_backup(), but that's not
    > pretty strange-looking too. Again, I've already complained about
    > include_wal_files() being part of this, but there's also:
    >
    > +       /* ... and pg_control after everything else. */
    >
    > ...which (1) is an odd thing to say when this is the first thing this
    > particular function is to send and (2) is another example of a sloppy
    > division of labor between client and server; apparently, the client is
    > supposed to know not to request pg_control, because the server is
    > going to send it unsolicited. There's no particular reason to have
    > this a special case. The client could just request it last. And then
    > the server code wouldn't need a special case, and you wouldn't have
    > this odd logic split between the client and the server.
    >
    > Overall, I think this needs a lot more work. The overall idea's not
    > wrong, but there seem to be a very large number of details which, at
    > least to me, do not seem to be correct.
    >
    >
    
    Thank you Robert for the detailed review. I really appreciate your insights
    and very precise feedback.
    
    After the changes suggested above, the design on a high level will look
    something
    like this:
    
    === SEQUENTIAL EXECUTION ===
    START_BACKUP [LABEL | FAST]
    - Starts backup on the server
    - Returns the start LSN to client
    
    LIST_TABLESPACES
    - Sends a list of all tables spaces to client
    
    Loops over LIST_TABLESPACES
    - LIST_FILES [tablespace]
    - Sends file list for the given tablespace
    - Create a list of all files
    
    === PARALLEL EXECUTION ===
    Thread loop until the list of files is exhausted
    SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
    - If the checksum is enabled then WAL_START_LOCATION is required.
    - Can request server to send one or more files but we are requesting one at
    a time
    - Pick next file from list of files
    
    - Threads sleep after the list is exhausted
    - All threads are sleeping
    
    === SEQUENTIAL EXECUTION ===
    STOP_BACKUP [NOWAIT]
    - Stops backup mode
    - Return end LSN
    
    If --wal-method=fetch then
    LIST_WAL_FILES 'start_lsn' 'end_lsn'
    - Sends a list of WAL files between start LSN and end LSN
    
    === PARALLEL EXECUTION ===
    Thread loop until the list of WAL files is exhausted
    SEND_FILE <WAL file>
    - Can request server to send one or more files but we are requesting one
    WAL file at a time
    - Pick next file from list of WAL files
    
    - Threads terminate and set their status as completed/terminated
    
    === SEQUENTIAL EXECUTION ===
    Cleanup
    
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  42. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-01-30T13:39:51Z

    On Sat, Jan 4, 2020 at 11:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com>
    > wrote:
    >
    >> On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > I have updated the patches (v7 attached) and have taken care of all
    >> issues pointed by Jeevan, additionally
    >> > ran the pgindent on each patch. Furthermore, Command names have been
    >> renamed as suggested and I
    >> > have simplified the SendFiles function. Client can only request the
    >> regular files, any other kind such as
    >> > directories or symlinks will be skipped, the client will be responsible
    >> for taking care of such.
    >>
    >> Hi,
    >>
    >> Patch 0001 of this series conflicts with my recent commit
    >> 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
    >> inspired by some previous study of 0001. That being said, I think 0001
    >> has the wrong idea. There's no reason that I can see why it should be
    >> correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
    >> perform_base_backup(). It's true that if we register a long-lived
    >> before_shmem_exit hook, then the backup will get cleaned up even
    >> without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
    >> question of the warning message. I think that our goal should be to
    >> emit the warning message about a backup being stopped too early if the
    >> user uses either pg_start_backup() or the new START_BACKUP command and
    >> does not end the backup with either pg_stop_backup() or the new
    >> STOP_BACKUP command -- but not if a single command that both starts
    >> and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
    >> that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
    >> need to temporarily register do_pg_abort_backup() as a
    >> before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
    >> commands like BASE_BACKUP() -- and for things like pg_start_backup()
    >> or the new START_BACKUP command, we just need to add a single call to
    >> register_persistent_abort_backup_handler().
    >>
    >> So I think you can drop 0001, and then in the patch that actually
    >> introduces START_BACKUP, add the call to
    >> register_persistent_abort_backup_handler() before calling
    >> do_pg_start_backup(). Also in that patch, also adjust the warning text
    >> that do_pg_abort_backup() emits to be more generic e.g. "aborting
    >> backup due to backend exiting while a non-exclusive backup is in
    >> progress".
    >>
    >> Sure. will do.
    >
    >
    >> 0003 creates three new functions, moving code from
    >> do_pg_start_backup() to a new function collectTablespaces() and from
    >> perform_base_backup() to new functions setup_throttle() and
    >> include_wal_files(). I'm skeptical about all of these changes. One
    >> general nitpick is that the way these function names are capitalized
    >> and punctuated does not seem to have been chosen very consistently;
    >> how about name_like_this() throughout? A bit more substantively:
    >>
    >> - collectTablespaces() is factored out of do_pg_start_backup() so that
    >> it can also be used by SendFileList(), but that means that a client is
    >> going to invoke START_BACKUP, indirectly calling collectTablespaces(),
    >> and then immediately afterward the client is probably going to call
    >> SEND_FILE_LIST, which will again call collectTablespaces(). That does
    >> not appear to be super-great. For one thing, it's duplicate work,
    >> although because SendFileList() is going to pass infotbssize as false,
    >> it's not a lot of duplicated work.
    >
    >
    > I'll remove this duplication by eliminating this call from START_BACKUP and
    > SEND_FILE_LIST functions. More about this is explained later in this email.
    >
    >
    >> Also, what happens if the two calls
    >> to collectTablespaces() return different answers due to concurrent
    >> CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
    >> it seems like there is at least the possibility of bugs if different
    >> parts of the backup have different notions of what tablespaces exist.
    >>
    >
    > The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
    > be resolved by the WAL files collected for the backup. I don't think we
    > can do anything when objects are created or dropped in-between start and
    > stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
    > scenario and does not error out when some relation files go away.
    >
    >
    >>
    >> - setup_throttle() is factored out of perform_base_backup() so that it
    >> can be called in StartBackup() and StopBackup() and SendFiles(). This
    >> seems extremely odd. Why does it make any sense to give the user an
    >> option to activate throttling when *ending* a backup? Why does it make
    >> sense to give the user a chance to enable throttling *both* at the
    >> startup of a backup *and also* for each individual file. If we're
    >> going to support throttling here, it seems like it should be either a
    >> backup-level property or a file-level property, not both.
    >>
    >
    > It's a file-level property only. Throttle functionality relies on global
    > variables. StartBackup() and StopBackup() are calling setup_throttle
    > function to disable the throttling.
    >
    > I should have been more explicit here by using -1 to setup_throttle,
    > Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
    > (Although it defaults to -1 for these functions).
    >
    > I'll remove the setup_throttle() call for both functions.
    >
    >
    >>
    >> - include_wal_files() is factored out of perform_base_backup() so that
    >> it can be called by StopBackup(). This seems like a poor design
    >> decision. The idea behind the BASE_BACKUP command is that you run that
    >> one command, and the server sends you everything. The idea in this new
    >> way of doing business is that the client requests the individual files
    >> it wants -- except for the WAL files, which are for some reason not
    >> requested individually but sent all together as part of the
    >> STOP_BACKUP response. It seems like it would be more consistent if the
    >> client were to decide which WAL files it needs and request them one by
    >> one, just as we do with other files.
    >>
    >
    > As I understand you are suggesting to add another command to fetch the
    > list of WAL files which would be called by the client after executing stop
    > backup. Once the client gets that list, it starts requesting the WAL files
    > one
    > by one.
    >
    > So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
    > as arguments and return the list of WAL files between these LSNs.
    >
    > Something like this :
    > LIST_WAL_FILES 'start_lsn'  'end_lsn';
    >
    >
    >>
    >> I think there's a common theme to all of these complaints, which is
    >> that you haven't done enough to move things that are the
    >> responsibility of the backend in the BASE_BACKUP model to the frontend
    >> in this model. I started wondering, for example, whether it might not
    >> be better to have the client rather than the server construct the
    >> tablespace_map file. After all, the client needs to get the list of
    >> files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
    >> almost enough to construct the tablespace map. The only additional
    >> thing it needs is the full pathname to which the link points. But, it
    >> seems that we could fairly easily extend SEND_FILE_LIST to send, for
    >> files that are symbolic links, the target of the link, using a new
    >> column. Or alternatively, using a separate command, so that instead of
    >> just sending a single SEND_FILE_LIST command, the client might first
    >> ask for a tablespace list and then might ask for a list of files
    >> within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
    >> for each tablespace, with 0 for the main tablespace, perhaps). I'm not
    >> sure which way is better.
    >>
    >
    > do_pg_start_backup is collecting the tablespace information anyway to
    > build the tablespace_map for BASE_BACKUP. So returning the same seemed
    > better than adding a new command for the same information. hence multiple
    > calls to the collectTablespaces() [to be renamed to collect_tablespaces].
    >
    > tablespace_map can be constructed by the client, but then BASE_BACKUP
    > is returning it as part of the full backup. If clients in parallel mode
    > are to construct this themselves, these will seem like two different
    > approaches. Perhaps this should be done for BASE_BACKUP as
    > well?
    >
    > I'll refactor the do_pg_start_backup function to remove the code related
    > to tablespace information collection (to collect_tablespaces) and
    > tablespace_map file creation, so that this function does not collect this
    > information unnecessarily. perform_base_backup function can collect and
    > send the tablespace information to the client and then the client can
    > construct the tablespace_map file.
    >
    > I'll add a new command to fetch the list of tablespaces i.e.
    > LIST_TABLESPACES
    > which will return the tablespace information to the client for parallel
    > mode. And will refactor START_BACKUP and STOP_BACKUP commands,
    > so that they only do the specific job of putting the system in backup mode
    > or
    > out of it, nothing else.These commands should only return the start and end
    > LSN to the client.
    >
    >
    >
    >>
    >> Similarly, for throttling, I have a hard time understanding how what
    >> you've got here is going to work reasonably. It looks like each client
    >> is just going to request whatever MAX_RATE the user specifies, but the
    >> result of that will be that the actual transfer rate is probably a
    >> multiple of the specified rate, approximately equal to the specified
    >> rate times the number of clients. That's probably not what the user
    >> wants. You could take the specified rate and divide it by the number
    >> of workers, but limiting each of 4 workers to a quarter of the rate
    >> will probably lead to a combined rate of less than than the specified
    >> rate, because if one worker doesn't use all of the bandwidth to which
    >> it's entitled, or even exits earlier than the others, the other
    >> workers don't get to go any faster as a result. Another problem is
    >> that, in the current approach, throttling applies overall to the
    >> entire backup, but in this approach, it is applied separately to each
    >> SEND_FILE command. In the current approach, if one file finishes a
    >> little faster or slower than anticipated, the next file in the tarball
    >> will be sent a little slower or faster to compensate. But in this
    >> approach, each SEND_FILES command is throttled separately, so this
    >> property is lost. Furthermore, while BASEBACKUP sends data
    >> continuously, this approach naturally involves pauses between
    >> commands. If files are large, that won't matter much, but if they're
    >> small and numerous, it will tend to cause the actual transfer rate to
    >> be less than the throttling rate.
    >>
    >> One potential way to solve this problem is... move it to the client
    >> side. Instead of making it the server's job not to send data too fast,
    >> make it the client's job not to receive data too fast. Let the server
    >> backends write as fast as they want, and on the pg_basebackup side,
    >> have the threads coordinate with each other so that they don't read
    >> data faster than the configured rate. That's not quite the same thing,
    >> though, because the server can get ahead by the size of the client's
    >> receive buffers plus whatever data is on the wire. I don't know
    >> whether that's a big enough problem to be worth caring about. If it
    >> is, then I think we need some server infrastructure to "group
    >> throttle" a group of cooperating backends.
    >>
    >
    > That was a mistake in my code. maxrate should've been equally divided
    > amongst all threads. I agree that we should move this to the client-side.
    > When a thread exits, its share should also be equally divided amongst
    > the remaining threads (i.e. recalculate maxrate for each remaining
    > thread).
    >
    > Say we have 4 running threads with each allocation 25% of the bandwidth.
    > Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
    > 33.33% each. This solves one problem that you had identified. However,
    > it doesn't solve where one (or more) thread is not fully consuming their
    > allocated share. I'm not really sure how we can solve it properly.
    > Suggestions
    > are welcome.
    >
    >
    >>
    >> A general comment about 0004 is that it seems like you've proceeded by
    >> taking the code from perform_base_backup() and spreading it across
    >> several different functions without, necessarily, as much thought as
    >> is needed there. For instance, StartBackup() looks like just the
    >> beginning of perform_base_backup(). But, why shouldn't it instead look
    >> like pg_start_backup() -- in fact, a simplified version that only
    >> handles the non-exclusive backup case? Is the extra stuff it's doing
    >> really appropriate? I've already complained about the
    >> tablespace-related stuff here and the throttling, but there's more.
    >> Setting statrelpath here will probably break if somebody tries to use
    >> SEND_FILES without first calling START_BACKUP. Sending the
    >> backup_label file here is oddly asymmetric, because that's done by
    >> pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
    >> looks like it's just the end of perform_base_backup(), but that's not
    >> pretty strange-looking too. Again, I've already complained about
    >> include_wal_files() being part of this, but there's also:
    >>
    >> +       /* ... and pg_control after everything else. */
    >>
    >> ...which (1) is an odd thing to say when this is the first thing this
    >> particular function is to send and (2) is another example of a sloppy
    >> division of labor between client and server; apparently, the client is
    >> supposed to know not to request pg_control, because the server is
    >> going to send it unsolicited. There's no particular reason to have
    >> this a special case. The client could just request it last. And then
    >> the server code wouldn't need a special case, and you wouldn't have
    >> this odd logic split between the client and the server.
    >>
    >> Overall, I think this needs a lot more work. The overall idea's not
    >> wrong, but there seem to be a very large number of details which, at
    >> least to me, do not seem to be correct.
    >>
    >>
    >
    > Thank you Robert for the detailed review. I really appreciate your insights
    > and very precise feedback.
    >
    > After the changes suggested above, the design on a high level will look
    > something
    > like this:
    >
    > === SEQUENTIAL EXECUTION ===
    > START_BACKUP [LABEL | FAST]
    > - Starts backup on the server
    > - Returns the start LSN to client
    >
    > LIST_TABLESPACES
    > - Sends a list of all tables spaces to client
    >
    > Loops over LIST_TABLESPACES
    > - LIST_FILES [tablespace]
    > - Sends file list for the given tablespace
    > - Create a list of all files
    >
    > === PARALLEL EXECUTION ===
    > Thread loop until the list of files is exhausted
    > SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
    > - If the checksum is enabled then WAL_START_LOCATION is required.
    > - Can request server to send one or more files but we are requesting one
    > at a time
    > - Pick next file from list of files
    >
    > - Threads sleep after the list is exhausted
    > - All threads are sleeping
    >
    > === SEQUENTIAL EXECUTION ===
    > STOP_BACKUP [NOWAIT]
    > - Stops backup mode
    > - Return end LSN
    >
    > If --wal-method=fetch then
    > LIST_WAL_FILES 'start_lsn' 'end_lsn'
    > - Sends a list of WAL files between start LSN and end LSN
    >
    > === PARALLEL EXECUTION ===
    > Thread loop until the list of WAL files is exhausted
    > SEND_FILE <WAL file>
    > - Can request server to send one or more files but we are requesting one
    > WAL file at a time
    > - Pick next file from list of WAL files
    >
    > - Threads terminate and set their status as completed/terminated
    >
    > === SEQUENTIAL EXECUTION ===
    > Cleanup
    >
    >
    >
    >
    Here are the the updated patches, taking care of the issues pointed
    earlier. This patch adds the following commands (with specified option):
    
    START_BACKUP [LABEL '<label>'] [FAST]
    STOP_BACKUP [NOWAIT]
    LIST_TABLESPACES [PROGRESS]
    LIST_FILES [TABLESPACE]
    LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
                                [NOVERIFY_CHECKSUMS]
    
    
    Parallel backup is not making any use of tablespace map, so I have
    removed that option from the above commands. There is a patch pending
    to remove the exclusive backup; we can further refactor the
    do_pg_start_backup
    function at that time, to remove the tablespace information and move the
    creation of tablespace_map file to the client.
    
    
    I have disabled the maxrate option for parallel backup. I intend to send
    out a separate patch for it. Robert previously suggested to implement
    throttling on the client-side. I found the original email thread [1]
    where throttling was proposed and added to the server. In that thread,
    it was originally implemented on the client-side, but per many suggestions,
    it was moved to server-side.
    
    So, I have a few suggestions on how we can implement this:
    
    1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
    the user could choose the bandwidth allocation for each worker. This
    approach
    can be implemented on the client-side as well as on the server-side.
    
    2- have the maxrate, be divided among workers equally at first. and the
    let the main thread keep adjusting it whenever one of the workers finishes.
    I believe this would only be possible if we handle throttling on the client.
    Also, as I understand it, implementing this will introduce additional mutex
    for handling of bandwidth consumption data so that rate may be adjusted
    according to data received by threads.
    
    [1]
    https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  43. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-02-10T13:48:37Z

    Hi Asif,
    
    On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > Here are the the updated patches, taking care of the issues pointed
    > earlier. This patch adds the following commands (with specified option):
    >
    > START_BACKUP [LABEL '<label>'] [FAST]
    > STOP_BACKUP [NOWAIT]
    > LIST_TABLESPACES [PROGRESS]
    > LIST_FILES [TABLESPACE]
    > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    > SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
    >                             [NOVERIFY_CHECKSUMS]
    >
    >
    > Parallel backup is not making any use of tablespace map, so I have
    > removed that option from the above commands. There is a patch pending
    > to remove the exclusive backup; we can further refactor the
    > do_pg_start_backup
    > function at that time, to remove the tablespace information and move the
    > creation of tablespace_map file to the client.
    >
    >
    > I have disabled the maxrate option for parallel backup. I intend to send
    > out a separate patch for it. Robert previously suggested to implement
    > throttling on the client-side. I found the original email thread [1]
    > where throttling was proposed and added to the server. In that thread,
    > it was originally implemented on the client-side, but per many suggestions,
    > it was moved to server-side.
    >
    > So, I have a few suggestions on how we can implement this:
    >
    > 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
    > the user could choose the bandwidth allocation for each worker. This
    > approach
    > can be implemented on the client-side as well as on the server-side.
    >
    > 2- have the maxrate, be divided among workers equally at first. and the
    > let the main thread keep adjusting it whenever one of the workers finishes.
    > I believe this would only be possible if we handle throttling on the
    > client.
    > Also, as I understand it, implementing this will introduce additional mutex
    > for handling of bandwidth consumption data so that rate may be adjusted
    > according to data received by threads.
    >
    > [1]
    > https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    The latest changes look good to me. However, the patch set is missing the
    documentation.
    Please add those.
    
    Thanks
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  44. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-02-17T08:39:08Z

    Thanks Jeevan. Here is the documentation patch.
    
    On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >> Here are the the updated patches, taking care of the issues pointed
    >> earlier. This patch adds the following commands (with specified option):
    >>
    >> START_BACKUP [LABEL '<label>'] [FAST]
    >> STOP_BACKUP [NOWAIT]
    >> LIST_TABLESPACES [PROGRESS]
    >> LIST_FILES [TABLESPACE]
    >> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
    >>                             [NOVERIFY_CHECKSUMS]
    >>
    >>
    >> Parallel backup is not making any use of tablespace map, so I have
    >> removed that option from the above commands. There is a patch pending
    >> to remove the exclusive backup; we can further refactor the
    >> do_pg_start_backup
    >> function at that time, to remove the tablespace information and move the
    >> creation of tablespace_map file to the client.
    >>
    >>
    >> I have disabled the maxrate option for parallel backup. I intend to send
    >> out a separate patch for it. Robert previously suggested to implement
    >> throttling on the client-side. I found the original email thread [1]
    >> where throttling was proposed and added to the server. In that thread,
    >> it was originally implemented on the client-side, but per many
    >> suggestions,
    >> it was moved to server-side.
    >>
    >> So, I have a few suggestions on how we can implement this:
    >>
    >> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
    >> the user could choose the bandwidth allocation for each worker. This
    >> approach
    >> can be implemented on the client-side as well as on the server-side.
    >>
    >> 2- have the maxrate, be divided among workers equally at first. and the
    >> let the main thread keep adjusting it whenever one of the workers
    >> finishes.
    >> I believe this would only be possible if we handle throttling on the
    >> client.
    >> Also, as I understand it, implementing this will introduce additional
    >> mutex
    >> for handling of bandwidth consumption data so that rate may be adjusted
    >> according to data received by threads.
    >>
    >> [1]
    >> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    >
    > The latest changes look good to me. However, the patch set is missing the
    > documentation.
    > Please add those.
    >
    > Thanks
    >
    > --
    > Jeevan Chalke
    > Associate Database Architect & Team Lead, Product Development
    > EnterpriseDB Corporation
    > The Enterprise PostgreSQL Company
    >
    >
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  45. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-02-25T14:18:42Z

    Hi,
    
    I have created a commitfest entry.
    https://commitfest.postgresql.org/27/2472/
    
    
    On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Thanks Jeevan. Here is the documentation patch.
    >
    > On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
    > jeevan.chalke@enterprisedb.com> wrote:
    >
    >> Hi Asif,
    >>
    >> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>> Here are the the updated patches, taking care of the issues pointed
    >>> earlier. This patch adds the following commands (with specified option):
    >>>
    >>> START_BACKUP [LABEL '<label>'] [FAST]
    >>> STOP_BACKUP [NOWAIT]
    >>> LIST_TABLESPACES [PROGRESS]
    >>> LIST_FILES [TABLESPACE]
    >>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
    >>>                             [NOVERIFY_CHECKSUMS]
    >>>
    >>>
    >>> Parallel backup is not making any use of tablespace map, so I have
    >>> removed that option from the above commands. There is a patch pending
    >>> to remove the exclusive backup; we can further refactor the
    >>> do_pg_start_backup
    >>> function at that time, to remove the tablespace information and move the
    >>> creation of tablespace_map file to the client.
    >>>
    >>>
    >>> I have disabled the maxrate option for parallel backup. I intend to send
    >>> out a separate patch for it. Robert previously suggested to implement
    >>> throttling on the client-side. I found the original email thread [1]
    >>> where throttling was proposed and added to the server. In that thread,
    >>> it was originally implemented on the client-side, but per many
    >>> suggestions,
    >>> it was moved to server-side.
    >>>
    >>> So, I have a few suggestions on how we can implement this:
    >>>
    >>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
    >>> the user could choose the bandwidth allocation for each worker. This
    >>> approach
    >>> can be implemented on the client-side as well as on the server-side.
    >>>
    >>> 2- have the maxrate, be divided among workers equally at first. and the
    >>> let the main thread keep adjusting it whenever one of the workers
    >>> finishes.
    >>> I believe this would only be possible if we handle throttling on the
    >>> client.
    >>> Also, as I understand it, implementing this will introduce additional
    >>> mutex
    >>> for handling of bandwidth consumption data so that rate may be adjusted
    >>> according to data received by threads.
    >>>
    >>> [1]
    >>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>>
    >>> --
    >>> Asif Rehman
    >>> Highgo Software (Canada/China/Pakistan)
    >>> URL : www.highgo.ca
    >>>
    >>>
    >>
    >> The latest changes look good to me. However, the patch set is missing the
    >> documentation.
    >> Please add those.
    >>
    >> Thanks
    >>
    >> --
    >> Jeevan Chalke
    >> Associate Database Architect & Team Lead, Product Development
    >> EnterpriseDB Corporation
    >> The Enterprise PostgreSQL Company
    >>
    >>
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  46. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-11T09:38:20Z

    Hi Asif
    
    I have started testing this feature. I have applied v6 patch on commit
    a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
    I got few observations, please take a look.
    
    *--if backup failed, backup directory is not getting removed.*
    [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
    pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
    
    
    *--giving large number of jobs leading segmentation fault.*
    ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    .
    .
    .
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: error: could not connect to server: could not fork new
    process for connection: Resource temporarily unavailable
    
    could not fork new process for connection: Resource temporarily unavailable
    pg_basebackup: error: failed to create thread: Resource temporarily
    unavailable
    Segmentation fault (core dumped)
    
    --stack-trace
    gdb -q -c core.11824 pg_basebackup
    Loaded symbols for /lib64/libnss_files.so.2
    Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
    /tmp/test_bkp/bkp10'.
    Program terminated with signal 11, Segmentation fault.
    #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    pthread_join.c:46
    46  if (INVALID_NOT_TERMINATED_TD_P (pd))
    Missing separate debuginfos, use: debuginfo-install
    keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    (gdb) bt
    #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    pthread_join.c:46
    #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
    #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
    #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
    #4  exit (status=1) at exit.c:100
    #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at
    pg_basebackup.c:2713
    #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
    #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
    pg_basebackup.c:2668
    
    
    *--with tablespace is in the same directory as data, parallel_backup
    crashed*
    [edb@localhost bin]$ ./initdb -D /tmp/data
    [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
    [edb@localhost bin]$ mkdir /tmp/ts
    [edb@localhost bin]$ ./psql postgres
    psql (13devel)
    Type "help" for help.
    
    postgres=# create tablespace ts location '/tmp/ts';
    CREATE TABLESPACE
    postgres=# create table tx (a int) tablespace ts;
    CREATE TABLE
    postgres=# \q
    [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
    Segmentation fault (core dumped)
    
    --stack-trace
    [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
    Loaded symbols for /lib64/libnss_files.so.2
    Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
    /tmp/ts=/tmp/ts1'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    backupInfo=0x14210a0) at pg_basebackup.c:3000
    3000 backupInfo->curr->next = file;
    Missing separate debuginfos, use: debuginfo-install
    keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    (gdb) bt
    #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    backupInfo=0x14210a0) at pg_basebackup.c:3000
    #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
    pg_basebackup.c:2739
    #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
    #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
    pg_basebackup.c:2668
    (gdb)
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Hi,
    >
    > I have created a commitfest entry.
    > https://commitfest.postgresql.org/27/2472/
    >
    >
    > On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >> Thanks Jeevan. Here is the documentation patch.
    >>
    >> On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
    >> jeevan.chalke@enterprisedb.com> wrote:
    >>
    >>> Hi Asif,
    >>>
    >>> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>>
    >>>> Here are the the updated patches, taking care of the issues pointed
    >>>> earlier. This patch adds the following commands (with specified option):
    >>>>
    >>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>> STOP_BACKUP [NOWAIT]
    >>>> LIST_TABLESPACES [PROGRESS]
    >>>> LIST_FILES [TABLESPACE]
    >>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
    >>>>                             [NOVERIFY_CHECKSUMS]
    >>>>
    >>>>
    >>>> Parallel backup is not making any use of tablespace map, so I have
    >>>> removed that option from the above commands. There is a patch pending
    >>>> to remove the exclusive backup; we can further refactor the
    >>>> do_pg_start_backup
    >>>> function at that time, to remove the tablespace information and move the
    >>>> creation of tablespace_map file to the client.
    >>>>
    >>>>
    >>>> I have disabled the maxrate option for parallel backup. I intend to send
    >>>> out a separate patch for it. Robert previously suggested to implement
    >>>> throttling on the client-side. I found the original email thread [1]
    >>>> where throttling was proposed and added to the server. In that thread,
    >>>> it was originally implemented on the client-side, but per many
    >>>> suggestions,
    >>>> it was moved to server-side.
    >>>>
    >>>> So, I have a few suggestions on how we can implement this:
    >>>>
    >>>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
    >>>> the user could choose the bandwidth allocation for each worker. This
    >>>> approach
    >>>> can be implemented on the client-side as well as on the server-side.
    >>>>
    >>>> 2- have the maxrate, be divided among workers equally at first. and the
    >>>> let the main thread keep adjusting it whenever one of the workers
    >>>> finishes.
    >>>> I believe this would only be possible if we handle throttling on the
    >>>> client.
    >>>> Also, as I understand it, implementing this will introduce additional
    >>>> mutex
    >>>> for handling of bandwidth consumption data so that rate may be adjusted
    >>>> according to data received by threads.
    >>>>
    >>>> [1]
    >>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>>>
    >>>> --
    >>>> Asif Rehman
    >>>> Highgo Software (Canada/China/Pakistan)
    >>>> URL : www.highgo.ca
    >>>>
    >>>>
    >>>
    >>> The latest changes look good to me. However, the patch set is missing
    >>> the documentation.
    >>> Please add those.
    >>>
    >>> Thanks
    >>>
    >>> --
    >>> Jeevan Chalke
    >>> Associate Database Architect & Team Lead, Product Development
    >>> EnterpriseDB Corporation
    >>> The Enterprise PostgreSQL Company
    >>>
    >>>
    >>
    >> --
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  47. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-03-13T16:21:09Z

    On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif
    >
    > I have started testing this feature. I have applied v6 patch on commit
    > a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
    > I got few observations, please take a look.
    >
    > *--if backup failed, backup directory is not getting removed.*
    > [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
    > /tmp/test_bkp/bkp6
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
    > /tmp/test_bkp/bkp6
    > pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
    > empty
    >
    >
    > *--giving large number of jobs leading segmentation fault.*
    > ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > .
    > .
    > .
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: error: could not connect to server: could not fork new
    > process for connection: Resource temporarily unavailable
    >
    > could not fork new process for connection: Resource temporarily unavailable
    > pg_basebackup: error: failed to create thread: Resource temporarily
    > unavailable
    > Segmentation fault (core dumped)
    >
    > --stack-trace
    > gdb -q -c core.11824 pg_basebackup
    > Loaded symbols for /lib64/libnss_files.so.2
    > Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
    > /tmp/test_bkp/bkp10'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    > pthread_join.c:46
    > 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
    > Missing separate debuginfos, use: debuginfo-install
    > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    > (gdb) bt
    > #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    > pthread_join.c:46
    > #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
    > #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
    > #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
    > #4  exit (status=1) at exit.c:100
    > #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
    > at pg_basebackup.c:2713
    > #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
    > #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
    > pg_basebackup.c:2668
    >
    >
    > *--with tablespace is in the same directory as data, parallel_backup
    > crashed*
    > [edb@localhost bin]$ ./initdb -D /tmp/data
    > [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
    > [edb@localhost bin]$ mkdir /tmp/ts
    > [edb@localhost bin]$ ./psql postgres
    > psql (13devel)
    > Type "help" for help.
    >
    > postgres=# create tablespace ts location '/tmp/ts';
    > CREATE TABLESPACE
    > postgres=# create table tx (a int) tablespace ts;
    > CREATE TABLE
    > postgres=# \q
    > [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
    > Segmentation fault (core dumped)
    >
    > --stack-trace
    > [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
    > Loaded symbols for /lib64/libnss_files.so.2
    > Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
    > /tmp/ts=/tmp/ts1'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    > backupInfo=0x14210a0) at pg_basebackup.c:3000
    > 3000 backupInfo->curr->next = file;
    > Missing separate debuginfos, use: debuginfo-install
    > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    > (gdb) bt
    > #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    > backupInfo=0x14210a0) at pg_basebackup.c:3000
    > #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
    > pg_basebackup.c:2739
    > #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
    > #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
    > pg_basebackup.c:2668
    > (gdb)
    >
    
    
    Thanks Rajkumar. I have fixed the above issues and have rebased the patch
    to the latest master (b7f64c64).
    (V9 of the patches are attached).
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  48. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T06:08:31Z

    Thanks for the patches.
    
    I have verified reported issues with new patches, issues are fixed now.
    
    I got another observation where If a new slot name given without -C option,
    it leads to server crash error.
    
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
    test_bkp_slot
    pg_basebackup: error: could not send replication command
    "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
    pg_basebackup: error: could not list backup files: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: removing data directory "/tmp/bkp"
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Fri, Mar 13, 2020 at 9:51 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Hi Asif
    >>
    >> I have started testing this feature. I have applied v6 patch on commit
    >> a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
    >> I got few observations, please take a look.
    >>
    >> *--if backup failed, backup directory is not getting removed.*
    >> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
    >> /tmp/test_bkp/bkp6
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
    >> /tmp/test_bkp/bkp6
    >> pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
    >> empty
    >>
    >>
    >> *--giving large number of jobs leading segmentation fault.*
    >> ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> .
    >> .
    >> .
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> pg_basebackup: error: could not connect to server: could not fork new
    >> process for connection: Resource temporarily unavailable
    >>
    >> could not fork new process for connection: Resource temporarily
    >> unavailable
    >> pg_basebackup: error: failed to create thread: Resource temporarily
    >> unavailable
    >> Segmentation fault (core dumped)
    >>
    >> --stack-trace
    >> gdb -q -c core.11824 pg_basebackup
    >> Loaded symbols for /lib64/libnss_files.so.2
    >> Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
    >> /tmp/test_bkp/bkp10'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    >> pthread_join.c:46
    >> 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >> Missing separate debuginfos, use: debuginfo-install
    >> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >> (gdb) bt
    >> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
    >> pthread_join.c:46
    >> #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
    >> #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
    >> #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
    >> #4  exit (status=1) at exit.c:100
    >> #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
    >> at pg_basebackup.c:2713
    >> #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
    >> #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
    >> pg_basebackup.c:2668
    >>
    >>
    >> *--with tablespace is in the same directory as data, parallel_backup
    >> crashed*
    >> [edb@localhost bin]$ ./initdb -D /tmp/data
    >> [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
    >> [edb@localhost bin]$ mkdir /tmp/ts
    >> [edb@localhost bin]$ ./psql postgres
    >> psql (13devel)
    >> Type "help" for help.
    >>
    >> postgres=# create tablespace ts location '/tmp/ts';
    >> CREATE TABLESPACE
    >> postgres=# create table tx (a int) tablespace ts;
    >> CREATE TABLE
    >> postgres=# \q
    >> [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
    >> Segmentation fault (core dumped)
    >>
    >> --stack-trace
    >> [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
    >> Loaded symbols for /lib64/libnss_files.so.2
    >> Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
    >> /tmp/ts=/tmp/ts1'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    >> backupInfo=0x14210a0) at pg_basebackup.c:3000
    >> 3000 backupInfo->curr->next = file;
    >> Missing separate debuginfos, use: debuginfo-install
    >> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >> (gdb) bt
    >> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
    >> backupInfo=0x14210a0) at pg_basebackup.c:3000
    >> #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
    >> pg_basebackup.c:2739
    >> #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
    >> #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
    >> pg_basebackup.c:2668
    >> (gdb)
    >>
    >
    >
    > Thanks Rajkumar. I have fixed the above issues and have rebased the patch
    > to the latest master (b7f64c64).
    > (V9 of the patches are attached).
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  49. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-03-16T06:21:49Z

    On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Thanks for the patches.
    >
    > I have verified reported issues with new patches, issues are fixed now.
    >
    > I got another observation where If a new slot name given without -C
    > option, it leads to server crash error.
    >
    > [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
    > test_bkp_slot
    > pg_basebackup: error: could not send replication command
    > "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
    > pg_basebackup: error: could not list backup files: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > pg_basebackup: removing data directory "/tmp/bkp"
    >
    
    It seems to be an expected behavior. The START_BACKUP command has been
    executed, and
    pg_basebackup tries to start a WAL streaming process with a non-existent
    slot, which results in
    an error. So the backup is aborted while terminating all other processes.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  50. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T06:26:16Z

    On Mon, Mar 16, 2020 at 11:52 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Thanks for the patches.
    >>
    >> I have verified reported issues with new patches, issues are fixed now.
    >>
    >> I got another observation where If a new slot name given without -C
    >> option, it leads to server crash error.
    >>
    >> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
    >> test_bkp_slot
    >> pg_basebackup: error: could not send replication command
    >> "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
    >> pg_basebackup: error: could not list backup files: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: removing data directory "/tmp/bkp"
    >>
    >
    > It seems to be an expected behavior. The START_BACKUP command has been
    > executed, and
    > pg_basebackup tries to start a WAL streaming process with a non-existent
    > slot, which results in
    > an error. So the backup is aborted while terminating all other processes.
    >
    I think error message can be improved. current error message looks like
    database server is crashed.
    
    on PG same is existing with exit 1.
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp --slot
    test_bkp_slot
    pg_basebackup: error: could not send replication command
    "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
    pg_basebackup: error: child process exited with exit code 1
    pg_basebackup: removing data directory "/tmp/bkp"
    
    
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  51. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-03-16T08:43:48Z

    Hi Asif,
    
    
    > Thanks Rajkumar. I have fixed the above issues and have rebased the patch
    > to the latest master (b7f64c64).
    > (V9 of the patches are attached).
    >
    
    I had a further review of the patches and here are my few observations:
    
    1.
    +/*
    + * stop_backup() - ends an online backup
    + *
    + * The function is called at the end of an online backup. It sends out
    pg_control
    + * file, optionally WAL segments and ending WAL location.
    + */
    
    Comments seem out-dated.
    
    2. With parallel jobs, maxrate is now not supported. Since we are now asking
    data in multiple threads throttling seems important here. Can you please
    explain why have you disabled that?
    
    3. As we are always fetching a single file and as Robert suggested, let
    rename
    SEND_FILES to SEND_FILE instead.
    
    4. Does this work on Windows? I mean does pthread_create() work on Windows?
    I asked this as I see that pgbench has its own implementation for
    pthread_create() for WIN32 but this patch doesn't.
    
    5. Typos:
    tablspace => tablespace
    safly => safely
    
    6. parallel_backup_run() needs some comments explaining the states it goes
    through PB_* states.
    
    7.
    +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    +                if (backupinfo->activeworkers == 0)
    +                {
    +                    backupinfo->backupstate = PB_STOP_BACKUP;
    +                    free_filelist(backupinfo);
    +                }
    +                break;
    +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
    +                if (backupinfo->activeworkers == 0)
    +                {
    +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    +                }
    +                break;
    
    Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    
    Thanks
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
    Phone: +91 20 66449694
    
    Website: www.enterprisedb.com
    EnterpriseDB Blog: http://blogs.enterprisedb.com/
    Follow us on Twitter: http://www.twitter.com/enterprisedb
    
    This e-mail message (and any attachment) is intended for the use of the
    individual or entity to whom it is addressed. This message contains
    information from EnterpriseDB Corporation that may be privileged,
    confidential, or exempt from disclosure under applicable law. If you are
    not the intended recipient or authorized to receive this for the intended
    recipient, any use, dissemination, distribution, retention, archiving, or
    copying of this communication is strictly prohibited. If you have received
    this e-mail in error, please notify the sender immediately by reply e-mail
    and delete this message.
    
  52. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T12:49:44Z

    Hi Asif,
    
    On testing further, I found when taking backup with -R, pg_basebackup
    crashed
    this crash is not consistently reproducible.
    
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    text);"
    CREATE TABLE
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
    ('parallel_backup with -R recovery-conf');"
    INSERT 0 1
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
    Segmentation fault (core dumped)
    
    stack trace looks the same as it was on earlier reported crash with
    tablespace.
    --stack trace
    [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    Loaded symbols for /lib64/libnss_files.so.2
    Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    -R'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    pg_basebackup.c:3175
    3175 backupinfo->curr = fetchfile->next;
    Missing separate debuginfos, use: debuginfo-install
    keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    (gdb) bt
    #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    pg_basebackup.c:3175
    #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
    #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    pthread_create.c:301
    #3  0x00000039212e8c4d in clone () at
    ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    (gdb)
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    >
    >> Thanks Rajkumar. I have fixed the above issues and have rebased the patch
    >> to the latest master (b7f64c64).
    >> (V9 of the patches are attached).
    >>
    >
    > I had a further review of the patches and here are my few observations:
    >
    > 1.
    > +/*
    > + * stop_backup() - ends an online backup
    > + *
    > + * The function is called at the end of an online backup. It sends out
    > pg_control
    > + * file, optionally WAL segments and ending WAL location.
    > + */
    >
    > Comments seem out-dated.
    >
    > 2. With parallel jobs, maxrate is now not supported. Since we are now
    > asking
    > data in multiple threads throttling seems important here. Can you please
    > explain why have you disabled that?
    >
    > 3. As we are always fetching a single file and as Robert suggested, let
    > rename
    > SEND_FILES to SEND_FILE instead.
    >
    > 4. Does this work on Windows? I mean does pthread_create() work on Windows?
    > I asked this as I see that pgbench has its own implementation for
    > pthread_create() for WIN32 but this patch doesn't.
    >
    > 5. Typos:
    > tablspace => tablespace
    > safly => safely
    >
    > 6. parallel_backup_run() needs some comments explaining the states it goes
    > through PB_* states.
    >
    > 7.
    > +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    > +                if (backupinfo->activeworkers == 0)
    > +                {
    > +                    backupinfo->backupstate = PB_STOP_BACKUP;
    > +                    free_filelist(backupinfo);
    > +                }
    > +                break;
    > +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
    > +                if (backupinfo->activeworkers == 0)
    > +                {
    > +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    > +                }
    > +                break;
    >
    > Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >
    > Thanks
    > --
    > Jeevan Chalke
    > Associate Database Architect & Team Lead, Product Development
    > EnterpriseDB Corporation
    > The Enterprise PostgreSQL Company
    >
    > Phone: +91 20 66449694
    >
    > Website: www.enterprisedb.com
    > EnterpriseDB Blog: http://blogs.enterprisedb.com/
    > Follow us on Twitter: http://www.twitter.com/enterprisedb
    >
    > This e-mail message (and any attachment) is intended for the use of the
    > individual or entity to whom it is addressed. This message contains
    > information from EnterpriseDB Corporation that may be privileged,
    > confidential, or exempt from disclosure under applicable law. If you are
    > not the intended recipient or authorized to receive this for the intended
    > recipient, any use, dissemination, distribution, retention, archiving, or
    > copying of this communication is strictly prohibited. If you have received
    > this e-mail in error, please notify the sender immediately by reply e-mail
    > and delete this message.
    >
    
  53. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-19T10:41:24Z

    Hi Asif,
    
    In another scenarios, bkp data is corrupted for tablespace. again this is
    not reproducible everytime,
    but If I am running the same set of commands I am getting the same error.
    
    [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    waiting for server to start.... done
    server started
    [edb@localhost bin]$
    [edb@localhost bin]$ mkdir /tmp/tblsp
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
    location '/tmp/tblsp';"
    CREATE TABLESPACE
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
    tablespace tblsp;"
    CREATE DATABASE
    [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
    text);"
    CREATE TABLE
    [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
    ('parallel_backup with tablespace');"
    INSERT 0 1
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
    start
    waiting for server to start.... done
    server started
    [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
      oid  |  spcname   | spcowner | spcacl | spcoptions
    -------+------------+----------+--------+------------
      1663 | pg_default |       10 |        |
     16384 | tblsp      |       10 |        |
    (2 rows)
    
    [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    psql: error: could not connect to server: FATAL:
     "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
    [edb@localhost bin]$
    [edb@localhost bin]$ ls
    data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    [edb@localhost bin]$ ls
    /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    ls: cannot access
    /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    directory
    
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > On testing further, I found when taking backup with -R, pg_basebackup
    > crashed
    > this crash is not consistently reproducible.
    >
    > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    > text);"
    > CREATE TABLE
    > [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
    > ('parallel_backup with -R recovery-conf');"
    > INSERT 0 1
    > [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
    > Segmentation fault (core dumped)
    >
    > stack trace looks the same as it was on earlier reported crash with
    > tablespace.
    > --stack trace
    > [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    > Loaded symbols for /lib64/libnss_files.so.2
    > Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    > -R'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    > pg_basebackup.c:3175
    > 3175 backupinfo->curr = fetchfile->next;
    > Missing separate debuginfos, use: debuginfo-install
    > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    > (gdb) bt
    > #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    > pg_basebackup.c:3175
    > #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
    > #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    > pthread_create.c:301
    > #3  0x00000039212e8c4d in clone () at
    > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    > (gdb)
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    > jeevan.chalke@enterprisedb.com> wrote:
    >
    >> Hi Asif,
    >>
    >>
    >>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>> patch to the latest master (b7f64c64).
    >>> (V9 of the patches are attached).
    >>>
    >>
    >> I had a further review of the patches and here are my few observations:
    >>
    >> 1.
    >> +/*
    >> + * stop_backup() - ends an online backup
    >> + *
    >> + * The function is called at the end of an online backup. It sends out
    >> pg_control
    >> + * file, optionally WAL segments and ending WAL location.
    >> + */
    >>
    >> Comments seem out-dated.
    >>
    >> 2. With parallel jobs, maxrate is now not supported. Since we are now
    >> asking
    >> data in multiple threads throttling seems important here. Can you please
    >> explain why have you disabled that?
    >>
    >> 3. As we are always fetching a single file and as Robert suggested, let
    >> rename
    >> SEND_FILES to SEND_FILE instead.
    >>
    >> 4. Does this work on Windows? I mean does pthread_create() work on
    >> Windows?
    >> I asked this as I see that pgbench has its own implementation for
    >> pthread_create() for WIN32 but this patch doesn't.
    >>
    >> 5. Typos:
    >> tablspace => tablespace
    >> safly => safely
    >>
    >> 6. parallel_backup_run() needs some comments explaining the states it goes
    >> through PB_* states.
    >>
    >> 7.
    >> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    >> +                if (backupinfo->activeworkers == 0)
    >> +                {
    >> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >> +                    free_filelist(backupinfo);
    >> +                }
    >> +                break;
    >> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
    >> +                if (backupinfo->activeworkers == 0)
    >> +                {
    >> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >> +                }
    >> +                break;
    >>
    >> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>
    >> Thanks
    >> --
    >> Jeevan Chalke
    >> Associate Database Architect & Team Lead, Product Development
    >> EnterpriseDB Corporation
    >> The Enterprise PostgreSQL Company
    >>
    >> Phone: +91 20 66449694
    >>
    >> Website: www.enterprisedb.com
    >> EnterpriseDB Blog: http://blogs.enterprisedb.com/
    >> Follow us on Twitter: http://www.twitter.com/enterprisedb
    >>
    >> This e-mail message (and any attachment) is intended for the use of the
    >> individual or entity to whom it is addressed. This message contains
    >> information from EnterpriseDB Corporation that may be privileged,
    >> confidential, or exempt from disclosure under applicable law. If you are
    >> not the intended recipient or authorized to receive this for the intended
    >> recipient, any use, dissemination, distribution, retention, archiving, or
    >> copying of this communication is strictly prohibited. If you have received
    >> this e-mail in error, please notify the sender immediately by reply e-mail
    >> and delete this message.
    >>
    >
    
  54. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-25T07:22:11Z

    Hi Asif,
    
    While testing further I observed parallel backup is not able to take backup
    of standby server.
    
    mkdir /tmp/archive_dir
    echo "archive_mode='on'">> data/postgresql.conf
    echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
    
    ./pg_ctl -D data -l logs start
    ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    
    echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    /tmp/slave/postgresql.conf
    echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    /tmp/slave/postgresql.conf
    echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
    
    ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    pg_is_in_recovery();"
     pg_is_in_recovery
    -------------------
     f
    (1 row)
    
    [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    pg_is_in_recovery();"
     pg_is_in_recovery
    -------------------
     t
    (1 row)
    
    
    
    
    *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    promoted during online backupHINT:  This means that the backup being taken
    is corrupt and should not be used. Try taking another online
    backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    
    #same is working fine without parallel backup
    [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    /tmp/bkp_s/PG_VERSION
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > In another scenarios, bkp data is corrupted for tablespace. again this is
    > not reproducible everytime,
    > but If I am running the same set of commands I am getting the same error.
    >
    > [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    > waiting for server to start.... done
    > server started
    > [edb@localhost bin]$
    > [edb@localhost bin]$ mkdir /tmp/tblsp
    > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
    > location '/tmp/tblsp';"
    > CREATE TABLESPACE
    > [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
    > tablespace tblsp;"
    > CREATE DATABASE
    > [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
    > text);"
    > CREATE TABLE
    > [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
    > ('parallel_backup with tablespace');"
    > INSERT 0 1
    > [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    > /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    > [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
    > start
    > waiting for server to start.... done
    > server started
    > [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    > pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >   oid  |  spcname   | spcowner | spcacl | spcoptions
    > -------+------------+----------+--------+------------
    >   1663 | pg_default |       10 |        |
    >  16384 | tblsp      |       10 |        |
    > (2 rows)
    >
    > [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    > psql: error: could not connect to server: FATAL:
    >  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    > DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    > missing.
    > [edb@localhost bin]$
    > [edb@localhost bin]$ ls
    > data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    > data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    > [edb@localhost bin]$ ls
    > /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    > ls: cannot access
    > /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    > directory
    >
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Hi Asif,
    >>
    >> On testing further, I found when taking backup with -R, pg_basebackup
    >> crashed
    >> this crash is not consistently reproducible.
    >>
    >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    >> text);"
    >> CREATE TABLE
    >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
    >> ('parallel_backup with -R recovery-conf');"
    >> INSERT 0 1
    >> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
    >> Segmentation fault (core dumped)
    >>
    >> stack trace looks the same as it was on earlier reported crash with
    >> tablespace.
    >> --stack trace
    >> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >> Loaded symbols for /lib64/libnss_files.so.2
    >> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    >> -R'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >> pg_basebackup.c:3175
    >> 3175 backupinfo->curr = fetchfile->next;
    >> Missing separate debuginfos, use: debuginfo-install
    >> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >> (gdb) bt
    >> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >> pg_basebackup.c:3175
    >> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >> pg_basebackup.c:2715
    >> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >> pthread_create.c:301
    >> #3  0x00000039212e8c4d in clone () at
    >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >> (gdb)
    >>
    >> Thanks & Regards,
    >> Rajkumar Raghuwanshi
    >>
    >>
    >> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >> jeevan.chalke@enterprisedb.com> wrote:
    >>
    >>> Hi Asif,
    >>>
    >>>
    >>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>>> patch to the latest master (b7f64c64).
    >>>> (V9 of the patches are attached).
    >>>>
    >>>
    >>> I had a further review of the patches and here are my few observations:
    >>>
    >>> 1.
    >>> +/*
    >>> + * stop_backup() - ends an online backup
    >>> + *
    >>> + * The function is called at the end of an online backup. It sends out
    >>> pg_control
    >>> + * file, optionally WAL segments and ending WAL location.
    >>> + */
    >>>
    >>> Comments seem out-dated.
    >>>
    >>> 2. With parallel jobs, maxrate is now not supported. Since we are now
    >>> asking
    >>> data in multiple threads throttling seems important here. Can you please
    >>> explain why have you disabled that?
    >>>
    >>> 3. As we are always fetching a single file and as Robert suggested, let
    >>> rename
    >>> SEND_FILES to SEND_FILE instead.
    >>>
    >>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>> Windows?
    >>> I asked this as I see that pgbench has its own implementation for
    >>> pthread_create() for WIN32 but this patch doesn't.
    >>>
    >>> 5. Typos:
    >>> tablspace => tablespace
    >>> safly => safely
    >>>
    >>> 6. parallel_backup_run() needs some comments explaining the states it
    >>> goes
    >>> through PB_* states.
    >>>
    >>> 7.
    >>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    >>> +                if (backupinfo->activeworkers == 0)
    >>> +                {
    >>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>> +                    free_filelist(backupinfo);
    >>> +                }
    >>> +                break;
    >>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server
    >>> */
    >>> +                if (backupinfo->activeworkers == 0)
    >>> +                {
    >>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>> +                }
    >>> +                break;
    >>>
    >>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>
    >>> Thanks
    >>> --
    >>> Jeevan Chalke
    >>> Associate Database Architect & Team Lead, Product Development
    >>> EnterpriseDB Corporation
    >>> The Enterprise PostgreSQL Company
    >>>
    >>> Phone: +91 20 66449694
    >>>
    >>> Website: www.enterprisedb.com
    >>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
    >>> Follow us on Twitter: http://www.twitter.com/enterprisedb
    >>>
    >>> This e-mail message (and any attachment) is intended for the use of the
    >>> individual or entity to whom it is addressed. This message contains
    >>> information from EnterpriseDB Corporation that may be privileged,
    >>> confidential, or exempt from disclosure under applicable law. If you are
    >>> not the intended recipient or authorized to receive this for the intended
    >>> recipient, any use, dissemination, distribution, retention, archiving, or
    >>> copying of this communication is strictly prohibited. If you have received
    >>> this e-mail in error, please notify the sender immediately by reply e-mail
    >>> and delete this message.
    >>>
    >>
    
  55. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-03-27T17:33:28Z

    On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > While testing further I observed parallel backup is not able to take
    > backup of standby server.
    >
    > mkdir /tmp/archive_dir
    > echo "archive_mode='on'">> data/postgresql.conf
    > echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
    >
    > ./pg_ctl -D data -l logs start
    > ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    >
    > echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    > /tmp/slave/postgresql.conf
    > echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    > /tmp/slave/postgresql.conf
    > echo "promote_trigger_file='/tmp/failover.log'">>
    > /tmp/slave/postgresql.conf
    >
    > ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    >
    > [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    > pg_is_in_recovery();"
    >  pg_is_in_recovery
    > -------------------
    >  f
    > (1 row)
    >
    > [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    > pg_is_in_recovery();"
    >  pg_is_in_recovery
    > -------------------
    >  t
    > (1 row)
    >
    >
    >
    >
    > *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    > 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    > promoted during online backupHINT:  This means that the backup being taken
    > is corrupt and should not be used. Try taking another online
    > backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    >
    > #same is working fine without parallel backup
    > [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    > [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    > /tmp/bkp_s/PG_VERSION
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Hi Asif,
    >>
    >> In another scenarios, bkp data is corrupted for tablespace. again this is
    >> not reproducible everytime,
    >> but If I am running the same set of commands I am getting the same error.
    >>
    >> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    >> waiting for server to start.... done
    >> server started
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ mkdir /tmp/tblsp
    >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
    >> location '/tmp/tblsp';"
    >> CREATE TABLESPACE
    >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
    >> tablespace tblsp;"
    >> CREATE DATABASE
    >> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
    >> text);"
    >> CREATE TABLE
    >> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
    >> values ('parallel_backup with tablespace');"
    >> INSERT 0 1
    >> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    >> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    >> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
    >> start
    >> waiting for server to start.... done
    >> server started
    >> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    >> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >>   oid  |  spcname   | spcowner | spcacl | spcoptions
    >> -------+------------+----------+--------+------------
    >>   1663 | pg_default |       10 |        |
    >>  16384 | tblsp      |       10 |        |
    >> (2 rows)
    >>
    >> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    >> psql: error: could not connect to server: FATAL:
    >>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    >> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    >> missing.
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ls
    >> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >> [edb@localhost bin]$ ls
    >> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >> ls: cannot access
    >> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    >> directory
    >>
    >>
    >> Thanks & Regards,
    >> Rajkumar Raghuwanshi
    >>
    >>
    >> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    >> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>
    >>> Hi Asif,
    >>>
    >>> On testing further, I found when taking backup with -R, pg_basebackup
    >>> crashed
    >>> this crash is not consistently reproducible.
    >>>
    >>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    >>> text);"
    >>> CREATE TABLE
    >>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
    >>> values ('parallel_backup with -R recovery-conf');"
    >>> INSERT 0 1
    >>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    >>> -R
    >>> Segmentation fault (core dumped)
    >>>
    >>> stack trace looks the same as it was on earlier reported crash with
    >>> tablespace.
    >>> --stack trace
    >>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >>> Loaded symbols for /lib64/libnss_files.so.2
    >>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    >>> -R'.
    >>> Program terminated with signal 11, Segmentation fault.
    >>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>> pg_basebackup.c:3175
    >>> 3175 backupinfo->curr = fetchfile->next;
    >>> Missing separate debuginfos, use: debuginfo-install
    >>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >>> (gdb) bt
    >>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>> pg_basebackup.c:3175
    >>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >>> pg_basebackup.c:2715
    >>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >>> pthread_create.c:301
    >>> #3  0x00000039212e8c4d in clone () at
    >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >>> (gdb)
    >>>
    >>> Thanks & Regards,
    >>> Rajkumar Raghuwanshi
    >>>
    >>>
    >>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >>> jeevan.chalke@enterprisedb.com> wrote:
    >>>
    >>>> Hi Asif,
    >>>>
    >>>>
    >>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>>>> patch to the latest master (b7f64c64).
    >>>>> (V9 of the patches are attached).
    >>>>>
    >>>>
    >>>> I had a further review of the patches and here are my few observations:
    >>>>
    >>>> 1.
    >>>> +/*
    >>>> + * stop_backup() - ends an online backup
    >>>> + *
    >>>> + * The function is called at the end of an online backup. It sends out
    >>>> pg_control
    >>>> + * file, optionally WAL segments and ending WAL location.
    >>>> + */
    >>>>
    >>>> Comments seem out-dated.
    >>>>
    >>>
    Fixed.
    
    
    >
    >>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
    >>>> asking
    >>>> data in multiple threads throttling seems important here. Can you please
    >>>> explain why have you disabled that?
    >>>>
    >>>> 3. As we are always fetching a single file and as Robert suggested, let
    >>>> rename
    >>>> SEND_FILES to SEND_FILE instead.
    >>>>
    >>>
    Yes, we are fetching a single file. However, SEND_FILES is still capable of
    fetching multiple files in one
    go, that's why the name.
    
    
    >>>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>>> Windows?
    >>>> I asked this as I see that pgbench has its own implementation for
    >>>> pthread_create() for WIN32 but this patch doesn't.
    >>>>
    >>>
    patch is updated to add support for the Windows platform.
    
    
    >>>> 5. Typos:
    >>>> tablspace => tablespace
    >>>> safly => safely
    >>>>
    >>>> Done.
    
    
    > 6. parallel_backup_run() needs some comments explaining the states it goes
    >>>> through PB_* states.
    >>>>
    >>>> 7.
    >>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    >>>> +                if (backupinfo->activeworkers == 0)
    >>>> +                {
    >>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>>> +                    free_filelist(backupinfo);
    >>>> +                }
    >>>> +                break;
    >>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server
    >>>> */
    >>>> +                if (backupinfo->activeworkers == 0)
    >>>> +                {
    >>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>>> +                }
    >>>> +                break;
    >>>>
    >>> Done.
    
    
    >
    >>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>>
    >>> Done.
    
    The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
    A pointer
    variable remained uninitialized which in turn caused the system to
    misbehave.
    
    Attached is the updated set of patches. AFAIK, to complete parallel backup
    feature
    set, there remain three sub-features:
    
    1- parallel backup does not work with a standby server. In parallel backup,
    the server
    spawns multiple processes and there is no shared state being maintained. So
    currently,
    no way to tell multiple processes if the standby was promoted during the
    backup since
    the START_BACKUP was called.
    
    2- throttling. Robert previously suggested that we implement throttling on
    the client-side.
    However, I found a previous discussion where it was advocated to be added
    to the
    backend instead[1].
    
    So, it was better to have a consensus before moving the throttle function
    to the client.
    That’s why for the time being I have disabled it and have asked for
    suggestions on it
    to move forward.
    
    It seems to me that we have to maintain a shared state in order to support
    taking backup
    from standby. Also, there is a new feature recently committed for backup
    progress
    reporting in the backend (pg_stat_progress_basebackup). This functionality
    was recently
    added via this commit ID: e65497df. For parallel backup to update these
    stats, a shared
    state will be required.
    
    Since multiple pg_basebackup can be running at the same time, maintaining a
    shared state
    can become a little complex, unless we disallow taking multiple parallel
    backups.
    
    So proceeding on with this patch, I will be working on:
    - throttling to be implemented on the client-side.
    - adding a shared state to handle backup from the standby.
    
    
    
    [1]
    https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  56. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-30T10:43:47Z

    Thanks Asif,
    
    I have re-verified reported issue. expect standby backup, others are fixed.
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Hi Asif,
    >>
    >> While testing further I observed parallel backup is not able to take
    >> backup of standby server.
    >>
    >> mkdir /tmp/archive_dir
    >> echo "archive_mode='on'">> data/postgresql.conf
    >> echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
    >>
    >> ./pg_ctl -D data -l logs start
    >> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    >>
    >> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    >> /tmp/slave/postgresql.conf
    >> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    >> /tmp/slave/postgresql.conf
    >> echo "promote_trigger_file='/tmp/failover.log'">>
    >> /tmp/slave/postgresql.conf
    >>
    >> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    >>
    >> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    >> pg_is_in_recovery();"
    >>  pg_is_in_recovery
    >> -------------------
    >>  f
    >> (1 row)
    >>
    >> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    >> pg_is_in_recovery();"
    >>  pg_is_in_recovery
    >> -------------------
    >>  t
    >> (1 row)
    >>
    >>
    >>
    >>
    >> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    >> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    >> promoted during online backupHINT:  This means that the backup being taken
    >> is corrupt and should not be used. Try taking another online
    >> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    >>
    >> #same is working fine without parallel backup
    >> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    >> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    >> /tmp/bkp_s/PG_VERSION
    >>
    >> Thanks & Regards,
    >> Rajkumar Raghuwanshi
    >>
    >>
    >> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    >> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>
    >>> Hi Asif,
    >>>
    >>> In another scenarios, bkp data is corrupted for tablespace. again this
    >>> is not reproducible everytime,
    >>> but If I am running the same set of commands I am getting the same error.
    >>>
    >>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    >>> waiting for server to start.... done
    >>> server started
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ mkdir /tmp/tblsp
    >>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
    >>> tblsp location '/tmp/tblsp';"
    >>> CREATE TABLESPACE
    >>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
    >>> tablespace tblsp;"
    >>> CREATE DATABASE
    >>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
    >>> text);"
    >>> CREATE TABLE
    >>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
    >>> values ('parallel_backup with tablespace');"
    >>> INSERT 0 1
    >>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    >>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    >>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
    >>> start
    >>> waiting for server to start.... done
    >>> server started
    >>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    >>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >>>   oid  |  spcname   | spcowner | spcacl | spcoptions
    >>> -------+------------+----------+--------+------------
    >>>   1663 | pg_default |       10 |        |
    >>>  16384 | tblsp      |       10 |        |
    >>> (2 rows)
    >>>
    >>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    >>> psql: error: could not connect to server: FATAL:
    >>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    >>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    >>> missing.
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ ls
    >>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>> [edb@localhost bin]$ ls
    >>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>> ls: cannot access
    >>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    >>> directory
    >>>
    >>>
    >>> Thanks & Regards,
    >>> Rajkumar Raghuwanshi
    >>>
    >>>
    >>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    >>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>
    >>>> Hi Asif,
    >>>>
    >>>> On testing further, I found when taking backup with -R, pg_basebackup
    >>>> crashed
    >>>> this crash is not consistently reproducible.
    >>>>
    >>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    >>>> text);"
    >>>> CREATE TABLE
    >>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
    >>>> values ('parallel_backup with -R recovery-conf');"
    >>>> INSERT 0 1
    >>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
    >>>> -R
    >>>> Segmentation fault (core dumped)
    >>>>
    >>>> stack trace looks the same as it was on earlier reported crash with
    >>>> tablespace.
    >>>> --stack trace
    >>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >>>> Loaded symbols for /lib64/libnss_files.so.2
    >>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
    >>>> /tmp/test_bkp/bkp -R'.
    >>>> Program terminated with signal 11, Segmentation fault.
    >>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>> pg_basebackup.c:3175
    >>>> 3175 backupinfo->curr = fetchfile->next;
    >>>> Missing separate debuginfos, use: debuginfo-install
    >>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >>>> (gdb) bt
    >>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>> pg_basebackup.c:3175
    >>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >>>> pg_basebackup.c:2715
    >>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >>>> pthread_create.c:301
    >>>> #3  0x00000039212e8c4d in clone () at
    >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >>>> (gdb)
    >>>>
    >>>> Thanks & Regards,
    >>>> Rajkumar Raghuwanshi
    >>>>
    >>>>
    >>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >>>> jeevan.chalke@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif,
    >>>>>
    >>>>>
    >>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>>>>> patch to the latest master (b7f64c64).
    >>>>>> (V9 of the patches are attached).
    >>>>>>
    >>>>>
    >>>>> I had a further review of the patches and here are my few observations:
    >>>>>
    >>>>> 1.
    >>>>> +/*
    >>>>> + * stop_backup() - ends an online backup
    >>>>> + *
    >>>>> + * The function is called at the end of an online backup. It sends
    >>>>> out pg_control
    >>>>> + * file, optionally WAL segments and ending WAL location.
    >>>>> + */
    >>>>>
    >>>>> Comments seem out-dated.
    >>>>>
    >>>>
    > Fixed.
    >
    >
    >>
    >>>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
    >>>>> asking
    >>>>> data in multiple threads throttling seems important here. Can you
    >>>>> please
    >>>>> explain why have you disabled that?
    >>>>>
    >>>>> 3. As we are always fetching a single file and as Robert suggested,
    >>>>> let rename
    >>>>> SEND_FILES to SEND_FILE instead.
    >>>>>
    >>>>
    > Yes, we are fetching a single file. However, SEND_FILES is still capable
    > of fetching multiple files in one
    > go, that's why the name.
    >
    >
    >>>>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>>>> Windows?
    >>>>> I asked this as I see that pgbench has its own implementation for
    >>>>> pthread_create() for WIN32 but this patch doesn't.
    >>>>>
    >>>>
    > patch is updated to add support for the Windows platform.
    >
    >
    >>>>> 5. Typos:
    >>>>> tablspace => tablespace
    >>>>> safly => safely
    >>>>>
    >>>>> Done.
    >
    >
    >> 6. parallel_backup_run() needs some comments explaining the states it goes
    >>>>> through PB_* states.
    >>>>>
    >>>>> 7.
    >>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    >>>>> +                if (backupinfo->activeworkers == 0)
    >>>>> +                {
    >>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>>>> +                    free_filelist(backupinfo);
    >>>>> +                }
    >>>>> +                break;
    >>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
    >>>>> server */
    >>>>> +                if (backupinfo->activeworkers == 0)
    >>>>> +                {
    >>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>>>> +                }
    >>>>> +                break;
    >>>>>
    >>>> Done.
    >
    >
    >>
    >>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>>>
    >>>> Done.
    >
    > The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
    > A pointer
    > variable remained uninitialized which in turn caused the system to
    > misbehave.
    >
    > Attached is the updated set of patches. AFAIK, to complete parallel backup
    > feature
    > set, there remain three sub-features:
    >
    > 1- parallel backup does not work with a standby server. In parallel
    > backup, the server
    > spawns multiple processes and there is no shared state being maintained.
    > So currently,
    > no way to tell multiple processes if the standby was promoted during the
    > backup since
    > the START_BACKUP was called.
    >
    > 2- throttling. Robert previously suggested that we implement throttling on
    > the client-side.
    > However, I found a previous discussion where it was advocated to be added
    > to the
    > backend instead[1].
    >
    > So, it was better to have a consensus before moving the throttle function
    > to the client.
    > That’s why for the time being I have disabled it and have asked for
    > suggestions on it
    > to move forward.
    >
    > It seems to me that we have to maintain a shared state in order to support
    > taking backup
    > from standby. Also, there is a new feature recently committed for backup
    > progress
    > reporting in the backend (pg_stat_progress_basebackup). This functionality
    > was recently
    > added via this commit ID: e65497df. For parallel backup to update these
    > stats, a shared
    > state will be required.
    >
    > Since multiple pg_basebackup can be running at the same time, maintaining
    > a shared state
    > can become a little complex, unless we disallow taking multiple parallel
    > backups.
    >
    > So proceeding on with this patch, I will be working on:
    > - throttling to be implemented on the client-side.
    > - adding a shared state to handle backup from the standby.
    >
    >
    >
    > [1]
    > https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  57. Re: WIP/PoC for parallel backup

    Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-03-30T12:58:18Z

    On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Thanks Asif,
    >
    > I have re-verified reported issue. expect standby backup, others are fixed.
    >
    
    Yes As Asif mentioned he is working on the standby issue and adding
    bandwidth throttling functionality to parallel backup.
    
    It would be good to get some feedback on Asif previous email from Robert on
    the design considerations for stand-by server support and throttling. I
    believe all the other points mentioned by Robert in this thread are
    addressed by Asif so it would be good to hear about any other concerns that
    are not addressed.
    
    Thanks,
    
    -- Ahsan
    
    
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >>
    >> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
    >> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>
    >>> Hi Asif,
    >>>
    >>> While testing further I observed parallel backup is not able to take
    >>> backup of standby server.
    >>>
    >>> mkdir /tmp/archive_dir
    >>> echo "archive_mode='on'">> data/postgresql.conf
    >>> echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
    >>>
    >>> ./pg_ctl -D data -l logs start
    >>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    >>>
    >>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    >>> /tmp/slave/postgresql.conf
    >>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    >>> /tmp/slave/postgresql.conf
    >>> echo "promote_trigger_file='/tmp/failover.log'">>
    >>> /tmp/slave/postgresql.conf
    >>>
    >>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    >>>
    >>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    >>> pg_is_in_recovery();"
    >>>  pg_is_in_recovery
    >>> -------------------
    >>>  f
    >>> (1 row)
    >>>
    >>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    >>> pg_is_in_recovery();"
    >>>  pg_is_in_recovery
    >>> -------------------
    >>>  t
    >>> (1 row)
    >>>
    >>>
    >>>
    >>>
    >>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    >>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    >>> promoted during online backupHINT:  This means that the backup being taken
    >>> is corrupt and should not be used. Try taking another online
    >>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    >>>
    >>> #same is working fine without parallel backup
    >>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    >>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    >>> /tmp/bkp_s/PG_VERSION
    >>>
    >>> Thanks & Regards,
    >>> Rajkumar Raghuwanshi
    >>>
    >>>
    >>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    >>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>
    >>>> Hi Asif,
    >>>>
    >>>> In another scenarios, bkp data is corrupted for tablespace. again this
    >>>> is not reproducible everytime,
    >>>> but If I am running the same set of commands I am getting the same
    >>>> error.
    >>>>
    >>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    >>>> waiting for server to start.... done
    >>>> server started
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ mkdir /tmp/tblsp
    >>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
    >>>> tblsp location '/tmp/tblsp';"
    >>>> CREATE TABLESPACE
    >>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
    >>>> testdb tablespace tblsp;"
    >>>> CREATE DATABASE
    >>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
    >>>> text);"
    >>>> CREATE TABLE
    >>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
    >>>> values ('parallel_backup with tablespace');"
    >>>> INSERT 0 1
    >>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    >>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    >>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
    >>>> 5555" start
    >>>> waiting for server to start.... done
    >>>> server started
    >>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    >>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
    >>>> -------+------------+----------+--------+------------
    >>>>   1663 | pg_default |       10 |        |
    >>>>  16384 | tblsp      |       10 |        |
    >>>> (2 rows)
    >>>>
    >>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    >>>> psql: error: could not connect to server: FATAL:
    >>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    >>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    >>>> missing.
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ ls
    >>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>> [edb@localhost bin]$ ls
    >>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>> ls: cannot access
    >>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    >>>> directory
    >>>>
    >>>>
    >>>> Thanks & Regards,
    >>>> Rajkumar Raghuwanshi
    >>>>
    >>>>
    >>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    >>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif,
    >>>>>
    >>>>> On testing further, I found when taking backup with -R, pg_basebackup
    >>>>> crashed
    >>>>> this crash is not consistently reproducible.
    >>>>>
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
    >>>>> text);"
    >>>>> CREATE TABLE
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
    >>>>> values ('parallel_backup with -R recovery-conf');"
    >>>>> INSERT 0 1
    >>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
    >>>>> /tmp/test_bkp/bkp -R
    >>>>> Segmentation fault (core dumped)
    >>>>>
    >>>>> stack trace looks the same as it was on earlier reported crash with
    >>>>> tablespace.
    >>>>> --stack trace
    >>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >>>>> Loaded symbols for /lib64/libnss_files.so.2
    >>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
    >>>>> /tmp/test_bkp/bkp -R'.
    >>>>> Program terminated with signal 11, Segmentation fault.
    >>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>> pg_basebackup.c:3175
    >>>>> 3175 backupinfo->curr = fetchfile->next;
    >>>>> Missing separate debuginfos, use: debuginfo-install
    >>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >>>>> (gdb) bt
    >>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>> pg_basebackup.c:3175
    >>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >>>>> pg_basebackup.c:2715
    >>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >>>>> pthread_create.c:301
    >>>>> #3  0x00000039212e8c4d in clone () at
    >>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >>>>> (gdb)
    >>>>>
    >>>>> Thanks & Regards,
    >>>>> Rajkumar Raghuwanshi
    >>>>>
    >>>>>
    >>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >>>>> jeevan.chalke@enterprisedb.com> wrote:
    >>>>>
    >>>>>> Hi Asif,
    >>>>>>
    >>>>>>
    >>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>>>>>> patch to the latest master (b7f64c64).
    >>>>>>> (V9 of the patches are attached).
    >>>>>>>
    >>>>>>
    >>>>>> I had a further review of the patches and here are my few
    >>>>>> observations:
    >>>>>>
    >>>>>> 1.
    >>>>>> +/*
    >>>>>> + * stop_backup() - ends an online backup
    >>>>>> + *
    >>>>>> + * The function is called at the end of an online backup. It sends
    >>>>>> out pg_control
    >>>>>> + * file, optionally WAL segments and ending WAL location.
    >>>>>> + */
    >>>>>>
    >>>>>> Comments seem out-dated.
    >>>>>>
    >>>>>
    >> Fixed.
    >>
    >>
    >>>
    >>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
    >>>>>> asking
    >>>>>> data in multiple threads throttling seems important here. Can you
    >>>>>> please
    >>>>>> explain why have you disabled that?
    >>>>>>
    >>>>>> 3. As we are always fetching a single file and as Robert suggested,
    >>>>>> let rename
    >>>>>> SEND_FILES to SEND_FILE instead.
    >>>>>>
    >>>>>
    >> Yes, we are fetching a single file. However, SEND_FILES is still capable
    >> of fetching multiple files in one
    >> go, that's why the name.
    >>
    >>
    >>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>>>>> Windows?
    >>>>>> I asked this as I see that pgbench has its own implementation for
    >>>>>> pthread_create() for WIN32 but this patch doesn't.
    >>>>>>
    >>>>>
    >> patch is updated to add support for the Windows platform.
    >>
    >>
    >>>>>> 5. Typos:
    >>>>>> tablspace => tablespace
    >>>>>> safly => safely
    >>>>>>
    >>>>>> Done.
    >>
    >>
    >>> 6. parallel_backup_run() needs some comments explaining the states it
    >>>>>> goes
    >>>>>> through PB_* states.
    >>>>>>
    >>>>>> 7.
    >>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
    >>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>> +                {
    >>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>>>>> +                    free_filelist(backupinfo);
    >>>>>> +                }
    >>>>>> +                break;
    >>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
    >>>>>> server */
    >>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>> +                {
    >>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>>>>> +                }
    >>>>>> +                break;
    >>>>>>
    >>>>> Done.
    >>
    >>
    >>>
    >>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>>>>
    >>>>> Done.
    >>
    >> The corrupted tablespace and crash, reported by Rajkumar, have been
    >> fixed. A pointer
    >> variable remained uninitialized which in turn caused the system to
    >> misbehave.
    >>
    >> Attached is the updated set of patches. AFAIK, to complete parallel
    >> backup feature
    >> set, there remain three sub-features:
    >>
    >> 1- parallel backup does not work with a standby server. In parallel
    >> backup, the server
    >> spawns multiple processes and there is no shared state being maintained.
    >> So currently,
    >> no way to tell multiple processes if the standby was promoted during the
    >> backup since
    >> the START_BACKUP was called.
    >>
    >> 2- throttling. Robert previously suggested that we implement
    >> throttling on the client-side.
    >> However, I found a previous discussion where it was advocated to be added
    >> to the
    >> backend instead[1].
    >>
    >> So, it was better to have a consensus before moving the throttle function
    >> to the client.
    >> That’s why for the time being I have disabled it and have asked for
    >> suggestions on it
    >> to move forward.
    >>
    >> It seems to me that we have to maintain a shared state in order to
    >> support taking backup
    >> from standby. Also, there is a new feature recently committed for backup
    >> progress
    >> reporting in the backend (pg_stat_progress_basebackup). This
    >> functionality was recently
    >> added via this commit ID: e65497df. For parallel backup to update these
    >> stats, a shared
    >> state will be required.
    >>
    >> Since multiple pg_basebackup can be running at the same time, maintaining
    >> a shared state
    >> can become a little complex, unless we disallow taking multiple parallel
    >> backups.
    >>
    >> So proceeding on with this patch, I will be working on:
    >> - throttling to be implemented on the client-side.
    >> - adding a shared state to handle backup from the standby.
    >>
    >>
    >>
    >> [1]
    >> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>
    >>
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    
    -- 
    Highgo Software (Canada/China/Pakistan)
    URL : http://www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    EMAIL: mailto: ahsan.hadi@highgo.ca
    
  58. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-02T09:57:52Z

    Hi Asif,
    
    My colleague Kashif Zeeshan reported an issue off-list, posting here,
    please take a look.
    
    When executing two backups at the same time, getting FATAL error due to
    max_wal_senders and instead of exit  Backup got completed
    And when tried to start the server from the backup cluster, getting error.
    
    [edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
    [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup/
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_57849"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (3) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (4) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (5) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (6) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (7) created
    pg_basebackup: write-ahead log end point: 0/C3000050
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: syncing data to disk ...
    pg_basebackup: base backup completed
    [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_57848"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (3) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (4) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (5) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (6) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    pg_basebackup: backup worker (7) created
    pg_basebackup: write-ahead log end point: 0/C2000348
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: syncing data to disk ...
    pg_basebackup: base backup completed
    
    [edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/  -o "-p 5438"
    start
    pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
    directory
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    
    >
    >
    > On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >> Thanks Asif,
    >>
    >> I have re-verified reported issue. expect standby backup, others are
    >> fixed.
    >>
    >
    > Yes As Asif mentioned he is working on the standby issue and adding
    > bandwidth throttling functionality to parallel backup.
    >
    > It would be good to get some feedback on Asif previous email from Robert
    > on the design considerations for stand-by server support and throttling. I
    > believe all the other points mentioned by Robert in this thread are
    > addressed by Asif so it would be good to hear about any other concerns that
    > are not addressed.
    >
    > Thanks,
    >
    > -- Ahsan
    >
    >
    >> Thanks & Regards,
    >> Rajkumar Raghuwanshi
    >>
    >>
    >> On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>>
    >>> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
    >>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>
    >>>> Hi Asif,
    >>>>
    >>>> While testing further I observed parallel backup is not able to take
    >>>> backup of standby server.
    >>>>
    >>>> mkdir /tmp/archive_dir
    >>>> echo "archive_mode='on'">> data/postgresql.conf
    >>>> echo "archive_command='cp %p /tmp/archive_dir/%f'">>
    >>>> data/postgresql.conf
    >>>>
    >>>> ./pg_ctl -D data -l logs start
    >>>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    >>>>
    >>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    >>>> /tmp/slave/postgresql.conf
    >>>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    >>>> /tmp/slave/postgresql.conf
    >>>> echo "promote_trigger_file='/tmp/failover.log'">>
    >>>> /tmp/slave/postgresql.conf
    >>>>
    >>>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    >>>>
    >>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    >>>> pg_is_in_recovery();"
    >>>>  pg_is_in_recovery
    >>>> -------------------
    >>>>  f
    >>>> (1 row)
    >>>>
    >>>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    >>>> pg_is_in_recovery();"
    >>>>  pg_is_in_recovery
    >>>> -------------------
    >>>>  t
    >>>> (1 row)
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    >>>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    >>>> promoted during online backupHINT:  This means that the backup being taken
    >>>> is corrupt and should not be used. Try taking another online
    >>>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    >>>>
    >>>> #same is working fine without parallel backup
    >>>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    >>>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    >>>> /tmp/bkp_s/PG_VERSION
    >>>>
    >>>> Thanks & Regards,
    >>>> Rajkumar Raghuwanshi
    >>>>
    >>>>
    >>>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    >>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif,
    >>>>>
    >>>>> In another scenarios, bkp data is corrupted for tablespace. again this
    >>>>> is not reproducible everytime,
    >>>>> but If I am running the same set of commands I am getting the same
    >>>>> error.
    >>>>>
    >>>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    >>>>> waiting for server to start.... done
    >>>>> server started
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$ mkdir /tmp/tblsp
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
    >>>>> tblsp location '/tmp/tblsp';"
    >>>>> CREATE TABLESPACE
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
    >>>>> testdb tablespace tblsp;"
    >>>>> CREATE DATABASE
    >>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
    >>>>> (a text);"
    >>>>> CREATE TABLE
    >>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
    >>>>> values ('parallel_backup with tablespace');"
    >>>>> INSERT 0 1
    >>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    >>>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    >>>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
    >>>>> 5555" start
    >>>>> waiting for server to start.... done
    >>>>> server started
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    >>>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >>>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
    >>>>> -------+------------+----------+--------+------------
    >>>>>   1663 | pg_default |       10 |        |
    >>>>>  16384 | tblsp      |       10 |        |
    >>>>> (2 rows)
    >>>>>
    >>>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
    >>>>> psql: error: could not connect to server: FATAL:
    >>>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    >>>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    >>>>> missing.
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$ ls
    >>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>> [edb@localhost bin]$ ls
    >>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>> ls: cannot access
    >>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    >>>>> directory
    >>>>>
    >>>>>
    >>>>> Thanks & Regards,
    >>>>> Rajkumar Raghuwanshi
    >>>>>
    >>>>>
    >>>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    >>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>>
    >>>>>> Hi Asif,
    >>>>>>
    >>>>>> On testing further, I found when taking backup with -R, pg_basebackup
    >>>>>> crashed
    >>>>>> this crash is not consistently reproducible.
    >>>>>>
    >>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
    >>>>>> (a text);"
    >>>>>> CREATE TABLE
    >>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
    >>>>>> values ('parallel_backup with -R recovery-conf');"
    >>>>>> INSERT 0 1
    >>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
    >>>>>> /tmp/test_bkp/bkp -R
    >>>>>> Segmentation fault (core dumped)
    >>>>>>
    >>>>>> stack trace looks the same as it was on earlier reported crash with
    >>>>>> tablespace.
    >>>>>> --stack trace
    >>>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >>>>>> Loaded symbols for /lib64/libnss_files.so.2
    >>>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
    >>>>>> /tmp/test_bkp/bkp -R'.
    >>>>>> Program terminated with signal 11, Segmentation fault.
    >>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>>> pg_basebackup.c:3175
    >>>>>> 3175 backupinfo->curr = fetchfile->next;
    >>>>>> Missing separate debuginfos, use: debuginfo-install
    >>>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >>>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >>>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >>>>>> (gdb) bt
    >>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>>> pg_basebackup.c:3175
    >>>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >>>>>> pg_basebackup.c:2715
    >>>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >>>>>> pthread_create.c:301
    >>>>>> #3  0x00000039212e8c4d in clone () at
    >>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >>>>>> (gdb)
    >>>>>>
    >>>>>> Thanks & Regards,
    >>>>>> Rajkumar Raghuwanshi
    >>>>>>
    >>>>>>
    >>>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >>>>>> jeevan.chalke@enterprisedb.com> wrote:
    >>>>>>
    >>>>>>> Hi Asif,
    >>>>>>>
    >>>>>>>
    >>>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
    >>>>>>>> patch to the latest master (b7f64c64).
    >>>>>>>> (V9 of the patches are attached).
    >>>>>>>>
    >>>>>>>
    >>>>>>> I had a further review of the patches and here are my few
    >>>>>>> observations:
    >>>>>>>
    >>>>>>> 1.
    >>>>>>> +/*
    >>>>>>> + * stop_backup() - ends an online backup
    >>>>>>> + *
    >>>>>>> + * The function is called at the end of an online backup. It sends
    >>>>>>> out pg_control
    >>>>>>> + * file, optionally WAL segments and ending WAL location.
    >>>>>>> + */
    >>>>>>>
    >>>>>>> Comments seem out-dated.
    >>>>>>>
    >>>>>>
    >>> Fixed.
    >>>
    >>>
    >>>>
    >>>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are
    >>>>>>> now asking
    >>>>>>> data in multiple threads throttling seems important here. Can you
    >>>>>>> please
    >>>>>>> explain why have you disabled that?
    >>>>>>>
    >>>>>>> 3. As we are always fetching a single file and as Robert suggested,
    >>>>>>> let rename
    >>>>>>> SEND_FILES to SEND_FILE instead.
    >>>>>>>
    >>>>>>
    >>> Yes, we are fetching a single file. However, SEND_FILES is still capable
    >>> of fetching multiple files in one
    >>> go, that's why the name.
    >>>
    >>>
    >>>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>>>>>> Windows?
    >>>>>>> I asked this as I see that pgbench has its own implementation for
    >>>>>>> pthread_create() for WIN32 but this patch doesn't.
    >>>>>>>
    >>>>>>
    >>> patch is updated to add support for the Windows platform.
    >>>
    >>>
    >>>>>>> 5. Typos:
    >>>>>>> tablspace => tablespace
    >>>>>>> safly => safely
    >>>>>>>
    >>>>>>> Done.
    >>>
    >>>
    >>>> 6. parallel_backup_run() needs some comments explaining the states it
    >>>>>>> goes
    >>>>>>> through PB_* states.
    >>>>>>>
    >>>>>>> 7.
    >>>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server
    >>>>>>> */
    >>>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>>> +                {
    >>>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>>>>>> +                    free_filelist(backupinfo);
    >>>>>>> +                }
    >>>>>>> +                break;
    >>>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
    >>>>>>> server */
    >>>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>>> +                {
    >>>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>>>>>> +                }
    >>>>>>> +                break;
    >>>>>>>
    >>>>>> Done.
    >>>
    >>>
    >>>>
    >>>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>>>>>
    >>>>>> Done.
    >>>
    >>> The corrupted tablespace and crash, reported by Rajkumar, have been
    >>> fixed. A pointer
    >>> variable remained uninitialized which in turn caused the system to
    >>> misbehave.
    >>>
    >>> Attached is the updated set of patches. AFAIK, to complete parallel
    >>> backup feature
    >>> set, there remain three sub-features:
    >>>
    >>> 1- parallel backup does not work with a standby server. In parallel
    >>> backup, the server
    >>> spawns multiple processes and there is no shared state being maintained.
    >>> So currently,
    >>> no way to tell multiple processes if the standby was promoted during the
    >>> backup since
    >>> the START_BACKUP was called.
    >>>
    >>> 2- throttling. Robert previously suggested that we implement
    >>> throttling on the client-side.
    >>> However, I found a previous discussion where it was advocated to be
    >>> added to the
    >>> backend instead[1].
    >>>
    >>> So, it was better to have a consensus before moving the throttle
    >>> function to the client.
    >>> That’s why for the time being I have disabled it and have asked for
    >>> suggestions on it
    >>> to move forward.
    >>>
    >>> It seems to me that we have to maintain a shared state in order to
    >>> support taking backup
    >>> from standby. Also, there is a new feature recently committed for backup
    >>> progress
    >>> reporting in the backend (pg_stat_progress_basebackup). This
    >>> functionality was recently
    >>> added via this commit ID: e65497df. For parallel backup to update these
    >>> stats, a shared
    >>> state will be required.
    >>>
    >>> Since multiple pg_basebackup can be running at the same time,
    >>> maintaining a shared state
    >>> can become a little complex, unless we disallow taking multiple parallel
    >>> backups.
    >>>
    >>> So proceeding on with this patch, I will be working on:
    >>> - throttling to be implemented on the client-side.
    >>> - adding a shared state to handle backup from the standby.
    >>>
    >>>
    >>>
    >>> [1]
    >>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>>
    >>>
    >>> --
    >>> Asif Rehman
    >>> Highgo Software (Canada/China/Pakistan)
    >>> URL : www.highgo.ca
    >>>
    >>>
    >
    > --
    > Highgo Software (Canada/China/Pakistan)
    > URL : http://www.highgo.ca
    > ADDR: 10318 WHALLEY BLVD, Surrey, BC
    > EMAIL: mailto: ahsan.hadi@highgo.ca
    >
    
  59. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T11:29:47Z

    Hi Asif
    
    The backup failed with errors "error: could not connect to server: could
    not look up local user ID 1000: Too many open files" when the
    max_wal_senders was set to 2000.
    The errors generated for the workers starting from backup worke=1017.
    Please note that the backup directory was also not cleaned after the backup
    was failed.
    
    
    Steps
    =======
    1) Generate data in DB
     ./pgbench -i -s 600 -h localhost  -p 5432 postgres
    2) Set max_wal_senders = 2000 in postgresql.
    3) Generate the backup
    
    
    [edb@localhost bin]$
    ^[[A[edb@localhost bin]$
    [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
     /home/edb/Desktop/backup/
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 1/F1000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_58692"
    pg_basebackup: backup worker (0) created
    ….
    …..
    …..
    pg_basebackup: backup worker (1017) created
    pg_basebackup: error: could not connect to server: could not look up local
    user ID 1000: Too many open files
    pg_basebackup: backup worker (1018) created
    pg_basebackup: error: could not connect to server: could not look up local
    user ID 1000: Too many open files
    …
    …
    …
    pg_basebackup: error: could not connect to server: could not look up local
    user ID 1000: Too many open files
    pg_basebackup: backup worker (1989) created
    pg_basebackup: error: could not create file
    "/home/edb/Desktop/backup//global/4183": Too many open files
    pg_basebackup: error: could not create file
    "/home/edb/Desktop/backup//global/3592": Too many open files
    pg_basebackup: error: could not create file
    "/home/edb/Desktop/backup//global/4177": Too many open files
    [edb@localhost bin]$
    
    
    4) The backup directory is not cleaned
    
    
    [edb@localhost bin]$
    [edb@localhost bin]$ ls  /home/edb/Desktop/backup
    base    pg_commit_ts  pg_logical    pg_notify    pg_serial     pg_stat
     pg_subtrans  pg_twophase  pg_xact
    global  pg_dynshmem   pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp
     pg_tblspc    pg_wal
    [edb@localhost bin]$
    
    
    Kashif Zeeshan
    EnterpriseDB
    
    
    On Thu, Apr 2, 2020 at 2:58 PM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > My colleague Kashif Zeeshan reported an issue off-list, posting here,
    > please take a look.
    >
    > When executing two backups at the same time, getting FATAL error due to
    > max_wal_senders and instead of exit  Backup got completed
    > And when tried to start the server from the backup cluster, getting error.
    >
    > [edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
    > [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup/
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_57849"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (5) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (6) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (7) created
    > pg_basebackup: write-ahead log end point: 0/C3000050
    > pg_basebackup: waiting for background process to finish streaming ...
    > pg_basebackup: syncing data to disk ...
    > pg_basebackup: base backup completed
    > [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D
    >  /home/edb/Desktop/backup1/
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_57848"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (5) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (6) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > pg_basebackup: backup worker (7) created
    > pg_basebackup: write-ahead log end point: 0/C2000348
    > pg_basebackup: waiting for background process to finish streaming ...
    > pg_basebackup: syncing data to disk ...
    > pg_basebackup: base backup completed
    >
    > [edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/  -o "-p 5438"
    > start
    > pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
    > directory
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    >
    >>
    >>
    >> On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
    >> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>
    >>> Thanks Asif,
    >>>
    >>> I have re-verified reported issue. expect standby backup, others are
    >>> fixed.
    >>>
    >>
    >> Yes As Asif mentioned he is working on the standby issue and adding
    >> bandwidth throttling functionality to parallel backup.
    >>
    >> It would be good to get some feedback on Asif previous email from Robert
    >> on the design considerations for stand-by server support and throttling. I
    >> believe all the other points mentioned by Robert in this thread are
    >> addressed by Asif so it would be good to hear about any other concerns that
    >> are not addressed.
    >>
    >> Thanks,
    >>
    >> -- Ahsan
    >>
    >>
    >>> Thanks & Regards,
    >>> Rajkumar Raghuwanshi
    >>>
    >>>
    >>> On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>>
    >>>>
    >>>> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
    >>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif,
    >>>>>
    >>>>> While testing further I observed parallel backup is not able to take
    >>>>> backup of standby server.
    >>>>>
    >>>>> mkdir /tmp/archive_dir
    >>>>> echo "archive_mode='on'">> data/postgresql.conf
    >>>>> echo "archive_command='cp %p /tmp/archive_dir/%f'">>
    >>>>> data/postgresql.conf
    >>>>>
    >>>>> ./pg_ctl -D data -l logs start
    >>>>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
    >>>>>
    >>>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
    >>>>> /tmp/slave/postgresql.conf
    >>>>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
    >>>>> /tmp/slave/postgresql.conf
    >>>>> echo "promote_trigger_file='/tmp/failover.log'">>
    >>>>> /tmp/slave/postgresql.conf
    >>>>>
    >>>>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
    >>>>>
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
    >>>>> pg_is_in_recovery();"
    >>>>>  pg_is_in_recovery
    >>>>> -------------------
    >>>>>  f
    >>>>> (1 row)
    >>>>>
    >>>>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
    >>>>> pg_is_in_recovery();"
    >>>>>  pg_is_in_recovery
    >>>>> -------------------
    >>>>>  t
    >>>>> (1 row)
    >>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
    >>>>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
    >>>>> promoted during online backupHINT:  This means that the backup being taken
    >>>>> is corrupt and should not be used. Try taking another online
    >>>>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
    >>>>>
    >>>>> #same is working fine without parallel backup
    >>>>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
    >>>>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
    >>>>> /tmp/bkp_s/PG_VERSION
    >>>>>
    >>>>> Thanks & Regards,
    >>>>> Rajkumar Raghuwanshi
    >>>>>
    >>>>>
    >>>>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
    >>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>>
    >>>>>> Hi Asif,
    >>>>>>
    >>>>>> In another scenarios, bkp data is corrupted for tablespace. again
    >>>>>> this is not reproducible everytime,
    >>>>>> but If I am running the same set of commands I am getting the same
    >>>>>> error.
    >>>>>>
    >>>>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
    >>>>>> waiting for server to start.... done
    >>>>>> server started
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$ mkdir /tmp/tblsp
    >>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
    >>>>>> tblsp location '/tmp/tblsp';"
    >>>>>> CREATE TABLESPACE
    >>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
    >>>>>> testdb tablespace tblsp;"
    >>>>>> CREATE DATABASE
    >>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
    >>>>>> (a text);"
    >>>>>> CREATE TABLE
    >>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
    >>>>>> values ('parallel_backup with tablespace');"
    >>>>>> INSERT 0 1
    >>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
    >>>>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
    >>>>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
    >>>>>> 5555" start
    >>>>>> waiting for server to start.... done
    >>>>>> server started
    >>>>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
    >>>>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
    >>>>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
    >>>>>> -------+------------+----------+--------+------------
    >>>>>>   1663 | pg_default |       10 |        |
    >>>>>>  16384 | tblsp      |       10 |        |
    >>>>>> (2 rows)
    >>>>>>
    >>>>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from
    >>>>>> testtbl";
    >>>>>> psql: error: could not connect to server: FATAL:
    >>>>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
    >>>>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
    >>>>>> missing.
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$ ls
    >>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>>> [edb@localhost bin]$ ls
    >>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
    >>>>>> ls: cannot access
    >>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
    >>>>>> directory
    >>>>>>
    >>>>>>
    >>>>>> Thanks & Regards,
    >>>>>> Rajkumar Raghuwanshi
    >>>>>>
    >>>>>>
    >>>>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
    >>>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >>>>>>
    >>>>>>> Hi Asif,
    >>>>>>>
    >>>>>>> On testing further, I found when taking backup with -R,
    >>>>>>> pg_basebackup crashed
    >>>>>>> this crash is not consistently reproducible.
    >>>>>>>
    >>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
    >>>>>>> (a text);"
    >>>>>>> CREATE TABLE
    >>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
    >>>>>>> values ('parallel_backup with -R recovery-conf');"
    >>>>>>> INSERT 0 1
    >>>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
    >>>>>>> /tmp/test_bkp/bkp -R
    >>>>>>> Segmentation fault (core dumped)
    >>>>>>>
    >>>>>>> stack trace looks the same as it was on earlier reported crash with
    >>>>>>> tablespace.
    >>>>>>> --stack trace
    >>>>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
    >>>>>>> Loaded symbols for /lib64/libnss_files.so.2
    >>>>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
    >>>>>>> /tmp/test_bkp/bkp -R'.
    >>>>>>> Program terminated with signal 11, Segmentation fault.
    >>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>>>> pg_basebackup.c:3175
    >>>>>>> 3175 backupinfo->curr = fetchfile->next;
    >>>>>>> Missing separate debuginfos, use: debuginfo-install
    >>>>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    >>>>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    >>>>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    >>>>>>> (gdb) bt
    >>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
    >>>>>>> pg_basebackup.c:3175
    >>>>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
    >>>>>>> pg_basebackup.c:2715
    >>>>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
    >>>>>>> pthread_create.c:301
    >>>>>>> #3  0x00000039212e8c4d in clone () at
    >>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
    >>>>>>> (gdb)
    >>>>>>>
    >>>>>>> Thanks & Regards,
    >>>>>>> Rajkumar Raghuwanshi
    >>>>>>>
    >>>>>>>
    >>>>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
    >>>>>>> jeevan.chalke@enterprisedb.com> wrote:
    >>>>>>>
    >>>>>>>> Hi Asif,
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased
    >>>>>>>>> the patch to the latest master (b7f64c64).
    >>>>>>>>> (V9 of the patches are attached).
    >>>>>>>>>
    >>>>>>>>
    >>>>>>>> I had a further review of the patches and here are my few
    >>>>>>>> observations:
    >>>>>>>>
    >>>>>>>> 1.
    >>>>>>>> +/*
    >>>>>>>> + * stop_backup() - ends an online backup
    >>>>>>>> + *
    >>>>>>>> + * The function is called at the end of an online backup. It sends
    >>>>>>>> out pg_control
    >>>>>>>> + * file, optionally WAL segments and ending WAL location.
    >>>>>>>> + */
    >>>>>>>>
    >>>>>>>> Comments seem out-dated.
    >>>>>>>>
    >>>>>>>
    >>>> Fixed.
    >>>>
    >>>>
    >>>>>
    >>>>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are
    >>>>>>>> now asking
    >>>>>>>> data in multiple threads throttling seems important here. Can you
    >>>>>>>> please
    >>>>>>>> explain why have you disabled that?
    >>>>>>>>
    >>>>>>>> 3. As we are always fetching a single file and as Robert suggested,
    >>>>>>>> let rename
    >>>>>>>> SEND_FILES to SEND_FILE instead.
    >>>>>>>>
    >>>>>>>
    >>>> Yes, we are fetching a single file. However, SEND_FILES is still
    >>>> capable of fetching multiple files in one
    >>>> go, that's why the name.
    >>>>
    >>>>
    >>>>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
    >>>>>>>> Windows?
    >>>>>>>> I asked this as I see that pgbench has its own implementation for
    >>>>>>>> pthread_create() for WIN32 but this patch doesn't.
    >>>>>>>>
    >>>>>>>
    >>>> patch is updated to add support for the Windows platform.
    >>>>
    >>>>
    >>>>>>>> 5. Typos:
    >>>>>>>> tablspace => tablespace
    >>>>>>>> safly => safely
    >>>>>>>>
    >>>>>>>> Done.
    >>>>
    >>>>
    >>>>> 6. parallel_backup_run() needs some comments explaining the states it
    >>>>>>>> goes
    >>>>>>>> through PB_* states.
    >>>>>>>>
    >>>>>>>> 7.
    >>>>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server
    >>>>>>>> */
    >>>>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>>>> +                {
    >>>>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
    >>>>>>>> +                    free_filelist(backupinfo);
    >>>>>>>> +                }
    >>>>>>>> +                break;
    >>>>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
    >>>>>>>> server */
    >>>>>>>> +                if (backupinfo->activeworkers == 0)
    >>>>>>>> +                {
    >>>>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
    >>>>>>>> +                }
    >>>>>>>> +                break;
    >>>>>>>>
    >>>>>>> Done.
    >>>>
    >>>>
    >>>>>
    >>>>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
    >>>>>>>>
    >>>>>>> Done.
    >>>>
    >>>> The corrupted tablespace and crash, reported by Rajkumar, have been
    >>>> fixed. A pointer
    >>>> variable remained uninitialized which in turn caused the system to
    >>>> misbehave.
    >>>>
    >>>> Attached is the updated set of patches. AFAIK, to complete parallel
    >>>> backup feature
    >>>> set, there remain three sub-features:
    >>>>
    >>>> 1- parallel backup does not work with a standby server. In parallel
    >>>> backup, the server
    >>>> spawns multiple processes and there is no shared state being
    >>>> maintained. So currently,
    >>>> no way to tell multiple processes if the standby was promoted during
    >>>> the backup since
    >>>> the START_BACKUP was called.
    >>>>
    >>>> 2- throttling. Robert previously suggested that we implement
    >>>> throttling on the client-side.
    >>>> However, I found a previous discussion where it was advocated to be
    >>>> added to the
    >>>> backend instead[1].
    >>>>
    >>>> So, it was better to have a consensus before moving the throttle
    >>>> function to the client.
    >>>> That’s why for the time being I have disabled it and have asked for
    >>>> suggestions on it
    >>>> to move forward.
    >>>>
    >>>> It seems to me that we have to maintain a shared state in order to
    >>>> support taking backup
    >>>> from standby. Also, there is a new feature recently committed for
    >>>> backup progress
    >>>> reporting in the backend (pg_stat_progress_basebackup). This
    >>>> functionality was recently
    >>>> added via this commit ID: e65497df. For parallel backup to update these
    >>>> stats, a shared
    >>>> state will be required.
    >>>>
    >>>> Since multiple pg_basebackup can be running at the same time,
    >>>> maintaining a shared state
    >>>> can become a little complex, unless we disallow taking multiple
    >>>> parallel backups.
    >>>>
    >>>> So proceeding on with this patch, I will be working on:
    >>>> - throttling to be implemented on the client-side.
    >>>> - adding a shared state to handle backup from the standby.
    >>>>
    >>>>
    >>>>
    >>>> [1]
    >>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
    >>>>
    >>>>
    >>>> --
    >>>> Asif Rehman
    >>>> Highgo Software (Canada/China/Pakistan)
    >>>> URL : www.highgo.ca
    >>>>
    >>>>
    >>
    >> --
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : http://www.highgo.ca
    >> ADDR: 10318 WHALLEY BLVD, Surrey, BC
    >> EMAIL: mailto: ahsan.hadi@highgo.ca
    >>
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  60. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T11:46:51Z

    On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
    > go, that's why the name.
    
    I don't see why it should work that way. If we're fetching individual
    files, why have an unused capability to fetch multiple files?
    
    > 1- parallel backup does not work with a standby server. In parallel backup, the server
    > spawns multiple processes and there is no shared state being maintained. So currently,
    > no way to tell multiple processes if the standby was promoted during the backup since
    > the START_BACKUP was called.
    
    Why would you need to do that? As long as the process where
    STOP_BACKUP can do the check, that seems good enough.
    
    > 2- throttling. Robert previously suggested that we implement throttling on the client-side.
    > However, I found a previous discussion where it was advocated to be added to the
    > backend instead[1].
    >
    > So, it was better to have a consensus before moving the throttle function to the client.
    > That’s why for the time being I have disabled it and have asked for suggestions on it
    > to move forward.
    >
    > It seems to me that we have to maintain a shared state in order to support taking backup
    > from standby. Also, there is a new feature recently committed for backup progress
    > reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
    > added via this commit ID: e65497df. For parallel backup to update these stats, a shared
    > state will be required.
    
    I've come around to the view that a shared state is a good idea and
    that throttling on the server-side makes more sense. I'm not clear on
    whether we need shared state only for throttling or whether we need it
    for more than that. Another possible reason might be for the
    progress-reporting stuff that just got added.
    
    > Since multiple pg_basebackup can be running at the same time, maintaining a shared state
    > can become a little complex, unless we disallow taking multiple parallel backups.
    
    I do not see why it would be necessary to disallow taking multiple
    parallel backups. You just need to have multiple copies of the shared
    state and a way to decide which one to use for any particular backup.
    I guess that is a little complex, but only a little.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  61. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T11:48:33Z

    On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    > The backup failed with errors "error: could not connect to server: could
    > not look up local user ID 1000: Too many open files" when the
    > max_wal_senders was set to 2000.
    > The errors generated for the workers starting from backup worke=1017.
    >
    
    It wasn't the fact that you set max_wal_senders to 2000. It was the fact
    that you specified 1990 parallel workers. By so doing, you overloaded the
    machine, which is why everything failed. That's to be expected.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  62. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T11:54:49Z

    On Thu, Apr 2, 2020 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >> The backup failed with errors "error: could not connect to server: could
    >> not look up local user ID 1000: Too many open files" when the
    >> max_wal_senders was set to 2000.
    >> The errors generated for the workers starting from backup worke=1017.
    >>
    >
    > It wasn't the fact that you set max_wal_senders to 2000. It was the fact
    > that you specified 1990 parallel workers. By so doing, you overloaded the
    > machine, which is why everything failed. That's to be expected.
    >
    > Thanks alot Robert,
    In this case the backup folder was not being emptied as the backup was
    failed, the cleanup should be done in this case too.
    
    
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  63. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T13:23:21Z

    On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
    <kashif.zeeshan@enterprisedb.com> wrote:
    > Thanks alot Robert,
    > In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.
    
    Does it fail to clean up the backup folder in all cases where the
    backup failed, or just in this case?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  64. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T13:46:15Z

    On Thu, Apr 2, 2020 at 6:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
    > <kashif.zeeshan@enterprisedb.com> wrote:
    > > Thanks alot Robert,
    > > In this case the backup folder was not being emptied as the backup was
    > failed, the cleanup should be done in this case too.
    >
    > Does it fail to clean up the backup folder in all cases where the
    > backup failed, or just in this case?
    >
    The cleanup is done in the cases I have seen so far with base pg_basebackup
    functionality (not including the parallel backup feature) with the message
    "pg_basebackup: removing contents of data directory"
    A similar case was also fixed for parallel backup reported by Rajkumar
    where the contents of the backup folder were not cleaned up after the error.
    
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  65. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T14:20:15Z

    On Thu, Apr 2, 2020 at 9:46 AM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    > Does it fail to clean up the backup folder in all cases where the
    >> backup failed, or just in this case?
    >>
    > The cleanup is done in the cases I have seen so far with base
    > pg_basebackup functionality (not including the parallel backup feature)
    > with the message "pg_basebackup: removing contents of data directory"
    > A similar case was also fixed for parallel backup reported by Rajkumar
    > where the contents of the backup folder were not cleaned up after the error.
    >
    
    What I'm saying is that it's unclear whether there's a bug here or whether
    it just failed because of the very extreme test scenario you created.
    Spawning >1000 processes on a small machine can easily make a lot of things
    fail.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  66. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-02T15:16:57Z

    On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > Yes, we are fetching a single file. However, SEND_FILES is still capable
    > of fetching multiple files in one
    > > go, that's why the name.
    >
    > I don't see why it should work that way. If we're fetching individual
    > files, why have an unused capability to fetch multiple files?
    >
    
    Okay will rename and will modify the function to send a single file as well.
    
    
    > > 1- parallel backup does not work with a standby server. In parallel
    > backup, the server
    > > spawns multiple processes and there is no shared state being maintained.
    > So currently,
    > > no way to tell multiple processes if the standby was promoted during the
    > backup since
    > > the START_BACKUP was called.
    >
    > Why would you need to do that? As long as the process where
    > STOP_BACKUP can do the check, that seems good enough.
    >
    
    
    Yes, but the user will get the error only after the STOP_BACKUP, not while
    the backup is
    in progress. So if the backup is a large one, early error detection would
    be much beneficial.
    This is the current behavior of non-parallel backup as well.
    
    
    >
    > > 2- throttling. Robert previously suggested that we implement throttling
    > on the client-side.
    > > However, I found a previous discussion where it was advocated to be
    > added to the
    > > backend instead[1].
    > >
    > > So, it was better to have a consensus before moving the throttle
    > function to the client.
    > > That’s why for the time being I have disabled it and have asked for
    > suggestions on it
    > > to move forward.
    > >
    > > It seems to me that we have to maintain a shared state in order to
    > support taking backup
    > > from standby. Also, there is a new feature recently committed for backup
    > progress
    > > reporting in the backend (pg_stat_progress_basebackup). This
    > functionality was recently
    > > added via this commit ID: e65497df. For parallel backup to update these
    > stats, a shared
    > > state will be required.
    >
    > I've come around to the view that a shared state is a good idea and
    > that throttling on the server-side makes more sense. I'm not clear on
    > whether we need shared state only for throttling or whether we need it
    > for more than that. Another possible reason might be for the
    > progress-reporting stuff that just got added.
    >
    
    Okay, then I will add the shared state. And since we are adding the shared
    state, we can use
    that for throttling, progress-reporting and standby early error checking.
    
    
    > > Since multiple pg_basebackup can be running at the same time,
    > maintaining a shared state
    > > can become a little complex, unless we disallow taking multiple parallel
    > backups.
    >
    > I do not see why it would be necessary to disallow taking multiple
    > parallel backups. You just need to have multiple copies of the shared
    > state and a way to decide which one to use for any particular backup.
    > I guess that is a little complex, but only a little.
    >
    
    There are two possible options:
    
    (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    (2) (Preferred Option) Use the WAL start location as the BackupID.
    
    
    This BackupID should be given back as a response to start backup command.
    All client workers
    
    must append this ID to all parallel backup replication commands. So that we
    can use this identifier
    
    to search for that particular backup. Does that sound good?
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  67. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T15:44:59Z

    On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >> Why would you need to do that? As long as the process where
    >> STOP_BACKUP can do the check, that seems good enough.
    >
    > Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
    > in progress. So if the backup is a large one, early error detection would be much beneficial.
    > This is the current behavior of non-parallel backup as well.
    
    Because non-parallel backup does not feature early detection of this
    error, it is not necessary to make parallel backup do so. Indeed, it
    is undesirable. If you want to fix that problem, do it on a separate
    thread in a separate patch. A patch proposing to make parallel backup
    inconsistent in behavior with non-parallel backup will be rejected, at
    least if I have anything to say about it.
    
    TBH, fixing this doesn't seem like an urgent problem to me. The
    current situation is not great, but promotions ought to be relatively
    infrequent, so I'm not sure it's a huge problem in practice. It is
    also worth considering whether the right fix is to figure out how to
    make that case actually work, rather than just making it fail quicker.
    I don't currently understand the reason for the prohibition so I can't
    express an intelligent opinion on what the right answer is here, but
    it seems like it ought to be investigated before somebody goes and
    builds a bunch of infrastructure to make the error more timely.
    
    > Okay, then I will add the shared state. And since we are adding the shared state, we can use
    > that for throttling, progress-reporting and standby early error checking.
    
    Please propose a grammar here for all the new replication commands you
    plan to add before going and implement everything. That will make it
    easier to hash out the design without forcing you to keep changing the
    code. Your design should include a sketch of how several sets of
    coordinating backends taking several concurrent parallel backups will
    end up with one shared state per parallel backup.
    
    > There are two possible options:
    >
    > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >
    > This BackupID should be given back as a response to start backup command. All client workers
    > must append this ID to all parallel backup replication commands. So that we can use this identifier
    > to search for that particular backup. Does that sound good?
    
    Using the WAL start location as the backup ID seems like it might be
    problematic -- could a single checkpoint not end up as the start
    location for multiple backups started at the same time? Whether that's
    possible now or not, it seems unwise to hard-wire that assumption into
    the wire protocol.
    
    I was thinking that perhaps the client should generate a unique backup
    ID, e.g. leader does:
    
    START_BACKUP unique_backup_id [options]...
    
    And then others do:
    
    JOIN_BACKUP unique_backup_id
    
    My thought is that you will have a number of shared memory structure
    equal to max_wal_senders, each one large enough to hold the shared
    state for one backup. The shared state will include
    char[NAMEDATALEN-or-something] which will be used to hold the backup
    ID. START_BACKUP would allocate one and copy the name into it;
    JOIN_BACKUP would search for one by name.
    
    If you want to generate the name on the server side, then I suppose
    START_BACKUP would return a result set that includes the backup ID,
    and clients would have to specify that same backup ID when invoking
    JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    better. Either way, the backup ID should be something long and hard to
    guess, not e.g. the leader processes' PID. I think we should generate
    it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    result to get a string. That way there's almost no risk of two backup
    IDs colliding accidentally, and even if we somehow had a malicious
    user trying to screw up somebody else's parallel backup by choosing a
    colliding backup ID, it would be pretty hard to have any success. A
    user with enough access to do that sort of thing can probably cause a
    lot worse problems anyway, but it seems pretty easy to guard against
    intentional collisions robustly here, so I think we should.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  68. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-03T08:45:23Z

    On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > >> Why would you need to do that? As long as the process where
    > >> STOP_BACKUP can do the check, that seems good enough.
    > >
    > > Yes, but the user will get the error only after the STOP_BACKUP, not
    > while the backup is
    > > in progress. So if the backup is a large one, early error detection
    > would be much beneficial.
    > > This is the current behavior of non-parallel backup as well.
    >
    > Because non-parallel backup does not feature early detection of this
    > error, it is not necessary to make parallel backup do so. Indeed, it
    > is undesirable. If you want to fix that problem, do it on a separate
    > thread in a separate patch. A patch proposing to make parallel backup
    > inconsistent in behavior with non-parallel backup will be rejected, at
    > least if I have anything to say about it.
    >
    > TBH, fixing this doesn't seem like an urgent problem to me. The
    > current situation is not great, but promotions ought to be relatively
    > infrequent, so I'm not sure it's a huge problem in practice. It is
    > also worth considering whether the right fix is to figure out how to
    > make that case actually work, rather than just making it fail quicker.
    > I don't currently understand the reason for the prohibition so I can't
    > express an intelligent opinion on what the right answer is here, but
    > it seems like it ought to be investigated before somebody goes and
    > builds a bunch of infrastructure to make the error more timely.
    >
    
    Non-parallel backup already does the early error checking. I only intended
    
    to make parallel behave the same as non-parallel here. So, I agree with
    
    you that the behavior of parallel backup should be consistent with the
    
    non-parallel one.  Please see the code snippet below from
    
    basebackup.c:sendDir()
    
    
    /*
    >
    >  * Check if the postmaster has signaled us to exit, and abort with an
    >
    >  * error in that case. The error handler further up will call
    >
    >  * do_pg_abort_backup() for us. Also check that if the backup was
    >
    >  * started while still in recovery, the server wasn't promoted.
    >
    >  * do_pg_stop_backup() will check that too, but it's better to stop
    >
    >  * the backup early than continue to the end and fail there.
    >
    >  */
    >
    > CHECK_FOR_INTERRUPTS();
    >
    > *if* (RecoveryInProgress() != backup_started_in_recovery)
    >
    > ereport(ERROR,
    >
    > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >
    > errmsg("the standby was promoted during online backup"),
    >
    > errhint("This means that the backup being taken is corrupt "
    >
    > "and should not be used. "
    >
    > "Try taking another online backup.")));
    >
    >
    > > Okay, then I will add the shared state. And since we are adding the
    > shared state, we can use
    > > that for throttling, progress-reporting and standby early error checking.
    >
    > Please propose a grammar here for all the new replication commands you
    > plan to add before going and implement everything. That will make it
    > easier to hash out the design without forcing you to keep changing the
    > code. Your design should include a sketch of how several sets of
    > coordinating backends taking several concurrent parallel backups will
    > end up with one shared state per parallel backup.
    >
    > > There are two possible options:
    > >
    > > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    > > (2) (Preferred Option) Use the WAL start location as the BackupID.
    > >
    > > This BackupID should be given back as a response to start backup
    > command. All client workers
    > > must append this ID to all parallel backup replication commands. So that
    > we can use this identifier
    > > to search for that particular backup. Does that sound good?
    >
    > Using the WAL start location as the backup ID seems like it might be
    > problematic -- could a single checkpoint not end up as the start
    > location for multiple backups started at the same time? Whether that's
    > possible now or not, it seems unwise to hard-wire that assumption into
    > the wire protocol.
    >
    > I was thinking that perhaps the client should generate a unique backup
    > ID, e.g. leader does:
    >
    > START_BACKUP unique_backup_id [options]...
    >
    > And then others do:
    >
    > JOIN_BACKUP unique_backup_id
    >
    > My thought is that you will have a number of shared memory structure
    > equal to max_wal_senders, each one large enough to hold the shared
    > state for one backup. The shared state will include
    > char[NAMEDATALEN-or-something] which will be used to hold the backup
    > ID. START_BACKUP would allocate one and copy the name into it;
    > JOIN_BACKUP would search for one by name.
    >
    > If you want to generate the name on the server side, then I suppose
    > START_BACKUP would return a result set that includes the backup ID,
    > and clients would have to specify that same backup ID when invoking
    > JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    > better. Either way, the backup ID should be something long and hard to
    > guess, not e.g. the leader processes' PID. I think we should generate
    > it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    > result to get a string. That way there's almost no risk of two backup
    > IDs colliding accidentally, and even if we somehow had a malicious
    > user trying to screw up somebody else's parallel backup by choosing a
    > colliding backup ID, it would be pretty hard to have any success. A
    > user with enough access to do that sort of thing can probably cause a
    > lot worse problems anyway, but it seems pretty easy to guard against
    > intentional collisions robustly here, so I think we should.
    >
    >
    Okay so If we are to add another replication command ‘JOIN_BACKUP
    unique_backup_id’
    to make workers find the relevant shared state. There won't be any need for
    changing
    the grammar for any other command. The START_BACKUP can return the
    unique_backup_id
    in the result set.
    
    I am thinking of the following struct for shared state:
    
    > *typedef* *struct*
    >
    > {
    >
    > *char* backupid[NAMEDATALEN];
    >
    > XLogRecPtr startptr;
    >
    >
    > slock_t lock;
    >
    > int64 throttling_counter;
    >
    > *bool* backup_started_in_recovery;
    >
    > } BackupSharedState;
    >
    >
    The shared state structure entries would be maintained by a shared hash
    table.
    There will be one structure per parallel backup. Since a single parallel
    backup
    can engage more than one wal sender, so I think max_wal_senders might be a
    little
    too much; perhaps max_wal_senders/2 since there will be at least 2
    connections
    per parallel backup? Alternatively, we can set a new GUC that defines the
    maximum
    number of for concurrent parallel backups i.e.
    ‘max_concurent_backups_allowed = 10’
    perhaps, or we can make it user-configurable.
    
    The key would be “backupid=hex_encode(pg_random_strong(16))”
    
    Checking for Standby Promotion:
    At the START_BACKUP command, we initialize
    BackupSharedState.backup_started_in_recovery
    and keep checking it whenever send_file () is called to send a new file.
    
    Throttling:
    BackupSharedState.throttling_counter - The throttling logic remains the same
    as for non-parallel backup with the exception that multiple threads will
    now be
    updating it. So in parallel backup, this will represent the overall bytes
    that
    have been transferred. So the workers would sleep if they have exceeded the
    limit. Hence, the shared state carries a lock to safely update the
    throttling
    value atomically.
    
    Progress Reporting:
    Although I think we should add progress-reporting for parallel backup as a
    separate patch. The relevant entries for progress-reporting such as
    ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
    as well.
    
    
    Grammar:
    There is a change in the resultset being returned for START_BACKUP command;
    unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
    added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
    to the grammar.
    
    START_BACKUP [LABEL '<label>'] [FAST]
      - returns startptr, tli, backup_label, unique_backup_id
    STOP_BACKUP [NOWAIT]
      - returns startptr, tli, backup_label
    JOIN_BACKUP ‘unique_backup_id’
      - attaches a shared state identified by ‘unique_backup_id’ to a backend
    process.
    
    LIST_TABLESPACES [PROGRESS]
    LIST_FILES [TABLESPACE]
    LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  69. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-03T10:01:17Z

    Hi Asif
    
    When a non-existent slot is used with tablespace then correct error is
    displayed but then the backup folder is not cleaned and leaves a corrupt
    backup.
    
    Steps
    =======
    
    edb@localhost bin]$
    [edb@localhost bin]$ mkdir /home/edb/tbl1
    [edb@localhost bin]$ mkdir /home/edb/tbl_res
    [edb@localhost bin]$
    postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    CREATE TABLESPACE
    postgres=#
    postgres=# create table t1 (a int) tablespace tbl1;
    CREATE TABLE
    postgres=# insert into t1 values(100);
    INSERT 0 1
    postgres=# insert into t1 values(200);
    INSERT 0 1
    postgres=# insert into t1 values(300);
    INSERT 0 1
    postgres=#
    
    
    [edb@localhost bin]$
    [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/
    -T /home/edb/tbl1=/home/edb/tbl_res -S test
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: error: could not send replication command
    "START_REPLICATION": ERROR:  replication slot "test" does not exist
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: write-ahead log end point: 0/2E000100
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: error: child thread exited with error 1
    [edb@localhost bin]$
    
    backup folder not cleaned
    
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$ ls /home/edb/Desktop/backup
    backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
     pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
                postgresql.conf
    base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
     postgresql.auto.conf
    [edb@localhost bin]$
    
    
    
    
    If the same case is executed without the parallel backup patch then the
    backup folder is cleaned after the error is displayed.
    
    [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
    /home/edb/tbl1=/home/edb/tbl_res -S test999
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: error: could not send replication command
    "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    pg_basebackup: write-ahead log end point: 0/2B000100
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: error: child process exited with exit code 1
    *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    pg_basebackup: changes to tablespace directories will not be undone
    
    
    On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> >> Why would you need to do that? As long as the process where
    >> >> STOP_BACKUP can do the check, that seems good enough.
    >> >
    >> > Yes, but the user will get the error only after the STOP_BACKUP, not
    >> while the backup is
    >> > in progress. So if the backup is a large one, early error detection
    >> would be much beneficial.
    >> > This is the current behavior of non-parallel backup as well.
    >>
    >> Because non-parallel backup does not feature early detection of this
    >> error, it is not necessary to make parallel backup do so. Indeed, it
    >> is undesirable. If you want to fix that problem, do it on a separate
    >> thread in a separate patch. A patch proposing to make parallel backup
    >> inconsistent in behavior with non-parallel backup will be rejected, at
    >> least if I have anything to say about it.
    >>
    >> TBH, fixing this doesn't seem like an urgent problem to me. The
    >> current situation is not great, but promotions ought to be relatively
    >> infrequent, so I'm not sure it's a huge problem in practice. It is
    >> also worth considering whether the right fix is to figure out how to
    >> make that case actually work, rather than just making it fail quicker.
    >> I don't currently understand the reason for the prohibition so I can't
    >> express an intelligent opinion on what the right answer is here, but
    >> it seems like it ought to be investigated before somebody goes and
    >> builds a bunch of infrastructure to make the error more timely.
    >>
    >
    > Non-parallel backup already does the early error checking. I only intended
    >
    > to make parallel behave the same as non-parallel here. So, I agree with
    >
    > you that the behavior of parallel backup should be consistent with the
    >
    > non-parallel one.  Please see the code snippet below from
    >
    > basebackup.c:sendDir()
    >
    >
    > /*
    >>
    >>  * Check if the postmaster has signaled us to exit, and abort with an
    >>
    >>  * error in that case. The error handler further up will call
    >>
    >>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>
    >>  * started while still in recovery, the server wasn't promoted.
    >>
    >>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>
    >>  * the backup early than continue to the end and fail there.
    >>
    >>  */
    >>
    >> CHECK_FOR_INTERRUPTS();
    >>
    >> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>
    >> ereport(ERROR,
    >>
    >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>
    >> errmsg("the standby was promoted during online backup"),
    >>
    >> errhint("This means that the backup being taken is corrupt "
    >>
    >> "and should not be used. "
    >>
    >> "Try taking another online backup.")));
    >>
    >>
    >> > Okay, then I will add the shared state. And since we are adding the
    >> shared state, we can use
    >> > that for throttling, progress-reporting and standby early error
    >> checking.
    >>
    >> Please propose a grammar here for all the new replication commands you
    >> plan to add before going and implement everything. That will make it
    >> easier to hash out the design without forcing you to keep changing the
    >> code. Your design should include a sketch of how several sets of
    >> coordinating backends taking several concurrent parallel backups will
    >> end up with one shared state per parallel backup.
    >>
    >> > There are two possible options:
    >> >
    >> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    >> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >> >
    >> > This BackupID should be given back as a response to start backup
    >> command. All client workers
    >> > must append this ID to all parallel backup replication commands. So
    >> that we can use this identifier
    >> > to search for that particular backup. Does that sound good?
    >>
    >> Using the WAL start location as the backup ID seems like it might be
    >> problematic -- could a single checkpoint not end up as the start
    >> location for multiple backups started at the same time? Whether that's
    >> possible now or not, it seems unwise to hard-wire that assumption into
    >> the wire protocol.
    >>
    >> I was thinking that perhaps the client should generate a unique backup
    >> ID, e.g. leader does:
    >>
    >> START_BACKUP unique_backup_id [options]...
    >>
    >> And then others do:
    >>
    >> JOIN_BACKUP unique_backup_id
    >>
    >> My thought is that you will have a number of shared memory structure
    >> equal to max_wal_senders, each one large enough to hold the shared
    >> state for one backup. The shared state will include
    >> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >> ID. START_BACKUP would allocate one and copy the name into it;
    >> JOIN_BACKUP would search for one by name.
    >>
    >> If you want to generate the name on the server side, then I suppose
    >> START_BACKUP would return a result set that includes the backup ID,
    >> and clients would have to specify that same backup ID when invoking
    >> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >> better. Either way, the backup ID should be something long and hard to
    >> guess, not e.g. the leader processes' PID. I think we should generate
    >> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >> result to get a string. That way there's almost no risk of two backup
    >> IDs colliding accidentally, and even if we somehow had a malicious
    >> user trying to screw up somebody else's parallel backup by choosing a
    >> colliding backup ID, it would be pretty hard to have any success. A
    >> user with enough access to do that sort of thing can probably cause a
    >> lot worse problems anyway, but it seems pretty easy to guard against
    >> intentional collisions robustly here, so I think we should.
    >>
    >>
    > Okay so If we are to add another replication command ‘JOIN_BACKUP
    > unique_backup_id’
    > to make workers find the relevant shared state. There won't be any need
    > for changing
    > the grammar for any other command. The START_BACKUP can return the
    > unique_backup_id
    > in the result set.
    >
    > I am thinking of the following struct for shared state:
    >
    >> *typedef* *struct*
    >>
    >> {
    >>
    >> *char* backupid[NAMEDATALEN];
    >>
    >> XLogRecPtr startptr;
    >>
    >>
    >> slock_t lock;
    >>
    >> int64 throttling_counter;
    >>
    >> *bool* backup_started_in_recovery;
    >>
    >> } BackupSharedState;
    >>
    >>
    > The shared state structure entries would be maintained by a shared hash
    > table.
    > There will be one structure per parallel backup. Since a single parallel
    > backup
    > can engage more than one wal sender, so I think max_wal_senders might be a
    > little
    > too much; perhaps max_wal_senders/2 since there will be at least 2
    > connections
    > per parallel backup? Alternatively, we can set a new GUC that defines the
    > maximum
    > number of for concurrent parallel backups i.e.
    > ‘max_concurent_backups_allowed = 10’
    > perhaps, or we can make it user-configurable.
    >
    > The key would be “backupid=hex_encode(pg_random_strong(16))”
    >
    > Checking for Standby Promotion:
    > At the START_BACKUP command, we initialize
    > BackupSharedState.backup_started_in_recovery
    > and keep checking it whenever send_file () is called to send a new file.
    >
    > Throttling:
    > BackupSharedState.throttling_counter - The throttling logic remains the
    > same
    > as for non-parallel backup with the exception that multiple threads will
    > now be
    > updating it. So in parallel backup, this will represent the overall bytes
    > that
    > have been transferred. So the workers would sleep if they have exceeded the
    > limit. Hence, the shared state carries a lock to safely update the
    > throttling
    > value atomically.
    >
    > Progress Reporting:
    > Although I think we should add progress-reporting for parallel backup as a
    > separate patch. The relevant entries for progress-reporting such as
    > ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
    > as well.
    >
    >
    > Grammar:
    > There is a change in the resultset being returned for START_BACKUP
    > command;
    > unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
    > added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
    > to the grammar.
    >
    > START_BACKUP [LABEL '<label>'] [FAST]
    >   - returns startptr, tli, backup_label, unique_backup_id
    > STOP_BACKUP [NOWAIT]
    >   - returns startptr, tli, backup_label
    > JOIN_BACKUP ‘unique_backup_id’
    >   - attaches a shared state identified by ‘unique_backup_id’ to a backend
    > process.
    >
    > LIST_TABLESPACES [PROGRESS]
    > LIST_FILES [TABLESPACE]
    > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    > SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  70. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-04-07T04:15:00Z

    Asif,
    
    After recent backup manifest addition, patches needed to rebase and
    reconsideration of a few things like making sure that parallel backup
    creates
    a manifest file correctly or not etc.
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  71. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-07T11:03:46Z

    On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    > Hi Asif
    >
    > When a non-existent slot is used with tablespace then correct error is
    > displayed but then the backup folder is not cleaned and leaves a corrupt
    > backup.
    >
    > Steps
    > =======
    >
    > edb@localhost bin]$
    > [edb@localhost bin]$ mkdir /home/edb/tbl1
    > [edb@localhost bin]$ mkdir /home/edb/tbl_res
    > [edb@localhost bin]$
    > postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    > CREATE TABLESPACE
    > postgres=#
    > postgres=# create table t1 (a int) tablespace tbl1;
    > CREATE TABLE
    > postgres=# insert into t1 values(100);
    > INSERT 0 1
    > postgres=# insert into t1 values(200);
    > INSERT 0 1
    > postgres=# insert into t1 values(300);
    > INSERT 0 1
    > postgres=#
    >
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: error: could not send replication command
    > "START_REPLICATION": ERROR:  replication slot "test" does not exist
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: write-ahead log end point: 0/2E000100
    > pg_basebackup: waiting for background process to finish streaming ...
    > pg_basebackup: error: child thread exited with error 1
    > [edb@localhost bin]$
    >
    > backup folder not cleaned
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$ ls /home/edb/Desktop/backup
    > backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >             postgresql.conf
    > base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    > pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >  postgresql.auto.conf
    > [edb@localhost bin]$
    >
    >
    >
    >
    > If the same case is executed without the parallel backup patch then the
    > backup folder is cleaned after the error is displayed.
    >
    > [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
    > /home/edb/tbl1=/home/edb/tbl_res -S test999
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: error: could not send replication command
    > "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    > pg_basebackup: write-ahead log end point: 0/2B000100
    > pg_basebackup: waiting for background process to finish streaming ...
    > pg_basebackup: error: child process exited with exit code 1
    > *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    > pg_basebackup: changes to tablespace directories will not be undone
    >
    
    
    Hi Asif
    
    A similar case is when DB Server is shut down while the Parallel Backup is
    in progress then the correct error is displayed but then the backup folder
    is not cleaned and leaves a corrupt backup. I think one bug fix will solve
    all these cases where clean up is not done when parallel backup is failed.
    
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: backup worker (5) created
    pg_basebackup: backup worker (6) created
    pg_basebackup: backup worker (7) created
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    Same case when executed on pg_basebackup without the Parallel backup patch
    then proper clean up is done.
    
    [edb@localhost bin]$
    [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: removing contents of data directory
    "/home/edb/Desktop/backup/"
    [edb@localhost bin]$
    
    Thanks
    
    
    >
    > On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >
    >>
    >>
    >> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >>
    >>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>> >> Why would you need to do that? As long as the process where
    >>> >> STOP_BACKUP can do the check, that seems good enough.
    >>> >
    >>> > Yes, but the user will get the error only after the STOP_BACKUP, not
    >>> while the backup is
    >>> > in progress. So if the backup is a large one, early error detection
    >>> would be much beneficial.
    >>> > This is the current behavior of non-parallel backup as well.
    >>>
    >>> Because non-parallel backup does not feature early detection of this
    >>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>> is undesirable. If you want to fix that problem, do it on a separate
    >>> thread in a separate patch. A patch proposing to make parallel backup
    >>> inconsistent in behavior with non-parallel backup will be rejected, at
    >>> least if I have anything to say about it.
    >>>
    >>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>> current situation is not great, but promotions ought to be relatively
    >>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>> also worth considering whether the right fix is to figure out how to
    >>> make that case actually work, rather than just making it fail quicker.
    >>> I don't currently understand the reason for the prohibition so I can't
    >>> express an intelligent opinion on what the right answer is here, but
    >>> it seems like it ought to be investigated before somebody goes and
    >>> builds a bunch of infrastructure to make the error more timely.
    >>>
    >>
    >> Non-parallel backup already does the early error checking. I only intended
    >>
    >> to make parallel behave the same as non-parallel here. So, I agree with
    >>
    >> you that the behavior of parallel backup should be consistent with the
    >>
    >> non-parallel one.  Please see the code snippet below from
    >>
    >> basebackup.c:sendDir()
    >>
    >>
    >> /*
    >>>
    >>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>
    >>>  * error in that case. The error handler further up will call
    >>>
    >>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>
    >>>  * started while still in recovery, the server wasn't promoted.
    >>>
    >>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>
    >>>  * the backup early than continue to the end and fail there.
    >>>
    >>>  */
    >>>
    >>> CHECK_FOR_INTERRUPTS();
    >>>
    >>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>
    >>> ereport(ERROR,
    >>>
    >>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>
    >>> errmsg("the standby was promoted during online backup"),
    >>>
    >>> errhint("This means that the backup being taken is corrupt "
    >>>
    >>> "and should not be used. "
    >>>
    >>> "Try taking another online backup.")));
    >>>
    >>>
    >>> > Okay, then I will add the shared state. And since we are adding the
    >>> shared state, we can use
    >>> > that for throttling, progress-reporting and standby early error
    >>> checking.
    >>>
    >>> Please propose a grammar here for all the new replication commands you
    >>> plan to add before going and implement everything. That will make it
    >>> easier to hash out the design without forcing you to keep changing the
    >>> code. Your design should include a sketch of how several sets of
    >>> coordinating backends taking several concurrent parallel backups will
    >>> end up with one shared state per parallel backup.
    >>>
    >>> > There are two possible options:
    >>> >
    >>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    >>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>> >
    >>> > This BackupID should be given back as a response to start backup
    >>> command. All client workers
    >>> > must append this ID to all parallel backup replication commands. So
    >>> that we can use this identifier
    >>> > to search for that particular backup. Does that sound good?
    >>>
    >>> Using the WAL start location as the backup ID seems like it might be
    >>> problematic -- could a single checkpoint not end up as the start
    >>> location for multiple backups started at the same time? Whether that's
    >>> possible now or not, it seems unwise to hard-wire that assumption into
    >>> the wire protocol.
    >>>
    >>> I was thinking that perhaps the client should generate a unique backup
    >>> ID, e.g. leader does:
    >>>
    >>> START_BACKUP unique_backup_id [options]...
    >>>
    >>> And then others do:
    >>>
    >>> JOIN_BACKUP unique_backup_id
    >>>
    >>> My thought is that you will have a number of shared memory structure
    >>> equal to max_wal_senders, each one large enough to hold the shared
    >>> state for one backup. The shared state will include
    >>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>> ID. START_BACKUP would allocate one and copy the name into it;
    >>> JOIN_BACKUP would search for one by name.
    >>>
    >>> If you want to generate the name on the server side, then I suppose
    >>> START_BACKUP would return a result set that includes the backup ID,
    >>> and clients would have to specify that same backup ID when invoking
    >>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>> better. Either way, the backup ID should be something long and hard to
    >>> guess, not e.g. the leader processes' PID. I think we should generate
    >>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>> result to get a string. That way there's almost no risk of two backup
    >>> IDs colliding accidentally, and even if we somehow had a malicious
    >>> user trying to screw up somebody else's parallel backup by choosing a
    >>> colliding backup ID, it would be pretty hard to have any success. A
    >>> user with enough access to do that sort of thing can probably cause a
    >>> lot worse problems anyway, but it seems pretty easy to guard against
    >>> intentional collisions robustly here, so I think we should.
    >>>
    >>>
    >> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >> unique_backup_id’
    >> to make workers find the relevant shared state. There won't be any need
    >> for changing
    >> the grammar for any other command. The START_BACKUP can return the
    >> unique_backup_id
    >> in the result set.
    >>
    >> I am thinking of the following struct for shared state:
    >>
    >>> *typedef* *struct*
    >>>
    >>> {
    >>>
    >>> *char* backupid[NAMEDATALEN];
    >>>
    >>> XLogRecPtr startptr;
    >>>
    >>>
    >>> slock_t lock;
    >>>
    >>> int64 throttling_counter;
    >>>
    >>> *bool* backup_started_in_recovery;
    >>>
    >>> } BackupSharedState;
    >>>
    >>>
    >> The shared state structure entries would be maintained by a shared hash
    >> table.
    >> There will be one structure per parallel backup. Since a single parallel
    >> backup
    >> can engage more than one wal sender, so I think max_wal_senders might be
    >> a little
    >> too much; perhaps max_wal_senders/2 since there will be at least 2
    >> connections
    >> per parallel backup? Alternatively, we can set a new GUC that defines the
    >> maximum
    >> number of for concurrent parallel backups i.e.
    >> ‘max_concurent_backups_allowed = 10’
    >> perhaps, or we can make it user-configurable.
    >>
    >> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>
    >> Checking for Standby Promotion:
    >> At the START_BACKUP command, we initialize
    >> BackupSharedState.backup_started_in_recovery
    >> and keep checking it whenever send_file () is called to send a new file.
    >>
    >> Throttling:
    >> BackupSharedState.throttling_counter - The throttling logic remains the
    >> same
    >> as for non-parallel backup with the exception that multiple threads will
    >> now be
    >> updating it. So in parallel backup, this will represent the overall bytes
    >> that
    >> have been transferred. So the workers would sleep if they have exceeded
    >> the
    >> limit. Hence, the shared state carries a lock to safely update the
    >> throttling
    >> value atomically.
    >>
    >> Progress Reporting:
    >> Although I think we should add progress-reporting for parallel backup as a
    >> separate patch. The relevant entries for progress-reporting such as
    >> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
    >> as well.
    >>
    >>
    >> Grammar:
    >> There is a change in the resultset being returned for START_BACKUP
    >> command;
    >> unique_backup_id is added. Additionally, JOIN_BACKUP replication command
    >> is
    >> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >> changes
    >> to the grammar.
    >>
    >> START_BACKUP [LABEL '<label>'] [FAST]
    >>   - returns startptr, tli, backup_label, unique_backup_id
    >> STOP_BACKUP [NOWAIT]
    >>   - returns startptr, tli, backup_label
    >> JOIN_BACKUP ‘unique_backup_id’
    >>   - attaches a shared state identified by ‘unique_backup_id’ to a backend
    >> process.
    >>
    >> LIST_TABLESPACES [PROGRESS]
    >> LIST_FILES [TABLESPACE]
    >> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>
    >>
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    >
    > --
    > Regards
    > ====================================
    > Kashif Zeeshan
    > Lead Quality Assurance Engineer / Manager
    >
    > EnterpriseDB Corporation
    > The Enterprise Postgres Company
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  72. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-07T16:43:51Z

    Hi,
    
    Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    
    I have added the shared state as previously described. The new grammar
    changes
    are as follows:
    
    START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
        - This will generate a unique backupid using pg_strong_random(16) and
    hex-encoded
          it. which is then returned as the result set.
        - It will also create a shared state and add it to the hashtable. The
    hash table size is set
          to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I
    think it's
          sufficient initial size. max_wal_senders is not used, because it can
    be set to quite a
          large values.
    
    JOIN_BACKUP 'backup_id'
        - finds 'backup_id' in hashtable and attaches it to server process.
    
    
    SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
        - renamed SEND_FILES to SEND_FILE
        - removed START_WAL_LOCATION from this because 'startptr' is now
    accessible through
          shared state.
    
    There is no change in other commands:
    STOP_BACKUP [NOWAIT]
    LIST_TABLESPACES [PROGRESS]
    LIST_FILES [TABLESPACE]
    LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    
    The current patches (v11) have been rebased to the latest master. The
    backup manifest is enabled
    by default, so I have disabled it for parallel backup mode and have
    generated a warning so that
    user is aware of it and not expect it in the backup.
    
    
    On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    >
    >
    > On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >> Hi Asif
    >>
    >> When a non-existent slot is used with tablespace then correct error is
    >> displayed but then the backup folder is not cleaned and leaves a corrupt
    >> backup.
    >>
    >> Steps
    >> =======
    >>
    >> edb@localhost bin]$
    >> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >> [edb@localhost bin]$
    >> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >> CREATE TABLESPACE
    >> postgres=#
    >> postgres=# create table t1 (a int) tablespace tbl1;
    >> CREATE TABLE
    >> postgres=# insert into t1 values(100);
    >> INSERT 0 1
    >> postgres=# insert into t1 values(200);
    >> INSERT 0 1
    >> postgres=# insert into t1 values(300);
    >> INSERT 0 1
    >> postgres=#
    >>
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: error: could not send replication command
    >> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: write-ahead log end point: 0/2E000100
    >> pg_basebackup: waiting for background process to finish streaming ...
    >> pg_basebackup: error: child thread exited with error 1
    >> [edb@localhost bin]$
    >>
    >> backup folder not cleaned
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>             postgresql.conf
    >> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>  postgresql.auto.conf
    >> [edb@localhost bin]$
    >>
    >>
    >>
    >>
    >> If the same case is executed without the parallel backup patch then the
    >> backup folder is cleaned after the error is displayed.
    >>
    >> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
    >> /home/edb/tbl1=/home/edb/tbl_res -S test999
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: error: could not send replication command
    >> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >> pg_basebackup: write-ahead log end point: 0/2B000100
    >> pg_basebackup: waiting for background process to finish streaming ...
    >> pg_basebackup: error: child process exited with exit code 1
    >> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >> pg_basebackup: changes to tablespace directories will not be undone
    >>
    >
    >
    > Hi Asif
    >
    > A similar case is when DB Server is shut down while the Parallel Backup is
    > in progress then the correct error is displayed but then the backup folder
    > is not cleaned and leaves a corrupt backup. I think one bug fix will solve
    > all these cases where clean up is not done when parallel backup is failed.
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
    > 8
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: backup worker (5) created
    > pg_basebackup: backup worker (6) created
    > pg_basebackup: backup worker (7) created
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    >
    > Same case when executed on pg_basebackup without the Parallel backup patch
    > then proper clean up is done.
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > pg_basebackup: removing contents of data directory
    > "/home/edb/Desktop/backup/"
    > [edb@localhost bin]$
    >
    > Thanks
    >
    >
    >>
    >> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>>
    >>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>> wrote:
    >>>
    >>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>> wrote:
    >>>> >> Why would you need to do that? As long as the process where
    >>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>> >
    >>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
    >>>> while the backup is
    >>>> > in progress. So if the backup is a large one, early error detection
    >>>> would be much beneficial.
    >>>> > This is the current behavior of non-parallel backup as well.
    >>>>
    >>>> Because non-parallel backup does not feature early detection of this
    >>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>> thread in a separate patch. A patch proposing to make parallel backup
    >>>> inconsistent in behavior with non-parallel backup will be rejected, at
    >>>> least if I have anything to say about it.
    >>>>
    >>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>> current situation is not great, but promotions ought to be relatively
    >>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>> also worth considering whether the right fix is to figure out how to
    >>>> make that case actually work, rather than just making it fail quicker.
    >>>> I don't currently understand the reason for the prohibition so I can't
    >>>> express an intelligent opinion on what the right answer is here, but
    >>>> it seems like it ought to be investigated before somebody goes and
    >>>> builds a bunch of infrastructure to make the error more timely.
    >>>>
    >>>
    >>> Non-parallel backup already does the early error checking. I only
    >>> intended
    >>>
    >>> to make parallel behave the same as non-parallel here. So, I agree with
    >>>
    >>> you that the behavior of parallel backup should be consistent with the
    >>>
    >>> non-parallel one.  Please see the code snippet below from
    >>>
    >>> basebackup.c:sendDir()
    >>>
    >>>
    >>> /*
    >>>>
    >>>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>>
    >>>>  * error in that case. The error handler further up will call
    >>>>
    >>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>
    >>>>  * started while still in recovery, the server wasn't promoted.
    >>>>
    >>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>
    >>>>  * the backup early than continue to the end and fail there.
    >>>>
    >>>>  */
    >>>>
    >>>> CHECK_FOR_INTERRUPTS();
    >>>>
    >>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>
    >>>> ereport(ERROR,
    >>>>
    >>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>
    >>>> errmsg("the standby was promoted during online backup"),
    >>>>
    >>>> errhint("This means that the backup being taken is corrupt "
    >>>>
    >>>> "and should not be used. "
    >>>>
    >>>> "Try taking another online backup.")));
    >>>>
    >>>>
    >>>> > Okay, then I will add the shared state. And since we are adding the
    >>>> shared state, we can use
    >>>> > that for throttling, progress-reporting and standby early error
    >>>> checking.
    >>>>
    >>>> Please propose a grammar here for all the new replication commands you
    >>>> plan to add before going and implement everything. That will make it
    >>>> easier to hash out the design without forcing you to keep changing the
    >>>> code. Your design should include a sketch of how several sets of
    >>>> coordinating backends taking several concurrent parallel backups will
    >>>> end up with one shared state per parallel backup.
    >>>>
    >>>> > There are two possible options:
    >>>> >
    >>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    >>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>> >
    >>>> > This BackupID should be given back as a response to start backup
    >>>> command. All client workers
    >>>> > must append this ID to all parallel backup replication commands. So
    >>>> that we can use this identifier
    >>>> > to search for that particular backup. Does that sound good?
    >>>>
    >>>> Using the WAL start location as the backup ID seems like it might be
    >>>> problematic -- could a single checkpoint not end up as the start
    >>>> location for multiple backups started at the same time? Whether that's
    >>>> possible now or not, it seems unwise to hard-wire that assumption into
    >>>> the wire protocol.
    >>>>
    >>>> I was thinking that perhaps the client should generate a unique backup
    >>>> ID, e.g. leader does:
    >>>>
    >>>> START_BACKUP unique_backup_id [options]...
    >>>>
    >>>> And then others do:
    >>>>
    >>>> JOIN_BACKUP unique_backup_id
    >>>>
    >>>> My thought is that you will have a number of shared memory structure
    >>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>> state for one backup. The shared state will include
    >>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>> JOIN_BACKUP would search for one by name.
    >>>>
    >>>> If you want to generate the name on the server side, then I suppose
    >>>> START_BACKUP would return a result set that includes the backup ID,
    >>>> and clients would have to specify that same backup ID when invoking
    >>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>>> better. Either way, the backup ID should be something long and hard to
    >>>> guess, not e.g. the leader processes' PID. I think we should generate
    >>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>>> result to get a string. That way there's almost no risk of two backup
    >>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>> user trying to screw up somebody else's parallel backup by choosing a
    >>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>> user with enough access to do that sort of thing can probably cause a
    >>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>> intentional collisions robustly here, so I think we should.
    >>>>
    >>>>
    >>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>> unique_backup_id’
    >>> to make workers find the relevant shared state. There won't be any need
    >>> for changing
    >>> the grammar for any other command. The START_BACKUP can return the
    >>> unique_backup_id
    >>> in the result set.
    >>>
    >>> I am thinking of the following struct for shared state:
    >>>
    >>>> *typedef* *struct*
    >>>>
    >>>> {
    >>>>
    >>>> *char* backupid[NAMEDATALEN];
    >>>>
    >>>> XLogRecPtr startptr;
    >>>>
    >>>>
    >>>> slock_t lock;
    >>>>
    >>>> int64 throttling_counter;
    >>>>
    >>>> *bool* backup_started_in_recovery;
    >>>>
    >>>> } BackupSharedState;
    >>>>
    >>>>
    >>> The shared state structure entries would be maintained by a shared hash
    >>> table.
    >>> There will be one structure per parallel backup. Since a single parallel
    >>> backup
    >>> can engage more than one wal sender, so I think max_wal_senders might be
    >>> a little
    >>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>> connections
    >>> per parallel backup? Alternatively, we can set a new GUC that defines
    >>> the maximum
    >>> number of for concurrent parallel backups i.e.
    >>> ‘max_concurent_backups_allowed = 10’
    >>> perhaps, or we can make it user-configurable.
    >>>
    >>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>
    >>> Checking for Standby Promotion:
    >>> At the START_BACKUP command, we initialize
    >>> BackupSharedState.backup_started_in_recovery
    >>> and keep checking it whenever send_file () is called to send a new file.
    >>>
    >>> Throttling:
    >>> BackupSharedState.throttling_counter - The throttling logic remains the
    >>> same
    >>> as for non-parallel backup with the exception that multiple threads will
    >>> now be
    >>> updating it. So in parallel backup, this will represent the overall
    >>> bytes that
    >>> have been transferred. So the workers would sleep if they have exceeded
    >>> the
    >>> limit. Hence, the shared state carries a lock to safely update the
    >>> throttling
    >>> value atomically.
    >>>
    >>> Progress Reporting:
    >>> Although I think we should add progress-reporting for parallel backup as
    >>> a
    >>> separate patch. The relevant entries for progress-reporting such as
    >>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>> structure
    >>> as well.
    >>>
    >>>
    >>> Grammar:
    >>> There is a change in the resultset being returned for START_BACKUP
    >>> command;
    >>> unique_backup_id is added. Additionally, JOIN_BACKUP replication command
    >>> is
    >>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>> changes
    >>> to the grammar.
    >>>
    >>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>   - returns startptr, tli, backup_label, unique_backup_id
    >>> STOP_BACKUP [NOWAIT]
    >>>   - returns startptr, tli, backup_label
    >>> JOIN_BACKUP ‘unique_backup_id’
    >>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>> backend process.
    >>>
    >>> LIST_TABLESPACES [PROGRESS]
    >>> LIST_FILES [TABLESPACE]
    >>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>
    >>>
    >>> --
    >>> Asif Rehman
    >>> Highgo Software (Canada/China/Pakistan)
    >>> URL : www.highgo.ca
    >>>
    >>>
    >>
    >> --
    >> Regards
    >> ====================================
    >> Kashif Zeeshan
    >> Lead Quality Assurance Engineer / Manager
    >>
    >> EnterpriseDB Corporation
    >> The Enterprise Postgres Company
    >>
    >>
    >
    > --
    > Regards
    > ====================================
    > Kashif Zeeshan
    > Lead Quality Assurance Engineer / Manager
    >
    > EnterpriseDB Corporation
    > The Enterprise Postgres Company
    >
    >
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  73. Re: WIP/PoC for parallel backup

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-04-07T17:02:55Z

    On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Hi,
    >
    > Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >
    > I have added the shared state as previously described. The new grammar
    > changes
    > are as follows:
    >
    > START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >     - This will generate a unique backupid using pg_strong_random(16) and
    > hex-encoded
    >       it. which is then returned as the result set.
    >     - It will also create a shared state and add it to the hashtable. The
    > hash table size is set
    >       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
    > I think it's
    >       sufficient initial size. max_wal_senders is not used, because it can
    > be set to quite a
    >       large values.
    >
    > JOIN_BACKUP 'backup_id'
    >     - finds 'backup_id' in hashtable and attaches it to server process.
    >
    >
    > SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >     - renamed SEND_FILES to SEND_FILE
    >     - removed START_WAL_LOCATION from this because 'startptr' is now
    > accessible through
    >       shared state.
    >
    > There is no change in other commands:
    > STOP_BACKUP [NOWAIT]
    > LIST_TABLESPACES [PROGRESS]
    > LIST_FILES [TABLESPACE]
    > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >
    > The current patches (v11) have been rebased to the latest master. The
    > backup manifest is enabled
    > by default, so I have disabled it for parallel backup mode and have
    > generated a warning so that
    > user is aware of it and not expect it in the backup.
    >
    
    So, are you working on to make it work? I don't think a parallel backup
    feature should be creating a backup with no manifest.
    
    
    >
    >
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  74. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-07T17:25:05Z

    On Tue, Apr 7, 2020 at 10:03 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >> Hi,
    >>
    >> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>
    >> I have added the shared state as previously described. The new grammar
    >> changes
    >> are as follows:
    >>
    >> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>     - This will generate a unique backupid using pg_strong_random(16) and
    >> hex-encoded
    >>       it. which is then returned as the result set.
    >>     - It will also create a shared state and add it to the hashtable. The
    >> hash table size is set
    >>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
    >> I think it's
    >>       sufficient initial size. max_wal_senders is not used, because it
    >> can be set to quite a
    >>       large values.
    >>
    >> JOIN_BACKUP 'backup_id'
    >>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>
    >>
    >> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>     - renamed SEND_FILES to SEND_FILE
    >>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >> accessible through
    >>       shared state.
    >>
    >> There is no change in other commands:
    >> STOP_BACKUP [NOWAIT]
    >> LIST_TABLESPACES [PROGRESS]
    >> LIST_FILES [TABLESPACE]
    >> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>
    >> The current patches (v11) have been rebased to the latest master. The
    >> backup manifest is enabled
    >> by default, so I have disabled it for parallel backup mode and have
    >> generated a warning so that
    >> user is aware of it and not expect it in the backup.
    >>
    >
    > So, are you working on to make it work? I don't think a parallel backup
    > feature should be creating a backup with no manifest.
    >
    
    I will, however parallel backup is already quite a large patch. So I think
    we should first
    agree on the current work before adding a backup manifest and
    progress-reporting support.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  75. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-07T17:36:21Z

    On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > Non-parallel backup already does the early error checking. I only intended
    > to make parallel behave the same as non-parallel here. So, I agree with
    > you that the behavior of parallel backup should be consistent with the
    > non-parallel one.  Please see the code snippet below from
    > basebackup.c:sendDir()
    
    Oh, OK. So then we need to preserve that behavior, I think. Sorry, I
    didn't realize the check was happening there.
    
    > I am thinking of the following struct for shared state:
    >> typedef struct
    >> {
    >> char backupid[NAMEDATALEN];
    >> XLogRecPtr startptr;
    >> slock_t lock;
    >> int64 throttling_counter;
    >> bool backup_started_in_recovery;
    >> } BackupSharedState;
    
    Looks broadly reasonable. Can anything other than lock and
    throttling_counter change while it's running? If not, how about using
    pg_atomic_uint64 for the throttling counter, and dropping lock? If
    that gets too complicated it's OK to keep it as you have it.
    
    > The shared state structure entries would be maintained by a shared hash table.
    > There will be one structure per parallel backup. Since a single parallel backup
    > can engage more than one wal sender, so I think max_wal_senders might be a little
    > too much; perhaps max_wal_senders/2 since there will be at least 2 connections
    > per parallel backup? Alternatively, we can set a new GUC that defines the maximum
    > number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
    > perhaps, or we can make it user-configurable.
    
    I don't think you need a hash table. Linear search should be fine. And
    I see no point in dividing max_wal_senders by 2 either. The default is
    *10*. You'd need to increase that by more than an order of magnitude
    for a hash table to be needed, and more than that for the shared
    memory consumption to matter.
    
    > The key would be “backupid=hex_encode(pg_random_strong(16))”
    
    wfm
    
    > Progress Reporting:
    > Although I think we should add progress-reporting for parallel backup as a
    > separate patch. The relevant entries for progress-reporting such as
    > ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
    > as well.
    
    I mean, you can separate it for review if you wish, but it would need
    to be committed together.
    
    > START_BACKUP [LABEL '<label>'] [FAST]
    >   - returns startptr, tli, backup_label, unique_backup_id
    
    OK. But what if I want to use this interface for a non-parallel backup?
    
    > STOP_BACKUP [NOWAIT]
    >   - returns startptr, tli, backup_label
    
    I don't think it makes sense for STOP_BACKUP to return the same values
    that START_BACKUP already returned. Presumably STOP_BACKUP should
    return the end LSN. It could also return the backup label and
    tablespace map files, as the corresponding SQL function does, unless
    there's some better way of returning those in this case.
    
    > JOIN_BACKUP ‘unique_backup_id’
    >   - attaches a shared state identified by ‘unique_backup_id’ to a backend process.
    
    OK.
    
    > LIST_TABLESPACES [PROGRESS]
    
    OK.
    
    > LIST_FILES [TABLESPACE]
    
    OK.
    
    > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    
    Why not just LIST_WAL_FILES 'startptr' 'endptr'?
    
    > SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    
    Why parens? That seems useless.
    
    Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and
    SEND_WAL_FILE 'walfilename' as separate commands. But not sure.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  76. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-07T17:37:30Z

    On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > I will, however parallel backup is already quite a large patch. So I think we should first
    > agree on the current work before adding a backup manifest and progress-reporting support.
    
    It's going to be needed for commit, but it may make sense for us to do
    more review of what you've got here before we worry about it.
    
    I'm gonna try to find some time for that as soon as I can.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  77. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-08T05:48:31Z

    Hi Asif,
    
    Thanks for new patches.
    
    Patches need to be rebased on head. Getting a failure while applying the
    0003 patch.
    edb@localhost postgresql]$ git apply
    v11/0003-Parallel-Backup-Backend-Replication-commands.patch
    error: patch failed: src/backend/storage/ipc/ipci.c:147
    error: src/backend/storage/ipc/ipci.c: patch does not apply
    
    I have applied v11 patches on commit -
    23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.
    
    pg_basebackup has a new option "--no-estimate-size",  pg_basebackup crashes
    when using this option.
    
    [edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
    Segmentation fault (core dumped)
    
    --stacktrace
    [edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
    Loaded symbols for /lib64/libselinux.so.1
    Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
    --jobs=2'.
    Program terminated with signal 11, Segmentation fault.
    #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
    optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
    298  while (ISSPACE (*s))
    Missing separate debuginfos, use: debuginfo-install
    keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    (gdb) bt
    #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
    optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
    #1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
    #2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
    pg_basebackup.c:2526
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > > I will, however parallel backup is already quite a large patch. So I
    > think we should first
    > > agree on the current work before adding a backup manifest and
    > progress-reporting support.
    >
    > It's going to be needed for commit, but it may make sense for us to do
    > more review of what you've got here before we worry about it.
    >
    > I'm gonna try to find some time for that as soon as I can.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
  78. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-08T07:09:20Z

    rebased and updated to current master (d025cf88ba). v12 is attahced.
    
    Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:
    
    - LIST_WAL_FILES 'startptr' 'endptr'
    - SEND_FILE 'FILE'  [NOVERIFY_CHECKSUMS]
    
    
    On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    > Hi Asif,
    >
    > Thanks for new patches.
    >
    > Patches need to be rebased on head. Getting a failure while applying the
    > 0003 patch.
    > edb@localhost postgresql]$ git apply
    > v11/0003-Parallel-Backup-Backend-Replication-commands.patch
    > error: patch failed: src/backend/storage/ipc/ipci.c:147
    > error: src/backend/storage/ipc/ipci.c: patch does not apply
    >
    > I have applied v11 patches on commit -
    > 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.
    >
    > pg_basebackup has a new option "--no-estimate-size",  pg_basebackup
    > crashes when using this option.
    >
    > [edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size
    > --jobs=2
    > Segmentation fault (core dumped)
    >
    > --stacktrace
    > [edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
    > Loaded symbols for /lib64/libselinux.so.1
    > Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
    > --jobs=2'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
    > optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
    > 298  while (ISSPACE (*s))
    > Missing separate debuginfos, use: debuginfo-install
    > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
    > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
    > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
    > (gdb) bt
    > #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
    > optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
    > #1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
    > #2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
    > pg_basebackup.c:2526
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > I will, however parallel backup is already quite a large patch. So I
    >> think we should first
    >> > agree on the current work before adding a backup manifest and
    >> progress-reporting support.
    >>
    >> It's going to be needed for commit, but it may make sense for us to do
    >> more review of what you've got here before we worry about it.
    >>
    >> I'm gonna try to find some time for that as soon as I can.
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  79. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-08T13:53:14Z

    On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Hi,
    >
    > Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >
    > I have added the shared state as previously described. The new grammar
    > changes
    > are as follows:
    >
    > START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >     - This will generate a unique backupid using pg_strong_random(16) and
    > hex-encoded
    >       it. which is then returned as the result set.
    >     - It will also create a shared state and add it to the hashtable. The
    > hash table size is set
    >       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
    > I think it's
    >       sufficient initial size. max_wal_senders is not used, because it can
    > be set to quite a
    >       large values.
    >
    > JOIN_BACKUP 'backup_id'
    >     - finds 'backup_id' in hashtable and attaches it to server process.
    >
    >
    > SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >     - renamed SEND_FILES to SEND_FILE
    >     - removed START_WAL_LOCATION from this because 'startptr' is now
    > accessible through
    >       shared state.
    >
    > There is no change in other commands:
    > STOP_BACKUP [NOWAIT]
    > LIST_TABLESPACES [PROGRESS]
    > LIST_FILES [TABLESPACE]
    > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >
    > The current patches (v11) have been rebased to the latest master. The
    > backup manifest is enabled
    > by default, so I have disabled it for parallel backup mode and have
    > generated a warning so that
    > user is aware of it and not expect it in the backup.
    >
    > Hi Asif
    
    I have verified the bug fixes, one bug is fixed and working now as expected
    
    For the verification of the other bug fixes faced following issues, please
    have a look.
    
    
    1) Following bug fixes mentioned below are generating segmentation fault.
    
    Please note for reference I have added a description only as steps were
    given in previous emails of each bug I tried to verify the fix. Backtrace
    is also added with each case which points to one bug for both the cases.
    
    a) The backup failed with errors "error: could not connect to server: could
    not look up local user ID 1000: Too many open files" when the
    max_wal_senders was set to 2000.
    
    
    [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
     /home/edb/Desktop/backup/
    pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    ….
    ….
    pg_basebackup: backup worker (1014) created
    pg_basebackup: backup worker (1015) created
    pg_basebackup: backup worker (1016) created
    pg_basebackup: backup worker (1017) created
    pg_basebackup: error: could not connect to server: could not look up local
    user ID 1000: Too many open files
    Segmentation fault
    [edb@localhost bin]$
    
    
    [edb@localhost bin]$
    [edb@localhost bin]$ gdb pg_basebackup
    /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
    >
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from
    /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    [New LWP 13219]
    [New LWP 13222]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `./pg_basebackup -v -j 1990 -D
    /home/edb/Desktop/backup/'.
    Program terminated with signal 11, Segmentation fault.
    #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    (gdb) bt
    #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    listp=0x7f22272f86c8 <__exit_funcs>,
    run_list_atexit=run_list_atexit@entry=true)
    at exit.c:77
    #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
    #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at
    pg_basebackup.c:2811
    #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    pg_basebackup.c:2765
    (gdb)
    
    
    
    
    b) When executing two backups at the same time, getting FATAL error due to
    max_wal_senders and instead of exit  Backup got completed.
    
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
    pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: backup worker (5) created
    pg_basebackup: backup worker (6) created
    pg_basebackup: error: could not connect to server: FATAL:  number of
    requested standby connections exceeds max_wal_senders (currently 10)
    Segmentation fault (core dumped)
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$ gdb pg_basebackup
    /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
    >
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from
    /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    [New LWP 17041]
    [New LWP 17067]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `./pg_basebackup -v -j 8 -D
    /home/edb/Desktop/backup1/'.
    Program terminated with signal 11, Segmentation fault.
    #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    (gdb) bt
    #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    listp=0x7f051f1436c8 <__exit_funcs>,
    run_list_atexit=run_list_atexit@entry=true)
    at exit.c:77
    #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
    #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at
    pg_basebackup.c:2811
    #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    pg_basebackup.c:2765
    (gdb)
    
    
    
    
    2) The following bug is not fixed yet
    
    A similar case is when DB Server is shut down while the Parallel Backup is
    in progress then the correct error is displayed but then the backup folder
    is not cleaned and leaves a corrupt backup.
    
    [edb@localhost bin]$
    [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
    pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: backup worker (5) created
    pg_basebackup: backup worker (6) created
    pg_basebackup: backup worker (7) created
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: removing contents of data directory
    "/home/edb/Desktop/backup/"
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    
    
    [edb@localhost bin]$
    [edb@localhost bin]$ ls /home/edb/Desktop/backup
    base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
     pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    
    
    
    Thanks
    Kashif Zeeshan
    
    >
    >
    > On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >> kashif.zeeshan@enterprisedb.com> wrote:
    >>
    >>> Hi Asif
    >>>
    >>> When a non-existent slot is used with tablespace then correct error is
    >>> displayed but then the backup folder is not cleaned and leaves a corrupt
    >>> backup.
    >>>
    >>> Steps
    >>> =======
    >>>
    >>> edb@localhost bin]$
    >>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>> [edb@localhost bin]$
    >>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>> CREATE TABLESPACE
    >>> postgres=#
    >>> postgres=# create table t1 (a int) tablespace tbl1;
    >>> CREATE TABLE
    >>> postgres=# insert into t1 values(100);
    >>> INSERT 0 1
    >>> postgres=# insert into t1 values(200);
    >>> INSERT 0 1
    >>> postgres=# insert into t1 values(300);
    >>> INSERT 0 1
    >>> postgres=#
    >>>
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: error: could not send replication command
    >>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>> pg_basebackup: backup worker (0) created
    >>> pg_basebackup: backup worker (1) created
    >>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>> pg_basebackup: waiting for background process to finish streaming ...
    >>> pg_basebackup: error: child thread exited with error 1
    >>> [edb@localhost bin]$
    >>>
    >>> backup folder not cleaned
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>             postgresql.conf
    >>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>  postgresql.auto.conf
    >>> [edb@localhost bin]$
    >>>
    >>>
    >>>
    >>>
    >>> If the same case is executed without the parallel backup patch then the
    >>> backup folder is cleaned after the error is displayed.
    >>>
    >>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: error: could not send replication command
    >>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>> pg_basebackup: waiting for background process to finish streaming ...
    >>> pg_basebackup: error: child process exited with exit code 1
    >>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>> pg_basebackup: changes to tablespace directories will not be undone
    >>>
    >>
    >>
    >> Hi Asif
    >>
    >> A similar case is when DB Server is shut down while the Parallel Backup
    >> is in progress then the correct error is displayed but then the backup
    >> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
    >> solve all these cases where clean up is not done when parallel backup is
    >> failed.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >> -j 8
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: backup worker (5) created
    >> pg_basebackup: backup worker (6) created
    >> pg_basebackup: backup worker (7) created
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >>
    >> Same case when executed on pg_basebackup without the Parallel backup
    >> patch then proper clean up is done.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: removing contents of data directory
    >> "/home/edb/Desktop/backup/"
    >> [edb@localhost bin]$
    >>
    >> Thanks
    >>
    >>
    >>>
    >>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>>
    >>>>
    >>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>> wrote:
    >>>>
    >>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>>> wrote:
    >>>>> >> Why would you need to do that? As long as the process where
    >>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>> >
    >>>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
    >>>>> while the backup is
    >>>>> > in progress. So if the backup is a large one, early error detection
    >>>>> would be much beneficial.
    >>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>
    >>>>> Because non-parallel backup does not feature early detection of this
    >>>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>>> thread in a separate patch. A patch proposing to make parallel backup
    >>>>> inconsistent in behavior with non-parallel backup will be rejected, at
    >>>>> least if I have anything to say about it.
    >>>>>
    >>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>> current situation is not great, but promotions ought to be relatively
    >>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>> also worth considering whether the right fix is to figure out how to
    >>>>> make that case actually work, rather than just making it fail quicker.
    >>>>> I don't currently understand the reason for the prohibition so I can't
    >>>>> express an intelligent opinion on what the right answer is here, but
    >>>>> it seems like it ought to be investigated before somebody goes and
    >>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>
    >>>>
    >>>> Non-parallel backup already does the early error checking. I only
    >>>> intended
    >>>>
    >>>> to make parallel behave the same as non-parallel here. So, I agree with
    >>>>
    >>>> you that the behavior of parallel backup should be consistent with the
    >>>>
    >>>> non-parallel one.  Please see the code snippet below from
    >>>>
    >>>> basebackup.c:sendDir()
    >>>>
    >>>>
    >>>> /*
    >>>>>
    >>>>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>>>
    >>>>>  * error in that case. The error handler further up will call
    >>>>>
    >>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>
    >>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>
    >>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>
    >>>>>  * the backup early than continue to the end and fail there.
    >>>>>
    >>>>>  */
    >>>>>
    >>>>> CHECK_FOR_INTERRUPTS();
    >>>>>
    >>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>
    >>>>> ereport(ERROR,
    >>>>>
    >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>
    >>>>> errmsg("the standby was promoted during online backup"),
    >>>>>
    >>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>
    >>>>> "and should not be used. "
    >>>>>
    >>>>> "Try taking another online backup.")));
    >>>>>
    >>>>>
    >>>>> > Okay, then I will add the shared state. And since we are adding the
    >>>>> shared state, we can use
    >>>>> > that for throttling, progress-reporting and standby early error
    >>>>> checking.
    >>>>>
    >>>>> Please propose a grammar here for all the new replication commands you
    >>>>> plan to add before going and implement everything. That will make it
    >>>>> easier to hash out the design without forcing you to keep changing the
    >>>>> code. Your design should include a sketch of how several sets of
    >>>>> coordinating backends taking several concurrent parallel backups will
    >>>>> end up with one shared state per parallel backup.
    >>>>>
    >>>>> > There are two possible options:
    >>>>> >
    >>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    >>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>>> >
    >>>>> > This BackupID should be given back as a response to start backup
    >>>>> command. All client workers
    >>>>> > must append this ID to all parallel backup replication commands. So
    >>>>> that we can use this identifier
    >>>>> > to search for that particular backup. Does that sound good?
    >>>>>
    >>>>> Using the WAL start location as the backup ID seems like it might be
    >>>>> problematic -- could a single checkpoint not end up as the start
    >>>>> location for multiple backups started at the same time? Whether that's
    >>>>> possible now or not, it seems unwise to hard-wire that assumption into
    >>>>> the wire protocol.
    >>>>>
    >>>>> I was thinking that perhaps the client should generate a unique backup
    >>>>> ID, e.g. leader does:
    >>>>>
    >>>>> START_BACKUP unique_backup_id [options]...
    >>>>>
    >>>>> And then others do:
    >>>>>
    >>>>> JOIN_BACKUP unique_backup_id
    >>>>>
    >>>>> My thought is that you will have a number of shared memory structure
    >>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>> state for one backup. The shared state will include
    >>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>> JOIN_BACKUP would search for one by name.
    >>>>>
    >>>>> If you want to generate the name on the server side, then I suppose
    >>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>> and clients would have to specify that same backup ID when invoking
    >>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>>>> better. Either way, the backup ID should be something long and hard to
    >>>>> guess, not e.g. the leader processes' PID. I think we should generate
    >>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>>>> result to get a string. That way there's almost no risk of two backup
    >>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>> user trying to screw up somebody else's parallel backup by choosing a
    >>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>> user with enough access to do that sort of thing can probably cause a
    >>>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>>> intentional collisions robustly here, so I think we should.
    >>>>>
    >>>>>
    >>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>> unique_backup_id’
    >>>> to make workers find the relevant shared state. There won't be any need
    >>>> for changing
    >>>> the grammar for any other command. The START_BACKUP can return the
    >>>> unique_backup_id
    >>>> in the result set.
    >>>>
    >>>> I am thinking of the following struct for shared state:
    >>>>
    >>>>> *typedef* *struct*
    >>>>>
    >>>>> {
    >>>>>
    >>>>> *char* backupid[NAMEDATALEN];
    >>>>>
    >>>>> XLogRecPtr startptr;
    >>>>>
    >>>>>
    >>>>> slock_t lock;
    >>>>>
    >>>>> int64 throttling_counter;
    >>>>>
    >>>>> *bool* backup_started_in_recovery;
    >>>>>
    >>>>> } BackupSharedState;
    >>>>>
    >>>>>
    >>>> The shared state structure entries would be maintained by a shared hash
    >>>> table.
    >>>> There will be one structure per parallel backup. Since a single
    >>>> parallel backup
    >>>> can engage more than one wal sender, so I think max_wal_senders might
    >>>> be a little
    >>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>> connections
    >>>> per parallel backup? Alternatively, we can set a new GUC that defines
    >>>> the maximum
    >>>> number of for concurrent parallel backups i.e.
    >>>> ‘max_concurent_backups_allowed = 10’
    >>>> perhaps, or we can make it user-configurable.
    >>>>
    >>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>
    >>>> Checking for Standby Promotion:
    >>>> At the START_BACKUP command, we initialize
    >>>> BackupSharedState.backup_started_in_recovery
    >>>> and keep checking it whenever send_file () is called to send a new file.
    >>>>
    >>>> Throttling:
    >>>> BackupSharedState.throttling_counter - The throttling logic remains the
    >>>> same
    >>>> as for non-parallel backup with the exception that multiple threads
    >>>> will now be
    >>>> updating it. So in parallel backup, this will represent the overall
    >>>> bytes that
    >>>> have been transferred. So the workers would sleep if they have exceeded
    >>>> the
    >>>> limit. Hence, the shared state carries a lock to safely update the
    >>>> throttling
    >>>> value atomically.
    >>>>
    >>>> Progress Reporting:
    >>>> Although I think we should add progress-reporting for parallel backup
    >>>> as a
    >>>> separate patch. The relevant entries for progress-reporting such as
    >>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>> structure
    >>>> as well.
    >>>>
    >>>>
    >>>> Grammar:
    >>>> There is a change in the resultset being returned for START_BACKUP
    >>>> command;
    >>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>> command is
    >>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>> changes
    >>>> to the grammar.
    >>>>
    >>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>> STOP_BACKUP [NOWAIT]
    >>>>   - returns startptr, tli, backup_label
    >>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>> backend process.
    >>>>
    >>>> LIST_TABLESPACES [PROGRESS]
    >>>> LIST_FILES [TABLESPACE]
    >>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>
    >>>>
    >>>> --
    >>>> Asif Rehman
    >>>> Highgo Software (Canada/China/Pakistan)
    >>>> URL : www.highgo.ca
    >>>>
    >>>>
    >>>
    >>> --
    >>> Regards
    >>> ====================================
    >>> Kashif Zeeshan
    >>> Lead Quality Assurance Engineer / Manager
    >>>
    >>> EnterpriseDB Corporation
    >>> The Enterprise Postgres Company
    >>>
    >>>
    >>
    >> --
    >> Regards
    >> ====================================
    >> Kashif Zeeshan
    >> Lead Quality Assurance Engineer / Manager
    >>
    >> EnterpriseDB Corporation
    >> The Enterprise Postgres Company
    >>
    >>
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  80. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-14T12:33:16Z

    On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    >
    >
    > On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >
    >> Hi,
    >>
    >> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>
    >> I have added the shared state as previously described. The new grammar
    >> changes
    >> are as follows:
    >>
    >> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>     - This will generate a unique backupid using pg_strong_random(16) and
    >> hex-encoded
    >>       it. which is then returned as the result set.
    >>     - It will also create a shared state and add it to the hashtable. The
    >> hash table size is set
    >>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
    >> I think it's
    >>       sufficient initial size. max_wal_senders is not used, because it
    >> can be set to quite a
    >>       large values.
    >>
    >> JOIN_BACKUP 'backup_id'
    >>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>
    >>
    >> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>     - renamed SEND_FILES to SEND_FILE
    >>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >> accessible through
    >>       shared state.
    >>
    >> There is no change in other commands:
    >> STOP_BACKUP [NOWAIT]
    >> LIST_TABLESPACES [PROGRESS]
    >> LIST_FILES [TABLESPACE]
    >> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>
    >> The current patches (v11) have been rebased to the latest master. The
    >> backup manifest is enabled
    >> by default, so I have disabled it for parallel backup mode and have
    >> generated a warning so that
    >> user is aware of it and not expect it in the backup.
    >>
    >> Hi Asif
    >
    > I have verified the bug fixes, one bug is fixed and working now as
    > expected
    >
    > For the verification of the other bug fixes faced following issues, please
    > have a look.
    >
    >
    > 1) Following bug fixes mentioned below are generating segmentation fault.
    >
    > Please note for reference I have added a description only as steps were
    > given in previous emails of each bug I tried to verify the fix. Backtrace
    > is also added with each case which points to one bug for both the cases.
    >
    > a) The backup failed with errors "error: could not connect to server:
    > could not look up local user ID 1000: Too many open files" when the
    > max_wal_senders was set to 2000.
    >
    >
    > [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
    >  /home/edb/Desktop/backup/
    > pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: backup worker (3) created
    > ….
    > ….
    > pg_basebackup: backup worker (1014) created
    > pg_basebackup: backup worker (1015) created
    > pg_basebackup: backup worker (1016) created
    > pg_basebackup: backup worker (1017) created
    > pg_basebackup: error: could not connect to server: could not look up local
    > user ID 1000: Too many open files
    > Segmentation fault
    > [edb@localhost bin]$
    >
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$ gdb pg_basebackup
    > /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    > Copyright (C) 2013 Free Software Foundation, Inc.
    > License GPLv3+: GNU GPL version 3 or later <
    > http://gnu.org/licenses/gpl.html>
    > This is free software: you are free to change and redistribute it.
    > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    > and "show warranty" for details.
    > This GDB was configured as "x86_64-redhat-linux-gnu".
    > For bug reporting instructions, please see:
    > <http://www.gnu.org/software/gdb/bugs/>...
    > Reading symbols from
    > /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    > [New LWP 13219]
    > [New LWP 13222]
    > [Thread debugging using libthread_db enabled]
    > Using host libthread_db library "/lib64/libthread_db.so.1".
    > Core was generated by `./pg_basebackup -v -j 1990 -D
    > /home/edb/Desktop/backup/'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    > 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    > (gdb) bt
    > #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    > #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    > #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    > #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    > listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    > at exit.c:77
    > #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
    > #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at
    > pg_basebackup.c:2811
    > #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    > #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    > pg_basebackup.c:2765
    > (gdb)
    >
    >
    >
    >
    > b) When executing two backups at the same time, getting FATAL error due to
    > max_wal_senders and instead of exit  Backup got completed.
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
    >  /home/edb/Desktop/backup1/
    > pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: backup worker (5) created
    > pg_basebackup: backup worker (6) created
    > pg_basebackup: error: could not connect to server: FATAL:  number of
    > requested standby connections exceeds max_wal_senders (currently 10)
    > Segmentation fault (core dumped)
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$ gdb pg_basebackup
    > /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    > Copyright (C) 2013 Free Software Foundation, Inc.
    > License GPLv3+: GNU GPL version 3 or later <
    > http://gnu.org/licenses/gpl.html>
    > This is free software: you are free to change and redistribute it.
    > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    > and "show warranty" for details.
    > This GDB was configured as "x86_64-redhat-linux-gnu".
    > For bug reporting instructions, please see:
    > <http://www.gnu.org/software/gdb/bugs/>...
    > Reading symbols from
    > /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    > [New LWP 17041]
    > [New LWP 17067]
    > [Thread debugging using libthread_db enabled]
    > Using host libthread_db library "/lib64/libthread_db.so.1".
    > Core was generated by `./pg_basebackup -v -j 8 -D
    > /home/edb/Desktop/backup1/'.
    > Program terminated with signal 11, Segmentation fault.
    > #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    > 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    > (gdb) bt
    > #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    > #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    > #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    > #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    > listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    > at exit.c:77
    > #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
    > #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
    > at pg_basebackup.c:2811
    > #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    > #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    > pg_basebackup.c:2765
    > (gdb)
    >
    >
    >
    >
    > 2) The following bug is not fixed yet
    >
    > A similar case is when DB Server is shut down while the Parallel Backup is
    > in progress then the correct error is displayed but then the backup folder
    > is not cleaned and leaves a corrupt backup.
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
    > pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: backup worker (5) created
    > pg_basebackup: backup worker (6) created
    > pg_basebackup: backup worker (7) created
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > pg_basebackup: removing contents of data directory
    > "/home/edb/Desktop/backup/"
    > pg_basebackup: error: could not read COPY data: server closed the
    > connection unexpectedly
    > This probably means the server terminated abnormally
    > before or while processing the request.
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    >
    >
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$ ls /home/edb/Desktop/backup
    > base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    > pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    > pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
    >  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    >
    >
    >
    >
    > Thanks
    > Kashif Zeeshan
    >
    >>
    >>
    >> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    >> kashif.zeeshan@enterprisedb.com> wrote:
    >>
    >>>
    >>>
    >>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>
    >>>> Hi Asif
    >>>>
    >>>> When a non-existent slot is used with tablespace then correct error is
    >>>> displayed but then the backup folder is not cleaned and leaves a corrupt
    >>>> backup.
    >>>>
    >>>> Steps
    >>>> =======
    >>>>
    >>>> edb@localhost bin]$
    >>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>>> [edb@localhost bin]$
    >>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>>> CREATE TABLESPACE
    >>>> postgres=#
    >>>> postgres=# create table t1 (a int) tablespace tbl1;
    >>>> CREATE TABLE
    >>>> postgres=# insert into t1 values(100);
    >>>> INSERT 0 1
    >>>> postgres=# insert into t1 values(200);
    >>>> INSERT 0 1
    >>>> postgres=# insert into t1 values(300);
    >>>> INSERT 0 1
    >>>> postgres=#
    >>>>
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: error: could not send replication command
    >>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>> pg_basebackup: error: child thread exited with error 1
    >>>> [edb@localhost bin]$
    >>>>
    >>>> backup folder not cleaned
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>>             postgresql.conf
    >>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>>  postgresql.auto.conf
    >>>> [edb@localhost bin]$
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> If the same case is executed without the parallel backup patch then the
    >>>> backup folder is cleaned after the error is displayed.
    >>>>
    >>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: error: could not send replication command
    >>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>> pg_basebackup: error: child process exited with exit code 1
    >>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>>> pg_basebackup: changes to tablespace directories will not be undone
    >>>>
    >>>
    >>>
    >>> Hi Asif
    >>>
    >>> A similar case is when DB Server is shut down while the Parallel Backup
    >>> is in progress then the correct error is displayed but then the backup
    >>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
    >>> solve all these cases where clean up is not done when parallel backup is
    >>> failed.
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>> -j 8
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    >>> pg_basebackup: backup worker (0) created
    >>> pg_basebackup: backup worker (1) created
    >>> pg_basebackup: backup worker (2) created
    >>> pg_basebackup: backup worker (3) created
    >>> pg_basebackup: backup worker (4) created
    >>> pg_basebackup: backup worker (5) created
    >>> pg_basebackup: backup worker (6) created
    >>> pg_basebackup: backup worker (7) created
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>>
    >>> Same case when executed on pg_basebackup without the Parallel backup
    >>> patch then proper clean up is done.
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> pg_basebackup: removing contents of data directory
    >>> "/home/edb/Desktop/backup/"
    >>> [edb@localhost bin]$
    >>>
    >>> Thanks
    >>>
    >>>
    >>>>
    >>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>> wrote:
    >>>>
    >>>>>
    >>>>>
    >>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>>> wrote:
    >>>>>
    >>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>> wrote:
    >>>>>> >> Why would you need to do that? As long as the process where
    >>>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>>> >
    >>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
    >>>>>> not while the backup is
    >>>>>> > in progress. So if the backup is a large one, early error detection
    >>>>>> would be much beneficial.
    >>>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>>
    >>>>>> Because non-parallel backup does not feature early detection of this
    >>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>>>> thread in a separate patch. A patch proposing to make parallel backup
    >>>>>> inconsistent in behavior with non-parallel backup will be rejected, at
    >>>>>> least if I have anything to say about it.
    >>>>>>
    >>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>>> current situation is not great, but promotions ought to be relatively
    >>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>>> also worth considering whether the right fix is to figure out how to
    >>>>>> make that case actually work, rather than just making it fail quicker.
    >>>>>> I don't currently understand the reason for the prohibition so I can't
    >>>>>> express an intelligent opinion on what the right answer is here, but
    >>>>>> it seems like it ought to be investigated before somebody goes and
    >>>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>>
    >>>>>
    >>>>> Non-parallel backup already does the early error checking. I only
    >>>>> intended
    >>>>>
    >>>>> to make parallel behave the same as non-parallel here. So, I agree with
    >>>>>
    >>>>> you that the behavior of parallel backup should be consistent with the
    >>>>>
    >>>>> non-parallel one.  Please see the code snippet below from
    >>>>>
    >>>>> basebackup.c:sendDir()
    >>>>>
    >>>>>
    >>>>> /*
    >>>>>>
    >>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>>>>
    >>>>>>  * error in that case. The error handler further up will call
    >>>>>>
    >>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>>
    >>>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>>
    >>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>>
    >>>>>>  * the backup early than continue to the end and fail there.
    >>>>>>
    >>>>>>  */
    >>>>>>
    >>>>>> CHECK_FOR_INTERRUPTS();
    >>>>>>
    >>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>>
    >>>>>> ereport(ERROR,
    >>>>>>
    >>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>>
    >>>>>> errmsg("the standby was promoted during online backup"),
    >>>>>>
    >>>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>>
    >>>>>> "and should not be used. "
    >>>>>>
    >>>>>> "Try taking another online backup.")));
    >>>>>>
    >>>>>>
    >>>>>> > Okay, then I will add the shared state. And since we are adding the
    >>>>>> shared state, we can use
    >>>>>> > that for throttling, progress-reporting and standby early error
    >>>>>> checking.
    >>>>>>
    >>>>>> Please propose a grammar here for all the new replication commands you
    >>>>>> plan to add before going and implement everything. That will make it
    >>>>>> easier to hash out the design without forcing you to keep changing the
    >>>>>> code. Your design should include a sketch of how several sets of
    >>>>>> coordinating backends taking several concurrent parallel backups will
    >>>>>> end up with one shared state per parallel backup.
    >>>>>>
    >>>>>> > There are two possible options:
    >>>>>> >
    >>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
    >>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>>>> >
    >>>>>> > This BackupID should be given back as a response to start backup
    >>>>>> command. All client workers
    >>>>>> > must append this ID to all parallel backup replication commands. So
    >>>>>> that we can use this identifier
    >>>>>> > to search for that particular backup. Does that sound good?
    >>>>>>
    >>>>>> Using the WAL start location as the backup ID seems like it might be
    >>>>>> problematic -- could a single checkpoint not end up as the start
    >>>>>> location for multiple backups started at the same time? Whether that's
    >>>>>> possible now or not, it seems unwise to hard-wire that assumption into
    >>>>>> the wire protocol.
    >>>>>>
    >>>>>> I was thinking that perhaps the client should generate a unique backup
    >>>>>> ID, e.g. leader does:
    >>>>>>
    >>>>>> START_BACKUP unique_backup_id [options]...
    >>>>>>
    >>>>>> And then others do:
    >>>>>>
    >>>>>> JOIN_BACKUP unique_backup_id
    >>>>>>
    >>>>>> My thought is that you will have a number of shared memory structure
    >>>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>>> state for one backup. The shared state will include
    >>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>>> JOIN_BACKUP would search for one by name.
    >>>>>>
    >>>>>> If you want to generate the name on the server side, then I suppose
    >>>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>>> and clients would have to specify that same backup ID when invoking
    >>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>>>>> better. Either way, the backup ID should be something long and hard to
    >>>>>> guess, not e.g. the leader processes' PID. I think we should generate
    >>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>>>>> result to get a string. That way there's almost no risk of two backup
    >>>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>>> user trying to screw up somebody else's parallel backup by choosing a
    >>>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>>> user with enough access to do that sort of thing can probably cause a
    >>>>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>>>> intentional collisions robustly here, so I think we should.
    >>>>>>
    >>>>>>
    >>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>>> unique_backup_id’
    >>>>> to make workers find the relevant shared state. There won't be any
    >>>>> need for changing
    >>>>> the grammar for any other command. The START_BACKUP can return the
    >>>>> unique_backup_id
    >>>>> in the result set.
    >>>>>
    >>>>> I am thinking of the following struct for shared state:
    >>>>>
    >>>>>> *typedef* *struct*
    >>>>>>
    >>>>>> {
    >>>>>>
    >>>>>> *char* backupid[NAMEDATALEN];
    >>>>>>
    >>>>>> XLogRecPtr startptr;
    >>>>>>
    >>>>>>
    >>>>>> slock_t lock;
    >>>>>>
    >>>>>> int64 throttling_counter;
    >>>>>>
    >>>>>> *bool* backup_started_in_recovery;
    >>>>>>
    >>>>>> } BackupSharedState;
    >>>>>>
    >>>>>>
    >>>>> The shared state structure entries would be maintained by a shared
    >>>>> hash table.
    >>>>> There will be one structure per parallel backup. Since a single
    >>>>> parallel backup
    >>>>> can engage more than one wal sender, so I think max_wal_senders might
    >>>>> be a little
    >>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>>> connections
    >>>>> per parallel backup? Alternatively, we can set a new GUC that defines
    >>>>> the maximum
    >>>>> number of for concurrent parallel backups i.e.
    >>>>> ‘max_concurent_backups_allowed = 10’
    >>>>> perhaps, or we can make it user-configurable.
    >>>>>
    >>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>>
    >>>>> Checking for Standby Promotion:
    >>>>> At the START_BACKUP command, we initialize
    >>>>> BackupSharedState.backup_started_in_recovery
    >>>>> and keep checking it whenever send_file () is called to send a new
    >>>>> file.
    >>>>>
    >>>>> Throttling:
    >>>>> BackupSharedState.throttling_counter - The throttling logic remains
    >>>>> the same
    >>>>> as for non-parallel backup with the exception that multiple threads
    >>>>> will now be
    >>>>> updating it. So in parallel backup, this will represent the overall
    >>>>> bytes that
    >>>>> have been transferred. So the workers would sleep if they have
    >>>>> exceeded the
    >>>>> limit. Hence, the shared state carries a lock to safely update the
    >>>>> throttling
    >>>>> value atomically.
    >>>>>
    >>>>> Progress Reporting:
    >>>>> Although I think we should add progress-reporting for parallel backup
    >>>>> as a
    >>>>> separate patch. The relevant entries for progress-reporting such as
    >>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>>> structure
    >>>>> as well.
    >>>>>
    >>>>>
    >>>>> Grammar:
    >>>>> There is a change in the resultset being returned for START_BACKUP
    >>>>> command;
    >>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>>> command is
    >>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>>> changes
    >>>>> to the grammar.
    >>>>>
    >>>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>>> STOP_BACKUP [NOWAIT]
    >>>>>   - returns startptr, tli, backup_label
    >>>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>>> backend process.
    >>>>>
    >>>>> LIST_TABLESPACES [PROGRESS]
    >>>>> LIST_FILES [TABLESPACE]
    >>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>>
    >>>>>
    >
    
    Hi,
    
    rebased and updated to the current master (8128b0c1). v13 is attached.
    
    - Fixes the above reported issues.
    
    - Added progress-reporting support for parallel:
    For this, 'backup_streamed' is moved to a shared structure (BackupState) as
    pg_atomic_uint64 variable. The worker processes will keep incrementing this
    variable.
    
    While files are being transferred from server to client. The main process
    remains
    in an idle state. So after each increment, the worker process will signal
    master to
    update the stats in pg_stat_progress_basebackup view.
    
    The 'tablespace_streamed' column is not updated and will remain empty. This
    is
    because multiple workers may be copying files from different tablespaces.
    
    
    - Added backup manifest:
    The backend workers maintain their own manifest file which contains a list
    of files
    that are being transferred by the work. Once all backup files are
    transferred, the
    workers will create a temp file as
    ('pg_tempdir/temp_file_prefix_backupid.workerid')
    to write the content of the manifest file from BufFile. The workers won’t
    add the
    header, nor the WAL information in their manifest. These two will be added
    by the
    main process while merging all worker manifest files.
    
    The main process will read these individual files and concatenate them into
    a single file
    which is then sent back to the client.
    
    The manifest file is created when the following command is received:
    
    >     BUILD_MANIFEST 'backupid'
    
    
    This is a new replication command. It is sent when pg_basebackup has copied
    all the
    $PGDATA files including WAL files.
    
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  81. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-14T13:32:40Z

    Hi Asif
    
    Getting the following error on Parallel backup when --no-manifest option is
    used.
    
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
     /home/edb/Desktop/backup/ --no-manifest
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_10223"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: write-ahead log end point: 0/2000100
    pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
     could not open file
    "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
    or directory
    pg_basebackup: removing contents of data directory
    "/home/edb/Desktop/backup/"
    [edb@localhost bin]$
    
    Thanks
    
    On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>> Hi,
    >>>
    >>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>>
    >>> I have added the shared state as previously described. The new grammar
    >>> changes
    >>> are as follows:
    >>>
    >>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>>     - This will generate a unique backupid using pg_strong_random(16)
    >>> and hex-encoded
    >>>       it. which is then returned as the result set.
    >>>     - It will also create a shared state and add it to the hashtable.
    >>> The hash table size is set
    >>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
    >>> dynamically, I think it's
    >>>       sufficient initial size. max_wal_senders is not used, because it
    >>> can be set to quite a
    >>>       large values.
    >>>
    >>> JOIN_BACKUP 'backup_id'
    >>>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>>
    >>>
    >>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>>     - renamed SEND_FILES to SEND_FILE
    >>>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >>> accessible through
    >>>       shared state.
    >>>
    >>> There is no change in other commands:
    >>> STOP_BACKUP [NOWAIT]
    >>> LIST_TABLESPACES [PROGRESS]
    >>> LIST_FILES [TABLESPACE]
    >>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>
    >>> The current patches (v11) have been rebased to the latest master. The
    >>> backup manifest is enabled
    >>> by default, so I have disabled it for parallel backup mode and have
    >>> generated a warning so that
    >>> user is aware of it and not expect it in the backup.
    >>>
    >>> Hi Asif
    >>
    >> I have verified the bug fixes, one bug is fixed and working now as
    >> expected
    >>
    >> For the verification of the other bug fixes faced following issues,
    >> please have a look.
    >>
    >>
    >> 1) Following bug fixes mentioned below are generating segmentation fault.
    >>
    >> Please note for reference I have added a description only as steps were
    >> given in previous emails of each bug I tried to verify the fix. Backtrace
    >> is also added with each case which points to one bug for both the cases.
    >>
    >> a) The backup failed with errors "error: could not connect to server:
    >> could not look up local user ID 1000: Too many open files" when the
    >> max_wal_senders was set to 2000.
    >>
    >>
    >> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
    >>  /home/edb/Desktop/backup/
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> ….
    >> ….
    >> pg_basebackup: backup worker (1014) created
    >> pg_basebackup: backup worker (1015) created
    >> pg_basebackup: backup worker (1016) created
    >> pg_basebackup: backup worker (1017) created
    >> pg_basebackup: error: could not connect to server: could not look up
    >> local user ID 1000: Too many open files
    >> Segmentation fault
    >> [edb@localhost bin]$
    >>
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ gdb pg_basebackup
    >> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    >> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >> Copyright (C) 2013 Free Software Foundation, Inc.
    >> License GPLv3+: GNU GPL version 3 or later <
    >> http://gnu.org/licenses/gpl.html>
    >> This is free software: you are free to change and redistribute it.
    >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    >> and "show warranty" for details.
    >> This GDB was configured as "x86_64-redhat-linux-gnu".
    >> For bug reporting instructions, please see:
    >> <http://www.gnu.org/software/gdb/bugs/>...
    >> Reading symbols from
    >> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >> [New LWP 13219]
    >> [New LWP 13222]
    >> [Thread debugging using libthread_db enabled]
    >> Using host libthread_db library "/lib64/libthread_db.so.1".
    >> Core was generated by `./pg_basebackup -v -j 1990 -D
    >> /home/edb/Desktop/backup/'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >> (gdb) bt
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    >> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >> at exit.c:77
    >> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
    >> at pg_basebackup.c:2811
    >> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    >> pg_basebackup.c:2765
    >> (gdb)
    >>
    >>
    >>
    >>
    >> b) When executing two backups at the same time, getting FATAL error due
    >> to max_wal_senders and instead of exit  Backup got completed.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
    >>  /home/edb/Desktop/backup1/
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: backup worker (5) created
    >> pg_basebackup: backup worker (6) created
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> Segmentation fault (core dumped)
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ gdb pg_basebackup
    >> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    >> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >> Copyright (C) 2013 Free Software Foundation, Inc.
    >> License GPLv3+: GNU GPL version 3 or later <
    >> http://gnu.org/licenses/gpl.html>
    >> This is free software: you are free to change and redistribute it.
    >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    >> and "show warranty" for details.
    >> This GDB was configured as "x86_64-redhat-linux-gnu".
    >> For bug reporting instructions, please see:
    >> <http://www.gnu.org/software/gdb/bugs/>...
    >> Reading symbols from
    >> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >> [New LWP 17041]
    >> [New LWP 17067]
    >> [Thread debugging using libthread_db enabled]
    >> Using host libthread_db library "/lib64/libthread_db.so.1".
    >> Core was generated by `./pg_basebackup -v -j 8 -D
    >> /home/edb/Desktop/backup1/'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >> (gdb) bt
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    >> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >> at exit.c:77
    >> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
    >> at pg_basebackup.c:2811
    >> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    >> pg_basebackup.c:2765
    >> (gdb)
    >>
    >>
    >>
    >>
    >> 2) The following bug is not fixed yet
    >>
    >> A similar case is when DB Server is shut down while the Parallel Backup
    >> is in progress then the correct error is displayed but then the backup
    >> folder is not cleaned and leaves a corrupt backup.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
    >> 8
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: backup worker (5) created
    >> pg_basebackup: backup worker (6) created
    >> pg_basebackup: backup worker (7) created
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: removing contents of data directory
    >> "/home/edb/Desktop/backup/"
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >>
    >>
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    >> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    >> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
    >>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >>
    >>
    >>
    >>
    >> Thanks
    >> Kashif Zeeshan
    >>
    >>>
    >>>
    >>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    >>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>
    >>>>
    >>>>
    >>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif
    >>>>>
    >>>>> When a non-existent slot is used with tablespace then correct error is
    >>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
    >>>>> backup.
    >>>>>
    >>>>> Steps
    >>>>> =======
    >>>>>
    >>>>> edb@localhost bin]$
    >>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>>>> [edb@localhost bin]$
    >>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>>>> CREATE TABLESPACE
    >>>>> postgres=#
    >>>>> postgres=# create table t1 (a int) tablespace tbl1;
    >>>>> CREATE TABLE
    >>>>> postgres=# insert into t1 values(100);
    >>>>> INSERT 0 1
    >>>>> postgres=# insert into t1 values(200);
    >>>>> INSERT 0 1
    >>>>> postgres=# insert into t1 values(300);
    >>>>> INSERT 0 1
    >>>>> postgres=#
    >>>>>
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: error: could not send replication command
    >>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>>>> pg_basebackup: backup worker (0) created
    >>>>> pg_basebackup: backup worker (1) created
    >>>>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>> pg_basebackup: error: child thread exited with error 1
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>> backup folder not cleaned
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>>>             postgresql.conf
    >>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>>>  postgresql.auto.conf
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>> If the same case is executed without the parallel backup patch then
    >>>>> the backup folder is cleaned after the error is displayed.
    >>>>>
    >>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: error: could not send replication command
    >>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>>>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>> pg_basebackup: error: child process exited with exit code 1
    >>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>>>> pg_basebackup: changes to tablespace directories will not be undone
    >>>>>
    >>>>
    >>>>
    >>>> Hi Asif
    >>>>
    >>>> A similar case is when DB Server is shut down while the Parallel Backup
    >>>> is in progress then the correct error is displayed but then the backup
    >>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
    >>>> solve all these cases where clean up is not done when parallel backup is
    >>>> failed.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> -j 8
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: backup worker (2) created
    >>>> pg_basebackup: backup worker (3) created
    >>>> pg_basebackup: backup worker (4) created
    >>>> pg_basebackup: backup worker (5) created
    >>>> pg_basebackup: backup worker (6) created
    >>>> pg_basebackup: backup worker (7) created
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>>
    >>>> Same case when executed on pg_basebackup without the Parallel backup
    >>>> patch then proper clean up is done.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: removing contents of data directory
    >>>> "/home/edb/Desktop/backup/"
    >>>> [edb@localhost bin]$
    >>>>
    >>>> Thanks
    >>>>
    >>>>
    >>>>>
    >>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>>> wrote:
    >>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>>>> wrote:
    >>>>>>
    >>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>>> wrote:
    >>>>>>> >> Why would you need to do that? As long as the process where
    >>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>>>> >
    >>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
    >>>>>>> not while the backup is
    >>>>>>> > in progress. So if the backup is a large one, early error
    >>>>>>> detection would be much beneficial.
    >>>>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>>>
    >>>>>>> Because non-parallel backup does not feature early detection of this
    >>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>>>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>>>>> thread in a separate patch. A patch proposing to make parallel backup
    >>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
    >>>>>>> at
    >>>>>>> least if I have anything to say about it.
    >>>>>>>
    >>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>>>> current situation is not great, but promotions ought to be relatively
    >>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>>>> also worth considering whether the right fix is to figure out how to
    >>>>>>> make that case actually work, rather than just making it fail
    >>>>>>> quicker.
    >>>>>>> I don't currently understand the reason for the prohibition so I
    >>>>>>> can't
    >>>>>>> express an intelligent opinion on what the right answer is here, but
    >>>>>>> it seems like it ought to be investigated before somebody goes and
    >>>>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>>>
    >>>>>>
    >>>>>> Non-parallel backup already does the early error checking. I only
    >>>>>> intended
    >>>>>>
    >>>>>> to make parallel behave the same as non-parallel here. So, I agree
    >>>>>> with
    >>>>>>
    >>>>>> you that the behavior of parallel backup should be consistent with the
    >>>>>>
    >>>>>> non-parallel one.  Please see the code snippet below from
    >>>>>>
    >>>>>> basebackup.c:sendDir()
    >>>>>>
    >>>>>>
    >>>>>> /*
    >>>>>>>
    >>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>>>>>
    >>>>>>>  * error in that case. The error handler further up will call
    >>>>>>>
    >>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>>>
    >>>>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>>>
    >>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>>>
    >>>>>>>  * the backup early than continue to the end and fail there.
    >>>>>>>
    >>>>>>>  */
    >>>>>>>
    >>>>>>> CHECK_FOR_INTERRUPTS();
    >>>>>>>
    >>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>>>
    >>>>>>> ereport(ERROR,
    >>>>>>>
    >>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>>>
    >>>>>>> errmsg("the standby was promoted during online backup"),
    >>>>>>>
    >>>>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>>>
    >>>>>>> "and should not be used. "
    >>>>>>>
    >>>>>>> "Try taking another online backup.")));
    >>>>>>>
    >>>>>>>
    >>>>>>> > Okay, then I will add the shared state. And since we are adding
    >>>>>>> the shared state, we can use
    >>>>>>> > that for throttling, progress-reporting and standby early error
    >>>>>>> checking.
    >>>>>>>
    >>>>>>> Please propose a grammar here for all the new replication commands
    >>>>>>> you
    >>>>>>> plan to add before going and implement everything. That will make it
    >>>>>>> easier to hash out the design without forcing you to keep changing
    >>>>>>> the
    >>>>>>> code. Your design should include a sketch of how several sets of
    >>>>>>> coordinating backends taking several concurrent parallel backups will
    >>>>>>> end up with one shared state per parallel backup.
    >>>>>>>
    >>>>>>> > There are two possible options:
    >>>>>>> >
    >>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
    >>>>>>> OR
    >>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>>>>> >
    >>>>>>> > This BackupID should be given back as a response to start backup
    >>>>>>> command. All client workers
    >>>>>>> > must append this ID to all parallel backup replication commands.
    >>>>>>> So that we can use this identifier
    >>>>>>> > to search for that particular backup. Does that sound good?
    >>>>>>>
    >>>>>>> Using the WAL start location as the backup ID seems like it might be
    >>>>>>> problematic -- could a single checkpoint not end up as the start
    >>>>>>> location for multiple backups started at the same time? Whether
    >>>>>>> that's
    >>>>>>> possible now or not, it seems unwise to hard-wire that assumption
    >>>>>>> into
    >>>>>>> the wire protocol.
    >>>>>>>
    >>>>>>> I was thinking that perhaps the client should generate a unique
    >>>>>>> backup
    >>>>>>> ID, e.g. leader does:
    >>>>>>>
    >>>>>>> START_BACKUP unique_backup_id [options]...
    >>>>>>>
    >>>>>>> And then others do:
    >>>>>>>
    >>>>>>> JOIN_BACKUP unique_backup_id
    >>>>>>>
    >>>>>>> My thought is that you will have a number of shared memory structure
    >>>>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>>>> state for one backup. The shared state will include
    >>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>>>> JOIN_BACKUP would search for one by name.
    >>>>>>>
    >>>>>>> If you want to generate the name on the server side, then I suppose
    >>>>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>>>> and clients would have to specify that same backup ID when invoking
    >>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>>>>>> better. Either way, the backup ID should be something long and hard
    >>>>>>> to
    >>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
    >>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>>>>>> result to get a string. That way there's almost no risk of two backup
    >>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>>>> user trying to screw up somebody else's parallel backup by choosing a
    >>>>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>>>> user with enough access to do that sort of thing can probably cause a
    >>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>>>>> intentional collisions robustly here, so I think we should.
    >>>>>>>
    >>>>>>>
    >>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>>>> unique_backup_id’
    >>>>>> to make workers find the relevant shared state. There won't be any
    >>>>>> need for changing
    >>>>>> the grammar for any other command. The START_BACKUP can return the
    >>>>>> unique_backup_id
    >>>>>> in the result set.
    >>>>>>
    >>>>>> I am thinking of the following struct for shared state:
    >>>>>>
    >>>>>>> *typedef* *struct*
    >>>>>>>
    >>>>>>> {
    >>>>>>>
    >>>>>>> *char* backupid[NAMEDATALEN];
    >>>>>>>
    >>>>>>> XLogRecPtr startptr;
    >>>>>>>
    >>>>>>>
    >>>>>>> slock_t lock;
    >>>>>>>
    >>>>>>> int64 throttling_counter;
    >>>>>>>
    >>>>>>> *bool* backup_started_in_recovery;
    >>>>>>>
    >>>>>>> } BackupSharedState;
    >>>>>>>
    >>>>>>>
    >>>>>> The shared state structure entries would be maintained by a shared
    >>>>>> hash table.
    >>>>>> There will be one structure per parallel backup. Since a single
    >>>>>> parallel backup
    >>>>>> can engage more than one wal sender, so I think max_wal_senders might
    >>>>>> be a little
    >>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>>>> connections
    >>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
    >>>>>> the maximum
    >>>>>> number of for concurrent parallel backups i.e.
    >>>>>> ‘max_concurent_backups_allowed = 10’
    >>>>>> perhaps, or we can make it user-configurable.
    >>>>>>
    >>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>>>
    >>>>>> Checking for Standby Promotion:
    >>>>>> At the START_BACKUP command, we initialize
    >>>>>> BackupSharedState.backup_started_in_recovery
    >>>>>> and keep checking it whenever send_file () is called to send a new
    >>>>>> file.
    >>>>>>
    >>>>>> Throttling:
    >>>>>> BackupSharedState.throttling_counter - The throttling logic remains
    >>>>>> the same
    >>>>>> as for non-parallel backup with the exception that multiple threads
    >>>>>> will now be
    >>>>>> updating it. So in parallel backup, this will represent the overall
    >>>>>> bytes that
    >>>>>> have been transferred. So the workers would sleep if they have
    >>>>>> exceeded the
    >>>>>> limit. Hence, the shared state carries a lock to safely update the
    >>>>>> throttling
    >>>>>> value atomically.
    >>>>>>
    >>>>>> Progress Reporting:
    >>>>>> Although I think we should add progress-reporting for parallel backup
    >>>>>> as a
    >>>>>> separate patch. The relevant entries for progress-reporting such as
    >>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>>>> structure
    >>>>>> as well.
    >>>>>>
    >>>>>>
    >>>>>> Grammar:
    >>>>>> There is a change in the resultset being returned for START_BACKUP
    >>>>>> command;
    >>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>>>> command is
    >>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>>>> changes
    >>>>>> to the grammar.
    >>>>>>
    >>>>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>>>> STOP_BACKUP [NOWAIT]
    >>>>>>   - returns startptr, tli, backup_label
    >>>>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>>>> backend process.
    >>>>>>
    >>>>>> LIST_TABLESPACES [PROGRESS]
    >>>>>> LIST_FILES [TABLESPACE]
    >>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>>>
    >>>>>>
    >>
    >
    > Hi,
    >
    > rebased and updated to the current master (8128b0c1). v13 is attached.
    >
    > - Fixes the above reported issues.
    >
    > - Added progress-reporting support for parallel:
    > For this, 'backup_streamed' is moved to a shared structure (BackupState) as
    > pg_atomic_uint64 variable. The worker processes will keep incrementing this
    > variable.
    >
    > While files are being transferred from server to client. The main process
    > remains
    > in an idle state. So after each increment, the worker process will signal
    > master to
    > update the stats in pg_stat_progress_basebackup view.
    >
    > The 'tablespace_streamed' column is not updated and will remain empty.
    > This is
    > because multiple workers may be copying files from different tablespaces.
    >
    >
    > - Added backup manifest:
    > The backend workers maintain their own manifest file which contains a list
    > of files
    > that are being transferred by the work. Once all backup files are
    > transferred, the
    > workers will create a temp file as
    > ('pg_tempdir/temp_file_prefix_backupid.workerid')
    > to write the content of the manifest file from BufFile. The workers won’t
    > add the
    > header, nor the WAL information in their manifest. These two will be added
    > by the
    > main process while merging all worker manifest files.
    >
    > The main process will read these individual files and concatenate them
    > into a single file
    > which is then sent back to the client.
    >
    > The manifest file is created when the following command is received:
    >
    >>     BUILD_MANIFEST 'backupid'
    >
    >
    > This is a new replication command. It is sent when pg_basebackup has
    > copied all the
    > $PGDATA files including WAL files.
    >
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  82. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-14T14:36:58Z

    On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <
    kashif.zeeshan@enterprisedb.com> wrote:
    
    > Hi Asif
    >
    > Getting the following error on Parallel backup when --no-manifest option
    > is used.
    >
    > [edb@localhost bin]$
    > [edb@localhost bin]$
    > [edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
    >  /home/edb/Desktop/backup/ --no-manifest
    > pg_basebackup: initiating base backup, waiting for checkpoint to complete
    > pg_basebackup: checkpoint completed
    > pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    > pg_basebackup: starting background WAL receiver
    > pg_basebackup: created temporary replication slot "pg_basebackup_10223"
    > pg_basebackup: backup worker (0) created
    > pg_basebackup: backup worker (1) created
    > pg_basebackup: backup worker (2) created
    > pg_basebackup: backup worker (3) created
    > pg_basebackup: backup worker (4) created
    > pg_basebackup: write-ahead log end point: 0/2000100
    > pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
    >  could not open file
    > "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
    > or directory
    > pg_basebackup: removing contents of data directory
    > "/home/edb/Desktop/backup/"
    > [edb@localhost bin]$
    >
    
    I forgot to make a check for no-manifest. Fixed. Attached is the updated
    patch.
    
    
    > Thanks
    >
    > On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >>
    >> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
    >> kashif.zeeshan@enterprisedb.com> wrote:
    >>
    >>>
    >>>
    >>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>> Hi,
    >>>>
    >>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>>>
    >>>> I have added the shared state as previously described. The new grammar
    >>>> changes
    >>>> are as follows:
    >>>>
    >>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>>>     - This will generate a unique backupid using pg_strong_random(16)
    >>>> and hex-encoded
    >>>>       it. which is then returned as the result set.
    >>>>     - It will also create a shared state and add it to the hashtable.
    >>>> The hash table size is set
    >>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
    >>>> dynamically, I think it's
    >>>>       sufficient initial size. max_wal_senders is not used, because it
    >>>> can be set to quite a
    >>>>       large values.
    >>>>
    >>>> JOIN_BACKUP 'backup_id'
    >>>>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>>>
    >>>>
    >>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>>>     - renamed SEND_FILES to SEND_FILE
    >>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >>>> accessible through
    >>>>       shared state.
    >>>>
    >>>> There is no change in other commands:
    >>>> STOP_BACKUP [NOWAIT]
    >>>> LIST_TABLESPACES [PROGRESS]
    >>>> LIST_FILES [TABLESPACE]
    >>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>
    >>>> The current patches (v11) have been rebased to the latest master. The
    >>>> backup manifest is enabled
    >>>> by default, so I have disabled it for parallel backup mode and have
    >>>> generated a warning so that
    >>>> user is aware of it and not expect it in the backup.
    >>>>
    >>>> Hi Asif
    >>>
    >>> I have verified the bug fixes, one bug is fixed and working now as
    >>> expected
    >>>
    >>> For the verification of the other bug fixes faced following issues,
    >>> please have a look.
    >>>
    >>>
    >>> 1) Following bug fixes mentioned below are generating segmentation
    >>> fault.
    >>>
    >>> Please note for reference I have added a description only as steps were
    >>> given in previous emails of each bug I tried to verify the fix. Backtrace
    >>> is also added with each case which points to one bug for both the cases.
    >>>
    >>> a) The backup failed with errors "error: could not connect to server:
    >>> could not look up local user ID 1000: Too many open files" when the
    >>> max_wal_senders was set to 2000.
    >>>
    >>>
    >>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
    >>>  /home/edb/Desktop/backup/
    >>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>> mode
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    >>> pg_basebackup: backup worker (0) created
    >>> pg_basebackup: backup worker (1) created
    >>> pg_basebackup: backup worker (2) created
    >>> pg_basebackup: backup worker (3) created
    >>> ….
    >>> ….
    >>> pg_basebackup: backup worker (1014) created
    >>> pg_basebackup: backup worker (1015) created
    >>> pg_basebackup: backup worker (1016) created
    >>> pg_basebackup: backup worker (1017) created
    >>> pg_basebackup: error: could not connect to server: could not look up
    >>> local user ID 1000: Too many open files
    >>> Segmentation fault
    >>> [edb@localhost bin]$
    >>>
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ gdb pg_basebackup
    >>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    >>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >>> Copyright (C) 2013 Free Software Foundation, Inc.
    >>> License GPLv3+: GNU GPL version 3 or later <
    >>> http://gnu.org/licenses/gpl.html>
    >>> This is free software: you are free to change and redistribute it.
    >>> There is NO WARRANTY, to the extent permitted by law.  Type "show
    >>> copying"
    >>> and "show warranty" for details.
    >>> This GDB was configured as "x86_64-redhat-linux-gnu".
    >>> For bug reporting instructions, please see:
    >>> <http://www.gnu.org/software/gdb/bugs/>...
    >>> Reading symbols from
    >>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >>> [New LWP 13219]
    >>> [New LWP 13222]
    >>> [Thread debugging using libthread_db enabled]
    >>> Using host libthread_db library "/lib64/libthread_db.so.1".
    >>> Core was generated by `./pg_basebackup -v -j 1990 -D
    >>> /home/edb/Desktop/backup/'.
    >>> Program terminated with signal 11, Segmentation fault.
    >>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >>> (gdb) bt
    >>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    >>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >>> at exit.c:77
    >>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
    >>> at pg_basebackup.c:2811
    >>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    >>> pg_basebackup.c:2765
    >>> (gdb)
    >>>
    >>>
    >>>
    >>>
    >>> b) When executing two backups at the same time, getting FATAL error due
    >>> to max_wal_senders and instead of exit  Backup got completed.
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
    >>>  /home/edb/Desktop/backup1/
    >>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>> mode
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    >>> pg_basebackup: backup worker (0) created
    >>> pg_basebackup: backup worker (1) created
    >>> pg_basebackup: backup worker (2) created
    >>> pg_basebackup: backup worker (3) created
    >>> pg_basebackup: backup worker (4) created
    >>> pg_basebackup: backup worker (5) created
    >>> pg_basebackup: backup worker (6) created
    >>> pg_basebackup: error: could not connect to server: FATAL:  number of
    >>> requested standby connections exceeds max_wal_senders (currently 10)
    >>> Segmentation fault (core dumped)
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ gdb pg_basebackup
    >>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    >>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >>> Copyright (C) 2013 Free Software Foundation, Inc.
    >>> License GPLv3+: GNU GPL version 3 or later <
    >>> http://gnu.org/licenses/gpl.html>
    >>> This is free software: you are free to change and redistribute it.
    >>> There is NO WARRANTY, to the extent permitted by law.  Type "show
    >>> copying"
    >>> and "show warranty" for details.
    >>> This GDB was configured as "x86_64-redhat-linux-gnu".
    >>> For bug reporting instructions, please see:
    >>> <http://www.gnu.org/software/gdb/bugs/>...
    >>> Reading symbols from
    >>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >>> [New LWP 17041]
    >>> [New LWP 17067]
    >>> [Thread debugging using libthread_db enabled]
    >>> Using host libthread_db library "/lib64/libthread_db.so.1".
    >>> Core was generated by `./pg_basebackup -v -j 8 -D
    >>> /home/edb/Desktop/backup1/'.
    >>> Program terminated with signal 11, Segmentation fault.
    >>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >>> (gdb) bt
    >>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    >>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >>> at exit.c:77
    >>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
    >>> at pg_basebackup.c:2811
    >>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    >>> pg_basebackup.c:2765
    >>> (gdb)
    >>>
    >>>
    >>>
    >>>
    >>> 2) The following bug is not fixed yet
    >>>
    >>> A similar case is when DB Server is shut down while the Parallel Backup
    >>> is in progress then the correct error is displayed but then the backup
    >>> folder is not cleaned and leaves a corrupt backup.
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>> -j 8
    >>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>> mode
    >>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >>> pg_basebackup: checkpoint completed
    >>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    >>> pg_basebackup: starting background WAL receiver
    >>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    >>> pg_basebackup: backup worker (0) created
    >>> pg_basebackup: backup worker (1) created
    >>> pg_basebackup: backup worker (2) created
    >>> pg_basebackup: backup worker (3) created
    >>> pg_basebackup: backup worker (4) created
    >>> pg_basebackup: backup worker (5) created
    >>> pg_basebackup: backup worker (6) created
    >>> pg_basebackup: backup worker (7) created
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> pg_basebackup: removing contents of data directory
    >>> "/home/edb/Desktop/backup/"
    >>> pg_basebackup: error: could not read COPY data: server closed the
    >>> connection unexpectedly
    >>> This probably means the server terminated abnormally
    >>> before or while processing the request.
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>>
    >>>
    >>>
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    >>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    >>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
    >>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    >>> [edb@localhost bin]$
    >>> [edb@localhost bin]$
    >>>
    >>>
    >>>
    >>>
    >>> Thanks
    >>> Kashif Zeeshan
    >>>
    >>>>
    >>>>
    >>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    >>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>
    >>>>>
    >>>>>
    >>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >>>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>>
    >>>>>> Hi Asif
    >>>>>>
    >>>>>> When a non-existent slot is used with tablespace then correct error
    >>>>>> is displayed but then the backup folder is not cleaned and leaves a corrupt
    >>>>>> backup.
    >>>>>>
    >>>>>> Steps
    >>>>>> =======
    >>>>>>
    >>>>>> edb@localhost bin]$
    >>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>>>>> [edb@localhost bin]$
    >>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>>>>> CREATE TABLESPACE
    >>>>>> postgres=#
    >>>>>> postgres=# create table t1 (a int) tablespace tbl1;
    >>>>>> CREATE TABLE
    >>>>>> postgres=# insert into t1 values(100);
    >>>>>> INSERT 0 1
    >>>>>> postgres=# insert into t1 values(200);
    >>>>>> INSERT 0 1
    >>>>>> postgres=# insert into t1 values(300);
    >>>>>> INSERT 0 1
    >>>>>> postgres=#
    >>>>>>
    >>>>>>
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>> complete
    >>>>>> pg_basebackup: checkpoint completed
    >>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>>>>> pg_basebackup: starting background WAL receiver
    >>>>>> pg_basebackup: error: could not send replication command
    >>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>>>>> pg_basebackup: backup worker (0) created
    >>>>>> pg_basebackup: backup worker (1) created
    >>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>>> pg_basebackup: error: child thread exited with error 1
    >>>>>> [edb@localhost bin]$
    >>>>>>
    >>>>>> backup folder not cleaned
    >>>>>>
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>>>>             postgresql.conf
    >>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>>>>  postgresql.auto.conf
    >>>>>> [edb@localhost bin]$
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>> If the same case is executed without the parallel backup patch then
    >>>>>> the backup folder is cleaned after the error is displayed.
    >>>>>>
    >>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D
    >>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>> complete
    >>>>>> pg_basebackup: checkpoint completed
    >>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>>>>> pg_basebackup: starting background WAL receiver
    >>>>>> pg_basebackup: error: could not send replication command
    >>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>>> pg_basebackup: error: child process exited with exit code 1
    >>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>>>>> pg_basebackup: changes to tablespace directories will not be undone
    >>>>>>
    >>>>>
    >>>>>
    >>>>> Hi Asif
    >>>>>
    >>>>> A similar case is when DB Server is shut down while the Parallel
    >>>>> Backup is in progress then the correct error is displayed but then the
    >>>>> backup folder is not cleaned and leaves a corrupt backup. I think one bug
    >>>>> fix will solve all these cases where clean up is not done when parallel
    >>>>> backup is failed.
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
    >>>>>  /home/edb/Desktop/backup/ -j 8
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    >>>>> pg_basebackup: backup worker (0) created
    >>>>> pg_basebackup: backup worker (1) created
    >>>>> pg_basebackup: backup worker (2) created
    >>>>> pg_basebackup: backup worker (3) created
    >>>>> pg_basebackup: backup worker (4) created
    >>>>> pg_basebackup: backup worker (5) created
    >>>>> pg_basebackup: backup worker (6) created
    >>>>> pg_basebackup: backup worker (7) created
    >>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>> connection unexpectedly
    >>>>> This probably means the server terminated abnormally
    >>>>> before or while processing the request.
    >>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>> connection unexpectedly
    >>>>> This probably means the server terminated abnormally
    >>>>> before or while processing the request.
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>> Same case when executed on pg_basebackup without the Parallel backup
    >>>>> patch then proper clean up is done.
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
    >>>>>  /home/edb/Desktop/backup/
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>> connection unexpectedly
    >>>>> This probably means the server terminated abnormally
    >>>>> before or while processing the request.
    >>>>> pg_basebackup: removing contents of data directory
    >>>>> "/home/edb/Desktop/backup/"
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>> Thanks
    >>>>>
    >>>>>
    >>>>>>
    >>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>> wrote:
    >>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>>>>> wrote:
    >>>>>>>
    >>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>>>> wrote:
    >>>>>>>> >> Why would you need to do that? As long as the process where
    >>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>>>>> >
    >>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
    >>>>>>>> not while the backup is
    >>>>>>>> > in progress. So if the backup is a large one, early error
    >>>>>>>> detection would be much beneficial.
    >>>>>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>>>>
    >>>>>>>> Because non-parallel backup does not feature early detection of this
    >>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>>>>>> thread in a separate patch. A patch proposing to make parallel
    >>>>>>>> backup
    >>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
    >>>>>>>> at
    >>>>>>>> least if I have anything to say about it.
    >>>>>>>>
    >>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>>>>> current situation is not great, but promotions ought to be
    >>>>>>>> relatively
    >>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>>>>> also worth considering whether the right fix is to figure out how to
    >>>>>>>> make that case actually work, rather than just making it fail
    >>>>>>>> quicker.
    >>>>>>>> I don't currently understand the reason for the prohibition so I
    >>>>>>>> can't
    >>>>>>>> express an intelligent opinion on what the right answer is here, but
    >>>>>>>> it seems like it ought to be investigated before somebody goes and
    >>>>>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>>>>
    >>>>>>>
    >>>>>>> Non-parallel backup already does the early error checking. I only
    >>>>>>> intended
    >>>>>>>
    >>>>>>> to make parallel behave the same as non-parallel here. So, I agree
    >>>>>>> with
    >>>>>>>
    >>>>>>> you that the behavior of parallel backup should be consistent with
    >>>>>>> the
    >>>>>>>
    >>>>>>> non-parallel one.  Please see the code snippet below from
    >>>>>>>
    >>>>>>> basebackup.c:sendDir()
    >>>>>>>
    >>>>>>>
    >>>>>>> /*
    >>>>>>>>
    >>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with
    >>>>>>>> an
    >>>>>>>>
    >>>>>>>>  * error in that case. The error handler further up will call
    >>>>>>>>
    >>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>>>>
    >>>>>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>>>>
    >>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>>>>
    >>>>>>>>  * the backup early than continue to the end and fail there.
    >>>>>>>>
    >>>>>>>>  */
    >>>>>>>>
    >>>>>>>> CHECK_FOR_INTERRUPTS();
    >>>>>>>>
    >>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>>>>
    >>>>>>>> ereport(ERROR,
    >>>>>>>>
    >>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>>>>
    >>>>>>>> errmsg("the standby was promoted during online backup"),
    >>>>>>>>
    >>>>>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>>>>
    >>>>>>>> "and should not be used. "
    >>>>>>>>
    >>>>>>>> "Try taking another online backup.")));
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> > Okay, then I will add the shared state. And since we are adding
    >>>>>>>> the shared state, we can use
    >>>>>>>> > that for throttling, progress-reporting and standby early error
    >>>>>>>> checking.
    >>>>>>>>
    >>>>>>>> Please propose a grammar here for all the new replication commands
    >>>>>>>> you
    >>>>>>>> plan to add before going and implement everything. That will make it
    >>>>>>>> easier to hash out the design without forcing you to keep changing
    >>>>>>>> the
    >>>>>>>> code. Your design should include a sketch of how several sets of
    >>>>>>>> coordinating backends taking several concurrent parallel backups
    >>>>>>>> will
    >>>>>>>> end up with one shared state per parallel backup.
    >>>>>>>>
    >>>>>>>> > There are two possible options:
    >>>>>>>> >
    >>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
    >>>>>>>> OR
    >>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>>>>>> >
    >>>>>>>> > This BackupID should be given back as a response to start backup
    >>>>>>>> command. All client workers
    >>>>>>>> > must append this ID to all parallel backup replication commands.
    >>>>>>>> So that we can use this identifier
    >>>>>>>> > to search for that particular backup. Does that sound good?
    >>>>>>>>
    >>>>>>>> Using the WAL start location as the backup ID seems like it might be
    >>>>>>>> problematic -- could a single checkpoint not end up as the start
    >>>>>>>> location for multiple backups started at the same time? Whether
    >>>>>>>> that's
    >>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
    >>>>>>>> into
    >>>>>>>> the wire protocol.
    >>>>>>>>
    >>>>>>>> I was thinking that perhaps the client should generate a unique
    >>>>>>>> backup
    >>>>>>>> ID, e.g. leader does:
    >>>>>>>>
    >>>>>>>> START_BACKUP unique_backup_id [options]...
    >>>>>>>>
    >>>>>>>> And then others do:
    >>>>>>>>
    >>>>>>>> JOIN_BACKUP unique_backup_id
    >>>>>>>>
    >>>>>>>> My thought is that you will have a number of shared memory structure
    >>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>>>>> state for one backup. The shared state will include
    >>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>>>>> JOIN_BACKUP would search for one by name.
    >>>>>>>>
    >>>>>>>> If you want to generate the name on the server side, then I suppose
    >>>>>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>>>>> and clients would have to specify that same backup ID when invoking
    >>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way
    >>>>>>>> is
    >>>>>>>> better. Either way, the backup ID should be something long and hard
    >>>>>>>> to
    >>>>>>>> guess, not e.g. the leader processes' PID. I think we should
    >>>>>>>> generate
    >>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode
    >>>>>>>> the
    >>>>>>>> result to get a string. That way there's almost no risk of two
    >>>>>>>> backup
    >>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>>>>> user trying to screw up somebody else's parallel backup by choosing
    >>>>>>>> a
    >>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>>>>> user with enough access to do that sort of thing can probably cause
    >>>>>>>> a
    >>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>>>>>> intentional collisions robustly here, so I think we should.
    >>>>>>>>
    >>>>>>>>
    >>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>>>>> unique_backup_id’
    >>>>>>> to make workers find the relevant shared state. There won't be any
    >>>>>>> need for changing
    >>>>>>> the grammar for any other command. The START_BACKUP can return the
    >>>>>>> unique_backup_id
    >>>>>>> in the result set.
    >>>>>>>
    >>>>>>> I am thinking of the following struct for shared state:
    >>>>>>>
    >>>>>>>> *typedef* *struct*
    >>>>>>>>
    >>>>>>>> {
    >>>>>>>>
    >>>>>>>> *char* backupid[NAMEDATALEN];
    >>>>>>>>
    >>>>>>>> XLogRecPtr startptr;
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> slock_t lock;
    >>>>>>>>
    >>>>>>>> int64 throttling_counter;
    >>>>>>>>
    >>>>>>>> *bool* backup_started_in_recovery;
    >>>>>>>>
    >>>>>>>> } BackupSharedState;
    >>>>>>>>
    >>>>>>>>
    >>>>>>> The shared state structure entries would be maintained by a shared
    >>>>>>> hash table.
    >>>>>>> There will be one structure per parallel backup. Since a single
    >>>>>>> parallel backup
    >>>>>>> can engage more than one wal sender, so I think max_wal_senders
    >>>>>>> might be a little
    >>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>>>>> connections
    >>>>>>> per parallel backup? Alternatively, we can set a new GUC that
    >>>>>>> defines the maximum
    >>>>>>> number of for concurrent parallel backups i.e.
    >>>>>>> ‘max_concurent_backups_allowed = 10’
    >>>>>>> perhaps, or we can make it user-configurable.
    >>>>>>>
    >>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>>>>
    >>>>>>> Checking for Standby Promotion:
    >>>>>>> At the START_BACKUP command, we initialize
    >>>>>>> BackupSharedState.backup_started_in_recovery
    >>>>>>> and keep checking it whenever send_file () is called to send a new
    >>>>>>> file.
    >>>>>>>
    >>>>>>> Throttling:
    >>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
    >>>>>>> the same
    >>>>>>> as for non-parallel backup with the exception that multiple threads
    >>>>>>> will now be
    >>>>>>> updating it. So in parallel backup, this will represent the overall
    >>>>>>> bytes that
    >>>>>>> have been transferred. So the workers would sleep if they have
    >>>>>>> exceeded the
    >>>>>>> limit. Hence, the shared state carries a lock to safely update the
    >>>>>>> throttling
    >>>>>>> value atomically.
    >>>>>>>
    >>>>>>> Progress Reporting:
    >>>>>>> Although I think we should add progress-reporting for parallel
    >>>>>>> backup as a
    >>>>>>> separate patch. The relevant entries for progress-reporting such as
    >>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>>>>> structure
    >>>>>>> as well.
    >>>>>>>
    >>>>>>>
    >>>>>>> Grammar:
    >>>>>>> There is a change in the resultset being returned for START_BACKUP
    >>>>>>> command;
    >>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>>>>> command is
    >>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>>>>> changes
    >>>>>>> to the grammar.
    >>>>>>>
    >>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>>>>> STOP_BACKUP [NOWAIT]
    >>>>>>>   - returns startptr, tli, backup_label
    >>>>>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>>>>> backend process.
    >>>>>>>
    >>>>>>> LIST_TABLESPACES [PROGRESS]
    >>>>>>> LIST_FILES [TABLESPACE]
    >>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>>>>
    >>>>>>>
    >>>
    >>
    >> Hi,
    >>
    >> rebased and updated to the current master (8128b0c1). v13 is attached.
    >>
    >> - Fixes the above reported issues.
    >>
    >> - Added progress-reporting support for parallel:
    >> For this, 'backup_streamed' is moved to a shared structure (BackupState)
    >> as
    >> pg_atomic_uint64 variable. The worker processes will keep incrementing
    >> this
    >> variable.
    >>
    >> While files are being transferred from server to client. The main process
    >> remains
    >> in an idle state. So after each increment, the worker process will signal
    >> master to
    >> update the stats in pg_stat_progress_basebackup view.
    >>
    >> The 'tablespace_streamed' column is not updated and will remain empty.
    >> This is
    >> because multiple workers may be copying files from different tablespaces.
    >>
    >>
    >> - Added backup manifest:
    >> The backend workers maintain their own manifest file which contains a
    >> list of files
    >> that are being transferred by the work. Once all backup files are
    >> transferred, the
    >> workers will create a temp file as
    >> ('pg_tempdir/temp_file_prefix_backupid.workerid')
    >> to write the content of the manifest file from BufFile. The workers won’t
    >> add the
    >> header, nor the WAL information in their manifest. These two will be
    >> added by the
    >> main process while merging all worker manifest files.
    >>
    >> The main process will read these individual files and concatenate them
    >> into a single file
    >> which is then sent back to the client.
    >>
    >> The manifest file is created when the following command is received:
    >>
    >>>     BUILD_MANIFEST 'backupid'
    >>
    >>
    >> This is a new replication command. It is sent when pg_basebackup has
    >> copied all the
    >> $PGDATA files including WAL files.
    >>
    >>
    >>
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >>
    >
    > --
    > Regards
    > ====================================
    > Kashif Zeeshan
    > Lead Quality Assurance Engineer / Manager
    >
    > EnterpriseDB Corporation
    > The Enterprise Postgres Company
    >
    >
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  83. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-14T20:49:04Z

    On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.
    
    +typedef struct
    +{
    ...
    +} BackupFile;
    +
    +typedef struct
    +{
    ...
    +} BackupState;
    
    These structures need comments.
    
    +list_wal_files_opt_list:
    +                       SCONST SCONST
                                    {
    -                                 $$ = makeDefElem("manifest_checksums",
    -
    (Node *)makeString($2), -1);
    +                                       $$ = list_make2(
    +                                       makeDefElem("start_wal_location",
    +                                               (Node *)makeString($2), -1),
    +                                       makeDefElem("end_wal_location",
    +                                               (Node *)makeString($2), -1));
    +
                                    }
    
    This seems like an unnecessarily complicated parse representation. The
    DefElems seem to be completely unnecessary here.
    
    @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
                    set_ps_display(activitymsg);
            }
    
    -       perform_base_backup(&opt);
    +       switch (cmd->cmdtag)
    
    So the design here is that SendBaseBackup() is now going to do a bunch
    of things that are NOT sending a base backup? With no updates to the
    comments of that function and no change to the process title it sets?
    
    -       return (manifest->buffile != NULL);
    +       return (manifest && manifest->buffile != NULL);
    
    Heck no. It appears that you didn't even bother reading the function
    header comment.
    
    + * Send a single resultset containing XLogRecPtr record (in text format)
    + * TimelineID and backup label.
      */
     static void
    -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
    +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
    +                                        StringInfo label, char *backupid)
    
    This just casually breaks wire protocol compatibility, which seems
    completely unacceptable.
    
    +       if (strlen(opt->tablespace) > 0)
    +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
    +       else
    +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
    +
    +       SendFilesHeader(files);
    
    So I guess the idea here is that we buffer the entire list of files in
    memory, regardless of size, and then we send it out afterwards. That
    doesn't seem like a good idea. The list of files might be very large.
    We probably need some code refactoring here rather than just piling
    more and more different responsibilities onto sendTablespace() and
    sendDir().
    
    +       if (state->parallel_mode)
    +               SpinLockAcquire(&state->lock);
    +
    +       state->throttling_counter += increment;
    +
    +       if (state->parallel_mode)
    +               SpinLockRelease(&state->lock);
    
    I don't like this much. It seems to me that we would do better to use
    atomics here all the time, instead of conditional spinlocks.
    
    +static void
    +send_file(basebackup_options *opt, char *file, bool missing_ok)
    ...
    +       if (file == NULL)
    +               return;
    
    That seems totally inappropriate.
    
    +                       sendFile(file, file + basepathlen, &statbuf,
    true, InvalidOid, NULL, NULL);
    
    Maybe I'm misunderstanding, but this looks like it's going to write a
    tar header, even though we're not writing a tarfile.
    
    +               else
    +                       ereport(WARNING,
    +                                       (errmsg("skipping special file
    or directory \"%s\"", file)));
    
    So, if the user asks for a directory or symlink, what's going to
    happen is that they're going to receive an empty file, and get a
    warning. That sounds like terrible behavior.
    
    +       /*
    +        * Check for checksum failures. If there are failures across multiple
    +        * processes it may not report total checksum count, but it will error
    +        * out,terminating the backup.
    +        */
    
    In other words, the patch breaks the feature. Not that the feature in
    question works particularly well as things stand, but this makes it
    worse.
    
    I think this patch (0003) is in really bad shape. I'm having second
    thoughts about the design, but it's kind of hard to even have a
    discussion about the design when the patch is riddled with minor
    problems like inadequate comments, failure to update existing
    comments, and breaking a bunch of things. I understand that sometimes
    things get missed, but this is version 14 of a patch that's been
    kicking around since last August.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  84. Re: WIP/PoC for parallel backup

    Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-04-15T08:49:39Z

    On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > I forgot to make a check for no-manifest. Fixed. Attached is the updated
    > patch.
    >
    > +typedef struct
    > +{
    > ...
    > +} BackupFile;
    > +
    > +typedef struct
    > +{
    > ...
    > +} BackupState;
    >
    > These structures need comments.
    >
    > +list_wal_files_opt_list:
    > +                       SCONST SCONST
    >                                 {
    > -                                 $$ = makeDefElem("manifest_checksums",
    > -
    > (Node *)makeString($2), -1);
    > +                                       $$ = list_make2(
    > +                                       makeDefElem("start_wal_location",
    > +                                               (Node *)makeString($2),
    > -1),
    > +                                       makeDefElem("end_wal_location",
    > +                                               (Node *)makeString($2),
    > -1));
    > +
    >                                 }
    >
    > This seems like an unnecessarily complicated parse representation. The
    > DefElems seem to be completely unnecessary here.
    >
    > @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
    >                 set_ps_display(activitymsg);
    >         }
    >
    > -       perform_base_backup(&opt);
    > +       switch (cmd->cmdtag)
    >
    > So the design here is that SendBaseBackup() is now going to do a bunch
    > of things that are NOT sending a base backup? With no updates to the
    > comments of that function and no change to the process title it sets?
    >
    > -       return (manifest->buffile != NULL);
    > +       return (manifest && manifest->buffile != NULL);
    >
    > Heck no. It appears that you didn't even bother reading the function
    > header comment.
    >
    > + * Send a single resultset containing XLogRecPtr record (in text format)
    > + * TimelineID and backup label.
    >   */
    >  static void
    > -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
    > +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
    > +                                        StringInfo label, char *backupid)
    >
    > This just casually breaks wire protocol compatibility, which seems
    > completely unacceptable.
    >
    > +       if (strlen(opt->tablespace) > 0)
    > +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
    > +       else
    > +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
    > +
    > +       SendFilesHeader(files);
    >
    > So I guess the idea here is that we buffer the entire list of files in
    > memory, regardless of size, and then we send it out afterwards. That
    > doesn't seem like a good idea. The list of files might be very large.
    > We probably need some code refactoring here rather than just piling
    > more and more different responsibilities onto sendTablespace() and
    > sendDir().
    >
    > +       if (state->parallel_mode)
    > +               SpinLockAcquire(&state->lock);
    > +
    > +       state->throttling_counter += increment;
    > +
    > +       if (state->parallel_mode)
    > +               SpinLockRelease(&state->lock);
    >
    > I don't like this much. It seems to me that we would do better to use
    > atomics here all the time, instead of conditional spinlocks.
    >
    > +static void
    > +send_file(basebackup_options *opt, char *file, bool missing_ok)
    > ...
    > +       if (file == NULL)
    > +               return;
    >
    > That seems totally inappropriate.
    >
    > +                       sendFile(file, file + basepathlen, &statbuf,
    > true, InvalidOid, NULL, NULL);
    >
    > Maybe I'm misunderstanding, but this looks like it's going to write a
    > tar header, even though we're not writing a tarfile.
    >
    > +               else
    > +                       ereport(WARNING,
    > +                                       (errmsg("skipping special file
    > or directory \"%s\"", file)));
    >
    > So, if the user asks for a directory or symlink, what's going to
    > happen is that they're going to receive an empty file, and get a
    > warning. That sounds like terrible behavior.
    >
    > +       /*
    > +        * Check for checksum failures. If there are failures across
    > multiple
    > +        * processes it may not report total checksum count, but it will
    > error
    > +        * out,terminating the backup.
    > +        */
    >
    > In other words, the patch breaks the feature. Not that the feature in
    > question works particularly well as things stand, but this makes it
    > worse.
    >
    > I think this patch (0003) is in really bad shape. I'm having second
    > thoughts about the design, but it's kind of hard to even have a
    > discussion about the design when the patch is riddled with minor
    > problems like inadequate comments, failure to update existing
    > comments, and breaking a bunch of things. I understand that sometimes
    > things get missed, but this is version 14 of a patch that's been
    > kicking around since last August.
    
    
    Fair enough. Some of this is also due to backup related features i.e backup
    manifest, progress reporting that got committed to master towards the tail
    end of PG-13. Rushing to get parallel backup feature compatible with these
    features also caused some of the oversights.
    
    
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    >
    > --
    Highgo Software (Canada/China/Pakistan)
    URL : http://www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    EMAIL: mailto: ahsan.hadi@highgo.ca
    
  85. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-15T09:28:37Z

    Hi Asif,
    
    In below scenarios backup verification failed for tablespace, when backup
    taken with parallel option.
    without parallel for the same scenario pg_verifybackup is passed without
    any error.
    
    [edb@localhost bin]$ mkdir /tmp/test_bkp/tblsp1
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp1
    location '/tmp/test_bkp/tblsp1';"
    CREATE TABLESPACE
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text)
    tablespace tblsp1;"
    CREATE TABLE
    [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
    ('parallel_backup with -T tablespace option');"
    INSERT 0 1
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp -T
    /tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp2 -j 4
    [edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp
    pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16390" is
    present on disk but not in the manifest
    pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16388" is
    present on disk but not in the manifest
    pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16385" is
    present on disk but not in the manifest
    pg_verifybackup: error: "/PG_13_202004074/13530/16388" is present in the
    manifest but not on disk
    pg_verifybackup: error: "/PG_13_202004074/13530/16390" is present in the
    manifest but not on disk
    pg_verifybackup: error: "/PG_13_202004074/13530/16385" is present in the
    manifest but not on disk
    
    --without parallel backup
    [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp1 -T
    /tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp3 -j 1
    [edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp1
    backup successfully verified
    
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Wed, Apr 15, 2020 at 2:19 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    
    >
    >
    > On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >> > I forgot to make a check for no-manifest. Fixed. Attached is the
    >> updated patch.
    >>
    >> +typedef struct
    >> +{
    >> ...
    >> +} BackupFile;
    >> +
    >> +typedef struct
    >> +{
    >> ...
    >> +} BackupState;
    >>
    >> These structures need comments.
    >>
    >> +list_wal_files_opt_list:
    >> +                       SCONST SCONST
    >>                                 {
    >> -                                 $$ = makeDefElem("manifest_checksums",
    >> -
    >> (Node *)makeString($2), -1);
    >> +                                       $$ = list_make2(
    >> +                                       makeDefElem("start_wal_location",
    >> +                                               (Node *)makeString($2),
    >> -1),
    >> +                                       makeDefElem("end_wal_location",
    >> +                                               (Node *)makeString($2),
    >> -1));
    >> +
    >>                                 }
    >>
    >> This seems like an unnecessarily complicated parse representation. The
    >> DefElems seem to be completely unnecessary here.
    >>
    >> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
    >>                 set_ps_display(activitymsg);
    >>         }
    >>
    >> -       perform_base_backup(&opt);
    >> +       switch (cmd->cmdtag)
    >>
    >> So the design here is that SendBaseBackup() is now going to do a bunch
    >> of things that are NOT sending a base backup? With no updates to the
    >> comments of that function and no change to the process title it sets?
    >>
    >> -       return (manifest->buffile != NULL);
    >> +       return (manifest && manifest->buffile != NULL);
    >>
    >> Heck no. It appears that you didn't even bother reading the function
    >> header comment.
    >>
    >> + * Send a single resultset containing XLogRecPtr record (in text format)
    >> + * TimelineID and backup label.
    >>   */
    >>  static void
    >> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
    >> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
    >> +                                        StringInfo label, char *backupid)
    >>
    >> This just casually breaks wire protocol compatibility, which seems
    >> completely unacceptable.
    >>
    >> +       if (strlen(opt->tablespace) > 0)
    >> +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
    >> +       else
    >> +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
    >> +
    >> +       SendFilesHeader(files);
    >>
    >> So I guess the idea here is that we buffer the entire list of files in
    >> memory, regardless of size, and then we send it out afterwards. That
    >> doesn't seem like a good idea. The list of files might be very large.
    >> We probably need some code refactoring here rather than just piling
    >> more and more different responsibilities onto sendTablespace() and
    >> sendDir().
    >>
    >> +       if (state->parallel_mode)
    >> +               SpinLockAcquire(&state->lock);
    >> +
    >> +       state->throttling_counter += increment;
    >> +
    >> +       if (state->parallel_mode)
    >> +               SpinLockRelease(&state->lock);
    >>
    >> I don't like this much. It seems to me that we would do better to use
    >> atomics here all the time, instead of conditional spinlocks.
    >>
    >> +static void
    >> +send_file(basebackup_options *opt, char *file, bool missing_ok)
    >> ...
    >> +       if (file == NULL)
    >> +               return;
    >>
    >> That seems totally inappropriate.
    >>
    >> +                       sendFile(file, file + basepathlen, &statbuf,
    >> true, InvalidOid, NULL, NULL);
    >>
    >> Maybe I'm misunderstanding, but this looks like it's going to write a
    >> tar header, even though we're not writing a tarfile.
    >>
    >> +               else
    >> +                       ereport(WARNING,
    >> +                                       (errmsg("skipping special file
    >> or directory \"%s\"", file)));
    >>
    >> So, if the user asks for a directory or symlink, what's going to
    >> happen is that they're going to receive an empty file, and get a
    >> warning. That sounds like terrible behavior.
    >>
    >> +       /*
    >> +        * Check for checksum failures. If there are failures across
    >> multiple
    >> +        * processes it may not report total checksum count, but it will
    >> error
    >> +        * out,terminating the backup.
    >> +        */
    >>
    >> In other words, the patch breaks the feature. Not that the feature in
    >> question works particularly well as things stand, but this makes it
    >> worse.
    >>
    >> I think this patch (0003) is in really bad shape. I'm having second
    >> thoughts about the design, but it's kind of hard to even have a
    >> discussion about the design when the patch is riddled with minor
    >> problems like inadequate comments, failure to update existing
    >> comments, and breaking a bunch of things. I understand that sometimes
    >> things get missed, but this is version 14 of a patch that's been
    >> kicking around since last August.
    >
    >
    > Fair enough. Some of this is also due to backup related features i.e
    > backup manifest, progress reporting that got committed to master towards
    > the tail end of PG-13. Rushing to get parallel backup feature compatible
    > with these features also caused some of the oversights.
    >
    >
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >>
    >> --
    > Highgo Software (Canada/China/Pakistan)
    > URL : http://www.highgo.ca
    > ADDR: 10318 WHALLEY BLVD, Surrey, BC
    > EMAIL: mailto: ahsan.hadi@highgo.ca
    >
    
  86. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-15T13:31:22Z

    On Wed, Apr 15, 2020 at 4:49 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    > Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committed to master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features also caused some of the oversights.
    
    Sure, but there's also no point in rushing out a feature that's in a
    state where it's got no chance of being acceptable, and quite a number
    of these problems are not new, either.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  87. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-17T06:00:55Z

    On Tue, Apr 14, 2020 at 7:37 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >> Hi Asif
    >>
    >> Getting the following error on Parallel backup when --no-manifest option
    >> is used.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
    >>  /home/edb/Desktop/backup/ --no-manifest
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_10223"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: write-ahead log end point: 0/2000100
    >> pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
    >>  could not open file
    >> "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
    >> or directory
    >> pg_basebackup: removing contents of data directory
    >> "/home/edb/Desktop/backup/"
    >> [edb@localhost bin]$
    >>
    >
    > I forgot to make a check for no-manifest. Fixed. Attached is the updated
    > patch.
    >
    Hi Asif
    
    Verified the fix, thanks.
    
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$ ./pg_basebackup -v -j 5 -D
    /home/edb/Desktop/backup --no-manifest
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_27407"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: write-ahead log end point: 0/4000100
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: syncing data to disk ...
    pg_basebackup: base backup completed
    [edb@localhost bin]$
    [edb@localhost bin]$ ls /home/edb/Desktop/backup
    backup_label  pg_commit_ts  pg_ident.conf  pg_notify    pg_snapshots
    pg_subtrans  PG_VERSION  postgresql.auto.conf
    base          pg_dynshmem   pg_logical     pg_replslot  pg_stat
    pg_tblspc    pg_wal      postgresql.conf
    global        pg_hba.conf   pg_multixact   pg_serial    pg_stat_tmp
    pg_twophase  pg_xact
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    Regards
    Kashif Zeeshan
    
    >
    >
    >> Thanks
    >>
    >> On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>>
    >>> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
    >>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>
    >>>>
    >>>>
    >>>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>> wrote:
    >>>>
    >>>>> Hi,
    >>>>>
    >>>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>>>>
    >>>>> I have added the shared state as previously described. The new grammar
    >>>>> changes
    >>>>> are as follows:
    >>>>>
    >>>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>>>>     - This will generate a unique backupid using pg_strong_random(16)
    >>>>> and hex-encoded
    >>>>>       it. which is then returned as the result set.
    >>>>>     - It will also create a shared state and add it to the hashtable.
    >>>>> The hash table size is set
    >>>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
    >>>>> dynamically, I think it's
    >>>>>       sufficient initial size. max_wal_senders is not used, because it
    >>>>> can be set to quite a
    >>>>>       large values.
    >>>>>
    >>>>> JOIN_BACKUP 'backup_id'
    >>>>>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>>>>
    >>>>>
    >>>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>>>>     - renamed SEND_FILES to SEND_FILE
    >>>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >>>>> accessible through
    >>>>>       shared state.
    >>>>>
    >>>>> There is no change in other commands:
    >>>>> STOP_BACKUP [NOWAIT]
    >>>>> LIST_TABLESPACES [PROGRESS]
    >>>>> LIST_FILES [TABLESPACE]
    >>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>>
    >>>>> The current patches (v11) have been rebased to the latest master. The
    >>>>> backup manifest is enabled
    >>>>> by default, so I have disabled it for parallel backup mode and have
    >>>>> generated a warning so that
    >>>>> user is aware of it and not expect it in the backup.
    >>>>>
    >>>>> Hi Asif
    >>>>
    >>>> I have verified the bug fixes, one bug is fixed and working now as
    >>>> expected
    >>>>
    >>>> For the verification of the other bug fixes faced following issues,
    >>>> please have a look.
    >>>>
    >>>>
    >>>> 1) Following bug fixes mentioned below are generating segmentation
    >>>> fault.
    >>>>
    >>>> Please note for reference I have added a description only as steps were
    >>>> given in previous emails of each bug I tried to verify the fix. Backtrace
    >>>> is also added with each case which points to one bug for both the cases.
    >>>>
    >>>> a) The backup failed with errors "error: could not connect to server:
    >>>> could not look up local user ID 1000: Too many open files" when the
    >>>> max_wal_senders was set to 2000.
    >>>>
    >>>>
    >>>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
    >>>>  /home/edb/Desktop/backup/
    >>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>>> mode
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: backup worker (2) created
    >>>> pg_basebackup: backup worker (3) created
    >>>> ….
    >>>> ….
    >>>> pg_basebackup: backup worker (1014) created
    >>>> pg_basebackup: backup worker (1015) created
    >>>> pg_basebackup: backup worker (1016) created
    >>>> pg_basebackup: backup worker (1017) created
    >>>> pg_basebackup: error: could not connect to server: could not look up
    >>>> local user ID 1000: Too many open files
    >>>> Segmentation fault
    >>>> [edb@localhost bin]$
    >>>>
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ gdb pg_basebackup
    >>>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    >>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >>>> Copyright (C) 2013 Free Software Foundation, Inc.
    >>>> License GPLv3+: GNU GPL version 3 or later <
    >>>> http://gnu.org/licenses/gpl.html>
    >>>> This is free software: you are free to change and redistribute it.
    >>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
    >>>> copying"
    >>>> and "show warranty" for details.
    >>>> This GDB was configured as "x86_64-redhat-linux-gnu".
    >>>> For bug reporting instructions, please see:
    >>>> <http://www.gnu.org/software/gdb/bugs/>...
    >>>> Reading symbols from
    >>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >>>> [New LWP 13219]
    >>>> [New LWP 13222]
    >>>> [Thread debugging using libthread_db enabled]
    >>>> Using host libthread_db library "/lib64/libthread_db.so.1".
    >>>> Core was generated by `./pg_basebackup -v -j 1990 -D
    >>>> /home/edb/Desktop/backup/'.
    >>>> Program terminated with signal 11, Segmentation fault.
    >>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >>>> (gdb) bt
    >>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >>>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    >>>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >>>> at exit.c:77
    >>>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at
    >>>> exit.c:99
    >>>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
    >>>> at pg_basebackup.c:2811
    >>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    >>>> pg_basebackup.c:2765
    >>>> (gdb)
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> b) When executing two backups at the same time, getting FATAL error due
    >>>> to max_wal_senders and instead of exit  Backup got completed.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
    >>>>  /home/edb/Desktop/backup1/
    >>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>>> mode
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: backup worker (2) created
    >>>> pg_basebackup: backup worker (3) created
    >>>> pg_basebackup: backup worker (4) created
    >>>> pg_basebackup: backup worker (5) created
    >>>> pg_basebackup: backup worker (6) created
    >>>> pg_basebackup: error: could not connect to server: FATAL:  number of
    >>>> requested standby connections exceeds max_wal_senders (currently 10)
    >>>> Segmentation fault (core dumped)
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ gdb pg_basebackup
    >>>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    >>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >>>> Copyright (C) 2013 Free Software Foundation, Inc.
    >>>> License GPLv3+: GNU GPL version 3 or later <
    >>>> http://gnu.org/licenses/gpl.html>
    >>>> This is free software: you are free to change and redistribute it.
    >>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
    >>>> copying"
    >>>> and "show warranty" for details.
    >>>> This GDB was configured as "x86_64-redhat-linux-gnu".
    >>>> For bug reporting instructions, please see:
    >>>> <http://www.gnu.org/software/gdb/bugs/>...
    >>>> Reading symbols from
    >>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >>>> [New LWP 17041]
    >>>> [New LWP 17067]
    >>>> [Thread debugging using libthread_db enabled]
    >>>> Using host libthread_db library "/lib64/libthread_db.so.1".
    >>>> Core was generated by `./pg_basebackup -v -j 8 -D
    >>>> /home/edb/Desktop/backup1/'.
    >>>> Program terminated with signal 11, Segmentation fault.
    >>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >>>> (gdb) bt
    >>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >>>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    >>>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >>>> at exit.c:77
    >>>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at
    >>>> exit.c:99
    >>>> #5  0x0000000000408c54 in create_parallel_workers
    >>>> (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
    >>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    >>>> pg_basebackup.c:2765
    >>>> (gdb)
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> 2) The following bug is not fixed yet
    >>>>
    >>>> A similar case is when DB Server is shut down while the Parallel Backup
    >>>> is in progress then the correct error is displayed but then the backup
    >>>> folder is not cleaned and leaves a corrupt backup.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> -j 8
    >>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >>>> mode
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: backup worker (2) created
    >>>> pg_basebackup: backup worker (3) created
    >>>> pg_basebackup: backup worker (4) created
    >>>> pg_basebackup: backup worker (5) created
    >>>> pg_basebackup: backup worker (6) created
    >>>> pg_basebackup: backup worker (7) created
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: removing contents of data directory
    >>>> "/home/edb/Desktop/backup/"
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>>
    >>>>
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    >>>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    >>>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
    >>>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>>
    >>>>
    >>>>
    >>>>
    >>>> Thanks
    >>>> Kashif Zeeshan
    >>>>
    >>>>>
    >>>>>
    >>>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    >>>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >>>>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>>>
    >>>>>>> Hi Asif
    >>>>>>>
    >>>>>>> When a non-existent slot is used with tablespace then correct error
    >>>>>>> is displayed but then the backup folder is not cleaned and leaves a corrupt
    >>>>>>> backup.
    >>>>>>>
    >>>>>>> Steps
    >>>>>>> =======
    >>>>>>>
    >>>>>>> edb@localhost bin]$
    >>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>>>>>> [edb@localhost bin]$
    >>>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>>>>>> CREATE TABLESPACE
    >>>>>>> postgres=#
    >>>>>>> postgres=# create table t1 (a int) tablespace tbl1;
    >>>>>>> CREATE TABLE
    >>>>>>> postgres=# insert into t1 values(100);
    >>>>>>> INSERT 0 1
    >>>>>>> postgres=# insert into t1 values(200);
    >>>>>>> INSERT 0 1
    >>>>>>> postgres=# insert into t1 values(300);
    >>>>>>> INSERT 0 1
    >>>>>>> postgres=#
    >>>>>>>
    >>>>>>>
    >>>>>>> [edb@localhost bin]$
    >>>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>>> complete
    >>>>>>> pg_basebackup: checkpoint completed
    >>>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>>>>>> pg_basebackup: starting background WAL receiver
    >>>>>>> pg_basebackup: error: could not send replication command
    >>>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>>>>>> pg_basebackup: backup worker (0) created
    >>>>>>> pg_basebackup: backup worker (1) created
    >>>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>>>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>>>> pg_basebackup: error: child thread exited with error 1
    >>>>>>> [edb@localhost bin]$
    >>>>>>>
    >>>>>>> backup folder not cleaned
    >>>>>>>
    >>>>>>> [edb@localhost bin]$
    >>>>>>> [edb@localhost bin]$
    >>>>>>> [edb@localhost bin]$
    >>>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>>>>>             postgresql.conf
    >>>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>>>>>>   pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>>>>>  postgresql.auto.conf
    >>>>>>> [edb@localhost bin]$
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>> If the same case is executed without the parallel backup patch then
    >>>>>>> the backup folder is cleaned after the error is displayed.
    >>>>>>>
    >>>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D
    >>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>>> complete
    >>>>>>> pg_basebackup: checkpoint completed
    >>>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>>>>>> pg_basebackup: starting background WAL receiver
    >>>>>>> pg_basebackup: error: could not send replication command
    >>>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>>>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>>>> pg_basebackup: error: child process exited with exit code 1
    >>>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>>>>>> pg_basebackup: changes to tablespace directories will not be undone
    >>>>>>>
    >>>>>>
    >>>>>>
    >>>>>> Hi Asif
    >>>>>>
    >>>>>> A similar case is when DB Server is shut down while the Parallel
    >>>>>> Backup is in progress then the correct error is displayed but then the
    >>>>>> backup folder is not cleaned and leaves a corrupt backup. I think one bug
    >>>>>> fix will solve all these cases where clean up is not done when parallel
    >>>>>> backup is failed.
    >>>>>>
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
    >>>>>>  /home/edb/Desktop/backup/ -j 8
    >>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>> complete
    >>>>>> pg_basebackup: checkpoint completed
    >>>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >>>>>> pg_basebackup: starting background WAL receiver
    >>>>>> pg_basebackup: created temporary replication slot
    >>>>>> "pg_basebackup_57337"
    >>>>>> pg_basebackup: backup worker (0) created
    >>>>>> pg_basebackup: backup worker (1) created
    >>>>>> pg_basebackup: backup worker (2) created
    >>>>>> pg_basebackup: backup worker (3) created
    >>>>>> pg_basebackup: backup worker (4) created
    >>>>>> pg_basebackup: backup worker (5) created
    >>>>>> pg_basebackup: backup worker (6) created
    >>>>>> pg_basebackup: backup worker (7) created
    >>>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>>> connection unexpectedly
    >>>>>> This probably means the server terminated abnormally
    >>>>>> before or while processing the request.
    >>>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>>> connection unexpectedly
    >>>>>> This probably means the server terminated abnormally
    >>>>>> before or while processing the request.
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$
    >>>>>>
    >>>>>> Same case when executed on pg_basebackup without the Parallel backup
    >>>>>> patch then proper clean up is done.
    >>>>>>
    >>>>>> [edb@localhost bin]$
    >>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
    >>>>>>  /home/edb/Desktop/backup/
    >>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>>> complete
    >>>>>> pg_basebackup: checkpoint completed
    >>>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >>>>>> pg_basebackup: starting background WAL receiver
    >>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >>>>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>>>> connection unexpectedly
    >>>>>> This probably means the server terminated abnormally
    >>>>>> before or while processing the request.
    >>>>>> pg_basebackup: removing contents of data directory
    >>>>>> "/home/edb/Desktop/backup/"
    >>>>>> [edb@localhost bin]$
    >>>>>>
    >>>>>> Thanks
    >>>>>>
    >>>>>>
    >>>>>>>
    >>>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>>> wrote:
    >>>>>>>
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>>>>>> wrote:
    >>>>>>>>
    >>>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <
    >>>>>>>>> asifr.rehman@gmail.com> wrote:
    >>>>>>>>> >> Why would you need to do that? As long as the process where
    >>>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>>>>>> >
    >>>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
    >>>>>>>>> not while the backup is
    >>>>>>>>> > in progress. So if the backup is a large one, early error
    >>>>>>>>> detection would be much beneficial.
    >>>>>>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>>>>>
    >>>>>>>>> Because non-parallel backup does not feature early detection of
    >>>>>>>>> this
    >>>>>>>>> error, it is not necessary to make parallel backup do so. Indeed,
    >>>>>>>>> it
    >>>>>>>>> is undesirable. If you want to fix that problem, do it on a
    >>>>>>>>> separate
    >>>>>>>>> thread in a separate patch. A patch proposing to make parallel
    >>>>>>>>> backup
    >>>>>>>>> inconsistent in behavior with non-parallel backup will be
    >>>>>>>>> rejected, at
    >>>>>>>>> least if I have anything to say about it.
    >>>>>>>>>
    >>>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>>>>>> current situation is not great, but promotions ought to be
    >>>>>>>>> relatively
    >>>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>>>>>> also worth considering whether the right fix is to figure out how
    >>>>>>>>> to
    >>>>>>>>> make that case actually work, rather than just making it fail
    >>>>>>>>> quicker.
    >>>>>>>>> I don't currently understand the reason for the prohibition so I
    >>>>>>>>> can't
    >>>>>>>>> express an intelligent opinion on what the right answer is here,
    >>>>>>>>> but
    >>>>>>>>> it seems like it ought to be investigated before somebody goes and
    >>>>>>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>>>>>
    >>>>>>>>
    >>>>>>>> Non-parallel backup already does the early error checking. I only
    >>>>>>>> intended
    >>>>>>>>
    >>>>>>>> to make parallel behave the same as non-parallel here. So, I agree
    >>>>>>>> with
    >>>>>>>>
    >>>>>>>> you that the behavior of parallel backup should be consistent with
    >>>>>>>> the
    >>>>>>>>
    >>>>>>>> non-parallel one.  Please see the code snippet below from
    >>>>>>>>
    >>>>>>>> basebackup.c:sendDir()
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> /*
    >>>>>>>>>
    >>>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with
    >>>>>>>>> an
    >>>>>>>>>
    >>>>>>>>>  * error in that case. The error handler further up will call
    >>>>>>>>>
    >>>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>>>>>
    >>>>>>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>>>>>
    >>>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>>>>>
    >>>>>>>>>  * the backup early than continue to the end and fail there.
    >>>>>>>>>
    >>>>>>>>>  */
    >>>>>>>>>
    >>>>>>>>> CHECK_FOR_INTERRUPTS();
    >>>>>>>>>
    >>>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>>>>>
    >>>>>>>>> ereport(ERROR,
    >>>>>>>>>
    >>>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>>>>>
    >>>>>>>>> errmsg("the standby was promoted during online backup"),
    >>>>>>>>>
    >>>>>>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>>>>>
    >>>>>>>>> "and should not be used. "
    >>>>>>>>>
    >>>>>>>>> "Try taking another online backup.")));
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>> > Okay, then I will add the shared state. And since we are adding
    >>>>>>>>> the shared state, we can use
    >>>>>>>>> > that for throttling, progress-reporting and standby early error
    >>>>>>>>> checking.
    >>>>>>>>>
    >>>>>>>>> Please propose a grammar here for all the new replication commands
    >>>>>>>>> you
    >>>>>>>>> plan to add before going and implement everything. That will make
    >>>>>>>>> it
    >>>>>>>>> easier to hash out the design without forcing you to keep changing
    >>>>>>>>> the
    >>>>>>>>> code. Your design should include a sketch of how several sets of
    >>>>>>>>> coordinating backends taking several concurrent parallel backups
    >>>>>>>>> will
    >>>>>>>>> end up with one shared state per parallel backup.
    >>>>>>>>>
    >>>>>>>>> > There are two possible options:
    >>>>>>>>> >
    >>>>>>>>> > (1) Server may generate a unique ID i.e.
    >>>>>>>>> BackupID=<unique_string> OR
    >>>>>>>>> > (2) (Preferred Option) Use the WAL start location as the
    >>>>>>>>> BackupID.
    >>>>>>>>> >
    >>>>>>>>> > This BackupID should be given back as a response to start backup
    >>>>>>>>> command. All client workers
    >>>>>>>>> > must append this ID to all parallel backup replication commands.
    >>>>>>>>> So that we can use this identifier
    >>>>>>>>> > to search for that particular backup. Does that sound good?
    >>>>>>>>>
    >>>>>>>>> Using the WAL start location as the backup ID seems like it might
    >>>>>>>>> be
    >>>>>>>>> problematic -- could a single checkpoint not end up as the start
    >>>>>>>>> location for multiple backups started at the same time? Whether
    >>>>>>>>> that's
    >>>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
    >>>>>>>>> into
    >>>>>>>>> the wire protocol.
    >>>>>>>>>
    >>>>>>>>> I was thinking that perhaps the client should generate a unique
    >>>>>>>>> backup
    >>>>>>>>> ID, e.g. leader does:
    >>>>>>>>>
    >>>>>>>>> START_BACKUP unique_backup_id [options]...
    >>>>>>>>>
    >>>>>>>>> And then others do:
    >>>>>>>>>
    >>>>>>>>> JOIN_BACKUP unique_backup_id
    >>>>>>>>>
    >>>>>>>>> My thought is that you will have a number of shared memory
    >>>>>>>>> structure
    >>>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>>>>>> state for one backup. The shared state will include
    >>>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the
    >>>>>>>>> backup
    >>>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>>>>>> JOIN_BACKUP would search for one by name.
    >>>>>>>>>
    >>>>>>>>> If you want to generate the name on the server side, then I suppose
    >>>>>>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>>>>>> and clients would have to specify that same backup ID when invoking
    >>>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way
    >>>>>>>>> is
    >>>>>>>>> better. Either way, the backup ID should be something long and
    >>>>>>>>> hard to
    >>>>>>>>> guess, not e.g. the leader processes' PID. I think we should
    >>>>>>>>> generate
    >>>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode
    >>>>>>>>> the
    >>>>>>>>> result to get a string. That way there's almost no risk of two
    >>>>>>>>> backup
    >>>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>>>>>> user trying to screw up somebody else's parallel backup by
    >>>>>>>>> choosing a
    >>>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>>>>>> user with enough access to do that sort of thing can probably
    >>>>>>>>> cause a
    >>>>>>>>> lot worse problems anyway, but it seems pretty easy to guard
    >>>>>>>>> against
    >>>>>>>>> intentional collisions robustly here, so I think we should.
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>>>>>> unique_backup_id’
    >>>>>>>> to make workers find the relevant shared state. There won't be any
    >>>>>>>> need for changing
    >>>>>>>> the grammar for any other command. The START_BACKUP can return the
    >>>>>>>> unique_backup_id
    >>>>>>>> in the result set.
    >>>>>>>>
    >>>>>>>> I am thinking of the following struct for shared state:
    >>>>>>>>
    >>>>>>>>> *typedef* *struct*
    >>>>>>>>>
    >>>>>>>>> {
    >>>>>>>>>
    >>>>>>>>> *char* backupid[NAMEDATALEN];
    >>>>>>>>>
    >>>>>>>>> XLogRecPtr startptr;
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>>> slock_t lock;
    >>>>>>>>>
    >>>>>>>>> int64 throttling_counter;
    >>>>>>>>>
    >>>>>>>>> *bool* backup_started_in_recovery;
    >>>>>>>>>
    >>>>>>>>> } BackupSharedState;
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>> The shared state structure entries would be maintained by a shared
    >>>>>>>> hash table.
    >>>>>>>> There will be one structure per parallel backup. Since a single
    >>>>>>>> parallel backup
    >>>>>>>> can engage more than one wal sender, so I think max_wal_senders
    >>>>>>>> might be a little
    >>>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>>>>>> connections
    >>>>>>>> per parallel backup? Alternatively, we can set a new GUC that
    >>>>>>>> defines the maximum
    >>>>>>>> number of for concurrent parallel backups i.e.
    >>>>>>>> ‘max_concurent_backups_allowed = 10’
    >>>>>>>> perhaps, or we can make it user-configurable.
    >>>>>>>>
    >>>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>>>>>
    >>>>>>>> Checking for Standby Promotion:
    >>>>>>>> At the START_BACKUP command, we initialize
    >>>>>>>> BackupSharedState.backup_started_in_recovery
    >>>>>>>> and keep checking it whenever send_file () is called to send a new
    >>>>>>>> file.
    >>>>>>>>
    >>>>>>>> Throttling:
    >>>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
    >>>>>>>> the same
    >>>>>>>> as for non-parallel backup with the exception that multiple threads
    >>>>>>>> will now be
    >>>>>>>> updating it. So in parallel backup, this will represent the overall
    >>>>>>>> bytes that
    >>>>>>>> have been transferred. So the workers would sleep if they have
    >>>>>>>> exceeded the
    >>>>>>>> limit. Hence, the shared state carries a lock to safely update the
    >>>>>>>> throttling
    >>>>>>>> value atomically.
    >>>>>>>>
    >>>>>>>> Progress Reporting:
    >>>>>>>> Although I think we should add progress-reporting for parallel
    >>>>>>>> backup as a
    >>>>>>>> separate patch. The relevant entries for progress-reporting such as
    >>>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>>>>>> structure
    >>>>>>>> as well.
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Grammar:
    >>>>>>>> There is a change in the resultset being returned for START_BACKUP
    >>>>>>>> command;
    >>>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>>>>>> command is
    >>>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>>>>>> changes
    >>>>>>>> to the grammar.
    >>>>>>>>
    >>>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>>>>>> STOP_BACKUP [NOWAIT]
    >>>>>>>>   - returns startptr, tli, backup_label
    >>>>>>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>>>>>> backend process.
    >>>>>>>>
    >>>>>>>> LIST_TABLESPACES [PROGRESS]
    >>>>>>>> LIST_FILES [TABLESPACE]
    >>>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>>>>>
    >>>>>>>>
    >>>>
    >>>
    >>> Hi,
    >>>
    >>> rebased and updated to the current master (8128b0c1). v13 is attached.
    >>>
    >>> - Fixes the above reported issues.
    >>>
    >>> - Added progress-reporting support for parallel:
    >>> For this, 'backup_streamed' is moved to a shared structure (BackupState)
    >>> as
    >>> pg_atomic_uint64 variable. The worker processes will keep incrementing
    >>> this
    >>> variable.
    >>>
    >>> While files are being transferred from server to client. The main
    >>> process remains
    >>> in an idle state. So after each increment, the worker process will
    >>> signal master to
    >>> update the stats in pg_stat_progress_basebackup view.
    >>>
    >>> The 'tablespace_streamed' column is not updated and will remain empty.
    >>> This is
    >>> because multiple workers may be copying files from different tablespaces.
    >>>
    >>>
    >>> - Added backup manifest:
    >>> The backend workers maintain their own manifest file which contains a
    >>> list of files
    >>> that are being transferred by the work. Once all backup files are
    >>> transferred, the
    >>> workers will create a temp file as
    >>> ('pg_tempdir/temp_file_prefix_backupid.workerid')
    >>> to write the content of the manifest file from BufFile. The workers
    >>> won’t add the
    >>> header, nor the WAL information in their manifest. These two will be
    >>> added by the
    >>> main process while merging all worker manifest files.
    >>>
    >>> The main process will read these individual files and concatenate them
    >>> into a single file
    >>> which is then sent back to the client.
    >>>
    >>> The manifest file is created when the following command is received:
    >>>
    >>>>     BUILD_MANIFEST 'backupid'
    >>>
    >>>
    >>> This is a new replication command. It is sent when pg_basebackup has
    >>> copied all the
    >>> $PGDATA files including WAL files.
    >>>
    >>>
    >>>
    >>> --
    >>> Asif Rehman
    >>> Highgo Software (Canada/China/Pakistan)
    >>> URL : www.highgo.ca
    >>>
    >>>
    >>
    >> --
    >> Regards
    >> ====================================
    >> Kashif Zeeshan
    >> Lead Quality Assurance Engineer / Manager
    >>
    >> EnterpriseDB Corporation
    >> The Enterprise Postgres Company
    >>
    >>
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  88. Re: WIP/PoC for parallel backup

    Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-17T07:08:14Z

    On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
    > kashif.zeeshan@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>> Hi,
    >>>
    >>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
    >>>
    >>> I have added the shared state as previously described. The new grammar
    >>> changes
    >>> are as follows:
    >>>
    >>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    >>>     - This will generate a unique backupid using pg_strong_random(16)
    >>> and hex-encoded
    >>>       it. which is then returned as the result set.
    >>>     - It will also create a shared state and add it to the hashtable.
    >>> The hash table size is set
    >>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
    >>> dynamically, I think it's
    >>>       sufficient initial size. max_wal_senders is not used, because it
    >>> can be set to quite a
    >>>       large values.
    >>>
    >>> JOIN_BACKUP 'backup_id'
    >>>     - finds 'backup_id' in hashtable and attaches it to server process.
    >>>
    >>>
    >>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    >>>     - renamed SEND_FILES to SEND_FILE
    >>>     - removed START_WAL_LOCATION from this because 'startptr' is now
    >>> accessible through
    >>>       shared state.
    >>>
    >>> There is no change in other commands:
    >>> STOP_BACKUP [NOWAIT]
    >>> LIST_TABLESPACES [PROGRESS]
    >>> LIST_FILES [TABLESPACE]
    >>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>
    >>> The current patches (v11) have been rebased to the latest master. The
    >>> backup manifest is enabled
    >>> by default, so I have disabled it for parallel backup mode and have
    >>> generated a warning so that
    >>> user is aware of it and not expect it in the backup.
    >>>
    >>> Hi Asif
    >>
    >> I have verified the bug fixes, one bug is fixed and working now as
    >> expected
    >>
    >> For the verification of the other bug fixes faced following issues,
    >> please have a look.
    >>
    >>
    >> 1) Following bug fixes mentioned below are generating segmentation fault.
    >>
    >> Please note for reference I have added a description only as steps were
    >> given in previous emails of each bug I tried to verify the fix. Backtrace
    >> is also added with each case which points to one bug for both the cases.
    >>
    >> a) The backup failed with errors "error: could not connect to server:
    >> could not look up local user ID 1000: Too many open files" when the
    >> max_wal_senders was set to 2000.
    >>
    >>
    >> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
    >>  /home/edb/Desktop/backup/
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> ….
    >> ….
    >> pg_basebackup: backup worker (1014) created
    >> pg_basebackup: backup worker (1015) created
    >> pg_basebackup: backup worker (1016) created
    >> pg_basebackup: backup worker (1017) created
    >> pg_basebackup: error: could not connect to server: could not look up
    >> local user ID 1000: Too many open files
    >> Segmentation fault
    >> [edb@localhost bin]$
    >>
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ gdb pg_basebackup
    >> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
    >> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >> Copyright (C) 2013 Free Software Foundation, Inc.
    >> License GPLv3+: GNU GPL version 3 or later <
    >> http://gnu.org/licenses/gpl.html>
    >> This is free software: you are free to change and redistribute it.
    >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    >> and "show warranty" for details.
    >> This GDB was configured as "x86_64-redhat-linux-gnu".
    >> For bug reporting instructions, please see:
    >> <http://www.gnu.org/software/gdb/bugs/>...
    >> Reading symbols from
    >> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >> [New LWP 13219]
    >> [New LWP 13222]
    >> [Thread debugging using libthread_db enabled]
    >> Using host libthread_db library "/lib64/libthread_db.so.1".
    >> Core was generated by `./pg_basebackup -v -j 1990 -D
    >> /home/edb/Desktop/backup/'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >> (gdb) bt
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
    >> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >> at exit.c:77
    >> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
    >> at pg_basebackup.c:2811
    >> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
    >> pg_basebackup.c:2765
    >> (gdb)
    >>
    >>
    >>
    >>
    >> b) When executing two backups at the same time, getting FATAL error due
    >> to max_wal_senders and instead of exit  Backup got completed.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
    >>  /home/edb/Desktop/backup1/
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: backup worker (5) created
    >> pg_basebackup: backup worker (6) created
    >> pg_basebackup: error: could not connect to server: FATAL:  number of
    >> requested standby connections exceeds max_wal_senders (currently 10)
    >> Segmentation fault (core dumped)
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ gdb pg_basebackup
    >> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
    >> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
    >> Copyright (C) 2013 Free Software Foundation, Inc.
    >> License GPLv3+: GNU GPL version 3 or later <
    >> http://gnu.org/licenses/gpl.html>
    >> This is free software: you are free to change and redistribute it.
    >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    >> and "show warranty" for details.
    >> This GDB was configured as "x86_64-redhat-linux-gnu".
    >> For bug reporting instructions, please see:
    >> <http://www.gnu.org/software/gdb/bugs/>...
    >> Reading symbols from
    >> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
    >> [New LWP 17041]
    >> [New LWP 17067]
    >> [Thread debugging using libthread_db enabled]
    >> Using host libthread_db library "/lib64/libthread_db.so.1".
    >> Core was generated by `./pg_basebackup -v -j 8 -D
    >> /home/edb/Desktop/backup1/'.
    >> Program terminated with signal 11, Segmentation fault.
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
    >> (gdb) bt
    >> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
    >> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
    >> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
    >> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
    >> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
    >> at exit.c:77
    >> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
    >> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
    >> at pg_basebackup.c:2811
    >> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
    >> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
    >> pg_basebackup.c:2765
    >> (gdb)
    >>
    >>
    >>
    >>
    >> 2) The following bug is not fixed yet
    >>
    >> A similar case is when DB Server is shut down while the Parallel Backup
    >> is in progress then the correct error is displayed but then the backup
    >> folder is not cleaned and leaves a corrupt backup.
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
    >> 8
    >> pg_basebackup: warning: backup manifest is disabled in parallel backup
    >> mode
    >> pg_basebackup: initiating base backup, waiting for checkpoint to complete
    >> pg_basebackup: checkpoint completed
    >> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    >> pg_basebackup: starting background WAL receiver
    >> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    >> pg_basebackup: backup worker (0) created
    >> pg_basebackup: backup worker (1) created
    >> pg_basebackup: backup worker (2) created
    >> pg_basebackup: backup worker (3) created
    >> pg_basebackup: backup worker (4) created
    >> pg_basebackup: backup worker (5) created
    >> pg_basebackup: backup worker (6) created
    >> pg_basebackup: backup worker (7) created
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> pg_basebackup: removing contents of data directory
    >> "/home/edb/Desktop/backup/"
    >> pg_basebackup: error: could not read COPY data: server closed the
    >> connection unexpectedly
    >> This probably means the server terminated abnormally
    >> before or while processing the request.
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >>
    >>
    >>
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    >> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    >> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
    >>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    >> [edb@localhost bin]$
    >> [edb@localhost bin]$
    >>
    >>
    >>
    >>
    >> Thanks
    >> Kashif Zeeshan
    >>
    >>>
    >>>
    >>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
    >>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>
    >>>>
    >>>>
    >>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
    >>>> kashif.zeeshan@enterprisedb.com> wrote:
    >>>>
    >>>>> Hi Asif
    >>>>>
    >>>>> When a non-existent slot is used with tablespace then correct error is
    >>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
    >>>>> backup.
    >>>>>
    >>>>> Steps
    >>>>> =======
    >>>>>
    >>>>> edb@localhost bin]$
    >>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
    >>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
    >>>>> [edb@localhost bin]$
    >>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
    >>>>> CREATE TABLESPACE
    >>>>> postgres=#
    >>>>> postgres=# create table t1 (a int) tablespace tbl1;
    >>>>> CREATE TABLE
    >>>>> postgres=# insert into t1 values(100);
    >>>>> INSERT 0 1
    >>>>> postgres=# insert into t1 values(200);
    >>>>> INSERT 0 1
    >>>>> postgres=# insert into t1 values(300);
    >>>>> INSERT 0 1
    >>>>> postgres=#
    >>>>>
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
    >>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: error: could not send replication command
    >>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
    >>>>> pg_basebackup: backup worker (0) created
    >>>>> pg_basebackup: backup worker (1) created
    >>>>> pg_basebackup: write-ahead log end point: 0/2E000100
    >>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>> pg_basebackup: error: child thread exited with error 1
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>> backup folder not cleaned
    >>>>>
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$
    >>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
    >>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
    >>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
    >>>>>             postgresql.conf
    >>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
    >>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
    >>>>>  postgresql.auto.conf
    >>>>> [edb@localhost bin]$
    >>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>> If the same case is executed without the parallel backup patch then
    >>>>> the backup folder is cleaned after the error is displayed.
    >>>>>
    >>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
    >>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>>> complete
    >>>>> pg_basebackup: checkpoint completed
    >>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
    >>>>> pg_basebackup: starting background WAL receiver
    >>>>> pg_basebackup: error: could not send replication command
    >>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
    >>>>> pg_basebackup: write-ahead log end point: 0/2B000100
    >>>>> pg_basebackup: waiting for background process to finish streaming ...
    >>>>> pg_basebackup: error: child process exited with exit code 1
    >>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
    >>>>> pg_basebackup: changes to tablespace directories will not be undone
    >>>>>
    >>>>
    >>>>
    >>>> Hi Asif
    >>>>
    >>>> A similar case is when DB Server is shut down while the Parallel Backup
    >>>> is in progress then the correct error is displayed but then the backup
    >>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
    >>>> solve all these cases where clean up is not done when parallel backup is
    >>>> failed.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> -j 8
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
    >>>> pg_basebackup: backup worker (0) created
    >>>> pg_basebackup: backup worker (1) created
    >>>> pg_basebackup: backup worker (2) created
    >>>> pg_basebackup: backup worker (3) created
    >>>> pg_basebackup: backup worker (4) created
    >>>> pg_basebackup: backup worker (5) created
    >>>> pg_basebackup: backup worker (6) created
    >>>> pg_basebackup: backup worker (7) created
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$
    >>>>
    >>>> Same case when executed on pg_basebackup without the Parallel backup
    >>>> patch then proper clean up is done.
    >>>>
    >>>> [edb@localhost bin]$
    >>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
    >>>> pg_basebackup: initiating base backup, waiting for checkpoint to
    >>>> complete
    >>>> pg_basebackup: checkpoint completed
    >>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
    >>>> pg_basebackup: starting background WAL receiver
    >>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
    >>>> pg_basebackup: error: could not read COPY data: server closed the
    >>>> connection unexpectedly
    >>>> This probably means the server terminated abnormally
    >>>> before or while processing the request.
    >>>> pg_basebackup: removing contents of data directory
    >>>> "/home/edb/Desktop/backup/"
    >>>> [edb@localhost bin]$
    >>>>
    >>>> Thanks
    >>>>
    >>>>
    >>>>>
    >>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
    >>>>> wrote:
    >>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
    >>>>>> wrote:
    >>>>>>
    >>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
    >>>>>>> wrote:
    >>>>>>> >> Why would you need to do that? As long as the process where
    >>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
    >>>>>>> >
    >>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
    >>>>>>> not while the backup is
    >>>>>>> > in progress. So if the backup is a large one, early error
    >>>>>>> detection would be much beneficial.
    >>>>>>> > This is the current behavior of non-parallel backup as well.
    >>>>>>>
    >>>>>>> Because non-parallel backup does not feature early detection of this
    >>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
    >>>>>>> is undesirable. If you want to fix that problem, do it on a separate
    >>>>>>> thread in a separate patch. A patch proposing to make parallel backup
    >>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
    >>>>>>> at
    >>>>>>> least if I have anything to say about it.
    >>>>>>>
    >>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
    >>>>>>> current situation is not great, but promotions ought to be relatively
    >>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
    >>>>>>> also worth considering whether the right fix is to figure out how to
    >>>>>>> make that case actually work, rather than just making it fail
    >>>>>>> quicker.
    >>>>>>> I don't currently understand the reason for the prohibition so I
    >>>>>>> can't
    >>>>>>> express an intelligent opinion on what the right answer is here, but
    >>>>>>> it seems like it ought to be investigated before somebody goes and
    >>>>>>> builds a bunch of infrastructure to make the error more timely.
    >>>>>>>
    >>>>>>
    >>>>>> Non-parallel backup already does the early error checking. I only
    >>>>>> intended
    >>>>>>
    >>>>>> to make parallel behave the same as non-parallel here. So, I agree
    >>>>>> with
    >>>>>>
    >>>>>> you that the behavior of parallel backup should be consistent with the
    >>>>>>
    >>>>>> non-parallel one.  Please see the code snippet below from
    >>>>>>
    >>>>>> basebackup.c:sendDir()
    >>>>>>
    >>>>>>
    >>>>>> /*
    >>>>>>>
    >>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
    >>>>>>>
    >>>>>>>  * error in that case. The error handler further up will call
    >>>>>>>
    >>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
    >>>>>>>
    >>>>>>>  * started while still in recovery, the server wasn't promoted.
    >>>>>>>
    >>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
    >>>>>>>
    >>>>>>>  * the backup early than continue to the end and fail there.
    >>>>>>>
    >>>>>>>  */
    >>>>>>>
    >>>>>>> CHECK_FOR_INTERRUPTS();
    >>>>>>>
    >>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
    >>>>>>>
    >>>>>>> ereport(ERROR,
    >>>>>>>
    >>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    >>>>>>>
    >>>>>>> errmsg("the standby was promoted during online backup"),
    >>>>>>>
    >>>>>>> errhint("This means that the backup being taken is corrupt "
    >>>>>>>
    >>>>>>> "and should not be used. "
    >>>>>>>
    >>>>>>> "Try taking another online backup.")));
    >>>>>>>
    >>>>>>>
    >>>>>>> > Okay, then I will add the shared state. And since we are adding
    >>>>>>> the shared state, we can use
    >>>>>>> > that for throttling, progress-reporting and standby early error
    >>>>>>> checking.
    >>>>>>>
    >>>>>>> Please propose a grammar here for all the new replication commands
    >>>>>>> you
    >>>>>>> plan to add before going and implement everything. That will make it
    >>>>>>> easier to hash out the design without forcing you to keep changing
    >>>>>>> the
    >>>>>>> code. Your design should include a sketch of how several sets of
    >>>>>>> coordinating backends taking several concurrent parallel backups will
    >>>>>>> end up with one shared state per parallel backup.
    >>>>>>>
    >>>>>>> > There are two possible options:
    >>>>>>> >
    >>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
    >>>>>>> OR
    >>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
    >>>>>>> >
    >>>>>>> > This BackupID should be given back as a response to start backup
    >>>>>>> command. All client workers
    >>>>>>> > must append this ID to all parallel backup replication commands.
    >>>>>>> So that we can use this identifier
    >>>>>>> > to search for that particular backup. Does that sound good?
    >>>>>>>
    >>>>>>> Using the WAL start location as the backup ID seems like it might be
    >>>>>>> problematic -- could a single checkpoint not end up as the start
    >>>>>>> location for multiple backups started at the same time? Whether
    >>>>>>> that's
    >>>>>>> possible now or not, it seems unwise to hard-wire that assumption
    >>>>>>> into
    >>>>>>> the wire protocol.
    >>>>>>>
    >>>>>>> I was thinking that perhaps the client should generate a unique
    >>>>>>> backup
    >>>>>>> ID, e.g. leader does:
    >>>>>>>
    >>>>>>> START_BACKUP unique_backup_id [options]...
    >>>>>>>
    >>>>>>> And then others do:
    >>>>>>>
    >>>>>>> JOIN_BACKUP unique_backup_id
    >>>>>>>
    >>>>>>> My thought is that you will have a number of shared memory structure
    >>>>>>> equal to max_wal_senders, each one large enough to hold the shared
    >>>>>>> state for one backup. The shared state will include
    >>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
    >>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
    >>>>>>> JOIN_BACKUP would search for one by name.
    >>>>>>>
    >>>>>>> If you want to generate the name on the server side, then I suppose
    >>>>>>> START_BACKUP would return a result set that includes the backup ID,
    >>>>>>> and clients would have to specify that same backup ID when invoking
    >>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
    >>>>>>> better. Either way, the backup ID should be something long and hard
    >>>>>>> to
    >>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
    >>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
    >>>>>>> result to get a string. That way there's almost no risk of two backup
    >>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
    >>>>>>> user trying to screw up somebody else's parallel backup by choosing a
    >>>>>>> colliding backup ID, it would be pretty hard to have any success. A
    >>>>>>> user with enough access to do that sort of thing can probably cause a
    >>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
    >>>>>>> intentional collisions robustly here, so I think we should.
    >>>>>>>
    >>>>>>>
    >>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
    >>>>>> unique_backup_id’
    >>>>>> to make workers find the relevant shared state. There won't be any
    >>>>>> need for changing
    >>>>>> the grammar for any other command. The START_BACKUP can return the
    >>>>>> unique_backup_id
    >>>>>> in the result set.
    >>>>>>
    >>>>>> I am thinking of the following struct for shared state:
    >>>>>>
    >>>>>>> *typedef* *struct*
    >>>>>>>
    >>>>>>> {
    >>>>>>>
    >>>>>>> *char* backupid[NAMEDATALEN];
    >>>>>>>
    >>>>>>> XLogRecPtr startptr;
    >>>>>>>
    >>>>>>>
    >>>>>>> slock_t lock;
    >>>>>>>
    >>>>>>> int64 throttling_counter;
    >>>>>>>
    >>>>>>> *bool* backup_started_in_recovery;
    >>>>>>>
    >>>>>>> } BackupSharedState;
    >>>>>>>
    >>>>>>>
    >>>>>> The shared state structure entries would be maintained by a shared
    >>>>>> hash table.
    >>>>>> There will be one structure per parallel backup. Since a single
    >>>>>> parallel backup
    >>>>>> can engage more than one wal sender, so I think max_wal_senders might
    >>>>>> be a little
    >>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
    >>>>>> connections
    >>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
    >>>>>> the maximum
    >>>>>> number of for concurrent parallel backups i.e.
    >>>>>> ‘max_concurent_backups_allowed = 10’
    >>>>>> perhaps, or we can make it user-configurable.
    >>>>>>
    >>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
    >>>>>>
    >>>>>> Checking for Standby Promotion:
    >>>>>> At the START_BACKUP command, we initialize
    >>>>>> BackupSharedState.backup_started_in_recovery
    >>>>>> and keep checking it whenever send_file () is called to send a new
    >>>>>> file.
    >>>>>>
    >>>>>> Throttling:
    >>>>>> BackupSharedState.throttling_counter - The throttling logic remains
    >>>>>> the same
    >>>>>> as for non-parallel backup with the exception that multiple threads
    >>>>>> will now be
    >>>>>> updating it. So in parallel backup, this will represent the overall
    >>>>>> bytes that
    >>>>>> have been transferred. So the workers would sleep if they have
    >>>>>> exceeded the
    >>>>>> limit. Hence, the shared state carries a lock to safely update the
    >>>>>> throttling
    >>>>>> value atomically.
    >>>>>>
    >>>>>> Progress Reporting:
    >>>>>> Although I think we should add progress-reporting for parallel backup
    >>>>>> as a
    >>>>>> separate patch. The relevant entries for progress-reporting such as
    >>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
    >>>>>> structure
    >>>>>> as well.
    >>>>>>
    >>>>>>
    >>>>>> Grammar:
    >>>>>> There is a change in the resultset being returned for START_BACKUP
    >>>>>> command;
    >>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
    >>>>>> command is
    >>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
    >>>>>> changes
    >>>>>> to the grammar.
    >>>>>>
    >>>>>> START_BACKUP [LABEL '<label>'] [FAST]
    >>>>>>   - returns startptr, tli, backup_label, unique_backup_id
    >>>>>> STOP_BACKUP [NOWAIT]
    >>>>>>   - returns startptr, tli, backup_label
    >>>>>> JOIN_BACKUP ‘unique_backup_id’
    >>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
    >>>>>> backend process.
    >>>>>>
    >>>>>> LIST_TABLESPACES [PROGRESS]
    >>>>>> LIST_FILES [TABLESPACE]
    >>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
    >>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
    >>>>>>
    >>>>>>
    >>
    >
    > Hi,
    >
    > rebased and updated to the current master (8128b0c1). v13 is attached.
    >
    > - Fixes the above reported issues.
    >
    
    Hi Asif
    
    I have verified the bug fixes, out of 3 bugs 2 are now fixed but the
    following issue is still not fixed.
    
    *A similar case is when DB Server is shut down while the Parallel Backup is
    in progress then the correct error is displayed but then the backup folder
    is not cleaned and leaves a corrupt backup. *
    
    [edb@localhost bin]$
    [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
    pg_basebackup: warning: backup manifest is disabled in parallel backup mode
    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created temporary replication slot "pg_basebackup_16235"
    pg_basebackup: backup worker (0) created
    pg_basebackup: backup worker (1) created
    pg_basebackup: backup worker (2) created
    pg_basebackup: backup worker (3) created
    pg_basebackup: backup worker (4) created
    pg_basebackup: backup worker (5) created
    pg_basebackup: backup worker (6) created
    pg_basebackup: backup worker (7) created
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    pg_basebackup: removing contents of data directory
    "/home/edb/Desktop/backup/"
    pg_basebackup: error: could not read COPY data: server closed the
    connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    [edb@localhost bin]$
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    
    
    [edb@localhost bin]$
    [edb@localhost bin]$ ls /home/edb/Desktop/backup
    base         pg_hba.conf    pg_logical    pg_notify    pg_serial
    pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
    pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
     pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
    [edb@localhost bin]$
    [edb@localhost bin]$
    
    
    
    Thanks
    Kashif zeeshan
    
    
    >
    > - Added progress-reporting support for parallel:
    > For this, 'backup_streamed' is moved to a shared structure (BackupState) as
    > pg_atomic_uint64 variable. The worker processes will keep incrementing this
    > variable.
    >
    > While files are being transferred from server to client. The main process
    > remains
    > in an idle state. So after each increment, the worker process will signal
    > master to
    > update the stats in pg_stat_progress_basebackup view.
    >
    > The 'tablespace_streamed' column is not updated and will remain empty.
    > This is
    > because multiple workers may be copying files from different tablespaces.
    >
    >
    > - Added backup manifest:
    > The backend workers maintain their own manifest file which contains a list
    > of files
    > that are being transferred by the work. Once all backup files are
    > transferred, the
    > workers will create a temp file as
    > ('pg_tempdir/temp_file_prefix_backupid.workerid')
    > to write the content of the manifest file from BufFile. The workers won’t
    > add the
    > header, nor the WAL information in their manifest. These two will be added
    > by the
    > main process while merging all worker manifest files.
    >
    > The main process will read these individual files and concatenate them
    > into a single file
    > which is then sent back to the client.
    >
    > The manifest file is created when the following command is received:
    >
    >>     BUILD_MANIFEST 'backupid'
    >
    >
    > This is a new replication command. It is sent when pg_basebackup has
    > copied all the
    > $PGDATA files including WAL files.
    >
    >
    >
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
    -- 
    Regards
    ====================================
    Kashif Zeeshan
    Lead Quality Assurance Engineer / Manager
    
    EnterpriseDB Corporation
    The Enterprise Postgres Company
    
  89. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T04:27:31Z

    On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > I forgot to make a check for no-manifest. Fixed. Attached is the updated
    > patch.
    >
    >
    Have we done any performance testing with this patch to see the benefits?
    If so, can you point me to the results? If not, then can we perform some
    tests on large backups to see the benefits of this patch/idea?
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
  90. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-21T07:29:47Z

    Hi,
    
    I did some tests a while back, and here are the results. The tests were
    done to simulate
    a live database environment using pgbench.
    
    machine configuration used for this test:
    Instance Type:    t2.xlarge
    Volume Type  :    io1
    Memory (MiB) :    16384
    vCPU #           :    4
    Architecture    :    X86_64
    IOP                 :    16000
    Database Size (GB) :    102
    
    The setup consist of 3 machines.
    - one for database instances
    - one for pg_basebackup client and
    - one for pgbench with some parallel workers, simulating SELECT loads.
    
                                       basebackup | 4 workers | 8 Workers  | 16
    workers
    Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
    (pgbench running with 50 parallel client simulating SELECT load)
    
    Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    (pgbench running with 100 parallel client simulating SELECT load)
    
    
    
    On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    
    > On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >> I forgot to make a check for no-manifest. Fixed. Attached is the updated
    >> patch.
    >>
    >>
    > Have we done any performance testing with this patch to see the benefits?
    > If so, can you point me to the results? If not, then can we perform some
    > tests on large backups to see the benefits of this patch/idea?
    >
    > --
    > With Regards,
    > Amit Kapila.
    > EnterpriseDB: http://www.enterprisedb.com
    >
    
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  91. Re: WIP/PoC for parallel backup

    Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2020-04-21T09:35:38Z

    Hi Asif,
    
    On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    > Hi,
    >
    > I did some tests a while back, and here are the results. The tests were
    > done to simulate
    > a live database environment using pgbench.
    >
    > machine configuration used for this test:
    > Instance Type:    t2.xlarge
    > Volume Type  :    io1
    > Memory (MiB) :    16384
    > vCPU #           :    4
    > Architecture    :    X86_64
    > IOP                 :    16000
    > Database Size (GB) :    102
    >
    > The setup consist of 3 machines.
    > - one for database instances
    > - one for pg_basebackup client and
    > - one for pgbench with some parallel workers, simulating SELECT loads.
    >
    >                                    basebackup | 4 workers | 8 Workers  |
    > 16 workers
    > Backup Duration(Min):       69.25    |  20.44      | 19.86          |
    > 20.15
    > (pgbench running with 50 parallel client simulating SELECT load)
    >
    
    
    Well that looks a bit strange. All 4, 8 and 16 workers backup configurations
    seem to have taken the same time. Is it because the machine CPUs are
    only 4? In that case did you try to run with 2-workers and compare that
    with 4-workers time?
    
    Also, just to clarify and be sure - was there anything else running on any
    of
    these 3 machines while the backup was in progress.
    
    Regards,
    Jeevan Ladhe
    
    
    > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    > (pgbench running with 100 parallel client simulating SELECT load)
    >
    >
    >
    > On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    >
    >> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>> I forgot to make a check for no-manifest. Fixed. Attached is the updated
    >>> patch.
    >>>
    >>>
    >> Have we done any performance testing with this patch to see the benefits?
    >> If so, can you point me to the results? If not, then can we perform some
    >> tests on large backups to see the benefits of this patch/idea?
    >>
    >> --
    >> With Regards,
    >> Amit Kapila.
    >> EnterpriseDB: http://www.enterprisedb.com
    >>
    >
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  92. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-21T10:16:33Z

    On Tue, 21 Apr 2020 at 2:36 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
    wrote:
    
    > Hi Asif,
    >
    > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >> Hi,
    >>
    >> I did some tests a while back, and here are the results. The tests were
    >> done to simulate
    >> a live database environment using pgbench.
    >>
    >> machine configuration used for this test:
    >> Instance Type:    t2.xlarge
    >> Volume Type  :    io1
    >> Memory (MiB) :    16384
    >> vCPU #           :    4
    >> Architecture    :    X86_64
    >> IOP                 :    16000
    >> Database Size (GB) :    102
    >>
    >> The setup consist of 3 machines.
    >> - one for database instances
    >> - one for pg_basebackup client and
    >> - one for pgbench with some parallel workers, simulating SELECT loads.
    >>
    >>                                    basebackup | 4 workers | 8 Workers  |
    >> 16 workers
    >> Backup Duration(Min):       69.25    |  20.44      | 19.86          |
    >> 20.15
    >> (pgbench running with 50 parallel client simulating SELECT load)
    >>
    >
    >
    > Well that looks a bit strange. All 4, 8 and 16 workers backup
    > configurations
    > seem to have taken the same time. Is it because the machine CPUs are
    > only 4? In that case did you try to run with 2-workers and compare that
    > with 4-workers time?
    >
    > Also, just to clarify and be sure - was there anything else running on any
    > of
    > these 3 machines while the backup was in progress.
    >
    
    The tests were performed only for 4, 8 and 16 at the time and there was
    nothing else running on any of the machines.
    
    
    > Regards,
    > Jeevan Ladhe
    >
    >
    >> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    >> (pgbench running with 100 parallel client simulating SELECT load)
    >>
    >>
    >>
    >> On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com>
    >> wrote:
    >>
    >>> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
    >>> wrote:
    >>>
    >>>>
    >>>> I forgot to make a check for no-manifest. Fixed. Attached is the
    >>>> updated patch.
    >>>>
    >>>>
    >>> Have we done any performance testing with this patch to see the
    >>> benefits? If so, can you point me to the results? If not, then can we
    >>> perform some tests on large backups to see the benefits of this patch/idea?
    >>>
    >>> --
    >>> With Regards,
    >>> Amit Kapila.
    >>> EnterpriseDB: http://www.enterprisedb.com
    >>>
    >>
    >>
    >> --
    >> --
    >> Asif Rehman
    >> Highgo Software (Canada/China/Pakistan)
    >> URL : www.highgo.ca
    >>
    >> --
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  93. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T11:48:17Z

    On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >
    > I did some tests a while back, and here are the results. The tests were done to simulate
    > a live database environment using pgbench.
    >
    > machine configuration used for this test:
    > Instance Type:    t2.xlarge
    > Volume Type  :    io1
    > Memory (MiB) :    16384
    > vCPU #           :    4
    > Architecture    :    X86_64
    > IOP                 :    16000
    > Database Size (GB) :    102
    >
    > The setup consist of 3 machines.
    > - one for database instances
    > - one for pg_basebackup client and
    > - one for pgbench with some parallel workers, simulating SELECT loads.
    >
    >                                    basebackup | 4 workers | 8 Workers  | 16 workers
    > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
    > (pgbench running with 50 parallel client simulating SELECT load)
    >
    > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    > (pgbench running with 100 parallel client simulating SELECT load)
    >
    
    Thanks for sharing the results, these show nice speedup!  However, I
    think we should try to find what exactly causes this speed up.  If you
    see the recent discussion on another thread related to this topic,
    Andres, pointed out that he doesn't think that we can gain much by
    having multiple connections[1].  It might be due to some internal
    limitations (like small buffers) [2] due to which we are seeing these
    speedups.  It might help if you can share the perf reports of the
    server-side and pg_basebackup side.  We don't need pgbench type
    workload to see what caused speed up.
    
    [1] - https://www.postgresql.org/message-id/20200420201922.55ab7ovg6535suyz%40alap3.anarazel.de
    [2] - https://www.postgresql.org/message-id/20200421064420.z7eattzqbunbutz3%40alap3.anarazel.de
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  94. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T11:49:45Z

    On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > >
    > > I did some tests a while back, and here are the results. The tests were done to simulate
    > > a live database environment using pgbench.
    > >
    > > machine configuration used for this test:
    > > Instance Type:    t2.xlarge
    > > Volume Type  :    io1
    > > Memory (MiB) :    16384
    > > vCPU #           :    4
    > > Architecture    :    X86_64
    > > IOP                 :    16000
    > > Database Size (GB) :    102
    > >
    > > The setup consist of 3 machines.
    > > - one for database instances
    > > - one for pg_basebackup client and
    > > - one for pgbench with some parallel workers, simulating SELECT loads.
    > >
    > >                                    basebackup | 4 workers | 8 Workers  | 16 workers
    > > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
    > > (pgbench running with 50 parallel client simulating SELECT load)
    > >
    > > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    > > (pgbench running with 100 parallel client simulating SELECT load)
    > >
    >
    > Thanks for sharing the results, these show nice speedup!  However, I
    > think we should try to find what exactly causes this speed up.  If you
    > see the recent discussion on another thread related to this topic,
    > Andres, pointed out that he doesn't think that we can gain much by
    > having multiple connections[1].  It might be due to some internal
    > limitations (like small buffers) [2] due to which we are seeing these
    > speedups.  It might help if you can share the perf reports of the
    > server-side and pg_basebackup side.
    >
    
    Just to be clear, we need perf reports both with and without patch-set.
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  95. Re: WIP/PoC for parallel backup

    Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-04-21T11:56:16Z

    On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    
    > On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    > >
    > > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    > > >
    > > > I did some tests a while back, and here are the results. The tests
    > were done to simulate
    > > > a live database environment using pgbench.
    > > >
    > > > machine configuration used for this test:
    > > > Instance Type:    t2.xlarge
    > > > Volume Type  :    io1
    > > > Memory (MiB) :    16384
    > > > vCPU #           :    4
    > > > Architecture    :    X86_64
    > > > IOP                 :    16000
    > > > Database Size (GB) :    102
    > > >
    > > > The setup consist of 3 machines.
    > > > - one for database instances
    > > > - one for pg_basebackup client and
    > > > - one for pgbench with some parallel workers, simulating SELECT loads.
    > > >
    > > >                                    basebackup | 4 workers | 8 Workers
    > | 16 workers
    > > > Backup Duration(Min):       69.25    |  20.44      | 19.86          |
    > 20.15
    > > > (pgbench running with 50 parallel client simulating SELECT load)
    > > >
    > > > Backup Duration(Min):       154.75   |  49.28     | 45.27         |
    > 20.35
    > > > (pgbench running with 100 parallel client simulating SELECT load)
    > > >
    > >
    > > Thanks for sharing the results, these show nice speedup!  However, I
    > > think we should try to find what exactly causes this speed up.  If you
    > > see the recent discussion on another thread related to this topic,
    > > Andres, pointed out that he doesn't think that we can gain much by
    > > having multiple connections[1].  It might be due to some internal
    > > limitations (like small buffers) [2] due to which we are seeing these
    > > speedups.  It might help if you can share the perf reports of the
    > > server-side and pg_basebackup side.
    > >
    >
    > Just to be clear, we need perf reports both with and without patch-set.
    >
    
    These tests were done a while back, I think it would be good to run the
    benchmark again with the latest patches of parallel backup and share the
    results and perf reports.
    
    >
    > --
    > With Regards,
    > Amit Kapila.
    > EnterpriseDB: http://www.enterprisedb.com
    >
    >
    >
    
    -- 
    Highgo Software (Canada/China/Pakistan)
    URL : http://www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    EMAIL: mailto: ahsan.hadi@highgo.ca
    
  96. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T14:12:37Z

    On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    >
    > On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >>
    >> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >> >
    >> > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >> > >
    >> > > I did some tests a while back, and here are the results. The tests were done to simulate
    >> > > a live database environment using pgbench.
    >> > >
    >> > > machine configuration used for this test:
    >> > > Instance Type:    t2.xlarge
    >> > > Volume Type  :    io1
    >> > > Memory (MiB) :    16384
    >> > > vCPU #           :    4
    >> > > Architecture    :    X86_64
    >> > > IOP                 :    16000
    >> > > Database Size (GB) :    102
    >> > >
    >> > > The setup consist of 3 machines.
    >> > > - one for database instances
    >> > > - one for pg_basebackup client and
    >> > > - one for pgbench with some parallel workers, simulating SELECT loads.
    >> > >
    >> > >                                    basebackup | 4 workers | 8 Workers  | 16 workers
    >> > > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
    >> > > (pgbench running with 50 parallel client simulating SELECT load)
    >> > >
    >> > > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    >> > > (pgbench running with 100 parallel client simulating SELECT load)
    >> > >
    >> >
    >> > Thanks for sharing the results, these show nice speedup!  However, I
    >> > think we should try to find what exactly causes this speed up.  If you
    >> > see the recent discussion on another thread related to this topic,
    >> > Andres, pointed out that he doesn't think that we can gain much by
    >> > having multiple connections[1].  It might be due to some internal
    >> > limitations (like small buffers) [2] due to which we are seeing these
    >> > speedups.  It might help if you can share the perf reports of the
    >> > server-side and pg_basebackup side.
    >> >
    >>
    >> Just to be clear, we need perf reports both with and without patch-set.
    >
    >
    > These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallel backup and share the results and perf reports.
    >
    
    Sounds good. I think we should also try to run the test with 1 worker
    as well.  The reason it will be good to see the results with 1 worker
    is that we can know if the technique to send file by file as is done
    in this patch is better or worse than the current HEAD code.  So, it
    will be good to see the results of an unpatched code, 1 worker, 2
    workers, 4 workers, etc.
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  97. Re: WIP/PoC for parallel backup

    Dipesh Pandit <dipesh.pandit@gmail.com> — 2020-04-22T11:07:25Z

    Hi Asif,
    
    I am reviewing your recent patch and found the patch is not applicable on latest master. 
    
    Could you please resolve the conflicts and update a new patch?
    
    Thanks,
    Dipesh
    EnterpriseDB: http://www.enterprisedb.com
  98. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-22T14:18:20Z

    Hi Dipesh,
    
    The rebased and updated patch is attached. Its rebased to (9f2c4ede).
    
    
    > +typedef struct
    > +{
    > ...
    > +} BackupFile;
    > +
    > +typedef struct
    > +{
    > ...
    > +} BackupState;
    >
    > These structures need comments.
    >
    Done.
    
    
    >
    > +list_wal_files_opt_list:
    > +                       SCONST SCONST
    >                                 {
    > -                                 $$ = makeDefElem("manifest_checksums",
    > -
    > (Node *)makeString($2), -1);
    > +                                       $$ = list_make2(
    > +                                       makeDefElem("start_wal_location",
    > +                                               (Node *)makeString($2),
    > -1),
    > +                                       makeDefElem("end_wal_location",
    > +                                               (Node *)makeString($2),
    > -1));
    > +
    >                                 }
    >
    > This seems like an unnecessarily complicated parse representation. The
    > DefElems seem to be completely unnecessary here.
    >
    
    The startptr and endptr are now in a shared state. so this command does not
    need to have these two options now. So I have removed this rule entirely.
    
    
    > @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
    >                 set_ps_display(activitymsg);
    >         }
    >
    > -       perform_base_backup(&opt);
    > +       switch (cmd->cmdtag)
    >
    > So the design here is that SendBaseBackup() is now going to do a bunch
    > of things that are NOT sending a base backup? With no updates to the
    > comments of that function and no change to the process title it sets?
    >
    
    Okay. I have renamed the function and have updated the comments.
    
    
    >
    > -       return (manifest->buffile != NULL);
    > +       return (manifest && manifest->buffile != NULL);
    >
    > Heck no. It appears that you didn't even bother reading the function
    > header comment.
    >
    
    Okay, I forgot to remove this check. In the backup manifest patch,
    manifest_info
    object is always available. Anyways I have removed this check for 003 patch
    as well.
    
    
    >
    > + * Send a single resultset containing XLogRecPtr record (in text format)
    > + * TimelineID and backup label.
    >   */
    >  static void
    > -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
    > +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
    > +                                        StringInfo label, char *backupid)
    >
    > This just casually breaks wire protocol compatibility, which seems
    > completely unacceptable.
    >
    
    Non-parallal backup returns startptr and tli in the result set. The
    START_BACKUP
    returns startptr, tli, backup label and backupid. So I had extended this
    result set.
    
    I have removed the changes from SendXlogRecPtrResult and have added another
    function just for returning the result set for parallel backup.
    
    
    >
    > +       if (strlen(opt->tablespace) > 0)
    > +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
    > +       else
    > +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
    > +
    > +       SendFilesHeader(files);
    >
    > So I guess the idea here is that we buffer the entire list of files in
    > memory, regardless of size, and then we send it out afterwards. That
    > doesn't seem like a good idea. The list of files might be very large.
    > We probably need some code refactoring here rather than just piling
    > more and more different responsibilities onto sendTablespace() and
    > sendDir().
    >
    
    I don't foresee memory to be a challenge here. Assuming a database
    containing 10240
    relation files (that max reach to 10 TB of size), the list will occupy
    approximately 102MB
    of space in memory. This obviously can be reduced, but it doesn’t seem too
    bad either.
    One way of doing it is by fetching a smaller set of files and clients can
    result in the next
    set if the current one is processed; perhaps fetch initially per table
    space and request for
    next one once the current one is done with.
    
    Currently, basebackup only does compression on the client-side. So, I
    suggest we stick with
    the existing behavior. On the other thread, you have mentioned that the
    backend should send
    the tarballs and that the server should decide which files per tarball. I
    believe the current
    design can accommodate that easily if it's the client deciding the files
    per tarball. The current
    design can also accommodate server-side compression and encryption with
    minimal changes.
    Is there a point I’m overlooking here?
    
    
    
    >
    > +       if (state->parallel_mode)
    > +               SpinLockAcquire(&state->lock);
    > +
    > +       state->throttling_counter += increment;
    > +
    > +       if (state->parallel_mode)
    > +               SpinLockRelease(&state->lock);
    >
    > I don't like this much. It seems to me that we would do better to use
    > atomics here all the time, instead of conditional spinlocks.
    >
    
    Okay have added throttling_counter as atomic. however a lock is still
    required
    for  throttling_counter%=throttling_sample.
    
    
    
    
    >
    > +static void
    > +send_file(basebackup_options *opt, char *file, bool missing_ok)
    > ...
    > +       if (file == NULL)
    > +               return;
    >
    > That seems totally inappropriate.
    >
    
    Removed.
    
    
    > +                       sendFile(file, file + basepathlen, &statbuf,
    > true, InvalidOid, NULL, NULL);
    >
    > Maybe I'm misunderstanding, but this looks like it's going to write a
    > tar header, even though we're not writing a tarfile.
    >
    
    sendFile() always sends files with tar header included, even if the backup
    mode
    
    is plain. pg_basebackup also expects the same. That's the current behavior
    of
    
    the system.
    
    Otherwise, we will have to duplicate this function which would be doing the
    pretty
    
    much same thing, except the tar header.
    
    
    
    >
    > +               else
    > +                       ereport(WARNING,
    > +                                       (errmsg("skipping special file
    > or directory \"%s\"", file)));
    >
    > So, if the user asks for a directory or symlink, what's going to
    > happen is that they're going to receive an empty file, and get a
    > warning. That sounds like terrible behavior.
    >
    
    Removed the warning and generated an error if other then a regular file is
    requested.
    
    
    >
    >
    > +       /*
    > +        * Check for checksum failures. If there are failures across
    > multiple
    > +        * processes it may not report total checksum count, but it will
    > error
    > +        * out,terminating the backup.
    > +        */
    >
    > In other words, the patch breaks the feature. Not that the feature in
    > question works particularly well as things stand, but this makes it
    > worse.
    >
    
    Added an atomic uint64 total_checksum_failures to shared state to keep
    the total count across workers, So it will have the same behavior as
    current.
    
    
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  99. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-04-22T16:27:35Z

    On Wed, Apr 22, 2020 at 10:18 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
    > I don't foresee memory to be a challenge here. Assuming a database containing 10240
    > relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB
    > of space in memory. This obviously can be reduced, but it doesn’t seem too bad either.
    > One way of doing it is by fetching a smaller set of files and clients can result in the next
    > set if the current one is processed; perhaps fetch initially per table space and request for
    > next one once the current one is done with.
    
    The more concerning case is when someone has a lot of small files.
    
    > Okay have added throttling_counter as atomic. however a lock is still required
    > for  throttling_counter%=throttling_sample.
    
    Well, if you can't get rid of the lock, using a atomics is pointless.
    
    >> +                       sendFile(file, file + basepathlen, &statbuf,
    >> true, InvalidOid, NULL, NULL);
    >>
    >> Maybe I'm misunderstanding, but this looks like it's going to write a
    >> tar header, even though we're not writing a tarfile.
    >
    > sendFile() always sends files with tar header included, even if the backup mode
    > is plain. pg_basebackup also expects the same. That's the current behavior of
    > the system.
    >
    > Otherwise, we will have to duplicate this function which would be doing the pretty
    > much same thing, except the tar header.
    
    Well, as I said before, the solution to that problem is refactoring,
    not crummy interfaces. You're never going to persuade any committer
    who understands what that code actually does to commit it.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  100. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-23T06:43:33Z

    On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    > Hi Dipesh,
    >
    > The rebased and updated patch is attached. Its rebased to (9f2c4ede).
    >
    
    Make is failing for v15 patch.
    
    gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Werror=vla -Wendif-labels
    -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
    -g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
    basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
    In file included from basebackup.c:33:
    ../../../src/include/replication/backup_manifest.h:37: error: redefinition
    of typedef ‘manifest_info’
    ../../../src/include/replication/basebackup.h:35: note: previous
    declaration of ‘manifest_info’ was here
    make[3]: *** [basebackup.o] Error 1
    make[3]: Leaving directory
    `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
    make[2]: *** [replication-recursive] Error 2
    
    
    >
    >
    
  101. Re: WIP/PoC for parallel backup

    Asif Rehman <asifr.rehman@gmail.com> — 2020-04-23T08:17:03Z

    On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <
    rajkumar.raghuwanshi@enterprisedb.com> wrote:
    
    >
    >
    > On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com>
    > wrote:
    >
    >>
    >> Hi Dipesh,
    >>
    >> The rebased and updated patch is attached. Its rebased to (9f2c4ede).
    >>
    >
    > Make is failing for v15 patch.
    >
    > gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
    > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
    > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
    > -g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
    > basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
    > In file included from basebackup.c:33:
    > ../../../src/include/replication/backup_manifest.h:37: error: redefinition
    > of typedef ‘manifest_info’
    > ../../../src/include/replication/basebackup.h:35: note: previous
    > declaration of ‘manifest_info’ was here
    > make[3]: *** [basebackup.o] Error 1
    > make[3]: Leaving directory
    > `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
    > make[2]: *** [replication-recursive] Error 2
    >
    >
    Just compiled on clean source and its compiling fine. Can you see if you
    have a clean source tree?
    
    
    -- 
    --
    Asif Rehman
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    
  102. Re: WIP/PoC for parallel backup

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-23T09:53:05Z

    On Thu, Apr 23, 2020 at 1:47 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    
    >
    >
    > On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <
    > rajkumar.raghuwanshi@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com>
    >> wrote:
    >>
    >>>
    >>> Hi Dipesh,
    >>>
    >>> The rebased and updated patch is attached. Its rebased to (9f2c4ede).
    >>>
    >>
    >> Make is failing for v15 patch.
    >>
    >> gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
    >> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
    >> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
    >> -g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
    >> basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
    >> In file included from basebackup.c:33:
    >> ../../../src/include/replication/backup_manifest.h:37: error:
    >> redefinition of typedef ‘manifest_info’
    >> ../../../src/include/replication/basebackup.h:35: note: previous
    >> declaration of ‘manifest_info’ was here
    >> make[3]: *** [basebackup.o] Error 1
    >> make[3]: Leaving directory
    >> `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
    >> make[2]: *** [replication-recursive] Error 2
    >>
    >>
    > Just compiled on clean source and its compiling fine. Can you see if you
    > have a clean source tree?
    >
    Yeah, my machine is not cleaned. My colleague Suraj is also able to compile.
    Thanks, sorry for the noise.
    
    
    >
    >
    > --
    > --
    > Asif Rehman
    > Highgo Software (Canada/China/Pakistan)
    > URL : www.highgo.ca
    >
    >
    
  103. Re: WIP/PoC for parallel backup

    David Zhang <david.zhang@highgo.ca> — 2020-04-27T16:53:16Z

    Hi,
    
    Here is the parallel backup performance test results with and without 
    the patch "parallel_backup_v15" on AWS cloud environment. Two 
    "t2.xlarge" machines were used: one for Postgres server and the other 
    one for pg_basebackup with the same machine configuration showing below.
    
    Machine configuration:
         Instance Type        :t2.xlarge
         Volume type          :io1
         Memory (MiB)         :16GB
         vCPU #               :4
         Architecture         :x86_64
         IOP                  :6000
         Database Size (GB)   :108
    
    Performance test results:
    without patch:
         real 18m49.346s
         user 1m24.178s
         sys 7m2.966s
    
    1 worker with patch:
         real 18m43.201s
         user 1m55.787s
         sys 7m24.724s
    
    2 worker with patch:
         real 18m47.373s
         user 2m22.970s
         sys 11m23.891s
    
    4 worker with patch:
         real 18m46.878s
         user 2m26.791s
         sys 13m14.716s
    
    As required, I didn't have the pgbench running in parallel like we did 
    in the previous benchmark.
    
    The perf report files for both Postgres server and pg_basebackup sides 
    are attached.
    
    The files are listed like below. i.e. without patch 1 worker, and with 
    patch 1, 2, 4 workers.
    
    perf report on Postgres server side:
         perf.data-postgres-without-parallel_backup_v15.txt
         perf.data-postgres-with-parallel_backup_v15-j1.txt
         perf.data-postgres-with-parallel_backup_v15-j2.txt
         perf.data-postgres-with-parallel_backup_v15-j4.txt
    
    perf report on pg_basebackup side:
         perf.data-pg_basebackup-without-parallel_backup_v15.txt
         perf.data-pg_basebackup-with-parallel_backup_v15-j1.txt
         perf.data-pg_basebackup-with-parallel_backup_v15-j2.txt
         perf.data-pg_basebackup-with-parallel_backup_v15-j4.txt
    
    
    If any more information required please let me know.
    
    
    On 2020-04-21 7:12 a.m., Amit Kapila wrote:
    > On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    >> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >>>> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
    >>>>> I did some tests a while back, and here are the results. The tests were done to simulate
    >>>>> a live database environment using pgbench.
    >>>>>
    >>>>> machine configuration used for this test:
    >>>>> Instance Type:    t2.xlarge
    >>>>> Volume Type  :    io1
    >>>>> Memory (MiB) :    16384
    >>>>> vCPU #           :    4
    >>>>> Architecture    :    X86_64
    >>>>> IOP                 :    16000
    >>>>> Database Size (GB) :    102
    >>>>>
    >>>>> The setup consist of 3 machines.
    >>>>> - one for database instances
    >>>>> - one for pg_basebackup client and
    >>>>> - one for pgbench with some parallel workers, simulating SELECT loads.
    >>>>>
    >>>>>                                     basebackup | 4 workers | 8 Workers  | 16 workers
    >>>>> Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
    >>>>> (pgbench running with 50 parallel client simulating SELECT load)
    >>>>>
    >>>>> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
    >>>>> (pgbench running with 100 parallel client simulating SELECT load)
    >>>>>
    >>>> Thanks for sharing the results, these show nice speedup!  However, I
    >>>> think we should try to find what exactly causes this speed up.  If you
    >>>> see the recent discussion on another thread related to this topic,
    >>>> Andres, pointed out that he doesn't think that we can gain much by
    >>>> having multiple connections[1].  It might be due to some internal
    >>>> limitations (like small buffers) [2] due to which we are seeing these
    >>>> speedups.  It might help if you can share the perf reports of the
    >>>> server-side and pg_basebackup side.
    >>>>
    >>> Just to be clear, we need perf reports both with and without patch-set.
    >>
    >> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallel backup and share the results and perf reports.
    >>
    > Sounds good. I think we should also try to run the test with 1 worker
    > as well.  The reason it will be good to see the results with 1 worker
    > is that we can know if the technique to send file by file as is done
    > in this patch is better or worse than the current HEAD code.  So, it
    > will be good to see the results of an unpatched code, 1 worker, 2
    > workers, 4 workers, etc.
    >
    -- 
    David
    
    Software Engineer
    Highgo Software Inc. (Canada)
    www.highgo.ca
    
  104. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-28T03:15:55Z

    On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
    >
    > Hi,
    >
    > Here is the parallel backup performance test results with and without
    > the patch "parallel_backup_v15" on AWS cloud environment. Two
    > "t2.xlarge" machines were used: one for Postgres server and the other
    > one for pg_basebackup with the same machine configuration showing below.
    >
    > Machine configuration:
    >      Instance Type        :t2.xlarge
    >      Volume type          :io1
    >      Memory (MiB)         :16GB
    >      vCPU #               :4
    >      Architecture         :x86_64
    >      IOP                  :6000
    >      Database Size (GB)   :108
    >
    > Performance test results:
    > without patch:
    >      real 18m49.346s
    >      user 1m24.178s
    >      sys 7m2.966s
    >
    > 1 worker with patch:
    >      real 18m43.201s
    >      user 1m55.787s
    >      sys 7m24.724s
    >
    > 2 worker with patch:
    >      real 18m47.373s
    >      user 2m22.970s
    >      sys 11m23.891s
    >
    > 4 worker with patch:
    >      real 18m46.878s
    >      user 2m26.791s
    >      sys 13m14.716s
    >
    > As required, I didn't have the pgbench running in parallel like we did
    > in the previous benchmark.
    >
    
    So, there doesn't seem to be any significant improvement in this
    scenario.  Now, it is not clear why there was a significant
    improvement in the previous run where pgbench was also running
    simultaneously.  I am not sure but maybe it is because when a lot of
    other backends were running (performing read-only workload) the
    backend that was responsible for doing backup was getting frequently
    scheduled out and it slowed down the overall backup process.  And when
    we start using multiple backends for backup one or other backup
    process is always running making the overall backup faster.  One idea
    to find this out is to check how much time backup takes when we run it
    with and without pgbench workload on HEAD (aka unpatched code).  Even
    if what I am saying is true or there is some other reason due to which
    we are seeing speedup in some cases (where there is a concurrent
    workload), it might not make the case for using multiple backends for
    backup but still, it is good to find that information as it might help
    in designing this feature better.
    
    > The perf report files for both Postgres server and pg_basebackup sides
    > are attached.
    >
    
    It is not clear which functions are taking more time or for which
    functions time is reduced as function symbols are not present in the
    reports.  I think you can refer
    "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
    take profiles and additionally use -fno-omit-frame-pointer during
    configure (you can use CFLAGS="-fno-omit-frame-pointer during
    configure).
    
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  105. Re: WIP/PoC for parallel backup

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-04-29T12:41:07Z

    Hi,
    
    We at EnterpriseDB did some performance testing around this
    parallel backup to check how this is beneficial and below are the results.
    In this testing, we run the backup -
    1) Without Asif’s patch
    2) With Asif’s patch and combination of workers 1,2,4,8.
    
    We run those test on two setup
    
    1) Client and Server both on the same machine (Local backups)
    
    2) Client and server on a different machine (remote backups)
    
    
    *Machine details: *
    
    1: Server (on which local backups performed and used as server for remote
    backups)
    
    2: Client (Used as a client for remote backups)
    
    
    *Server:*
    
    RAM: 500 GB
    CPU details:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 128
    On-line CPU(s) list: 0-127
    Thread(s) per core: 2
    Core(s) per socket: 8
    Socket(s): 8
    NUMA node(s): 8
    Filesystem: ext4
    
    
    *Client:*
    RAM: 490 GB
    CPU details:
    Architecture: ppc64le
    Byte Order: Little Endian
    CPU(s): 192
    On-line CPU(s) list: 0-191
    Thread(s) per core: 8
    Core(s) per socket: 1
    Socket(s): 24
    Filesystem: ext4
    
    Below are the results for the local test:
    
    Data size without paralle backup
    patch parallel backup with
    1 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    2 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    4 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    8 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch)
    10 GB
    (10 tables - each table around 1.05 GB) real 0m27.016s
    user 0m3.378s
    sys 0m23.059s real 0m30.314s
    user 0m3.575s
    sys 0m22.946s 12% performance
    decreased real 0m20.400s
    user 0m3.622s
    sys 0m29.670s 27% performace
    increased real 0m15.331s
    user 0m3.706s
    sys 0m39.189s 43% performance
    increased real 0m15.094s
    user 0m3.915s
    sys 1m23.350s 44% performace
    increased.
    50GB
    (50 tables - each table around 1.05 GB) real 2m11.049s
    user 0m16.464s
    sys 2m1.757s real 2m26.621s
    user 0m18.497s
    sys 2m4.792s 21% performance
    decreased real 1m9.581s
    user 0m18.298s
    sys 2m12.030s 46% performance
    increased real 0m53.894s
    user 0m18.588s
    sys 2m47.390s 58% performance
    increased. real 0m55.373s
    user 0m18.423s
    sys 5m57.470s 57% performance
    increased.
    100GB
    (100 tables - each table around 1.05 GB) real 4m4.776s
    user 0m33.699s
    sys 3m27.777s real 4m20.862s
    user 0m35.753s
    sys 3m28.262s 6% performance
    decreased real 2m37.411s
    user 0m36.440s
    sys 4m16.424s" 35% performance
    increased real 1m49.503s
    user 0m37.200s
    sys 5m58.077s 55% performace
    increased real 1m36.762s
    user 0m36.987s
    sys 9m36.906s 60% performace
    increased.
    200GB
    (200 tables - each table around 1.05 GB) real 10m34.998s
    user 1m8.471s
    sys 7m21.520s real 11m30.899s
    user 1m12.933s
    sys 8m14.496s 8% performance
    decreased real 6m8.481s
    user 1m13.771s
    sys 9m31.216s 41% performance
    increased real 4m2.403s
    user 1m18.331s
    sys 12m29.661s 61% performance
    increased real 4m3.768s
    user 1m24.547s
    sys 15m21.421s 61% performance
    increased
    
    Results for the remote test:
    
    Data size without paralle backup
    patch parallel backup with
    1 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    2 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    4 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch) parallel backup with
    8 worker % performance
    increased/decreased
    compare to normal
    backup
    (without patch)
    10 GB
    (10 tables - each table around 1.05 GB) real 1m36.829s
    user 0m2.124s
    sys 0m14.004s real 1m37.598s
    user 0m3.272s
    sys 0m11.110s 0.8% performance
    decreased real 1m36.753s
    user 0m2.627s
    sys 0m15.312s 0.08% performance
    increased. real 1m37.212s
    user 0m3.835s
    sys 0m13.221s 0.3% performance
    decreased. real 1m36.977s
    user 0m4.475s
    sys 0m17.937s 0.1% perfomance
    decreased.
    50GB
    (50 tables - each table around 1.05 GB) real 7m54.211s
    user 0m10.826s
    sys 1m10.435s real 7m55.603s
    user 0m16.535s
    sys 1m8.147s 0.2% performance
    decreased real 7m53.499s
    user 0m18.131s
    sys 1m8.822s 0.1% performance
    increased. real 7m54.687s
    user 0m15.818s
    sys 1m30.991s 0.1% performance
    decreased real 7m54.658s
    user 0m20.783s
    sys 1m34.460s 0.1% performance
    decreased
    100GB
    (100 tables - each table around 1.05 GB) real 15m45.776s
    user 0m21.802s
    sys 2m59.006s real 15m46.315s
    user 0m32.499s
    sys 2m47.245s 0.05% performance
    decreased real 15m46.065s
    user 0m28.877s
    sys 2m21.181s 0.03% performacne
    drcreased real 15m47.793s
    user 0m30.932s
    sys 2m36.708s 0.2% performance
    decresed real 15m47.129s
    user 0m35.151s
    sys 3m23.572s 0.14% performance
    decreased.
    200GB
    (200 tables - each table around 1.05 GB) real 32m55.720s
    user 0m50.602s
    sys 5m38.875s real 31m30.602s
    user 0m45.377s
    sys 4m57.405s 4% performance
    increased real 31m30.214s
    user 0m55.023s
    sys 5m8.689s 4% performance
    increased real 31m31.187s
    user 1m13.390s
    sys 5m40.861s 4% performance
    increased real 31m31.729s
    user 1m4.955s
    sys 6m35.774s 4% performance
    decreased
    
    
    Client & Server on the same machine, the result shows around 50%
    improvement in parallel run with worker 4 and 8.  We don’t see the huge
    performance improvement with more workers been added.
    
    
    Whereas, when the client and server on a different machine, we don’t see
    any major benefit in performance.  This testing result matches the testing
    results posted by David Zhang up thread.
    
    
    
    We ran the test for 100GB backup with parallel worker 4 to see the CPU
    usage and other information. What we noticed is that server is consuming
    the CPU almost 100% whole the time and pg_stat_activity shows that server
    is busy with ClientWrite most of the time.
    
    
    Attaching captured output for
    
    1) Top command output on the server after every 5 second
    
    2) pg_stat_activity output after every 5 second
    
    3) Top command output on the client after every 5 second
    
    
    Do let me know if anyone has further questions/inputs for the benchmarking.
    
    Thanks to Rushabh Lathia for helping me with this testing.
    
    On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    
    > On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca>
    > wrote:
    > >
    > > Hi,
    > >
    > > Here is the parallel backup performance test results with and without
    > > the patch "parallel_backup_v15" on AWS cloud environment. Two
    > > "t2.xlarge" machines were used: one for Postgres server and the other
    > > one for pg_basebackup with the same machine configuration showing below.
    > >
    > > Machine configuration:
    > >      Instance Type        :t2.xlarge
    > >      Volume type          :io1
    > >      Memory (MiB)         :16GB
    > >      vCPU #               :4
    > >      Architecture         :x86_64
    > >      IOP                  :6000
    > >      Database Size (GB)   :108
    > >
    > > Performance test results:
    > > without patch:
    > >      real 18m49.346s
    > >      user 1m24.178s
    > >      sys 7m2.966s
    > >
    > > 1 worker with patch:
    > >      real 18m43.201s
    > >      user 1m55.787s
    > >      sys 7m24.724s
    > >
    > > 2 worker with patch:
    > >      real 18m47.373s
    > >      user 2m22.970s
    > >      sys 11m23.891s
    > >
    > > 4 worker with patch:
    > >      real 18m46.878s
    > >      user 2m26.791s
    > >      sys 13m14.716s
    > >
    > > As required, I didn't have the pgbench running in parallel like we did
    > > in the previous benchmark.
    > >
    >
    > So, there doesn't seem to be any significant improvement in this
    > scenario.  Now, it is not clear why there was a significant
    > improvement in the previous run where pgbench was also running
    > simultaneously.  I am not sure but maybe it is because when a lot of
    > other backends were running (performing read-only workload) the
    > backend that was responsible for doing backup was getting frequently
    > scheduled out and it slowed down the overall backup process.  And when
    > we start using multiple backends for backup one or other backup
    > process is always running making the overall backup faster.  One idea
    > to find this out is to check how much time backup takes when we run it
    > with and without pgbench workload on HEAD (aka unpatched code).  Even
    > if what I am saying is true or there is some other reason due to which
    > we are seeing speedup in some cases (where there is a concurrent
    > workload), it might not make the case for using multiple backends for
    > backup but still, it is good to find that information as it might help
    > in designing this feature better.
    >
    > > The perf report files for both Postgres server and pg_basebackup sides
    > > are attached.
    > >
    >
    > It is not clear which functions are taking more time or for which
    > functions time is reduced as function symbols are not present in the
    > reports.  I think you can refer
    > "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
    > take profiles and additionally use -fno-omit-frame-pointer during
    > configure (you can use CFLAGS="-fno-omit-frame-pointer during
    > configure).
    >
    >
    > --
    > With Regards,
    > Amit Kapila.
    > EnterpriseDB: http://www.enterprisedb.com
    >
    >
    >
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  106. Re: WIP/PoC for parallel backup

    David Zhang <david.zhang@highgo.ca> — 2020-04-30T06:26:16Z

    Hi,
    
    Thanks a lot for sharing the test results. Here is the our test results 
    using perf on three ASW t2.xlarge with below configuration.
    
    Machine configuration:
           Instance Type        :t2.xlarge
           Volume type          :io1
           Memory (MiB)         :16GB
           vCPU #                   :4
           Architecture           :x86_64
           IOP                         :6000
           Database Size (GB)  :45 (Server)
    
    case 1: postgres server: without patch and without load
    
    * Disk I/O:
    
    # Samples: 342K of event 'block:block_rq_insert'
    # Event count (approx.): 342834
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. .....................
    #
         97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
          2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    
    
    * CPU:
    
    # Samples: 6M of event 'cpu-clock'
    # Event count (approx.): 1559444750000
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  .................... 
    .............................................
    #
         64.73%  swapper          [kernel.kallsyms]     [k] native_safe_halt
         10.89%  postgres         [vdso]                [.] __vdso_gettimeofday
          5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
          5.43%  postgres         libpthread-2.26.so    [.] __libc_recv
          1.72%  postgres         [kernel.kallsyms]     [k] 
    pvclock_clocksource_read
    
    * Network:
    
    # Samples: 2M of event 'skb:consume_skb'
    # Event count (approx.): 2739785
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. ...........................
    #
         91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
          7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
          0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
          0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
    
    
    case 1: pg_basebackup client: without patch and without load
    
    * Disk I/O:
    
    # Samples: 371K of event 'block:block_rq_insert'
    # Event count (approx.): 371362
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. .....................
    #
         96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
          2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
          0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
          0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    
    
    * CPU:
    
    # Samples: 3M of event 'cpu-clock'
    # Event count (approx.): 903527000000
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  .................. 
    .............................................
    #
         87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
          3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
          0.48%  swapper          [kernel.kallsyms]   [k] 
    __softirqentry_text_start
          0.37%  pg_basebackup    [kernel.kallsyms]   [k] 
    copy_user_enhanced_fast_string
          0.35%  swapper          [kernel.kallsyms]   [k] do_csum
    
    * Network:
    
    # Samples: 12M of event 'skb:consume_skb'
    # Event count (approx.): 12260713
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. ...........................
    #
         95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
          3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
          0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
          0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb
    
    
    case 2: postgres server: with patch and with load, 4 backup workers on 
    client side
    
    * Disk I/O:
    
    # Samples: 3M of event 'block:block_rq_insert'
    # Event count (approx.): 3634542
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. .....................
    #
         98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
          0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
          0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
          0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request
    
    * CPU:
    
    # Samples: 9M of event 'cpu-clock'
    # Event count (approx.): 2299129250000
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ..................... 
    .............................................
    #
         52.73%  swapper          [kernel.kallsyms]      [k] native_safe_halt
          8.31%  postgres         [vdso]                 [.] __vdso_gettimeofday
          4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
          4.16%  postgres         libpthread-2.26.so     [.] __libc_recv
          1.58%  postgres         [kernel.kallsyms]      [k] __lock_text_start
          1.52%  postgres         [kernel.kallsyms]      [k] 
    pvclock_clocksource_read
          0.81%  postgres         [kernel.kallsyms]      [k] 
    copy_user_enhanced_fast_string
    
    
    * Network:
    
    # Samples: 6M of event 'skb:consume_skb'
    # Event count (approx.): 6048795
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. ...........................
    #
         85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
         12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
          0.97%  postgres         [kernel.kallsyms]  [k] __consume_stateless_skb
          0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
          0.24%  perf             [kernel.kallsyms]  [k] consume_skb
    
    
    case 2: pg_basebackup 4 workers: with patch and with load
    
    * Disk I/O:
    
    # Samples: 372K of event 'block:block_rq_insert'
    # Event count (approx.): 372360
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. .....................
    #
         97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
          1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
          0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
          0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    
    
    * CPU:
    
    # Samples: 4M of event 'cpu-clock'
    # Event count (approx.): 1234071000000
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ........................ 
    .................................................
    #
         89.25%  swapper          [kernel.kallsyms]         [k] native_safe_halt
          0.93%  pg_basebackup    [kernel.kallsyms]         [k] 
    __lock_text_start
          0.91%  swapper          [kernel.kallsyms]         [k] 
    __lock_text_start
          0.69%  pg_basebackup    [kernel.kallsyms]         [k] 
    copy_user_enhanced_fast_string
          0.45%  swapper          [kernel.kallsyms]         [k] do_csum
    
    
    * Network:
    
    # Samples: 6M of event 'skb:consume_skb'
    # Event count (approx.): 6449013
    #
    # Overhead  Command          Shared Object Symbol
    # ........  ...............  ................. ...........................
    #
         90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
          9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
          0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
          0.21%  sshd             [kernel.kallsyms]  [k] consume_skb
    
    
    The detailed perf report is attached, with different scenarios, i.e. 
    without patch (with and without load for server and client) , with patch 
    (with and without load for 1, 2, 4, 8 workers for both server and 
    client). The file name should self explain the cases.
    
    Let me know if more information required.
    
    Best regards,
    
    David
    
    On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
    > Hi,
    >
    > We at EnterpriseDB did some performance testing around this 
    > parallel backup to check how this is beneficial and below are the 
    > results. In this testing, we run the backup -
    > 1) Without Asif’s patch
    > 2) With Asif’s patch and combination of workers 1,2,4,8.
    >
    > We run those test on two setup
    >
    > 1) Client and Server both on the same machine (Local backups)
    >
    > 2) Client and server on a different machine (remote backups)
    >
    >
    > *Machine details: *
    >
    > 1: Server (on which local backups performed and used as server for 
    > remote backups)
    >
    > 2: Client (Used as a client for remote backups)
    >
    >
    > *Server:*
    >
    > RAM:500 GB
    > CPU details:
    > Architecture: x86_64
    > CPU op-mode(s): 32-bit, 64-bit
    > Byte Order: Little Endian
    > CPU(s): 128
    > On-line CPU(s) list: 0-127
    > Thread(s) per core: 2
    > Core(s) per socket: 8
    > Socket(s): 8
    > NUMA node(s): 8
    > Filesystem:ext4
    >
    >
    > *Client:*
    > RAM:490 GB
    > CPU details:
    > Architecture: ppc64le
    > Byte Order: Little Endian
    > CPU(s): 192
    > On-line CPU(s) list: 0-191
    > Thread(s) per core: 8
    > Core(s) per socket: 1
    > Socket(s): 24
    > Filesystem:ext4
    >
    > Below are the results for the local test:
    >
    > Data size 	without paralle backup
    > patch 	parallel backup with
    > 1 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 2 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 4 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 8 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch)
    > 10 GB
    > (10 tables - each table around 1.05 GB) 	real 0m27.016s
    > user 0m3.378s
    > sys 0m23.059s 	real 0m30.314s
    > user 0m3.575s
    > sys 0m22.946s 	12% performance
    > decreased 	real 0m20.400s
    > user 0m3.622s
    > sys 0m29.670s 	27% performace
    > increased 	real 0m15.331s
    > user 0m3.706s
    > sys 0m39.189s 	43% performance
    > increased 	real 0m15.094s
    > user 0m3.915s
    > sys 1m23.350s 	44% performace
    > increased.
    > 50GB
    > (50 tables - each table around 1.05 GB) 	real 2m11.049s
    > user 0m16.464s
    > sys 2m1.757s 	real 2m26.621s
    > user 0m18.497s
    > sys 2m4.792s 	21% performance
    > decreased 	real 1m9.581s
    > user 0m18.298s
    > sys 2m12.030s 	46% performance
    > increased 	real 0m53.894s
    > user 0m18.588s
    > sys 2m47.390s 	58% performance
    > increased. 	real 0m55.373s
    > user 0m18.423s
    > sys 5m57.470s 	57% performance
    > increased.
    > 100GB
    > (100 tables - each table around 1.05 GB) 	real 4m4.776s
    > user 0m33.699s
    > sys 3m27.777s 	real 4m20.862s
    > user 0m35.753s
    > sys 3m28.262s 	6% performance
    > decreased 	real 2m37.411s
    > user 0m36.440s
    > sys 4m16.424s" 	35% performance
    > increased 	real 1m49.503s
    > user 0m37.200s
    > sys 5m58.077s 	55% performace
    > increased 	real 1m36.762s
    > user 0m36.987s
    > sys 9m36.906s 	60% performace
    > increased.
    > 200GB
    > (200 tables - each table around 1.05 GB) 	real 10m34.998s
    > user 1m8.471s
    > sys 7m21.520s 	real 11m30.899s
    > user 1m12.933s
    > sys 8m14.496s 	8% performance
    > decreased 	real 6m8.481s
    > user 1m13.771s
    > sys 9m31.216s 	41% performance
    > increased 	real 4m2.403s
    > user 1m18.331s
    > sys 12m29.661s 	61% performance
    > increased 	real 4m3.768s
    > user 1m24.547s
    > sys 15m21.421s 	61% performance
    > increased
    >
    >
    > Results for the remote test:
    >
    > Data size 	without paralle backup
    > patch 	parallel backup with
    > 1 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 2 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 4 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) 	parallel backup with
    > 8 worker 	% performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch)
    > 10 GB
    > (10 tables - each table around 1.05 GB) 	real 1m36.829s
    > user 0m2.124s
    > sys 0m14.004s 	real 1m37.598s
    > user 0m3.272s
    > sys 0m11.110s 	0.8% performance
    > decreased 	real 1m36.753s
    > user 0m2.627s
    > sys 0m15.312s 	0.08% performance
    > increased. 	real 1m37.212s
    > user 0m3.835s
    > sys 0m13.221s 	0.3% performance
    > decreased. 	real 1m36.977s
    > user 0m4.475s
    > sys 0m17.937s 	0.1% perfomance
    > decreased.
    > 50GB
    > (50 tables - each table around 1.05 GB) 	real 7m54.211s
    > user 0m10.826s
    > sys 1m10.435s 	real 7m55.603s
    > user 0m16.535s
    > sys 1m8.147s 	0.2% performance
    > decreased 	real 7m53.499s
    > user 0m18.131s
    > sys 1m8.822s 	0.1% performance
    > increased. 	real 7m54.687s
    > user 0m15.818s
    > sys 1m30.991s 	0.1% performance
    > decreased 	real 7m54.658s
    > user 0m20.783s
    > sys 1m34.460s 	0.1% performance
    > decreased
    > 100GB
    > (100 tables - each table around 1.05 GB) 	real 15m45.776s
    > user 0m21.802s
    > sys 2m59.006s 	real 15m46.315s
    > user 0m32.499s
    > sys 2m47.245s 	0.05% performance
    > decreased 	real 15m46.065s
    > user 0m28.877s
    > sys 2m21.181s 	0.03% performacne
    > drcreased 	real 15m47.793s
    > user 0m30.932s
    > sys 2m36.708s 	0.2% performance
    > decresed 	real 15m47.129s
    > user 0m35.151s
    > sys 3m23.572s 	0.14% performance
    > decreased.
    > 200GB
    > (200 tables - each table around 1.05 GB) 	real 32m55.720s
    > user 0m50.602s
    > sys 5m38.875s 	real 31m30.602s
    > user 0m45.377s
    > sys 4m57.405s 	4% performance
    > increased 	real 31m30.214s
    > user 0m55.023s
    > sys 5m8.689s 	4% performance
    > increased 	real 31m31.187s
    > user 1m13.390s
    > sys 5m40.861s 	4% performance
    > increased 	real 31m31.729s
    > user 1m4.955s
    > sys 6m35.774s 	4% performance
    > decreased
    >
    >
    >
    > Client & Server on the same machine, the result shows around 50% 
    > improvement in parallel run with worker 4 and 8.  We don’t see the 
    > huge performance improvement with more workers been added.
    >
    >
    > Whereas, when the client and server on a different machine, we don’t 
    > see any major benefit in performance.  This testing result matches the 
    > testing results posted by David Zhang up thread.
    >
    >
    >
    > We ran the test for 100GB backup with parallel worker 4 to see the CPU 
    > usage and other information. What we noticed is that server is 
    > consuming the CPU almost 100% whole the time and pg_stat_activity 
    > shows that server is busy with ClientWrite most of the time.
    >
    >
    > Attaching captured output for
    >
    > 1) Top command output on the server after every 5 second
    >
    > 2) pg_stat_activity output after every 5 second
    >
    > 3) Top command output on the client after every 5 second
    >
    >
    > Do let me know if anyone has further questions/inputs for the 
    > benchmarking.
    >
    >
    > Thanks to Rushabh Lathia for helping me with this testing.
    >
    > On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com 
    > <mailto:amit.kapila16@gmail.com>> wrote:
    >
    >     On Mon, Apr 27, 2020 at 10:23 PM David Zhang
    >     <david.zhang@highgo.ca <mailto:david.zhang@highgo.ca>> wrote:
    >     >
    >     > Hi,
    >     >
    >     > Here is the parallel backup performance test results with and
    >     without
    >     > the patch "parallel_backup_v15" on AWS cloud environment. Two
    >     > "t2.xlarge" machines were used: one for Postgres server and the
    >     other
    >     > one for pg_basebackup with the same machine configuration
    >     showing below.
    >     >
    >     > Machine configuration:
    >     >      Instance Type        :t2.xlarge
    >     >      Volume type          :io1
    >     >      Memory (MiB)         :16GB
    >     >      vCPU #               :4
    >     >      Architecture         :x86_64
    >     >      IOP                  :6000
    >     >      Database Size (GB)   :108
    >     >
    >     > Performance test results:
    >     > without patch:
    >     >      real 18m49.346s
    >     >      user 1m24.178s
    >     >      sys 7m2.966s
    >     >
    >     > 1 worker with patch:
    >     >      real 18m43.201s
    >     >      user 1m55.787s
    >     >      sys 7m24.724s
    >     >
    >     > 2 worker with patch:
    >     >      real 18m47.373s
    >     >      user 2m22.970s
    >     >      sys 11m23.891s
    >     >
    >     > 4 worker with patch:
    >     >      real 18m46.878s
    >     >      user 2m26.791s
    >     >      sys 13m14.716s
    >     >
    >     > As required, I didn't have the pgbench running in parallel like
    >     we did
    >     > in the previous benchmark.
    >     >
    >
    >     So, there doesn't seem to be any significant improvement in this
    >     scenario.  Now, it is not clear why there was a significant
    >     improvement in the previous run where pgbench was also running
    >     simultaneously.  I am not sure but maybe it is because when a lot of
    >     other backends were running (performing read-only workload) the
    >     backend that was responsible for doing backup was getting frequently
    >     scheduled out and it slowed down the overall backup process. And when
    >     we start using multiple backends for backup one or other backup
    >     process is always running making the overall backup faster. One idea
    >     to find this out is to check how much time backup takes when we run it
    >     with and without pgbench workload on HEAD (aka unpatched code).  Even
    >     if what I am saying is true or there is some other reason due to which
    >     we are seeing speedup in some cases (where there is a concurrent
    >     workload), it might not make the case for using multiple backends for
    >     backup but still, it is good to find that information as it might help
    >     in designing this feature better.
    >
    >     > The perf report files for both Postgres server and pg_basebackup
    >     sides
    >     > are attached.
    >     >
    >
    >     It is not clear which functions are taking more time or for which
    >     functions time is reduced as function symbols are not present in the
    >     reports.  I think you can refer
    >     "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
    >     take profiles and additionally use -fno-omit-frame-pointer during
    >     configure (you can use CFLAGS="-fno-omit-frame-pointer during
    >     configure).
    >
    >
    >     -- 
    >     With Regards,
    >     Amit Kapila.
    >     EnterpriseDB: http://www.enterprisedb.com
    >
    >
    >
    >
    > -- 
    > -- 
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    -- 
    David
    
    Software Engineer
    Highgo Software Inc. (Canada)
    www.highgo.ca
    
  107. Re: WIP/PoC for parallel backup

    Sumanta Mukherjee <sumanta.mukherjee@enterprisedb.com> — 2020-04-30T09:18:23Z

    Hi,
    
    Would it be possible to put in the absolute numbers of the perf
    so that it is easier to understand the amount of improvement with
    and without the patch and different loads and workers.
    
    I am also unsure why the swapper is taking such a huge percentage of the
    absolute time
    in the base run of just the postgres server and pg_basebackup client.
    
    With Regards,
    Sumanta Mukherjee.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca> wrote:
    
    > Hi,
    >
    > Thanks a lot for sharing the test results. Here is the our test results
    > using perf on three ASW t2.xlarge with below configuration.
    >
    > Machine configuration:
    >       Instance Type        :t2.xlarge
    >       Volume type          :io1
    >       Memory (MiB)         :16GB
    >       vCPU #                   :4
    >       Architecture           :x86_64
    >       IOP                         :6000
    >       Database Size (GB)  :45 (Server)
    >
    > case 1: postgres server: without patch and without load
    >
    > * Disk I/O:
    >
    > # Samples: 342K of event 'block:block_rq_insert'
    > # Event count (approx.): 342834
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  .....................
    > #
    >     97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
    >      2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    > * CPU:
    >
    > # Samples: 6M of event 'cpu-clock'
    > # Event count (approx.): 1559444750000
    > #
    > # Overhead  Command          Shared Object
    > Symbol
    > # ........  ...............  ....................
    > .............................................
    > #
    >     64.73%  swapper          [kernel.kallsyms]     [k] native_safe_halt
    >     10.89%  postgres         [vdso]                [.] __vdso_gettimeofday
    >      5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
    >      5.43%  postgres         libpthread-2.26.so    [.] __libc_recv
    >      1.72%  postgres         [kernel.kallsyms]     [k]
    > pvclock_clocksource_read
    >
    > * Network:
    >
    > # Samples: 2M of event 'skb:consume_skb'
    > # Event count (approx.): 2739785
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  ...........................
    > #
    >     91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
    >      7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
    >      0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
    >      0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
    >
    >
    > case 1: pg_basebackup client: without patch and without load
    >
    > * Disk I/O:
    >
    > # Samples: 371K of event 'block:block_rq_insert'
    > # Event count (approx.): 371362
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  .....................
    > #
    >     96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >      2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
    >      0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >      0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    > * CPU:
    >
    > # Samples: 3M of event 'cpu-clock'
    > # Event count (approx.): 903527000000
    > #
    > # Overhead  Command          Shared Object
    > Symbol
    > # ........  ...............  ..................
    > .............................................
    > #
    >     87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
    >      3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
    >      0.48%  swapper          [kernel.kallsyms]   [k]
    > __softirqentry_text_start
    >      0.37%  pg_basebackup    [kernel.kallsyms]   [k]
    > copy_user_enhanced_fast_string
    >      0.35%  swapper          [kernel.kallsyms]   [k] do_csum
    >
    > * Network:
    >
    > # Samples: 12M of event 'skb:consume_skb'
    > # Event count (approx.): 12260713
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  ...........................
    > #
    >     95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
    >      3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
    >      0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
    >      0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb
    >
    >
    > case 2: postgres server: with patch and with load, 4 backup workers on
    > client side
    >
    > * Disk I/O:
    >
    > # Samples: 3M of event 'block:block_rq_insert'
    > # Event count (approx.): 3634542
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  .....................
    > #
    >     98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
    >      0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
    >      0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >      0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request
    >
    > * CPU:
    >
    > # Samples: 9M of event 'cpu-clock'
    > # Event count (approx.): 2299129250000
    > #
    > # Overhead  Command          Shared Object
    > Symbol
    > # ........  ...............  .....................
    > .............................................
    > #
    >     52.73%  swapper          [kernel.kallsyms]      [k] native_safe_halt
    >      8.31%  postgres         [vdso]                 [.] __vdso_gettimeofday
    >      4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
    >      4.16%  postgres         libpthread-2.26.so     [.] __libc_recv
    >      1.58%  postgres         [kernel.kallsyms]      [k] __lock_text_start
    >      1.52%  postgres         [kernel.kallsyms]      [k]
    > pvclock_clocksource_read
    >      0.81%  postgres         [kernel.kallsyms]      [k]
    > copy_user_enhanced_fast_string
    >
    >
    > * Network:
    >
    > # Samples: 6M of event 'skb:consume_skb'
    > # Event count (approx.): 6048795
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  ...........................
    > #
    >     85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
    >     12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
    >      0.97%  postgres         [kernel.kallsyms]  [k] __consume_stateless_skb
    >      0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
    >      0.24%  perf             [kernel.kallsyms]  [k] consume_skb
    >
    >
    > case 2: pg_basebackup 4 workers: with patch and with load
    >
    > * Disk I/O:
    >
    > # Samples: 372K of event 'block:block_rq_insert'
    > # Event count (approx.): 372360
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  .....................
    > #
    >     97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >      1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
    >      0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >      0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    > * CPU:
    >
    > # Samples: 4M of event 'cpu-clock'
    > # Event count (approx.): 1234071000000
    > #
    > # Overhead  Command          Shared Object
    > Symbol
    > # ........  ...............  ........................
    > .................................................
    > #
    >     89.25%  swapper          [kernel.kallsyms]         [k] native_safe_halt
    >      0.93%  pg_basebackup    [kernel.kallsyms]         [k]
    > __lock_text_start
    >      0.91%  swapper          [kernel.kallsyms]         [k]
    > __lock_text_start
    >      0.69%  pg_basebackup    [kernel.kallsyms]         [k]
    > copy_user_enhanced_fast_string
    >      0.45%  swapper          [kernel.kallsyms]         [k] do_csum
    >
    >
    > * Network:
    >
    > # Samples: 6M of event 'skb:consume_skb'
    > # Event count (approx.): 6449013
    > #
    > # Overhead  Command          Shared Object      Symbol
    > # ........  ...............  .................  ...........................
    > #
    >     90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
    >      9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
    >      0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
    >      0.21%  sshd             [kernel.kallsyms]  [k] consume_skb
    >
    >
    > The detailed perf report is attached, with different scenarios, i.e.
    > without patch (with and without load for server and client) , with patch
    > (with and without load for 1, 2, 4, 8 workers for both server and client).
    > The file name should self explain the cases.
    >
    > Let me know if more information required.
    >
    > Best regards,
    >
    > David
    > On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
    >
    > Hi,
    >
    > We at EnterpriseDB did some performance testing around this
    > parallel backup to check how this is beneficial and below are the results.
    > In this testing, we run the backup -
    > 1) Without Asif’s patch
    > 2) With Asif’s patch and combination of workers 1,2,4,8.
    >
    > We run those test on two setup
    >
    > 1) Client and Server both on the same machine (Local backups)
    >
    > 2) Client and server on a different machine (remote backups)
    >
    >
    > *Machine details: *
    >
    > 1: Server (on which local backups performed and used as server for remote
    > backups)
    >
    > 2: Client (Used as a client for remote backups)
    >
    >
    > *Server:*
    > RAM: 500 GB
    > CPU details:
    > Architecture: x86_64
    > CPU op-mode(s): 32-bit, 64-bit
    > Byte Order: Little Endian
    > CPU(s): 128
    > On-line CPU(s) list: 0-127
    > Thread(s) per core: 2
    > Core(s) per socket: 8
    > Socket(s): 8
    > NUMA node(s): 8
    > Filesystem: ext4
    >
    >
    > *Client:*
    > RAM: 490 GB
    > CPU details:
    > Architecture: ppc64le
    > Byte Order: Little Endian
    > CPU(s): 192
    > On-line CPU(s) list: 0-191
    > Thread(s) per core: 8
    > Core(s) per socket: 1
    > Socket(s): 24
    > Filesystem: ext4
    >
    > Below are the results for the local test:
    >
    > Data size without paralle backup
    > patch parallel backup with
    > 1 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 2 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 4 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 8 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch)
    > 10 GB
    > (10 tables - each table around 1.05 GB) real 0m27.016s
    > user 0m3.378s
    > sys 0m23.059s real 0m30.314s
    > user 0m3.575s
    > sys 0m22.946s 12% performance
    > decreased real 0m20.400s
    > user 0m3.622s
    > sys 0m29.670s 27% performace
    > increased real 0m15.331s
    > user 0m3.706s
    > sys 0m39.189s 43% performance
    > increased real 0m15.094s
    > user 0m3.915s
    > sys 1m23.350s 44% performace
    > increased.
    > 50GB
    > (50 tables - each table around 1.05 GB) real 2m11.049s
    > user 0m16.464s
    > sys 2m1.757s real 2m26.621s
    > user 0m18.497s
    > sys 2m4.792s 21% performance
    > decreased real 1m9.581s
    > user 0m18.298s
    > sys 2m12.030s 46% performance
    > increased real 0m53.894s
    > user 0m18.588s
    > sys 2m47.390s 58% performance
    > increased. real 0m55.373s
    > user 0m18.423s
    > sys 5m57.470s 57% performance
    > increased.
    > 100GB
    > (100 tables - each table around 1.05 GB) real 4m4.776s
    > user 0m33.699s
    > sys 3m27.777s real 4m20.862s
    > user 0m35.753s
    > sys 3m28.262s 6% performance
    > decreased real 2m37.411s
    > user 0m36.440s
    > sys 4m16.424s" 35% performance
    > increased real 1m49.503s
    > user 0m37.200s
    > sys 5m58.077s 55% performace
    > increased real 1m36.762s
    > user 0m36.987s
    > sys 9m36.906s 60% performace
    > increased.
    > 200GB
    > (200 tables - each table around 1.05 GB) real 10m34.998s
    > user 1m8.471s
    > sys 7m21.520s real 11m30.899s
    > user 1m12.933s
    > sys 8m14.496s 8% performance
    > decreased real 6m8.481s
    > user 1m13.771s
    > sys 9m31.216s 41% performance
    > increased real 4m2.403s
    > user 1m18.331s
    > sys 12m29.661s 61% performance
    > increased real 4m3.768s
    > user 1m24.547s
    > sys 15m21.421s 61% performance
    > increased
    >
    > Results for the remote test:
    >
    > Data size without paralle backup
    > patch parallel backup with
    > 1 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 2 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 4 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch) parallel backup with
    > 8 worker % performance
    > increased/decreased
    > compare to normal
    > backup
    > (without patch)
    > 10 GB
    > (10 tables - each table around 1.05 GB) real 1m36.829s
    > user 0m2.124s
    > sys 0m14.004s real 1m37.598s
    > user 0m3.272s
    > sys 0m11.110s 0.8% performance
    > decreased real 1m36.753s
    > user 0m2.627s
    > sys 0m15.312s 0.08% performance
    > increased. real 1m37.212s
    > user 0m3.835s
    > sys 0m13.221s 0.3% performance
    > decreased. real 1m36.977s
    > user 0m4.475s
    > sys 0m17.937s 0.1% perfomance
    > decreased.
    > 50GB
    > (50 tables - each table around 1.05 GB) real 7m54.211s
    > user 0m10.826s
    > sys 1m10.435s real 7m55.603s
    > user 0m16.535s
    > sys 1m8.147s 0.2% performance
    > decreased real 7m53.499s
    > user 0m18.131s
    > sys 1m8.822s 0.1% performance
    > increased. real 7m54.687s
    > user 0m15.818s
    > sys 1m30.991s 0.1% performance
    > decreased real 7m54.658s
    > user 0m20.783s
    > sys 1m34.460s 0.1% performance
    > decreased
    > 100GB
    > (100 tables - each table around 1.05 GB) real 15m45.776s
    > user 0m21.802s
    > sys 2m59.006s real 15m46.315s
    > user 0m32.499s
    > sys 2m47.245s 0.05% performance
    > decreased real 15m46.065s
    > user 0m28.877s
    > sys 2m21.181s 0.03% performacne
    > drcreased real 15m47.793s
    > user 0m30.932s
    > sys 2m36.708s 0.2% performance
    > decresed real 15m47.129s
    > user 0m35.151s
    > sys 3m23.572s 0.14% performance
    > decreased.
    > 200GB
    > (200 tables - each table around 1.05 GB) real 32m55.720s
    > user 0m50.602s
    > sys 5m38.875s real 31m30.602s
    > user 0m45.377s
    > sys 4m57.405s 4% performance
    > increased real 31m30.214s
    > user 0m55.023s
    > sys 5m8.689s 4% performance
    > increased real 31m31.187s
    > user 1m13.390s
    > sys 5m40.861s 4% performance
    > increased real 31m31.729s
    > user 1m4.955s
    > sys 6m35.774s 4% performance
    > decreased
    >
    >
    > Client & Server on the same machine, the result shows around 50%
    > improvement in parallel run with worker 4 and 8.  We don’t see the huge
    > performance improvement with more workers been added.
    >
    >
    > Whereas, when the client and server on a different machine, we don’t see
    > any major benefit in performance.  This testing result matches the testing
    > results posted by David Zhang up thread.
    >
    >
    >
    > We ran the test for 100GB backup with parallel worker 4 to see the CPU
    > usage and other information. What we noticed is that server is consuming
    > the CPU almost 100% whole the time and pg_stat_activity shows that server
    > is busy with ClientWrite most of the time.
    >
    >
    > Attaching captured output for
    >
    > 1) Top command output on the server after every 5 second
    >
    > 2) pg_stat_activity output after every 5 second
    >
    > 3) Top command output on the client after every 5 second
    >
    >
    > Do let me know if anyone has further questions/inputs for the
    > benchmarking.
    >
    > Thanks to Rushabh Lathia for helping me with this testing.
    >
    > On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    >
    >> On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca>
    >> wrote:
    >> >
    >> > Hi,
    >> >
    >> > Here is the parallel backup performance test results with and without
    >> > the patch "parallel_backup_v15" on AWS cloud environment. Two
    >> > "t2.xlarge" machines were used: one for Postgres server and the other
    >> > one for pg_basebackup with the same machine configuration showing below.
    >> >
    >> > Machine configuration:
    >> >      Instance Type        :t2.xlarge
    >> >      Volume type          :io1
    >> >      Memory (MiB)         :16GB
    >> >      vCPU #               :4
    >> >      Architecture         :x86_64
    >> >      IOP                  :6000
    >> >      Database Size (GB)   :108
    >> >
    >> > Performance test results:
    >> > without patch:
    >> >      real 18m49.346s
    >> >      user 1m24.178s
    >> >      sys 7m2.966s
    >> >
    >> > 1 worker with patch:
    >> >      real 18m43.201s
    >> >      user 1m55.787s
    >> >      sys 7m24.724s
    >> >
    >> > 2 worker with patch:
    >> >      real 18m47.373s
    >> >      user 2m22.970s
    >> >      sys 11m23.891s
    >> >
    >> > 4 worker with patch:
    >> >      real 18m46.878s
    >> >      user 2m26.791s
    >> >      sys 13m14.716s
    >> >
    >> > As required, I didn't have the pgbench running in parallel like we did
    >> > in the previous benchmark.
    >> >
    >>
    >> So, there doesn't seem to be any significant improvement in this
    >> scenario.  Now, it is not clear why there was a significant
    >> improvement in the previous run where pgbench was also running
    >> simultaneously.  I am not sure but maybe it is because when a lot of
    >> other backends were running (performing read-only workload) the
    >> backend that was responsible for doing backup was getting frequently
    >> scheduled out and it slowed down the overall backup process.  And when
    >> we start using multiple backends for backup one or other backup
    >> process is always running making the overall backup faster.  One idea
    >> to find this out is to check how much time backup takes when we run it
    >> with and without pgbench workload on HEAD (aka unpatched code).  Even
    >> if what I am saying is true or there is some other reason due to which
    >> we are seeing speedup in some cases (where there is a concurrent
    >> workload), it might not make the case for using multiple backends for
    >> backup but still, it is good to find that information as it might help
    >> in designing this feature better.
    >>
    >> > The perf report files for both Postgres server and pg_basebackup sides
    >> > are attached.
    >> >
    >>
    >> It is not clear which functions are taking more time or for which
    >> functions time is reduced as function symbols are not present in the
    >> reports.  I think you can refer
    >> "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
    >> take profiles and additionally use -fno-omit-frame-pointer during
    >> configure (you can use CFLAGS="-fno-omit-frame-pointer during
    >> configure).
    >>
    >>
    >> --
    >> With Regards,
    >> Amit Kapila.
    >> EnterpriseDB: http://www.enterprisedb.com
    >>
    >>
    >>
    >
    > --
    > --
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    >
    > --
    > David
    >
    > Software Engineer
    > Highgo Software Inc. (Canada)
    > www.highgo.ca
    >
    
  108. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-30T10:45:13Z

    On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    >
    > Hi,
    >
    > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
    > 1) Without Asif’s patch
    > 2) With Asif’s patch and combination of workers 1,2,4,8.
    >
    > We run those test on two setup
    >
    > 1) Client and Server both on the same machine (Local backups)
    >
    > 2) Client and server on a different machine (remote backups)
    >
    >
    > Machine details:
    >
    > 1: Server (on which local backups performed and used as server for remote backups)
    >
    > 2: Client (Used as a client for remote backups)
    >
    >
    ...
    >
    >
    > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
    >
    >
    > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
    >
    >
    >
    > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
    >
    >
    
    Was this for a setup where the client and server were on the same
    machine or where the client was on a different machine?  If it was for
    the case where both are on the same machine, then ideally, we should
    see ClientRead events in a similar proportion?
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  109. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-04-30T13:09:36Z

    On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > >
    > > Hi,
    > >
    > > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
    > > 1) Without Asif’s patch
    > > 2) With Asif’s patch and combination of workers 1,2,4,8.
    > >
    > > We run those test on two setup
    > >
    > > 1) Client and Server both on the same machine (Local backups)
    > >
    > > 2) Client and server on a different machine (remote backups)
    > >
    > >
    > > Machine details:
    > >
    > > 1: Server (on which local backups performed and used as server for remote backups)
    > >
    > > 2: Client (Used as a client for remote backups)
    > >
    > >
    > ...
    > >
    > >
    > > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
    > >
    > >
    > > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
    > >
    > >
    > >
    > > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
    > >
    > >
    >
    > Was this for a setup where the client and server were on the same
    > machine or where the client was on a different machine?  If it was for
    > the case where both are on the same machine, then ideally, we should
    > see ClientRead events in a similar proportion?
    >
    
    During an offlist discussion with Robert, he pointed out that current
    basebackup's code doesn't account for the wait event for the reading
    of files which can change what pg_stat_activity shows?  Can you please
    apply his latest patch to improve basebackup.c's code [1] which will
    take care of that waitevent before getting the data again?
    
    [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  110. Re: WIP/PoC for parallel backup

    David Zhang <david.zhang@highgo.ca> — 2020-04-30T18:18:18Z

    On 2020-04-30 2:18 a.m., Sumanta Mukherjee wrote:
    
    > Hi,
    >
    > Would it be possible to put in the absolute numbers of the perf
    > so that it is easier to understand the amount of improvement with
    > and without the patch and different loads and workers.
    
    Here is the parameters used to record the perf data on both server and 
    client side, for example, after applied the patch15 using 4 workers with 
    load,
    
    perf record -o postgres_patch_j4_load -e block:block_rq_insert -e 
    cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- 
    /home/ec2-user/after/bin/postgres -D /mnt/test/data
    
    perf record -o backup_patch_j4_load -e block:block_rq_insert -e 
    cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- 
    /home/ec2-user/after/bin/pg_basebackup -h ${PG_SERVER} -p 5432 -D 
    /mnt/backup/data -v
    
    And this is how the report is generated.
    perf report  -i postgres_patch_j4_load --stdio > postgres_patch_j4_load.txt
    
    The original perf data files are still available, can you please clarify 
    which parameter you would like to be added for regenerating the report, 
    or any other parameters need to be added to recreate the perf.data and 
    then generate the report?
    
    >
    > I am also unsure why the swapper is taking such a huge percentage of 
    > the absolute time
    > in the base run of just the postgres server and pg_basebackup client.
    >
    > With Regards,
    > Sumanta Mukherjee.
    > EnterpriseDB: http://www.enterprisedb.com
    >
    >
    > On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca 
    > <mailto:david.zhang@highgo.ca>> wrote:
    >
    >     Hi,
    >
    >     Thanks a lot for sharing the test results. Here is the our test
    >     results using perf on three ASW t2.xlarge with below configuration.
    >
    >     Machine configuration:
    >           Instance Type        :t2.xlarge
    >           Volume type          :io1
    >           Memory (MiB)         :16GB
    >           vCPU #                   :4
    >           Architecture           :x86_64
    >           IOP                         :6000
    >           Database Size (GB)  :45 (Server)
    >
    >     case 1: postgres server: without patch and without load
    >
    >     * Disk I/O:
    >
    >     # Samples: 342K of event 'block:block_rq_insert'
    >     # Event count (approx.): 342834
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ................. .....................
    >     #
    >         97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
    >          2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    >     * CPU:
    >
    >     # Samples: 6M of event 'cpu-clock'
    >     # Event count (approx.): 1559444750000
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ....................
    >     .............................................
    >     #
    >         64.73%  swapper          [kernel.kallsyms]     [k]
    >     native_safe_halt
    >         10.89%  postgres         [vdso]                [.]
    >     __vdso_gettimeofday
    >          5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
    >          5.43%  postgres libpthread-2.26.so
    >     <http://libpthread-2.26.so>    [.] __libc_recv
    >          1.72%  postgres         [kernel.kallsyms]     [k]
    >     pvclock_clocksource_read
    >
    >     * Network:
    >
    >     # Samples: 2M of event 'skb:consume_skb'
    >     # Event count (approx.): 2739785
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  .................
    >     ...........................
    >     #
    >         91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
    >          7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
    >          0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
    >          0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
    >
    >
    >     case 1: pg_basebackup client: without patch and without load
    >
    >     * Disk I/O:
    >
    >     # Samples: 371K of event 'block:block_rq_insert'
    >     # Event count (approx.): 371362
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ................. .....................
    >     #
    >         96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >          2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
    >          0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >          0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    >     * CPU:
    >
    >     # Samples: 3M of event 'cpu-clock'
    >     # Event count (approx.): 903527000000
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ..................
    >     .............................................
    >     #
    >         87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
    >          3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
    >          0.48%  swapper          [kernel.kallsyms]   [k]
    >     __softirqentry_text_start
    >          0.37%  pg_basebackup    [kernel.kallsyms]   [k]
    >     copy_user_enhanced_fast_string
    >          0.35%  swapper          [kernel.kallsyms]   [k] do_csum
    >
    >     * Network:
    >
    >     # Samples: 12M of event 'skb:consume_skb'
    >     # Event count (approx.): 12260713
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  .................
    >     ...........................
    >     #
    >         95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
    >          3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
    >          0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
    >          0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb
    >
    >
    >     case 2: postgres server: with patch and with load, 4 backup
    >     workers on client side
    >
    >     * Disk I/O:
    >
    >     # Samples: 3M of event 'block:block_rq_insert'
    >     # Event count (approx.): 3634542
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ................. .....................
    >     #
    >         98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
    >          0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
    >          0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >          0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request
    >
    >     * CPU:
    >
    >     # Samples: 9M of event 'cpu-clock'
    >     # Event count (approx.): 2299129250000
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  .....................
    >     .............................................
    >     #
    >         52.73%  swapper          [kernel.kallsyms]      [k]
    >     native_safe_halt
    >          8.31%  postgres         [vdso]                 [.]
    >     __vdso_gettimeofday
    >          4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
    >          4.16%  postgres libpthread-2.26.so
    >     <http://libpthread-2.26.so>     [.] __libc_recv
    >          1.58%  postgres         [kernel.kallsyms]      [k]
    >     __lock_text_start
    >          1.52%  postgres         [kernel.kallsyms]      [k]
    >     pvclock_clocksource_read
    >          0.81%  postgres         [kernel.kallsyms]      [k]
    >     copy_user_enhanced_fast_string
    >
    >
    >     * Network:
    >
    >     # Samples: 6M of event 'skb:consume_skb'
    >     # Event count (approx.): 6048795
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  .................
    >     ...........................
    >     #
    >         85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
    >         12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
    >          0.97%  postgres         [kernel.kallsyms]  [k]
    >     __consume_stateless_skb
    >          0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
    >          0.24%  perf             [kernel.kallsyms]  [k] consume_skb
    >
    >
    >     case 2: pg_basebackup 4 workers: with patch and with load
    >
    >     * Disk I/O:
    >
    >     # Samples: 372K of event 'block:block_rq_insert'
    >     # Event count (approx.): 372360
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ................. .....................
    >     #
    >         97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
    >          1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
    >          0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
    >          0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
    >
    >
    >     * CPU:
    >
    >     # Samples: 4M of event 'cpu-clock'
    >     # Event count (approx.): 1234071000000
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  ........................
    >     .................................................
    >     #
    >         89.25%  swapper          [kernel.kallsyms] [k] native_safe_halt
    >          0.93%  pg_basebackup    [kernel.kallsyms] [k] __lock_text_start
    >          0.91%  swapper          [kernel.kallsyms] [k] __lock_text_start
    >          0.69%  pg_basebackup    [kernel.kallsyms] [k]
    >     copy_user_enhanced_fast_string
    >          0.45%  swapper          [kernel.kallsyms] [k] do_csum
    >
    >
    >     * Network:
    >
    >     # Samples: 6M of event 'skb:consume_skb'
    >     # Event count (approx.): 6449013
    >     #
    >     # Overhead  Command          Shared Object Symbol
    >     # ........  ...............  .................
    >     ...........................
    >     #
    >         90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
    >          9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
    >          0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
    >          0.21%  sshd             [kernel.kallsyms]  [k] consume_skb
    >
    >
    >     The detailed perf report is attached, with different scenarios,
    >     i.e. without patch (with and without load for server and client) ,
    >     with patch (with and without load for 1, 2, 4, 8 workers for both
    >     server and client). The file name should self explain the cases.
    >
    >     Let me know if more information required.
    >
    >     Best regards,
    >
    >     David
    >
    >     On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
    >>     Hi,
    >>
    >>     We at EnterpriseDB did some performance testing around this
    >>     parallel backup to check how this is beneficial and below are the
    >>     results. In this testing, we run the backup -
    >>     1) Without Asif’s patch
    >>     2) With Asif’s patch and combination of workers 1,2,4,8.
    >>
    >>     We run those test on two setup
    >>
    >>     1) Client and Server both on the same machine (Local backups)
    >>
    >>     2) Client and server on a different machine (remote backups)
    >>
    >>
    >>     *Machine details: *
    >>
    >>     1: Server (on which local backups performed and used as server
    >>     for remote backups)
    >>
    >>     2: Client (Used as a client for remote backups)
    >>
    >>
    >>     *Server:*
    >>
    >>     RAM:500 GB
    >>     CPU details:
    >>     Architecture: x86_64
    >>     CPU op-mode(s): 32-bit, 64-bit
    >>     Byte Order: Little Endian
    >>     CPU(s): 128
    >>     On-line CPU(s) list: 0-127
    >>     Thread(s) per core: 2
    >>     Core(s) per socket: 8
    >>     Socket(s): 8
    >>     NUMA node(s): 8
    >>     Filesystem:ext4
    >>
    >>
    >>     *Client:*
    >>     RAM:490 GB
    >>     CPU details:
    >>     Architecture: ppc64le
    >>     Byte Order: Little Endian
    >>     CPU(s): 192
    >>     On-line CPU(s) list: 0-191
    >>     Thread(s) per core: 8
    >>     Core(s) per socket: 1
    >>     Socket(s): 24
    >>     Filesystem:ext4
    >>
    >>     Below are the results for the local test:
    >>
    >>     Data size 	without paralle backup
    >>     patch 	parallel backup with
    >>     1 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     2 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     4 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     8 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch)
    >>     10 GB
    >>     (10 tables - each table around 1.05 GB) 	real 0m27.016s
    >>     user 0m3.378s
    >>     sys 0m23.059s 	real 0m30.314s
    >>     user 0m3.575s
    >>     sys 0m22.946s 	12% performance
    >>     decreased 	real 0m20.400s
    >>     user 0m3.622s
    >>     sys 0m29.670s 	27% performace
    >>     increased 	real 0m15.331s
    >>     user 0m3.706s
    >>     sys 0m39.189s 	43% performance
    >>     increased 	real 0m15.094s
    >>     user 0m3.915s
    >>     sys 1m23.350s 	44% performace
    >>     increased.
    >>     50GB
    >>     (50 tables - each table around 1.05 GB) 	real 2m11.049s
    >>     user 0m16.464s
    >>     sys 2m1.757s 	real 2m26.621s
    >>     user 0m18.497s
    >>     sys 2m4.792s 	21% performance
    >>     decreased 	real 1m9.581s
    >>     user 0m18.298s
    >>     sys 2m12.030s 	46% performance
    >>     increased 	real 0m53.894s
    >>     user 0m18.588s
    >>     sys 2m47.390s 	58% performance
    >>     increased. 	real 0m55.373s
    >>     user 0m18.423s
    >>     sys 5m57.470s 	57% performance
    >>     increased.
    >>     100GB
    >>     (100 tables - each table around 1.05 GB) 	real 4m4.776s
    >>     user 0m33.699s
    >>     sys 3m27.777s 	real 4m20.862s
    >>     user 0m35.753s
    >>     sys 3m28.262s 	6% performance
    >>     decreased 	real 2m37.411s
    >>     user 0m36.440s
    >>     sys 4m16.424s" 	35% performance
    >>     increased 	real 1m49.503s
    >>     user 0m37.200s
    >>     sys 5m58.077s 	55% performace
    >>     increased 	real 1m36.762s
    >>     user 0m36.987s
    >>     sys 9m36.906s 	60% performace
    >>     increased.
    >>     200GB
    >>     (200 tables - each table around 1.05 GB) 	real 10m34.998s
    >>     user 1m8.471s
    >>     sys 7m21.520s 	real 11m30.899s
    >>     user 1m12.933s
    >>     sys 8m14.496s 	8% performance
    >>     decreased 	real 6m8.481s
    >>     user 1m13.771s
    >>     sys 9m31.216s 	41% performance
    >>     increased 	real 4m2.403s
    >>     user 1m18.331s
    >>     sys 12m29.661s 	61% performance
    >>     increased 	real 4m3.768s
    >>     user 1m24.547s
    >>     sys 15m21.421s 	61% performance
    >>     increased
    >>
    >>
    >>     Results for the remote test:
    >>
    >>     Data size 	without paralle backup
    >>     patch 	parallel backup with
    >>     1 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     2 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     4 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch) 	parallel backup with
    >>     8 worker 	% performance
    >>     increased/decreased
    >>     compare to normal
    >>     backup
    >>     (without patch)
    >>     10 GB
    >>     (10 tables - each table around 1.05 GB) 	real 1m36.829s
    >>     user 0m2.124s
    >>     sys 0m14.004s 	real 1m37.598s
    >>     user 0m3.272s
    >>     sys 0m11.110s 	0.8% performance
    >>     decreased 	real 1m36.753s
    >>     user 0m2.627s
    >>     sys 0m15.312s 	0.08% performance
    >>     increased. 	real 1m37.212s
    >>     user 0m3.835s
    >>     sys 0m13.221s 	0.3% performance
    >>     decreased. 	real 1m36.977s
    >>     user 0m4.475s
    >>     sys 0m17.937s 	0.1% perfomance
    >>     decreased.
    >>     50GB
    >>     (50 tables - each table around 1.05 GB) 	real 7m54.211s
    >>     user 0m10.826s
    >>     sys 1m10.435s 	real 7m55.603s
    >>     user 0m16.535s
    >>     sys 1m8.147s 	0.2% performance
    >>     decreased 	real 7m53.499s
    >>     user 0m18.131s
    >>     sys 1m8.822s 	0.1% performance
    >>     increased. 	real 7m54.687s
    >>     user 0m15.818s
    >>     sys 1m30.991s 	0.1% performance
    >>     decreased 	real 7m54.658s
    >>     user 0m20.783s
    >>     sys 1m34.460s 	0.1% performance
    >>     decreased
    >>     100GB
    >>     (100 tables - each table around 1.05 GB) 	real 15m45.776s
    >>     user 0m21.802s
    >>     sys 2m59.006s 	real 15m46.315s
    >>     user 0m32.499s
    >>     sys 2m47.245s 	0.05% performance
    >>     decreased 	real 15m46.065s
    >>     user 0m28.877s
    >>     sys 2m21.181s 	0.03% performacne
    >>     drcreased 	real 15m47.793s
    >>     user 0m30.932s
    >>     sys 2m36.708s 	0.2% performance
    >>     decresed 	real 15m47.129s
    >>     user 0m35.151s
    >>     sys 3m23.572s 	0.14% performance
    >>     decreased.
    >>     200GB
    >>     (200 tables - each table around 1.05 GB) 	real 32m55.720s
    >>     user 0m50.602s
    >>     sys 5m38.875s 	real 31m30.602s
    >>     user 0m45.377s
    >>     sys 4m57.405s 	4% performance
    >>     increased 	real 31m30.214s
    >>     user 0m55.023s
    >>     sys 5m8.689s 	4% performance
    >>     increased 	real 31m31.187s
    >>     user 1m13.390s
    >>     sys 5m40.861s 	4% performance
    >>     increased 	real 31m31.729s
    >>     user 1m4.955s
    >>     sys 6m35.774s 	4% performance
    >>     decreased
    >>
    >>
    >>
    >>     Client & Server on the same machine, the result shows around 50%
    >>     improvement in parallel run with worker 4 and 8.  We don’t see
    >>     the huge performance improvement with more workers been added.
    >>
    >>
    >>     Whereas, when the client and server on a different machine, we
    >>     don’t see any major benefit in performance.  This testing result
    >>     matches the testing results posted by David Zhang up thread.
    >>
    >>
    >>
    >>     We ran the test for 100GB backup with parallel worker 4 to see
    >>     the CPU usage and other information. What we noticed is that
    >>     server is consuming the CPU almost 100% whole the time and
    >>     pg_stat_activity shows that server is busy with ClientWrite most
    >>     of the time.
    >>
    >>
    >>     Attaching captured output for
    >>
    >>     1) Top command output on the server after every 5 second
    >>
    >>     2) pg_stat_activity output after every 5 second
    >>
    >>     3) Top command output on the client after every 5 second
    >>
    >>
    >>     Do let me know if anyone has further questions/inputs for the
    >>     benchmarking.
    >>
    >>
    >>     Thanks to Rushabh Lathia for helping me with this testing.
    >>
    >>     On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila
    >>     <amit.kapila16@gmail.com <mailto:amit.kapila16@gmail.com>> wrote:
    >>
    >>         On Mon, Apr 27, 2020 at 10:23 PM David Zhang
    >>         <david.zhang@highgo.ca <mailto:david.zhang@highgo.ca>> wrote:
    >>         >
    >>         > Hi,
    >>         >
    >>         > Here is the parallel backup performance test results with
    >>         and without
    >>         > the patch "parallel_backup_v15" on AWS cloud environment. Two
    >>         > "t2.xlarge" machines were used: one for Postgres server and
    >>         the other
    >>         > one for pg_basebackup with the same machine configuration
    >>         showing below.
    >>         >
    >>         > Machine configuration:
    >>         >      Instance Type        :t2.xlarge
    >>         >      Volume type          :io1
    >>         >      Memory (MiB)         :16GB
    >>         >      vCPU #               :4
    >>         >      Architecture         :x86_64
    >>         >      IOP                  :6000
    >>         >      Database Size (GB)   :108
    >>         >
    >>         > Performance test results:
    >>         > without patch:
    >>         >      real 18m49.346s
    >>         >      user 1m24.178s
    >>         >      sys 7m2.966s
    >>         >
    >>         > 1 worker with patch:
    >>         >      real 18m43.201s
    >>         >      user 1m55.787s
    >>         >      sys 7m24.724s
    >>         >
    >>         > 2 worker with patch:
    >>         >      real 18m47.373s
    >>         >      user 2m22.970s
    >>         >      sys 11m23.891s
    >>         >
    >>         > 4 worker with patch:
    >>         >      real 18m46.878s
    >>         >      user 2m26.791s
    >>         >      sys 13m14.716s
    >>         >
    >>         > As required, I didn't have the pgbench running in parallel
    >>         like we did
    >>         > in the previous benchmark.
    >>         >
    >>
    >>         So, there doesn't seem to be any significant improvement in this
    >>         scenario.  Now, it is not clear why there was a significant
    >>         improvement in the previous run where pgbench was also running
    >>         simultaneously.  I am not sure but maybe it is because when a
    >>         lot of
    >>         other backends were running (performing read-only workload) the
    >>         backend that was responsible for doing backup was getting
    >>         frequently
    >>         scheduled out and it slowed down the overall backup process. 
    >>         And when
    >>         we start using multiple backends for backup one or other backup
    >>         process is always running making the overall backup faster. 
    >>         One idea
    >>         to find this out is to check how much time backup takes when
    >>         we run it
    >>         with and without pgbench workload on HEAD (aka unpatched
    >>         code).  Even
    >>         if what I am saying is true or there is some other reason due
    >>         to which
    >>         we are seeing speedup in some cases (where there is a concurrent
    >>         workload), it might not make the case for using multiple
    >>         backends for
    >>         backup but still, it is good to find that information as it
    >>         might help
    >>         in designing this feature better.
    >>
    >>         > The perf report files for both Postgres server and
    >>         pg_basebackup sides
    >>         > are attached.
    >>         >
    >>
    >>         It is not clear which functions are taking more time or for which
    >>         functions time is reduced as function symbols are not present
    >>         in the
    >>         reports.  I think you can refer
    >>         "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see
    >>         how to
    >>         take profiles and additionally use -fno-omit-frame-pointer during
    >>         configure (you can use CFLAGS="-fno-omit-frame-pointer during
    >>         configure).
    >>
    >>
    >>         -- 
    >>         With Regards,
    >>         Amit Kapila.
    >>         EnterpriseDB: http://www.enterprisedb.com
    >>
    >>
    >>
    >>
    >>     -- 
    >>     -- 
    >>
    >>     Thanks & Regards,
    >>     Suraj kharage,
    >>     EnterpriseDB Corporation,
    >>     The Postgres Database Company.
    >     -- 
    >     David
    >
    >     Software Engineer
    >     Highgo Software Inc. (Canada)
    >     www.highgo.ca <http://www.highgo.ca>
    >
    -- 
    David
    
    Software Engineer
    Highgo Software Inc. (Canada)
    www.highgo.ca
    
  111. Re: WIP/PoC for parallel backup

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2020-05-04T13:22:37Z

    On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    
    > On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > >
    > > Hi,
    > >
    > > We at EnterpriseDB did some performance testing around this parallel
    > backup to check how this is beneficial and below are the results. In this
    > testing, we run the backup -
    > > 1) Without Asif’s patch
    > > 2) With Asif’s patch and combination of workers 1,2,4,8.
    > >
    > > We run those test on two setup
    > >
    > > 1) Client and Server both on the same machine (Local backups)
    > >
    > > 2) Client and server on a different machine (remote backups)
    > >
    > >
    > > Machine details:
    > >
    > > 1: Server (on which local backups performed and used as server for
    > remote backups)
    > >
    > > 2: Client (Used as a client for remote backups)
    > >
    > >
    > ...
    > >
    > >
    > > Client & Server on the same machine, the result shows around 50%
    > improvement in parallel run with worker 4 and 8.  We don’t see the huge
    > performance improvement with more workers been added.
    > >
    > >
    > > Whereas, when the client and server on a different machine, we don’t see
    > any major benefit in performance.  This testing result matches the testing
    > results posted by David Zhang up thread.
    > >
    > >
    > >
    > > We ran the test for 100GB backup with parallel worker 4 to see the CPU
    > usage and other information. What we noticed is that server is consuming
    > the CPU almost 100% whole the time and pg_stat_activity shows that server
    > is busy with ClientWrite most of the time.
    > >
    > >
    >
    > Was this for a setup where the client and server were on the same
    > machine or where the client was on a different machine?  If it was for
    > the case where both are on the same machine, then ideally, we should
    > see ClientRead events in a similar proportion?
    >
    
    In the particular setup, the client and server were on different machines.
    
    
    > During an offlist discussion with Robert, he pointed out that current
    > basebackup's code doesn't account for the wait event for the reading
    > of files which can change what pg_stat_activity shows?  Can you please
    > apply his latest patch to improve basebackup.c's code [1] which will
    > take care of that waitevent before getting the data again?
    >
    > [1] -
    > https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
    >
    
    
    Sure, we can try out this and do a similar run to collect the
    pg_stat_activity output.
    
    
    > --
    > With Regards,
    > Amit Kapila.
    > EnterpriseDB: http://www.enterprisedb.com
    >
    >
    >
    
    -- 
    Rushabh Lathia
    
  112. Re: WIP/PoC for parallel backup

    Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-05-21T05:17:29Z

    On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    >
    > On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    >
    >> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
    >> <suraj.kharage@enterprisedb.com> wrote:
    >> >
    >> > Hi,
    >> >
    >> > We at EnterpriseDB did some performance testing around this parallel
    >> backup to check how this is beneficial and below are the results. In this
    >> testing, we run the backup -
    >> > 1) Without Asif’s patch
    >> > 2) With Asif’s patch and combination of workers 1,2,4,8.
    >> >
    >> > We run those test on two setup
    >> >
    >> > 1) Client and Server both on the same machine (Local backups)
    >> >
    >> > 2) Client and server on a different machine (remote backups)
    >> >
    >> >
    >> > Machine details:
    >> >
    >> > 1: Server (on which local backups performed and used as server for
    >> remote backups)
    >> >
    >> > 2: Client (Used as a client for remote backups)
    >> >
    >> >
    >> ...
    >> >
    >> >
    >> > Client & Server on the same machine, the result shows around 50%
    >> improvement in parallel run with worker 4 and 8.  We don’t see the huge
    >> performance improvement with more workers been added.
    >> >
    >> >
    >> > Whereas, when the client and server on a different machine, we don’t
    >> see any major benefit in performance.  This testing result matches the
    >> testing results posted by David Zhang up thread.
    >> >
    >> >
    >> >
    >> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
    >> usage and other information. What we noticed is that server is consuming
    >> the CPU almost 100% whole the time and pg_stat_activity shows that server
    >> is busy with ClientWrite most of the time.
    >> >
    >> >
    >>
    >> Was this for a setup where the client and server were on the same
    >> machine or where the client was on a different machine?  If it was for
    >> the case where both are on the same machine, then ideally, we should
    >> see ClientRead events in a similar proportion?
    >>
    >
    > In the particular setup, the client and server were on different machines.
    >
    >
    >> During an offlist discussion with Robert, he pointed out that current
    >> basebackup's code doesn't account for the wait event for the reading
    >> of files which can change what pg_stat_activity shows?  Can you please
    >> apply his latest patch to improve basebackup.c's code [1] which will
    >> take care of that waitevent before getting the data again?
    >>
    >> [1] -
    >> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
    >>
    >
    >
    > Sure, we can try out this and do a similar run to collect the
    > pg_stat_activity output.
    >
    
    Have you had the chance to try this out?
    
    
    >
    >
    >> --
    >> With Regards,
    >> Amit Kapila.
    >> EnterpriseDB: http://www.enterprisedb.com
    >>
    >>
    >>
    >
    > --
    > Rushabh Lathia
    >
    
    
    -- 
    Highgo Software (Canada/China/Pakistan)
    URL : http://www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    EMAIL: mailto: ahsan.hadi@highgo.ca
    
  113. Re: WIP/PoC for parallel backup

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2020-05-21T06:06:23Z

    On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    
    >
    >
    > On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    >
    >>
    >>
    >> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com>
    >> wrote:
    >>
    >>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
    >>> <suraj.kharage@enterprisedb.com> wrote:
    >>> >
    >>> > Hi,
    >>> >
    >>> > We at EnterpriseDB did some performance testing around this parallel
    >>> backup to check how this is beneficial and below are the results. In this
    >>> testing, we run the backup -
    >>> > 1) Without Asif’s patch
    >>> > 2) With Asif’s patch and combination of workers 1,2,4,8.
    >>> >
    >>> > We run those test on two setup
    >>> >
    >>> > 1) Client and Server both on the same machine (Local backups)
    >>> >
    >>> > 2) Client and server on a different machine (remote backups)
    >>> >
    >>> >
    >>> > Machine details:
    >>> >
    >>> > 1: Server (on which local backups performed and used as server for
    >>> remote backups)
    >>> >
    >>> > 2: Client (Used as a client for remote backups)
    >>> >
    >>> >
    >>> ...
    >>> >
    >>> >
    >>> > Client & Server on the same machine, the result shows around 50%
    >>> improvement in parallel run with worker 4 and 8.  We don’t see the huge
    >>> performance improvement with more workers been added.
    >>> >
    >>> >
    >>> > Whereas, when the client and server on a different machine, we don’t
    >>> see any major benefit in performance.  This testing result matches the
    >>> testing results posted by David Zhang up thread.
    >>> >
    >>> >
    >>> >
    >>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
    >>> usage and other information. What we noticed is that server is consuming
    >>> the CPU almost 100% whole the time and pg_stat_activity shows that server
    >>> is busy with ClientWrite most of the time.
    >>> >
    >>> >
    >>>
    >>> Was this for a setup where the client and server were on the same
    >>> machine or where the client was on a different machine?  If it was for
    >>> the case where both are on the same machine, then ideally, we should
    >>> see ClientRead events in a similar proportion?
    >>>
    >>
    >> In the particular setup, the client and server were on different
    >> machines.
    >>
    >>
    >>> During an offlist discussion with Robert, he pointed out that current
    >>> basebackup's code doesn't account for the wait event for the reading
    >>> of files which can change what pg_stat_activity shows?  Can you please
    >>> apply his latest patch to improve basebackup.c's code [1] which will
    >>> take care of that waitevent before getting the data again?
    >>>
    >>> [1] -
    >>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
    >>>
    >>
    >>
    >> Sure, we can try out this and do a similar run to collect the
    >> pg_stat_activity output.
    >>
    >
    > Have you had the chance to try this out?
    >
    
    Yes. My colleague Suraj tried this and here are the pg_stat_activity output
    files.
    
    Captured wait events after every 3 seconds during the backup for -
    1: parallel backup for 100GB data with 4 workers
    (pg_stat_activity_normal_backup_100GB.txt)
    2: Normal backup (without parallel backup patch) for 100GB data
    (pg_stat_activity_j4_100GB.txt)
    
    Here is the observation:
    
    The total number of events (pg_stat_activity) captured during above runs:
    - 314 events for normal backups
    - 316 events for parallel backups (-j 4)
    
    BaseBackupRead wait event numbers: (newly added)
    37 - in normal backups
    25 - in the parallel backup (-j 4)
    
    ClientWrite wait event numbers:
    175 - in normal backup
    1098 - in parallel backups
    
    ClientRead wait event numbers:
    0 - ClientRead in normal backup
    326 - ClientRead in parallel backups for diff processes. (all in idle state)
    
    
    
    
    Thanks,
    Rushabh Lathia
    www.EnterpriseDB.com
    
  114. Re: WIP/PoC for parallel backup

    Amit Kapila <amit.kapila16@gmail.com> — 2020-05-21T06:53:56Z

    On Thu, May 21, 2020 at 11:36 AM Rushabh Lathia
    <rushabh.lathia@gmail.com> wrote:
    >
    > On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
    >>
    >>>>
    >>>> During an offlist discussion with Robert, he pointed out that current
    >>>> basebackup's code doesn't account for the wait event for the reading
    >>>> of files which can change what pg_stat_activity shows?  Can you please
    >>>> apply his latest patch to improve basebackup.c's code [1] which will
    >>>> take care of that waitevent before getting the data again?
    >>>>
    >>>> [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
    >>>
    >>>
    >>>
    >>> Sure, we can try out this and do a similar run to collect the pg_stat_activity output.
    >>
    >>
    >> Have you had the chance to try this out?
    >
    >
    > Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
    >
    > Captured wait events after every 3 seconds during the backup for -
    > 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
    > 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
    >
    > Here is the observation:
    >
    > The total number of events (pg_stat_activity) captured during above runs:
    > - 314 events for normal backups
    > - 316 events for parallel backups (-j 4)
    >
    > BaseBackupRead wait event numbers: (newly added)
    > 37 - in normal backups
    > 25 - in the parallel backup (-j 4)
    >
    > ClientWrite wait event numbers:
    > 175 - in normal backup
    > 1098 - in parallel backups
    >
    > ClientRead wait event numbers:
    > 0 - ClientRead in normal backup
    > 326 - ClientRead in parallel backups for diff processes. (all in idle state)
    >
    
    It might be interesting to see why ClientRead/ClientWrite has
    increased so much and can we reduce it?
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  115. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-05-21T13:41:54Z

    On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
    >
    > Captured wait events after every 3 seconds during the backup for -
    > 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
    > 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
    >
    > Here is the observation:
    >
    > The total number of events (pg_stat_activity) captured during above runs:
    > - 314 events for normal backups
    > - 316 events for parallel backups (-j 4)
    >
    > BaseBackupRead wait event numbers: (newly added)
    > 37 - in normal backups
    > 25 - in the parallel backup (-j 4)
    >
    > ClientWrite wait event numbers:
    > 175 - in normal backup
    > 1098 - in parallel backups
    >
    > ClientRead wait event numbers:
    > 0 - ClientRead in normal backup
    > 326 - ClientRead in parallel backups for diff processes. (all in idle state)
    
    So, basically, when we go from 1 process to 4, the additional
    processes spend all of their time waiting rather than doing any useful
    work, and that's why there is no performance benefit. Presumably, the
    reason they spend all their time waiting for ClientRead/ClientWrite is
    because the network between the two machines is saturated, so adding
    more processes that are trying to use it at maximum speed just leads
    to spending more time waiting for it to be available.
    
    Do we have the same results for the local backup case, where the patch helped?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  116. Re: WIP/PoC for parallel backup

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-05-22T06:03:29Z

    On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    >
    > So, basically, when we go from 1 process to 4, the additional
    > processes spend all of their time waiting rather than doing any useful
    > work, and that's why there is no performance benefit. Presumably, the
    > reason they spend all their time waiting for ClientRead/ClientWrite is
    > because the network between the two machines is saturated, so adding
    > more processes that are trying to use it at maximum speed just leads
    > to spending more time waiting for it to be available.
    >
    > Do we have the same results for the local backup case, where the patch
    > helped?
    >
    
    Here is the result for local backup case (100GB data). Attaching the
    captured logs.
    
    The total number of events (pg_stat_activity) captured during local runs:
    - 82 events for normal backups
    - 31 events for parallel backups (-j 4)
    
    BaseBackupRead wait event numbers: (newly added)
    24 - in normal backups
    14 - in parallel backup (-j 4)
    
    ClientWrite wait event numbers:
    8 - in normal backup
    43 - in parallel backups
    
    ClientRead wait event numbers:
    0 - ClientRead in normal backup
    32 - ClientRead in parallel backups for diff processes.
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  117. Re: WIP/PoC for parallel backup

    Hamid Akhtar <hamid.akhtar@gmail.com> — 2020-06-11T17:40:38Z

    As far I understand, parallel backup is not a mandatory performance
    feature, rather, one at user's discretion. This IMHO indicates that it will
    benefit some users and it may not others.
    
    Taking a backup is an I/O intensive workload. So by parallelizing it
    through multiple worker threads/processes, creates an overhead of its own.
    So what precisely are we optimizing here. Looking at a running database
    system in any environment, I see the following potential scenarios playing
    out. These are probably clear to everyone here, but I'm listing these for
    completeness and clarity.
    
    Locally Running Backup:
    (1) Server has no clients connected other than base backup.
    (2) Server has other clients connected which are actively performing
    operations causing disk I/O.
    
    Remotely Running Backup:
    (3) Server has no clients connected other than remote base backup.
    (4) Server has other clients connected which are actively performing
    operations causing disk I/O.
    
    Others:
    (5) Server or the system running base backup has other processes competing
    for disk or network bandwidth.
    
    Generally speaking, I see that parallelization could potentially benefit in
    scenarios (2), (4) and (5) with the reason being that having more than one
    thread increases the likelihood that backup will now get a bigger time
    slice for disk I/O and network bandwidth. With (1) and (3), since there are
    no competing processes, addition of multiple threads or processes will only
    increase CPU overhead whilst still getting the same network and disk time
    slice. In this particular case, the performance will degrade.
    
    IMHO, that’s why by adding other load on the server, perhaps by running
    pgbench simultaneously may show improved performance for parallel backup.
    Also, running parallel backup on a local laptop more often than yields
    improved performance.
    
    There are obviously other factors that may impact the performance like the
    type of I/O scheduler being used whether CFQ or some other.
    
    IMHO, parallel backup has obvious performance benefits, but we need to
    ensure that users understand that there is potential for slower backup if
    there is no competition for resources.
    
    
    
    On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <
    suraj.kharage@enterprisedb.com> wrote:
    
    >
    > On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >>
    >> So, basically, when we go from 1 process to 4, the additional
    >> processes spend all of their time waiting rather than doing any useful
    >> work, and that's why there is no performance benefit. Presumably, the
    >> reason they spend all their time waiting for ClientRead/ClientWrite is
    >> because the network between the two machines is saturated, so adding
    >> more processes that are trying to use it at maximum speed just leads
    >> to spending more time waiting for it to be available.
    >>
    >> Do we have the same results for the local backup case, where the patch
    >> helped?
    >>
    >
    > Here is the result for local backup case (100GB data). Attaching the
    > captured logs.
    >
    > The total number of events (pg_stat_activity) captured during local runs:
    > - 82 events for normal backups
    > - 31 events for parallel backups (-j 4)
    >
    > BaseBackupRead wait event numbers: (newly added)
    > 24 - in normal backups
    > 14 - in parallel backup (-j 4)
    >
    > ClientWrite wait event numbers:
    > 8 - in normal backup
    > 43 - in parallel backups
    >
    > ClientRead wait event numbers:
    > 0 - ClientRead in normal backup
    > 32 - ClientRead in parallel backups for diff processes.
    >
    >
    > --
    > --
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    >
    
    
    -- 
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    CELL:+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
    SKYPE: engineeredvirus
    
  118. Re: WIP/PoC for parallel backup

    Robert Haas <robertmhaas@gmail.com> — 2020-06-12T17:28:37Z

    On Thu, Jun 11, 2020 at 1:41 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:
    > As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This IMHO indicates that it will benefit some users and it may not others.
    >
    > IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potential for slower backup if there is no competition for resources.
    
    I am sure that nobody is arguing that the patch has to be beneficial
    in all cases in order to justify applying it. However, there are
    several good arguments against proceding with this patch:
    
    * Every version of the patch that has been reviewed by anybody has
    been riddled with errors. Over and over again, testers have found
    serious bugs, and code reviewers have noticed lots of problems, too.
    
    * This approach requires rewriting a lot of current functionality,
    either by moving it to the client side or by restructuring it to work
    with parallelism. That's a lot of work, and it seems likely to
    generate more work in the future as people continue to add features.
    It's one thing to add a feature that doesn't benefit everybody; it's
    another thing to add a feature that doesn't benefit everybody and also
    hinders future development. See
    http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
    for more discussion of these issues.
    
    * The scenarios in which the patch delivers a performance benefit are
    narrow and somewhat contrived. In remote backup scenarios, AIUI, the
    patch hasn't been shown to help. In local backups, it does, but how
    likely is it that you are going to do your local backups over the wire
    protocol instead of by direct file copy, which is probably much
    faster? I agree that if your server is overloaded, having multiple
    processes competing for the server resources will allow backup to get
    a larger slice relative to other things, but that seems like a pretty
    hackish and inefficient solution to that problem. You could also argue
    that we could provide a feature to prioritize some queries over other
    queries by running them with tons of parallel workers just to convince
    the OS to give them more resources, and I guess that would work, but
    it would also waste tons of resources and possibly cripple or even
    crash your system if you used it enough. The same argument applies
    here.
    
    * Even when the patch does provide a benefit, it seems to max out at
    about 2.5X. Clearly it's nice to have something go 2.5X faster, but
    the point is that it doesn't scale beyond that no matter how many
    workers you add. That doesn't automatically mean that something is a
    bad idea, but it is a concern. At the very least, we should be able to
    say why it doesn't scale any better than that.
    
    * Actually, we have some hints about that. Over at
    http://postgr.es/m/20200503174922.mfzzdafa5g4rlhez@alap3.anarazel.de
    Andres has shown that too much concurrency when copying files results
    in a dramatic performance reduction, and that a lot of the reason why
    concurrency helps in the first place has to do with the fact that
    pg_basebackup does not have any cache control (no fallocate,
    sync_file_range(WRITE), posix_fadvise(DONTNEED)). When those things
    are added the performance gets better and the benefits of concurrency
    are reduced. I suspect that would also be true for this patch. It
    would be unreasonable to commit a large patch, especially one that
    would hinder future development, if we could get the same benefits
    from a small patch that would not do so.
    
    I am not in a position to tell you how to spend your time, so you can
    certainly pursue this patch if you wish. However, I think it's
    probably not the best use of time. Even if you fixed all the bugs and
    reimplemented all of the functionality that needs reimplementing in
    order to make this approach work, it still doesn't make sense to
    commit the patch if either (a) we can obtain the same benefit, or most
    of it, from a much simpler patch or (b) the patch is going to make it
    significantly harder to develop other features that we want to have,
    especially if those features seem likely to be more beneficial than
    what this patch offers. I think both of those are likely true here.
    
    For an example of (b), consider compression of tar files on the server
    side before transmission to the client. If you take the approach this
    patch does and move tarfile construction to the client, that is
    impossible. Now you can argue (and perhaps you will) that this would
    just mean someone has to choose between using this feature and using
    that feature, and why should users not have such a choice? That is a
    fair argument, but my counter-argument is that users shouldn't be
    forced into making that choice. If the parallel feature is beneficial
    enough to justify having it, then it ought to be designed in such a
    way that it works with the other features we also want to have rather
    than forcing users to choose between them. Since I have already
    proposed (on the other thread linked above) a design that would make
    that possible, and this design does not, I have a hard time
    understanding why we would pick this one, especially given all of the
    other disadvantages which it seems to have.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  119. Re: WIP/PoC for parallel backup

    Daniel Gustafsson <daniel@yesql.se> — 2020-07-06T12:24:03Z

    > On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
    
    > I am sure that nobody is arguing that the patch has to be beneficial
    > in all cases in order to justify applying it. However, there are
    > several good arguments against proceding with this patch:
    
    This thread has stalled with no resolution to the raised issues, and the latest
    version of the patch (v15) posted no longer applies (I only tried 0001 which
    failed, the green tick in the CFBot is due it mistakenlt thinking an attached
    report is a patch).  I'm marking this patch Returned with Feedback.  Please
    open a new CF entry when there is a new version of the patch.
    
    cheers ./daniel
    
    
    
  120. Re: WIP/PoC for parallel backup

    Hamid Akhtar <hamid.akhtar@gmail.com> — 2020-07-06T12:39:08Z

    On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson <daniel@yesql.se> wrote:
    
    > > On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > > I am sure that nobody is arguing that the patch has to be beneficial
    > > in all cases in order to justify applying it. However, there are
    > > several good arguments against proceding with this patch:
    >
    > This thread has stalled with no resolution to the raised issues, and the
    > latest
    > version of the patch (v15) posted no longer applies (I only tried 0001
    > which
    > failed, the green tick in the CFBot is due it mistakenlt thinking an
    > attached
    > report is a patch).  I'm marking this patch Returned with Feedback.  Please
    > open a new CF entry when there is a new version of the patch.
    >
    > cheers ./daniel
    
    
    I think this is fair. There are quite a few valid points raised by Robert.
    
    
    -- 
    Highgo Software (Canada/China/Pakistan)
    URL : www.highgo.ca
    ADDR: 10318 WHALLEY BLVD, Surrey, BC
    CELL:+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
    SKYPE: engineeredvirus