Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited

WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-08-21T13:47:04Z

Hi Hackers,

I have been looking into adding parallel backup feature in pg_basebackup.
Currently pg_basebackup sends BASE_BACKUP command for taking full backup,
server scans the PGDATA and sends the files to pg_basebackup. In general,
server takes the following steps on BASE_BACKUP command:

- do pg_start_backup
- scans PGDATA, creates and send header containing information of
tablespaces.
- sends each tablespace to pg_basebackup.
- and then do pg_stop_backup

All these steps are executed sequentially by a single process. The idea I
am working on is to separate these steps into multiple commands in
replication grammer. Add worker processes to the pg_basebackup where they
can copy the contents of PGDATA in parallel.

The command line interface syntax would be like:
pg_basebackup --jobs=WORKERS


Replication commands:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,
scans PGDATA and sends a list of file names.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This
commands will be send by each worker and that worker will be getting the
said files.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.

The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and
getting a list of filenames from the server in response. It should then
divide this list as per --jobs parameter. (This division can be based on
file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS
(file1, file2,...) command. In response, the server will send the files
mentioned in the list back to the requesting worker process.

Once all the files are copied, then pg_basebackup will send the STOP_BACKUP
command. Similar idea has been been discussed by Robert, on the incremental
backup thread a while ago. This is similar to that but instead of
START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP
PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:

Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.


Regards,
Asif

Re: WIP/PoC for parallel backup

P <apraveen@pivotal.io> — 2019-08-23T10:17:51Z

Hi Asif

Interesting proposal.  Bulk of the work in a backup is transferring files
from source data directory to destination.  Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set.  Is that necessary?  I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process.  That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup.  On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim

Re: WIP/PoC for parallel backup

Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-08-23T13:03:10Z

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

> Hi Asif
>
> Interesting proposal.  Bulk of the work in a backup is transferring files
> from source data directory to destination.  Your patch is breaking this
> task down in multiple sets of files and transferring each set in parallel.
> This seems correct, however, your patch is also creating a new process to
> handle each set.  Is that necessary?  I think we should try to achieve this
> using multiple asynchronous libpq connections from a single basebackup
> process.  That is to use PQconnectStartParams() interface instead of
> PQconnectdbParams(), wich is currently used by basebackup.  On the server
> side, it may still result in multiple backend processes per connection, and
> an attempt should be made to avoid that as well, but it seems complicated.
>
> What do you think?
>
> The main question is what we really want to solve here. What is the
bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk).
If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway.  I implemented the parallel backup
in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is
overkill.
There are two options, one is non-blocking calls or you can have some
worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that,
we
can see what is the best way to solve that. Some numbers may help to
understand the
actual benefit.


-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-08-23T16:04:07Z

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

> Hi Asif
>
> Interesting proposal.  Bulk of the work in a backup is transferring files
> from source data directory to destination.  Your patch is breaking this
> task down in multiple sets of files and transferring each set in parallel.
> This seems correct, however, your patch is also creating a new process to
> handle each set.  Is that necessary?  I think we should try to achieve this
> using multiple asynchronous libpq connections from a single basebackup
> process.  That is to use PQconnectStartParams() interface instead of
> PQconnectdbParams(), wich is currently used by basebackup.  On the server
> side, it may still result in multiple backend processes per connection, and
> an attempt should be made to avoid that as well, but it seems complicated.
>
> What do you think?
>
> Asim
>

Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain the
same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly among
multiple workers based on file sizes, that would allow to get some concrete
numbers as well as it will also us to gauge some benefits between async and
multiprocess/thread approach on client side.

Regards,
Asif

Re: WIP/PoC for parallel backup

Stephen Frost <sfrost@snowman.net> — 2019-08-23T17:26:38Z

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal.  Bulk of the work in a backup is transferring files
> > from source data directory to destination.  Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set.  Is that necessary?  I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process.  That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup.  On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
> 
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
> 
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

Thanks,

Stephen

Re: WIP/PoC for parallel backup

Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-08-23T17:50:09Z

On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfrost@snowman.net> wrote:

> Greetings,
>
> * Asif Rehman (asifr.rehman@gmail.com) wrote:
> > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > > Interesting proposal.  Bulk of the work in a backup is transferring
> files
> > > from source data directory to destination.  Your patch is breaking this
> > > task down in multiple sets of files and transferring each set in
> parallel.
> > > This seems correct, however, your patch is also creating a new process
> to
> > > handle each set.  Is that necessary?  I think we should try to achieve
> this
> > > using multiple asynchronous libpq connections from a single basebackup
> > > process.  That is to use PQconnectStartParams() interface instead of
> > > PQconnectdbParams(), wich is currently used by basebackup.  On the
> server
> > > side, it may still result in multiple backend processes per
> connection, and
> > > an attempt should be made to avoid that as well, but it seems
> complicated.
> >
> > Thanks Asim for the feedback. This is a good suggestion. The main idea I
> > wanted to discuss is the design where we can open multiple backend
> > connections to get the data instead of a single connection.
> > On the client side we can have multiple approaches, One is to use
> > asynchronous APIs ( as suggested by you) and other could be to decide
> > between multi-process and multi-thread. The main point was we can extract
> > lot of performance benefit by using the multiple connections and I built
> > this POC to float the idea of how the parallel backup can work, since the
> > core logic of getting the files using multiple connections will remain
> the
> > same, wether we use asynchronous, multi-process or multi-threaded.
> >
> > I am going to address the division of files to be distributed evenly
> among
> > multiple workers based on file sizes, that would allow to get some
> concrete
> > numbers as well as it will also us to gauge some benefits between async
> and
> > multiprocess/thread approach on client side.
>
> I would expect you to quickly want to support compression on the server
> side, before the data is sent across the network, and possibly
> encryption, and so it'd likely make sense to just have independent
> processes and connections through which to do that.
>
> +1 for compression and encryption, but I think parallelism will give us
the benefit with and without the compression.

Thanks,
>
> Stephen
>


-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com> — 2019-08-23T19:15:32Z

On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:

> Greetings,
>
> * Asif Rehman (asifr.rehman@gmail.com) wrote:
> > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > > Interesting proposal.  Bulk of the work in a backup is transferring
> files
> > > from source data directory to destination.  Your patch is breaking this
> > > task down in multiple sets of files and transferring each set in
> parallel.
> > > This seems correct, however, your patch is also creating a new process
> to
> > > handle each set.  Is that necessary?  I think we should try to achieve
> this
> > > using multiple asynchronous libpq connections from a single basebackup
> > > process.  That is to use PQconnectStartParams() interface instead of
> > > PQconnectdbParams(), wich is currently used by basebackup.  On the
> server
> > > side, it may still result in multiple backend processes per
> connection, and
> > > an attempt should be made to avoid that as well, but it seems
> complicated.
> >
> > Thanks Asim for the feedback. This is a good suggestion. The main idea I
> > wanted to discuss is the design where we can open multiple backend
> > connections to get the data instead of a single connection.
> > On the client side we can have multiple approaches, One is to use
> > asynchronous APIs ( as suggested by you) and other could be to decide
> > between multi-process and multi-thread. The main point was we can extract
> > lot of performance benefit by using the multiple connections and I built
> > this POC to float the idea of how the parallel backup can work, since the
> > core logic of getting the files using multiple connections will remain
> the
> > same, wether we use asynchronous, multi-process or multi-threaded.
> >
> > I am going to address the division of files to be distributed evenly
> among
> > multiple workers based on file sizes, that would allow to get some
> concrete
> > numbers as well as it will also us to gauge some benefits between async
> and
> > multiprocess/thread approach on client side.
>
> I would expect you to quickly want to support compression on the server
> side, before the data is sent across the network, and possibly
> encryption, and so it'd likely make sense to just have independent
> processes and connections through which to do that.


It would be interesting to see the benefits of compression (before the data
is transferred over the network) on top of parallelism. Since there is also
some overhead associated with performing the compression. I agree with your
suggestion of trying to add parallelism first and then try compression
before the data is sent across the network.


>
> Thanks,
>
> Stephen
>

Re: WIP/PoC for parallel backup

Stephen Frost <sfrost@snowman.net> — 2019-08-23T19:42:54Z

Greetings,

* Ahsan Hadi (ahsan.hadi@gmail.com) wrote:
> On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > I would expect you to quickly want to support compression on the server
> > side, before the data is sent across the network, and possibly
> > encryption, and so it'd likely make sense to just have independent
> > processes and connections through which to do that.
> 
> It would be interesting to see the benefits of compression (before the data
> is transferred over the network) on top of parallelism. Since there is also
> some overhead associated with performing the compression. I agree with your
> suggestion of trying to add parallelism first and then try compression
> before the data is sent across the network.

You're welcome to take a look at pgbackrest for insight and to play with
regarding compression-before-transfer, how best to split up the files
and order them, encryption, et al.  We've put quite a bit of effort into
figuring all of that out.

Thanks!

Stephen

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-09-24T17:53:03Z

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file.  That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even.  If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones.  Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers.  Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-09-27T16:00:01Z

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> > If the parallel option is there, then it will only do pg_start_backup,
> scans PGDATA and sends a list of file names.
>
> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
> list of file names. I don't think that's a good approach. It's too
> confusing to have one replication command that returns totally
> different things depending on whether some option is given.
>

Sure. I will add a separate command (START_BACKUP)  for parallel.


> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
> list.
> > pg_basebackup will then send back a list of filenames in this command.
> This commands will be send by each worker and that worker will be getting
> the said files.
>
> Seems reasonable, but I think you should just pass one file name and
> use the command multiple times, once per file.
>

I considered this approach initially,  however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.


>
> > - STOP_BACKUP
> > when all workers finish then, pg_basebackup will send STOP_BACKUP
> command.
>
> This also seems reasonable, but surely the matching command should
> then be called START_BACKUP, not BASEBACKUP PARALLEL.
>
> > I have done a basic proof of concenpt (POC), which is also attached. I
> would appreciate some input on this. So far, I am simply dividing the list
> equally and assigning them to worker processes. I intend to fine tune this
> by taking into consideration file sizes. Further to add tar format support,
> I am considering that each worker process, processes all files belonging to
> a tablespace in its list (i.e. creates and copies tar file), before it
> processes the next tablespace. As a result, this will create tar files that
> are disjointed with respect tablespace data. For example:
>
> Instead of doing this, I suggest that you should just maintain a list
> of all the files that need to be fetched and have each worker pull a
> file from the head of the list and fetch it when it finishes receiving
> the previous file.  That way, if some connections go faster or slower
> than others, the distribution of work ends up fairly even.  If you
> instead pre-distribute the work, you're guessing what's going to
> happen in the future instead of just waiting to see what actually does
> happen. Guessing isn't intrinsically bad, but guessing when you could
> be sure of doing the right thing *is* bad.
>
> If you want to be really fancy, you could start by sorting the files
> in descending order of size, so that big files are fetched before
> small ones.  Since the largest possible file is 1GB and any database
> where this feature is important is probably hundreds or thousands of
> GB, this may not be very important. I suggest not worrying about it
> for v1.
>

Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.

Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.


>
> > Say, tablespace t1 has 20 files and we have 5 worker processes and
> tablespace t2 has 10. Ignoring all other factors for the sake of this
> example, each worker process will get a group of 4 files of t1 and 2 files
> of t2. Each process will create 2 tar files, one for t1 containing 4 files
> and another for t2 containing 2 files.
>
> This is one of several possible approaches. If we're doing a
> plain-format backup in parallel, we can just write each file where it
> needs to go and call it good. But, with a tar-format backup, what
> should we do? I can see three options:
>
> 1. Error! Tar format parallel backups are not supported.
>
> 2. Write multiple tar files. The user might reasonably expect that
> they're going to end up with the same files at the end of the backup
> regardless of whether they do it in parallel. A user with this
> expectation will be disappointed.
>
> 3. Write one tar file. In this design, the workers have to take turns
> writing to the tar file, so you need some synchronization around that.
> Perhaps you'd have N threads that read and buffer a file, and N+1
> buffers.  Then you have one additional thread that reads the complete
> files from the buffers and writes them to the tar file. There's
> obviously some possibility that the writer won't be able to keep up
> and writing the backup will therefore be slower than it would be with
> approach (2).
>
> There's probably also a possibility that approach (2) would thrash the
> disk head back and forth between multiple files that are all being
> written at the same time, and approach (3) will therefore win by not
> thrashing the disk head. But, since spinning media are becoming less
> and less popular and are likely to have multiple disk heads under the
> hood when they are used, this is probably not too likely.
>
> I think your choice to go with approach (2) is probably reasonable,
> but I'm not sure whether everyone will agree.
>

Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-10-03T11:47:31Z

Hi  Asif,

I was looking at the patch and tried comipling it. However, got few errors
and warnings.

Fixed those in the attached patch.

On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi Robert,
>
> Thanks for the feedback. Please see the comments below:
>
> On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
>
>> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
>> > If the parallel option is there, then it will only do pg_start_backup,
>> scans PGDATA and sends a list of file names.
>>
>> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
>> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
>> list of file names. I don't think that's a good approach. It's too
>> confusing to have one replication command that returns totally
>> different things depending on whether some option is given.
>>
>
> Sure. I will add a separate command (START_BACKUP)  for parallel.
>
>
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
>> list.
>> > pg_basebackup will then send back a list of filenames in this command.
>> This commands will be send by each worker and that worker will be getting
>> the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>>
>
> I considered this approach initially,  however, I adopted the current
> strategy to avoid multiple round trips between the server and clients and
> save on query processing time by issuing a single command rather than
> multiple ones. Further fetching multiple files at once will also aid in
> supporting the tar format by utilising the existing ReceiveTarFile()
> function and will be able to create a tarball for per tablespace per worker.
>
>
>>
>> > - STOP_BACKUP
>> > when all workers finish then, pg_basebackup will send STOP_BACKUP
>> command.
>>
>> This also seems reasonable, but surely the matching command should
>> then be called START_BACKUP, not BASEBACKUP PARALLEL.
>>
>> > I have done a basic proof of concenpt (POC), which is also attached. I
>> would appreciate some input on this. So far, I am simply dividing the list
>> equally and assigning them to worker processes. I intend to fine tune this
>> by taking into consideration file sizes. Further to add tar format support,
>> I am considering that each worker process, processes all files belonging to
>> a tablespace in its list (i.e. creates and copies tar file), before it
>> processes the next tablespace. As a result, this will create tar files that
>> are disjointed with respect tablespace data. For example:
>>
>> Instead of doing this, I suggest that you should just maintain a list
>> of all the files that need to be fetched and have each worker pull a
>> file from the head of the list and fetch it when it finishes receiving
>> the previous file.  That way, if some connections go faster or slower
>> than others, the distribution of work ends up fairly even.  If you
>> instead pre-distribute the work, you're guessing what's going to
>> happen in the future instead of just waiting to see what actually does
>> happen. Guessing isn't intrinsically bad, but guessing when you could
>> be sure of doing the right thing *is* bad.
>>
>> If you want to be really fancy, you could start by sorting the files
>> in descending order of size, so that big files are fetched before
>> small ones.  Since the largest possible file is 1GB and any database
>> where this feature is important is probably hundreds or thousands of
>> GB, this may not be very important. I suggest not worrying about it
>> for v1.
>>
>
> Ideally, I would like to support the tar format as well, which would be
> much easier to implement when fetching multiple files at once since that
> would enable using the existent functionality to be used without much
> change.
>
> Your idea of sorting the files in descending order of size seems very
> appealing. I think we can do this and have the file divided among the
> workers one by one i.e. the first file in the list goes to worker 1, the
> second to process 2, and so on and so forth.
>
>
>>
>> > Say, tablespace t1 has 20 files and we have 5 worker processes and
>> tablespace t2 has 10. Ignoring all other factors for the sake of this
>> example, each worker process will get a group of 4 files of t1 and 2 files
>> of t2. Each process will create 2 tar files, one for t1 containing 4 files
>> and another for t2 containing 2 files.
>>
>> This is one of several possible approaches. If we're doing a
>> plain-format backup in parallel, we can just write each file where it
>> needs to go and call it good. But, with a tar-format backup, what
>> should we do? I can see three options:
>>
>> 1. Error! Tar format parallel backups are not supported.
>>
>> 2. Write multiple tar files. The user might reasonably expect that
>> they're going to end up with the same files at the end of the backup
>> regardless of whether they do it in parallel. A user with this
>> expectation will be disappointed.
>>
>> 3. Write one tar file. In this design, the workers have to take turns
>> writing to the tar file, so you need some synchronization around that.
>> Perhaps you'd have N threads that read and buffer a file, and N+1
>> buffers.  Then you have one additional thread that reads the complete
>> files from the buffers and writes them to the tar file. There's
>> obviously some possibility that the writer won't be able to keep up
>> and writing the backup will therefore be slower than it would be with
>> approach (2).
>>
>> There's probably also a possibility that approach (2) would thrash the
>> disk head back and forth between multiple files that are all being
>> written at the same time, and approach (3) will therefore win by not
>> thrashing the disk head. But, since spinning media are becoming less
>> and less popular and are likely to have multiple disk heads under the
>> hood when they are used, this is probably not too likely.
>>
>> I think your choice to go with approach (2) is probably reasonable,
>> but I'm not sure whether everyone will agree.
>>
>
> Yes for the tar format support, approach (2) is what I had in
> mind. Currently I'm working on the implementation and will share the patch
> in a couple of days.
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>


-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-10-03T13:39:56Z

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially,  however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

> Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.

I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-04T11:01:55Z

On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
> list.
> >> > pg_basebackup will then send back a list of filenames in this
> command. This commands will be send by each worker and that worker will be
> getting the said files.
> >>
> >> Seems reasonable, but I think you should just pass one file name and
> >> use the command multiple times, once per file.
> >
> > I considered this approach initially,  however, I adopted the current
> strategy to avoid multiple round trips between the server and clients and
> save on query processing time by issuing a single command rather than
> multiple ones. Further fetching multiple files at once will also aid in
> supporting the tar format by utilising the existing ReceiveTarFile()
> function and will be able to create a tarball for per tablespace per worker.
>
> I think that sending multiple filenames on a line could save some time
> when there are lots of very small files, because then the round-trip
> overhead could be significant.
>
> However, if you've got mostly big files, I think this is going to be a
> loser. It'll be fine if you're able to divide the work exactly evenly,
> but that's pretty hard to do, because some workers may succeed in
> copying the data faster than others for a variety of reasons: some
> data is in memory, some data has to be read from disk, different data
> may need to be read from different disks that run at different speeds,
> not all the network connections may run at the same speed. Remember
> that the backup's not done until the last worker finishes, and so
> there may well be a significant advantage in terms of overall speed in
> putting some energy into making sure that they finish as close to each
> other in time as possible.
>
> To put that another way, the first time all the workers except one get
> done while the last one still has 10GB of data to copy, somebody's
> going to be unhappy.
>

I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.

The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?

Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so I
have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel
option in all of its tests, to make sure parallel mode works expectantly.
The one thing that differs from base backup is the file checksum reporting.
In parallel mode, the total number of checksum failures are not reported
correctly however it will abort the backup whenever a checksum failure
occurs. This is because processes are not maintaining any shared state. I
assume that it's not much important to report total number of failures vs
noticing the failure and aborting.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-10-04T12:07:42Z

On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?

Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.

I'm also not particularly excited about having the server do the
sorting based on file size.  Seems like that ought to be the client's
job, if the client needs the sorting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-10-07T08:52:36Z

Thanks Asif for the patch.  I am opting this for a review.  Patch is
bit big, so here are very initial comments to make the review process
easier.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice.  It will benefit in reviewing the patch.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.

[1]
https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

Thanks,

On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in
>> given list.
>> >> > pg_basebackup will then send back a list of filenames in this
>> command. This commands will be send by each worker and that worker will be
>> getting the said files.
>> >>
>> >> Seems reasonable, but I think you should just pass one file name and
>> >> use the command multiple times, once per file.
>> >
>> > I considered this approach initially,  however, I adopted the current
>> strategy to avoid multiple round trips between the server and clients and
>> save on query processing time by issuing a single command rather than
>> multiple ones. Further fetching multiple files at once will also aid in
>> supporting the tar format by utilising the existing ReceiveTarFile()
>> function and will be able to create a tarball for per tablespace per worker.
>>
>> I think that sending multiple filenames on a line could save some time
>> when there are lots of very small files, because then the round-trip
>> overhead could be significant.
>>
>> However, if you've got mostly big files, I think this is going to be a
>> loser. It'll be fine if you're able to divide the work exactly evenly,
>> but that's pretty hard to do, because some workers may succeed in
>> copying the data faster than others for a variety of reasons: some
>> data is in memory, some data has to be read from disk, different data
>> may need to be read from different disks that run at different speeds,
>> not all the network connections may run at the same speed. Remember
>> that the backup's not done until the last worker finishes, and so
>> there may well be a significant advantage in terms of overall speed in
>> putting some energy into making sure that they finish as close to each
>> other in time as possible.
>>
>> To put that another way, the first time all the workers except one get
>> done while the last one still has 10GB of data to copy, somebody's
>> going to be unhappy.
>>
>
> I have updated the patch (see the attached patch) to include tablespace
> support, tar format support and all other backup base backup options to
> work in parallel mode as well. As previously suggested, I have removed
> BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
> backup. The tar format will write multiple tar files depending upon the
> number of workers specified. Also made all commands
> (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
> base_backup_opt_list. This way the command-line options can also be
> provided to these commands. Since the command-line options don't change
> once the backup initiates, I went this way instead of storing them in
> shared state.
>
> The START_BACKUP command will now return a sorted list of files in
> descending order based on file sizes. This way, the larger files will be on
> top of the list. hence these files will be assigned to workers one by one,
> making it so that the larger files will be copied before other files.
>
> Based on my understanding your main concern is that the files won't be
> distributed fairly i.e one worker might get a big file and take more time
> while others get done early with smaller files? In this approach I have
> created a list of files in descending order based on there sizes so all the
> big size files will come at the top. The maximum file size in PG is 1GB so
> if we have four workers who are picking up file from the list one by one,
> the worst case scenario is that one worker gets a file of 1GB to process
> while others get files of smaller size. However with this approach of
> descending files based on size and handing it out to workers one by one,
> there is a very high likelihood of workers getting work evenly. does this
> address your concerns?
>
> Furthermore the patch also includes the regression test. As t/
> 010_pg_basebackup.pl test-case is testing base backup comprehensively, so
> I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added
> parallel option in all of its tests, to make sure parallel mode works
> expectantly. The one thing that differs from base backup is the file
> checksum reporting. In parallel mode, the total number of checksum failures
> are not reported correctly however it will abort the backup whenever a
> checksum failure occurs. This is because processes are not maintaining any
> shared state. I assume that it's not much important to report total number
> of failures vs noticing the failure and aborting.
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Rushabh Lathia

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-07T12:48:12Z

On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

> Thanks Asif for the patch.  I am opting this for a review.  Patch is
> bit big, so here are very initial comments to make the review process
> easier.
>

Thanks Rushabh for reviewing the patch.

> 1) Patch seems doing lot of code shuffling, I think it would be easy
> to review if you can break the clean up patch separately.
>
> Example:
> a: setup_throttle
> b: include_wal_files
>
> 2) As I can see this patch basically have three major phase.
>
> a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
> STOP_BACKUP.
> b) Implementation of actual parallel backup.
> c) Testcase
>
> I would suggest, if you can break out in three as a separate patch that
> would be nice.  It will benefit in reviewing the patch.
>

Sure, why not. I will break them into multiple patches.

>
> 3) In your patch you are preparing the backup manifest (file which
> giving the information about the data files). Robert Haas, submitted
> the backup manifests patch on another thread [1], and I think we
> should use that patch to get the backup manifests for parallel backup.
>

Sure. Though the backup manifest patch calculates and includes the checksum
of backup files and is done
while the file is being transferred to the frontend-end. The manifest file
itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has
to be available when START_BACKUP
 is called.

That means, backup manifest should support its creation while excluding the
checksum during START_BACKUP().
I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we
can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those
directories although empty, are still
expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of
backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can
use them.

Robert what do you suggest?

-- 
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-10-07T13:05:34Z

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
>  is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.  I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other.  I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-07T13:35:19Z

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> > Sure. Though the backup manifest patch calculates and includes the
> checksum of backup files and is done
> > while the file is being transferred to the frontend-end. The manifest
> file itself is copied at the
> > very end of the backup. In parallel backup, I need the list of filenames
> before file contents are transferred, in
> > order to divide them into multiple workers. For that, the manifest file
> has to be available when START_BACKUP
> >  is called.
> >
> > That means, backup manifest should support its creation while excluding
> the checksum during START_BACKUP().
> > I also need the directory information as well for two reasons:
> >
> > - In plain format, base path has to exist before we can write the file.
> we can extract the base path from the file
> > but doing that for all files does not seem a good idea.
> > - base backup does not include the content of some directories but those
> directories although empty, are still
> > expected in PGDATA.
> >
> > I can make these changes part of parallel backup (which would be on top
> of backup manifest patch) or
> > these changes can be done as part of manifest patch and then parallel
> can use them.
> >
> > Robert what do you suggest?
>
> I think we should probably not use backup manifests here, actually. I
> initially thought that would be a good idea, but after further thought
> it seems like it just complicates the code to no real benefit.


Okay.


>   I
> suggest that the START_BACKUP command just return a result set, like a
> query, with perhaps four columns: file name, file type ('d' for
> directory or 'f' for file), file size, file mtime. pg_basebackup will
> ignore the mtime, but some other tools might find that useful
> information.
>
yes current patch already returns the result set. will add the additional
information.


> I wonder if we should also split START_BACKUP (which should enter
> non-exclusive backup mode) from GET_FILE_LIST, in case some other
> client program wants to use one of those but not the other.  I think
> that's probably a good idea, but not sure.
>

Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since
we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.


>
> I still think that the files should be requested one at a time, not a
> huge long list in a single command.
>
sure, will make the change.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-10-07T13:43:22Z

On Mon, Oct 7, 2019 at 6:06 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> > Sure. Though the backup manifest patch calculates and includes the
> checksum of backup files and is done
> > while the file is being transferred to the frontend-end. The manifest
> file itself is copied at the
> > very end of the backup. In parallel backup, I need the list of filenames
> before file contents are transferred, in
> > order to divide them into multiple workers. For that, the manifest file
> has to be available when START_BACKUP
> >  is called.
> >
> > That means, backup manifest should support its creation while excluding
> the checksum during START_BACKUP().
> > I also need the directory information as well for two reasons:
> >
> > - In plain format, base path has to exist before we can write the file.
> we can extract the base path from the file
> > but doing that for all files does not seem a good idea.
> > - base backup does not include the content of some directories but those
> directories although empty, are still
> > expected in PGDATA.
> >
> > I can make these changes part of parallel backup (which would be on top
> of backup manifest patch) or
> > these changes can be done as part of manifest patch and then parallel
> can use them.
> >
> > Robert what do you suggest?
>
> I think we should probably not use backup manifests here, actually. I
> initially thought that would be a good idea, but after further thought
> it seems like it just complicates the code to no real benefit.  I
> suggest that the START_BACKUP command just return a result set, like a
> query, with perhaps four columns: file name, file type ('d' for
> directory or 'f' for file), file size, file mtime. pg_basebackup will
> ignore the mtime, but some other tools might find that useful
> information.
>
> I wonder if we should also split START_BACKUP (which should enter
> non-exclusive backup mode) from GET_FILE_LIST, in case some other
> client program wants to use one of those but not the other.  I think
> that's probably a good idea, but not sure.
>
> I still think that the files should be requested one at a time, not a
> huge long list in a single command.
>

What about have an API to get the single file or list of files? We will use
a single file in
our application and other tools can get the benefit of list of files.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-10-07T13:47:29Z

On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> What about have an API to get the single file or list of files? We will use a single file in
> our application and other tools can get the benefit of list of files.

That sounds a bit speculative to me. Who is to say that anyone will
find that useful? I mean, I think it's fine and good to build the
functionality that we need in a way that maximizes the likelihood that
other tools can reuse that functionality, and I think we should do
that. But I don't think it's smart to build functionality that we
don't really need in the hope that somebody else will find it useful
unless we're pretty sure that they actually will. I don't see that as
being the case here; YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-16T13:19:10Z

On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > Sure. Though the backup manifest patch calculates and includes the
>> checksum of backup files and is done
>> > while the file is being transferred to the frontend-end. The manifest
>> file itself is copied at the
>> > very end of the backup. In parallel backup, I need the list of
>> filenames before file contents are transferred, in
>> > order to divide them into multiple workers. For that, the manifest file
>> has to be available when START_BACKUP
>> >  is called.
>> >
>> > That means, backup manifest should support its creation while excluding
>> the checksum during START_BACKUP().
>> > I also need the directory information as well for two reasons:
>> >
>> > - In plain format, base path has to exist before we can write the file.
>> we can extract the base path from the file
>> > but doing that for all files does not seem a good idea.
>> > - base backup does not include the content of some directories but
>> those directories although empty, are still
>> > expected in PGDATA.
>> >
>> > I can make these changes part of parallel backup (which would be on top
>> of backup manifest patch) or
>> > these changes can be done as part of manifest patch and then parallel
>> can use them.
>> >
>> > Robert what do you suggest?
>>
>> I think we should probably not use backup manifests here, actually. I
>> initially thought that would be a good idea, but after further thought
>> it seems like it just complicates the code to no real benefit.
>
>
> Okay.
>
>
>>   I
>> suggest that the START_BACKUP command just return a result set, like a
>> query, with perhaps four columns: file name, file type ('d' for
>> directory or 'f' for file), file size, file mtime. pg_basebackup will
>> ignore the mtime, but some other tools might find that useful
>> information.
>>
> yes current patch already returns the result set. will add the additional
> information.
>
>
>> I wonder if we should also split START_BACKUP (which should enter
>> non-exclusive backup mode) from GET_FILE_LIST, in case some other
>> client program wants to use one of those but not the other.  I think
>> that's probably a good idea, but not sure.
>>
>
> Currently pg_basebackup does not enter in exclusive backup mode and other
> tools have to
> use pg_start_backup() and pg_stop_backup() functions to achieve that.
> Since we are breaking
> backup into multiple command, I believe it would be a good idea to have
> this option. I will include
> it in next revision of this patch.
>
>
>>
>> I still think that the files should be requested one at a time, not a
>> huge long list in a single command.
>>
> sure, will make the change.
>
>
>

I have refactored the functionality into multiple smaller patches in order
to make the review process easier. I have divided the code into backend
changes and pg_basebackup changes. The
backend replication system now supports the following commands:

- START_BACKUP
- SEND_FILE_LIST
- SEND_FILES_CONTENT
- STOP_BACKUP

The START_BACKUP will not return the list of files, instead SEND_FILE_LIST
is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace
header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that
turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by
stop_backup so pg_basebackup will send it as part
of STOP_BACKUP.

The SEND_FILE_LIST will return the list of files. It will be returned as
resultset having four columns (filename, type, size, mtime).
The SEND_FILES_CONTENT can now return the single file or multiple files as
required. There is not much change required to
support both, so I believe it will be much useable this way if other tools
want to utilise it.

As per suggestion from Robert, I am currently working on making changes in
pg_basebackup to fetch files one by one. However that's not complete and
the attach patch
is still using the old method of multi-file fetching to test the backend
commands. I will send an updated patch which will contain the changes on
fetching file one by one.

I wanted to share the backend patch to get some feedback in the mean time.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2019-10-16T20:32:56Z

I quickly tried to have a look at your 0001-refactor patch.
Here are some comments:

1. The patch fails to compile.

Sorry if I am missing something, but am not able to understand why in new
function collectTablespaces() you have added an extra parameter NULL while
calling sendTablespace(), it fails the compilation :

+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;


gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
-I../../../../src/include    -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:12253:59: error: too many arguments to function call, expected 2,
have 3
                ti->size = infotbssize ? sendTablespace(fullpath, true,
NULL) : -1;
                                         ~~~~~~~~~~~~~~                 ^~~~

2. I think the patch needs to run via pg_indent. It does not follow 80
column
width.
e.g.

+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
infotbssize, bool needtblspcmapfile)
+{

3.
The comments in re-factored code appear to be redundant. example:
Following comment:
 /* Setup and activate network throttling, if client requested it */
appears thrice in the code, before calling setup_throttle(), in the
prologue of
the function setup_throttle(), and above the if() in that function.
Similarly - the comment:
/* Collect information about all tablespaces */
in collectTablespaces().

4.
In function include_wal_files() why is the parameter TimeLineID i.e. endtli
needed. I don't see it being used in the function at all. I think you can
safely
get rid of it.

+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)

Regards,
Jeevan Ladhe

On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
>>
>>
>> On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>> > Sure. Though the backup manifest patch calculates and includes the
>>> checksum of backup files and is done
>>> > while the file is being transferred to the frontend-end. The manifest
>>> file itself is copied at the
>>> > very end of the backup. In parallel backup, I need the list of
>>> filenames before file contents are transferred, in
>>> > order to divide them into multiple workers. For that, the manifest
>>> file has to be available when START_BACKUP
>>> >  is called.
>>> >
>>> > That means, backup manifest should support its creation while
>>> excluding the checksum during START_BACKUP().
>>> > I also need the directory information as well for two reasons:
>>> >
>>> > - In plain format, base path has to exist before we can write the
>>> file. we can extract the base path from the file
>>> > but doing that for all files does not seem a good idea.
>>> > - base backup does not include the content of some directories but
>>> those directories although empty, are still
>>> > expected in PGDATA.
>>> >
>>> > I can make these changes part of parallel backup (which would be on
>>> top of backup manifest patch) or
>>> > these changes can be done as part of manifest patch and then parallel
>>> can use them.
>>> >
>>> > Robert what do you suggest?
>>>
>>> I think we should probably not use backup manifests here, actually. I
>>> initially thought that would be a good idea, but after further thought
>>> it seems like it just complicates the code to no real benefit.
>>
>>
>> Okay.
>>
>>
>>>   I
>>> suggest that the START_BACKUP command just return a result set, like a
>>> query, with perhaps four columns: file name, file type ('d' for
>>> directory or 'f' for file), file size, file mtime. pg_basebackup will
>>> ignore the mtime, but some other tools might find that useful
>>> information.
>>>
>> yes current patch already returns the result set. will add the additional
>> information.
>>
>>
>>> I wonder if we should also split START_BACKUP (which should enter
>>> non-exclusive backup mode) from GET_FILE_LIST, in case some other
>>> client program wants to use one of those but not the other.  I think
>>> that's probably a good idea, but not sure.
>>>
>>
>> Currently pg_basebackup does not enter in exclusive backup mode and other
>> tools have to
>> use pg_start_backup() and pg_stop_backup() functions to achieve that.
>> Since we are breaking
>> backup into multiple command, I believe it would be a good idea to have
>> this option. I will include
>> it in next revision of this patch.
>>
>>
>>>
>>> I still think that the files should be requested one at a time, not a
>>> huge long list in a single command.
>>>
>> sure, will make the change.
>>
>>
>>
>
> I have refactored the functionality into multiple smaller patches in order
> to make the review process easier. I have divided the code into backend
> changes and pg_basebackup changes. The
> backend replication system now supports the following commands:
>
> - START_BACKUP
> - SEND_FILE_LIST
> - SEND_FILES_CONTENT
> - STOP_BACKUP
>
> The START_BACKUP will not return the list of files, instead SEND_FILE_LIST
> is used for that. The START_BACKUP
> now calls pg_start_backup and returns starting WAL position, tablespace
> header information and content of backup label file.
> Initially I was using tmp files to store the backup_label content but that
> turns out to be bad idea, because there can be multiple
> non-exclusive backups running. The backup label information is needed by
> stop_backup so pg_basebackup will send it as part
> of STOP_BACKUP.
>
> The SEND_FILE_LIST will return the list of files. It will be returned as
> resultset having four columns (filename, type, size, mtime).
> The SEND_FILES_CONTENT can now return the single file or multiple files as
> required. There is not much change required to
> support both, so I believe it will be much useable this way if other tools
> want to utilise it.
>
> As per suggestion from Robert, I am currently working on making changes in
> pg_basebackup to fetch files one by one. However that's not complete and
> the attach patch
> is still using the old method of multi-file fetching to test the backend
> commands. I will send an updated patch which will contain the changes on
> fetching file one by one.
>
> I wanted to share the backend patch to get some feedback in the mean time.
>
> Thanks,
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-17T05:21:15Z

On Thu, Oct 17, 2019 at 1:33 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:

> I quickly tried to have a look at your 0001-refactor patch.
> Here are some comments:
>
> 1. The patch fails to compile.
>
> Sorry if I am missing something, but am not able to understand why in new
> function collectTablespaces() you have added an extra parameter NULL while
> calling sendTablespace(), it fails the compilation :
>
> + ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
>
>
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
> -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror
> -I../../../../src/include    -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
> xlog.c:12253:59: error: too many arguments to function call, expected 2,
> have 3
>                 ti->size = infotbssize ? sendTablespace(fullpath, true,
> NULL) : -1;
>                                          ~~~~~~~~~~~~~~
> ^~~~
>
> 2. I think the patch needs to run via pg_indent. It does not follow 80
> column
> width.
> e.g.
>
> +void
> +collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool
> infotbssize, bool needtblspcmapfile)
> +{
>
> 3.
> The comments in re-factored code appear to be redundant. example:
> Following comment:
>  /* Setup and activate network throttling, if client requested it */
> appears thrice in the code, before calling setup_throttle(), in the
> prologue of
> the function setup_throttle(), and above the if() in that function.
> Similarly - the comment:
> /* Collect information about all tablespaces */
> in collectTablespaces().
>
> 4.
> In function include_wal_files() why is the parameter TimeLineID i.e. endtli
> needed. I don't see it being used in the function at all. I think you can
> safely
> get rid of it.
>
> +include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
>
>
Thanks Jeevan. Some changes that should be part of 2nd patch were left in
the 1st. I have fixed that and the above mentioned issues as well.
Attached are the updated patches.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-10-18T11:11:53Z

On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> Attached are the updated patches.
>

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.

--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.

2.
+typedef struct
+{
+    char        name[MAXPGPATH];
+    char        type;
+    int32        size;
+    time_t        mtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+    SEND_FILE_LIST,
+    SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total

--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+    int            len = 0;
+    SimpleStringListCell *cell;
+
+    for (cell = list->head; cell; cell = cell->next, len++)
+        ;
+
+    return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
              */
             snprintf(filename, sizeof(filename), "%s/%s", current_path,
                      copybuf);
+
             if (filename[strlen(filename) - 1] == '/')
             {
                 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
                      * can map them too.)
                      */
                     filename[strlen(filename) - 1] = '\0';    /* Remove
trailing slash */
-
                     mapped_tblspc_path =
get_tablespace_mapping(&copybuf[157]);
+
                     if (symlink(mapped_tblspc_path, filename) != 0)
                     {
                         pg_log_error("could not create symbolic link from
\"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it
be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()

Thanks

>
> Thanks,
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Ibrar Ahmed <ibrar.ahmad@gmail.com> — 2019-10-24T10:19:08Z

On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

>
>
> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>> Attached are the updated patches.
>>
>
> I had a quick look over these changes and they look good overall.
> However, here are my few review comments I caught while glancing the
> patches
> 0002 and 0003.
>
>
> --- 0002 patch
>
> 1.
> Can lsn option be renamed to start-wal-location? This will be more clear
> too.
>
> 2.
> +typedef struct
> +{
> +    char        name[MAXPGPATH];
> +    char        type;
> +    int32        size;
> +    time_t        mtime;
> +} BackupFile;
>
> I think it will be good if we keep this structure in a common place so that
> the client can also use it.
>
> 3.
> +    SEND_FILE_LIST,
> +    SEND_FILES_CONTENT,
> Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
> respectively?
> The reason behind the first name change is, we are not getting only file
> lists
> here instead we are getting a few more details with that too. And for
> others,
> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
>
> 4.
> Typos:
> non-exlusive => non-exclusive
> retured => returned
> optionaly => optionally
> nessery => necessary
> totoal => total
>
>
> --- 0003 patch
>
> 1.
> +static int
> +simple_list_length(SimpleStringList *list)
> +{
> +    int            len = 0;
> +    SimpleStringListCell *cell;
> +
> +    for (cell = list->head; cell; cell = cell->next, len++)
> +        ;
> +
> +    return len;
> +}
>
> I think it will be good if it goes to simple_list.c. That will help in
> other
> usages as well.
>
> 2.
> Please revert these unnecessary changes:
>
> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
> int rownum)
>               */
>              snprintf(filename, sizeof(filename), "%s/%s", current_path,
>                       copybuf);
> +
>              if (filename[strlen(filename) - 1] == '/')
>              {
>                  /*
>
> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
> int rownum)
>                       * can map them too.)
>                       */
>                      filename[strlen(filename) - 1] = '\0';    /* Remove
> trailing slash */
> -
>                      mapped_tblspc_path =
> get_tablespace_mapping(&copybuf[157]);
> +
>                      if (symlink(mapped_tblspc_path, filename) != 0)
>                      {
>                          pg_log_error("could not create symbolic link from
> \"%s\" to \"%s\": %m",
>
> 3.
> Typos:
> retrive => retrieve
> takecare => take care
> tablespae => tablespace
>
> 4.
> ParallelBackupEnd() function does not do anything for parallelism. Will it
> be
> better to just rename it as EndBackup()?
>
> 5.
> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
> reusing
> a LABEL option, that seems odd. How about adding a new option for that?
>
> 6.
> It will be good if we have some comments explaining what the function is
> actually doing in its prologue. For functions like:
> GetBackupFilesList()
> ReceiveFiles()
> create_workers_and_fetch()
>
>
> Thanks
>
>
>>
>> Thanks,
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> Jeevan Chalke
> Associate Database Architect & Team Lead, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>
I had a detailed discussion with Robert Haas at PostgreConf Europe about
parallel backup.
We discussed the current state of the patch and what needs to be done to
get the patch committed.

- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this
is a client utility it makes
more sense to use threads. The data needs to be shared amongst different
threads and the main process,
handling that is simpler as compared to interprocess communication.

- Fetching a single file or multiple files was also discussed. We concluded
in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if parallel
writing gives us
any benefit. This benchmark needs to be done on different hardware and
different
network to identify which are the real bottlenecks. In general, we agreed
that we could start with fetching
one file at a time but that will be revisited after the benchmarks are done.

- There is also an ongoing debate in this thread that we should have one
single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the
main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per
thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for
writing on tar file and all
the other threads will receive the data from the network and stream to the
master thread. This also
supports the idea of using a thread-based model rather than a process-based
approach because it
requires too much data sharing between processes. If we cannot achieve
this, then we can disable the
TAR option for parallel backup in the first version.

- In the case of data sharing, we need to try to avoid unnecessary locking
and more suitable algorithm to
solve the reader-writer problem is required.

-- 
Ibrar Ahmed

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-24T11:24:41Z

On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

>
>
> On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
> jeevan.chalke@enterprisedb.com> wrote:
>
>>
>>
>> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>> Attached are the updated patches.
>>>
>>
>> I had a quick look over these changes and they look good overall.
>> However, here are my few review comments I caught while glancing the
>> patches
>> 0002 and 0003.
>>
>>
>> --- 0002 patch
>>
>> 1.
>> Can lsn option be renamed to start-wal-location? This will be more clear
>> too.
>>
>> 2.
>> +typedef struct
>> +{
>> +    char        name[MAXPGPATH];
>> +    char        type;
>> +    int32        size;
>> +    time_t        mtime;
>> +} BackupFile;
>>
>> I think it will be good if we keep this structure in a common place so
>> that
>> the client can also use it.
>>
>> 3.
>> +    SEND_FILE_LIST,
>> +    SEND_FILES_CONTENT,
>> Can above two commands renamed to SEND_BACKUP_MANIFEST and
>> SEND_BACKUP_FILE
>> respectively?
>> The reason behind the first name change is, we are not getting only file
>> lists
>> here instead we are getting a few more details with that too. And for
>> others,
>> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
>>
>> 4.
>> Typos:
>> non-exlusive => non-exclusive
>> retured => returned
>> optionaly => optionally
>> nessery => necessary
>> totoal => total
>>
>>
>> --- 0003 patch
>>
>> 1.
>> +static int
>> +simple_list_length(SimpleStringList *list)
>> +{
>> +    int            len = 0;
>> +    SimpleStringListCell *cell;
>> +
>> +    for (cell = list->head; cell; cell = cell->next, len++)
>> +        ;
>> +
>> +    return len;
>> +}
>>
>> I think it will be good if it goes to simple_list.c. That will help in
>> other
>> usages as well.
>>
>> 2.
>> Please revert these unnecessary changes:
>>
>> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>> *res, int rownum)
>>               */
>>              snprintf(filename, sizeof(filename), "%s/%s", current_path,
>>                       copybuf);
>> +
>>              if (filename[strlen(filename) - 1] == '/')
>>              {
>>                  /*
>>
>> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>> *res, int rownum)
>>                       * can map them too.)
>>                       */
>>                      filename[strlen(filename) - 1] = '\0';    /* Remove
>> trailing slash */
>> -
>>                      mapped_tblspc_path =
>> get_tablespace_mapping(&copybuf[157]);
>> +
>>                      if (symlink(mapped_tblspc_path, filename) != 0)
>>                      {
>>                          pg_log_error("could not create symbolic link
>> from \"%s\" to \"%s\": %m",
>>
>> 3.
>> Typos:
>> retrive => retrieve
>> takecare => take care
>> tablespae => tablespace
>>
>> 4.
>> ParallelBackupEnd() function does not do anything for parallelism. Will
>> it be
>> better to just rename it as EndBackup()?
>>
>> 5.
>> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
>> reusing
>> a LABEL option, that seems odd. How about adding a new option for that?
>>
>> 6.
>> It will be good if we have some comments explaining what the function is
>> actually doing in its prologue. For functions like:
>> GetBackupFilesList()
>> ReceiveFiles()
>> create_workers_and_fetch()
>>
>>
>> Thanks
>>
>>
>>>
>>> Thanks,
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> --
>> Jeevan Chalke
>> Associate Database Architect & Team Lead, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
> I had a detailed discussion with Robert Haas at PostgreConf Europe about
> parallel backup.
> We discussed the current state of the patch and what needs to be done to
> get the patch committed.
>
> - The current patch uses a process to implement parallelism. There are many
> reasons we need to use threads instead of processes. To start with, as
> this is a client utility it makes
> more sense to use threads. The data needs to be shared amongst different
> threads and the main process,
> handling that is simpler as compared to interprocess communication.
>

Yes I agree. I have already converted the code to use threads instead of
processes. This avoids the overhead
of interprocess communication.

With a single file fetching strategy, this requires communication between
competing threads/processes. To handle
that in a multiprocess application, it requires IPC. The current approach
of multiple threads instead of processes
avoids this overhead.


> - Fetching a single file or multiple files was also discussed. We
> concluded in our discussion that we
> need to benchmark to see if disk I/O is a bottleneck or not and if
> parallel writing gives us
> any benefit. This benchmark needs to be done on different hardware and
> different
> network to identify which are the real bottlenecks. In general, we agreed
> that we could start with fetching
> one file at a time but that will be revisited after the benchmarks are
> done.
>

I'll share the updated patch in the next couple of days. After that, I'll
work on benchmarking that in
different environments that I have.


>
> - There is also an ongoing debate in this thread that we should have one
> single tar file for all files or one
> TAR file per thread. I really want to have a single tar file because the
> main purpose of the TAR file is to
> reduce the management of multiple files, but in case of one file per
> thread, we end up with many tar
> files. Therefore we need to have one master thread which is responsible
> for writing on tar file and all
> the other threads will receive the data from the network and stream to the
> master thread. This also
> supports the idea of using a thread-based model rather than a
> process-based approach because it
> requires too much data sharing between processes. If we cannot achieve
> this, then we can disable the
> TAR option for parallel backup in the first version.
>

I am in favour of disabling the tar format for the first version of
parallel backup.


> - In the case of data sharing, we need to try to avoid unnecessary locking
> and more suitable algorithm to
> solve the reader-writer problem is required.
>
> --
> Ibrar Ahmed
>


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-28T14:03:33Z

On Thu, Oct 24, 2019 at 4:24 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>
>>
>>
>> On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
>> jeevan.chalke@enterprisedb.com> wrote:
>>
>>>
>>>
>>> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Attached are the updated patches.
>>>>
>>>
>>> I had a quick look over these changes and they look good overall.
>>> However, here are my few review comments I caught while glancing the
>>> patches
>>> 0002 and 0003.
>>>
>>>
>>> --- 0002 patch
>>>
>>> 1.
>>> Can lsn option be renamed to start-wal-location? This will be more clear
>>> too.
>>>
>>> 2.
>>> +typedef struct
>>> +{
>>> +    char        name[MAXPGPATH];
>>> +    char        type;
>>> +    int32        size;
>>> +    time_t        mtime;
>>> +} BackupFile;
>>>
>>> I think it will be good if we keep this structure in a common place so
>>> that
>>> the client can also use it.
>>>
>>> 3.
>>> +    SEND_FILE_LIST,
>>> +    SEND_FILES_CONTENT,
>>> Can above two commands renamed to SEND_BACKUP_MANIFEST and
>>> SEND_BACKUP_FILE
>>> respectively?
>>> The reason behind the first name change is, we are not getting only file
>>> lists
>>> here instead we are getting a few more details with that too. And for
>>> others,
>>> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
>>>
>>> 4.
>>> Typos:
>>> non-exlusive => non-exclusive
>>> retured => returned
>>> optionaly => optionally
>>> nessery => necessary
>>> totoal => total
>>>
>>>
>>> --- 0003 patch
>>>
>>> 1.
>>> +static int
>>> +simple_list_length(SimpleStringList *list)
>>> +{
>>> +    int            len = 0;
>>> +    SimpleStringListCell *cell;
>>> +
>>> +    for (cell = list->head; cell; cell = cell->next, len++)
>>> +        ;
>>> +
>>> +    return len;
>>> +}
>>>
>>> I think it will be good if it goes to simple_list.c. That will help in
>>> other
>>> usages as well.
>>>
>>> 2.
>>> Please revert these unnecessary changes:
>>>
>>> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>>> *res, int rownum)
>>>               */
>>>              snprintf(filename, sizeof(filename), "%s/%s", current_path,
>>>                       copybuf);
>>> +
>>>              if (filename[strlen(filename) - 1] == '/')
>>>              {
>>>                  /*
>>>
>>> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>>> *res, int rownum)
>>>                       * can map them too.)
>>>                       */
>>>                      filename[strlen(filename) - 1] = '\0';    /* Remove
>>> trailing slash */
>>> -
>>>                      mapped_tblspc_path =
>>> get_tablespace_mapping(&copybuf[157]);
>>> +
>>>                      if (symlink(mapped_tblspc_path, filename) != 0)
>>>                      {
>>>                          pg_log_error("could not create symbolic link
>>> from \"%s\" to \"%s\": %m",
>>>
>>> 3.
>>> Typos:
>>> retrive => retrieve
>>> takecare => take care
>>> tablespae => tablespace
>>>
>>> 4.
>>> ParallelBackupEnd() function does not do anything for parallelism. Will
>>> it be
>>> better to just rename it as EndBackup()?
>>>
>>> 5.
>>> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
>>> reusing
>>> a LABEL option, that seems odd. How about adding a new option for that?
>>>
>>> 6.
>>> It will be good if we have some comments explaining what the function is
>>> actually doing in its prologue. For functions like:
>>> GetBackupFilesList()
>>> ReceiveFiles()
>>> create_workers_and_fetch()
>>>
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Asif Rehman
>>>> Highgo Software (Canada/China/Pakistan)
>>>> URL : www.highgo.ca
>>>>
>>>>
>>>
>>> --
>>> Jeevan Chalke
>>> Associate Database Architect & Team Lead, Product Development
>>> EnterpriseDB Corporation
>>> The Enterprise PostgreSQL Company
>>>
>>>
>> I had a detailed discussion with Robert Haas at PostgreConf Europe about
>> parallel backup.
>> We discussed the current state of the patch and what needs to be done to
>> get the patch committed.
>>
>> - The current patch uses a process to implement parallelism. There are
>> many
>> reasons we need to use threads instead of processes. To start with, as
>> this is a client utility it makes
>> more sense to use threads. The data needs to be shared amongst different
>> threads and the main process,
>> handling that is simpler as compared to interprocess communication.
>>
>
> Yes I agree. I have already converted the code to use threads instead of
> processes. This avoids the overhead
> of interprocess communication.
>
> With a single file fetching strategy, this requires communication between
> competing threads/processes. To handle
> that in a multiprocess application, it requires IPC. The current approach
> of multiple threads instead of processes
> avoids this overhead.
>
>
>> - Fetching a single file or multiple files was also discussed. We
>> concluded in our discussion that we
>> need to benchmark to see if disk I/O is a bottleneck or not and if
>> parallel writing gives us
>> any benefit. This benchmark needs to be done on different hardware and
>> different
>> network to identify which are the real bottlenecks. In general, we agreed
>> that we could start with fetching
>> one file at a time but that will be revisited after the benchmarks are
>> done.
>>
>
> I'll share the updated patch in the next couple of days. After that, I'll
> work on benchmarking that in
> different environments that I have.
>
>
>>
>> - There is also an ongoing debate in this thread that we should have one
>> single tar file for all files or one
>> TAR file per thread. I really want to have a single tar file because the
>> main purpose of the TAR file is to
>> reduce the management of multiple files, but in case of one file per
>> thread, we end up with many tar
>> files. Therefore we need to have one master thread which is responsible
>> for writing on tar file and all
>> the other threads will receive the data from the network and stream to
>> the master thread. This also
>> supports the idea of using a thread-based model rather than a
>> process-based approach because it
>> requires too much data sharing between processes. If we cannot achieve
>> this, then we can disable the
>> TAR option for parallel backup in the first version.
>>
>
> I am in favour of disabling the tar format for the first version of
> parallel backup.
>
>
>> - In the case of data sharing, we need to try to avoid unnecessary
>> locking and more suitable algorithm to
>> solve the reader-writer problem is required.
>>
>> --
>> Ibrar Ahmed
>>
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>
I have updated the patch to include the changes suggested by Jeevan. This
patch also implements the thread workers instead of
processes and fetches a single file at a time. The tar format has been
disabled for first version of parallel backup.

Conversion from the previous process based application to the current
thread based one required slight modification in data structure,
addition of a few new functions and progress reporting functionality.

The core data structure remains in tact where table space based file
listing is maintained, however, we are now maintaining a list of all
files (maintaining pointers to FileInfo structure; so no duplication of
data), so that we can sequentially access these without adding too
much processing in critical section. The current scope of the critical
section for thread workers is limited to incrementing the file index
within the list of files.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-10-28T15:28:49Z

On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.

Looking at 0001-0003:

It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.

The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.

STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.

SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.

I think it would be fine to decide that this facility won't support
exclusive-mode backup.

I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.

I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.

The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-10-30T14:16:11Z

On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > I have updated the patch to include the changes suggested by Jeevan.
> This patch also implements the thread workers instead of
> > processes and fetches a single file at a time. The tar format has been
> disabled for first version of parallel backup.
>
> Looking at 0001-0003:
>
> It's not clear to me what the purpose of the start WAL location is
> supposed to be. As far as I can see, SendBackupFiles() stores it in a
> variable which is then used for exactly nothing, and nothing else uses
> it.  It seems like that would be part of a potential incremental
> backup feature, but I don't see what it's got to do with parallel full
> backup.
>

'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.


> The tablespace_path option appears entirely unused, and I don't know
> why that should be necessary here, either.
>

This is to calculate the basepathlen. We need to exclude the tablespace
location (or
base path) from the filename before it is sent to the client with sendFile
call. I added
this option primarily to avoid performing string manipulation on filename
to extract the
tablespace location and then calculate the basepathlen.

Alternatively we can do it by extracting the base path from the received
filename. What
do you suggest?


>
> STORE_BACKUPFILE() seems like maybe it should be a function rather
> than a macro, and also probably be renamed, because it doesn't store
> files and the argument's not necessarily a file.
>
Sure.


>
> SendBackupManifest() does not send a backup manifest in the sense
> contemplated by the email thread on that subject.  It sends a file
> list.  That seems like the right idea - IMHO, anyway - but you need to
> do a thorough renaming.
>

I'm considering the following command names:
START_BACKUP
- Starts the backup process

SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as
filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.

SEND_BACKUP_FILES
- Sends one or more files to the client.

STOP_BACKUP
- Stops the backup process.

I'll update the function names accordingly after your confirmation. Of
course, suggestions for
better names are welcome.


>
> I think it would be fine to decide that this facility won't support
> exclusive-mode backup.
>

Sure. Will drop this patch.


>
> I don't think much of having both sendDir() and sendDir_(). The latter
> name is inconsistent with any naming convention we have, and there
> seems to be no reason not to just add an argument to sendDir() and
> change the callers.


> I think we should rename - perhaps as a preparatory patch - the
> sizeonly flag to dryrun, or something like that.
>

Sure, will take care of it.


> The resource cleanup does not look right.  You've included calls to
> PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
> and StopBackup(), but what happens if there is an error or even a
> clean shutdown of the connection in between? I think that there needs

to be some change here to ensure that a walsender will always call
> base_backup_cleanup() when it exits; I think that'd probably remove
> the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
> we have already.  This might also be something that could be done as a
> separate, prepatory refactoring patch.
>

You're right. I didn't handle this case properly. I will removed
PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way
whenever backend process exits,
base_backup_cleanup will be called:
- If it exists before calling the do_pg_stop_backup, base_backup_cleanup
will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
then base_backup_cleanup
will simply return without doing anything.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-11-01T15:26:02Z

On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > I have updated the patch to include the changes suggested by Jeevan.
>> This patch also implements the thread workers instead of
>> > processes and fetches a single file at a time. The tar format has been
>> disabled for first version of parallel backup.
>>
>> Looking at 0001-0003:
>>
>> It's not clear to me what the purpose of the start WAL location is
>> supposed to be. As far as I can see, SendBackupFiles() stores it in a
>> variable which is then used for exactly nothing, and nothing else uses
>> it.  It seems like that would be part of a potential incremental
>> backup feature, but I don't see what it's got to do with parallel full
>> backup.
>>
>
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>
>
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>>
>
> This is to calculate the basepathlen. We need to exclude the tablespace
> location (or
> base path) from the filename before it is sent to the client with sendFile
> call. I added
> this option primarily to avoid performing string manipulation on filename
> to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received
> filename. What
> do you suggest?
>
>
>>
>> STORE_BACKUPFILE() seems like maybe it should be a function rather
>> than a macro, and also probably be renamed, because it doesn't store
>> files and the argument's not necessarily a file.
>>
> Sure.
>
>
>>
>> SendBackupManifest() does not send a backup manifest in the sense
>> contemplated by the email thread on that subject.  It sends a file
>> list.  That seems like the right idea - IMHO, anyway - but you need to
>> do a thorough renaming.
>>
>
> I'm considering the following command names:
> START_BACKUP
> - Starts the backup process
>
> SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
> - Sends the list of all files (along with file information such as
> filename, file type (directory/file/link),
> file size and file mtime for each file) to be backed up.
>
> SEND_BACKUP_FILES
> - Sends one or more files to the client.
>
> STOP_BACKUP
> - Stops the backup process.
>
> I'll update the function names accordingly after your confirmation. Of
> course, suggestions for
> better names are welcome.
>
>
>>
>> I think it would be fine to decide that this facility won't support
>> exclusive-mode backup.
>>
>
> Sure. Will drop this patch.
>
>
>>
>> I don't think much of having both sendDir() and sendDir_(). The latter
>> name is inconsistent with any naming convention we have, and there
>> seems to be no reason not to just add an argument to sendDir() and
>> change the callers.
>
>
>> I think we should rename - perhaps as a preparatory patch - the
>> sizeonly flag to dryrun, or something like that.
>>
>
> Sure, will take care of it.
>
>
>> The resource cleanup does not look right.  You've included calls to
>> PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
>> and StopBackup(), but what happens if there is an error or even a
>> clean shutdown of the connection in between? I think that there needs
>
> to be some change here to ensure that a walsender will always call
>> base_backup_cleanup() when it exits; I think that'd probably remove
>> the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
>> we have already.  This might also be something that could be done as a
>> separate, prepatory refactoring patch.
>>
>
> You're right. I didn't handle this case properly. I will removed
> PG_ENSURE_ERROR_CLEANUP
> calls and replace it with before_shmem_exit handler. This way
> whenever backend process exits,
> base_backup_cleanup will be called:
> - If it exists before calling the do_pg_stop_backup, base_backup_cleanup
> will take care of cleanup.
> - otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
> then base_backup_cleanup
> will simply return without doing anything.
>
>
>
The updated patches are attached.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-11-01T15:52:51Z

On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

You should add docs, too.  I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-11-04T13:08:55Z

On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > 'startptr' is used by sendFile() during checksum verification. Since
> > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>
> Ugh, global variables.
>
> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
> STOP_BACKUP all using the same base_backup_opt_list production as
> BASE_BACKUP? Presumably most of those options are not applicable to
> most of those commands, and the productions should therefore be
> separated.
>

Are you expecting something like the attached patch? Basically I have
reorganised the grammar
rules so each command can have the options required by it.

I was feeling a bit reluctant for this change because it may add some
unwanted grammar rules in
the replication grammar. Since these commands are using the same options as
base backup, may
be we could throw error inside the relevant functions on unwanted options?



> You should add docs, too.  I wouldn't have to guess what some of this
> stuff was for if you wrote documentation explaining what this stuff
> was for. :-)
>

Yes I will add it in the next patch.


>
> >> The tablespace_path option appears entirely unused, and I don't know
> >> why that should be necessary here, either.
> >
> > This is to calculate the basepathlen. We need to exclude the tablespace
> location (or
> > base path) from the filename before it is sent to the client with
> sendFile call. I added
> > this option primarily to avoid performing string manipulation on
> filename to extract the
> > tablespace location and then calculate the basepathlen.
> >
> > Alternatively we can do it by extracting the base path from the received
> filename. What
> > do you suggest?
>
> I don't think the server needs any information from the client in
> order to be able to exclude the tablespace location from the pathname.
> Whatever it needs to know, it should be able to figure out, just as it
> would in a non-parallel backup.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-11-12T12:07:14Z

On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > 'startptr' is used by sendFile() during checksum verification. Since
>> > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>>
>> Ugh, global variables.
>>
>> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
>> STOP_BACKUP all using the same base_backup_opt_list production as
>> BASE_BACKUP? Presumably most of those options are not applicable to
>> most of those commands, and the productions should therefore be
>> separated.
>>
>
> Are you expecting something like the attached patch? Basically I have
> reorganised the grammar
> rules so each command can have the options required by it.
>
> I was feeling a bit reluctant for this change because it may add some
> unwanted grammar rules in
> the replication grammar. Since these commands are using the same options
> as base backup, may
> be we could throw error inside the relevant functions on unwanted options?
>
>
>
>> You should add docs, too.  I wouldn't have to guess what some of this
>> stuff was for if you wrote documentation explaining what this stuff
>> was for. :-)
>>
>
> Yes I will add it in the next patch.
>
>
>>
>> >> The tablespace_path option appears entirely unused, and I don't know
>> >> why that should be necessary here, either.
>> >
>> > This is to calculate the basepathlen. We need to exclude the tablespace
>> location (or
>> > base path) from the filename before it is sent to the client with
>> sendFile call. I added
>> > this option primarily to avoid performing string manipulation on
>> filename to extract the
>> > tablespace location and then calculate the basepathlen.
>> >
>> > Alternatively we can do it by extracting the base path from the
>> received filename. What
>> > do you suggest?
>>
>> I don't think the server needs any information from the client in
>> order to be able to exclude the tablespace location from the pathname.
>> Whatever it needs to know, it should be able to figure out, just as it
>> would in a non-parallel backup.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
I have updated the replication grammar with some new rules to differentiate
the options production
for base backup and newly added commands.

I have also created a separate patch to include the documentation changes.
The current syntax is as below:

- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
- SEND_BACKUP_FILELIST
- SEND_BACKUP_FILES ( 'FILE' [, ...] )  [ MAX_RATE rate ] [
NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-11-13T13:34:04Z

On Tue, Nov 12, 2019 at 5:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
>>
>>
>> On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>> > 'startptr' is used by sendFile() during checksum verification. Since
>>> > SendBackupFiles() is using sendFIle we have to set a valid WAL
>>> location.
>>>
>>> Ugh, global variables.
>>>
>>> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
>>> STOP_BACKUP all using the same base_backup_opt_list production as
>>> BASE_BACKUP? Presumably most of those options are not applicable to
>>> most of those commands, and the productions should therefore be
>>> separated.
>>>
>>
>> Are you expecting something like the attached patch? Basically I have
>> reorganised the grammar
>> rules so each command can have the options required by it.
>>
>> I was feeling a bit reluctant for this change because it may add some
>> unwanted grammar rules in
>> the replication grammar. Since these commands are using the same options
>> as base backup, may
>> be we could throw error inside the relevant functions on unwanted options?
>>
>>
>>
>>> You should add docs, too.  I wouldn't have to guess what some of this
>>> stuff was for if you wrote documentation explaining what this stuff
>>> was for. :-)
>>>
>>
>> Yes I will add it in the next patch.
>>
>>
>>>
>>> >> The tablespace_path option appears entirely unused, and I don't know
>>> >> why that should be necessary here, either.
>>> >
>>> > This is to calculate the basepathlen. We need to exclude the
>>> tablespace location (or
>>> > base path) from the filename before it is sent to the client with
>>> sendFile call. I added
>>> > this option primarily to avoid performing string manipulation on
>>> filename to extract the
>>> > tablespace location and then calculate the basepathlen.
>>> >
>>> > Alternatively we can do it by extracting the base path from the
>>> received filename. What
>>> > do you suggest?
>>>
>>> I don't think the server needs any information from the client in
>>> order to be able to exclude the tablespace location from the pathname.
>>> Whatever it needs to know, it should be able to figure out, just as it
>>> would in a non-parallel backup.
>>>
>>> --
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>
>>
> I have updated the replication grammar with some new rules to
> differentiate the options production
> for base backup and newly added commands.
>
> I have also created a separate patch to include the documentation changes.
> The current syntax is as below:
>
> - START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
> - STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
> - SEND_BACKUP_FILELIST
> - SEND_BACKUP_FILES ( 'FILE' [, ...] )  [ MAX_RATE rate ] [
> NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]
>
>
Sorry, I sent the wrong patches. Please see the correct version of the
patches (_v6).

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-11-27T08:38:31Z

On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> Sorry, I sent the wrong patches. Please see the correct version of the
> patches (_v6).
>

Review comments on these patches:

1.
+    XLogRecPtr    wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+    int32        size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+        else if (
+#ifndef WIN32
+                 S_ISLNK(statbuf.st_mode)
+#else
+                 pgwin32_is_junction(pathbuf)
+#endif
+            )
+        {
+            /*
+             * If symlink, write it as a directory. file symlinks only
allowed
+             * in pg_tblspc
+             */
+            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
false);
+        }

In normal backup mode, we skip the special file which is not a regular file
or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special
files?

5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+        /*
+         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+         * 'base/1/1245/32683', ...) [options]
+         */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+            SCONST                            { $$ = (Node *)
makeString($1); }
+            ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories

=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the
backup-manifest
infrastructure from that patch?

When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

[1]
https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>
Thanks
-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-11-27T19:57:27Z

On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?

I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST). Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-12-10T14:33:48Z

On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

>
>
> On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>> Sorry, I sent the wrong patches. Please see the correct version of the
>> patches (_v6).
>>
>
> Review comments on these patches:
>
> 1.
> +    XLogRecPtr    wal_location;
>
> Looking at the other field names in basebackup_options structure, let's use
> wallocation instead. Or better startwallocation to be precise.
>
> 2.
> +    int32        size;
>
> Should we use size_t here?
>
> 3.
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?
>
> 4.
> +        else if (
> +#ifndef WIN32
> +                 S_ISLNK(statbuf.st_mode)
> +#else
> +                 pgwin32_is_junction(pathbuf)
> +#endif
> +            )
> +        {
> +            /*
> +             * If symlink, write it as a directory. file symlinks only
> allowed
> +             * in pg_tblspc
> +             */
> +            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
> +            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
> false);
> +        }
>
> In normal backup mode, we skip the special file which is not a regular
> file or
> a directory or a symlink inside pg_tblspc. But in your patch, above code,
> treats it as a directory. Should parallel backup too skip such special
> files?
>

Yeah going through the code again, I found it a little bit inconsistent. In
fact
SendBackupFiles function is supposed to send the files that were requested
of
it. However, currently is performing these tasks:

1) If the requested file were to be a directory, it will return a TAR
directory entry.
2) If the requested files were to be symlink inside pg_tblspc, it will
return the link path.
3) and as you pointed out above, if the requested files were a symlink
outside pg_tblspc
and inside PGDATA then it will return TAR directory entry.

I think that this function should not take care of any of the above.
Instead, it should
be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should
only send the
regular files and ignore the request of any other kind, be it a directory
or symlink.

Any thoughts?


> 5.
> Please keep header file inclusions in alphabetical order in basebackup.c
> and
> pg_basebackup.c
>
> 6.
> +        /*
> +         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
> +         * 'base/1/1245/32683', ...) [options]
> +         */
>
> Please update these comments as we fetch one file at a time.
>
> 7.
> +backup_file:
> +            SCONST                            { $$ = (Node *)
> makeString($1); }
> +            ;
> +
>
> Instead of having this rule with only one constant terminal, we can use
> SCONST directly in backup_files_list. However, I don't see any issue with
> this approach either, just trying to reduce the rules.
>
> 8.
> Please indent code within 80 char limit at all applicable places.
>
> 9.
> Please fix following typos:
>
> identifing => identifying
> optionaly => optionally
> structre => structure
> progrsss => progress
> Retrive => Retrieve
> direcotries => directories
>
>
> =====
>
> The other mail thread related to backup manifest [1], is creating a
> backup_manifest file and sends that to the client which has optional
> checksum and other details including filename, file size, mtime, etc.
> There is a patch on the same thread which is then validating the backup
> too.
>
> Since this patch too gets a file list from the server and has similar
> details (except checksum), can somehow parallel backup use the
> backup-manifest
> infrastructure from that patch?
>

This was discussed earlier in the thread, and as Robert suggested, it would
complicate the
code to no real benefit.


> When the parallel backup is in use, will there be a backup_manifest file
> created too? I am just visualizing what will be the scenario when both
> these
> features are checked-in.
>

Yes, I think it should. Since the full backup will have a manifest file,
there is no
reason for parallel backup to not support it.

I'll share the updated patch in the next couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-12-10T14:34:14Z

On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
> <jeevan.chalke@enterprisedb.com> wrote:
> > I am still not sure why we need SEND_BACKUP_FILELIST as a separate
> command.
> > Can't we return the file list with START_BACKUP itself?
>
> I had the same thought, but I think it's better to keep them separate.
> Somebody might want to use the SEND_BACKUP_FILELIST command for
> something other than a backup (I actually think it should be called
> just SEND_FILE_LIST)


Sure. Thanks for the recommendation. To keep the function names in sync, I
intend to do following the
following renamings:
- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST -->  SEND_FILE_LIST

. Somebody might want to start a backup without
> getting a file list because they're going to copy the files at the FS
> level. Somebody might want to get a list of files to process after
> somebody else has started the backup on another connection. Or maybe
> nobody wants to do any of those things, but it doesn't seem to cost us
> much of anything to split the commands, so I think we should.
>

+1

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2019-12-12T15:19:57Z

On Tue, Dec 10, 2019 at 7:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com>
> wrote:
>
>> On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
>> <jeevan.chalke@enterprisedb.com> wrote:
>> > I am still not sure why we need SEND_BACKUP_FILELIST as a separate
>> command.
>> > Can't we return the file list with START_BACKUP itself?
>>
>> I had the same thought, but I think it's better to keep them separate.
>> Somebody might want to use the SEND_BACKUP_FILELIST command for
>> something other than a backup (I actually think it should be called
>> just SEND_FILE_LIST)
>
>
> Sure. Thanks for the recommendation. To keep the function names in sync, I
> intend to do following the
> following renamings:
> - SEND_BACKUP_FILES --> SEND_FILES
> - SEND_BACKUP_FILELIST -->  SEND_FILE_LIST
>
> . Somebody might want to start a backup without
>> getting a file list because they're going to copy the files at the FS
>> level. Somebody might want to get a list of files to process after
>> somebody else has started the backup on another connection. Or maybe
>> nobody wants to do any of those things, but it doesn't seem to cost us
>> much of anything to split the commands, so I think we should.
>>
>
> +1
>
>
I have updated the patches (v7 attached) and have taken care of all issues
pointed by Jeevan, additionally
ran the pgindent on each patch. Furthermore, Command names have been
renamed as suggested and I
have simplified the SendFiles function. Client can only request the regular
files, any other kind such as
directories or symlinks will be skipped, the client will be responsible for
taking care of such.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2019-12-19T17:47:22Z

On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.

Hi,

Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().

So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".

0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:

- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work. Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.

- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.

- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.

I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.

Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.

One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.

A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:

+ /* ... and pg_control after everything else. */

...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.

Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-01-04T06:53:52Z

On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > I have updated the patches (v7 attached) and have taken care of all
> issues pointed by Jeevan, additionally
> > ran the pgindent on each patch. Furthermore, Command names have been
> renamed as suggested and I
> > have simplified the SendFiles function. Client can only request the
> regular files, any other kind such as
> > directories or symlinks will be skipped, the client will be responsible
> for taking care of such.
>
> Hi,
>
> Patch 0001 of this series conflicts with my recent commit
> 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
> inspired by some previous study of 0001. That being said, I think 0001
> has the wrong idea. There's no reason that I can see why it should be
> correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
> perform_base_backup(). It's true that if we register a long-lived
> before_shmem_exit hook, then the backup will get cleaned up even
> without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
> question of the warning message. I think that our goal should be to
> emit the warning message about a backup being stopped too early if the
> user uses either pg_start_backup() or the new START_BACKUP command and
> does not end the backup with either pg_stop_backup() or the new
> STOP_BACKUP command -- but not if a single command that both starts
> and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
> that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
> need to temporarily register do_pg_abort_backup() as a
> before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
> commands like BASE_BACKUP() -- and for things like pg_start_backup()
> or the new START_BACKUP command, we just need to add a single call to
> register_persistent_abort_backup_handler().
>
> So I think you can drop 0001, and then in the patch that actually
> introduces START_BACKUP, add the call to
> register_persistent_abort_backup_handler() before calling
> do_pg_start_backup(). Also in that patch, also adjust the warning text
> that do_pg_abort_backup() emits to be more generic e.g. "aborting
> backup due to backend exiting while a non-exclusive backup is in
> progress".
>
> Sure. will do.


> 0003 creates three new functions, moving code from
> do_pg_start_backup() to a new function collectTablespaces() and from
> perform_base_backup() to new functions setup_throttle() and
> include_wal_files(). I'm skeptical about all of these changes. One
> general nitpick is that the way these function names are capitalized
> and punctuated does not seem to have been chosen very consistently;
> how about name_like_this() throughout? A bit more substantively:
>
> - collectTablespaces() is factored out of do_pg_start_backup() so that
> it can also be used by SendFileList(), but that means that a client is
> going to invoke START_BACKUP, indirectly calling collectTablespaces(),
> and then immediately afterward the client is probably going to call
> SEND_FILE_LIST, which will again call collectTablespaces(). That does
> not appear to be super-great. For one thing, it's duplicate work,
> although because SendFileList() is going to pass infotbssize as false,
> it's not a lot of duplicated work.


I'll remove this duplication by eliminating this call from START_BACKUP and
SEND_FILE_LIST functions. More about this is explained later in this email.


> Also, what happens if the two calls
> to collectTablespaces() return different answers due to concurrent
> CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
> it seems like there is at least the possibility of bugs if different
> parts of the backup have different notions of what tablespaces exist.
>

The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
be resolved by the WAL files collected for the backup. I don't think we
can do anything when objects are created or dropped in-between start and
stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
scenario and does not error out when some relation files go away.


>
> - setup_throttle() is factored out of perform_base_backup() so that it
> can be called in StartBackup() and StopBackup() and SendFiles(). This
> seems extremely odd. Why does it make any sense to give the user an
> option to activate throttling when *ending* a backup? Why does it make
> sense to give the user a chance to enable throttling *both* at the
> startup of a backup *and also* for each individual file. If we're
> going to support throttling here, it seems like it should be either a
> backup-level property or a file-level property, not both.
>

It's a file-level property only. Throttle functionality relies on global
variables. StartBackup() and StopBackup() are calling setup_throttle
function to disable the throttling.

I should have been more explicit here by using -1 to setup_throttle,
Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
(Although it defaults to -1 for these functions).

I'll remove the setup_throttle() call for both functions.


>
> - include_wal_files() is factored out of perform_base_backup() so that
> it can be called by StopBackup(). This seems like a poor design
> decision. The idea behind the BASE_BACKUP command is that you run that
> one command, and the server sends you everything. The idea in this new
> way of doing business is that the client requests the individual files
> it wants -- except for the WAL files, which are for some reason not
> requested individually but sent all together as part of the
> STOP_BACKUP response. It seems like it would be more consistent if the
> client were to decide which WAL files it needs and request them one by
> one, just as we do with other files.
>

As I understand you are suggesting to add another command to fetch the
list of WAL files which would be called by the client after executing stop
backup. Once the client gets that list, it starts requesting the WAL files
one
by one.

So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
as arguments and return the list of WAL files between these LSNs.

Something like this :
LIST_WAL_FILES 'start_lsn'  'end_lsn';


>
> I think there's a common theme to all of these complaints, which is
> that you haven't done enough to move things that are the
> responsibility of the backend in the BASE_BACKUP model to the frontend
> in this model. I started wondering, for example, whether it might not
> be better to have the client rather than the server construct the
> tablespace_map file. After all, the client needs to get the list of
> files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
> almost enough to construct the tablespace map. The only additional
> thing it needs is the full pathname to which the link points. But, it
> seems that we could fairly easily extend SEND_FILE_LIST to send, for
> files that are symbolic links, the target of the link, using a new
> column. Or alternatively, using a separate command, so that instead of
> just sending a single SEND_FILE_LIST command, the client might first
> ask for a tablespace list and then might ask for a list of files
> within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
> for each tablespace, with 0 for the main tablespace, perhaps). I'm not
> sure which way is better.
>

do_pg_start_backup is collecting the tablespace information anyway to
build the tablespace_map for BASE_BACKUP. So returning the same seemed
better than adding a new command for the same information. hence multiple
calls to the collectTablespaces() [to be renamed to collect_tablespaces].

tablespace_map can be constructed by the client, but then BASE_BACKUP
is returning it as part of the full backup. If clients in parallel mode
are to construct this themselves, these will seem like two different
approaches. Perhaps this should be done for BASE_BACKUP as
well?

I'll refactor the do_pg_start_backup function to remove the code related
to tablespace information collection (to collect_tablespaces) and
tablespace_map file creation, so that this function does not collect this
information unnecessarily. perform_base_backup function can collect and
send the tablespace information to the client and then the client can
construct the tablespace_map file.

I'll add a new command to fetch the list of tablespaces i.e.
LIST_TABLESPACES
which will return the tablespace information to the client for parallel
mode. And will refactor START_BACKUP and STOP_BACKUP commands,
so that they only do the specific job of putting the system in backup mode
or
out of it, nothing else.These commands should only return the start and end
LSN to the client.



>
> Similarly, for throttling, I have a hard time understanding how what
> you've got here is going to work reasonably. It looks like each client
> is just going to request whatever MAX_RATE the user specifies, but the
> result of that will be that the actual transfer rate is probably a
> multiple of the specified rate, approximately equal to the specified
> rate times the number of clients. That's probably not what the user
> wants. You could take the specified rate and divide it by the number
> of workers, but limiting each of 4 workers to a quarter of the rate
> will probably lead to a combined rate of less than than the specified
> rate, because if one worker doesn't use all of the bandwidth to which
> it's entitled, or even exits earlier than the others, the other
> workers don't get to go any faster as a result. Another problem is
> that, in the current approach, throttling applies overall to the
> entire backup, but in this approach, it is applied separately to each
> SEND_FILE command. In the current approach, if one file finishes a
> little faster or slower than anticipated, the next file in the tarball
> will be sent a little slower or faster to compensate. But in this
> approach, each SEND_FILES command is throttled separately, so this
> property is lost. Furthermore, while BASEBACKUP sends data
> continuously, this approach naturally involves pauses between
> commands. If files are large, that won't matter much, but if they're
> small and numerous, it will tend to cause the actual transfer rate to
> be less than the throttling rate.
>
> One potential way to solve this problem is... move it to the client
> side. Instead of making it the server's job not to send data too fast,
> make it the client's job not to receive data too fast. Let the server
> backends write as fast as they want, and on the pg_basebackup side,
> have the threads coordinate with each other so that they don't read
> data faster than the configured rate. That's not quite the same thing,
> though, because the server can get ahead by the size of the client's
> receive buffers plus whatever data is on the wire. I don't know
> whether that's a big enough problem to be worth caring about. If it
> is, then I think we need some server infrastructure to "group
> throttle" a group of cooperating backends.
>

That was a mistake in my code. maxrate should've been equally divided
amongst all threads. I agree that we should move this to the client-side.
When a thread exits, its share should also be equally divided amongst
the remaining threads (i.e. recalculate maxrate for each remaining thread).

Say we have 4 running threads with each allocation 25% of the bandwidth.
Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
33.33% each. This solves one problem that you had identified. However,
it doesn't solve where one (or more) thread is not fully consuming their
allocated share. I'm not really sure how we can solve it properly.
Suggestions
are welcome.


>
> A general comment about 0004 is that it seems like you've proceeded by
> taking the code from perform_base_backup() and spreading it across
> several different functions without, necessarily, as much thought as
> is needed there. For instance, StartBackup() looks like just the
> beginning of perform_base_backup(). But, why shouldn't it instead look
> like pg_start_backup() -- in fact, a simplified version that only
> handles the non-exclusive backup case? Is the extra stuff it's doing
> really appropriate? I've already complained about the
> tablespace-related stuff here and the throttling, but there's more.
> Setting statrelpath here will probably break if somebody tries to use
> SEND_FILES without first calling START_BACKUP. Sending the
> backup_label file here is oddly asymmetric, because that's done by
> pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
> looks like it's just the end of perform_base_backup(), but that's not
> pretty strange-looking too. Again, I've already complained about
> include_wal_files() being part of this, but there's also:
>
> +       /* ... and pg_control after everything else. */
>
> ...which (1) is an odd thing to say when this is the first thing this
> particular function is to send and (2) is another example of a sloppy
> division of labor between client and server; apparently, the client is
> supposed to know not to request pg_control, because the server is
> going to send it unsolicited. There's no particular reason to have
> this a special case. The client could just request it last. And then
> the server code wouldn't need a special case, and you wouldn't have
> this odd logic split between the client and the server.
>
> Overall, I think this needs a lot more work. The overall idea's not
> wrong, but there seem to be a very large number of details which, at
> least to me, do not seem to be correct.
>
>

Thank you Robert for the detailed review. I really appreciate your insights
and very precise feedback.

After the changes suggested above, the design on a high level will look
something
like this:

=== SEQUENTIAL EXECUTION ===
START_BACKUP [LABEL | FAST]
- Starts backup on the server
- Returns the start LSN to client

LIST_TABLESPACES
- Sends a list of all tables spaces to client

Loops over LIST_TABLESPACES
- LIST_FILES [tablespace]
- Sends file list for the given tablespace
- Create a list of all files

=== PARALLEL EXECUTION ===
Thread loop until the list of files is exhausted
SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
- If the checksum is enabled then WAL_START_LOCATION is required.
- Can request server to send one or more files but we are requesting one at
a time
- Pick next file from list of files

- Threads sleep after the list is exhausted
- All threads are sleeping

=== SEQUENTIAL EXECUTION ===
STOP_BACKUP [NOWAIT]
- Stops backup mode
- Return end LSN

If --wal-method=fetch then
LIST_WAL_FILES 'start_lsn' 'end_lsn'
- Sends a list of WAL files between start LSN and end LSN

=== PARALLEL EXECUTION ===
Thread loop until the list of WAL files is exhausted
SEND_FILE <WAL file>
- Can request server to send one or more files but we are requesting one
WAL file at a time
- Pick next file from list of WAL files

- Threads terminate and set their status as completed/terminated

=== SEQUENTIAL EXECUTION ===
Cleanup



--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-01-30T13:39:51Z

On Sat, Jan 4, 2020 at 11:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
>
>> On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > I have updated the patches (v7 attached) and have taken care of all
>> issues pointed by Jeevan, additionally
>> > ran the pgindent on each patch. Furthermore, Command names have been
>> renamed as suggested and I
>> > have simplified the SendFiles function. Client can only request the
>> regular files, any other kind such as
>> > directories or symlinks will be skipped, the client will be responsible
>> for taking care of such.
>>
>> Hi,
>>
>> Patch 0001 of this series conflicts with my recent commit
>> 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
>> inspired by some previous study of 0001. That being said, I think 0001
>> has the wrong idea. There's no reason that I can see why it should be
>> correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
>> perform_base_backup(). It's true that if we register a long-lived
>> before_shmem_exit hook, then the backup will get cleaned up even
>> without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
>> question of the warning message. I think that our goal should be to
>> emit the warning message about a backup being stopped too early if the
>> user uses either pg_start_backup() or the new START_BACKUP command and
>> does not end the backup with either pg_stop_backup() or the new
>> STOP_BACKUP command -- but not if a single command that both starts
>> and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
>> that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
>> need to temporarily register do_pg_abort_backup() as a
>> before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
>> commands like BASE_BACKUP() -- and for things like pg_start_backup()
>> or the new START_BACKUP command, we just need to add a single call to
>> register_persistent_abort_backup_handler().
>>
>> So I think you can drop 0001, and then in the patch that actually
>> introduces START_BACKUP, add the call to
>> register_persistent_abort_backup_handler() before calling
>> do_pg_start_backup(). Also in that patch, also adjust the warning text
>> that do_pg_abort_backup() emits to be more generic e.g. "aborting
>> backup due to backend exiting while a non-exclusive backup is in
>> progress".
>>
>> Sure. will do.
>
>
>> 0003 creates three new functions, moving code from
>> do_pg_start_backup() to a new function collectTablespaces() and from
>> perform_base_backup() to new functions setup_throttle() and
>> include_wal_files(). I'm skeptical about all of these changes. One
>> general nitpick is that the way these function names are capitalized
>> and punctuated does not seem to have been chosen very consistently;
>> how about name_like_this() throughout? A bit more substantively:
>>
>> - collectTablespaces() is factored out of do_pg_start_backup() so that
>> it can also be used by SendFileList(), but that means that a client is
>> going to invoke START_BACKUP, indirectly calling collectTablespaces(),
>> and then immediately afterward the client is probably going to call
>> SEND_FILE_LIST, which will again call collectTablespaces(). That does
>> not appear to be super-great. For one thing, it's duplicate work,
>> although because SendFileList() is going to pass infotbssize as false,
>> it's not a lot of duplicated work.
>
>
> I'll remove this duplication by eliminating this call from START_BACKUP and
> SEND_FILE_LIST functions. More about this is explained later in this email.
>
>
>> Also, what happens if the two calls
>> to collectTablespaces() return different answers due to concurrent
>> CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
>> it seems like there is at least the possibility of bugs if different
>> parts of the backup have different notions of what tablespaces exist.
>>
>
> The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
> be resolved by the WAL files collected for the backup. I don't think we
> can do anything when objects are created or dropped in-between start and
> stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
> scenario and does not error out when some relation files go away.
>
>
>>
>> - setup_throttle() is factored out of perform_base_backup() so that it
>> can be called in StartBackup() and StopBackup() and SendFiles(). This
>> seems extremely odd. Why does it make any sense to give the user an
>> option to activate throttling when *ending* a backup? Why does it make
>> sense to give the user a chance to enable throttling *both* at the
>> startup of a backup *and also* for each individual file. If we're
>> going to support throttling here, it seems like it should be either a
>> backup-level property or a file-level property, not both.
>>
>
> It's a file-level property only. Throttle functionality relies on global
> variables. StartBackup() and StopBackup() are calling setup_throttle
> function to disable the throttling.
>
> I should have been more explicit here by using -1 to setup_throttle,
> Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
> (Although it defaults to -1 for these functions).
>
> I'll remove the setup_throttle() call for both functions.
>
>
>>
>> - include_wal_files() is factored out of perform_base_backup() so that
>> it can be called by StopBackup(). This seems like a poor design
>> decision. The idea behind the BASE_BACKUP command is that you run that
>> one command, and the server sends you everything. The idea in this new
>> way of doing business is that the client requests the individual files
>> it wants -- except for the WAL files, which are for some reason not
>> requested individually but sent all together as part of the
>> STOP_BACKUP response. It seems like it would be more consistent if the
>> client were to decide which WAL files it needs and request them one by
>> one, just as we do with other files.
>>
>
> As I understand you are suggesting to add another command to fetch the
> list of WAL files which would be called by the client after executing stop
> backup. Once the client gets that list, it starts requesting the WAL files
> one
> by one.
>
> So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
> as arguments and return the list of WAL files between these LSNs.
>
> Something like this :
> LIST_WAL_FILES 'start_lsn'  'end_lsn';
>
>
>>
>> I think there's a common theme to all of these complaints, which is
>> that you haven't done enough to move things that are the
>> responsibility of the backend in the BASE_BACKUP model to the frontend
>> in this model. I started wondering, for example, whether it might not
>> be better to have the client rather than the server construct the
>> tablespace_map file. After all, the client needs to get the list of
>> files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
>> almost enough to construct the tablespace map. The only additional
>> thing it needs is the full pathname to which the link points. But, it
>> seems that we could fairly easily extend SEND_FILE_LIST to send, for
>> files that are symbolic links, the target of the link, using a new
>> column. Or alternatively, using a separate command, so that instead of
>> just sending a single SEND_FILE_LIST command, the client might first
>> ask for a tablespace list and then might ask for a list of files
>> within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
>> for each tablespace, with 0 for the main tablespace, perhaps). I'm not
>> sure which way is better.
>>
>
> do_pg_start_backup is collecting the tablespace information anyway to
> build the tablespace_map for BASE_BACKUP. So returning the same seemed
> better than adding a new command for the same information. hence multiple
> calls to the collectTablespaces() [to be renamed to collect_tablespaces].
>
> tablespace_map can be constructed by the client, but then BASE_BACKUP
> is returning it as part of the full backup. If clients in parallel mode
> are to construct this themselves, these will seem like two different
> approaches. Perhaps this should be done for BASE_BACKUP as
> well?
>
> I'll refactor the do_pg_start_backup function to remove the code related
> to tablespace information collection (to collect_tablespaces) and
> tablespace_map file creation, so that this function does not collect this
> information unnecessarily. perform_base_backup function can collect and
> send the tablespace information to the client and then the client can
> construct the tablespace_map file.
>
> I'll add a new command to fetch the list of tablespaces i.e.
> LIST_TABLESPACES
> which will return the tablespace information to the client for parallel
> mode. And will refactor START_BACKUP and STOP_BACKUP commands,
> so that they only do the specific job of putting the system in backup mode
> or
> out of it, nothing else.These commands should only return the start and end
> LSN to the client.
>
>
>
>>
>> Similarly, for throttling, I have a hard time understanding how what
>> you've got here is going to work reasonably. It looks like each client
>> is just going to request whatever MAX_RATE the user specifies, but the
>> result of that will be that the actual transfer rate is probably a
>> multiple of the specified rate, approximately equal to the specified
>> rate times the number of clients. That's probably not what the user
>> wants. You could take the specified rate and divide it by the number
>> of workers, but limiting each of 4 workers to a quarter of the rate
>> will probably lead to a combined rate of less than than the specified
>> rate, because if one worker doesn't use all of the bandwidth to which
>> it's entitled, or even exits earlier than the others, the other
>> workers don't get to go any faster as a result. Another problem is
>> that, in the current approach, throttling applies overall to the
>> entire backup, but in this approach, it is applied separately to each
>> SEND_FILE command. In the current approach, if one file finishes a
>> little faster or slower than anticipated, the next file in the tarball
>> will be sent a little slower or faster to compensate. But in this
>> approach, each SEND_FILES command is throttled separately, so this
>> property is lost. Furthermore, while BASEBACKUP sends data
>> continuously, this approach naturally involves pauses between
>> commands. If files are large, that won't matter much, but if they're
>> small and numerous, it will tend to cause the actual transfer rate to
>> be less than the throttling rate.
>>
>> One potential way to solve this problem is... move it to the client
>> side. Instead of making it the server's job not to send data too fast,
>> make it the client's job not to receive data too fast. Let the server
>> backends write as fast as they want, and on the pg_basebackup side,
>> have the threads coordinate with each other so that they don't read
>> data faster than the configured rate. That's not quite the same thing,
>> though, because the server can get ahead by the size of the client's
>> receive buffers plus whatever data is on the wire. I don't know
>> whether that's a big enough problem to be worth caring about. If it
>> is, then I think we need some server infrastructure to "group
>> throttle" a group of cooperating backends.
>>
>
> That was a mistake in my code. maxrate should've been equally divided
> amongst all threads. I agree that we should move this to the client-side.
> When a thread exits, its share should also be equally divided amongst
> the remaining threads (i.e. recalculate maxrate for each remaining
> thread).
>
> Say we have 4 running threads with each allocation 25% of the bandwidth.
> Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
> 33.33% each. This solves one problem that you had identified. However,
> it doesn't solve where one (or more) thread is not fully consuming their
> allocated share. I'm not really sure how we can solve it properly.
> Suggestions
> are welcome.
>
>
>>
>> A general comment about 0004 is that it seems like you've proceeded by
>> taking the code from perform_base_backup() and spreading it across
>> several different functions without, necessarily, as much thought as
>> is needed there. For instance, StartBackup() looks like just the
>> beginning of perform_base_backup(). But, why shouldn't it instead look
>> like pg_start_backup() -- in fact, a simplified version that only
>> handles the non-exclusive backup case? Is the extra stuff it's doing
>> really appropriate? I've already complained about the
>> tablespace-related stuff here and the throttling, but there's more.
>> Setting statrelpath here will probably break if somebody tries to use
>> SEND_FILES without first calling START_BACKUP. Sending the
>> backup_label file here is oddly asymmetric, because that's done by
>> pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
>> looks like it's just the end of perform_base_backup(), but that's not
>> pretty strange-looking too. Again, I've already complained about
>> include_wal_files() being part of this, but there's also:
>>
>> +       /* ... and pg_control after everything else. */
>>
>> ...which (1) is an odd thing to say when this is the first thing this
>> particular function is to send and (2) is another example of a sloppy
>> division of labor between client and server; apparently, the client is
>> supposed to know not to request pg_control, because the server is
>> going to send it unsolicited. There's no particular reason to have
>> this a special case. The client could just request it last. And then
>> the server code wouldn't need a special case, and you wouldn't have
>> this odd logic split between the client and the server.
>>
>> Overall, I think this needs a lot more work. The overall idea's not
>> wrong, but there seem to be a very large number of details which, at
>> least to me, do not seem to be correct.
>>
>>
>
> Thank you Robert for the detailed review. I really appreciate your insights
> and very precise feedback.
>
> After the changes suggested above, the design on a high level will look
> something
> like this:
>
> === SEQUENTIAL EXECUTION ===
> START_BACKUP [LABEL | FAST]
> - Starts backup on the server
> - Returns the start LSN to client
>
> LIST_TABLESPACES
> - Sends a list of all tables spaces to client
>
> Loops over LIST_TABLESPACES
> - LIST_FILES [tablespace]
> - Sends file list for the given tablespace
> - Create a list of all files
>
> === PARALLEL EXECUTION ===
> Thread loop until the list of files is exhausted
> SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
> - If the checksum is enabled then WAL_START_LOCATION is required.
> - Can request server to send one or more files but we are requesting one
> at a time
> - Pick next file from list of files
>
> - Threads sleep after the list is exhausted
> - All threads are sleeping
>
> === SEQUENTIAL EXECUTION ===
> STOP_BACKUP [NOWAIT]
> - Stops backup mode
> - Return end LSN
>
> If --wal-method=fetch then
> LIST_WAL_FILES 'start_lsn' 'end_lsn'
> - Sends a list of WAL files between start LSN and end LSN
>
> === PARALLEL EXECUTION ===
> Thread loop until the list of WAL files is exhausted
> SEND_FILE <WAL file>
> - Can request server to send one or more files but we are requesting one
> WAL file at a time
> - Pick next file from list of WAL files
>
> - Threads terminate and set their status as completed/terminated
>
> === SEQUENTIAL EXECUTION ===
> Cleanup
>
>
>
>
Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
                            [NOVERIFY_CHECKSUMS]


Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the
do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.


I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This
approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1]
https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-02-10T13:48:37Z

Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> Here are the the updated patches, taking care of the issues pointed
> earlier. This patch adds the following commands (with specified option):
>
> START_BACKUP [LABEL '<label>'] [FAST]
> STOP_BACKUP [NOWAIT]
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
>                             [NOVERIFY_CHECKSUMS]
>
>
> Parallel backup is not making any use of tablespace map, so I have
> removed that option from the above commands. There is a patch pending
> to remove the exclusive backup; we can further refactor the
> do_pg_start_backup
> function at that time, to remove the tablespace information and move the
> creation of tablespace_map file to the client.
>
>
> I have disabled the maxrate option for parallel backup. I intend to send
> out a separate patch for it. Robert previously suggested to implement
> throttling on the client-side. I found the original email thread [1]
> where throttling was proposed and added to the server. In that thread,
> it was originally implemented on the client-side, but per many suggestions,
> it was moved to server-side.
>
> So, I have a few suggestions on how we can implement this:
>
> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
> the user could choose the bandwidth allocation for each worker. This
> approach
> can be implemented on the client-side as well as on the server-side.
>
> 2- have the maxrate, be divided among workers equally at first. and the
> let the main thread keep adjusting it whenever one of the workers finishes.
> I believe this would only be possible if we handle throttling on the
> client.
> Also, as I understand it, implementing this will introduce additional mutex
> for handling of bandwidth consumption data so that rate may be adjusted
> according to data received by threads.
>
> [1]
> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

The latest changes look good to me. However, the patch set is missing the
documentation.
Please add those.

Thanks

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-02-17T08:39:08Z

Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

> Hi Asif,
>
> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>> Here are the the updated patches, taking care of the issues pointed
>> earlier. This patch adds the following commands (with specified option):
>>
>> START_BACKUP [LABEL '<label>'] [FAST]
>> STOP_BACKUP [NOWAIT]
>> LIST_TABLESPACES [PROGRESS]
>> LIST_FILES [TABLESPACE]
>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
>>                             [NOVERIFY_CHECKSUMS]
>>
>>
>> Parallel backup is not making any use of tablespace map, so I have
>> removed that option from the above commands. There is a patch pending
>> to remove the exclusive backup; we can further refactor the
>> do_pg_start_backup
>> function at that time, to remove the tablespace information and move the
>> creation of tablespace_map file to the client.
>>
>>
>> I have disabled the maxrate option for parallel backup. I intend to send
>> out a separate patch for it. Robert previously suggested to implement
>> throttling on the client-side. I found the original email thread [1]
>> where throttling was proposed and added to the server. In that thread,
>> it was originally implemented on the client-side, but per many
>> suggestions,
>> it was moved to server-side.
>>
>> So, I have a few suggestions on how we can implement this:
>>
>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
>> the user could choose the bandwidth allocation for each worker. This
>> approach
>> can be implemented on the client-side as well as on the server-side.
>>
>> 2- have the maxrate, be divided among workers equally at first. and the
>> let the main thread keep adjusting it whenever one of the workers
>> finishes.
>> I believe this would only be possible if we handle throttling on the
>> client.
>> Also, as I understand it, implementing this will introduce additional
>> mutex
>> for handling of bandwidth consumption data so that rate may be adjusted
>> according to data received by threads.
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> The latest changes look good to me. However, the patch set is missing the
> documentation.
> Please add those.
>
> Thanks
>
> --
> Jeevan Chalke
> Associate Database Architect & Team Lead, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-02-25T14:18:42Z

Hi,

I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/


On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Thanks Jeevan. Here is the documentation patch.
>
> On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
> jeevan.chalke@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>> Here are the the updated patches, taking care of the issues pointed
>>> earlier. This patch adds the following commands (with specified option):
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST]
>>> STOP_BACKUP [NOWAIT]
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
>>>                             [NOVERIFY_CHECKSUMS]
>>>
>>>
>>> Parallel backup is not making any use of tablespace map, so I have
>>> removed that option from the above commands. There is a patch pending
>>> to remove the exclusive backup; we can further refactor the
>>> do_pg_start_backup
>>> function at that time, to remove the tablespace information and move the
>>> creation of tablespace_map file to the client.
>>>
>>>
>>> I have disabled the maxrate option for parallel backup. I intend to send
>>> out a separate patch for it. Robert previously suggested to implement
>>> throttling on the client-side. I found the original email thread [1]
>>> where throttling was proposed and added to the server. In that thread,
>>> it was originally implemented on the client-side, but per many
>>> suggestions,
>>> it was moved to server-side.
>>>
>>> So, I have a few suggestions on how we can implement this:
>>>
>>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
>>> the user could choose the bandwidth allocation for each worker. This
>>> approach
>>> can be implemented on the client-side as well as on the server-side.
>>>
>>> 2- have the maxrate, be divided among workers equally at first. and the
>>> let the main thread keep adjusting it whenever one of the workers
>>> finishes.
>>> I believe this would only be possible if we handle throttling on the
>>> client.
>>> Also, as I understand it, implementing this will introduce additional
>>> mutex
>>> for handling of bandwidth consumption data so that rate may be adjusted
>>> according to data received by threads.
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> The latest changes look good to me. However, the patch set is missing the
>> documentation.
>> Please add those.
>>
>> Thanks
>>
>> --
>> Jeevan Chalke
>> Associate Database Architect & Team Lead, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-11T09:38:20Z

Hi Asif

I have started testing this feature. I have applied v6 patch on commit
a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.

*--if backup failed, backup directory is not getting removed.*
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty


*--giving large number of jobs leading segmentation fault.*
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new
process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily
unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
/tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
46  if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  pthread_join (threadid=140503120623360, thread_return=0x0) at
pthread_join.c:46
#1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4  exit (status=1) at exit.c:100
#5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at
pg_basebackup.c:2713
#6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
pg_basebackup.c:2668


*--with tablespace is in the same directory as data, parallel_backup
crashed*
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
/tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
backupInfo=0x14210a0) at pg_basebackup.c:3000
#1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
pg_basebackup.c:2739
#2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
pg_basebackup.c:2668
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi


On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi,
>
> I have created a commitfest entry.
> https://commitfest.postgresql.org/27/2472/
>
>
> On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>> Thanks Jeevan. Here is the documentation patch.
>>
>> On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
>> jeevan.chalke@enterprisedb.com> wrote:
>>
>>> Hi Asif,
>>>
>>> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Here are the the updated patches, taking care of the issues pointed
>>>> earlier. This patch adds the following commands (with specified option):
>>>>
>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>> STOP_BACKUP [NOWAIT]
>>>> LIST_TABLESPACES [PROGRESS]
>>>> LIST_FILES [TABLESPACE]
>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
>>>>                             [NOVERIFY_CHECKSUMS]
>>>>
>>>>
>>>> Parallel backup is not making any use of tablespace map, so I have
>>>> removed that option from the above commands. There is a patch pending
>>>> to remove the exclusive backup; we can further refactor the
>>>> do_pg_start_backup
>>>> function at that time, to remove the tablespace information and move the
>>>> creation of tablespace_map file to the client.
>>>>
>>>>
>>>> I have disabled the maxrate option for parallel backup. I intend to send
>>>> out a separate patch for it. Robert previously suggested to implement
>>>> throttling on the client-side. I found the original email thread [1]
>>>> where throttling was proposed and added to the server. In that thread,
>>>> it was originally implemented on the client-side, but per many
>>>> suggestions,
>>>> it was moved to server-side.
>>>>
>>>> So, I have a few suggestions on how we can implement this:
>>>>
>>>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
>>>> the user could choose the bandwidth allocation for each worker. This
>>>> approach
>>>> can be implemented on the client-side as well as on the server-side.
>>>>
>>>> 2- have the maxrate, be divided among workers equally at first. and the
>>>> let the main thread keep adjusting it whenever one of the workers
>>>> finishes.
>>>> I believe this would only be possible if we handle throttling on the
>>>> client.
>>>> Also, as I understand it, implementing this will introduce additional
>>>> mutex
>>>> for handling of bandwidth consumption data so that rate may be adjusted
>>>> according to data received by threads.
>>>>
>>>> [1]
>>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>>>
>>>> --
>>>> Asif Rehman
>>>> Highgo Software (Canada/China/Pakistan)
>>>> URL : www.highgo.ca
>>>>
>>>>
>>>
>>> The latest changes look good to me. However, the patch set is missing
>>> the documentation.
>>> Please add those.
>>>
>>> Thanks
>>>
>>> --
>>> Jeevan Chalke
>>> Associate Database Architect & Team Lead, Product Development
>>> EnterpriseDB Corporation
>>> The Enterprise PostgreSQL Company
>>>
>>>
>>
>> --
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-03-13T16:21:09Z

On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif
>
> I have started testing this feature. I have applied v6 patch on commit
> a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
> I got few observations, please take a look.
>
> *--if backup failed, backup directory is not getting removed.*
> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
> /tmp/test_bkp/bkp6
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
> /tmp/test_bkp/bkp6
> pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
> empty
>
>
> *--giving large number of jobs leading segmentation fault.*
> ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> .
> .
> .
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: could not fork new
> process for connection: Resource temporarily unavailable
>
> could not fork new process for connection: Resource temporarily unavailable
> pg_basebackup: error: failed to create thread: Resource temporarily
> unavailable
> Segmentation fault (core dumped)
>
> --stack-trace
> gdb -q -c core.11824 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
> /tmp/test_bkp/bkp10'.
> Program terminated with signal 11, Segmentation fault.
> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
> pthread_join.c:46
> 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
> pthread_join.c:46
> #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
> #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
> #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
> #4  exit (status=1) at exit.c:100
> #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
> at pg_basebackup.c:2713
> #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
> #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
> pg_basebackup.c:2668
>
>
> *--with tablespace is in the same directory as data, parallel_backup
> crashed*
> [edb@localhost bin]$ ./initdb -D /tmp/data
> [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
> [edb@localhost bin]$ mkdir /tmp/ts
> [edb@localhost bin]$ ./psql postgres
> psql (13devel)
> Type "help" for help.
>
> postgres=# create tablespace ts location '/tmp/ts';
> CREATE TABLESPACE
> postgres=# create table tx (a int) tablespace ts;
> CREATE TABLE
> postgres=# \q
> [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
> Segmentation fault (core dumped)
>
> --stack-trace
> [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
> /tmp/ts=/tmp/ts1'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
> backupInfo=0x14210a0) at pg_basebackup.c:3000
> 3000 backupInfo->curr->next = file;
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
> backupInfo=0x14210a0) at pg_basebackup.c:3000
> #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
> pg_basebackup.c:2739
> #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
> #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
> pg_basebackup.c:2668
> (gdb)
>


Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T06:08:31Z

Thanks for the patches.

I have verified reported issues with new patches, issues are fixed now.

I got another observation where If a new slot name given without -C option,
it leads to server crash error.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"

Thanks & Regards,
Rajkumar Raghuwanshi


On Fri, Mar 13, 2020 at 9:51 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Hi Asif
>>
>> I have started testing this feature. I have applied v6 patch on commit
>> a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
>> I got few observations, please take a look.
>>
>> *--if backup failed, backup directory is not getting removed.*
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
>> /tmp/test_bkp/bkp6
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
>> /tmp/test_bkp/bkp6
>> pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
>> empty
>>
>>
>> *--giving large number of jobs leading segmentation fault.*
>> ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> .
>> .
>> .
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: could not fork new
>> process for connection: Resource temporarily unavailable
>>
>> could not fork new process for connection: Resource temporarily
>> unavailable
>> pg_basebackup: error: failed to create thread: Resource temporarily
>> unavailable
>> Segmentation fault (core dumped)
>>
>> --stack-trace
>> gdb -q -c core.11824 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
>> /tmp/test_bkp/bkp10'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
>> pthread_join.c:46
>> 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> Missing separate debuginfos, use: debuginfo-install
>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
>> pthread_join.c:46
>> #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
>> #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
>> #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
>> #4  exit (status=1) at exit.c:100
>> #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
>> at pg_basebackup.c:2713
>> #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
>> #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
>> pg_basebackup.c:2668
>>
>>
>> *--with tablespace is in the same directory as data, parallel_backup
>> crashed*
>> [edb@localhost bin]$ ./initdb -D /tmp/data
>> [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
>> [edb@localhost bin]$ mkdir /tmp/ts
>> [edb@localhost bin]$ ./psql postgres
>> psql (13devel)
>> Type "help" for help.
>>
>> postgres=# create tablespace ts location '/tmp/ts';
>> CREATE TABLESPACE
>> postgres=# create table tx (a int) tablespace ts;
>> CREATE TABLE
>> postgres=# \q
>> [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
>> Segmentation fault (core dumped)
>>
>> --stack-trace
>> [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
>> /tmp/ts=/tmp/ts1'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
>> backupInfo=0x14210a0) at pg_basebackup.c:3000
>> 3000 backupInfo->curr->next = file;
>> Missing separate debuginfos, use: debuginfo-install
>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
>> backupInfo=0x14210a0) at pg_basebackup.c:3000
>> #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
>> pg_basebackup.c:2739
>> #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
>> #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
>> pg_basebackup.c:2668
>> (gdb)
>>
>
>
> Thanks Rajkumar. I have fixed the above issues and have rebased the patch
> to the latest master (b7f64c64).
> (V9 of the patches are attached).
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-03-16T06:21:49Z

On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Thanks for the patches.
>
> I have verified reported issues with new patches, issues are fixed now.
>
> I got another observation where If a new slot name given without -C
> option, it leads to server crash error.
>
> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
> test_bkp_slot
> pg_basebackup: error: could not send replication command
> "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
> pg_basebackup: error: could not list backup files: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing data directory "/tmp/bkp"
>

It seems to be an expected behavior. The START_BACKUP command has been
executed, and
pg_basebackup tries to start a WAL streaming process with a non-existent
slot, which results in
an error. So the backup is aborted while terminating all other processes.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T06:26:16Z

On Mon, Mar 16, 2020 at 11:52 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Thanks for the patches.
>>
>> I have verified reported issues with new patches, issues are fixed now.
>>
>> I got another observation where If a new slot name given without -C
>> option, it leads to server crash error.
>>
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
>> test_bkp_slot
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
>> pg_basebackup: error: could not list backup files: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing data directory "/tmp/bkp"
>>
>
> It seems to be an expected behavior. The START_BACKUP command has been
> executed, and
> pg_basebackup tries to start a WAL streaming process with a non-existent
> slot, which results in
> an error. So the backup is aborted while terminating all other processes.
>
I think error message can be improved. current error message looks like
database server is crashed.

on PG same is existing with exit 1.
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bkp"


>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-03-16T08:43:48Z

Hi Asif,


> Thanks Rajkumar. I have fixed the above issues and have rebased the patch
> to the latest master (b7f64c64).
> (V9 of the patches are attached).
>

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out
pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let
rename
SEND_FILES to SEND_FILE instead.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

5. Typos:
tablspace => tablespace
safly => safely

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Thanks
-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-16T12:49:44Z

Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1  0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3  0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi


On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

> Hi Asif,
>
>
>> Thanks Rajkumar. I have fixed the above issues and have rebased the patch
>> to the latest master (b7f64c64).
>> (V9 of the patches are attached).
>>
>
> I had a further review of the patches and here are my few observations:
>
> 1.
> +/*
> + * stop_backup() - ends an online backup
> + *
> + * The function is called at the end of an online backup. It sends out
> pg_control
> + * file, optionally WAL segments and ending WAL location.
> + */
>
> Comments seem out-dated.
>
> 2. With parallel jobs, maxrate is now not supported. Since we are now
> asking
> data in multiple threads throttling seems important here. Can you please
> explain why have you disabled that?
>
> 3. As we are always fetching a single file and as Robert suggested, let
> rename
> SEND_FILES to SEND_FILE instead.
>
> 4. Does this work on Windows? I mean does pthread_create() work on Windows?
> I asked this as I see that pgbench has its own implementation for
> pthread_create() for WIN32 but this patch doesn't.
>
> 5. Typos:
> tablspace => tablespace
> safly => safely
>
> 6. parallel_backup_run() needs some comments explaining the states it goes
> through PB_* states.
>
> 7.
> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
> +                if (backupinfo->activeworkers == 0)
> +                {
> +                    backupinfo->backupstate = PB_STOP_BACKUP;
> +                    free_filelist(backupinfo);
> +                }
> +                break;
> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
> +                if (backupinfo->activeworkers == 0)
> +                {
> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
> +                }
> +                break;
>
> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>
> Thanks
> --
> Jeevan Chalke
> Associate Database Architect & Team Lead, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 66449694
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.
>

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-19T10:41:24Z

Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is
not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
  oid  |  spcname   | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
  1663 | pg_default |       10 |        |
 16384 | tblsp      |       10 |        |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
 "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directory


Thanks & Regards,
Rajkumar Raghuwanshi


On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif,
>
> On testing further, I found when taking backup with -R, pg_basebackup
> crashed
> this crash is not consistently reproducible.
>
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
> text);"
> CREATE TABLE
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
> ('parallel_backup with -R recovery-conf');"
> INSERT 0 1
> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
> Segmentation fault (core dumped)
>
> stack trace looks the same as it was on earlier reported crash with
> tablespace.
> --stack trace
> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
> -R'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
> pg_basebackup.c:3175
> 3175 backupinfo->curr = fetchfile->next;
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
> pg_basebackup.c:3175
> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
> pthread_create.c:301
> #3  0x00000039212e8c4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> (gdb)
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
> jeevan.chalke@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>>
>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>> patch to the latest master (b7f64c64).
>>> (V9 of the patches are attached).
>>>
>>
>> I had a further review of the patches and here are my few observations:
>>
>> 1.
>> +/*
>> + * stop_backup() - ends an online backup
>> + *
>> + * The function is called at the end of an online backup. It sends out
>> pg_control
>> + * file, optionally WAL segments and ending WAL location.
>> + */
>>
>> Comments seem out-dated.
>>
>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>> asking
>> data in multiple threads throttling seems important here. Can you please
>> explain why have you disabled that?
>>
>> 3. As we are always fetching a single file and as Robert suggested, let
>> rename
>> SEND_FILES to SEND_FILE instead.
>>
>> 4. Does this work on Windows? I mean does pthread_create() work on
>> Windows?
>> I asked this as I see that pgbench has its own implementation for
>> pthread_create() for WIN32 but this patch doesn't.
>>
>> 5. Typos:
>> tablspace => tablespace
>> safly => safely
>>
>> 6. parallel_backup_run() needs some comments explaining the states it goes
>> through PB_* states.
>>
>> 7.
>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
>> +                if (backupinfo->activeworkers == 0)
>> +                {
>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>> +                    free_filelist(backupinfo);
>> +                }
>> +                break;
>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
>> +                if (backupinfo->activeworkers == 0)
>> +                {
>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>> +                }
>> +                break;
>>
>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>
>> Thanks
>> --
>> Jeevan Chalke
>> Associate Database Architect & Team Lead, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>> Phone: +91 20 66449694
>>
>> Website: www.enterprisedb.com
>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>
>> This e-mail message (and any attachment) is intended for the use of the
>> individual or entity to whom it is addressed. This message contains
>> information from EnterpriseDB Corporation that may be privileged,
>> confidential, or exempt from disclosure under applicable law. If you are
>> not the intended recipient or authorized to receive this for the intended
>> recipient, any use, dissemination, distribution, retention, archiving, or
>> copying of this communication is strictly prohibited. If you have received
>> this e-mail in error, please notify the sender immediately by reply e-mail
>> and delete this message.
>>
>

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-25T07:22:11Z

Hi Asif,

While testing further I observed parallel backup is not able to take backup
of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)




*[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR:  the standby was
promoted during online backupHINT:  This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif,
>
> In another scenarios, bkp data is corrupted for tablespace. again this is
> not reproducible everytime,
> but If I am running the same set of commands I am getting the same error.
>
> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
> waiting for server to start.... done
> server started
> [edb@localhost bin]$
> [edb@localhost bin]$ mkdir /tmp/tblsp
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
> location '/tmp/tblsp';"
> CREATE TABLESPACE
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
> tablespace tblsp;"
> CREATE DATABASE
> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
> text);"
> CREATE TABLE
> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
> ('parallel_backup with tablespace');"
> INSERT 0 1
> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
> start
> waiting for server to start.... done
> server started
> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>   oid  |  spcname   | spcowner | spcacl | spcoptions
> -------+------------+----------+--------+------------
>   1663 | pg_default |       10 |        |
>  16384 | tblsp      |       10 |        |
> (2 rows)
>
> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
> psql: error: could not connect to server: FATAL:
>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
> missing.
> [edb@localhost bin]$
> [edb@localhost bin]$ ls
> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> [edb@localhost bin]$ ls
> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> ls: cannot access
> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
> directory
>
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>> On testing further, I found when taking backup with -R, pg_basebackup
>> crashed
>> this crash is not consistently reproducible.
>>
>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
>> text);"
>> CREATE TABLE
>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
>> ('parallel_backup with -R recovery-conf');"
>> INSERT 0 1
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
>> Segmentation fault (core dumped)
>>
>> stack trace looks the same as it was on earlier reported crash with
>> tablespace.
>> --stack trace
>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
>> -R'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>> pg_basebackup.c:3175
>> 3175 backupinfo->curr = fetchfile->next;
>> Missing separate debuginfos, use: debuginfo-install
>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>> pg_basebackup.c:3175
>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>> pg_basebackup.c:2715
>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>> pthread_create.c:301
>> #3  0x00000039212e8c4d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>> (gdb)
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>> jeevan.chalke@enterprisedb.com> wrote:
>>
>>> Hi Asif,
>>>
>>>
>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>> patch to the latest master (b7f64c64).
>>>> (V9 of the patches are attached).
>>>>
>>>
>>> I had a further review of the patches and here are my few observations:
>>>
>>> 1.
>>> +/*
>>> + * stop_backup() - ends an online backup
>>> + *
>>> + * The function is called at the end of an online backup. It sends out
>>> pg_control
>>> + * file, optionally WAL segments and ending WAL location.
>>> + */
>>>
>>> Comments seem out-dated.
>>>
>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>>> asking
>>> data in multiple threads throttling seems important here. Can you please
>>> explain why have you disabled that?
>>>
>>> 3. As we are always fetching a single file and as Robert suggested, let
>>> rename
>>> SEND_FILES to SEND_FILE instead.
>>>
>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>> Windows?
>>> I asked this as I see that pgbench has its own implementation for
>>> pthread_create() for WIN32 but this patch doesn't.
>>>
>>> 5. Typos:
>>> tablspace => tablespace
>>> safly => safely
>>>
>>> 6. parallel_backup_run() needs some comments explaining the states it
>>> goes
>>> through PB_* states.
>>>
>>> 7.
>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
>>> +                if (backupinfo->activeworkers == 0)
>>> +                {
>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>> +                    free_filelist(backupinfo);
>>> +                }
>>> +                break;
>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server
>>> */
>>> +                if (backupinfo->activeworkers == 0)
>>> +                {
>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>> +                }
>>> +                break;
>>>
>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>
>>> Thanks
>>> --
>>> Jeevan Chalke
>>> Associate Database Architect & Team Lead, Product Development
>>> EnterpriseDB Corporation
>>> The Enterprise PostgreSQL Company
>>>
>>> Phone: +91 20 66449694
>>>
>>> Website: www.enterprisedb.com
>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>>
>>> This e-mail message (and any attachment) is intended for the use of the
>>> individual or entity to whom it is addressed. This message contains
>>> information from EnterpriseDB Corporation that may be privileged,
>>> confidential, or exempt from disclosure under applicable law. If you are
>>> not the intended recipient or authorized to receive this for the intended
>>> recipient, any use, dissemination, distribution, retention, archiving, or
>>> copying of this communication is strictly prohibited. If you have received
>>> this e-mail in error, please notify the sender immediately by reply e-mail
>>> and delete this message.
>>>
>>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-03-27T17:33:28Z

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif,
>
> While testing further I observed parallel backup is not able to take
> backup of standby server.
>
> mkdir /tmp/archive_dir
> echo "archive_mode='on'">> data/postgresql.conf
> echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
>
> ./pg_ctl -D data -l logs start
> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
>
> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
> /tmp/slave/postgresql.conf
> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
> /tmp/slave/postgresql.conf
> echo "promote_trigger_file='/tmp/failover.log'">>
> /tmp/slave/postgresql.conf
>
> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
>
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
> pg_is_in_recovery();"
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
> pg_is_in_recovery();"
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
>
>
>
>
> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
> promoted during online backupHINT:  This means that the backup being taken
> is corrupt and should not be used. Try taking another online
> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
>
> #same is working fine without parallel backup
> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
> /tmp/bkp_s/PG_VERSION
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>> In another scenarios, bkp data is corrupted for tablespace. again this is
>> not reproducible everytime,
>> but If I am running the same set of commands I am getting the same error.
>>
>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
>> waiting for server to start.... done
>> server started
>> [edb@localhost bin]$
>> [edb@localhost bin]$ mkdir /tmp/tblsp
>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
>> location '/tmp/tblsp';"
>> CREATE TABLESPACE
>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
>> tablespace tblsp;"
>> CREATE DATABASE
>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
>> text);"
>> CREATE TABLE
>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
>> values ('parallel_backup with tablespace');"
>> INSERT 0 1
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
>> start
>> waiting for server to start.... done
>> server started
>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>>   oid  |  spcname   | spcowner | spcacl | spcoptions
>> -------+------------+----------+--------+------------
>>   1663 | pg_default |       10 |        |
>>  16384 | tblsp      |       10 |        |
>> (2 rows)
>>
>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
>> psql: error: could not connect to server: FATAL:
>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
>> missing.
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ls
>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>> [edb@localhost bin]$ ls
>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>> ls: cannot access
>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
>> directory
>>
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>
>>> Hi Asif,
>>>
>>> On testing further, I found when taking backup with -R, pg_basebackup
>>> crashed
>>> this crash is not consistently reproducible.
>>>
>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
>>> text);"
>>> CREATE TABLE
>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
>>> values ('parallel_backup with -R recovery-conf');"
>>> INSERT 0 1
>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
>>> -R
>>> Segmentation fault (core dumped)
>>>
>>> stack trace looks the same as it was on earlier reported crash with
>>> tablespace.
>>> --stack trace
>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>>> Loaded symbols for /lib64/libnss_files.so.2
>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
>>> -R'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>> pg_basebackup.c:3175
>>> 3175 backupinfo->curr = fetchfile->next;
>>> Missing separate debuginfos, use: debuginfo-install
>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>>> (gdb) bt
>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>> pg_basebackup.c:3175
>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>>> pg_basebackup.c:2715
>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>>> pthread_create.c:301
>>> #3  0x00000039212e8c4d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>> (gdb)
>>>
>>> Thanks & Regards,
>>> Rajkumar Raghuwanshi
>>>
>>>
>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>>> jeevan.chalke@enterprisedb.com> wrote:
>>>
>>>> Hi Asif,
>>>>
>>>>
>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>>> patch to the latest master (b7f64c64).
>>>>> (V9 of the patches are attached).
>>>>>
>>>>
>>>> I had a further review of the patches and here are my few observations:
>>>>
>>>> 1.
>>>> +/*
>>>> + * stop_backup() - ends an online backup
>>>> + *
>>>> + * The function is called at the end of an online backup. It sends out
>>>> pg_control
>>>> + * file, optionally WAL segments and ending WAL location.
>>>> + */
>>>>
>>>> Comments seem out-dated.
>>>>
>>>
Fixed.


>
>>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>>>> asking
>>>> data in multiple threads throttling seems important here. Can you please
>>>> explain why have you disabled that?
>>>>
>>>> 3. As we are always fetching a single file and as Robert suggested, let
>>>> rename
>>>> SEND_FILES to SEND_FILE instead.
>>>>
>>>
Yes, we are fetching a single file. However, SEND_FILES is still capable of
fetching multiple files in one
go, that's why the name.


>>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>>> Windows?
>>>> I asked this as I see that pgbench has its own implementation for
>>>> pthread_create() for WIN32 but this patch doesn't.
>>>>
>>>
patch is updated to add support for the Windows platform.


>>>> 5. Typos:
>>>> tablspace => tablespace
>>>> safly => safely
>>>>
>>>> Done.


> 6. parallel_backup_run() needs some comments explaining the states it goes
>>>> through PB_* states.
>>>>
>>>> 7.
>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
>>>> +                if (backupinfo->activeworkers == 0)
>>>> +                {
>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>>> +                    free_filelist(backupinfo);
>>>> +                }
>>>> +                break;
>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server
>>>> */
>>>> +                if (backupinfo->activeworkers == 0)
>>>> +                {
>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>>> +                }
>>>> +                break;
>>>>
>>> Done.


>
>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>>
>>> Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
A pointer
variable remained uninitialized which in turn caused the system to
misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup
feature
set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup,
the server
spawns multiple processes and there is no shared state being maintained. So
currently,
no way to tell multiple processes if the standby was promoted during the
backup since
the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on
the client-side.
However, I found a previous discussion where it was advocated to be added
to the
backend instead[1].

So, it was better to have a consensus before moving the throttle function
to the client.
That’s why for the time being I have disabled it and have asked for
suggestions on it
to move forward.

It seems to me that we have to maintain a shared state in order to support
taking backup
from standby. Also, there is a new feature recently committed for backup
progress
reporting in the backend (pg_stat_progress_basebackup). This functionality
was recently
added via this commit ID: e65497df. For parallel backup to update these
stats, a shared
state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a
shared state
can become a little complex, unless we disallow taking multiple parallel
backups.

So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.



[1]
https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-30T10:43:47Z

Thanks Asif,

I have re-verified reported issue. expect standby backup, others are fixed.

Thanks & Regards,
Rajkumar Raghuwanshi


On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>> While testing further I observed parallel backup is not able to take
>> backup of standby server.
>>
>> mkdir /tmp/archive_dir
>> echo "archive_mode='on'">> data/postgresql.conf
>> echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
>>
>> ./pg_ctl -D data -l logs start
>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
>>
>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
>> /tmp/slave/postgresql.conf
>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
>> /tmp/slave/postgresql.conf
>> echo "promote_trigger_file='/tmp/failover.log'">>
>> /tmp/slave/postgresql.conf
>>
>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
>>
>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
>> pg_is_in_recovery();"
>>  pg_is_in_recovery
>> -------------------
>>  f
>> (1 row)
>>
>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
>> pg_is_in_recovery();"
>>  pg_is_in_recovery
>> -------------------
>>  t
>> (1 row)
>>
>>
>>
>>
>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
>> promoted during online backupHINT:  This means that the backup being taken
>> is corrupt and should not be used. Try taking another online
>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
>>
>> #same is working fine without parallel backup
>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
>> /tmp/bkp_s/PG_VERSION
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>
>>> Hi Asif,
>>>
>>> In another scenarios, bkp data is corrupted for tablespace. again this
>>> is not reproducible everytime,
>>> but If I am running the same set of commands I am getting the same error.
>>>
>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
>>> waiting for server to start.... done
>>> server started
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ mkdir /tmp/tblsp
>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
>>> tblsp location '/tmp/tblsp';"
>>> CREATE TABLESPACE
>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
>>> tablespace tblsp;"
>>> CREATE DATABASE
>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
>>> text);"
>>> CREATE TABLE
>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
>>> values ('parallel_backup with tablespace');"
>>> INSERT 0 1
>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
>>> start
>>> waiting for server to start.... done
>>> server started
>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
>>> -------+------------+----------+--------+------------
>>>   1663 | pg_default |       10 |        |
>>>  16384 | tblsp      |       10 |        |
>>> (2 rows)
>>>
>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
>>> psql: error: could not connect to server: FATAL:
>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
>>> missing.
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ ls
>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>> [edb@localhost bin]$ ls
>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>> ls: cannot access
>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
>>> directory
>>>
>>>
>>> Thanks & Regards,
>>> Rajkumar Raghuwanshi
>>>
>>>
>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>
>>>> Hi Asif,
>>>>
>>>> On testing further, I found when taking backup with -R, pg_basebackup
>>>> crashed
>>>> this crash is not consistently reproducible.
>>>>
>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
>>>> text);"
>>>> CREATE TABLE
>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
>>>> values ('parallel_backup with -R recovery-conf');"
>>>> INSERT 0 1
>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
>>>> -R
>>>> Segmentation fault (core dumped)
>>>>
>>>> stack trace looks the same as it was on earlier reported crash with
>>>> tablespace.
>>>> --stack trace
>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
>>>> /tmp/test_bkp/bkp -R'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>> pg_basebackup.c:3175
>>>> 3175 backupinfo->curr = fetchfile->next;
>>>> Missing separate debuginfos, use: debuginfo-install
>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>>>> (gdb) bt
>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>> pg_basebackup.c:3175
>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>>>> pg_basebackup.c:2715
>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>>>> pthread_create.c:301
>>>> #3  0x00000039212e8c4d in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>> (gdb)
>>>>
>>>> Thanks & Regards,
>>>> Rajkumar Raghuwanshi
>>>>
>>>>
>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>>>> jeevan.chalke@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif,
>>>>>
>>>>>
>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>>>> patch to the latest master (b7f64c64).
>>>>>> (V9 of the patches are attached).
>>>>>>
>>>>>
>>>>> I had a further review of the patches and here are my few observations:
>>>>>
>>>>> 1.
>>>>> +/*
>>>>> + * stop_backup() - ends an online backup
>>>>> + *
>>>>> + * The function is called at the end of an online backup. It sends
>>>>> out pg_control
>>>>> + * file, optionally WAL segments and ending WAL location.
>>>>> + */
>>>>>
>>>>> Comments seem out-dated.
>>>>>
>>>>
> Fixed.
>
>
>>
>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>>>>> asking
>>>>> data in multiple threads throttling seems important here. Can you
>>>>> please
>>>>> explain why have you disabled that?
>>>>>
>>>>> 3. As we are always fetching a single file and as Robert suggested,
>>>>> let rename
>>>>> SEND_FILES to SEND_FILE instead.
>>>>>
>>>>
> Yes, we are fetching a single file. However, SEND_FILES is still capable
> of fetching multiple files in one
> go, that's why the name.
>
>
>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>>>> Windows?
>>>>> I asked this as I see that pgbench has its own implementation for
>>>>> pthread_create() for WIN32 but this patch doesn't.
>>>>>
>>>>
> patch is updated to add support for the Windows platform.
>
>
>>>>> 5. Typos:
>>>>> tablspace => tablespace
>>>>> safly => safely
>>>>>
>>>>> Done.
>
>
>> 6. parallel_backup_run() needs some comments explaining the states it goes
>>>>> through PB_* states.
>>>>>
>>>>> 7.
>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
>>>>> +                if (backupinfo->activeworkers == 0)
>>>>> +                {
>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>>>> +                    free_filelist(backupinfo);
>>>>> +                }
>>>>> +                break;
>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
>>>>> server */
>>>>> +                if (backupinfo->activeworkers == 0)
>>>>> +                {
>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>>>> +                }
>>>>> +                break;
>>>>>
>>>> Done.
>
>
>>
>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>>>
>>>> Done.
>
> The corrupted tablespace and crash, reported by Rajkumar, have been fixed.
> A pointer
> variable remained uninitialized which in turn caused the system to
> misbehave.
>
> Attached is the updated set of patches. AFAIK, to complete parallel backup
> feature
> set, there remain three sub-features:
>
> 1- parallel backup does not work with a standby server. In parallel
> backup, the server
> spawns multiple processes and there is no shared state being maintained.
> So currently,
> no way to tell multiple processes if the standby was promoted during the
> backup since
> the START_BACKUP was called.
>
> 2- throttling. Robert previously suggested that we implement throttling on
> the client-side.
> However, I found a previous discussion where it was advocated to be added
> to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function
> to the client.
> That’s why for the time being I have disabled it and have asked for
> suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support
> taking backup
> from standby. Also, there is a new feature recently committed for backup
> progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality
> was recently
> added via this commit ID: e65497df. For parallel backup to update these
> stats, a shared
> state will be required.
>
> Since multiple pg_basebackup can be running at the same time, maintaining
> a shared state
> can become a little complex, unless we disallow taking multiple parallel
> backups.
>
> So proceeding on with this patch, I will be working on:
> - throttling to be implemented on the client-side.
> - adding a shared state to handle backup from the standby.
>
>
>
> [1]
> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-03-30T12:58:18Z

On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Thanks Asif,
>
> I have re-verified reported issue. expect standby backup, others are fixed.
>

Yes As Asif mentioned he is working on the standby issue and adding
bandwidth throttling functionality to parallel backup.

It would be good to get some feedback on Asif previous email from Robert on
the design considerations for stand-by server support and throttling. I
believe all the other points mentioned by Robert in this thread are
addressed by Asif so it would be good to hear about any other concerns that
are not addressed.

Thanks,

-- Ahsan


> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>>
>> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>
>>> Hi Asif,
>>>
>>> While testing further I observed parallel backup is not able to take
>>> backup of standby server.
>>>
>>> mkdir /tmp/archive_dir
>>> echo "archive_mode='on'">> data/postgresql.conf
>>> echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
>>>
>>> ./pg_ctl -D data -l logs start
>>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
>>>
>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
>>> /tmp/slave/postgresql.conf
>>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
>>> /tmp/slave/postgresql.conf
>>> echo "promote_trigger_file='/tmp/failover.log'">>
>>> /tmp/slave/postgresql.conf
>>>
>>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
>>>
>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
>>> pg_is_in_recovery();"
>>>  pg_is_in_recovery
>>> -------------------
>>>  f
>>> (1 row)
>>>
>>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
>>> pg_is_in_recovery();"
>>>  pg_is_in_recovery
>>> -------------------
>>>  t
>>> (1 row)
>>>
>>>
>>>
>>>
>>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
>>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
>>> promoted during online backupHINT:  This means that the backup being taken
>>> is corrupt and should not be used. Try taking another online
>>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
>>>
>>> #same is working fine without parallel backup
>>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
>>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
>>> /tmp/bkp_s/PG_VERSION
>>>
>>> Thanks & Regards,
>>> Rajkumar Raghuwanshi
>>>
>>>
>>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>
>>>> Hi Asif,
>>>>
>>>> In another scenarios, bkp data is corrupted for tablespace. again this
>>>> is not reproducible everytime,
>>>> but If I am running the same set of commands I am getting the same
>>>> error.
>>>>
>>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
>>>> waiting for server to start.... done
>>>> server started
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ mkdir /tmp/tblsp
>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
>>>> tblsp location '/tmp/tblsp';"
>>>> CREATE TABLESPACE
>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
>>>> testdb tablespace tblsp;"
>>>> CREATE DATABASE
>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
>>>> text);"
>>>> CREATE TABLE
>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
>>>> values ('parallel_backup with tablespace');"
>>>> INSERT 0 1
>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
>>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
>>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
>>>> 5555" start
>>>> waiting for server to start.... done
>>>> server started
>>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
>>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
>>>> -------+------------+----------+--------+------------
>>>>   1663 | pg_default |       10 |        |
>>>>  16384 | tblsp      |       10 |        |
>>>> (2 rows)
>>>>
>>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
>>>> psql: error: could not connect to server: FATAL:
>>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
>>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
>>>> missing.
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ ls
>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>> [edb@localhost bin]$ ls
>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>> ls: cannot access
>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
>>>> directory
>>>>
>>>>
>>>> Thanks & Regards,
>>>> Rajkumar Raghuwanshi
>>>>
>>>>
>>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif,
>>>>>
>>>>> On testing further, I found when taking backup with -R, pg_basebackup
>>>>> crashed
>>>>> this crash is not consistently reproducible.
>>>>>
>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
>>>>> text);"
>>>>> CREATE TABLE
>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
>>>>> values ('parallel_backup with -R recovery-conf');"
>>>>> INSERT 0 1
>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
>>>>> /tmp/test_bkp/bkp -R
>>>>> Segmentation fault (core dumped)
>>>>>
>>>>> stack trace looks the same as it was on earlier reported crash with
>>>>> tablespace.
>>>>> --stack trace
>>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
>>>>> /tmp/test_bkp/bkp -R'.
>>>>> Program terminated with signal 11, Segmentation fault.
>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>> pg_basebackup.c:3175
>>>>> 3175 backupinfo->curr = fetchfile->next;
>>>>> Missing separate debuginfos, use: debuginfo-install
>>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>>>>> (gdb) bt
>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>> pg_basebackup.c:3175
>>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>>>>> pg_basebackup.c:2715
>>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>>>>> pthread_create.c:301
>>>>> #3  0x00000039212e8c4d in clone () at
>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>>> (gdb)
>>>>>
>>>>> Thanks & Regards,
>>>>> Rajkumar Raghuwanshi
>>>>>
>>>>>
>>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>>>>> jeevan.chalke@enterprisedb.com> wrote:
>>>>>
>>>>>> Hi Asif,
>>>>>>
>>>>>>
>>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>>>>> patch to the latest master (b7f64c64).
>>>>>>> (V9 of the patches are attached).
>>>>>>>
>>>>>>
>>>>>> I had a further review of the patches and here are my few
>>>>>> observations:
>>>>>>
>>>>>> 1.
>>>>>> +/*
>>>>>> + * stop_backup() - ends an online backup
>>>>>> + *
>>>>>> + * The function is called at the end of an online backup. It sends
>>>>>> out pg_control
>>>>>> + * file, optionally WAL segments and ending WAL location.
>>>>>> + */
>>>>>>
>>>>>> Comments seem out-dated.
>>>>>>
>>>>>
>> Fixed.
>>
>>
>>>
>>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>>>>>> asking
>>>>>> data in multiple threads throttling seems important here. Can you
>>>>>> please
>>>>>> explain why have you disabled that?
>>>>>>
>>>>>> 3. As we are always fetching a single file and as Robert suggested,
>>>>>> let rename
>>>>>> SEND_FILES to SEND_FILE instead.
>>>>>>
>>>>>
>> Yes, we are fetching a single file. However, SEND_FILES is still capable
>> of fetching multiple files in one
>> go, that's why the name.
>>
>>
>>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>>>>> Windows?
>>>>>> I asked this as I see that pgbench has its own implementation for
>>>>>> pthread_create() for WIN32 but this patch doesn't.
>>>>>>
>>>>>
>> patch is updated to add support for the Windows platform.
>>
>>
>>>>>> 5. Typos:
>>>>>> tablspace => tablespace
>>>>>> safly => safely
>>>>>>
>>>>>> Done.
>>
>>
>>> 6. parallel_backup_run() needs some comments explaining the states it
>>>>>> goes
>>>>>> through PB_* states.
>>>>>>
>>>>>> 7.
>>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server */
>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>> +                {
>>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>>>>> +                    free_filelist(backupinfo);
>>>>>> +                }
>>>>>> +                break;
>>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
>>>>>> server */
>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>> +                {
>>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>>>>> +                }
>>>>>> +                break;
>>>>>>
>>>>> Done.
>>
>>
>>>
>>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>>>>
>>>>> Done.
>>
>> The corrupted tablespace and crash, reported by Rajkumar, have been
>> fixed. A pointer
>> variable remained uninitialized which in turn caused the system to
>> misbehave.
>>
>> Attached is the updated set of patches. AFAIK, to complete parallel
>> backup feature
>> set, there remain three sub-features:
>>
>> 1- parallel backup does not work with a standby server. In parallel
>> backup, the server
>> spawns multiple processes and there is no shared state being maintained.
>> So currently,
>> no way to tell multiple processes if the standby was promoted during the
>> backup since
>> the START_BACKUP was called.
>>
>> 2- throttling. Robert previously suggested that we implement
>> throttling on the client-side.
>> However, I found a previous discussion where it was advocated to be added
>> to the
>> backend instead[1].
>>
>> So, it was better to have a consensus before moving the throttle function
>> to the client.
>> That’s why for the time being I have disabled it and have asked for
>> suggestions on it
>> to move forward.
>>
>> It seems to me that we have to maintain a shared state in order to
>> support taking backup
>> from standby. Also, there is a new feature recently committed for backup
>> progress
>> reporting in the backend (pg_stat_progress_basebackup). This
>> functionality was recently
>> added via this commit ID: e65497df. For parallel backup to update these
>> stats, a shared
>> state will be required.
>>
>> Since multiple pg_basebackup can be running at the same time, maintaining
>> a shared state
>> can become a little complex, unless we disallow taking multiple parallel
>> backups.
>>
>> So proceeding on with this patch, I will be working on:
>> - throttling to be implemented on the client-side.
>> - adding a shared state to handle backup from the standby.
>>
>>
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>

-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-02T09:57:52Z

Hi Asif,

My colleague Kashif Zeeshan reported an issue off-list, posting here,
please take a look.

When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit  Backup got completed
And when tried to start the server from the backup cluster, getting error.

[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/  -o "-p 5438"
start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
directory

Thanks & Regards,
Rajkumar Raghuwanshi


On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

>
>
> On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>> Thanks Asif,
>>
>> I have re-verified reported issue. expect standby backup, others are
>> fixed.
>>
>
> Yes As Asif mentioned he is working on the standby issue and adding
> bandwidth throttling functionality to parallel backup.
>
> It would be good to get some feedback on Asif previous email from Robert
> on the design considerations for stand-by server support and throttling. I
> believe all the other points mentioned by Robert in this thread are
> addressed by Asif so it would be good to hear about any other concerns that
> are not addressed.
>
> Thanks,
>
> -- Ahsan
>
>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>> On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>
>>>> Hi Asif,
>>>>
>>>> While testing further I observed parallel backup is not able to take
>>>> backup of standby server.
>>>>
>>>> mkdir /tmp/archive_dir
>>>> echo "archive_mode='on'">> data/postgresql.conf
>>>> echo "archive_command='cp %p /tmp/archive_dir/%f'">>
>>>> data/postgresql.conf
>>>>
>>>> ./pg_ctl -D data -l logs start
>>>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
>>>>
>>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
>>>> /tmp/slave/postgresql.conf
>>>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
>>>> /tmp/slave/postgresql.conf
>>>> echo "promote_trigger_file='/tmp/failover.log'">>
>>>> /tmp/slave/postgresql.conf
>>>>
>>>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
>>>>
>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
>>>> pg_is_in_recovery();"
>>>>  pg_is_in_recovery
>>>> -------------------
>>>>  f
>>>> (1 row)
>>>>
>>>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
>>>> pg_is_in_recovery();"
>>>>  pg_is_in_recovery
>>>> -------------------
>>>>  t
>>>> (1 row)
>>>>
>>>>
>>>>
>>>>
>>>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
>>>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
>>>> promoted during online backupHINT:  This means that the backup being taken
>>>> is corrupt and should not be used. Try taking another online
>>>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
>>>>
>>>> #same is working fine without parallel backup
>>>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
>>>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
>>>> /tmp/bkp_s/PG_VERSION
>>>>
>>>> Thanks & Regards,
>>>> Rajkumar Raghuwanshi
>>>>
>>>>
>>>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif,
>>>>>
>>>>> In another scenarios, bkp data is corrupted for tablespace. again this
>>>>> is not reproducible everytime,
>>>>> but If I am running the same set of commands I am getting the same
>>>>> error.
>>>>>
>>>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
>>>>> waiting for server to start.... done
>>>>> server started
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$ mkdir /tmp/tblsp
>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
>>>>> tblsp location '/tmp/tblsp';"
>>>>> CREATE TABLESPACE
>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
>>>>> testdb tablespace tblsp;"
>>>>> CREATE DATABASE
>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
>>>>> (a text);"
>>>>> CREATE TABLE
>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
>>>>> values ('parallel_backup with tablespace');"
>>>>> INSERT 0 1
>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
>>>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
>>>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
>>>>> 5555" start
>>>>> waiting for server to start.... done
>>>>> server started
>>>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
>>>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>>>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
>>>>> -------+------------+----------+--------+------------
>>>>>   1663 | pg_default |       10 |        |
>>>>>  16384 | tblsp      |       10 |        |
>>>>> (2 rows)
>>>>>
>>>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
>>>>> psql: error: could not connect to server: FATAL:
>>>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
>>>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
>>>>> missing.
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$ ls
>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>> [edb@localhost bin]$ ls
>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>> ls: cannot access
>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
>>>>> directory
>>>>>
>>>>>
>>>>> Thanks & Regards,
>>>>> Rajkumar Raghuwanshi
>>>>>
>>>>>
>>>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
>>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>>
>>>>>> Hi Asif,
>>>>>>
>>>>>> On testing further, I found when taking backup with -R, pg_basebackup
>>>>>> crashed
>>>>>> this crash is not consistently reproducible.
>>>>>>
>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
>>>>>> (a text);"
>>>>>> CREATE TABLE
>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
>>>>>> values ('parallel_backup with -R recovery-conf');"
>>>>>> INSERT 0 1
>>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
>>>>>> /tmp/test_bkp/bkp -R
>>>>>> Segmentation fault (core dumped)
>>>>>>
>>>>>> stack trace looks the same as it was on earlier reported crash with
>>>>>> tablespace.
>>>>>> --stack trace
>>>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>>>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
>>>>>> /tmp/test_bkp/bkp -R'.
>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>>> pg_basebackup.c:3175
>>>>>> 3175 backupinfo->curr = fetchfile->next;
>>>>>> Missing separate debuginfos, use: debuginfo-install
>>>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>>>>>> (gdb) bt
>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>>> pg_basebackup.c:3175
>>>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>>>>>> pg_basebackup.c:2715
>>>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>>>>>> pthread_create.c:301
>>>>>> #3  0x00000039212e8c4d in clone () at
>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>>>> (gdb)
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Rajkumar Raghuwanshi
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>>>>>> jeevan.chalke@enterprisedb.com> wrote:
>>>>>>
>>>>>>> Hi Asif,
>>>>>>>
>>>>>>>
>>>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>>>>>> patch to the latest master (b7f64c64).
>>>>>>>> (V9 of the patches are attached).
>>>>>>>>
>>>>>>>
>>>>>>> I had a further review of the patches and here are my few
>>>>>>> observations:
>>>>>>>
>>>>>>> 1.
>>>>>>> +/*
>>>>>>> + * stop_backup() - ends an online backup
>>>>>>> + *
>>>>>>> + * The function is called at the end of an online backup. It sends
>>>>>>> out pg_control
>>>>>>> + * file, optionally WAL segments and ending WAL location.
>>>>>>> + */
>>>>>>>
>>>>>>> Comments seem out-dated.
>>>>>>>
>>>>>>
>>> Fixed.
>>>
>>>
>>>>
>>>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are
>>>>>>> now asking
>>>>>>> data in multiple threads throttling seems important here. Can you
>>>>>>> please
>>>>>>> explain why have you disabled that?
>>>>>>>
>>>>>>> 3. As we are always fetching a single file and as Robert suggested,
>>>>>>> let rename
>>>>>>> SEND_FILES to SEND_FILE instead.
>>>>>>>
>>>>>>
>>> Yes, we are fetching a single file. However, SEND_FILES is still capable
>>> of fetching multiple files in one
>>> go, that's why the name.
>>>
>>>
>>>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>>>>>> Windows?
>>>>>>> I asked this as I see that pgbench has its own implementation for
>>>>>>> pthread_create() for WIN32 but this patch doesn't.
>>>>>>>
>>>>>>
>>> patch is updated to add support for the Windows platform.
>>>
>>>
>>>>>>> 5. Typos:
>>>>>>> tablspace => tablespace
>>>>>>> safly => safely
>>>>>>>
>>>>>>> Done.
>>>
>>>
>>>> 6. parallel_backup_run() needs some comments explaining the states it
>>>>>>> goes
>>>>>>> through PB_* states.
>>>>>>>
>>>>>>> 7.
>>>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server
>>>>>>> */
>>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>>> +                {
>>>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>>>>>> +                    free_filelist(backupinfo);
>>>>>>> +                }
>>>>>>> +                break;
>>>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
>>>>>>> server */
>>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>>> +                {
>>>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>>>>>> +                }
>>>>>>> +                break;
>>>>>>>
>>>>>> Done.
>>>
>>>
>>>>
>>>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>>>>>
>>>>>> Done.
>>>
>>> The corrupted tablespace and crash, reported by Rajkumar, have been
>>> fixed. A pointer
>>> variable remained uninitialized which in turn caused the system to
>>> misbehave.
>>>
>>> Attached is the updated set of patches. AFAIK, to complete parallel
>>> backup feature
>>> set, there remain three sub-features:
>>>
>>> 1- parallel backup does not work with a standby server. In parallel
>>> backup, the server
>>> spawns multiple processes and there is no shared state being maintained.
>>> So currently,
>>> no way to tell multiple processes if the standby was promoted during the
>>> backup since
>>> the START_BACKUP was called.
>>>
>>> 2- throttling. Robert previously suggested that we implement
>>> throttling on the client-side.
>>> However, I found a previous discussion where it was advocated to be
>>> added to the
>>> backend instead[1].
>>>
>>> So, it was better to have a consensus before moving the throttle
>>> function to the client.
>>> That’s why for the time being I have disabled it and have asked for
>>> suggestions on it
>>> to move forward.
>>>
>>> It seems to me that we have to maintain a shared state in order to
>>> support taking backup
>>> from standby. Also, there is a new feature recently committed for backup
>>> progress
>>> reporting in the backend (pg_stat_progress_basebackup). This
>>> functionality was recently
>>> added via this commit ID: e65497df. For parallel backup to update these
>>> stats, a shared
>>> state will be required.
>>>
>>> Since multiple pg_basebackup can be running at the same time,
>>> maintaining a shared state
>>> can become a little complex, unless we disallow taking multiple parallel
>>> backups.
>>>
>>> So proceeding on with this patch, I will be working on:
>>> - throttling to be implemented on the client-side.
>>> - adding a shared state to handle backup from the standby.
>>>
>>>
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>>
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : http://www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> EMAIL: mailto: ahsan.hadi@highgo.ca
>

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T11:29:47Z

Hi Asif

The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.
Please note that the backup directory was also not cleaned after the backup
was failed.


Steps
=======
1) Generate data in DB
 ./pgbench -i -s 600 -h localhost  -p 5432 postgres
2) Set max_wal_senders = 2000 in postgresql.
3) Generate the backup


[edb@localhost bin]$
^[[A[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
 /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/F1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_58692"
pg_basebackup: backup worker (0) created
….
…..
…..
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
pg_basebackup: backup worker (1018) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
…
…
…
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
pg_basebackup: backup worker (1989) created
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/4183": Too many open files
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/3592": Too many open files
pg_basebackup: error: could not create file
"/home/edb/Desktop/backup//global/4177": Too many open files
[edb@localhost bin]$


4) The backup directory is not cleaned


[edb@localhost bin]$
[edb@localhost bin]$ ls  /home/edb/Desktop/backup
base    pg_commit_ts  pg_logical    pg_notify    pg_serial     pg_stat
 pg_subtrans  pg_twophase  pg_xact
global  pg_dynshmem   pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp
 pg_tblspc    pg_wal
[edb@localhost bin]$


Kashif Zeeshan
EnterpriseDB


On Thu, Apr 2, 2020 at 2:58 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif,
>
> My colleague Kashif Zeeshan reported an issue off-list, posting here,
> please take a look.
>
> When executing two backups at the same time, getting FATAL error due to
> max_wal_senders and instead of exit  Backup got completed
> And when tried to start the server from the backup cluster, getting error.
>
> [edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
> [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup/
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_57849"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (3) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (4) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (5) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (6) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (7) created
> pg_basebackup: write-ahead log end point: 0/C3000050
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: syncing data to disk ...
> pg_basebackup: base backup completed
> [edb@localhost bin]$ ./pg_basebackup -v -j 8 -D
>  /home/edb/Desktop/backup1/
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_57848"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (3) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (4) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (5) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (6) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: backup worker (7) created
> pg_basebackup: write-ahead log end point: 0/C2000348
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: syncing data to disk ...
> pg_basebackup: base backup completed
>
> [edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/  -o "-p 5438"
> start
> pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster
> directory
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>
>>
>>
>> On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <
>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>
>>> Thanks Asif,
>>>
>>> I have re-verified reported issue. expect standby backup, others are
>>> fixed.
>>>
>>
>> Yes As Asif mentioned he is working on the standby issue and adding
>> bandwidth throttling functionality to parallel backup.
>>
>> It would be good to get some feedback on Asif previous email from Robert
>> on the design considerations for stand-by server support and throttling. I
>> believe all the other points mentioned by Robert in this thread are
>> addressed by Asif so it would be good to hear about any other concerns that
>> are not addressed.
>>
>> Thanks,
>>
>> -- Ahsan
>>
>>
>>> Thanks & Regards,
>>> Rajkumar Raghuwanshi
>>>
>>>
>>> On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <
>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif,
>>>>>
>>>>> While testing further I observed parallel backup is not able to take
>>>>> backup of standby server.
>>>>>
>>>>> mkdir /tmp/archive_dir
>>>>> echo "archive_mode='on'">> data/postgresql.conf
>>>>> echo "archive_command='cp %p /tmp/archive_dir/%f'">>
>>>>> data/postgresql.conf
>>>>>
>>>>> ./pg_ctl -D data -l logs start
>>>>> ./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
>>>>>
>>>>> echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
>>>>> /tmp/slave/postgresql.conf
>>>>> echo "restore_command='cp /tmp/archive_dir/%f %p'">>
>>>>> /tmp/slave/postgresql.conf
>>>>> echo "promote_trigger_file='/tmp/failover.log'">>
>>>>> /tmp/slave/postgresql.conf
>>>>>
>>>>> ./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
>>>>>
>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "select
>>>>> pg_is_in_recovery();"
>>>>>  pg_is_in_recovery
>>>>> -------------------
>>>>>  f
>>>>> (1 row)
>>>>>
>>>>> [edb@localhost bin]$ ./psql postgres -p 5433 -c "select
>>>>> pg_is_in_recovery();"
>>>>>  pg_is_in_recovery
>>>>> -------------------
>>>>>  t
>>>>> (1 row)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
>>>>> 6pg_basebackup: error: could not list backup files: ERROR:  the standby was
>>>>> promoted during online backupHINT:  This means that the backup being taken
>>>>> is corrupt and should not be used. Try taking another online
>>>>> backup.pg_basebackup: removing data directory "/tmp/bkp_s"*
>>>>>
>>>>> #same is working fine without parallel backup
>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
>>>>> [edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
>>>>> /tmp/bkp_s/PG_VERSION
>>>>>
>>>>> Thanks & Regards,
>>>>> Rajkumar Raghuwanshi
>>>>>
>>>>>
>>>>> On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
>>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>>
>>>>>> Hi Asif,
>>>>>>
>>>>>> In another scenarios, bkp data is corrupted for tablespace. again
>>>>>> this is not reproducible everytime,
>>>>>> but If I am running the same set of commands I am getting the same
>>>>>> error.
>>>>>>
>>>>>> [edb@localhost bin]$ ./pg_ctl -D data -l logfile start
>>>>>> waiting for server to start.... done
>>>>>> server started
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$ mkdir /tmp/tblsp
>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace
>>>>>> tblsp location '/tmp/tblsp';"
>>>>>> CREATE TABLESPACE
>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create database
>>>>>> testdb tablespace tblsp;"
>>>>>> CREATE DATABASE
>>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl
>>>>>> (a text);"
>>>>>> CREATE TABLE
>>>>>> [edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl
>>>>>> values ('parallel_backup with tablespace');"
>>>>>> INSERT 0 1
>>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
>>>>>> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
>>>>>> [edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p
>>>>>> 5555" start
>>>>>> waiting for server to start.... done
>>>>>> server started
>>>>>> [edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from
>>>>>> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
>>>>>>   oid  |  spcname   | spcowner | spcacl | spcoptions
>>>>>> -------+------------+----------+--------+------------
>>>>>>   1663 | pg_default |       10 |        |
>>>>>>  16384 | tblsp      |       10 |        |
>>>>>> (2 rows)
>>>>>>
>>>>>> [edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from
>>>>>> testtbl";
>>>>>> psql: error: could not connect to server: FATAL:
>>>>>>  "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
>>>>>> DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
>>>>>> missing.
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$ ls
>>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>>> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>>> [edb@localhost bin]$ ls
>>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
>>>>>> ls: cannot access
>>>>>> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
>>>>>> directory
>>>>>>
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Rajkumar Raghuwanshi
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
>>>>>> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>>>>>>
>>>>>>> Hi Asif,
>>>>>>>
>>>>>>> On testing further, I found when taking backup with -R,
>>>>>>> pg_basebackup crashed
>>>>>>> this crash is not consistently reproducible.
>>>>>>>
>>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test
>>>>>>> (a text);"
>>>>>>> CREATE TABLE
>>>>>>> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test
>>>>>>> values ('parallel_backup with -R recovery-conf');"
>>>>>>> INSERT 0 1
>>>>>>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D
>>>>>>> /tmp/test_bkp/bkp -R
>>>>>>> Segmentation fault (core dumped)
>>>>>>>
>>>>>>> stack trace looks the same as it was on earlier reported crash with
>>>>>>> tablespace.
>>>>>>> --stack trace
>>>>>>> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
>>>>>>> Loaded symbols for /lib64/libnss_files.so.2
>>>>>>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D
>>>>>>> /tmp/test_bkp/bkp -R'.
>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>>>> pg_basebackup.c:3175
>>>>>>> 3175 backupinfo->curr = fetchfile->next;
>>>>>>> Missing separate debuginfos, use: debuginfo-install
>>>>>>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>>>>>>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>>>>>>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>>>>>>> (gdb) bt
>>>>>>> #0  0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>>>>>>> pg_basebackup.c:3175
>>>>>>> #1  0x0000000000408a9e in worker_run (arg=0xc1e458) at
>>>>>>> pg_basebackup.c:2715
>>>>>>> #2  0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>>>>>>> pthread_create.c:301
>>>>>>> #3  0x00000039212e8c4d in clone () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>>>>> (gdb)
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Rajkumar Raghuwanshi
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>>>>>>> jeevan.chalke@enterprisedb.com> wrote:
>>>>>>>
>>>>>>>> Hi Asif,
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks Rajkumar. I have fixed the above issues and have rebased
>>>>>>>>> the patch to the latest master (b7f64c64).
>>>>>>>>> (V9 of the patches are attached).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I had a further review of the patches and here are my few
>>>>>>>> observations:
>>>>>>>>
>>>>>>>> 1.
>>>>>>>> +/*
>>>>>>>> + * stop_backup() - ends an online backup
>>>>>>>> + *
>>>>>>>> + * The function is called at the end of an online backup. It sends
>>>>>>>> out pg_control
>>>>>>>> + * file, optionally WAL segments and ending WAL location.
>>>>>>>> + */
>>>>>>>>
>>>>>>>> Comments seem out-dated.
>>>>>>>>
>>>>>>>
>>>> Fixed.
>>>>
>>>>
>>>>>
>>>>>>>> 2. With parallel jobs, maxrate is now not supported. Since we are
>>>>>>>> now asking
>>>>>>>> data in multiple threads throttling seems important here. Can you
>>>>>>>> please
>>>>>>>> explain why have you disabled that?
>>>>>>>>
>>>>>>>> 3. As we are always fetching a single file and as Robert suggested,
>>>>>>>> let rename
>>>>>>>> SEND_FILES to SEND_FILE instead.
>>>>>>>>
>>>>>>>
>>>> Yes, we are fetching a single file. However, SEND_FILES is still
>>>> capable of fetching multiple files in one
>>>> go, that's why the name.
>>>>
>>>>
>>>>>>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>>>>>>> Windows?
>>>>>>>> I asked this as I see that pgbench has its own implementation for
>>>>>>>> pthread_create() for WIN32 but this patch doesn't.
>>>>>>>>
>>>>>>>
>>>> patch is updated to add support for the Windows platform.
>>>>
>>>>
>>>>>>>> 5. Typos:
>>>>>>>> tablspace => tablespace
>>>>>>>> safly => safely
>>>>>>>>
>>>>>>>> Done.
>>>>
>>>>
>>>>> 6. parallel_backup_run() needs some comments explaining the states it
>>>>>>>> goes
>>>>>>>> through PB_* states.
>>>>>>>>
>>>>>>>> 7.
>>>>>>>> +            case PB_FETCH_REL_FILES:    /* fetch files from server
>>>>>>>> */
>>>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>>>> +                {
>>>>>>>> +                    backupinfo->backupstate = PB_STOP_BACKUP;
>>>>>>>> +                    free_filelist(backupinfo);
>>>>>>>> +                }
>>>>>>>> +                break;
>>>>>>>> +            case PB_FETCH_WAL_FILES:    /* fetch WAL files from
>>>>>>>> server */
>>>>>>>> +                if (backupinfo->activeworkers == 0)
>>>>>>>> +                {
>>>>>>>> +                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>>>>>>> +                }
>>>>>>>> +                break;
>>>>>>>>
>>>>>>> Done.
>>>>
>>>>
>>>>>
>>>>>>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>>>>>>
>>>>>>> Done.
>>>>
>>>> The corrupted tablespace and crash, reported by Rajkumar, have been
>>>> fixed. A pointer
>>>> variable remained uninitialized which in turn caused the system to
>>>> misbehave.
>>>>
>>>> Attached is the updated set of patches. AFAIK, to complete parallel
>>>> backup feature
>>>> set, there remain three sub-features:
>>>>
>>>> 1- parallel backup does not work with a standby server. In parallel
>>>> backup, the server
>>>> spawns multiple processes and there is no shared state being
>>>> maintained. So currently,
>>>> no way to tell multiple processes if the standby was promoted during
>>>> the backup since
>>>> the START_BACKUP was called.
>>>>
>>>> 2- throttling. Robert previously suggested that we implement
>>>> throttling on the client-side.
>>>> However, I found a previous discussion where it was advocated to be
>>>> added to the
>>>> backend instead[1].
>>>>
>>>> So, it was better to have a consensus before moving the throttle
>>>> function to the client.
>>>> That’s why for the time being I have disabled it and have asked for
>>>> suggestions on it
>>>> to move forward.
>>>>
>>>> It seems to me that we have to maintain a shared state in order to
>>>> support taking backup
>>>> from standby. Also, there is a new feature recently committed for
>>>> backup progress
>>>> reporting in the backend (pg_stat_progress_basebackup). This
>>>> functionality was recently
>>>> added via this commit ID: e65497df. For parallel backup to update these
>>>> stats, a shared
>>>> state will be required.
>>>>
>>>> Since multiple pg_basebackup can be running at the same time,
>>>> maintaining a shared state
>>>> can become a little complex, unless we disallow taking multiple
>>>> parallel backups.
>>>>
>>>> So proceeding on with this patch, I will be working on:
>>>> - throttling to be implemented on the client-side.
>>>> - adding a shared state to handle backup from the standby.
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>>>
>>>>
>>>> --
>>>> Asif Rehman
>>>> Highgo Software (Canada/China/Pakistan)
>>>> URL : www.highgo.ca
>>>>
>>>>
>>
>> --
>> Highgo Software (Canada/China/Pakistan)
>> URL : http://www.highgo.ca
>> ADDR: 10318 WHALLEY BLVD, Surrey, BC
>> EMAIL: mailto: ahsan.hadi@highgo.ca
>>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-02T11:46:51Z

On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-02T11:48:33Z

On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

> The backup failed with errors "error: could not connect to server: could
> not look up local user ID 1000: Too many open files" when the
> max_wal_senders was set to 2000.
> The errors generated for the workers starting from backup worke=1017.
>

It wasn't the fact that you set max_wal_senders to 2000. It was the fact
that you specified 1990 parallel workers. By so doing, you overloaded the
machine, which is why everything failed. That's to be expected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T11:54:49Z

On Thu, Apr 2, 2020 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>> The backup failed with errors "error: could not connect to server: could
>> not look up local user ID 1000: Too many open files" when the
>> max_wal_senders was set to 2000.
>> The errors generated for the workers starting from backup worke=1017.
>>
>
> It wasn't the fact that you set max_wal_senders to 2000. It was the fact
> that you specified 1990 parallel workers. By so doing, you overloaded the
> machine, which is why everything failed. That's to be expected.
>
> Thanks alot Robert,
In this case the backup folder was not being emptied as the backup was
failed, the cleanup should be done in this case too.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-02T13:23:21Z

On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-02T13:46:15Z

On Thu, Apr 2, 2020 at 6:23 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
> <kashif.zeeshan@enterprisedb.com> wrote:
> > Thanks alot Robert,
> > In this case the backup folder was not being emptied as the backup was
> failed, the cleanup should be done in this case too.
>
> Does it fail to clean up the backup folder in all cases where the
> backup failed, or just in this case?
>
The cleanup is done in the cases I have seen so far with base pg_basebackup
functionality (not including the parallel backup feature) with the message
"pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar
where the contents of the backup folder were not cleaned up after the error.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-02T14:20:15Z

On Thu, Apr 2, 2020 at 9:46 AM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

> Does it fail to clean up the backup folder in all cases where the
>> backup failed, or just in this case?
>>
> The cleanup is done in the cases I have seen so far with base
> pg_basebackup functionality (not including the parallel backup feature)
> with the message "pg_basebackup: removing contents of data directory"
> A similar case was also fixed for parallel backup reported by Rajkumar
> where the contents of the backup folder were not cleaned up after the error.
>

What I'm saying is that it's unclear whether there's a bug here or whether
it just failed because of the very extreme test scenario you created.
Spawning >1000 processes on a small machine can easily make a lot of things
fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-02T15:16:57Z

On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > Yes, we are fetching a single file. However, SEND_FILES is still capable
> of fetching multiple files in one
> > go, that's why the name.
>
> I don't see why it should work that way. If we're fetching individual
> files, why have an unused capability to fetch multiple files?
>

Okay will rename and will modify the function to send a single file as well.


> > 1- parallel backup does not work with a standby server. In parallel
> backup, the server
> > spawns multiple processes and there is no shared state being maintained.
> So currently,
> > no way to tell multiple processes if the standby was promoted during the
> backup since
> > the START_BACKUP was called.
>
> Why would you need to do that? As long as the process where
> STOP_BACKUP can do the check, that seems good enough.
>


Yes, but the user will get the error only after the STOP_BACKUP, not while
the backup is
in progress. So if the backup is a large one, early error detection would
be much beneficial.
This is the current behavior of non-parallel backup as well.


>
> > 2- throttling. Robert previously suggested that we implement throttling
> on the client-side.
> > However, I found a previous discussion where it was advocated to be
> added to the
> > backend instead[1].
> >
> > So, it was better to have a consensus before moving the throttle
> function to the client.
> > That’s why for the time being I have disabled it and have asked for
> suggestions on it
> > to move forward.
> >
> > It seems to me that we have to maintain a shared state in order to
> support taking backup
> > from standby. Also, there is a new feature recently committed for backup
> progress
> > reporting in the backend (pg_stat_progress_basebackup). This
> functionality was recently
> > added via this commit ID: e65497df. For parallel backup to update these
> stats, a shared
> > state will be required.
>
> I've come around to the view that a shared state is a good idea and
> that throttling on the server-side makes more sense. I'm not clear on
> whether we need shared state only for throttling or whether we need it
> for more than that. Another possible reason might be for the
> progress-reporting stuff that just got added.
>

Okay, then I will add the shared state. And since we are adding the shared
state, we can use
that for throttling, progress-reporting and standby early error checking.


> > Since multiple pg_basebackup can be running at the same time,
> maintaining a shared state
> > can become a little complex, unless we disallow taking multiple parallel
> backups.
>
> I do not see why it would be necessary to disallow taking multiple
> parallel backups. You just need to have multiple copies of the shared
> state and a way to decide which one to use for any particular backup.
> I guess that is a little complex, but only a little.
>

There are two possible options:

(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.


This BackupID should be given back as a response to start backup command.
All client workers

must append this ID to all parallel backup replication commands. So that we
can use this identifier

to search for that particular backup. Does that sound good?


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-02T15:44:59Z

On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-03T08:45:23Z

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> >> Why would you need to do that? As long as the process where
> >> STOP_BACKUP can do the check, that seems good enough.
> >
> > Yes, but the user will get the error only after the STOP_BACKUP, not
> while the backup is
> > in progress. So if the backup is a large one, early error detection
> would be much beneficial.
> > This is the current behavior of non-parallel backup as well.
>
> Because non-parallel backup does not feature early detection of this
> error, it is not necessary to make parallel backup do so. Indeed, it
> is undesirable. If you want to fix that problem, do it on a separate
> thread in a separate patch. A patch proposing to make parallel backup
> inconsistent in behavior with non-parallel backup will be rejected, at
> least if I have anything to say about it.
>
> TBH, fixing this doesn't seem like an urgent problem to me. The
> current situation is not great, but promotions ought to be relatively
> infrequent, so I'm not sure it's a huge problem in practice. It is
> also worth considering whether the right fix is to figure out how to
> make that case actually work, rather than just making it fail quicker.
> I don't currently understand the reason for the prohibition so I can't
> express an intelligent opinion on what the right answer is here, but
> it seems like it ought to be investigated before somebody goes and
> builds a bunch of infrastructure to make the error more timely.
>

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from

basebackup.c:sendDir()


/*
>
>  * Check if the postmaster has signaled us to exit, and abort with an
>
>  * error in that case. The error handler further up will call
>
>  * do_pg_abort_backup() for us. Also check that if the backup was
>
>  * started while still in recovery, the server wasn't promoted.
>
>  * do_pg_stop_backup() will check that too, but it's better to stop
>
>  * the backup early than continue to the end and fail there.
>
>  */
>
> CHECK_FOR_INTERRUPTS();
>
> *if* (RecoveryInProgress() != backup_started_in_recovery)
>
> ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>
> errmsg("the standby was promoted during online backup"),
>
> errhint("This means that the backup being taken is corrupt "
>
> "and should not be used. "
>
> "Try taking another online backup.")));
>
>
> > Okay, then I will add the shared state. And since we are adding the
> shared state, we can use
> > that for throttling, progress-reporting and standby early error checking.
>
> Please propose a grammar here for all the new replication commands you
> plan to add before going and implement everything. That will make it
> easier to hash out the design without forcing you to keep changing the
> code. Your design should include a sketch of how several sets of
> coordinating backends taking several concurrent parallel backups will
> end up with one shared state per parallel backup.
>
> > There are two possible options:
> >
> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> > (2) (Preferred Option) Use the WAL start location as the BackupID.
> >
> > This BackupID should be given back as a response to start backup
> command. All client workers
> > must append this ID to all parallel backup replication commands. So that
> we can use this identifier
> > to search for that particular backup. Does that sound good?
>
> Using the WAL start location as the backup ID seems like it might be
> problematic -- could a single checkpoint not end up as the start
> location for multiple backups started at the same time? Whether that's
> possible now or not, it seems unwise to hard-wire that assumption into
> the wire protocol.
>
> I was thinking that perhaps the client should generate a unique backup
> ID, e.g. leader does:
>
> START_BACKUP unique_backup_id [options]...
>
> And then others do:
>
> JOIN_BACKUP unique_backup_id
>
> My thought is that you will have a number of shared memory structure
> equal to max_wal_senders, each one large enough to hold the shared
> state for one backup. The shared state will include
> char[NAMEDATALEN-or-something] which will be used to hold the backup
> ID. START_BACKUP would allocate one and copy the name into it;
> JOIN_BACKUP would search for one by name.
>
> If you want to generate the name on the server side, then I suppose
> START_BACKUP would return a result set that includes the backup ID,
> and clients would have to specify that same backup ID when invoking
> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
> better. Either way, the backup ID should be something long and hard to
> guess, not e.g. the leader processes' PID. I think we should generate
> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
> result to get a string. That way there's almost no risk of two backup
> IDs colliding accidentally, and even if we somehow had a malicious
> user trying to screw up somebody else's parallel backup by choosing a
> colliding backup ID, it would be pretty hard to have any success. A
> user with enough access to do that sort of thing can probably cause a
> lot worse problems anyway, but it seems pretty easy to guard against
> intentional collisions robustly here, so I think we should.
>
>
Okay so If we are to add another replication command ‘JOIN_BACKUP
unique_backup_id’
to make workers find the relevant shared state. There won't be any need for
changing
the grammar for any other command. The START_BACKUP can return the
unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

> *typedef* *struct*
>
> {
>
> *char* backupid[NAMEDATALEN];
>
> XLogRecPtr startptr;
>
>
> slock_t lock;
>
> int64 throttling_counter;
>
> *bool* backup_started_in_recovery;
>
> } BackupSharedState;
>
>
The shared state structure entries would be maintained by a shared hash
table.
There will be one structure per parallel backup. Since a single parallel
backup
can engage more than one wal sender, so I think max_wal_senders might be a
little
too much; perhaps max_wal_senders/2 since there will be at least 2
connections
per parallel backup? Alternatively, we can set a new GUC that defines the
maximum
number of for concurrent parallel backups i.e.
‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize
BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will
now be
updating it. So in parallel backup, this will represent the overall bytes
that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the
throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.


Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
  - returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
  - returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
  - attaches a shared state identified by ‘unique_backup_id’ to a backend
process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-03T10:01:17Z

Hi Asif

When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/
-T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
 pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
            postgresql.conf
base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
 postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undone


On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> >> Why would you need to do that? As long as the process where
>> >> STOP_BACKUP can do the check, that seems good enough.
>> >
>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>> while the backup is
>> > in progress. So if the backup is a large one, early error detection
>> would be much beneficial.
>> > This is the current behavior of non-parallel backup as well.
>>
>> Because non-parallel backup does not feature early detection of this
>> error, it is not necessary to make parallel backup do so. Indeed, it
>> is undesirable. If you want to fix that problem, do it on a separate
>> thread in a separate patch. A patch proposing to make parallel backup
>> inconsistent in behavior with non-parallel backup will be rejected, at
>> least if I have anything to say about it.
>>
>> TBH, fixing this doesn't seem like an urgent problem to me. The
>> current situation is not great, but promotions ought to be relatively
>> infrequent, so I'm not sure it's a huge problem in practice. It is
>> also worth considering whether the right fix is to figure out how to
>> make that case actually work, rather than just making it fail quicker.
>> I don't currently understand the reason for the prohibition so I can't
>> express an intelligent opinion on what the right answer is here, but
>> it seems like it ought to be investigated before somebody goes and
>> builds a bunch of infrastructure to make the error more timely.
>>
>
> Non-parallel backup already does the early error checking. I only intended
>
> to make parallel behave the same as non-parallel here. So, I agree with
>
> you that the behavior of parallel backup should be consistent with the
>
> non-parallel one.  Please see the code snippet below from
>
> basebackup.c:sendDir()
>
>
> /*
>>
>>  * Check if the postmaster has signaled us to exit, and abort with an
>>
>>  * error in that case. The error handler further up will call
>>
>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>
>>  * started while still in recovery, the server wasn't promoted.
>>
>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>
>>  * the backup early than continue to the end and fail there.
>>
>>  */
>>
>> CHECK_FOR_INTERRUPTS();
>>
>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>
>> ereport(ERROR,
>>
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>
>> errmsg("the standby was promoted during online backup"),
>>
>> errhint("This means that the backup being taken is corrupt "
>>
>> "and should not be used. "
>>
>> "Try taking another online backup.")));
>>
>>
>> > Okay, then I will add the shared state. And since we are adding the
>> shared state, we can use
>> > that for throttling, progress-reporting and standby early error
>> checking.
>>
>> Please propose a grammar here for all the new replication commands you
>> plan to add before going and implement everything. That will make it
>> easier to hash out the design without forcing you to keep changing the
>> code. Your design should include a sketch of how several sets of
>> coordinating backends taking several concurrent parallel backups will
>> end up with one shared state per parallel backup.
>>
>> > There are two possible options:
>> >
>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>> >
>> > This BackupID should be given back as a response to start backup
>> command. All client workers
>> > must append this ID to all parallel backup replication commands. So
>> that we can use this identifier
>> > to search for that particular backup. Does that sound good?
>>
>> Using the WAL start location as the backup ID seems like it might be
>> problematic -- could a single checkpoint not end up as the start
>> location for multiple backups started at the same time? Whether that's
>> possible now or not, it seems unwise to hard-wire that assumption into
>> the wire protocol.
>>
>> I was thinking that perhaps the client should generate a unique backup
>> ID, e.g. leader does:
>>
>> START_BACKUP unique_backup_id [options]...
>>
>> And then others do:
>>
>> JOIN_BACKUP unique_backup_id
>>
>> My thought is that you will have a number of shared memory structure
>> equal to max_wal_senders, each one large enough to hold the shared
>> state for one backup. The shared state will include
>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>> ID. START_BACKUP would allocate one and copy the name into it;
>> JOIN_BACKUP would search for one by name.
>>
>> If you want to generate the name on the server side, then I suppose
>> START_BACKUP would return a result set that includes the backup ID,
>> and clients would have to specify that same backup ID when invoking
>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>> better. Either way, the backup ID should be something long and hard to
>> guess, not e.g. the leader processes' PID. I think we should generate
>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>> result to get a string. That way there's almost no risk of two backup
>> IDs colliding accidentally, and even if we somehow had a malicious
>> user trying to screw up somebody else's parallel backup by choosing a
>> colliding backup ID, it would be pretty hard to have any success. A
>> user with enough access to do that sort of thing can probably cause a
>> lot worse problems anyway, but it seems pretty easy to guard against
>> intentional collisions robustly here, so I think we should.
>>
>>
> Okay so If we are to add another replication command ‘JOIN_BACKUP
> unique_backup_id’
> to make workers find the relevant shared state. There won't be any need
> for changing
> the grammar for any other command. The START_BACKUP can return the
> unique_backup_id
> in the result set.
>
> I am thinking of the following struct for shared state:
>
>> *typedef* *struct*
>>
>> {
>>
>> *char* backupid[NAMEDATALEN];
>>
>> XLogRecPtr startptr;
>>
>>
>> slock_t lock;
>>
>> int64 throttling_counter;
>>
>> *bool* backup_started_in_recovery;
>>
>> } BackupSharedState;
>>
>>
> The shared state structure entries would be maintained by a shared hash
> table.
> There will be one structure per parallel backup. Since a single parallel
> backup
> can engage more than one wal sender, so I think max_wal_senders might be a
> little
> too much; perhaps max_wal_senders/2 since there will be at least 2
> connections
> per parallel backup? Alternatively, we can set a new GUC that defines the
> maximum
> number of for concurrent parallel backups i.e.
> ‘max_concurent_backups_allowed = 10’
> perhaps, or we can make it user-configurable.
>
> The key would be “backupid=hex_encode(pg_random_strong(16))”
>
> Checking for Standby Promotion:
> At the START_BACKUP command, we initialize
> BackupSharedState.backup_started_in_recovery
> and keep checking it whenever send_file () is called to send a new file.
>
> Throttling:
> BackupSharedState.throttling_counter - The throttling logic remains the
> same
> as for non-parallel backup with the exception that multiple threads will
> now be
> updating it. So in parallel backup, this will represent the overall bytes
> that
> have been transferred. So the workers would sleep if they have exceeded the
> limit. Hence, the shared state carries a lock to safely update the
> throttling
> value atomically.
>
> Progress Reporting:
> Although I think we should add progress-reporting for parallel backup as a
> separate patch. The relevant entries for progress-reporting such as
> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
> as well.
>
>
> Grammar:
> There is a change in the resultset being returned for START_BACKUP
> command;
> unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
> added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
> to the grammar.
>
> START_BACKUP [LABEL '<label>'] [FAST]
>   - returns startptr, tli, backup_label, unique_backup_id
> STOP_BACKUP [NOWAIT]
>   - returns startptr, tli, backup_label
> JOIN_BACKUP ‘unique_backup_id’
>   - attaches a shared state identified by ‘unique_backup_id’ to a backend
> process.
>
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-04-07T04:15:00Z

Asif,

After recent backup manifest addition, patches needed to rebase and
reconsideration of a few things like making sure that parallel backup
creates
a manifest file correctly or not etc.

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-07T11:03:46Z

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

> Hi Asif
>
> When a non-existent slot is used with tablespace then correct error is
> displayed but then the backup folder is not cleaned and leaves a corrupt
> backup.
>
> Steps
> =======
>
> edb@localhost bin]$
> [edb@localhost bin]$ mkdir /home/edb/tbl1
> [edb@localhost bin]$ mkdir /home/edb/tbl_res
> [edb@localhost bin]$
> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
> CREATE TABLESPACE
> postgres=#
> postgres=# create table t1 (a int) tablespace tbl1;
> CREATE TABLE
> postgres=# insert into t1 values(100);
> INSERT 0 1
> postgres=# insert into t1 values(200);
> INSERT 0 1
> postgres=# insert into t1 values(300);
> INSERT 0 1
> postgres=#
>
>
> [edb@localhost bin]$
> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: error: could not send replication command
> "START_REPLICATION": ERROR:  replication slot "test" does not exist
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: write-ahead log end point: 0/2E000100
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: error: child thread exited with error 1
> [edb@localhost bin]$
>
> backup folder not cleaned
>
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$ ls /home/edb/Desktop/backup
> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>             postgresql.conf
> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>  postgresql.auto.conf
> [edb@localhost bin]$
>
>
>
>
> If the same case is executed without the parallel backup patch then the
> backup folder is cleaned after the error is displayed.
>
> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
> /home/edb/tbl1=/home/edb/tbl_res -S test999
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: error: could not send replication command
> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
> pg_basebackup: write-ahead log end point: 0/2B000100
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: error: child process exited with exit code 1
> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
> pg_basebackup: changes to tablespace directories will not be undone
>


Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is
in progress then the correct error is displayed but then the backup folder
is not cleaned and leaves a corrupt backup. I think one bug fix will solve
all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch
then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks


>
> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
>>
>>
>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>> >> Why would you need to do that? As long as the process where
>>> >> STOP_BACKUP can do the check, that seems good enough.
>>> >
>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>>> while the backup is
>>> > in progress. So if the backup is a large one, early error detection
>>> would be much beneficial.
>>> > This is the current behavior of non-parallel backup as well.
>>>
>>> Because non-parallel backup does not feature early detection of this
>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>> is undesirable. If you want to fix that problem, do it on a separate
>>> thread in a separate patch. A patch proposing to make parallel backup
>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>> least if I have anything to say about it.
>>>
>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>> current situation is not great, but promotions ought to be relatively
>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>> also worth considering whether the right fix is to figure out how to
>>> make that case actually work, rather than just making it fail quicker.
>>> I don't currently understand the reason for the prohibition so I can't
>>> express an intelligent opinion on what the right answer is here, but
>>> it seems like it ought to be investigated before somebody goes and
>>> builds a bunch of infrastructure to make the error more timely.
>>>
>>
>> Non-parallel backup already does the early error checking. I only intended
>>
>> to make parallel behave the same as non-parallel here. So, I agree with
>>
>> you that the behavior of parallel backup should be consistent with the
>>
>> non-parallel one.  Please see the code snippet below from
>>
>> basebackup.c:sendDir()
>>
>>
>> /*
>>>
>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>
>>>  * error in that case. The error handler further up will call
>>>
>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>
>>>  * started while still in recovery, the server wasn't promoted.
>>>
>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>
>>>  * the backup early than continue to the end and fail there.
>>>
>>>  */
>>>
>>> CHECK_FOR_INTERRUPTS();
>>>
>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>
>>> ereport(ERROR,
>>>
>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>
>>> errmsg("the standby was promoted during online backup"),
>>>
>>> errhint("This means that the backup being taken is corrupt "
>>>
>>> "and should not be used. "
>>>
>>> "Try taking another online backup.")));
>>>
>>>
>>> > Okay, then I will add the shared state. And since we are adding the
>>> shared state, we can use
>>> > that for throttling, progress-reporting and standby early error
>>> checking.
>>>
>>> Please propose a grammar here for all the new replication commands you
>>> plan to add before going and implement everything. That will make it
>>> easier to hash out the design without forcing you to keep changing the
>>> code. Your design should include a sketch of how several sets of
>>> coordinating backends taking several concurrent parallel backups will
>>> end up with one shared state per parallel backup.
>>>
>>> > There are two possible options:
>>> >
>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>> >
>>> > This BackupID should be given back as a response to start backup
>>> command. All client workers
>>> > must append this ID to all parallel backup replication commands. So
>>> that we can use this identifier
>>> > to search for that particular backup. Does that sound good?
>>>
>>> Using the WAL start location as the backup ID seems like it might be
>>> problematic -- could a single checkpoint not end up as the start
>>> location for multiple backups started at the same time? Whether that's
>>> possible now or not, it seems unwise to hard-wire that assumption into
>>> the wire protocol.
>>>
>>> I was thinking that perhaps the client should generate a unique backup
>>> ID, e.g. leader does:
>>>
>>> START_BACKUP unique_backup_id [options]...
>>>
>>> And then others do:
>>>
>>> JOIN_BACKUP unique_backup_id
>>>
>>> My thought is that you will have a number of shared memory structure
>>> equal to max_wal_senders, each one large enough to hold the shared
>>> state for one backup. The shared state will include
>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>> ID. START_BACKUP would allocate one and copy the name into it;
>>> JOIN_BACKUP would search for one by name.
>>>
>>> If you want to generate the name on the server side, then I suppose
>>> START_BACKUP would return a result set that includes the backup ID,
>>> and clients would have to specify that same backup ID when invoking
>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>> better. Either way, the backup ID should be something long and hard to
>>> guess, not e.g. the leader processes' PID. I think we should generate
>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>> result to get a string. That way there's almost no risk of two backup
>>> IDs colliding accidentally, and even if we somehow had a malicious
>>> user trying to screw up somebody else's parallel backup by choosing a
>>> colliding backup ID, it would be pretty hard to have any success. A
>>> user with enough access to do that sort of thing can probably cause a
>>> lot worse problems anyway, but it seems pretty easy to guard against
>>> intentional collisions robustly here, so I think we should.
>>>
>>>
>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>> unique_backup_id’
>> to make workers find the relevant shared state. There won't be any need
>> for changing
>> the grammar for any other command. The START_BACKUP can return the
>> unique_backup_id
>> in the result set.
>>
>> I am thinking of the following struct for shared state:
>>
>>> *typedef* *struct*
>>>
>>> {
>>>
>>> *char* backupid[NAMEDATALEN];
>>>
>>> XLogRecPtr startptr;
>>>
>>>
>>> slock_t lock;
>>>
>>> int64 throttling_counter;
>>>
>>> *bool* backup_started_in_recovery;
>>>
>>> } BackupSharedState;
>>>
>>>
>> The shared state structure entries would be maintained by a shared hash
>> table.
>> There will be one structure per parallel backup. Since a single parallel
>> backup
>> can engage more than one wal sender, so I think max_wal_senders might be
>> a little
>> too much; perhaps max_wal_senders/2 since there will be at least 2
>> connections
>> per parallel backup? Alternatively, we can set a new GUC that defines the
>> maximum
>> number of for concurrent parallel backups i.e.
>> ‘max_concurent_backups_allowed = 10’
>> perhaps, or we can make it user-configurable.
>>
>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>
>> Checking for Standby Promotion:
>> At the START_BACKUP command, we initialize
>> BackupSharedState.backup_started_in_recovery
>> and keep checking it whenever send_file () is called to send a new file.
>>
>> Throttling:
>> BackupSharedState.throttling_counter - The throttling logic remains the
>> same
>> as for non-parallel backup with the exception that multiple threads will
>> now be
>> updating it. So in parallel backup, this will represent the overall bytes
>> that
>> have been transferred. So the workers would sleep if they have exceeded
>> the
>> limit. Hence, the shared state carries a lock to safely update the
>> throttling
>> value atomically.
>>
>> Progress Reporting:
>> Although I think we should add progress-reporting for parallel backup as a
>> separate patch. The relevant entries for progress-reporting such as
>> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
>> as well.
>>
>>
>> Grammar:
>> There is a change in the resultset being returned for START_BACKUP
>> command;
>> unique_backup_id is added. Additionally, JOIN_BACKUP replication command
>> is
>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>> changes
>> to the grammar.
>>
>> START_BACKUP [LABEL '<label>'] [FAST]
>>   - returns startptr, tli, backup_label, unique_backup_id
>> STOP_BACKUP [NOWAIT]
>>   - returns startptr, tli, backup_label
>> JOIN_BACKUP ‘unique_backup_id’
>>   - attaches a shared state identified by ‘unique_backup_id’ to a backend
>> process.
>>
>> LIST_TABLESPACES [PROGRESS]
>> LIST_FILES [TABLESPACE]
>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> Regards
> ====================================
> Kashif Zeeshan
> Lead Quality Assurance Engineer / Manager
>
> EnterpriseDB Corporation
> The Enterprise Postgres Company
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-07T16:43:51Z

Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar
changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
    - This will generate a unique backupid using pg_strong_random(16) and
hex-encoded
      it. which is then returned as the result set.
    - It will also create a shared state and add it to the hashtable. The
hash table size is set
      to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I
think it's
      sufficient initial size. max_wal_senders is not used, because it can
be set to quite a
      large values.

JOIN_BACKUP 'backup_id'
    - finds 'backup_id' in hashtable and attaches it to server process.


SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
    - renamed SEND_FILES to SEND_FILE
    - removed START_WAL_LOCATION from this because 'startptr' is now
accessible through
      shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The
backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have
generated a warning so that
user is aware of it and not expect it in the backup.


On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

>
>
> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>> Hi Asif
>>
>> When a non-existent slot is used with tablespace then correct error is
>> displayed but then the backup folder is not cleaned and leaves a corrupt
>> backup.
>>
>> Steps
>> =======
>>
>> edb@localhost bin]$
>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>> [edb@localhost bin]$
>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>> CREATE TABLESPACE
>> postgres=#
>> postgres=# create table t1 (a int) tablespace tbl1;
>> CREATE TABLE
>> postgres=# insert into t1 values(100);
>> INSERT 0 1
>> postgres=# insert into t1 values(200);
>> INSERT 0 1
>> postgres=# insert into t1 values(300);
>> INSERT 0 1
>> postgres=#
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: write-ahead log end point: 0/2E000100
>> pg_basebackup: waiting for background process to finish streaming ...
>> pg_basebackup: error: child thread exited with error 1
>> [edb@localhost bin]$
>>
>> backup folder not cleaned
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>             postgresql.conf
>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>  postgresql.auto.conf
>> [edb@localhost bin]$
>>
>>
>>
>>
>> If the same case is executed without the parallel backup patch then the
>> backup folder is cleaned after the error is displayed.
>>
>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
>> /home/edb/tbl1=/home/edb/tbl_res -S test999
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>> pg_basebackup: write-ahead log end point: 0/2B000100
>> pg_basebackup: waiting for background process to finish streaming ...
>> pg_basebackup: error: child process exited with exit code 1
>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>> pg_basebackup: changes to tablespace directories will not be undone
>>
>
>
> Hi Asif
>
> A similar case is when DB Server is shut down while the Parallel Backup is
> in progress then the correct error is displayed but then the backup folder
> is not cleaned and leaves a corrupt backup. I think one bug fix will solve
> all these cases where clean up is not done when parallel backup is failed.
>
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
> 8
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: backup worker (7) created
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> [edb@localhost bin]$
> [edb@localhost bin]$
>
> Same case when executed on pg_basebackup without the Parallel backup patch
> then proper clean up is done.
>
> [edb@localhost bin]$
> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> [edb@localhost bin]$
>
> Thanks
>
>
>>
>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>
>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>> wrote:
>>>> >> Why would you need to do that? As long as the process where
>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>> >
>>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>>>> while the backup is
>>>> > in progress. So if the backup is a large one, early error detection
>>>> would be much beneficial.
>>>> > This is the current behavior of non-parallel backup as well.
>>>>
>>>> Because non-parallel backup does not feature early detection of this
>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>>> least if I have anything to say about it.
>>>>
>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>> current situation is not great, but promotions ought to be relatively
>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>> also worth considering whether the right fix is to figure out how to
>>>> make that case actually work, rather than just making it fail quicker.
>>>> I don't currently understand the reason for the prohibition so I can't
>>>> express an intelligent opinion on what the right answer is here, but
>>>> it seems like it ought to be investigated before somebody goes and
>>>> builds a bunch of infrastructure to make the error more timely.
>>>>
>>>
>>> Non-parallel backup already does the early error checking. I only
>>> intended
>>>
>>> to make parallel behave the same as non-parallel here. So, I agree with
>>>
>>> you that the behavior of parallel backup should be consistent with the
>>>
>>> non-parallel one.  Please see the code snippet below from
>>>
>>> basebackup.c:sendDir()
>>>
>>>
>>> /*
>>>>
>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>
>>>>  * error in that case. The error handler further up will call
>>>>
>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>
>>>>  * started while still in recovery, the server wasn't promoted.
>>>>
>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>
>>>>  * the backup early than continue to the end and fail there.
>>>>
>>>>  */
>>>>
>>>> CHECK_FOR_INTERRUPTS();
>>>>
>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>
>>>> ereport(ERROR,
>>>>
>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>
>>>> errmsg("the standby was promoted during online backup"),
>>>>
>>>> errhint("This means that the backup being taken is corrupt "
>>>>
>>>> "and should not be used. "
>>>>
>>>> "Try taking another online backup.")));
>>>>
>>>>
>>>> > Okay, then I will add the shared state. And since we are adding the
>>>> shared state, we can use
>>>> > that for throttling, progress-reporting and standby early error
>>>> checking.
>>>>
>>>> Please propose a grammar here for all the new replication commands you
>>>> plan to add before going and implement everything. That will make it
>>>> easier to hash out the design without forcing you to keep changing the
>>>> code. Your design should include a sketch of how several sets of
>>>> coordinating backends taking several concurrent parallel backups will
>>>> end up with one shared state per parallel backup.
>>>>
>>>> > There are two possible options:
>>>> >
>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>> >
>>>> > This BackupID should be given back as a response to start backup
>>>> command. All client workers
>>>> > must append this ID to all parallel backup replication commands. So
>>>> that we can use this identifier
>>>> > to search for that particular backup. Does that sound good?
>>>>
>>>> Using the WAL start location as the backup ID seems like it might be
>>>> problematic -- could a single checkpoint not end up as the start
>>>> location for multiple backups started at the same time? Whether that's
>>>> possible now or not, it seems unwise to hard-wire that assumption into
>>>> the wire protocol.
>>>>
>>>> I was thinking that perhaps the client should generate a unique backup
>>>> ID, e.g. leader does:
>>>>
>>>> START_BACKUP unique_backup_id [options]...
>>>>
>>>> And then others do:
>>>>
>>>> JOIN_BACKUP unique_backup_id
>>>>
>>>> My thought is that you will have a number of shared memory structure
>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>> state for one backup. The shared state will include
>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>> JOIN_BACKUP would search for one by name.
>>>>
>>>> If you want to generate the name on the server side, then I suppose
>>>> START_BACKUP would return a result set that includes the backup ID,
>>>> and clients would have to specify that same backup ID when invoking
>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>> better. Either way, the backup ID should be something long and hard to
>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>> result to get a string. That way there's almost no risk of two backup
>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>> user with enough access to do that sort of thing can probably cause a
>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>> intentional collisions robustly here, so I think we should.
>>>>
>>>>
>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>> unique_backup_id’
>>> to make workers find the relevant shared state. There won't be any need
>>> for changing
>>> the grammar for any other command. The START_BACKUP can return the
>>> unique_backup_id
>>> in the result set.
>>>
>>> I am thinking of the following struct for shared state:
>>>
>>>> *typedef* *struct*
>>>>
>>>> {
>>>>
>>>> *char* backupid[NAMEDATALEN];
>>>>
>>>> XLogRecPtr startptr;
>>>>
>>>>
>>>> slock_t lock;
>>>>
>>>> int64 throttling_counter;
>>>>
>>>> *bool* backup_started_in_recovery;
>>>>
>>>> } BackupSharedState;
>>>>
>>>>
>>> The shared state structure entries would be maintained by a shared hash
>>> table.
>>> There will be one structure per parallel backup. Since a single parallel
>>> backup
>>> can engage more than one wal sender, so I think max_wal_senders might be
>>> a little
>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>> connections
>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>> the maximum
>>> number of for concurrent parallel backups i.e.
>>> ‘max_concurent_backups_allowed = 10’
>>> perhaps, or we can make it user-configurable.
>>>
>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>
>>> Checking for Standby Promotion:
>>> At the START_BACKUP command, we initialize
>>> BackupSharedState.backup_started_in_recovery
>>> and keep checking it whenever send_file () is called to send a new file.
>>>
>>> Throttling:
>>> BackupSharedState.throttling_counter - The throttling logic remains the
>>> same
>>> as for non-parallel backup with the exception that multiple threads will
>>> now be
>>> updating it. So in parallel backup, this will represent the overall
>>> bytes that
>>> have been transferred. So the workers would sleep if they have exceeded
>>> the
>>> limit. Hence, the shared state carries a lock to safely update the
>>> throttling
>>> value atomically.
>>>
>>> Progress Reporting:
>>> Although I think we should add progress-reporting for parallel backup as
>>> a
>>> separate patch. The relevant entries for progress-reporting such as
>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>> structure
>>> as well.
>>>
>>>
>>> Grammar:
>>> There is a change in the resultset being returned for START_BACKUP
>>> command;
>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication command
>>> is
>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>> changes
>>> to the grammar.
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>   - returns startptr, tli, backup_label, unique_backup_id
>>> STOP_BACKUP [NOWAIT]
>>>   - returns startptr, tli, backup_label
>>> JOIN_BACKUP ‘unique_backup_id’
>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>> backend process.
>>>
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> --
>> Regards
>> ====================================
>> Kashif Zeeshan
>> Lead Quality Assurance Engineer / Manager
>>
>> EnterpriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
> --
> Regards
> ====================================
> Kashif Zeeshan
> Lead Quality Assurance Engineer / Manager
>
> EnterpriseDB Corporation
> The Enterprise Postgres Company
>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2020-04-07T17:02:55Z

On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi,
>
> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>
> I have added the shared state as previously described. The new grammar
> changes
> are as follows:
>
> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>     - This will generate a unique backupid using pg_strong_random(16) and
> hex-encoded
>       it. which is then returned as the result set.
>     - It will also create a shared state and add it to the hashtable. The
> hash table size is set
>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
> I think it's
>       sufficient initial size. max_wal_senders is not used, because it can
> be set to quite a
>       large values.
>
> JOIN_BACKUP 'backup_id'
>     - finds 'backup_id' in hashtable and attaches it to server process.
>
>
> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>     - renamed SEND_FILES to SEND_FILE
>     - removed START_WAL_LOCATION from this because 'startptr' is now
> accessible through
>       shared state.
>
> There is no change in other commands:
> STOP_BACKUP [NOWAIT]
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>
> The current patches (v11) have been rebased to the latest master. The
> backup manifest is enabled
> by default, so I have disabled it for parallel backup mode and have
> generated a warning so that
> user is aware of it and not expect it in the backup.
>

So, are you working on to make it work? I don't think a parallel backup
feature should be creating a backup with no manifest.


>
>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-07T17:25:05Z

On Tue, Apr 7, 2020 at 10:03 PM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

>
>
> On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>> Hi,
>>
>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>
>> I have added the shared state as previously described. The new grammar
>> changes
>> are as follows:
>>
>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>     - This will generate a unique backupid using pg_strong_random(16) and
>> hex-encoded
>>       it. which is then returned as the result set.
>>     - It will also create a shared state and add it to the hashtable. The
>> hash table size is set
>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
>> I think it's
>>       sufficient initial size. max_wal_senders is not used, because it
>> can be set to quite a
>>       large values.
>>
>> JOIN_BACKUP 'backup_id'
>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>
>>
>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>     - renamed SEND_FILES to SEND_FILE
>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>> accessible through
>>       shared state.
>>
>> There is no change in other commands:
>> STOP_BACKUP [NOWAIT]
>> LIST_TABLESPACES [PROGRESS]
>> LIST_FILES [TABLESPACE]
>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>
>> The current patches (v11) have been rebased to the latest master. The
>> backup manifest is enabled
>> by default, so I have disabled it for parallel backup mode and have
>> generated a warning so that
>> user is aware of it and not expect it in the backup.
>>
>
> So, are you working on to make it work? I don't think a parallel backup
> feature should be creating a backup with no manifest.
>

I will, however parallel backup is already quite a large patch. So I think
we should first
agree on the current work before adding a backup manifest and
progress-reporting support.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-07T17:36:21Z

On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Non-parallel backup already does the early error checking. I only intended
> to make parallel behave the same as non-parallel here. So, I agree with
> you that the behavior of parallel backup should be consistent with the
> non-parallel one.  Please see the code snippet below from
> basebackup.c:sendDir()

Oh, OK. So then we need to preserve that behavior, I think. Sorry, I
didn't realize the check was happening there.

> I am thinking of the following struct for shared state:
>> typedef struct
>> {
>> char backupid[NAMEDATALEN];
>> XLogRecPtr startptr;
>> slock_t lock;
>> int64 throttling_counter;
>> bool backup_started_in_recovery;
>> } BackupSharedState;

Looks broadly reasonable. Can anything other than lock and
throttling_counter change while it's running? If not, how about using
pg_atomic_uint64 for the throttling counter, and dropping lock? If
that gets too complicated it's OK to keep it as you have it.

> The shared state structure entries would be maintained by a shared hash table.
> There will be one structure per parallel backup. Since a single parallel backup
> can engage more than one wal sender, so I think max_wal_senders might be a little
> too much; perhaps max_wal_senders/2 since there will be at least 2 connections
> per parallel backup? Alternatively, we can set a new GUC that defines the maximum
> number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
> perhaps, or we can make it user-configurable.

I don't think you need a hash table. Linear search should be fine. And
I see no point in dividing max_wal_senders by 2 either. The default is
*10*. You'd need to increase that by more than an order of magnitude
for a hash table to be needed, and more than that for the shared
memory consumption to matter.

> The key would be “backupid=hex_encode(pg_random_strong(16))”

wfm

> Progress Reporting:
> Although I think we should add progress-reporting for parallel backup as a
> separate patch. The relevant entries for progress-reporting such as
> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
> as well.

I mean, you can separate it for review if you wish, but it would need
to be committed together.

> START_BACKUP [LABEL '<label>'] [FAST]
>   - returns startptr, tli, backup_label, unique_backup_id

OK. But what if I want to use this interface for a non-parallel backup?

> STOP_BACKUP [NOWAIT]
>   - returns startptr, tli, backup_label

I don't think it makes sense for STOP_BACKUP to return the same values
that START_BACKUP already returned. Presumably STOP_BACKUP should
return the end LSN. It could also return the backup label and
tablespace map files, as the corresponding SQL function does, unless
there's some better way of returning those in this case.

> JOIN_BACKUP ‘unique_backup_id’
>   - attaches a shared state identified by ‘unique_backup_id’ to a backend process.

OK.

> LIST_TABLESPACES [PROGRESS]

OK.

> LIST_FILES [TABLESPACE]

OK.

> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

Why not just LIST_WAL_FILES 'startptr' 'endptr'?

> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Why parens? That seems useless.

Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and
SEND_WAL_FILE 'walfilename' as separate commands. But not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-07T17:37:30Z

On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-08T05:48:31Z

Hi Asif,

Thanks for new patches.

Patches need to be rebased on head. Getting a failure while applying the
0003 patch.
edb@localhost postgresql]$ git apply
v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply

I have applied v11 patches on commit -
23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.

pg_basebackup has a new option "--no-estimate-size",  pg_basebackup crashes
when using this option.

[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)

--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
--jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298  while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
pg_basebackup.c:2526

Thanks & Regards,
Rajkumar Raghuwanshi


On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> > I will, however parallel backup is already quite a large patch. So I
> think we should first
> > agree on the current work before adding a backup manifest and
> progress-reporting support.
>
> It's going to be needed for commit, but it may make sense for us to do
> more review of what you've got here before we worry about it.
>
> I'm gonna try to find some time for that as soon as I can.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-08T07:09:20Z

rebased and updated to current master (d025cf88ba). v12 is attahced.

Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:

- LIST_WAL_FILES 'startptr' 'endptr'
- SEND_FILE 'FILE'  [NOVERIFY_CHECKSUMS]


On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif,
>
> Thanks for new patches.
>
> Patches need to be rebased on head. Getting a failure while applying the
> 0003 patch.
> edb@localhost postgresql]$ git apply
> v11/0003-Parallel-Backup-Backend-Replication-commands.patch
> error: patch failed: src/backend/storage/ipc/ipci.c:147
> error: src/backend/storage/ipc/ipci.c: patch does not apply
>
> I have applied v11 patches on commit -
> 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.
>
> pg_basebackup has a new option "--no-estimate-size",  pg_basebackup
> crashes when using this option.
>
> [edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size
> --jobs=2
> Segmentation fault (core dumped)
>
> --stacktrace
> [edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
> Loaded symbols for /lib64/libselinux.so.1
> Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
> --jobs=2'.
> Program terminated with signal 11, Segmentation fault.
> #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
> optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
> 298  while (ISSPACE (*s))
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value
> optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
> #1  0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
> #2  0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at
> pg_basebackup.c:2526
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > I will, however parallel backup is already quite a large patch. So I
>> think we should first
>> > agree on the current work before adding a backup manifest and
>> progress-reporting support.
>>
>> It's going to be needed for commit, but it may make sense for us to do
>> more review of what you've got here before we worry about it.
>>
>> I'm gonna try to find some time for that as soon as I can.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-08T13:53:14Z

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi,
>
> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>
> I have added the shared state as previously described. The new grammar
> changes
> are as follows:
>
> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>     - This will generate a unique backupid using pg_strong_random(16) and
> hex-encoded
>       it. which is then returned as the result set.
>     - It will also create a shared state and add it to the hashtable. The
> hash table size is set
>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
> I think it's
>       sufficient initial size. max_wal_senders is not used, because it can
> be set to quite a
>       large values.
>
> JOIN_BACKUP 'backup_id'
>     - finds 'backup_id' in hashtable and attaches it to server process.
>
>
> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>     - renamed SEND_FILES to SEND_FILE
>     - removed START_WAL_LOCATION from this because 'startptr' is now
> accessible through
>       shared state.
>
> There is no change in other commands:
> STOP_BACKUP [NOWAIT]
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>
> The current patches (v11) have been rebased to the latest master. The
> backup manifest is enabled
> by default, so I have disabled it for parallel backup mode and have
> generated a warning so that
> user is aware of it and not expect it in the backup.
>
> Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please
have a look.


1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were
given in previous emails of each bug I tried to verify the fix. Backtrace
is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.


[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
 /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$


[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup
/tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D
/home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
listp=0x7f22272f86c8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true)
at exit.c:77
#4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at
pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
pg_basebackup.c:2765
(gdb)




b) When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit  Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 8 -D  /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL:  number of
requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup
/tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D
/home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
listp=0x7f051f1436c8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true)
at exit.c:77
#4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at
pg_basebackup.c:2811
#6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
pg_basebackup.c:2765
(gdb)




2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is
in progress then the correct error is displayed but then the backup folder
is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$



[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base         pg_hba.conf    pg_logical    pg_notify    pg_serial
pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
 pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$




Thanks
Kashif Zeeshan

>
>
> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>>
>>
>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>> kashif.zeeshan@enterprisedb.com> wrote:
>>
>>> Hi Asif
>>>
>>> When a non-existent slot is used with tablespace then correct error is
>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>> backup.
>>>
>>> Steps
>>> =======
>>>
>>> edb@localhost bin]$
>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>> [edb@localhost bin]$
>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>> CREATE TABLESPACE
>>> postgres=#
>>> postgres=# create table t1 (a int) tablespace tbl1;
>>> CREATE TABLE
>>> postgres=# insert into t1 values(100);
>>> INSERT 0 1
>>> postgres=# insert into t1 values(200);
>>> INSERT 0 1
>>> postgres=# insert into t1 values(300);
>>> INSERT 0 1
>>> postgres=#
>>>
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: error: could not send replication command
>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>> pg_basebackup: waiting for background process to finish streaming ...
>>> pg_basebackup: error: child thread exited with error 1
>>> [edb@localhost bin]$
>>>
>>> backup folder not cleaned
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>             postgresql.conf
>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>  postgresql.auto.conf
>>> [edb@localhost bin]$
>>>
>>>
>>>
>>>
>>> If the same case is executed without the parallel backup patch then the
>>> backup folder is cleaned after the error is displayed.
>>>
>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: error: could not send replication command
>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>> pg_basebackup: waiting for background process to finish streaming ...
>>> pg_basebackup: error: child process exited with exit code 1
>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>> pg_basebackup: changes to tablespace directories will not be undone
>>>
>>
>>
>> Hi Asif
>>
>> A similar case is when DB Server is shut down while the Parallel Backup
>> is in progress then the correct error is displayed but then the backup
>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>> solve all these cases where clean up is not done when parallel backup is
>> failed.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>> -j 8
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: backup worker (7) created
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>> Same case when executed on pg_basebackup without the Parallel backup
>> patch then proper clean up is done.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing contents of data directory
>> "/home/edb/Desktop/backup/"
>> [edb@localhost bin]$
>>
>> Thanks
>>
>>
>>>
>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>>> wrote:
>>>>> >> Why would you need to do that? As long as the process where
>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>> >
>>>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>>>>> while the backup is
>>>>> > in progress. So if the backup is a large one, early error detection
>>>>> would be much beneficial.
>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>
>>>>> Because non-parallel backup does not feature early detection of this
>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>>>> least if I have anything to say about it.
>>>>>
>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>> current situation is not great, but promotions ought to be relatively
>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>> also worth considering whether the right fix is to figure out how to
>>>>> make that case actually work, rather than just making it fail quicker.
>>>>> I don't currently understand the reason for the prohibition so I can't
>>>>> express an intelligent opinion on what the right answer is here, but
>>>>> it seems like it ought to be investigated before somebody goes and
>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>
>>>>
>>>> Non-parallel backup already does the early error checking. I only
>>>> intended
>>>>
>>>> to make parallel behave the same as non-parallel here. So, I agree with
>>>>
>>>> you that the behavior of parallel backup should be consistent with the
>>>>
>>>> non-parallel one.  Please see the code snippet below from
>>>>
>>>> basebackup.c:sendDir()
>>>>
>>>>
>>>> /*
>>>>>
>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>>
>>>>>  * error in that case. The error handler further up will call
>>>>>
>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>
>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>
>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>
>>>>>  * the backup early than continue to the end and fail there.
>>>>>
>>>>>  */
>>>>>
>>>>> CHECK_FOR_INTERRUPTS();
>>>>>
>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>
>>>>> ereport(ERROR,
>>>>>
>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>
>>>>> errmsg("the standby was promoted during online backup"),
>>>>>
>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>
>>>>> "and should not be used. "
>>>>>
>>>>> "Try taking another online backup.")));
>>>>>
>>>>>
>>>>> > Okay, then I will add the shared state. And since we are adding the
>>>>> shared state, we can use
>>>>> > that for throttling, progress-reporting and standby early error
>>>>> checking.
>>>>>
>>>>> Please propose a grammar here for all the new replication commands you
>>>>> plan to add before going and implement everything. That will make it
>>>>> easier to hash out the design without forcing you to keep changing the
>>>>> code. Your design should include a sketch of how several sets of
>>>>> coordinating backends taking several concurrent parallel backups will
>>>>> end up with one shared state per parallel backup.
>>>>>
>>>>> > There are two possible options:
>>>>> >
>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>> >
>>>>> > This BackupID should be given back as a response to start backup
>>>>> command. All client workers
>>>>> > must append this ID to all parallel backup replication commands. So
>>>>> that we can use this identifier
>>>>> > to search for that particular backup. Does that sound good?
>>>>>
>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>> problematic -- could a single checkpoint not end up as the start
>>>>> location for multiple backups started at the same time? Whether that's
>>>>> possible now or not, it seems unwise to hard-wire that assumption into
>>>>> the wire protocol.
>>>>>
>>>>> I was thinking that perhaps the client should generate a unique backup
>>>>> ID, e.g. leader does:
>>>>>
>>>>> START_BACKUP unique_backup_id [options]...
>>>>>
>>>>> And then others do:
>>>>>
>>>>> JOIN_BACKUP unique_backup_id
>>>>>
>>>>> My thought is that you will have a number of shared memory structure
>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>> state for one backup. The shared state will include
>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>> JOIN_BACKUP would search for one by name.
>>>>>
>>>>> If you want to generate the name on the server side, then I suppose
>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>> and clients would have to specify that same backup ID when invoking
>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>> better. Either way, the backup ID should be something long and hard to
>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>> result to get a string. That way there's almost no risk of two backup
>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>> user with enough access to do that sort of thing can probably cause a
>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>> intentional collisions robustly here, so I think we should.
>>>>>
>>>>>
>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>> unique_backup_id’
>>>> to make workers find the relevant shared state. There won't be any need
>>>> for changing
>>>> the grammar for any other command. The START_BACKUP can return the
>>>> unique_backup_id
>>>> in the result set.
>>>>
>>>> I am thinking of the following struct for shared state:
>>>>
>>>>> *typedef* *struct*
>>>>>
>>>>> {
>>>>>
>>>>> *char* backupid[NAMEDATALEN];
>>>>>
>>>>> XLogRecPtr startptr;
>>>>>
>>>>>
>>>>> slock_t lock;
>>>>>
>>>>> int64 throttling_counter;
>>>>>
>>>>> *bool* backup_started_in_recovery;
>>>>>
>>>>> } BackupSharedState;
>>>>>
>>>>>
>>>> The shared state structure entries would be maintained by a shared hash
>>>> table.
>>>> There will be one structure per parallel backup. Since a single
>>>> parallel backup
>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>> be a little
>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>> connections
>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>> the maximum
>>>> number of for concurrent parallel backups i.e.
>>>> ‘max_concurent_backups_allowed = 10’
>>>> perhaps, or we can make it user-configurable.
>>>>
>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>
>>>> Checking for Standby Promotion:
>>>> At the START_BACKUP command, we initialize
>>>> BackupSharedState.backup_started_in_recovery
>>>> and keep checking it whenever send_file () is called to send a new file.
>>>>
>>>> Throttling:
>>>> BackupSharedState.throttling_counter - The throttling logic remains the
>>>> same
>>>> as for non-parallel backup with the exception that multiple threads
>>>> will now be
>>>> updating it. So in parallel backup, this will represent the overall
>>>> bytes that
>>>> have been transferred. So the workers would sleep if they have exceeded
>>>> the
>>>> limit. Hence, the shared state carries a lock to safely update the
>>>> throttling
>>>> value atomically.
>>>>
>>>> Progress Reporting:
>>>> Although I think we should add progress-reporting for parallel backup
>>>> as a
>>>> separate patch. The relevant entries for progress-reporting such as
>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>> structure
>>>> as well.
>>>>
>>>>
>>>> Grammar:
>>>> There is a change in the resultset being returned for START_BACKUP
>>>> command;
>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>> command is
>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>> changes
>>>> to the grammar.
>>>>
>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>> STOP_BACKUP [NOWAIT]
>>>>   - returns startptr, tli, backup_label
>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>> backend process.
>>>>
>>>> LIST_TABLESPACES [PROGRESS]
>>>> LIST_FILES [TABLESPACE]
>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>
>>>>
>>>> --
>>>> Asif Rehman
>>>> Highgo Software (Canada/China/Pakistan)
>>>> URL : www.highgo.ca
>>>>
>>>>
>>>
>>> --
>>> Regards
>>> ====================================
>>> Kashif Zeeshan
>>> Lead Quality Assurance Engineer / Manager
>>>
>>> EnterpriseDB Corporation
>>> The Enterprise Postgres Company
>>>
>>>
>>
>> --
>> Regards
>> ====================================
>> Kashif Zeeshan
>> Lead Quality Assurance Engineer / Manager
>>
>> EnterpriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-14T12:33:16Z

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

>
>
> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>
>> I have added the shared state as previously described. The new grammar
>> changes
>> are as follows:
>>
>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>     - This will generate a unique backupid using pg_strong_random(16) and
>> hex-encoded
>>       it. which is then returned as the result set.
>>     - It will also create a shared state and add it to the hashtable. The
>> hash table size is set
>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
>> I think it's
>>       sufficient initial size. max_wal_senders is not used, because it
>> can be set to quite a
>>       large values.
>>
>> JOIN_BACKUP 'backup_id'
>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>
>>
>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>     - renamed SEND_FILES to SEND_FILE
>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>> accessible through
>>       shared state.
>>
>> There is no change in other commands:
>> STOP_BACKUP [NOWAIT]
>> LIST_TABLESPACES [PROGRESS]
>> LIST_FILES [TABLESPACE]
>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>
>> The current patches (v11) have been rebased to the latest master. The
>> backup manifest is enabled
>> by default, so I have disabled it for parallel backup mode and have
>> generated a warning so that
>> user is aware of it and not expect it in the backup.
>>
>> Hi Asif
>
> I have verified the bug fixes, one bug is fixed and working now as
> expected
>
> For the verification of the other bug fixes faced following issues, please
> have a look.
>
>
> 1) Following bug fixes mentioned below are generating segmentation fault.
>
> Please note for reference I have added a description only as steps were
> given in previous emails of each bug I tried to verify the fix. Backtrace
> is also added with each case which points to one bug for both the cases.
>
> a) The backup failed with errors "error: could not connect to server:
> could not look up local user ID 1000: Too many open files" when the
> max_wal_senders was set to 2000.
>
>
> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>  /home/edb/Desktop/backup/
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> ….
> ….
> pg_basebackup: backup worker (1014) created
> pg_basebackup: backup worker (1015) created
> pg_basebackup: backup worker (1016) created
> pg_basebackup: backup worker (1017) created
> pg_basebackup: error: could not connect to server: could not look up local
> user ID 1000: Too many open files
> Segmentation fault
> [edb@localhost bin]$
>
>
> [edb@localhost bin]$
> [edb@localhost bin]$ gdb pg_basebackup
> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
> [New LWP 13219]
> [New LWP 13222]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./pg_basebackup -v -j 1990 -D
> /home/edb/Desktop/backup/'.
> Program terminated with signal 11, Segmentation fault.
> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
> (gdb) bt
> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
> at exit.c:77
> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at
> pg_basebackup.c:2811
> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
> pg_basebackup.c:2765
> (gdb)
>
>
>
>
> b) When executing two backups at the same time, getting FATAL error due to
> max_wal_senders and instead of exit  Backup got completed.
>
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>  /home/edb/Desktop/backup1/
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> Segmentation fault (core dumped)
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$ gdb pg_basebackup
> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
> [New LWP 17041]
> [New LWP 17067]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./pg_basebackup -v -j 8 -D
> /home/edb/Desktop/backup1/'.
> Program terminated with signal 11, Segmentation fault.
> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
> (gdb) bt
> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
> at exit.c:77
> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
> at pg_basebackup.c:2811
> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
> pg_basebackup.c:2765
> (gdb)
>
>
>
>
> 2) The following bug is not fixed yet
>
> A similar case is when DB Server is shut down while the Parallel Backup is
> in progress then the correct error is displayed but then the backup folder
> is not cleaned and leaves a corrupt backup.
>
> [edb@localhost bin]$
> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: backup worker (7) created
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$
>
>
>
> [edb@localhost bin]$
> [edb@localhost bin]$ ls /home/edb/Desktop/backup
> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
> [edb@localhost bin]$
> [edb@localhost bin]$
>
>
>
>
> Thanks
> Kashif Zeeshan
>
>>
>>
>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>> kashif.zeeshan@enterprisedb.com> wrote:
>>
>>>
>>>
>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>
>>>> Hi Asif
>>>>
>>>> When a non-existent slot is used with tablespace then correct error is
>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>>> backup.
>>>>
>>>> Steps
>>>> =======
>>>>
>>>> edb@localhost bin]$
>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>> [edb@localhost bin]$
>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>> CREATE TABLESPACE
>>>> postgres=#
>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>> CREATE TABLE
>>>> postgres=# insert into t1 values(100);
>>>> INSERT 0 1
>>>> postgres=# insert into t1 values(200);
>>>> INSERT 0 1
>>>> postgres=# insert into t1 values(300);
>>>> INSERT 0 1
>>>> postgres=#
>>>>
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: error: could not send replication command
>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>> pg_basebackup: error: child thread exited with error 1
>>>> [edb@localhost bin]$
>>>>
>>>> backup folder not cleaned
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>             postgresql.conf
>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>  postgresql.auto.conf
>>>> [edb@localhost bin]$
>>>>
>>>>
>>>>
>>>>
>>>> If the same case is executed without the parallel backup patch then the
>>>> backup folder is cleaned after the error is displayed.
>>>>
>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: error: could not send replication command
>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>> pg_basebackup: error: child process exited with exit code 1
>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>
>>>
>>>
>>> Hi Asif
>>>
>>> A similar case is when DB Server is shut down while the Parallel Backup
>>> is in progress then the correct error is displayed but then the backup
>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>>> solve all these cases where clean up is not done when parallel backup is
>>> failed.
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>> -j 8
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: backup worker (7) created
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>>
>>> Same case when executed on pg_basebackup without the Parallel backup
>>> patch then proper clean up is done.
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: removing contents of data directory
>>> "/home/edb/Desktop/backup/"
>>> [edb@localhost bin]$
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>>>> wrote:
>>>>>> >> Why would you need to do that? As long as the process where
>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>> >
>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>> not while the backup is
>>>>>> > in progress. So if the backup is a large one, early error detection
>>>>>> would be much beneficial.
>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>
>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>>>>> least if I have anything to say about it.
>>>>>>
>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>> current situation is not great, but promotions ought to be relatively
>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>> make that case actually work, rather than just making it fail quicker.
>>>>>> I don't currently understand the reason for the prohibition so I can't
>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>
>>>>>
>>>>> Non-parallel backup already does the early error checking. I only
>>>>> intended
>>>>>
>>>>> to make parallel behave the same as non-parallel here. So, I agree with
>>>>>
>>>>> you that the behavior of parallel backup should be consistent with the
>>>>>
>>>>> non-parallel one.  Please see the code snippet below from
>>>>>
>>>>> basebackup.c:sendDir()
>>>>>
>>>>>
>>>>> /*
>>>>>>
>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>>>
>>>>>>  * error in that case. The error handler further up will call
>>>>>>
>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>
>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>
>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>
>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>
>>>>>>  */
>>>>>>
>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>
>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>
>>>>>> ereport(ERROR,
>>>>>>
>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>
>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>
>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>
>>>>>> "and should not be used. "
>>>>>>
>>>>>> "Try taking another online backup.")));
>>>>>>
>>>>>>
>>>>>> > Okay, then I will add the shared state. And since we are adding the
>>>>>> shared state, we can use
>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>> checking.
>>>>>>
>>>>>> Please propose a grammar here for all the new replication commands you
>>>>>> plan to add before going and implement everything. That will make it
>>>>>> easier to hash out the design without forcing you to keep changing the
>>>>>> code. Your design should include a sketch of how several sets of
>>>>>> coordinating backends taking several concurrent parallel backups will
>>>>>> end up with one shared state per parallel backup.
>>>>>>
>>>>>> > There are two possible options:
>>>>>> >
>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>> >
>>>>>> > This BackupID should be given back as a response to start backup
>>>>>> command. All client workers
>>>>>> > must append this ID to all parallel backup replication commands. So
>>>>>> that we can use this identifier
>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>
>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>> location for multiple backups started at the same time? Whether that's
>>>>>> possible now or not, it seems unwise to hard-wire that assumption into
>>>>>> the wire protocol.
>>>>>>
>>>>>> I was thinking that perhaps the client should generate a unique backup
>>>>>> ID, e.g. leader does:
>>>>>>
>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>
>>>>>> And then others do:
>>>>>>
>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>
>>>>>> My thought is that you will have a number of shared memory structure
>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>> state for one backup. The shared state will include
>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>
>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>>> better. Either way, the backup ID should be something long and hard to
>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>>> result to get a string. That way there's almost no risk of two backup
>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>> user with enough access to do that sort of thing can probably cause a
>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>
>>>>>>
>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>> unique_backup_id’
>>>>> to make workers find the relevant shared state. There won't be any
>>>>> need for changing
>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>> unique_backup_id
>>>>> in the result set.
>>>>>
>>>>> I am thinking of the following struct for shared state:
>>>>>
>>>>>> *typedef* *struct*
>>>>>>
>>>>>> {
>>>>>>
>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>
>>>>>> XLogRecPtr startptr;
>>>>>>
>>>>>>
>>>>>> slock_t lock;
>>>>>>
>>>>>> int64 throttling_counter;
>>>>>>
>>>>>> *bool* backup_started_in_recovery;
>>>>>>
>>>>>> } BackupSharedState;
>>>>>>
>>>>>>
>>>>> The shared state structure entries would be maintained by a shared
>>>>> hash table.
>>>>> There will be one structure per parallel backup. Since a single
>>>>> parallel backup
>>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>>> be a little
>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>> connections
>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>>> the maximum
>>>>> number of for concurrent parallel backups i.e.
>>>>> ‘max_concurent_backups_allowed = 10’
>>>>> perhaps, or we can make it user-configurable.
>>>>>
>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>
>>>>> Checking for Standby Promotion:
>>>>> At the START_BACKUP command, we initialize
>>>>> BackupSharedState.backup_started_in_recovery
>>>>> and keep checking it whenever send_file () is called to send a new
>>>>> file.
>>>>>
>>>>> Throttling:
>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>> the same
>>>>> as for non-parallel backup with the exception that multiple threads
>>>>> will now be
>>>>> updating it. So in parallel backup, this will represent the overall
>>>>> bytes that
>>>>> have been transferred. So the workers would sleep if they have
>>>>> exceeded the
>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>> throttling
>>>>> value atomically.
>>>>>
>>>>> Progress Reporting:
>>>>> Although I think we should add progress-reporting for parallel backup
>>>>> as a
>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>> structure
>>>>> as well.
>>>>>
>>>>>
>>>>> Grammar:
>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>> command;
>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>> command is
>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>> changes
>>>>> to the grammar.
>>>>>
>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>> STOP_BACKUP [NOWAIT]
>>>>>   - returns startptr, tli, backup_label
>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>> backend process.
>>>>>
>>>>> LIST_TABLESPACES [PROGRESS]
>>>>> LIST_FILES [TABLESPACE]
>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>
>>>>>
>

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process
remains
in an idle state. So after each increment, the worker process will signal
master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This
is
because multiple workers may be copying files from different tablespaces.


- Added backup manifest:
The backend workers maintain their own manifest file which contains a list
of files
that are being transferred by the work. Once all backup files are
transferred, the
workers will create a temp file as
('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t
add the
header, nor the WAL information in their manifest. These two will be added
by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into
a single file
which is then sent back to the client.

The manifest file is created when the following command is received:

>     BUILD_MANIFEST 'backupid'


This is a new replication command. It is sent when pg_basebackup has copied
all the
$PGDATA files including WAL files.



--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-14T13:32:40Z

Hi Asif

Getting the following error on Parallel backup when --no-manifest option is
used.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
 /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
 could not open file
"base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
or directory
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>
>>> I have added the shared state as previously described. The new grammar
>>> changes
>>> are as follows:
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>     - This will generate a unique backupid using pg_strong_random(16)
>>> and hex-encoded
>>>       it. which is then returned as the result set.
>>>     - It will also create a shared state and add it to the hashtable.
>>> The hash table size is set
>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>> dynamically, I think it's
>>>       sufficient initial size. max_wal_senders is not used, because it
>>> can be set to quite a
>>>       large values.
>>>
>>> JOIN_BACKUP 'backup_id'
>>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>>
>>>
>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>     - renamed SEND_FILES to SEND_FILE
>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>>> accessible through
>>>       shared state.
>>>
>>> There is no change in other commands:
>>> STOP_BACKUP [NOWAIT]
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>
>>> The current patches (v11) have been rebased to the latest master. The
>>> backup manifest is enabled
>>> by default, so I have disabled it for parallel backup mode and have
>>> generated a warning so that
>>> user is aware of it and not expect it in the backup.
>>>
>>> Hi Asif
>>
>> I have verified the bug fixes, one bug is fixed and working now as
>> expected
>>
>> For the verification of the other bug fixes faced following issues,
>> please have a look.
>>
>>
>> 1) Following bug fixes mentioned below are generating segmentation fault.
>>
>> Please note for reference I have added a description only as steps were
>> given in previous emails of each bug I tried to verify the fix. Backtrace
>> is also added with each case which points to one bug for both the cases.
>>
>> a) The backup failed with errors "error: could not connect to server:
>> could not look up local user ID 1000: Too many open files" when the
>> max_wal_senders was set to 2000.
>>
>>
>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>  /home/edb/Desktop/backup/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> ….
>> ….
>> pg_basebackup: backup worker (1014) created
>> pg_basebackup: backup worker (1015) created
>> pg_basebackup: backup worker (1016) created
>> pg_basebackup: backup worker (1017) created
>> pg_basebackup: error: could not connect to server: could not look up
>> local user ID 1000: Too many open files
>> Segmentation fault
>> [edb@localhost bin]$
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 13219]
>> [New LWP 13222]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 1990 -D
>> /home/edb/Desktop/backup/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> b) When executing two backups at the same time, getting FATAL error due
>> to max_wal_senders and instead of exit  Backup got completed.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>>  /home/edb/Desktop/backup1/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> Segmentation fault (core dumped)
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 17041]
>> [New LWP 17067]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 8 -D
>> /home/edb/Desktop/backup1/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> 2) The following bug is not fixed yet
>>
>> A similar case is when DB Server is shut down while the Parallel Backup
>> is in progress then the correct error is displayed but then the backup
>> folder is not cleaned and leaves a corrupt backup.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
>> 8
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: backup worker (7) created
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing contents of data directory
>> "/home/edb/Desktop/backup/"
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>>
>> Thanks
>> Kashif Zeeshan
>>
>>>
>>>
>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif
>>>>>
>>>>> When a non-existent slot is used with tablespace then correct error is
>>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>> backup.
>>>>>
>>>>> Steps
>>>>> =======
>>>>>
>>>>> edb@localhost bin]$
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>>> [edb@localhost bin]$
>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>>> CREATE TABLESPACE
>>>>> postgres=#
>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>> CREATE TABLE
>>>>> postgres=# insert into t1 values(100);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(200);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(300);
>>>>> INSERT 0 1
>>>>> postgres=#
>>>>>
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>>> pg_basebackup: backup worker (0) created
>>>>> pg_basebackup: backup worker (1) created
>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child thread exited with error 1
>>>>> [edb@localhost bin]$
>>>>>
>>>>> backup folder not cleaned
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>>             postgresql.conf
>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>>  postgresql.auto.conf
>>>>> [edb@localhost bin]$
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If the same case is executed without the parallel backup patch then
>>>>> the backup folder is cleaned after the error is displayed.
>>>>>
>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>
>>>>
>>>>
>>>> Hi Asif
>>>>
>>>> A similar case is when DB Server is shut down while the Parallel Backup
>>>> is in progress then the correct error is displayed but then the backup
>>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>>>> solve all these cases where clean up is not done when parallel backup is
>>>> failed.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> -j 8
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> pg_basebackup: backup worker (4) created
>>>> pg_basebackup: backup worker (5) created
>>>> pg_basebackup: backup worker (6) created
>>>> pg_basebackup: backup worker (7) created
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>>
>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>> patch then proper clean up is done.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: removing contents of data directory
>>>> "/home/edb/Desktop/backup/"
>>>> [edb@localhost bin]$
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>>>>> wrote:
>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>> >
>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>> not while the backup is
>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>> detection would be much beneficial.
>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>
>>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
>>>>>>> at
>>>>>>> least if I have anything to say about it.
>>>>>>>
>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>> current situation is not great, but promotions ought to be relatively
>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>>> make that case actually work, rather than just making it fail
>>>>>>> quicker.
>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>> can't
>>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>
>>>>>>
>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>> intended
>>>>>>
>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>> with
>>>>>>
>>>>>> you that the behavior of parallel backup should be consistent with the
>>>>>>
>>>>>> non-parallel one.  Please see the code snippet below from
>>>>>>
>>>>>> basebackup.c:sendDir()
>>>>>>
>>>>>>
>>>>>> /*
>>>>>>>
>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>>>>
>>>>>>>  * error in that case. The error handler further up will call
>>>>>>>
>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>
>>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>>
>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>
>>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>>
>>>>>>>  */
>>>>>>>
>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>
>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>
>>>>>>> ereport(ERROR,
>>>>>>>
>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>
>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>
>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>
>>>>>>> "and should not be used. "
>>>>>>>
>>>>>>> "Try taking another online backup.")));
>>>>>>>
>>>>>>>
>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>> the shared state, we can use
>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>> checking.
>>>>>>>
>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>> you
>>>>>>> plan to add before going and implement everything. That will make it
>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>> the
>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>> coordinating backends taking several concurrent parallel backups will
>>>>>>> end up with one shared state per parallel backup.
>>>>>>>
>>>>>>> > There are two possible options:
>>>>>>> >
>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
>>>>>>> OR
>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>>> >
>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>> command. All client workers
>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>> So that we can use this identifier
>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>
>>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>> that's
>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>> into
>>>>>>> the wire protocol.
>>>>>>>
>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>> backup
>>>>>>> ID, e.g. leader does:
>>>>>>>
>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>
>>>>>>> And then others do:
>>>>>>>
>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>
>>>>>>> My thought is that you will have a number of shared memory structure
>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>> state for one backup. The shared state will include
>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>
>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>>>> better. Either way, the backup ID should be something long and hard
>>>>>>> to
>>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>>>> result to get a string. That way there's almost no risk of two backup
>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>> user with enough access to do that sort of thing can probably cause a
>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>
>>>>>>>
>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>> unique_backup_id’
>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>> need for changing
>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>> unique_backup_id
>>>>>> in the result set.
>>>>>>
>>>>>> I am thinking of the following struct for shared state:
>>>>>>
>>>>>>> *typedef* *struct*
>>>>>>>
>>>>>>> {
>>>>>>>
>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>
>>>>>>> XLogRecPtr startptr;
>>>>>>>
>>>>>>>
>>>>>>> slock_t lock;
>>>>>>>
>>>>>>> int64 throttling_counter;
>>>>>>>
>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>
>>>>>>> } BackupSharedState;
>>>>>>>
>>>>>>>
>>>>>> The shared state structure entries would be maintained by a shared
>>>>>> hash table.
>>>>>> There will be one structure per parallel backup. Since a single
>>>>>> parallel backup
>>>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>>>> be a little
>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>> connections
>>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>>>> the maximum
>>>>>> number of for concurrent parallel backups i.e.
>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>> perhaps, or we can make it user-configurable.
>>>>>>
>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>
>>>>>> Checking for Standby Promotion:
>>>>>> At the START_BACKUP command, we initialize
>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>> file.
>>>>>>
>>>>>> Throttling:
>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>> the same
>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>> will now be
>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>> bytes that
>>>>>> have been transferred. So the workers would sleep if they have
>>>>>> exceeded the
>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>> throttling
>>>>>> value atomically.
>>>>>>
>>>>>> Progress Reporting:
>>>>>> Although I think we should add progress-reporting for parallel backup
>>>>>> as a
>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>> structure
>>>>>> as well.
>>>>>>
>>>>>>
>>>>>> Grammar:
>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>> command;
>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>> command is
>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>> changes
>>>>>> to the grammar.
>>>>>>
>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>   - returns startptr, tli, backup_label
>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>> backend process.
>>>>>>
>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>> LIST_FILES [TABLESPACE]
>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>
>>>>>>
>>
>
> Hi,
>
> rebased and updated to the current master (8128b0c1). v13 is attached.
>
> - Fixes the above reported issues.
>
> - Added progress-reporting support for parallel:
> For this, 'backup_streamed' is moved to a shared structure (BackupState) as
> pg_atomic_uint64 variable. The worker processes will keep incrementing this
> variable.
>
> While files are being transferred from server to client. The main process
> remains
> in an idle state. So after each increment, the worker process will signal
> master to
> update the stats in pg_stat_progress_basebackup view.
>
> The 'tablespace_streamed' column is not updated and will remain empty.
> This is
> because multiple workers may be copying files from different tablespaces.
>
>
> - Added backup manifest:
> The backend workers maintain their own manifest file which contains a list
> of files
> that are being transferred by the work. Once all backup files are
> transferred, the
> workers will create a temp file as
> ('pg_tempdir/temp_file_prefix_backupid.workerid')
> to write the content of the manifest file from BufFile. The workers won’t
> add the
> header, nor the WAL information in their manifest. These two will be added
> by the
> main process while merging all worker manifest files.
>
> The main process will read these individual files and concatenate them
> into a single file
> which is then sent back to the client.
>
> The manifest file is created when the following command is received:
>
>>     BUILD_MANIFEST 'backupid'
>
>
> This is a new replication command. It is sent when pg_basebackup has
> copied all the
> $PGDATA files including WAL files.
>
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-14T14:36:58Z

On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <
kashif.zeeshan@enterprisedb.com> wrote:

> Hi Asif
>
> Getting the following error on Parallel backup when --no-manifest option
> is used.
>
> [edb@localhost bin]$
> [edb@localhost bin]$
> [edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
>  /home/edb/Desktop/backup/ --no-manifest
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_10223"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: write-ahead log end point: 0/2000100
> pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
>  could not open file
> "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
> or directory
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> [edb@localhost bin]$
>

I forgot to make a check for no-manifest. Fixed. Attached is the updated
patch.


> Thanks
>
> On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>>
>> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
>> kashif.zeeshan@enterprisedb.com> wrote:
>>
>>>
>>>
>>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>>
>>>> I have added the shared state as previously described. The new grammar
>>>> changes
>>>> are as follows:
>>>>
>>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>>     - This will generate a unique backupid using pg_strong_random(16)
>>>> and hex-encoded
>>>>       it. which is then returned as the result set.
>>>>     - It will also create a shared state and add it to the hashtable.
>>>> The hash table size is set
>>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>>> dynamically, I think it's
>>>>       sufficient initial size. max_wal_senders is not used, because it
>>>> can be set to quite a
>>>>       large values.
>>>>
>>>> JOIN_BACKUP 'backup_id'
>>>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>>>
>>>>
>>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>>     - renamed SEND_FILES to SEND_FILE
>>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>>>> accessible through
>>>>       shared state.
>>>>
>>>> There is no change in other commands:
>>>> STOP_BACKUP [NOWAIT]
>>>> LIST_TABLESPACES [PROGRESS]
>>>> LIST_FILES [TABLESPACE]
>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>
>>>> The current patches (v11) have been rebased to the latest master. The
>>>> backup manifest is enabled
>>>> by default, so I have disabled it for parallel backup mode and have
>>>> generated a warning so that
>>>> user is aware of it and not expect it in the backup.
>>>>
>>>> Hi Asif
>>>
>>> I have verified the bug fixes, one bug is fixed and working now as
>>> expected
>>>
>>> For the verification of the other bug fixes faced following issues,
>>> please have a look.
>>>
>>>
>>> 1) Following bug fixes mentioned below are generating segmentation
>>> fault.
>>>
>>> Please note for reference I have added a description only as steps were
>>> given in previous emails of each bug I tried to verify the fix. Backtrace
>>> is also added with each case which points to one bug for both the cases.
>>>
>>> a) The backup failed with errors "error: could not connect to server:
>>> could not look up local user ID 1000: Too many open files" when the
>>> max_wal_senders was set to 2000.
>>>
>>>
>>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>>  /home/edb/Desktop/backup/
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> ….
>>> ….
>>> pg_basebackup: backup worker (1014) created
>>> pg_basebackup: backup worker (1015) created
>>> pg_basebackup: backup worker (1016) created
>>> pg_basebackup: backup worker (1017) created
>>> pg_basebackup: error: could not connect to server: could not look up
>>> local user ID 1000: Too many open files
>>> Segmentation fault
>>> [edb@localhost bin]$
>>>
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ gdb pg_basebackup
>>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from
>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>> [New LWP 13219]
>>> [New LWP 13222]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `./pg_basebackup -v -j 1990 -D
>>> /home/edb/Desktop/backup/'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>>> (gdb) bt
>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
>>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>>> at exit.c:77
>>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
>>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>>> at pg_basebackup.c:2811
>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>>> pg_basebackup.c:2765
>>> (gdb)
>>>
>>>
>>>
>>>
>>> b) When executing two backups at the same time, getting FATAL error due
>>> to max_wal_senders and instead of exit  Backup got completed.
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>>>  /home/edb/Desktop/backup1/
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: error: could not connect to server: FATAL:  number of
>>> requested standby connections exceeds max_wal_senders (currently 10)
>>> Segmentation fault (core dumped)
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ gdb pg_basebackup
>>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from
>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>> [New LWP 17041]
>>> [New LWP 17067]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `./pg_basebackup -v -j 8 -D
>>> /home/edb/Desktop/backup1/'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>>> (gdb) bt
>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
>>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>>> at exit.c:77
>>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
>>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
>>> at pg_basebackup.c:2811
>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>>> pg_basebackup.c:2765
>>> (gdb)
>>>
>>>
>>>
>>>
>>> 2) The following bug is not fixed yet
>>>
>>> A similar case is when DB Server is shut down while the Parallel Backup
>>> is in progress then the correct error is displayed but then the backup
>>> folder is not cleaned and leaves a corrupt backup.
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>> -j 8
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: backup worker (7) created
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: removing contents of data directory
>>> "/home/edb/Desktop/backup/"
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>>
>>>
>>>
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
>>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
>>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
>>> [edb@localhost bin]$
>>> [edb@localhost bin]$
>>>
>>>
>>>
>>>
>>> Thanks
>>> Kashif Zeeshan
>>>
>>>>
>>>>
>>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>>
>>>>>> Hi Asif
>>>>>>
>>>>>> When a non-existent slot is used with tablespace then correct error
>>>>>> is displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>>> backup.
>>>>>>
>>>>>> Steps
>>>>>> =======
>>>>>>
>>>>>> edb@localhost bin]$
>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>>>> [edb@localhost bin]$
>>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>>>> CREATE TABLESPACE
>>>>>> postgres=#
>>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>>> CREATE TABLE
>>>>>> postgres=# insert into t1 values(100);
>>>>>> INSERT 0 1
>>>>>> postgres=# insert into t1 values(200);
>>>>>> INSERT 0 1
>>>>>> postgres=# insert into t1 values(300);
>>>>>> INSERT 0 1
>>>>>> postgres=#
>>>>>>
>>>>>>
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: error: could not send replication command
>>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>>>> pg_basebackup: backup worker (0) created
>>>>>> pg_basebackup: backup worker (1) created
>>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>> pg_basebackup: error: child thread exited with error 1
>>>>>> [edb@localhost bin]$
>>>>>>
>>>>>> backup folder not cleaned
>>>>>>
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>>>             postgresql.conf
>>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>>>  postgresql.auto.conf
>>>>>> [edb@localhost bin]$
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> If the same case is executed without the parallel backup patch then
>>>>>> the backup folder is cleaned after the error is displayed.
>>>>>>
>>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D
>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: error: could not send replication command
>>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>>
>>>>>
>>>>>
>>>>> Hi Asif
>>>>>
>>>>> A similar case is when DB Server is shut down while the Parallel
>>>>> Backup is in progress then the correct error is displayed but then the
>>>>> backup folder is not cleaned and leaves a corrupt backup. I think one bug
>>>>> fix will solve all these cases where clean up is not done when parallel
>>>>> backup is failed.
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
>>>>>  /home/edb/Desktop/backup/ -j 8
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>>>> pg_basebackup: backup worker (0) created
>>>>> pg_basebackup: backup worker (1) created
>>>>> pg_basebackup: backup worker (2) created
>>>>> pg_basebackup: backup worker (3) created
>>>>> pg_basebackup: backup worker (4) created
>>>>> pg_basebackup: backup worker (5) created
>>>>> pg_basebackup: backup worker (6) created
>>>>> pg_basebackup: backup worker (7) created
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>>
>>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>>> patch then proper clean up is done.
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
>>>>>  /home/edb/Desktop/backup/
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> pg_basebackup: removing contents of data directory
>>>>> "/home/edb/Desktop/backup/"
>>>>> [edb@localhost bin]$
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>>> >
>>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>>> not while the backup is
>>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>>> detection would be much beneficial.
>>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>>
>>>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>>>> thread in a separate patch. A patch proposing to make parallel
>>>>>>>> backup
>>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
>>>>>>>> at
>>>>>>>> least if I have anything to say about it.
>>>>>>>>
>>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>>> current situation is not great, but promotions ought to be
>>>>>>>> relatively
>>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>>>> make that case actually work, rather than just making it fail
>>>>>>>> quicker.
>>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>>> can't
>>>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>>
>>>>>>>
>>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>>> intended
>>>>>>>
>>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>>> with
>>>>>>>
>>>>>>> you that the behavior of parallel backup should be consistent with
>>>>>>> the
>>>>>>>
>>>>>>> non-parallel one.  Please see the code snippet below from
>>>>>>>
>>>>>>> basebackup.c:sendDir()
>>>>>>>
>>>>>>>
>>>>>>> /*
>>>>>>>>
>>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with
>>>>>>>> an
>>>>>>>>
>>>>>>>>  * error in that case. The error handler further up will call
>>>>>>>>
>>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>>
>>>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>>>
>>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>>
>>>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>>>
>>>>>>>>  */
>>>>>>>>
>>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>>
>>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>>
>>>>>>>> ereport(ERROR,
>>>>>>>>
>>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>>
>>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>>
>>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>>
>>>>>>>> "and should not be used. "
>>>>>>>>
>>>>>>>> "Try taking another online backup.")));
>>>>>>>>
>>>>>>>>
>>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>>> the shared state, we can use
>>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>>> checking.
>>>>>>>>
>>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>>> you
>>>>>>>> plan to add before going and implement everything. That will make it
>>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>>> the
>>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>>> coordinating backends taking several concurrent parallel backups
>>>>>>>> will
>>>>>>>> end up with one shared state per parallel backup.
>>>>>>>>
>>>>>>>> > There are two possible options:
>>>>>>>> >
>>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
>>>>>>>> OR
>>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>>>> >
>>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>>> command. All client workers
>>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>>> So that we can use this identifier
>>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>>
>>>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>>> that's
>>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>>> into
>>>>>>>> the wire protocol.
>>>>>>>>
>>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>>> backup
>>>>>>>> ID, e.g. leader does:
>>>>>>>>
>>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>>
>>>>>>>> And then others do:
>>>>>>>>
>>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>>
>>>>>>>> My thought is that you will have a number of shared memory structure
>>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>>> state for one backup. The shared state will include
>>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>>
>>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way
>>>>>>>> is
>>>>>>>> better. Either way, the backup ID should be something long and hard
>>>>>>>> to
>>>>>>>> guess, not e.g. the leader processes' PID. I think we should
>>>>>>>> generate
>>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode
>>>>>>>> the
>>>>>>>> result to get a string. That way there's almost no risk of two
>>>>>>>> backup
>>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>>> user trying to screw up somebody else's parallel backup by choosing
>>>>>>>> a
>>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>>> user with enough access to do that sort of thing can probably cause
>>>>>>>> a
>>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>>
>>>>>>>>
>>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>>> unique_backup_id’
>>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>>> need for changing
>>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>>> unique_backup_id
>>>>>>> in the result set.
>>>>>>>
>>>>>>> I am thinking of the following struct for shared state:
>>>>>>>
>>>>>>>> *typedef* *struct*
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>>
>>>>>>>> XLogRecPtr startptr;
>>>>>>>>
>>>>>>>>
>>>>>>>> slock_t lock;
>>>>>>>>
>>>>>>>> int64 throttling_counter;
>>>>>>>>
>>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>>
>>>>>>>> } BackupSharedState;
>>>>>>>>
>>>>>>>>
>>>>>>> The shared state structure entries would be maintained by a shared
>>>>>>> hash table.
>>>>>>> There will be one structure per parallel backup. Since a single
>>>>>>> parallel backup
>>>>>>> can engage more than one wal sender, so I think max_wal_senders
>>>>>>> might be a little
>>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>>> connections
>>>>>>> per parallel backup? Alternatively, we can set a new GUC that
>>>>>>> defines the maximum
>>>>>>> number of for concurrent parallel backups i.e.
>>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>>> perhaps, or we can make it user-configurable.
>>>>>>>
>>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>>
>>>>>>> Checking for Standby Promotion:
>>>>>>> At the START_BACKUP command, we initialize
>>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>>> file.
>>>>>>>
>>>>>>> Throttling:
>>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>>> the same
>>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>>> will now be
>>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>>> bytes that
>>>>>>> have been transferred. So the workers would sleep if they have
>>>>>>> exceeded the
>>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>>> throttling
>>>>>>> value atomically.
>>>>>>>
>>>>>>> Progress Reporting:
>>>>>>> Although I think we should add progress-reporting for parallel
>>>>>>> backup as a
>>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>>> structure
>>>>>>> as well.
>>>>>>>
>>>>>>>
>>>>>>> Grammar:
>>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>>> command;
>>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>>> command is
>>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>>> changes
>>>>>>> to the grammar.
>>>>>>>
>>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>>   - returns startptr, tli, backup_label
>>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>>> backend process.
>>>>>>>
>>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>>> LIST_FILES [TABLESPACE]
>>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>>
>>>>>>>
>>>
>>
>> Hi,
>>
>> rebased and updated to the current master (8128b0c1). v13 is attached.
>>
>> - Fixes the above reported issues.
>>
>> - Added progress-reporting support for parallel:
>> For this, 'backup_streamed' is moved to a shared structure (BackupState)
>> as
>> pg_atomic_uint64 variable. The worker processes will keep incrementing
>> this
>> variable.
>>
>> While files are being transferred from server to client. The main process
>> remains
>> in an idle state. So after each increment, the worker process will signal
>> master to
>> update the stats in pg_stat_progress_basebackup view.
>>
>> The 'tablespace_streamed' column is not updated and will remain empty.
>> This is
>> because multiple workers may be copying files from different tablespaces.
>>
>>
>> - Added backup manifest:
>> The backend workers maintain their own manifest file which contains a
>> list of files
>> that are being transferred by the work. Once all backup files are
>> transferred, the
>> workers will create a temp file as
>> ('pg_tempdir/temp_file_prefix_backupid.workerid')
>> to write the content of the manifest file from BufFile. The workers won’t
>> add the
>> header, nor the WAL information in their manifest. These two will be
>> added by the
>> main process while merging all worker manifest files.
>>
>> The main process will read these individual files and concatenate them
>> into a single file
>> which is then sent back to the client.
>>
>> The manifest file is created when the following command is received:
>>
>>>     BUILD_MANIFEST 'backupid'
>>
>>
>> This is a new replication command. It is sent when pg_basebackup has
>> copied all the
>> $PGDATA files including WAL files.
>>
>>
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> Regards
> ====================================
> Kashif Zeeshan
> Lead Quality Assurance Engineer / Manager
>
> EnterpriseDB Corporation
> The Enterprise Postgres Company
>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-14T20:49:04Z

On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

+list_wal_files_opt_list:
+                       SCONST SCONST
                                {
-                                 $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+                                       $$ = list_make2(
+                                       makeDefElem("start_wal_location",
+                                               (Node *)makeString($2), -1),
+                                       makeDefElem("end_wal_location",
+                                               (Node *)makeString($2), -1));
+
                                }

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
                set_ps_display(activitymsg);
        }

-       perform_base_backup(&opt);
+       switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

-       return (manifest->buffile != NULL);
+       return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
  */
 static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+                                        StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

+       if (strlen(opt->tablespace) > 0)
+               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+       else
+               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+       SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

+       if (state->parallel_mode)
+               SpinLockAcquire(&state->lock);
+
+       state->throttling_counter += increment;
+
+       if (state->parallel_mode)
+               SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+       if (file == NULL)
+               return;

That seems totally inappropriate.

+                       sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

+               else
+                       ereport(WARNING,
+                                       (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

+       /*
+        * Check for checksum failures. If there are failures across multiple
+        * processes it may not report total checksum count, but it will error
+        * out,terminating the backup.
+        */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-04-15T08:49:39Z

On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > I forgot to make a check for no-manifest. Fixed. Attached is the updated
> patch.
>
> +typedef struct
> +{
> ...
> +} BackupFile;
> +
> +typedef struct
> +{
> ...
> +} BackupState;
>
> These structures need comments.
>
> +list_wal_files_opt_list:
> +                       SCONST SCONST
>                                 {
> -                                 $$ = makeDefElem("manifest_checksums",
> -
> (Node *)makeString($2), -1);
> +                                       $$ = list_make2(
> +                                       makeDefElem("start_wal_location",
> +                                               (Node *)makeString($2),
> -1),
> +                                       makeDefElem("end_wal_location",
> +                                               (Node *)makeString($2),
> -1));
> +
>                                 }
>
> This seems like an unnecessarily complicated parse representation. The
> DefElems seem to be completely unnecessary here.
>
> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
>                 set_ps_display(activitymsg);
>         }
>
> -       perform_base_backup(&opt);
> +       switch (cmd->cmdtag)
>
> So the design here is that SendBaseBackup() is now going to do a bunch
> of things that are NOT sending a base backup? With no updates to the
> comments of that function and no change to the process title it sets?
>
> -       return (manifest->buffile != NULL);
> +       return (manifest && manifest->buffile != NULL);
>
> Heck no. It appears that you didn't even bother reading the function
> header comment.
>
> + * Send a single resultset containing XLogRecPtr record (in text format)
> + * TimelineID and backup label.
>   */
>  static void
> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
> +                                        StringInfo label, char *backupid)
>
> This just casually breaks wire protocol compatibility, which seems
> completely unacceptable.
>
> +       if (strlen(opt->tablespace) > 0)
> +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
> +       else
> +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
> +
> +       SendFilesHeader(files);
>
> So I guess the idea here is that we buffer the entire list of files in
> memory, regardless of size, and then we send it out afterwards. That
> doesn't seem like a good idea. The list of files might be very large.
> We probably need some code refactoring here rather than just piling
> more and more different responsibilities onto sendTablespace() and
> sendDir().
>
> +       if (state->parallel_mode)
> +               SpinLockAcquire(&state->lock);
> +
> +       state->throttling_counter += increment;
> +
> +       if (state->parallel_mode)
> +               SpinLockRelease(&state->lock);
>
> I don't like this much. It seems to me that we would do better to use
> atomics here all the time, instead of conditional spinlocks.
>
> +static void
> +send_file(basebackup_options *opt, char *file, bool missing_ok)
> ...
> +       if (file == NULL)
> +               return;
>
> That seems totally inappropriate.
>
> +                       sendFile(file, file + basepathlen, &statbuf,
> true, InvalidOid, NULL, NULL);
>
> Maybe I'm misunderstanding, but this looks like it's going to write a
> tar header, even though we're not writing a tarfile.
>
> +               else
> +                       ereport(WARNING,
> +                                       (errmsg("skipping special file
> or directory \"%s\"", file)));
>
> So, if the user asks for a directory or symlink, what's going to
> happen is that they're going to receive an empty file, and get a
> warning. That sounds like terrible behavior.
>
> +       /*
> +        * Check for checksum failures. If there are failures across
> multiple
> +        * processes it may not report total checksum count, but it will
> error
> +        * out,terminating the backup.
> +        */
>
> In other words, the patch breaks the feature. Not that the feature in
> question works particularly well as things stand, but this makes it
> worse.
>
> I think this patch (0003) is in really bad shape. I'm having second
> thoughts about the design, but it's kind of hard to even have a
> discussion about the design when the patch is riddled with minor
> problems like inadequate comments, failure to update existing
> comments, and breaking a bunch of things. I understand that sometimes
> things get missed, but this is version 14 of a patch that's been
> kicking around since last August.


Fair enough. Some of this is also due to backup related features i.e backup
manifest, progress reporting that got committed to master towards the tail
end of PG-13. Rushing to get parallel backup feature compatible with these
features also caused some of the oversights.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-15T09:28:37Z

Hi Asif,

In below scenarios backup verification failed for tablespace, when backup
taken with parallel option.
without parallel for the same scenario pg_verifybackup is passed without
any error.

[edb@localhost bin]$ mkdir /tmp/test_bkp/tblsp1
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp1
location '/tmp/test_bkp/tblsp1';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text)
tablespace tblsp1;"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -T tablespace option');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp -T
/tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp2 -j 4
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16390" is
present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16388" is
present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16385" is
present on disk but not in the manifest
pg_verifybackup: error: "/PG_13_202004074/13530/16388" is present in the
manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16390" is present in the
manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16385" is present in the
manifest but not on disk

--without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp1 -T
/tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp3 -j 1
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp1
backup successfully verified


Thanks & Regards,
Rajkumar Raghuwanshi


On Wed, Apr 15, 2020 at 2:19 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

>
>
> On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > I forgot to make a check for no-manifest. Fixed. Attached is the
>> updated patch.
>>
>> +typedef struct
>> +{
>> ...
>> +} BackupFile;
>> +
>> +typedef struct
>> +{
>> ...
>> +} BackupState;
>>
>> These structures need comments.
>>
>> +list_wal_files_opt_list:
>> +                       SCONST SCONST
>>                                 {
>> -                                 $$ = makeDefElem("manifest_checksums",
>> -
>> (Node *)makeString($2), -1);
>> +                                       $$ = list_make2(
>> +                                       makeDefElem("start_wal_location",
>> +                                               (Node *)makeString($2),
>> -1),
>> +                                       makeDefElem("end_wal_location",
>> +                                               (Node *)makeString($2),
>> -1));
>> +
>>                                 }
>>
>> This seems like an unnecessarily complicated parse representation. The
>> DefElems seem to be completely unnecessary here.
>>
>> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
>>                 set_ps_display(activitymsg);
>>         }
>>
>> -       perform_base_backup(&opt);
>> +       switch (cmd->cmdtag)
>>
>> So the design here is that SendBaseBackup() is now going to do a bunch
>> of things that are NOT sending a base backup? With no updates to the
>> comments of that function and no change to the process title it sets?
>>
>> -       return (manifest->buffile != NULL);
>> +       return (manifest && manifest->buffile != NULL);
>>
>> Heck no. It appears that you didn't even bother reading the function
>> header comment.
>>
>> + * Send a single resultset containing XLogRecPtr record (in text format)
>> + * TimelineID and backup label.
>>   */
>>  static void
>> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
>> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
>> +                                        StringInfo label, char *backupid)
>>
>> This just casually breaks wire protocol compatibility, which seems
>> completely unacceptable.
>>
>> +       if (strlen(opt->tablespace) > 0)
>> +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
>> +       else
>> +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
>> +
>> +       SendFilesHeader(files);
>>
>> So I guess the idea here is that we buffer the entire list of files in
>> memory, regardless of size, and then we send it out afterwards. That
>> doesn't seem like a good idea. The list of files might be very large.
>> We probably need some code refactoring here rather than just piling
>> more and more different responsibilities onto sendTablespace() and
>> sendDir().
>>
>> +       if (state->parallel_mode)
>> +               SpinLockAcquire(&state->lock);
>> +
>> +       state->throttling_counter += increment;
>> +
>> +       if (state->parallel_mode)
>> +               SpinLockRelease(&state->lock);
>>
>> I don't like this much. It seems to me that we would do better to use
>> atomics here all the time, instead of conditional spinlocks.
>>
>> +static void
>> +send_file(basebackup_options *opt, char *file, bool missing_ok)
>> ...
>> +       if (file == NULL)
>> +               return;
>>
>> That seems totally inappropriate.
>>
>> +                       sendFile(file, file + basepathlen, &statbuf,
>> true, InvalidOid, NULL, NULL);
>>
>> Maybe I'm misunderstanding, but this looks like it's going to write a
>> tar header, even though we're not writing a tarfile.
>>
>> +               else
>> +                       ereport(WARNING,
>> +                                       (errmsg("skipping special file
>> or directory \"%s\"", file)));
>>
>> So, if the user asks for a directory or symlink, what's going to
>> happen is that they're going to receive an empty file, and get a
>> warning. That sounds like terrible behavior.
>>
>> +       /*
>> +        * Check for checksum failures. If there are failures across
>> multiple
>> +        * processes it may not report total checksum count, but it will
>> error
>> +        * out,terminating the backup.
>> +        */
>>
>> In other words, the patch breaks the feature. Not that the feature in
>> question works particularly well as things stand, but this makes it
>> worse.
>>
>> I think this patch (0003) is in really bad shape. I'm having second
>> thoughts about the design, but it's kind of hard to even have a
>> discussion about the design when the patch is riddled with minor
>> problems like inadequate comments, failure to update existing
>> comments, and breaking a bunch of things. I understand that sometimes
>> things get missed, but this is version 14 of a patch that's been
>> kicking around since last August.
>
>
> Fair enough. Some of this is also due to backup related features i.e
> backup manifest, progress reporting that got committed to master towards
> the tail end of PG-13. Rushing to get parallel backup feature compatible
> with these features also caused some of the oversights.
>
>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
> Highgo Software (Canada/China/Pakistan)
> URL : http://www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> EMAIL: mailto: ahsan.hadi@highgo.ca
>

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-15T13:31:22Z

On Wed, Apr 15, 2020 at 4:49 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
> Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committed to master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features also caused some of the oversights.

Sure, but there's also no point in rushing out a feature that's in a
state where it's got no chance of being acceptable, and quite a number
of these problems are not new, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-17T06:00:55Z

On Tue, Apr 14, 2020 at 7:37 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>> Hi Asif
>>
>> Getting the following error on Parallel backup when --no-manifest option
>> is used.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -j 5  -D
>>  /home/edb/Desktop/backup/ --no-manifest
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_10223"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: write-ahead log end point: 0/2000100
>> pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
>>  could not open file
>> "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
>> or directory
>> pg_basebackup: removing contents of data directory
>> "/home/edb/Desktop/backup/"
>> [edb@localhost bin]$
>>
>
> I forgot to make a check for no-manifest. Fixed. Attached is the updated
> patch.
>
Hi Asif

Verified the fix, thanks.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D
/home/edb/Desktop/backup --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_27407"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/4000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  pg_commit_ts  pg_ident.conf  pg_notify    pg_snapshots
pg_subtrans  PG_VERSION  postgresql.auto.conf
base          pg_dynshmem   pg_logical     pg_replslot  pg_stat
pg_tblspc    pg_wal      postgresql.conf
global        pg_hba.conf   pg_multixact   pg_serial    pg_stat_tmp
pg_twophase  pg_xact
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

Regards
Kashif Zeeshan

>
>
>> Thanks
>>
>> On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>>>
>>>>> I have added the shared state as previously described. The new grammar
>>>>> changes
>>>>> are as follows:
>>>>>
>>>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>>>     - This will generate a unique backupid using pg_strong_random(16)
>>>>> and hex-encoded
>>>>>       it. which is then returned as the result set.
>>>>>     - It will also create a shared state and add it to the hashtable.
>>>>> The hash table size is set
>>>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>>>> dynamically, I think it's
>>>>>       sufficient initial size. max_wal_senders is not used, because it
>>>>> can be set to quite a
>>>>>       large values.
>>>>>
>>>>> JOIN_BACKUP 'backup_id'
>>>>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>>>>
>>>>>
>>>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>>>     - renamed SEND_FILES to SEND_FILE
>>>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>>>>> accessible through
>>>>>       shared state.
>>>>>
>>>>> There is no change in other commands:
>>>>> STOP_BACKUP [NOWAIT]
>>>>> LIST_TABLESPACES [PROGRESS]
>>>>> LIST_FILES [TABLESPACE]
>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>
>>>>> The current patches (v11) have been rebased to the latest master. The
>>>>> backup manifest is enabled
>>>>> by default, so I have disabled it for parallel backup mode and have
>>>>> generated a warning so that
>>>>> user is aware of it and not expect it in the backup.
>>>>>
>>>>> Hi Asif
>>>>
>>>> I have verified the bug fixes, one bug is fixed and working now as
>>>> expected
>>>>
>>>> For the verification of the other bug fixes faced following issues,
>>>> please have a look.
>>>>
>>>>
>>>> 1) Following bug fixes mentioned below are generating segmentation
>>>> fault.
>>>>
>>>> Please note for reference I have added a description only as steps were
>>>> given in previous emails of each bug I tried to verify the fix. Backtrace
>>>> is also added with each case which points to one bug for both the cases.
>>>>
>>>> a) The backup failed with errors "error: could not connect to server:
>>>> could not look up local user ID 1000: Too many open files" when the
>>>> max_wal_senders was set to 2000.
>>>>
>>>>
>>>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>>>  /home/edb/Desktop/backup/
>>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>>> mode
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> ….
>>>> ….
>>>> pg_basebackup: backup worker (1014) created
>>>> pg_basebackup: backup worker (1015) created
>>>> pg_basebackup: backup worker (1016) created
>>>> pg_basebackup: backup worker (1017) created
>>>> pg_basebackup: error: could not connect to server: could not look up
>>>> local user ID 1000: Too many open files
>>>> Segmentation fault
>>>> [edb@localhost bin]$
>>>>
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ gdb pg_basebackup
>>>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>>> License GPLv3+: GNU GPL version 3 or later <
>>>> http://gnu.org/licenses/gpl.html>
>>>> This is free software: you are free to change and redistribute it.
>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>>> copying"
>>>> and "show warranty" for details.
>>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>>> For bug reporting instructions, please see:
>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>> Reading symbols from
>>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>>> [New LWP 13219]
>>>> [New LWP 13222]
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>> Core was generated by `./pg_basebackup -v -j 1990 -D
>>>> /home/edb/Desktop/backup/'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>>>> (gdb) bt
>>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
>>>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>>>> at exit.c:77
>>>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at
>>>> exit.c:99
>>>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>>>> at pg_basebackup.c:2811
>>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>>>> pg_basebackup.c:2765
>>>> (gdb)
>>>>
>>>>
>>>>
>>>>
>>>> b) When executing two backups at the same time, getting FATAL error due
>>>> to max_wal_senders and instead of exit  Backup got completed.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>>>>  /home/edb/Desktop/backup1/
>>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>>> mode
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> pg_basebackup: backup worker (4) created
>>>> pg_basebackup: backup worker (5) created
>>>> pg_basebackup: backup worker (6) created
>>>> pg_basebackup: error: could not connect to server: FATAL:  number of
>>>> requested standby connections exceeds max_wal_senders (currently 10)
>>>> Segmentation fault (core dumped)
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ gdb pg_basebackup
>>>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>>> License GPLv3+: GNU GPL version 3 or later <
>>>> http://gnu.org/licenses/gpl.html>
>>>> This is free software: you are free to change and redistribute it.
>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>>> copying"
>>>> and "show warranty" for details.
>>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>>> For bug reporting instructions, please see:
>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>> Reading symbols from
>>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>>> [New LWP 17041]
>>>> [New LWP 17067]
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>> Core was generated by `./pg_basebackup -v -j 8 -D
>>>> /home/edb/Desktop/backup1/'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>>>> (gdb) bt
>>>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
>>>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>>>> at exit.c:77
>>>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at
>>>> exit.c:99
>>>> #5  0x0000000000408c54 in create_parallel_workers
>>>> (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
>>>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>>>> pg_basebackup.c:2765
>>>> (gdb)
>>>>
>>>>
>>>>
>>>>
>>>> 2) The following bug is not fixed yet
>>>>
>>>> A similar case is when DB Server is shut down while the Parallel Backup
>>>> is in progress then the correct error is displayed but then the backup
>>>> folder is not cleaned and leaves a corrupt backup.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> -j 8
>>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>>> mode
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> pg_basebackup: backup worker (4) created
>>>> pg_basebackup: backup worker (5) created
>>>> pg_basebackup: backup worker (6) created
>>>> pg_basebackup: backup worker (7) created
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: removing contents of data directory
>>>> "/home/edb/Desktop/backup/"
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>>
>>>>
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
>>>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
>>>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>>>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Kashif Zeeshan
>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>>>
>>>>>>> Hi Asif
>>>>>>>
>>>>>>> When a non-existent slot is used with tablespace then correct error
>>>>>>> is displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>>>> backup.
>>>>>>>
>>>>>>> Steps
>>>>>>> =======
>>>>>>>
>>>>>>> edb@localhost bin]$
>>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>>>>> [edb@localhost bin]$
>>>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>>>>> CREATE TABLESPACE
>>>>>>> postgres=#
>>>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>>>> CREATE TABLE
>>>>>>> postgres=# insert into t1 values(100);
>>>>>>> INSERT 0 1
>>>>>>> postgres=# insert into t1 values(200);
>>>>>>> INSERT 0 1
>>>>>>> postgres=# insert into t1 values(300);
>>>>>>> INSERT 0 1
>>>>>>> postgres=#
>>>>>>>
>>>>>>>
>>>>>>> [edb@localhost bin]$
>>>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>>> complete
>>>>>>> pg_basebackup: checkpoint completed
>>>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>>>> pg_basebackup: starting background WAL receiver
>>>>>>> pg_basebackup: error: could not send replication command
>>>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>>>>> pg_basebackup: backup worker (0) created
>>>>>>> pg_basebackup: backup worker (1) created
>>>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>>> pg_basebackup: error: child thread exited with error 1
>>>>>>> [edb@localhost bin]$
>>>>>>>
>>>>>>> backup folder not cleaned
>>>>>>>
>>>>>>> [edb@localhost bin]$
>>>>>>> [edb@localhost bin]$
>>>>>>> [edb@localhost bin]$
>>>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>>>>             postgresql.conf
>>>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>>>>>   pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>>>>  postgresql.auto.conf
>>>>>>> [edb@localhost bin]$
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If the same case is executed without the parallel backup patch then
>>>>>>> the backup folder is cleaned after the error is displayed.
>>>>>>>
>>>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D
>>>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>>> complete
>>>>>>> pg_basebackup: checkpoint completed
>>>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>>>> pg_basebackup: starting background WAL receiver
>>>>>>> pg_basebackup: error: could not send replication command
>>>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Asif
>>>>>>
>>>>>> A similar case is when DB Server is shut down while the Parallel
>>>>>> Backup is in progress then the correct error is displayed but then the
>>>>>> backup folder is not cleaned and leaves a corrupt backup. I think one bug
>>>>>> fix will solve all these cases where clean up is not done when parallel
>>>>>> backup is failed.
>>>>>>
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
>>>>>>  /home/edb/Desktop/backup/ -j 8
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: created temporary replication slot
>>>>>> "pg_basebackup_57337"
>>>>>> pg_basebackup: backup worker (0) created
>>>>>> pg_basebackup: backup worker (1) created
>>>>>> pg_basebackup: backup worker (2) created
>>>>>> pg_basebackup: backup worker (3) created
>>>>>> pg_basebackup: backup worker (4) created
>>>>>> pg_basebackup: backup worker (5) created
>>>>>> pg_basebackup: backup worker (6) created
>>>>>> pg_basebackup: backup worker (7) created
>>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>>> connection unexpectedly
>>>>>> This probably means the server terminated abnormally
>>>>>> before or while processing the request.
>>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>>> connection unexpectedly
>>>>>> This probably means the server terminated abnormally
>>>>>> before or while processing the request.
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$
>>>>>>
>>>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>>>> patch then proper clean up is done.
>>>>>>
>>>>>> [edb@localhost bin]$
>>>>>> [edb@localhost bin]$  ./pg_basebackup -v -D
>>>>>>  /home/edb/Desktop/backup/
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>>> connection unexpectedly
>>>>>> This probably means the server terminated abnormally
>>>>>> before or while processing the request.
>>>>>> pg_basebackup: removing contents of data directory
>>>>>> "/home/edb/Desktop/backup/"
>>>>>> [edb@localhost bin]$
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <
>>>>>>>>> asifr.rehman@gmail.com> wrote:
>>>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>>>> >
>>>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>>>> not while the backup is
>>>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>>>> detection would be much beneficial.
>>>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>>>
>>>>>>>>> Because non-parallel backup does not feature early detection of
>>>>>>>>> this
>>>>>>>>> error, it is not necessary to make parallel backup do so. Indeed,
>>>>>>>>> it
>>>>>>>>> is undesirable. If you want to fix that problem, do it on a
>>>>>>>>> separate
>>>>>>>>> thread in a separate patch. A patch proposing to make parallel
>>>>>>>>> backup
>>>>>>>>> inconsistent in behavior with non-parallel backup will be
>>>>>>>>> rejected, at
>>>>>>>>> least if I have anything to say about it.
>>>>>>>>>
>>>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>>>> current situation is not great, but promotions ought to be
>>>>>>>>> relatively
>>>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>>>> also worth considering whether the right fix is to figure out how
>>>>>>>>> to
>>>>>>>>> make that case actually work, rather than just making it fail
>>>>>>>>> quicker.
>>>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>>>> can't
>>>>>>>>> express an intelligent opinion on what the right answer is here,
>>>>>>>>> but
>>>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>>>> intended
>>>>>>>>
>>>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>>>> with
>>>>>>>>
>>>>>>>> you that the behavior of parallel backup should be consistent with
>>>>>>>> the
>>>>>>>>
>>>>>>>> non-parallel one.  Please see the code snippet below from
>>>>>>>>
>>>>>>>> basebackup.c:sendDir()
>>>>>>>>
>>>>>>>>
>>>>>>>> /*
>>>>>>>>>
>>>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with
>>>>>>>>> an
>>>>>>>>>
>>>>>>>>>  * error in that case. The error handler further up will call
>>>>>>>>>
>>>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>>>
>>>>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>>>>
>>>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>>>
>>>>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>>>>
>>>>>>>>>  */
>>>>>>>>>
>>>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>>>
>>>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>>>
>>>>>>>>> ereport(ERROR,
>>>>>>>>>
>>>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>>>
>>>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>>>
>>>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>>>
>>>>>>>>> "and should not be used. "
>>>>>>>>>
>>>>>>>>> "Try taking another online backup.")));
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>>>> the shared state, we can use
>>>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>>>> checking.
>>>>>>>>>
>>>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>>>> you
>>>>>>>>> plan to add before going and implement everything. That will make
>>>>>>>>> it
>>>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>>>> the
>>>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>>>> coordinating backends taking several concurrent parallel backups
>>>>>>>>> will
>>>>>>>>> end up with one shared state per parallel backup.
>>>>>>>>>
>>>>>>>>> > There are two possible options:
>>>>>>>>> >
>>>>>>>>> > (1) Server may generate a unique ID i.e.
>>>>>>>>> BackupID=<unique_string> OR
>>>>>>>>> > (2) (Preferred Option) Use the WAL start location as the
>>>>>>>>> BackupID.
>>>>>>>>> >
>>>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>>>> command. All client workers
>>>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>>>> So that we can use this identifier
>>>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>>>
>>>>>>>>> Using the WAL start location as the backup ID seems like it might
>>>>>>>>> be
>>>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>>>> that's
>>>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>>>> into
>>>>>>>>> the wire protocol.
>>>>>>>>>
>>>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>>>> backup
>>>>>>>>> ID, e.g. leader does:
>>>>>>>>>
>>>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>>>
>>>>>>>>> And then others do:
>>>>>>>>>
>>>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>>>
>>>>>>>>> My thought is that you will have a number of shared memory
>>>>>>>>> structure
>>>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>>>> state for one backup. The shared state will include
>>>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the
>>>>>>>>> backup
>>>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>>>
>>>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way
>>>>>>>>> is
>>>>>>>>> better. Either way, the backup ID should be something long and
>>>>>>>>> hard to
>>>>>>>>> guess, not e.g. the leader processes' PID. I think we should
>>>>>>>>> generate
>>>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode
>>>>>>>>> the
>>>>>>>>> result to get a string. That way there's almost no risk of two
>>>>>>>>> backup
>>>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>>>> user trying to screw up somebody else's parallel backup by
>>>>>>>>> choosing a
>>>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>>>> user with enough access to do that sort of thing can probably
>>>>>>>>> cause a
>>>>>>>>> lot worse problems anyway, but it seems pretty easy to guard
>>>>>>>>> against
>>>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>>>> unique_backup_id’
>>>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>>>> need for changing
>>>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>>>> unique_backup_id
>>>>>>>> in the result set.
>>>>>>>>
>>>>>>>> I am thinking of the following struct for shared state:
>>>>>>>>
>>>>>>>>> *typedef* *struct*
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>
>>>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>>>
>>>>>>>>> XLogRecPtr startptr;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> slock_t lock;
>>>>>>>>>
>>>>>>>>> int64 throttling_counter;
>>>>>>>>>
>>>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>>>
>>>>>>>>> } BackupSharedState;
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The shared state structure entries would be maintained by a shared
>>>>>>>> hash table.
>>>>>>>> There will be one structure per parallel backup. Since a single
>>>>>>>> parallel backup
>>>>>>>> can engage more than one wal sender, so I think max_wal_senders
>>>>>>>> might be a little
>>>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>>>> connections
>>>>>>>> per parallel backup? Alternatively, we can set a new GUC that
>>>>>>>> defines the maximum
>>>>>>>> number of for concurrent parallel backups i.e.
>>>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>>>> perhaps, or we can make it user-configurable.
>>>>>>>>
>>>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>>>
>>>>>>>> Checking for Standby Promotion:
>>>>>>>> At the START_BACKUP command, we initialize
>>>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>>>> file.
>>>>>>>>
>>>>>>>> Throttling:
>>>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>>>> the same
>>>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>>>> will now be
>>>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>>>> bytes that
>>>>>>>> have been transferred. So the workers would sleep if they have
>>>>>>>> exceeded the
>>>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>>>> throttling
>>>>>>>> value atomically.
>>>>>>>>
>>>>>>>> Progress Reporting:
>>>>>>>> Although I think we should add progress-reporting for parallel
>>>>>>>> backup as a
>>>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>>>> structure
>>>>>>>> as well.
>>>>>>>>
>>>>>>>>
>>>>>>>> Grammar:
>>>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>>>> command;
>>>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>>>> command is
>>>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>>>> changes
>>>>>>>> to the grammar.
>>>>>>>>
>>>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>>>   - returns startptr, tli, backup_label
>>>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>>>> backend process.
>>>>>>>>
>>>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>>>> LIST_FILES [TABLESPACE]
>>>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>>>
>>>>>>>>
>>>>
>>>
>>> Hi,
>>>
>>> rebased and updated to the current master (8128b0c1). v13 is attached.
>>>
>>> - Fixes the above reported issues.
>>>
>>> - Added progress-reporting support for parallel:
>>> For this, 'backup_streamed' is moved to a shared structure (BackupState)
>>> as
>>> pg_atomic_uint64 variable. The worker processes will keep incrementing
>>> this
>>> variable.
>>>
>>> While files are being transferred from server to client. The main
>>> process remains
>>> in an idle state. So after each increment, the worker process will
>>> signal master to
>>> update the stats in pg_stat_progress_basebackup view.
>>>
>>> The 'tablespace_streamed' column is not updated and will remain empty.
>>> This is
>>> because multiple workers may be copying files from different tablespaces.
>>>
>>>
>>> - Added backup manifest:
>>> The backend workers maintain their own manifest file which contains a
>>> list of files
>>> that are being transferred by the work. Once all backup files are
>>> transferred, the
>>> workers will create a temp file as
>>> ('pg_tempdir/temp_file_prefix_backupid.workerid')
>>> to write the content of the manifest file from BufFile. The workers
>>> won’t add the
>>> header, nor the WAL information in their manifest. These two will be
>>> added by the
>>> main process while merging all worker manifest files.
>>>
>>> The main process will read these individual files and concatenate them
>>> into a single file
>>> which is then sent back to the client.
>>>
>>> The manifest file is created when the following command is received:
>>>
>>>>     BUILD_MANIFEST 'backupid'
>>>
>>>
>>> This is a new replication command. It is sent when pg_basebackup has
>>> copied all the
>>> $PGDATA files including WAL files.
>>>
>>>
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> --
>> Regards
>> ====================================
>> Kashif Zeeshan
>> Lead Quality Assurance Engineer / Manager
>>
>> EnterpriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> — 2020-04-17T07:08:14Z

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
> kashif.zeeshan@enterprisedb.com> wrote:
>
>>
>>
>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>
>>> I have added the shared state as previously described. The new grammar
>>> changes
>>> are as follows:
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>     - This will generate a unique backupid using pg_strong_random(16)
>>> and hex-encoded
>>>       it. which is then returned as the result set.
>>>     - It will also create a shared state and add it to the hashtable.
>>> The hash table size is set
>>>       to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>> dynamically, I think it's
>>>       sufficient initial size. max_wal_senders is not used, because it
>>> can be set to quite a
>>>       large values.
>>>
>>> JOIN_BACKUP 'backup_id'
>>>     - finds 'backup_id' in hashtable and attaches it to server process.
>>>
>>>
>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>     - renamed SEND_FILES to SEND_FILE
>>>     - removed START_WAL_LOCATION from this because 'startptr' is now
>>> accessible through
>>>       shared state.
>>>
>>> There is no change in other commands:
>>> STOP_BACKUP [NOWAIT]
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>
>>> The current patches (v11) have been rebased to the latest master. The
>>> backup manifest is enabled
>>> by default, so I have disabled it for parallel backup mode and have
>>> generated a warning so that
>>> user is aware of it and not expect it in the backup.
>>>
>>> Hi Asif
>>
>> I have verified the bug fixes, one bug is fixed and working now as
>> expected
>>
>> For the verification of the other bug fixes faced following issues,
>> please have a look.
>>
>>
>> 1) Following bug fixes mentioned below are generating segmentation fault.
>>
>> Please note for reference I have added a description only as steps were
>> given in previous emails of each bug I tried to verify the fix. Backtrace
>> is also added with each case which points to one bug for both the cases.
>>
>> a) The backup failed with errors "error: could not connect to server:
>> could not look up local user ID 1000: Too many open files" when the
>> max_wal_senders was set to 2000.
>>
>>
>> [edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>  /home/edb/Desktop/backup/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> ….
>> ….
>> pg_basebackup: backup worker (1014) created
>> pg_basebackup: backup worker (1015) created
>> pg_basebackup: backup worker (1016) created
>> pg_basebackup: backup worker (1017) created
>> pg_basebackup: error: could not connect to server: could not look up
>> local user ID 1000: Too many open files
>> Segmentation fault
>> [edb@localhost bin]$
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 13219]
>> [New LWP 13222]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 1990 -D
>> /home/edb/Desktop/backup/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f2226f76a49 in __run_exit_handlers (status=1,
>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> b) When executing two backups at the same time, getting FATAL error due
>> to max_wal_senders and instead of exit  Backup got completed.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$  ./pg_basebackup -v -j 8 -D
>>  /home/edb/Desktop/backup1/
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> Segmentation fault (core dumped)
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$ gdb pg_basebackup
>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from
>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>> [New LWP 17041]
>> [New LWP 17067]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `./pg_basebackup -v -j 8 -D
>> /home/edb/Desktop/backup1/'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> 47  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> (gdb) bt
>> #0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>> #1  0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>> #2  0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>> #3  0x00007f051edc1a49 in __run_exit_handlers (status=1,
>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true)
>> at exit.c:77
>> #4  0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
>> #5  0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
>> at pg_basebackup.c:2811
>> #6  0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>> #7  0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>> pg_basebackup.c:2765
>> (gdb)
>>
>>
>>
>>
>> 2) The following bug is not fixed yet
>>
>> A similar case is when DB Server is shut down while the Parallel Backup
>> is in progress then the correct error is displayed but then the backup
>> folder is not cleaned and leaves a corrupt backup.
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j
>> 8
>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>> mode
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: backup worker (2) created
>> pg_basebackup: backup worker (3) created
>> pg_basebackup: backup worker (4) created
>> pg_basebackup: backup worker (5) created
>> pg_basebackup: backup worker (6) created
>> pg_basebackup: backup worker (7) created
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing contents of data directory
>> "/home/edb/Desktop/backup/"
>> pg_basebackup: error: could not read COPY data: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>> [edb@localhost bin]$
>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>> base         pg_hba.conf    pg_logical    pg_notify    pg_serial
>> pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
>> pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
>>  pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
>> [edb@localhost bin]$
>> [edb@localhost bin]$
>>
>>
>>
>>
>> Thanks
>> Kashif Zeeshan
>>
>>>
>>>
>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>> kashif.zeeshan@enterprisedb.com> wrote:
>>>>
>>>>> Hi Asif
>>>>>
>>>>> When a non-existent slot is used with tablespace then correct error is
>>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>> backup.
>>>>>
>>>>> Steps
>>>>> =======
>>>>>
>>>>> edb@localhost bin]$
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl1
>>>>> [edb@localhost bin]$ mkdir /home/edb/tbl_res
>>>>> [edb@localhost bin]$
>>>>> postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
>>>>> CREATE TABLESPACE
>>>>> postgres=#
>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>> CREATE TABLE
>>>>> postgres=# insert into t1 values(100);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(200);
>>>>> INSERT 0 1
>>>>> postgres=# insert into t1 values(300);
>>>>> INSERT 0 1
>>>>> postgres=#
>>>>>
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$  ./pg_basebackup -v -j 2 -D
>>>>>  /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test" does not exist
>>>>> pg_basebackup: backup worker (0) created
>>>>> pg_basebackup: backup worker (1) created
>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child thread exited with error 1
>>>>> [edb@localhost bin]$
>>>>>
>>>>> backup folder not cleaned
>>>>>
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$
>>>>> [edb@localhost bin]$ ls /home/edb/Desktop/backup
>>>>> backup_label  global        pg_dynshmem  pg_ident.conf  pg_multixact
>>>>>  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION  pg_xact
>>>>>             postgresql.conf
>>>>> base          pg_commit_ts  pg_hba.conf  pg_logical     pg_notify
>>>>> pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_wal
>>>>>  postgresql.auto.conf
>>>>> [edb@localhost bin]$
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If the same case is executed without the parallel backup patch then
>>>>> the backup folder is cleaned after the error is displayed.
>>>>>
>>>>> [edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: error: could not send replication command
>>>>> "START_REPLICATION": ERROR:  replication slot "test999" does not exist
>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>
>>>>
>>>>
>>>> Hi Asif
>>>>
>>>> A similar case is when DB Server is shut down while the Parallel Backup
>>>> is in progress then the correct error is displayed but then the backup
>>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>>>> solve all these cases where clean up is not done when parallel backup is
>>>> failed.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> -j 8
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: backup worker (2) created
>>>> pg_basebackup: backup worker (3) created
>>>> pg_basebackup: backup worker (4) created
>>>> pg_basebackup: backup worker (5) created
>>>> pg_basebackup: backup worker (6) created
>>>> pg_basebackup: backup worker (7) created
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$
>>>>
>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>> patch then proper clean up is done.
>>>>
>>>> [edb@localhost bin]$
>>>> [edb@localhost bin]$  ./pg_basebackup -v -D  /home/edb/Desktop/backup/
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>> connection unexpectedly
>>>> This probably means the server terminated abnormally
>>>> before or while processing the request.
>>>> pg_basebackup: removing contents of data directory
>>>> "/home/edb/Desktop/backup/"
>>>> [edb@localhost bin]$
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com>
>>>>>>> wrote:
>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>> >
>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>> not while the backup is
>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>> detection would be much beneficial.
>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>
>>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
>>>>>>> at
>>>>>>> least if I have anything to say about it.
>>>>>>>
>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>> current situation is not great, but promotions ought to be relatively
>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>>> make that case actually work, rather than just making it fail
>>>>>>> quicker.
>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>> can't
>>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>
>>>>>>
>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>> intended
>>>>>>
>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>> with
>>>>>>
>>>>>> you that the behavior of parallel backup should be consistent with the
>>>>>>
>>>>>> non-parallel one.  Please see the code snippet below from
>>>>>>
>>>>>> basebackup.c:sendDir()
>>>>>>
>>>>>>
>>>>>> /*
>>>>>>>
>>>>>>>  * Check if the postmaster has signaled us to exit, and abort with an
>>>>>>>
>>>>>>>  * error in that case. The error handler further up will call
>>>>>>>
>>>>>>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>
>>>>>>>  * started while still in recovery, the server wasn't promoted.
>>>>>>>
>>>>>>>  * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>
>>>>>>>  * the backup early than continue to the end and fail there.
>>>>>>>
>>>>>>>  */
>>>>>>>
>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>
>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>
>>>>>>> ereport(ERROR,
>>>>>>>
>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>
>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>
>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>
>>>>>>> "and should not be used. "
>>>>>>>
>>>>>>> "Try taking another online backup.")));
>>>>>>>
>>>>>>>
>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>> the shared state, we can use
>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>> checking.
>>>>>>>
>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>> you
>>>>>>> plan to add before going and implement everything. That will make it
>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>> the
>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>> coordinating backends taking several concurrent parallel backups will
>>>>>>> end up with one shared state per parallel backup.
>>>>>>>
>>>>>>> > There are two possible options:
>>>>>>> >
>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
>>>>>>> OR
>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>>> >
>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>> command. All client workers
>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>> So that we can use this identifier
>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>
>>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>> that's
>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>> into
>>>>>>> the wire protocol.
>>>>>>>
>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>> backup
>>>>>>> ID, e.g. leader does:
>>>>>>>
>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>
>>>>>>> And then others do:
>>>>>>>
>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>
>>>>>>> My thought is that you will have a number of shared memory structure
>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>> state for one backup. The shared state will include
>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>
>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>>>> better. Either way, the backup ID should be something long and hard
>>>>>>> to
>>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>>>> result to get a string. That way there's almost no risk of two backup
>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>> user with enough access to do that sort of thing can probably cause a
>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>
>>>>>>>
>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>> unique_backup_id’
>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>> need for changing
>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>> unique_backup_id
>>>>>> in the result set.
>>>>>>
>>>>>> I am thinking of the following struct for shared state:
>>>>>>
>>>>>>> *typedef* *struct*
>>>>>>>
>>>>>>> {
>>>>>>>
>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>
>>>>>>> XLogRecPtr startptr;
>>>>>>>
>>>>>>>
>>>>>>> slock_t lock;
>>>>>>>
>>>>>>> int64 throttling_counter;
>>>>>>>
>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>
>>>>>>> } BackupSharedState;
>>>>>>>
>>>>>>>
>>>>>> The shared state structure entries would be maintained by a shared
>>>>>> hash table.
>>>>>> There will be one structure per parallel backup. Since a single
>>>>>> parallel backup
>>>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>>>> be a little
>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>> connections
>>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>>>> the maximum
>>>>>> number of for concurrent parallel backups i.e.
>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>> perhaps, or we can make it user-configurable.
>>>>>>
>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>
>>>>>> Checking for Standby Promotion:
>>>>>> At the START_BACKUP command, we initialize
>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>> file.
>>>>>>
>>>>>> Throttling:
>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>> the same
>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>> will now be
>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>> bytes that
>>>>>> have been transferred. So the workers would sleep if they have
>>>>>> exceeded the
>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>> throttling
>>>>>> value atomically.
>>>>>>
>>>>>> Progress Reporting:
>>>>>> Although I think we should add progress-reporting for parallel backup
>>>>>> as a
>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>> structure
>>>>>> as well.
>>>>>>
>>>>>>
>>>>>> Grammar:
>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>> command;
>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>> command is
>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>> changes
>>>>>> to the grammar.
>>>>>>
>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>   - returns startptr, tli, backup_label, unique_backup_id
>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>   - returns startptr, tli, backup_label
>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>   - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>> backend process.
>>>>>>
>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>> LIST_FILES [TABLESPACE]
>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>
>>>>>>
>>
>
> Hi,
>
> rebased and updated to the current master (8128b0c1). v13 is attached.
>
> - Fixes the above reported issues.
>

Hi Asif

I have verified the bug fixes, out of 3 bugs 2 are now fixed but the
following issue is still not fixed.

*A similar case is when DB Server is shut down while the Parallel Backup is
in progress then the correct error is displayed but then the backup folder
is not cleaned and leaves a corrupt backup. *

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory
"/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$



[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base         pg_hba.conf    pg_logical    pg_notify    pg_serial
pg_stat      pg_subtrans  pg_twophase  pg_xact               postgresql.conf
pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots
 pg_stat_tmp  pg_tblspc    PG_VERSION   postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$



Thanks
Kashif zeeshan


>
> - Added progress-reporting support for parallel:
> For this, 'backup_streamed' is moved to a shared structure (BackupState) as
> pg_atomic_uint64 variable. The worker processes will keep incrementing this
> variable.
>
> While files are being transferred from server to client. The main process
> remains
> in an idle state. So after each increment, the worker process will signal
> master to
> update the stats in pg_stat_progress_basebackup view.
>
> The 'tablespace_streamed' column is not updated and will remain empty.
> This is
> because multiple workers may be copying files from different tablespaces.
>
>
> - Added backup manifest:
> The backend workers maintain their own manifest file which contains a list
> of files
> that are being transferred by the work. Once all backup files are
> transferred, the
> workers will create a temp file as
> ('pg_tempdir/temp_file_prefix_backupid.workerid')
> to write the content of the manifest file from BufFile. The workers won’t
> add the
> header, nor the WAL information in their manifest. These two will be added
> by the
> main process while merging all worker manifest files.
>
> The main process will read these individual files and concatenate them
> into a single file
> which is then sent back to the client.
>
> The manifest file is created when the following command is received:
>
>>     BUILD_MANIFEST 'backupid'
>
>
> This is a new replication command. It is sent when pg_basebackup has
> copied all the
> $PGDATA files including WAL files.
>
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T04:27:31Z

On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> I forgot to make a check for no-manifest. Fixed. Attached is the updated
> patch.
>
>
Have we done any performance testing with this patch to see the benefits?
If so, can you point me to the results? If not, then can we perform some
tests on large backups to see the benefits of this patch/idea?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-21T07:29:47Z

Hi,

I did some tests a while back, and here are the results. The tests were
done to simulate
a live database environment using pgbench.

machine configuration used for this test:
Instance Type:    t2.xlarge
Volume Type  :    io1
Memory (MiB) :    16384
vCPU #           :    4
Architecture    :    X86_64
IOP                 :    16000
Database Size (GB) :    102

The setup consist of 3 machines.
- one for database instances
- one for pg_basebackup client and
- one for pgbench with some parallel workers, simulating SELECT loads.

                                   basebackup | 4 workers | 8 Workers  | 16
workers
Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
(pgbench running with 50 parallel client simulating SELECT load)

Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
(pgbench running with 100 parallel client simulating SELECT load)



On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>> I forgot to make a check for no-manifest. Fixed. Attached is the updated
>> patch.
>>
>>
> Have we done any performance testing with this patch to see the benefits?
> If so, can you point me to the results? If not, then can we perform some
> tests on large backups to see the benefits of this patch/idea?
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> — 2020-04-21T09:35:38Z

Hi Asif,

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi,
>
> I did some tests a while back, and here are the results. The tests were
> done to simulate
> a live database environment using pgbench.
>
> machine configuration used for this test:
> Instance Type:    t2.xlarge
> Volume Type  :    io1
> Memory (MiB) :    16384
> vCPU #           :    4
> Architecture    :    X86_64
> IOP                 :    16000
> Database Size (GB) :    102
>
> The setup consist of 3 machines.
> - one for database instances
> - one for pg_basebackup client and
> - one for pgbench with some parallel workers, simulating SELECT loads.
>
>                                    basebackup | 4 workers | 8 Workers  |
> 16 workers
> Backup Duration(Min):       69.25    |  20.44      | 19.86          |
> 20.15
> (pgbench running with 50 parallel client simulating SELECT load)
>


Well that looks a bit strange. All 4, 8 and 16 workers backup configurations
seem to have taken the same time. Is it because the machine CPUs are
only 4? In that case did you try to run with 2-workers and compare that
with 4-workers time?

Also, just to clarify and be sure - was there anything else running on any
of
these 3 machines while the backup was in progress.

Regards,
Jeevan Ladhe


> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
> (pgbench running with 100 parallel client simulating SELECT load)
>
>
>
> On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>
>> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>> I forgot to make a check for no-manifest. Fixed. Attached is the updated
>>> patch.
>>>
>>>
>> Have we done any performance testing with this patch to see the benefits?
>> If so, can you point me to the results? If not, then can we perform some
>> tests on large backups to see the benefits of this patch/idea?
>>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-21T10:16:33Z

On Tue, 21 Apr 2020 at 2:36 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>
wrote:

> Hi Asif,
>
> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>> Hi,
>>
>> I did some tests a while back, and here are the results. The tests were
>> done to simulate
>> a live database environment using pgbench.
>>
>> machine configuration used for this test:
>> Instance Type:    t2.xlarge
>> Volume Type  :    io1
>> Memory (MiB) :    16384
>> vCPU #           :    4
>> Architecture    :    X86_64
>> IOP                 :    16000
>> Database Size (GB) :    102
>>
>> The setup consist of 3 machines.
>> - one for database instances
>> - one for pg_basebackup client and
>> - one for pgbench with some parallel workers, simulating SELECT loads.
>>
>>                                    basebackup | 4 workers | 8 Workers  |
>> 16 workers
>> Backup Duration(Min):       69.25    |  20.44      | 19.86          |
>> 20.15
>> (pgbench running with 50 parallel client simulating SELECT load)
>>
>
>
> Well that looks a bit strange. All 4, 8 and 16 workers backup
> configurations
> seem to have taken the same time. Is it because the machine CPUs are
> only 4? In that case did you try to run with 2-workers and compare that
> with 4-workers time?
>
> Also, just to clarify and be sure - was there anything else running on any
> of
> these 3 machines while the backup was in progress.
>

The tests were performed only for 4, 8 and 16 at the time and there was
nothing else running on any of the machines.


> Regards,
> Jeevan Ladhe
>
>
>> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
>> (pgbench running with 100 parallel client simulating SELECT load)
>>
>>
>>
>> On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>>
>>> On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com>
>>> wrote:
>>>
>>>>
>>>> I forgot to make a check for no-manifest. Fixed. Attached is the
>>>> updated patch.
>>>>
>>>>
>>> Have we done any performance testing with this patch to see the
>>> benefits? If so, can you point me to the results? If not, then can we
>>> perform some tests on large backups to see the benefits of this patch/idea?
>>>
>>> --
>>> With Regards,
>>> Amit Kapila.
>>> EnterpriseDB: http://www.enterprisedb.com
>>>
>>
>>
>> --
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>> --
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T11:48:17Z

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
> I did some tests a while back, and here are the results. The tests were done to simulate
> a live database environment using pgbench.
>
> machine configuration used for this test:
> Instance Type:    t2.xlarge
> Volume Type  :    io1
> Memory (MiB) :    16384
> vCPU #           :    4
> Architecture    :    X86_64
> IOP                 :    16000
> Database Size (GB) :    102
>
> The setup consist of 3 machines.
> - one for database instances
> - one for pg_basebackup client and
> - one for pgbench with some parallel workers, simulating SELECT loads.
>
>                                    basebackup | 4 workers | 8 Workers  | 16 workers
> Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
> (pgbench running with 50 parallel client simulating SELECT load)
>
> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
> (pgbench running with 100 parallel client simulating SELECT load)
>

Thanks for sharing the results, these show nice speedup!  However, I
think we should try to find what exactly causes this speed up.  If you
see the recent discussion on another thread related to this topic,
Andres, pointed out that he doesn't think that we can gain much by
having multiple connections[1].  It might be due to some internal
limitations (like small buffers) [2] due to which we are seeing these
speedups.  It might help if you can share the perf reports of the
server-side and pg_basebackup side.  We don't need pgbench type
workload to see what caused speed up.

[1] - https://www.postgresql.org/message-id/20200420201922.55ab7ovg6535suyz%40alap3.anarazel.de
[2] - https://www.postgresql.org/message-id/20200421064420.z7eattzqbunbutz3%40alap3.anarazel.de

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T11:49:45Z

On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> >
> > I did some tests a while back, and here are the results. The tests were done to simulate
> > a live database environment using pgbench.
> >
> > machine configuration used for this test:
> > Instance Type:    t2.xlarge
> > Volume Type  :    io1
> > Memory (MiB) :    16384
> > vCPU #           :    4
> > Architecture    :    X86_64
> > IOP                 :    16000
> > Database Size (GB) :    102
> >
> > The setup consist of 3 machines.
> > - one for database instances
> > - one for pg_basebackup client and
> > - one for pgbench with some parallel workers, simulating SELECT loads.
> >
> >                                    basebackup | 4 workers | 8 Workers  | 16 workers
> > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
> > (pgbench running with 50 parallel client simulating SELECT load)
> >
> > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
> > (pgbench running with 100 parallel client simulating SELECT load)
> >
>
> Thanks for sharing the results, these show nice speedup!  However, I
> think we should try to find what exactly causes this speed up.  If you
> see the recent discussion on another thread related to this topic,
> Andres, pointed out that he doesn't think that we can gain much by
> having multiple connections[1].  It might be due to some internal
> limitations (like small buffers) [2] due to which we are seeing these
> speedups.  It might help if you can share the perf reports of the
> server-side and pg_basebackup side.
>

Just to be clear, we need perf reports both with and without patch-set.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-04-21T11:56:16Z

On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
> > >
> > > I did some tests a while back, and here are the results. The tests
> were done to simulate
> > > a live database environment using pgbench.
> > >
> > > machine configuration used for this test:
> > > Instance Type:    t2.xlarge
> > > Volume Type  :    io1
> > > Memory (MiB) :    16384
> > > vCPU #           :    4
> > > Architecture    :    X86_64
> > > IOP                 :    16000
> > > Database Size (GB) :    102
> > >
> > > The setup consist of 3 machines.
> > > - one for database instances
> > > - one for pg_basebackup client and
> > > - one for pgbench with some parallel workers, simulating SELECT loads.
> > >
> > >                                    basebackup | 4 workers | 8 Workers
> | 16 workers
> > > Backup Duration(Min):       69.25    |  20.44      | 19.86          |
> 20.15
> > > (pgbench running with 50 parallel client simulating SELECT load)
> > >
> > > Backup Duration(Min):       154.75   |  49.28     | 45.27         |
> 20.35
> > > (pgbench running with 100 parallel client simulating SELECT load)
> > >
> >
> > Thanks for sharing the results, these show nice speedup!  However, I
> > think we should try to find what exactly causes this speed up.  If you
> > see the recent discussion on another thread related to this topic,
> > Andres, pointed out that he doesn't think that we can gain much by
> > having multiple connections[1].  It might be due to some internal
> > limitations (like small buffers) [2] due to which we are seeing these
> > speedups.  It might help if you can share the perf reports of the
> > server-side and pg_basebackup side.
> >
>
> Just to be clear, we need perf reports both with and without patch-set.
>

These tests were done a while back, I think it would be good to run the
benchmark again with the latest patches of parallel backup and share the
results and perf reports.

>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-21T14:12:37Z

On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > >
>> > > I did some tests a while back, and here are the results. The tests were done to simulate
>> > > a live database environment using pgbench.
>> > >
>> > > machine configuration used for this test:
>> > > Instance Type:    t2.xlarge
>> > > Volume Type  :    io1
>> > > Memory (MiB) :    16384
>> > > vCPU #           :    4
>> > > Architecture    :    X86_64
>> > > IOP                 :    16000
>> > > Database Size (GB) :    102
>> > >
>> > > The setup consist of 3 machines.
>> > > - one for database instances
>> > > - one for pg_basebackup client and
>> > > - one for pgbench with some parallel workers, simulating SELECT loads.
>> > >
>> > >                                    basebackup | 4 workers | 8 Workers  | 16 workers
>> > > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
>> > > (pgbench running with 50 parallel client simulating SELECT load)
>> > >
>> > > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
>> > > (pgbench running with 100 parallel client simulating SELECT load)
>> > >
>> >
>> > Thanks for sharing the results, these show nice speedup!  However, I
>> > think we should try to find what exactly causes this speed up.  If you
>> > see the recent discussion on another thread related to this topic,
>> > Andres, pointed out that he doesn't think that we can gain much by
>> > having multiple connections[1].  It might be due to some internal
>> > limitations (like small buffers) [2] due to which we are seeing these
>> > speedups.  It might help if you can share the perf reports of the
>> > server-side and pg_basebackup side.
>> >
>>
>> Just to be clear, we need perf reports both with and without patch-set.
>
>
> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallel backup and share the results and perf reports.
>

Sounds good. I think we should also try to run the test with 1 worker
as well.  The reason it will be good to see the results with 1 worker
is that we can know if the technique to send file by file as is done
in this patch is better or worse than the current HEAD code.  So, it
will be good to see the results of an unpatched code, 1 worker, 2
workers, 4 workers, etc.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Dipesh Pandit <dipesh.pandit@gmail.com> — 2020-04-22T11:07:25Z

Hi Asif,

I am reviewing your recent patch and found the patch is not applicable on latest master. 

Could you please resolve the conflicts and update a new patch?

Thanks,
Dipesh
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-22T14:18:20Z

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).


> +typedef struct
> +{
> ...
> +} BackupFile;
> +
> +typedef struct
> +{
> ...
> +} BackupState;
>
> These structures need comments.
>
Done.


>
> +list_wal_files_opt_list:
> +                       SCONST SCONST
>                                 {
> -                                 $$ = makeDefElem("manifest_checksums",
> -
> (Node *)makeString($2), -1);
> +                                       $$ = list_make2(
> +                                       makeDefElem("start_wal_location",
> +                                               (Node *)makeString($2),
> -1),
> +                                       makeDefElem("end_wal_location",
> +                                               (Node *)makeString($2),
> -1));
> +
>                                 }
>
> This seems like an unnecessarily complicated parse representation. The
> DefElems seem to be completely unnecessary here.
>

The startptr and endptr are now in a shared state. so this command does not
need to have these two options now. So I have removed this rule entirely.


> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
>                 set_ps_display(activitymsg);
>         }
>
> -       perform_base_backup(&opt);
> +       switch (cmd->cmdtag)
>
> So the design here is that SendBaseBackup() is now going to do a bunch
> of things that are NOT sending a base backup? With no updates to the
> comments of that function and no change to the process title it sets?
>

Okay. I have renamed the function and have updated the comments.


>
> -       return (manifest->buffile != NULL);
> +       return (manifest && manifest->buffile != NULL);
>
> Heck no. It appears that you didn't even bother reading the function
> header comment.
>

Okay, I forgot to remove this check. In the backup manifest patch,
manifest_info
object is always available. Anyways I have removed this check for 003 patch
as well.


>
> + * Send a single resultset containing XLogRecPtr record (in text format)
> + * TimelineID and backup label.
>   */
>  static void
> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
> +                                        StringInfo label, char *backupid)
>
> This just casually breaks wire protocol compatibility, which seems
> completely unacceptable.
>

Non-parallal backup returns startptr and tli in the result set. The
START_BACKUP
returns startptr, tli, backup label and backupid. So I had extended this
result set.

I have removed the changes from SendXlogRecPtrResult and have added another
function just for returning the result set for parallel backup.


>
> +       if (strlen(opt->tablespace) > 0)
> +               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
> +       else
> +               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
> +
> +       SendFilesHeader(files);
>
> So I guess the idea here is that we buffer the entire list of files in
> memory, regardless of size, and then we send it out afterwards. That
> doesn't seem like a good idea. The list of files might be very large.
> We probably need some code refactoring here rather than just piling
> more and more different responsibilities onto sendTablespace() and
> sendDir().
>

I don't foresee memory to be a challenge here. Assuming a database
containing 10240
relation files (that max reach to 10 TB of size), the list will occupy
approximately 102MB
of space in memory. This obviously can be reduced, but it doesn’t seem too
bad either.
One way of doing it is by fetching a smaller set of files and clients can
result in the next
set if the current one is processed; perhaps fetch initially per table
space and request for
next one once the current one is done with.

Currently, basebackup only does compression on the client-side. So, I
suggest we stick with
the existing behavior. On the other thread, you have mentioned that the
backend should send
the tarballs and that the server should decide which files per tarball. I
believe the current
design can accommodate that easily if it's the client deciding the files
per tarball. The current
design can also accommodate server-side compression and encryption with
minimal changes.
Is there a point I’m overlooking here?



>
> +       if (state->parallel_mode)
> +               SpinLockAcquire(&state->lock);
> +
> +       state->throttling_counter += increment;
> +
> +       if (state->parallel_mode)
> +               SpinLockRelease(&state->lock);
>
> I don't like this much. It seems to me that we would do better to use
> atomics here all the time, instead of conditional spinlocks.
>

Okay have added throttling_counter as atomic. however a lock is still
required
for  throttling_counter%=throttling_sample.




>
> +static void
> +send_file(basebackup_options *opt, char *file, bool missing_ok)
> ...
> +       if (file == NULL)
> +               return;
>
> That seems totally inappropriate.
>

Removed.


> +                       sendFile(file, file + basepathlen, &statbuf,
> true, InvalidOid, NULL, NULL);
>
> Maybe I'm misunderstanding, but this looks like it's going to write a
> tar header, even though we're not writing a tarfile.
>

sendFile() always sends files with tar header included, even if the backup
mode

is plain. pg_basebackup also expects the same. That's the current behavior
of

the system.

Otherwise, we will have to duplicate this function which would be doing the
pretty

much same thing, except the tar header.



>
> +               else
> +                       ereport(WARNING,
> +                                       (errmsg("skipping special file
> or directory \"%s\"", file)));
>
> So, if the user asks for a directory or symlink, what's going to
> happen is that they're going to receive an empty file, and get a
> warning. That sounds like terrible behavior.
>

Removed the warning and generated an error if other then a regular file is
requested.


>
>
> +       /*
> +        * Check for checksum failures. If there are failures across
> multiple
> +        * processes it may not report total checksum count, but it will
> error
> +        * out,terminating the backup.
> +        */
>
> In other words, the patch breaks the feature. Not that the feature in
> question works particularly well as things stand, but this makes it
> worse.
>

Added an atomic uint64 total_checksum_failures to shared state to keep
the total count across workers, So it will have the same behavior as
current.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-04-22T16:27:35Z

On Wed, Apr 22, 2020 at 10:18 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I don't foresee memory to be a challenge here. Assuming a database containing 10240
> relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB
> of space in memory. This obviously can be reduced, but it doesn’t seem too bad either.
> One way of doing it is by fetching a smaller set of files and clients can result in the next
> set if the current one is processed; perhaps fetch initially per table space and request for
> next one once the current one is done with.

The more concerning case is when someone has a lot of small files.

> Okay have added throttling_counter as atomic. however a lock is still required
> for  throttling_counter%=throttling_sample.

Well, if you can't get rid of the lock, using a atomics is pointless.

>> +                       sendFile(file, file + basepathlen, &statbuf,
>> true, InvalidOid, NULL, NULL);
>>
>> Maybe I'm misunderstanding, but this looks like it's going to write a
>> tar header, even though we're not writing a tarfile.
>
> sendFile() always sends files with tar header included, even if the backup mode
> is plain. pg_basebackup also expects the same. That's the current behavior of
> the system.
>
> Otherwise, we will have to duplicate this function which would be doing the pretty
> much same thing, except the tar header.

Well, as I said before, the solution to that problem is refactoring,
not crummy interfaces. You're never going to persuade any committer
who understands what that code actually does to commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-23T06:43:33Z

On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
> Hi Dipesh,
>
> The rebased and updated patch is attached. Its rebased to (9f2c4ede).
>

Make is failing for v15 patch.

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
In file included from basebackup.c:33:
../../../src/include/replication/backup_manifest.h:37: error: redefinition
of typedef ‘manifest_info’
../../../src/include/replication/basebackup.h:35: note: previous
declaration of ‘manifest_info’ was here
make[3]: *** [basebackup.o] Error 1
make[3]: Leaving directory
`/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
make[2]: *** [replication-recursive] Error 2


>
>

Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com> — 2020-04-23T08:17:03Z

On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

>
>
> On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com>
> wrote:
>
>>
>> Hi Dipesh,
>>
>> The rebased and updated patch is attached. Its rebased to (9f2c4ede).
>>
>
> Make is failing for v15 patch.
>
> gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
> -g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
> basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
> In file included from basebackup.c:33:
> ../../../src/include/replication/backup_manifest.h:37: error: redefinition
> of typedef ‘manifest_info’
> ../../../src/include/replication/basebackup.h:35: note: previous
> declaration of ‘manifest_info’ was here
> make[3]: *** [basebackup.o] Error 1
> make[3]: Leaving directory
> `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
> make[2]: *** [replication-recursive] Error 2
>
>
Just compiled on clean source and its compiling fine. Can you see if you
have a clean source tree?


-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-04-23T09:53:05Z

On Thu, Apr 23, 2020 at 1:47 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <
> rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
>>
>>
>> On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>>
>>>
>>> Hi Dipesh,
>>>
>>> The rebased and updated patch is attached. Its rebased to (9f2c4ede).
>>>
>>
>> Make is failing for v15 patch.
>>
>> gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
>> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
>> -g -g -O0 -I. -I. -I../../../src/include  -D_GNU_SOURCE   -c -o
>> basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
>> In file included from basebackup.c:33:
>> ../../../src/include/replication/backup_manifest.h:37: error:
>> redefinition of typedef ‘manifest_info’
>> ../../../src/include/replication/basebackup.h:35: note: previous
>> declaration of ‘manifest_info’ was here
>> make[3]: *** [basebackup.o] Error 1
>> make[3]: Leaving directory
>> `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
>> make[2]: *** [replication-recursive] Error 2
>>
>>
> Just compiled on clean source and its compiling fine. Can you see if you
> have a clean source tree?
>
Yeah, my machine is not cleaned. My colleague Suraj is also able to compile.
Thanks, sorry for the noise.


>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

Re: WIP/PoC for parallel backup

David Zhang <david.zhang@highgo.ca> — 2020-04-27T16:53:16Z

Hi,

Here is the parallel backup performance test results with and without 
the patch "parallel_backup_v15" on AWS cloud environment. Two 
"t2.xlarge" machines were used: one for Postgres server and the other 
one for pg_basebackup with the same machine configuration showing below.

Machine configuration:
     Instance Type        :t2.xlarge
     Volume type          :io1
     Memory (MiB)         :16GB
     vCPU #               :4
     Architecture         :x86_64
     IOP                  :6000
     Database Size (GB)   :108

Performance test results:
without patch:
     real 18m49.346s
     user 1m24.178s
     sys 7m2.966s

1 worker with patch:
     real 18m43.201s
     user 1m55.787s
     sys 7m24.724s

2 worker with patch:
     real 18m47.373s
     user 2m22.970s
     sys 11m23.891s

4 worker with patch:
     real 18m46.878s
     user 2m26.791s
     sys 13m14.716s

As required, I didn't have the pgbench running in parallel like we did 
in the previous benchmark.

The perf report files for both Postgres server and pg_basebackup sides 
are attached.

The files are listed like below. i.e. without patch 1 worker, and with 
patch 1, 2, 4 workers.

perf report on Postgres server side:
     perf.data-postgres-without-parallel_backup_v15.txt
     perf.data-postgres-with-parallel_backup_v15-j1.txt
     perf.data-postgres-with-parallel_backup_v15-j2.txt
     perf.data-postgres-with-parallel_backup_v15-j4.txt

perf report on pg_basebackup side:
     perf.data-pg_basebackup-without-parallel_backup_v15.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j1.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j2.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j4.txt


If any more information required please let me know.


On 2020-04-21 7:12 a.m., Amit Kapila wrote:
> On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>>>>> I did some tests a while back, and here are the results. The tests were done to simulate
>>>>> a live database environment using pgbench.
>>>>>
>>>>> machine configuration used for this test:
>>>>> Instance Type:    t2.xlarge
>>>>> Volume Type  :    io1
>>>>> Memory (MiB) :    16384
>>>>> vCPU #           :    4
>>>>> Architecture    :    X86_64
>>>>> IOP                 :    16000
>>>>> Database Size (GB) :    102
>>>>>
>>>>> The setup consist of 3 machines.
>>>>> - one for database instances
>>>>> - one for pg_basebackup client and
>>>>> - one for pgbench with some parallel workers, simulating SELECT loads.
>>>>>
>>>>>                                     basebackup | 4 workers | 8 Workers  | 16 workers
>>>>> Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
>>>>> (pgbench running with 50 parallel client simulating SELECT load)
>>>>>
>>>>> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
>>>>> (pgbench running with 100 parallel client simulating SELECT load)
>>>>>
>>>> Thanks for sharing the results, these show nice speedup!  However, I
>>>> think we should try to find what exactly causes this speed up.  If you
>>>> see the recent discussion on another thread related to this topic,
>>>> Andres, pointed out that he doesn't think that we can gain much by
>>>> having multiple connections[1].  It might be due to some internal
>>>> limitations (like small buffers) [2] due to which we are seeing these
>>>> speedups.  It might help if you can share the perf reports of the
>>>> server-side and pg_basebackup side.
>>>>
>>> Just to be clear, we need perf reports both with and without patch-set.
>>
>> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallel backup and share the results and perf reports.
>>
> Sounds good. I think we should also try to run the test with 1 worker
> as well.  The reason it will be good to see the results with 1 worker
> is that we can know if the technique to send file by file as is done
> in this patch is better or worse than the current HEAD code.  So, it
> will be good to see the results of an unpatched code, 1 worker, 2
> workers, 4 workers, etc.
>
-- 
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-28T03:15:55Z

On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
>      Instance Type        :t2.xlarge
>      Volume type          :io1
>      Memory (MiB)         :16GB
>      vCPU #               :4
>      Architecture         :x86_64
>      IOP                  :6000
>      Database Size (GB)   :108
>
> Performance test results:
> without patch:
>      real 18m49.346s
>      user 1m24.178s
>      sys 7m2.966s
>
> 1 worker with patch:
>      real 18m43.201s
>      user 1m55.787s
>      sys 7m24.724s
>
> 2 worker with patch:
>      real 18m47.373s
>      user 2m22.970s
>      sys 11m23.891s
>
> 4 worker with patch:
>      real 18m46.878s
>      user 2m26.791s
>      sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario.  Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously.  I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process.  And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster.  One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code).  Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports.  I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-04-29T12:41:07Z

Hi,

We at EnterpriseDB did some performance testing around this
parallel backup to check how this is beneficial and below are the results.
In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup

1) Client and Server both on the same machine (Local backups)

2) Client and server on a different machine (remote backups)


*Machine details: *

1: Server (on which local backups performed and used as server for remote
backups)

2: Client (Used as a client for remote backups)


*Server:*

RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4


*Client:*
RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4

Below are the results for the local test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased

Results for the remote test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased


Client & Server on the same machine, the result shows around 50%
improvement in parallel run with worker 4 and 8.  We don’t see the huge
performance improvement with more workers been added.


Whereas, when the client and server on a different machine, we don’t see
any major benefit in performance.  This testing result matches the testing
results posted by David Zhang up thread.



We ran the test for 100GB backup with parallel worker 4 to see the CPU
usage and other information. What we noticed is that server is consuming
the CPU almost 100% whole the time and pg_stat_activity shows that server
is busy with ClientWrite most of the time.


Attaching captured output for

1) Top command output on the server after every 5 second

2) pg_stat_activity output after every 5 second

3) Top command output on the client after every 5 second


Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

> On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca>
> wrote:
> >
> > Hi,
> >
> > Here is the parallel backup performance test results with and without
> > the patch "parallel_backup_v15" on AWS cloud environment. Two
> > "t2.xlarge" machines were used: one for Postgres server and the other
> > one for pg_basebackup with the same machine configuration showing below.
> >
> > Machine configuration:
> >      Instance Type        :t2.xlarge
> >      Volume type          :io1
> >      Memory (MiB)         :16GB
> >      vCPU #               :4
> >      Architecture         :x86_64
> >      IOP                  :6000
> >      Database Size (GB)   :108
> >
> > Performance test results:
> > without patch:
> >      real 18m49.346s
> >      user 1m24.178s
> >      sys 7m2.966s
> >
> > 1 worker with patch:
> >      real 18m43.201s
> >      user 1m55.787s
> >      sys 7m24.724s
> >
> > 2 worker with patch:
> >      real 18m47.373s
> >      user 2m22.970s
> >      sys 11m23.891s
> >
> > 4 worker with patch:
> >      real 18m46.878s
> >      user 2m26.791s
> >      sys 13m14.716s
> >
> > As required, I didn't have the pgbench running in parallel like we did
> > in the previous benchmark.
> >
>
> So, there doesn't seem to be any significant improvement in this
> scenario.  Now, it is not clear why there was a significant
> improvement in the previous run where pgbench was also running
> simultaneously.  I am not sure but maybe it is because when a lot of
> other backends were running (performing read-only workload) the
> backend that was responsible for doing backup was getting frequently
> scheduled out and it slowed down the overall backup process.  And when
> we start using multiple backends for backup one or other backup
> process is always running making the overall backup faster.  One idea
> to find this out is to check how much time backup takes when we run it
> with and without pgbench workload on HEAD (aka unpatched code).  Even
> if what I am saying is true or there is some other reason due to which
> we are seeing speedup in some cases (where there is a concurrent
> workload), it might not make the case for using multiple backends for
> backup but still, it is good to find that information as it might help
> in designing this feature better.
>
> > The perf report files for both Postgres server and pg_basebackup sides
> > are attached.
> >
>
> It is not clear which functions are taking more time or for which
> functions time is reduced as function symbols are not present in the
> reports.  I think you can refer
> "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
> take profiles and additionally use -fno-omit-frame-pointer during
> configure (you can use CFLAGS="-fno-omit-frame-pointer during
> configure).
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Re: WIP/PoC for parallel backup

David Zhang <david.zhang@highgo.ca> — 2020-04-30T06:26:16Z

Hi,

Thanks a lot for sharing the test results. Here is the our test results 
using perf on three ASW t2.xlarge with below configuration.

Machine configuration:
       Instance Type        :t2.xlarge
       Volume type          :io1
       Memory (MiB)         :16GB
       vCPU #                   :4
       Architecture           :x86_64
       IOP                         :6000
       Database Size (GB)  :45 (Server)

case 1: postgres server: without patch and without load

* Disk I/O:

# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
      2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  .................... 
.............................................
#
     64.73%  swapper          [kernel.kallsyms]     [k] native_safe_halt
     10.89%  postgres         [vdso]                [.] __vdso_gettimeofday
      5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
      5.43%  postgres         libpthread-2.26.so    [.] __libc_recv
      1.72%  postgres         [kernel.kallsyms]     [k] 
pvclock_clocksource_read

* Network:

# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
      7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
      0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
      0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb


case 1: pg_basebackup client: without patch and without load

* Disk I/O:

# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
      2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
      0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  .................. 
.............................................
#
     87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
      3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
      0.48%  swapper          [kernel.kallsyms]   [k] 
__softirqentry_text_start
      0.37%  pg_basebackup    [kernel.kallsyms]   [k] 
copy_user_enhanced_fast_string
      0.35%  swapper          [kernel.kallsyms]   [k] do_csum

* Network:

# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
      3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
      0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
      0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb


case 2: postgres server: with patch and with load, 4 backup workers on 
client side

* Disk I/O:

# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
      0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
      0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request

* CPU:

# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ..................... 
.............................................
#
     52.73%  swapper          [kernel.kallsyms]      [k] native_safe_halt
      8.31%  postgres         [vdso]                 [.] __vdso_gettimeofday
      4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
      4.16%  postgres         libpthread-2.26.so     [.] __libc_recv
      1.58%  postgres         [kernel.kallsyms]      [k] __lock_text_start
      1.52%  postgres         [kernel.kallsyms]      [k] 
pvclock_clocksource_read
      0.81%  postgres         [kernel.kallsyms]      [k] 
copy_user_enhanced_fast_string


* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
     12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
      0.97%  postgres         [kernel.kallsyms]  [k] __consume_stateless_skb
      0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
      0.24%  perf             [kernel.kallsyms]  [k] consume_skb


case 2: pg_basebackup 4 workers: with patch and with load

* Disk I/O:

# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
      1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
      0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ........................ 
.................................................
#
     89.25%  swapper          [kernel.kallsyms]         [k] native_safe_halt
      0.93%  pg_basebackup    [kernel.kallsyms]         [k] 
__lock_text_start
      0.91%  swapper          [kernel.kallsyms]         [k] 
__lock_text_start
      0.69%  pg_basebackup    [kernel.kallsyms]         [k] 
copy_user_enhanced_fast_string
      0.45%  swapper          [kernel.kallsyms]         [k] do_csum


* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
      9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
      0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
      0.21%  sshd             [kernel.kallsyms]  [k] consume_skb


The detailed perf report is attached, with different scenarios, i.e. 
without patch (with and without load for server and client) , with patch 
(with and without load for 1, 2, 4, 8 workers for both server and 
client). The file name should self explain the cases.

Let me know if more information required.

Best regards,

David

On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
> Hi,
>
> We at EnterpriseDB did some performance testing around this 
> parallel backup to check how this is beneficial and below are the 
> results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> *Machine details: *
>
> 1: Server (on which local backups performed and used as server for 
> remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
> *Server:*
>
> RAM:500 GB
> CPU details:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 128
> On-line CPU(s) list: 0-127
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 8
> NUMA node(s): 8
> Filesystem:ext4
>
>
> *Client:*
> RAM:490 GB
> CPU details:
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 192
> On-line CPU(s) list: 0-191
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 24
> Filesystem:ext4
>
> Below are the results for the local test:
>
> Data size 	without paralle backup
> patch 	parallel backup with
> 1 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 2 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 4 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 8 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) 	real 0m27.016s
> user 0m3.378s
> sys 0m23.059s 	real 0m30.314s
> user 0m3.575s
> sys 0m22.946s 	12% performance
> decreased 	real 0m20.400s
> user 0m3.622s
> sys 0m29.670s 	27% performace
> increased 	real 0m15.331s
> user 0m3.706s
> sys 0m39.189s 	43% performance
> increased 	real 0m15.094s
> user 0m3.915s
> sys 1m23.350s 	44% performace
> increased.
> 50GB
> (50 tables - each table around 1.05 GB) 	real 2m11.049s
> user 0m16.464s
> sys 2m1.757s 	real 2m26.621s
> user 0m18.497s
> sys 2m4.792s 	21% performance
> decreased 	real 1m9.581s
> user 0m18.298s
> sys 2m12.030s 	46% performance
> increased 	real 0m53.894s
> user 0m18.588s
> sys 2m47.390s 	58% performance
> increased. 	real 0m55.373s
> user 0m18.423s
> sys 5m57.470s 	57% performance
> increased.
> 100GB
> (100 tables - each table around 1.05 GB) 	real 4m4.776s
> user 0m33.699s
> sys 3m27.777s 	real 4m20.862s
> user 0m35.753s
> sys 3m28.262s 	6% performance
> decreased 	real 2m37.411s
> user 0m36.440s
> sys 4m16.424s" 	35% performance
> increased 	real 1m49.503s
> user 0m37.200s
> sys 5m58.077s 	55% performace
> increased 	real 1m36.762s
> user 0m36.987s
> sys 9m36.906s 	60% performace
> increased.
> 200GB
> (200 tables - each table around 1.05 GB) 	real 10m34.998s
> user 1m8.471s
> sys 7m21.520s 	real 11m30.899s
> user 1m12.933s
> sys 8m14.496s 	8% performance
> decreased 	real 6m8.481s
> user 1m13.771s
> sys 9m31.216s 	41% performance
> increased 	real 4m2.403s
> user 1m18.331s
> sys 12m29.661s 	61% performance
> increased 	real 4m3.768s
> user 1m24.547s
> sys 15m21.421s 	61% performance
> increased
>
>
> Results for the remote test:
>
> Data size 	without paralle backup
> patch 	parallel backup with
> 1 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 2 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 4 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 8 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) 	real 1m36.829s
> user 0m2.124s
> sys 0m14.004s 	real 1m37.598s
> user 0m3.272s
> sys 0m11.110s 	0.8% performance
> decreased 	real 1m36.753s
> user 0m2.627s
> sys 0m15.312s 	0.08% performance
> increased. 	real 1m37.212s
> user 0m3.835s
> sys 0m13.221s 	0.3% performance
> decreased. 	real 1m36.977s
> user 0m4.475s
> sys 0m17.937s 	0.1% perfomance
> decreased.
> 50GB
> (50 tables - each table around 1.05 GB) 	real 7m54.211s
> user 0m10.826s
> sys 1m10.435s 	real 7m55.603s
> user 0m16.535s
> sys 1m8.147s 	0.2% performance
> decreased 	real 7m53.499s
> user 0m18.131s
> sys 1m8.822s 	0.1% performance
> increased. 	real 7m54.687s
> user 0m15.818s
> sys 1m30.991s 	0.1% performance
> decreased 	real 7m54.658s
> user 0m20.783s
> sys 1m34.460s 	0.1% performance
> decreased
> 100GB
> (100 tables - each table around 1.05 GB) 	real 15m45.776s
> user 0m21.802s
> sys 2m59.006s 	real 15m46.315s
> user 0m32.499s
> sys 2m47.245s 	0.05% performance
> decreased 	real 15m46.065s
> user 0m28.877s
> sys 2m21.181s 	0.03% performacne
> drcreased 	real 15m47.793s
> user 0m30.932s
> sys 2m36.708s 	0.2% performance
> decresed 	real 15m47.129s
> user 0m35.151s
> sys 3m23.572s 	0.14% performance
> decreased.
> 200GB
> (200 tables - each table around 1.05 GB) 	real 32m55.720s
> user 0m50.602s
> sys 5m38.875s 	real 31m30.602s
> user 0m45.377s
> sys 4m57.405s 	4% performance
> increased 	real 31m30.214s
> user 0m55.023s
> sys 5m8.689s 	4% performance
> increased 	real 31m31.187s
> user 1m13.390s
> sys 5m40.861s 	4% performance
> increased 	real 31m31.729s
> user 1m4.955s
> sys 6m35.774s 	4% performance
> decreased
>
>
>
> Client & Server on the same machine, the result shows around 50% 
> improvement in parallel run with worker 4 and 8.  We don’t see the 
> huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t 
> see any major benefit in performance.  This testing result matches the 
> testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU 
> usage and other information. What we noticed is that server is 
> consuming the CPU almost 100% whole the time and pg_stat_activity 
> shows that server is busy with ClientWrite most of the time.
>
>
> Attaching captured output for
>
> 1) Top command output on the server after every 5 second
>
> 2) pg_stat_activity output after every 5 second
>
> 3) Top command output on the client after every 5 second
>
>
> Do let me know if anyone has further questions/inputs for the 
> benchmarking.
>
>
> Thanks to Rushabh Lathia for helping me with this testing.
>
> On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com 
> <mailto:amit.kapila16@gmail.com>> wrote:
>
>     On Mon, Apr 27, 2020 at 10:23 PM David Zhang
>     <david.zhang@highgo.ca <mailto:david.zhang@highgo.ca>> wrote:
>     >
>     > Hi,
>     >
>     > Here is the parallel backup performance test results with and
>     without
>     > the patch "parallel_backup_v15" on AWS cloud environment. Two
>     > "t2.xlarge" machines were used: one for Postgres server and the
>     other
>     > one for pg_basebackup with the same machine configuration
>     showing below.
>     >
>     > Machine configuration:
>     >      Instance Type        :t2.xlarge
>     >      Volume type          :io1
>     >      Memory (MiB)         :16GB
>     >      vCPU #               :4
>     >      Architecture         :x86_64
>     >      IOP                  :6000
>     >      Database Size (GB)   :108
>     >
>     > Performance test results:
>     > without patch:
>     >      real 18m49.346s
>     >      user 1m24.178s
>     >      sys 7m2.966s
>     >
>     > 1 worker with patch:
>     >      real 18m43.201s
>     >      user 1m55.787s
>     >      sys 7m24.724s
>     >
>     > 2 worker with patch:
>     >      real 18m47.373s
>     >      user 2m22.970s
>     >      sys 11m23.891s
>     >
>     > 4 worker with patch:
>     >      real 18m46.878s
>     >      user 2m26.791s
>     >      sys 13m14.716s
>     >
>     > As required, I didn't have the pgbench running in parallel like
>     we did
>     > in the previous benchmark.
>     >
>
>     So, there doesn't seem to be any significant improvement in this
>     scenario.  Now, it is not clear why there was a significant
>     improvement in the previous run where pgbench was also running
>     simultaneously.  I am not sure but maybe it is because when a lot of
>     other backends were running (performing read-only workload) the
>     backend that was responsible for doing backup was getting frequently
>     scheduled out and it slowed down the overall backup process. And when
>     we start using multiple backends for backup one or other backup
>     process is always running making the overall backup faster. One idea
>     to find this out is to check how much time backup takes when we run it
>     with and without pgbench workload on HEAD (aka unpatched code).  Even
>     if what I am saying is true or there is some other reason due to which
>     we are seeing speedup in some cases (where there is a concurrent
>     workload), it might not make the case for using multiple backends for
>     backup but still, it is good to find that information as it might help
>     in designing this feature better.
>
>     > The perf report files for both Postgres server and pg_basebackup
>     sides
>     > are attached.
>     >
>
>     It is not clear which functions are taking more time or for which
>     functions time is reduced as function symbols are not present in the
>     reports.  I think you can refer
>     "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
>     take profiles and additionally use -fno-omit-frame-pointer during
>     configure (you can use CFLAGS="-fno-omit-frame-pointer during
>     configure).
>
>
>     -- 
>     With Regards,
>     Amit Kapila.
>     EnterpriseDB: http://www.enterprisedb.com
>
>
>
>
> -- 
> -- 
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
-- 
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: WIP/PoC for parallel backup

Sumanta Mukherjee <sumanta.mukherjee@enterprisedb.com> — 2020-04-30T09:18:23Z

Hi,

Would it be possible to put in the absolute numbers of the perf
so that it is easier to understand the amount of improvement with
and without the patch and different loads and workers.

I am also unsure why the swapper is taking such a huge percentage of the
absolute time
in the base run of just the postgres server and pg_basebackup client.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com


On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca> wrote:

> Hi,
>
> Thanks a lot for sharing the test results. Here is the our test results
> using perf on three ASW t2.xlarge with below configuration.
>
> Machine configuration:
>       Instance Type        :t2.xlarge
>       Volume type          :io1
>       Memory (MiB)         :16GB
>       vCPU #                   :4
>       Architecture           :x86_64
>       IOP                         :6000
>       Database Size (GB)  :45 (Server)
>
> case 1: postgres server: without patch and without load
>
> * Disk I/O:
>
> # Samples: 342K of event 'block:block_rq_insert'
> # Event count (approx.): 342834
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  .....................
> #
>     97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
>      2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>
>
> * CPU:
>
> # Samples: 6M of event 'cpu-clock'
> # Event count (approx.): 1559444750000
> #
> # Overhead  Command          Shared Object
> Symbol
> # ........  ...............  ....................
> .............................................
> #
>     64.73%  swapper          [kernel.kallsyms]     [k] native_safe_halt
>     10.89%  postgres         [vdso]                [.] __vdso_gettimeofday
>      5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
>      5.43%  postgres         libpthread-2.26.so    [.] __libc_recv
>      1.72%  postgres         [kernel.kallsyms]     [k]
> pvclock_clocksource_read
>
> * Network:
>
> # Samples: 2M of event 'skb:consume_skb'
> # Event count (approx.): 2739785
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  ...........................
> #
>     91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
>      7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
>      0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
>      0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
>
>
> case 1: pg_basebackup client: without patch and without load
>
> * Disk I/O:
>
> # Samples: 371K of event 'block:block_rq_insert'
> # Event count (approx.): 371362
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  .....................
> #
>     96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>      2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
>      0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>      0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
>
>
> * CPU:
>
> # Samples: 3M of event 'cpu-clock'
> # Event count (approx.): 903527000000
> #
> # Overhead  Command          Shared Object
> Symbol
> # ........  ...............  ..................
> .............................................
> #
>     87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
>      3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
>      0.48%  swapper          [kernel.kallsyms]   [k]
> __softirqentry_text_start
>      0.37%  pg_basebackup    [kernel.kallsyms]   [k]
> copy_user_enhanced_fast_string
>      0.35%  swapper          [kernel.kallsyms]   [k] do_csum
>
> * Network:
>
> # Samples: 12M of event 'skb:consume_skb'
> # Event count (approx.): 12260713
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  ...........................
> #
>     95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
>      3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
>      0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
>      0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb
>
>
> case 2: postgres server: with patch and with load, 4 backup workers on
> client side
>
> * Disk I/O:
>
> # Samples: 3M of event 'block:block_rq_insert'
> # Event count (approx.): 3634542
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  .....................
> #
>     98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
>      0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
>      0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>      0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request
>
> * CPU:
>
> # Samples: 9M of event 'cpu-clock'
> # Event count (approx.): 2299129250000
> #
> # Overhead  Command          Shared Object
> Symbol
> # ........  ...............  .....................
> .............................................
> #
>     52.73%  swapper          [kernel.kallsyms]      [k] native_safe_halt
>      8.31%  postgres         [vdso]                 [.] __vdso_gettimeofday
>      4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
>      4.16%  postgres         libpthread-2.26.so     [.] __libc_recv
>      1.58%  postgres         [kernel.kallsyms]      [k] __lock_text_start
>      1.52%  postgres         [kernel.kallsyms]      [k]
> pvclock_clocksource_read
>      0.81%  postgres         [kernel.kallsyms]      [k]
> copy_user_enhanced_fast_string
>
>
> * Network:
>
> # Samples: 6M of event 'skb:consume_skb'
> # Event count (approx.): 6048795
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  ...........................
> #
>     85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
>     12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
>      0.97%  postgres         [kernel.kallsyms]  [k] __consume_stateless_skb
>      0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
>      0.24%  perf             [kernel.kallsyms]  [k] consume_skb
>
>
> case 2: pg_basebackup 4 workers: with patch and with load
>
> * Disk I/O:
>
> # Samples: 372K of event 'block:block_rq_insert'
> # Event count (approx.): 372360
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  .....................
> #
>     97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>      1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
>      0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>      0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
>
>
> * CPU:
>
> # Samples: 4M of event 'cpu-clock'
> # Event count (approx.): 1234071000000
> #
> # Overhead  Command          Shared Object
> Symbol
> # ........  ...............  ........................
> .................................................
> #
>     89.25%  swapper          [kernel.kallsyms]         [k] native_safe_halt
>      0.93%  pg_basebackup    [kernel.kallsyms]         [k]
> __lock_text_start
>      0.91%  swapper          [kernel.kallsyms]         [k]
> __lock_text_start
>      0.69%  pg_basebackup    [kernel.kallsyms]         [k]
> copy_user_enhanced_fast_string
>      0.45%  swapper          [kernel.kallsyms]         [k] do_csum
>
>
> * Network:
>
> # Samples: 6M of event 'skb:consume_skb'
> # Event count (approx.): 6449013
> #
> # Overhead  Command          Shared Object      Symbol
> # ........  ...............  .................  ...........................
> #
>     90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
>      9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
>      0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
>      0.21%  sshd             [kernel.kallsyms]  [k] consume_skb
>
>
> The detailed perf report is attached, with different scenarios, i.e.
> without patch (with and without load for server and client) , with patch
> (with and without load for 1, 2, 4, 8 workers for both server and client).
> The file name should self explain the cases.
>
> Let me know if more information required.
>
> Best regards,
>
> David
> On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this
> parallel backup to check how this is beneficial and below are the results.
> In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> *Machine details: *
>
> 1: Server (on which local backups performed and used as server for remote
> backups)
>
> 2: Client (Used as a client for remote backups)
>
>
> *Server:*
> RAM: 500 GB
> CPU details:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 128
> On-line CPU(s) list: 0-127
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 8
> NUMA node(s): 8
> Filesystem: ext4
>
>
> *Client:*
> RAM: 490 GB
> CPU details:
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 192
> On-line CPU(s) list: 0-191
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 24
> Filesystem: ext4
>
> Below are the results for the local test:
>
> Data size without paralle backup
> patch parallel backup with
> 1 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 2 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 4 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 8 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) real 0m27.016s
> user 0m3.378s
> sys 0m23.059s real 0m30.314s
> user 0m3.575s
> sys 0m22.946s 12% performance
> decreased real 0m20.400s
> user 0m3.622s
> sys 0m29.670s 27% performace
> increased real 0m15.331s
> user 0m3.706s
> sys 0m39.189s 43% performance
> increased real 0m15.094s
> user 0m3.915s
> sys 1m23.350s 44% performace
> increased.
> 50GB
> (50 tables - each table around 1.05 GB) real 2m11.049s
> user 0m16.464s
> sys 2m1.757s real 2m26.621s
> user 0m18.497s
> sys 2m4.792s 21% performance
> decreased real 1m9.581s
> user 0m18.298s
> sys 2m12.030s 46% performance
> increased real 0m53.894s
> user 0m18.588s
> sys 2m47.390s 58% performance
> increased. real 0m55.373s
> user 0m18.423s
> sys 5m57.470s 57% performance
> increased.
> 100GB
> (100 tables - each table around 1.05 GB) real 4m4.776s
> user 0m33.699s
> sys 3m27.777s real 4m20.862s
> user 0m35.753s
> sys 3m28.262s 6% performance
> decreased real 2m37.411s
> user 0m36.440s
> sys 4m16.424s" 35% performance
> increased real 1m49.503s
> user 0m37.200s
> sys 5m58.077s 55% performace
> increased real 1m36.762s
> user 0m36.987s
> sys 9m36.906s 60% performace
> increased.
> 200GB
> (200 tables - each table around 1.05 GB) real 10m34.998s
> user 1m8.471s
> sys 7m21.520s real 11m30.899s
> user 1m12.933s
> sys 8m14.496s 8% performance
> decreased real 6m8.481s
> user 1m13.771s
> sys 9m31.216s 41% performance
> increased real 4m2.403s
> user 1m18.331s
> sys 12m29.661s 61% performance
> increased real 4m3.768s
> user 1m24.547s
> sys 15m21.421s 61% performance
> increased
>
> Results for the remote test:
>
> Data size without paralle backup
> patch parallel backup with
> 1 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 2 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 4 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch) parallel backup with
> 8 worker % performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) real 1m36.829s
> user 0m2.124s
> sys 0m14.004s real 1m37.598s
> user 0m3.272s
> sys 0m11.110s 0.8% performance
> decreased real 1m36.753s
> user 0m2.627s
> sys 0m15.312s 0.08% performance
> increased. real 1m37.212s
> user 0m3.835s
> sys 0m13.221s 0.3% performance
> decreased. real 1m36.977s
> user 0m4.475s
> sys 0m17.937s 0.1% perfomance
> decreased.
> 50GB
> (50 tables - each table around 1.05 GB) real 7m54.211s
> user 0m10.826s
> sys 1m10.435s real 7m55.603s
> user 0m16.535s
> sys 1m8.147s 0.2% performance
> decreased real 7m53.499s
> user 0m18.131s
> sys 1m8.822s 0.1% performance
> increased. real 7m54.687s
> user 0m15.818s
> sys 1m30.991s 0.1% performance
> decreased real 7m54.658s
> user 0m20.783s
> sys 1m34.460s 0.1% performance
> decreased
> 100GB
> (100 tables - each table around 1.05 GB) real 15m45.776s
> user 0m21.802s
> sys 2m59.006s real 15m46.315s
> user 0m32.499s
> sys 2m47.245s 0.05% performance
> decreased real 15m46.065s
> user 0m28.877s
> sys 2m21.181s 0.03% performacne
> drcreased real 15m47.793s
> user 0m30.932s
> sys 2m36.708s 0.2% performance
> decresed real 15m47.129s
> user 0m35.151s
> sys 3m23.572s 0.14% performance
> decreased.
> 200GB
> (200 tables - each table around 1.05 GB) real 32m55.720s
> user 0m50.602s
> sys 5m38.875s real 31m30.602s
> user 0m45.377s
> sys 4m57.405s 4% performance
> increased real 31m30.214s
> user 0m55.023s
> sys 5m8.689s 4% performance
> increased real 31m31.187s
> user 1m13.390s
> sys 5m40.861s 4% performance
> increased real 31m31.729s
> user 1m4.955s
> sys 6m35.774s 4% performance
> decreased
>
>
> Client & Server on the same machine, the result shows around 50%
> improvement in parallel run with worker 4 and 8.  We don’t see the huge
> performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see
> any major benefit in performance.  This testing result matches the testing
> results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU
> usage and other information. What we noticed is that server is consuming
> the CPU almost 100% whole the time and pg_stat_activity shows that server
> is busy with ClientWrite most of the time.
>
>
> Attaching captured output for
>
> 1) Top command output on the server after every 5 second
>
> 2) pg_stat_activity output after every 5 second
>
> 3) Top command output on the client after every 5 second
>
>
> Do let me know if anyone has further questions/inputs for the
> benchmarking.
>
> Thanks to Rushabh Lathia for helping me with this testing.
>
> On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>
>> On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca>
>> wrote:
>> >
>> > Hi,
>> >
>> > Here is the parallel backup performance test results with and without
>> > the patch "parallel_backup_v15" on AWS cloud environment. Two
>> > "t2.xlarge" machines were used: one for Postgres server and the other
>> > one for pg_basebackup with the same machine configuration showing below.
>> >
>> > Machine configuration:
>> >      Instance Type        :t2.xlarge
>> >      Volume type          :io1
>> >      Memory (MiB)         :16GB
>> >      vCPU #               :4
>> >      Architecture         :x86_64
>> >      IOP                  :6000
>> >      Database Size (GB)   :108
>> >
>> > Performance test results:
>> > without patch:
>> >      real 18m49.346s
>> >      user 1m24.178s
>> >      sys 7m2.966s
>> >
>> > 1 worker with patch:
>> >      real 18m43.201s
>> >      user 1m55.787s
>> >      sys 7m24.724s
>> >
>> > 2 worker with patch:
>> >      real 18m47.373s
>> >      user 2m22.970s
>> >      sys 11m23.891s
>> >
>> > 4 worker with patch:
>> >      real 18m46.878s
>> >      user 2m26.791s
>> >      sys 13m14.716s
>> >
>> > As required, I didn't have the pgbench running in parallel like we did
>> > in the previous benchmark.
>> >
>>
>> So, there doesn't seem to be any significant improvement in this
>> scenario.  Now, it is not clear why there was a significant
>> improvement in the previous run where pgbench was also running
>> simultaneously.  I am not sure but maybe it is because when a lot of
>> other backends were running (performing read-only workload) the
>> backend that was responsible for doing backup was getting frequently
>> scheduled out and it slowed down the overall backup process.  And when
>> we start using multiple backends for backup one or other backup
>> process is always running making the overall backup faster.  One idea
>> to find this out is to check how much time backup takes when we run it
>> with and without pgbench workload on HEAD (aka unpatched code).  Even
>> if what I am saying is true or there is some other reason due to which
>> we are seeing speedup in some cases (where there is a concurrent
>> workload), it might not make the case for using multiple backends for
>> backup but still, it is good to find that information as it might help
>> in designing this feature better.
>>
>> > The perf report files for both Postgres server and pg_basebackup sides
>> > are attached.
>> >
>>
>> It is not clear which functions are taking more time or for which
>> functions time is reduced as function symbols are not present in the
>> reports.  I think you can refer
>> "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
>> take profiles and additionally use -fno-omit-frame-pointer during
>> configure (you can use CFLAGS="-fno-omit-frame-pointer during
>> configure).
>>
>>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>
> --
> David
>
> Software Engineer
> Highgo Software Inc. (Canada)
> www.highgo.ca
>

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-30T10:45:13Z

On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine?  If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-04-30T13:09:36Z

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> > 1) Without Asif’s patch
> > 2) With Asif’s patch and combination of workers 1,2,4,8.
> >
> > We run those test on two setup
> >
> > 1) Client and Server both on the same machine (Local backups)
> >
> > 2) Client and server on a different machine (remote backups)
> >
> >
> > Machine details:
> >
> > 1: Server (on which local backups performed and used as server for remote backups)
> >
> > 2: Client (Used as a client for remote backups)
> >
> >
> ...
> >
> >
> > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.
> >
> >
> > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.
> >
> >
> >
> > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
> >
> >
>
> Was this for a setup where the client and server were on the same
> machine or where the client was on a different machine?  If it was for
> the case where both are on the same machine, then ideally, we should
> see ClientRead events in a similar proportion?
>

During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows?  Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

David Zhang <david.zhang@highgo.ca> — 2020-04-30T18:18:18Z

On 2020-04-30 2:18 a.m., Sumanta Mukherjee wrote:

> Hi,
>
> Would it be possible to put in the absolute numbers of the perf
> so that it is easier to understand the amount of improvement with
> and without the patch and different loads and workers.

Here is the parameters used to record the perf data on both server and 
client side, for example, after applied the patch15 using 4 workers with 
load,

perf record -o postgres_patch_j4_load -e block:block_rq_insert -e 
cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- 
/home/ec2-user/after/bin/postgres -D /mnt/test/data

perf record -o backup_patch_j4_load -e block:block_rq_insert -e 
cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- 
/home/ec2-user/after/bin/pg_basebackup -h ${PG_SERVER} -p 5432 -D 
/mnt/backup/data -v

And this is how the report is generated.
perf report  -i postgres_patch_j4_load --stdio > postgres_patch_j4_load.txt

The original perf data files are still available, can you please clarify 
which parameter you would like to be added for regenerating the report, 
or any other parameters need to be added to recreate the perf.data and 
then generate the report?

>
> I am also unsure why the swapper is taking such a huge percentage of 
> the absolute time
> in the base run of just the postgres server and pg_basebackup client.
>
> With Regards,
> Sumanta Mukherjee.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca 
> <mailto:david.zhang@highgo.ca>> wrote:
>
>     Hi,
>
>     Thanks a lot for sharing the test results. Here is the our test
>     results using perf on three ASW t2.xlarge with below configuration.
>
>     Machine configuration:
>           Instance Type        :t2.xlarge
>           Volume type          :io1
>           Memory (MiB)         :16GB
>           vCPU #                   :4
>           Architecture           :x86_64
>           IOP                         :6000
>           Database Size (GB)  :45 (Server)
>
>     case 1: postgres server: without patch and without load
>
>     * Disk I/O:
>
>     # Samples: 342K of event 'block:block_rq_insert'
>     # Event count (approx.): 342834
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ................. .....................
>     #
>         97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
>          2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>
>
>     * CPU:
>
>     # Samples: 6M of event 'cpu-clock'
>     # Event count (approx.): 1559444750000
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ....................
>     .............................................
>     #
>         64.73%  swapper          [kernel.kallsyms]     [k]
>     native_safe_halt
>         10.89%  postgres         [vdso]                [.]
>     __vdso_gettimeofday
>          5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
>          5.43%  postgres libpthread-2.26.so
>     <http://libpthread-2.26.so>    [.] __libc_recv
>          1.72%  postgres         [kernel.kallsyms]     [k]
>     pvclock_clocksource_read
>
>     * Network:
>
>     # Samples: 2M of event 'skb:consume_skb'
>     # Event count (approx.): 2739785
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  .................
>     ...........................
>     #
>         91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
>          7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
>          0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
>          0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
>
>
>     case 1: pg_basebackup client: without patch and without load
>
>     * Disk I/O:
>
>     # Samples: 371K of event 'block:block_rq_insert'
>     # Event count (approx.): 371362
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ................. .....................
>     #
>         96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>          2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
>          0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>          0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
>
>
>     * CPU:
>
>     # Samples: 3M of event 'cpu-clock'
>     # Event count (approx.): 903527000000
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ..................
>     .............................................
>     #
>         87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
>          3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
>          0.48%  swapper          [kernel.kallsyms]   [k]
>     __softirqentry_text_start
>          0.37%  pg_basebackup    [kernel.kallsyms]   [k]
>     copy_user_enhanced_fast_string
>          0.35%  swapper          [kernel.kallsyms]   [k] do_csum
>
>     * Network:
>
>     # Samples: 12M of event 'skb:consume_skb'
>     # Event count (approx.): 12260713
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  .................
>     ...........................
>     #
>         95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
>          3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
>          0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
>          0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb
>
>
>     case 2: postgres server: with patch and with load, 4 backup
>     workers on client side
>
>     * Disk I/O:
>
>     # Samples: 3M of event 'block:block_rq_insert'
>     # Event count (approx.): 3634542
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ................. .....................
>     #
>         98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
>          0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
>          0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>          0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request
>
>     * CPU:
>
>     # Samples: 9M of event 'cpu-clock'
>     # Event count (approx.): 2299129250000
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  .....................
>     .............................................
>     #
>         52.73%  swapper          [kernel.kallsyms]      [k]
>     native_safe_halt
>          8.31%  postgres         [vdso]                 [.]
>     __vdso_gettimeofday
>          4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
>          4.16%  postgres libpthread-2.26.so
>     <http://libpthread-2.26.so>     [.] __libc_recv
>          1.58%  postgres         [kernel.kallsyms]      [k]
>     __lock_text_start
>          1.52%  postgres         [kernel.kallsyms]      [k]
>     pvclock_clocksource_read
>          0.81%  postgres         [kernel.kallsyms]      [k]
>     copy_user_enhanced_fast_string
>
>
>     * Network:
>
>     # Samples: 6M of event 'skb:consume_skb'
>     # Event count (approx.): 6048795
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  .................
>     ...........................
>     #
>         85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
>         12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
>          0.97%  postgres         [kernel.kallsyms]  [k]
>     __consume_stateless_skb
>          0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
>          0.24%  perf             [kernel.kallsyms]  [k] consume_skb
>
>
>     case 2: pg_basebackup 4 workers: with patch and with load
>
>     * Disk I/O:
>
>     # Samples: 372K of event 'block:block_rq_insert'
>     # Event count (approx.): 372360
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ................. .....................
>     #
>         97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
>          1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
>          0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
>          0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request
>
>
>     * CPU:
>
>     # Samples: 4M of event 'cpu-clock'
>     # Event count (approx.): 1234071000000
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  ........................
>     .................................................
>     #
>         89.25%  swapper          [kernel.kallsyms] [k] native_safe_halt
>          0.93%  pg_basebackup    [kernel.kallsyms] [k] __lock_text_start
>          0.91%  swapper          [kernel.kallsyms] [k] __lock_text_start
>          0.69%  pg_basebackup    [kernel.kallsyms] [k]
>     copy_user_enhanced_fast_string
>          0.45%  swapper          [kernel.kallsyms] [k] do_csum
>
>
>     * Network:
>
>     # Samples: 6M of event 'skb:consume_skb'
>     # Event count (approx.): 6449013
>     #
>     # Overhead  Command          Shared Object Symbol
>     # ........  ...............  .................
>     ...........................
>     #
>         90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
>          9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
>          0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
>          0.21%  sshd             [kernel.kallsyms]  [k] consume_skb
>
>
>     The detailed perf report is attached, with different scenarios,
>     i.e. without patch (with and without load for server and client) ,
>     with patch (with and without load for 1, 2, 4, 8 workers for both
>     server and client). The file name should self explain the cases.
>
>     Let me know if more information required.
>
>     Best regards,
>
>     David
>
>     On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
>>     Hi,
>>
>>     We at EnterpriseDB did some performance testing around this
>>     parallel backup to check how this is beneficial and below are the
>>     results. In this testing, we run the backup -
>>     1) Without Asif’s patch
>>     2) With Asif’s patch and combination of workers 1,2,4,8.
>>
>>     We run those test on two setup
>>
>>     1) Client and Server both on the same machine (Local backups)
>>
>>     2) Client and server on a different machine (remote backups)
>>
>>
>>     *Machine details: *
>>
>>     1: Server (on which local backups performed and used as server
>>     for remote backups)
>>
>>     2: Client (Used as a client for remote backups)
>>
>>
>>     *Server:*
>>
>>     RAM:500 GB
>>     CPU details:
>>     Architecture: x86_64
>>     CPU op-mode(s): 32-bit, 64-bit
>>     Byte Order: Little Endian
>>     CPU(s): 128
>>     On-line CPU(s) list: 0-127
>>     Thread(s) per core: 2
>>     Core(s) per socket: 8
>>     Socket(s): 8
>>     NUMA node(s): 8
>>     Filesystem:ext4
>>
>>
>>     *Client:*
>>     RAM:490 GB
>>     CPU details:
>>     Architecture: ppc64le
>>     Byte Order: Little Endian
>>     CPU(s): 192
>>     On-line CPU(s) list: 0-191
>>     Thread(s) per core: 8
>>     Core(s) per socket: 1
>>     Socket(s): 24
>>     Filesystem:ext4
>>
>>     Below are the results for the local test:
>>
>>     Data size 	without paralle backup
>>     patch 	parallel backup with
>>     1 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     2 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     4 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     8 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch)
>>     10 GB
>>     (10 tables - each table around 1.05 GB) 	real 0m27.016s
>>     user 0m3.378s
>>     sys 0m23.059s 	real 0m30.314s
>>     user 0m3.575s
>>     sys 0m22.946s 	12% performance
>>     decreased 	real 0m20.400s
>>     user 0m3.622s
>>     sys 0m29.670s 	27% performace
>>     increased 	real 0m15.331s
>>     user 0m3.706s
>>     sys 0m39.189s 	43% performance
>>     increased 	real 0m15.094s
>>     user 0m3.915s
>>     sys 1m23.350s 	44% performace
>>     increased.
>>     50GB
>>     (50 tables - each table around 1.05 GB) 	real 2m11.049s
>>     user 0m16.464s
>>     sys 2m1.757s 	real 2m26.621s
>>     user 0m18.497s
>>     sys 2m4.792s 	21% performance
>>     decreased 	real 1m9.581s
>>     user 0m18.298s
>>     sys 2m12.030s 	46% performance
>>     increased 	real 0m53.894s
>>     user 0m18.588s
>>     sys 2m47.390s 	58% performance
>>     increased. 	real 0m55.373s
>>     user 0m18.423s
>>     sys 5m57.470s 	57% performance
>>     increased.
>>     100GB
>>     (100 tables - each table around 1.05 GB) 	real 4m4.776s
>>     user 0m33.699s
>>     sys 3m27.777s 	real 4m20.862s
>>     user 0m35.753s
>>     sys 3m28.262s 	6% performance
>>     decreased 	real 2m37.411s
>>     user 0m36.440s
>>     sys 4m16.424s" 	35% performance
>>     increased 	real 1m49.503s
>>     user 0m37.200s
>>     sys 5m58.077s 	55% performace
>>     increased 	real 1m36.762s
>>     user 0m36.987s
>>     sys 9m36.906s 	60% performace
>>     increased.
>>     200GB
>>     (200 tables - each table around 1.05 GB) 	real 10m34.998s
>>     user 1m8.471s
>>     sys 7m21.520s 	real 11m30.899s
>>     user 1m12.933s
>>     sys 8m14.496s 	8% performance
>>     decreased 	real 6m8.481s
>>     user 1m13.771s
>>     sys 9m31.216s 	41% performance
>>     increased 	real 4m2.403s
>>     user 1m18.331s
>>     sys 12m29.661s 	61% performance
>>     increased 	real 4m3.768s
>>     user 1m24.547s
>>     sys 15m21.421s 	61% performance
>>     increased
>>
>>
>>     Results for the remote test:
>>
>>     Data size 	without paralle backup
>>     patch 	parallel backup with
>>     1 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     2 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     4 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch) 	parallel backup with
>>     8 worker 	% performance
>>     increased/decreased
>>     compare to normal
>>     backup
>>     (without patch)
>>     10 GB
>>     (10 tables - each table around 1.05 GB) 	real 1m36.829s
>>     user 0m2.124s
>>     sys 0m14.004s 	real 1m37.598s
>>     user 0m3.272s
>>     sys 0m11.110s 	0.8% performance
>>     decreased 	real 1m36.753s
>>     user 0m2.627s
>>     sys 0m15.312s 	0.08% performance
>>     increased. 	real 1m37.212s
>>     user 0m3.835s
>>     sys 0m13.221s 	0.3% performance
>>     decreased. 	real 1m36.977s
>>     user 0m4.475s
>>     sys 0m17.937s 	0.1% perfomance
>>     decreased.
>>     50GB
>>     (50 tables - each table around 1.05 GB) 	real 7m54.211s
>>     user 0m10.826s
>>     sys 1m10.435s 	real 7m55.603s
>>     user 0m16.535s
>>     sys 1m8.147s 	0.2% performance
>>     decreased 	real 7m53.499s
>>     user 0m18.131s
>>     sys 1m8.822s 	0.1% performance
>>     increased. 	real 7m54.687s
>>     user 0m15.818s
>>     sys 1m30.991s 	0.1% performance
>>     decreased 	real 7m54.658s
>>     user 0m20.783s
>>     sys 1m34.460s 	0.1% performance
>>     decreased
>>     100GB
>>     (100 tables - each table around 1.05 GB) 	real 15m45.776s
>>     user 0m21.802s
>>     sys 2m59.006s 	real 15m46.315s
>>     user 0m32.499s
>>     sys 2m47.245s 	0.05% performance
>>     decreased 	real 15m46.065s
>>     user 0m28.877s
>>     sys 2m21.181s 	0.03% performacne
>>     drcreased 	real 15m47.793s
>>     user 0m30.932s
>>     sys 2m36.708s 	0.2% performance
>>     decresed 	real 15m47.129s
>>     user 0m35.151s
>>     sys 3m23.572s 	0.14% performance
>>     decreased.
>>     200GB
>>     (200 tables - each table around 1.05 GB) 	real 32m55.720s
>>     user 0m50.602s
>>     sys 5m38.875s 	real 31m30.602s
>>     user 0m45.377s
>>     sys 4m57.405s 	4% performance
>>     increased 	real 31m30.214s
>>     user 0m55.023s
>>     sys 5m8.689s 	4% performance
>>     increased 	real 31m31.187s
>>     user 1m13.390s
>>     sys 5m40.861s 	4% performance
>>     increased 	real 31m31.729s
>>     user 1m4.955s
>>     sys 6m35.774s 	4% performance
>>     decreased
>>
>>
>>
>>     Client & Server on the same machine, the result shows around 50%
>>     improvement in parallel run with worker 4 and 8.  We don’t see
>>     the huge performance improvement with more workers been added.
>>
>>
>>     Whereas, when the client and server on a different machine, we
>>     don’t see any major benefit in performance.  This testing result
>>     matches the testing results posted by David Zhang up thread.
>>
>>
>>
>>     We ran the test for 100GB backup with parallel worker 4 to see
>>     the CPU usage and other information. What we noticed is that
>>     server is consuming the CPU almost 100% whole the time and
>>     pg_stat_activity shows that server is busy with ClientWrite most
>>     of the time.
>>
>>
>>     Attaching captured output for
>>
>>     1) Top command output on the server after every 5 second
>>
>>     2) pg_stat_activity output after every 5 second
>>
>>     3) Top command output on the client after every 5 second
>>
>>
>>     Do let me know if anyone has further questions/inputs for the
>>     benchmarking.
>>
>>
>>     Thanks to Rushabh Lathia for helping me with this testing.
>>
>>     On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila
>>     <amit.kapila16@gmail.com <mailto:amit.kapila16@gmail.com>> wrote:
>>
>>         On Mon, Apr 27, 2020 at 10:23 PM David Zhang
>>         <david.zhang@highgo.ca <mailto:david.zhang@highgo.ca>> wrote:
>>         >
>>         > Hi,
>>         >
>>         > Here is the parallel backup performance test results with
>>         and without
>>         > the patch "parallel_backup_v15" on AWS cloud environment. Two
>>         > "t2.xlarge" machines were used: one for Postgres server and
>>         the other
>>         > one for pg_basebackup with the same machine configuration
>>         showing below.
>>         >
>>         > Machine configuration:
>>         >      Instance Type        :t2.xlarge
>>         >      Volume type          :io1
>>         >      Memory (MiB)         :16GB
>>         >      vCPU #               :4
>>         >      Architecture         :x86_64
>>         >      IOP                  :6000
>>         >      Database Size (GB)   :108
>>         >
>>         > Performance test results:
>>         > without patch:
>>         >      real 18m49.346s
>>         >      user 1m24.178s
>>         >      sys 7m2.966s
>>         >
>>         > 1 worker with patch:
>>         >      real 18m43.201s
>>         >      user 1m55.787s
>>         >      sys 7m24.724s
>>         >
>>         > 2 worker with patch:
>>         >      real 18m47.373s
>>         >      user 2m22.970s
>>         >      sys 11m23.891s
>>         >
>>         > 4 worker with patch:
>>         >      real 18m46.878s
>>         >      user 2m26.791s
>>         >      sys 13m14.716s
>>         >
>>         > As required, I didn't have the pgbench running in parallel
>>         like we did
>>         > in the previous benchmark.
>>         >
>>
>>         So, there doesn't seem to be any significant improvement in this
>>         scenario.  Now, it is not clear why there was a significant
>>         improvement in the previous run where pgbench was also running
>>         simultaneously.  I am not sure but maybe it is because when a
>>         lot of
>>         other backends were running (performing read-only workload) the
>>         backend that was responsible for doing backup was getting
>>         frequently
>>         scheduled out and it slowed down the overall backup process. 
>>         And when
>>         we start using multiple backends for backup one or other backup
>>         process is always running making the overall backup faster. 
>>         One idea
>>         to find this out is to check how much time backup takes when
>>         we run it
>>         with and without pgbench workload on HEAD (aka unpatched
>>         code).  Even
>>         if what I am saying is true or there is some other reason due
>>         to which
>>         we are seeing speedup in some cases (where there is a concurrent
>>         workload), it might not make the case for using multiple
>>         backends for
>>         backup but still, it is good to find that information as it
>>         might help
>>         in designing this feature better.
>>
>>         > The perf report files for both Postgres server and
>>         pg_basebackup sides
>>         > are attached.
>>         >
>>
>>         It is not clear which functions are taking more time or for which
>>         functions time is reduced as function symbols are not present
>>         in the
>>         reports.  I think you can refer
>>         "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see
>>         how to
>>         take profiles and additionally use -fno-omit-frame-pointer during
>>         configure (you can use CFLAGS="-fno-omit-frame-pointer during
>>         configure).
>>
>>
>>         -- 
>>         With Regards,
>>         Amit Kapila.
>>         EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>>
>>     -- 
>>     -- 
>>
>>     Thanks & Regards,
>>     Suraj kharage,
>>     EnterpriseDB Corporation,
>>     The Postgres Database Company.
>     -- 
>     David
>
>     Software Engineer
>     Highgo Software Inc. (Canada)
>     www.highgo.ca <http://www.highgo.ca>
>
-- 
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: WIP/PoC for parallel backup

Rushabh Lathia <rushabh.lathia@gmail.com> — 2020-05-04T13:22:37Z

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > We at EnterpriseDB did some performance testing around this parallel
> backup to check how this is beneficial and below are the results. In this
> testing, we run the backup -
> > 1) Without Asif’s patch
> > 2) With Asif’s patch and combination of workers 1,2,4,8.
> >
> > We run those test on two setup
> >
> > 1) Client and Server both on the same machine (Local backups)
> >
> > 2) Client and server on a different machine (remote backups)
> >
> >
> > Machine details:
> >
> > 1: Server (on which local backups performed and used as server for
> remote backups)
> >
> > 2: Client (Used as a client for remote backups)
> >
> >
> ...
> >
> >
> > Client & Server on the same machine, the result shows around 50%
> improvement in parallel run with worker 4 and 8.  We don’t see the huge
> performance improvement with more workers been added.
> >
> >
> > Whereas, when the client and server on a different machine, we don’t see
> any major benefit in performance.  This testing result matches the testing
> results posted by David Zhang up thread.
> >
> >
> >
> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
> usage and other information. What we noticed is that server is consuming
> the CPU almost 100% whole the time and pg_stat_activity shows that server
> is busy with ClientWrite most of the time.
> >
> >
>
> Was this for a setup where the client and server were on the same
> machine or where the client was on a different machine?  If it was for
> the case where both are on the same machine, then ideally, we should
> see ClientRead events in a similar proportion?
>

In the particular setup, the client and server were on different machines.


> During an offlist discussion with Robert, he pointed out that current
> basebackup's code doesn't account for the wait event for the reading
> of files which can change what pg_stat_activity shows?  Can you please
> apply his latest patch to improve basebackup.c's code [1] which will
> take care of that waitevent before getting the data again?
>
> [1] -
> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>


Sure, we can try out this and do a similar run to collect the
pg_stat_activity output.


> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 
Rushabh Lathia

Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com> — 2020-05-21T05:17:29Z

On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

>
>
> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>
>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
>> <suraj.kharage@enterprisedb.com> wrote:
>> >
>> > Hi,
>> >
>> > We at EnterpriseDB did some performance testing around this parallel
>> backup to check how this is beneficial and below are the results. In this
>> testing, we run the backup -
>> > 1) Without Asif’s patch
>> > 2) With Asif’s patch and combination of workers 1,2,4,8.
>> >
>> > We run those test on two setup
>> >
>> > 1) Client and Server both on the same machine (Local backups)
>> >
>> > 2) Client and server on a different machine (remote backups)
>> >
>> >
>> > Machine details:
>> >
>> > 1: Server (on which local backups performed and used as server for
>> remote backups)
>> >
>> > 2: Client (Used as a client for remote backups)
>> >
>> >
>> ...
>> >
>> >
>> > Client & Server on the same machine, the result shows around 50%
>> improvement in parallel run with worker 4 and 8.  We don’t see the huge
>> performance improvement with more workers been added.
>> >
>> >
>> > Whereas, when the client and server on a different machine, we don’t
>> see any major benefit in performance.  This testing result matches the
>> testing results posted by David Zhang up thread.
>> >
>> >
>> >
>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
>> usage and other information. What we noticed is that server is consuming
>> the CPU almost 100% whole the time and pg_stat_activity shows that server
>> is busy with ClientWrite most of the time.
>> >
>> >
>>
>> Was this for a setup where the client and server were on the same
>> machine or where the client was on a different machine?  If it was for
>> the case where both are on the same machine, then ideally, we should
>> see ClientRead events in a similar proportion?
>>
>
> In the particular setup, the client and server were on different machines.
>
>
>> During an offlist discussion with Robert, he pointed out that current
>> basebackup's code doesn't account for the wait event for the reading
>> of files which can change what pg_stat_activity shows?  Can you please
>> apply his latest patch to improve basebackup.c's code [1] which will
>> take care of that waitevent before getting the data again?
>>
>> [1] -
>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>
>
>
> Sure, we can try out this and do a similar run to collect the
> pg_stat_activity output.
>

Have you had the chance to try this out?


>
>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>
> --
> Rushabh Lathia
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

Rushabh Lathia <rushabh.lathia@gmail.com> — 2020-05-21T06:06:23Z

On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

>
>
> On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>>
>>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
>>> <suraj.kharage@enterprisedb.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We at EnterpriseDB did some performance testing around this parallel
>>> backup to check how this is beneficial and below are the results. In this
>>> testing, we run the backup -
>>> > 1) Without Asif’s patch
>>> > 2) With Asif’s patch and combination of workers 1,2,4,8.
>>> >
>>> > We run those test on two setup
>>> >
>>> > 1) Client and Server both on the same machine (Local backups)
>>> >
>>> > 2) Client and server on a different machine (remote backups)
>>> >
>>> >
>>> > Machine details:
>>> >
>>> > 1: Server (on which local backups performed and used as server for
>>> remote backups)
>>> >
>>> > 2: Client (Used as a client for remote backups)
>>> >
>>> >
>>> ...
>>> >
>>> >
>>> > Client & Server on the same machine, the result shows around 50%
>>> improvement in parallel run with worker 4 and 8.  We don’t see the huge
>>> performance improvement with more workers been added.
>>> >
>>> >
>>> > Whereas, when the client and server on a different machine, we don’t
>>> see any major benefit in performance.  This testing result matches the
>>> testing results posted by David Zhang up thread.
>>> >
>>> >
>>> >
>>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
>>> usage and other information. What we noticed is that server is consuming
>>> the CPU almost 100% whole the time and pg_stat_activity shows that server
>>> is busy with ClientWrite most of the time.
>>> >
>>> >
>>>
>>> Was this for a setup where the client and server were on the same
>>> machine or where the client was on a different machine?  If it was for
>>> the case where both are on the same machine, then ideally, we should
>>> see ClientRead events in a similar proportion?
>>>
>>
>> In the particular setup, the client and server were on different
>> machines.
>>
>>
>>> During an offlist discussion with Robert, he pointed out that current
>>> basebackup's code doesn't account for the wait event for the reading
>>> of files which can change what pg_stat_activity shows?  Can you please
>>> apply his latest patch to improve basebackup.c's code [1] which will
>>> take care of that waitevent before getting the data again?
>>>
>>> [1] -
>>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>>
>>
>>
>> Sure, we can try out this and do a similar run to collect the
>> pg_stat_activity output.
>>
>
> Have you had the chance to try this out?
>

Yes. My colleague Suraj tried this and here are the pg_stat_activity output
files.

Captured wait events after every 3 seconds during the backup for -
1: parallel backup for 100GB data with 4 workers
(pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data
(pg_stat_activity_j4_100GB.txt)

Here is the observation:

The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)

ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)




Thanks,
Rushabh Lathia
www.EnterpriseDB.com

Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com> — 2020-05-21T06:53:56Z

On Thu, May 21, 2020 at 11:36 AM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>>
>>>>
>>>> During an offlist discussion with Robert, he pointed out that current
>>>> basebackup's code doesn't account for the wait event for the reading
>>>> of files which can change what pg_stat_activity shows?  Can you please
>>>> apply his latest patch to improve basebackup.c's code [1] which will
>>>> take care of that waitevent before getting the data again?
>>>>
>>>> [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>>
>>>
>>>
>>> Sure, we can try out this and do a similar run to collect the pg_stat_activity output.
>>
>>
>> Have you had the chance to try this out?
>
>
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)
>

It might be interesting to see why ClientRead/ClientWrite has
increased so much and can we reduce it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-05-21T13:41:54Z

On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-05-22T06:03:29Z

On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

>
> So, basically, when we go from 1 process to 4, the additional
> processes spend all of their time waiting rather than doing any useful
> work, and that's why there is no performance benefit. Presumably, the
> reason they spend all their time waiting for ClientRead/ClientWrite is
> because the network between the two machines is saturated, so adding
> more processes that are trying to use it at maximum speed just leads
> to spending more time waiting for it to be available.
>
> Do we have the same results for the local backup case, where the patch
> helped?
>

Here is the result for local backup case (100GB data). Attaching the
captured logs.

The total number of events (pg_stat_activity) captured during local runs:
- 82 events for normal backups
- 31 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)

ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.


-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Re: WIP/PoC for parallel backup

Hamid Akhtar <hamid.akhtar@gmail.com> — 2020-06-11T17:40:38Z

As far I understand, parallel backup is not a mandatory performance
feature, rather, one at user's discretion. This IMHO indicates that it will
benefit some users and it may not others.

Taking a backup is an I/O intensive workload. So by parallelizing it
through multiple worker threads/processes, creates an overhead of its own.
So what precisely are we optimizing here. Looking at a running database
system in any environment, I see the following potential scenarios playing
out. These are probably clear to everyone here, but I'm listing these for
completeness and clarity.

Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing
operations causing disk I/O.

Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing
operations causing disk I/O.

Others:
(5) Server or the system running base backup has other processes competing
for disk or network bandwidth.

Generally speaking, I see that parallelization could potentially benefit in
scenarios (2), (4) and (5) with the reason being that having more than one
thread increases the likelihood that backup will now get a bigger time
slice for disk I/O and network bandwidth. With (1) and (3), since there are
no competing processes, addition of multiple threads or processes will only
increase CPU overhead whilst still getting the same network and disk time
slice. In this particular case, the performance will degrade.

IMHO, that’s why by adding other load on the server, perhaps by running
pgbench simultaneously may show improved performance for parallel backup.
Also, running parallel backup on a local laptop more often than yields
improved performance.

There are obviously other factors that may impact the performance like the
type of I/O scheduler being used whether CFQ or some other.

IMHO, parallel backup has obvious performance benefits, but we need to
ensure that users understand that there is potential for slower backup if
there is no competition for resources.

On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:

>
> On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>>
>> So, basically, when we go from 1 process to 4, the additional
>> processes spend all of their time waiting rather than doing any useful
>> work, and that's why there is no performance benefit. Presumably, the
>> reason they spend all their time waiting for ClientRead/ClientWrite is
>> because the network between the two machines is saturated, so adding
>> more processes that are trying to use it at maximum speed just leads
>> to spending more time waiting for it to be available.
>>
>> Do we have the same results for the local backup case, where the patch
>> helped?
>>
>
> Here is the result for local backup case (100GB data). Attaching the
> captured logs.
>
> The total number of events (pg_stat_activity) captured during local runs:
> - 82 events for normal backups
> - 31 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 24 - in normal backups
> 14 - in parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 8 - in normal backup
> 43 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 32 - ClientRead in parallel backups for diff processes.
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus

Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com> — 2020-06-12T17:28:37Z

On Thu, Jun 11, 2020 at 1:41 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:
> As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This IMHO indicates that it will benefit some users and it may not others.
>
> IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potential for slower backup if there is no competition for resources.

I am sure that nobody is arguing that the patch has to be beneficial
in all cases in order to justify applying it. However, there are
several good arguments against proceding with this patch:

* Every version of the patch that has been reviewed by anybody has
been riddled with errors. Over and over again, testers have found
serious bugs, and code reviewers have noticed lots of problems, too.

* This approach requires rewriting a lot of current functionality,
either by moving it to the client side or by restructuring it to work
with parallelism. That's a lot of work, and it seems likely to
generate more work in the future as people continue to add features.
It's one thing to add a feature that doesn't benefit everybody; it's
another thing to add a feature that doesn't benefit everybody and also
hinders future development. See
http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
for more discussion of these issues.

* The scenarios in which the patch delivers a performance benefit are
narrow and somewhat contrived. In remote backup scenarios, AIUI, the
patch hasn't been shown to help. In local backups, it does, but how
likely is it that you are going to do your local backups over the wire
protocol instead of by direct file copy, which is probably much
faster? I agree that if your server is overloaded, having multiple
processes competing for the server resources will allow backup to get
a larger slice relative to other things, but that seems like a pretty
hackish and inefficient solution to that problem. You could also argue
that we could provide a feature to prioritize some queries over other
queries by running them with tons of parallel workers just to convince
the OS to give them more resources, and I guess that would work, but
it would also waste tons of resources and possibly cripple or even
crash your system if you used it enough. The same argument applies
here.

* Even when the patch does provide a benefit, it seems to max out at
about 2.5X. Clearly it's nice to have something go 2.5X faster, but
the point is that it doesn't scale beyond that no matter how many
workers you add. That doesn't automatically mean that something is a
bad idea, but it is a concern. At the very least, we should be able to
say why it doesn't scale any better than that.

* Actually, we have some hints about that. Over at
http://postgr.es/m/20200503174922.mfzzdafa5g4rlhez@alap3.anarazel.de
Andres has shown that too much concurrency when copying files results
in a dramatic performance reduction, and that a lot of the reason why
concurrency helps in the first place has to do with the fact that
pg_basebackup does not have any cache control (no fallocate,
sync_file_range(WRITE), posix_fadvise(DONTNEED)). When those things
are added the performance gets better and the benefits of concurrency
are reduced. I suspect that would also be true for this patch. It
would be unreasonable to commit a large patch, especially one that
would hinder future development, if we could get the same benefits
from a small patch that would not do so.

I am not in a position to tell you how to spend your time, so you can
certainly pursue this patch if you wish. However, I think it's
probably not the best use of time. Even if you fixed all the bugs and
reimplemented all of the functionality that needs reimplementing in
order to make this approach work, it still doesn't make sense to
commit the patch if either (a) we can obtain the same benefit, or most
of it, from a much simpler patch or (b) the patch is going to make it
significantly harder to develop other features that we want to have,
especially if those features seem likely to be more beneficial than
what this patch offers. I think both of those are likely true here.

For an example of (b), consider compression of tar files on the server
side before transmission to the client. If you take the approach this
patch does and move tarfile construction to the client, that is
impossible. Now you can argue (and perhaps you will) that this would
just mean someone has to choose between using this feature and using
that feature, and why should users not have such a choice? That is a
fair argument, but my counter-argument is that users shouldn't be
forced into making that choice. If the parallel feature is beneficial
enough to justify having it, then it ought to be designed in such a
way that it works with the other features we also want to have rather
than forcing users to choose between them. Since I have already
proposed (on the other thread linked above) a design that would make
that possible, and this design does not, I have a hard time
understanding why we would pick this one, especially given all of the
other disadvantages which it seems to have.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Daniel Gustafsson <daniel@yesql.se> — 2020-07-06T12:24:03Z

> On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:

> I am sure that nobody is arguing that the patch has to be beneficial
> in all cases in order to justify applying it. However, there are
> several good arguments against proceding with this patch:

This thread has stalled with no resolution to the raised issues, and the latest
version of the patch (v15) posted no longer applies (I only tried 0001 which
failed, the green tick in the CFBot is due it mistakenlt thinking an attached
report is a patch).  I'm marking this patch Returned with Feedback.  Please
open a new CF entry when there is a new version of the patch.

cheers ./daniel

Re: WIP/PoC for parallel backup

Hamid Akhtar <hamid.akhtar@gmail.com> — 2020-07-06T12:39:08Z

On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson <daniel@yesql.se> wrote:

> > On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
>
> > I am sure that nobody is arguing that the patch has to be beneficial
> > in all cases in order to justify applying it. However, there are
> > several good arguments against proceding with this patch:
>
> This thread has stalled with no resolution to the raised issues, and the
> latest
> version of the patch (v15) posted no longer applies (I only tried 0001
> which
> failed, the green tick in the CFBot is due it mistakenlt thinking an
> attached
> report is a patch).  I'm marking this patch Returned with Feedback.  Please
> open a new CF entry when there is a new version of the patch.
>
> cheers ./daniel


I think this is fair. There are quite a few valid points raised by Robert.


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus