Re: WIP/PoC for parallel backup

Jeevan Chalke <jeevan.chalke@enterprisedb.com>
View thread
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
To: asifr.rehman@gmail.com
Cc: Robert Haas <robertmhaas@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2019-10-03T11:47:31Z
Lists: pgsql-hackers
Attachments

0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patch (text/x-patch)
Hi  Asif,

I was looking at the patch and tried comipling it. However, got few errors
and warnings.

Fixed those in the attached patch.

On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

> Hi Robert,
>
> Thanks for the feedback. Please see the comments below:
>
> On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
>
>> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
>> > If the parallel option is there, then it will only do pg_start_backup,
>> scans PGDATA and sends a list of file names.
>>
>> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
>> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
>> list of file names. I don't think that's a good approach. It's too
>> confusing to have one replication command that returns totally
>> different things depending on whether some option is given.
>>
>
> Sure. I will add a separate command (START_BACKUP)  for parallel.
>
>
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
>> list.
>> > pg_basebackup will then send back a list of filenames in this command.
>> This commands will be send by each worker and that worker will be getting
>> the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>>
>
> I considered this approach initially,  however, I adopted the current
> strategy to avoid multiple round trips between the server and clients and
> save on query processing time by issuing a single command rather than
> multiple ones. Further fetching multiple files at once will also aid in
> supporting the tar format by utilising the existing ReceiveTarFile()
> function and will be able to create a tarball for per tablespace per worker.
>
>
>>
>> > - STOP_BACKUP
>> > when all workers finish then, pg_basebackup will send STOP_BACKUP
>> command.
>>
>> This also seems reasonable, but surely the matching command should
>> then be called START_BACKUP, not BASEBACKUP PARALLEL.
>>
>> > I have done a basic proof of concenpt (POC), which is also attached. I
>> would appreciate some input on this. So far, I am simply dividing the list
>> equally and assigning them to worker processes. I intend to fine tune this
>> by taking into consideration file sizes. Further to add tar format support,
>> I am considering that each worker process, processes all files belonging to
>> a tablespace in its list (i.e. creates and copies tar file), before it
>> processes the next tablespace. As a result, this will create tar files that
>> are disjointed with respect tablespace data. For example:
>>
>> Instead of doing this, I suggest that you should just maintain a list
>> of all the files that need to be fetched and have each worker pull a
>> file from the head of the list and fetch it when it finishes receiving
>> the previous file.  That way, if some connections go faster or slower
>> than others, the distribution of work ends up fairly even.  If you
>> instead pre-distribute the work, you're guessing what's going to
>> happen in the future instead of just waiting to see what actually does
>> happen. Guessing isn't intrinsically bad, but guessing when you could
>> be sure of doing the right thing *is* bad.
>>
>> If you want to be really fancy, you could start by sorting the files
>> in descending order of size, so that big files are fetched before
>> small ones.  Since the largest possible file is 1GB and any database
>> where this feature is important is probably hundreds or thousands of
>> GB, this may not be very important. I suggest not worrying about it
>> for v1.
>>
>
> Ideally, I would like to support the tar format as well, which would be
> much easier to implement when fetching multiple files at once since that
> would enable using the existent functionality to be used without much
> change.
>
> Your idea of sorting the files in descending order of size seems very
> appealing. I think we can do this and have the file divided among the
> workers one by one i.e. the first file in the list goes to worker 1, the
> second to process 2, and so on and so forth.
>
>
>>
>> > Say, tablespace t1 has 20 files and we have 5 worker processes and
>> tablespace t2 has 10. Ignoring all other factors for the sake of this
>> example, each worker process will get a group of 4 files of t1 and 2 files
>> of t2. Each process will create 2 tar files, one for t1 containing 4 files
>> and another for t2 containing 2 files.
>>
>> This is one of several possible approaches. If we're doing a
>> plain-format backup in parallel, we can just write each file where it
>> needs to go and call it good. But, with a tar-format backup, what
>> should we do? I can see three options:
>>
>> 1. Error! Tar format parallel backups are not supported.
>>
>> 2. Write multiple tar files. The user might reasonably expect that
>> they're going to end up with the same files at the end of the backup
>> regardless of whether they do it in parallel. A user with this
>> expectation will be disappointed.
>>
>> 3. Write one tar file. In this design, the workers have to take turns
>> writing to the tar file, so you need some synchronization around that.
>> Perhaps you'd have N threads that read and buffer a file, and N+1
>> buffers.  Then you have one additional thread that reads the complete
>> files from the buffers and writes them to the tar file. There's
>> obviously some possibility that the writer won't be able to keep up
>> and writing the backup will therefore be slower than it would be with
>> approach (2).
>>
>> There's probably also a possibility that approach (2) would thrash the
>> disk head back and forth between multiple files that are all being
>> written at the same time, and approach (3) will therefore win by not
>> thrashing the disk head. But, since spinning media are becoming less
>> and less popular and are likely to have multiple disk heads under the
>> hood when they are used, this is probably not too likely.
>>
>> I think your choice to go with approach (2) is probably reasonable,
>> but I'm not sure whether everyone will agree.
>>
>
> Yes for the tar format support, approach (2) is what I had in
> mind. Currently I'm working on the implementation and will share the patch
> in a couple of days.
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>


-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company