Re: WIP/PoC for parallel backup
Jeevan Chalke <jeevan.chalke@enterprisedb.com>
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
To: asifr.rehman@gmail.com
Cc: Robert Haas <robertmhaas@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2019-10-03T11:47:31Z
Lists: pgsql-hackers
Attachments
Hi Asif, I was looking at the patch and tried comipling it. However, got few errors and warnings. Fixed those in the attached patch. On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote: > Hi Robert, > > Thanks for the feedback. Please see the comments below: > > On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> > wrote: > >> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> >> wrote: >> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA >> > If the parallel option is there, then it will only do pg_start_backup, >> scans PGDATA and sends a list of file names. >> >> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns >> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a >> list of file names. I don't think that's a good approach. It's too >> confusing to have one replication command that returns totally >> different things depending on whether some option is given. >> > > Sure. I will add a separate command (START_BACKUP) for parallel. > > >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given >> list. >> > pg_basebackup will then send back a list of filenames in this command. >> This commands will be send by each worker and that worker will be getting >> the said files. >> >> Seems reasonable, but I think you should just pass one file name and >> use the command multiple times, once per file. >> > > I considered this approach initially, however, I adopted the current > strategy to avoid multiple round trips between the server and clients and > save on query processing time by issuing a single command rather than > multiple ones. Further fetching multiple files at once will also aid in > supporting the tar format by utilising the existing ReceiveTarFile() > function and will be able to create a tarball for per tablespace per worker. > > >> >> > - STOP_BACKUP >> > when all workers finish then, pg_basebackup will send STOP_BACKUP >> command. >> >> This also seems reasonable, but surely the matching command should >> then be called START_BACKUP, not BASEBACKUP PARALLEL. >> >> > I have done a basic proof of concenpt (POC), which is also attached. I >> would appreciate some input on this. So far, I am simply dividing the list >> equally and assigning them to worker processes. I intend to fine tune this >> by taking into consideration file sizes. Further to add tar format support, >> I am considering that each worker process, processes all files belonging to >> a tablespace in its list (i.e. creates and copies tar file), before it >> processes the next tablespace. As a result, this will create tar files that >> are disjointed with respect tablespace data. For example: >> >> Instead of doing this, I suggest that you should just maintain a list >> of all the files that need to be fetched and have each worker pull a >> file from the head of the list and fetch it when it finishes receiving >> the previous file. That way, if some connections go faster or slower >> than others, the distribution of work ends up fairly even. If you >> instead pre-distribute the work, you're guessing what's going to >> happen in the future instead of just waiting to see what actually does >> happen. Guessing isn't intrinsically bad, but guessing when you could >> be sure of doing the right thing *is* bad. >> >> If you want to be really fancy, you could start by sorting the files >> in descending order of size, so that big files are fetched before >> small ones. Since the largest possible file is 1GB and any database >> where this feature is important is probably hundreds or thousands of >> GB, this may not be very important. I suggest not worrying about it >> for v1. >> > > Ideally, I would like to support the tar format as well, which would be > much easier to implement when fetching multiple files at once since that > would enable using the existent functionality to be used without much > change. > > Your idea of sorting the files in descending order of size seems very > appealing. I think we can do this and have the file divided among the > workers one by one i.e. the first file in the list goes to worker 1, the > second to process 2, and so on and so forth. > > >> >> > Say, tablespace t1 has 20 files and we have 5 worker processes and >> tablespace t2 has 10. Ignoring all other factors for the sake of this >> example, each worker process will get a group of 4 files of t1 and 2 files >> of t2. Each process will create 2 tar files, one for t1 containing 4 files >> and another for t2 containing 2 files. >> >> This is one of several possible approaches. If we're doing a >> plain-format backup in parallel, we can just write each file where it >> needs to go and call it good. But, with a tar-format backup, what >> should we do? I can see three options: >> >> 1. Error! Tar format parallel backups are not supported. >> >> 2. Write multiple tar files. The user might reasonably expect that >> they're going to end up with the same files at the end of the backup >> regardless of whether they do it in parallel. A user with this >> expectation will be disappointed. >> >> 3. Write one tar file. In this design, the workers have to take turns >> writing to the tar file, so you need some synchronization around that. >> Perhaps you'd have N threads that read and buffer a file, and N+1 >> buffers. Then you have one additional thread that reads the complete >> files from the buffers and writes them to the tar file. There's >> obviously some possibility that the writer won't be able to keep up >> and writing the backup will therefore be slower than it would be with >> approach (2). >> >> There's probably also a possibility that approach (2) would thrash the >> disk head back and forth between multiple files that are all being >> written at the same time, and approach (3) will therefore win by not >> thrashing the disk head. But, since spinning media are becoming less >> and less popular and are likely to have multiple disk heads under the >> hood when they are used, this is probably not too likely. >> >> I think your choice to go with approach (2) is probably reasonable, >> but I'm not sure whether everyone will agree. >> > > Yes for the tar format support, approach (2) is what I had in > mind. Currently I'm working on the implementation and will share the patch > in a couple of days. > > > -- > Asif Rehman > Highgo Software (Canada/China/Pakistan) > URL : www.highgo.ca > -- Jeevan Chalke Associate Database Architect & Team Lead, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company