Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: Asif Rehman <asifr.rehman@gmail.com>

Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>

Date: 2019-10-04T12:07:42Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited

On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?

Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.

I'm also not particularly excited about having the server do the
sorting based on file size.  Seems like that ought to be the client's
job, if the client needs the sorting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company