Re: WIP/PoC for parallel backup

P <apraveen@pivotal.io>

From: Asim R P <apraveen@pivotal.io>

To: asifr.rehman@gmail.com

Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>

Date: 2019-08-23T10:17:51Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited

Hi Asif

Interesting proposal.  Bulk of the work in a backup is transferring files
from source data directory to destination.  Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set.  Is that necessary?  I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process.  That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup.  On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim