Re: WIP/PoC for parallel backup

P <apraveen@pivotal.io>

From: Asim R P <apraveen@pivotal.io>
To: asifr.rehman@gmail.com
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2019-08-23T10:17:51Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

Hi Asif

Interesting proposal.  Bulk of the work in a backup is transferring files
from source data directory to destination.  Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set.  Is that necessary?  I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process.  That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup.  On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim