Re: WIP/PoC for parallel backup

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: Rushabh Lathia <rushabh.lathia@gmail.com>

Cc: Ahsan Hadi <ahsan.hadi@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, David Zhang <david.zhang@highgo.ca>, Asif Rehman <asifr.rehman@gmail.com>, Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>

Date: 2020-05-21T13:41:54Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited

On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company