Re: WIP/PoC for parallel backup
Rushabh Lathia <rushabh.lathia@gmail.com>
From: Rushabh Lathia <rushabh.lathia@gmail.com>
To: Ahsan Hadi <ahsan.hadi@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>,
Suraj Kharage <suraj.kharage@enterprisedb.com>, David Zhang <david.zhang@highgo.ca>, Asif Rehman <asifr.rehman@gmail.com>, Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>,
Robert Haas <robertmhaas@gmail.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2020-05-21T06:06:23Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
-
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
-
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited
Attachments
- pg_stat_activity_j4_100GB.txt (text/plain)
- pg_stat_activity_normal_backup_100GB.txt (text/plain)
On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote: > > > On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> > wrote: > >> >> >> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> >>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage >>> <suraj.kharage@enterprisedb.com> wrote: >>> > >>> > Hi, >>> > >>> > We at EnterpriseDB did some performance testing around this parallel >>> backup to check how this is beneficial and below are the results. In this >>> testing, we run the backup - >>> > 1) Without Asif’s patch >>> > 2) With Asif’s patch and combination of workers 1,2,4,8. >>> > >>> > We run those test on two setup >>> > >>> > 1) Client and Server both on the same machine (Local backups) >>> > >>> > 2) Client and server on a different machine (remote backups) >>> > >>> > >>> > Machine details: >>> > >>> > 1: Server (on which local backups performed and used as server for >>> remote backups) >>> > >>> > 2: Client (Used as a client for remote backups) >>> > >>> > >>> ... >>> > >>> > >>> > Client & Server on the same machine, the result shows around 50% >>> improvement in parallel run with worker 4 and 8. We don’t see the huge >>> performance improvement with more workers been added. >>> > >>> > >>> > Whereas, when the client and server on a different machine, we don’t >>> see any major benefit in performance. This testing result matches the >>> testing results posted by David Zhang up thread. >>> > >>> > >>> > >>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU >>> usage and other information. What we noticed is that server is consuming >>> the CPU almost 100% whole the time and pg_stat_activity shows that server >>> is busy with ClientWrite most of the time. >>> > >>> > >>> >>> Was this for a setup where the client and server were on the same >>> machine or where the client was on a different machine? If it was for >>> the case where both are on the same machine, then ideally, we should >>> see ClientRead events in a similar proportion? >>> >> >> In the particular setup, the client and server were on different >> machines. >> >> >>> During an offlist discussion with Robert, he pointed out that current >>> basebackup's code doesn't account for the wait event for the reading >>> of files which can change what pg_stat_activity shows? Can you please >>> apply his latest patch to improve basebackup.c's code [1] which will >>> take care of that waitevent before getting the data again? >>> >>> [1] - >>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com >>> >> >> >> Sure, we can try out this and do a similar run to collect the >> pg_stat_activity output. >> > > Have you had the chance to try this out? > Yes. My colleague Suraj tried this and here are the pg_stat_activity output files. Captured wait events after every 3 seconds during the backup for - 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt) 2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt) Here is the observation: The total number of events (pg_stat_activity) captured during above runs: - 314 events for normal backups - 316 events for parallel backups (-j 4) BaseBackupRead wait event numbers: (newly added) 37 - in normal backups 25 - in the parallel backup (-j 4) ClientWrite wait event numbers: 175 - in normal backup 1098 - in parallel backups ClientRead wait event numbers: 0 - ClientRead in normal backup 326 - ClientRead in parallel backups for diff processes. (all in idle state) Thanks, Rushabh Lathia www.EnterpriseDB.com