Re: WIP/PoC for parallel backup

Ahsan Hadi <ahsan.hadi@gmail.com>

From: Ahsan Hadi <ahsan.hadi@gmail.com>

To: Rushabh Lathia <rushabh.lathia@gmail.com>

Cc: Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, David Zhang <david.zhang@highgo.ca>, Asif Rehman <asifr.rehman@gmail.com>, Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>, Robert Haas <robertmhaas@gmail.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>

Date: 2020-05-21T05:17:29Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited

On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

>
>
> On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>
>> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
>> <suraj.kharage@enterprisedb.com> wrote:
>> >
>> > Hi,
>> >
>> > We at EnterpriseDB did some performance testing around this parallel
>> backup to check how this is beneficial and below are the results. In this
>> testing, we run the backup -
>> > 1) Without Asif’s patch
>> > 2) With Asif’s patch and combination of workers 1,2,4,8.
>> >
>> > We run those test on two setup
>> >
>> > 1) Client and Server both on the same machine (Local backups)
>> >
>> > 2) Client and server on a different machine (remote backups)
>> >
>> >
>> > Machine details:
>> >
>> > 1: Server (on which local backups performed and used as server for
>> remote backups)
>> >
>> > 2: Client (Used as a client for remote backups)
>> >
>> >
>> ...
>> >
>> >
>> > Client & Server on the same machine, the result shows around 50%
>> improvement in parallel run with worker 4 and 8.  We don’t see the huge
>> performance improvement with more workers been added.
>> >
>> >
>> > Whereas, when the client and server on a different machine, we don’t
>> see any major benefit in performance.  This testing result matches the
>> testing results posted by David Zhang up thread.
>> >
>> >
>> >
>> > We ran the test for 100GB backup with parallel worker 4 to see the CPU
>> usage and other information. What we noticed is that server is consuming
>> the CPU almost 100% whole the time and pg_stat_activity shows that server
>> is busy with ClientWrite most of the time.
>> >
>> >
>>
>> Was this for a setup where the client and server were on the same
>> machine or where the client was on a different machine?  If it was for
>> the case where both are on the same machine, then ideally, we should
>> see ClientRead events in a similar proportion?
>>
>
> In the particular setup, the client and server were on different machines.
>
>
>> During an offlist discussion with Robert, he pointed out that current
>> basebackup's code doesn't account for the wait event for the reading
>> of files which can change what pg_stat_activity shows?  Can you please
>> apply his latest patch to improve basebackup.c's code [1] which will
>> take care of that waitevent before getting the data again?
>>
>> [1] -
>> https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>
>
>
> Sure, we can try out this and do a similar run to collect the
> pg_stat_activity output.
>

Have you had the chance to try this out?


>
>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>>
>>
>
> --
> Rushabh Lathia
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca