Re: WIP/PoC for parallel backup

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>
To: Rushabh Lathia <rushabh.lathia@gmail.com>
Cc: Ahsan Hadi <ahsan.hadi@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, David Zhang <david.zhang@highgo.ca>, Asif Rehman <asifr.rehman@gmail.com>, Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>, Robert Haas <robertmhaas@gmail.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2020-05-21T06:53:56Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

On Thu, May 21, 2020 at 11:36 AM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>>
>>>>
>>>> During an offlist discussion with Robert, he pointed out that current
>>>> basebackup's code doesn't account for the wait event for the reading
>>>> of files which can change what pg_stat_activity shows?  Can you please
>>>> apply his latest patch to improve basebackup.c's code [1] which will
>>>> take care of that waitevent before getting the data again?
>>>>
>>>> [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>>
>>>
>>>
>>> Sure, we can try out this and do a similar run to collect the pg_stat_activity output.
>>
>>
>> Have you had the chance to try this out?
>
>
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)
>

It might be interesting to see why ClientRead/ClientWrite has
increased so much and can we reduce it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com