Re: WIP/PoC for parallel backup

David Zhang <david.zhang@highgo.ca>

From: David Zhang <david.zhang@highgo.ca>
To: Suraj Kharage <suraj.kharage@enterprisedb.com>, Amit Kapila <amit.kapila16@gmail.com>
Cc: Ahsan Hadi <ahsan.hadi@gmail.com>, Asif Rehman <asifr.rehman@gmail.com>, Kashif Zeeshan <kashif.zeeshan@enterprisedb.com>, Robert Haas <robertmhaas@gmail.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2020-04-30T06:26:16Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

Attachments

Hi,

Thanks a lot for sharing the test results. Here is the our test results 
using perf on three ASW t2.xlarge with below configuration.

Machine configuration:
       Instance Type        :t2.xlarge
       Volume type          :io1
       Memory (MiB)         :16GB
       vCPU #                   :4
       Architecture           :x86_64
       IOP                         :6000
       Database Size (GB)  :45 (Server)

case 1: postgres server: without patch and without load

* Disk I/O:

# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     97.65%  postgres         [kernel.kallsyms]  [k] __elv_add_request
      2.27%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  .................... 
.............................................
#
     64.73%  swapper          [kernel.kallsyms]     [k] native_safe_halt
     10.89%  postgres         [vdso]                [.] __vdso_gettimeofday
      5.64%  postgres         [kernel.kallsyms]     [k] do_syscall_64
      5.43%  postgres         libpthread-2.26.so    [.] __libc_recv
      1.72%  postgres         [kernel.kallsyms]     [k] 
pvclock_clocksource_read

* Network:

# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     91.58%  swapper          [kernel.kallsyms]  [k] consume_skb
      7.09%  postgres         [kernel.kallsyms]  [k] consume_skb
      0.61%  kswapd0          [kernel.kallsyms]  [k] consume_skb
      0.44%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb


case 1: pg_basebackup client: without patch and without load

* Disk I/O:

# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     96.78%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
      2.82%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
      0.29%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.09%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  .................. 
.............................................
#
     87.99%  swapper          [kernel.kallsyms]   [k] native_safe_halt
      3.14%  swapper          [kernel.kallsyms]   [k] __lock_text_start
      0.48%  swapper          [kernel.kallsyms]   [k] 
__softirqentry_text_start
      0.37%  pg_basebackup    [kernel.kallsyms]   [k] 
copy_user_enhanced_fast_string
      0.35%  swapper          [kernel.kallsyms]   [k] do_csum

* Network:

# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     95.12%  swapper          [kernel.kallsyms]  [k] consume_skb
      3.23%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
      0.83%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
      0.45%  kswapd0          [kernel.kallsyms]  [k] consume_skb


case 2: postgres server: with patch and with load, 4 backup workers on 
client side

* Disk I/O:

# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     98.88%  postgres         [kernel.kallsyms]  [k] __elv_add_request
      0.66%  perf             [kernel.kallsyms]  [k] __elv_add_request
      0.42%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.01%  sshd             [kernel.kallsyms]  [k] __elv_add_request

* CPU:

# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ..................... 
.............................................
#
     52.73%  swapper          [kernel.kallsyms]      [k] native_safe_halt
      8.31%  postgres         [vdso]                 [.] __vdso_gettimeofday
      4.46%  postgres         [kernel.kallsyms]      [k] do_syscall_64
      4.16%  postgres         libpthread-2.26.so     [.] __libc_recv
      1.58%  postgres         [kernel.kallsyms]      [k] __lock_text_start
      1.52%  postgres         [kernel.kallsyms]      [k] 
pvclock_clocksource_read
      0.81%  postgres         [kernel.kallsyms]      [k] 
copy_user_enhanced_fast_string


* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     85.81%  postgres         [kernel.kallsyms]  [k] consume_skb
     12.03%  swapper          [kernel.kallsyms]  [k] consume_skb
      0.97%  postgres         [kernel.kallsyms]  [k] __consume_stateless_skb
      0.85%  ksoftirqd/3      [kernel.kallsyms]  [k] consume_skb
      0.24%  perf             [kernel.kallsyms]  [k] consume_skb


case 2: pg_basebackup 4 workers: with patch and with load

* Disk I/O:

# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. .....................
#
     97.26%  kworker/u30:0    [kernel.kallsyms]  [k] __elv_add_request
      1.45%  pg_basebackup    [kernel.kallsyms]  [k] __elv_add_request
      0.95%  kworker/u30:1    [kernel.kallsyms]  [k] __elv_add_request
      0.14%  xfsaild/xvda1    [kernel.kallsyms]  [k] __elv_add_request


* CPU:

# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ........................ 
.................................................
#
     89.25%  swapper          [kernel.kallsyms]         [k] native_safe_halt
      0.93%  pg_basebackup    [kernel.kallsyms]         [k] 
__lock_text_start
      0.91%  swapper          [kernel.kallsyms]         [k] 
__lock_text_start
      0.69%  pg_basebackup    [kernel.kallsyms]         [k] 
copy_user_enhanced_fast_string
      0.45%  swapper          [kernel.kallsyms]         [k] do_csum


* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead  Command          Shared Object Symbol
# ........  ...............  ................. ...........................
#
     90.28%  pg_basebackup    [kernel.kallsyms]  [k] consume_skb
      9.09%  swapper          [kernel.kallsyms]  [k] consume_skb
      0.29%  ksoftirqd/1      [kernel.kallsyms]  [k] consume_skb
      0.21%  sshd             [kernel.kallsyms]  [k] consume_skb


The detailed perf report is attached, with different scenarios, i.e. 
without patch (with and without load for server and client) , with patch 
(with and without load for 1, 2, 4, 8 workers for both server and 
client). The file name should self explain the cases.

Let me know if more information required.

Best regards,

David

On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
> Hi,
>
> We at EnterpriseDB did some performance testing around this 
> parallel backup to check how this is beneficial and below are the 
> results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> *Machine details: *
>
> 1: Server (on which local backups performed and used as server for 
> remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
> *Server:*
>
> RAM:500 GB
> CPU details:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 128
> On-line CPU(s) list: 0-127
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 8
> NUMA node(s): 8
> Filesystem:ext4
>
>
> *Client:*
> RAM:490 GB
> CPU details:
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 192
> On-line CPU(s) list: 0-191
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 24
> Filesystem:ext4
>
> Below are the results for the local test:
>
> Data size 	without paralle backup
> patch 	parallel backup with
> 1 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 2 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 4 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 8 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) 	real 0m27.016s
> user 0m3.378s
> sys 0m23.059s 	real 0m30.314s
> user 0m3.575s
> sys 0m22.946s 	12% performance
> decreased 	real 0m20.400s
> user 0m3.622s
> sys 0m29.670s 	27% performace
> increased 	real 0m15.331s
> user 0m3.706s
> sys 0m39.189s 	43% performance
> increased 	real 0m15.094s
> user 0m3.915s
> sys 1m23.350s 	44% performace
> increased.
> 50GB
> (50 tables - each table around 1.05 GB) 	real 2m11.049s
> user 0m16.464s
> sys 2m1.757s 	real 2m26.621s
> user 0m18.497s
> sys 2m4.792s 	21% performance
> decreased 	real 1m9.581s
> user 0m18.298s
> sys 2m12.030s 	46% performance
> increased 	real 0m53.894s
> user 0m18.588s
> sys 2m47.390s 	58% performance
> increased. 	real 0m55.373s
> user 0m18.423s
> sys 5m57.470s 	57% performance
> increased.
> 100GB
> (100 tables - each table around 1.05 GB) 	real 4m4.776s
> user 0m33.699s
> sys 3m27.777s 	real 4m20.862s
> user 0m35.753s
> sys 3m28.262s 	6% performance
> decreased 	real 2m37.411s
> user 0m36.440s
> sys 4m16.424s" 	35% performance
> increased 	real 1m49.503s
> user 0m37.200s
> sys 5m58.077s 	55% performace
> increased 	real 1m36.762s
> user 0m36.987s
> sys 9m36.906s 	60% performace
> increased.
> 200GB
> (200 tables - each table around 1.05 GB) 	real 10m34.998s
> user 1m8.471s
> sys 7m21.520s 	real 11m30.899s
> user 1m12.933s
> sys 8m14.496s 	8% performance
> decreased 	real 6m8.481s
> user 1m13.771s
> sys 9m31.216s 	41% performance
> increased 	real 4m2.403s
> user 1m18.331s
> sys 12m29.661s 	61% performance
> increased 	real 4m3.768s
> user 1m24.547s
> sys 15m21.421s 	61% performance
> increased
>
>
> Results for the remote test:
>
> Data size 	without paralle backup
> patch 	parallel backup with
> 1 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 2 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 4 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch) 	parallel backup with
> 8 worker 	% performance
> increased/decreased
> compare to normal
> backup
> (without patch)
> 10 GB
> (10 tables - each table around 1.05 GB) 	real 1m36.829s
> user 0m2.124s
> sys 0m14.004s 	real 1m37.598s
> user 0m3.272s
> sys 0m11.110s 	0.8% performance
> decreased 	real 1m36.753s
> user 0m2.627s
> sys 0m15.312s 	0.08% performance
> increased. 	real 1m37.212s
> user 0m3.835s
> sys 0m13.221s 	0.3% performance
> decreased. 	real 1m36.977s
> user 0m4.475s
> sys 0m17.937s 	0.1% perfomance
> decreased.
> 50GB
> (50 tables - each table around 1.05 GB) 	real 7m54.211s
> user 0m10.826s
> sys 1m10.435s 	real 7m55.603s
> user 0m16.535s
> sys 1m8.147s 	0.2% performance
> decreased 	real 7m53.499s
> user 0m18.131s
> sys 1m8.822s 	0.1% performance
> increased. 	real 7m54.687s
> user 0m15.818s
> sys 1m30.991s 	0.1% performance
> decreased 	real 7m54.658s
> user 0m20.783s
> sys 1m34.460s 	0.1% performance
> decreased
> 100GB
> (100 tables - each table around 1.05 GB) 	real 15m45.776s
> user 0m21.802s
> sys 2m59.006s 	real 15m46.315s
> user 0m32.499s
> sys 2m47.245s 	0.05% performance
> decreased 	real 15m46.065s
> user 0m28.877s
> sys 2m21.181s 	0.03% performacne
> drcreased 	real 15m47.793s
> user 0m30.932s
> sys 2m36.708s 	0.2% performance
> decresed 	real 15m47.129s
> user 0m35.151s
> sys 3m23.572s 	0.14% performance
> decreased.
> 200GB
> (200 tables - each table around 1.05 GB) 	real 32m55.720s
> user 0m50.602s
> sys 5m38.875s 	real 31m30.602s
> user 0m45.377s
> sys 4m57.405s 	4% performance
> increased 	real 31m30.214s
> user 0m55.023s
> sys 5m8.689s 	4% performance
> increased 	real 31m31.187s
> user 1m13.390s
> sys 5m40.861s 	4% performance
> increased 	real 31m31.729s
> user 1m4.955s
> sys 6m35.774s 	4% performance
> decreased
>
>
>
> Client & Server on the same machine, the result shows around 50% 
> improvement in parallel run with worker 4 and 8.  We don’t see the 
> huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t 
> see any major benefit in performance.  This testing result matches the 
> testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU 
> usage and other information. What we noticed is that server is 
> consuming the CPU almost 100% whole the time and pg_stat_activity 
> shows that server is busy with ClientWrite most of the time.
>
>
> Attaching captured output for
>
> 1) Top command output on the server after every 5 second
>
> 2) pg_stat_activity output after every 5 second
>
> 3) Top command output on the client after every 5 second
>
>
> Do let me know if anyone has further questions/inputs for the 
> benchmarking.
>
>
> Thanks to Rushabh Lathia for helping me with this testing.
>
> On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com 
> <mailto:amit.kapila16@gmail.com>> wrote:
>
>     On Mon, Apr 27, 2020 at 10:23 PM David Zhang
>     <david.zhang@highgo.ca <mailto:david.zhang@highgo.ca>> wrote:
>     >
>     > Hi,
>     >
>     > Here is the parallel backup performance test results with and
>     without
>     > the patch "parallel_backup_v15" on AWS cloud environment. Two
>     > "t2.xlarge" machines were used: one for Postgres server and the
>     other
>     > one for pg_basebackup with the same machine configuration
>     showing below.
>     >
>     > Machine configuration:
>     >      Instance Type        :t2.xlarge
>     >      Volume type          :io1
>     >      Memory (MiB)         :16GB
>     >      vCPU #               :4
>     >      Architecture         :x86_64
>     >      IOP                  :6000
>     >      Database Size (GB)   :108
>     >
>     > Performance test results:
>     > without patch:
>     >      real 18m49.346s
>     >      user 1m24.178s
>     >      sys 7m2.966s
>     >
>     > 1 worker with patch:
>     >      real 18m43.201s
>     >      user 1m55.787s
>     >      sys 7m24.724s
>     >
>     > 2 worker with patch:
>     >      real 18m47.373s
>     >      user 2m22.970s
>     >      sys 11m23.891s
>     >
>     > 4 worker with patch:
>     >      real 18m46.878s
>     >      user 2m26.791s
>     >      sys 13m14.716s
>     >
>     > As required, I didn't have the pgbench running in parallel like
>     we did
>     > in the previous benchmark.
>     >
>
>     So, there doesn't seem to be any significant improvement in this
>     scenario.  Now, it is not clear why there was a significant
>     improvement in the previous run where pgbench was also running
>     simultaneously.  I am not sure but maybe it is because when a lot of
>     other backends were running (performing read-only workload) the
>     backend that was responsible for doing backup was getting frequently
>     scheduled out and it slowed down the overall backup process. And when
>     we start using multiple backends for backup one or other backup
>     process is always running making the overall backup faster. One idea
>     to find this out is to check how much time backup takes when we run it
>     with and without pgbench workload on HEAD (aka unpatched code).  Even
>     if what I am saying is true or there is some other reason due to which
>     we are seeing speedup in some cases (where there is a concurrent
>     workload), it might not make the case for using multiple backends for
>     backup but still, it is good to find that information as it might help
>     in designing this feature better.
>
>     > The perf report files for both Postgres server and pg_basebackup
>     sides
>     > are attached.
>     >
>
>     It is not clear which functions are taking more time or for which
>     functions time is reduced as function symbols are not present in the
>     reports.  I think you can refer
>     "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
>     take profiles and additionally use -fno-omit-frame-pointer during
>     configure (you can use CFLAGS="-fno-omit-frame-pointer during
>     configure).
>
>
>     -- 
>     With Regards,
>     Amit Kapila.
>     EnterpriseDB: http://www.enterprisedb.com
>
>
>
>
> -- 
> -- 
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
-- 
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca