Thread

Re: Parallel Apply

Nisha Moond <nisha.moond412@gmail.com> — 2025-08-30T05:12:56Z
Hi,

I ran tests to compare the performance of logical synchronous
replication with parallel-apply against physical synchronous
replication.

Highlights
===============
On pgHead:(current behavior)
 - With synchronous physical replication set to remote_apply, the
Primary’s TPS drops by ~60% (≈2.5x slower than asynchronous).
 - With synchronous logical replication set to remote_apply, the
Publisher’s TPS drops drastically by ~94% (≈16x slower than
asynchronous).

With proposed Parallel-Apply Patch(v1):
 - Parallel apply significantly improves logical synchronous
replication performance by 5-6×.
 - With 40 parallel workers on the subscriber, the Publisher achieves
30045.82 TPS, which is 5.5× faster than the no-patch case (5435.46
TPS).
 - With the patch, the Publisher’s performance is only ~3x slower than
asynchronous, bringing it much closer to the physical replication
case.

Machine details
===============
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM

Source code:
===============
 - pgHead(e9a31c0cc60) and v1 patch

Test-01: Physical replication:
======================
 - To measure the physical synchronous replication performance on pgHead.

Setup & Workload:
-----------------
Primary --> Standby
 - Two nodes created in physical (primary-standby) replication setup.
 - Default pgbench (read-write) was run on the Primary with scale=300,
#clients=40, run duration=20 minutes.
 - The TPS is measured with the synchronous_commit set as "off" vs
"remote_apply" on pgHead.

Results:
---------
synchronous_commit    Primary_TPS    regression
OFF        90466.57743    -
remote_apply(run1)    35848.6558    -60%
remote_apply(run2)    35306.25479    -61%

 - on phHead, when synchronous_commit is set to "remote_apply" during
physical replication, the Primary experiences a 60–61% reduction in
TPS, which is ~2.5 times slower.
~~~

Test-02: Logical replication:
=====================
 - To measure the logical synchronous replication performance on
pgHead and with parallel-apply patch.

Setup & Workload:
-----------------
Publisher --> Subscriber
 - Two nodes created in logical (publisher-subscriber) replication setup.
 - Default pgbench (read-write) was run on the Pub with scale=300,
#clients=40, run duration=20 minutes.
 - The TPS is measured on pgHead and with the parallel-apply v1 patch.
 - The number of parallel workers was varied as 2, 4, 8, 16, 32, 40.

case-01: pgHead
-------------------
Results:
synchronous_commit    Primary_TPS    regression
pgHead(OFF)      89138.14626    --
pgHead(remote_apply)    5435.464525    -94%

 - By default(pgHead), the synchronous logical replication sees a 94%
drop in TPS which is -
 a) 16.4 times slower than the logical async case and,
 b) 6.6 times slower than physical sync replication case.

case-02: patched
---------------------
 - synchronous_commit = 'remote_apply'
 - measured the performance by varying #parallel workers as 2, 4, 8, 16, 32, 40

Results:
#workers    Primary_TPS      Improvement_with_patch    faster_than_no-patch
   2     9679.077736    78%     1.78x
   4     14329.64073    164%    2.64x
   8     21832.04285    302%    4.02x
  16    27676.47085    409%    5.09x
  32    29718.40090    447%    5.47x
  40    30045.82365    453%    5.53x

- The TPS on the publisher improves significantly as the number of
parallel workers increases.
- At 40 workers, the TPS reaches 30045.82, which is about 5.5x higher
than the no-patch case..
- With 40 parallel workers, logical sync replication is only about
1.2x slower than physical sync replication.
~~~

The scripts used for the tests are attached. We'll do tests with
larger data sets later and share results.

--
Thanks,
Nisha