Thread

  1. Re: Conflict detection for update_deleted in logical replication

    Nisha Moond <nisha.moond412@gmail.com> — 2025-07-25T11:38:01Z

    Hi All,
    
    We conducted performance testing of a bi-directional logical
    replication setup, focusing on the primary use case of the
    update_deleted feature.
    To simulate a realistic scenario, we used a high workload with limited
    concurrent updates, and well-distributed writes among servers.
    
    Used source
    ===========
    pgHead commit 62a17a92833 + v47 patch set
    
    Machine details
    ===============
    Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM
    
    Test-1: Distributed Write Load
    ==============================
    Highlight:
    -----------
     - In a bi-directional logical replication setup, with
    well-distributed write workloads and a thoughtfully tuned
    configuration to minimize lag (e.g., through row filters), TPS
    regression is minimal or even negligible.
     - Performance can be sustained with significantly fewer apply workers
    compared to the number of client connections on the publisher.
    
    Setup:
    --------
     - 2 Nodes(node1 and node2) are created(on same machine) of same
    configurations -
        autovacuum = false
        shared_buffers = '30GB'
        -- Also, worker and logical replication related parameters were
    increased as per requirement (see attached scripts for details).
     - Both nodes have two set of pgbench tables initiated with *scale=300*:
       -- set1: pgbench_pub_accounts, pgbench_pub_tellers,
    pgbench_pub_branches, and pgbench_pub_history
       -- set2: pgbench_accounts, pgbench_tellers, pgbench_branches, and
    pgbench_history
     - Node1 is publishing all changes for set1 tables and Node2 has
    subscribed for the same.
     - Node2 is publishing all changes for set2 tables and Node2 has
    subscribed for the same.
    Note: In all the tests, subscriptions are created with (origin=NONE)
    as it is a bi-directional replication.
    
    Workload Run:
    ---------------
     - On node1, pgbench(read-write) with option "-b simple-update" is run
    on set1 tables.
     - On node2, pgbench(read-write) with option "-b simple-update" is run
    on set2 tables.
     - #clients = 40
     - pgbench run duration = 10 minutes.
     - results were measured for 3 runs of each case.
    
    Test Runs:
    - Six tests were done with varying #pub-sub pairs and below is TPS
    reduction in both nodes for all the cases:
    
    | Case | # Pub-Sub Pairs | TPS Reduction  |
    | ---- | --------------- | -------------- |
    | 01   | 30              | 0–1%           |
    | 02   | 15              | 6–7%           |
    | 03   | 5               | 7–8%           |
    | 04   | 3               | 0-1%           |
    | 05   | 2               | 14–15%         |
    | 06   | 1 (no filters)  | 37–40%         |
    
     - With appropriate row filters and distribution of load across apply
    workers, the performance impact of update_deleted patch can be
    minimized.
     - Just 3 pub-sub pairs are enough to keep TPS close to the baseline
    for the given workload.
     - Poor distribution of replication workload (e.g., only 1–2 pub-sub
    pairs) leads to higher overhead due to increased apply worker
    contention.
    ~~~~
    
    Detailed results for all the above cases:
    
    case-01:
    ---------
     - Created 30 pub-sub pairs to distribute the replication load between
    30 apply workers on each node.
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   5633.377165   5579.244492   6385.839585   6482.775975
    2   5926.328644   5947.035275   6216.045707   6416.113723
    3   5522.804663   5542.380108   6541.031535   6190.123097
    median   5633.377165   5579.244492   6385.839585   6416.113723
    regression  -1%   0%
    
     - No regression
    ~~~~
    
    case-02:
    ---------
     - #pub-sub pairs = 15
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   8207.708475   7584.288026   8854.017934   8204.301497
    2   8120.979334   7404.735801   8719.451895   8169.697482
    3   7877.859139   7536.762733   8542.896669   8177.853563
    median   8120.979334   7536.762733   8719.451895   8177.853563
    regression   -7%   -6%
    
     - There was 6-7% TPS reduction on both nodes, which seems in acceptable range.
    ~~~
    
    case-03:
    ---------
     - #pub-sub pairs = 5
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   12325.90315   11664.7445   12997.47104   12324.025
    2   12060.38753   11370.52775   12728.41287   12127.61208
    3   12390.3677   11367.10255   13135.02558   12036.71502
    median   12325.90315   11370.52775   12997.47104   12127.61208
    regression   -8%   -7%
    
     - There was 7-8% TPS reduction on both nodes, which seems in acceptable range.
    ~~~
    
    case-04:
    ---------
     -  #pub-sub pairs = 3
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   13186.22898   12464.42604   13973.8394   13370.45596
    2   13038.15817   13014.03906   13866.51966   13866.47395
    3   13881.10513   13868.71971   14687.67444   14516.33854
    median   13186.22898   13014.03906   13973.8394   13866.47395
    regression   -1%   -1%
    
     - No regression observed
    
    
    case-05:
    ---------
     -  #pub-sub pairs = 2
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   15936.98792   13563.98476   16734.35292   14527.22942
    2   16031.23003   13648.24979   16958.49609   14657.80008
    3   16113.79935   13550.68329   17029.5035   14509.84068
    median   16031.23003   13563.98476   16958.49609   14527.22942
    regression   -15%   -14%
    
     - The TPS reduced by 14-15% on both nodes.
    ~~~
    
    case-06:
    ---------
     - #pub-sub pairs = 1 , no row filter is used on both nodes
    
    Results:
    #run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
    patched_Node2_TPS
    1   22900.06507   13609.60639   23254.25113   14592.25271
    2   22110.98426   13907.62583   22755.89945   14805.73717
    3   22719.88901   13246.41484   23055.70406   14256.54223
    median 22719.88901 13609.60639 23055.70406   14592.25271
    regression   -40%   -37%
    
    - The regression observed is 37-40% on both nodes.
    ~~~~
    
    
    Test-2: High concurrency
    ===========================
    Highlight:
    ------------
     Despite poor write distribution across servers and high concurrent
    updates, distributing replication load across multiple apply workers
    limited the TPS drop to just 15–18%.
    
    Setup:
    ---------------
     - 2 Nodes(node1 and node2) are created with same configuration as in Test-01
     - Both nodes have same set of pgbench tables initialized with
    scale=60 (small tables to increase concurrent updates)
     - Both nodes are subscribed to each other for all the changes.
      -- 15 pub-sub pairs are created using row filters to distribute the
    load and all the subscriptions are created with (origin = NONE).
    
    Workload Run:
    ---------------
     - On both nodes,the default pgbench(read-write) is run on tables.
     - #clients = 15
     - pgbench run duration = 5 minutes.
     - results were measured for 2 runs of each case.
    
    Results:
    
    Node1 TPS:
    #run   pgHead_Node1_TPS   patched_Node1_TPS
    1   9585.470749   7660.645249
    2   9442.364918   8035.531482
    median   9513.917834   7848.088366
    regression     -18%
    
    Node2 TPS:
    
    #run   pgHead_Node2_TPS   patched_Node2_TPS
    1   9485.232611   8248.783417
    2   9468.894086   7938.991136
    median  9477.063349   8093.887277
    regression    -15%
    
    - Under high concurrent writes to the same small tables, contention
    increases and the TPS drop is 15-18% on both nodes.
    ~~~~
    
    The scripts used for above tests are attached.
    
    --
    Thanks,
    Nisha