Thread

  1. parallel data loading for pgbench -i

    Mircea Cadariu <cadariu.mircea@gmail.com> — 2025-11-17T12:46:12Z

    Hi,
    
    I propose a patch for speeding up pgbench -i through multithreading.
    
    To enable this, pass -j and then the number of workers you want to use.
    
    Here are some results I got on my laptop:
    
    
    master
    
    ---
    
    -i -s 100
    done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side 
    generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).
    
    -i -s 100 --partitions=10
    done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side 
    generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).
    
    
    patch (-j 10)
    
    ---
    
    -i -s 100 -j 10
    done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side 
    generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s).
    
    -i -s 100 -j 10 --partitions=10
    done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side 
    generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s).
    
    The speedup is more significant for the partitioned use-case. This is 
    because all workers can use COPY FREEZE (thus incurring a lower vacuum 
    penalty) because they create their separate partitions.
    
    For the non-partitioned case the speedup is lower, but I observe it 
    improves somewhat with larger scale factors. When parallel vacuum 
    support is merged, this should further reduce the time.
    
    I'd still need to update docs, tests, better integrate the code with its 
    surroundings, and other aspects. Would appreciate any feedback on what I 
    have so far though. Thanks!
    
    Kind regards,
    
    Mircea Cadariu
    
    
  2. Re: parallel data loading for pgbench -i

    Mircea Cadariu <cadariu.mircea@gmail.com> — 2026-05-08T18:11:46Z

    Hi Lakshmi and Hayato,
    
    
    Thanks a lot for your feedback.
    
    Attached for your consideration is v4, in which I address your remarks.
    
    -- 
    Thanks,
    Mircea Cadariu
    
  3. Re: parallel data loading for pgbench -i

    lakshmi <lakshmigcdac@gmail.com> — 2026-05-09T08:02:44Z

    On Fri, May 8, 2026 at 11:41 PM Mircea Cadariu <cadariu.mircea@gmail.com>
    wrote:
    
    > Hi Lakshmi and Hayato,
    >
    >
    > Thanks a lot for your feedback.
    >
    > Attached for your consideration is v4, in which I address your remarks.
    >
    Hi Mircea, Hayato,
    
    I tested the v4 patch on 19devel with a few different thread/partition
    combinations.
    
    The updated API looks much better now. I verified that:
    
       - parallel loading works correctly with -j
       - uneven partition distribution (for example 5 partitions with 2
       threads) also works fine
       - serial mode with -j 1 works again as expected
    
    The workers appear to run concurrently, and VACUUM time remains relatively
    small in my tests.
    
    Overall, the new approach looks much cleaner and more flexible compared to
    the earlier versions.
    
    Thanks again for the update.
    
    Best regards,
    Lakshmi