Thread

  1. Re: Patch: dumping tables data in multiple chunks in pg_dump

    Hannu Krosing <hannuk@google.com> — 2025-11-13T18:02:43Z

    I just ran a test by generating a 408GB table and then dumping it both ways
    
    $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
    /tmp/plain.dump largedb
    
    real    39m54.968s
    user    37m21.557s
    sys     2m32.422s
    
    $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
    --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
    
    real    5m52.965s
    user    40m27.284s
    sys     3m53.339s
    
    So parallel dump with 8 workers using 1GB (128k pages) chunks runs
    almost 7 times faster than the sequential dump.
    
    this was a table that had no TOAST part. I will run some more tests
    with TOASTed tables next and expect similar or better improvements.
    
    
    
    On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
    <ashutosh.bapat.oss@gmail.com> wrote:
    >
    > Hi Hannu,
    >
    > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk@google.com> wrote:
    > >
    > > Attached is a patch that adds the ability to dump table data in multiple chunks.
    > >
    > > Looking for feedback at this point:
    > >  1) what have I missed
    > >  2) should I implement something to avoid single-page chunks
    > >
    > > The flag --huge-table-chunk-pages which tells the directory format
    > > dump to dump tables where the main fork has more pages than this in
    > > multiple chunks of given number of pages,
    > >
    > > The main use case is speeding up parallel dumps in case of one or a
    > > small number of HUGE tables so parts of these can be dumped in
    > > parallel.
    >
    > Have you measured speed up? Can you please share the numbers?
    >
    > --
    > Best Wishes,
    > Ashutosh Bapat