Thread

  1. Re: Adding pg_dump flag for parallel export to pipes

    solai v <solai.cdac@gmail.com> — 2026-05-22T10:34:23Z

    Hi all,
    
    Thank you for the updated patch.
    
    On Fri, May 22, 2026 at 1:03 PM Nitin Motiani <nitinmotiani@google.com> wrote:
    >
    > Changed how pipe commands are quoted in the Windows test. The latest
    > versions are attached.
    
    I worked on reproducing the current limitation around parallel dumps
    and then tested the latest v16 patch adding --pipe support for
    pg_dump. To begin with, I verified the existing behavior.
    For example:
    pg_dump postgres | gzip > dump.sql.gz works, but does not support parallelism,
    whereas:
    pg_dump -Fd -j 4 -f dumpdir postgres
    du -sh dumpdir
    21M dumpdir
    requires intermediate disk storage. This demonstrates the current
    limitation where users must choose between parallelism and streaming
    pipelines.
    I then tested the patch introducing --pipe support. The feature is
    quite useful for modern workflows where users want to stream dump
    output directly to compression or upload pipelines without relying on
    intermediate storage. Basic functionality worked as expected.
    For example:
    pg_dump -p 55432 -Fd -j 4 --pipe="cat > dump.out" postgres, produced a
    ~38MB output file,
    and:
    pg_dump -p 55432 -Fd -j 4 --pipe="gzip > dump.gz" postgres produced, a
    compressed file (~11MB).
    The initial contents appeared valid:
    gunzip -c dump.gz | head
    1
    2
    3
    ...
    Also, no intermediate directory was created, confirming that the patch
    enables streaming without filesystem-backed staging. Error handling
    also behaved correctly.
    For example:
    --pipe="invalid_cmd"
    resulted in:
    pg_dump: error: pipe command failed: command not found
    and:
    --pipe="gzip | false"
    resulted in:
    pg_dump: error: pipe command failed: child process exited with exit code 1
    However, I observed an important issue when using the feature with
    multiple parallel workers. Since the pipe command is executed per
    output file, using: --pipe="gzip > dump.gz", it results in multiple
    workers invoking independent gzip processes that all write to the same
    output file. This leads to corrupted or truncated output.
    In my testing:
    gunzip -c dump.gz > dump.sql
    failed with:
    gzip: dump.gz: unexpected end of file
    This suggests that concurrent writes to a shared output target are not
    coordinated and can result in invalid dumps. It would be helpful to
    clarify expected usage patterns here. For example: whether users are
    expected to generate distinct outputs per worker, or whether
    safeguards should be implemented to prevent multiple workers from
    writing to the same destination. Additionally, during failure
    scenarios I observed backend logs such as:
    FATAL: connection to client lost
    Broken pipe
    While this is expected when the pipe terminates prematurely, it may be
    worth considering whether error messaging or cleanup behavior can be
    made clearer from the user perspective.
    Overall, the feature is valuable and aligns well with modern backup
    workflows. However, behavior in multi-worker scenarios with shared
    pipe targets may need further clarification or safeguards to avoid
    data corruption. Looking forward to more feedback.
    
    
    Regards.
    Solai