Thread

  1. Non-blocking archiver process

    Ronan Dunklau <ronan.dunklau@aiven.io> — 2025-07-04T06:46:08Z

    Hello,
    
    We've noticed a behavior that seems surprising to us. 
    Since DROP DATABASE now waits for a ProcSignalBarrier, it can hang up 
    indefinitely if the archive_command hangs. 
    
    The reason for this is that the builtin archive module doesn't process any 
    interrupts while the archiving command is running, as it's run with a system() 
    call, blocking undefintely. 
    
    Before rushing on to implement a non-blocking archive library (perhaps using 
    popen or posix_spawn, while keeping other systems in mind), what unintended 
    consequences would it have to actually run the archive_command in a non-
    blocking way, and checking interrupts while it runs ?
    
    Thanks !
    
    --
    Ronan Dunklau
    
    
    
    
    
    
  2. Re: Non-blocking archiver process

    Noah Misch <noah@leadboat.com> — 2025-07-05T03:01:07Z

    On Fri, Jul 04, 2025 at 08:46:08AM +0200, Ronan Dunklau wrote:
    > We've noticed a behavior that seems surprising to us. 
    > Since DROP DATABASE now waits for a ProcSignalBarrier, it can hang up 
    > indefinitely if the archive_command hangs. 
    > 
    > The reason for this is that the builtin archive module doesn't process any 
    > interrupts while the archiving command is running, as it's run with a system() 
    > call, blocking undefintely. 
    > 
    > Before rushing on to implement a non-blocking archive library (perhaps using 
    > popen or posix_spawn, while keeping other systems in mind), what unintended 
    > consequences would it have to actually run the archive_command in a non-
    > blocking way, and checking interrupts while it runs ?
    
    I can't think of any unintended consequences.  I think we just missed this
    when adding the first use of ProcSignalBarrier (v15).  Making this easier to
    miss, archiver spent most of its history not connecting to shared memory.  Its
    shared memory connection appeared in v14.
    
    
    
    
  3. Re: Non-blocking archiver process

    Patrick Stählin <me@packi.ch> — 2025-07-27T15:45:35Z

    On 05.07.25 05:01, Noah Misch wrote:
    > On Fri, Jul 04, 2025 at 08:46:08AM +0200, Ronan Dunklau wrote:
    >> We've noticed a behavior that seems surprising to us.
    >> Since DROP DATABASE now waits for a ProcSignalBarrier, it can hang up
    >> indefinitely if the archive_command hangs.
    >>
    >> The reason for this is that the builtin archive module doesn't process any
    >> interrupts while the archiving command is running, as it's run with a system()
    >> call, blocking undefintely.
    >>
    >> Before rushing on to implement a non-blocking archive library (perhaps using
    >> popen or posix_spawn, while keeping other systems in mind), what unintended
    >> consequences would it have to actually run the archive_command in a non-
    >> blocking way, and checking interrupts while it runs ?
    > 
    > I can't think of any unintended consequences.  I think we just missed this
    > when adding the first use of ProcSignalBarrier (v15).  Making this easier to
    > miss, archiver spent most of its history not connecting to shared memory.  Its
    > shared memory connection appeared in v14.
    
    I've taken some time, mostly for WIN32, to implement an interruptible 
    version of archive_command. My WIN32 days are long behind me, so it's 
    quite possible that this has some faults I'm not seeing. Then again, it 
    passes CI.
    I failed to make it work in WIN32 with popen since the handles it 
    returns can't be made non-blocking so this change is a bit bigger.
    
    @Ronan: Let me now if you'd like to be attributed more, I took some 
    inspiration from a private repos with your prototype.
    
    I don't know if I should add that to the running commitfest for PG19 or 
    if this is something that would need to be backported. Just let me know.
    
    Thanks,
    Patrick
  4. Re: Non-blocking archiver process

    Artem Gavrilov <artem.gavrilov@percona.com> — 2025-09-25T11:13:24Z

    Hello Patrick
    
    I did a review of your patch.
    
    Initial Run
    ===========
    The patch applies cleanly to HEAD (196063d6761). All tests successfully
    pass on MacOS 15.7.
    
    Comments
    ===========
    1) Instead of `malloc` and `free` it should be `palloc` and `pfree`.
    
    2) In fact `archiveFileno` is a file descriptor, and the first variable one
    is not. I think it's worth to rename them to `archiveFile` and `archiveFd`.
    Or maybe even just `file` and `fd` as the scope there is not so big.
    >
    > FILE   *archiveFd = NULL;
    > int archiveFileno;
    
    
    3)  Variable name `bytesRead` is rare in PG code base. It is used only two
    times, while `readBytes` is used four times. Other variants, like `nbytes`
    are more common. Let's pick some popular name.
    
    4) Variable name `dwRc` is unique for the PG codebase and not really clear
    as for me. How about name it just `exitcode`?
    
    5) `return` is redundant here as it will never be reached.
    
    ereport(FATAL,
    >         (errmsg_internal("Failed to malloc win32cmd %m")));
    > return false;
    
    
    6) Same here, `free` and `return` are unreachable due ereport with fatal
    level.
    
    > ereport(FATAL,
    >         (errmsg("CreateProcess() call failed: %m (error code %lu)",
    >                 GetLastError())));
    > free(win32cmd);
    > return false;
    
    
    7) This loop can be stuck forever as `WaitForSingleObject` may
    return `WAIT_FAILED*` *and it's not always retriable.
    
    > while (true)
    > {
    >     CHECK_FOR_INTERRUPTS();
    >     if (WaitForSingleObject(pi.hProcess, POLL_TIMEOUT_MSEC) ==
    > WAIT_OBJECT_0)
    >         break;
    > }
    
    -- 
    
    <https://www.percona.com/>
    
    Artem Gavrilov
    Senior Software Engineer, Percona
    
    artem.gavrilov@percona.com
    
  5. Re: Non-blocking archiver process

    Robert Haas <robertmhaas@gmail.com> — 2025-10-14T18:49:57Z

    Here are a few comments from me:
    
    I think that by calling system(), anything that is printed out by the child
    process ends up in the PostgreSQL log. With the patch, the output of the
    command seems like it will be read into a buffer and discarded. That's a
    significant behavior difference.
    
    This patch has a 10ms polling loop after which it checks for interrupts,
    and then repeats. I think our normal convention in these kinds of cases is
    to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s)
    which results in fewer processor wakeups, while actually still being more
    responsive because the arrival of an interrupt should set the process latch
    and terminate the wait.
    
    In general, I agree with Artem's comments about writing code that fits the
    PostgreSQL style. We don't want to invent new ways of solving problems for
    which we already have infrastructure, nor do we want to solve problems in
    this case in a way that is different from what we do in other cases. Using
    ereport appropriately is part of that, but there's a lot more to it. Artem
    mentioned malloc and free, but notice, for example, that we also have our
    own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that
    here, but we shouldn't *accidentally* not use that here.
    
    I wonder whether we should really be using popen() in any form, actually.
    If what we want is for the output of the command to go to our standard
    output, as it does with system(), that's exactly what popen() is designed
    not to do, so it doesn't seem like a natural fit. Perhaps we should be
    using fork() + exec()? Or maybe we already have some good infrastructure
    for this we can reuse somewhere?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
  6. Re: Non-blocking archiver process

    BharatDB <bharatdbpg@gmail.com> — 2025-11-10T10:41:21Z

    Dear PostgreSQL Community,
    
    I found 8 Bugs in `shell_archive.c`
    
    
    While studying how `archive_command` works, I discovered **8 real issues**
    that affect reliability, performance, and safety:
    
    | # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command
    failure | Archiver hangs forever | | 2 | Unreachable code after `return` |
    Dead logic | | 3 | Discarded `stdout` from archive command | `DROP
    DATABASE` hangs for full command duration | | 4 | Aggressive polling with
    `Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in
    backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`,
    `bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` |
    Missed PG infrastructure | | 8 | Redundant `return;` in `void` function |
    Style issue |
    
    I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style
    fixes**:
    
    - Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used
    `fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` /
    `pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`,
    `bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE`
    hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed
    redundant `return;` and dead code This is my contribution to improve the
    PostgreSQL archiver process. I have attached a patch implementing the
    discussed fixes — please review and share any suggestions or feedback. Thanks
    !
    Lakshmi
    
    
    On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > Here are a few comments from me:
    >
    > I think that by calling system(), anything that is printed out by the
    > child process ends up in the PostgreSQL log. With the patch, the output of
    > the command seems like it will be read into a buffer and discarded. That's
    > a significant behavior difference.
    >
    > This patch has a 10ms polling loop after which it checks for interrupts,
    > and then repeats. I think our normal convention in these kinds of cases is
    > to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s)
    > which results in fewer processor wakeups, while actually still being more
    > responsive because the arrival of an interrupt should set the process latch
    > and terminate the wait.
    >
    > In general, I agree with Artem's comments about writing code that fits the
    > PostgreSQL style. We don't want to invent new ways of solving problems for
    > which we already have infrastructure, nor do we want to solve problems in
    > this case in a way that is different from what we do in other cases. Using
    > ereport appropriately is part of that, but there's a lot more to it. Artem
    > mentioned malloc and free, but notice, for example, that we also have our
    > own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that
    > here, but we shouldn't *accidentally* not use that here.
    >
    > I wonder whether we should really be using popen() in any form, actually.
    > If what we want is for the output of the command to go to our standard
    > output, as it does with system(), that's exactly what popen() is designed
    > not to do, so it doesn't seem like a natural fit. Perhaps we should be
    > using fork() + exec()? Or maybe we already have some good infrastructure
    > for this we can reuse somewhere?
    >
    > --
    > Robert Haas
    > EDB: http://www.enterprisedb.com
    >
    
  7. Re: Non-blocking archiver process

    Chao Li <li.evan.chao@gmail.com> — 2025-11-12T01:42:32Z

    
    > On Nov 10, 2025, at 18:41, BharatDB <bharatdbpg@gmail.com> wrote:
    > 
    > Dear PostgreSQL Community, 
    > I found 8 Bugs in `shell_archive.c` 
    > 
    > While studying how `archive_command` works, I discovered **8 real issues** that affect reliability, performance, and safety: 
    > | # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command failure | Archiver hangs forever | | 2 | Unreachable code after `return` | Dead logic | | 3 | Discarded `stdout` from archive command | `DROP DATABASE` hangs for full command duration | | 4 | Aggressive polling with `Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`, `bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` | Missed PG infrastructure | | 8 | Redundant `return;` in `void` function | Style issue | 
    > I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style fixes**: 
    > - Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used `fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` / `pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`, `bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE` hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed redundant `return;` and dead code This is my contribution to improve the PostgreSQL archiver process. I have attached a patch implementing the discussed fixes — please review and share any suggestions or feedback. Thanks !
    > Lakshmi
    > 
    
    Thanks for the patch, but I just cannot pass build in my end:
    ```
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Werror=unguarded-availability-new -Wendif-labels -Wmissing-format-attribute -Wcast-function-type -Wformat-security -Wmissing-variable-declarations -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -Wno-format-truncation -Wno-cast-function-type-strict -g -O2 -I../../../src/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX15.5.sdk   -I/opt/homebrew/Cellar/icu4c@77/77.1/include    -c -o shell_archive.o shell_archive.c -MMD -MP -MF .deps/shell_archive.Po
    shell_archive.c:19:10: fatal error: 'latch.h' file not found
       19 | #include "latch.h"  /* For WaitLatchOrSocket */
          |          ^~~~~~~~~
    1 error generated.
    make[3]: *** [shell_archive.o] Error 1
    make[2]: *** [archive-recursive] Error 2
    make[1]: *** [all-backend-recurse] Error 2
    make: *** [all-src-recurse] Error 2
    ```
    
    Also, as a general comment, you need do pgident on the file you changed.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  8. Re: Non-blocking archiver process

    BharatDB <bharatdbpg@gmail.com> — 2025-11-12T11:58:37Z

    Dear Evan and PostgreSQL community,
    
    Thank you for the feedback and for reporting the build failure.
    
    I have prepared  v2 of the patch, which fixes the issues you mentioned:
    
    - Replaced the incorrect include `#include "latch.h"` with `#include
    "storage/latch.h"`, fixing the macOS build failure.
    - Wrapped Windows-specific code (`win32cmd`, `pfree(win32cmd)`, etc.)
    inside `#ifdef WIN32`.
    - Used `OpenPipeStream()` and `fileno(archiveFile)` for safe non-blocking
    archive command execution.
    - Switched to `palloc()`/`pfree()` for memory safety in backend context.
    - Read and logged command stdout to avoid hangs.
    - Cleaned up redundant `return;` and unreachable code.
    - Ran `pgindent` for proper PostgreSQL code style formatting.
    
    Verification
    - Built successfully on Linux with:
        `make distclean && ./configure --with-zlib && make -j$(nproc)`
    - Confirmed clean compile of `src/backend/archive/shell_archive.c` with no
    warnings.
    - Verified zlib linked (`-lz`) and archiver module builds successfully.
    
    This v2 should now build cleanly on both macOS and Linux and follows
    PostgreSQL’s coding conventions.
    
    Attached is the updated patch:
    `0001-v2-shell_archive-refactor-fix-build-and-format.patch`
    
    Best regards,
    Lakshmi
    <bharatdbpg@gmail.com>
    
    On Wed, Nov 12, 2025 at 7:13 AM Chao Li <li.evan.chao@gmail.com> wrote:
    
    >
    >
    > > On Nov 10, 2025, at 18:41, BharatDB <bharatdbpg@gmail.com> wrote:
    > >
    > > Dear PostgreSQL Community,
    > > I found 8 Bugs in `shell_archive.c`
    > >
    > > While studying how `archive_command` works, I discovered **8 real
    > issues** that affect reliability, performance, and safety:
    > > | # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command
    > failure | Archiver hangs forever | | 2 | Unreachable code after `return` |
    > Dead logic | | 3 | Discarded `stdout` from archive command | `DROP
    > DATABASE` hangs for full command duration | | 4 | Aggressive polling with
    > `Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in
    > backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`,
    > `bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` |
    > Missed PG infrastructure | | 8 | Redundant `return;` in `void` function |
    > Style issue |
    > > I refactored `src/backend/archive/shell_archive.c` with
    > **PostgreSQL-style fixes**:
    > > - Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used
    > `fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` /
    > `pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`,
    > `bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE`
    > hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed
    > redundant `return;` and dead code This is my contribution to improve the
    > PostgreSQL archiver process. I have attached a patch implementing the
    > discussed fixes — please review and share any suggestions or feedback.
    > Thanks !
    > > Lakshmi
    > >
    >
    > Thanks for the patch, but I just cannot pass build in my end:
    > ```
    > gcc -Wall -Wmissing-prototypes -Wpointer-arith
    > -Wdeclaration-after-statement -Werror=vla
    > -Werror=unguarded-availability-new -Wendif-labels
    > -Wmissing-format-attribute -Wcast-function-type -Wformat-security
    > -Wmissing-variable-declarations -fno-strict-aliasing -fwrapv
    > -fexcess-precision=standard -Wno-unused-command-line-argument
    > -Wno-compound-token-split-by-macro -Wno-format-truncation
    > -Wno-cast-function-type-strict -g -O2 -I../../../src/include -isysroot
    > /Library/Developer/CommandLineTools/SDKs/MacOSX15.5.sdk
    >  -I/opt/homebrew/Cellar/icu4c@77/77.1/include    -c -o shell_archive.o
    > shell_archive.c -MMD -MP -MF .deps/shell_archive.Po
    > shell_archive.c:19:10: fatal error: 'latch.h' file not found
    >    19 | #include "latch.h"  /* For WaitLatchOrSocket */
    >       |          ^~~~~~~~~
    > 1 error generated.
    > make[3]: *** [shell_archive.o] Error 1
    > make[2]: *** [archive-recursive] Error 2
    > make[1]: *** [all-backend-recurse] Error 2
    > make: *** [all-src-recurse] Error 2
    > ```
    >
    > Also, as a general comment, you need do pgident on the file you changed.
    >
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    >
    >
    >
    >
    >
    
  9. Re: Non-blocking archiver process

    Robert Haas <robertmhaas@gmail.com> — 2025-11-19T19:47:36Z

    On Wed, Nov 12, 2025 at 6:57 AM BharatDB <bharatdbpg@gmail.com> wrote:
    > Dear Evan and PostgreSQL community,
    
    1. Please stop sending AI-generated emails and patches to this list.
    If you want to use an AI tool to help you generate emails and patches,
    that's fine with me. (I can't speak for anyone else.) But you're
    clearly just cutting and pasting, and the tools are not yet good
    enough for that to be a viable way to move PostgreSQL forward.
    Moreover, if I wanted to try to move PostgreSQL forward by cutting and
    pasting from an AI, I could do that myself.
    
    2. Please stop sharing a single email address between multiple people.
    Submit the work you did yourself under your own name, not the "work" a
    chatbot did under a company name. Contributions to PostgreSQL are
    credited to individuals, not companies, as you can see from the commit
    log, the release notes, and how other people are using this list.
    
    Thanks,
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com