Re: Non-blocking archiver process

BharatDB <bharatdbpg@gmail.com>

From: BharatDB <bharatdbpg@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Artem Gavrilov <artem.gavrilov@percona.com>, Patrick Stählin <me@packi.ch>, Noah Misch <noah@leadboat.com>, Ronan Dunklau <ronan.dunklau@aiven.io>, pgsql-hackers@lists.postgresql.org
Date: 2025-11-10T10:41:21Z
Lists: pgsql-hackers

Attachments

Dear PostgreSQL Community,

I found 8 Bugs in `shell_archive.c`


While studying how `archive_command` works, I discovered **8 real issues**
that affect reliability, performance, and safety:

| # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command
failure | Archiver hangs forever | | 2 | Unreachable code after `return` |
Dead logic | | 3 | Discarded `stdout` from archive command | `DROP
DATABASE` hangs for full command duration | | 4 | Aggressive polling with
`Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in
backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`,
`bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` |
Missed PG infrastructure | | 8 | Redundant `return;` in `void` function |
Style issue |

I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style
fixes**:

- Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used
`fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` /
`pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`,
`bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE`
hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed
redundant `return;` and dead code This is my contribution to improve the
PostgreSQL archiver process. I have attached a patch implementing the
discussed fixes — please review and share any suggestions or feedback. Thanks
!
Lakshmi


On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <robertmhaas@gmail.com> wrote:

> Here are a few comments from me:
>
> I think that by calling system(), anything that is printed out by the
> child process ends up in the PostgreSQL log. With the patch, the output of
> the command seems like it will be read into a buffer and discarded. That's
> a significant behavior difference.
>
> This patch has a 10ms polling loop after which it checks for interrupts,
> and then repeats. I think our normal convention in these kinds of cases is
> to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s)
> which results in fewer processor wakeups, while actually still being more
> responsive because the arrival of an interrupt should set the process latch
> and terminate the wait.
>
> In general, I agree with Artem's comments about writing code that fits the
> PostgreSQL style. We don't want to invent new ways of solving problems for
> which we already have infrastructure, nor do we want to solve problems in
> this case in a way that is different from what we do in other cases. Using
> ereport appropriately is part of that, but there's a lot more to it. Artem
> mentioned malloc and free, but notice, for example, that we also have our
> own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that
> here, but we shouldn't *accidentally* not use that here.
>
> I wonder whether we should really be using popen() in any form, actually.
> If what we want is for the output of the command to go to our standard
> output, as it does with system(), that's exactly what popen() is designed
> not to do, so it doesn't seem like a natural fit. Perhaps we should be
> using fork() + exec()? Or maybe we already have some good infrastructure
> for this we can reuse somewhere?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>