Re: Non-blocking archiver process
BharatDB <bharatdbpg@gmail.com>
From: BharatDB <bharatdbpg@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Artem Gavrilov <artem.gavrilov@percona.com>, Patrick Stählin <me@packi.ch>, Noah Misch <noah@leadboat.com>, Ronan Dunklau <ronan.dunklau@aiven.io>, pgsql-hackers@lists.postgresql.org
Date: 2025-11-10T10:41:21Z
Lists: pgsql-hackers
Attachments
- 0001-refactor-shell_archive.c-use-OpenPipeStream-improve-.patch (text/x-patch) patch 0001
Dear PostgreSQL Community, I found 8 Bugs in `shell_archive.c` While studying how `archive_command` works, I discovered **8 real issues** that affect reliability, performance, and safety: | # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command failure | Archiver hangs forever | | 2 | Unreachable code after `return` | Dead logic | | 3 | Discarded `stdout` from archive command | `DROP DATABASE` hangs for full command duration | | 4 | Aggressive polling with `Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`, `bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` | Missed PG infrastructure | | 8 | Redundant `return;` in `void` function | Style issue | I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style fixes**: - Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used `fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` / `pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`, `bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE` hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed redundant `return;` and dead code This is my contribution to improve the PostgreSQL archiver process. I have attached a patch implementing the discussed fixes — please review and share any suggestions or feedback. Thanks ! Lakshmi On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > Here are a few comments from me: > > I think that by calling system(), anything that is printed out by the > child process ends up in the PostgreSQL log. With the patch, the output of > the command seems like it will be read into a buffer and discarded. That's > a significant behavior difference. > > This patch has a 10ms polling loop after which it checks for interrupts, > and then repeats. I think our normal convention in these kinds of cases is > to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s) > which results in fewer processor wakeups, while actually still being more > responsive because the arrival of an interrupt should set the process latch > and terminate the wait. > > In general, I agree with Artem's comments about writing code that fits the > PostgreSQL style. We don't want to invent new ways of solving problems for > which we already have infrastructure, nor do we want to solve problems in > this case in a way that is different from what we do in other cases. Using > ereport appropriately is part of that, but there's a lot more to it. Artem > mentioned malloc and free, but notice, for example, that we also have our > own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that > here, but we shouldn't *accidentally* not use that here. > > I wonder whether we should really be using popen() in any form, actually. > If what we want is for the output of the command to go to our standard > output, as it does with system(), that's exactly what popen() is designed > not to do, so it doesn't seem like a natural fit. Perhaps we should be > using fork() + exec()? Or maybe we already have some good infrastructure > for this we can reuse somewhere? > > -- > Robert Haas > EDB: http://www.enterprisedb.com >