Re: vacuumdb: add --dry-run

Nathan Bossart <nathandbossart@gmail.com>

From: Nathan Bossart <nathandbossart@gmail.com>
To: Corey Huinker <corey.huinker@gmail.com>
Cc: pgsql-hackers@postgresql.org
Date: 2025-11-11T19:46:59Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add ParallelSlotSetIdle().

  2. vacuumdb: Add --dry-run.

  3. vacuumdb: Move some variables to the vacuumingOptions struct.

  4. Log a note at program start when running in dry-run mode

On Mon, Nov 10, 2025 at 05:33:34PM -0500, Corey Huinker wrote:
>> My attempts to test this all got stuck in wait_on_slots().  I haven't
>> looked too closely, but I suspect the issue is that the socket never
>> becomes readable because we don't send a query.  If I set free_slot->inUse
>> to false before printing the command, it no longer hangs.  We probably want
>> to create a function in parallel_slot.c to mark slots that we don't intend
>> to give a query as idle.
> 
> Would that be preferable to skipping the creation of extra connections for
> parallel workers? I can see it both ways. On the one hand we want to give
> as true a reflection of "what would happen with these options", and on the
> other hand one could view the creation of extra workers as "real" vs a dry
> run.

I think what I'm proposing actually does skip creating extra connections.
If we're immediately marking the first connection as idle, each loop
iteration should reuse the same connection.

BTW it might be better to modify run_vacuum_command() to skip running the
command in dry-run mode.  That would also take care of the
ONLY_DATABASE_STATS stuff.  We should probably do something about the
executeCommand() for --analyze-in-stages, too.

-- 
nathan