Automatically sizing the IO worker pool
Thomas Munro <thomas.munro@gmail.com>
From: Thomas Munro <thomas.munro@gmail.com>
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-04-12T16:59:54Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
aio: Adjust I/O worker pool automatically.
- d1c01b79d4ae 19 (unreleased) landed
-
aio: Simplify pgaio_worker_submit().
- fc44f106657a 19 (unreleased) landed
-
aio: Remove obsolete IO worker ID references.
- b4c19da93a08 18.0 landed
- 177c1f059338 19 (unreleased) landed
-
aio: Regularize IO worker internal naming.
- b2afb0676337 18.0 landed
- 01d618bcd782 19 (unreleased) landed
Attachments
- 0001-aio-Regularize-io_method-worker-naming-conventions.patch (text/x-patch) patch 0001
- 0002-aio-Remove-IO-worker-ID-references-from-postmaster.c.patch (text/x-patch) patch 0002
- 0003-aio-Try-repeatedly-to-give-batched-IOs-to-workers.patch (text/x-patch) patch 0003
- 0004-aio-Adjust-IO-worker-pool-size-automatically.patch (text/x-patch) patch 0004
- 0005-XXX-read_buffer_loop.patch (text/x-patch) patch 0005
It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:
io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms
It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out. IO jobs were already
concentrated into the lowest numbered workers, partly because that
seemed to have marginally better latency than anything else tried so
far due to latch collapsing with lucky timing, and partly in
anticipation of this.
The patch also reduces bogus wakeups a bit by being a bit more
cautious about fanout. That could probably be improved a lot more and
needs more research. It's quite tricky to figure out how to suppress
wakeups without throwing potential concurrency away.
The first couple of patches are independent of this topic, and might
be potential cleanups/fixes for master/v18. The last is a simple
latency test.
Ideas, testing, flames etc welcome.