Checkpointer write combining
Melanie Plageman <melanieplageman@gmail.com>
From: Melanie Plageman <melanieplageman@gmail.com>
To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Cc: Andres Freund <andres@anarazel.de>
Date: 2025-09-02T21:10:43Z
Lists: pgsql-hackers
Attachments
- v1-0005-Fix-XLogNeedsFlush-for-checkpointer.patch (text/x-patch) patch v1-0005
- v1-0001-Refactor-goto-into-for-loop-in-GetVictimBuffer.patch (text/x-patch) patch v1-0001
- v1-0004-Write-combining-for-BAS_BULKWRITE.patch (text/x-patch) patch v1-0004
- v1-0003-Eagerly-flush-bulkwrite-strategy-ring.patch (text/x-patch) patch v1-0003
- v1-0002-Split-FlushBuffer-into-two-parts.patch (text/x-patch) patch v1-0002
- v1-0006-Add-database-Oid-to-CkptSortItem.patch (text/x-patch) patch v1-0006
- v1-0007-Implement-checkpointer-data-write-combining.patch (text/x-patch) patch v1-0007
Hi, The attached patchset implements checkpointer write combining -- which makes immediate checkpoints at least 20% faster in my tests. Checkpointer achieves higher write throughput and higher write IOPs with the patch. Besides the immediate performance gain with the patchset, we will eventually need all writers to do write combining if we want to use direct IO. Additionally, I think the general shape I refactored BufferSync() into will be useful for AIO-ifying checkpointer. The patch set has preliminary patches (0001-0004) that implement eager flushing and write combining for bulkwrites (like COPY FROM). The functions used to flush a batch of writes for bulkwrites (see 0004) are reused for the checkpointer. The eager flushing component of this patch set has been discussed elsewhere [1]. 0005 implements a fix for XLogNeedsFlush() when called by checkpointer during an end-of-crash-recovery checkpoint. I've already started another thread about this [2], but the patch is required for the patch set to pass tests. One outstanding action item is to test to see if there are any benefits to spread checkpoints. More on how I measured the performance benefit to immediate checkpoints: I tuned checkpoint_completion_target, checkpoint_timeout, and min and max_wal_size to ensure no other checkpoints were initiated. With 16 GB shared buffers and io_combine_limit 128, I created a 15 GB table. To get consistent results, I used pg_prewarm to read the table into shared buffers, issued a checkpoint, then used Bilal's patch [3] to mark all the buffers as dirty again and issue another checkpoint. On a fast local SSD, this proved to be a consistent 20%+ speed up (~6.5 seconds to ~5 seconds). - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_Yjn4mvN9NBxtmsCQSGwup45CoA4e05nhR7ADP-v0WCig@mail.gmail.com [2] https://www.postgresql.org/message-id/CAAKRu_a1vZRZRWO3_jv_X13RYoqLRVipGO0237g5PKzPa2YX6g%40mail.gmail.com [3] https://www.postgresql.org/message-id/flat/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com