Re: Making pg_rewind faster
John H <johnhyvr@gmail.com>
From: John H <johnhyvr@gmail.com>
To: wenhui qiu <qiuwenhuifx@gmail.com>
Cc: Japin Li <japinli@hotmail.com>, Michael Paquier <michael@paquier.xyz>, Andres Freund <andres@anarazel.de>,
Alexander Korotkov <aekorotkov@gmail.com>, Justin Kwan <justinpkwan@outlook.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers <pgsql-hackers@postgresql.org>,
vignesh <vignesh@cloudflare.com>, vignesh ravichandran <admin@viggy28.dev>,
"hlinnaka@iki.fi" <hlinnaka@iki.fi>, "jkwan@cloudflare.com" <jkwan@cloudflare.com>
Date: 2025-07-02T18:21:54Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
pg_rewind: Skip copy of WAL segments generated before point of divergence
- 5173bfd0443e 19 (unreleased) landed
-
pg_rewind: Extend code detecting relation files to work with WAL files
- 6ae08d9583e9 19 (unreleased) landed
-
Split TESTDIR into TESTLOGDIR and TESTDATADIR
- c47885bd8b69 16.0 cited
Attachments
- 0005-Avoid-copying-WAL-segments-before-divergence-to-spee.patch (application/octet-stream) patch 0005
Hi, Thanks for the quick review. On Tue, Jul 1, 2025 at 8:16 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote: > > Perhaps decide_wal_file_action() could be defined in filemap.c. > That's a good point. I updated the patch to reflect that. > > While this is unrelated to WAL logging, it could also contribute to faster > > pg_rewind operations. Should we consider ignoring log files under PGDATA > > (e.g., those in the default log/ directory)? > Agree ,Usually the log file directory also takes up a lot of space, and the number of log files is quite large > Should we handle this use case? I do agree that for the more common use-cases of pg_rewind which is synchronizing an old writer to the new leader after failover, avoiding syncing the logging directory is useful. At the same time, pg_rewind is intended to make the same copy of the source cluster as efficiently as possible which would include "all" directories if they exist in $PGDATA. By default logging_collector is off as well. I'd also think you would want to avoid putting the logs in $PGDATA to have smaller backups if you are using tools like pg_basebackup. > On Wed, Jul 2, 2025 at 10:21 AM Japin Li <japinli@hotmail.com> wrote: >> >> Hi, John >> >> Thanks for updating the patch. >> >> 1. >> +/* Determine the type of file content (relation, WAL, or other) */ >> +static file_content_type_t >> +getFileType(const char *path) >> >> Considering the existence of file_type_t, would getFileContentType() be a >> suitable function for handling file content types? Do you mean naming getFileType to getFileContentType? Thanks, -- John Hsu - Amazon Web Services