Re: Making pg_rewind faster
Michael Paquier <michael@paquier.xyz>
From: Michael Paquier <michael@paquier.xyz>
To: John H <johnhyvr@gmail.com>
Cc: Srinath Reddy Sadipiralla <srinath2133@gmail.com>, Robert Haas <robertmhaas@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>, Japin Li <japinli@hotmail.com>, Andres Freund <andres@anarazel.de>, Alexander Korotkov <aekorotkov@gmail.com>, Justin Kwan <justinpkwan@outlook.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers <pgsql-hackers@postgresql.org>, vignesh ravichandran <admin@viggy28.dev>, "hlinnaka@iki.fi" <hlinnaka@iki.fi>
Date: 2025-10-23T08:22:33Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
pg_rewind: Skip copy of WAL segments generated before point of divergence
- 5173bfd0443e 19 (unreleased) landed
-
pg_rewind: Extend code detecting relation files to work with WAL files
- 6ae08d9583e9 19 (unreleased) landed
-
Split TESTDIR into TESTLOGDIR and TESTDATADIR
- c47885bd8b69 16.0 cited
Attachments
- v12-0001-Avoid-copying-WAL-segments-before-divergence-to-.patch (text/x-diff) patch v12-0001
On Tue, Oct 21, 2025 at 04:14:55PM -0700, John H wrote: > Made the changes and included the documentation update. I have been looking at this patch, and the first part that stood out is that the refactoring related to isRelDataFile() can be done as an independent piece, cutting the total size of the main patch by 30% or so. So I have extracted this part, simplified it to make it more consistent and les noisy on HEAD, and applied it as 6ae08d9583e9. I am still looking at the rest, and attached is a rebased version of what I have for the moment. Note first that the new test was missing in meson.build, so the coverage provided by the CI was limited. + * However we check last_common_segno and file_size again for sanity. + */ + if (file_segno < last_common_segno && source_size == target_size) What if a segment has a size different than the one a cluster has been initialized with, still both have the same size on the target and the source? Unlikely, still incorrect if not cross-checked with what a control file has in store. :) +# Sleep to allow mtime to be different +usleep(1000000); [...] + [qr/WAL segment \"$wal_seg_skipped\" not copied to target/, + qr/pg_wal\/$corrupt_wal_seg \(COPY\)/ + ], FWIW, I am definitely not a fan of the test relying on the timestamp of the files based on their retrieved meta-data stats, with the mtime. I suspect complications on Windows. Worse thing, there is a forced sleep of 1s added to the test, to make sure that the timestamp of the files we compare have a different number. But do we really need that anyway if we have the debug information with the file map printed in it? Hence, I think that we should simplify and rework the test a bit, tweaking a bit how the filemap is printed: now that we can detect if an operation is happening on a WAL file thanks to file_content_type_t, let's always print the type of operation done for each WAL file and remove this "not copied to target" debug log which provides the same information, then let's compare the debug output with what we want. It seems to me that it would be good for the test to run some sanity checks on the rewound standby, as well. That would provide more validation for the case of the "corrupted" segment that gets copied anyway. -- Michael