Re: Making pg_rewind faster

John H <johnhyvr@gmail.com>

From: John H <johnhyvr@gmail.com>
To: wenhui qiu <qiuwenhuifx@gmail.com>
Cc: Japin Li <japinli@hotmail.com>, Michael Paquier <michael@paquier.xyz>, Andres Freund <andres@anarazel.de>, Alexander Korotkov <aekorotkov@gmail.com>, Justin Kwan <justinpkwan@outlook.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers <pgsql-hackers@postgresql.org>, vignesh <vignesh@cloudflare.com>, vignesh ravichandran <admin@viggy28.dev>, "hlinnaka@iki.fi" <hlinnaka@iki.fi>, "jkwan@cloudflare.com" <jkwan@cloudflare.com>
Date: 2025-07-02T18:21:54Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. pg_rewind: Skip copy of WAL segments generated before point of divergence

  2. pg_rewind: Extend code detecting relation files to work with WAL files

  3. Split TESTDIR into TESTLOGDIR and TESTDATADIR

Attachments

Hi,

Thanks for the quick review.

On Tue, Jul 1, 2025 at 8:16 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> > Perhaps decide_wal_file_action() could be defined in filemap.c.
>

That's a good point. I updated the patch to reflect that.

> >  While this is unrelated to WAL logging, it could also contribute to faster
> > pg_rewind operations.  Should we consider ignoring log files under PGDATA
> > (e.g., those in the default log/ directory)?
> Agree ,Usually the log file directory also takes up a lot of space, and the number of log files is quite large
>

Should we handle this use case? I do agree that for the more common
use-cases of pg_rewind which is synchronizing an old writer to the new
leader after failover, avoiding syncing the logging directory is
useful.
At the same time, pg_rewind is intended to make the same copy of the
source cluster as efficiently as possible which would include "all"
directories if they exist in $PGDATA. By default logging_collector is
off as well. I'd also think you would want to avoid putting the logs
in $PGDATA to have smaller backups if you are using tools like
pg_basebackup.

> On Wed, Jul 2, 2025 at 10:21 AM Japin Li <japinli@hotmail.com> wrote:
>>
>> Hi, John
>>
>> Thanks for updating the patch.
>>
>> 1.
>> +/* Determine the type of file content (relation, WAL, or other) */
>> +static file_content_type_t
>> +getFileType(const char *path)
>>
>> Considering the existence of file_type_t, would getFileContentType() be a
>> suitable function for handling file content types?

Do you mean naming getFileType to getFileContentType?

Thanks,

-- 
John Hsu - Amazon Web Services