Re: Making pg_rewind faster

wenhui qiu <qiuwenhuifx@gmail.com>

From: wenhui qiu <qiuwenhuifx@gmail.com>
To: Japin Li <japinli@hotmail.com>
Cc: John H <johnhyvr@gmail.com>, Michael Paquier <michael@paquier.xyz>, Andres Freund <andres@anarazel.de>, Alexander Korotkov <aekorotkov@gmail.com>, Justin Kwan <justinpkwan@outlook.com>, Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers <pgsql-hackers@postgresql.org>, vignesh <vignesh@cloudflare.com>, vignesh ravichandran <admin@viggy28.dev>, "hlinnaka@iki.fi" <hlinnaka@iki.fi>, "jkwan@cloudflare.com" <jkwan@cloudflare.com>
Date: 2025-07-02T03:16:25Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. pg_rewind: Skip copy of WAL segments generated before point of divergence

  2. pg_rewind: Extend code detecting relation files to work with WAL files

  3. Split TESTDIR into TESTLOGDIR and TESTDATADIR

HI
>  2.
> Perhaps decide_wal_file_action() could be defined in filemap.c.


>  While this is unrelated to WAL logging, it could also contribute to
faster
> pg_rewind operations.  Should we consider ignoring log files under PGDATA
> (e.g., those in the default log/ directory)?
Agree ,Usually the log file directory also takes up a lot of space, and the
number of log files is quite large

On Wed, Jul 2, 2025 at 10:21 AM Japin Li <japinli@hotmail.com> wrote:

> On Tue, 01 Jul 2025 at 11:13, John H <johnhyvr@gmail.com> wrote:
> > Hi,
> >
> > I've attached an updated version of the patch against master with the
> changes
> > suggested.
> >
> > On Tue, Nov 29, 2022 at 10:03 PM Michael Paquier <michael@paquier.xyz>
> wrote:
> >>
> >> On Thu, Oct 06, 2022 at 04:08:45PM +0900, Michael Paquier wrote:
> >>>
> >>>  There may be something I am missing here, but there is no need to care
> >>> about segments with a TLI older than lastcommontliIndex, no?
> >
> > Hard to say. pg_rewind is intended to make the same "copy" of the
> cluster which
> > implies pg_wal/ should look the same. There might be use cases around
> logical
> > replication where you would want these WAL files to still exist even
> > across promotions?
> >
> >>> decide_wal_file_action() assumes that the WAL segment exists on the
> >>> target and the source.  This looks bug-prone to me without at least an
> >>> assertion.
> >
> > From previous refactors there is now an Assertion in filemap.c
> > decide_file_action that handles this.
> >
> >> Assert(entry->target_exists && entry->source_exists);
> >
> > decide_wal_file_action is called after the assertion.
> >
>
> Hi, John
>
> Thanks for updating the patch.
>
> 1.
> +/* Determine the type of file content (relation, WAL, or other) */
> +static file_content_type_t
> +getFileType(const char *path)
>
> Considering the existence of file_type_t, would getFileContentType() be a
> suitable function for handling file content types?
>
> 2.
> Perhaps decide_wal_file_action() could be defined in filemap.c.
>
>
> While this is unrelated to WAL logging, it could also contribute to faster
> pg_rewind operations.  Should we consider ignoring log files under PGDATA
> (e.g., those in the default log/ directory)?
>
> --
> Regards,
> Japin Li
>
>
>