Thread

  1. Re: Making pg_rewind faster

    Srinath Reddy Sadipiralla <srinath2133@gmail.com> — 2025-10-18T14:17:06Z

    On Thu, Oct 16, 2025 at 11:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Oct 9, 2025 at 3:09 PM Srinath Reddy Sadipiralla
    > <srinath2133@gmail.com> wrote:
    > > just a second late :( i was about to post a patch addressing the
    > refactors which Robert mentioned  ,anyway will have a look at your latest
    > patch John thanks :), curious about the tap test.
    > >
    > > while i was writing the patch something suddenly struck me , that is why
    > we are even depending on last_common_segno ,because once we reached
    > decide_wal_file_action it means that the file exists in both target and
    > source ,AFAIK this can only happen with wal segments older than or equal to
    > last_common_segno because once the promotion competes the filename of the
    > WAL files gets changed with the new timelineID(2), for ex: if the
    > last_common_segno is 000000010000000000000003 then based on the rules in
    > XLogInitNewTimeline
    > > 1) if the timeline switch happens in middle of segment ,copy data from
    > the last WAL segment and create WAL file with same segno but different
    > timelineID,in this case the starting WAL file for the new timeline will be
    > 000000020000000000000003
    > > 2) if the timeline switch happens at segment boundary , just create next
    > segment for this case the starting WAL file for the new timeline will be
    > 000000020000000000000004
    > >
    > > so basically the files which exists in source and not in target like the
    > new timeline WAL segments will be copied to target in total before we reach
    > decide_wal_file_action , so i think we don't need to think about copying
    > WAL files after divergence point by calculating and checking against
    > last_common_segno which we are doing in our current approach , i think we
    > can just do
    >
    > What makes me nervous about this is that it isn't necessarily the case
    > that the servers were perfectly in sync at the time of the failure.
    > Suppose that the primary was in the middle of writing
    > 000000010000000000000003. The standby might also have this file, but
    > it might contain less valid data than the one on the primary;
    > therefore, if we don't copy the file, the two servers might not have
    > an identical file. Maybe that wouldn't really matter, in the sense
    > that the extra valid data that exists on the original primary
    > shouldn't prevent it from replaying WAL on the new primary's timeline,
    > which is probably all we really care about. But it feels dangerous to
    > me.
    >
    
    Thanks Robert ,I want to understand this point more , and will get back .
    
    -- 
    Thanks,
    Srinath Reddy Sadipiralla
    EDB: https://www.enterprisedb.com/