Thread

  1. Requested WAL segment xxx has already been removed

    Japin Li <japinli@hotmail.com> — 2025-07-14T08:08:06Z

    Hi all,
    
    I recently hit an error with our streaming replication setup:
    
      2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0
    
    It appears the requested WAL segment 00000001000000000000000C had already been
    archived, and I confirmed its presence in the archive directory. However, when
    the standby tried to request this file, the primary only searched for it in
    pg_wal and didn't check the archive directory. I had to manually copy the
    segment into pg_wal to get streaming replication working again.
    
    My question is: Can we make the primary automatically search the archive if
    restore_command is set?
    
    I found that Fujii Masao also requested this feature [1], but it seems there
    wasn't a consensus.
    
    I've attached a script to reproduce this issue.
    
    [1] https://www.postgresql.org/message-id/AANLkTinN%3DxsPOoaXzVFSp1OkfMDAB1f_d-F91xjEZDV8%40mail.gmail.com
    
    -- 
    Regards,
    Japin Li
    
    
  2. Re: Requested WAL segment xxx has already been removed

    Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-14T08:21:02Z

    On Mon, 14 Jul 2025 at 10:08, Japin Li <japinli@hotmail.com> wrote:
    
    >
    > Hi all,
    >
    > I recently hit an error with our streaming replication setup:
    >
    >   2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14
    > 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment
    > 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION
    > 0/C000000 TIMELINE 1",,,"standby","walsender",,0
    >
    > My question is: Can we make the primary automatically search the archive if
    > restore_command is set?
    
    
    If we talk about physical replication, then with the same success
    restore_command could be (and more important, it should be) used on a
    standby. And the main question here is why standby wasn't
    properly configured?
    
    However, with logical replication it is a different story, and it would be
    really great if restore_command is used when WAL's are missing to fetch it.
    
    Regards,
    --
    Alexander Kukushkin
    
  3. Re: Requested WAL segment xxx has already been removed

    Japin Li <japinli@hotmail.com> — 2025-07-14T09:23:54Z

    On Mon, 14 Jul 2025 at 10:21, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
    > On Mon, 14 Jul 2025 at 10:08, Japin Li <japinli@hotmail.com> wrote:
    >
    >  Hi all,
    >
    >  I recently hit an error with our streaming replication setup:
    >
    >    2025-07-14 11:52:59.361
    >  CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59
    >  CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been
    >  removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0
    >
    >  My question is: Can we make the primary automatically search the archive if
    >  restore_command is set?
    >
    > If we talk about physical replication, then with the same success restore_command could be (and more important, it should
    > be) used on a standby.
    >
    
    Yes, I'm referring to physical replication.
    
    > And the main question here is why standby wasn't properly configured?
    >
    
    The configuration is as expected. My test script simulates two distinct hosts
    by utilizing local archive storage.
    
    For physical replication across distinct hosts without shared WAL archive
    storage, WALs are archived locally (in my test).
    
    When the primary's walsender needs a WAL file from the archive that's not in
    its pg_wal directory, manual copying is required to the primary's pg_wal or the
    standby's pg_wal (or its archive directory, and use restore_command to fetch it).
    
    What prevents us from using the primary's restore_command to retrieve the
    necessary WALs?
    
    > However, with logical replication it is a different story, and it would be really great if restore_command is used when
    > WAL's are missing to fetch it.
    >  
    
    -- 
    Regards,
    Japin Li
    
    
    
    
  4. Re: Requested WAL segment xxx has already been removed

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2025-07-14T11:33:36Z

    
    On 2025/07/14 17:08, Japin Li wrote:
    > 
    > Hi all,
    > 
    > I recently hit an error with our streaming replication setup:
    > 
    >    2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0
    > 
    > It appears the requested WAL segment 00000001000000000000000C had already been
    > archived, and I confirmed its presence in the archive directory. However, when
    > the standby tried to request this file, the primary only searched for it in
    > pg_wal and didn't check the archive directory. I had to manually copy the
    > segment into pg_wal to get streaming replication working again.
    > 
    > My question is: Can we make the primary automatically search the archive if
    > restore_command is set?
    > 
    > I found that Fujii Masao also requested this feature [1], but it seems there
    > wasn't a consensus.
    
    Yeah, I still like this idea. It's useful, for example, when we want to
    temporarily retain WAL files, such as during planned standby maintenance,
    to avoid "requested WAL segment ... removed." error.
    
    Using a replication slot is one way to retain WAL files in pg_wal,
    but it requires the pg_wal directory to be large enough to hold all
    WAL generated during that time, which isn't always practical.
    
    Regards,
    
    -- 
    Fujii Masao
    NTT DATA Japan Corporation
    
    
    
    
    
  5. Re: Requested WAL segment xxx has already been removed

    Japin Li <japinli@hotmail.com> — 2025-07-15T06:10:44Z

    On Mon, 14 Jul 2025 at 20:33, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    > On 2025/07/14 17:08, Japin Li wrote:
    >> Hi all,
    >> I recently hit an error with our streaming replication setup:
    >>    2025-07-14 11:52:59.361
    >> CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14
    >> 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment
    >> 00000001000000000000000C has already been
    >> removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE
    >> 1",,,"standby","walsender",,0
    >> It appears the requested WAL segment 00000001000000000000000C had
    >> already been
    >> archived, and I confirmed its presence in the archive directory. However, when
    >> the standby tried to request this file, the primary only searched for it in
    >> pg_wal and didn't check the archive directory. I had to manually copy the
    >> segment into pg_wal to get streaming replication working again.
    >> My question is: Can we make the primary automatically search the
    >> archive if
    >> restore_command is set?
    >> I found that Fujii Masao also requested this feature [1], but it
    >> seems there
    >> wasn't a consensus.
    >
    > Yeah, I still like this idea. It's useful, for example, when we want to
    > temporarily retain WAL files, such as during planned standby maintenance,
    > to avoid "requested WAL segment ... removed." error.
    >
    > Using a replication slot is one way to retain WAL files in pg_wal,
    > but it requires the pg_wal directory to be large enough to hold all
    > WAL generated during that time, which isn't always practical.
    >
    
    Agreed.  Here is a patch that fixes this.
    
    -- 
    Regards,
    Japin Li
    
  6. Re: Requested WAL segment xxx has already been removed

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-15T08:41:56Z

    HI Japin
       Thank you for your working on this.It is useful ,when a standby node has
    hardware issue repaired ,wal log usually has been archived.The
    wal_keep_size parameter is difficult to estimate accurately, as hardware
    repair or replacement times are often unpredictable. If the machine can be
    fixed in a few days, the archived WAL files are likely still available in
    the archive directory.One small regret is that postgresql currently lacks a
    speed limit for sending wal logs.
    
    Thanks
    
    On Tue, Jul 15, 2025 at 2:11 PM Japin Li <japinli@hotmail.com> wrote:
    
    > On Mon, 14 Jul 2025 at 20:33, Fujii Masao <masao.fujii@oss.nttdata.com>
    > wrote:
    > > On 2025/07/14 17:08, Japin Li wrote:
    > >> Hi all,
    > >> I recently hit an error with our streaming replication setup:
    > >>    2025-07-14 11:52:59.361
    > >> CST,"replicator","",728458,"10.9.9.74:35724
    > ",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14
    > >> 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment
    > >> 00000001000000000000000C has already been
    > >> removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE
    > >> 1",,,"standby","walsender",,0
    > >> It appears the requested WAL segment 00000001000000000000000C had
    > >> already been
    > >> archived, and I confirmed its presence in the archive directory.
    > However, when
    > >> the standby tried to request this file, the primary only searched for
    > it in
    > >> pg_wal and didn't check the archive directory. I had to manually copy
    > the
    > >> segment into pg_wal to get streaming replication working again.
    > >> My question is: Can we make the primary automatically search the
    > >> archive if
    > >> restore_command is set?
    > >> I found that Fujii Masao also requested this feature [1], but it
    > >> seems there
    > >> wasn't a consensus.
    > >
    > > Yeah, I still like this idea. It's useful, for example, when we want to
    > > temporarily retain WAL files, such as during planned standby maintenance,
    > > to avoid "requested WAL segment ... removed." error.
    > >
    > > Using a replication slot is one way to retain WAL files in pg_wal,
    > > but it requires the pg_wal directory to be large enough to hold all
    > > WAL generated during that time, which isn't always practical.
    > >
    >
    > Agreed.  Here is a patch that fixes this.
    >
    > --
    > Regards,
    > Japin Li
    >
    
  7. Re: Requested WAL segment xxx has already been removed

    Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-15T09:24:35Z

    Hi,
    
    On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote:
    
    > The configuration is as expected. My test script simulates two distinct
    > hosts
    > by utilizing local archive storage.
    >
    > For physical replication across distinct hosts without shared WAL archive
    > storage, WALs are archived locally (in my test).
    >
    > When the primary's walsender needs a WAL file from the archive that's not
    > in
    > its pg_wal directory, manual copying is required to the primary's pg_wal
    > or the
    > standby's pg_wal (or its archive directory, and use restore_command to
    > fetch it).
    >
    > What prevents us from using the primary's restore_command to retrieve the
    > necessary WALs?
    >
    
    I am just talking about the practical side of local archive storage.
    Such archives will be gone along with the server in case of disaster and
    therefore they bring only a little value.
    With the same success, physical standby can use restore_command to copy
    files from the archive on the primary via ssh/rsync or similar. This
    approach is used for ages and works just fine.
    
    What is really painful right now, logical walsenders can only look into
    pg_wal, and unfortunately replication slots don't give 100% guarantee for
    WAL retention because of max_slot_wal_keep_size.
    That is, using restore_command for logical walsenders would be really
    helpful and solve some problems and pain points with logical replication.
    
    However, if we start calling restore_command also for physical walsenders
    it might result in increased resource usage on primary without providing
    much additional value. For example, restore_command is failing, but standby
    indefinitely continues making replication connection attempts.
    
    I don't mind if it will also work for physical replication, but IMO there
    should be a possibility to opt out from it.
    
    Regards,
    --
    Alexander Kukushkin
    
  8. Re: Requested WAL segment xxx has already been removed

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-15T10:01:46Z

    HI
    >What is really painful right now, logical walsenders can only look into
    pg_wal, and unfortunately replication slots don't give 100% guarantee for
    WAL >retention because of max_slot_wal_keep_size.
    >That is, using restore_command for logical walsenders would be really
    helpful and solve some problems and pain points with logical replication.
    restore_command needs to be realized with the help of ssh or nfs shared
    storage,most companies  due to the requirement of security audit, it is not
    possible to establish ssh mutual trust.It would be very convenient if this
    feature was implemented
    
    
    Thanks
    
    On Tue, Jul 15, 2025 at 5:24 PM Alexander Kukushkin <cyberdemn@gmail.com>
    wrote:
    
    > Hi,
    >
    > On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote:
    >
    >> The configuration is as expected. My test script simulates two distinct
    >> hosts
    >> by utilizing local archive storage.
    >>
    >> For physical replication across distinct hosts without shared WAL archive
    >> storage, WALs are archived locally (in my test).
    >>
    >> When the primary's walsender needs a WAL file from the archive that's not
    >> in
    >> its pg_wal directory, manual copying is required to the primary's pg_wal
    >> or the
    >> standby's pg_wal (or its archive directory, and use restore_command to
    >> fetch it).
    >>
    >> What prevents us from using the primary's restore_command to retrieve the
    >> necessary WALs?
    >>
    >
    > I am just talking about the practical side of local archive storage.
    > Such archives will be gone along with the server in case of disaster and
    > therefore they bring only a little value.
    > With the same success, physical standby can use restore_command to copy
    > files from the archive on the primary via ssh/rsync or similar. This
    > approach is used for ages and works just fine.
    >
    > What is really painful right now, logical walsenders can only look into
    > pg_wal, and unfortunately replication slots don't give 100% guarantee for
    > WAL retention because of max_slot_wal_keep_size.
    > That is, using restore_command for logical walsenders would be really
    > helpful and solve some problems and pain points with logical replication.
    >
    > However, if we start calling restore_command also for physical walsenders
    > it might result in increased resource usage on primary without providing
    > much additional value. For example, restore_command is failing, but standby
    > indefinitely continues making replication connection attempts.
    >
    > I don't mind if it will also work for physical replication, but IMO there
    > should be a possibility to opt out from it.
    >
    > Regards,
    > --
    > Alexander Kukushkin
    >
    
  9. Re: Requested WAL segment xxx has already been removed

    Japin Li <japinli@hotmail.com> — 2025-07-15T10:07:56Z

    On Tue, 15 Jul 2025 at 11:24, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
    > Hi,
    >
    > On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote:
    >
    >  The configuration is as expected. My test script simulates two distinct hosts
    >  by utilizing local archive storage.
    >
    >  For physical replication across distinct hosts without shared WAL archive
    >  storage, WALs are archived locally (in my test).
    >
    >  When the primary's walsender needs a WAL file from the archive that's not in
    >  its pg_wal directory, manual copying is required to the primary's pg_wal or the
    >  standby's pg_wal (or its archive directory, and use restore_command to fetch it).
    >
    >  What prevents us from using the primary's restore_command to retrieve the
    >  necessary WALs?
    >
    > I am just talking about the practical side of local archive storage.
    >
    
    Yes, it's quite niche in its usage.
    
    > Such archives will be gone along with the server in case of disaster and therefore they bring only a little value.
    > With the same success, physical standby can use restore_command to copy files from the archive on the primary via
    > ssh/rsync or similar. This approach is used for ages and works just fine.
    >
    
    However, some environments might prohibit password-free scp or the use of
    shared directories.
    
    > What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't
    > give 100% guarantee for WAL retention because of max_slot_wal_keep_size.
    > That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points
    > with logical replication.
    >
    
    I agree; logical walsenders offer greater value than physical ones.
    
    > However, if we start calling restore_command also for physical walsenders it might result in increased resource usage on
    > primary without providing much additional value. For example, restore_command is failing, but standby indefinitely
    > continues making replication connection attempts.
    >
    
    IIRC, the standby will indefinitely attempt to connect for replication, even
    without restore_command configured.
    
    > I don't mind if it will also work for physical replication, but IMO there should be a possibility to opt out from it.
    >
    
    -- 
    Regards,
    Japin Li
    
    
    
    
  10. Re: Requested WAL segment xxx has already been removed

    Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-15T10:12:01Z

    On Tue, 15 Jul 2025 at 12:08, Japin Li <japinli@hotmail.com> wrote:
    
    >
    > IIRC, the standby will indefinitely attempt to connect for replication,
    > even
    > without restore_command configured.
    >
    
    That's correct. However right now it just results in an attempt to open the
    WAL segment in pg_wal and failing, what is cheap.
    Calling restore_command is more expensive and therefore the impact on
    resource usage will be bigger.
    
    Regards,
    --
    Alexander Kukushkin