Thread
-
Requested WAL segment xxx has already been removed
Japin Li <japinli@hotmail.com> — 2025-07-14T08:08:06Z
Hi all, I recently hit an error with our streaming replication setup: 2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0 It appears the requested WAL segment 00000001000000000000000C had already been archived, and I confirmed its presence in the archive directory. However, when the standby tried to request this file, the primary only searched for it in pg_wal and didn't check the archive directory. I had to manually copy the segment into pg_wal to get streaming replication working again. My question is: Can we make the primary automatically search the archive if restore_command is set? I found that Fujii Masao also requested this feature [1], but it seems there wasn't a consensus. I've attached a script to reproduce this issue. [1] https://www.postgresql.org/message-id/AANLkTinN%3DxsPOoaXzVFSp1OkfMDAB1f_d-F91xjEZDV8%40mail.gmail.com -- Regards, Japin Li
-
Re: Requested WAL segment xxx has already been removed
Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-14T08:21:02Z
On Mon, 14 Jul 2025 at 10:08, Japin Li <japinli@hotmail.com> wrote: > > Hi all, > > I recently hit an error with our streaming replication setup: > > 2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 > 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment > 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION > 0/C000000 TIMELINE 1",,,"standby","walsender",,0 > > My question is: Can we make the primary automatically search the archive if > restore_command is set? If we talk about physical replication, then with the same success restore_command could be (and more important, it should be) used on a standby. And the main question here is why standby wasn't properly configured? However, with logical replication it is a different story, and it would be really great if restore_command is used when WAL's are missing to fetch it. Regards, -- Alexander Kukushkin
-
Re: Requested WAL segment xxx has already been removed
Japin Li <japinli@hotmail.com> — 2025-07-14T09:23:54Z
On Mon, 14 Jul 2025 at 10:21, Alexander Kukushkin <cyberdemn@gmail.com> wrote: > On Mon, 14 Jul 2025 at 10:08, Japin Li <japinli@hotmail.com> wrote: > > Hi all, > > I recently hit an error with our streaming replication setup: > > 2025-07-14 11:52:59.361 > CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59 > CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been > removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0 > > My question is: Can we make the primary automatically search the archive if > restore_command is set? > > If we talk about physical replication, then with the same success restore_command could be (and more important, it should > be) used on a standby. > Yes, I'm referring to physical replication. > And the main question here is why standby wasn't properly configured? > The configuration is as expected. My test script simulates two distinct hosts by utilizing local archive storage. For physical replication across distinct hosts without shared WAL archive storage, WALs are archived locally (in my test). When the primary's walsender needs a WAL file from the archive that's not in its pg_wal directory, manual copying is required to the primary's pg_wal or the standby's pg_wal (or its archive directory, and use restore_command to fetch it). What prevents us from using the primary's restore_command to retrieve the necessary WALs? > However, with logical replication it is a different story, and it would be really great if restore_command is used when > WAL's are missing to fetch it. > -- Regards, Japin Li
-
Re: Requested WAL segment xxx has already been removed
Fujii Masao <masao.fujii@oss.nttdata.com> — 2025-07-14T11:33:36Z
On 2025/07/14 17:08, Japin Li wrote: > > Hi all, > > I recently hit an error with our streaming replication setup: > > 2025-07-14 11:52:59.361 CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment 00000001000000000000000C has already been removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE 1",,,"standby","walsender",,0 > > It appears the requested WAL segment 00000001000000000000000C had already been > archived, and I confirmed its presence in the archive directory. However, when > the standby tried to request this file, the primary only searched for it in > pg_wal and didn't check the archive directory. I had to manually copy the > segment into pg_wal to get streaming replication working again. > > My question is: Can we make the primary automatically search the archive if > restore_command is set? > > I found that Fujii Masao also requested this feature [1], but it seems there > wasn't a consensus. Yeah, I still like this idea. It's useful, for example, when we want to temporarily retain WAL files, such as during planned standby maintenance, to avoid "requested WAL segment ... removed." error. Using a replication slot is one way to retain WAL files in pg_wal, but it requires the pg_wal directory to be large enough to hold all WAL generated during that time, which isn't always practical. Regards, -- Fujii Masao NTT DATA Japan Corporation
-
Re: Requested WAL segment xxx has already been removed
Japin Li <japinli@hotmail.com> — 2025-07-15T06:10:44Z
On Mon, 14 Jul 2025 at 20:33, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2025/07/14 17:08, Japin Li wrote: >> Hi all, >> I recently hit an error with our streaming replication setup: >> 2025-07-14 11:52:59.361 >> CST,"replicator","",728458,"10.9.9.74:35724",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 >> 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment >> 00000001000000000000000C has already been >> removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE >> 1",,,"standby","walsender",,0 >> It appears the requested WAL segment 00000001000000000000000C had >> already been >> archived, and I confirmed its presence in the archive directory. However, when >> the standby tried to request this file, the primary only searched for it in >> pg_wal and didn't check the archive directory. I had to manually copy the >> segment into pg_wal to get streaming replication working again. >> My question is: Can we make the primary automatically search the >> archive if >> restore_command is set? >> I found that Fujii Masao also requested this feature [1], but it >> seems there >> wasn't a consensus. > > Yeah, I still like this idea. It's useful, for example, when we want to > temporarily retain WAL files, such as during planned standby maintenance, > to avoid "requested WAL segment ... removed." error. > > Using a replication slot is one way to retain WAL files in pg_wal, > but it requires the pg_wal directory to be large enough to hold all > WAL generated during that time, which isn't always practical. > Agreed. Here is a patch that fixes this. -- Regards, Japin Li
-
Re: Requested WAL segment xxx has already been removed
wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-15T08:41:56Z
HI Japin Thank you for your working on this.It is useful ,when a standby node has hardware issue repaired ,wal log usually has been archived.The wal_keep_size parameter is difficult to estimate accurately, as hardware repair or replacement times are often unpredictable. If the machine can be fixed in a few days, the archived WAL files are likely still available in the archive directory.One small regret is that postgresql currently lacks a speed limit for sending wal logs. Thanks On Tue, Jul 15, 2025 at 2:11 PM Japin Li <japinli@hotmail.com> wrote: > On Mon, 14 Jul 2025 at 20:33, Fujii Masao <masao.fujii@oss.nttdata.com> > wrote: > > On 2025/07/14 17:08, Japin Li wrote: > >> Hi all, > >> I recently hit an error with our streaming replication setup: > >> 2025-07-14 11:52:59.361 > >> CST,"replicator","",728458,"10.9.9.74:35724 > ",68747f1b.b1d8a,1,"START_REPLICATION",2025-07-14 > >> 11:52:59 CST,3/0,0,ERROR,58P01,"requested WAL segment > >> 00000001000000000000000C has already been > >> removed",,,,,,"START_REPLICATION 0/C000000 TIMELINE > >> 1",,,"standby","walsender",,0 > >> It appears the requested WAL segment 00000001000000000000000C had > >> already been > >> archived, and I confirmed its presence in the archive directory. > However, when > >> the standby tried to request this file, the primary only searched for > it in > >> pg_wal and didn't check the archive directory. I had to manually copy > the > >> segment into pg_wal to get streaming replication working again. > >> My question is: Can we make the primary automatically search the > >> archive if > >> restore_command is set? > >> I found that Fujii Masao also requested this feature [1], but it > >> seems there > >> wasn't a consensus. > > > > Yeah, I still like this idea. It's useful, for example, when we want to > > temporarily retain WAL files, such as during planned standby maintenance, > > to avoid "requested WAL segment ... removed." error. > > > > Using a replication slot is one way to retain WAL files in pg_wal, > > but it requires the pg_wal directory to be large enough to hold all > > WAL generated during that time, which isn't always practical. > > > > Agreed. Here is a patch that fixes this. > > -- > Regards, > Japin Li >
-
Re: Requested WAL segment xxx has already been removed
Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-15T09:24:35Z
Hi, On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote: > The configuration is as expected. My test script simulates two distinct > hosts > by utilizing local archive storage. > > For physical replication across distinct hosts without shared WAL archive > storage, WALs are archived locally (in my test). > > When the primary's walsender needs a WAL file from the archive that's not > in > its pg_wal directory, manual copying is required to the primary's pg_wal > or the > standby's pg_wal (or its archive directory, and use restore_command to > fetch it). > > What prevents us from using the primary's restore_command to retrieve the > necessary WALs? > I am just talking about the practical side of local archive storage. Such archives will be gone along with the server in case of disaster and therefore they bring only a little value. With the same success, physical standby can use restore_command to copy files from the archive on the primary via ssh/rsync or similar. This approach is used for ages and works just fine. What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't give 100% guarantee for WAL retention because of max_slot_wal_keep_size. That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points with logical replication. However, if we start calling restore_command also for physical walsenders it might result in increased resource usage on primary without providing much additional value. For example, restore_command is failing, but standby indefinitely continues making replication connection attempts. I don't mind if it will also work for physical replication, but IMO there should be a possibility to opt out from it. Regards, -- Alexander Kukushkin
-
Re: Requested WAL segment xxx has already been removed
wenhui qiu <qiuwenhuifx@gmail.com> — 2025-07-15T10:01:46Z
HI >What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't give 100% guarantee for WAL >retention because of max_slot_wal_keep_size. >That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points with logical replication. restore_command needs to be realized with the help of ssh or nfs shared storage,most companies due to the requirement of security audit, it is not possible to establish ssh mutual trust.It would be very convenient if this feature was implemented Thanks On Tue, Jul 15, 2025 at 5:24 PM Alexander Kukushkin <cyberdemn@gmail.com> wrote: > Hi, > > On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote: > >> The configuration is as expected. My test script simulates two distinct >> hosts >> by utilizing local archive storage. >> >> For physical replication across distinct hosts without shared WAL archive >> storage, WALs are archived locally (in my test). >> >> When the primary's walsender needs a WAL file from the archive that's not >> in >> its pg_wal directory, manual copying is required to the primary's pg_wal >> or the >> standby's pg_wal (or its archive directory, and use restore_command to >> fetch it). >> >> What prevents us from using the primary's restore_command to retrieve the >> necessary WALs? >> > > I am just talking about the practical side of local archive storage. > Such archives will be gone along with the server in case of disaster and > therefore they bring only a little value. > With the same success, physical standby can use restore_command to copy > files from the archive on the primary via ssh/rsync or similar. This > approach is used for ages and works just fine. > > What is really painful right now, logical walsenders can only look into > pg_wal, and unfortunately replication slots don't give 100% guarantee for > WAL retention because of max_slot_wal_keep_size. > That is, using restore_command for logical walsenders would be really > helpful and solve some problems and pain points with logical replication. > > However, if we start calling restore_command also for physical walsenders > it might result in increased resource usage on primary without providing > much additional value. For example, restore_command is failing, but standby > indefinitely continues making replication connection attempts. > > I don't mind if it will also work for physical replication, but IMO there > should be a possibility to opt out from it. > > Regards, > -- > Alexander Kukushkin >
-
Re: Requested WAL segment xxx has already been removed
Japin Li <japinli@hotmail.com> — 2025-07-15T10:07:56Z
On Tue, 15 Jul 2025 at 11:24, Alexander Kukushkin <cyberdemn@gmail.com> wrote: > Hi, > > On Mon, 14 Jul 2025 at 11:24, Japin Li <japinli@hotmail.com> wrote: > > The configuration is as expected. My test script simulates two distinct hosts > by utilizing local archive storage. > > For physical replication across distinct hosts without shared WAL archive > storage, WALs are archived locally (in my test). > > When the primary's walsender needs a WAL file from the archive that's not in > its pg_wal directory, manual copying is required to the primary's pg_wal or the > standby's pg_wal (or its archive directory, and use restore_command to fetch it). > > What prevents us from using the primary's restore_command to retrieve the > necessary WALs? > > I am just talking about the practical side of local archive storage. > Yes, it's quite niche in its usage. > Such archives will be gone along with the server in case of disaster and therefore they bring only a little value. > With the same success, physical standby can use restore_command to copy files from the archive on the primary via > ssh/rsync or similar. This approach is used for ages and works just fine. > However, some environments might prohibit password-free scp or the use of shared directories. > What is really painful right now, logical walsenders can only look into pg_wal, and unfortunately replication slots don't > give 100% guarantee for WAL retention because of max_slot_wal_keep_size. > That is, using restore_command for logical walsenders would be really helpful and solve some problems and pain points > with logical replication. > I agree; logical walsenders offer greater value than physical ones. > However, if we start calling restore_command also for physical walsenders it might result in increased resource usage on > primary without providing much additional value. For example, restore_command is failing, but standby indefinitely > continues making replication connection attempts. > IIRC, the standby will indefinitely attempt to connect for replication, even without restore_command configured. > I don't mind if it will also work for physical replication, but IMO there should be a possibility to opt out from it. > -- Regards, Japin Li
-
Re: Requested WAL segment xxx has already been removed
Alexander Kukushkin <cyberdemn@gmail.com> — 2025-07-15T10:12:01Z
On Tue, 15 Jul 2025 at 12:08, Japin Li <japinli@hotmail.com> wrote: > > IIRC, the standby will indefinitely attempt to connect for replication, > even > without restore_command configured. > That's correct. However right now it just results in an attempt to open the WAL segment in pg_wal and failing, what is cheap. Calling restore_command is more expensive and therefore the impact on resource usage will be bigger. Regards, -- Alexander Kukushkin