Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
Amit Kapila <amit.kapila16@gmail.com>
From: Amit Kapila <amit.kapila16@gmail.com>
To: depesz@depesz.com
Cc: pgsql-bugs mailing list <pgsql-bugs@postgresql.org>
Date: 2022-11-14T13:11:59Z
Lists: pgsql-bugs
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix a possibility of logical replication slot's restart_lsn going backwards.
- e5ed873b1b4a 18.0 landed
- 568e78a653ee 17.2 landed
- f353911337cf 16.6 landed
- 91771b3fbbc3 15.10 landed
- 26c4e8968690 14.15 landed
- 15dc1abb17dd 13.18 landed
On Sun, Nov 13, 2022 at 1:36 PM hubert depesz lubaczewski <depesz@depesz.com> wrote: > > On Fri, Nov 11, 2022 at 03:50:40PM +0100, hubert depesz lubaczewski wrote: > > #v+ > > 2022-11-11 12:45:26.432 UTC,,,994963,,636e43e6.f2e93,2,,2022-11-11 12:45:26 UTC,6/0,0,ERROR,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 000000010001039D00000083 has already been removed",,,,,,,,,"","logical replication worker",,0 > > #v- > > Sooo... plot thickens. > > Without any changes, manual rebuild or anything, yesterday, the problem > seems to have solved itself?! > > In logs on focal/pg14 I see: > > #v+ > 2022-11-12 20:55:39.190 UTC,,,1897563,,6370084b.1cf45b,2,,2022-11-12 20:55:39 UTC,6/0,0,ERROR,08P01,"could not receive data from WAL stream: ERROR: requested WAL segment 000000010001039D00000083 has already been removed",,,,,,,,,"","logical replication worker",,0 > #v- > > And this is *the last* such message. > > On bionic/pg12 we have in logs from pg_replication_slots: > > #v+ > timestamp pg_current_wal_lsn slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn > 2022-11-12 20:51:00 UTC 1041E/D3A0E540 focal14 pgoutput logical 16607 canvas f f \N \N 3241443528 1039D/83825958 1039D/96453F38 > 2022-11-12 20:51:59 UTC 1041E/D89B6000 focal14 pgoutput logical 16607 canvas f f \N \N 3241443528 1039D/83825958 1039D/96453F38 > 2022-11-12 20:52:58 UTC 1041E/E0547450 focal14 pgoutput logical 16607 canvas f f \N \N 3241443528 1039D/83825958 1039D/96453F38 > 2022-11-12 20:53:58 UTC 1041E/E59634F0 focal14 pgoutput logical 16607 canvas f f \N \N 3241443528 1039D/83825958 1039D/96453F38 > 2022-11-12 20:54:57 UTC 1041E/EBB50DE8 focal14 pgoutput logical 16607 canvas f f \N \N 3241443528 1039D/83825958 1039D/96453F38 > 2022-11-12 20:55:55 UTC 1041E/FBBC3160 focal14 pgoutput logical 16607 canvas f t 18626 \N 3241450490 1039D/9170B010 1039D/9B86EAF0 > ... ... From the last two lines above, it is clear why it started working. The restart_lsn has advanced from 1039D/83825958 to 1039D/9170B010 which means the system no longer needs data from 000000010001039D00000083. -- With Regards, Amit Kapila.