Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Masahiko Sawada <sawada.mshk@gmail.com>

Cc: depesz@depesz.com, Amit Kapila <amit.kapila16@gmail.com>, pgsql-bugs mailing list <pgsql-bugs@postgresql.org>

Date: 2022-11-17T08:02:58Z

Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix a possibility of logical replication slot's restart_lsn going backwards.
- e5ed873b1b4a 18.0 landed
- 568e78a653ee 17.2 landed
- f353911337cf 16.6 landed
- 91771b3fbbc3 15.10 landed
- 26c4e8968690 14.15 landed
- 15dc1abb17dd 13.18 landed

Hi,

On 2022-11-15 23:59:37 +0900, Masahiko Sawada wrote:
> > Is something like the following scenario possible to happen?
> >
> > 1. wal sender updates slot's restart_lsn and releases the spin lock
> > (not saved in the disk yet)
> > 2. someone updates slots' minimum restart_lsn (note that slot's
> > restart_lsn in memory is already updated).

You mean ReplicationSlotsComputeRequiredLSN(), or update that specific slot's
restart_lsn? The latter shouldn't happen.

> > 3. checkpointer removes WAL files older than the minimum restart_lsn
> > calculated at step 2.

For xmin we have protection against that via the split between
catalog_xmin/effective_catalog_xmin. We should probably mirror that for
restart_lsn as well.

We should also call ReplicationSlotsComputeRequiredLSN if only update_restart
is true...

> > 4. wal sender restarts for some reason (or server crashed).

I don't think walsender alone restarting should change anything, but
crash-restart obviously would.

Greetings,

Andres Freund