Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: depesz@depesz.com, Amit Kapila <amit.kapila16@gmail.com>, pgsql-bugs mailing list <pgsql-bugs@postgresql.org>
Date: 2022-11-17T08:02:58Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix a possibility of logical replication slot's restart_lsn going backwards.

Hi,

On 2022-11-15 23:59:37 +0900, Masahiko Sawada wrote:
> > Is something like the following scenario possible to happen?
> >
> > 1. wal sender updates slot's restart_lsn and releases the spin lock
> > (not saved in the disk yet)
> > 2. someone updates slots' minimum restart_lsn (note that slot's
> > restart_lsn in memory is already updated).

You mean ReplicationSlotsComputeRequiredLSN(), or update that specific slot's
restart_lsn? The latter shouldn't happen.


> > 3. checkpointer removes WAL files older than the minimum restart_lsn
> > calculated at step 2.

For xmin we have protection against that via the split between
catalog_xmin/effective_catalog_xmin. We should probably mirror that for
restart_lsn as well.

We should also call ReplicationSlotsComputeRequiredLSN if only update_restart
is true...


> > 4. wal sender restarts for some reason (or server crashed).

I don't think walsender alone restarting should change anything, but
crash-restart obviously would.

Greetings,

Andres Freund