Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

Masahiko Sawada <sawada.mshk@gmail.com>

From: Masahiko Sawada <sawada.mshk@gmail.com>
To: Andres Freund <andres@anarazel.de>
Cc: depesz@depesz.com, Amit Kapila <amit.kapila16@gmail.com>, pgsql-bugs mailing list <pgsql-bugs@postgresql.org>
Date: 2022-11-17T14:22:12Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix a possibility of logical replication slot's restart_lsn going backwards.

On Thu, Nov 17, 2022 at 5:03 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-11-15 23:59:37 +0900, Masahiko Sawada wrote:
> > > Is something like the following scenario possible to happen?
> > >
> > > 1. wal sender updates slot's restart_lsn and releases the spin lock
> > > (not saved in the disk yet)
> > > 2. someone updates slots' minimum restart_lsn (note that slot's
> > > restart_lsn in memory is already updated).
>
> You mean ReplicationSlotsComputeRequiredLSN(), or update that specific slot's
> restart_lsn? The latter shouldn't happen.

I meant the former.

>
>
> > > 3. checkpointer removes WAL files older than the minimum restart_lsn
> > > calculated at step 2.
>
> For xmin we have protection against that via the split between
> catalog_xmin/effective_catalog_xmin. We should probably mirror that for
> restart_lsn as well.
>
> We should also call ReplicationSlotsComputeRequiredLSN if only update_restart
> is true...

Agree.

>
>
> > > 4. wal sender restarts for some reason (or server crashed).
>
> I don't think walsender alone restarting should change anything, but
> crash-restart obviously would.

Right. I've confirmed this scenario is possible to happen with crash-restart.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com