Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>
To: depesz@depesz.com
Cc: pgsql-bugs mailing list <pgsql-bugs@postgresql.org>
Date: 2022-10-19T10:44:28Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix a possibility of logical replication slot's restart_lsn going backwards.

On Wed, Oct 19, 2022 at 3:50 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> On Tue, Oct 18, 2022 at 04:57:52PM +0530, Amit Kapila wrote:
> > On Mon, Oct 17, 2022 at 2:43 PM hubert depesz lubaczewski
> > <depesz@depesz.com> wrote:
> > >
> > > On Sun, Oct 16, 2022 at 10:35:17AM +0530, Amit Kapila wrote:
> > > > > Wal file has been removed. Please note that the file was, as shown earlier, still within "restart_lsn" as visibile on pg12/bionic.
> > > > This is quite strange and I am not able to see the reason why this can
> > > > happen. The only way to debug this problem that comes to mind is to
> > > > add more LOGS around the code that removes the WAL files. For example,
> > > > we can try to print the value of minimumslotLSN (keep) and logSegNo in
> > > > KeepLogSeg().
> > >
> > > That would require changing pg sources, I think, recompiling, and
> > > retrying?
> > >
> >
> > Yes. BTW, are you on the latest release of pg12, if not then you can
> > once check the release notes to see if there is any related bug fix in
> > the later releases?
> >
> > Is this problem reproducible? If so, can you find out why there are
> > multiple time connection issues between walsender and walreceiver?
>
> I can try to redo it, but before I do - anything I could do to either
> side of replication to increase our chances at figuring out the
> underlying problem? I can't restart pg12, though.
>

One idea is to change log level to DEBUG2 so that we can print which
files are removed by the server via
RemoveOldXlogFiles()
{
...
elog(DEBUG2, "attempting to remove WAL segments older than log file %s",
lastoff);
...
}

If we can do this then at the very least we can know whether the
required files are removed by the server or by some external
application.

-- 
With Regards,
Amit Kapila.