Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

hubert depesz lubaczewski <depesz@depesz.com>

From: hubert depesz lubaczewski <depesz@depesz.com>
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: pgsql-bugs mailing list <pgsql-bugs@postgresql.org>
Date: 2022-11-14T12:17:03Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix a possibility of logical replication slot's restart_lsn going backwards.

On Mon, Nov 14, 2022 at 05:08:15PM +0530, Amit Kapila wrote:
> > #v+
> > 2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000082""",,,,,,,,,""
...
> > #v-
> >
> > So, the whole set of missing files was recycled at the same time.
> >
> > One more thing that I have is copy of pg_replication_slots info. This data is taken periodically, around every 1 minute. Each line, aside from pg_replication_slots slots data has also output of pg_current_wal_lsn() and timestamp. Looks like this:
> >
> > #v+
> > timestamp                pg_current_wal_lsn  slot_name  plugin    slot_type  datoid  database  temporary  active  active_pid  xmin  catalog_xmin  restart_lsn     confirmed_flush_lsn
> ...
> ...
> > 2022-11-10 21:02:52 UTC  103A7/B45ECEA8      focal14    pgoutput  logical    16607   canvas    f          f       \N          \N    3241434855    1039D/83825958  1039D/911A8DB0
> > 2022-11-10 21:03:52 UTC  103A7/C4355F60      focal14    pgoutput  logical    16607   canvas    f          t       21748       \N    3241443528    1039D/83825958  1039D/955633D0
> ...
> 
> I think from the above two it is clear that the slot with restart_lsn
> 1039D/83825958  is active and we seem to have recycled corresponding
> segment.
> 
> You may have the LOG for "attempting to remove WAL segments older than
> log file %s", if so, please share. Also on similar lines, I think we

Absolutely.

There is something weird happening:

#v+
2022-11-10 20:01:07.520 UTC,,,28925,,636378d6.70fd,4488667,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039B000000D9",,,,,,,,,""
2022-11-10 20:06:07.359 UTC,,,28925,,636378d6.70fd,4496840,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039B000000F8",,,,,,,,,""
2022-11-10 20:11:07.522 UTC,,,28925,,636378d6.70fd,4505099,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039C00000027",,,,,,,,,""
2022-11-10 20:16:07.466 UTC,,,28925,,636378d6.70fd,4513331,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039C00000054",,,,,,,,,""
2022-11-10 20:21:07.572 UTC,,,28925,,636378d6.70fd,4521466,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039C0000008F",,,,,,,,,""
2022-11-10 20:26:07.384 UTC,,,28925,,636378d6.70fd,4529518,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039C000000B9",,,,,,,,,""
2022-11-10 20:31:07.619 UTC,,,28925,,636378d6.70fd,4537868,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039C000000F8",,,,,,,,,""
2022-11-10 20:36:07.532 UTC,,,28925,,636378d6.70fd,4546079,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000002B",,,,,,,,,""
2022-11-10 20:41:07.603 UTC,,,28925,,636378d6.70fd,4554468,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D00000052",,,,,,,,,""
2022-11-10 20:46:07.609 UTC,,,28925,,636378d6.70fd,4562376,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000005F",,,,,,,,,""
2022-11-10 20:51:07.513 UTC,,,28925,,636378d6.70fd,4570523,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000005F",,,,,,,,,""
2022-11-10 20:56:07.727 UTC,,,28925,,636378d6.70fd,4578553,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D00000070",,,,,,,,,""
2022-11-10 21:01:07.509 UTC,,,28925,,636378d6.70fd,4586721,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D00000082",,,,,,,,,""
2022-11-10 21:06:07.521 UTC,,,28925,,636378d6.70fd,4594902,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:11:07.402 UTC,,,28925,,636378d6.70fd,4602683,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:16:07.837 UTC,,,28925,,636378d6.70fd,4610577,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:21:07.571 UTC,,,28925,,636378d6.70fd,4618490,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:26:07.559 UTC,,,28925,,636378d6.70fd,4626365,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:31:07.649 UTC,,,28925,,636378d6.70fd,4634592,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:36:07.737 UTC,,,28925,,636378d6.70fd,4642829,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:41:07.278 UTC,,,28925,,636378d6.70fd,4650983,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:46:07.670 UTC,,,28925,,636378d6.70fd,4658973,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:51:07.579 UTC,,,28925,,636378d6.70fd,4666946,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:56:07.618 UTC,,,28925,,636378d6.70fd,4674922,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
#v-

If I'll add "recycled" lines we will get:

#v+
...
2022-11-10 21:01:07.509 UTC,,,28925,,636378d6.70fd,4586721,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D00000082",,,,,,,,,""
2022-11-10 21:01:07.512 UTC,,,28925,,636378d6.70fd,4586722,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000071""",,,,,,,,,""
2022-11-10 21:01:07.514 UTC,,,28925,,636378d6.70fd,4586723,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000072""",,,,,,,,,""
2022-11-10 21:01:07.515 UTC,,,28925,,636378d6.70fd,4586724,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000073""",,,,,,,,,""
2022-11-10 21:01:07.517 UTC,,,28925,,636378d6.70fd,4586725,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000074""",,,,,,,,,""
2022-11-10 21:01:07.517 UTC,,,28925,,636378d6.70fd,4586726,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000075""",,,,,,,,,""
2022-11-10 21:01:07.519 UTC,,,28925,,636378d6.70fd,4586727,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000076""",,,,,,,,,""
2022-11-10 21:01:07.520 UTC,,,28925,,636378d6.70fd,4586728,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000077""",,,,,,,,,""
2022-11-10 21:01:07.521 UTC,,,28925,,636378d6.70fd,4586729,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000078""",,,,,,,,,""
2022-11-10 21:01:07.522 UTC,,,28925,,636378d6.70fd,4586730,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000079""",,,,,,,,,""
2022-11-10 21:01:07.523 UTC,,,28925,,636378d6.70fd,4586731,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007A""",,,,,,,,,""
2022-11-10 21:01:07.524 UTC,,,28925,,636378d6.70fd,4586732,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007B""",,,,,,,,,""
2022-11-10 21:01:07.524 UTC,,,28925,,636378d6.70fd,4586733,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007C""",,,,,,,,,""
2022-11-10 21:01:07.525 UTC,,,28925,,636378d6.70fd,4586734,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007D""",,,,,,,,,""
2022-11-10 21:01:07.527 UTC,,,28925,,636378d6.70fd,4586735,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007E""",,,,,,,,,""
2022-11-10 21:01:07.528 UTC,,,28925,,636378d6.70fd,4586736,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000007F""",,,,,,,,,""
2022-11-10 21:01:07.529 UTC,,,28925,,636378d6.70fd,4586737,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000080""",,,,,,,,,""
2022-11-10 21:01:07.530 UTC,,,28925,,636378d6.70fd,4586738,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000081""",,,,,,,,,""
2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000082""",,,,,,,,,""
2022-11-10 21:06:07.521 UTC,,,28925,,636378d6.70fd,4594902,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:06:07.523 UTC,,,28925,,636378d6.70fd,4594903,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000083""",,,,,,,,,""
2022-11-10 21:06:07.524 UTC,,,28925,,636378d6.70fd,4594904,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000084""",,,,,,,,,""
2022-11-10 21:06:07.525 UTC,,,28925,,636378d6.70fd,4594905,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000085""",,,,,,,,,""
2022-11-10 21:06:07.526 UTC,,,28925,,636378d6.70fd,4594906,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000086""",,,,,,,,,""
2022-11-10 21:06:07.527 UTC,,,28925,,636378d6.70fd,4594907,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000087""",,,,,,,,,""
2022-11-10 21:06:07.528 UTC,,,28925,,636378d6.70fd,4594908,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000088""",,,,,,,,,""
2022-11-10 21:06:07.529 UTC,,,28925,,636378d6.70fd,4594909,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D00000089""",,,,,,,,,""
2022-11-10 21:06:07.530 UTC,,,28925,,636378d6.70fd,4594910,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-ahead log file ""000000010001039D0000008A""",,,,,,,,,""
2022-11-10 21:11:07.402 UTC,,,28925,,636378d6.70fd,4602683,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:16:07.837 UTC,,,28925,,636378d6.70fd,4610577,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:21:07.571 UTC,,,28925,,636378d6.70fd,4618490,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:26:07.559 UTC,,,28925,,636378d6.70fd,4626365,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:31:07.649 UTC,,,28925,,636378d6.70fd,4634592,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:36:07.737 UTC,,,28925,,636378d6.70fd,4642829,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:41:07.278 UTC,,,28925,,636378d6.70fd,4650983,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:46:07.670 UTC,,,28925,,636378d6.70fd,4658973,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:51:07.579 UTC,,,28925,,636378d6.70fd,4666946,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
2022-11-10 21:56:07.618 UTC,,,28925,,636378d6.70fd,4674922,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to remove WAL segments older than log file 000000000001039D0000008A",,,,,,,,,""
#v-

> need to add some LOGs before calling RemoveOldXlogFiles() to know
> where our computation to remove files goes off. Something, like the
> attached, would be helpful but note I have added this quickly on my
> working branch to show you what I have in mind. You may need small
> changes based on the branch you are working on.

That will be complicated.

Changing pg (recompile, and rerun) requires careful preparation with our
customers. Which takes time.

Additionally we are using precompiled binaries from pgdg, that makes the
process slightly more involved.

And to top it off, as I mailed in the other message - the problem doesn't
happen always. For 8 clusters that I tried to prepare problem happened on two,
and on one of them (this one actually) it kinda solved on its own?!

I know I'm asking a lot, but is there any chance we could setup
conference/shared screen session to try to get as much data as possible
out of the situations we have now? Of course that would be paid
consulting. If you're OK with it, please mail me off list, so we can
arrange it.

Best regards,

depesz