Re: WAL segments removed from primary despite the fact that logical replication slot needs it.

Masahiko Sawada <sawada.mshk@gmail.com>

From: Masahiko Sawada <sawada.mshk@gmail.com>
To: depesz@depesz.com
Cc: Amit Kapila <amit.kapila16@gmail.com>, PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Date: 2023-02-06T08:25:42Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix a possibility of logical replication slot's restart_lsn going backwards.

Attachments

Hi,

On Thu, Dec 8, 2022 at 8:13 PM hubert depesz lubaczewski
<depesz@depesz.com> wrote:
>
> Hi,
> just checking - has there been any progress on diagnosing/fixing the
> bug?

Sorry for the late response.

Based on the analysis we did[1][2], I've created the manual scenario
to reproduce this issue with the attached patch and the script.

The scenario.md explains the basic steps to reproduce this issue. It
consists of 13 steps (very tricky!!). It's not sophisticated and could
be improved. test.sh is the shell script I used to execute the
reproduction steps from 1 to 10. In my environment, I could reproduce
this issue by the following steps.

1. apply the patch and build PostgreSQL.
2. run test.sh.
3. execute the step 11 and later described in scenario.md.

The test.sh is a very hacky and dirty script and is optimized in my
environment (especially adding many sleeps). You might need to adjust
it while checking scenario.md.

I've also confirmed that this issue is fixed by the attached patch,
which clears candidate_restart_lsn and friends during
ReplicationSlotRelease().

[1] https://www.postgresql.org/message-id/CAA4eK1JvyWHzMwhO9jzPquctE_ha6bz3EkB3KE6qQJx63StErQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoBHMCEDcV4eBtSVvDrCN4SrMXanX5N9%2BL-E%2B4OWXYY0ew%40mail.gmail.com

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com