RE: Re:RE: Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
From: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>
To: 'ocean_li_996' <ocean_li_996@163.com>
Cc: 'Alexander Lakhin' <exclusion@gmail.com>, "pgsql-bugs@lists.postgresql.org" <pgsql-bugs@lists.postgresql.org>, "feichanghong@qq.com" <feichanghong@qq.com>, "amit.kapila16@gmail.com" <amit.kapila16@gmail.com>
Date: 2024-03-12T10:22:59Z
Lists: pgsql-bugs
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix catalog lookup due to wrong snapshot for subtransactions during decoding.
- 6b77048e571d 14.11 cited
Dear Haiyang,
Thanks for checking! This reply was still focused only on "Issue 2" in your notation.
>## Issue 2
>Inspired by your spec case, I've reorganized the spec case provided in [2]. The new test in attachment
>is able to reproduce the issue mentioned in [1] even before commit 6b77048e5.
Good findings. I also confirmed the workload could fail after reverting the 6b77048e5.
Also confirmed that the patch [1] could fix the workload as well.
permutation "s0_init" "s0_begin" "s0_savepoint" "s0_create_part1" "s0_savepoint_release"
"s2_init" "s1_checkpoint" "s1_get_changes" "s0_commit" "s2_get_changes"
## Analysis
The point was that the serialized snapshot by another replication slot can be reused.
When the first get_change is called, a consistent snapshot can be serialized because
of the XLOG_RUNNING_XACTS record (see later).
The get_changes for the second slot reuses so that it can read WAL records property.
(If the first slot does not exist, the status of the snapshot would be
SNAPBUILD_BUILDING_SNAPSHOT. So no records are read)
In the second get_changes, below records are read. First (LOCK, RUNNING_XACTS)
pair is generated from the slot creation, and second pair comes from the
CHECKPOINT. I.e., it reads all records from the slot generation.
```
...lsn: 0/01906DB8, prev 0/01906D58, desc: LOCK ...
...lsn: 0/01906DF0, prev 0/01906DB8, desc: RUNNING_XACTS ...
...lsn: 0/01906E30, prev 0/01906DF0, desc: LOCK ...
...lsn: 0/01906E68, prev 0/01906E30, desc: RUNNING_XACTS ...
...lsn: 0/01906EA8, prev 0/01906E68, desc: CHECKPOINT_ONLINE ...
...lsn: 0/01906F20, prev 0/01906EA8, desc: COMMIT ... subxacts: 728; ... inval msgs: ...
```
Also the final COMMIT record contains the info for a subtransaction and
XACT_XINFO_HAS_INVALS flag, so DecodeCommit()->SnapBuildXidSetCatalogChanges()
is called transactions.
After that, two ReorderBufferTXNs are created with the same LSN, it fails the
assertion in AssertTXNLsnOrder().
I will update the patch if above analysis is correct.
>The approach in [3] is also LGFM.
Thanks. I agreed that we should not ease condition for Assert() as much as possible.
[1]: https://www.postgresql.org/message-id/TYCPR01MB1207790E98F0A563280CC39FCF5262%40TYCPR01MB12077.jpnprd01.prod.outlook.com
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/