[bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
From: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>
To: "'pgsql-hackers@lists.postgresql.org'" <pgsql-hackers@lists.postgresql.org>
Cc: Amit Kapila <amit.kapila16@gmail.com>, "shveta.malik@gmail.com" <shveta.malik@gmail.com>
Date: 2024-08-08T05:07:18Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Don't advance origin during apply failure.
- 1528b0d899d6 19 (unreleased) landed
- 2f7ffe124a9b 18.2 landed
- 63a65adf4d8e 16.12 landed
- 0ed8f1afb15f 17.8 landed
- 3f28b2fcac33 18.0 landed
- 915aafe82a7c 17.0 landed
- b39c5272c1d2 16.5 landed
-
Perform apply of large transactions by parallel workers.
- 216a784829c2 16.0 cited
Attachments
- v2-0001-Prevent-origin-progress-advancement-if-failed-to-.patch (application/octet-stream) patch v2-0001
- test_2pc.sh (application/octet-stream)
Dear hackers, This thread forks from [1]. Here can be used to discuss second item. Below part contains the same statements written in [1], but I did copy-and-paste just in case. Attached patch is almost the same but bit modified based on the comment from Amit [2] - an unrelated change is removed. Found issue ===== When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0 and a transaction is prepared on the publisher, the apply worker reports an ERROR on the subscriber. After that, the prepared transaction is not replayed, which means it's lost forever. Attached script can emulate the situation. -- ERROR: prepared transactions are disabled HINT: Set "max_prepared_transactions" to a nonzero value. -- The reason is that we advanced the origin progress when aborting the transaction as well (RecordTransactionAbort->replorigin_session_advance). So, after setting replorigin_session_origin_lsn, if any ERROR happens when preparing the transaction, the transaction aborts which incorrectly advances the origin lsn. An easiest fix is to reset session replication origin before calling the RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop() raises an ERROR or 2) apply worker exits. Attached patch can fix the issue on HEAD. [1]: https://www.postgresql.org/message-id/TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2%40TYAPR01MB5692.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/CAA4eK1L-r8OKGdBwC6AeXSibrjr9xKsg8LjGpX_PDR5Go-A9TA%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED