Re: Improve pg_sync_replication_slots() to wait for primary to advance
shveta malik <shveta.malik@gmail.com>
From: shveta malik <shveta.malik@gmail.com>
To: Ajin Cherian <itsajin@gmail.com>
Cc: PostgreSQL mailing lists <pgsql-hackers@postgresql.org>,
shveta malik <shveta.malik@gmail.com>
Date: 2025-07-09T04:53:40Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Enhance slot synchronization API to respect promotion signal.
- 4bed04d39566 17.10 landed
- 94efd308bcec 18.4 landed
- 1362bc33e025 19 (unreleased) landed
-
Fix inconsistent elevel in pg_sync_replication_slots() retry logic.
- f1ddaa15357f 19 (unreleased) landed
-
Refactor slot synchronization logic in slotsync.c.
- 788ec96d591d 19 (unreleased) landed
-
Fix intermittent BF failure in 040_standby_failover_slots_sync.
- b47c50e5667b 19 (unreleased) landed
-
Add retry logic to pg_sync_replication_slots().
- 0d2d4a0ec3ec 19 (unreleased) landed
-
Fix LOCK_TIMEOUT handling in slotsync worker.
- 04396eacd3fa 19 (unreleased) cited
-
Add slotsync skip statistics.
- 76b78721ca49 19 (unreleased) cited
Please find few more comments:
1)
In pg_sync_replication_slots() doc, we have this:
"Note that this function is primarily intended for testing and
debugging purposes and should be used with caution. Additionally, this
function cannot be executed if ...."
We can get rid of this info as well and change to:
"Note that this function cannot be executed if...."
2)
We got rid of NOTE in logicaldecoding.sgml, but now the page does not
mention pg_sync_replication_slots() at all. We need to bring back the
change removed by [1] (or something on similar line) which is this:
- <command>CREATE SUBSCRIPTION</command> during slot creation, and
then calling
- <link linkend="pg-sync-replication-slots">
- <function>pg_sync_replication_slots</function></link>
- on the standby. By setting <link linkend="guc-sync-replication-slots">
+ <command>CREATE SUBSCRIPTION</command> during slot creation.
+ Additionally, enabling <link linkend="guc-sync-replication-slots">
+ <varname>sync_replication_slots</varname></link> on the standby
+ is required. By enabling <link linkend="guc-sync-replication-slots">
3)
wait_for_primary_slot_catchup():
+ /*
+ * It is possible to get null values for confirmed_lsn and
+ * catalog_xmin if on the primary server the slot is just created with
+ * a valid restart_lsn and slot-sync worker has fetched the slot
+ * before the primary server could set valid confirmed_lsn and
+ * catalog_xmin.
+ */
Do we need this special handling? We already have one such handling in
synchronize_slots(). please see:
/*
* If restart_lsn, confirmed_lsn or catalog_xmin is
invalid but the
* slot is valid, that means we have fetched the
remote_slot in its
* RS_EPHEMERAL state. In such a case, don't sync it;
we can always
* sync it in the next sync cycle when the remote_slot
is persisted
* and has valid lsn(s) and xmin values.
*/
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
!TransactionIdIsValid(remote_slot->catalog_xmin)) &&
remote_slot->invalidated == RS_INVAL_NONE)
pfree(remote_slot);
Due to the above check in synchronize_slots(), we will not reach
wait_for_primary_slot_catchup() when any of confirmed_lsn or
catalog_xmin is not initialized.
[1]: https://www.postgresql.org/message-id/CAJpy0uAD_La2vi%2BB%2BiSBbCYTMayMstvbF9ndrAJysL9t5fHtbQ%40mail.gmail.com
thanks
Shveta