Thread
-
[BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding
cca5507 <cca5507@qq.com> — 2025-11-22T08:55:05Z
Hi, When working on another historic snapshot's bug in [1], I find the $subject. Here is a test case, but we need to add some log in SnapBuildSerialize() first: diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 6e18baa33cb..6d13b2d811b 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1523,6 +1523,19 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn) /* consistent snapshots have no next phase */ Assert(builder->next_phase_at == InvalidTransactionId); + StringInfoData logbuf; + initStringInfo(&logbuf); + appendStringInfo(&logbuf, "SnapBuildSerialize: lsn: %X/%08X xmin: %u, xmax: %u, committed: ", + LSN_FORMAT_ARGS(lsn), builder->xmin, builder->xmax); + for (size_t i = 0; i < builder->committed.xcnt; i++) + { + if (i > 0) + appendStringInfoString(&logbuf, ", "); + appendStringInfo(&logbuf, "%u", builder->committed.xip[i]); + } + elog(LOG, "%s", logbuf.data); + pfree(logbuf.data); + /* * We identify snapshots by the LSN they are valid for. We don't need to * include timelines in the name as each LSN maps to exactly one timeline 1) create table t (id int) with (user_catalog_table = true); 2) select pg_create_logical_replication_slot('s1', 'test_decoding'); 3) select pg_create_logical_replication_slot('s2', 'test_decoding'); 4) insert into t values (1); 5) select pg_replication_slot_advance('s1', pg_current_wal_lsn()); 6) select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null); Then we will find some log like this: LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 768, committed: STATEMENT: select pg_replication_slot_advance('s1', pg_current_wal_lsn()); LOG: SnapBuildSerialize: lsn: 0/017D1318 xmin: 768, xmax: 769, committed: 768 STATEMENT: select pg_logical_slot_get_changes('s2', pg_current_wal_lsn(), null); At the same lsn, we get two different historic snapshots, and the first one (which is incorrect) is serialized to disk. The main reason is that we don't handle XLOG_HEAP2_NEW_CID during fast-forwarding, so we don't consider the insert as having a catalog change. Attach a patch to fix it. Looking forward to your reply. [1] https://www.postgresql.org/message-id/tencent_21E152AD504A814C071EDF41A4DD7BA84D06%40qq.com -- Regards, ChangAo Chen