Thread

  1. Re: Excessive LOG messages from replication slot sync worker

    Japin Li <japinli@hotmail.com> — 2025-08-12T07:38:34Z

    On Tue, Aug 12, 2025 at 12:24:10PM +0530, shveta malik wrote:
    > On Tue, Aug 12, 2025 at 8:52 AM Fujii Masao <masao.fujii@gmail.com> wrote:
    > >
    > > Hi,
    > >
    > > While testing the replication slot synchronization feature, I set up
    > > three instances on the same machine:
    > >
    > > - Physical replication primary (also Logical replication publisher)
    > > - Physical replication standby
    > > - Logical replication subscriber
    > >
    > > ----------------------------------------
    > > initdb -D pri --encoding=UTF8 --locale=C
    > > cat <<EOF >> pri/postgresql.conf
    > > wal_level = logical
    > > log_line_prefix = '%m [pri][%b] '
    > > logging_collector = on
    > > EOF
    > > pg_ctl -D pri start
    > > pgbench -i
    > > cat <<EOF | psql
    > > CREATE PUBLICATION mypub FOR ALL TABLES;
    > > SELECT pg_create_physical_replication_slot('pslot');
    > > EOF
    > > cat <<EOF >> pri/postgresql.conf
    > > synchronized_standby_slots = 'pslot'
    > > EOF
    > > pg_ctl -D pri reload
    > >
    > > pg_basebackup -D sby -c fast
    > > touch sby/standby.signal
    > > cat <<EOF >> sby/postgresql.conf
    > > port = 5433
    > > primary_conninfo = 'host=localhost port=5432 dbname=postgres'
    > > primary_slot_name = 'pslot'
    > > sync_replication_slots = on
    > > hot_standby_feedback = on
    > > log_line_prefix = '%m [sby][%b] '
    > > EOF
    > > pg_ctl -D sby start
    > >
    > > initdb -D sub --encoding=UTF8 --locale=C
    > > cat <<EOF >> sub/postgresql.conf
    > > port = 5434
    > > wal_level = logical
    > > log_line_prefix = '%m [sub][%b] '
    > > logging_collector = on
    > > EOF
    > > pg_ctl -D sub start
    > > pgbench -i -Itp -p 5434
    > > cat <<EOF | psql -p 5434
    > > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432
    > > dbname=postgres' PUBLICATION mypub WITH (failover);
    > > EOF
    > > ----------------------------------------
    > >
    > >
    > > I noticed that the replication slot sync worker on the standby generates
    > > a large number of LOG messages, about two per 200–300 ms, during
    > > pgbench runs (pgbench -c 10 -j 10 -T 60). For example,
    > >
    > > ----------------------------------------
    > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  starting
    > > logical decoding for slot "mysub"
    > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  Streaming
    > > transactions committing after 0/04F317B8, reading WAL from 0/04006FC0.
    > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] LOG:  logical
    > > decoding found consistent point at 0/04006FC0
    > > 2025-08-12 11:48:21.067 JST [sby][slotsync worker] DETAIL:  There are
    > > no running transactions.
    > > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] LOG:  starting
    > > logical decoding for slot "mysub"
    > > 2025-08-12 11:48:21.296 JST [sby][slotsync worker] DETAIL:  Streaming
    > > transactions committing after 0/04F9DC90, reading WAL from 0/04006FC0.
    > > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] LOG:  logical
    > > decoding found consistent point at 0/04006FC0
    > > 2025-08-12 11:48:21.297 JST [sby][slotsync worker] DETAIL:  There are
    > > no running transactions.
    > > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] LOG:  starting
    > > logical decoding for slot "mysub"
    > > 2025-08-12 11:48:21.526 JST [sby][slotsync worker] DETAIL:  Streaming
    > > transactions committing after 0/0506A5C0, reading WAL from 0/04006FC0.
    > > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] LOG:  logical
    > > decoding found consistent point at 0/04006FC0
    > > 2025-08-12 11:48:21.527 JST [sby][slotsync worker] DETAIL:  There are
    > > no running transactions.
    > > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] LOG:  starting
    > > logical decoding for slot "mysub"
    > > 2025-08-12 11:48:21.762 JST [sby][slotsync worker] DETAIL:  Streaming
    > > transactions committing after 0/050C7F70, reading WAL from 0/04006FC0.
    > > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] LOG:  logical
    > > decoding found consistent point at 0/04006FC0
    > > 2025-08-12 11:48:21.763 JST [sby][slotsync worker] DETAIL:  There are
    > > no running transactions.
    > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  starting
    > > logical decoding for slot "mysub"
    > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  Streaming
    > > transactions committing after 0/05111420, reading WAL from 0/04006FC0.
    > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] LOG:  logical
    > > decoding found consistent point at 0/04006FC0
    > > 2025-08-12 11:48:21.999 JST [sby][slotsync worker] DETAIL:  There are
    > > no running transactions.
    > > ----------------------------------------
    > >
    > >
    > >  Is this expected behavior, a configuration mistake on my side, or perhaps
    > >  an oversight in the commit of the replication slot synchronization feature?
    > >
    > > This seems more like debug output left over from development rather than
    > > something useful for users. If this is true, would it make sense to change
    > > these messages from LOG to DEBUG?
    > >
    > 
    > Thank You for reporting the issue.
    > 
    > It looks like commit 2633dae (mistakenly) introduced a change ([1]) in
    > SnapBuildSnapshotExists(), altering the format used for snapshot file
    > names during the search. However, SnapBuildSerialize still uses the
    > old format ("%s/%X-%X.snap"), which led to the slot-sync worker being
    > unable to locate existing snapshot files. Reverting the format
    > resolved the issue for me. Could you please check if the attached
    > patch works for you as well?
    > 
    > [1]:
    > SnapBuildSnapshotExists()
    > - sprintf(path, "%s/%X-%X.snap",
    > + sprintf(path, "%s/%08X-%08X.snap",
    > 
    
    Sorry, it's my fault.  I forgot to update the format in SnapBuildSerialize().
    
    I'd prefer to update the format in SnapBuildSerialize() instead of reverting
    this change.
    
    diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
    index 8532bfd27e5..db7051170fc 100644
    --- a/src/backend/replication/logical/snapbuild.c
    +++ b/src/backend/replication/logical/snapbuild.c
    @@ -1529,7 +1529,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
          * unless the user used pg_resetwal or similar. If a user did so, there's
          * no hope continuing to decode anyway.
          */
    -    sprintf(path, "%s/%X-%X.snap",
    +    sprintf(path, "%s/%08X-%08X.snap",
                 PG_LOGICAL_SNAPSHOTS_DIR,
                 LSN_FORMAT_ARGS(lsn));
    
    @@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
         elog(DEBUG1, "serializing snapshot to %s", path);
    
         /* to make sure only we will write to this tempfile, include pid */
    -    sprintf(tmppath, "%s/%X-%X.snap.%d.tmp",
    +    sprintf(tmppath, "%s/%08X-%08X.snap.%d.tmp",
                 PG_LOGICAL_SNAPSHOTS_DIR,
                 LSN_FORMAT_ARGS(lsn), MyProcPid);
    
    -- 
    Best regards,
    Japin Li
    ChengDu WenWu Information Technology Co., LTD.