Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Clear 'xid' in dummy async notify entries written to fill up pages

  2. Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY

  3. Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY

  4. Escalate ERRORs during async notify processing to FATAL

  5. Limit the size of TID lists during parallel GIN build

  1. LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-06T04:50:01Z

    Hi,
    
    I'm bringing up a bug that was reported multiple times [1][2][3] in
    the bugs list here, for a broader audience.
    
    The issue is that an ERROR like the one below occurs when trying to
    register any listener in the database.
    
    test=# listen c21;
    ERROR:  58P01: could not access status of transaction 14279685
    DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
    LOCATION:  SlruReportIOError, slru.c:1087
    
    In [1], Andrei Varashen provided detailed reproduction steps. I’m
    copying and pasting his example from thread [1] here, with slight
    simplification.
    
    Pre-conditions:
    - Disable autovacuum to avoid its intervention.
    
    Steps to reproduce:
    
    1. Create a test table and notify (but not listen) to a channel
    (backend 1):
    
    create table test (id int);
    insert into test values (1);
    notify c1;
    
    2. List pg_xact files so we know its starting state:
    
    ➜  ls -lah ~/pg-devel/data/pg_xact
    total 16
    drwx------@  3 alex.wang  staff    96B Aug  5 20:10 .
    drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
    -rw-------@  1 alex.wang  staff   8.0K Aug  5 20:12 0000
    
    3. Prepare a "test.sql" file for pgbench, and then run pgbench to
    generate lots of transactions, so that pg_xact/0000 is completely
    filled and leads to pg_xact/0001.
    
    > cat test.sql
    UPDATE test SET id = 1;
    
    > pgbench -n -c 80 -j 10 -t 15000 -f ~/workspace/test.sql postgres
    
    4. Verify that pg_xact/0001 is created:
    
    ➜  ls -lah ~/pg-devel/data/pg_xact
    total 560
    drwx------@  4 alex.wang  staff   128B Aug  5 20:25 .
    drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
    -rw-------@  1 alex.wang  staff   256K Aug  5 20:25 0000
    -rw-------@  1 alex.wang  staff    16K Aug  5 20:25 0001
    
    5. Execute VACUUM FREEZE on every database on the server to freeze
    rows and purge pg_xact/0000.
    
    postgres=# VACUUM FREEZE;
    VACUUM
    
    postgres=# \c template1
    You are now connected to database "template1" as user "postgres".
    template1=# VACUUM FREEZE;
    VACUUM
    
    template1=# ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
    ALTER DATABASE
    template1=# \c template0
    You are now connected to database "template0" as user "postgres".
    template0=# VACUUM FREEZE;
    VACUUM
    postgres=# select datname, datfrozenxid from pg_database;
      datname  | datfrozenxid
    -----------+--------------
     postgres  |      1200774
     template0 |      1200775
     template1 |      1200774
    (3 rows)
    
    6. Ensure that pg_xact/0000 is gone:
    
    ➜   ls -lah ~/pg-devel/data/pg_xact
    total 80
    drwx------@  3 alex.wang  staff    96B Aug  5 20:29 .
    drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
    -rw-------@  1 alex.wang  staff    40K Aug  5 20:30 0001
    
    7. Try to listen to any channel from any backend connection on the
    same database:
    
    postgres=# listen c1;
    ERROR:  58P01: could not access status of transaction 773
    DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
    LOCATION:  SlruReportIOError, slru.c:1087
    postgres=# listen c2;
    ERROR:  58P01: could not access status of transaction 773
    DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
    LOCATION:  SlruReportIOError, slru.c:1087
    
    Here's the stack trace from the master branch:
    (lldb) bt
    * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 21.1
      * frame #0: 0x0000000102c798c8
    postgres`SlruReportIOError(ctl=0x000000010378e5e0, pageno=0, xid=773) at
    slru.c:1084:4
        frame #1: 0x0000000102c792b0
    postgres`SimpleLruReadPage(ctl=0x000000010378e5e0, pageno=0, write_ok=true,
    xid=773) at slru.c:603:4
        frame #2: 0x0000000102c79f7c
    postgres`SimpleLruReadPage_ReadOnly(ctl=0x000000010378e5e0, pageno=0,
    xid=773) at slru.c:661:9
        frame #3: 0x0000000102c6a6bc postgres`TransactionIdGetStatus(xid=773,
    lsn=0x000000016d2975e8) at clog.c:745:11
        frame #4: 0x0000000102c7e924
    postgres`TransactionLogFetch(transactionId=773) at transam.c:79:14
        frame #5: 0x0000000102c7e74c
    postgres`TransactionIdDidCommit(transactionId=773) at transam.c:130:14
        frame #6: 0x0000000102dc8b94
    postgres`asyncQueueProcessPageEntries(current=0x000000016d297720,
    stop=QueuePosition @ 0x000000016d297690, page_buffer="\U00000014",
    snapshot=0x000000012d812398) at async.c:2069:13
        frame #7: 0x0000000102dc8954 postgres`asyncQueueReadAllNotifications at
    async.c:1981:18
        frame #8: 0x0000000102dc6b5c postgres`Exec_ListenPreCommit at
    async.c:1127:3
        frame #9: 0x0000000102dc664c postgres`PreCommit_Notify at async.c:881:6
        frame #10: 0x0000000102c8c77c postgres`CommitTransaction at
    xact.c:2341:2
        frame #11: 0x0000000102c87b2c postgres`CommitTransactionCommandInternal
    at xact.c:3214:4
        frame #12: 0x0000000102c87a44 postgres`CommitTransactionCommand at
    xact.c:3175:10
        frame #13: 0x000000010321da94 postgres`finish_xact_command at
    postgres.c:2833:3
        frame #14: 0x000000010321b64c
    postgres`exec_simple_query(query_string="listen c1;") at postgres.c:1298:4
        frame #15: 0x000000010321a714 postgres`PostgresMain(dbname="postgres",
    username="alex.wang") at postgres.c:4767:7
        frame #16: 0x0000000103211a14
    postgres`BackendMain(startup_data=0x000000016d299e48, startup_data_len=24)
    at backend_startup.c:124:2
        frame #17: 0x00000001030e938c
    postgres`postmaster_child_launch(child_type=B_BACKEND, child_slot=56,
    startup_data=0x000000016d299e48, startup_data_len=24,
    client_sock=0x000000016d299ed8) at launch_backend.c:290:3
        frame #18: 0x00000001030f0d60
    postgres`BackendStartup(client_sock=0x000000016d299ed8) at
    postmaster.c:3587:8
        frame #19: 0x00000001030eebc4 postgres`ServerLoop at postmaster.c:1702:6
        frame #20: 0x00000001030ed67c postgres`PostmasterMain(argc=3,
    argv=0x00006000021f14e0) at postmaster.c:1400:11
        frame #21: 0x0000000102f73e40 postgres`main(argc=3,
    argv=0x00006000021f14e0) at main.c:231:4
        frame #22: 0x00000001940a2b98 dyld`start + 6076
    
    Daniil Davydov has analyzed the root cause in thread [4] and I agree with
    what he said:
    
    On Thu, Jul 31, 2025 at 8:21 PM Daniil Davydov <3danissimo@gmail.com> wrote:
    
    > We have the following logic in the notify queue :
    > If there are no listeners within all databases, and we are calling
    > LISTEN, then we must iterate from 'tail' to 'head' of the queue and
    > check statuses of transactions (see Exec_ListenPreCommit).
    > If there is a pruned-away xid in the queue, we will try to access its
    > status and get an error.
    >
    > Because the tail of the queue is not necessarily always advanced
    > forward by the listeners, we can get such error without any long lived
    > transactions.
    >
    
    The fix and workarounds were discussed in [5] and [6]: In [5], Daniil
    proposed a patch, which I’ve attached. The patch adds a call to
    asyncQueueAdvanceTail() in vac_update_datfrozenxid(), so that VACUUM
    advances the async queue tail.
    
    Similarly, calling the built-in function pg_notification_queue_usage()
    also advances the async queue tail. So when the issue occurs, calling
    this function could also make the error go away. However, this doesn’t
    prevent the error from happening in the first place.
    
    My questions:
    
    1. Is it acceptable to drop notifications from the async queue if
    there are no active listeners? There might still be notifications that
    haven’t been read by any previous listener.
    
    2. If the answer to 1 is no, how can we teach VACUUM to respect the
    minimum xid stored in all AsyncQueueEntries?
    
    Best,
    Alex
    
    [1]
    https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2%40postgresql.org
    [2]
    https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
    [3]
    https://www.postgresql.org/message-id/18394-e7459245148578b2@postgresql.org
    [4]
    https://www.postgresql.org/message-id/CAJDiXgh3Jh2N90Fe4%3DX2qE%2BYAiZ1BSpgznhH%2BvkTHmvK3gjqxw%40mail.gmail.com
    [5]
    https://www.postgresql.org/message-id/CAJDiXgj1BmLKAZ%3DgOC1eETmctt7z%3Daj8MHfmd%2BnORE1P6qncsA%40mail.gmail.com
    [6]
    https://www.postgresql.org/message-id/CAK98qZ0uZrf1b6L4mDs%2B17M07KCNt6YMc9vD-%2BTmeWaJHRbGpA%40mail.gmail.com
    
  2. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-08-06T10:44:13Z

    On 2025-Aug-05, Alexandra Wang wrote:
    
    > I'm bringing up a bug that was reported multiple times [1][2][3] in
    > the bugs list here, for a broader audience.
    > 
    > The issue is that an ERROR like the one below occurs when trying to
    > register any listener in the database.
    > 
    > test=# listen c21;
    > ERROR:  58P01: could not access status of transaction 14279685
    > DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
    > LOCATION:  SlruReportIOError, slru.c:1087
    
    Oh, interesting problem.  Many thanks for the excellent write-up.
    
    > My questions:
    > 
    > 1. Is it acceptable to drop notifications from the async queue if
    > there are no active listeners? There might still be notifications that
    > haven’t been read by any previous listener.
    
    I'm somewhat wary of this idea -- could these inactive listeners become
    active later and expect to be able to read their notifies?
    
    > 2. If the answer to 1 is no, how can we teach VACUUM to respect the
    > minimum xid stored in all AsyncQueueEntries?
    
    Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
    listeners, and back off the pg_clog truncation based on that.  This
    could be done by having a new boolean argument that says to look up the
    XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
    would only be passed true by vac_update_datfrozenxid(), to avoid
    overhead by other callers), then collect the oldest of those and return
    it.
    
    This does create the problem that an inactive listener could cause the
    XID counter to stay far in the past.  Maybe we could try to avoid this
    by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
    send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
    are way overdue on reading notifies.  I'm not sure if this is really
    needed or useful; consider a backend stuck on SIGSTOP (debugger or
    whatever): it will just sit there forever.
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    
    
    
    
  3. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-11T13:41:08Z

    On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
    >> My questions:
    >> 
    >> 1. Is it acceptable to drop notifications from the async queue if
    >> there are no active listeners? There might still be notifications that
    >> haven’t been read by any previous listener.
    >
    > I'm somewhat wary of this idea -- could these inactive listeners become
    > active later and expect to be able to read their notifies?
    >
    I'm bit worry about this too.
    
    >> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
    >> minimum xid stored in all AsyncQueueEntries?
    >
    > Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
    > listeners, and back off the pg_clog truncation based on that.  This
    > could be done by having a new boolean argument that says to look up the
    > XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
    > would only be passed true by vac_update_datfrozenxid(), to avoid
    > overhead by other callers), then collect the oldest of those and return
    > it.
    >
    The problem with only considering the oldest XID of listeners is that
    IIUC we may have notifications without listeners, and in this case we
    may still get this error because when the LISTEN is executed we loop
    through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
    call TransactionIdDidCommit() that raise the error before
    IsListeningOn(channel) is called.
    
    Another option would be to add a minXid field on AsyncQueueControl and
    then update this value on asyncQueueProcessPageEntries() and
    asyncQueueAddEntries() routines, and then we could check this value on
    vac_update_datfrozenxid().
    
    > This does create the problem that an inactive listener could cause the
    > XID counter to stay far in the past.  Maybe we could try to avoid this
    > by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
    > send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
    > are way overdue on reading notifies.  I'm not sure if this is really
    > needed or useful; consider a backend stuck on SIGSTOP (debugger or
    > whatever): it will just sit there forever.
    >
    With this idea that I've proposed we still could have this problem, if a
    listener take too long to consume a message we would block vacuum freeze
    to advance the xid. For this I think that we could have two GUC's; One
    to enable and disable the oldest xmin check on async queue and the
    second to control how far we want to prevent the vacuum from freezing
    the oldest async queue xid, and if the min xid raises this limit we
    ignore and truncate the xid.
    
    I've write a draft patch that plays with the idea, see attached.
    
    --
    Matheus Alcantara
    
  4. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Daniil Davydov <3danissimo@gmail.com> — 2025-08-13T19:29:29Z

    Hi,
    
    On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
    > >> My questions:
    > >>
    > >> 1. Is it acceptable to drop notifications from the async queue if
    > >> there are no active listeners? There might still be notifications that
    > >> haven’t been read by any previous listener.
    > >
    > > I'm somewhat wary of this idea -- could these inactive listeners become
    > > active later and expect to be able to read their notifies?
    > >
    > I'm bit worry about this too.
    
    What exactly do we mean by "active listener"? According to the source code,
    the active listener (as far as I understand) is the one who listens to at least
    one channel. If we have no active listeners in the database, the new listener
    will set its pointer to the tail of the async queue. Thus, messages with old
    xid will not be touched by anybody. I don't see any point in dropping them
    in this case.
    
    If the "inactive" listener is the backend which is stuck somewhere, the
    answer is "no" - this backend should be able to process all notifications.
    
    >
    > >> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
    > >> minimum xid stored in all AsyncQueueEntries?
    > >
    > > Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
    > > listeners, and back off the pg_clog truncation based on that.  This
    > > could be done by having a new boolean argument that says to look up the
    > > XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
    > > would only be passed true by vac_update_datfrozenxid(), to avoid
    > > overhead by other callers), then collect the oldest of those and return
    > > it.
    > >
    > The problem with only considering the oldest XID of listeners is that
    > IIUC we may have notifications without listeners, and in this case we
    > may still get this error because when the LISTEN is executed we loop
    > through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
    > call TransactionIdDidCommit() that raise the error before
    > IsListeningOn(channel) is called.
    >
    
    Agree.
    
    > > This does create the problem that an inactive listener could cause the
    > > XID counter to stay far in the past.  Maybe we could try to avoid this
    > > by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
    > > send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
    > > are way overdue on reading notifies.  I'm not sure if this is really
    > > needed or useful; consider a backend stuck on SIGSTOP (debugger or
    > > whatever): it will just sit there forever.
    > >
    > With this idea that I've proposed we still could have this problem, if a
    > listener take too long to consume a message we would block vacuum freeze
    > to advance the xid. For this I think that we could have two GUC's; One
    > to enable and disable the oldest xmin check on async queue and the
    > second to control how far we want to prevent the vacuum from freezing
    > the oldest async queue xid, and if the min xid raises this limit we
    > ignore and truncate the xid.
    >
    
    1) About Álvaro's comment:
    I don't think that killing the lagging backend will save us, because entries
    in the async queue are not ordered by xid. Thus, even after killing such a
    backend, we have no guarantees that the problem is gone, because not
    lagging backends still may encounter a super old xid.
    
    2) About Matheus's comment:
    I guess that there is only one reliable way to determine the precise minimal xid
    in the queue (after message processing) - scan all pages from head to tail.
    It obviously can take a lot of time. The second GUC (if I get it
    right) logic is based
    on minimal xid determination, so it won't be the best solution for
    highload systems
    (that are forced to turn on the first GUC, because of safety requirements).
    
    > Another option would be to add a minXid field on AsyncQueueControl and
    > then update this value on asyncQueueProcessPageEntries() and
    > asyncQueueAddEntries() routines, and then we could check this value on
    > vac_update_datfrozenxid().
    >
    > I've write a draft patch that plays with the idea, see attached.
    
    Thanks for the patch! I have few comments on it :
    1)
    Maybe we should add an assert that NotifyQueueLock is held to the StoreEntryXid
    function?
    
    2)
    For fields in AsyncQueueControl we have macros like QUEUE_HEAD,
    QUEUE_TAIL and so on. Maybe minXid should also be wrapped with something
    like QUEUE_MIN_XID?
    
    3)
    This logic will not work with (for example) two listeners. Steps :
    session1=# listen c1;
    session2=# listen c1;
    session3=# notify c1; // sets minimal xid to N inside StoreEntryXid
    session3=# notify c1;
    session1=# listen c1; // sets minimal xid to (N + 1) inside ReleaseEntryXid
    
    After these steps, session2 still needs to process a message with xid = N, but
    autovacuum legally can freeze it and truncate clog.
    
    Maybe we should check whether there are no listeners with smaller queue
    position compared to the current backend?
    
    4)
    After ReleaseEntryXid, minXid field contains not actually minimal xid
    in the queue,
    but maximum processed xid. Maybe this parameter can be renamed?
    
    --
    Best regards,
    Daniil Davydov
    
    
    
    
  5. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-19T00:14:15Z

    On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
    > Hi,
    >
    > On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    >>
    >> On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
    >> >> My questions:
    >> >>
    >> >> 1. Is it acceptable to drop notifications from the async queue if
    >> >> there are no active listeners? There might still be notifications that
    >> >> haven’t been read by any previous listener.
    >> >
    >> > I'm somewhat wary of this idea -- could these inactive listeners become
    >> > active later and expect to be able to read their notifies?
    >> >
    >> I'm bit worry about this too.
    >
    > What exactly do we mean by "active listener"? According to the source code,
    > the active listener (as far as I understand) is the one who listens to at least
    > one channel. If we have no active listeners in the database, the new listener
    > will set its pointer to the tail of the async queue. Thus, messages with old
    > xid will not be touched by anybody. I don't see any point in dropping them
    > in this case.
    >
    I think that this definition is correct, but IIUC the tail can still
    have notifications with xid's that were already truncated by vacuum
    freeze. When the LISTEN is executed, we first loop through the
    notification queue to try to advance the queue pointers and we can
    eventually iterate over a notification that was added on the queue
    without any listener but it has a xid that is already truncated by vacuum
    freeze, so in this case it will fail to get the transaction status. On
    Alex steps to reproduce the issue it first executes the NOTIFY and
    then executes the LISTEN which fails after vacuum freeze.
    
    > If the "inactive" listener is the backend which is stuck somewhere, the
    > answer is "no" - this backend should be able to process all notifications.
    >
    I tried to reproduce the issue by using some kind of "inactive"
    listener but so far I didn't manage to trigger the error. This is what I
    tried:
    
    1. Create listener:
    postgres=# listen c1;
    
    2. Execute a very long query to make the backend busy to process the
    notification:
    postgres=# select * from generate_series(1,10000000000) g where g > 1;
    
    3. On another session send the notification
    postgres=# notify c1;
    
    4. Execute pgbench test:
    pgbench -n -c 80 -j 10 -t 15000 -f test.sql postgres
    
    5. Verify that we have multiple files on pg_xact:
    ➜  ls -lah ~/pg-devel/data/pg_xact
    total 608
    -rw-------@ 1 matheus  staff   256K Aug 18 20:36 0000
    -rw-------@ 1 matheus  staff    40K Aug 18 20:56 0001
    
    6. Execute VACUUM FREEZE on every database on the server
    
    postgres=# VACUUM FREEZE;
    VACUUM
    
    postgres=# \c template1
    You are now connected to database "template1" as user "postgres".
    template1=# VACUUM FREEZE;
    VACUUM
    
    template1=# \c template0
    You are now connected to database "template0" as user "postgres".
    template0=# VACUUM FREEZE;
    VACUUM
    
    After the vacuum freeze I still can see the same files on pg_xact/ and
    if I cancel the long query the notification is received correctly, and
    then if I execute vacuum freeze again on every database the oldest
    pg_xact file is truncated.
    
    So, if my tests are correct I don't think that storing the oldest xid is
    necessary anymore since I don't think that we can lose notifications
    using the patch from Daniil or I'm missing something here?
    
    Thinking about this, maybe another solution for this would be to change
    queue advancing pointers to skip the transaction status check? With this
    we would not need to touch any vacuum freeze code and instead change the
    LISTEN execution to handle this scenario.
    
    Thoughts?
    
    --
    Matheus Alcantara
    
    
    
    
  6. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Daniil Davydov <3danissimo@gmail.com> — 2025-08-19T03:57:42Z

    Hi,
    
    On Tue, Aug 19, 2025 at 7:14 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
    > >
    > > What exactly do we mean by "active listener"? According to the source code,
    > > the active listener (as far as I understand) is the one who listens to at least
    > > one channel. If we have no active listeners in the database, the new listener
    > > will set its pointer to the tail of the async queue. Thus, messages with old
    > > xid will not be touched by anybody. I don't see any point in dropping them
    > > in this case.
    > >
    > I think that this definition is correct, but IIUC the tail can still
    > have notifications with xid's that were already truncated by vacuum
    > freeze. When the LISTEN is executed, we first loop through the
    > notification queue to try to advance the queue pointers and we can
    > eventually iterate over a notification that was added on the queue
    > without any listener but it has a xid that is already truncated by vacuum
    > freeze, so in this case it will fail to get the transaction status. On
    > Alex steps to reproduce the issue it first executes the NOTIFY and
    > then executes the LISTEN which fails after vacuum freeze.
    >
    
    Yeah, you are right. I looked at the code again, and found out that even
    if there are no active listeners, new listener should iterate from the head
    to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
    dropping notifications is not the best way to resolve this issue.
    
    > > If the "inactive" listener is the backend which is stuck somewhere, the
    > > answer is "no" - this backend should be able to process all notifications.
    > >
    > I tried to reproduce the issue by using some kind of "inactive"
    > listener but so far I didn't manage to trigger the error.
    >
    > After the vacuum freeze I still can see the same files on pg_xact/ and
    > if I cancel the long query the notification is received correctly, and
    > then if I execute vacuum freeze again on every database the oldest
    > pg_xact file is truncated.
    >
    > So, if my tests are correct I don't think that storing the oldest xid is
    > necessary anymore since I don't think that we can lose notifications
    > using the patch from Daniil or I'm missing something here?
    >
    
    You have started a very long transaction, which holds its xid and prevents
    vacuum from freezing it. But what if the backend is stuck not inside a
    transaction? Maybe we can just hardcode a huge delay (not inside the
    transaction) or stop process execution via breakpoint in gdb. If we will use it
    instead of a long query, I think that this error may be reproducible.
    
    --
    Best regards,
    Daniil Davydov
    
    
    
    
  7. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-19T11:31:00Z

    On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
    >> I think that this definition is correct, but IIUC the tail can still
    >> have notifications with xid's that were already truncated by vacuum
    >> freeze. When the LISTEN is executed, we first loop through the
    >> notification queue to try to advance the queue pointers and we can
    >> eventually iterate over a notification that was added on the queue
    >> without any listener but it has a xid that is already truncated by vacuum
    >> freeze, so in this case it will fail to get the transaction status. On
    >> Alex steps to reproduce the issue it first executes the NOTIFY and
    >> then executes the LISTEN which fails after vacuum freeze.
    >>
    >
    > Yeah, you are right. I looked at the code again, and found out that even
    > if there are no active listeners, new listener should iterate from the head
    > to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
    > dropping notifications is not the best way to resolve this issue.
    >
    In the steps that Alex shared, is it expected that the "LISTEN c1" command
    consumes the notification that was sent previously with NOTIFY? IIUC the
    LISTEN command should be executed before of any NOTIFY, so executing the
    LISTEN after a NOTIFY will not consume any previous notification added
    on the channel, so how bad would be to drop this notification from the
    queue in this situation?
    
    >> > If the "inactive" listener is the backend which is stuck somewhere, the
    >> > answer is "no" - this backend should be able to process all notifications.
    >> >
    >> I tried to reproduce the issue by using some kind of "inactive"
    >> listener but so far I didn't manage to trigger the error.
    >>
    >> After the vacuum freeze I still can see the same files on pg_xact/ and
    >> if I cancel the long query the notification is received correctly, and
    >> then if I execute vacuum freeze again on every database the oldest
    >> pg_xact file is truncated.
    >>
    >> So, if my tests are correct I don't think that storing the oldest xid is
    >> necessary anymore since I don't think that we can lose notifications
    >> using the patch from Daniil or I'm missing something here?
    >>
    >
    > You have started a very long transaction, which holds its xid and prevents
    > vacuum from freezing it. But what if the backend is stuck not inside a
    > transaction? Maybe we can just hardcode a huge delay (not inside the
    > transaction) or stop process execution via breakpoint in gdb. If we will use it
    > instead of a long query, I think that this error may be reproducible.
    >
    But how could this happen in real scenarios? I mean, how the backend
    could be stuck outside a transaction?
    
    --
    Matheus Alcantara
    
    
    
    
  8. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Daniil Davydov <3danissimo@gmail.com> — 2025-08-19T17:37:54Z

    Hi,
    
    On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
    > > You have started a very long transaction, which holds its xid and prevents
    > > vacuum from freezing it. But what if the backend is stuck not inside a
    > > transaction? Maybe we can just hardcode a huge delay (not inside the
    > > transaction) or stop process execution via breakpoint in gdb. If we will use it
    > > instead of a long query, I think that this error may be reproducible.
    > >
    > But how could this happen in real scenarios? I mean, how the backend
    > could be stuck outside a transaction?
    >
    
    For now, I cannot come up with a situation where it may be possible.
    Perhaps, such a lagging may occur during network communication,
    but I couldn't reproduce it. Maybe other people know how we can achieve
    this?
    
    I think that if such a situation may be possible, the suggestion to  delete
    messages  will no longer be relevant. Therefore, first of all, I would like to
    clarify this issue.
    
    --
    Best regards,
    Daniil Davydov
    
    
    
    
  9. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-08-19T17:40:50Z

    On Mon, Aug 18, 2025 at 5:14 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
    > > Hi,
    > >
    > > On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
    > > <matheusssilv97@gmail.com> wrote:
    > >>
    > >> On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
    > >> >> My questions:
    > >> >>
    > >> >> 1. Is it acceptable to drop notifications from the async queue if
    > >> >> there are no active listeners? There might still be notifications that
    > >> >> haven’t been read by any previous listener.
    > >> >
    > >> > I'm somewhat wary of this idea -- could these inactive listeners become
    > >> > active later and expect to be able to read their notifies?
    > >> >
    > >> I'm bit worry about this too.
    > >
    > > What exactly do we mean by "active listener"? According to the source code,
    > > the active listener (as far as I understand) is the one who listens to at least
    > > one channel. If we have no active listeners in the database, the new listener
    > > will set its pointer to the tail of the async queue. Thus, messages with old
    > > xid will not be touched by anybody. I don't see any point in dropping them
    > > in this case.
    > >
    > I think that this definition is correct, but IIUC the tail can still
    > have notifications with xid's that were already truncated by vacuum
    > freeze. When the LISTEN is executed, we first loop through the
    > notification queue to try to advance the queue pointers and we can
    > eventually iterate over a notification that was added on the queue
    > without any listener but it has a xid that is already truncated by vacuum
    > freeze, so in this case it will fail to get the transaction status.
    
    When a process first executes the LISTEN command, it iterates through
    the notification queue, but it seems only to advance its queue pointer
    because it doesn't add the interested channels to its list yet and it
    isn't interested in the notifications queued before it registered as a
    listener. I'm wondering if we could optimize this by allowing the
    queue pointer to fast-forward without checking transaction status. If
    feasible, this might resolve the reported issue.
    
    However, I have a more fundamental concern regarding the LISTEN/NOTIFY
    implementation. Since vacuum doesn't consider the XIDs of notification
    entries, there might be a critical issue with the truncation of clog
    entries that LISTEN/NOTIFY still requires. As I understand it,
    notification queue entries aren't ordered by XID, and it's possible
    for a notification with an older XID to be positioned at the queue's
    head. If vacuum freeze then truncates the corresponding clogs,
    listeners attempting to retrieve this notification would fail to
    obtain the transaction status. To address this, we likely need to
    either implement Álvaro's suggestion[1] to make vacuum aware of the
    oldest XID in the notification queue, or develop a mechanism to
    remove/freeze XIDs of the notification entries.
    
    Regards,
    
    [1] https://www.postgresql.org/message-id/202508061044.ptcyt7aqsaaa%40alvherre.pgsql
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  10. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-20T21:18:32Z

    On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
    > Hi,
    >
    > On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    >>
    >> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
    >> > You have started a very long transaction, which holds its xid and prevents
    >> > vacuum from freezing it. But what if the backend is stuck not inside a
    >> > transaction? Maybe we can just hardcode a huge delay (not inside the
    >> > transaction) or stop process execution via breakpoint in gdb. If we will use it
    >> > instead of a long query, I think that this error may be reproducible.
    >> >
    >> But how could this happen in real scenarios? I mean, how the backend
    >> could be stuck outside a transaction?
    >>
    >
    > For now, I cannot come up with a situation where it may be possible.
    > Perhaps, such a lagging may occur during network communication,
    > but I couldn't reproduce it. Maybe other people know how we can achieve
    > this?
    >
    Reading more the code I understand that once the a NOTIFY command is
    received by a backend (and the transaction is committed) it will
    emedialy signal all other listener backends and if the listener backend
    is in idle it will consume the notification and then send it back to the
    client as a PqMsg_NotificationResponse, so if there is a network delay
    to send the notification from the listener backend back to the client I
    don't think that it would be possible to get this error, because the
    message was already dispatched by the backend and it will eventually get
    to the client and once the notification is dispatched the backend
    doesn't need to track it anymore (the queue pointers of the backend are
    advanced after the dispatch).
    
    Assuming that every SQL command is wrapped into a transaction (if it's
    not already inside in) I think a busy listener backend will always
    prevent the vacuum from freezing clog files past from its current xid,
    so any notification that is sent while the backend is busy will not have
    their transaction status removed from clog files anyway.
    
    Is all these understandings and assumptions correct or am I missing
    something here?
    
    > I think that if such a situation may be possible, the suggestion to  delete
    > messages  will no longer be relevant. Therefore, first of all, I would like to
    > clarify this issue.
    >
    >From what I've understood until now it seems to me that this can happen
    only if we have a notification on the queue without any listener, so the
    notification may stay on the queue from a long time and a vacuum freeze
    can be executed during this time and then when we have a new listener
    (even for a different channel) it will fail to advance the pointers at
    listener setup(Exec_ListenPreCommit()) because it would not be able to
    get the transition status of this very old notification.
    
    (please note that I'm not trying to invalidate your concern, I'm also
    have this concern but unfortunately I'm unable to reproduce it and I'm
    just sharing my thoughts to see if this issue is really possible or not)
    
    --
    Matheus Alcantara
    
    
    
    
  11. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-20T21:22:25Z

    On Tue Aug 19, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
    > However, I have a more fundamental concern regarding the LISTEN/NOTIFY
    > implementation. Since vacuum doesn't consider the XIDs of notification
    > entries, there might be a critical issue with the truncation of clog
    > entries that LISTEN/NOTIFY still requires. As I understand it,
    > notification queue entries aren't ordered by XID, and it's possible
    > for a notification with an older XID to be positioned at the queue's
    > head. If vacuum freeze then truncates the corresponding clogs,
    > listeners attempting to retrieve this notification would fail to
    > obtain the transaction status. To address this, we likely need to
    > either implement Álvaro's suggestion[1] to make vacuum aware of the
    > oldest XID in the notification queue, or develop a mechanism to
    > remove/freeze XIDs of the notification entries.
    >
    Thanks for the comments! Please see my reply at [1] that I mention that
    I don't think that is too easy to have this specific scenario of a busy
    backend loose dropped notifications.
    
    [1] https://www.postgresql.org/message-id/DC7KGTXW3QSG.OZA24HONT78J%40gmail.com
    
    --
    Matheus Alcantara
    
    
    
    
  12. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-08-21T22:25:34Z

    On Wed, Aug 20, 2025 at 2:18 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
    > > Hi,
    > >
    > > On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
    > > <matheusssilv97@gmail.com> wrote:
    > >>
    > >> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
    > >> > You have started a very long transaction, which holds its xid and prevents
    > >> > vacuum from freezing it. But what if the backend is stuck not inside a
    > >> > transaction? Maybe we can just hardcode a huge delay (not inside the
    > >> > transaction) or stop process execution via breakpoint in gdb. If we will use it
    > >> > instead of a long query, I think that this error may be reproducible.
    > >> >
    > >> But how could this happen in real scenarios? I mean, how the backend
    > >> could be stuck outside a transaction?
    > >>
    > >
    > > For now, I cannot come up with a situation where it may be possible.
    > > Perhaps, such a lagging may occur during network communication,
    > > but I couldn't reproduce it. Maybe other people know how we can achieve
    > > this?
    > >
    > Reading more the code I understand that once the a NOTIFY command is
    > received by a backend (and the transaction is committed) it will
    > emedialy signal all other listener backends and if the listener backend
    > is in idle it will consume the notification and then send it back to the
    > client as a PqMsg_NotificationResponse, so if there is a network delay
    > to send the notification from the listener backend back to the client I
    > don't think that it would be possible to get this error, because the
    > message was already dispatched by the backend and it will eventually get
    > to the client and once the notification is dispatched the backend
    > doesn't need to track it anymore (the queue pointers of the backend are
    > advanced after the dispatch).
    >
    > Assuming that every SQL command is wrapped into a transaction (if it's
    > not already inside in) I think a busy listener backend will always
    > prevent the vacuum from freezing clog files past from its current xid,
    > so any notification that is sent while the backend is busy will not have
    > their transaction status removed from clog files anyway.
    
    What about backend processes that don't have any xid or xmin (i.e.,
    are read-only query and in idle-in-transaction)?
    
    IIUC we process the notification entries at the beginning of the
    server loop (see L4608 in postgres.c) and when reading a command (via
    ProcessClientReadInterrupt()), but it seems to me that if a process is
    in idle-in-transaction state it doesn't process the entries unless the
    transaction is committed. I've reproduced the missing clog entry error
    even if we have a notification on the queue with a valid listener,
    with the following steps:
    
    1. Initialize the database cluster.
    
    2. Execute "ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;".
    
    3. Start one psql session and execute:
    
    -- Session 1
    =# listen s;
    LISTEN
    =# begin;
    BEGIN
    
    (keep this session open to leave the process idle-in-transaction.)
    
    4. Open another psql session and execute:
    
    -- Session 2
    =# begin;
    BEGIN
    =# select txid_current();
     txid_current
    --------------
              756
    (1 row)
    =# notify s;
    NOTIFY
    =# commit;
    COMMIT
    
    The notification to the channel 's' should be available for the
    session-1's transaction.
    
    5. Consume enough XIDs to truncate clog entries.
    
    -- Session 2
    =# create extension xid_wraparound;
    CREATE EXTENSION
    =# select consume_xids(10_000_000);
    NOTICE:  consumed 10000000 / 10000000 XIDs, latest 0:10000757
     consume_xids
    --------------
         10000757
    (1 row)
    =# select txid_current();
     txid_current
    --------------
         10000758
    (1 row)
    
    6. Execute vacuum freeze on all databases:
    
    $ vacuumdb --all --freeze
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template0"
    vacuumdb: vacuuming database "template1"
    
    $ psql -d postgres -c "select datname, datfrozenxid, age(datfrozenxid)
    from pg_database"
      datname  | datfrozenxid | age
    -----------+--------------+-----
     postgres  |     10000759 |  11
     template0 |     10000759 |  11
     template1 |     10000759 |  11
    (3 rows)
    
    7. On the first psql session:
    
    -- Session 1
    =# commit;
    COMMIT
    ERROR:  could not access status of transaction 756
    DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  13. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-21T23:14:47Z

    On Thu Aug 21, 2025 at 7:25 PM -03, Masahiko Sawada wrote:
    > What about backend processes that don't have any xid or xmin (i.e.,
    > are read-only query and in idle-in-transaction)?
    >
    > IIUC we process the notification entries at the beginning of the
    > server loop (see L4608 in postgres.c) and when reading a command (via
    > ProcessClientReadInterrupt()), but it seems to me that if a process is
    > in idle-in-transaction state it doesn't process the entries unless the
    > transaction is committed. I've reproduced the missing clog entry error
    > even if we have a notification on the queue with a valid listener,
    > with the following steps:
    >
    Ok, now we can reproduce this, thank you! So I think that we need a way
    to teach the VACUUM FREEZE about old xid's on async queue.
    
    I'll work on this considering the initial Daniil comments at [1]
    
    [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
    
    --
    Matheus Alcantara
    
    
    
    
    
  14. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-28T22:31:35Z

    On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
    > I'll work on this considering the initial Daniil comments at [1]
    >
    > [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
    >
    I've been working on this on the last few days, please see the attached
    patch version.
    
    In this new version I tried to follow the suggestion from Daniil of
    scanning all pages from tail to head of the async queue.
    
    I tried to use other data structures like a list or a hashmap to store
    the xids but it shows to me more complicated due to duplicated
    information being stored and also considering that this data structure
    should be stored on shared memory, so I just tried to get the oldest xid
    with the information that we already have on the async queue.
    
    To find the oldest xid on the async queue I basically read all
    AsyncQueueEntry's on async queue pages on SLRU cache from tail to head,
    skipping only uncommitted transactions and notifications for different
    databases.
    
    The code that reads the pages and the entries within the page is a bit
    coupled with the send notification logic. The important functions that
    execute this is asyncQueueReadAllNotifications() which read the pages
    from SLRU cache and then call the asyncQueueProcessPageEntries() which
    read the entries within the page and send the notification to the client
    if needed. Since I need a very similar code to read the notification
    entries on the queue to check the min xid I think that it would be good
    to have an API that just works like an iterator and returns the next
    queue entry to work on so that it can be used for both cases.
    
    In this patch I created an AsyncQueueIterator that iterates over pages
    and AsyncQueueEntry's within the pages. The code is based on the
    asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
    functions. For now I'm just implementing the iterator logic and use at
    AsyncQueueMinXid() to get the min xid on the queue to make the review
    more easier but I think that we can have another patch that refactor
    asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
    functions to use this iterator approach and avoid duplicated code.
    
    I'm also adding a TAP test that reproduces the issue, but I'm not sure if
    it's declared on the best path.
    
    I think that it would be good to perform some benchmark tests to see if
    we have performance issues when reading very long async queues.
    
    Thoughts?
    
    --
    Matheus Alcantara
    
  15. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Jacques Combrink <jacques@quantsolutions.co.za> — 2025-09-01T14:06:58Z

    I came across this discussion after we had this problem at least twice 
    now at work.
    I read through the discussion to see if this is the problem we are 
    experiencing.
    At first, I thought this was not exactly our problem, because of the way 
    it was showed to be reproduced.
    
    With a long running transaction that is listening, and then doing all 
    the other steps like like running through transactions until a second 
    pg_xact file is created, and then vacuuming etc etc...
    
    But, I felt like we don't do this, so it cannot be the way we run into 
    the problem. Specifically, having a listener as a long running transaction.
    
    
    So I set out to get it into the broken state, and here is how I can 
    reproduce it without long running queries:
    
    1. Create another database
    
    postgres=# CREATE DATABASE test;
    CREATE DATABASE
    
    2. Create a listener on the postgres database.
    
    postgres=# LISTEN xx;
    LISTEN
    
    3. Create notifies for the test database.
    
    Here I struggled to get a stuck notification in the queue (SELECT 
    pg_notification_queue_usage();).
    
    It can happen with only a single notify from a  psql connection, but I 
    get a higher hit rate with the following:
    
    Create a notify.sql with the only contents being:
    
    ```
    NOTIFY s, 'yappers';
    ```
    Then run this against the test database with pgbench.
    
    pgbench -n -c 80 -j 20 -t 1000 -f notify.sql test
    
    
    4. Confirm that there is now something stuck in the queue:
    
    postgres=# SELECT pg_notification_queue_usage();
      pg_notification_queue_usage
    -----------------------------
              9.5367431640625e-07
    (1 row)
    
    If this still shows 0, then run step 3 again.
    
    5. Consume xid's. I create a file consume.sql with the only contents being:
    ```
    SELECT txid_current();
    ```
    
    Then:
    
    pgbench -n -c 80 -j 30 -t 100000 -f consume.sql postgres
    
    6. Verify that a new file is created in the pg_xact folder.
    
    If not, just run the previous step again.
    
    7. Run vacuum freeze. (Remember to allow connections to template0 
    beforehand with `ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;`)
    
    $ vacuumdb --all --freeze
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template0"
    vacuumdb: vacuuming database "template1"
    vacuumdb: vacuuming database "test"
    
    8. Connect to the test database and try execute a listener.
    test=# LISTEN anything;
    ERROR:  could not access status of transaction 81086
    DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
    test=#
    
    
    Extra weirdness:
    
    In step 2 create a connection to the test database and LISTEN there on 
    any channel, even the one you are notifying to:
    And you don't have to run anything on that connection, there is no 
    backend_xmin on that connection, you don't start a transaction nothing, 
    you just run `LISTEN something;`
    Then after step 7, as long as you don't close this connection, you will 
    not get an error when you try to set up a listener on the test database, 
    even on a new connection to the test database.
    
    What I'm saying is after step 7 you can have two connections. One to 
    postgres database with active listener, and one to test database with 
    active listener.
    As long as the postgres one is open the queue is stuck, ie `SELECT 
    pg_notification_queue_usage();` returns non-zero.
    As long as the test database connection is open, new listeners do not 
    experience the error.
    
    If you close the test database connection, but leave the postgres 
    connection, from now on any listener you try to create on test database 
    will error.
    
    If you close the postgres connection, the queue clears immediately and 
    new LISTEN's on the test database will work.
    This means it is possible to get rid of the problem without restarting 
    the postmaster if you can close connections until you close the one that 
    "caused" the problem.
    Don't know if this is actually useful, but to me it seemed like everyone 
    believed it to be impossible to recover without restarting, so it's 
    something at least.
    
    
    TLDR:
    active listener on one database causes notify on another database to get 
    stuck.
    At no point could I get a stuck notify if I don't have a listener on at 
    least one other database than the one I am notifying on. See the Extra 
    weirdness section.
    At no point do you need to have any other queries running, there is 
    never an idle in transaction query needed for bad timing with the vacuum.
    
    I hope I explained everything well enough so that one of you smart 
    people can find and fix the problem.
    
    
    
    
    
  16. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Yura Sokolov <y.sokolov@postgrespro.ru> — 2025-09-02T07:37:59Z

    29.08.2025 01:31, Matheus Alcantara пишет:
    > On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
    >> I'll work on this considering the initial Daniil comments at [1]
    >>
    >> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
    >>
    > I've been working on this on the last few days, please see the attached
    > patch version.
    > 
    > In this new version I tried to follow the suggestion from Daniil of
    > scanning all pages from tail to head of the async queue.
    
    Since we still need to scan those pages, couldn't we mark finished
    transactions as committed/aborted?
    This way we may to not hold datfrozenxid for a long time and will allow
    both safe clog truncation and safe async queue notification.
    More over, most notifications could be marked as completed in
    AtCommit_Notify already. So there is a need to recheck a few notifications
    that were written but not marked in AtCommit_Notify.
    
    Since AsyncQueueEntry field is used only to check visibility, looks like it
    is safe to set it to FrozenTransactionId.
    
    -- 
    regards
    Yura Sokolov aka funny-falcon
    
    
    
    
  17. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-03T19:28:21Z

    Hi,
    
    On Tue, Sep 2, 2025 at 10:38 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
    >
    > 29.08.2025 01:31, Matheus Alcantara пишет:
    > > On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
    > >> I'll work on this considering the initial Daniil comments at [1]
    > >>
    > >> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
    > >>
    > > I've been working on this on the last few days, please see the attached
    > > patch version.
    > >
    > > In this new version I tried to follow the suggestion from Daniil of
    > > scanning all pages from tail to head of the async queue.
    >
    > Since we still need to scan those pages, couldn't we mark finished
    > transactions as committed/aborted?
    > This way we may to not hold datfrozenxid for a long time and will allow
    > both safe clog truncation and safe async queue notification.
    > More over, most notifications could be marked as completed in
    > AtCommit_Notify already. So there is a need to recheck a few notifications
    > that were written but not marked in AtCommit_Notify.
    >
    > Since AsyncQueueEntry field is used only to check visibility, looks like it
    > is safe to set it to FrozenTransactionId.
    
    I like the idea to make queue records more self-sufficient, so we
    don't need to check a transaction's status to determine if a
    notification should be sent. What if we somehow mark every record of
    an aborted transaction, so listeners can skip it without checking
    transaction status? Also if the record is not marked as "aborted" we
    also can send it without checking transaction status. IIUC
    AtAbort_Notify is called before any listener can read records of the
    aborted transaction (is it correct?), so we can do such "marking" in
    AtAbort_Notify. And the listen/notify subsystem already has a similar
    "marking" mechanism. At the end of every queue page we have a "dummy"
    record which is skipped by every listener. Listeners skip it because
    record's 'dboid' equals InvalidOid (listeners ignore any record where
    the 'dboid' is different from their own). In the same way aborted
    transactions can set 'dboid' to InvalidOid for their records in
    AtAbort_Notify.
    
    Some benefits of this approach:
    1) Only aborted transactions would need to perform extra work.
    2) Listeners don't need to check notify transaction status, but ofc
    they still need to wait if the notify transaction is 'in progress'.
    3) Vacuum don't need to be listen/notify aware.
    4) We don't need to do an extra scan of the whole queue.
    
    What do you think?
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  18. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T20:35:49Z

    On Mon Sep 1, 2025 at 11:06 AM -03, Jacques Combrink wrote:
    > TLDR:
    > active listener on one database causes notify on another database to get
    > stuck.
    > At no point could I get a stuck notify if I don't have a listener on at
    > least one other database than the one I am notifying on. See the Extra
    > weirdness section.
    > At no point do you need to have any other queries running, there is
    > never an idle in transaction query needed for bad timing with the vacuum.
    >
    > I hope I explained everything well enough so that one of you smart
    > people can find and fix the problem.
    >
    The  long running transaction steps is just an example that we can lose
    notifications using the first patch from Daniil that Alex has shared on
    [1]. The steps that you've shared is just another way to trigger the
    issue but it's similar to the steps that Alex also shared on [1].
    
    All these different ways to trigger the error face the same underlying
    problem: If a notification is keep for too long on the queue that vacuum
    freeze can run and truncate clog files that contains transaction
    information of this notification the error will happen.
    
    The patch that I've attached on [2] aims to fix the issue following the
    steps that you've shared, but during the tests I've found a stack
    overflow bug on AsyncQueueIterNextNotification() due to the number of
    notifications. I'm attaching a new version that fix this bug and I tried
    to reproduce your steps with this new version and the issue seems to be
    fixed.
    
    Note that notifications that were added without any previous LISTEN will
    block the xid advance during VACUUM FREEZE until we have a listener on
    the database that owns these notifications. The XXX comment on vacuum.c
    is about this problem.
    
    [1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
    [2] https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com
    
    --
    Matheus Alcantara
    
  19. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T21:04:47Z

    Thanks for the comments!
    
    On Tue Sep 2, 2025 at 4:37 AM -03, Yura Sokolov wrote:
    > 29.08.2025 01:31, Matheus Alcantara пишет:
    >> On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
    >>> I'll work on this considering the initial Daniil comments at [1]
    >>>
    >>> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
    >>>
    >> I've been working on this on the last few days, please see the attached
    >> patch version.
    >>
    >> In this new version I tried to follow the suggestion from Daniil of
    >> scanning all pages from tail to head of the async queue.
    >
    > Since we still need to scan those pages, couldn't we mark finished
    > transactions as committed/aborted?
    > This way we may to not hold datfrozenxid for a long time and will allow
    > both safe clog truncation and safe async queue notification.
    > More over, most notifications could be marked as completed in
    > AtCommit_Notify already. So there is a need to recheck a few notifications
    > that were written but not marked in AtCommit_Notify.
    >
    > Since AsyncQueueEntry field is used only to check visibility, looks like it
    > is safe to set it to FrozenTransactionId.
    >
    Maybe we could have a boolean "committed" field on AsyncQueueEntry and
    check this field before sending the notification instead of call
    TransactionIdDidCommit()? Is something similar what you are proposing?
    
    We create the AsyncQueueEntry's and add into the SLRU at
    PreCommit_Notify(), so to mark these entries as committed on
    AtCommit_Notify() we would need a way to find these entries on the SLRU
    from the pendingNotifies->events that contains the notifications added
    on the current transaction being committed.
    
    This idea looks promising to me TBH, I'm just not sure if it's
    completely safe to mark an AsyncQueueEntry on the queue as committed
    without checking on TransactionIdDidCommit(). I would like to read more
    opinions about this before working on the next version based on this
    idea.
    
    --
    Matheus Alcantara
    
    
    
    
  20. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T21:14:40Z

    On Wed Sep 3, 2025 at 4:28 PM -03, Arseniy Mukhin wrote:
    >> Since we still need to scan those pages, couldn't we mark finished
    >> transactions as committed/aborted?
    >> This way we may to not hold datfrozenxid for a long time and will allow
    >> both safe clog truncation and safe async queue notification.
    >> More over, most notifications could be marked as completed in
    >> AtCommit_Notify already. So there is a need to recheck a few notifications
    >> that were written but not marked in AtCommit_Notify.
    >>
    >> Since AsyncQueueEntry field is used only to check visibility, looks like it
    >> is safe to set it to FrozenTransactionId.
    >
    > I like the idea to make queue records more self-sufficient, so we
    > don't need to check a transaction's status to determine if a
    > notification should be sent. What if we somehow mark every record of
    > an aborted transaction, so listeners can skip it without checking
    > transaction status? Also if the record is not marked as "aborted" we
    > also can send it without checking transaction status. IIUC
    > AtAbort_Notify is called before any listener can read records of the
    > aborted transaction (is it correct?), so we can do such "marking" in
    > AtAbort_Notify. And the listen/notify subsystem already has a similar
    > "marking" mechanism. At the end of every queue page we have a "dummy"
    > record which is skipped by every listener. Listeners skip it because
    > record's 'dboid' equals InvalidOid (listeners ignore any record where
    > the 'dboid' is different from their own). In the same way aborted
    > transactions can set 'dboid' to InvalidOid for their records in
    > AtAbort_Notify.
    >
    > Some benefits of this approach:
    > 1) Only aborted transactions would need to perform extra work.
    > 2) Listeners don't need to check notify transaction status, but ofc
    > they still need to wait if the notify transaction is 'in progress'.
    > 3) Vacuum don't need to be listen/notify aware.
    > 4) We don't need to do an extra scan of the whole queue.
    >
    > What do you think?
    >
    IIUC we don't store notifications of aborted transactions on the queue.
    On PreCommit_Notify we add the notifications on the queue and on
    Commit_Notify() we signal the backends.
    
    Or I'm missing something here?
    
    --
    Matheus Alcantara
    
    
    
    
    
  21. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Rishu Bagga <rishu.postgres@gmail.com> — 2025-09-03T23:51:20Z

    On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
    
    <matheusssilv97@gmail.com> wrote:
    
    
    > IIUC we don't store notifications of aborted transactions on the
    
    > queue. On PreCommit_Notify we add the notifications on the queue
    
    > and on Commit_Notify() we signal the backends.
    
    >
    
    > Or I'm missing something here?
    
    
    My understanding is that something could go wrong in between
    
    PreCommit_Notify and AtCommit_Notify, which could cause the
    
    transaction to abort, and the notification might be in the queue at
    
    this point, even though the transaction aborted, hence the dependency
    
    on the commit log.
    
    
    > Maybe we could have a boolean "committed" field on AsyncQueueEntry
    
    > and check this field before sending the notification instead of
    
    > call TransactionIdDidCommit()? Is something similar what you are
    
    > proposing?
    
    
    I like this direction, as it opens up the ability to remove the
    
    global lock held from PreCommit_Notify to the end of the transaction.
    
    In the same vein as this idea, I was experimenting with a patch
    
    inspired by Tom Lane’s idea in [1] where we write the actual
    
    notification data into the WAL, and just write the db, lsn, and xid
    
    into the notify queue in AtCommit_Notify, so the notify queue only
    
    contains information about committed transactions. Unfortunately,
    
    the additional WAL write overhead in this approach seemed to outweigh
    
    the benefits of removing the lock.
    
    
    Following that idea - I think perhaps we could have two queues in the
    
    notify system, one that stores the notification data, and another
    
    that stores the position of the committed transaction’s notification,
    
    which we would append to in AtCommit_Notify. Then the listener would
    
    go through the position queue, and find / read the notifications from
    
    the notify data queue.
    
    
    [1] https://www.postgresql.org/message-id/1878165.1752858390%40sss.pgh.pa.us
    
  22. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-04T14:14:22Z

    Hi,
    
    On Thu, Sep 4, 2025 at 2:51 AM Rishu Bagga <rishu.postgres@gmail.com> wrote:
    >
    > On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
    >
    > <matheusssilv97@gmail.com> wrote:
    >
    >
    > > IIUC we don't store notifications of aborted transactions on the
    >
    > > queue. On PreCommit_Notify we add the notifications on the queue
    >
    > > and on Commit_Notify() we signal the backends.
    >
    > >
    >
    > > Or I'm missing something here?
    >
    >
    > My understanding is that something could go wrong in between
    >
    > PreCommit_Notify and AtCommit_Notify, which could cause the
    >
    > transaction to abort, and the notification might be in the queue at
    >
    > this point, even though the transaction aborted, hence the dependency
    >
    > on the commit log.
    
    I have the same understanding.
    
    >
    > > Maybe we could have a boolean "committed" field on AsyncQueueEntry
    >
    > > and check this field before sending the notification instead of
    >
    > > call TransactionIdDidCommit()? Is something similar what you are
    >
    > > proposing?
    >
    >
    > I like this direction, as it opens up the ability to remove the
    >
    > global lock held from PreCommit_Notify to the end of the transaction.
    >
    > In the same vein as this idea, I was experimenting with a patch
    >
    > inspired by Tom Lane’s idea in [1] where we write the actual
    >
    > notification data into the WAL, and just write the db, lsn, and xid
    >
    > into the notify queue in AtCommit_Notify, so the notify queue only
    >
    > contains information about committed transactions. Unfortunately,
    >
    > the additional WAL write overhead in this approach seemed to outweigh
    >
    > the benefits of removing the lock.
    
    
    Interesting, have you shared your patch and results somewhere? IIUC
    Tom's approach resolves this bug, because with it we have queue
    entries produced by committed transactions only, so we don't need to
    check their status and don't have dependency on clog.
    
    >
    > Following that idea - I think perhaps we could have two queues in the
    >
    > notify system, one that stores the notification data, and another
    >
    > that stores the position of the committed transaction’s notification,
    >
    > which we would append to in AtCommit_Notify. Then the listener would
    >
    > go through the position queue, and find / read the notifications from
    >
    > the notify data queue.
    >
    
    If I'm not wrong, by the time we got into AtCommit_Notify we no longer
    hold the global lock, so we can't provide notifications in the commit
    order. I think it would work if we add entries to the position queue
    concurrently with adding commit entries to the wal, as in Tom's idea
    3rd step.
    Another question: how to truncate notification data queue. It sounds
    like it will be more difficult to figure out what data in the queue is
    still needed.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  23. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Rishu Bagga <rishu.postgres@gmail.com> — 2025-09-04T23:02:16Z

    On Thu, Sep 4, 2025 at 7:14 AM Arseniy Mukhin
    <arseniy.mukhin.dev@gmail.com> wrote:
    
    > Interesting, have you shared your patch and results somewhere? IIUC
    > Tom's approach resolves this bug, because with it we have queue
    > entries produced by committed transactions only, so we don't need to
    > check their status and don't have dependency on clog.
    
    I was eventually able to get better numbers with the patch after using
    a substantial number of connections, so I have now posted it here. [1].
    
    > Another question: how to truncate notification data queue. It sounds
    > like it will be more difficult to figure out what data in the queue is
    > still needed.
    
    Yeah, I will think about this some more, and follow up with another
    patch.
    
    Thanks,
    Rishu
    
    [1] https://www.postgresql.org/message-id/CAK80%3DjipUfGC%2BU
    QSzeA4oCP9daRtHZGm2SQZWLxC9NWmVTDtRQ%40mail.gmail.com
    
    
    
    
  24. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-05T10:44:11Z

    On Wed Sep 3, 2025 at 8:51 PM -03, Rishu Bagga wrote:
    > On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
    >
    > <matheusssilv97@gmail.com> wrote:
    >
    >
    >> IIUC we don't store notifications of aborted transactions on the
    >
    >> queue. On PreCommit_Notify we add the notifications on the queue
    >
    >> and on Commit_Notify() we signal the backends.
    >
    >>
    >
    >> Or I'm missing something here?
    >
    >
    > My understanding is that something could go wrong in between
    >
    > PreCommit_Notify and AtCommit_Notify, which could cause the
    >
    > transaction to abort, and the notification might be in the queue at
    >
    > this point, even though the transaction aborted, hence the dependency
    >
    > on the commit log.
    >
    Ok, I agree that that this may happen but I don't see this as a common
    case to fix the issue based on this behaviour. I think that we check the
    transaction status also to skip notifications that were added on the
    queue by transactions that are not fully committed yet, and I see this
    scenario as a more common case but I could be wrong.
    
    --
    Matheus Alcantara
    
    
    
    
  25. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-05T10:45:32Z

    On Thu Sep 4, 2025 at 8:02 PM -03, Rishu Bagga wrote:
    > On Thu, Sep 4, 2025 at 7:14 AM Arseniy Mukhin
    > <arseniy.mukhin.dev@gmail.com> wrote:
    >
    >> Interesting, have you shared your patch and results somewhere? IIUC
    >> Tom's approach resolves this bug, because with it we have queue
    >> entries produced by committed transactions only, so we don't need to
    >> check their status and don't have dependency on clog.
    >
    > I was eventually able to get better numbers with the patch after using
    > a substantial number of connections, so I have now posted it here. [1].
    >
    Thanks for sharing the patch! I'll reserve some time to test and review.
    
    --
    Matheus Alcantara
    
    
    
    
  26. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-05T12:56:48Z

    On Fri, Sep 5, 2025 at 1:44 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Sep 3, 2025 at 8:51 PM -03, Rishu Bagga wrote:
    > > On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
    > >
    > > <matheusssilv97@gmail.com> wrote:
    > >
    > >
    > >> IIUC we don't store notifications of aborted transactions on the
    > >
    > >> queue. On PreCommit_Notify we add the notifications on the queue
    > >
    > >> and on Commit_Notify() we signal the backends.
    > >
    > >>
    > >
    > >> Or I'm missing something here?
    > >
    > >
    > > My understanding is that something could go wrong in between
    > >
    > > PreCommit_Notify and AtCommit_Notify, which could cause the
    > >
    > > transaction to abort, and the notification might be in the queue at
    > >
    > > this point, even though the transaction aborted, hence the dependency
    > >
    > > on the commit log.
    > >
    > Ok, I agree that that this may happen but I don't see this as a common
    > case to fix the issue based on this behaviour. I think that we check the
    > transaction status also to skip notifications that were added on the
    > queue by transactions that are not fully committed yet, and I see this
    > scenario as a more common case but I could be wrong.
    >
    
    IIUC we don't need clog to check if the notification transaction is
    still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
    If we see that the notification transaction is still in progress, we
    postpone processing of that notification. If we see that notification
    transaction is not in progress, then we check its status with
    TransactionIdDidCommit() (using clog) and only then fail if clog
    doesn't have notification transcation data (as a quick check, we can
    see in the stack trace that the failure occurs during the call to
    TransactionIdDidCommit, not XidInMVCCSnapshot).
    
    So we need to check notification transaction status only to understand
    if a transaction was committed and we can send notification or it was
    aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
    and we should skip notification. And the solution that Yura Sokolov
    suggested is to introduce a 'commited' flag, so we can check it
    instead of calling TransactionIdDidCommit. Another option is to make
    aborted transactions to clean up after themselves in AtAbort_Notify(),
    so if listener see notification in the queue and transaction that
    wrote it is not in progress, then such notification was always created
    by committed transaction and we also don't need to call
    TransactionIdDidCommit. One way for aborted transactions to clean up
    is to set its notifications dbOid to 0, so listeners skip them.
    Another option I think is to set queue HEAD back to the position where
    it was when the aborted transaction started to write to the queue. It
    also makes such aborted transaction notifications 'invisible' for
    listeners. We have a global lock and there is always only one writer,
    so it sounds possible. And there is another option which is the patch
    that Rishu Bagga is working on. In the patch all transactions write to
    the listen/notify queue only after they wrote a commit record to the
    WAL, so all notifications in the queue are always by committed
    transactions and we don't need to call TransactionIdDidCommit.
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  27. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-06T14:14:54Z

    On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
    >> Ok, I agree that that this may happen but I don't see this as a common
    >> case to fix the issue based on this behaviour. I think that we check the
    >> transaction status also to skip notifications that were added on the
    >> queue by transactions that are not fully committed yet, and I see this
    >> scenario as a more common case but I could be wrong.
    >>
    >
    > IIUC we don't need clog to check if the notification transaction is
    > still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
    > If we see that the notification transaction is still in progress, we
    > postpone processing of that notification. If we see that notification
    > transaction is not in progress, then we check its status with
    > TransactionIdDidCommit() (using clog) and only then fail if clog
    > doesn't have notification transcation data (as a quick check, we can
    > see in the stack trace that the failure occurs during the call to
    > TransactionIdDidCommit, not XidInMVCCSnapshot).
    >
    > So we need to check notification transaction status only to understand
    > if a transaction was committed and we can send notification or it was
    > aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
    > and we should skip notification. And the solution that Yura Sokolov
    > suggested is to introduce a 'commited' flag, so we can check it
    > instead of calling TransactionIdDidCommit. Another option is to make
    > aborted transactions to clean up after themselves in AtAbort_Notify(),
    > so if listener see notification in the queue and transaction that
    > wrote it is not in progress, then such notification was always created
    > by committed transaction and we also don't need to call
    > TransactionIdDidCommit. One way for aborted transactions to clean up
    > is to set its notifications dbOid to 0, so listeners skip them.
    > Another option I think is to set queue HEAD back to the position where
    > it was when the aborted transaction started to write to the queue. It
    > also makes such aborted transaction notifications 'invisible' for
    > listeners. We have a global lock and there is always only one writer,
    > so it sounds possible. And there is another option which is the patch
    > that Rishu Bagga is working on. In the patch all transactions write to
    > the listen/notify queue only after they wrote a commit record to the
    > WAL, so all notifications in the queue are always by committed
    > transactions and we don't need to call TransactionIdDidCommit.
    >
    You are right, I missed the XidInMVCCSnapshot() call, sorry about that
    and thanks very much for all the explanation!
    
    I'll keep on hold the development of new patch versions for this thread
    and focus on review and test the patch from Rishu at [1] to see if we
    can make progress using the WAL approach.
    
    
    [1] https://www.postgresql.org/message-id/CAK80%3DjipUfGC%2BUQSzeA4oCP9daRtHZGm2SQZWLxC9NWmVTDtRQ%40mail.gmail.com
    
    --
    Matheus Alcantara
    
    
    
    
    
  28. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-09-15T17:40:40Z

    On Sat, Sep 6, 2025 at 7:15 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
    > >> Ok, I agree that that this may happen but I don't see this as a common
    > >> case to fix the issue based on this behaviour. I think that we check the
    > >> transaction status also to skip notifications that were added on the
    > >> queue by transactions that are not fully committed yet, and I see this
    > >> scenario as a more common case but I could be wrong.
    > >>
    > >
    > > IIUC we don't need clog to check if the notification transaction is
    > > still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
    > > If we see that the notification transaction is still in progress, we
    > > postpone processing of that notification. If we see that notification
    > > transaction is not in progress, then we check its status with
    > > TransactionIdDidCommit() (using clog) and only then fail if clog
    > > doesn't have notification transcation data (as a quick check, we can
    > > see in the stack trace that the failure occurs during the call to
    > > TransactionIdDidCommit, not XidInMVCCSnapshot).
    > >
    > > So we need to check notification transaction status only to understand
    > > if a transaction was committed and we can send notification or it was
    > > aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
    > > and we should skip notification. And the solution that Yura Sokolov
    > > suggested is to introduce a 'commited' flag, so we can check it
    > > instead of calling TransactionIdDidCommit. Another option is to make
    > > aborted transactions to clean up after themselves in AtAbort_Notify(),
    > > so if listener see notification in the queue and transaction that
    > > wrote it is not in progress, then such notification was always created
    > > by committed transaction and we also don't need to call
    > > TransactionIdDidCommit. One way for aborted transactions to clean up
    > > is to set its notifications dbOid to 0, so listeners skip them.
    > > Another option I think is to set queue HEAD back to the position where
    > > it was when the aborted transaction started to write to the queue. It
    > > also makes such aborted transaction notifications 'invisible' for
    > > listeners. We have a global lock and there is always only one writer,
    > > so it sounds possible. And there is another option which is the patch
    > > that Rishu Bagga is working on. In the patch all transactions write to
    > > the listen/notify queue only after they wrote a commit record to the
    > > WAL, so all notifications in the queue are always by committed
    > > transactions and we don't need to call TransactionIdDidCommit.
    > >
    > You are right, I missed the XidInMVCCSnapshot() call, sorry about that
    > and thanks very much for all the explanation!
    >
    > I'll keep on hold the development of new patch versions for this thread
    > and focus on review and test the patch from Rishu at [1] to see if we
    > can make progress using the WAL approach.
    
    While the WAL-based approach discussed on another thread is promising,
    I think it would not be acceptable for back branches as it requires
    quite a lot of refactoring. Given that this is a long-standing bug in
    listen/notify, I think we can continue discussing how to fix the issue
    on backbranches on this thread.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  29. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-18T21:34:40Z

    On Mon Sep 15, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
    > While the WAL-based approach discussed on another thread is promising,
    > I think it would not be acceptable for back branches as it requires
    > quite a lot of refactoring. Given that this is a long-standing bug in
    > listen/notify, I think we can continue discussing how to fix the issue
    > on backbranches on this thread.
    >
    Please see the new attached patch, it has a different implementation
    that I've previously posted which is based on the idea that Arseniy
    posted on [1].
    
    This new version include the "committed" field on AsyncQueueEntry struct
    so that we can use this info when processing the notification instead of
    call TransactionIdDidCommit()
    
    The "committed" field is set to true when the AsyncQueueEntry is being
    added on the SLRU page buffer when the PreCommit_Notify() is called. If
    an error occurs between the PreCommit_Notify() and AtCommit_Notify() the
    AtAbort_Notify() will be called and will set the "committed" field to
    false for the notifications inside the aborted transaction.
    
    It's a bit tricky to know at AtAbort_Notify() which notifications were
    added on the SLRU page buffer by the aborted transaction, so I created a
    new data structure and a global variable to keep track of this
    information. See the commit message for more information.
    
    On the previously patch that I've posted I've created a TAP test to
    reproduce the issue with the VACUUM FREEZE, this new version also
    include this test and also a new test case that use the injection points
    extension to force an error between the PreCommit_Notify() and
    AtCommit_Notify() so that we can ensure that these notifications of an
    aborted transaction are not visible to other listener backends.
    
    
    [1] https://www.postgresql.org/message-id/CAE7r3M%2BXwf0A_aNhu7pYQd7nDQaqaEnyCjtvqg8XsgManmPOUA%40mail.gmail.com
    
    --
    Matheus Alcantara
    
  30. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-19T12:34:21Z

    Hi,
    
    On Fri, Sep 19, 2025 at 12:35 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Mon Sep 15, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
    > > While the WAL-based approach discussed on another thread is promising,
    > > I think it would not be acceptable for back branches as it requires
    > > quite a lot of refactoring. Given that this is a long-standing bug in
    > > listen/notify, I think we can continue discussing how to fix the issue
    > > on backbranches on this thread.
    > >
    > Please see the new attached patch, it has a different implementation
    > that I've previously posted which is based on the idea that Arseniy
    > posted on [1].
    >
    
    Thank you for the new version.
    
    > This new version include the "committed" field on AsyncQueueEntry struct
    > so that we can use this info when processing the notification instead of
    > call TransactionIdDidCommit()
    >
    > The "committed" field is set to true when the AsyncQueueEntry is being
    > added on the SLRU page buffer when the PreCommit_Notify() is called. If
    > an error occurs between the PreCommit_Notify() and AtCommit_Notify() the
    > AtAbort_Notify() will be called and will set the "committed" field to
    > false for the notifications inside the aborted transaction.
    >
    > It's a bit tricky to know at AtAbort_Notify() which notifications were
    > added on the SLRU page buffer by the aborted transaction, so I created a
    > new data structure and a global variable to keep track of this
    > information. See the commit message for more information.
    >
    
    I like this approach. We got rid of dependency on clog and don't limit
    vacuum. Several points about the fix:
    
    Is it correct to remember and reuse slru slots here? IIUC we can't do
    it if we don't hold SLRU bank lock, because by the time we get in
    AtAbort_Notify() the queue page could be already evicted. Probably we
    need to use SimpleLruReadPage after we acquire the lock in
    AtAbort_Notify()?
    
    I think adding a boolean 'committed' is a good approach, but what do
    you think about setting the queue head back to the position where
    aborted transaction notifications start? We can do such a reset in
    AtAbort_Notify(). So instead of marking notifications as
    'commited=false' we completely erase them from the queue by moving the
    queue head back. From listeners perspective if there is a notification
    of completed transaction in the queue - it's always a committed
    transaction, so again get rid of TransactionIdDidCommit() call. It
    seems like a simpler approach because we don't need to remember all
    notifications positions in the queue and don't need the additional
    field 'committed'. All we need is to remember the head position before
    we write anything to the queue, and reset it back if there is an
    abort. IIUC Listeners will never send such erased notifications:
    - while the aborted transaction is looking like 'in progress',
    listeners can't send its notifications.
    - by the time the aborted transaction is completed, the head is
    already set back so erased notifications are located after the queue
    head and listeners can't read it.
    
    > On the previously patch that I've posted I've created a TAP test to
    > reproduce the issue with the VACUUM FREEZE, this new version also
    > include this test and also a new test case that use the injection points
    > extension to force an error between the PreCommit_Notify() and
    > AtCommit_Notify() so that we can ensure that these notifications of an
    > aborted transaction are not visible to other listener backends.
    >
    
    I think it's a good test to have. FWIW there is a way to reproduce the
    test condition without the injection point. We can use the fact that
    serializable conflicts are checked after tx adds notifications to the
    queue. Please find the attached patch with the example tap test. Not
    sure if using injections points is more preferable?
    
    
    Best regards,
    Arseniy Mukhin
    
  31. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-19T14:36:43Z

    On Fri Sep 19, 2025 at 9:34 AM -03, Arseniy Mukhin wrote:
    > Hi,
    >
    Thanks for taking a look at this!
    
    > I like this approach. We got rid of dependency on clog and don't limit
    > vacuum. Several points about the fix:
    >
    > Is it correct to remember and reuse slru slots here? IIUC we can't do
    > it if we don't hold SLRU bank lock, because by the time we get in
    > AtAbort_Notify() the queue page could be already evicted. Probably we
    > need to use SimpleLruReadPage after we acquire the lock in
    > AtAbort_Notify()?
    >
    Yeah, I think it make sense. Maybe we could just store the page number
    and offset and call SimpleLruReadPage() inside AtAbort_Notify() to get
    the slotno?
    
    > I think adding a boolean 'committed' is a good approach, but what do
    > you think about setting the queue head back to the position where
    > aborted transaction notifications start? We can do such a reset in
    > AtAbort_Notify(). So instead of marking notifications as
    > 'commited=false' we completely erase them from the queue by moving the
    > queue head back. From listeners perspective if there is a notification
    > of completed transaction in the queue - it's always a committed
    > transaction, so again get rid of TransactionIdDidCommit() call. It
    > seems like a simpler approach because we don't need to remember all
    > notifications positions in the queue and don't need the additional
    > field 'committed'. All we need is to remember the head position before
    > we write anything to the queue, and reset it back if there is an
    > abort. IIUC Listeners will never send such erased notifications:
    > - while the aborted transaction is looking like 'in progress',
    > listeners can't send its notifications.
    > - by the time the aborted transaction is completed, the head is
    > already set back so erased notifications are located after the queue
    > head and listeners can't read it.
    >
    I think that with this approach we may have a scenario where when we
    enter the AtAbort_Notify() the queue head may not be the notification
    from the transaction being aborted but it can be a notification from
    another committed transaction, so resetting the queue head back to the
    position before the aborted transaction would make us lose the
    notification from the committed transaction. Is that make sense? What do
    you think?
    
    > I think it's a good test to have. FWIW there is a way to reproduce the
    > test condition without the injection point. We can use the fact that
    > serializable conflicts are checked after tx adds notifications to the
    > queue. Please find the attached patch with the example tap test. Not
    > sure if using injections points is more preferable?
    >
    Great, I think that if we can have a way to reproduce this without an
    injection point would be better. I dind't look the test yet but I'll try
    to incorporate this on the next patch version. Thanks!
    
    --
    Matheus Alcantara
    
    
    
    
  32. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-19T15:13:54Z

    On Fri, Sep 19, 2025 at 5:36 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Fri Sep 19, 2025 at 9:34 AM -03, Arseniy Mukhin wrote:
    > > Hi,
    > >
    > Thanks for taking a look at this!
    >
    > > I like this approach. We got rid of dependency on clog and don't limit
    > > vacuum. Several points about the fix:
    > >
    > > Is it correct to remember and reuse slru slots here? IIUC we can't do
    > > it if we don't hold SLRU bank lock, because by the time we get in
    > > AtAbort_Notify() the queue page could be already evicted. Probably we
    > > need to use SimpleLruReadPage after we acquire the lock in
    > > AtAbort_Notify()?
    > >
    > Yeah, I think it make sense. Maybe we could just store the page number
    > and offset and call SimpleLruReadPage() inside AtAbort_Notify() to get
    > the slotno?
    
    Yeah, I think it would work.
    
    >
    > > I think adding a boolean 'committed' is a good approach, but what do
    > > you think about setting the queue head back to the position where
    > > aborted transaction notifications start? We can do such a reset in
    > > AtAbort_Notify(). So instead of marking notifications as
    > > 'commited=false' we completely erase them from the queue by moving the
    > > queue head back. From listeners perspective if there is a notification
    > > of completed transaction in the queue - it's always a committed
    > > transaction, so again get rid of TransactionIdDidCommit() call. It
    > > seems like a simpler approach because we don't need to remember all
    > > notifications positions in the queue and don't need the additional
    > > field 'committed'. All we need is to remember the head position before
    > > we write anything to the queue, and reset it back if there is an
    > > abort. IIUC Listeners will never send such erased notifications:
    > > - while the aborted transaction is looking like 'in progress',
    > > listeners can't send its notifications.
    > > - by the time the aborted transaction is completed, the head is
    > > already set back so erased notifications are located after the queue
    > > head and listeners can't read it.
    > >
    > I think that with this approach we may have a scenario where when we
    > enter the AtAbort_Notify() the queue head may not be the notification
    > from the transaction being aborted but it can be a notification from
    > another committed transaction, so resetting the queue head back to the
    > position before the aborted transaction would make us lose the
    > notification from the committed transaction. Is that make sense? What do
    > you think?
    
    I think it's impossible: before pushing anything to the queue we
    acquire global lock in PreCommit_Notify():
    
    LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    
    While we are holding the lock, no writers can add anything to the
    queue. Then we save head position and add pending notifications to the
    queue. The moment we get in AtAbort_Notify(), we still hold the global
    lock (maybe it is worth adding Assert about it if we start relying on
    it), so we can be sure there are no notifications in the queue after
    the saved head position except ours. So it seems safe but maybe I
    missed something.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  33. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-22T13:09:20Z

    On Fri Sep 19, 2025 at 12:13 PM -03, Arseniy Mukhin wrote:
    > I think it's impossible: before pushing anything to the queue we
    > acquire global lock in PreCommit_Notify():
    >
    > LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    >
    > While we are holding the lock, no writers can add anything to the
    > queue. Then we save head position and add pending notifications to the
    > queue. The moment we get in AtAbort_Notify(), we still hold the global
    > lock (maybe it is worth adding Assert about it if we start relying on
    > it), so we can be sure there are no notifications in the queue after
    > the saved head position except ours. So it seems safe but maybe I
    > missed something.
    >
    Thanks for the explanation! I'm just not sure if I understand why do we
    need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
    PreCommit_Notify() if we already have the
    LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    
    See the attached patch that is based on your the previous comment of
    resetting the QUEUE_HEAD at AtAbort_Notify()
    
    --
    Matheus Alcantara
    
  34. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-23T16:11:35Z

    On Mon, Sep 22, 2025 at 4:09 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Fri Sep 19, 2025 at 12:13 PM -03, Arseniy Mukhin wrote:
    > > I think it's impossible: before pushing anything to the queue we
    > > acquire global lock in PreCommit_Notify():
    > >
    > > LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    > >
    > > While we are holding the lock, no writers can add anything to the
    > > queue. Then we save head position and add pending notifications to the
    > > queue. The moment we get in AtAbort_Notify(), we still hold the global
    > > lock (maybe it is worth adding Assert about it if we start relying on
    > > it), so we can be sure there are no notifications in the queue after
    > > the saved head position except ours. So it seems safe but maybe I
    > > missed something.
    > >
    > Thanks for the explanation! I'm just not sure if I understand why do we
    > need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
    > PreCommit_Notify() if we already have the
    > LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    >
    
    Good question. It seems that it would be enough to hold
    NotifyQueueLock only during the head update, so I don't understand it
    either.
    
    > See the attached patch that is based on your the previous comment of
    > resetting the QUEUE_HEAD at AtAbort_Notify()
    
    Thank you for the new version!
    
    The fix looks exactly how I thought about it. But while thinking about
    why we need to hold NotifyQueueLock in PreCommit_Notify I realized
    there is a concurrency bug in the 'head resetting' approach. I thought
    that listeners hold NotifyQueueLock lock while reading notifications
    from the queue, but now I see that they hold it only while reading the
    current head position. So we can end up in the next situation:
    
    There are 2 backends:  listener and writer (backend with notifications)
    
    
            listener                                   writer
      ----------------------------------------------------------------------
                                             writer wants to commit, so it
                                             adds notifications to the queue
                                             in PreCommit_Notify() and advances
                                             queue head.
    
      ----------------------------------------------------------------------
    listener wants to read notifications. It
    gets into asyncQueueReadAllNotifications()
    and reads the current head (it already
    contains writer's notifications):
    
    
    asyncQueueReadAllNotifications()
              ...
        LWLockAcquire(NotifyQueueLock, LW_SHARED);
        head = QUEUE_HEAD;
        LWLockRelease(NotifyQueueLock);
              ...
    
      ----------------------------------------------------------------------
                                             writer failed to commit, so it
                                             resets queue head in AtAbort_Notify()
                                             and completes the transaction.
    
      ----------------------------------------------------------------------
    listener gets a snapshot where the writer
    is not in progress.
    
            ...
        snapshot = RegisterSnapshot(GetLatestSnapshot());
            ...
    
    
    This way the listener reads the head that includes all writer's
    notifications and a snapshot where the writer is not in progress, so
    nothing stops the listener from sending these notifications and it's
    even possible to have the listener's position that is after the queue
    head, so yes, it's bad :( Sorry about that.
    
    Probably we can fix it (swap GetLatestSnapshot() with the head
    position reading for example) but now I think that 'committed' field
    approach is a more robust way to fix the bug. What do you think?
    
    BTW one thing about 'committed' field fix version [0]. It seems that
    instead of remembering all notification positions, we can remember the
    head position after we acquire global lock and before we write
    anything to the queue and in case of abort we can just start from the
    saved position and mark all notifications until the end of the queue
    as 'committed = false'. The reason is the same as with resetting the
    head approach - as we hold global lock, no writers can add
    notifications to the queue, so when we in AtAbort_Notify() all
    notifications after the saved position are ours.
    
    
    [0] https://www.postgresql.org/message-id/CAFY6G8fui7omKfUXF%2BJJ82B34ExrwK6J7vKe61S-DhEmr_jBhA%40mail.gmail.com
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  35. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-23T22:39:57Z

    On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
    >> Thanks for the explanation! I'm just not sure if I understand why do we
    >> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
    >> PreCommit_Notify() if we already have the
    >> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    >>
    >
    > Good question. It seems that it would be enough to hold
    > NotifyQueueLock only during the head update, so I don't understand it
    > either.
    >
    IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
    InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
    LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
    wait while we are adding the entries on the queue.
    
    > Thank you for the new version!
    >
    > The fix looks exactly how I thought about it. But while thinking about
    > why we need to hold NotifyQueueLock in PreCommit_Notify I realized
    > there is a concurrency bug in the 'head resetting' approach. I thought
    > that listeners hold NotifyQueueLock lock while reading notifications
    > from the queue, but now I see that they hold it only while reading the
    > current head position. So we can end up in the next situation:
    >
    > There are 2 backends:  listener and writer (backend with notifications)
    >
    >
    >         listener                                   writer
    >   ----------------------------------------------------------------------
    >                                          writer wants to commit, so it
    >                                          adds notifications to the queue
    >                                          in PreCommit_Notify() and advances
    >                                          queue head.
    >
    >   ----------------------------------------------------------------------
    > listener wants to read notifications. It
    > gets into asyncQueueReadAllNotifications()
    > and reads the current head (it already
    > contains writer's notifications):
    >
    >
    > asyncQueueReadAllNotifications()
    >           ...
    >     LWLockAcquire(NotifyQueueLock, LW_SHARED);
    >     head = QUEUE_HEAD;
    >     LWLockRelease(NotifyQueueLock);
    >           ...
    >
    >   ----------------------------------------------------------------------
    >                                          writer failed to commit, so it
    >                                          resets queue head in AtAbort_Notify()
    >                                          and completes the transaction.
    >
    >   ----------------------------------------------------------------------
    > listener gets a snapshot where the writer
    > is not in progress.
    >
    >         ...
    >     snapshot = RegisterSnapshot(GetLatestSnapshot());
    >         ...
    >
    >
    > This way the listener reads the head that includes all writer's
    > notifications and a snapshot where the writer is not in progress, so
    > nothing stops the listener from sending these notifications and it's
    > even possible to have the listener's position that is after the queue
    > head, so yes, it's bad :( Sorry about that.
    >
    Yeah, this is bad. I'm wondering if we could reproduce such race
    conditions scenarios with some TAP tests.
    
    > Probably we can fix it (swap GetLatestSnapshot() with the head
    > position reading for example) but now I think that 'committed' field
    > approach is a more robust way to fix the bug. What do you think?
    >
    I also agree that the committed field seems a more safe approach.
    
    > BTW one thing about 'committed' field fix version [0]. It seems that
    > instead of remembering all notification positions, we can remember the
    > head position after we acquire global lock and before we write
    > anything to the queue and in case of abort we can just start from the
    > saved position and mark all notifications until the end of the queue
    > as 'committed = false'. The reason is the same as with resetting the
    > head approach - as we hold global lock, no writers can add
    > notifications to the queue, so when we in AtAbort_Notify() all
    > notifications after the saved position are ours.
    >
    See the new attached version which implements this idea of using the
    committed field approach. I was just a bit concenerd about a race
    condition situation where the QUEUE_HEAD is changed by another publisher
    process and just iterating over all entries from the saved previous
    QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
    successfully committed notifications as not committed by mistake, so in
    this new patch version I save the QUEUE_HEAD position before we add the
    entries on the shared queue and also the QUEUE_HEAD position after these
    entries are added so we ensure that we only process the entries of this
    range although we have the global lock
    LockSharedObject(DatabaseRelationId) that may prevent this situation.
    
    What do you think?
    
    --
    Matheus Alcantara
    
    
  36. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-24T13:59:07Z

    On Wed, Sep 24, 2025 at 1:40 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
    > >> Thanks for the explanation! I'm just not sure if I understand why do we
    > >> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
    > >> PreCommit_Notify() if we already have the
    > >> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
    > >>
    > >
    > > Good question. It seems that it would be enough to hold
    > > NotifyQueueLock only during the head update, so I don't understand it
    > > either.
    > >
    > IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
    > InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
    > LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
    > wait while we are adding the entries on the queue.
    
    Yeah, but I don't get why we want to block them with NotifyQueueLock
    while we are writing to the queue. There is a SLRU bank lock and they
    can't read anything after the head. Maybe it's just not a big deal?
    
    > ...
    > > This way the listener reads the head that includes all writer's
    > > notifications and a snapshot where the writer is not in progress, so
    > > nothing stops the listener from sending these notifications and it's
    > > even possible to have the listener's position that is after the queue
    > > head, so yes, it's bad :( Sorry about that.
    > >
    > Yeah, this is bad. I'm wondering if we could reproduce such race
    > conditions scenarios with some TAP tests.
    >
    
    I agree it would be great to have more tests for such cases. As for
    the 'committed field' patch, I think we can add a TAP test that shows
    that listeners postpone processing of notifications until
    notifications were marked as 'committed=false' in case of aborted
    transactions. I tried to write one, but have not succeeded yet. Hope
    to finish it soon.
    
    > > Probably we can fix it (swap GetLatestSnapshot() with the head
    > > position reading for example) but now I think that 'committed' field
    > > approach is a more robust way to fix the bug. What do you think?
    > >
    > I also agree that the committed field seems a more safe approach.
    >
    > > BTW one thing about 'committed' field fix version [0]. It seems that
    > > instead of remembering all notification positions, we can remember the
    > > head position after we acquire global lock and before we write
    > > anything to the queue and in case of abort we can just start from the
    > > saved position and mark all notifications until the end of the queue
    > > as 'committed = false'. The reason is the same as with resetting the
    > > head approach - as we hold global lock, no writers can add
    > > notifications to the queue, so when we in AtAbort_Notify() all
    > > notifications after the saved position are ours.
    > >
    > See the new attached version which implements this idea of using the
    > committed field approach. I was just a bit concenerd about a race
    > condition situation where the QUEUE_HEAD is changed by another publisher
    > process and just iterating over all entries from the saved previous
    > QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
    > successfully committed notifications as not committed by mistake, so in
    > this new patch version I save the QUEUE_HEAD position before we add the
    > entries on the shared queue and also the QUEUE_HEAD position after these
    > entries are added so we ensure that we only process the entries of this
    > range although we have the global lock
    > LockSharedObject(DatabaseRelationId) that may prevent this situation.
    
    Thank you! Speaking of the scenario, my understanding is that it's
    impossible as we hold the global lock, so QueuePosition head should
    always be equal to QUEUE_HEAD when we get into At_AbortNotify(), but
    maybe I'm wrong.
    
    Patch looks great. Some minor points:
    
    I have a warning when using git am with the patch:
         warning: 1 line adds whitespace errors.
    
    There is a comment in the head of the async.c file about some
    listen/notify internals. Maybe it's worth adding a comment about how
    aborted transactions do clean up.
    
    What do you think about a Assert in asyncQueueRollbackNotifications()
    that other backends still see us as 'in progress'? So we can be sure
    that they can't process our notifications before we mark notifications
    as 'committed=false'. Not sure how to do it correctly, maybe
    
              Assert(TransactionIdIsValid(MyProc->xid));
    
    will work? The TAP test that I tried to write also should test it.
    
    There are several comments where the word "crash" is used. What do you
    think about using "abort" instead? "Crash" sounds more like PANIC
    situation where we don't care about notifications because they don't
    survive restart.
    
    Thank you!
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  37. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-24T19:23:08Z

    On Wed Sep 24, 2025 at 10:59 AM -03, Arseniy Mukhin wrote:
    > Thank you! Speaking of the scenario, my understanding is that it's
    > impossible as we hold the global lock, so QueuePosition head should
    > always be equal to QUEUE_HEAD when we get into At_AbortNotify(), but
    > maybe I'm wrong.
    >
    Let's see if anyone else has any thoughts about this.
    
    > Patch looks great. Some minor points:
    >
    > I have a warning when using git am with the patch:
    >      warning: 1 line adds whitespace errors.
    >
    I don't think that this is a problem? And I don't know how to remove
    this warning, I've just created the patch with git format-patch @~1
    
    > There is a comment in the head of the async.c file about some
    > listen/notify internals. Maybe it's worth adding a comment about how
    > aborted transactions do clean up.
    >
    Added
    
    > What do you think about a Assert in asyncQueueRollbackNotifications()
    > that other backends still see us as 'in progress'? So we can be sure
    > that they can't process our notifications before we mark notifications
    > as 'committed=false'. Not sure how to do it correctly, maybe
    >
    >           Assert(TransactionIdIsValid(MyProc->xid));
    >
    > will work? The TAP test that I tried to write also should test it.
    >
    Sounds resanable to me. I think that we could use the following assert
    when iterating over the entries, what do you think?
    			Assert(TransactionIdIsInProgress(qe->xid));
    
    > There are several comments where the word "crash" is used. What do you
    > think about using "abort" instead? "Crash" sounds more like PANIC
    > situation where we don't care about notifications because they don't
    > survive restart.
    >
    Good point, fixed
    
    > Thank you!
    >
    Thanks for reviewing this!
    
    --
    Matheus Alcantara
    
    
  38. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-25T09:43:34Z

    On Wed, Sep 24, 2025 at 10:23 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Sep 24, 2025 at 10:59 AM -03, Arseniy Mukhin wrote:
    > > ...
    > > Patch looks great. Some minor points:
    > >
    > > I have a warning when using git am with the patch:
    > >      warning: 1 line adds whitespace errors.
    > >
    > I don't think that this is a problem? And I don't know how to remove
    > this warning, I've just created the patch with git format-patch @~1
    >
    
    In my experience a run of pgindent usually fixes all such warnings,
    but I agree it's not a big deal.
    
    >
    > > What do you think about a Assert in asyncQueueRollbackNotifications()
    > > that other backends still see us as 'in progress'? So we can be sure
    > > that they can't process our notifications before we mark notifications
    > > as 'committed=false'. Not sure how to do it correctly, maybe
    > >
    > >           Assert(TransactionIdIsValid(MyProc->xid));
    > >
    > > will work? The TAP test that I tried to write also should test it.
    > >
    > Sounds resanable to me. I think that we could use the following assert
    > when iterating over the entries, what do you think?
    >                         Assert(TransactionIdIsInProgress(qe->xid));
    >
    
    Yeah, TransactionIdIsInProgress() looks like the right way to do it.
    And one more 'Assert' idea... maybe we can add
    Assert(TransactionIdIsCurrentTransactionId(qe->xid)) just to verify
    that we are only touching our own notifications?
    
    Thank you, the fixes look good.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  39. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-28T13:17:48Z

    Hi,
    
    On Wed, Sep 24, 2025 at 4:59 PM Arseniy Mukhin
    <arseniy.mukhin.dev@gmail.com> wrote:
    >
    > On Wed, Sep 24, 2025 at 1:40 AM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    > > ...
    > > > This way the listener reads the head that includes all writer's
    > > > notifications and a snapshot where the writer is not in progress, so
    > > > nothing stops the listener from sending these notifications and it's
    > > > even possible to have the listener's position that is after the queue
    > > > head, so yes, it's bad :( Sorry about that.
    > > >
    > > Yeah, this is bad. I'm wondering if we could reproduce such race
    > > conditions scenarios with some TAP tests.
    > >
    >
    > I agree it would be great to have more tests for such cases. As for
    > the 'committed field' patch, I think we can add a TAP test that shows
    > that listeners postpone processing of notifications until
    > notifications were marked as 'committed=false' in case of aborted
    > transactions. I tried to write one, but have not succeeded yet. Hope
    > to finish it soon.
    
    I finally managed to write a TAP test for it, so there is a new
    version with the tap test.
    
    I also realized that we can increase test coverage in
    002_aborted_tx_notifies.pl if notifications of the aborted transaction
    span several pages. This way we can better test
    asyncQueueRollbackNotifications(). So I changed
    002_aborted_tx_notifies.pl TAP test a bit.
    
    And there is a small indentation change in lmgr.h that should fix this
    git am warning.
    
    I added all changes as a separate patch file (0002) so it was more
    clear what changed. Please feel free to merge into the main patch file
    / drop any part of it that makes sense to you.
    
    
    Best regards,
    Arseniy Mukhin
    
  40. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-01T11:57:38Z

    On Sun Sep 28, 2025 at 10:17 AM -03, Arseniy Mukhin wrote:
    >> I agree it would be great to have more tests for such cases. As for
    >> the 'committed field' patch, I think we can add a TAP test that shows
    >> that listeners postpone processing of notifications until
    >> notifications were marked as 'committed=false' in case of aborted
    >> transactions. I tried to write one, but have not succeeded yet. Hope
    >> to finish it soon.
    >
    > I finally managed to write a TAP test for it, so there is a new
    > version with the tap test.
    >
    > I also realized that we can increase test coverage in
    > 002_aborted_tx_notifies.pl if notifications of the aborted transaction
    > span several pages. This way we can better test
    > asyncQueueRollbackNotifications(). So I changed
    > 002_aborted_tx_notifies.pl TAP test a bit.
    >
    > And there is a small indentation change in lmgr.h that should fix this
    > git am warning.
    >
    Thanks for the patches.
    
    I've created a CF entry so we can get more reviews and comments:
    https://commitfest.postgresql.org/patch/6095/
    
    Let's see what other think about the approach being used to fix this
    issue.
    
    --
    Matheus Alcantara
    
    
    
    
  41. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-02T12:06:38Z

    Hi,
    
    On Wed, Oct 1, 2025 at 2:57 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Sun Sep 28, 2025 at 10:17 AM -03, Arseniy Mukhin wrote:
    > >> I agree it would be great to have more tests for such cases. As for
    > >> the 'committed field' patch, I think we can add a TAP test that shows
    > >> that listeners postpone processing of notifications until
    > >> notifications were marked as 'committed=false' in case of aborted
    > >> transactions. I tried to write one, but have not succeeded yet. Hope
    > >> to finish it soon.
    > >
    > > I finally managed to write a TAP test for it, so there is a new
    > > version with the tap test.
    > >
    > > I also realized that we can increase test coverage in
    > > 002_aborted_tx_notifies.pl if notifications of the aborted transaction
    > > span several pages. This way we can better test
    > > asyncQueueRollbackNotifications(). So I changed
    > > 002_aborted_tx_notifies.pl TAP test a bit.
    > >
    > > And there is a small indentation change in lmgr.h that should fix this
    > > git am warning.
    > >
    > Thanks for the patches.
    >
    > I've created a CF entry so we can get more reviews and comments:
    > https://commitfest.postgresql.org/patch/6095/
    >
    
    Thank you.
    
    There is a test failure on CI, so please find the new patch version
    with the fix (Makefile was updated a little bit). And I merged 0002
    file with tap tests into 0001.
    
    
    Best regards,
    Arseniy Mukhin
    
  42. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-17T13:51:01Z

    I have the impression that this thread has lost focus on the idea of
    producing a backpatchable bugfix.  The last proposed patch has a lot of
    new mechanism that doesn't seem suitable for backpatch.  I could be
    wrong of course.
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    "[PostgreSQL] is a great group; in my opinion it is THE best open source
    development communities in existence anywhere."                (Lamar Owen)
    
    
    
    
  43. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-17T20:50:53Z

    On Fri, Oct 17, 2025 at 7:31 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
    >
    > I have the impression that this thread has lost focus on the idea of
    > producing a backpatchable bugfix.  The last proposed patch has a lot of
    > new mechanism that doesn't seem suitable for backpatch.  I could be
    > wrong of course.
    >
    
    Oops, I guess the TAP test that I added in the last version and that
    uses injection points is one of those things. PFA the new version
    without it. I also noticed that CheckSharedObjectLockedByMe() which
    was introduced in one of previous versions is unused, so I removed it
    too.
    
    
    Best regards,
    Arseniy Mukhin
    
  44. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-18T05:43:15Z

    On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
    > On Fri, Oct 17, 2025 at 7:31 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
    >>
    >> I have the impression that this thread has lost focus on the idea of
    >> producing a backpatchable bugfix.  The last proposed patch has a lot of
    >> new mechanism that doesn't seem suitable for backpatch.  I could be
    >> wrong of course.
    >>
    >
    > Oops, I guess the TAP test that I added in the last version and that
    > uses injection points is one of those things. PFA the new version
    > without it. I also noticed that CheckSharedObjectLockedByMe() which
    > was introduced in one of previous versions is unused, so I removed it
    > too.
    
    What a funny coincidence that the approach in this patch,
    has one similarity with the "Direct advancement" approach
    in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
    namely that we're both interested in QUEUE_HEAD before/after
    we push the notifications into the queue, in PreCommit_Notify().
    
    It looks to me like our new data structures are Interchangeable,
    so I guess we probably want both patches to eventually settle
    on one and the same?
    
    The differences I note between our queue head before/after code are:
    
    - In this patch, you are palloc'ing a struct with two fields.
      In [1], we're using two separate static QueuePosition variables.
    
    - In this patch, you are taking/releasing a shared lock before/after
      the loop to read QUEUE_HEAD and set previousHead/head.
      In [1], we avoid the need of the shared lock, by doing the reads
      within the existing exclusive lock inside the loop, but instead
      therefore need a firstIteration bool, to know which is the first
      iteration, and need to overwrite the after-var in each iteration.
    
    I don't think the noted differences above matter, both seems fine.
    
    Another thing I noticed in your patch that made me wonder,
    is the naming of the new AsyncQueueEntry bool field,
    which is given the name "committed".
    
    I think this name is not entirely faithful, since when set to true,
    the entry has not been committed yet.
    
    How about negating the meaning of this boolean field?
    To instead indicate when the entry has been rollbacked.
    Then, it would clearly communicate just that.
    
    Maybe naming it something like "rollbacked" or "aborted"?
    
    /Joel
    
    [1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e%40app.fastmail.com
    
    
    
    
  45. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-19T17:14:07Z

    On Fri, Oct 17, 2025, at 15:51, Álvaro Herrera wrote:
    > I have the impression that this thread has lost focus on the idea of
    > producing a backpatchable bugfix.  The last proposed patch has a lot of
    > new mechanism that doesn't seem suitable for backpatch.  I could be
    > wrong of course.
    
    I've tried to create a minimal isolated fix, hopefully suitable for
    backpatching, with no new mechanisms, other than the added
    GetOldestQueuedNotifyXid used by vac_update_datfrozenxid.
    
    It's based on the approach discussed earlier in this thread, that just
    goes through the notification queue from QUEUE_TAIL to QUEUE_HEAD, to
    find the oldestXid in the current database.
    
    Implementation:
    
    * Break out SLRU read page code from asyncQueueReadAllNotifications into
      new helper-function asyncQueueReadPageToBuffer.
    
    * Add GetOldestQueuedNotifyXid which uses the new helper-function
      asyncQueueReadPageToBuffer.
    
    It passes the 001_xid_freeze.pl test, not included in this patch.
    
    /Joel
  46. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T12:37:24Z

    On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
    > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
    > What a funny coincidence that the approach in this patch,
    > has one similarity with the "Direct advancement" approach
    > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
    > namely that we're both interested in QUEUE_HEAD before/after
    > we push the notifications into the queue, in PreCommit_Notify().
    >
    > It looks to me like our new data structures are Interchangeable,
    > so I guess we probably want both patches to eventually settle
    > on one and the same?
    >
    > The differences I note between our queue head before/after code are:
    >
    > - In this patch, you are palloc'ing a struct with two fields.
    >   In [1], we're using two separate static QueuePosition variables.
    >
    > - In this patch, you are taking/releasing a shared lock before/after
    >   the loop to read QUEUE_HEAD and set previousHead/head.
    >   In [1], we avoid the need of the shared lock, by doing the reads
    >   within the existing exclusive lock inside the loop, but instead
    >   therefore need a firstIteration bool, to know which is the first
    >   iteration, and need to overwrite the after-var in each iteration.
    >
    > I don't think the noted differences above matter, both seems fine.
    >
    Yeah, I also think that both approach seems fine. I keep the v8 version
    with the palloc, if someone has any concern about this I'm open to
    switch to another approach.
    
    > Another thing I noticed in your patch that made me wonder,
    > is the naming of the new AsyncQueueEntry bool field,
    > which is given the name "committed".
    >
    > I think this name is not entirely faithful, since when set to true,
    > the entry has not been committed yet.
    >
    > How about negating the meaning of this boolean field?
    > To instead indicate when the entry has been rollbacked.
    > Then, it would clearly communicate just that.
    >
    > Maybe naming it something like "rollbacked" or "aborted"?
    >
    Good point. I've renamed this field on the attached v8 version.
    
    --
    Matheus Alcantara
    
    
  47. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T12:41:43Z

    On Sun Oct 19, 2025 at 2:14 PM -03, Joel Jacobson wrote:
    > On Fri, Oct 17, 2025, at 15:51, Álvaro Herrera wrote:
    >> I have the impression that this thread has lost focus on the idea of
    >> producing a backpatchable bugfix.  The last proposed patch has a lot of
    >> new mechanism that doesn't seem suitable for backpatch.  I could be
    >> wrong of course.
    >
    > I've tried to create a minimal isolated fix, hopefully suitable for
    > backpatching, with no new mechanisms, other than the added
    > GetOldestQueuedNotifyXid used by vac_update_datfrozenxid.
    >
    > It's based on the approach discussed earlier in this thread, that just
    > goes through the notification queue from QUEUE_TAIL to QUEUE_HEAD, to
    > find the oldestXid in the current database.
    >
    > Implementation:
    >
    > * Break out SLRU read page code from asyncQueueReadAllNotifications into
    >   new helper-function asyncQueueReadPageToBuffer.
    >
    > * Add GetOldestQueuedNotifyXid which uses the new helper-function
    >   asyncQueueReadPageToBuffer.
    >
    > It passes the 001_xid_freeze.pl test, not included in this patch.
    >
    This is similar to what was already proposed at [1]. This approach was
    abandoned because a notification on the queue may block datfrozenxid
    advance and clog truncation which can cause other issues for the users [2].
    
    
    [1] https://www.postgresql.org/message-id/CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com
    [2] https://www.postgresql.org/message-id/d186fba0-dc65-4274-aa96-3906bbb2e530%40postgrespro.ru
    
    --
    Matheus Alcantara
    
    
    
    
    
  48. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-20T14:18:50Z

    On 2025-Oct-20, Matheus Alcantara wrote:
    
    > This is similar to what was already proposed at [1]. This approach was
    > abandoned because a notification on the queue may block datfrozenxid
    > advance and clog truncation which can cause other issues for the users [2].
    
    Well, I think that this is the right solution for backpatching, and that
    you were wrong to abandon it.  You can continue to design a better
    mechanism for the master branch, but in old branches we cannot really do
    all those things you're proposing to do.
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
    al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
    La Feria de las Tinieblas, R. Bradbury)
    
    
    
    
  49. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T18:19:50Z

    On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
    > On 2025-Oct-20, Matheus Alcantara wrote:
    >
    >> This is similar to what was already proposed at [1]. This approach was
    >> abandoned because a notification on the queue may block datfrozenxid
    >> advance and clog truncation which can cause other issues for the users [2].
    >
    > Well, I think that this is the right solution for backpatching, and that
    > you were wrong to abandon it.  You can continue to design a better
    > mechanism for the master branch, but in old branches we cannot really do
    > all those things you're proposing to do.
    > 
    I actually would prefer this approach TBH, but since this can cause
    other issues like transaction wraparound due to not consumed
    notifications we would need other mechanisms to prevent that and I'm not
    sure if users should expect this kind of behavior changes on minor
    version updates? 
    
    I think that to go with this solution we would need some way to drop too
    old notifications from the queue to advance the datfrozenxid, so I
    imagine that we would need some GUC to make this configurable and we can
    configure a default value of course but some use cases may not be the
    best configuration, this is something that users should expected to deal
    on minor version updates?
    
    Going with the "self contained" idea sound more easier to backpatch
    actually, so this is the main reason that I abandoned this other
    approach. Could you please point what make the v8 version not visible
    for bachpatching?
    
    Thanks for the comments!
    
    --
    Matheus Alcantara
    
    
    
    
    
  50. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-21T01:31:54Z

    On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
    > > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
    > > What a funny coincidence that the approach in this patch,
    > > has one similarity with the "Direct advancement" approach
    > > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
    > > namely that we're both interested in QUEUE_HEAD before/after
    > > we push the notifications into the queue, in PreCommit_Notify().
    > >
    > > It looks to me like our new data structures are Interchangeable,
    > > so I guess we probably want both patches to eventually settle
    > > on one and the same?
    > >
    > > The differences I note between our queue head before/after code are:
    > >
    > > - In this patch, you are palloc'ing a struct with two fields.
    > >   In [1], we're using two separate static QueuePosition variables.
    > >
    > > - In this patch, you are taking/releasing a shared lock before/after
    > >   the loop to read QUEUE_HEAD and set previousHead/head.
    > >   In [1], we avoid the need of the shared lock, by doing the reads
    > >   within the existing exclusive lock inside the loop, but instead
    > >   therefore need a firstIteration bool, to know which is the first
    > >   iteration, and need to overwrite the after-var in each iteration.
    > >
    > > I don't think the noted differences above matter, both seems fine.
    > >
    > Yeah, I also think that both approach seems fine. I keep the v8 version
    > with the palloc, if someone has any concern about this I'm open to
    > switch to another approach.
    >
    > > Another thing I noticed in your patch that made me wonder,
    > > is the naming of the new AsyncQueueEntry bool field,
    > > which is given the name "committed".
    > >
    > > I think this name is not entirely faithful, since when set to true,
    > > the entry has not been committed yet.
    > >
    > > How about negating the meaning of this boolean field?
    > > To instead indicate when the entry has been rollbacked.
    > > Then, it would clearly communicate just that.
    > >
    > > Maybe naming it something like "rollbacked" or "aborted"?
    > >
    > Good point. I've renamed this field on the attached v8 version.
    
    I've reviewed the v8 patch and I'm not sure it's a bullet-proof
    approach. The basic idea of the v8 patch is to add async entries with
    rollbacked=false and we set rollbacked=true in
    asyncQueueRollbackNotifications called from AtAbort_Notify() if an
    error happens between PreCommit_Notify() and AtCommit_Notify(). The
    asyncQueueRollbackNotifications() reads SLRU pages to mark entries
    'rollbacked' but reading SLRU pages could fail for some reason. In
    this case, we would end up with a PANIC for recursive errors, or even
    if we skip marking entries we would end up leaving entries as
    committed. Also, if there are a lot of notifications across multiple
    SLRU pages we need to mark as 'rollbacked',
    asyncQueueRollbackNotifications() could take time to complete but it's
    not interruptible. I don't think it's a good idea to introduce such an
    operation in abort paths.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  51. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-21T20:26:55Z

    On Tue, Oct 21, 2025 at 4:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
    >
    > On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    > >
    > > On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
    > > > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
    > > > What a funny coincidence that the approach in this patch,
    > > > has one similarity with the "Direct advancement" approach
    > > > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
    > > > namely that we're both interested in QUEUE_HEAD before/after
    > > > we push the notifications into the queue, in PreCommit_Notify().
    > > >
    > > > It looks to me like our new data structures are Interchangeable,
    > > > so I guess we probably want both patches to eventually settle
    > > > on one and the same?
    > > >
    > > > The differences I note between our queue head before/after code are:
    > > >
    > > > - In this patch, you are palloc'ing a struct with two fields.
    > > >   In [1], we're using two separate static QueuePosition variables.
    > > >
    > > > - In this patch, you are taking/releasing a shared lock before/after
    > > >   the loop to read QUEUE_HEAD and set previousHead/head.
    > > >   In [1], we avoid the need of the shared lock, by doing the reads
    > > >   within the existing exclusive lock inside the loop, but instead
    > > >   therefore need a firstIteration bool, to know which is the first
    > > >   iteration, and need to overwrite the after-var in each iteration.
    > > >
    > > > I don't think the noted differences above matter, both seems fine.
    > > >
    > > Yeah, I also think that both approach seems fine. I keep the v8 version
    > > with the palloc, if someone has any concern about this I'm open to
    > > switch to another approach.
    > >
    > > > Another thing I noticed in your patch that made me wonder,
    > > > is the naming of the new AsyncQueueEntry bool field,
    > > > which is given the name "committed".
    > > >
    > > > I think this name is not entirely faithful, since when set to true,
    > > > the entry has not been committed yet.
    > > >
    > > > How about negating the meaning of this boolean field?
    > > > To instead indicate when the entry has been rollbacked.
    > > > Then, it would clearly communicate just that.
    > > >
    > > > Maybe naming it something like "rollbacked" or "aborted"?
    > > >
    > > Good point. I've renamed this field on the attached v8 version.
    >
    > I've reviewed the v8 patch and I'm not sure it's a bullet-proof
    > approach. The basic idea of the v8 patch is to add async entries with
    > rollbacked=false and we set rollbacked=true in
    > asyncQueueRollbackNotifications called from AtAbort_Notify() if an
    > error happens between PreCommit_Notify() and AtCommit_Notify(). The
    > asyncQueueRollbackNotifications() reads SLRU pages to mark entries
    > 'rollbacked' but reading SLRU pages could fail for some reason. In
    > this case, we would end up with a PANIC for recursive errors, or even
    > if we skip marking entries we would end up leaving entries as
    > committed. Also, if there are a lot of notifications across multiple
    > SLRU pages we need to mark as 'rollbacked',
    > asyncQueueRollbackNotifications() could take time to complete but it's
    > not interruptible. I don't think it's a good idea to introduce such an
    > operation in abort paths.
    >
    
    Thanks for the explanation! Now I see why using AtAbort_Notify() was a
    bad idea. It might be possible to solve the second problem and make
    the amount of work in AtAbort_Notify independent of the number of
    notifications. But I have no idea how to solve the SLRU page issue.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  52. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-21T21:42:18Z

    On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
    > > On 2025-Oct-20, Matheus Alcantara wrote:
    > >
    > >> This is similar to what was already proposed at [1]. This approach was
    > >> abandoned because a notification on the queue may block datfrozenxid
    > >> advance and clog truncation which can cause other issues for the users [2].
    > >
    > > Well, I think that this is the right solution for backpatching, and that
    > > you were wrong to abandon it.  You can continue to design a better
    > > mechanism for the master branch, but in old branches we cannot really do
    > > all those things you're proposing to do.
    > >
    > I actually would prefer this approach TBH, but since this can cause
    > other issues like transaction wraparound due to not consumed
    > notifications we would need other mechanisms to prevent that and I'm not
    > sure if users should expect this kind of behavior changes on minor
    > version updates?
    
    True, unconsumed notifications could cause transaction wraparound by
    preventing datfrozenxid from advancing. However, this risk only
    applies when users have long-term unconsumed notifications, which is
    uncommon. That said, we should note that, as I mentioned
    previously[1], a process can accumulate unconsumed notifications
    simply by being in idle-in-transaction state, even without
    backend_xmin and backend_xid, which prevents datfrozenxid from
    advancing. While this might not be problematic in practice if it's
    rare, I find it concerning that we have no way to check the age of
    unconsumed notifications.
    
    >
    > I think that to go with this solution we would need some way to drop too
    > old notifications from the queue to advance the datfrozenxid, so I
    > imagine that we would need some GUC to make this configurable and we can
    > configure a default value of course but some use cases may not be the
    > best configuration, this is something that users should expected to deal
    > on minor version updates?
    
    I think adding a new GUC would be overkill for this fix. As for
    dropping old notifications from the queue, we probably don't need to
    make it configurable - we could simply drop notifications whose commit
    status is no longer available (instead of raising an error).
    
    >
    > Going with the "self contained" idea sound more easier to backpatch
    > actually, so this is the main reason that I abandoned this other
    > approach. Could you please point what make the v8 version not visible
    > for bachpatching?
    
    Regarding the v8 patch, it introduces a fundamentally new way of
    managing notification entries (adding entries with 'committed' state
    and marking them 'aborted' in abort paths). This affects all use
    cases, not just those involving very old unconsumed notifications, and
    could introduce more serious bugs like PANIC or SEGV. For
    backpatching, I prefer targeting just the problematic behavior while
    leaving unrelated parts unchanged. Though Álvaro might have a
    different perspective on this.
    
    Regards,
    
    [1] https://www.postgresql.org/message-id/CAD21AoCD%2BHXoc2QZCAS9d8ahDeikNqbnU0i6cQzpMFOEurkPPg%40mail.gmail.com
    
    --
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  53. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-21T23:16:49Z

    On 21/10/25 18:42, Masahiko Sawada wrote:
    > On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    >>
    >> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
    >>> On 2025-Oct-20, Matheus Alcantara wrote:
    >>>
    >>>> This is similar to what was already proposed at [1]. This approach was
    >>>> abandoned because a notification on the queue may block datfrozenxid
    >>>> advance and clog truncation which can cause other issues for the users [2].
    >>>
    >>> Well, I think that this is the right solution for backpatching, and that
    >>> you were wrong to abandon it.  You can continue to design a better
    >>> mechanism for the master branch, but in old branches we cannot really do
    >>> all those things you're proposing to do.
    >>>
    >> I actually would prefer this approach TBH, but since this can cause
    >> other issues like transaction wraparound due to not consumed
    >> notifications we would need other mechanisms to prevent that and I'm not
    >> sure if users should expect this kind of behavior changes on minor
    >> version updates?
    > 
    > True, unconsumed notifications could cause transaction wraparound by
    > preventing datfrozenxid from advancing. However, this risk only
    > applies when users have long-term unconsumed notifications, which is
    > uncommon. That said, we should note that, as I mentioned
    > previously[1], a process can accumulate unconsumed notifications
    > simply by being in idle-in-transaction state, even without
    > backend_xmin and backend_xid, which prevents datfrozenxid from
    > advancing. While this might not be problematic in practice if it's
    > rare, I find it concerning that we have no way to check the age of
    > unconsumed notifications.
    >
    Ok, I think that I was too conservative when thinking about the
    transaction wraparound issue that it could happen. I agree that this
    seems a uncommon scenario.
    
    >> I think that to go with this solution we would need some way to drop too
    >> old notifications from the queue to advance the datfrozenxid, so I
    >> imagine that we would need some GUC to make this configurable and we can
    >> configure a default value of course but some use cases may not be the
    >> best configuration, this is something that users should expected to deal
    >> on minor version updates?
    > 
    > I think adding a new GUC would be overkill for this fix. As for
    > dropping old notifications from the queue, we probably don't need to
    > make it configurable - we could simply drop notifications whose commit
    > status is no longer available (instead of raising an error).
    >
    IIUC this is about not making the vacuum freeze considering the oldest
    xid on the queue but just remove notifications whose transaction status
    is no longer available right? Since currently when the error happens we
    already can't process the notifications it seems a reasonable way to go
    IMO.
    
    >> Going with the "self contained" idea sound more easier to backpatch
    >> actually, so this is the main reason that I abandoned this other
    >> approach. Could you please point what make the v8 version not visible
    >> for bachpatching?
    > 
    > Regarding the v8 patch, it introduces a fundamentally new way of
    > managing notification entries (adding entries with 'committed' state
    > and marking them 'aborted' in abort paths). This affects all use
    > cases, not just those involving very old unconsumed notifications, and
    > could introduce more serious bugs like PANIC or SEGV. For
    > backpatching, I prefer targeting just the problematic behavior while
    > leaving unrelated parts unchanged. Though Álvaro might have a
    > different perspective on this.
    >
    Thanks very much for this explanation and for what you've previously
    wrote on [1]. It's clear to me now that the v8 architecture is not a
    good way to go.
    
    [1] https://www.postgresql.org/message-id/CAD21AoCFZxXCBy%2B5DoarfG9LC9VdNwWRDpDHE5sdTh5Ym0EcqQ%40mail.gmail.com
    
    -- 
    Matheus Alcantara
    
    
    
    
  54. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-22T00:02:08Z

    On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
    >> Regarding the v8 patch, it introduces a fundamentally new way of
    >> managing notification entries (adding entries with 'committed' state
    >> and marking them 'aborted' in abort paths). This affects all use
    >> cases, not just those involving very old unconsumed notifications, and
    >> could introduce more serious bugs like PANIC or SEGV. For
    >> backpatching, I prefer targeting just the problematic behavior while
    >> leaving unrelated parts unchanged. Though Álvaro might have a
    >> different perspective on this.
    >>
    > Thanks very much for this explanation and for what you've previously
    > wrote on [1]. It's clear to me now that the v8 architecture is not a
    > good way to go.
    
    How about doing some more work in vac_update_datfrozenxid()?
    
    Pseudo-code sketch:
    
    ```
    void
    vac_update_datfrozenxid(void)
    {
    
        /* After computing newFrozenXid from all known sources... */
    
        TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();
    
        if (TransactionIdIsValid(oldestNotifyXid) &&
            TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
        {
            /*
             * The async queue has XIDs older than our proposed freeze point.
             * Attempt cleanup, then back off and let the next VACUUM benefit.
             */
    
            if (asyncQueueHasListeners())
            {
                /*
                 * Wake all listening backends across *all* databases
                 * that are not already at QUEUE_HEAD.
                 * They'll hopefully process notifications and advance
                 * their pointers, allowing the next VACUUM to freeze further.
                 */
                asyncQueueWakeAllListeners();
            }
            else
            {
                /*
                 * No listeners exist - discard all unread notifications.
                 * The next VACUUM should succeed in advancing datfrozenxid.
                 * asyncQueueAdvanceTailNoListeners() would take exclusive lock
                 * on NotifyQueueLock before checking
                 * QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
                 */
                asyncQueueAdvanceTailNoListeners();
            }
    
            /*
             * Back off datfrozenxid to protect the old XIDs.
             * The cleanup we just performed should allow the next VACUUM
             * to freeze further.
             */
            newFrozenXid = oldestNotifyXid;
        }
    }
    ```
    
    Maybe it wouldn't solve all problematic situations, but to me it seems
    like these measures could help many of them, or am I missing some
    crucial insight here?
    
    /Joel
    
    
    
    
  55. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-22T04:31:46Z

    On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On 21/10/25 18:42, Masahiko Sawada wrote:
    > > On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
    > > <matheusssilv97@gmail.com> wrote:
    > >>
    > >> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
    > >>> On 2025-Oct-20, Matheus Alcantara wrote:
    > >>>
    > >>>> This is similar to what was already proposed at [1]. This approach was
    > >>>> abandoned because a notification on the queue may block datfrozenxid
    > >>>> advance and clog truncation which can cause other issues for the users [2].
    > >>>
    > >>> Well, I think that this is the right solution for backpatching, and that
    > >>> you were wrong to abandon it.  You can continue to design a better
    > >>> mechanism for the master branch, but in old branches we cannot really do
    > >>> all those things you're proposing to do.
    > >>>
    > >> I actually would prefer this approach TBH, but since this can cause
    > >> other issues like transaction wraparound due to not consumed
    > >> notifications we would need other mechanisms to prevent that and I'm not
    > >> sure if users should expect this kind of behavior changes on minor
    > >> version updates?
    > >
    > > True, unconsumed notifications could cause transaction wraparound by
    > > preventing datfrozenxid from advancing. However, this risk only
    > > applies when users have long-term unconsumed notifications, which is
    > > uncommon. That said, we should note that, as I mentioned
    > > previously[1], a process can accumulate unconsumed notifications
    > > simply by being in idle-in-transaction state, even without
    > > backend_xmin and backend_xid, which prevents datfrozenxid from
    > > advancing. While this might not be problematic in practice if it's
    > > rare, I find it concerning that we have no way to check the age of
    > > unconsumed notifications.
    > >
    > Ok, I think that I was too conservative when thinking about the
    > transaction wraparound issue that it could happen. I agree that this
    > seems a uncommon scenario.
    >
    > >> I think that to go with this solution we would need some way to drop too
    > >> old notifications from the queue to advance the datfrozenxid, so I
    > >> imagine that we would need some GUC to make this configurable and we can
    > >> configure a default value of course but some use cases may not be the
    > >> best configuration, this is something that users should expected to deal
    > >> on minor version updates?
    > >
    > > I think adding a new GUC would be overkill for this fix. As for
    > > dropping old notifications from the queue, we probably don't need to
    > > make it configurable - we could simply drop notifications whose commit
    > > status is no longer available (instead of raising an error).
    > >
    > IIUC this is about not making the vacuum freeze considering the oldest
    > xid on the queue but just remove notifications whose transaction status
    > is no longer available right? Since currently when the error happens we
    > already can't process the notifications it seems a reasonable way to go
    > IMO.
    
    On second thought, simply hiding the error would be worse than our
    current behavior. Users wouldn't know their notifications are being
    dropped, as they often don't check WARNINGs. The more frequently they
    try to freeze XIDs, the more notifications they'd lose. To avoid
    silent discards, they would need to increase
    autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
    this increases the risk of XID wraparound. I think the proposed
    approach modifying the vacuum freeze to consider the oldest XID on the
    queue would be better. This has a downside as I mentioned: processes
    in idle-in-transaction state even without backend_xmin and backend_xid
    can still accumulate unconsumed notifications. However, leaving
    transactions in idle-in-transaction state for a long time is bad
    practice anyway. While we might want to consider adding a safeguard
    for this case, I guess it would rarely occur in practice.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  56. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-22T15:23:42Z

    On Wed Oct 22, 2025 at 1:31 AM -03, Masahiko Sawada wrote:
    > On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
    >> > I think adding a new GUC would be overkill for this fix. As for
    >> > dropping old notifications from the queue, we probably don't need to
    >> > make it configurable - we could simply drop notifications whose commit
    >> > status is no longer available (instead of raising an error).
    >> >
    >> IIUC this is about not making the vacuum freeze considering the oldest
    >> xid on the queue but just remove notifications whose transaction status
    >> is no longer available right? Since currently when the error happens we
    >> already can't process the notifications it seems a reasonable way to go
    >> IMO.
    >
    > On second thought, simply hiding the error would be worse than our
    > current behavior. Users wouldn't know their notifications are being
    > dropped, as they often don't check WARNINGs. The more frequently they
    > try to freeze XIDs, the more notifications they'd lose. To avoid
    > silent discards, they would need to increase
    > autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
    > this increases the risk of XID wraparound. I think the proposed
    > approach modifying the vacuum freeze to consider the oldest XID on the
    > queue would be better. This has a downside as I mentioned: processes
    > in idle-in-transaction state even without backend_xmin and backend_xid
    > can still accumulate unconsumed notifications. However, leaving
    > transactions in idle-in-transaction state for a long time is bad
    > practice anyway. While we might want to consider adding a safeguard
    > for this case, I guess it would rarely occur in practice.
    >
    I'm attaching a v9 patch which is based on the idea of changing the
    vacuum freeze to consider the oldest xid on the listen/notify queue. The
    0001 patch is from Joel that it was previously sent on [1] with some
    small tweaks and the 0002 is the TAP tests introduced on the previously
    versions by me and by Arseniy. I keep it separate because I'm not sure
    if it's all suitable for back-pacthing.
    
    I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    this architecture being used. I think that it's not, but perhaps is a
    good test to keep it?
    
    [1] https://www.postgresql.org/message-id/25651193-da4e-4185-a564-f2efa6b0c8a4%40app.fastmail.com
    
    --
    Matheus Alcantara
    
    
  57. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-22T15:49:43Z

    On 2025-Oct-22, Matheus Alcantara wrote:
    
    > I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    > this architecture being used. I think that it's not, but perhaps is a
    > good test to keep it?
    
    I'd rather have tests than not, but I'd think it needs to be behind
    PG_TEST_EXTRA because of things like
    
    +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    
    -- 
    Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
    "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
    
    
    
    
  58. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-22T17:25:36Z

    On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
    > On 2025-Oct-22, Matheus Alcantara wrote:
    >
    >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    >> this architecture being used. I think that it's not, but perhaps is a
    >> good test to keep it?
    >
    > I'd rather have tests than not, but I'd think it needs to be behind
    > PG_TEST_EXTRA because of things like
    >
    > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    >
    Attached v10 with wrapping into PG_TEST_EXTRA. Should we enable this
    somewhere to be executed on build farm?
    
    --
    Matheus Alcantara
    
    
  59. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-23T23:42:45Z

    On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
    > > On 2025-Oct-22, Matheus Alcantara wrote:
    > >
    > >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    > >> this architecture being used. I think that it's not, but perhaps is a
    > >> good test to keep it?
    > >
    > > I'd rather have tests than not, but I'd think it needs to be behind
    > > PG_TEST_EXTRA because of things like
    > >
    > > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    > >
    > Attached v10 with wrapping into PG_TEST_EXTRA.
    
    Thank you for updating the patches!
    
    I've reviewed the patches and here are the comments.
    
    v10-0001-Prevent-VACUUM-from-truncating-XIDs-still-presen.patch:
    
    + *
    + * Returns InvalidTransactionId if there are no unprocessed notifications.
    
    This comment is not accurate since the function returns
    InvalidTransactionId even if all unprocessed notifications are created
    on other databases.
    
    ---
    +TransactionId
    +GetOldestQueuedNotifyXid(void)
    +{
    
    How about renaming it to something like GetOldestNotifyTransactionId()?
    
    ---
    +   /* page_buffer must be adequately aligned, so use a union */
    +   union
    +   {
    +       char        buf[QUEUE_PAGESIZE];
    +       AsyncQueueEntry align;
    +   }           page_buffer;
    
    asyncQueueReadAllNotifications() uses this union too, so how about
    define this type as AlignedQueueEntryPage or something and use it in
    both functions?
    
    ---
    +
    +   LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
    +
    
    I don't think we need an exclusive lock here.
    
    ---
    In GetOldestQueuedNotifyXid() why do we keep holding NotifyQueueLock
    while calculating the oldest XID in the queue?
    
    ---
    +
    +       /* Process all entries on this page up to head */
    +       while (curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
    QUEUE_PAGESIZE &&
    +              !QUEUE_POS_EQUAL(pos, head))
    +       {
    +           qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
    (snip)
    +           /* Advance to next entry */
    +           if (asyncQueueAdvance(&pos, qe->length))
    +               break;          /* advanced to next page */
    +
    
    I think the check "curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
    QUEUE_PAGESIZE" is essentially the same as the one we do in
    asyncQueueAdvance(). I think we can refactor the inner loop in
    GetOldestQueuedNotifyXid() to something like:
    
    for (;;)
    {
        bool reachedEndOfPage;
    
        qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
    
        // check qe->xid here...
    
        reachedEndOfPage = asyncQueueAdvance(&pos, qe->length);
    
        if (reachedEndOfPage || QUEUE_POS_EQUAL(pos, head))
            break;
    }
    
    ---
    v10-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch:
    
    This new test directory needs to have .gitignore.
    
    ---
    +# Copyright (c) 2024-2025, PostgreSQL Global Development Group
    +
    
    It's better to have a short description of this test here.
    
    ---
    +use File::Path qw(mkpath);
    
    It seems not necessary.
    
    ---
    +$node->safe_psql('postgres',
    +   'CREATE TABLE t AS SELECT g AS a, g+2 AS b from
    generate_series(1,100000) g;'
    
    Why does it need to insert many rows?
    
    ---
    +# --- Session 2, multiple notify's, and commit ---
    +for my $i (1 .. 10)
    +{
    +   $node->safe_psql(
    +       'postgres', "
    +       BEGIN;
    +       NOTIFY s, '$i';
    +       COMMIT;");
    +}
    
    Why does it need to send a notification 10 times?
    
    ---
    +# Consume enough XIDs to trigger truncation
    +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    +
    
    I guess it consumes XID too much to trigger truncation. Given the one
    clog segment file is 256kB in size, it's enough to consume 1,050,000
    XIDs to move to the next segment file.
    
    ---
    +# Get the new datfrozenxid after vacuum freeze to ensure that is advanced but
    +# we can still get the notification status of the notification
    +my $datafronzenxid_freeze = $node->safe_psql('postgres', "select
    datfrozenxid from pg_database where datname = 'postgres'");
    +ok($datafronzenxid_freeze > $datafronzenxid, 'datfrozenxid is advanced');
    
    I think this test passes in all cases, so it is meaningless. Instead,
    what we need to check in terms of datfrozenxid is that its value
    doesn't get greater than the XID we used for the notification, even
    after vacuum freeze. I think we remember the XID used for NOTIFY and
    check if the old/new datfrozenxid values are older than that value.
    
    ---
    +is($notifications_count, 10, 'received all committed notifications');
    +
    +done_testing();
    
    After consuming the unconsumed notifications, let's do vacuum freeze
    and check if datfrozenxid now can be advanced.
    
    ---
    I would expect to add 002_aborted_tx_notifies.pl in a separate patch
    since it's not related to this bug fix.
    
    ---
    +# Test checks that listeners do not receive notifications from aborted
    +# transaction even if notifications have been added to the listen/notify
    +# queue. To reproduce it we use the fact that serializable conflicts
    +# are checked after tx adds notifications to the queue.
    
    I wonder if we could implement this test using the isolation test
    instead of the tap test. Is there any reason why you used a tap test
    for that?
    
    Regards,
    
    > Should we enable this
    > somewhere to be executed on build farm?
    
    Yeah, I hope some buildfarm animals enable it.
    
    Regards,
    
    -- 
    Masahiko Sawada
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  60. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-24T07:36:44Z

    On Fri, Oct 24, 2025 at 2:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
    >
    > On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    > >
    > > On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
    > > > On 2025-Oct-22, Matheus Alcantara wrote:
    > > >
    > > >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    > > >> this architecture being used. I think that it's not, but perhaps is a
    > > >> good test to keep it?
    > > >
    > > > I'd rather have tests than not, but I'd think it needs to be behind
    > > > PG_TEST_EXTRA because of things like
    > > >
    > > > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    > > >
    > > Attached v10 with wrapping into PG_TEST_EXTRA.
    >
    > Thank you for updating the patches!
    >
    > I've reviewed the patches and here are the comments.
    
    Thank you for the review!
    
    > I would expect to add 002_aborted_tx_notifies.pl in a separate patch
    > since it's not related to this bug fix.
    >
    > ---
    > +# Test checks that listeners do not receive notifications from aborted
    > +# transaction even if notifications have been added to the listen/notify
    > +# queue. To reproduce it we use the fact that serializable conflicts
    > +# are checked after tx adds notifications to the queue.
    >
    > I wonder if we could implement this test using the isolation test
    > instead of the tap test. Is there any reason why you used a tap test
    > for that?
    >
    
    I agree it's less relevant to the patch now than it was with the new
    'committed' field approach. And there is no particular reason why it
    was implemented as a TAP test actually.. So +1 to move it to separate
    patch (does it mean to separate thread as well or just separate patch
    file?) and rewrite as an isolation test (IIUC it's better to use
    isolation test infrastructure if it's possible). I can try to do it if
    nobody else does it earlier.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  61. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-24T08:55:02Z

    On Wed, Oct 22, 2025 at 3:02 AM Joel Jacobson <joel@compiler.org> wrote:
    >
    > On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
    > >> Regarding the v8 patch, it introduces a fundamentally new way of
    > >> managing notification entries (adding entries with 'committed' state
    > >> and marking them 'aborted' in abort paths). This affects all use
    > >> cases, not just those involving very old unconsumed notifications, and
    > >> could introduce more serious bugs like PANIC or SEGV. For
    > >> backpatching, I prefer targeting just the problematic behavior while
    > >> leaving unrelated parts unchanged. Though Álvaro might have a
    > >> different perspective on this.
    > >>
    > > Thanks very much for this explanation and for what you've previously
    > > wrote on [1]. It's clear to me now that the v8 architecture is not a
    > > good way to go.
    >
    > How about doing some more work in vac_update_datfrozenxid()?
    >
    > Pseudo-code sketch:
    >
    > ```
    > void
    > vac_update_datfrozenxid(void)
    > {
    >
    >     /* After computing newFrozenXid from all known sources... */
    >
    >     TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();
    >
    >     if (TransactionIdIsValid(oldestNotifyXid) &&
    >         TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
    >     {
    >         /*
    >          * The async queue has XIDs older than our proposed freeze point.
    >          * Attempt cleanup, then back off and let the next VACUUM benefit.
    >          */
    >
    >         if (asyncQueueHasListeners())
    >         {
    >             /*
    >              * Wake all listening backends across *all* databases
    >              * that are not already at QUEUE_HEAD.
    >              * They'll hopefully process notifications and advance
    >              * their pointers, allowing the next VACUUM to freeze further.
    >              */
    >             asyncQueueWakeAllListeners();
    >         }
    >         else
    >         {
    >             /*
    >              * No listeners exist - discard all unread notifications.
    >              * The next VACUUM should succeed in advancing datfrozenxid.
    >              * asyncQueueAdvanceTailNoListeners() would take exclusive lock
    >              * on NotifyQueueLock before checking
    >              * QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
    >              */
    >             asyncQueueAdvanceTailNoListeners();
    >         }
    >
    >         /*
    >          * Back off datfrozenxid to protect the old XIDs.
    >          * The cleanup we just performed should allow the next VACUUM
    >          * to freeze further.
    >          */
    >         newFrozenXid = oldestNotifyXid;
    >     }
    > }
    > ```
    >
    > Maybe it wouldn't solve all problematic situations, but to me it seems
    > like these measures could help many of them, or am I missing some
    > crucial insight here?
    
    I agree we need to add something like this. Looks like with v10 it's
    possible for the listen/notify queue to block datfrozenxid advancing
    even without extreme circumstances (without hanging listeners etc).
    
    I see two thing we should take care of in v10:
    
    1) Currently asyncQueueAdvanceTail is called regularly only if we have
    a constant flow of notifications. We try to advance the tail every
    time when the head reaches every 4th page. So if we don't have new
    notifications, the tail will stay where it is forever. It means that
    if we have at least 1 notification in the queue without constant flow
    of notifications then GetOldestQueuedNotifyXid will constantly return
    the same result which will block the advancement of datfrozenxid. So
    it looks like we need something that will advance the tail in this
    case.
    
    2) Currently we wake up all listeners regularly only if we have a
    constant flow of notifications (again). Even if we have a listener and
    we never wake up such a listener because of the new notifications (for
    example there are no new notifications in the listener's database), we
    still signal such a listener sometimes as its position lags too much
    from the head. But again, if we don't have new notifications, it's
    possible that such a listener will never process some notification
    (from another database) and tail advancement will be blocked. As a
    result, datfrozenxid advancement also will be blocked. So probably we
    need something that will wake up lagging listeners if they block tail
    advancement.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  62. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-24T12:12:01Z

    On Fri, Oct 24, 2025, at 11:55, Arseniy Mukhin wrote:
    > On Wed, Oct 22, 2025 at 3:02 AM Joel Jacobson <joel@compiler.org> wrote:
    >> How about doing some more work in vac_update_datfrozenxid()?
    >>
    >> Pseudo-code sketch:
    >>
    ...
    > I agree we need to add something like this. Looks like with v10 it's
    > possible for the listen/notify queue to block datfrozenxid advancing
    > even without extreme circumstances (without hanging listeners etc).
    
    Attached, two implementations of the sketched out idea, which can be
    applied on top of the v10 patch. They should both be functionally
    equivalent.
    
    vacuum_notify_queue_cleanup-with-code-dup.txt:
    
    This version doesn't try at all to avoid code duplication;
    asyncQueueAdvanceTailNoListeners is very similar to SignalBackends, and
    asyncQueueAdvanceTailNoListeners is very similar to
    asyncQueueAdvanceTail. I think this might be preferable, if the channel
    hash optimization that we're working on in the other thread, as a bonus
    solves these fundamental problems, so that these added safety
    functionality can be eliminated. I think it's quite likely we can
    achieve that, but not certain.
    
    vacuum_notify_queue_cleanup-without-code-dup.txt:
    
    This version instead equips SignalBackends and asyncQueueAdvanceTail
    with a new boolean input parameter to control their behavior, where
    passing false gives the current behavior, used at the current call
    sites, and vacuum would pass true, to signal all non-caught-up backends
    in all databases and forcibly advance the tail when there are no
    listening backends.
    
    I have no strong preference for one or the other.
    
    /Joel
  63. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-25T00:32:54Z

    Thanks for the review!
    
    On Thu Oct 23, 2025 at 8:42 PM -03, Masahiko Sawada wrote:
    > On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
    > <matheusssilv97@gmail.com> wrote:
    >>
    >> On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
    >> > On 2025-Oct-22, Matheus Alcantara wrote:
    >> >
    >> >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
    >> >> this architecture being used. I think that it's not, but perhaps is a
    >> >> good test to keep it?
    >> >
    >> > I'd rather have tests than not, but I'd think it needs to be behind
    >> > PG_TEST_EXTRA because of things like
    >> >
    >> > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    >> >
    >> Attached v10 with wrapping into PG_TEST_EXTRA.
    >
    > Thank you for updating the patches!
    >
    > I've reviewed the patches and here are the comments.
    >
    > v10-0001-Prevent-VACUUM-from-truncating-XIDs-still-presen.patch:
    >
    > + *
    > + * Returns InvalidTransactionId if there are no unprocessed notifications.
    >
    > This comment is not accurate since the function returns
    > InvalidTransactionId even if all unprocessed notifications are created
    > on other databases.
    >
    Fixed
    
    > ---
    > +TransactionId
    > +GetOldestQueuedNotifyXid(void)
    > +{
    >
    > How about renaming it to something like GetOldestNotifyTransactionId()?
    >
    Yeap, sounds better, fixed.
    
    > ---
    > +   /* page_buffer must be adequately aligned, so use a union */
    > +   union
    > +   {
    > +       char        buf[QUEUE_PAGESIZE];
    > +       AsyncQueueEntry align;
    > +   }           page_buffer;
    >
    > asyncQueueReadAllNotifications() uses this union too, so how about
    > define this type as AlignedQueueEntryPage or something and use it in
    > both functions?
    >
    Good point, fixed.
    
    > ---
    > +
    > +   LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
    > +
    >
    > I don't think we need an exclusive lock here.
    >
    Fixed.
    
    > ---
    > In GetOldestQueuedNotifyXid() why do we keep holding NotifyQueueLock
    > while calculating the oldest XID in the queue?
    >
    Yeah, I don't think that it's necessary. The
    asyncQueueReadAllNotifications() for example only hold the lock when
    reading the QUEUE_BACKEND_POS and QUEUE_HEAD. I think that it's a
    similar case here, fixed.
    
    > ---
    > +
    > +       /* Process all entries on this page up to head */
    > +       while (curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
    > QUEUE_PAGESIZE &&
    > +              !QUEUE_POS_EQUAL(pos, head))
    > +       {
    > +           qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
    > (snip)
    > +           /* Advance to next entry */
    > +           if (asyncQueueAdvance(&pos, qe->length))
    > +               break;          /* advanced to next page */
    > +
    >
    > I think the check "curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
    > QUEUE_PAGESIZE" is essentially the same as the one we do in
    > asyncQueueAdvance(). I think we can refactor the inner loop in
    > GetOldestQueuedNotifyXid() to something like:
    >
    > for (;;)
    > {
    >     bool reachedEndOfPage;
    >
    >     qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
    >
    >     // check qe->xid here...
    >
    >     reachedEndOfPage = asyncQueueAdvance(&pos, qe->length);
    >
    >     if (reachedEndOfPage || QUEUE_POS_EQUAL(pos, head))
    >         break;
    > }
    >
    Fixed
    
    > ---
    > v10-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch:
    >
    > This new test directory needs to have .gitignore.
    >
    Fixed
    
    > ---
    > +# Copyright (c) 2024-2025, PostgreSQL Global Development Group
    > +
    >
    > It's better to have a short description of this test here.
    >
    Fixed
    
    > ---
    > +use File::Path qw(mkpath);
    >
    > It seems not necessary.
    >
    Fixed
    
    > ---
    > +$node->safe_psql('postgres',
    > +   'CREATE TABLE t AS SELECT g AS a, g+2 AS b from
    > generate_series(1,100000) g;'
    >
    > Why does it need to insert many rows?
    >
    I think that this value was left when I was writing the first versions of
    this test, I don't think that it's necessary. I reduced to just 10 rows.
    
    > ---
    > +# --- Session 2, multiple notify's, and commit ---
    > +for my $i (1 .. 10)
    > +{
    > +   $node->safe_psql(
    > +       'postgres', "
    > +       BEGIN;
    > +       NOTIFY s, '$i';
    > +       COMMIT;");
    > +}
    >
    > Why does it need to send a notification 10 times?
    >
    I just wanted to test with multiple notifications, we can reduce to two
    or three, there is not specific reason to send 10 notifications.
    
    > ---
    > +# Consume enough XIDs to trigger truncation
    > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
    > +
    >
    > I guess it consumes XID too much to trigger truncation. Given the one
    > clog segment file is 256kB in size, it's enough to consume 1,050,000
    > XIDs to move to the next segment file.
    >
    Fixed
    
    > ---
    > +# Get the new datfrozenxid after vacuum freeze to ensure that is advanced but
    > +# we can still get the notification status of the notification
    > +my $datafronzenxid_freeze = $node->safe_psql('postgres', "select
    > datfrozenxid from pg_database where datname = 'postgres'");
    > +ok($datafronzenxid_freeze > $datafronzenxid, 'datfrozenxid is advanced');
    >
    > I think this test passes in all cases, so it is meaningless. Instead,
    > what we need to check in terms of datfrozenxid is that its value
    > doesn't get greater than the XID we used for the notification, even
    > after vacuum freeze. I think we remember the XID used for NOTIFY and
    > check if the old/new datfrozenxid values are older than that value.
    >
    I think that was the goal of this test. I've swap the values to make it
    more clear. Please let me know if I misunderstood your point here.
    
    > ---
    > +is($notifications_count, 10, 'received all committed notifications');
    > +
    > +done_testing();
    >
    > After consuming the unconsumed notifications, let's do vacuum freeze
    > and check if datfrozenxid now can be advanced.
    >
    Good point, during this I realize that the datafrozenxid was not being
    advanced even after the queue being consumed. This was because when a
    backend listener is consuming the notifications it only update their own
    queue tail pointer and not the global shared tail pointer, so I've
    included a call to asyncQueueAdvanceTail() at the beginning of
    GetOldestNotifyTransactionId() to do this.
    
    > ---
    > I would expect to add 002_aborted_tx_notifies.pl in a separate patch
    > since it's not related to this bug fix.
    >
    On the new attached version I've moved this test to a separated patch.
    
    > ---
    > +# Test checks that listeners do not receive notifications from aborted
    > +# transaction even if notifications have been added to the listen/notify
    > +# queue. To reproduce it we use the fact that serializable conflicts
    > +# are checked after tx adds notifications to the queue.
    >
    > I wonder if we could implement this test using the isolation test
    > instead of the tap test. Is there any reason why you used a tap test
    > for that?
    >
    On this new version I just moved the test into a separated patch but I
    agree that we can implement this using the isolation test,
    
    I don't know how we should handle the follow-up patches that Joel have
    sent on [1], if we should incoportate on 0001 or not. I still need to
    review and test them.
    
    [1] https://www.postgresql.org/message-id/b6f6cbb8-1903-4ca0-ae64-01d84d1e12a3%40app.fastmail.com
    
    --
    Matheus Alcantara
    
  64. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-25T00:33:33Z

    On Fri Oct 24, 2025 at 4:36 AM -03, Arseniy Mukhin wrote:
    >> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
    >> since it's not related to this bug fix.
    >>
    >> ---
    >> +# Test checks that listeners do not receive notifications from aborted
    >> +# transaction even if notifications have been added to the listen/notify
    >> +# queue. To reproduce it we use the fact that serializable conflicts
    >> +# are checked after tx adds notifications to the queue.
    >>
    >> I wonder if we could implement this test using the isolation test
    >> instead of the tap test. Is there any reason why you used a tap test
    >> for that?
    >>
    >
    > I agree it's less relevant to the patch now than it was with the new
    > 'committed' field approach. And there is no particular reason why it
    > was implemented as a TAP test actually.. So +1 to move it to separate
    > patch (does it mean to separate thread as well or just separate patch
    > file?) and rewrite as an isolation test (IIUC it's better to use
    > isolation test infrastructure if it's possible). I can try to do it if
    > nobody else does it earlier.
    >
    On the v11 version that I've sent on [1] I've move this test into a
    separate patch, please feel free to implement it as an isolation test if
    you want it.
    
    [1] https://www.postgresql.org/message-id/DDQZB2AD34V4.3RH2USCA72AS8%40gmail.com
    
    --
    Matheus Alcantara
    
    
    
    
    
  65. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-25T13:08:41Z

    On Sat, Oct 25, 2025 at 3:33 AM Matheus Alcantara
    <matheusssilv97@gmail.com> wrote:
    >
    > On Fri Oct 24, 2025 at 4:36 AM -03, Arseniy Mukhin wrote:
    > >> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
    > >> since it's not related to this bug fix.
    > >>
    > >> ---
    > >> +# Test checks that listeners do not receive notifications from aborted
    > >> +# transaction even if notifications have been added to the listen/notify
    > >> +# queue. To reproduce it we use the fact that serializable conflicts
    > >> +# are checked after tx adds notifications to the queue.
    > >>
    > >> I wonder if we could implement this test using the isolation test
    > >> instead of the tap test. Is there any reason why you used a tap test
    > >> for that?
    > >>
    > >
    > > I agree it's less relevant to the patch now than it was with the new
    > > 'committed' field approach. And there is no particular reason why it
    > > was implemented as a TAP test actually.. So +1 to move it to separate
    > > patch (does it mean to separate thread as well or just separate patch
    > > file?) and rewrite as an isolation test (IIUC it's better to use
    > > isolation test infrastructure if it's possible). I can try to do it if
    > > nobody else does it earlier.
    > >
    > On the v11 version that I've sent on [1] I've move this test into a
    > separate patch, please feel free to implement it as an isolation test if
    > you want it.
    >
    
    Thank you!
    
    I reimplemented the test in 0002 as an isolation test and added the
    commit message. PFA the new version.
    
    
    Best regards,
    Arseniy Mukhin
    
  66. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-25T18:01:56Z

    On Sat, Oct 25, 2025, at 15:08, Arseniy Mukhin wrote:
    > I reimplemented the test in 0002 as an isolation test and added the
    > commit message. PFA the new version.
    
    I've rebased the patches so they can be applied on top of v12:
    - v12-vacuum_notify_queue_cleanup-without-code-dup.txt
    - v12-vacuum_notify_queue_cleanup-with-code-dup.txt
    
    I also want to share a new idea that Heikki Linnakangas came up with
    during PgConf25 in Riga.
    
    The idea is basically to scan the notification queue from tail to head,
    and update xids that precedes newFrozenXid, to either
    FrozenTransactionId if committed, or InvalidTransactionId if not.
    
    I've implemented this by adding a new extern function
    AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
    
    This way, we don't need to hold back the advancement of newFrozenXid in
    vac_update_datfrozenxid.
    
    Attempt of implementing Heikki's idea:
    - async_notify_freeze_xids.txt
    
    This idea has the benefit of never holding back newFrozenXid,
    however, I see some problems with it:
    
    - If there is a bug in the code, we could accidentally overwrite
      xid values we didn't intend to overwrite, and maybe we would never
      find out that we did.
    
    - We wouldn't know for sure that we've understood the cause of
      this bug.
    
    With v12-vacuum_notify_queue_cleanup, we instead have the downside of
    possibly holding back newFrozenXid, but with the advantage of giving us
    higher confidence in that we've correctly identified the cause of the
    bug.
    
    /Joel
  67. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-29T11:40:34Z

    On 25/10/2025 21:01, Joel Jacobson wrote:
    > On Sat, Oct 25, 2025, at 15:08, Arseniy Mukhin wrote:
    >> I reimplemented the test in 0002 as an isolation test and added the
    >> commit message. PFA the new version.
    > 
    > I've rebased the patches so they can be applied on top of v12:
    > - v12-vacuum_notify_queue_cleanup-without-code-dup.txt
    > - v12-vacuum_notify_queue_cleanup-with-code-dup.txt
    > 
    > I also want to share a new idea that Heikki Linnakangas came up with
    > during PgConf25 in Riga.
    > 
    > The idea is basically to scan the notification queue from tail to head,
    > and update xids that precedes newFrozenXid, to either
    > FrozenTransactionId if committed, or InvalidTransactionId if not.
    > 
    > I've implemented this by adding a new extern function
    > AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
    > 
    > This way, we don't need to hold back the advancement of newFrozenXid in
    > vac_update_datfrozenxid.
    > 
    > Attempt of implementing Heikki's idea:
    > - async_notify_freeze_xids.txt
    
    Thanks!
    
    > This idea has the benefit of never holding back newFrozenXid,
    > however, I see some problems with it:
    > 
    > - If there is a bug in the code, we could accidentally overwrite
    >    xid values we didn't intend to overwrite, and maybe we would never
    >    find out that we did.
    > 
    > - We wouldn't know for sure that we've understood the cause of
    >    this bug.
    > 
    > With v12-vacuum_notify_queue_cleanup, we instead have the downside of
    > possibly holding back newFrozenXid, but with the advantage of giving us
    > higher confidence in that we've correctly identified the cause of the
    > bug.
    
    Meh. Robustness is good and all, and in heap tuples we don't overwrite 
    the xmin/xmax but just set a FROZEN flag, precisely so that we preserve 
    evidence if something goes wrong. But I can't get too worked up about 
    that for the async notification queue.
    
    That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
    similar to heap tuples, and set the hint bit instead of replacing the 
    original xid. That might be good for performance too: If the first 
    backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
    the hint bit, that would save the effort for other listening backends.
    
    - Heikki
    
    
    
    
  68. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-29T12:30:12Z

    On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
    > On 25/10/2025 21:01, Joel Jacobson wrote:
    >> I also want to share a new idea that Heikki Linnakangas came up with
    >> during PgConf25 in Riga.
    >> 
    >> The idea is basically to scan the notification queue from tail to head,
    >> and update xids that precedes newFrozenXid, to either
    >> FrozenTransactionId if committed, or InvalidTransactionId if not.
    >> 
    >> I've implemented this by adding a new extern function
    >> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
    >> 
    >> This way, we don't need to hold back the advancement of newFrozenXid in
    >> vac_update_datfrozenxid.
    >> 
    >> Attempt of implementing Heikki's idea:
    >> - async_notify_freeze_xids.txt
    >
    > Thanks!
    
    Thanks for the idea!
    
    >> This idea has the benefit of never holding back newFrozenXid,
    >> however, I see some problems with it:
    >> 
    >> - If there is a bug in the code, we could accidentally overwrite
    >>    xid values we didn't intend to overwrite, and maybe we would never
    >>    find out that we did.
    >> 
    >> - We wouldn't know for sure that we've understood the cause of
    >>    this bug.
    >> 
    >> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
    >> possibly holding back newFrozenXid, but with the advantage of giving us
    >> higher confidence in that we've correctly identified the cause of the
    >> bug.
    >
    > Meh. Robustness is good and all, and in heap tuples we don't overwrite 
    > the xmin/xmax but just set a FROZEN flag, precisely so that we preserve 
    > evidence if something goes wrong. But I can't get too worked up about 
    > that for the async notification queue.
    
    Having slept on this for some days, I'm less worried about this
    approach. I like the simplicity of it, and that we don't bolt on complexity
    to another subsystem, just for the sake of improved debugging
    capabilities of a different subsystem.
    
    I think a different way of looking at this, is that we wouldn't conceal
    a bug in async, but rather we would let vacuum do part of its job, of
    checking the commit status of the xids, when needed, and be okay with
    that split responsibility.
    
    If we can prove that the optimizations we're working on, will render
    that new code unnecessary, then I guess we can just remove it.
    
    > That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
    > similar to heap tuples, and set the hint bit instead of replacing the 
    > original xid. That might be good for performance too: If the first 
    > backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
    > the hint bit, that would save the effort for other listening backends.
    
    Interesting, sounds like a really promising approach.
    
    /Joel
    
    
    
    
  69. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-29T13:05:27Z

    On 29/10/2025 14:30, Joel Jacobson wrote:
    > On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
    >> On 25/10/2025 21:01, Joel Jacobson wrote:
    >>> I also want to share a new idea that Heikki Linnakangas came up with
    >>> during PgConf25 in Riga.
    >>>
    >>> The idea is basically to scan the notification queue from tail to head,
    >>> and update xids that precedes newFrozenXid, to either
    >>> FrozenTransactionId if committed, or InvalidTransactionId if not.
    >>>
    >>> I've implemented this by adding a new extern function
    >>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
    >>>
    >>> This way, we don't need to hold back the advancement of newFrozenXid in
    >>> vac_update_datfrozenxid.
    >>>
    >>> Attempt of implementing Heikki's idea:
    >>> - async_notify_freeze_xids.txt
    >>
    >> Thanks!
    > 
    > Thanks for the idea!
    
    For the record, Yura suggested the same approach earlier in this thread 
    [1]. That line of discussion led to more complicated patches but I think 
    the complications came from trying to set the flag as part of 
    commit/abort. The changes are more straightforward and backpatchable if 
    we only do it at vacuum.
    
    >>> This idea has the benefit of never holding back newFrozenXid,
    >>> however, I see some problems with it:
    >>>
    >>> - If there is a bug in the code, we could accidentally overwrite
    >>>     xid values we didn't intend to overwrite, and maybe we would never
    >>>     find out that we did.
    >>>
    >>> - We wouldn't know for sure that we've understood the cause of
    >>>     this bug.
    >>>
    >>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
    >>> possibly holding back newFrozenXid, but with the advantage of giving us
    >>> higher confidence in that we've correctly identified the cause of the
    >>> bug.
    >>
    >> Meh. Robustness is good and all, and in heap tuples we don't overwrite
    >> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
    >> evidence if something goes wrong. But I can't get too worked up about
    >> that for the async notification queue.
    > 
    > Having slept on this for some days, I'm less worried about this
    > approach. I like the simplicity of it, and that we don't bolt on complexity
    > to another subsystem, just for the sake of improved debugging
    > capabilities of a different subsystem.
    > 
    > I think a different way of looking at this, is that we wouldn't conceal
    > a bug in async, but rather we would let vacuum do part of its job, of
    > checking the commit status of the xids, when needed, and be okay with
    > that split responsibility.
    
    Ok, at quick glance I think this patch is in pretty good shape. I'll 
    review it more thoroughly, and see if I can incorporate the test from 
    Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and 
    backpatch this.
    
    - Heikki
    
    
    
    
    
  70. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-29T16:48:23Z

    "Joel Jacobson" <joel@compiler.org> writes:
    > On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
    >> That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
    >> similar to heap tuples, and set the hint bit instead of replacing the 
    >> original xid. That might be good for performance too: If the first 
    >> backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
    >> the hint bit, that would save the effort for other listening backends.
    
    > Interesting, sounds like a really promising approach.
    
    I don't see any room for adding hint bits without enlarging
    AsyncQueueEntry, which I'd rather not do.  But we could mechanize that
    without a separate hint bit by replacing the tested XID with FrozenXid
    or InvalidXid, as the patch already does.  We already require writing
    an XID to shared memory to be atomic, so that seems okay.
    
    However ... that won't actually work, the reason being that
    asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
    but from a local copy.  Even if it were to modify the state of that
    copy, no other backend would see the effects.
    
    The reason it's like that is stated in the comments:
    
     * The current page must have been fetched into page_buffer from shared
     * memory.  (We could access the page right in shared memory, but that
     * would imply holding the SLRU bank lock throughout this routine.)
    
    The patch proposed here likewise appears to involve holding an SLRU
    bank lock throughout what could be a significant number of
    TransactionIdDidCommit tests.  That seems like it could result in a
    pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
    by only doing it when VACUUM is trying to advance datfrozenxid, but
    still I wonder if we can't find a less concurrency-unfriendly answer.
    
    The local-copy behavior also means that this patch isn't quite a 100%
    fix.  We could have a race condition like so:
    
    1. Backend A grabs a copy of some SLRU page and begins running
    asyncQueueProcessPageEntries(), but then loses the CPU.
    
    2. Backend B completes a VACUUM, finds it can advance datfrozenxid,
    marks some relevant XIDs as frozen in the notify SLRU, and truncates
    clog.
    
    3. Backend A gets control back, tries to discover the state of
    some XID that's still present in its local copy of the XID page,
    and fails.
    
    Step 2 will take long enough that this isn't very plausible
    timing-wise, but it's still theoretically a hole.
    
    All of this is a problem mainly because of the presumption that
    holding an SLRU bank lock for a long time is bad.  I wonder how
    dangerous that really is.
    
    			regards, tom lane
    
    
    
    
  71. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-29T17:13:16Z

    On Wed, Oct 29, 2025, at 17:48, Tom Lane wrote:
    > However ... that won't actually work, the reason being that
    > asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
    > but from a local copy.  Even if it were to modify the state of that
    > copy, no other backend would see the effects.
    >
    > The reason it's like that is stated in the comments:
    >
    >  * The current page must have been fetched into page_buffer from shared
    >  * memory.  (We could access the page right in shared memory, but that
    >  * would imply holding the SLRU bank lock throughout this routine.)
    >
    > The patch proposed here likewise appears to involve holding an SLRU
    > bank lock throughout what could be a significant number of
    > TransactionIdDidCommit tests.  That seems like it could result in a
    > pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
    > by only doing it when VACUUM is trying to advance datfrozenxid, but
    > still I wonder if we can't find a less concurrency-unfriendly answer.
    ...
    > All of this is a problem mainly because of the presumption that
    > holding an SLRU bank lock for a long time is bad.  I wonder how
    > dangerous that really is.
    
    Ops. Sounds scary.
    
    I don't know if others have looked at the v12-vacuum_notify_queue_cleanup
    approach; if it's bad, it would be helpful to understand why.
    
    /Joel
    
    
    
    
  72. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-30T13:25:31Z

    On 29/10/2025 18:48, Tom Lane wrote:
    > However ... that won't actually work, the reason being that
    > asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
    > but from a local copy.  Even if it were to modify the state of that
    > copy, no other backend would see the effects.
    
    Ah, I missed that local copy.
    
    > The reason it's like that is stated in the comments:
    > 
    >   * The current page must have been fetched into page_buffer from shared
    >   * memory.  (We could access the page right in shared memory, but that
    >   * would imply holding the SLRU bank lock throughout this routine.)
    > 
    > The patch proposed here likewise appears to involve holding an SLRU
    > bank lock throughout what could be a significant number of
    > TransactionIdDidCommit tests.  That seems like it could result in a
    > pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
    > by only doing it when VACUUM is trying to advance datfrozenxid, but
    > still I wonder if we can't find a less concurrency-unfriendly answer.
    
    Hmm. I thought the original comment was more worried about sending the 
    message to the client being slow. But I guess the 
    TransactionIdDidCommit() calls are not free either.
    
    > The local-copy behavior also means that this patch isn't quite a 100%
    > fix.  We could have a race condition like so:
    > 
    > 1. Backend A grabs a copy of some SLRU page and begins running
    > asyncQueueProcessPageEntries(), but then loses the CPU.
    > 
    > 2. Backend B completes a VACUUM, finds it can advance datfrozenxid,
    > marks some relevant XIDs as frozen in the notify SLRU, and truncates
    > clog.
    > 
    > 3. Backend A gets control back, tries to discover the state of
    > some XID that's still present in its local copy of the XID page,
    > and fails.
    > 
    > Step 2 will take long enough that this isn't very plausible
    > timing-wise, but it's still theoretically a hole.
    
    It would be better than what we have now, but sure would be nice to fix 
    the problem fully..
    
    A straightforward fix would be to introduce another lock, 
    NotifyQueueVacuumLock, that's held in shared mode across 
    asyncQueueReadAllNotifications(), and acquired in exclusive mode by 
    freezing. That way freezing would wait out the concurrent readers that 
    might have local copies. Or some other similar mechanism for waiting 
    them out with e.g. condition variables.
    
    > All of this is a problem mainly because of the presumption that
    > holding an SLRU bank lock for a long time is bad.  I wonder how
    > dangerous that really is.
    
    It was worse in older versions before the SLRU banks were introduced.
    
    Holding the lock while we send the notification to the client is not 
    acceptable. But perhaps we could split asyncQueueProcessPageEntries() 
    into two parts:
    
    1. Do the TransactionIdDidCommit() checks on each entry and copy the 
    visible entries to local memory.
    2. Release lock and send the notifications to the backend.
    
    Joel, since you've been working on some optimizations in this area too, 
    would you happen to have some suitable performance test scripts for this?
    
    - Heikki
    
    
    
    
    
  73. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-30T23:08:04Z

    On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
    > Joel, since you've been working on some optimizations in this area too, 
    > would you happen to have some suitable performance test scripts for this?
    
    Glad you asked. I'm actually working on a benchmark+correctness tester.
    It's very much work-in-progress though, don't look too much at the code,
    or your eyes will bleed.
    
    It's a combined benchmark + correctness tester, that verifies that only
    the expected notifications are received on the expected connections,
    while at the same time doing timing measurements.
    
    ```
    % ./pg_bench_lino --help
    Usage: ./pg_bench_lino [OPTIONS]
    
    PostgreSQL LISTEN/NOTIFY Benchmark Tool (CLI version)
    
    Options:
      -c, --connections N       Number of database connections (default: 1)
      -n, --channels N          Number of notification channels (default: 1)
      -t, --tick-ms N           Tick interval in milliseconds (default: 1)
      -l, --listen-prob N       LISTEN probability % (default: 0.1, 0 to disable)
      -u, --unlisten-prob N     UNLISTEN probability % (default: 0.05, 0 to disable)
      -p, --notify-prob N       NOTIFY probability % (default: 1.0, 0 to disable)
      -a, --unlisten-all-prob N UNLISTEN * probability % (default: 0.01, 0 to disable)
      -T, --ticks N             Number of ticks to run (REQUIRED)
      -s, --seed N              Random seed for reproducibility (default: current time)
      -h, --help                Show this help message
    ```
    
    Example run on my MacBook M3 Max
    
    --- master:
    
    % ./pg_bench_lino -t 0 -c 100 -n 1000 -l 10 -u 0 -p 100 -s 42 -T 10000
    Initializing 100 connections and 1000 channels...
    Initialization complete. Starting benchmark...
    
    ========================================================
    PostgreSQL LISTEN/NOTIFY Benchmark Results
    ========================================================
    
    Configuration:
      Connections: 100
      Channels: 1000
      Tick interval: 0 ms
      LISTEN probability: 10.0000%
      UNLISTEN probability: 0.0000%
      NOTIFY probability: 100.0000%
      UNLISTEN * probability: 0.0100%
      Ticks executed: 10000
      Random seed: 42
    
    Final State:
      Active connections: 100
      Active channels: 1000
      Listening pairs: 989
      Correctness errors: 0
    
    ========================================================
    Operation Statistics:
    Operation            Count      Min(ms)      Avg(ms)      Max(ms)
    --------------------------------------------------------
    LISTEN                1009        0.016        0.029        0.291
    UNLISTEN                 0            -            -            -
    UNLISTEN *               1        0.071        0.071        0.071
    NOTIFY                9989        0.064        0.247        2.163
    NOTIFY delivery       9989        0.072        0.277        2.170
    
    ========================================================
    NOTIFY Delivery Time Distribution:
    
    0.05-0.1ms   # 4 (0.0%)
    0.1-0.15ms   ## 263 (2.6%)
    0.15-0.2ms   ####### 881 (8.8%)
    0.2-0.3ms    ################################################## 5907 (59.1%)
    0.3-0.4ms    #################### 2388 (23.9%)
    0.4-0.5ms    #### 525 (5.3%)
    0.5-0.75ms   # 18 (0.2%)
    1-2ms        # 2 (0.0%)
    2-5ms        # 1 (0.0%)
    
    
    -- Optimization v22 patch:
    
    % ./pg_bench_lino -t 0 -c 100 -n 1000 -l 10 -u 0 -p 100 -s 42 -T 10000
    Initializing 100 connections and 1000 channels...
    Initialization complete. Starting benchmark...
    
    ========================================================
    PostgreSQL LISTEN/NOTIFY Benchmark Results
    ========================================================
    
    Configuration:
      Connections: 100
      Channels: 1000
      Tick interval: 0 ms
      LISTEN probability: 10.0000%
      UNLISTEN probability: 0.0000%
      NOTIFY probability: 100.0000%
      UNLISTEN * probability: 0.0100%
      Ticks executed: 10000
      Random seed: 42
    
    Final State:
      Active connections: 100
      Active channels: 1000
      Listening pairs: 989
      Correctness errors: 0
    
    ========================================================
    Operation Statistics:
    Operation            Count      Min(ms)      Avg(ms)      Max(ms)
    --------------------------------------------------------
    LISTEN                1009        0.015        0.023        0.340
    UNLISTEN                 0            -            -            -
    UNLISTEN *               1        0.056        0.056        0.056
    NOTIFY                9989        0.018        0.031        2.037
    NOTIFY delivery       9989        0.022        0.062        2.056
    
    ========================================================
    NOTIFY Delivery Time Distribution:
    
    0-0.05ms     ################################################## 5002 (50.1%)
    0.05-0.1ms   ######################################### 4130 (41.3%)
    0.1-0.15ms   # 77 (0.8%)
    0.15-0.2ms   #### 427 (4.3%)
    0.2-0.3ms    ### 338 (3.4%)
    0.3-0.4ms    # 11 (0.1%)
    0.4-0.5ms    # 1 (0.0%)
    2-5ms        # 3 (0.0%)
    
    To compile:
    gcc -Wall -Wextra -g -I"$(pg_config --includedir)" -c pg_bench_lino.c -o pg_bench_lino.o
    gcc pg_bench_lino.o -L"$(pg_config --libdir)" -lpq -o pg_bench_lino
    
    /Joel
  74. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-10-30T23:27:11Z

    On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
    > On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
    >> Joel, since you've been working on some optimizations in this area too, 
    >> would you happen to have some suitable performance test scripts for this?
    >
    > Glad you asked. I'm actually working on a benchmark+correctness tester.
    > It's very much work-in-progress though, don't look too much at the code,
    > or your eyes will bleed.
    >
    > It's a combined benchmark + correctness tester, that verifies that only
    > the expected notifications are received on the expected connections,
    > while at the same time doing timing measurements.
    
    To run multiple pg_bench_lino processes in parallell to simulate
    concurrent workloads, I realized the randomization of the channel names
    and payloads were not random enough to avoid collissions. New version
    attached that uses real UUIDs for channel names and payloads.
    
    /Joel
    
  75. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-31T16:40:03Z

    On 31/10/2025 01:27, Joel Jacobson wrote:
    > On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
    >> On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
    >>> Joel, since you've been working on some optimizations in this area too,
    >>> would you happen to have some suitable performance test scripts for this?
    >>
    >> Glad you asked. I'm actually working on a benchmark+correctness tester.
    >> It's very much work-in-progress though, don't look too much at the code,
    >> or your eyes will bleed.
    >>
    >> It's a combined benchmark + correctness tester, that verifies that only
    >> the expected notifications are received on the expected connections,
    >> while at the same time doing timing measurements.
    > 
    > To run multiple pg_bench_lino processes in parallell to simulate
    > concurrent workloads, I realized the randomization of the channel names
    > and payloads were not random enough to avoid collissions. New version
    > attached that uses real UUIDs for channel names and payloads.
    
    Thanks! Here's a sketch for holding the bank lock across 
    TransactionIdDidCommit() calls. In quick testing with your test program, 
    I can't see any performance difference. However, I'm not quite sure what 
    options I should be using to stress this. My gut feeling is that it's 
    fine, but it'd be nice to do construct a real worst case test case to be 
    sure.
    
    There are some opportunities for micro-optimizations here:
    
    * IsListeningOn() is kind of expensive if a backend is listening 
    multiple channels. We really should turn that into a hash table. As the 
    patch stands, I'm doing the IsListeningOn() calls while holding the bank 
    lock, but unless we speed up IsListeningOn() in those degenerate cases, 
    we should perhaps call it only after copying and releasing the lock.
    
    * If IsListeningOn() is made fast, it might make sense to call it before 
    TransactionIdDidCommit().
    
    * Implement the hint bits.
    
    I don't know how much those matter. Again, a test case would be nice.
    
    I'll work more on performance testing next week of this, if no one else 
    picks that up.
    
    - Heikki
    
  76. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-03T11:02:32Z

    On 31/10/2025 18:40, Heikki Linnakangas wrote:
    > On 31/10/2025 01:27, Joel Jacobson wrote:
    >> On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
    >>> On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
    >>>> Joel, since you've been working on some optimizations in this area too,
    >>>> would you happen to have some suitable performance test scripts for 
    >>>> this?
    >>>
    >>> Glad you asked. I'm actually working on a benchmark+correctness tester.
    >>> It's very much work-in-progress though, don't look too much at the code,
    >>> or your eyes will bleed.
    >>>
    >>> It's a combined benchmark + correctness tester, that verifies that only
    >>> the expected notifications are received on the expected connections,
    >>> while at the same time doing timing measurements.
    >>
    >> To run multiple pg_bench_lino processes in parallell to simulate
    >> concurrent workloads, I realized the randomization of the channel names
    >> and payloads were not random enough to avoid collissions. New version
    >> attached that uses real UUIDs for channel names and payloads.
    > 
    > Thanks! Here's a sketch for holding the bank lock across 
    > TransactionIdDidCommit() calls. In quick testing with your test program, 
    > I can't see any performance difference. However, I'm not quite sure what 
    > options I should be using to stress this. My gut feeling is that it's 
    > fine, but it'd be nice to do construct a real worst case test case to be 
    > sure.
    
    I wrote another little stand-alone performance test program for this, 
    attached. It launches N connections that send NOTIFYs to a single 
    channel as fast as possible, and M threads that listen for the 
    notifications. I ran it with different combinations of N and M, on 
    'master' and on REL_14_STABLE (which didn't have SLRU banks) and I 
    cannot discern any performance difference from these patches. So it 
    seems that holding the SLRU (bank) lock across the 
    TransactionIdDidCommit() calls is fine.
    
    - Heikki
    
  77. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-11-03T21:45:00Z

    On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
    > I wrote another little stand-alone performance test program for this, 
    > attached. It launches N connections that send NOTIFYs to a single 
    > channel as fast as possible, and M threads that listen for the 
    > notifications. I ran it with different combinations of N and M, on 
    > 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I 
    > cannot discern any performance difference from these patches. So it 
    > seems that holding the SLRU (bank) lock across the 
    > TransactionIdDidCommit() calls is fine.
    
    Nice! That for the benchmark code! I took the liberty of hacking a bit
    on it, and added support for multiple channels, with separate listener
    and notifier threads per channel. Each notification now carries the
    notifier ID, a sequence number, and a send timestamp. Listeners verify
    that sequence numbers arrive in order and record delivery latency. The
    program collects latency measurements into fixed buckets and reports
    them once per second together with total and per-second send/receive
    counts.
    
    Also added a short delay before starting notifiers so that listeners
    have time to issue their LISTEN commands, and a new --channels option,
    and the meaning of --listeners and --notifiers was changed to apply per
    channel.
    
    Also fixed so the code could be compiled outside of the PostgreSQL
    source code repo, if wanting to build this as stand-alone tool.
    
    I've benchmarked master vs 0001+0002 and can't notice any differences;
    see attached output from benchmark runs.
    
    /Joel
  78. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-04T12:09:58Z

    On 03/11/2025 23:45, Joel Jacobson wrote:
    > On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
    >> I wrote another little stand-alone performance test program for this,
    >> attached. It launches N connections that send NOTIFYs to a single
    >> channel as fast as possible, and M threads that listen for the
    >> notifications. I ran it with different combinations of N and M, on
    >> 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I
    >> cannot discern any performance difference from these patches. So it
    >> seems that holding the SLRU (bank) lock across the
    >> TransactionIdDidCommit() calls is fine.
    > 
    > Nice! That for the benchmark code! I took the liberty of hacking a bit
    > on it, and added support for multiple channels, with separate listener
    > and notifier threads per channel. Each notification now carries the
    > notifier ID, a sequence number, and a send timestamp. Listeners verify
    > that sequence numbers arrive in order and record delivery latency. The
    > program collects latency measurements into fixed buckets and reports
    > them once per second together with total and per-second send/receive
    > counts.
    > 
    > Also added a short delay before starting notifiers so that listeners
    > have time to issue their LISTEN commands, and a new --channels option,
    > and the meaning of --listeners and --notifiers was changed to apply per
    > channel.
    > 
    > Also fixed so the code could be compiled outside of the PostgreSQL
    > source code repo, if wanting to build this as stand-alone tool.
    > 
    > I've benchmarked master vs 0001+0002 and can't notice any differences;
    > see attached output from benchmark runs.
    
    Thanks. After some further testing, I was able to find a scenario where 
    this patch significantly reduces performance: if the listening backends 
    subscribe to a massive number of channels, like 10000, they spend a lot 
    of time scanning the linked list of subscribed channels in 
    IsListeningOn(). With the patch, those checks were performed while 
    holding the SLRU lock, and it started to show up as lock contention 
    between notifiers and listeners. To demonstrate that, attached is 
    another version of the test program that adds an --extra-channels=N 
    argument. If you set it to e.g. 10000, each listener backends calls 
    LISTEN on 10000 additional channels that are never notified. They just 
    make the listenChannels list longer. With that and the patches I posted 
    previously, I'm getting:
    
    $ PGHOST=localhost PGDB=postgres://localhost/postgres 
    ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
    --extra-channels=10000
    10 s: 12716 sent (1274/s), 635775 received (63737/s)
      0.00-0.01ms                0 (0.0%) avg: 0.000ms
      0.01-0.10ms                0 (0.0%) avg: 0.000ms
      0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
      1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
      10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
     >100.00ms                  0 (0.0%) avg: 0.000ms
    ^C
    
    Whereas on 'master', I see about 2-3x more notifies/s:
    
    $ PGHOST=localhost PGDB=postgres://localhost/postgres 
    ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
    --extra-channels=10000
    10 s: 32057 sent (3296/s), 1602995 received (164896/s)
      0.00-0.01ms                0 (0.0%) avg: 0.000ms
      0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
      0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
      1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
      10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
     >100.00ms                  0 (0.0%) avg: 0.000ms
    ^C
    
    Fortunately that's easy to fix: We can move the IsListeningOn() check 
    after releasing the lock. See attached.
    
    The elephant in the room of course is that a lookup in a linked list is 
    O(n) and it would be very straightforward to replace it with e.g. a hash 
    table. We should do that irrespective of this bug fix. But I'm inclined 
    to do it as a separate followup patch.
    
    - Heikki
    
  79. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-04T14:40:31Z

    Hi,
    
    On Tue, Nov 4, 2025 at 3:10 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >
    > On 03/11/2025 23:45, Joel Jacobson wrote:
    > > On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
    > >> I wrote another little stand-alone performance test program for this,
    > >> attached. It launches N connections that send NOTIFYs to a single
    > >> channel as fast as possible, and M threads that listen for the
    > >> notifications. I ran it with different combinations of N and M, on
    > >> 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I
    > >> cannot discern any performance difference from these patches. So it
    > >> seems that holding the SLRU (bank) lock across the
    > >> TransactionIdDidCommit() calls is fine.
    > >
    > > Nice! That for the benchmark code! I took the liberty of hacking a bit
    > > on it, and added support for multiple channels, with separate listener
    > > and notifier threads per channel. Each notification now carries the
    > > notifier ID, a sequence number, and a send timestamp. Listeners verify
    > > that sequence numbers arrive in order and record delivery latency. The
    > > program collects latency measurements into fixed buckets and reports
    > > them once per second together with total and per-second send/receive
    > > counts.
    > >
    > > Also added a short delay before starting notifiers so that listeners
    > > have time to issue their LISTEN commands, and a new --channels option,
    > > and the meaning of --listeners and --notifiers was changed to apply per
    > > channel.
    > >
    > > Also fixed so the code could be compiled outside of the PostgreSQL
    > > source code repo, if wanting to build this as stand-alone tool.
    > >
    > > I've benchmarked master vs 0001+0002 and can't notice any differences;
    > > see attached output from benchmark runs.
    >
    > Thanks. After some further testing, I was able to find a scenario where
    > this patch significantly reduces performance: if the listening backends
    > subscribe to a massive number of channels, like 10000, they spend a lot
    > of time scanning the linked list of subscribed channels in
    > IsListeningOn(). With the patch, those checks were performed while
    > holding the SLRU lock, and it started to show up as lock contention
    > between notifiers and listeners. To demonstrate that, attached is
    > another version of the test program that adds an --extra-channels=N
    > argument. If you set it to e.g. 10000, each listener backends calls
    > LISTEN on 10000 additional channels that are never notified. They just
    > make the listenChannels list longer. With that and the patches I posted
    > previously, I'm getting:
    >
    > $ PGHOST=localhost PGDB=postgres://localhost/postgres
    > ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1
    > --extra-channels=10000
    > 10 s: 12716 sent (1274/s), 635775 received (63737/s)
    >   0.00-0.01ms                0 (0.0%) avg: 0.000ms
    >   0.01-0.10ms                0 (0.0%) avg: 0.000ms
    >   0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
    >   1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
    >   10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
    >  >100.00ms                  0 (0.0%) avg: 0.000ms
    > ^C
    >
    > Whereas on 'master', I see about 2-3x more notifies/s:
    >
    > $ PGHOST=localhost PGDB=postgres://localhost/postgres
    > ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1
    > --extra-channels=10000
    > 10 s: 32057 sent (3296/s), 1602995 received (164896/s)
    >   0.00-0.01ms                0 (0.0%) avg: 0.000ms
    >   0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
    >   0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
    >   1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
    >   10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
    >  >100.00ms                  0 (0.0%) avg: 0.000ms
    > ^C
    >
    > Fortunately that's easy to fix: We can move the IsListeningOn() check
    > after releasing the lock. See attached.
    >
    
    Thank you for working on this!
    
    It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
    
    With master, in case of a failure during NotifyMyFrontEnd, the
    listener's position in PG_FINALLY is set to the beginning of the next
    notification, since we advance the "current" position only if the
    previous notification was successfully sent.
    
    With 0002, we advance the "current" position while copying
    notifications to the local buffer, and begin sending them after the
    position has already been advanced for all copied notifications. So in
    case of a failure, the listener's position in PG_FINALLY is set to the
    beginning of the next page or queue head. This means we can lose
    notifications that were copied but were not sent.
    
    If we want to preserve the previous behavior, maybe we could use a new
    local position while copying notifications and only advance the
    "current" position while sending notifications to the frontend?
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  80. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-11-04T14:44:28Z

    Heikki Linnakangas <hlinnaka@iki.fi> writes:
    > Fortunately that's easy to fix: We can move the IsListeningOn() check 
    > after releasing the lock. See attached.
    > The elephant in the room of course is that a lookup in a linked list is 
    > O(n) and it would be very straightforward to replace it with e.g. a hash 
    > table. We should do that irrespective of this bug fix. But I'm inclined 
    > to do it as a separate followup patch.
    
    There is a different patch series that's concerned with improving
    NOTIFY throughput [1].  I think this one should just focus on
    fixing the XID problem.
    
    			regards, tom lane
    
    [1] https://commitfest.postgresql.org/patch/6078/
    
    
    
    
  81. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-04T16:27:48Z

    On 2025-Nov-04, Heikki Linnakangas wrote:
    
    > From 7c342e6efffc8d59c2e7658f6f2f3b138d02e0bb Mon Sep 17 00:00:00 2001
    > From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
    > Date: Tue, 4 Nov 2025 13:22:08 +0200
    > Subject: [PATCH v2 1/2] Fix bug where we truncated CLOG that was still needed
    >  by LISTEN/NOTIFY
    
    > The async notification queue contains the XID of the sender, and when
    > processing notifications we call TransactionIdDidCommit() on the
    > XID. But we had no safeguards to prevent the CLOG segments containing
    > those XIDs from being truncated away. As a result, if a backend didn't
    > for some reason process its notifications for a long time, or when a
    > new backend issued LISTEN, you could get an error like:
    > 
    > test=# listen c21;
    > ERROR:  58P01: could not access status of transaction 14279685
    > DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
    > LOCATION:  SlruReportIOError, slru.c:1087
    > 
    > Note: This commit is not a full fix. [...]
    
    The commit message doesn't say just _what_ does the patch do to fix the
    bug though.  Looking through the code, I think you're setting XIDs in
    the async queue to frozenXid, but I think the commit message should
    explain that.
    
    Thanks for spending time on this.
    
    -- 
    Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
    "La fuerza no está en los medios físicos
    sino que reside en una voluntad indomable" (Gandhi)
    
    
    
    
  82. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-11-04T20:43:15Z

    On Tue, Nov 4, 2025, at 13:09, Heikki Linnakangas wrote:
    > With that and the patches I posted 
    > previously, I'm getting:
    >
    > $ PGHOST=localhost PGDB=postgres://localhost/postgres 
    > ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
    > --extra-channels=10000
    > 10 s: 12716 sent (1274/s), 635775 received (63737/s)
    >   0.00-0.01ms                0 (0.0%) avg: 0.000ms
    >   0.01-0.10ms                0 (0.0%) avg: 0.000ms
    >   0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
    >   1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
    >   10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
    >  >100.00ms                  0 (0.0%) avg: 0.000ms
    > ^C
    >
    > Whereas on 'master', I see about 2-3x more notifies/s:
    >
    > $ PGHOST=localhost PGDB=postgres://localhost/postgres 
    > ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
    > --extra-channels=10000
    > 10 s: 32057 sent (3296/s), 1602995 received (164896/s)
    >   0.00-0.01ms                0 (0.0%) avg: 0.000ms
    >   0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
    >   0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
    >   1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
    >   10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
    >  >100.00ms                  0 (0.0%) avg: 0.000ms
    
    Nice with the --extra-channels addition!
    
    Below results on my MacBook Pro M3 Max.
    I ran them for 30s to get more stable distributions.
    
    % ninja install
    C compiler for the host machine: cc (clang 17.0.0 "Apple clang version 17.0.0 (clang-1700.0.13.5)")
    C linker for the host machine: cc ld64 1167.5
    Host machine cpu family: aarch64
    Host machine cpu: aarch64
    buildtype: release
    
    % gcc -Wall -Wextra -O2 -pthread -I/Users/joel/pg19/include/postgresql/server -I/Users/joel/pg19/include -o async-notify-test-3 async-notify-test-3.c -L/Users/joel/pg19/lib -lpq -pthread -lm
    
    % ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 --extra-channels=10000
    
    # master (c98dffcb7c7010d216dc16d22cb594ef7d65fde1)
    30 s: 293329 sent (9656/s), 14653182 received (480181/s)
     0.00-0.01ms                0 (0.0%) avg: 0.000ms
     0.01-0.10ms     #          298195 (2.0%) avg: 0.078ms
     0.10-1.00ms     ##         3060085 (20.9%) avg: 0.390ms
     1.00-10.00ms    ##         3310362 (22.6%) avg: 4.242ms
     10.00-100.00ms  #####      7482595 (51.1%) avg: 44.441ms
    >100.00ms       #          501945 (3.4%) avg: 119.257ms
    
    # 0001-Fix-bug-where-we-truncated-CLOG-that-was-still-neede.patch
    # 0002-Hold-SLRU-bank-lock-across-TransactionIdDidCommit-in.patch
    30 s: 87462 sent (3174/s), 4373100 received (158698/s)
     0.00-0.01ms                0 (0.0%) avg: 0.000ms
     0.01-0.10ms     #          16 (0.0%) avg: 0.091ms
     0.10-1.00ms     #          576554 (13.2%) avg: 0.798ms
     1.00-10.00ms    ########   3796530 (86.8%) avg: 1.589ms
     10.00-100.00ms             0 (0.0%) avg: 0.000ms
    >100.00ms                  0 (0.0%) avg: 0.000ms
    
    # v2-0001-Fix-bug-where-we-truncated-CLOG-that-was-still-ne.patch
    # v2-0002-Fix-remaining-race-condition-with-CLOG-truncation.patch
    30 s: 274342 sent (10134/s), 13708284 received (525274/s)
     0.00-0.01ms                0 (0.0%) avg: 0.000ms
     0.01-0.10ms     #          241981 (1.8%) avg: 0.079ms
     0.10-1.00ms     ###        4213484 (30.7%) avg: 0.399ms
     1.00-10.00ms    #          2727917 (19.9%) avg: 4.033ms
     10.00-100.00ms  ####       5920198 (43.2%) avg: 46.500ms
    >100.00ms       #          604712 (4.4%) avg: 118.336ms
    
    /Joel
    
    
    
    
  83. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Joel Jacobson <joel@compiler.org> — 2025-11-05T01:10:45Z

    On Tue, Nov 4, 2025, at 13:09, Heikki Linnakangas wrote:
    > The elephant in the room of course is that a lookup in a linked list is 
    > O(n) and it would be very straightforward to replace it with e.g. a hash 
    > table. We should do that irrespective of this bug fix. But I'm inclined 
    > to do it as a separate followup patch.
    
    Thanks for the idea of replacing it with a local hash table.
    I've implemented it in the patch set we're working on:
    https://commitfest.postgresql.org/patch/6078/
    https://www.postgresql.org/message-id/12c08e29-c21c-4a3d-a269-a48a1a26b18d%40app.fastmail.com
    
    /Joel
    
    
    
    
  84. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-05T09:21:29Z

    On 04/11/2025 16:40, Arseniy Mukhin wrote:
    > It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
    > 
    > With master, in case of a failure during NotifyMyFrontEnd, the
    > listener's position in PG_FINALLY is set to the beginning of the next
    > notification, since we advance the "current" position only if the
    > previous notification was successfully sent.
    > 
    > With 0002, we advance the "current" position while copying
    > notifications to the local buffer, and begin sending them after the
    > position has already been advanced for all copied notifications. So in
    > case of a failure, the listener's position in PG_FINALLY is set to the
    > beginning of the next page or queue head. This means we can lose
    > notifications that were copied but were not sent.
    
    True.
    
    > If we want to preserve the previous behavior, maybe we could use a new
    > local position while copying notifications and only advance the
    > "current" position while sending notifications to the frontend?
    
    That's not good. The loop advances 'current' before calling 
    TransactionIdDidCommit() to ensure that if there's a broken entry in the 
    queue for which TransactionIdDidCommit() throws an error, we advance 
    'current' past that point. Otherwise we would get back later to try to 
    process the same broken entry again and again.
    
    We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that 
    the copied notifications get sent even on error. But I'm a reluctant to 
    put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is 
    safe to call during error processing. And if the client is not 
    processing the notifications, the abort processing could be delayed for 
    a long time.
    
    One idea is to close the client connection on a TransactionIdDidCommit 
    error, i.e. make the error FATAL. The current behavior where we skip the 
    notification seems problematic already. Closing the connection would be 
    a clear signal that some notifications have been lost.
    
    Or we can just accept new behavior that if TransactionIdDidCommit() 
    fails, we might lose more notifications than the one that caused the error.
    
    - Heikki
    
    
    
    
    
  85. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-05T12:40:38Z

    On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
    > On 04/11/2025 16:40, Arseniy Mukhin wrote:
    >> It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
    >>
    >> With master, in case of a failure during NotifyMyFrontEnd, the
    >> listener's position in PG_FINALLY is set to the beginning of the next
    >> notification, since we advance the "current" position only if the
    >> previous notification was successfully sent.
    >>
    >> With 0002, we advance the "current" position while copying
    >> notifications to the local buffer, and begin sending them after the
    >> position has already been advanced for all copied notifications. So in
    >> case of a failure, the listener's position in PG_FINALLY is set to the
    >> beginning of the next page or queue head. This means we can lose
    >> notifications that were copied but were not sent.
    >
    > True.
    >
    >> If we want to preserve the previous behavior, maybe we could use a new
    >> local position while copying notifications and only advance the
    >> "current" position while sending notifications to the frontend?
    >
    > That's not good. The loop advances 'current' before calling
    > TransactionIdDidCommit() to ensure that if there's a broken entry in the
    > queue for which TransactionIdDidCommit() throws an error, we advance
    > 'current' past that point. Otherwise we would get back later to try to
    > process the same broken entry again and again.
    >
    > We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
    > the copied notifications get sent even on error. But I'm a reluctant to
    > put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
    > safe to call during error processing. And if the client is not
    > processing the notifications, the abort processing could be delayed for
    > a long time.
    >
    > One idea is to close the client connection on a TransactionIdDidCommit
    > error, i.e. make the error FATAL. The current behavior where we skip the
    > notification seems problematic already. Closing the connection would be
    > a clear signal that some notifications have been lost.
    >
    > Or we can just accept new behavior that if TransactionIdDidCommit()
    > fails, we might lose more notifications than the one that caused the error.
    >
    I think that we may have an error on TransactionIdDidCommit() and on
    NotifyMyFrontEnd() right?
    
    In case of an error on TransactionIdDidCommit() I think that we will
    have the same behavior as we advance the "current" position before of
    DidCommit call the PG_FINALLY block will set the backend position past
    the failing notification entry.
    
    I think that the problem is on NotifyMyFrontEnd() call that if it fails
    we will set the current backend position to the head of the queue
    or the beginning of the next page possibly losing notifications.
    
    How bad would be to store the notification entries within a list and
    store the next position with the notification entry and then wrap the
    NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    saved "next position" on PG_CATCH? Something like this:
    do
    {
        ListEntry le;
        qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
        reachedEndOfPage = asyncQueueAdvance(current, qe->length);
        ...
        else if (TransactionIdDidCommit(qe->xid))
        {
            le->entry = qe;
            le->nextPos = current;
        }
    } while (!reachedEndOfPage);
    
    foreach(lc, notificationEntries)
    {
        ListEntry *le = lfirst(lc);
        AsyncQueueEntry *qe = le->entry;
        char           *channel = qe->data;
    
        PG_TRY()
        {
            if (IsListeningOn(channel))
            {
                char           *payload = qe->data + strlen(channel) + 1;
                NotifyMyFrontEnd(channel, payload, qe->srcPid);
            }
        }
        PG_CATCH()
        {
            current = le->nextPos;
            PG_RE_THROW();
        }
    }
    
    My concern with this idea is about the overhead of using a list to store
    the notification entries and also the PG_TRY block but perhaps it's
    worth it to prevent losing notifications?
    
    --
    Matheus Alcantara
    EDB: http://www.enterprisedb.com
    
    
    
    
  86. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-05T12:59:15Z

    On 05/11/2025 14:40, Matheus Alcantara wrote:
    > On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
    >> On 04/11/2025 16:40, Arseniy Mukhin wrote:
    >>> If we want to preserve the previous behavior, maybe we could use a new
    >>> local position while copying notifications and only advance the
    >>> "current" position while sending notifications to the frontend?
    >>
    >> That's not good. The loop advances 'current' before calling
    >> TransactionIdDidCommit() to ensure that if there's a broken entry in the
    >> queue for which TransactionIdDidCommit() throws an error, we advance
    >> 'current' past that point. Otherwise we would get back later to try to
    >> process the same broken entry again and again.
    >>
    >> We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
    >> the copied notifications get sent even on error. But I'm a reluctant to
    >> put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
    >> safe to call during error processing. And if the client is not
    >> processing the notifications, the abort processing could be delayed for
    >> a long time.
    >>
    >> One idea is to close the client connection on a TransactionIdDidCommit
    >> error, i.e. make the error FATAL. The current behavior where we skip the
    >> notification seems problematic already. Closing the connection would be
    >> a clear signal that some notifications have been lost.
    >>
    >> Or we can just accept new behavior that if TransactionIdDidCommit()
    >> fails, we might lose more notifications than the one that caused the error.
    >>
    > I think that we may have an error on TransactionIdDidCommit() and on
    > NotifyMyFrontEnd() right?
    
    True.
    
    > In case of an error on TransactionIdDidCommit() I think that we will
    > have the same behavior as we advance the "current" position before of
    > DidCommit call the PG_FINALLY block will set the backend position past
    > the failing notification entry.
    
    With my patch, if TransactionIdDidCommit() fails, we will lose all the 
    notifications that we have buffered in the local buffer but haven't 
    passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single 
    notification, the one where TransactionIdDidCommit() failed.
    
    > I think that the problem is on NotifyMyFrontEnd() call that if it fails
    > we will set the current backend position to the head of the queue
    > or the beginning of the next page possibly losing notifications.
    
    True, a failure on NotifyMyFrontEnd can happen too, and it will also 
    lead to losing all the other notifications in the local buffer.
    
    > How bad would be to store the notification entries within a list and
    > store the next position with the notification entry and then wrap the
    > NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    > saved "next position" on PG_CATCH? Something like this:
    > do
    > {
    >      ListEntry le;
    >      qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
    >      reachedEndOfPage = asyncQueueAdvance(current, qe->length);
    >      ...
    >      else if (TransactionIdDidCommit(qe->xid))
    >      {
    >          le->entry = qe;
    >          le->nextPos = current;
    >      }
    > } while (!reachedEndOfPage);
    > 
    > foreach(lc, notificationEntries)
    > {
    >      ListEntry *le = lfirst(lc);
    >      AsyncQueueEntry *qe = le->entry;
    >      char           *channel = qe->data;
    > 
    >      PG_TRY()
    >      {
    >          if (IsListeningOn(channel))
    >          {
    >              char           *payload = qe->data + strlen(channel) + 1;
    >              NotifyMyFrontEnd(channel, payload, qe->srcPid);
    >          }
    >      }
    >      PG_CATCH()
    >      {
    >          current = le->nextPos;
    >          PG_RE_THROW();
    >      }
    > }
    
    That addresses the failure on NotifyMyFrontEnd, but not on 
    TransactionIdDidCommit.
    
    IMHO we should just make these errors FATAL. TransactionIdDidCommit() 
    should not really fail, after fixing the bug we're discussing. 
    NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on 
    an otherwise idle connection. Or it could fail if the client connection 
    is lost, but then the backend is about to die anyway. And arguably 
    closing the connection is better than losing even a single notification, 
    anyway.
    
    - Heikki
    
    
    
    
    
  87. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-05T18:17:19Z

    On Wed, Nov 5, 2025 at 12:21 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >
    > On 04/11/2025 16:40, Arseniy Mukhin wrote:
    > > It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
    > >
    > > With master, in case of a failure during NotifyMyFrontEnd, the
    > > listener's position in PG_FINALLY is set to the beginning of the next
    > > notification, since we advance the "current" position only if the
    > > previous notification was successfully sent.
    > >
    > > With 0002, we advance the "current" position while copying
    > > notifications to the local buffer, and begin sending them after the
    > > position has already been advanced for all copied notifications. So in
    > > case of a failure, the listener's position in PG_FINALLY is set to the
    > > beginning of the next page or queue head. This means we can lose
    > > notifications that were copied but were not sent.
    >
    > True.
    >
    > > If we want to preserve the previous behavior, maybe we could use a new
    > > local position while copying notifications and only advance the
    > > "current" position while sending notifications to the frontend?
    >
    > That's not good. The loop advances 'current' before calling
    > TransactionIdDidCommit() to ensure that if there's a broken entry in the
    > queue for which TransactionIdDidCommit() throws an error, we advance
    > 'current' past that point. Otherwise we would get back later to try to
    > process the same broken entry again and again.
    >
    
    Ouch, I failed to realise that this try/catch saves us from
    TransactionIdDidCommit failure too.
    
    The comment around PG_TRY says:
    
        /*
         * It is possible that we fail while trying to send a message to our
         * frontend (for example, because of encoding conversion failure).  If
    
    So I thought that it's the main reason we have try/catch here.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  88. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-05T19:02:52Z

    On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
    >> In case of an error on TransactionIdDidCommit() I think that we will
    >> have the same behavior as we advance the "current" position before of
    >> DidCommit call the PG_FINALLY block will set the backend position past
    >> the failing notification entry.
    >
    > With my patch, if TransactionIdDidCommit() fails, we will lose all the 
    > notifications that we have buffered in the local buffer but haven't 
    > passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single 
    > notification, the one where TransactionIdDidCommit() failed.
    >
    Yeah, that's true.
    
    >> How bad would be to store the notification entries within a list and
    >> store the next position with the notification entry and then wrap the
    >> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    >> saved "next position" on PG_CATCH? Something like this:
    >
    > [ ...]
    >
    > That addresses the failure on NotifyMyFrontEnd, but not on 
    > TransactionIdDidCommit.
    >
    > IMHO we should just make these errors FATAL. TransactionIdDidCommit() 
    > should not really fail, after fixing the bug we're discussing. 
    > NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on 
    > an otherwise idle connection. Or it could fail if the client connection 
    > is lost, but then the backend is about to die anyway. And arguably 
    > closing the connection is better than losing even a single notification, 
    > anyway.
    >
    My only concern with making these errors FATAL is that if a notification
    entry causes a different, recoverable error, all subsequent messages
    will be lost. This is because if backend die and the user open a new
    connection and execute LISTEN on the channel it will not see these
    notifications past the one that caused the error. I'm not sure if we are
    completely safe from this case of a recoverable error, what do you
    think?
    
    -- 
    Matheus Alcantara
    EDB: http://www.enterprisedb.com
    
    
    
    
    
  89. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T10:05:20Z

    On 05/11/2025 21:02, Matheus Alcantara wrote:
    > On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
    >>> In case of an error on TransactionIdDidCommit() I think that we will
    >>> have the same behavior as we advance the "current" position before of
    >>> DidCommit call the PG_FINALLY block will set the backend position past
    >>> the failing notification entry.
    >>
    >> With my patch, if TransactionIdDidCommit() fails, we will lose all the
    >> notifications that we have buffered in the local buffer but haven't
    >> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
    >> notification, the one where TransactionIdDidCommit() failed.
    >>
    > Yeah, that's true.
    > 
    >>> How bad would be to store the notification entries within a list and
    >>> store the next position with the notification entry and then wrap the
    >>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    >>> saved "next position" on PG_CATCH? Something like this:
    >>
    >> [ ...]
    >>
    >> That addresses the failure on NotifyMyFrontEnd, but not on
    >> TransactionIdDidCommit.
    >>
    >> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
    >> should not really fail, after fixing the bug we're discussing.
    >> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
    >> an otherwise idle connection. Or it could fail if the client connection
    >> is lost, but then the backend is about to die anyway. And arguably
    >> closing the connection is better than losing even a single notification,
    >> anyway.
    >>
    > My only concern with making these errors FATAL is that if a notification
    > entry causes a different, recoverable error, all subsequent messages
    > will be lost. This is because if backend die and the user open a new
    > connection and execute LISTEN on the channel it will not see these
    > notifications past the one that caused the error. I'm not sure if we are
    > completely safe from this case of a recoverable error, what do you
    > think?
    
    I did some more testing of the current behavior, using the encoding 
    conversion to cause an error:
    
    In backend A:
    
    SET client_encoding='latin1';
    LISTEN foo;
    
    Backend b:
    
    NOTIFY foo, 'ハ';
    
    Observations:
    
    - If the connection is idle when the notification is received, the ERROR 
    is turned into FATAL anyway:
    
    postgres=# SET client_encoding='latin1';
    SET
    postgres=# LISTEN foo;
    LISTEN
    postgres=# select 1; -- do the NOTIFY in another connection before this
    ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8" 
    has no equivalent in encoding "LATIN1"
    FATAL:  terminating connection because protocol synchronization was lost
    server closed the connection unexpectedly
    	This probably means the server terminated abnormally
    	before or while processing the request.
    
    - If there are multiple notifications pending, we stop processing the 
    subsequent notifications on the first error. The subsequent 
    notifications will only be processed when another notify interrupt is 
    received, i.e. when a backend sends yet another notification.
    
    I'm getting more and more convinced that escalating all ERRORs to FATALs 
    during notify processing is the right way to go. Attached is a new patch 
    set that does that.
    
    - Heikki
    
  90. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-06T10:18:13Z

    On 2025-Nov-05, Matheus Alcantara wrote:
    
    > My only concern with making these errors FATAL is that if a notification
    > entry causes a different, recoverable error, all subsequent messages
    > will be lost. This is because if backend die and the user open a new
    > connection and execute LISTEN on the channel it will not see these
    > notifications past the one that caused the error.
    
    I don't think things are supposed to work that way -- I understand that
    an application that connects is supposed to read the current state of
    things, and then the notifies ensure that any changes from that point on
    can be processed.  So if a connection dies on a FATAL, then you have to
    establish your LISTENs again and obtain the current state of the world
    at that point.  It doesn't miss anything, because any NOTIFYies that
    occurred while the connection was down should be obtained when current
    state is read.
    
    -- 
    Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
    "This is what I like so much about PostgreSQL.  Most of the surprises
    are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
    Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
    
    
    
    
  91. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-06T10:24:50Z

    Hello,
    
    I've been studying the code for freezing of items at vacuum-truncate
    time, and one thing that jumps at me is that perhaps we ought to be more
    proactive in freezing items immediately as they are processed.  We can
    do that for any transactions that precede RecentXmin, because by that
    point, every snapshot in the system should have that committed
    transaction as no longer in progress anyway. Something like
    
    diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
    index 8ac7d989641..88d5ed2b461 100644
    --- a/src/backend/commands/async.c
    +++ b/src/backend/commands/async.c
    @@ -2038,6 +2038,11 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
     			}
     			else if (TransactionIdDidCommit(qe->xid))
     			{
    +				if (TransactionIdPrecedes(RecentXmin, qe->xid))
    +				{
    +					elog(WARNING, "freezing an entry for transaction %u", qe->xid);
    +					qe->xid = FrozenTransactionId;
    +				}
     				memcpy(local_buf_end, qe, qe->length);
     				local_buf_end += qe->length;
     			}
    
    If we do this, then the chances that we need to freeze items at vacuum
    time are much lower, and we're not creating any danger that
    TransactionIdDidCommit() errors any more than we have in the current
    code.
    
    Is there any reason this wouldn't work?
    
    
    I noticed that async-notify-test-3 doesn't actually freeze any items by
    itself ... you need to do "ALTER TABLE template0 allow_connections" and
    then do vacuumdb -a in a loop in order for this to happen (or something
    similar, I guess).  Otherwise the new code in AsyncNotifyFreezeXids is
    never executed.
    
    -- 
    Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
    "Those who use electric razors are infidels destined to burn in hell while
    we drink from rivers of beer, download free vids and mingle with naked
    well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)
    
    
    
    
  92. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-06T11:20:37Z

    On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >
    > On 05/11/2025 21:02, Matheus Alcantara wrote:
    > > On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
    > >>> In case of an error on TransactionIdDidCommit() I think that we will
    > >>> have the same behavior as we advance the "current" position before of
    > >>> DidCommit call the PG_FINALLY block will set the backend position past
    > >>> the failing notification entry.
    > >>
    > >> With my patch, if TransactionIdDidCommit() fails, we will lose all the
    > >> notifications that we have buffered in the local buffer but haven't
    > >> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
    > >> notification, the one where TransactionIdDidCommit() failed.
    > >>
    > > Yeah, that's true.
    > >
    > >>> How bad would be to store the notification entries within a list and
    > >>> store the next position with the notification entry and then wrap the
    > >>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    > >>> saved "next position" on PG_CATCH? Something like this:
    > >>
    > >> [ ...]
    > >>
    > >> That addresses the failure on NotifyMyFrontEnd, but not on
    > >> TransactionIdDidCommit.
    > >>
    > >> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
    > >> should not really fail, after fixing the bug we're discussing.
    > >> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
    > >> an otherwise idle connection. Or it could fail if the client connection
    > >> is lost, but then the backend is about to die anyway. And arguably
    > >> closing the connection is better than losing even a single notification,
    > >> anyway.
    > >>
    > > My only concern with making these errors FATAL is that if a notification
    > > entry causes a different, recoverable error, all subsequent messages
    > > will be lost. This is because if backend die and the user open a new
    > > connection and execute LISTEN on the channel it will not see these
    > > notifications past the one that caused the error. I'm not sure if we are
    > > completely safe from this case of a recoverable error, what do you
    > > think?
    >
    > I did some more testing of the current behavior, using the encoding
    > conversion to cause an error:
    >
    > In backend A:
    >
    > SET client_encoding='latin1';
    > LISTEN foo;
    >
    > Backend b:
    >
    > NOTIFY foo, 'ハ';
    >
    > Observations:
    >
    > - If the connection is idle when the notification is received, the ERROR
    > is turned into FATAL anyway:
    >
    > postgres=# SET client_encoding='latin1';
    > SET
    > postgres=# LISTEN foo;
    > LISTEN
    > postgres=# select 1; -- do the NOTIFY in another connection before this
    > ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
    > has no equivalent in encoding "LATIN1"
    > FATAL:  terminating connection because protocol synchronization was lost
    > server closed the connection unexpectedly
    >         This probably means the server terminated abnormally
    >         before or while processing the request.
    >
    > - If there are multiple notifications pending, we stop processing the
    > subsequent notifications on the first error. The subsequent
    > notifications will only be processed when another notify interrupt is
    > received, i.e. when a backend sends yet another notification.
    >
    > I'm getting more and more convinced that escalating all ERRORs to FATALs
    > during notify processing is the right way to go. Attached is a new patch
    > set that does that.
    >
    
    One point about removing try/catch: when we execute LISTEN for the
    very first time, we read the queue from the min(listener's pos) up to
    the head. listenChannels is empty at the moment, so we never call
    NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
    some problematic entry in the queue that we need to process during
    initial LISTEN, it could block creation of new listeners. Is it
    something we need to worry about?
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  93. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T11:35:28Z

    Thanks for reviewing!
    
    On 06/11/2025 12:24, Álvaro Herrera wrote:
    > I've been studying the code for freezing of items at vacuum-truncate
    > time, and one thing that jumps at me is that perhaps we ought to be more
    > proactive in freezing items immediately as they are processed.  We can
    > do that for any transactions that precede RecentXmin, because by that
    > point, every snapshot in the system should have that committed
    > transaction as no longer in progress anyway. Something like
    > 
    > diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
    > index 8ac7d989641..88d5ed2b461 100644
    > --- a/src/backend/commands/async.c
    > +++ b/src/backend/commands/async.c
    > @@ -2038,6 +2038,11 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
    >   			}
    >   			else if (TransactionIdDidCommit(qe->xid))
    >   			{
    > +				if (TransactionIdPrecedes(RecentXmin, qe->xid))
    > +				{
    > +					elog(WARNING, "freezing an entry for transaction %u", qe->xid);
    > +					qe->xid = FrozenTransactionId;
    > +				}
    >   				memcpy(local_buf_end, qe, qe->length);
    >   				local_buf_end += qe->length;
    >   			}
    > 
    > If we do this, then the chances that we need to freeze items at vacuum
    > time are much lower, and we're not creating any danger that
    > TransactionIdDidCommit() errors any more than we have in the current
    > code.
    > 
    > Is there any reason this wouldn't work?
    
    We could do that and it would be good for performance anyway. This is 
    the "hint bit" idea that's been discussed in this thread. I think it 
    would be better to do it with additional hint bits rather than by 
    overwriting 'xid'. Having a separate COMMIITTED/ABORTED and FROZEN bits 
    would allow setting the COMMITTED hint bit even when the xid is newer 
    than RecentXmin.
    
    Would need to mark the SLRU page dirty if we do that. Not sure if that 
    requires holding an exclusive lock. And we should also replace aborted 
    xids with InvalidTransactionId.
    
    I feel that that should be a separate patch, and not backpatched. We'd 
    still need all the other changes from this patch series even if we did 
    that, it would be just an optimization.
    
    > I noticed that async-notify-test-3 doesn't actually freeze any items by
    > itself ... you need to do "ALTER TABLE template0 allow_connections" and
    > then do vacuumdb -a in a loop in order for this to happen (or something
    > similar, I guess).  Otherwise the new code in AsyncNotifyFreezeXids is
    > never executed.
    
    Right, async-notify-test-3 was just a performance test to verify that 
    the holding the SLRU lock longer doesn't degrade performance. I haven't 
    tried performance testing CLOG truncation itself, but it happens so 
    rarely that I'm not too worried about it. See 
    src/test/modules/xid_wraparound/t/004_notify_freeze.pl for a correctness 
    test.
    
    - Heikki
    
    
    
    
    
  94. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T12:05:59Z

    On 06/11/2025 13:20, Arseniy Mukhin wrote:
    > On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >>
    >> On 05/11/2025 21:02, Matheus Alcantara wrote:
    >>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
    >>>>> In case of an error on TransactionIdDidCommit() I think that we will
    >>>>> have the same behavior as we advance the "current" position before of
    >>>>> DidCommit call the PG_FINALLY block will set the backend position past
    >>>>> the failing notification entry.
    >>>>
    >>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
    >>>> notifications that we have buffered in the local buffer but haven't
    >>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
    >>>> notification, the one where TransactionIdDidCommit() failed.
    >>>>
    >>> Yeah, that's true.
    >>>
    >>>>> How bad would be to store the notification entries within a list and
    >>>>> store the next position with the notification entry and then wrap the
    >>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    >>>>> saved "next position" on PG_CATCH? Something like this:
    >>>>
    >>>> [ ...]
    >>>>
    >>>> That addresses the failure on NotifyMyFrontEnd, but not on
    >>>> TransactionIdDidCommit.
    >>>>
    >>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
    >>>> should not really fail, after fixing the bug we're discussing.
    >>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
    >>>> an otherwise idle connection. Or it could fail if the client connection
    >>>> is lost, but then the backend is about to die anyway. And arguably
    >>>> closing the connection is better than losing even a single notification,
    >>>> anyway.
    >>>>
    >>> My only concern with making these errors FATAL is that if a notification
    >>> entry causes a different, recoverable error, all subsequent messages
    >>> will be lost. This is because if backend die and the user open a new
    >>> connection and execute LISTEN on the channel it will not see these
    >>> notifications past the one that caused the error. I'm not sure if we are
    >>> completely safe from this case of a recoverable error, what do you
    >>> think?
    >>
    >> I did some more testing of the current behavior, using the encoding
    >> conversion to cause an error:
    >>
    >> In backend A:
    >>
    >> SET client_encoding='latin1';
    >> LISTEN foo;
    >>
    >> Backend b:
    >>
    >> NOTIFY foo, 'ハ';
    >>
    >> Observations:
    >>
    >> - If the connection is idle when the notification is received, the ERROR
    >> is turned into FATAL anyway:
    >>
    >> postgres=# SET client_encoding='latin1';
    >> SET
    >> postgres=# LISTEN foo;
    >> LISTEN
    >> postgres=# select 1; -- do the NOTIFY in another connection before this
    >> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
    >> has no equivalent in encoding "LATIN1"
    >> FATAL:  terminating connection because protocol synchronization was lost
    >> server closed the connection unexpectedly
    >>          This probably means the server terminated abnormally
    >>          before or while processing the request.
    >>
    >> - If there are multiple notifications pending, we stop processing the
    >> subsequent notifications on the first error. The subsequent
    >> notifications will only be processed when another notify interrupt is
    >> received, i.e. when a backend sends yet another notification.
    >>
    >> I'm getting more and more convinced that escalating all ERRORs to FATALs
    >> during notify processing is the right way to go. Attached is a new patch
    >> set that does that.
    > 
    > One point about removing try/catch: when we execute LISTEN for the
    > very first time, we read the queue from the min(listener's pos) up to
    > the head. listenChannels is empty at the moment, so we never call
    > NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
    > some problematic entry in the queue that we need to process during
    > initial LISTEN, it could block creation of new listeners. Is it
    > something we need to worry about?
    
    Hmm, I don't see how it could block the creation of new listeners. 
    Exec_ListenPreCommit() starts from "max(listener's pos)" over existing 
    listeners, not min(). Am I missing something?
    
    That said, Exec_ListenPreCommit() could start from 'min' among all 
    listeners, if there are no listeners for the same backend. Per attached 
    patch. I don't think it makes much difference though.
    
    Another small change we could make is to check for listenChannels == NIL 
    before calling TransactionIdDidCommit.
    
    - Heikki
  95. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-06T15:13:43Z

    On Thu, Nov 6, 2025 at 3:06 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >
    > On 06/11/2025 13:20, Arseniy Mukhin wrote:
    > > On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    > >>
    > >> On 05/11/2025 21:02, Matheus Alcantara wrote:
    > >>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
    > >>>>> In case of an error on TransactionIdDidCommit() I think that we will
    > >>>>> have the same behavior as we advance the "current" position before of
    > >>>>> DidCommit call the PG_FINALLY block will set the backend position past
    > >>>>> the failing notification entry.
    > >>>>
    > >>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
    > >>>> notifications that we have buffered in the local buffer but haven't
    > >>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
    > >>>> notification, the one where TransactionIdDidCommit() failed.
    > >>>>
    > >>> Yeah, that's true.
    > >>>
    > >>>>> How bad would be to store the notification entries within a list and
    > >>>>> store the next position with the notification entry and then wrap the
    > >>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
    > >>>>> saved "next position" on PG_CATCH? Something like this:
    > >>>>
    > >>>> [ ...]
    > >>>>
    > >>>> That addresses the failure on NotifyMyFrontEnd, but not on
    > >>>> TransactionIdDidCommit.
    > >>>>
    > >>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
    > >>>> should not really fail, after fixing the bug we're discussing.
    > >>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
    > >>>> an otherwise idle connection. Or it could fail if the client connection
    > >>>> is lost, but then the backend is about to die anyway. And arguably
    > >>>> closing the connection is better than losing even a single notification,
    > >>>> anyway.
    > >>>>
    > >>> My only concern with making these errors FATAL is that if a notification
    > >>> entry causes a different, recoverable error, all subsequent messages
    > >>> will be lost. This is because if backend die and the user open a new
    > >>> connection and execute LISTEN on the channel it will not see these
    > >>> notifications past the one that caused the error. I'm not sure if we are
    > >>> completely safe from this case of a recoverable error, what do you
    > >>> think?
    > >>
    > >> I did some more testing of the current behavior, using the encoding
    > >> conversion to cause an error:
    > >>
    > >> In backend A:
    > >>
    > >> SET client_encoding='latin1';
    > >> LISTEN foo;
    > >>
    > >> Backend b:
    > >>
    > >> NOTIFY foo, 'ハ';
    > >>
    > >> Observations:
    > >>
    > >> - If the connection is idle when the notification is received, the ERROR
    > >> is turned into FATAL anyway:
    > >>
    > >> postgres=# SET client_encoding='latin1';
    > >> SET
    > >> postgres=# LISTEN foo;
    > >> LISTEN
    > >> postgres=# select 1; -- do the NOTIFY in another connection before this
    > >> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
    > >> has no equivalent in encoding "LATIN1"
    > >> FATAL:  terminating connection because protocol synchronization was lost
    > >> server closed the connection unexpectedly
    > >>          This probably means the server terminated abnormally
    > >>          before or while processing the request.
    > >>
    > >> - If there are multiple notifications pending, we stop processing the
    > >> subsequent notifications on the first error. The subsequent
    > >> notifications will only be processed when another notify interrupt is
    > >> received, i.e. when a backend sends yet another notification.
    > >>
    > >> I'm getting more and more convinced that escalating all ERRORs to FATALs
    > >> during notify processing is the right way to go. Attached is a new patch
    > >> set that does that.
    > >
    > > One point about removing try/catch: when we execute LISTEN for the
    > > very first time, we read the queue from the min(listener's pos) up to
    > > the head. listenChannels is empty at the moment, so we never call
    > > NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
    > > some problematic entry in the queue that we need to process during
    > > initial LISTEN, it could block creation of new listeners. Is it
    > > something we need to worry about?
    >
    > Hmm, I don't see how it could block the creation of new listeners.
    > Exec_ListenPreCommit() starts from "max(listener's pos)" over existing
    > listeners, not min(). Am I missing something?
    >
    
    My bad, you are right, we use max(listener's pos). But it doesn't
    resolve the issue if I'm not missing something.
    
    Let's say we have the queue:
    
    (tail ... pos1 ... bad_entry_pos ... head)
    
    bad_entry_pos - position of the entry where TransactionIdDidCommit fails.
    
    We have the listener L1 with pos = pos1. It means every new listener
    should process the queue from pos1 (as it's max(listener's pos)) up to
    the queue head and when they try to do it, they will fail on
    'bad_entry'.
    
    > That said, Exec_ListenPreCommit() could start from 'min' among all
    > listeners, if there are no listeners for the same backend. Per attached
    > patch. I don't think it makes much difference though.
    >
    
    Yes, if there are no listeners, every new listener will start the
    initial reading from the QUEUE_TAIL and will fail on the bad_entry
    too.
    
    But this example is valid only if we think we can fail on
    TransactionIdDidCommit (even after the fix).
    
    > Another small change we could make is to check for listenChannels == NIL
    > before calling TransactionIdDidCommit.
    
    There is a comment that describes why we need this initial reading in
    Exec_ListenPreCommit:
    
        /*
         * This is our first LISTEN, so establish our pointer.
         *
         * We set our pointer to the global tail pointer and then move it forward
         * over already-committed notifications.  This ensures we cannot miss any
         * not-yet-committed notifications.  We might get a few more but that
         * doesn't hurt.
    
    It seems that with such a check we will skip not-yet-committed
    notifications and always get to the HEAD. If we decide that it's ok,
    maybe we can just set 'pos' of every new listener to the HEAD without
    reading?
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  96. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-07T08:10:25Z

    On 06/11/2025 17:13, Arseniy Mukhin wrote:
    > Let's say we have the queue:
    > 
    > (tail ... pos1 ... bad_entry_pos ... head)
    > 
    > bad_entry_pos - position of the entry where TransactionIdDidCommit fails.
    > 
    > We have the listener L1 with pos = pos1. It means every new listener
    > should process the queue from pos1 (as it's max(listener's pos)) up to
    > the queue head and when they try to do it, they will fail on
    > 'bad_entry'.
    
    Gotcha.
    
    >> Another small change we could make is to check for listenChannels == NIL
    >> before calling TransactionIdDidCommit.
    > 
    > There is a comment that describes why we need this initial reading in
    > Exec_ListenPreCommit:
    > 
    >      /*
    >       * This is our first LISTEN, so establish our pointer.
    >       *
    >       * We set our pointer to the global tail pointer and then move it forward
    >       * over already-committed notifications.  This ensures we cannot miss any
    >       * not-yet-committed notifications.  We might get a few more but that
    >       * doesn't hurt.
    > 
    > It seems that with such a check we will skip not-yet-committed
    > notifications and always get to the HEAD. If we decide that it's ok,
    > maybe we can just set 'pos' of every new listener to the HEAD without
    > reading?
    
    Right, I didn't mean skipping asyncQueueReadAllNotifications() 
    altogether. Just the TransactionIdDidCommit() calls in it, when 
    listenChannels == NIL, see attached patch. The idea is that when 
    'listenChannels == NIL', we're not going send the notification to the 
    frontend regardless of what TransactionIdDidCommit says. If we don't 
    call TransactionIdDidCommit, we won't fail on bad entries.
    
    I'm not sure how much this matters. We really shouldn't have bad entries 
    in the queue to begin with. And if we do, a broken entry could cause all 
    kinds of trouble. For example, if the broken entry's (bogus) XID is 
    higher than current XID, XidInMVCCSnapshot() will return true and we'll 
    get stuck on that.
    
    - Heikki
    
  97. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-07T09:41:48Z

    On Fri, Nov 7, 2025 at 11:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >
    > On 06/11/2025 17:13, Arseniy Mukhin wrote:
    > > Let's say we have the queue:
    > >
    > > (tail ... pos1 ... bad_entry_pos ... head)
    > >
    > > bad_entry_pos - position of the entry where TransactionIdDidCommit fails.
    > >
    > > We have the listener L1 with pos = pos1. It means every new listener
    > > should process the queue from pos1 (as it's max(listener's pos)) up to
    > > the queue head and when they try to do it, they will fail on
    > > 'bad_entry'.
    >
    > Gotcha.
    >
    > >> Another small change we could make is to check for listenChannels == NIL
    > >> before calling TransactionIdDidCommit.
    > >
    > > There is a comment that describes why we need this initial reading in
    > > Exec_ListenPreCommit:
    > >
    > >      /*
    > >       * This is our first LISTEN, so establish our pointer.
    > >       *
    > >       * We set our pointer to the global tail pointer and then move it forward
    > >       * over already-committed notifications.  This ensures we cannot miss any
    > >       * not-yet-committed notifications.  We might get a few more but that
    > >       * doesn't hurt.
    > >
    > > It seems that with such a check we will skip not-yet-committed
    > > notifications and always get to the HEAD. If we decide that it's ok,
    > > maybe we can just set 'pos' of every new listener to the HEAD without
    > > reading?
    >
    > Right, I didn't mean skipping asyncQueueReadAllNotifications()
    > altogether. Just the TransactionIdDidCommit() calls in it, when
    > listenChannels == NIL, see attached patch. The idea is that when
    > 'listenChannels == NIL', we're not going send the notification to the
    > frontend regardless of what TransactionIdDidCommit says. If we don't
    > call TransactionIdDidCommit, we won't fail on bad entries.
    >
    
    Ahh, now I see, thank you. WFM.
    
    
    Best regards,
    Arseniy Mukhin
    
    
    
    
  98. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-07T14:14:31Z

    One thing I noticed while testing this is that asyncQueueAddEntries()
    fills the end of a page with a dummy entry, when the next notify doesn't
    fit.  However, this dummy entry contains a very valid TransactionId,
    which the new freezing code will try to look up and freeze.  I think
    this is somewhat bogus -- we shouldn't even try to look up that XID in
    the first place.  I propose to clear it like this
    
    @@ -1419,6 +1424,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
     			 */
     			qe.length = QUEUE_PAGESIZE - offset;
     			qe.dboid = InvalidOid;
    +			qe.xid = InvalidTransactionId;
     			qe.data[0] = '\0';	/* empty channel */
     			qe.data[1] = '\0';	/* empty payload */
     		}
    
    
    (Line numbers do not match, because I have other local changes.)
    
    
    -- 
    Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
    "I'm always right, but sometimes I'm more right than other times."
                                                      (Linus Torvalds)
    https://lore.kernel.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/
    
    
    
    
  99. Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-12T19:39:47Z

    On 07/11/2025 16:14, Álvaro Herrera wrote:
    > One thing I noticed while testing this is that asyncQueueAddEntries()
    > fills the end of a page with a dummy entry, when the next notify doesn't
    > fit.  However, this dummy entry contains a very valid TransactionId,
    > which the new freezing code will try to look up and freeze.  I think
    > this is somewhat bogus -- we shouldn't even try to look up that XID in
    > the first place.  I propose to clear it like this
    > 
    > @@ -1419,6 +1424,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
    >   			 */
    >   			qe.length = QUEUE_PAGESIZE - offset;
    >   			qe.dboid = InvalidOid;
    > +			qe.xid = InvalidTransactionId;
    >   			qe.data[0] = '\0';	/* empty channel */
    >   			qe.data[1] = '\0';	/* empty payload */
    >   		}
    > 
    > 
    > (Line numbers do not match, because I have other local changes.)
    
    Committed. I committed the above separately, because I forgot to include 
    it in the main commit. Oops.
    
    Just to summarize what was committed, out of all the different variants 
    discussed:
    
    * Any ERROR while processing an async notification is now turned into FATAL
    * Vacuum scans the async notification queue and freezes xids before 
    truncating CLOG
    * The TransactionIdDidCommit() calls are now made while holding the SLRU 
    lock. NotifyMyFrontEnd() calls are still made after releasing the lock
    * listenChannels == NIL special case is checked before 
    TransactionIdDidCommit(). This avoids the problem that no backend can 
    LISTEN to anything anymore, if there's one broken entry in the queue for 
    some reason
    * 'xid' field on dummy entries is now set to InvalidTransactionId so 
    that they don't need to be frozen
    
    Thanks everyone!
    
    - Heikki