Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Clear 'xid' in dummy async notify entries written to fill up pages
- 84f1bf4afa5e 14.21 landed
- 21a9014cf00a 15.16 landed
- 0e8eaa2181d4 16.12 landed
- d80d5f099502 17.8 landed
- 82fa6b78dba1 18.2 landed
- 0bdc777e8007 19 (unreleased) landed
Fix remaining race condition with CLOG truncation and LISTEN/NOTIFY
- c2e58c0711fe 14.21 landed
- 0c862646cf2a 15.16 landed
- 44e8c60be66c 16.12 landed
- c2682810ab7d 17.8 landed
- 7b069a1876e4 18.2 landed
- 797e9ea6e54b 19 (unreleased) landed
Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
- eba917d360e7 14.21 landed
- 1a469d7b5b7d 15.16 landed
- 053e1868b7ee 16.12 landed
- d02c03ddc5e3 17.8 landed
- 321ec54625fd 18.2 landed
- 8eeb4a0f7c06 19 (unreleased) landed
Escalate ERRORs during async notify processing to FATAL
- 7cb05dd2d198 14.21 landed
- b1da37de21d4 15.16 landed
- c1a5bde003b8 16.12 landed
- b821c92920f0 17.8 landed
- aab4a84bb070 18.2 landed
- 1b4699090eaf 19 (unreleased) landed
Limit the size of TID lists during parallel GIN build
- c98dffcb7c70 19 (unreleased) cited

LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Alexandra Wang <alexandra.wang.oss@gmail.com> — 2025-08-06T04:50:01Z

Hi,

I'm bringing up a bug that was reported multiple times [1][2][3] in
the bugs list here, for a broader audience.

The issue is that an ERROR like the one below occurs when trying to
register any listener in the database.

test=# listen c21;
ERROR:  58P01: could not access status of transaction 14279685
DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087

In [1], Andrei Varashen provided detailed reproduction steps. I’m
copying and pasting his example from thread [1] here, with slight
simplification.

Pre-conditions:
- Disable autovacuum to avoid its intervention.

Steps to reproduce:

1. Create a test table and notify (but not listen) to a channel
(backend 1):

create table test (id int);
insert into test values (1);
notify c1;

2. List pg_xact files so we know its starting state:

➜  ls -lah ~/pg-devel/data/pg_xact
total 16
drwx------@  3 alex.wang  staff    96B Aug  5 20:10 .
drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
-rw-------@  1 alex.wang  staff   8.0K Aug  5 20:12 0000

3. Prepare a "test.sql" file for pgbench, and then run pgbench to
generate lots of transactions, so that pg_xact/0000 is completely
filled and leads to pg_xact/0001.

> cat test.sql
UPDATE test SET id = 1;

> pgbench -n -c 80 -j 10 -t 15000 -f ~/workspace/test.sql postgres

4. Verify that pg_xact/0001 is created:

➜  ls -lah ~/pg-devel/data/pg_xact
total 560
drwx------@  4 alex.wang  staff   128B Aug  5 20:25 .
drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
-rw-------@  1 alex.wang  staff   256K Aug  5 20:25 0000
-rw-------@  1 alex.wang  staff    16K Aug  5 20:25 0001

5. Execute VACUUM FREEZE on every database on the server to freeze
rows and purge pg_xact/0000.

postgres=# VACUUM FREEZE;
VACUUM

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM

template1=# ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
ALTER DATABASE
template1=# \c template0
You are now connected to database "template0" as user "postgres".
template0=# VACUUM FREEZE;
VACUUM
postgres=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 postgres  |      1200774
 template0 |      1200775
 template1 |      1200774
(3 rows)

6. Ensure that pg_xact/0000 is gone:

➜   ls -lah ~/pg-devel/data/pg_xact
total 80
drwx------@  3 alex.wang  staff    96B Aug  5 20:29 .
drwx------@ 26 alex.wang  staff   832B Aug  5 20:12 ..
-rw-------@  1 alex.wang  staff    40K Aug  5 20:30 0001

7. Try to listen to any channel from any backend connection on the
same database:

postgres=# listen c1;
ERROR:  58P01: could not access status of transaction 773
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087
postgres=# listen c2;
ERROR:  58P01: could not access status of transaction 773
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087

Here's the stack trace from the master branch:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 21.1
  * frame #0: 0x0000000102c798c8
postgres`SlruReportIOError(ctl=0x000000010378e5e0, pageno=0, xid=773) at
slru.c:1084:4
    frame #1: 0x0000000102c792b0
postgres`SimpleLruReadPage(ctl=0x000000010378e5e0, pageno=0, write_ok=true,
xid=773) at slru.c:603:4
    frame #2: 0x0000000102c79f7c
postgres`SimpleLruReadPage_ReadOnly(ctl=0x000000010378e5e0, pageno=0,
xid=773) at slru.c:661:9
    frame #3: 0x0000000102c6a6bc postgres`TransactionIdGetStatus(xid=773,
lsn=0x000000016d2975e8) at clog.c:745:11
    frame #4: 0x0000000102c7e924
postgres`TransactionLogFetch(transactionId=773) at transam.c:79:14
    frame #5: 0x0000000102c7e74c
postgres`TransactionIdDidCommit(transactionId=773) at transam.c:130:14
    frame #6: 0x0000000102dc8b94
postgres`asyncQueueProcessPageEntries(current=0x000000016d297720,
stop=QueuePosition @ 0x000000016d297690, page_buffer="\U00000014",
snapshot=0x000000012d812398) at async.c:2069:13
    frame #7: 0x0000000102dc8954 postgres`asyncQueueReadAllNotifications at
async.c:1981:18
    frame #8: 0x0000000102dc6b5c postgres`Exec_ListenPreCommit at
async.c:1127:3
    frame #9: 0x0000000102dc664c postgres`PreCommit_Notify at async.c:881:6
    frame #10: 0x0000000102c8c77c postgres`CommitTransaction at
xact.c:2341:2
    frame #11: 0x0000000102c87b2c postgres`CommitTransactionCommandInternal
at xact.c:3214:4
    frame #12: 0x0000000102c87a44 postgres`CommitTransactionCommand at
xact.c:3175:10
    frame #13: 0x000000010321da94 postgres`finish_xact_command at
postgres.c:2833:3
    frame #14: 0x000000010321b64c
postgres`exec_simple_query(query_string="listen c1;") at postgres.c:1298:4
    frame #15: 0x000000010321a714 postgres`PostgresMain(dbname="postgres",
username="alex.wang") at postgres.c:4767:7
    frame #16: 0x0000000103211a14
postgres`BackendMain(startup_data=0x000000016d299e48, startup_data_len=24)
at backend_startup.c:124:2
    frame #17: 0x00000001030e938c
postgres`postmaster_child_launch(child_type=B_BACKEND, child_slot=56,
startup_data=0x000000016d299e48, startup_data_len=24,
client_sock=0x000000016d299ed8) at launch_backend.c:290:3
    frame #18: 0x00000001030f0d60
postgres`BackendStartup(client_sock=0x000000016d299ed8) at
postmaster.c:3587:8
    frame #19: 0x00000001030eebc4 postgres`ServerLoop at postmaster.c:1702:6
    frame #20: 0x00000001030ed67c postgres`PostmasterMain(argc=3,
argv=0x00006000021f14e0) at postmaster.c:1400:11
    frame #21: 0x0000000102f73e40 postgres`main(argc=3,
argv=0x00006000021f14e0) at main.c:231:4
    frame #22: 0x00000001940a2b98 dyld`start + 6076

Daniil Davydov has analyzed the root cause in thread [4] and I agree with
what he said:

On Thu, Jul 31, 2025 at 8:21 PM Daniil Davydov <3danissimo@gmail.com> wrote:

> We have the following logic in the notify queue :
> If there are no listeners within all databases, and we are calling
> LISTEN, then we must iterate from 'tail' to 'head' of the queue and
> check statuses of transactions (see Exec_ListenPreCommit).
> If there is a pruned-away xid in the queue, we will try to access its
> status and get an error.
>
> Because the tail of the queue is not necessarily always advanced
> forward by the listeners, we can get such error without any long lived
> transactions.
>

The fix and workarounds were discussed in [5] and [6]: In [5], Daniil
proposed a patch, which I’ve attached. The patch adds a call to
asyncQueueAdvanceTail() in vac_update_datfrozenxid(), so that VACUUM
advances the async queue tail.

Similarly, calling the built-in function pg_notification_queue_usage()
also advances the async queue tail. So when the issue occurs, calling
this function could also make the error go away. However, this doesn’t
prevent the error from happening in the first place.

My questions:

1. Is it acceptable to drop notifications from the async queue if
there are no active listeners? There might still be notifications that
haven’t been read by any previous listener.

2. If the answer to 1 is no, how can we teach VACUUM to respect the
minimum xid stored in all AsyncQueueEntries?

Best,
Alex

[1]
https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2%40postgresql.org
[2]
https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
[3]
https://www.postgresql.org/message-id/18394-e7459245148578b2@postgresql.org
[4]
https://www.postgresql.org/message-id/CAJDiXgh3Jh2N90Fe4%3DX2qE%2BYAiZ1BSpgznhH%2BvkTHmvK3gjqxw%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/CAJDiXgj1BmLKAZ%3DgOC1eETmctt7z%3Daj8MHfmd%2BnORE1P6qncsA%40mail.gmail.com
[6]
https://www.postgresql.org/message-id/CAK98qZ0uZrf1b6L4mDs%2B17M07KCNt6YMc9vD-%2BTmeWaJHRbGpA%40mail.gmail.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-08-06T10:44:13Z

On 2025-Aug-05, Alexandra Wang wrote:

> I'm bringing up a bug that was reported multiple times [1][2][3] in
> the bugs list here, for a broader audience.
> 
> The issue is that an ERROR like the one below occurs when trying to
> register any listener in the database.
> 
> test=# listen c21;
> ERROR:  58P01: could not access status of transaction 14279685
> DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:1087

Oh, interesting problem.  Many thanks for the excellent write-up.

> My questions:
> 
> 1. Is it acceptable to drop notifications from the async queue if
> there are no active listeners? There might still be notifications that
> haven’t been read by any previous listener.

I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?

> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
> minimum xid stored in all AsyncQueueEntries?

Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
listeners, and back off the pg_clog truncation based on that.  This
could be done by having a new boolean argument that says to look up the
XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
would only be passed true by vac_update_datfrozenxid(), to avoid
overhead by other callers), then collect the oldest of those and return
it.

This does create the problem that an inactive listener could cause the
XID counter to stay far in the past.  Maybe we could try to avoid this
by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
are way overdue on reading notifies.  I'm not sure if this is really
needed or useful; consider a backend stuck on SIGSTOP (debugger or
whatever): it will just sit there forever.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-11T13:41:08Z

On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
>> My questions:
>> 
>> 1. Is it acceptable to drop notifications from the async queue if
>> there are no active listeners? There might still be notifications that
>> haven’t been read by any previous listener.
>
> I'm somewhat wary of this idea -- could these inactive listeners become
> active later and expect to be able to read their notifies?
>
I'm bit worry about this too.

>> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
>> minimum xid stored in all AsyncQueueEntries?
>
> Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
> listeners, and back off the pg_clog truncation based on that.  This
> could be done by having a new boolean argument that says to look up the
> XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
> would only be passed true by vac_update_datfrozenxid(), to avoid
> overhead by other callers), then collect the oldest of those and return
> it.
>
The problem with only considering the oldest XID of listeners is that
IIUC we may have notifications without listeners, and in this case we
may still get this error because when the LISTEN is executed we loop
through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
call TransactionIdDidCommit() that raise the error before
IsListeningOn(channel) is called.

Another option would be to add a minXid field on AsyncQueueControl and
then update this value on asyncQueueProcessPageEntries() and
asyncQueueAddEntries() routines, and then we could check this value on
vac_update_datfrozenxid().

> This does create the problem that an inactive listener could cause the
> XID counter to stay far in the past.  Maybe we could try to avoid this
> by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
> send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
> are way overdue on reading notifies.  I'm not sure if this is really
> needed or useful; consider a backend stuck on SIGSTOP (debugger or
> whatever): it will just sit there forever.
>
With this idea that I've proposed we still could have this problem, if a
listener take too long to consume a message we would block vacuum freeze
to advance the xid. For this I think that we could have two GUC's; One
to enable and disable the oldest xmin check on async queue and the
second to control how far we want to prevent the vacuum from freezing
the oldest async queue xid, and if the min xid raises this limit we
ignore and truncate the xid.

I've write a draft patch that plays with the idea, see attached.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Daniil Davydov <3danissimo@gmail.com> — 2025-08-13T19:29:29Z

Hi,

On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
> >> My questions:
> >>
> >> 1. Is it acceptable to drop notifications from the async queue if
> >> there are no active listeners? There might still be notifications that
> >> haven’t been read by any previous listener.
> >
> > I'm somewhat wary of this idea -- could these inactive listeners become
> > active later and expect to be able to read their notifies?
> >
> I'm bit worry about this too.

What exactly do we mean by "active listener"? According to the source code,
the active listener (as far as I understand) is the one who listens to at least
one channel. If we have no active listeners in the database, the new listener
will set its pointer to the tail of the async queue. Thus, messages with old
xid will not be touched by anybody. I don't see any point in dropping them
in this case.

If the "inactive" listener is the backend which is stuck somewhere, the
answer is "no" - this backend should be able to process all notifications.

>
> >> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
> >> minimum xid stored in all AsyncQueueEntries?
> >
> > Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
> > listeners, and back off the pg_clog truncation based on that.  This
> > could be done by having a new boolean argument that says to look up the
> > XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
> > would only be passed true by vac_update_datfrozenxid(), to avoid
> > overhead by other callers), then collect the oldest of those and return
> > it.
> >
> The problem with only considering the oldest XID of listeners is that
> IIUC we may have notifications without listeners, and in this case we
> may still get this error because when the LISTEN is executed we loop
> through the AsyncQueueEntry's on asyncQueueProcessPageEntries() and we
> call TransactionIdDidCommit() that raise the error before
> IsListeningOn(channel) is called.
>

Agree.

> > This does create the problem that an inactive listener could cause the
> > XID counter to stay far in the past.  Maybe we could try to avoid this
> > by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
> > send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
> > are way overdue on reading notifies.  I'm not sure if this is really
> > needed or useful; consider a backend stuck on SIGSTOP (debugger or
> > whatever): it will just sit there forever.
> >
> With this idea that I've proposed we still could have this problem, if a
> listener take too long to consume a message we would block vacuum freeze
> to advance the xid. For this I think that we could have two GUC's; One
> to enable and disable the oldest xmin check on async queue and the
> second to control how far we want to prevent the vacuum from freezing
> the oldest async queue xid, and if the min xid raises this limit we
> ignore and truncate the xid.
>

1) About Álvaro's comment:
I don't think that killing the lagging backend will save us, because entries
in the async queue are not ordered by xid. Thus, even after killing such a
backend, we have no guarantees that the problem is gone, because not
lagging backends still may encounter a super old xid.

2) About Matheus's comment:
I guess that there is only one reliable way to determine the precise minimal xid
in the queue (after message processing) - scan all pages from head to tail.
It obviously can take a lot of time. The second GUC (if I get it
right) logic is based
on minimal xid determination, so it won't be the best solution for
highload systems
(that are forced to turn on the first GUC, because of safety requirements).

> Another option would be to add a minXid field on AsyncQueueControl and
> then update this value on asyncQueueProcessPageEntries() and
> asyncQueueAddEntries() routines, and then we could check this value on
> vac_update_datfrozenxid().
>
> I've write a draft patch that plays with the idea, see attached.

Thanks for the patch! I have few comments on it :
1)
Maybe we should add an assert that NotifyQueueLock is held to the StoreEntryXid
function?

2)
For fields in AsyncQueueControl we have macros like QUEUE_HEAD,
QUEUE_TAIL and so on. Maybe minXid should also be wrapped with something
like QUEUE_MIN_XID?

3)
This logic will not work with (for example) two listeners. Steps :
session1=# listen c1;
session2=# listen c1;
session3=# notify c1; // sets minimal xid to N inside StoreEntryXid
session3=# notify c1;
session1=# listen c1; // sets minimal xid to (N + 1) inside ReleaseEntryXid

After these steps, session2 still needs to process a message with xid = N, but
autovacuum legally can freeze it and truncate clog.

Maybe we should check whether there are no listeners with smaller queue
position compared to the current backend?

4)
After ReleaseEntryXid, minXid field contains not actually minimal xid
in the queue,
but maximum processed xid. Maybe this parameter can be renamed?

--
Best regards,
Daniil Davydov

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-19T00:14:15Z

On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
> Hi,
>
> On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
>> >> My questions:
>> >>
>> >> 1. Is it acceptable to drop notifications from the async queue if
>> >> there are no active listeners? There might still be notifications that
>> >> haven’t been read by any previous listener.
>> >
>> > I'm somewhat wary of this idea -- could these inactive listeners become
>> > active later and expect to be able to read their notifies?
>> >
>> I'm bit worry about this too.
>
> What exactly do we mean by "active listener"? According to the source code,
> the active listener (as far as I understand) is the one who listens to at least
> one channel. If we have no active listeners in the database, the new listener
> will set its pointer to the tail of the async queue. Thus, messages with old
> xid will not be touched by anybody. I don't see any point in dropping them
> in this case.
>
I think that this definition is correct, but IIUC the tail can still
have notifications with xid's that were already truncated by vacuum
freeze. When the LISTEN is executed, we first loop through the
notification queue to try to advance the queue pointers and we can
eventually iterate over a notification that was added on the queue
without any listener but it has a xid that is already truncated by vacuum
freeze, so in this case it will fail to get the transaction status. On
Alex steps to reproduce the issue it first executes the NOTIFY and
then executes the LISTEN which fails after vacuum freeze.

> If the "inactive" listener is the backend which is stuck somewhere, the
> answer is "no" - this backend should be able to process all notifications.
>
I tried to reproduce the issue by using some kind of "inactive"
listener but so far I didn't manage to trigger the error. This is what I
tried:

1. Create listener:
postgres=# listen c1;

2. Execute a very long query to make the backend busy to process the
notification:
postgres=# select * from generate_series(1,10000000000) g where g > 1;

3. On another session send the notification
postgres=# notify c1;

4. Execute pgbench test:
pgbench -n -c 80 -j 10 -t 15000 -f test.sql postgres

5. Verify that we have multiple files on pg_xact:
➜  ls -lah ~/pg-devel/data/pg_xact
total 608
-rw-------@ 1 matheus  staff   256K Aug 18 20:36 0000
-rw-------@ 1 matheus  staff    40K Aug 18 20:56 0001

6. Execute VACUUM FREEZE on every database on the server

postgres=# VACUUM FREEZE;
VACUUM

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM

template1=# \c template0
You are now connected to database "template0" as user "postgres".
template0=# VACUUM FREEZE;
VACUUM

After the vacuum freeze I still can see the same files on pg_xact/ and
if I cancel the long query the notification is received correctly, and
then if I execute vacuum freeze again on every database the oldest
pg_xact file is truncated.

So, if my tests are correct I don't think that storing the oldest xid is
necessary anymore since I don't think that we can lose notifications
using the patch from Daniil or I'm missing something here?

Thinking about this, maybe another solution for this would be to change
queue advancing pointers to skip the transaction status check? With this
we would not need to touch any vacuum freeze code and instead change the
LISTEN execution to handle this scenario.

Thoughts?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Daniil Davydov <3danissimo@gmail.com> — 2025-08-19T03:57:42Z

Hi,

On Tue, Aug 19, 2025 at 7:14 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
> >
> > What exactly do we mean by "active listener"? According to the source code,
> > the active listener (as far as I understand) is the one who listens to at least
> > one channel. If we have no active listeners in the database, the new listener
> > will set its pointer to the tail of the async queue. Thus, messages with old
> > xid will not be touched by anybody. I don't see any point in dropping them
> > in this case.
> >
> I think that this definition is correct, but IIUC the tail can still
> have notifications with xid's that were already truncated by vacuum
> freeze. When the LISTEN is executed, we first loop through the
> notification queue to try to advance the queue pointers and we can
> eventually iterate over a notification that was added on the queue
> without any listener but it has a xid that is already truncated by vacuum
> freeze, so in this case it will fail to get the transaction status. On
> Alex steps to reproduce the issue it first executes the NOTIFY and
> then executes the LISTEN which fails after vacuum freeze.
>

Yeah, you are right. I looked at the code again, and found out that even
if there are no active listeners, new listener should iterate from the head
to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
dropping notifications is not the best way to resolve this issue.

> > If the "inactive" listener is the backend which is stuck somewhere, the
> > answer is "no" - this backend should be able to process all notifications.
> >
> I tried to reproduce the issue by using some kind of "inactive"
> listener but so far I didn't manage to trigger the error.
>
> After the vacuum freeze I still can see the same files on pg_xact/ and
> if I cancel the long query the notification is received correctly, and
> then if I execute vacuum freeze again on every database the oldest
> pg_xact file is truncated.
>
> So, if my tests are correct I don't think that storing the oldest xid is
> necessary anymore since I don't think that we can lose notifications
> using the patch from Daniil or I'm missing something here?
>

You have started a very long transaction, which holds its xid and prevents
vacuum from freezing it. But what if the backend is stuck not inside a
transaction? Maybe we can just hardcode a huge delay (not inside the
transaction) or stop process execution via breakpoint in gdb. If we will use it
instead of a long query, I think that this error may be reproducible.

--
Best regards,
Daniil Davydov

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-19T11:31:00Z

On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
>> I think that this definition is correct, but IIUC the tail can still
>> have notifications with xid's that were already truncated by vacuum
>> freeze. When the LISTEN is executed, we first loop through the
>> notification queue to try to advance the queue pointers and we can
>> eventually iterate over a notification that was added on the queue
>> without any listener but it has a xid that is already truncated by vacuum
>> freeze, so in this case it will fail to get the transaction status. On
>> Alex steps to reproduce the issue it first executes the NOTIFY and
>> then executes the LISTEN which fails after vacuum freeze.
>>
>
> Yeah, you are right. I looked at the code again, and found out that even
> if there are no active listeners, new listener should iterate from the head
> to the tail. Thus, it may encounter truncated xid. Anyway, I still think that
> dropping notifications is not the best way to resolve this issue.
>
In the steps that Alex shared, is it expected that the "LISTEN c1" command
consumes the notification that was sent previously with NOTIFY? IIUC the
LISTEN command should be executed before of any NOTIFY, so executing the
LISTEN after a NOTIFY will not consume any previous notification added
on the channel, so how bad would be to drop this notification from the
queue in this situation?

>> > If the "inactive" listener is the backend which is stuck somewhere, the
>> > answer is "no" - this backend should be able to process all notifications.
>> >
>> I tried to reproduce the issue by using some kind of "inactive"
>> listener but so far I didn't manage to trigger the error.
>>
>> After the vacuum freeze I still can see the same files on pg_xact/ and
>> if I cancel the long query the notification is received correctly, and
>> then if I execute vacuum freeze again on every database the oldest
>> pg_xact file is truncated.
>>
>> So, if my tests are correct I don't think that storing the oldest xid is
>> necessary anymore since I don't think that we can lose notifications
>> using the patch from Daniil or I'm missing something here?
>>
>
> You have started a very long transaction, which holds its xid and prevents
> vacuum from freezing it. But what if the backend is stuck not inside a
> transaction? Maybe we can just hardcode a huge delay (not inside the
> transaction) or stop process execution via breakpoint in gdb. If we will use it
> instead of a long query, I think that this error may be reproducible.
>
But how could this happen in real scenarios? I mean, how the backend
could be stuck outside a transaction?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Daniil Davydov <3danissimo@gmail.com> — 2025-08-19T17:37:54Z

Hi,

On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
> > You have started a very long transaction, which holds its xid and prevents
> > vacuum from freezing it. But what if the backend is stuck not inside a
> > transaction? Maybe we can just hardcode a huge delay (not inside the
> > transaction) or stop process execution via breakpoint in gdb. If we will use it
> > instead of a long query, I think that this error may be reproducible.
> >
> But how could this happen in real scenarios? I mean, how the backend
> could be stuck outside a transaction?
>

For now, I cannot come up with a situation where it may be possible.
Perhaps, such a lagging may occur during network communication,
but I couldn't reproduce it. Maybe other people know how we can achieve
this?

I think that if such a situation may be possible, the suggestion to  delete
messages  will no longer be relevant. Therefore, first of all, I would like to
clarify this issue.

--
Best regards,
Daniil Davydov

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-08-19T17:40:50Z

On Mon, Aug 18, 2025 at 5:14 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Aug 13, 2025 at 4:29 PM -03, Daniil Davydov wrote:
> > Hi,
> >
> > On Mon, Aug 11, 2025 at 8:41 PM Matheus Alcantara
> > <matheusssilv97@gmail.com> wrote:
> >>
> >> On Wed Aug 6, 2025 at 7:44 AM -03, Álvaro Herrera wrote:
> >> >> My questions:
> >> >>
> >> >> 1. Is it acceptable to drop notifications from the async queue if
> >> >> there are no active listeners? There might still be notifications that
> >> >> haven’t been read by any previous listener.
> >> >
> >> > I'm somewhat wary of this idea -- could these inactive listeners become
> >> > active later and expect to be able to read their notifies?
> >> >
> >> I'm bit worry about this too.
> >
> > What exactly do we mean by "active listener"? According to the source code,
> > the active listener (as far as I understand) is the one who listens to at least
> > one channel. If we have no active listeners in the database, the new listener
> > will set its pointer to the tail of the async queue. Thus, messages with old
> > xid will not be touched by anybody. I don't see any point in dropping them
> > in this case.
> >
> I think that this definition is correct, but IIUC the tail can still
> have notifications with xid's that were already truncated by vacuum
> freeze. When the LISTEN is executed, we first loop through the
> notification queue to try to advance the queue pointers and we can
> eventually iterate over a notification that was added on the queue
> without any listener but it has a xid that is already truncated by vacuum
> freeze, so in this case it will fail to get the transaction status.

When a process first executes the LISTEN command, it iterates through
the notification queue, but it seems only to advance its queue pointer
because it doesn't add the interested channels to its list yet and it
isn't interested in the notifications queued before it registered as a
listener. I'm wondering if we could optimize this by allowing the
queue pointer to fast-forward without checking transaction status. If
feasible, this might resolve the reported issue.

However, I have a more fundamental concern regarding the LISTEN/NOTIFY
implementation. Since vacuum doesn't consider the XIDs of notification
entries, there might be a critical issue with the truncation of clog
entries that LISTEN/NOTIFY still requires. As I understand it,
notification queue entries aren't ordered by XID, and it's possible
for a notification with an older XID to be positioned at the queue's
head. If vacuum freeze then truncates the corresponding clogs,
listeners attempting to retrieve this notification would fail to
obtain the transaction status. To address this, we likely need to
either implement Álvaro's suggestion[1] to make vacuum aware of the
oldest XID in the notification queue, or develop a mechanism to
remove/freeze XIDs of the notification entries.

Regards,

[1] https://www.postgresql.org/message-id/202508061044.ptcyt7aqsaaa%40alvherre.pgsql

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-20T21:18:32Z

On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
> Hi,
>
> On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
>> > You have started a very long transaction, which holds its xid and prevents
>> > vacuum from freezing it. But what if the backend is stuck not inside a
>> > transaction? Maybe we can just hardcode a huge delay (not inside the
>> > transaction) or stop process execution via breakpoint in gdb. If we will use it
>> > instead of a long query, I think that this error may be reproducible.
>> >
>> But how could this happen in real scenarios? I mean, how the backend
>> could be stuck outside a transaction?
>>
>
> For now, I cannot come up with a situation where it may be possible.
> Perhaps, such a lagging may occur during network communication,
> but I couldn't reproduce it. Maybe other people know how we can achieve
> this?
>
Reading more the code I understand that once the a NOTIFY command is
received by a backend (and the transaction is committed) it will
emedialy signal all other listener backends and if the listener backend
is in idle it will consume the notification and then send it back to the
client as a PqMsg_NotificationResponse, so if there is a network delay
to send the notification from the listener backend back to the client I
don't think that it would be possible to get this error, because the
message was already dispatched by the backend and it will eventually get
to the client and once the notification is dispatched the backend
doesn't need to track it anymore (the queue pointers of the backend are
advanced after the dispatch).

Assuming that every SQL command is wrapped into a transaction (if it's
not already inside in) I think a busy listener backend will always
prevent the vacuum from freezing clog files past from its current xid,
so any notification that is sent while the backend is busy will not have
their transaction status removed from clog files anyway.

Is all these understandings and assumptions correct or am I missing
something here?

> I think that if such a situation may be possible, the suggestion to  delete
> messages  will no longer be relevant. Therefore, first of all, I would like to
> clarify this issue.
>
>From what I've understood until now it seems to me that this can happen
only if we have a notification on the queue without any listener, so the
notification may stay on the queue from a long time and a vacuum freeze
can be executed during this time and then when we have a new listener
(even for a different channel) it will fail to advance the pointers at
listener setup(Exec_ListenPreCommit()) because it would not be able to
get the transition status of this very old notification.

(please note that I'm not trying to invalidate your concern, I'm also
have this concern but unfortunately I'm unable to reproduce it and I'm
just sharing my thoughts to see if this issue is really possible or not)

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-20T21:22:25Z

On Tue Aug 19, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
> However, I have a more fundamental concern regarding the LISTEN/NOTIFY
> implementation. Since vacuum doesn't consider the XIDs of notification
> entries, there might be a critical issue with the truncation of clog
> entries that LISTEN/NOTIFY still requires. As I understand it,
> notification queue entries aren't ordered by XID, and it's possible
> for a notification with an older XID to be positioned at the queue's
> head. If vacuum freeze then truncates the corresponding clogs,
> listeners attempting to retrieve this notification would fail to
> obtain the transaction status. To address this, we likely need to
> either implement Álvaro's suggestion[1] to make vacuum aware of the
> oldest XID in the notification queue, or develop a mechanism to
> remove/freeze XIDs of the notification entries.
>
Thanks for the comments! Please see my reply at [1] that I mention that
I don't think that is too easy to have this specific scenario of a busy
backend loose dropped notifications.

[1] https://www.postgresql.org/message-id/DC7KGTXW3QSG.OZA24HONT78J%40gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-08-21T22:25:34Z

On Wed, Aug 20, 2025 at 2:18 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Tue Aug 19, 2025 at 2:37 PM -03, Daniil Davydov wrote:
> > Hi,
> >
> > On Tue, Aug 19, 2025 at 6:31 PM Matheus Alcantara
> > <matheusssilv97@gmail.com> wrote:
> >>
> >> On Tue Aug 19, 2025 at 12:57 AM -03, Daniil Davydov wrote:
> >> > You have started a very long transaction, which holds its xid and prevents
> >> > vacuum from freezing it. But what if the backend is stuck not inside a
> >> > transaction? Maybe we can just hardcode a huge delay (not inside the
> >> > transaction) or stop process execution via breakpoint in gdb. If we will use it
> >> > instead of a long query, I think that this error may be reproducible.
> >> >
> >> But how could this happen in real scenarios? I mean, how the backend
> >> could be stuck outside a transaction?
> >>
> >
> > For now, I cannot come up with a situation where it may be possible.
> > Perhaps, such a lagging may occur during network communication,
> > but I couldn't reproduce it. Maybe other people know how we can achieve
> > this?
> >
> Reading more the code I understand that once the a NOTIFY command is
> received by a backend (and the transaction is committed) it will
> emedialy signal all other listener backends and if the listener backend
> is in idle it will consume the notification and then send it back to the
> client as a PqMsg_NotificationResponse, so if there is a network delay
> to send the notification from the listener backend back to the client I
> don't think that it would be possible to get this error, because the
> message was already dispatched by the backend and it will eventually get
> to the client and once the notification is dispatched the backend
> doesn't need to track it anymore (the queue pointers of the backend are
> advanced after the dispatch).
>
> Assuming that every SQL command is wrapped into a transaction (if it's
> not already inside in) I think a busy listener backend will always
> prevent the vacuum from freezing clog files past from its current xid,
> so any notification that is sent while the backend is busy will not have
> their transaction status removed from clog files anyway.

What about backend processes that don't have any xid or xmin (i.e.,
are read-only query and in idle-in-transaction)?

IIUC we process the notification entries at the beginning of the
server loop (see L4608 in postgres.c) and when reading a command (via
ProcessClientReadInterrupt()), but it seems to me that if a process is
in idle-in-transaction state it doesn't process the entries unless the
transaction is committed. I've reproduced the missing clog entry error
even if we have a notification on the queue with a valid listener,
with the following steps:

1. Initialize the database cluster.

2. Execute "ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;".

3. Start one psql session and execute:

-- Session 1
=# listen s;
LISTEN
=# begin;
BEGIN

(keep this session open to leave the process idle-in-transaction.)

4. Open another psql session and execute:

-- Session 2
=# begin;
BEGIN
=# select txid_current();
 txid_current
--------------
          756
(1 row)
=# notify s;
NOTIFY
=# commit;
COMMIT

The notification to the channel 's' should be available for the
session-1's transaction.

5. Consume enough XIDs to truncate clog entries.

-- Session 2
=# create extension xid_wraparound;
CREATE EXTENSION
=# select consume_xids(10_000_000);
NOTICE:  consumed 10000000 / 10000000 XIDs, latest 0:10000757
 consume_xids
--------------
     10000757
(1 row)
=# select txid_current();
 txid_current
--------------
     10000758
(1 row)

6. Execute vacuum freeze on all databases:

$ vacuumdb --all --freeze
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template0"
vacuumdb: vacuuming database "template1"

$ psql -d postgres -c "select datname, datfrozenxid, age(datfrozenxid)
from pg_database"
  datname  | datfrozenxid | age
-----------+--------------+-----
 postgres  |     10000759 |  11
 template0 |     10000759 |  11
 template1 |     10000759 |  11
(3 rows)

7. On the first psql session:

-- Session 1
=# commit;
COMMIT
ERROR:  could not access status of transaction 756
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-21T23:14:47Z

On Thu Aug 21, 2025 at 7:25 PM -03, Masahiko Sawada wrote:
> What about backend processes that don't have any xid or xmin (i.e.,
> are read-only query and in idle-in-transaction)?
>
> IIUC we process the notification entries at the beginning of the
> server loop (see L4608 in postgres.c) and when reading a command (via
> ProcessClientReadInterrupt()), but it seems to me that if a process is
> in idle-in-transaction state it doesn't process the entries unless the
> transaction is committed. I've reproduced the missing clog entry error
> even if we have a notification on the queue with a valid listener,
> with the following steps:
>
Ok, now we can reproduce this, thank you! So I think that we need a way
to teach the VACUUM FREEZE about old xid's on async queue.

I'll work on this considering the initial Daniil comments at [1]

[1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-08-28T22:31:35Z

On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
> I'll work on this considering the initial Daniil comments at [1]
>
> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
>
I've been working on this on the last few days, please see the attached
patch version.

In this new version I tried to follow the suggestion from Daniil of
scanning all pages from tail to head of the async queue.

I tried to use other data structures like a list or a hashmap to store
the xids but it shows to me more complicated due to duplicated
information being stored and also considering that this data structure
should be stored on shared memory, so I just tried to get the oldest xid
with the information that we already have on the async queue.

To find the oldest xid on the async queue I basically read all
AsyncQueueEntry's on async queue pages on SLRU cache from tail to head,
skipping only uncommitted transactions and notifications for different
databases.

The code that reads the pages and the entries within the page is a bit
coupled with the send notification logic. The important functions that
execute this is asyncQueueReadAllNotifications() which read the pages
from SLRU cache and then call the asyncQueueProcessPageEntries() which
read the entries within the page and send the notification to the client
if needed. Since I need a very similar code to read the notification
entries on the queue to check the min xid I think that it would be good
to have an API that just works like an iterator and returns the next
queue entry to work on so that it can be used for both cases.

In this patch I created an AsyncQueueIterator that iterates over pages
and AsyncQueueEntry's within the pages. The code is based on the
asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
functions. For now I'm just implementing the iterator logic and use at
AsyncQueueMinXid() to get the min xid on the queue to make the review
more easier but I think that we can have another patch that refactor
asyncQueueReadAllNotifications() and asyncQueueProcessPageEntries()
functions to use this iterator approach and avoid duplicated code.

I'm also adding a TAP test that reproduces the issue, but I'm not sure if
it's declared on the best path.

I think that it would be good to perform some benchmark tests to see if
we have performance issues when reading very long async queues.

Thoughts?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Jacques Combrink <jacques@quantsolutions.co.za> — 2025-09-01T14:06:58Z

I came across this discussion after we had this problem at least twice 
now at work.
I read through the discussion to see if this is the problem we are 
experiencing.
At first, I thought this was not exactly our problem, because of the way 
it was showed to be reproduced.

With a long running transaction that is listening, and then doing all 
the other steps like like running through transactions until a second 
pg_xact file is created, and then vacuuming etc etc...

But, I felt like we don't do this, so it cannot be the way we run into 
the problem. Specifically, having a listener as a long running transaction.


So I set out to get it into the broken state, and here is how I can 
reproduce it without long running queries:

1. Create another database

postgres=# CREATE DATABASE test;
CREATE DATABASE

2. Create a listener on the postgres database.

postgres=# LISTEN xx;
LISTEN

3. Create notifies for the test database.

Here I struggled to get a stuck notification in the queue (SELECT 
pg_notification_queue_usage();).

It can happen with only a single notify from a  psql connection, but I 
get a higher hit rate with the following:

Create a notify.sql with the only contents being:

```
NOTIFY s, 'yappers';
```
Then run this against the test database with pgbench.

pgbench -n -c 80 -j 20 -t 1000 -f notify.sql test


4. Confirm that there is now something stuck in the queue:

postgres=# SELECT pg_notification_queue_usage();
  pg_notification_queue_usage
-----------------------------
          9.5367431640625e-07
(1 row)

If this still shows 0, then run step 3 again.

5. Consume xid's. I create a file consume.sql with the only contents being:
```
SELECT txid_current();
```

Then:

pgbench -n -c 80 -j 30 -t 100000 -f consume.sql postgres

6. Verify that a new file is created in the pg_xact folder.

If not, just run the previous step again.

7. Run vacuum freeze. (Remember to allow connections to template0 
beforehand with `ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;`)

$ vacuumdb --all --freeze
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template0"
vacuumdb: vacuuming database "template1"
vacuumdb: vacuuming database "test"

8. Connect to the test database and try execute a listener.
test=# LISTEN anything;
ERROR:  could not access status of transaction 81086
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.
test=#


Extra weirdness:

In step 2 create a connection to the test database and LISTEN there on 
any channel, even the one you are notifying to:
And you don't have to run anything on that connection, there is no 
backend_xmin on that connection, you don't start a transaction nothing, 
you just run `LISTEN something;`
Then after step 7, as long as you don't close this connection, you will 
not get an error when you try to set up a listener on the test database, 
even on a new connection to the test database.

What I'm saying is after step 7 you can have two connections. One to 
postgres database with active listener, and one to test database with 
active listener.
As long as the postgres one is open the queue is stuck, ie `SELECT 
pg_notification_queue_usage();` returns non-zero.
As long as the test database connection is open, new listeners do not 
experience the error.

If you close the test database connection, but leave the postgres 
connection, from now on any listener you try to create on test database 
will error.

If you close the postgres connection, the queue clears immediately and 
new LISTEN's on the test database will work.
This means it is possible to get rid of the problem without restarting 
the postmaster if you can close connections until you close the one that 
"caused" the problem.
Don't know if this is actually useful, but to me it seemed like everyone 
believed it to be impossible to recover without restarting, so it's 
something at least.


TLDR:
active listener on one database causes notify on another database to get 
stuck.
At no point could I get a stuck notify if I don't have a listener on at 
least one other database than the one I am notifying on. See the Extra 
weirdness section.
At no point do you need to have any other queries running, there is 
never an idle in transaction query needed for bad timing with the vacuum.

I hope I explained everything well enough so that one of you smart 
people can find and fix the problem.

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Yura Sokolov <y.sokolov@postgrespro.ru> — 2025-09-02T07:37:59Z

29.08.2025 01:31, Matheus Alcantara пишет:
> On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
>> I'll work on this considering the initial Daniil comments at [1]
>>
>> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
>>
> I've been working on this on the last few days, please see the attached
> patch version.
> 
> In this new version I tried to follow the suggestion from Daniil of
> scanning all pages from tail to head of the async queue.

Since we still need to scan those pages, couldn't we mark finished
transactions as committed/aborted?
This way we may to not hold datfrozenxid for a long time and will allow
both safe clog truncation and safe async queue notification.
More over, most notifications could be marked as completed in
AtCommit_Notify already. So there is a need to recheck a few notifications
that were written but not marked in AtCommit_Notify.

Since AsyncQueueEntry field is used only to check visibility, looks like it
is safe to set it to FrozenTransactionId.

-- 
regards
Yura Sokolov aka funny-falcon

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-03T19:28:21Z

Hi,

On Tue, Sep 2, 2025 at 10:38 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> 29.08.2025 01:31, Matheus Alcantara пишет:
> > On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
> >> I'll work on this considering the initial Daniil comments at [1]
> >>
> >> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
> >>
> > I've been working on this on the last few days, please see the attached
> > patch version.
> >
> > In this new version I tried to follow the suggestion from Daniil of
> > scanning all pages from tail to head of the async queue.
>
> Since we still need to scan those pages, couldn't we mark finished
> transactions as committed/aborted?
> This way we may to not hold datfrozenxid for a long time and will allow
> both safe clog truncation and safe async queue notification.
> More over, most notifications could be marked as completed in
> AtCommit_Notify already. So there is a need to recheck a few notifications
> that were written but not marked in AtCommit_Notify.
>
> Since AsyncQueueEntry field is used only to check visibility, looks like it
> is safe to set it to FrozenTransactionId.

I like the idea to make queue records more self-sufficient, so we
don't need to check a transaction's status to determine if a
notification should be sent. What if we somehow mark every record of
an aborted transaction, so listeners can skip it without checking
transaction status? Also if the record is not marked as "aborted" we
also can send it without checking transaction status. IIUC
AtAbort_Notify is called before any listener can read records of the
aborted transaction (is it correct?), so we can do such "marking" in
AtAbort_Notify. And the listen/notify subsystem already has a similar
"marking" mechanism. At the end of every queue page we have a "dummy"
record which is skipped by every listener. Listeners skip it because
record's 'dboid' equals InvalidOid (listeners ignore any record where
the 'dboid' is different from their own). In the same way aborted
transactions can set 'dboid' to InvalidOid for their records in
AtAbort_Notify.

Some benefits of this approach:
1) Only aborted transactions would need to perform extra work.
2) Listeners don't need to check notify transaction status, but ofc
they still need to wait if the notify transaction is 'in progress'.
3) Vacuum don't need to be listen/notify aware.
4) We don't need to do an extra scan of the whole queue.

What do you think?

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T20:35:49Z

On Mon Sep 1, 2025 at 11:06 AM -03, Jacques Combrink wrote:
> TLDR:
> active listener on one database causes notify on another database to get
> stuck.
> At no point could I get a stuck notify if I don't have a listener on at
> least one other database than the one I am notifying on. See the Extra
> weirdness section.
> At no point do you need to have any other queries running, there is
> never an idle in transaction query needed for bad timing with the vacuum.
>
> I hope I explained everything well enough so that one of you smart
> people can find and fix the problem.
>
The  long running transaction steps is just an example that we can lose
notifications using the first patch from Daniil that Alex has shared on
[1]. The steps that you've shared is just another way to trigger the
issue but it's similar to the steps that Alex also shared on [1].

All these different ways to trigger the error face the same underlying
problem: If a notification is keep for too long on the queue that vacuum
freeze can run and truncate clog files that contains transaction
information of this notification the error will happen.

The patch that I've attached on [2] aims to fix the issue following the
steps that you've shared, but during the tests I've found a stack
overflow bug on AsyncQueueIterNextNotification() due to the number of
notifications. I'm attaching a new version that fix this bug and I tried
to reproduce your steps with this new version and the issue seems to be
fixed.

Note that notifications that were added without any previous LISTEN will
block the xid advance during VACUUM FREEZE until we have a listener on
the database that owns these notifications. The XXX comment on vacuum.c
is about this problem.

[1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
[2] https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T21:04:47Z

Thanks for the comments!

On Tue Sep 2, 2025 at 4:37 AM -03, Yura Sokolov wrote:
> 29.08.2025 01:31, Matheus Alcantara пишет:
>> On Thu Aug 21, 2025 at 8:14 PM -03, Matheus Alcantara wrote:
>>> I'll work on this considering the initial Daniil comments at [1]
>>>
>>> [1] https://www.postgresql.org/message-id/CAJDiXgg1ArRB1-6wLtXfVVnQ38P9Y%2BCDfEc9M2TXiOf_4kfBMg%40mail.gmail.com
>>>
>> I've been working on this on the last few days, please see the attached
>> patch version.
>>
>> In this new version I tried to follow the suggestion from Daniil of
>> scanning all pages from tail to head of the async queue.
>
> Since we still need to scan those pages, couldn't we mark finished
> transactions as committed/aborted?
> This way we may to not hold datfrozenxid for a long time and will allow
> both safe clog truncation and safe async queue notification.
> More over, most notifications could be marked as completed in
> AtCommit_Notify already. So there is a need to recheck a few notifications
> that were written but not marked in AtCommit_Notify.
>
> Since AsyncQueueEntry field is used only to check visibility, looks like it
> is safe to set it to FrozenTransactionId.
>
Maybe we could have a boolean "committed" field on AsyncQueueEntry and
check this field before sending the notification instead of call
TransactionIdDidCommit()? Is something similar what you are proposing?

We create the AsyncQueueEntry's and add into the SLRU at
PreCommit_Notify(), so to mark these entries as committed on
AtCommit_Notify() we would need a way to find these entries on the SLRU
from the pendingNotifies->events that contains the notifications added
on the current transaction being committed.

This idea looks promising to me TBH, I'm just not sure if it's
completely safe to mark an AsyncQueueEntry on the queue as committed
without checking on TransactionIdDidCommit(). I would like to read more
opinions about this before working on the next version based on this
idea.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-03T21:14:40Z

On Wed Sep 3, 2025 at 4:28 PM -03, Arseniy Mukhin wrote:
>> Since we still need to scan those pages, couldn't we mark finished
>> transactions as committed/aborted?
>> This way we may to not hold datfrozenxid for a long time and will allow
>> both safe clog truncation and safe async queue notification.
>> More over, most notifications could be marked as completed in
>> AtCommit_Notify already. So there is a need to recheck a few notifications
>> that were written but not marked in AtCommit_Notify.
>>
>> Since AsyncQueueEntry field is used only to check visibility, looks like it
>> is safe to set it to FrozenTransactionId.
>
> I like the idea to make queue records more self-sufficient, so we
> don't need to check a transaction's status to determine if a
> notification should be sent. What if we somehow mark every record of
> an aborted transaction, so listeners can skip it without checking
> transaction status? Also if the record is not marked as "aborted" we
> also can send it without checking transaction status. IIUC
> AtAbort_Notify is called before any listener can read records of the
> aborted transaction (is it correct?), so we can do such "marking" in
> AtAbort_Notify. And the listen/notify subsystem already has a similar
> "marking" mechanism. At the end of every queue page we have a "dummy"
> record which is skipped by every listener. Listeners skip it because
> record's 'dboid' equals InvalidOid (listeners ignore any record where
> the 'dboid' is different from their own). In the same way aborted
> transactions can set 'dboid' to InvalidOid for their records in
> AtAbort_Notify.
>
> Some benefits of this approach:
> 1) Only aborted transactions would need to perform extra work.
> 2) Listeners don't need to check notify transaction status, but ofc
> they still need to wait if the notify transaction is 'in progress'.
> 3) Vacuum don't need to be listen/notify aware.
> 4) We don't need to do an extra scan of the whole queue.
>
> What do you think?
>
IIUC we don't store notifications of aborted transactions on the queue.
On PreCommit_Notify we add the notifications on the queue and on
Commit_Notify() we signal the backends.

Or I'm missing something here?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Rishu Bagga <rishu.postgres@gmail.com> — 2025-09-03T23:51:20Z

On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara

<matheusssilv97@gmail.com> wrote:


> IIUC we don't store notifications of aborted transactions on the

> queue. On PreCommit_Notify we add the notifications on the queue

> and on Commit_Notify() we signal the backends.

>

> Or I'm missing something here?


My understanding is that something could go wrong in between

PreCommit_Notify and AtCommit_Notify, which could cause the

transaction to abort, and the notification might be in the queue at

this point, even though the transaction aborted, hence the dependency

on the commit log.


> Maybe we could have a boolean "committed" field on AsyncQueueEntry

> and check this field before sending the notification instead of

> call TransactionIdDidCommit()? Is something similar what you are

> proposing?


I like this direction, as it opens up the ability to remove the

global lock held from PreCommit_Notify to the end of the transaction.

In the same vein as this idea, I was experimenting with a patch

inspired by Tom Lane’s idea in [1] where we write the actual

notification data into the WAL, and just write the db, lsn, and xid

into the notify queue in AtCommit_Notify, so the notify queue only

contains information about committed transactions. Unfortunately,

the additional WAL write overhead in this approach seemed to outweigh

the benefits of removing the lock.


Following that idea - I think perhaps we could have two queues in the

notify system, one that stores the notification data, and another

that stores the position of the committed transaction’s notification,

which we would append to in AtCommit_Notify. Then the listener would

go through the position queue, and find / read the notifications from

the notify data queue.


[1] https://www.postgresql.org/message-id/1878165.1752858390%40sss.pgh.pa.us

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-04T14:14:22Z

Hi,

On Thu, Sep 4, 2025 at 2:51 AM Rishu Bagga <rishu.postgres@gmail.com> wrote:
>
> On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
>
> <matheusssilv97@gmail.com> wrote:
>
>
> > IIUC we don't store notifications of aborted transactions on the
>
> > queue. On PreCommit_Notify we add the notifications on the queue
>
> > and on Commit_Notify() we signal the backends.
>
> >
>
> > Or I'm missing something here?
>
>
> My understanding is that something could go wrong in between
>
> PreCommit_Notify and AtCommit_Notify, which could cause the
>
> transaction to abort, and the notification might be in the queue at
>
> this point, even though the transaction aborted, hence the dependency
>
> on the commit log.

I have the same understanding.

>
> > Maybe we could have a boolean "committed" field on AsyncQueueEntry
>
> > and check this field before sending the notification instead of
>
> > call TransactionIdDidCommit()? Is something similar what you are
>
> > proposing?
>
>
> I like this direction, as it opens up the ability to remove the
>
> global lock held from PreCommit_Notify to the end of the transaction.
>
> In the same vein as this idea, I was experimenting with a patch
>
> inspired by Tom Lane’s idea in [1] where we write the actual
>
> notification data into the WAL, and just write the db, lsn, and xid
>
> into the notify queue in AtCommit_Notify, so the notify queue only
>
> contains information about committed transactions. Unfortunately,
>
> the additional WAL write overhead in this approach seemed to outweigh
>
> the benefits of removing the lock.


Interesting, have you shared your patch and results somewhere? IIUC
Tom's approach resolves this bug, because with it we have queue
entries produced by committed transactions only, so we don't need to
check their status and don't have dependency on clog.

>
> Following that idea - I think perhaps we could have two queues in the
>
> notify system, one that stores the notification data, and another
>
> that stores the position of the committed transaction’s notification,
>
> which we would append to in AtCommit_Notify. Then the listener would
>
> go through the position queue, and find / read the notifications from
>
> the notify data queue.
>

If I'm not wrong, by the time we got into AtCommit_Notify we no longer
hold the global lock, so we can't provide notifications in the commit
order. I think it would work if we add entries to the position queue
concurrently with adding commit entries to the wal, as in Tom's idea
3rd step.
Another question: how to truncate notification data queue. It sounds
like it will be more difficult to figure out what data in the queue is
still needed.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Rishu Bagga <rishu.postgres@gmail.com> — 2025-09-04T23:02:16Z

On Thu, Sep 4, 2025 at 7:14 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

> Interesting, have you shared your patch and results somewhere? IIUC
> Tom's approach resolves this bug, because with it we have queue
> entries produced by committed transactions only, so we don't need to
> check their status and don't have dependency on clog.

I was eventually able to get better numbers with the patch after using
a substantial number of connections, so I have now posted it here. [1].

> Another question: how to truncate notification data queue. It sounds
> like it will be more difficult to figure out what data in the queue is
> still needed.

Yeah, I will think about this some more, and follow up with another
patch.

Thanks,
Rishu

[1] https://www.postgresql.org/message-id/CAK80%3DjipUfGC%2BU
QSzeA4oCP9daRtHZGm2SQZWLxC9NWmVTDtRQ%40mail.gmail.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-05T10:44:11Z

On Wed Sep 3, 2025 at 8:51 PM -03, Rishu Bagga wrote:
> On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
>
> <matheusssilv97@gmail.com> wrote:
>
>
>> IIUC we don't store notifications of aborted transactions on the
>
>> queue. On PreCommit_Notify we add the notifications on the queue
>
>> and on Commit_Notify() we signal the backends.
>
>>
>
>> Or I'm missing something here?
>
>
> My understanding is that something could go wrong in between
>
> PreCommit_Notify and AtCommit_Notify, which could cause the
>
> transaction to abort, and the notification might be in the queue at
>
> this point, even though the transaction aborted, hence the dependency
>
> on the commit log.
>
Ok, I agree that that this may happen but I don't see this as a common
case to fix the issue based on this behaviour. I think that we check the
transaction status also to skip notifications that were added on the
queue by transactions that are not fully committed yet, and I see this
scenario as a more common case but I could be wrong.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-05T10:45:32Z

On Thu Sep 4, 2025 at 8:02 PM -03, Rishu Bagga wrote:
> On Thu, Sep 4, 2025 at 7:14 AM Arseniy Mukhin
> <arseniy.mukhin.dev@gmail.com> wrote:
>
>> Interesting, have you shared your patch and results somewhere? IIUC
>> Tom's approach resolves this bug, because with it we have queue
>> entries produced by committed transactions only, so we don't need to
>> check their status and don't have dependency on clog.
>
> I was eventually able to get better numbers with the patch after using
> a substantial number of connections, so I have now posted it here. [1].
>
Thanks for sharing the patch! I'll reserve some time to test and review.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-05T12:56:48Z

On Fri, Sep 5, 2025 at 1:44 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Sep 3, 2025 at 8:51 PM -03, Rishu Bagga wrote:
> > On Wed, Sep 3, 2025 at 2:14 PM Matheus Alcantara
> >
> > <matheusssilv97@gmail.com> wrote:
> >
> >
> >> IIUC we don't store notifications of aborted transactions on the
> >
> >> queue. On PreCommit_Notify we add the notifications on the queue
> >
> >> and on Commit_Notify() we signal the backends.
> >
> >>
> >
> >> Or I'm missing something here?
> >
> >
> > My understanding is that something could go wrong in between
> >
> > PreCommit_Notify and AtCommit_Notify, which could cause the
> >
> > transaction to abort, and the notification might be in the queue at
> >
> > this point, even though the transaction aborted, hence the dependency
> >
> > on the commit log.
> >
> Ok, I agree that that this may happen but I don't see this as a common
> case to fix the issue based on this behaviour. I think that we check the
> transaction status also to skip notifications that were added on the
> queue by transactions that are not fully committed yet, and I see this
> scenario as a more common case but I could be wrong.
>

IIUC we don't need clog to check if the notification transaction is
still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
If we see that the notification transaction is still in progress, we
postpone processing of that notification. If we see that notification
transaction is not in progress, then we check its status with
TransactionIdDidCommit() (using clog) and only then fail if clog
doesn't have notification transcation data (as a quick check, we can
see in the stack trace that the failure occurs during the call to
TransactionIdDidCommit, not XidInMVCCSnapshot).

So we need to check notification transaction status only to understand
if a transaction was committed and we can send notification or it was
aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
and we should skip notification. And the solution that Yura Sokolov
suggested is to introduce a 'commited' flag, so we can check it
instead of calling TransactionIdDidCommit. Another option is to make
aborted transactions to clean up after themselves in AtAbort_Notify(),
so if listener see notification in the queue and transaction that
wrote it is not in progress, then such notification was always created
by committed transaction and we also don't need to call
TransactionIdDidCommit. One way for aborted transactions to clean up
is to set its notifications dbOid to 0, so listeners skip them.
Another option I think is to set queue HEAD back to the position where
it was when the aborted transaction started to write to the queue. It
also makes such aborted transaction notifications 'invisible' for
listeners. We have a global lock and there is always only one writer,
so it sounds possible. And there is another option which is the patch
that Rishu Bagga is working on. In the patch all transactions write to
the listen/notify queue only after they wrote a commit record to the
WAL, so all notifications in the queue are always by committed
transactions and we don't need to call TransactionIdDidCommit.

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-06T14:14:54Z

On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
>> Ok, I agree that that this may happen but I don't see this as a common
>> case to fix the issue based on this behaviour. I think that we check the
>> transaction status also to skip notifications that were added on the
>> queue by transactions that are not fully committed yet, and I see this
>> scenario as a more common case but I could be wrong.
>>
>
> IIUC we don't need clog to check if the notification transaction is
> still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
> If we see that the notification transaction is still in progress, we
> postpone processing of that notification. If we see that notification
> transaction is not in progress, then we check its status with
> TransactionIdDidCommit() (using clog) and only then fail if clog
> doesn't have notification transcation data (as a quick check, we can
> see in the stack trace that the failure occurs during the call to
> TransactionIdDidCommit, not XidInMVCCSnapshot).
>
> So we need to check notification transaction status only to understand
> if a transaction was committed and we can send notification or it was
> aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
> and we should skip notification. And the solution that Yura Sokolov
> suggested is to introduce a 'commited' flag, so we can check it
> instead of calling TransactionIdDidCommit. Another option is to make
> aborted transactions to clean up after themselves in AtAbort_Notify(),
> so if listener see notification in the queue and transaction that
> wrote it is not in progress, then such notification was always created
> by committed transaction and we also don't need to call
> TransactionIdDidCommit. One way for aborted transactions to clean up
> is to set its notifications dbOid to 0, so listeners skip them.
> Another option I think is to set queue HEAD back to the position where
> it was when the aborted transaction started to write to the queue. It
> also makes such aborted transaction notifications 'invisible' for
> listeners. We have a global lock and there is always only one writer,
> so it sounds possible. And there is another option which is the patch
> that Rishu Bagga is working on. In the patch all transactions write to
> the listen/notify queue only after they wrote a commit record to the
> WAL, so all notifications in the queue are always by committed
> transactions and we don't need to call TransactionIdDidCommit.
>
You are right, I missed the XidInMVCCSnapshot() call, sorry about that
and thanks very much for all the explanation!

I'll keep on hold the development of new patch versions for this thread
and focus on review and test the patch from Rishu at [1] to see if we
can make progress using the WAL approach.


[1] https://www.postgresql.org/message-id/CAK80%3DjipUfGC%2BUQSzeA4oCP9daRtHZGm2SQZWLxC9NWmVTDtRQ%40mail.gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-09-15T17:40:40Z

On Sat, Sep 6, 2025 at 7:15 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
> >> Ok, I agree that that this may happen but I don't see this as a common
> >> case to fix the issue based on this behaviour. I think that we check the
> >> transaction status also to skip notifications that were added on the
> >> queue by transactions that are not fully committed yet, and I see this
> >> scenario as a more common case but I could be wrong.
> >>
> >
> > IIUC we don't need clog to check if the notification transaction is
> > still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
> > If we see that the notification transaction is still in progress, we
> > postpone processing of that notification. If we see that notification
> > transaction is not in progress, then we check its status with
> > TransactionIdDidCommit() (using clog) and only then fail if clog
> > doesn't have notification transcation data (as a quick check, we can
> > see in the stack trace that the failure occurs during the call to
> > TransactionIdDidCommit, not XidInMVCCSnapshot).
> >
> > So we need to check notification transaction status only to understand
> > if a transaction was committed and we can send notification or it was
> > aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
> > and we should skip notification. And the solution that Yura Sokolov
> > suggested is to introduce a 'commited' flag, so we can check it
> > instead of calling TransactionIdDidCommit. Another option is to make
> > aborted transactions to clean up after themselves in AtAbort_Notify(),
> > so if listener see notification in the queue and transaction that
> > wrote it is not in progress, then such notification was always created
> > by committed transaction and we also don't need to call
> > TransactionIdDidCommit. One way for aborted transactions to clean up
> > is to set its notifications dbOid to 0, so listeners skip them.
> > Another option I think is to set queue HEAD back to the position where
> > it was when the aborted transaction started to write to the queue. It
> > also makes such aborted transaction notifications 'invisible' for
> > listeners. We have a global lock and there is always only one writer,
> > so it sounds possible. And there is another option which is the patch
> > that Rishu Bagga is working on. In the patch all transactions write to
> > the listen/notify queue only after they wrote a commit record to the
> > WAL, so all notifications in the queue are always by committed
> > transactions and we don't need to call TransactionIdDidCommit.
> >
> You are right, I missed the XidInMVCCSnapshot() call, sorry about that
> and thanks very much for all the explanation!
>
> I'll keep on hold the development of new patch versions for this thread
> and focus on review and test the patch from Rishu at [1] to see if we
> can make progress using the WAL approach.

While the WAL-based approach discussed on another thread is promising,
I think it would not be acceptable for back branches as it requires
quite a lot of refactoring. Given that this is a long-standing bug in
listen/notify, I think we can continue discussing how to fix the issue
on backbranches on this thread.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-18T21:34:40Z

On Mon Sep 15, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
> While the WAL-based approach discussed on another thread is promising,
> I think it would not be acceptable for back branches as it requires
> quite a lot of refactoring. Given that this is a long-standing bug in
> listen/notify, I think we can continue discussing how to fix the issue
> on backbranches on this thread.
>
Please see the new attached patch, it has a different implementation
that I've previously posted which is based on the idea that Arseniy
posted on [1].

This new version include the "committed" field on AsyncQueueEntry struct
so that we can use this info when processing the notification instead of
call TransactionIdDidCommit()

The "committed" field is set to true when the AsyncQueueEntry is being
added on the SLRU page buffer when the PreCommit_Notify() is called. If
an error occurs between the PreCommit_Notify() and AtCommit_Notify() the
AtAbort_Notify() will be called and will set the "committed" field to
false for the notifications inside the aborted transaction.

It's a bit tricky to know at AtAbort_Notify() which notifications were
added on the SLRU page buffer by the aborted transaction, so I created a
new data structure and a global variable to keep track of this
information. See the commit message for more information.

On the previously patch that I've posted I've created a TAP test to
reproduce the issue with the VACUUM FREEZE, this new version also
include this test and also a new test case that use the injection points
extension to force an error between the PreCommit_Notify() and
AtCommit_Notify() so that we can ensure that these notifications of an
aborted transaction are not visible to other listener backends.

[1] https://www.postgresql.org/message-id/CAE7r3M%2BXwf0A_aNhu7pYQd7nDQaqaEnyCjtvqg8XsgManmPOUA%40mail.gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-19T12:34:21Z

Hi,

On Fri, Sep 19, 2025 at 12:35 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Mon Sep 15, 2025 at 2:40 PM -03, Masahiko Sawada wrote:
> > While the WAL-based approach discussed on another thread is promising,
> > I think it would not be acceptable for back branches as it requires
> > quite a lot of refactoring. Given that this is a long-standing bug in
> > listen/notify, I think we can continue discussing how to fix the issue
> > on backbranches on this thread.
> >
> Please see the new attached patch, it has a different implementation
> that I've previously posted which is based on the idea that Arseniy
> posted on [1].
>

Thank you for the new version.

> This new version include the "committed" field on AsyncQueueEntry struct
> so that we can use this info when processing the notification instead of
> call TransactionIdDidCommit()
>
> The "committed" field is set to true when the AsyncQueueEntry is being
> added on the SLRU page buffer when the PreCommit_Notify() is called. If
> an error occurs between the PreCommit_Notify() and AtCommit_Notify() the
> AtAbort_Notify() will be called and will set the "committed" field to
> false for the notifications inside the aborted transaction.
>
> It's a bit tricky to know at AtAbort_Notify() which notifications were
> added on the SLRU page buffer by the aborted transaction, so I created a
> new data structure and a global variable to keep track of this
> information. See the commit message for more information.
>

I like this approach. We got rid of dependency on clog and don't limit
vacuum. Several points about the fix:

Is it correct to remember and reuse slru slots here? IIUC we can't do
it if we don't hold SLRU bank lock, because by the time we get in
AtAbort_Notify() the queue page could be already evicted. Probably we
need to use SimpleLruReadPage after we acquire the lock in
AtAbort_Notify()?

I think adding a boolean 'committed' is a good approach, but what do
you think about setting the queue head back to the position where
aborted transaction notifications start? We can do such a reset in
AtAbort_Notify(). So instead of marking notifications as
'commited=false' we completely erase them from the queue by moving the
queue head back. From listeners perspective if there is a notification
of completed transaction in the queue - it's always a committed
transaction, so again get rid of TransactionIdDidCommit() call. It
seems like a simpler approach because we don't need to remember all
notifications positions in the queue and don't need the additional
field 'committed'. All we need is to remember the head position before
we write anything to the queue, and reset it back if there is an
abort. IIUC Listeners will never send such erased notifications:
- while the aborted transaction is looking like 'in progress',
listeners can't send its notifications.
- by the time the aborted transaction is completed, the head is
already set back so erased notifications are located after the queue
head and listeners can't read it.

> On the previously patch that I've posted I've created a TAP test to
> reproduce the issue with the VACUUM FREEZE, this new version also
> include this test and also a new test case that use the injection points
> extension to force an error between the PreCommit_Notify() and
> AtCommit_Notify() so that we can ensure that these notifications of an
> aborted transaction are not visible to other listener backends.
>

I think it's a good test to have. FWIW there is a way to reproduce the
test condition without the injection point. We can use the fact that
serializable conflicts are checked after tx adds notifications to the
queue. Please find the attached patch with the example tap test. Not
sure if using injections points is more preferable?


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-19T14:36:43Z

On Fri Sep 19, 2025 at 9:34 AM -03, Arseniy Mukhin wrote:
> Hi,
>
Thanks for taking a look at this!

> I like this approach. We got rid of dependency on clog and don't limit
> vacuum. Several points about the fix:
>
> Is it correct to remember and reuse slru slots here? IIUC we can't do
> it if we don't hold SLRU bank lock, because by the time we get in
> AtAbort_Notify() the queue page could be already evicted. Probably we
> need to use SimpleLruReadPage after we acquire the lock in
> AtAbort_Notify()?
>
Yeah, I think it make sense. Maybe we could just store the page number
and offset and call SimpleLruReadPage() inside AtAbort_Notify() to get
the slotno?

> I think adding a boolean 'committed' is a good approach, but what do
> you think about setting the queue head back to the position where
> aborted transaction notifications start? We can do such a reset in
> AtAbort_Notify(). So instead of marking notifications as
> 'commited=false' we completely erase them from the queue by moving the
> queue head back. From listeners perspective if there is a notification
> of completed transaction in the queue - it's always a committed
> transaction, so again get rid of TransactionIdDidCommit() call. It
> seems like a simpler approach because we don't need to remember all
> notifications positions in the queue and don't need the additional
> field 'committed'. All we need is to remember the head position before
> we write anything to the queue, and reset it back if there is an
> abort. IIUC Listeners will never send such erased notifications:
> - while the aborted transaction is looking like 'in progress',
> listeners can't send its notifications.
> - by the time the aborted transaction is completed, the head is
> already set back so erased notifications are located after the queue
> head and listeners can't read it.
>
I think that with this approach we may have a scenario where when we
enter the AtAbort_Notify() the queue head may not be the notification
from the transaction being aborted but it can be a notification from
another committed transaction, so resetting the queue head back to the
position before the aborted transaction would make us lose the
notification from the committed transaction. Is that make sense? What do
you think?

> I think it's a good test to have. FWIW there is a way to reproduce the
> test condition without the injection point. We can use the fact that
> serializable conflicts are checked after tx adds notifications to the
> queue. Please find the attached patch with the example tap test. Not
> sure if using injections points is more preferable?
>
Great, I think that if we can have a way to reproduce this without an
injection point would be better. I dind't look the test yet but I'll try
to incorporate this on the next patch version. Thanks!

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-19T15:13:54Z

On Fri, Sep 19, 2025 at 5:36 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Fri Sep 19, 2025 at 9:34 AM -03, Arseniy Mukhin wrote:
> > Hi,
> >
> Thanks for taking a look at this!
>
> > I like this approach. We got rid of dependency on clog and don't limit
> > vacuum. Several points about the fix:
> >
> > Is it correct to remember and reuse slru slots here? IIUC we can't do
> > it if we don't hold SLRU bank lock, because by the time we get in
> > AtAbort_Notify() the queue page could be already evicted. Probably we
> > need to use SimpleLruReadPage after we acquire the lock in
> > AtAbort_Notify()?
> >
> Yeah, I think it make sense. Maybe we could just store the page number
> and offset and call SimpleLruReadPage() inside AtAbort_Notify() to get
> the slotno?

Yeah, I think it would work.

>
> > I think adding a boolean 'committed' is a good approach, but what do
> > you think about setting the queue head back to the position where
> > aborted transaction notifications start? We can do such a reset in
> > AtAbort_Notify(). So instead of marking notifications as
> > 'commited=false' we completely erase them from the queue by moving the
> > queue head back. From listeners perspective if there is a notification
> > of completed transaction in the queue - it's always a committed
> > transaction, so again get rid of TransactionIdDidCommit() call. It
> > seems like a simpler approach because we don't need to remember all
> > notifications positions in the queue and don't need the additional
> > field 'committed'. All we need is to remember the head position before
> > we write anything to the queue, and reset it back if there is an
> > abort. IIUC Listeners will never send such erased notifications:
> > - while the aborted transaction is looking like 'in progress',
> > listeners can't send its notifications.
> > - by the time the aborted transaction is completed, the head is
> > already set back so erased notifications are located after the queue
> > head and listeners can't read it.
> >
> I think that with this approach we may have a scenario where when we
> enter the AtAbort_Notify() the queue head may not be the notification
> from the transaction being aborted but it can be a notification from
> another committed transaction, so resetting the queue head back to the
> position before the aborted transaction would make us lose the
> notification from the committed transaction. Is that make sense? What do
> you think?

I think it's impossible: before pushing anything to the queue we
acquire global lock in PreCommit_Notify():

LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);

While we are holding the lock, no writers can add anything to the
queue. Then we save head position and add pending notifications to the
queue. The moment we get in AtAbort_Notify(), we still hold the global
lock (maybe it is worth adding Assert about it if we start relying on
it), so we can be sure there are no notifications in the queue after
the saved head position except ours. So it seems safe but maybe I
missed something.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-22T13:09:20Z

On Fri Sep 19, 2025 at 12:13 PM -03, Arseniy Mukhin wrote:
> I think it's impossible: before pushing anything to the queue we
> acquire global lock in PreCommit_Notify():
>
> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
>
> While we are holding the lock, no writers can add anything to the
> queue. Then we save head position and add pending notifications to the
> queue. The moment we get in AtAbort_Notify(), we still hold the global
> lock (maybe it is worth adding Assert about it if we start relying on
> it), so we can be sure there are no notifications in the queue after
> the saved head position except ours. So it seems safe but maybe I
> missed something.
>
Thanks for the explanation! I'm just not sure if I understand why do we
need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
PreCommit_Notify() if we already have the
LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);

See the attached patch that is based on your the previous comment of
resetting the QUEUE_HEAD at AtAbort_Notify()

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-23T16:11:35Z

On Mon, Sep 22, 2025 at 4:09 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Fri Sep 19, 2025 at 12:13 PM -03, Arseniy Mukhin wrote:
> > I think it's impossible: before pushing anything to the queue we
> > acquire global lock in PreCommit_Notify():
> >
> > LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
> >
> > While we are holding the lock, no writers can add anything to the
> > queue. Then we save head position and add pending notifications to the
> > queue. The moment we get in AtAbort_Notify(), we still hold the global
> > lock (maybe it is worth adding Assert about it if we start relying on
> > it), so we can be sure there are no notifications in the queue after
> > the saved head position except ours. So it seems safe but maybe I
> > missed something.
> >
> Thanks for the explanation! I'm just not sure if I understand why do we
> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
> PreCommit_Notify() if we already have the
> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
>

Good question. It seems that it would be enough to hold
NotifyQueueLock only during the head update, so I don't understand it
either.

> See the attached patch that is based on your the previous comment of
> resetting the QUEUE_HEAD at AtAbort_Notify()

Thank you for the new version!

The fix looks exactly how I thought about it. But while thinking about
why we need to hold NotifyQueueLock in PreCommit_Notify I realized
there is a concurrency bug in the 'head resetting' approach. I thought
that listeners hold NotifyQueueLock lock while reading notifications
from the queue, but now I see that they hold it only while reading the
current head position. So we can end up in the next situation:

There are 2 backends:  listener and writer (backend with notifications)

        listener                                   writer
  ----------------------------------------------------------------------
                                         writer wants to commit, so it
                                         adds notifications to the queue
                                         in PreCommit_Notify() and advances
                                         queue head.

  ----------------------------------------------------------------------
listener wants to read notifications. It
gets into asyncQueueReadAllNotifications()
and reads the current head (it already
contains writer's notifications):

asyncQueueReadAllNotifications()
          ...
    LWLockAcquire(NotifyQueueLock, LW_SHARED);
    head = QUEUE_HEAD;
    LWLockRelease(NotifyQueueLock);
          ...

  ----------------------------------------------------------------------
                                         writer failed to commit, so it
                                         resets queue head in AtAbort_Notify()
                                         and completes the transaction.

  ----------------------------------------------------------------------
listener gets a snapshot where the writer
is not in progress.

        ...
    snapshot = RegisterSnapshot(GetLatestSnapshot());
        ...

This way the listener reads the head that includes all writer's
notifications and a snapshot where the writer is not in progress, so
nothing stops the listener from sending these notifications and it's
even possible to have the listener's position that is after the queue
head, so yes, it's bad :( Sorry about that.

Probably we can fix it (swap GetLatestSnapshot() with the head
position reading for example) but now I think that 'committed' field
approach is a more robust way to fix the bug. What do you think?

BTW one thing about 'committed' field fix version [0]. It seems that
instead of remembering all notification positions, we can remember the
head position after we acquire global lock and before we write
anything to the queue and in case of abort we can just start from the
saved position and mark all notifications until the end of the queue
as 'committed = false'. The reason is the same as with resetting the
head approach - as we hold global lock, no writers can add
notifications to the queue, so when we in AtAbort_Notify() all
notifications after the saved position are ours.

[0] https://www.postgresql.org/message-id/CAFY6G8fui7omKfUXF%2BJJ82B34ExrwK6J7vKe61S-DhEmr_jBhA%40mail.gmail.com

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-23T22:39:57Z

On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
>> Thanks for the explanation! I'm just not sure if I understand why do we
>> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
>> PreCommit_Notify() if we already have the
>> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
>>
>
> Good question. It seems that it would be enough to hold
> NotifyQueueLock only during the head update, so I don't understand it
> either.
>
IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
wait while we are adding the entries on the queue.

> Thank you for the new version!
>
> The fix looks exactly how I thought about it. But while thinking about
> why we need to hold NotifyQueueLock in PreCommit_Notify I realized
> there is a concurrency bug in the 'head resetting' approach. I thought
> that listeners hold NotifyQueueLock lock while reading notifications
> from the queue, but now I see that they hold it only while reading the
> current head position. So we can end up in the next situation:
>
> There are 2 backends:  listener and writer (backend with notifications)
>
>
>         listener                                   writer
>   ----------------------------------------------------------------------
>                                          writer wants to commit, so it
>                                          adds notifications to the queue
>                                          in PreCommit_Notify() and advances
>                                          queue head.
>
>   ----------------------------------------------------------------------
> listener wants to read notifications. It
> gets into asyncQueueReadAllNotifications()
> and reads the current head (it already
> contains writer's notifications):
>
>
> asyncQueueReadAllNotifications()
>           ...
>     LWLockAcquire(NotifyQueueLock, LW_SHARED);
>     head = QUEUE_HEAD;
>     LWLockRelease(NotifyQueueLock);
>           ...
>
>   ----------------------------------------------------------------------
>                                          writer failed to commit, so it
>                                          resets queue head in AtAbort_Notify()
>                                          and completes the transaction.
>
>   ----------------------------------------------------------------------
> listener gets a snapshot where the writer
> is not in progress.
>
>         ...
>     snapshot = RegisterSnapshot(GetLatestSnapshot());
>         ...
>
>
> This way the listener reads the head that includes all writer's
> notifications and a snapshot where the writer is not in progress, so
> nothing stops the listener from sending these notifications and it's
> even possible to have the listener's position that is after the queue
> head, so yes, it's bad :( Sorry about that.
>
Yeah, this is bad. I'm wondering if we could reproduce such race
conditions scenarios with some TAP tests.

> Probably we can fix it (swap GetLatestSnapshot() with the head
> position reading for example) but now I think that 'committed' field
> approach is a more robust way to fix the bug. What do you think?
>
I also agree that the committed field seems a more safe approach.

> BTW one thing about 'committed' field fix version [0]. It seems that
> instead of remembering all notification positions, we can remember the
> head position after we acquire global lock and before we write
> anything to the queue and in case of abort we can just start from the
> saved position and mark all notifications until the end of the queue
> as 'committed = false'. The reason is the same as with resetting the
> head approach - as we hold global lock, no writers can add
> notifications to the queue, so when we in AtAbort_Notify() all
> notifications after the saved position are ours.
>
See the new attached version which implements this idea of using the
committed field approach. I was just a bit concenerd about a race
condition situation where the QUEUE_HEAD is changed by another publisher
process and just iterating over all entries from the saved previous
QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
successfully committed notifications as not committed by mistake, so in
this new patch version I save the QUEUE_HEAD position before we add the
entries on the shared queue and also the QUEUE_HEAD position after these
entries are added so we ensure that we only process the entries of this
range although we have the global lock
LockSharedObject(DatabaseRelationId) that may prevent this situation.

What do you think?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-24T13:59:07Z

On Wed, Sep 24, 2025 at 1:40 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
> >> Thanks for the explanation! I'm just not sure if I understand why do we
> >> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
> >> PreCommit_Notify() if we already have the
> >> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
> >>
> >
> > Good question. It seems that it would be enough to hold
> > NotifyQueueLock only during the head update, so I don't understand it
> > either.
> >
> IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
> InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
> LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
> wait while we are adding the entries on the queue.

Yeah, but I don't get why we want to block them with NotifyQueueLock
while we are writing to the queue. There is a SLRU bank lock and they
can't read anything after the head. Maybe it's just not a big deal?

> ...
> > This way the listener reads the head that includes all writer's
> > notifications and a snapshot where the writer is not in progress, so
> > nothing stops the listener from sending these notifications and it's
> > even possible to have the listener's position that is after the queue
> > head, so yes, it's bad :( Sorry about that.
> >
> Yeah, this is bad. I'm wondering if we could reproduce such race
> conditions scenarios with some TAP tests.
>

I agree it would be great to have more tests for such cases. As for
the 'committed field' patch, I think we can add a TAP test that shows
that listeners postpone processing of notifications until
notifications were marked as 'committed=false' in case of aborted
transactions. I tried to write one, but have not succeeded yet. Hope
to finish it soon.

> > Probably we can fix it (swap GetLatestSnapshot() with the head
> > position reading for example) but now I think that 'committed' field
> > approach is a more robust way to fix the bug. What do you think?
> >
> I also agree that the committed field seems a more safe approach.
>
> > BTW one thing about 'committed' field fix version [0]. It seems that
> > instead of remembering all notification positions, we can remember the
> > head position after we acquire global lock and before we write
> > anything to the queue and in case of abort we can just start from the
> > saved position and mark all notifications until the end of the queue
> > as 'committed = false'. The reason is the same as with resetting the
> > head approach - as we hold global lock, no writers can add
> > notifications to the queue, so when we in AtAbort_Notify() all
> > notifications after the saved position are ours.
> >
> See the new attached version which implements this idea of using the
> committed field approach. I was just a bit concenerd about a race
> condition situation where the QUEUE_HEAD is changed by another publisher
> process and just iterating over all entries from the saved previous
> QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
> successfully committed notifications as not committed by mistake, so in
> this new patch version I save the QUEUE_HEAD position before we add the
> entries on the shared queue and also the QUEUE_HEAD position after these
> entries are added so we ensure that we only process the entries of this
> range although we have the global lock
> LockSharedObject(DatabaseRelationId) that may prevent this situation.

Thank you! Speaking of the scenario, my understanding is that it's
impossible as we hold the global lock, so QueuePosition head should
always be equal to QUEUE_HEAD when we get into At_AbortNotify(), but
maybe I'm wrong.

Patch looks great. Some minor points:

I have a warning when using git am with the patch:
     warning: 1 line adds whitespace errors.

There is a comment in the head of the async.c file about some
listen/notify internals. Maybe it's worth adding a comment about how
aborted transactions do clean up.

What do you think about a Assert in asyncQueueRollbackNotifications()
that other backends still see us as 'in progress'? So we can be sure
that they can't process our notifications before we mark notifications
as 'committed=false'. Not sure how to do it correctly, maybe

          Assert(TransactionIdIsValid(MyProc->xid));

will work? The TAP test that I tried to write also should test it.

There are several comments where the word "crash" is used. What do you
think about using "abort" instead? "Crash" sounds more like PANIC
situation where we don't care about notifications because they don't
survive restart.

Thank you!

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-09-24T19:23:08Z

On Wed Sep 24, 2025 at 10:59 AM -03, Arseniy Mukhin wrote:
> Thank you! Speaking of the scenario, my understanding is that it's
> impossible as we hold the global lock, so QueuePosition head should
> always be equal to QUEUE_HEAD when we get into At_AbortNotify(), but
> maybe I'm wrong.
>
Let's see if anyone else has any thoughts about this.

> Patch looks great. Some minor points:
>
> I have a warning when using git am with the patch:
>      warning: 1 line adds whitespace errors.
>
I don't think that this is a problem? And I don't know how to remove
this warning, I've just created the patch with git format-patch @~1

> There is a comment in the head of the async.c file about some
> listen/notify internals. Maybe it's worth adding a comment about how
> aborted transactions do clean up.
>
Added

> What do you think about a Assert in asyncQueueRollbackNotifications()
> that other backends still see us as 'in progress'? So we can be sure
> that they can't process our notifications before we mark notifications
> as 'committed=false'. Not sure how to do it correctly, maybe
>
>           Assert(TransactionIdIsValid(MyProc->xid));
>
> will work? The TAP test that I tried to write also should test it.
>
Sounds resanable to me. I think that we could use the following assert
when iterating over the entries, what do you think?
			Assert(TransactionIdIsInProgress(qe->xid));

> There are several comments where the word "crash" is used. What do you
> think about using "abort" instead? "Crash" sounds more like PANIC
> situation where we don't care about notifications because they don't
> survive restart.
>
Good point, fixed

> Thank you!
>
Thanks for reviewing this!

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-25T09:43:34Z

On Wed, Sep 24, 2025 at 10:23 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Sep 24, 2025 at 10:59 AM -03, Arseniy Mukhin wrote:
> > ...
> > Patch looks great. Some minor points:
> >
> > I have a warning when using git am with the patch:
> >      warning: 1 line adds whitespace errors.
> >
> I don't think that this is a problem? And I don't know how to remove
> this warning, I've just created the patch with git format-patch @~1
>

In my experience a run of pgindent usually fixes all such warnings,
but I agree it's not a big deal.

>
> > What do you think about a Assert in asyncQueueRollbackNotifications()
> > that other backends still see us as 'in progress'? So we can be sure
> > that they can't process our notifications before we mark notifications
> > as 'committed=false'. Not sure how to do it correctly, maybe
> >
> >           Assert(TransactionIdIsValid(MyProc->xid));
> >
> > will work? The TAP test that I tried to write also should test it.
> >
> Sounds resanable to me. I think that we could use the following assert
> when iterating over the entries, what do you think?
>                         Assert(TransactionIdIsInProgress(qe->xid));
>

Yeah, TransactionIdIsInProgress() looks like the right way to do it.
And one more 'Assert' idea... maybe we can add
Assert(TransactionIdIsCurrentTransactionId(qe->xid)) just to verify
that we are only touching our own notifications?

Thank you, the fixes look good.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-09-28T13:17:48Z

Hi,

On Wed, Sep 24, 2025 at 4:59 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
> On Wed, Sep 24, 2025 at 1:40 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> > ...
> > > This way the listener reads the head that includes all writer's
> > > notifications and a snapshot where the writer is not in progress, so
> > > nothing stops the listener from sending these notifications and it's
> > > even possible to have the listener's position that is after the queue
> > > head, so yes, it's bad :( Sorry about that.
> > >
> > Yeah, this is bad. I'm wondering if we could reproduce such race
> > conditions scenarios with some TAP tests.
> >
>
> I agree it would be great to have more tests for such cases. As for
> the 'committed field' patch, I think we can add a TAP test that shows
> that listeners postpone processing of notifications until
> notifications were marked as 'committed=false' in case of aborted
> transactions. I tried to write one, but have not succeeded yet. Hope
> to finish it soon.

I finally managed to write a TAP test for it, so there is a new
version with the tap test.

I also realized that we can increase test coverage in
002_aborted_tx_notifies.pl if notifications of the aborted transaction
span several pages. This way we can better test
asyncQueueRollbackNotifications(). So I changed
002_aborted_tx_notifies.pl TAP test a bit.

And there is a small indentation change in lmgr.h that should fix this
git am warning.

I added all changes as a separate patch file (0002) so it was more
clear what changed. Please feel free to merge into the main patch file
/ drop any part of it that makes sense to you.

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-01T11:57:38Z

On Sun Sep 28, 2025 at 10:17 AM -03, Arseniy Mukhin wrote:
>> I agree it would be great to have more tests for such cases. As for
>> the 'committed field' patch, I think we can add a TAP test that shows
>> that listeners postpone processing of notifications until
>> notifications were marked as 'committed=false' in case of aborted
>> transactions. I tried to write one, but have not succeeded yet. Hope
>> to finish it soon.
>
> I finally managed to write a TAP test for it, so there is a new
> version with the tap test.
>
> I also realized that we can increase test coverage in
> 002_aborted_tx_notifies.pl if notifications of the aborted transaction
> span several pages. This way we can better test
> asyncQueueRollbackNotifications(). So I changed
> 002_aborted_tx_notifies.pl TAP test a bit.
>
> And there is a small indentation change in lmgr.h that should fix this
> git am warning.
>
Thanks for the patches.

I've created a CF entry so we can get more reviews and comments:
https://commitfest.postgresql.org/patch/6095/

Let's see what other think about the approach being used to fix this
issue.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-02T12:06:38Z

Hi,

On Wed, Oct 1, 2025 at 2:57 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Sun Sep 28, 2025 at 10:17 AM -03, Arseniy Mukhin wrote:
> >> I agree it would be great to have more tests for such cases. As for
> >> the 'committed field' patch, I think we can add a TAP test that shows
> >> that listeners postpone processing of notifications until
> >> notifications were marked as 'committed=false' in case of aborted
> >> transactions. I tried to write one, but have not succeeded yet. Hope
> >> to finish it soon.
> >
> > I finally managed to write a TAP test for it, so there is a new
> > version with the tap test.
> >
> > I also realized that we can increase test coverage in
> > 002_aborted_tx_notifies.pl if notifications of the aborted transaction
> > span several pages. This way we can better test
> > asyncQueueRollbackNotifications(). So I changed
> > 002_aborted_tx_notifies.pl TAP test a bit.
> >
> > And there is a small indentation change in lmgr.h that should fix this
> > git am warning.
> >
> Thanks for the patches.
>
> I've created a CF entry so we can get more reviews and comments:
> https://commitfest.postgresql.org/patch/6095/
>

Thank you.

There is a test failure on CI, so please find the new patch version
with the fix (Makefile was updated a little bit). And I merged 0002
file with tap tests into 0001.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-17T13:51:01Z

I have the impression that this thread has lost focus on the idea of
producing a backpatchable bugfix.  The last proposed patch has a lot of
new mechanism that doesn't seem suitable for backpatch.  I could be
wrong of course.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"[PostgreSQL] is a great group; in my opinion it is THE best open source
development communities in existence anywhere."                (Lamar Owen)

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-17T20:50:53Z

On Fri, Oct 17, 2025 at 7:31 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> I have the impression that this thread has lost focus on the idea of
> producing a backpatchable bugfix.  The last proposed patch has a lot of
> new mechanism that doesn't seem suitable for backpatch.  I could be
> wrong of course.
>

Oops, I guess the TAP test that I added in the last version and that
uses injection points is one of those things. PFA the new version
without it. I also noticed that CheckSharedObjectLockedByMe() which
was introduced in one of previous versions is unused, so I removed it
too.

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-18T05:43:15Z

On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> On Fri, Oct 17, 2025 at 7:31 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>>
>> I have the impression that this thread has lost focus on the idea of
>> producing a backpatchable bugfix.  The last proposed patch has a lot of
>> new mechanism that doesn't seem suitable for backpatch.  I could be
>> wrong of course.
>>
>
> Oops, I guess the TAP test that I added in the last version and that
> uses injection points is one of those things. PFA the new version
> without it. I also noticed that CheckSharedObjectLockedByMe() which
> was introduced in one of previous versions is unused, so I removed it
> too.

What a funny coincidence that the approach in this patch,
has one similarity with the "Direct advancement" approach
in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
namely that we're both interested in QUEUE_HEAD before/after
we push the notifications into the queue, in PreCommit_Notify().

It looks to me like our new data structures are Interchangeable,
so I guess we probably want both patches to eventually settle
on one and the same?

The differences I note between our queue head before/after code are:

- In this patch, you are palloc'ing a struct with two fields.
  In [1], we're using two separate static QueuePosition variables.

- In this patch, you are taking/releasing a shared lock before/after
  the loop to read QUEUE_HEAD and set previousHead/head.
  In [1], we avoid the need of the shared lock, by doing the reads
  within the existing exclusive lock inside the loop, but instead
  therefore need a firstIteration bool, to know which is the first
  iteration, and need to overwrite the after-var in each iteration.

I don't think the noted differences above matter, both seems fine.

Another thing I noticed in your patch that made me wonder,
is the naming of the new AsyncQueueEntry bool field,
which is given the name "committed".

I think this name is not entirely faithful, since when set to true,
the entry has not been committed yet.

How about negating the meaning of this boolean field?
To instead indicate when the entry has been rollbacked.
Then, it would clearly communicate just that.

Maybe naming it something like "rollbacked" or "aborted"?

/Joel

[1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e%40app.fastmail.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-19T17:14:07Z

On Fri, Oct 17, 2025, at 15:51, Álvaro Herrera wrote:
> I have the impression that this thread has lost focus on the idea of
> producing a backpatchable bugfix.  The last proposed patch has a lot of
> new mechanism that doesn't seem suitable for backpatch.  I could be
> wrong of course.

I've tried to create a minimal isolated fix, hopefully suitable for
backpatching, with no new mechanisms, other than the added
GetOldestQueuedNotifyXid used by vac_update_datfrozenxid.

It's based on the approach discussed earlier in this thread, that just
goes through the notification queue from QUEUE_TAIL to QUEUE_HEAD, to
find the oldestXid in the current database.

Implementation:

* Break out SLRU read page code from asyncQueueReadAllNotifications into
  new helper-function asyncQueueReadPageToBuffer.

* Add GetOldestQueuedNotifyXid which uses the new helper-function
  asyncQueueReadPageToBuffer.

It passes the 001_xid_freeze.pl test, not included in this patch.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T12:37:24Z

On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
> On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> What a funny coincidence that the approach in this patch,
> has one similarity with the "Direct advancement" approach
> in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
> namely that we're both interested in QUEUE_HEAD before/after
> we push the notifications into the queue, in PreCommit_Notify().
>
> It looks to me like our new data structures are Interchangeable,
> so I guess we probably want both patches to eventually settle
> on one and the same?
>
> The differences I note between our queue head before/after code are:
>
> - In this patch, you are palloc'ing a struct with two fields.
>   In [1], we're using two separate static QueuePosition variables.
>
> - In this patch, you are taking/releasing a shared lock before/after
>   the loop to read QUEUE_HEAD and set previousHead/head.
>   In [1], we avoid the need of the shared lock, by doing the reads
>   within the existing exclusive lock inside the loop, but instead
>   therefore need a firstIteration bool, to know which is the first
>   iteration, and need to overwrite the after-var in each iteration.
>
> I don't think the noted differences above matter, both seems fine.
>
Yeah, I also think that both approach seems fine. I keep the v8 version
with the palloc, if someone has any concern about this I'm open to
switch to another approach.

> Another thing I noticed in your patch that made me wonder,
> is the naming of the new AsyncQueueEntry bool field,
> which is given the name "committed".
>
> I think this name is not entirely faithful, since when set to true,
> the entry has not been committed yet.
>
> How about negating the meaning of this boolean field?
> To instead indicate when the entry has been rollbacked.
> Then, it would clearly communicate just that.
>
> Maybe naming it something like "rollbacked" or "aborted"?
>
Good point. I've renamed this field on the attached v8 version.

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T12:41:43Z

On Sun Oct 19, 2025 at 2:14 PM -03, Joel Jacobson wrote:
> On Fri, Oct 17, 2025, at 15:51, Álvaro Herrera wrote:
>> I have the impression that this thread has lost focus on the idea of
>> producing a backpatchable bugfix.  The last proposed patch has a lot of
>> new mechanism that doesn't seem suitable for backpatch.  I could be
>> wrong of course.
>
> I've tried to create a minimal isolated fix, hopefully suitable for
> backpatching, with no new mechanisms, other than the added
> GetOldestQueuedNotifyXid used by vac_update_datfrozenxid.
>
> It's based on the approach discussed earlier in this thread, that just
> goes through the notification queue from QUEUE_TAIL to QUEUE_HEAD, to
> find the oldestXid in the current database.
>
> Implementation:
>
> * Break out SLRU read page code from asyncQueueReadAllNotifications into
>   new helper-function asyncQueueReadPageToBuffer.
>
> * Add GetOldestQueuedNotifyXid which uses the new helper-function
>   asyncQueueReadPageToBuffer.
>
> It passes the 001_xid_freeze.pl test, not included in this patch.
>
This is similar to what was already proposed at [1]. This approach was
abandoned because a notification on the queue may block datfrozenxid
advance and clog truncation which can cause other issues for the users [2].


[1] https://www.postgresql.org/message-id/CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/d186fba0-dc65-4274-aa96-3906bbb2e530%40postgrespro.ru

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-20T14:18:50Z

On 2025-Oct-20, Matheus Alcantara wrote:

> This is similar to what was already proposed at [1]. This approach was
> abandoned because a notification on the queue may block datfrozenxid
> advance and clog truncation which can cause other issues for the users [2].

Well, I think that this is the right solution for backpatching, and that
you were wrong to abandon it.  You can continue to design a better
mechanism for the master branch, but in old branches we cannot really do
all those things you're proposing to do.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-20T18:19:50Z

On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
> On 2025-Oct-20, Matheus Alcantara wrote:
>
>> This is similar to what was already proposed at [1]. This approach was
>> abandoned because a notification on the queue may block datfrozenxid
>> advance and clog truncation which can cause other issues for the users [2].
>
> Well, I think that this is the right solution for backpatching, and that
> you were wrong to abandon it.  You can continue to design a better
> mechanism for the master branch, but in old branches we cannot really do
> all those things you're proposing to do.
> 
I actually would prefer this approach TBH, but since this can cause
other issues like transaction wraparound due to not consumed
notifications we would need other mechanisms to prevent that and I'm not
sure if users should expect this kind of behavior changes on minor
version updates? 

I think that to go with this solution we would need some way to drop too
old notifications from the queue to advance the datfrozenxid, so I
imagine that we would need some GUC to make this configurable and we can
configure a default value of course but some use cases may not be the
best configuration, this is something that users should expected to deal
on minor version updates?

Going with the "self contained" idea sound more easier to backpatch
actually, so this is the main reason that I abandoned this other
approach. Could you please point what make the v8 version not visible
for bachpatching?

Thanks for the comments!

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-21T01:31:54Z

On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
> > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> > What a funny coincidence that the approach in this patch,
> > has one similarity with the "Direct advancement" approach
> > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
> > namely that we're both interested in QUEUE_HEAD before/after
> > we push the notifications into the queue, in PreCommit_Notify().
> >
> > It looks to me like our new data structures are Interchangeable,
> > so I guess we probably want both patches to eventually settle
> > on one and the same?
> >
> > The differences I note between our queue head before/after code are:
> >
> > - In this patch, you are palloc'ing a struct with two fields.
> >   In [1], we're using two separate static QueuePosition variables.
> >
> > - In this patch, you are taking/releasing a shared lock before/after
> >   the loop to read QUEUE_HEAD and set previousHead/head.
> >   In [1], we avoid the need of the shared lock, by doing the reads
> >   within the existing exclusive lock inside the loop, but instead
> >   therefore need a firstIteration bool, to know which is the first
> >   iteration, and need to overwrite the after-var in each iteration.
> >
> > I don't think the noted differences above matter, both seems fine.
> >
> Yeah, I also think that both approach seems fine. I keep the v8 version
> with the palloc, if someone has any concern about this I'm open to
> switch to another approach.
>
> > Another thing I noticed in your patch that made me wonder,
> > is the naming of the new AsyncQueueEntry bool field,
> > which is given the name "committed".
> >
> > I think this name is not entirely faithful, since when set to true,
> > the entry has not been committed yet.
> >
> > How about negating the meaning of this boolean field?
> > To instead indicate when the entry has been rollbacked.
> > Then, it would clearly communicate just that.
> >
> > Maybe naming it something like "rollbacked" or "aborted"?
> >
> Good point. I've renamed this field on the attached v8 version.

I've reviewed the v8 patch and I'm not sure it's a bullet-proof
approach. The basic idea of the v8 patch is to add async entries with
rollbacked=false and we set rollbacked=true in
asyncQueueRollbackNotifications called from AtAbort_Notify() if an
error happens between PreCommit_Notify() and AtCommit_Notify(). The
asyncQueueRollbackNotifications() reads SLRU pages to mark entries
'rollbacked' but reading SLRU pages could fail for some reason. In
this case, we would end up with a PANIC for recursive errors, or even
if we skip marking entries we would end up leaving entries as
committed. Also, if there are a lot of notifications across multiple
SLRU pages we need to mark as 'rollbacked',
asyncQueueRollbackNotifications() could take time to complete but it's
not interruptible. I don't think it's a good idea to introduce such an
operation in abort paths.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-21T20:26:55Z

On Tue, Oct 21, 2025 at 4:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
> > > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> > > What a funny coincidence that the approach in this patch,
> > > has one similarity with the "Direct advancement" approach
> > > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
> > > namely that we're both interested in QUEUE_HEAD before/after
> > > we push the notifications into the queue, in PreCommit_Notify().
> > >
> > > It looks to me like our new data structures are Interchangeable,
> > > so I guess we probably want both patches to eventually settle
> > > on one and the same?
> > >
> > > The differences I note between our queue head before/after code are:
> > >
> > > - In this patch, you are palloc'ing a struct with two fields.
> > >   In [1], we're using two separate static QueuePosition variables.
> > >
> > > - In this patch, you are taking/releasing a shared lock before/after
> > >   the loop to read QUEUE_HEAD and set previousHead/head.
> > >   In [1], we avoid the need of the shared lock, by doing the reads
> > >   within the existing exclusive lock inside the loop, but instead
> > >   therefore need a firstIteration bool, to know which is the first
> > >   iteration, and need to overwrite the after-var in each iteration.
> > >
> > > I don't think the noted differences above matter, both seems fine.
> > >
> > Yeah, I also think that both approach seems fine. I keep the v8 version
> > with the palloc, if someone has any concern about this I'm open to
> > switch to another approach.
> >
> > > Another thing I noticed in your patch that made me wonder,
> > > is the naming of the new AsyncQueueEntry bool field,
> > > which is given the name "committed".
> > >
> > > I think this name is not entirely faithful, since when set to true,
> > > the entry has not been committed yet.
> > >
> > > How about negating the meaning of this boolean field?
> > > To instead indicate when the entry has been rollbacked.
> > > Then, it would clearly communicate just that.
> > >
> > > Maybe naming it something like "rollbacked" or "aborted"?
> > >
> > Good point. I've renamed this field on the attached v8 version.
>
> I've reviewed the v8 patch and I'm not sure it's a bullet-proof
> approach. The basic idea of the v8 patch is to add async entries with
> rollbacked=false and we set rollbacked=true in
> asyncQueueRollbackNotifications called from AtAbort_Notify() if an
> error happens between PreCommit_Notify() and AtCommit_Notify(). The
> asyncQueueRollbackNotifications() reads SLRU pages to mark entries
> 'rollbacked' but reading SLRU pages could fail for some reason. In
> this case, we would end up with a PANIC for recursive errors, or even
> if we skip marking entries we would end up leaving entries as
> committed. Also, if there are a lot of notifications across multiple
> SLRU pages we need to mark as 'rollbacked',
> asyncQueueRollbackNotifications() could take time to complete but it's
> not interruptible. I don't think it's a good idea to introduce such an
> operation in abort paths.
>

Thanks for the explanation! Now I see why using AtAbort_Notify() was a
bad idea. It might be possible to solve the second problem and make
the amount of work in AtAbort_Notify independent of the number of
notifications. But I have no idea how to solve the SLRU page issue.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-21T21:42:18Z

On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
> > On 2025-Oct-20, Matheus Alcantara wrote:
> >
> >> This is similar to what was already proposed at [1]. This approach was
> >> abandoned because a notification on the queue may block datfrozenxid
> >> advance and clog truncation which can cause other issues for the users [2].
> >
> > Well, I think that this is the right solution for backpatching, and that
> > you were wrong to abandon it.  You can continue to design a better
> > mechanism for the master branch, but in old branches we cannot really do
> > all those things you're proposing to do.
> >
> I actually would prefer this approach TBH, but since this can cause
> other issues like transaction wraparound due to not consumed
> notifications we would need other mechanisms to prevent that and I'm not
> sure if users should expect this kind of behavior changes on minor
> version updates?

True, unconsumed notifications could cause transaction wraparound by
preventing datfrozenxid from advancing. However, this risk only
applies when users have long-term unconsumed notifications, which is
uncommon. That said, we should note that, as I mentioned
previously[1], a process can accumulate unconsumed notifications
simply by being in idle-in-transaction state, even without
backend_xmin and backend_xid, which prevents datfrozenxid from
advancing. While this might not be problematic in practice if it's
rare, I find it concerning that we have no way to check the age of
unconsumed notifications.

>
> I think that to go with this solution we would need some way to drop too
> old notifications from the queue to advance the datfrozenxid, so I
> imagine that we would need some GUC to make this configurable and we can
> configure a default value of course but some use cases may not be the
> best configuration, this is something that users should expected to deal
> on minor version updates?

I think adding a new GUC would be overkill for this fix. As for
dropping old notifications from the queue, we probably don't need to
make it configurable - we could simply drop notifications whose commit
status is no longer available (instead of raising an error).

>
> Going with the "self contained" idea sound more easier to backpatch
> actually, so this is the main reason that I abandoned this other
> approach. Could you please point what make the v8 version not visible
> for bachpatching?

Regarding the v8 patch, it introduces a fundamentally new way of
managing notification entries (adding entries with 'committed' state
and marking them 'aborted' in abort paths). This affects all use
cases, not just those involving very old unconsumed notifications, and
could introduce more serious bugs like PANIC or SEGV. For
backpatching, I prefer targeting just the problematic behavior while
leaving unrelated parts unchanged. Though Álvaro might have a
different perspective on this.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoCD%2BHXoc2QZCAS9d8ahDeikNqbnU0i6cQzpMFOEurkPPg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-21T23:16:49Z

On 21/10/25 18:42, Masahiko Sawada wrote:
> On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
>>> On 2025-Oct-20, Matheus Alcantara wrote:
>>>
>>>> This is similar to what was already proposed at [1]. This approach was
>>>> abandoned because a notification on the queue may block datfrozenxid
>>>> advance and clog truncation which can cause other issues for the users [2].
>>>
>>> Well, I think that this is the right solution for backpatching, and that
>>> you were wrong to abandon it.  You can continue to design a better
>>> mechanism for the master branch, but in old branches we cannot really do
>>> all those things you're proposing to do.
>>>
>> I actually would prefer this approach TBH, but since this can cause
>> other issues like transaction wraparound due to not consumed
>> notifications we would need other mechanisms to prevent that and I'm not
>> sure if users should expect this kind of behavior changes on minor
>> version updates?
> 
> True, unconsumed notifications could cause transaction wraparound by
> preventing datfrozenxid from advancing. However, this risk only
> applies when users have long-term unconsumed notifications, which is
> uncommon. That said, we should note that, as I mentioned
> previously[1], a process can accumulate unconsumed notifications
> simply by being in idle-in-transaction state, even without
> backend_xmin and backend_xid, which prevents datfrozenxid from
> advancing. While this might not be problematic in practice if it's
> rare, I find it concerning that we have no way to check the age of
> unconsumed notifications.
>
Ok, I think that I was too conservative when thinking about the
transaction wraparound issue that it could happen. I agree that this
seems a uncommon scenario.

>> I think that to go with this solution we would need some way to drop too
>> old notifications from the queue to advance the datfrozenxid, so I
>> imagine that we would need some GUC to make this configurable and we can
>> configure a default value of course but some use cases may not be the
>> best configuration, this is something that users should expected to deal
>> on minor version updates?
> 
> I think adding a new GUC would be overkill for this fix. As for
> dropping old notifications from the queue, we probably don't need to
> make it configurable - we could simply drop notifications whose commit
> status is no longer available (instead of raising an error).
>
IIUC this is about not making the vacuum freeze considering the oldest
xid on the queue but just remove notifications whose transaction status
is no longer available right? Since currently when the error happens we
already can't process the notifications it seems a reasonable way to go
IMO.

>> Going with the "self contained" idea sound more easier to backpatch
>> actually, so this is the main reason that I abandoned this other
>> approach. Could you please point what make the v8 version not visible
>> for bachpatching?
> 
> Regarding the v8 patch, it introduces a fundamentally new way of
> managing notification entries (adding entries with 'committed' state
> and marking them 'aborted' in abort paths). This affects all use
> cases, not just those involving very old unconsumed notifications, and
> could introduce more serious bugs like PANIC or SEGV. For
> backpatching, I prefer targeting just the problematic behavior while
> leaving unrelated parts unchanged. Though Álvaro might have a
> different perspective on this.
>
Thanks very much for this explanation and for what you've previously
wrote on [1]. It's clear to me now that the v8 architecture is not a
good way to go.

[1] https://www.postgresql.org/message-id/CAD21AoCFZxXCBy%2B5DoarfG9LC9VdNwWRDpDHE5sdTh5Ym0EcqQ%40mail.gmail.com

-- 
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-22T00:02:08Z

On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
>> Regarding the v8 patch, it introduces a fundamentally new way of
>> managing notification entries (adding entries with 'committed' state
>> and marking them 'aborted' in abort paths). This affects all use
>> cases, not just those involving very old unconsumed notifications, and
>> could introduce more serious bugs like PANIC or SEGV. For
>> backpatching, I prefer targeting just the problematic behavior while
>> leaving unrelated parts unchanged. Though Álvaro might have a
>> different perspective on this.
>>
> Thanks very much for this explanation and for what you've previously
> wrote on [1]. It's clear to me now that the v8 architecture is not a
> good way to go.

How about doing some more work in vac_update_datfrozenxid()?

Pseudo-code sketch:

```
void
vac_update_datfrozenxid(void)
{

    /* After computing newFrozenXid from all known sources... */

    TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();

    if (TransactionIdIsValid(oldestNotifyXid) &&
        TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
    {
        /*
         * The async queue has XIDs older than our proposed freeze point.
         * Attempt cleanup, then back off and let the next VACUUM benefit.
         */

        if (asyncQueueHasListeners())
        {
            /*
             * Wake all listening backends across *all* databases
             * that are not already at QUEUE_HEAD.
             * They'll hopefully process notifications and advance
             * their pointers, allowing the next VACUUM to freeze further.
             */
            asyncQueueWakeAllListeners();
        }
        else
        {
            /*
             * No listeners exist - discard all unread notifications.
             * The next VACUUM should succeed in advancing datfrozenxid.
             * asyncQueueAdvanceTailNoListeners() would take exclusive lock
             * on NotifyQueueLock before checking
             * QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
             */
            asyncQueueAdvanceTailNoListeners();
        }

        /*
         * Back off datfrozenxid to protect the old XIDs.
         * The cleanup we just performed should allow the next VACUUM
         * to freeze further.
         */
        newFrozenXid = oldestNotifyXid;
    }
}
```

Maybe it wouldn't solve all problematic situations, but to me it seems
like these measures could help many of them, or am I missing some
crucial insight here?

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-22T04:31:46Z

On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On 21/10/25 18:42, Masahiko Sawada wrote:
> > On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
> > <matheusssilv97@gmail.com> wrote:
> >>
> >> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
> >>> On 2025-Oct-20, Matheus Alcantara wrote:
> >>>
> >>>> This is similar to what was already proposed at [1]. This approach was
> >>>> abandoned because a notification on the queue may block datfrozenxid
> >>>> advance and clog truncation which can cause other issues for the users [2].
> >>>
> >>> Well, I think that this is the right solution for backpatching, and that
> >>> you were wrong to abandon it.  You can continue to design a better
> >>> mechanism for the master branch, but in old branches we cannot really do
> >>> all those things you're proposing to do.
> >>>
> >> I actually would prefer this approach TBH, but since this can cause
> >> other issues like transaction wraparound due to not consumed
> >> notifications we would need other mechanisms to prevent that and I'm not
> >> sure if users should expect this kind of behavior changes on minor
> >> version updates?
> >
> > True, unconsumed notifications could cause transaction wraparound by
> > preventing datfrozenxid from advancing. However, this risk only
> > applies when users have long-term unconsumed notifications, which is
> > uncommon. That said, we should note that, as I mentioned
> > previously[1], a process can accumulate unconsumed notifications
> > simply by being in idle-in-transaction state, even without
> > backend_xmin and backend_xid, which prevents datfrozenxid from
> > advancing. While this might not be problematic in practice if it's
> > rare, I find it concerning that we have no way to check the age of
> > unconsumed notifications.
> >
> Ok, I think that I was too conservative when thinking about the
> transaction wraparound issue that it could happen. I agree that this
> seems a uncommon scenario.
>
> >> I think that to go with this solution we would need some way to drop too
> >> old notifications from the queue to advance the datfrozenxid, so I
> >> imagine that we would need some GUC to make this configurable and we can
> >> configure a default value of course but some use cases may not be the
> >> best configuration, this is something that users should expected to deal
> >> on minor version updates?
> >
> > I think adding a new GUC would be overkill for this fix. As for
> > dropping old notifications from the queue, we probably don't need to
> > make it configurable - we could simply drop notifications whose commit
> > status is no longer available (instead of raising an error).
> >
> IIUC this is about not making the vacuum freeze considering the oldest
> xid on the queue but just remove notifications whose transaction status
> is no longer available right? Since currently when the error happens we
> already can't process the notifications it seems a reasonable way to go
> IMO.

On second thought, simply hiding the error would be worse than our
current behavior. Users wouldn't know their notifications are being
dropped, as they often don't check WARNINGs. The more frequently they
try to freeze XIDs, the more notifications they'd lose. To avoid
silent discards, they would need to increase
autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
this increases the risk of XID wraparound. I think the proposed
approach modifying the vacuum freeze to consider the oldest XID on the
queue would be better. This has a downside as I mentioned: processes
in idle-in-transaction state even without backend_xmin and backend_xid
can still accumulate unconsumed notifications. However, leaving
transactions in idle-in-transaction state for a long time is bad
practice anyway. While we might want to consider adding a safeguard
for this case, I guess it would rarely occur in practice.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-22T15:23:42Z

On Wed Oct 22, 2025 at 1:31 AM -03, Masahiko Sawada wrote:
> On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
>> > I think adding a new GUC would be overkill for this fix. As for
>> > dropping old notifications from the queue, we probably don't need to
>> > make it configurable - we could simply drop notifications whose commit
>> > status is no longer available (instead of raising an error).
>> >
>> IIUC this is about not making the vacuum freeze considering the oldest
>> xid on the queue but just remove notifications whose transaction status
>> is no longer available right? Since currently when the error happens we
>> already can't process the notifications it seems a reasonable way to go
>> IMO.
>
> On second thought, simply hiding the error would be worse than our
> current behavior. Users wouldn't know their notifications are being
> dropped, as they often don't check WARNINGs. The more frequently they
> try to freeze XIDs, the more notifications they'd lose. To avoid
> silent discards, they would need to increase
> autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
> this increases the risk of XID wraparound. I think the proposed
> approach modifying the vacuum freeze to consider the oldest XID on the
> queue would be better. This has a downside as I mentioned: processes
> in idle-in-transaction state even without backend_xmin and backend_xid
> can still accumulate unconsumed notifications. However, leaving
> transactions in idle-in-transaction state for a long time is bad
> practice anyway. While we might want to consider adding a safeguard
> for this case, I guess it would rarely occur in practice.
>
I'm attaching a v9 patch which is based on the idea of changing the
vacuum freeze to consider the oldest xid on the listen/notify queue. The
0001 patch is from Joel that it was previously sent on [1] with some
small tweaks and the 0002 is the TAP tests introduced on the previously
versions by me and by Arseniy. I keep it separate because I'm not sure
if it's all suitable for back-pacthing.

I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
this architecture being used. I think that it's not, but perhaps is a
good test to keep it?

[1] https://www.postgresql.org/message-id/25651193-da4e-4185-a564-f2efa6b0c8a4%40app.fastmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-10-22T15:49:43Z

On 2025-Oct-22, Matheus Alcantara wrote:

> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
> this architecture being used. I think that it's not, but perhaps is a
> good test to keep it?

I'd rather have tests than not, but I'd think it needs to be behind
PG_TEST_EXTRA because of things like

+$node->safe_psql('postgres', 'select consume_xids(10000000);');

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-22T17:25:36Z

On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
> On 2025-Oct-22, Matheus Alcantara wrote:
>
>> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
>> this architecture being used. I think that it's not, but perhaps is a
>> good test to keep it?
>
> I'd rather have tests than not, but I'd think it needs to be behind
> PG_TEST_EXTRA because of things like
>
> +$node->safe_psql('postgres', 'select consume_xids(10000000);');
>
Attached v10 with wrapping into PG_TEST_EXTRA. Should we enable this
somewhere to be executed on build farm?

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Masahiko Sawada <sawada.mshk@gmail.com> — 2025-10-23T23:42:45Z

On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
> > On 2025-Oct-22, Matheus Alcantara wrote:
> >
> >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
> >> this architecture being used. I think that it's not, but perhaps is a
> >> good test to keep it?
> >
> > I'd rather have tests than not, but I'd think it needs to be behind
> > PG_TEST_EXTRA because of things like
> >
> > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
> >
> Attached v10 with wrapping into PG_TEST_EXTRA.

Thank you for updating the patches!

I've reviewed the patches and here are the comments.

v10-0001-Prevent-VACUUM-from-truncating-XIDs-still-presen.patch:

+ *
+ * Returns InvalidTransactionId if there are no unprocessed notifications.

This comment is not accurate since the function returns
InvalidTransactionId even if all unprocessed notifications are created
on other databases.

---
+TransactionId
+GetOldestQueuedNotifyXid(void)
+{

How about renaming it to something like GetOldestNotifyTransactionId()?

---
+   /* page_buffer must be adequately aligned, so use a union */
+   union
+   {
+       char        buf[QUEUE_PAGESIZE];
+       AsyncQueueEntry align;
+   }           page_buffer;

asyncQueueReadAllNotifications() uses this union too, so how about
define this type as AlignedQueueEntryPage or something and use it in
both functions?

---
+
+   LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
+

I don't think we need an exclusive lock here.

---
In GetOldestQueuedNotifyXid() why do we keep holding NotifyQueueLock
while calculating the oldest XID in the queue?

---
+
+       /* Process all entries on this page up to head */
+       while (curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
QUEUE_PAGESIZE &&
+              !QUEUE_POS_EQUAL(pos, head))
+       {
+           qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
(snip)
+           /* Advance to next entry */
+           if (asyncQueueAdvance(&pos, qe->length))
+               break;          /* advanced to next page */
+

I think the check "curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
QUEUE_PAGESIZE" is essentially the same as the one we do in
asyncQueueAdvance(). I think we can refactor the inner loop in
GetOldestQueuedNotifyXid() to something like:

for (;;)
{
    bool reachedEndOfPage;

    qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);

    // check qe->xid here...

    reachedEndOfPage = asyncQueueAdvance(&pos, qe->length);

    if (reachedEndOfPage || QUEUE_POS_EQUAL(pos, head))
        break;
}

---
v10-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch:

This new test directory needs to have .gitignore.

---
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+

It's better to have a short description of this test here.

---
+use File::Path qw(mkpath);

It seems not necessary.

---
+$node->safe_psql('postgres',
+   'CREATE TABLE t AS SELECT g AS a, g+2 AS b from
generate_series(1,100000) g;'

Why does it need to insert many rows?

---
+# --- Session 2, multiple notify's, and commit ---
+for my $i (1 .. 10)
+{
+   $node->safe_psql(
+       'postgres', "
+       BEGIN;
+       NOTIFY s, '$i';
+       COMMIT;");
+}

Why does it need to send a notification 10 times?

---
+# Consume enough XIDs to trigger truncation
+$node->safe_psql('postgres', 'select consume_xids(10000000);');
+

I guess it consumes XID too much to trigger truncation. Given the one
clog segment file is 256kB in size, it's enough to consume 1,050,000
XIDs to move to the next segment file.

---
+# Get the new datfrozenxid after vacuum freeze to ensure that is advanced but
+# we can still get the notification status of the notification
+my $datafronzenxid_freeze = $node->safe_psql('postgres', "select
datfrozenxid from pg_database where datname = 'postgres'");
+ok($datafronzenxid_freeze > $datafronzenxid, 'datfrozenxid is advanced');

I think this test passes in all cases, so it is meaningless. Instead,
what we need to check in terms of datfrozenxid is that its value
doesn't get greater than the XID we used for the notification, even
after vacuum freeze. I think we remember the XID used for NOTIFY and
check if the old/new datfrozenxid values are older than that value.

---
+is($notifications_count, 10, 'received all committed notifications');
+
+done_testing();

After consuming the unconsumed notifications, let's do vacuum freeze
and check if datfrozenxid now can be advanced.

---
I would expect to add 002_aborted_tx_notifies.pl in a separate patch
since it's not related to this bug fix.

---
+# Test checks that listeners do not receive notifications from aborted
+# transaction even if notifications have been added to the listen/notify
+# queue. To reproduce it we use the fact that serializable conflicts
+# are checked after tx adds notifications to the queue.

I wonder if we could implement this test using the isolation test
instead of the tap test. Is there any reason why you used a tap test
for that?

Regards,

> Should we enable this
> somewhere to be executed on build farm?

Yeah, I hope some buildfarm animals enable it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-24T07:36:44Z

On Fri, Oct 24, 2025 at 2:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
> > > On 2025-Oct-22, Matheus Alcantara wrote:
> > >
> > >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
> > >> this architecture being used. I think that it's not, but perhaps is a
> > >> good test to keep it?
> > >
> > > I'd rather have tests than not, but I'd think it needs to be behind
> > > PG_TEST_EXTRA because of things like
> > >
> > > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
> > >
> > Attached v10 with wrapping into PG_TEST_EXTRA.
>
> Thank you for updating the patches!
>
> I've reviewed the patches and here are the comments.

Thank you for the review!

> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
> since it's not related to this bug fix.
>
> ---
> +# Test checks that listeners do not receive notifications from aborted
> +# transaction even if notifications have been added to the listen/notify
> +# queue. To reproduce it we use the fact that serializable conflicts
> +# are checked after tx adds notifications to the queue.
>
> I wonder if we could implement this test using the isolation test
> instead of the tap test. Is there any reason why you used a tap test
> for that?
>

I agree it's less relevant to the patch now than it was with the new
'committed' field approach. And there is no particular reason why it
was implemented as a TAP test actually.. So +1 to move it to separate
patch (does it mean to separate thread as well or just separate patch
file?) and rewrite as an isolation test (IIUC it's better to use
isolation test infrastructure if it's possible). I can try to do it if
nobody else does it earlier.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-24T08:55:02Z

On Wed, Oct 22, 2025 at 3:02 AM Joel Jacobson <joel@compiler.org> wrote:
>
> On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
> >> Regarding the v8 patch, it introduces a fundamentally new way of
> >> managing notification entries (adding entries with 'committed' state
> >> and marking them 'aborted' in abort paths). This affects all use
> >> cases, not just those involving very old unconsumed notifications, and
> >> could introduce more serious bugs like PANIC or SEGV. For
> >> backpatching, I prefer targeting just the problematic behavior while
> >> leaving unrelated parts unchanged. Though Álvaro might have a
> >> different perspective on this.
> >>
> > Thanks very much for this explanation and for what you've previously
> > wrote on [1]. It's clear to me now that the v8 architecture is not a
> > good way to go.
>
> How about doing some more work in vac_update_datfrozenxid()?
>
> Pseudo-code sketch:
>
> ```
> void
> vac_update_datfrozenxid(void)
> {
>
>     /* After computing newFrozenXid from all known sources... */
>
>     TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();
>
>     if (TransactionIdIsValid(oldestNotifyXid) &&
>         TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
>     {
>         /*
>          * The async queue has XIDs older than our proposed freeze point.
>          * Attempt cleanup, then back off and let the next VACUUM benefit.
>          */
>
>         if (asyncQueueHasListeners())
>         {
>             /*
>              * Wake all listening backends across *all* databases
>              * that are not already at QUEUE_HEAD.
>              * They'll hopefully process notifications and advance
>              * their pointers, allowing the next VACUUM to freeze further.
>              */
>             asyncQueueWakeAllListeners();
>         }
>         else
>         {
>             /*
>              * No listeners exist - discard all unread notifications.
>              * The next VACUUM should succeed in advancing datfrozenxid.
>              * asyncQueueAdvanceTailNoListeners() would take exclusive lock
>              * on NotifyQueueLock before checking
>              * QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
>              */
>             asyncQueueAdvanceTailNoListeners();
>         }
>
>         /*
>          * Back off datfrozenxid to protect the old XIDs.
>          * The cleanup we just performed should allow the next VACUUM
>          * to freeze further.
>          */
>         newFrozenXid = oldestNotifyXid;
>     }
> }
> ```
>
> Maybe it wouldn't solve all problematic situations, but to me it seems
> like these measures could help many of them, or am I missing some
> crucial insight here?

I agree we need to add something like this. Looks like with v10 it's
possible for the listen/notify queue to block datfrozenxid advancing
even without extreme circumstances (without hanging listeners etc).

I see two thing we should take care of in v10:

1) Currently asyncQueueAdvanceTail is called regularly only if we have
a constant flow of notifications. We try to advance the tail every
time when the head reaches every 4th page. So if we don't have new
notifications, the tail will stay where it is forever. It means that
if we have at least 1 notification in the queue without constant flow
of notifications then GetOldestQueuedNotifyXid will constantly return
the same result which will block the advancement of datfrozenxid. So
it looks like we need something that will advance the tail in this
case.

2) Currently we wake up all listeners regularly only if we have a
constant flow of notifications (again). Even if we have a listener and
we never wake up such a listener because of the new notifications (for
example there are no new notifications in the listener's database), we
still signal such a listener sometimes as its position lags too much
from the head. But again, if we don't have new notifications, it's
possible that such a listener will never process some notification
(from another database) and tail advancement will be blocked. As a
result, datfrozenxid advancement also will be blocked. So probably we
need something that will wake up lagging listeners if they block tail
advancement.

Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-24T12:12:01Z

On Fri, Oct 24, 2025, at 11:55, Arseniy Mukhin wrote:
> On Wed, Oct 22, 2025 at 3:02 AM Joel Jacobson <joel@compiler.org> wrote:
>> How about doing some more work in vac_update_datfrozenxid()?
>>
>> Pseudo-code sketch:
>>
...
> I agree we need to add something like this. Looks like with v10 it's
> possible for the listen/notify queue to block datfrozenxid advancing
> even without extreme circumstances (without hanging listeners etc).

Attached, two implementations of the sketched out idea, which can be
applied on top of the v10 patch. They should both be functionally
equivalent.

vacuum_notify_queue_cleanup-with-code-dup.txt:

This version doesn't try at all to avoid code duplication;
asyncQueueAdvanceTailNoListeners is very similar to SignalBackends, and
asyncQueueAdvanceTailNoListeners is very similar to
asyncQueueAdvanceTail. I think this might be preferable, if the channel
hash optimization that we're working on in the other thread, as a bonus
solves these fundamental problems, so that these added safety
functionality can be eliminated. I think it's quite likely we can
achieve that, but not certain.

vacuum_notify_queue_cleanup-without-code-dup.txt:

This version instead equips SignalBackends and asyncQueueAdvanceTail
with a new boolean input parameter to control their behavior, where
passing false gives the current behavior, used at the current call
sites, and vacuum would pass true, to signal all non-caught-up backends
in all databases and forcibly advance the tail when there are no
listening backends.

I have no strong preference for one or the other.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-25T00:32:54Z

Thanks for the review!

On Thu Oct 23, 2025 at 8:42 PM -03, Masahiko Sawada wrote:
> On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>>
>> On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
>> > On 2025-Oct-22, Matheus Alcantara wrote:
>> >
>> >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
>> >> this architecture being used. I think that it's not, but perhaps is a
>> >> good test to keep it?
>> >
>> > I'd rather have tests than not, but I'd think it needs to be behind
>> > PG_TEST_EXTRA because of things like
>> >
>> > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
>> >
>> Attached v10 with wrapping into PG_TEST_EXTRA.
>
> Thank you for updating the patches!
>
> I've reviewed the patches and here are the comments.
>
> v10-0001-Prevent-VACUUM-from-truncating-XIDs-still-presen.patch:
>
> + *
> + * Returns InvalidTransactionId if there are no unprocessed notifications.
>
> This comment is not accurate since the function returns
> InvalidTransactionId even if all unprocessed notifications are created
> on other databases.
>
Fixed

> ---
> +TransactionId
> +GetOldestQueuedNotifyXid(void)
> +{
>
> How about renaming it to something like GetOldestNotifyTransactionId()?
>
Yeap, sounds better, fixed.

> ---
> +   /* page_buffer must be adequately aligned, so use a union */
> +   union
> +   {
> +       char        buf[QUEUE_PAGESIZE];
> +       AsyncQueueEntry align;
> +   }           page_buffer;
>
> asyncQueueReadAllNotifications() uses this union too, so how about
> define this type as AlignedQueueEntryPage or something and use it in
> both functions?
>
Good point, fixed.

> ---
> +
> +   LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
> +
>
> I don't think we need an exclusive lock here.
>
Fixed.

> ---
> In GetOldestQueuedNotifyXid() why do we keep holding NotifyQueueLock
> while calculating the oldest XID in the queue?
>
Yeah, I don't think that it's necessary. The
asyncQueueReadAllNotifications() for example only hold the lock when
reading the QUEUE_BACKEND_POS and QUEUE_HEAD. I think that it's a
similar case here, fixed.

> ---
> +
> +       /* Process all entries on this page up to head */
> +       while (curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
> QUEUE_PAGESIZE &&
> +              !QUEUE_POS_EQUAL(pos, head))
> +       {
> +           qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
> (snip)
> +           /* Advance to next entry */
> +           if (asyncQueueAdvance(&pos, qe->length))
> +               break;          /* advanced to next page */
> +
>
> I think the check "curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
> QUEUE_PAGESIZE" is essentially the same as the one we do in
> asyncQueueAdvance(). I think we can refactor the inner loop in
> GetOldestQueuedNotifyXid() to something like:
>
> for (;;)
> {
>     bool reachedEndOfPage;
>
>     qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
>
>     // check qe->xid here...
>
>     reachedEndOfPage = asyncQueueAdvance(&pos, qe->length);
>
>     if (reachedEndOfPage || QUEUE_POS_EQUAL(pos, head))
>         break;
> }
>
Fixed

> ---
> v10-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch:
>
> This new test directory needs to have .gitignore.
>
Fixed

> ---
> +# Copyright (c) 2024-2025, PostgreSQL Global Development Group
> +
>
> It's better to have a short description of this test here.
>
Fixed

> ---
> +use File::Path qw(mkpath);
>
> It seems not necessary.
>
Fixed

> ---
> +$node->safe_psql('postgres',
> +   'CREATE TABLE t AS SELECT g AS a, g+2 AS b from
> generate_series(1,100000) g;'
>
> Why does it need to insert many rows?
>
I think that this value was left when I was writing the first versions of
this test, I don't think that it's necessary. I reduced to just 10 rows.

> ---
> +# --- Session 2, multiple notify's, and commit ---
> +for my $i (1 .. 10)
> +{
> +   $node->safe_psql(
> +       'postgres', "
> +       BEGIN;
> +       NOTIFY s, '$i';
> +       COMMIT;");
> +}
>
> Why does it need to send a notification 10 times?
>
I just wanted to test with multiple notifications, we can reduce to two
or three, there is not specific reason to send 10 notifications.

> ---
> +# Consume enough XIDs to trigger truncation
> +$node->safe_psql('postgres', 'select consume_xids(10000000);');
> +
>
> I guess it consumes XID too much to trigger truncation. Given the one
> clog segment file is 256kB in size, it's enough to consume 1,050,000
> XIDs to move to the next segment file.
>
Fixed

> ---
> +# Get the new datfrozenxid after vacuum freeze to ensure that is advanced but
> +# we can still get the notification status of the notification
> +my $datafronzenxid_freeze = $node->safe_psql('postgres', "select
> datfrozenxid from pg_database where datname = 'postgres'");
> +ok($datafronzenxid_freeze > $datafronzenxid, 'datfrozenxid is advanced');
>
> I think this test passes in all cases, so it is meaningless. Instead,
> what we need to check in terms of datfrozenxid is that its value
> doesn't get greater than the XID we used for the notification, even
> after vacuum freeze. I think we remember the XID used for NOTIFY and
> check if the old/new datfrozenxid values are older than that value.
>
I think that was the goal of this test. I've swap the values to make it
more clear. Please let me know if I misunderstood your point here.

> ---
> +is($notifications_count, 10, 'received all committed notifications');
> +
> +done_testing();
>
> After consuming the unconsumed notifications, let's do vacuum freeze
> and check if datfrozenxid now can be advanced.
>
Good point, during this I realize that the datafrozenxid was not being
advanced even after the queue being consumed. This was because when a
backend listener is consuming the notifications it only update their own
queue tail pointer and not the global shared tail pointer, so I've
included a call to asyncQueueAdvanceTail() at the beginning of
GetOldestNotifyTransactionId() to do this.

> ---
> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
> since it's not related to this bug fix.
>
On the new attached version I've moved this test to a separated patch.

> ---
> +# Test checks that listeners do not receive notifications from aborted
> +# transaction even if notifications have been added to the listen/notify
> +# queue. To reproduce it we use the fact that serializable conflicts
> +# are checked after tx adds notifications to the queue.
>
> I wonder if we could implement this test using the isolation test
> instead of the tap test. Is there any reason why you used a tap test
> for that?
>
On this new version I just moved the test into a separated patch but I
agree that we can implement this using the isolation test,

I don't know how we should handle the follow-up patches that Joel have
sent on [1], if we should incoportate on 0001 or not. I still need to
review and test them.

[1] https://www.postgresql.org/message-id/b6f6cbb8-1903-4ca0-ae64-01d84d1e12a3%40app.fastmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-10-25T00:33:33Z

On Fri Oct 24, 2025 at 4:36 AM -03, Arseniy Mukhin wrote:
>> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
>> since it's not related to this bug fix.
>>
>> ---
>> +# Test checks that listeners do not receive notifications from aborted
>> +# transaction even if notifications have been added to the listen/notify
>> +# queue. To reproduce it we use the fact that serializable conflicts
>> +# are checked after tx adds notifications to the queue.
>>
>> I wonder if we could implement this test using the isolation test
>> instead of the tap test. Is there any reason why you used a tap test
>> for that?
>>
>
> I agree it's less relevant to the patch now than it was with the new
> 'committed' field approach. And there is no particular reason why it
> was implemented as a TAP test actually.. So +1 to move it to separate
> patch (does it mean to separate thread as well or just separate patch
> file?) and rewrite as an isolation test (IIUC it's better to use
> isolation test infrastructure if it's possible). I can try to do it if
> nobody else does it earlier.
>
On the v11 version that I've sent on [1] I've move this test into a
separate patch, please feel free to implement it as an isolation test if
you want it.

[1] https://www.postgresql.org/message-id/DDQZB2AD34V4.3RH2USCA72AS8%40gmail.com

--
Matheus Alcantara

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-10-25T13:08:41Z

On Sat, Oct 25, 2025 at 3:33 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Fri Oct 24, 2025 at 4:36 AM -03, Arseniy Mukhin wrote:
> >> I would expect to add 002_aborted_tx_notifies.pl in a separate patch
> >> since it's not related to this bug fix.
> >>
> >> ---
> >> +# Test checks that listeners do not receive notifications from aborted
> >> +# transaction even if notifications have been added to the listen/notify
> >> +# queue. To reproduce it we use the fact that serializable conflicts
> >> +# are checked after tx adds notifications to the queue.
> >>
> >> I wonder if we could implement this test using the isolation test
> >> instead of the tap test. Is there any reason why you used a tap test
> >> for that?
> >>
> >
> > I agree it's less relevant to the patch now than it was with the new
> > 'committed' field approach. And there is no particular reason why it
> > was implemented as a TAP test actually.. So +1 to move it to separate
> > patch (does it mean to separate thread as well or just separate patch
> > file?) and rewrite as an isolation test (IIUC it's better to use
> > isolation test infrastructure if it's possible). I can try to do it if
> > nobody else does it earlier.
> >
> On the v11 version that I've sent on [1] I've move this test into a
> separate patch, please feel free to implement it as an isolation test if
> you want it.
>

Thank you!

I reimplemented the test in 0002 as an isolation test and added the
commit message. PFA the new version.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-25T18:01:56Z

On Sat, Oct 25, 2025, at 15:08, Arseniy Mukhin wrote:
> I reimplemented the test in 0002 as an isolation test and added the
> commit message. PFA the new version.

I've rebased the patches so they can be applied on top of v12:
- v12-vacuum_notify_queue_cleanup-without-code-dup.txt
- v12-vacuum_notify_queue_cleanup-with-code-dup.txt

I also want to share a new idea that Heikki Linnakangas came up with
during PgConf25 in Riga.

The idea is basically to scan the notification queue from tail to head,
and update xids that precedes newFrozenXid, to either
FrozenTransactionId if committed, or InvalidTransactionId if not.

I've implemented this by adding a new extern function
AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.

This way, we don't need to hold back the advancement of newFrozenXid in
vac_update_datfrozenxid.

Attempt of implementing Heikki's idea:
- async_notify_freeze_xids.txt

This idea has the benefit of never holding back newFrozenXid,
however, I see some problems with it:

- If there is a bug in the code, we could accidentally overwrite
  xid values we didn't intend to overwrite, and maybe we would never
  find out that we did.

- We wouldn't know for sure that we've understood the cause of
  this bug.

With v12-vacuum_notify_queue_cleanup, we instead have the downside of
possibly holding back newFrozenXid, but with the advantage of giving us
higher confidence in that we've correctly identified the cause of the
bug.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-29T11:40:34Z

On 25/10/2025 21:01, Joel Jacobson wrote:
> On Sat, Oct 25, 2025, at 15:08, Arseniy Mukhin wrote:
>> I reimplemented the test in 0002 as an isolation test and added the
>> commit message. PFA the new version.
> 
> I've rebased the patches so they can be applied on top of v12:
> - v12-vacuum_notify_queue_cleanup-without-code-dup.txt
> - v12-vacuum_notify_queue_cleanup-with-code-dup.txt
> 
> I also want to share a new idea that Heikki Linnakangas came up with
> during PgConf25 in Riga.
> 
> The idea is basically to scan the notification queue from tail to head,
> and update xids that precedes newFrozenXid, to either
> FrozenTransactionId if committed, or InvalidTransactionId if not.
> 
> I've implemented this by adding a new extern function
> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
> 
> This way, we don't need to hold back the advancement of newFrozenXid in
> vac_update_datfrozenxid.
> 
> Attempt of implementing Heikki's idea:
> - async_notify_freeze_xids.txt

Thanks!

> This idea has the benefit of never holding back newFrozenXid,
> however, I see some problems with it:
> 
> - If there is a bug in the code, we could accidentally overwrite
>    xid values we didn't intend to overwrite, and maybe we would never
>    find out that we did.
> 
> - We wouldn't know for sure that we've understood the cause of
>    this bug.
> 
> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
> possibly holding back newFrozenXid, but with the advantage of giving us
> higher confidence in that we've correctly identified the cause of the
> bug.

Meh. Robustness is good and all, and in heap tuples we don't overwrite 
the xmin/xmax but just set a FROZEN flag, precisely so that we preserve 
evidence if something goes wrong. But I can't get too worked up about 
that for the async notification queue.

That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
similar to heap tuples, and set the hint bit instead of replacing the 
original xid. That might be good for performance too: If the first 
backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
the hint bit, that would save the effort for other listening backends.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-29T12:30:12Z

On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
> On 25/10/2025 21:01, Joel Jacobson wrote:
>> I also want to share a new idea that Heikki Linnakangas came up with
>> during PgConf25 in Riga.
>> 
>> The idea is basically to scan the notification queue from tail to head,
>> and update xids that precedes newFrozenXid, to either
>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>> 
>> I've implemented this by adding a new extern function
>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>> 
>> This way, we don't need to hold back the advancement of newFrozenXid in
>> vac_update_datfrozenxid.
>> 
>> Attempt of implementing Heikki's idea:
>> - async_notify_freeze_xids.txt
>
> Thanks!

Thanks for the idea!

>> This idea has the benefit of never holding back newFrozenXid,
>> however, I see some problems with it:
>> 
>> - If there is a bug in the code, we could accidentally overwrite
>>    xid values we didn't intend to overwrite, and maybe we would never
>>    find out that we did.
>> 
>> - We wouldn't know for sure that we've understood the cause of
>>    this bug.
>> 
>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>> possibly holding back newFrozenXid, but with the advantage of giving us
>> higher confidence in that we've correctly identified the cause of the
>> bug.
>
> Meh. Robustness is good and all, and in heap tuples we don't overwrite 
> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve 
> evidence if something goes wrong. But I can't get too worked up about 
> that for the async notification queue.

Having slept on this for some days, I'm less worried about this
approach. I like the simplicity of it, and that we don't bolt on complexity
to another subsystem, just for the sake of improved debugging
capabilities of a different subsystem.

I think a different way of looking at this, is that we wouldn't conceal
a bug in async, but rather we would let vacuum do part of its job, of
checking the commit status of the xids, when needed, and be okay with
that split responsibility.

If we can prove that the optimizations we're working on, will render
that new code unnecessary, then I guess we can just remove it.

> That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
> similar to heap tuples, and set the hint bit instead of replacing the 
> original xid. That might be good for performance too: If the first 
> backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
> the hint bit, that would save the effort for other listening backends.

Interesting, sounds like a really promising approach.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-29T13:05:27Z

On 29/10/2025 14:30, Joel Jacobson wrote:
> On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
>> On 25/10/2025 21:01, Joel Jacobson wrote:
>>> I also want to share a new idea that Heikki Linnakangas came up with
>>> during PgConf25 in Riga.
>>>
>>> The idea is basically to scan the notification queue from tail to head,
>>> and update xids that precedes newFrozenXid, to either
>>> FrozenTransactionId if committed, or InvalidTransactionId if not.
>>>
>>> I've implemented this by adding a new extern function
>>> AsyncNotifyFreezeXids to async.h, called from vac_update_datfrozenxid.
>>>
>>> This way, we don't need to hold back the advancement of newFrozenXid in
>>> vac_update_datfrozenxid.
>>>
>>> Attempt of implementing Heikki's idea:
>>> - async_notify_freeze_xids.txt
>>
>> Thanks!
> 
> Thanks for the idea!

For the record, Yura suggested the same approach earlier in this thread 
[1]. That line of discussion led to more complicated patches but I think 
the complications came from trying to set the flag as part of 
commit/abort. The changes are more straightforward and backpatchable if 
we only do it at vacuum.

>>> This idea has the benefit of never holding back newFrozenXid,
>>> however, I see some problems with it:
>>>
>>> - If there is a bug in the code, we could accidentally overwrite
>>>     xid values we didn't intend to overwrite, and maybe we would never
>>>     find out that we did.
>>>
>>> - We wouldn't know for sure that we've understood the cause of
>>>     this bug.
>>>
>>> With v12-vacuum_notify_queue_cleanup, we instead have the downside of
>>> possibly holding back newFrozenXid, but with the advantage of giving us
>>> higher confidence in that we've correctly identified the cause of the
>>> bug.
>>
>> Meh. Robustness is good and all, and in heap tuples we don't overwrite
>> the xmin/xmax but just set a FROZEN flag, precisely so that we preserve
>> evidence if something goes wrong. But I can't get too worked up about
>> that for the async notification queue.
> 
> Having slept on this for some days, I'm less worried about this
> approach. I like the simplicity of it, and that we don't bolt on complexity
> to another subsystem, just for the sake of improved debugging
> capabilities of a different subsystem.
> 
> I think a different way of looking at this, is that we wouldn't conceal
> a bug in async, but rather we would let vacuum do part of its job, of
> checking the commit status of the xids, when needed, and be okay with
> that split responsibility.

Ok, at quick glance I think this patch is in pretty good shape. I'll 
review it more thoroughly, and see if I can incorporate the test from 
Matheus Alcantara's or Arseniy Mukhin's latest patches, and commit and 
backpatch this.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-29T16:48:23Z

"Joel Jacobson" <joel@compiler.org> writes:
> On Wed, Oct 29, 2025, at 12:40, Heikki Linnakangas wrote:
>> That said, we could add COMMITTED/INVALID hint bits to AsyncQueueEntry, 
>> similar to heap tuples, and set the hint bit instead of replacing the 
>> original xid. That might be good for performance too: If the first 
>> backend to call TransactionIdDidCommit() on an AsyncQueueEntry would set 
>> the hint bit, that would save the effort for other listening backends.

> Interesting, sounds like a really promising approach.

I don't see any room for adding hint bits without enlarging
AsyncQueueEntry, which I'd rather not do.  But we could mechanize that
without a separate hint bit by replacing the tested XID with FrozenXid
or InvalidXid, as the patch already does.  We already require writing
an XID to shared memory to be atomic, so that seems okay.

However ... that won't actually work, the reason being that
asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
but from a local copy.  Even if it were to modify the state of that
copy, no other backend would see the effects.

The reason it's like that is stated in the comments:

 * The current page must have been fetched into page_buffer from shared
 * memory.  (We could access the page right in shared memory, but that
 * would imply holding the SLRU bank lock throughout this routine.)

The patch proposed here likewise appears to involve holding an SLRU
bank lock throughout what could be a significant number of
TransactionIdDidCommit tests.  That seems like it could result in a
pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
by only doing it when VACUUM is trying to advance datfrozenxid, but
still I wonder if we can't find a less concurrency-unfriendly answer.

The local-copy behavior also means that this patch isn't quite a 100%
fix.  We could have a race condition like so:

1. Backend A grabs a copy of some SLRU page and begins running
asyncQueueProcessPageEntries(), but then loses the CPU.

2. Backend B completes a VACUUM, finds it can advance datfrozenxid,
marks some relevant XIDs as frozen in the notify SLRU, and truncates
clog.

3. Backend A gets control back, tries to discover the state of
some XID that's still present in its local copy of the XID page,
and fails.

Step 2 will take long enough that this isn't very plausible
timing-wise, but it's still theoretically a hole.

All of this is a problem mainly because of the presumption that
holding an SLRU bank lock for a long time is bad.  I wonder how
dangerous that really is.

			regards, tom lane

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-29T17:13:16Z

On Wed, Oct 29, 2025, at 17:48, Tom Lane wrote:
> However ... that won't actually work, the reason being that
> asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
> but from a local copy.  Even if it were to modify the state of that
> copy, no other backend would see the effects.
>
> The reason it's like that is stated in the comments:
>
>  * The current page must have been fetched into page_buffer from shared
>  * memory.  (We could access the page right in shared memory, but that
>  * would imply holding the SLRU bank lock throughout this routine.)
>
> The patch proposed here likewise appears to involve holding an SLRU
> bank lock throughout what could be a significant number of
> TransactionIdDidCommit tests.  That seems like it could result in a
> pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
> by only doing it when VACUUM is trying to advance datfrozenxid, but
> still I wonder if we can't find a less concurrency-unfriendly answer.
...
> All of this is a problem mainly because of the presumption that
> holding an SLRU bank lock for a long time is bad.  I wonder how
> dangerous that really is.

Ops. Sounds scary.

I don't know if others have looked at the v12-vacuum_notify_queue_cleanup
approach; if it's bad, it would be helpful to understand why.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-30T13:25:31Z

On 29/10/2025 18:48, Tom Lane wrote:
> However ... that won't actually work, the reason being that
> asyncQueueProcessPageEntries() doesn't work directly from an SLRU page
> but from a local copy.  Even if it were to modify the state of that
> copy, no other backend would see the effects.

Ah, I missed that local copy.

> The reason it's like that is stated in the comments:
> 
>   * The current page must have been fetched into page_buffer from shared
>   * memory.  (We could access the page right in shared memory, but that
>   * would imply holding the SLRU bank lock throughout this routine.)
> 
> The patch proposed here likewise appears to involve holding an SLRU
> bank lock throughout what could be a significant number of
> TransactionIdDidCommit tests.  That seems like it could result in a
> pretty bad "burp" in NOTIFY throughput.  That problem is ameliorated
> by only doing it when VACUUM is trying to advance datfrozenxid, but
> still I wonder if we can't find a less concurrency-unfriendly answer.

Hmm. I thought the original comment was more worried about sending the 
message to the client being slow. But I guess the 
TransactionIdDidCommit() calls are not free either.

> The local-copy behavior also means that this patch isn't quite a 100%
> fix.  We could have a race condition like so:
> 
> 1. Backend A grabs a copy of some SLRU page and begins running
> asyncQueueProcessPageEntries(), but then loses the CPU.
> 
> 2. Backend B completes a VACUUM, finds it can advance datfrozenxid,
> marks some relevant XIDs as frozen in the notify SLRU, and truncates
> clog.
> 
> 3. Backend A gets control back, tries to discover the state of
> some XID that's still present in its local copy of the XID page,
> and fails.
> 
> Step 2 will take long enough that this isn't very plausible
> timing-wise, but it's still theoretically a hole.

It would be better than what we have now, but sure would be nice to fix 
the problem fully..

A straightforward fix would be to introduce another lock, 
NotifyQueueVacuumLock, that's held in shared mode across 
asyncQueueReadAllNotifications(), and acquired in exclusive mode by 
freezing. That way freezing would wait out the concurrent readers that 
might have local copies. Or some other similar mechanism for waiting 
them out with e.g. condition variables.

> All of this is a problem mainly because of the presumption that
> holding an SLRU bank lock for a long time is bad.  I wonder how
> dangerous that really is.

It was worse in older versions before the SLRU banks were introduced.

Holding the lock while we send the notification to the client is not 
acceptable. But perhaps we could split asyncQueueProcessPageEntries() 
into two parts:

1. Do the TransactionIdDidCommit() checks on each entry and copy the 
visible entries to local memory.
2. Release lock and send the notifications to the backend.

Joel, since you've been working on some optimizations in this area too, 
would you happen to have some suitable performance test scripts for this?

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-30T23:08:04Z

On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
> Joel, since you've been working on some optimizations in this area too, 
> would you happen to have some suitable performance test scripts for this?

Glad you asked. I'm actually working on a benchmark+correctness tester.
It's very much work-in-progress though, don't look too much at the code,
or your eyes will bleed.

It's a combined benchmark + correctness tester, that verifies that only
the expected notifications are received on the expected connections,
while at the same time doing timing measurements.

```
% ./pg_bench_lino --help
Usage: ./pg_bench_lino [OPTIONS]

PostgreSQL LISTEN/NOTIFY Benchmark Tool (CLI version)

Options:
  -c, --connections N       Number of database connections (default: 1)
  -n, --channels N          Number of notification channels (default: 1)
  -t, --tick-ms N           Tick interval in milliseconds (default: 1)
  -l, --listen-prob N       LISTEN probability % (default: 0.1, 0 to disable)
  -u, --unlisten-prob N     UNLISTEN probability % (default: 0.05, 0 to disable)
  -p, --notify-prob N       NOTIFY probability % (default: 1.0, 0 to disable)
  -a, --unlisten-all-prob N UNLISTEN * probability % (default: 0.01, 0 to disable)
  -T, --ticks N             Number of ticks to run (REQUIRED)
  -s, --seed N              Random seed for reproducibility (default: current time)
  -h, --help                Show this help message
```

Example run on my MacBook M3 Max

--- master:

% ./pg_bench_lino -t 0 -c 100 -n 1000 -l 10 -u 0 -p 100 -s 42 -T 10000
Initializing 100 connections and 1000 channels...
Initialization complete. Starting benchmark...

========================================================
PostgreSQL LISTEN/NOTIFY Benchmark Results
========================================================

Configuration:
  Connections: 100
  Channels: 1000
  Tick interval: 0 ms
  LISTEN probability: 10.0000%
  UNLISTEN probability: 0.0000%
  NOTIFY probability: 100.0000%
  UNLISTEN * probability: 0.0100%
  Ticks executed: 10000
  Random seed: 42

Final State:
  Active connections: 100
  Active channels: 1000
  Listening pairs: 989
  Correctness errors: 0

========================================================
Operation Statistics:
Operation            Count      Min(ms)      Avg(ms)      Max(ms)
--------------------------------------------------------
LISTEN                1009        0.016        0.029        0.291
UNLISTEN                 0            -            -            -
UNLISTEN *               1        0.071        0.071        0.071
NOTIFY                9989        0.064        0.247        2.163
NOTIFY delivery       9989        0.072        0.277        2.170

========================================================
NOTIFY Delivery Time Distribution:

0.05-0.1ms   # 4 (0.0%)
0.1-0.15ms   ## 263 (2.6%)
0.15-0.2ms   ####### 881 (8.8%)
0.2-0.3ms    ################################################## 5907 (59.1%)
0.3-0.4ms    #################### 2388 (23.9%)
0.4-0.5ms    #### 525 (5.3%)
0.5-0.75ms   # 18 (0.2%)
1-2ms        # 2 (0.0%)
2-5ms        # 1 (0.0%)


-- Optimization v22 patch:

% ./pg_bench_lino -t 0 -c 100 -n 1000 -l 10 -u 0 -p 100 -s 42 -T 10000
Initializing 100 connections and 1000 channels...
Initialization complete. Starting benchmark...

========================================================
PostgreSQL LISTEN/NOTIFY Benchmark Results
========================================================

Configuration:
  Connections: 100
  Channels: 1000
  Tick interval: 0 ms
  LISTEN probability: 10.0000%
  UNLISTEN probability: 0.0000%
  NOTIFY probability: 100.0000%
  UNLISTEN * probability: 0.0100%
  Ticks executed: 10000
  Random seed: 42

Final State:
  Active connections: 100
  Active channels: 1000
  Listening pairs: 989
  Correctness errors: 0

========================================================
Operation Statistics:
Operation            Count      Min(ms)      Avg(ms)      Max(ms)
--------------------------------------------------------
LISTEN                1009        0.015        0.023        0.340
UNLISTEN                 0            -            -            -
UNLISTEN *               1        0.056        0.056        0.056
NOTIFY                9989        0.018        0.031        2.037
NOTIFY delivery       9989        0.022        0.062        2.056

========================================================
NOTIFY Delivery Time Distribution:

0-0.05ms     ################################################## 5002 (50.1%)
0.05-0.1ms   ######################################### 4130 (41.3%)
0.1-0.15ms   # 77 (0.8%)
0.15-0.2ms   #### 427 (4.3%)
0.2-0.3ms    ### 338 (3.4%)
0.3-0.4ms    # 11 (0.1%)
0.4-0.5ms    # 1 (0.0%)
2-5ms        # 3 (0.0%)

To compile:
gcc -Wall -Wextra -g -I"$(pg_config --includedir)" -c pg_bench_lino.c -o pg_bench_lino.o
gcc pg_bench_lino.o -L"$(pg_config --libdir)" -lpq -o pg_bench_lino

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-10-30T23:27:11Z

On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
> On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
>> Joel, since you've been working on some optimizations in this area too, 
>> would you happen to have some suitable performance test scripts for this?
>
> Glad you asked. I'm actually working on a benchmark+correctness tester.
> It's very much work-in-progress though, don't look too much at the code,
> or your eyes will bleed.
>
> It's a combined benchmark + correctness tester, that verifies that only
> the expected notifications are received on the expected connections,
> while at the same time doing timing measurements.

To run multiple pg_bench_lino processes in parallell to simulate
concurrent workloads, I realized the randomization of the channel names
and payloads were not random enough to avoid collissions. New version
attached that uses real UUIDs for channel names and payloads.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-10-31T16:40:03Z

On 31/10/2025 01:27, Joel Jacobson wrote:
> On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
>> On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
>>> Joel, since you've been working on some optimizations in this area too,
>>> would you happen to have some suitable performance test scripts for this?
>>
>> Glad you asked. I'm actually working on a benchmark+correctness tester.
>> It's very much work-in-progress though, don't look too much at the code,
>> or your eyes will bleed.
>>
>> It's a combined benchmark + correctness tester, that verifies that only
>> the expected notifications are received on the expected connections,
>> while at the same time doing timing measurements.
> 
> To run multiple pg_bench_lino processes in parallell to simulate
> concurrent workloads, I realized the randomization of the channel names
> and payloads were not random enough to avoid collissions. New version
> attached that uses real UUIDs for channel names and payloads.

Thanks! Here's a sketch for holding the bank lock across 
TransactionIdDidCommit() calls. In quick testing with your test program, 
I can't see any performance difference. However, I'm not quite sure what 
options I should be using to stress this. My gut feeling is that it's 
fine, but it'd be nice to do construct a real worst case test case to be 
sure.

There are some opportunities for micro-optimizations here:

* IsListeningOn() is kind of expensive if a backend is listening 
multiple channels. We really should turn that into a hash table. As the 
patch stands, I'm doing the IsListeningOn() calls while holding the bank 
lock, but unless we speed up IsListeningOn() in those degenerate cases, 
we should perhaps call it only after copying and releasing the lock.

* If IsListeningOn() is made fast, it might make sense to call it before 
TransactionIdDidCommit().

* Implement the hint bits.

I don't know how much those matter. Again, a test case would be nice.

I'll work more on performance testing next week of this, if no one else 
picks that up.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-03T11:02:32Z

On 31/10/2025 18:40, Heikki Linnakangas wrote:
> On 31/10/2025 01:27, Joel Jacobson wrote:
>> On Fri, Oct 31, 2025, at 00:08, Joel Jacobson wrote:
>>> On Thu, Oct 30, 2025, at 14:25, Heikki Linnakangas wrote:
>>>> Joel, since you've been working on some optimizations in this area too,
>>>> would you happen to have some suitable performance test scripts for 
>>>> this?
>>>
>>> Glad you asked. I'm actually working on a benchmark+correctness tester.
>>> It's very much work-in-progress though, don't look too much at the code,
>>> or your eyes will bleed.
>>>
>>> It's a combined benchmark + correctness tester, that verifies that only
>>> the expected notifications are received on the expected connections,
>>> while at the same time doing timing measurements.
>>
>> To run multiple pg_bench_lino processes in parallell to simulate
>> concurrent workloads, I realized the randomization of the channel names
>> and payloads were not random enough to avoid collissions. New version
>> attached that uses real UUIDs for channel names and payloads.
> 
> Thanks! Here's a sketch for holding the bank lock across 
> TransactionIdDidCommit() calls. In quick testing with your test program, 
> I can't see any performance difference. However, I'm not quite sure what 
> options I should be using to stress this. My gut feeling is that it's 
> fine, but it'd be nice to do construct a real worst case test case to be 
> sure.

I wrote another little stand-alone performance test program for this, 
attached. It launches N connections that send NOTIFYs to a single 
channel as fast as possible, and M threads that listen for the 
notifications. I ran it with different combinations of N and M, on 
'master' and on REL_14_STABLE (which didn't have SLRU banks) and I 
cannot discern any performance difference from these patches. So it 
seems that holding the SLRU (bank) lock across the 
TransactionIdDidCommit() calls is fine.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-11-03T21:45:00Z

On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
> I wrote another little stand-alone performance test program for this, 
> attached. It launches N connections that send NOTIFYs to a single 
> channel as fast as possible, and M threads that listen for the 
> notifications. I ran it with different combinations of N and M, on 
> 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I 
> cannot discern any performance difference from these patches. So it 
> seems that holding the SLRU (bank) lock across the 
> TransactionIdDidCommit() calls is fine.

Nice! That for the benchmark code! I took the liberty of hacking a bit
on it, and added support for multiple channels, with separate listener
and notifier threads per channel. Each notification now carries the
notifier ID, a sequence number, and a send timestamp. Listeners verify
that sequence numbers arrive in order and record delivery latency. The
program collects latency measurements into fixed buckets and reports
them once per second together with total and per-second send/receive
counts.

Also added a short delay before starting notifiers so that listeners
have time to issue their LISTEN commands, and a new --channels option,
and the meaning of --listeners and --notifiers was changed to apply per
channel.

Also fixed so the code could be compiled outside of the PostgreSQL
source code repo, if wanting to build this as stand-alone tool.

I've benchmarked master vs 0001+0002 and can't notice any differences;
see attached output from benchmark runs.

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-04T12:09:58Z

On 03/11/2025 23:45, Joel Jacobson wrote:
> On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
>> I wrote another little stand-alone performance test program for this,
>> attached. It launches N connections that send NOTIFYs to a single
>> channel as fast as possible, and M threads that listen for the
>> notifications. I ran it with different combinations of N and M, on
>> 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I
>> cannot discern any performance difference from these patches. So it
>> seems that holding the SLRU (bank) lock across the
>> TransactionIdDidCommit() calls is fine.
> 
> Nice! That for the benchmark code! I took the liberty of hacking a bit
> on it, and added support for multiple channels, with separate listener
> and notifier threads per channel. Each notification now carries the
> notifier ID, a sequence number, and a send timestamp. Listeners verify
> that sequence numbers arrive in order and record delivery latency. The
> program collects latency measurements into fixed buckets and reports
> them once per second together with total and per-second send/receive
> counts.
> 
> Also added a short delay before starting notifiers so that listeners
> have time to issue their LISTEN commands, and a new --channels option,
> and the meaning of --listeners and --notifiers was changed to apply per
> channel.
> 
> Also fixed so the code could be compiled outside of the PostgreSQL
> source code repo, if wanting to build this as stand-alone tool.
> 
> I've benchmarked master vs 0001+0002 and can't notice any differences;
> see attached output from benchmark runs.

Thanks. After some further testing, I was able to find a scenario where 
this patch significantly reduces performance: if the listening backends 
subscribe to a massive number of channels, like 10000, they spend a lot 
of time scanning the linked list of subscribed channels in 
IsListeningOn(). With the patch, those checks were performed while 
holding the SLRU lock, and it started to show up as lock contention 
between notifiers and listeners. To demonstrate that, attached is 
another version of the test program that adds an --extra-channels=N 
argument. If you set it to e.g. 10000, each listener backends calls 
LISTEN on 10000 additional channels that are never notified. They just 
make the listenChannels list longer. With that and the patches I posted 
previously, I'm getting:

$ PGHOST=localhost PGDB=postgres://localhost/postgres 
./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
--extra-channels=10000
10 s: 12716 sent (1274/s), 635775 received (63737/s)
  0.00-0.01ms                0 (0.0%) avg: 0.000ms
  0.01-0.10ms                0 (0.0%) avg: 0.000ms
  0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
  1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
  10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
 >100.00ms                  0 (0.0%) avg: 0.000ms
^C

Whereas on 'master', I see about 2-3x more notifies/s:

$ PGHOST=localhost PGDB=postgres://localhost/postgres 
./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
--extra-channels=10000
10 s: 32057 sent (3296/s), 1602995 received (164896/s)
  0.00-0.01ms                0 (0.0%) avg: 0.000ms
  0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
  0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
  1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
  10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
 >100.00ms                  0 (0.0%) avg: 0.000ms
^C

Fortunately that's easy to fix: We can move the IsListeningOn() check 
after releasing the lock. See attached.

The elephant in the room of course is that a lookup in a linked list is 
O(n) and it would be very straightforward to replace it with e.g. a hash 
table. We should do that irrespective of this bug fix. But I'm inclined 
to do it as a separate followup patch.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-04T14:40:31Z

Hi,

On Tue, Nov 4, 2025 at 3:10 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 03/11/2025 23:45, Joel Jacobson wrote:
> > On Mon, Nov 3, 2025, at 12:02, Heikki Linnakangas wrote:
> >> I wrote another little stand-alone performance test program for this,
> >> attached. It launches N connections that send NOTIFYs to a single
> >> channel as fast as possible, and M threads that listen for the
> >> notifications. I ran it with different combinations of N and M, on
> >> 'master' and on REL_14_STABLE (which didn't have SLRU banks) and I
> >> cannot discern any performance difference from these patches. So it
> >> seems that holding the SLRU (bank) lock across the
> >> TransactionIdDidCommit() calls is fine.
> >
> > Nice! That for the benchmark code! I took the liberty of hacking a bit
> > on it, and added support for multiple channels, with separate listener
> > and notifier threads per channel. Each notification now carries the
> > notifier ID, a sequence number, and a send timestamp. Listeners verify
> > that sequence numbers arrive in order and record delivery latency. The
> > program collects latency measurements into fixed buckets and reports
> > them once per second together with total and per-second send/receive
> > counts.
> >
> > Also added a short delay before starting notifiers so that listeners
> > have time to issue their LISTEN commands, and a new --channels option,
> > and the meaning of --listeners and --notifiers was changed to apply per
> > channel.
> >
> > Also fixed so the code could be compiled outside of the PostgreSQL
> > source code repo, if wanting to build this as stand-alone tool.
> >
> > I've benchmarked master vs 0001+0002 and can't notice any differences;
> > see attached output from benchmark runs.
>
> Thanks. After some further testing, I was able to find a scenario where
> this patch significantly reduces performance: if the listening backends
> subscribe to a massive number of channels, like 10000, they spend a lot
> of time scanning the linked list of subscribed channels in
> IsListeningOn(). With the patch, those checks were performed while
> holding the SLRU lock, and it started to show up as lock contention
> between notifiers and listeners. To demonstrate that, attached is
> another version of the test program that adds an --extra-channels=N
> argument. If you set it to e.g. 10000, each listener backends calls
> LISTEN on 10000 additional channels that are never notified. They just
> make the listenChannels list longer. With that and the patches I posted
> previously, I'm getting:
>
> $ PGHOST=localhost PGDB=postgres://localhost/postgres
> ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1
> --extra-channels=10000
> 10 s: 12716 sent (1274/s), 635775 received (63737/s)
>   0.00-0.01ms                0 (0.0%) avg: 0.000ms
>   0.01-0.10ms                0 (0.0%) avg: 0.000ms
>   0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
>   1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
>   10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
>  >100.00ms                  0 (0.0%) avg: 0.000ms
> ^C
>
> Whereas on 'master', I see about 2-3x more notifies/s:
>
> $ PGHOST=localhost PGDB=postgres://localhost/postgres
> ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1
> --extra-channels=10000
> 10 s: 32057 sent (3296/s), 1602995 received (164896/s)
>   0.00-0.01ms                0 (0.0%) avg: 0.000ms
>   0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
>   0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
>   1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
>   10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
>  >100.00ms                  0 (0.0%) avg: 0.000ms
> ^C
>
> Fortunately that's easy to fix: We can move the IsListeningOn() check
> after releasing the lock. See attached.
>

Thank you for working on this!

It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.

With master, in case of a failure during NotifyMyFrontEnd, the
listener's position in PG_FINALLY is set to the beginning of the next
notification, since we advance the "current" position only if the
previous notification was successfully sent.

With 0002, we advance the "current" position while copying
notifications to the local buffer, and begin sending them after the
position has already been advanced for all copied notifications. So in
case of a failure, the listener's position in PG_FINALLY is set to the
beginning of the next page or queue head. This means we can lose
notifications that were copied but were not sent.

If we want to preserve the previous behavior, maybe we could use a new
local position while copying notifications and only advance the
"current" position while sending notifications to the frontend?


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Tom Lane <tgl@sss.pgh.pa.us> — 2025-11-04T14:44:28Z

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Fortunately that's easy to fix: We can move the IsListeningOn() check 
> after releasing the lock. See attached.
> The elephant in the room of course is that a lookup in a linked list is 
> O(n) and it would be very straightforward to replace it with e.g. a hash 
> table. We should do that irrespective of this bug fix. But I'm inclined 
> to do it as a separate followup patch.

There is a different patch series that's concerned with improving
NOTIFY throughput [1].  I think this one should just focus on
fixing the XID problem.

			regards, tom lane

[1] https://commitfest.postgresql.org/patch/6078/

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-04T16:27:48Z

On 2025-Nov-04, Heikki Linnakangas wrote:

> From 7c342e6efffc8d59c2e7658f6f2f3b138d02e0bb Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Tue, 4 Nov 2025 13:22:08 +0200
> Subject: [PATCH v2 1/2] Fix bug where we truncated CLOG that was still needed
>  by LISTEN/NOTIFY

> The async notification queue contains the XID of the sender, and when
> processing notifications we call TransactionIdDidCommit() on the
> XID. But we had no safeguards to prevent the CLOG segments containing
> those XIDs from being truncated away. As a result, if a backend didn't
> for some reason process its notifications for a long time, or when a
> new backend issued LISTEN, you could get an error like:
> 
> test=# listen c21;
> ERROR:  58P01: could not access status of transaction 14279685
> DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:1087
> 
> Note: This commit is not a full fix. [...]

The commit message doesn't say just _what_ does the patch do to fix the
bug though.  Looking through the code, I think you're setting XIDs in
the async queue to frozenXid, but I think the commit message should
explain that.

Thanks for spending time on this.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-11-04T20:43:15Z

On Tue, Nov 4, 2025, at 13:09, Heikki Linnakangas wrote:
> With that and the patches I posted 
> previously, I'm getting:
>
> $ PGHOST=localhost PGDB=postgres://localhost/postgres 
> ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
> --extra-channels=10000
> 10 s: 12716 sent (1274/s), 635775 received (63737/s)
>   0.00-0.01ms                0 (0.0%) avg: 0.000ms
>   0.01-0.10ms                0 (0.0%) avg: 0.000ms
>   0.10-1.00ms     #          1915 (0.3%) avg: 0.807ms
>   1.00-10.00ms    #########  633550 (99.7%) avg: 3.502ms
>   10.00-100.00ms  #          310 (0.0%) avg: 11.423ms
>  >100.00ms                  0 (0.0%) avg: 0.000ms
> ^C
>
> Whereas on 'master', I see about 2-3x more notifies/s:
>
> $ PGHOST=localhost PGDB=postgres://localhost/postgres 
> ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 
> --extra-channels=10000
> 10 s: 32057 sent (3296/s), 1602995 received (164896/s)
>   0.00-0.01ms                0 (0.0%) avg: 0.000ms
>   0.01-0.10ms     #          11574 (0.7%) avg: 0.078ms
>   0.10-1.00ms     ######     1082960 (67.6%) avg: 0.577ms
>   1.00-10.00ms    ###        508199 (31.7%) avg: 1.489ms
>   10.00-100.00ms  #          262 (0.0%) avg: 16.178ms
>  >100.00ms                  0 (0.0%) avg: 0.000ms

Nice with the --extra-channels addition!

Below results on my MacBook Pro M3 Max.
I ran them for 30s to get more stable distributions.

% ninja install
C compiler for the host machine: cc (clang 17.0.0 "Apple clang version 17.0.0 (clang-1700.0.13.5)")
C linker for the host machine: cc ld64 1167.5
Host machine cpu family: aarch64
Host machine cpu: aarch64
buildtype: release

% gcc -Wall -Wextra -O2 -pthread -I/Users/joel/pg19/include/postgresql/server -I/Users/joel/pg19/include -o async-notify-test-3 async-notify-test-3.c -L/Users/joel/pg19/lib -lpq -pthread -lm

% ./async-notify-test-3 --listeners=50 --notifiers=4 --channels=1 --extra-channels=10000

# master (c98dffcb7c7010d216dc16d22cb594ef7d65fde1)
30 s: 293329 sent (9656/s), 14653182 received (480181/s)
 0.00-0.01ms                0 (0.0%) avg: 0.000ms
 0.01-0.10ms     #          298195 (2.0%) avg: 0.078ms
 0.10-1.00ms     ##         3060085 (20.9%) avg: 0.390ms
 1.00-10.00ms    ##         3310362 (22.6%) avg: 4.242ms
 10.00-100.00ms  #####      7482595 (51.1%) avg: 44.441ms
>100.00ms       #          501945 (3.4%) avg: 119.257ms

# 0001-Fix-bug-where-we-truncated-CLOG-that-was-still-neede.patch
# 0002-Hold-SLRU-bank-lock-across-TransactionIdDidCommit-in.patch
30 s: 87462 sent (3174/s), 4373100 received (158698/s)
 0.00-0.01ms                0 (0.0%) avg: 0.000ms
 0.01-0.10ms     #          16 (0.0%) avg: 0.091ms
 0.10-1.00ms     #          576554 (13.2%) avg: 0.798ms
 1.00-10.00ms    ########   3796530 (86.8%) avg: 1.589ms
 10.00-100.00ms             0 (0.0%) avg: 0.000ms
>100.00ms                  0 (0.0%) avg: 0.000ms

# v2-0001-Fix-bug-where-we-truncated-CLOG-that-was-still-ne.patch
# v2-0002-Fix-remaining-race-condition-with-CLOG-truncation.patch
30 s: 274342 sent (10134/s), 13708284 received (525274/s)
 0.00-0.01ms                0 (0.0%) avg: 0.000ms
 0.01-0.10ms     #          241981 (1.8%) avg: 0.079ms
 0.10-1.00ms     ###        4213484 (30.7%) avg: 0.399ms
 1.00-10.00ms    #          2727917 (19.9%) avg: 4.033ms
 10.00-100.00ms  ####       5920198 (43.2%) avg: 46.500ms
>100.00ms       #          604712 (4.4%) avg: 118.336ms

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Joel Jacobson <joel@compiler.org> — 2025-11-05T01:10:45Z

On Tue, Nov 4, 2025, at 13:09, Heikki Linnakangas wrote:
> The elephant in the room of course is that a lookup in a linked list is 
> O(n) and it would be very straightforward to replace it with e.g. a hash 
> table. We should do that irrespective of this bug fix. But I'm inclined 
> to do it as a separate followup patch.

Thanks for the idea of replacing it with a local hash table.
I've implemented it in the patch set we're working on:
https://commitfest.postgresql.org/patch/6078/
https://www.postgresql.org/message-id/12c08e29-c21c-4a3d-a269-a48a1a26b18d%40app.fastmail.com

/Joel

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-05T09:21:29Z

On 04/11/2025 16:40, Arseniy Mukhin wrote:
> It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
> 
> With master, in case of a failure during NotifyMyFrontEnd, the
> listener's position in PG_FINALLY is set to the beginning of the next
> notification, since we advance the "current" position only if the
> previous notification was successfully sent.
> 
> With 0002, we advance the "current" position while copying
> notifications to the local buffer, and begin sending them after the
> position has already been advanced for all copied notifications. So in
> case of a failure, the listener's position in PG_FINALLY is set to the
> beginning of the next page or queue head. This means we can lose
> notifications that were copied but were not sent.

True.

> If we want to preserve the previous behavior, maybe we could use a new
> local position while copying notifications and only advance the
> "current" position while sending notifications to the frontend?

That's not good. The loop advances 'current' before calling 
TransactionIdDidCommit() to ensure that if there's a broken entry in the 
queue for which TransactionIdDidCommit() throws an error, we advance 
'current' past that point. Otherwise we would get back later to try to 
process the same broken entry again and again.

We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that 
the copied notifications get sent even on error. But I'm a reluctant to 
put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is 
safe to call during error processing. And if the client is not 
processing the notifications, the abort processing could be delayed for 
a long time.

One idea is to close the client connection on a TransactionIdDidCommit 
error, i.e. make the error FATAL. The current behavior where we skip the 
notification seems problematic already. Closing the connection would be 
a clear signal that some notifications have been lost.

Or we can just accept new behavior that if TransactionIdDidCommit() 
fails, we might lose more notifications than the one that caused the error.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-05T12:40:38Z

On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
> On 04/11/2025 16:40, Arseniy Mukhin wrote:
>> It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
>>
>> With master, in case of a failure during NotifyMyFrontEnd, the
>> listener's position in PG_FINALLY is set to the beginning of the next
>> notification, since we advance the "current" position only if the
>> previous notification was successfully sent.
>>
>> With 0002, we advance the "current" position while copying
>> notifications to the local buffer, and begin sending them after the
>> position has already been advanced for all copied notifications. So in
>> case of a failure, the listener's position in PG_FINALLY is set to the
>> beginning of the next page or queue head. This means we can lose
>> notifications that were copied but were not sent.
>
> True.
>
>> If we want to preserve the previous behavior, maybe we could use a new
>> local position while copying notifications and only advance the
>> "current" position while sending notifications to the frontend?
>
> That's not good. The loop advances 'current' before calling
> TransactionIdDidCommit() to ensure that if there's a broken entry in the
> queue for which TransactionIdDidCommit() throws an error, we advance
> 'current' past that point. Otherwise we would get back later to try to
> process the same broken entry again and again.
>
> We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
> the copied notifications get sent even on error. But I'm a reluctant to
> put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
> safe to call during error processing. And if the client is not
> processing the notifications, the abort processing could be delayed for
> a long time.
>
> One idea is to close the client connection on a TransactionIdDidCommit
> error, i.e. make the error FATAL. The current behavior where we skip the
> notification seems problematic already. Closing the connection would be
> a clear signal that some notifications have been lost.
>
> Or we can just accept new behavior that if TransactionIdDidCommit()
> fails, we might lose more notifications than the one that caused the error.
>
I think that we may have an error on TransactionIdDidCommit() and on
NotifyMyFrontEnd() right?

In case of an error on TransactionIdDidCommit() I think that we will
have the same behavior as we advance the "current" position before of
DidCommit call the PG_FINALLY block will set the backend position past
the failing notification entry.

I think that the problem is on NotifyMyFrontEnd() call that if it fails
we will set the current backend position to the head of the queue
or the beginning of the next page possibly losing notifications.

How bad would be to store the notification entries within a list and
store the next position with the notification entry and then wrap the
NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
saved "next position" on PG_CATCH? Something like this:
do
{
    ListEntry le;
    qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
    reachedEndOfPage = asyncQueueAdvance(current, qe->length);
    ...
    else if (TransactionIdDidCommit(qe->xid))
    {
        le->entry = qe;
        le->nextPos = current;
    }
} while (!reachedEndOfPage);

foreach(lc, notificationEntries)
{
    ListEntry *le = lfirst(lc);
    AsyncQueueEntry *qe = le->entry;
    char           *channel = qe->data;

    PG_TRY()
    {
        if (IsListeningOn(channel))
        {
            char           *payload = qe->data + strlen(channel) + 1;
            NotifyMyFrontEnd(channel, payload, qe->srcPid);
        }
    }
    PG_CATCH()
    {
        current = le->nextPos;
        PG_RE_THROW();
    }
}

My concern with this idea is about the overhead of using a list to store
the notification entries and also the PG_TRY block but perhaps it's
worth it to prevent losing notifications?

--
Matheus Alcantara
EDB: http://www.enterprisedb.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-05T12:59:15Z

On 05/11/2025 14:40, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
>> On 04/11/2025 16:40, Arseniy Mukhin wrote:
>>> If we want to preserve the previous behavior, maybe we could use a new
>>> local position while copying notifications and only advance the
>>> "current" position while sending notifications to the frontend?
>>
>> That's not good. The loop advances 'current' before calling
>> TransactionIdDidCommit() to ensure that if there's a broken entry in the
>> queue for which TransactionIdDidCommit() throws an error, we advance
>> 'current' past that point. Otherwise we would get back later to try to
>> process the same broken entry again and again.
>>
>> We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
>> the copied notifications get sent even on error. But I'm a reluctant to
>> put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
>> safe to call during error processing. And if the client is not
>> processing the notifications, the abort processing could be delayed for
>> a long time.
>>
>> One idea is to close the client connection on a TransactionIdDidCommit
>> error, i.e. make the error FATAL. The current behavior where we skip the
>> notification seems problematic already. Closing the connection would be
>> a clear signal that some notifications have been lost.
>>
>> Or we can just accept new behavior that if TransactionIdDidCommit()
>> fails, we might lose more notifications than the one that caused the error.
>>
> I think that we may have an error on TransactionIdDidCommit() and on
> NotifyMyFrontEnd() right?

True.

> In case of an error on TransactionIdDidCommit() I think that we will
> have the same behavior as we advance the "current" position before of
> DidCommit call the PG_FINALLY block will set the backend position past
> the failing notification entry.

With my patch, if TransactionIdDidCommit() fails, we will lose all the 
notifications that we have buffered in the local buffer but haven't 
passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single 
notification, the one where TransactionIdDidCommit() failed.

> I think that the problem is on NotifyMyFrontEnd() call that if it fails
> we will set the current backend position to the head of the queue
> or the beginning of the next page possibly losing notifications.

True, a failure on NotifyMyFrontEnd can happen too, and it will also 
lead to losing all the other notifications in the local buffer.

> How bad would be to store the notification entries within a list and
> store the next position with the notification entry and then wrap the
> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
> saved "next position" on PG_CATCH? Something like this:
> do
> {
>      ListEntry le;
>      qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
>      reachedEndOfPage = asyncQueueAdvance(current, qe->length);
>      ...
>      else if (TransactionIdDidCommit(qe->xid))
>      {
>          le->entry = qe;
>          le->nextPos = current;
>      }
> } while (!reachedEndOfPage);
> 
> foreach(lc, notificationEntries)
> {
>      ListEntry *le = lfirst(lc);
>      AsyncQueueEntry *qe = le->entry;
>      char           *channel = qe->data;
> 
>      PG_TRY()
>      {
>          if (IsListeningOn(channel))
>          {
>              char           *payload = qe->data + strlen(channel) + 1;
>              NotifyMyFrontEnd(channel, payload, qe->srcPid);
>          }
>      }
>      PG_CATCH()
>      {
>          current = le->nextPos;
>          PG_RE_THROW();
>      }
> }

That addresses the failure on NotifyMyFrontEnd, but not on 
TransactionIdDidCommit.

IMHO we should just make these errors FATAL. TransactionIdDidCommit() 
should not really fail, after fixing the bug we're discussing. 
NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on 
an otherwise idle connection. Or it could fail if the client connection 
is lost, but then the backend is about to die anyway. And arguably 
closing the connection is better than losing even a single notification, 
anyway.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-05T18:17:19Z

On Wed, Nov 5, 2025 at 12:21 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 04/11/2025 16:40, Arseniy Mukhin wrote:
> > It seems that 0002 handles errors during NotifyMyFrontEnd a little differently.
> >
> > With master, in case of a failure during NotifyMyFrontEnd, the
> > listener's position in PG_FINALLY is set to the beginning of the next
> > notification, since we advance the "current" position only if the
> > previous notification was successfully sent.
> >
> > With 0002, we advance the "current" position while copying
> > notifications to the local buffer, and begin sending them after the
> > position has already been advanced for all copied notifications. So in
> > case of a failure, the listener's position in PG_FINALLY is set to the
> > beginning of the next page or queue head. This means we can lose
> > notifications that were copied but were not sent.
>
> True.
>
> > If we want to preserve the previous behavior, maybe we could use a new
> > local position while copying notifications and only advance the
> > "current" position while sending notifications to the frontend?
>
> That's not good. The loop advances 'current' before calling
> TransactionIdDidCommit() to ensure that if there's a broken entry in the
> queue for which TransactionIdDidCommit() throws an error, we advance
> 'current' past that point. Otherwise we would get back later to try to
> process the same broken entry again and again.
>

Ouch, I failed to realise that this try/catch saves us from
TransactionIdDidCommit failure too.

The comment around PG_TRY says:

    /*
     * It is possible that we fail while trying to send a message to our
     * frontend (for example, because of encoding conversion failure).  If

So I thought that it's the main reason we have try/catch here.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-05T19:02:52Z

On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>> In case of an error on TransactionIdDidCommit() I think that we will
>> have the same behavior as we advance the "current" position before of
>> DidCommit call the PG_FINALLY block will set the backend position past
>> the failing notification entry.
>
> With my patch, if TransactionIdDidCommit() fails, we will lose all the 
> notifications that we have buffered in the local buffer but haven't 
> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single 
> notification, the one where TransactionIdDidCommit() failed.
>
Yeah, that's true.

>> How bad would be to store the notification entries within a list and
>> store the next position with the notification entry and then wrap the
>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>> saved "next position" on PG_CATCH? Something like this:
>
> [ ...]
>
> That addresses the failure on NotifyMyFrontEnd, but not on 
> TransactionIdDidCommit.
>
> IMHO we should just make these errors FATAL. TransactionIdDidCommit() 
> should not really fail, after fixing the bug we're discussing. 
> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on 
> an otherwise idle connection. Or it could fail if the client connection 
> is lost, but then the backend is about to die anyway. And arguably 
> closing the connection is better than losing even a single notification, 
> anyway.
>
My only concern with making these errors FATAL is that if a notification
entry causes a different, recoverable error, all subsequent messages
will be lost. This is because if backend die and the user open a new
connection and execute LISTEN on the channel it will not see these
notifications past the one that caused the error. I'm not sure if we are
completely safe from this case of a recoverable error, what do you
think?

-- 
Matheus Alcantara
EDB: http://www.enterprisedb.com

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T10:05:20Z

On 05/11/2025 21:02, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>> In case of an error on TransactionIdDidCommit() I think that we will
>>> have the same behavior as we advance the "current" position before of
>>> DidCommit call the PG_FINALLY block will set the backend position past
>>> the failing notification entry.
>>
>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>> notifications that we have buffered in the local buffer but haven't
>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>> notification, the one where TransactionIdDidCommit() failed.
>>
> Yeah, that's true.
> 
>>> How bad would be to store the notification entries within a list and
>>> store the next position with the notification entry and then wrap the
>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>> saved "next position" on PG_CATCH? Something like this:
>>
>> [ ...]
>>
>> That addresses the failure on NotifyMyFrontEnd, but not on
>> TransactionIdDidCommit.
>>
>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>> should not really fail, after fixing the bug we're discussing.
>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>> an otherwise idle connection. Or it could fail if the client connection
>> is lost, but then the backend is about to die anyway. And arguably
>> closing the connection is better than losing even a single notification,
>> anyway.
>>
> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error. I'm not sure if we are
> completely safe from this case of a recoverable error, what do you
> think?

I did some more testing of the current behavior, using the encoding 
conversion to cause an error:

In backend A:

SET client_encoding='latin1';
LISTEN foo;

Backend b:

NOTIFY foo, 'ハ';

Observations:

- If the connection is idle when the notification is received, the ERROR 
is turned into FATAL anyway:

postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8" 
has no equivalent in encoding "LATIN1"
FATAL:  terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

- If there are multiple notifications pending, we stop processing the 
subsequent notifications on the first error. The subsequent 
notifications will only be processed when another notify interrupt is 
received, i.e. when a backend sends yet another notification.

I'm getting more and more convinced that escalating all ERRORs to FATALs 
during notify processing is the right way to go. Attached is a new patch 
set that does that.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-06T10:18:13Z

On 2025-Nov-05, Matheus Alcantara wrote:

> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error.

I don't think things are supposed to work that way -- I understand that
an application that connects is supposed to read the current state of
things, and then the notifies ensure that any changes from that point on
can be processed.  So if a connection dies on a FATAL, then you have to
establish your LISTENs again and obtain the current state of the world
at that point.  It doesn't miss anything, because any NOTIFYies that
occurred while the connection was down should be obtained when current
state is read.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-06T10:24:50Z

Hello,

I've been studying the code for freezing of items at vacuum-truncate
time, and one thing that jumps at me is that perhaps we ought to be more
proactive in freezing items immediately as they are processed.  We can
do that for any transactions that precede RecentXmin, because by that
point, every snapshot in the system should have that committed
transaction as no longer in progress anyway. Something like

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8ac7d989641..88d5ed2b461 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -2038,6 +2038,11 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 			}
 			else if (TransactionIdDidCommit(qe->xid))
 			{
+				if (TransactionIdPrecedes(RecentXmin, qe->xid))
+				{
+					elog(WARNING, "freezing an entry for transaction %u", qe->xid);
+					qe->xid = FrozenTransactionId;
+				}
 				memcpy(local_buf_end, qe, qe->length);
 				local_buf_end += qe->length;
 			}

If we do this, then the chances that we need to freeze items at vacuum
time are much lower, and we're not creating any danger that
TransactionIdDidCommit() errors any more than we have in the current
code.

Is there any reason this wouldn't work?


I noticed that async-notify-test-3 doesn't actually freeze any items by
itself ... you need to do "ALTER TABLE template0 allow_connections" and
then do vacuumdb -a in a loop in order for this to happen (or something
similar, I guess).  Otherwise the new code in AsyncNotifyFreezeXids is
never executed.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-06T11:20:37Z

On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 05/11/2025 21:02, Matheus Alcantara wrote:
> > On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
> >>> In case of an error on TransactionIdDidCommit() I think that we will
> >>> have the same behavior as we advance the "current" position before of
> >>> DidCommit call the PG_FINALLY block will set the backend position past
> >>> the failing notification entry.
> >>
> >> With my patch, if TransactionIdDidCommit() fails, we will lose all the
> >> notifications that we have buffered in the local buffer but haven't
> >> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
> >> notification, the one where TransactionIdDidCommit() failed.
> >>
> > Yeah, that's true.
> >
> >>> How bad would be to store the notification entries within a list and
> >>> store the next position with the notification entry and then wrap the
> >>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
> >>> saved "next position" on PG_CATCH? Something like this:
> >>
> >> [ ...]
> >>
> >> That addresses the failure on NotifyMyFrontEnd, but not on
> >> TransactionIdDidCommit.
> >>
> >> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
> >> should not really fail, after fixing the bug we're discussing.
> >> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
> >> an otherwise idle connection. Or it could fail if the client connection
> >> is lost, but then the backend is about to die anyway. And arguably
> >> closing the connection is better than losing even a single notification,
> >> anyway.
> >>
> > My only concern with making these errors FATAL is that if a notification
> > entry causes a different, recoverable error, all subsequent messages
> > will be lost. This is because if backend die and the user open a new
> > connection and execute LISTEN on the channel it will not see these
> > notifications past the one that caused the error. I'm not sure if we are
> > completely safe from this case of a recoverable error, what do you
> > think?
>
> I did some more testing of the current behavior, using the encoding
> conversion to cause an error:
>
> In backend A:
>
> SET client_encoding='latin1';
> LISTEN foo;
>
> Backend b:
>
> NOTIFY foo, 'ハ';
>
> Observations:
>
> - If the connection is idle when the notification is received, the ERROR
> is turned into FATAL anyway:
>
> postgres=# SET client_encoding='latin1';
> SET
> postgres=# LISTEN foo;
> LISTEN
> postgres=# select 1; -- do the NOTIFY in another connection before this
> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
> has no equivalent in encoding "LATIN1"
> FATAL:  terminating connection because protocol synchronization was lost
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
>
> - If there are multiple notifications pending, we stop processing the
> subsequent notifications on the first error. The subsequent
> notifications will only be processed when another notify interrupt is
> received, i.e. when a backend sends yet another notification.
>
> I'm getting more and more convinced that escalating all ERRORs to FATALs
> during notify processing is the right way to go. Attached is a new patch
> set that does that.
>

One point about removing try/catch: when we execute LISTEN for the
very first time, we read the queue from the min(listener's pos) up to
the head. listenChannels is empty at the moment, so we never call
NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
some problematic entry in the queue that we need to process during
initial LISTEN, it could block creation of new listeners. Is it
something we need to worry about?


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T11:35:28Z

Thanks for reviewing!

On 06/11/2025 12:24, Álvaro Herrera wrote:
> I've been studying the code for freezing of items at vacuum-truncate
> time, and one thing that jumps at me is that perhaps we ought to be more
> proactive in freezing items immediately as they are processed.  We can
> do that for any transactions that precede RecentXmin, because by that
> point, every snapshot in the system should have that committed
> transaction as no longer in progress anyway. Something like
> 
> diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
> index 8ac7d989641..88d5ed2b461 100644
> --- a/src/backend/commands/async.c
> +++ b/src/backend/commands/async.c
> @@ -2038,6 +2038,11 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
>   			}
>   			else if (TransactionIdDidCommit(qe->xid))
>   			{
> +				if (TransactionIdPrecedes(RecentXmin, qe->xid))
> +				{
> +					elog(WARNING, "freezing an entry for transaction %u", qe->xid);
> +					qe->xid = FrozenTransactionId;
> +				}
>   				memcpy(local_buf_end, qe, qe->length);
>   				local_buf_end += qe->length;
>   			}
> 
> If we do this, then the chances that we need to freeze items at vacuum
> time are much lower, and we're not creating any danger that
> TransactionIdDidCommit() errors any more than we have in the current
> code.
> 
> Is there any reason this wouldn't work?

We could do that and it would be good for performance anyway. This is 
the "hint bit" idea that's been discussed in this thread. I think it 
would be better to do it with additional hint bits rather than by 
overwriting 'xid'. Having a separate COMMIITTED/ABORTED and FROZEN bits 
would allow setting the COMMITTED hint bit even when the xid is newer 
than RecentXmin.

Would need to mark the SLRU page dirty if we do that. Not sure if that 
requires holding an exclusive lock. And we should also replace aborted 
xids with InvalidTransactionId.

I feel that that should be a separate patch, and not backpatched. We'd 
still need all the other changes from this patch series even if we did 
that, it would be just an optimization.

> I noticed that async-notify-test-3 doesn't actually freeze any items by
> itself ... you need to do "ALTER TABLE template0 allow_connections" and
> then do vacuumdb -a in a loop in order for this to happen (or something
> similar, I guess).  Otherwise the new code in AsyncNotifyFreezeXids is
> never executed.

Right, async-notify-test-3 was just a performance test to verify that 
the holding the SLRU lock longer doesn't degrade performance. I haven't 
tried performance testing CLOG truncation itself, but it happens so 
rarely that I'm not too worried about it. See 
src/test/modules/xid_wraparound/t/004_notify_freeze.pl for a correctness 
test.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T12:05:59Z

On 06/11/2025 13:20, Arseniy Mukhin wrote:
> On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 05/11/2025 21:02, Matheus Alcantara wrote:
>>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>>>> In case of an error on TransactionIdDidCommit() I think that we will
>>>>> have the same behavior as we advance the "current" position before of
>>>>> DidCommit call the PG_FINALLY block will set the backend position past
>>>>> the failing notification entry.
>>>>
>>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>>>> notifications that we have buffered in the local buffer but haven't
>>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>>>> notification, the one where TransactionIdDidCommit() failed.
>>>>
>>> Yeah, that's true.
>>>
>>>>> How bad would be to store the notification entries within a list and
>>>>> store the next position with the notification entry and then wrap the
>>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>>>> saved "next position" on PG_CATCH? Something like this:
>>>>
>>>> [ ...]
>>>>
>>>> That addresses the failure on NotifyMyFrontEnd, but not on
>>>> TransactionIdDidCommit.
>>>>
>>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>>>> should not really fail, after fixing the bug we're discussing.
>>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>>>> an otherwise idle connection. Or it could fail if the client connection
>>>> is lost, but then the backend is about to die anyway. And arguably
>>>> closing the connection is better than losing even a single notification,
>>>> anyway.
>>>>
>>> My only concern with making these errors FATAL is that if a notification
>>> entry causes a different, recoverable error, all subsequent messages
>>> will be lost. This is because if backend die and the user open a new
>>> connection and execute LISTEN on the channel it will not see these
>>> notifications past the one that caused the error. I'm not sure if we are
>>> completely safe from this case of a recoverable error, what do you
>>> think?
>>
>> I did some more testing of the current behavior, using the encoding
>> conversion to cause an error:
>>
>> In backend A:
>>
>> SET client_encoding='latin1';
>> LISTEN foo;
>>
>> Backend b:
>>
>> NOTIFY foo, 'ハ';
>>
>> Observations:
>>
>> - If the connection is idle when the notification is received, the ERROR
>> is turned into FATAL anyway:
>>
>> postgres=# SET client_encoding='latin1';
>> SET
>> postgres=# LISTEN foo;
>> LISTEN
>> postgres=# select 1; -- do the NOTIFY in another connection before this
>> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
>> has no equivalent in encoding "LATIN1"
>> FATAL:  terminating connection because protocol synchronization was lost
>> server closed the connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>>
>> - If there are multiple notifications pending, we stop processing the
>> subsequent notifications on the first error. The subsequent
>> notifications will only be processed when another notify interrupt is
>> received, i.e. when a backend sends yet another notification.
>>
>> I'm getting more and more convinced that escalating all ERRORs to FATALs
>> during notify processing is the right way to go. Attached is a new patch
>> set that does that.
> 
> One point about removing try/catch: when we execute LISTEN for the
> very first time, we read the queue from the min(listener's pos) up to
> the head. listenChannels is empty at the moment, so we never call
> NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
> some problematic entry in the queue that we need to process during
> initial LISTEN, it could block creation of new listeners. Is it
> something we need to worry about?

Hmm, I don't see how it could block the creation of new listeners. 
Exec_ListenPreCommit() starts from "max(listener's pos)" over existing 
listeners, not min(). Am I missing something?

That said, Exec_ListenPreCommit() could start from 'min' among all 
listeners, if there are no listeners for the same backend. Per attached 
patch. I don't think it makes much difference though.

Another small change we could make is to check for listenChannels == NIL 
before calling TransactionIdDidCommit.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-06T15:13:43Z

On Thu, Nov 6, 2025 at 3:06 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 06/11/2025 13:20, Arseniy Mukhin wrote:
> > On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>
> >> On 05/11/2025 21:02, Matheus Alcantara wrote:
> >>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
> >>>>> In case of an error on TransactionIdDidCommit() I think that we will
> >>>>> have the same behavior as we advance the "current" position before of
> >>>>> DidCommit call the PG_FINALLY block will set the backend position past
> >>>>> the failing notification entry.
> >>>>
> >>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
> >>>> notifications that we have buffered in the local buffer but haven't
> >>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
> >>>> notification, the one where TransactionIdDidCommit() failed.
> >>>>
> >>> Yeah, that's true.
> >>>
> >>>>> How bad would be to store the notification entries within a list and
> >>>>> store the next position with the notification entry and then wrap the
> >>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
> >>>>> saved "next position" on PG_CATCH? Something like this:
> >>>>
> >>>> [ ...]
> >>>>
> >>>> That addresses the failure on NotifyMyFrontEnd, but not on
> >>>> TransactionIdDidCommit.
> >>>>
> >>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
> >>>> should not really fail, after fixing the bug we're discussing.
> >>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
> >>>> an otherwise idle connection. Or it could fail if the client connection
> >>>> is lost, but then the backend is about to die anyway. And arguably
> >>>> closing the connection is better than losing even a single notification,
> >>>> anyway.
> >>>>
> >>> My only concern with making these errors FATAL is that if a notification
> >>> entry causes a different, recoverable error, all subsequent messages
> >>> will be lost. This is because if backend die and the user open a new
> >>> connection and execute LISTEN on the channel it will not see these
> >>> notifications past the one that caused the error. I'm not sure if we are
> >>> completely safe from this case of a recoverable error, what do you
> >>> think?
> >>
> >> I did some more testing of the current behavior, using the encoding
> >> conversion to cause an error:
> >>
> >> In backend A:
> >>
> >> SET client_encoding='latin1';
> >> LISTEN foo;
> >>
> >> Backend b:
> >>
> >> NOTIFY foo, 'ハ';
> >>
> >> Observations:
> >>
> >> - If the connection is idle when the notification is received, the ERROR
> >> is turned into FATAL anyway:
> >>
> >> postgres=# SET client_encoding='latin1';
> >> SET
> >> postgres=# LISTEN foo;
> >> LISTEN
> >> postgres=# select 1; -- do the NOTIFY in another connection before this
> >> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
> >> has no equivalent in encoding "LATIN1"
> >> FATAL:  terminating connection because protocol synchronization was lost
> >> server closed the connection unexpectedly
> >>          This probably means the server terminated abnormally
> >>          before or while processing the request.
> >>
> >> - If there are multiple notifications pending, we stop processing the
> >> subsequent notifications on the first error. The subsequent
> >> notifications will only be processed when another notify interrupt is
> >> received, i.e. when a backend sends yet another notification.
> >>
> >> I'm getting more and more convinced that escalating all ERRORs to FATALs
> >> during notify processing is the right way to go. Attached is a new patch
> >> set that does that.
> >
> > One point about removing try/catch: when we execute LISTEN for the
> > very first time, we read the queue from the min(listener's pos) up to
> > the head. listenChannels is empty at the moment, so we never call
> > NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
> > some problematic entry in the queue that we need to process during
> > initial LISTEN, it could block creation of new listeners. Is it
> > something we need to worry about?
>
> Hmm, I don't see how it could block the creation of new listeners.
> Exec_ListenPreCommit() starts from "max(listener's pos)" over existing
> listeners, not min(). Am I missing something?
>

My bad, you are right, we use max(listener's pos). But it doesn't
resolve the issue if I'm not missing something.

Let's say we have the queue:

(tail ... pos1 ... bad_entry_pos ... head)

bad_entry_pos - position of the entry where TransactionIdDidCommit fails.

We have the listener L1 with pos = pos1. It means every new listener
should process the queue from pos1 (as it's max(listener's pos)) up to
the queue head and when they try to do it, they will fail on
'bad_entry'.

> That said, Exec_ListenPreCommit() could start from 'min' among all
> listeners, if there are no listeners for the same backend. Per attached
> patch. I don't think it makes much difference though.
>

Yes, if there are no listeners, every new listener will start the
initial reading from the QUEUE_TAIL and will fail on the bad_entry
too.

But this example is valid only if we think we can fail on
TransactionIdDidCommit (even after the fix).

> Another small change we could make is to check for listenChannels == NIL
> before calling TransactionIdDidCommit.

There is a comment that describes why we need this initial reading in
Exec_ListenPreCommit:

    /*
     * This is our first LISTEN, so establish our pointer.
     *
     * We set our pointer to the global tail pointer and then move it forward
     * over already-committed notifications.  This ensures we cannot miss any
     * not-yet-committed notifications.  We might get a few more but that
     * doesn't hurt.

It seems that with such a check we will skip not-yet-committed
notifications and always get to the HEAD. If we decide that it's ok,
maybe we can just set 'pos' of every new listener to the HEAD without
reading?


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-07T08:10:25Z

On 06/11/2025 17:13, Arseniy Mukhin wrote:
> Let's say we have the queue:
> 
> (tail ... pos1 ... bad_entry_pos ... head)
> 
> bad_entry_pos - position of the entry where TransactionIdDidCommit fails.
> 
> We have the listener L1 with pos = pos1. It means every new listener
> should process the queue from pos1 (as it's max(listener's pos)) up to
> the queue head and when they try to do it, they will fail on
> 'bad_entry'.

Gotcha.

>> Another small change we could make is to check for listenChannels == NIL
>> before calling TransactionIdDidCommit.
> 
> There is a comment that describes why we need this initial reading in
> Exec_ListenPreCommit:
> 
>      /*
>       * This is our first LISTEN, so establish our pointer.
>       *
>       * We set our pointer to the global tail pointer and then move it forward
>       * over already-committed notifications.  This ensures we cannot miss any
>       * not-yet-committed notifications.  We might get a few more but that
>       * doesn't hurt.
> 
> It seems that with such a check we will skip not-yet-committed
> notifications and always get to the HEAD. If we decide that it's ok,
> maybe we can just set 'pos' of every new listener to the HEAD without
> reading?

Right, I didn't mean skipping asyncQueueReadAllNotifications() 
altogether. Just the TransactionIdDidCommit() calls in it, when 
listenChannels == NIL, see attached patch. The idea is that when 
'listenChannels == NIL', we're not going send the notification to the 
frontend regardless of what TransactionIdDidCommit says. If we don't 
call TransactionIdDidCommit, we won't fail on bad entries.

I'm not sure how much this matters. We really shouldn't have bad entries 
in the queue to begin with. And if we do, a broken entry could cause all 
kinds of trouble. For example, if the broken entry's (bogus) XID is 
higher than current XID, XidInMVCCSnapshot() will return true and we'll 
get stuck on that.

- Heikki

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> — 2025-11-07T09:41:48Z

On Fri, Nov 7, 2025 at 11:10 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 06/11/2025 17:13, Arseniy Mukhin wrote:
> > Let's say we have the queue:
> >
> > (tail ... pos1 ... bad_entry_pos ... head)
> >
> > bad_entry_pos - position of the entry where TransactionIdDidCommit fails.
> >
> > We have the listener L1 with pos = pos1. It means every new listener
> > should process the queue from pos1 (as it's max(listener's pos)) up to
> > the queue head and when they try to do it, they will fail on
> > 'bad_entry'.
>
> Gotcha.
>
> >> Another small change we could make is to check for listenChannels == NIL
> >> before calling TransactionIdDidCommit.
> >
> > There is a comment that describes why we need this initial reading in
> > Exec_ListenPreCommit:
> >
> >      /*
> >       * This is our first LISTEN, so establish our pointer.
> >       *
> >       * We set our pointer to the global tail pointer and then move it forward
> >       * over already-committed notifications.  This ensures we cannot miss any
> >       * not-yet-committed notifications.  We might get a few more but that
> >       * doesn't hurt.
> >
> > It seems that with such a check we will skip not-yet-committed
> > notifications and always get to the HEAD. If we decide that it's ok,
> > maybe we can just set 'pos' of every new listener to the HEAD without
> > reading?
>
> Right, I didn't mean skipping asyncQueueReadAllNotifications()
> altogether. Just the TransactionIdDidCommit() calls in it, when
> listenChannels == NIL, see attached patch. The idea is that when
> 'listenChannels == NIL', we're not going send the notification to the
> frontend regardless of what TransactionIdDidCommit says. If we don't
> call TransactionIdDidCommit, we won't fail on bad entries.
>

Ahh, now I see, thank you. WFM.


Best regards,
Arseniy Mukhin

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Álvaro Herrera <alvherre@kurilemu.de> — 2025-11-07T14:14:31Z

One thing I noticed while testing this is that asyncQueueAddEntries()
fills the end of a page with a dummy entry, when the next notify doesn't
fit.  However, this dummy entry contains a very valid TransactionId,
which the new freezing code will try to look up and freeze.  I think
this is somewhat bogus -- we shouldn't even try to look up that XID in
the first place.  I propose to clear it like this

@@ -1419,6 +1424,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 			 */
 			qe.length = QUEUE_PAGESIZE - offset;
 			qe.dboid = InvalidOid;
+			qe.xid = InvalidTransactionId;
 			qe.data[0] = '\0';	/* empty channel */
 			qe.data[1] = '\0';	/* empty payload */
 		}


(Line numbers do not match, because I have other local changes.)


-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I'm always right, but sometimes I'm more right than other times."
                                                  (Linus Torvalds)
https://lore.kernel.org/git/Pine.LNX.4.58.0504150753440.7211@ppc970.osdl.org/

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-12T19:39:47Z

On 07/11/2025 16:14, Álvaro Herrera wrote:
> One thing I noticed while testing this is that asyncQueueAddEntries()
> fills the end of a page with a dummy entry, when the next notify doesn't
> fit.  However, this dummy entry contains a very valid TransactionId,
> which the new freezing code will try to look up and freeze.  I think
> this is somewhat bogus -- we shouldn't even try to look up that XID in
> the first place.  I propose to clear it like this
> 
> @@ -1419,6 +1424,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
>   			 */
>   			qe.length = QUEUE_PAGESIZE - offset;
>   			qe.dboid = InvalidOid;
> +			qe.xid = InvalidTransactionId;
>   			qe.data[0] = '\0';	/* empty channel */
>   			qe.data[1] = '\0';	/* empty payload */
>   		}
> 
> 
> (Line numbers do not match, because I have other local changes.)

Committed. I committed the above separately, because I forgot to include 
it in the main commit. Oops.

Just to summarize what was committed, out of all the different variants 
discussed:

* Any ERROR while processing an async notification is now turned into FATAL
* Vacuum scans the async notification queue and freezes xids before 
truncating CLOG
* The TransactionIdDidCommit() calls are now made while holding the SLRU 
lock. NotifyMyFrontEnd() calls are still made after releasing the lock
* listenChannels == NIL special case is checked before 
TransactionIdDidCommit(). This avoids the problem that no backend can 
LISTEN to anything anymore, if there's one broken entry in the queue for 
some reason
* 'xid' field on dummy entries is now set to InvalidTransactionId so 
that they don't need to be frozen

Thanks everyone!

- Heikki