Thread

  1. Re: Fix pg_stat_wal_receiver to show CONNECTING status

    Chao Li <li.evan.chao@gmail.com> — 2026-05-21T07:20:13Z

    
    > On May 21, 2026, at 07:06, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 
    > 
    >> On May 21, 2026, at 04:43, Michael Paquier <michael@paquier.xyz> wrote:
    >> 
    >> On Wed, May 20, 2026 at 03:53:38PM +0800, Chao Li wrote:
    >>> With v2, slot_name, sender_host, sender_port, and conninfo are
    >>> already left NULL while the receiver is in CONNECTING state. I feel
    >>> we don't have to show the timestamp fields either. Since the columns
    >>> are named last_msg_send_time and last_msg_receipt_time, users may
    >>> naturally interpret them as the last time a message was sent to or
    >>> received from
    >>> the primary. If we show the standby server start time in those
    >>> columns, I am afraid that could be confusing.
    >>> 
    >>> But I think it might be useful to show the *_lsn and *_tli values in
    >>> CONNECTING state if they are available.
    >> 
    >> The original reason why ready_to_display has been introduced is this
    >> one, where we wanted to have a strict control over the connection
    >> information across multiple calls of pg_stat_get_wal_receiver():
    >> https://www.postgresql.org/message-id/CAB7nPqQNbHQ7F7wDD_2qvGA_FUW-Leds9HQNM6kJnto7RFNhUg@mail.gmail.com
    >> 
    >> With v2, ready_to_display is still able to do the job it is defined
    >> for.  This does not need to apply on the time fields, so IMO showing
    >> them to the values they are initialized is not a big deal, and they
    >> can actually be useful to know even in the early stage of connection
    >> as they reveal the state of the code.  
    >> 
    >> Note also that the time values could still show up based on their
    >> initial values at the early connection stage, even after completing
    >> walrcv_connect() and after ready_to_display is switched to true, so
    >> it's not like these values are that confusing: we just expose them a
    >> bit more at an earlier stage of the connection attempt process.  As a
    >> whole v2 is fine, and addresses your issue.
    >> --
    >> Michael
    > 
    > Thanks for the detailed explanation.
    > 
    > Now I see that, based on the original discussion you pointed out, as long as v2 clears conninfo before setting ready_to_display to true, it is okay to do that earlier while the state is still CONNECTING. On that point, I’m good with v2.
    > 
    > I’m still not fully convinced about displaying the *_time fields, but I don’t have a stronger argument either, so I’m fine with that. Maybe we can add a brief description in the doc like the attached diff?
    > 
    > Overall, v2 looks good to me now.
    > 
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    > 
    > 
    > 
    > 
    > <nocfbot_monitoring.sgml.diff>
    
    I spent more time here, and found that it is still possible to leak conninfo in the WAL receiver reuse path:
    
    * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
    * Then RequestXLogStreaming() copies raw conninfo into walrcv->conninfo and sets the state to WALRCV_RESTARTING.
    * WalRcvWaitForStartPosition() then moves the state to WALRCV_CONNECTING, but this path does not clear walrcv->conninfo again.
    
    The attached nocfbot_test.diff demonstrates the leak.
    
    Initially I thought we could also set ready_to_display to false when setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(), and set it back to true when switching back to WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
    
    I ended up with a solution that copies the primary connection info to walrcv->conninfo only when RequestXLogStreaming() is switching to WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver keeps using the existing wrconn, so it does not need raw conninfo to be copied into shared memory again. See the attached nocfbot_walreceiverfuncs.c.diff.
    
    With that change, the new test passes. I also ran "make check-world" successfully.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/