Thread

  1. contrib/pg_stat_tcpinfo

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-11-03T14:09:17Z

    Attached is pg_stat_tcpinfo, an heavy work in progress, Linux-only
    netstat/ss-like extension for showing detailed information about TCP
    connections based on information from the kernel itself. It's the
    result of some frustration and some corridor talks during recent
    PgConfEU, where we agreed with Andres that it would be cool to have
    this in core to support and resolve problems much more quickly (in
    spite of this being a Linux-only extension for a start). As it stands
    it can help troubleshooting advanced performance problems like:
    * basic connection problems (e.g. imagine being able to see that
    postgres_fdw PID is stuck in SYN_SENT)
    * bandwidth problems where [logical/physical] replication and/or
    backup (pg_basebackup) are not keeping up due any of the above: full
    sendQ/recvQ, random packet drops on connection, too high RTT (WAN),
    network jitter, etc.
    * tracking down reasons for stuck or reset-by-peer TCP connections due
    to wrongly firewalls silently killing idle connections (it's visible
    via not TCP keepalive not being activated in time, and later stuck in
    timer retransmits)
    * high latency outliers for SyncRep too (e.g. via
    lastsnd/lastrcv/lastack timers)
    * probably some more
    
    One could argue that while netstat/ss provide the necessary
    information, it's often much harder than necessary to combine the
    information with postgres information (like which connections are for
    replication, wait events, correlation to other DB activity). The idea
    is to have it in core, so it's available always.
    
    Sample use (psql issuing COPY to postgres_fdw to the same DB over
    127.0.0.1 -- 3 connections are visible: 1 for postgres_fdw itself, and
    two sides of the network socket [this happens only on localhost])
    
    postgres=# select
        pid, application_name, substring(query,1,40) q,
        wait_event, src_addr, src_port, dst_addr, dst_port,
        recvq, sendq,
        t.state, tcpinfo->>'rtt' as rtt, tcpinfo->>'rwnd_limited' as rwnd_limited,
        pg_size_pretty(cast(tcpinfo->>'delivery_rate' as bigint)) as
    peak_delivery_rate,
        substring(tcpinfo::text,1,64) || '...' as substr
    from pg_stat_get_tcpinfo() t join pg_stat_activity a using (pid)
    where
        t.state ='ESTABLISHED' and pid <> pg_backend_pid();
      pid  | application_name |                    q                     |
         wait_event      | src_addr  | src_port | dst_addr  | dst_port |
    recvq | sendq |    state    |  rtt  | rwnd_limited |
    peak_delivery_rate |                               substr
    -------+------------------+------------------------------------------+----------------------+-----------+----------+-----------+----------+-------+-------+-------------+-------+--------------+--------------------+---------------------------------------------------------------------
     81279 | psql             | COPY (SELECT bindata FROM fdw_bandwidth_ |
    PostgresFdwGetResult | 127.0.0.1 |     1234 | 127.0.0.1 |    52612 |
      0 |     0 | ESTABLISHED | 5.092 | 0            | 422 MB
    | {"ato": 40000, "rto": 206000, "rtt": 5.092, "lost": 0, "pmtu": 6...
     81279 | psql             | COPY (SELECT bindata FROM fdw_bandwidth_ |
    PostgresFdwGetResult | 127.0.0.1 |    59562 | 127.0.0.1 |     1234 |
      0 |     0 | ESTABLISHED | 0.038 | 0            | 2604 MB
    | {"ato": 40000, "rto": 201000, "rtt": 0.038, "lost": 0, "pmtu": 6...
     81282 | postgres_fdw     | FETCH 100 FROM c1                        |
    BuffileWrite         | 127.0.0.1 |     1234 | 127.0.0.1 |    59562 |
      0 |     0 | ESTABLISHED | 0.380 | 1735000      | 833 MB
    | {"ato": 40000, "rto": 201000, "rtt": 0.380, "lost": 0, "pmtu": 6...
    (3 rows)
    
    Some early feedback about direction in order to bring this into core
    would be appreciated. State of stuff:
    
    1. Andres is pushing for supporting UNIX domain sockets here, but I'm
    not sure if it is really worth the effort (and it would trigger new
    naming problem;)) and primarily making the code even more complex.
    IMHO the netlinksock_diag API is already convoluted and adding AF_UNIX
    would make it even less readable.
    2. IPv6 works, but wasn't tested much.
    3. Biggest TODO left is probably properly formatting the information
    based on struct tcpinfo variables (just like ss(1) does, so keeping
    the same unit/formatting)
    4. Patch/tests are missing intentionally as I would like first to
    stabilize the outputs/naming/code first.
    5. [security] Should this be available to pg_monitor/pg_read_all_stats
    or just to superuser?
    6. [security] Should this return info about all TCP connections or
    just the UID of the postmaster?
    
    -J.