Thread

contrib/pg_stat_tcpinfo

Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-11-03T14:09:17Z
Attached is pg_stat_tcpinfo, an heavy work in progress, Linux-only
netstat/ss-like extension for showing detailed information about TCP
connections based on information from the kernel itself. It's the
result of some frustration and some corridor talks during recent
PgConfEU, where we agreed with Andres that it would be cool to have
this in core to support and resolve problems much more quickly (in
spite of this being a Linux-only extension for a start). As it stands
it can help troubleshooting advanced performance problems like:
* basic connection problems (e.g. imagine being able to see that
postgres_fdw PID is stuck in SYN_SENT)
* bandwidth problems where [logical/physical] replication and/or
backup (pg_basebackup) are not keeping up due any of the above: full
sendQ/recvQ, random packet drops on connection, too high RTT (WAN),
network jitter, etc.
* tracking down reasons for stuck or reset-by-peer TCP connections due
to wrongly firewalls silently killing idle connections (it's visible
via not TCP keepalive not being activated in time, and later stuck in
timer retransmits)
* high latency outliers for SyncRep too (e.g. via
lastsnd/lastrcv/lastack timers)
* probably some more

One could argue that while netstat/ss provide the necessary
information, it's often much harder than necessary to combine the
information with postgres information (like which connections are for
replication, wait events, correlation to other DB activity). The idea
is to have it in core, so it's available always.

Sample use (psql issuing COPY to postgres_fdw to the same DB over
127.0.0.1 -- 3 connections are visible: 1 for postgres_fdw itself, and
two sides of the network socket [this happens only on localhost])

postgres=# select
    pid, application_name, substring(query,1,40) q,
    wait_event, src_addr, src_port, dst_addr, dst_port,
    recvq, sendq,
    t.state, tcpinfo->>'rtt' as rtt, tcpinfo->>'rwnd_limited' as rwnd_limited,
    pg_size_pretty(cast(tcpinfo->>'delivery_rate' as bigint)) as
peak_delivery_rate,
    substring(tcpinfo::text,1,64) || '...' as substr
from pg_stat_get_tcpinfo() t join pg_stat_activity a using (pid)
where
    t.state ='ESTABLISHED' and pid <> pg_backend_pid();
  pid  | application_name |                    q                     |
     wait_event      | src_addr  | src_port | dst_addr  | dst_port |
recvq | sendq |    state    |  rtt  | rwnd_limited |
peak_delivery_rate |                               substr
-------+------------------+------------------------------------------+----------------------+-----------+----------+-----------+----------+-------+-------+-------------+-------+--------------+--------------------+---------------------------------------------------------------------
 81279 | psql             | COPY (SELECT bindata FROM fdw_bandwidth_ |
PostgresFdwGetResult | 127.0.0.1 |     1234 | 127.0.0.1 |    52612 |
  0 |     0 | ESTABLISHED | 5.092 | 0            | 422 MB
| {"ato": 40000, "rto": 206000, "rtt": 5.092, "lost": 0, "pmtu": 6...
 81279 | psql             | COPY (SELECT bindata FROM fdw_bandwidth_ |
PostgresFdwGetResult | 127.0.0.1 |    59562 | 127.0.0.1 |     1234 |
  0 |     0 | ESTABLISHED | 0.038 | 0            | 2604 MB
| {"ato": 40000, "rto": 201000, "rtt": 0.038, "lost": 0, "pmtu": 6...
 81282 | postgres_fdw     | FETCH 100 FROM c1                        |
BuffileWrite         | 127.0.0.1 |     1234 | 127.0.0.1 |    59562 |
  0 |     0 | ESTABLISHED | 0.380 | 1735000      | 833 MB
| {"ato": 40000, "rto": 201000, "rtt": 0.380, "lost": 0, "pmtu": 6...
(3 rows)

Some early feedback about direction in order to bring this into core
would be appreciated. State of stuff:

1. Andres is pushing for supporting UNIX domain sockets here, but I'm
not sure if it is really worth the effort (and it would trigger new
naming problem;)) and primarily making the code even more complex.
IMHO the netlinksock_diag API is already convoluted and adding AF_UNIX
would make it even less readable.
2. IPv6 works, but wasn't tested much.
3. Biggest TODO left is probably properly formatting the information
based on struct tcpinfo variables (just like ss(1) does, so keeping
the same unit/formatting)
4. Patch/tests are missing intentionally as I would like first to
stabilize the outputs/naming/code first.
5. [security] Should this be available to pg_monitor/pg_read_all_stats
or just to superuser?
6. [security] Should this return info about all TCP connections or
just the UID of the postmaster?

-J.