Thread
-
contrib/pg_stat_tcpinfo
Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-11-03T14:09:17Z
Attached is pg_stat_tcpinfo, an heavy work in progress, Linux-only netstat/ss-like extension for showing detailed information about TCP connections based on information from the kernel itself. It's the result of some frustration and some corridor talks during recent PgConfEU, where we agreed with Andres that it would be cool to have this in core to support and resolve problems much more quickly (in spite of this being a Linux-only extension for a start). As it stands it can help troubleshooting advanced performance problems like: * basic connection problems (e.g. imagine being able to see that postgres_fdw PID is stuck in SYN_SENT) * bandwidth problems where [logical/physical] replication and/or backup (pg_basebackup) are not keeping up due any of the above: full sendQ/recvQ, random packet drops on connection, too high RTT (WAN), network jitter, etc. * tracking down reasons for stuck or reset-by-peer TCP connections due to wrongly firewalls silently killing idle connections (it's visible via not TCP keepalive not being activated in time, and later stuck in timer retransmits) * high latency outliers for SyncRep too (e.g. via lastsnd/lastrcv/lastack timers) * probably some more One could argue that while netstat/ss provide the necessary information, it's often much harder than necessary to combine the information with postgres information (like which connections are for replication, wait events, correlation to other DB activity). The idea is to have it in core, so it's available always. Sample use (psql issuing COPY to postgres_fdw to the same DB over 127.0.0.1 -- 3 connections are visible: 1 for postgres_fdw itself, and two sides of the network socket [this happens only on localhost]) postgres=# select pid, application_name, substring(query,1,40) q, wait_event, src_addr, src_port, dst_addr, dst_port, recvq, sendq, t.state, tcpinfo->>'rtt' as rtt, tcpinfo->>'rwnd_limited' as rwnd_limited, pg_size_pretty(cast(tcpinfo->>'delivery_rate' as bigint)) as peak_delivery_rate, substring(tcpinfo::text,1,64) || '...' as substr from pg_stat_get_tcpinfo() t join pg_stat_activity a using (pid) where t.state ='ESTABLISHED' and pid <> pg_backend_pid(); pid | application_name | q | wait_event | src_addr | src_port | dst_addr | dst_port | recvq | sendq | state | rtt | rwnd_limited | peak_delivery_rate | substr -------+------------------+------------------------------------------+----------------------+-----------+----------+-----------+----------+-------+-------+-------------+-------+--------------+--------------------+--------------------------------------------------------------------- 81279 | psql | COPY (SELECT bindata FROM fdw_bandwidth_ | PostgresFdwGetResult | 127.0.0.1 | 1234 | 127.0.0.1 | 52612 | 0 | 0 | ESTABLISHED | 5.092 | 0 | 422 MB | {"ato": 40000, "rto": 206000, "rtt": 5.092, "lost": 0, "pmtu": 6... 81279 | psql | COPY (SELECT bindata FROM fdw_bandwidth_ | PostgresFdwGetResult | 127.0.0.1 | 59562 | 127.0.0.1 | 1234 | 0 | 0 | ESTABLISHED | 0.038 | 0 | 2604 MB | {"ato": 40000, "rto": 201000, "rtt": 0.038, "lost": 0, "pmtu": 6... 81282 | postgres_fdw | FETCH 100 FROM c1 | BuffileWrite | 127.0.0.1 | 1234 | 127.0.0.1 | 59562 | 0 | 0 | ESTABLISHED | 0.380 | 1735000 | 833 MB | {"ato": 40000, "rto": 201000, "rtt": 0.380, "lost": 0, "pmtu": 6... (3 rows) Some early feedback about direction in order to bring this into core would be appreciated. State of stuff: 1. Andres is pushing for supporting UNIX domain sockets here, but I'm not sure if it is really worth the effort (and it would trigger new naming problem;)) and primarily making the code even more complex. IMHO the netlinksock_diag API is already convoluted and adding AF_UNIX would make it even less readable. 2. IPv6 works, but wasn't tested much. 3. Biggest TODO left is probably properly formatting the information based on struct tcpinfo variables (just like ss(1) does, so keeping the same unit/formatting) 4. Patch/tests are missing intentionally as I would like first to stabilize the outputs/naming/code first. 5. [security] Should this be available to pg_monitor/pg_read_all_stats or just to superuser? 6. [security] Should this return info about all TCP connections or just the UID of the postmaster? -J.