Thread

Sampling-based timing for EXPLAIN ANALYZE

Lukas Fittl <lukas@fittl.com> — 2023-01-02T11:36:04Z
Hi,

Since EXPLAIN ANALYZE with TIMING ON still carries noticeable overhead on
modern hardware (despite time sources being faster), I'd like to propose a
new setting EXPLAIN ANALYZE, called "TIMING SAMPLING", as compared to
TIMING ON.

This new timing mode uses a timer on a fixed recurring frequency (e.g. 100
or 1000 Hz) to gather a sampled timestamp on a predefined schedule, instead
of getting the time on-demand when InstrStartNode/InstrStopNode is called.
To implement the timer, we can use the existing timeout infrastructure,
which is backed by a wall clock timer (ITIMER_REAL).

Conceptually this is inspired by how sampling profilers work (e.g. "perf"),
but it ties into the existing per-plan node instrumentation done by EXPLAIN
ANALYZE, and simply provides a lower accuracy version of the total time for
each plan node.

In EXPLAIN output this is marked as "sampled time", and scaled to the total
wall clock time (to adjust for the sampling undercounting):

=# EXPLAIN (ANALYZE, BUFFERS, TIMING SAMPLING, SAMPLEFREQ 100) SELECT ...;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=201747.90..201748.00 rows=10 width=12) (actual
sampled time=5490.974 rows=9 loops=1)
   ...
   ->  Hash Join  (cost=0.23..199247.90 rows=499999 width=4) (actual
sampled time=3738.619 rows=9000000 loops=1)
         ...
         ->  Seq Scan on large  (cost=0.00..144247.79 rows=9999979 width=4)
(actual sampled time=1004.671 rows=10000001 loops=1)
               ...
         ->  Hash  (cost=0.10..0.10 rows=10 width=4) (actual sampled
time=0.000 rows=10 loops=1)
               ...
 Execution Time: 5491.475 ms
---

In simple query tests like this on my local machine, this shows a
consistent benefit over TIMING ON (and behaves close to ANALYZE with TIMING
OFF), whilst providing a "good enough" accuracy to identify which part of
the query was problematic.

Attached is a prototype patch for early feedback on the concept, with tests
and documentation to come in a follow-up. Since the January commitfest is
still marked as open I'll register it there, but note that my assumption is
this is *not* Postgres 16 material.

As an open item, note that in the patch the requested sampling frequency is
not yet passed to parallel workers (it always defaults to 1000 Hz when
sampling is enabled). Also, note the timing frequency is limited to a
maximum of 1000 Hz (1ms) due to current limitations of the timeout
infrastructure.

With thanks to Andres Freund for help on refining the idea, collaborating
on early code and finding the approach to hook into the timeout API.

Thanks,
Lukas

-- 
Lukas Fittl