Thread

  1. Sampling-based timing for EXPLAIN ANALYZE

    Lukas Fittl <lukas@fittl.com> — 2023-01-02T11:36:04Z

    Hi,
    
    Since EXPLAIN ANALYZE with TIMING ON still carries noticeable overhead on
    modern hardware (despite time sources being faster), I'd like to propose a
    new setting EXPLAIN ANALYZE, called "TIMING SAMPLING", as compared to
    TIMING ON.
    
    This new timing mode uses a timer on a fixed recurring frequency (e.g. 100
    or 1000 Hz) to gather a sampled timestamp on a predefined schedule, instead
    of getting the time on-demand when InstrStartNode/InstrStopNode is called.
    To implement the timer, we can use the existing timeout infrastructure,
    which is backed by a wall clock timer (ITIMER_REAL).
    
    Conceptually this is inspired by how sampling profilers work (e.g. "perf"),
    but it ties into the existing per-plan node instrumentation done by EXPLAIN
    ANALYZE, and simply provides a lower accuracy version of the total time for
    each plan node.
    
    In EXPLAIN output this is marked as "sampled time", and scaled to the total
    wall clock time (to adjust for the sampling undercounting):
    
    =# EXPLAIN (ANALYZE, BUFFERS, TIMING SAMPLING, SAMPLEFREQ 100) SELECT ...;
                                                                 QUERY PLAN
    
    -------------------------------------------------------------------------------------------------------------------------------------
     HashAggregate  (cost=201747.90..201748.00 rows=10 width=12) (actual
    sampled time=5490.974 rows=9 loops=1)
       ...
       ->  Hash Join  (cost=0.23..199247.90 rows=499999 width=4) (actual
    sampled time=3738.619 rows=9000000 loops=1)
             ...
             ->  Seq Scan on large  (cost=0.00..144247.79 rows=9999979 width=4)
    (actual sampled time=1004.671 rows=10000001 loops=1)
                   ...
             ->  Hash  (cost=0.10..0.10 rows=10 width=4) (actual sampled
    time=0.000 rows=10 loops=1)
                   ...
     Execution Time: 5491.475 ms
    ---
    
    In simple query tests like this on my local machine, this shows a
    consistent benefit over TIMING ON (and behaves close to ANALYZE with TIMING
    OFF), whilst providing a "good enough" accuracy to identify which part of
    the query was problematic.
    
    Attached is a prototype patch for early feedback on the concept, with tests
    and documentation to come in a follow-up. Since the January commitfest is
    still marked as open I'll register it there, but note that my assumption is
    this is *not* Postgres 16 material.
    
    As an open item, note that in the patch the requested sampling frequency is
    not yet passed to parallel workers (it always defaults to 1000 Hz when
    sampling is enabled). Also, note the timing frequency is limited to a
    maximum of 1000 Hz (1ms) due to current limitations of the timeout
    infrastructure.
    
    With thanks to Andres Freund for help on refining the idea, collaborating
    on early code and finding the approach to hook into the timeout API.
    
    Thanks,
    Lukas
    
    -- 
    Lukas Fittl