Thread

  1. Re: Sampling-based timing for EXPLAIN ANALYZE

    Lukas Fittl <lukas@fittl.com> — 2023-01-17T10:50:40Z

    On Fri, Jan 6, 2023 at 1:19 AM Jelte Fennema <me@jeltef.nl> wrote:
    
    > Nice addition! And the code looks pretty straight forward.
    >
    
    Thanks for reviewing!
    
    The current patch triggers warnings:
    > https://cirrus-ci.com/task/6016013976731648 Looks like you need to add
    > void as the argument.
    >
    
    Fixed in v2 attached. This also adds a simple regression test, as well as
    fixes the parallel working handling.
    
    Do you have some performance comparison between TIMING ON and TIMING
    > SAMPLING?
    >
    
    Here are some benchmarks of auto_explain overhead on my ARM-based M1
    Macbook for the following query run with pgbench on a scale factor 100 data
    set:
    
    SELECT COUNT(*) FROM pgbench_branches JOIN pgbench_accounts USING (bid)
    JOIN pgbench_tellers USING (bid) WHERE bid = 42;
    
    (the motivation is to use a query that is more complex than the standard
    pgbench select-only test query)
    
    avg latency (best of 3), -T 300, -c 4, -s 100, shared_buffers 2GB, fsync
    off, max_parallel_workers_per_gather 0:
    
    master, log_timing = off: 871 ms (878 / 877 / 871)
    patch, log_timing = off: 869 ms (882 / 880 / 869)
    patch, log_timing = on: 890 ms (917 / 930 / 890)
    patch, log_timing = sampling, samplefreq = 1000: 869 ms (887 / 869 / 894)
    
    Additionally, here is Andres' benchmark from [1], with the sampling option
    added:
    
    %  psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM
    generate_series(1, 100000) g(i);' postgres && pgbench -n -r -t 100 -f
    <(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT
    COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM
    t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") postgres |grep '^ '
    DROP TABLE
    SELECT 100000
             3.507           0  SELECT COUNT(*) FROM t;
             3.476           0  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*)
    FROM t;
             3.576           0  EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT
    COUNT(*) FROM t;
             5.096           0  EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*)
    FROM t;
    
    My pg_test_timing data for reference:
    
    % pg_test_timing
    Testing timing overhead for 3 seconds.
    Per loop time including overhead: 23.65 ns
    Histogram of timing durations:
      < us   % of total      count
         1     97.64472  123876325
         2      2.35421    2986658
         4      0.00022        277
         8      0.00016        202
        16      0.00064        815
        32      0.00005         64
    
    In InstrStartSampling there's logic to increase/decrease the frequency of
    > an already existing timer. It's not clear to me when this can occur. I'd
    > expect sampling frequency to remain constant throughout an explain plan. If
    > it's indeed needed, I think a code comment would be useful to explain why
    > this edge case is necessary.
    >
    
    Clarified in a code comment in v2. This is needed for handling nested
    statements which could have different sampling frequencies for each nesting
    level, i.e. a function might want to sample it's queries at a higher
    frequency than its caller.
    
    Thanks,
    Lukas
    
    [1] https://postgr.es/m/20230116213913.4oseovlzvc2674z7%40awork3.anarazel.de
    
    -- 
    Lukas Fittl