Re: Sampling-based timing for EXPLAIN ANALYZE
Lukas Fittl <lukas@fittl.com>
From: Lukas Fittl <lukas@fittl.com>
To: Jelte Fennema <me@jeltef.nl>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
Andres Freund <andres@anarazel.de>
Date: 2023-01-17T10:50:40Z
Lists: pgsql-hackers
Attachments
- v2-0001-Add-TIMING-SAMPLING-option-for-EXPLAIN-ANALYZE.patch (application/octet-stream)
On Fri, Jan 6, 2023 at 1:19 AM Jelte Fennema <me@jeltef.nl> wrote:
> Nice addition! And the code looks pretty straight forward.
>
Thanks for reviewing!
The current patch triggers warnings:
> https://cirrus-ci.com/task/6016013976731648 Looks like you need to add
> void as the argument.
>
Fixed in v2 attached. This also adds a simple regression test, as well as
fixes the parallel working handling.
Do you have some performance comparison between TIMING ON and TIMING
> SAMPLING?
>
Here are some benchmarks of auto_explain overhead on my ARM-based M1
Macbook for the following query run with pgbench on a scale factor 100 data
set:
SELECT COUNT(*) FROM pgbench_branches JOIN pgbench_accounts USING (bid)
JOIN pgbench_tellers USING (bid) WHERE bid = 42;
(the motivation is to use a query that is more complex than the standard
pgbench select-only test query)
avg latency (best of 3), -T 300, -c 4, -s 100, shared_buffers 2GB, fsync
off, max_parallel_workers_per_gather 0:
master, log_timing = off: 871 ms (878 / 877 / 871)
patch, log_timing = off: 869 ms (882 / 880 / 869)
patch, log_timing = on: 890 ms (917 / 930 / 890)
patch, log_timing = sampling, samplefreq = 1000: 869 ms (887 / 869 / 894)
Additionally, here is Andres' benchmark from [1], with the sampling option
added:
% psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM
generate_series(1, 100000) g(i);' postgres && pgbench -n -r -t 100 -f
<(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT
COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM
t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") postgres |grep '^ '
DROP TABLE
SELECT 100000
3.507 0 SELECT COUNT(*) FROM t;
3.476 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*)
FROM t;
3.576 0 EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT
COUNT(*) FROM t;
5.096 0 EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*)
FROM t;
My pg_test_timing data for reference:
% pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 23.65 ns
Histogram of timing durations:
< us % of total count
1 97.64472 123876325
2 2.35421 2986658
4 0.00022 277
8 0.00016 202
16 0.00064 815
32 0.00005 64
In InstrStartSampling there's logic to increase/decrease the frequency of
> an already existing timer. It's not clear to me when this can occur. I'd
> expect sampling frequency to remain constant throughout an explain plan. If
> it's indeed needed, I think a code comment would be useful to explain why
> this edge case is necessary.
>
Clarified in a code comment in v2. This is needed for handling nested
statements which could have different sampling frequencies for each nesting
level, i.e. a function might want to sample it's queries at a higher
frequency than its caller.
Thanks,
Lukas
[1] https://postgr.es/m/20230116213913.4oseovlzvc2674z7%40awork3.anarazel.de
--
Lukas Fittl