Re: Sampling-based timing for EXPLAIN ANALYZE

Lukas Fittl <lukas@fittl.com>

View thread

From: Lukas Fittl <lukas@fittl.com>

To: Jelte Fennema <me@jeltef.nl>

Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Andres Freund <andres@anarazel.de>

Date: 2023-01-17T10:50:40Z

Lists: pgsql-hackers

Attachments

v2-0001-Add-TIMING-SAMPLING-option-for-EXPLAIN-ANALYZE.patch (application/octet-stream)

On Fri, Jan 6, 2023 at 1:19 AM Jelte Fennema <me@jeltef.nl> wrote:

> Nice addition! And the code looks pretty straight forward.
>

Thanks for reviewing!

The current patch triggers warnings:
> https://cirrus-ci.com/task/6016013976731648 Looks like you need to add
> void as the argument.
>

Fixed in v2 attached. This also adds a simple regression test, as well as
fixes the parallel working handling.

Do you have some performance comparison between TIMING ON and TIMING
> SAMPLING?
>

Here are some benchmarks of auto_explain overhead on my ARM-based M1
Macbook for the following query run with pgbench on a scale factor 100 data
set:

SELECT COUNT(*) FROM pgbench_branches JOIN pgbench_accounts USING (bid)
JOIN pgbench_tellers USING (bid) WHERE bid = 42;

(the motivation is to use a query that is more complex than the standard
pgbench select-only test query)

avg latency (best of 3), -T 300, -c 4, -s 100, shared_buffers 2GB, fsync
off, max_parallel_workers_per_gather 0:

master, log_timing = off: 871 ms (878 / 877 / 871)
patch, log_timing = off: 869 ms (882 / 880 / 869)
patch, log_timing = on: 890 ms (917 / 930 / 890)
patch, log_timing = sampling, samplefreq = 1000: 869 ms (887 / 869 / 894)

Additionally, here is Andres' benchmark from [1], with the sampling option
added:

%  psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM
generate_series(1, 100000) g(i);' postgres && pgbench -n -r -t 100 -f
<(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT
COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM
t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") postgres |grep '^ '
DROP TABLE
SELECT 100000
         3.507           0  SELECT COUNT(*) FROM t;
         3.476           0  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*)
FROM t;
         3.576           0  EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT
COUNT(*) FROM t;
         5.096           0  EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*)
FROM t;

My pg_test_timing data for reference:

% pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 23.65 ns
Histogram of timing durations:
  < us   % of total      count
     1     97.64472  123876325
     2      2.35421    2986658
     4      0.00022        277
     8      0.00016        202
    16      0.00064        815
    32      0.00005         64

In InstrStartSampling there's logic to increase/decrease the frequency of
> an already existing timer. It's not clear to me when this can occur. I'd
> expect sampling frequency to remain constant throughout an explain plan. If
> it's indeed needed, I think a code comment would be useful to explain why
> this edge case is necessary.
>

Clarified in a code comment in v2. This is needed for handling nested
statements which could have different sampling frequencies for each nesting
level, i.e. a function might want to sample it's queries at a higher
frequency than its caller.

Thanks,
Lukas

[1] https://postgr.es/m/20230116213913.4oseovlzvc2674z7%40awork3.anarazel.de

-- 
Lukas Fittl