Re: another autovacuum scheduling thread
Sami Imseih <samimseih@gmail.com>
From: Sami Imseih <samimseih@gmail.com>
To: Nathan Bossart <nathandbossart@gmail.com>
Cc: Robert Haas <robertmhaas@gmail.com>, David Rowley <dgrowleyml@gmail.com>, Jeremy Schneider <schneider@ardentperf.com>, pgsql-hackers@postgresql.org
Date: 2025-10-31T00:38:15Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add rudimentary table prioritization to autovacuum.
- d7965d65fc5b 19 (unreleased) landed
-
Trigger more frequent autovacuums with relallfrozen
- 06eae9e6218a 18.0 cited
-
Harden nbtree page deletion.
- c34787f91058 14.0 cited
-
Check for interrupts inside the nbtree page deletion code.
- 3a01f68e35a3 12.0 cited
Attachments
- batch.sh (application/x-sh)
- summary.sh (application/x-sh)
- oltp.sh (application/x-sh)
- run_workloads.sh (application/x-sh)
> FWIW, I've been putting some scripts together to test some workloads > and I will share shortly what I have. Here is my attempt to test the behavior with the new prioritization. I wanted a way to run the same tests with different workloads, both with and without the prioritization patch, and to see if anything stands out as suspicious in terms of autovacuum or autoanalyze activity. For example, certain tables showing too little or too much autovacuum activity. The scripts I put together (attached) run a busy update workload (OLTP) and a separate batch workload. They use pgbench to execute custom scripts that are generated on the fly. The results are summarized by the average number of autovacuum and autoanalyze runs *per table*, along with some other DML activity stats to ensure that the workloads being compared have similar DML activity. Using the scripts: Place the attached scripts in a specific directory, and modify the section under "Caller should adjust these values" in run_workloads.sh to adjust the workload. The scripts assume you have a running cluster with your specific config file adjusted for the test. Once ready, call run_workloads.sh and at the end a summary will show up as you see below. Hopefully it works for you :) The summary.sh script can also be run while the workloads are executing. Here is a example of a test I wanted to run based on the discussion [0]: This scenario is one that was mentioned, but there are others in which a batch process performing inserts only is prioritized over the update workload. I ran this test for 10 minutes, using 200 clients for the update workload and 5 clients for the batch workload, with the following configuration: ``` max_connections=1000; autovacuum_naptime = '10s' shared_buffers = '4GB' autovacuum_max_workers = 6 ``` -- HEAD ``` Total Activity -[ RECORD 1 ]-------------+---------- total_n_dead_tup | 985183 total_n_mod_since_analyze | 220294866 total_reltuples | 247690373 total_autovacuum_count | 137 total_autoanalyze_count | 470 total_n_tup_upd | 7720012 total_n_tup_ins | 446683000 table_count | 105 Activity By Workload Type -[ RECORD 1 ]-----------------+---------------- table_group | batch_tables ** avg_autovacuum_count | 7.400 ** avg_autoanalyze_count | 8.000 avg_vacuum_count | 0.000 avg_analyze_count | 0.000 rows_inserted | 436683000 rows_updated | 0 rows_hot_updated | 0 table_count | 5 -[ RECORD 2 ]-----------------+---------------- table_group | numbered_tables ** avg_autovacuum_count | 1.000 ** avg_autoanalyze_count | 4.300 avg_vacuum_count | 1.000 avg_analyze_count | 0.000 rows_inserted | 10000000 rows_updated | 7720012 rows_hot_updated | 7094573 table_count | 100 ``` -- with v7 applied ``` Total Activity -[ RECORD 1 ]-------------+---------- total_n_dead_tup | 1233045 total_n_mod_since_analyze | 137843507 total_reltuples | 350704437 total_autovacuum_count | 146 total_autoanalyze_count | 605 total_n_tup_upd | 7896354 total_n_tup_ins | 487974000 table_count | 105 Activity By Workload Type -[ RECORD 1 ]-----------------+---------------- table_group | batch_tables ** avg_autovacuum_count | 11.000 ** avg_autoanalyze_count | 13.200 avg_vacuum_count | 0.000 avg_analyze_count | 0.000 rows_inserted | 477974000 rows_updated | 0 rows_hot_updated | 0 table_count | 5 -[ RECORD 2 ]-----------------+---------------- table_group | numbered_tables ** avg_autovacuum_count | 0.910 ** avg_autoanalyze_count | 5.390 avg_vacuum_count | 1.000 avg_analyze_count | 0.000 rows_inserted | 10000000 rows_updated | 7896354 rows_hot_updated | 7123134 table_count | 100 ``` The results above show what I expected: the batch tables receive higher priority, as seen from the averages of autovacuum and autoanalyze runs. This behavior is expected, but it may catch some users by surprise after an upgrade, since certain tables will now receive more attention than others. Longer tests might also show more bloat accumulating on heavily updated tables. In such cases, a user may need to adjust autovacuum settings on a per-table basis to restore the previous behavior. So, I am not quite sure what is the best way to test except for trying to find these non steady state workloads and see the impact of the prioritization change to (auto)vacuum/analyze activity . Maybe there is a better way? [0] https://www.postgresql.org/message-id/aQI7tGEs8IOPxG64%40nathan -- Sami Imseih Amazon Web Services (AWS)