Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Melanie Plageman <melanieplageman@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin
- 284925508ae6 19 (unreleased) landed
-
Simplify visibility check in heap_page_would_be_all_visible()
- 3efe58febc3c 19 (unreleased) landed
-
Eliminate use of cached VM value in lazy_scan_prune()
- 648a7e28d7c2 19 (unreleased) landed
-
Combine visibilitymap_set() cases in lazy_scan_prune()
- 21796c267d0a 19 (unreleased) landed
-
Fix const qualification in prune_freeze_setup()
- 4877391ce894 19 (unreleased) landed
-
Simplify vacuum visibility assertion
- bd298f54a0d6 19 (unreleased) landed
-
Split heap_page_prune_and_freeze() into helpers
- e135e044572e 19 (unreleased) landed
-
Assert that cutoffs are provided if freezing will be attempted
- cd38b7e77315 19 (unreleased) landed
-
Split PruneFreezeParams initializers to one field per line
- 1e14edcea5e1 19 (unreleased) landed
-
Refactor heap_page_prune_and_freeze() parameters into a struct
- 1937ed70621e 19 (unreleased) landed
-
Make heap_page_is_all_visible independent of LVRelState
- 3e4705484e0c 19 (unreleased) landed
-
Inline TransactionIdFollows/Precedes[OrEquals]()
- 43b05b38ea4d 19 (unreleased) landed
-
Add helper for freeze determination to heap_page_prune_and_freeze
- c8dd6542bae4 19 (unreleased) landed
-
Bump XLOG_PAGE_MAGIC after xl_heap_prune change
- 4a8fb58671d3 19 (unreleased) landed
-
Correct prune WAL record opcode name in comment
- ae8ea7278c16 19 (unreleased) landed
-
Add error codes when vacuum discovers VM corruption
- 8ec97e78a771 19 (unreleased) landed
-
Remove unused xl_heap_prune member, reason
- 4b5f206de2bb 19 (unreleased) landed
-
Remove unneeded VM pin from VM replay
- 3399c265543e 19 (unreleased) landed
-
Add assert and log message to visibilitymap_set
- e3d5ddb7ca91 19 (unreleased) landed
-
Add error codes to some corruption log messages
- fd6ec93bf890 13.0 cited
Attachments
- Set-pd_prune_xid-on-insert.txt (text/plain)
- v5-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (text/x-patch) patch v5-0003
- v5-0004-Use-xl_heap_prune-record-for-setting-empty-pages-.patch (text/x-patch) patch v5-0004
- v5-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (text/x-patch) patch v5-0001
- v5-0002-Make-heap_page_is_all_visible-independent-of-LVRe.patch (text/x-patch) patch v5-0002
- v5-0006-Combine-vacuum-phase-I-VM-update-cases.patch (text/x-patch) patch v5-0006
- v5-0007-Find-and-fix-VM-corruption-in-heap_page_prune_and.patch (text/x-patch) patch v5-0007
- v5-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (text/x-patch) patch v5-0005
- v5-0009-Update-VM-in-pruneheap.c.patch (text/x-patch) patch v5-0009
- v5-0008-Keep-all_frozen-updated-too-in-heap_page_prune_an.patch (text/x-patch) patch v5-0008
- v5-0010-Eliminate-xl_heap_visible-from-vacuum-phase-I-pru.patch (text/x-patch) patch v5-0010
- v5-0011-Rename-PruneState.freeze-to-attempt_freeze.patch (text/x-patch) patch v5-0011
- v5-0014-Use-GlobalVisState-to-determine-page-level-visibi.patch (text/x-patch) patch v5-0014
- v5-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisXi.patch (text/x-patch) patch v5-0013
- v5-0012-Remove-xl_heap_visible-entirely.patch (text/x-patch) patch v5-0012
- v5-0015-Inline-TransactionIdFollows-Precedes.patch (text/x-patch) patch v5-0015
- v5-0016-Unset-all-visible-sooner-if-not-freezing.patch (text/x-patch) patch v5-0016
- v5-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (text/x-patch) patch v5-0017
- v5-0018-Add-helper-functions-to-heap_page_prune_and_freez.patch (text/x-patch) patch v5-0018
- v5-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (text/x-patch) patch v5-0019
Thanks for continuing to take a look, Andrey.
On Mon, Jul 14, 2025 at 2:37 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
>
> If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
That's very unfortunate. I wonder what could be causing this. Do you
suspect a bug in Postgres? Or something wrong with the disk, etc?
> Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
>
> To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
Ah, you mean the warnings currently in lazy_scan_prune(). To me this
suggestion makes sense. I see at least one other example with
ERRCODE_DATA_CORRUPTED that is an error level below ERROR.
I have attached a cleaned up and updated version of the patch set (it
doesn't yet include your suggested error message change).
What's new in this version
-----
In addition to general code, comment, and commit message improvements,
notable changes are as follows:
- I have used the GlobalVisState for determining if the whole page is
visible in a more natural way.
- I micro-benchmarked and identified some sources of regression in the
additional code SELECT queries would do to set the VM. So, there are
several new commits addressing these (for example inlining several
functions and unsetting all-visible when we see a dead tuple if we
won't attempt freezing).
- Because heap_page_prune_and_freeze() was getting long, I added some
helper functions.
Performance impact of setting the VM on-access
-------
I found that with the patch set applied, we set many pages all-visible
in the VM on access, resulting in a higher overall number of pages set
all-visible, reducing load for vacuum, and dramatically decreasing
heap fetches by index-only scans.
I devised a simple benchmark -- with 8 workers inserting 20 rows at a
time into a table with a few columns and updating a single row that
they just inserted. Another worker queries the table 1x second using
an index.
After running the benchmark for a few minutes, though the table was
autovacuumed several times in both cases, with the patchset applied,
15% more blocks were all-visible at the end of the benchmark.
And with my patch applied, index-only scans did far fewer heap
fetches. A SELECT count(*) of the table at the same point in the
benchmark did 10,000 heap fetches on master and 500 with the patch
applied (I used auto_explain to determine this).
With my patch applied, autovacuum workers write half as much WAL as on
master. Some of this is courtesy of other patches in the set which
eliminate separate WAL records for setting the page all-visible. But,
vacuum is also scanning fewer pages and dirtying fewer buffers because
they are being set all-visible on-access.
There are more details about the benchmark at the end of the email.
Setting pd_prune_xid on insert
------
The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
patch in the set. It sets pd_prune_xid on insert (so pages filled by
COPY or insert can also be set all-visible in the VM before they are
vacuumed). I gave it a .txt extension because it currently fails
035_standby_logical_decoding due to a recovery conflict. I need to
investigate more to see if this is a bug in my patch set or elsewhere
in Postgres.
Besides the failing test, I have a feeling that my current heuristic
for whether or not to set the VM on-access is not quite right for
pages that have only been inserted to -- and if we get it wrong, we've
wasted those CPU cycles because we didn't otherwise need to prune the
page.
- Melanie
Benchmark
-------
psql -c "
DROP TABLE IF EXISTS simple_table;
CREATE TABLE simple_table (
id SERIAL PRIMARY KEY,
group_id INT NOT NULL,
data TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
create index on simple_table(group_id);
"
pgbench \
--no-vacuum \
--random-seed=0 \
-c 8 \
-j 8 \
-M prepared \
-T 200 \
> "pgbench_run_summary_update_${version}" \
-f- <<EOF &
\set gid random(1,1000)
INSERT INTO simple_table (group_id, data)
SELECT :gid, 'inserted'
RETURNING id \gset
update simple_table set data = 'updated' where id = :id;
insert into simple_table (group_id, data)
select :gid, 'inserted'
from generate_series(1,20);
EOF
insert_pid=$!
pgbench \
--no-vacuum \
--random-seed=0 \
-c 1 \
-j 1 \
--rate=1 \
-M prepared \
-T 200 \
> "pgbench_run_summary_select_${version}" \
-f- <<EOF
\set gid random(1, 1000)
select max(created_at) from simple_table where group_id = :gid;
select count(*) from simple_table where group_id = :gid;
EOF
wait $insert_pid