Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Melanie Plageman <melanieplageman@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin
- 284925508ae6 19 (unreleased) landed
-
Simplify visibility check in heap_page_would_be_all_visible()
- 3efe58febc3c 19 (unreleased) landed
-
Eliminate use of cached VM value in lazy_scan_prune()
- 648a7e28d7c2 19 (unreleased) landed
-
Combine visibilitymap_set() cases in lazy_scan_prune()
- 21796c267d0a 19 (unreleased) landed
-
Fix const qualification in prune_freeze_setup()
- 4877391ce894 19 (unreleased) landed
-
Simplify vacuum visibility assertion
- bd298f54a0d6 19 (unreleased) landed
-
Split heap_page_prune_and_freeze() into helpers
- e135e044572e 19 (unreleased) landed
-
Assert that cutoffs are provided if freezing will be attempted
- cd38b7e77315 19 (unreleased) landed
-
Split PruneFreezeParams initializers to one field per line
- 1e14edcea5e1 19 (unreleased) landed
-
Refactor heap_page_prune_and_freeze() parameters into a struct
- 1937ed70621e 19 (unreleased) landed
-
Make heap_page_is_all_visible independent of LVRelState
- 3e4705484e0c 19 (unreleased) landed
-
Inline TransactionIdFollows/Precedes[OrEquals]()
- 43b05b38ea4d 19 (unreleased) landed
-
Add helper for freeze determination to heap_page_prune_and_freeze
- c8dd6542bae4 19 (unreleased) landed
-
Bump XLOG_PAGE_MAGIC after xl_heap_prune change
- 4a8fb58671d3 19 (unreleased) landed
-
Correct prune WAL record opcode name in comment
- ae8ea7278c16 19 (unreleased) landed
-
Add error codes when vacuum discovers VM corruption
- 8ec97e78a771 19 (unreleased) landed
-
Remove unused xl_heap_prune member, reason
- 4b5f206de2bb 19 (unreleased) landed
-
Remove unneeded VM pin from VM replay
- 3399c265543e 19 (unreleased) landed
-
Add assert and log message to visibilitymap_set
- e3d5ddb7ca91 19 (unreleased) landed
-
Add error codes to some corruption log messages
- fd6ec93bf890 13.0 cited
Attachments
- v13-0004-Use-xl_heap_prune-record-for-setting-empty-pages.patch (text/x-patch) patch v13-0004
- v13-0001-Eliminate-xl_heap_visible-in-COPY-FREEZE.patch (text/x-patch) patch v13-0001
- v13-0002-Make-heap_page_is_all_visible-independent-of-LVR.patch (text/x-patch) patch v13-0002
- v13-0003-Eliminate-xl_heap_visible-from-vacuum-phase-III.patch (text/x-patch) patch v13-0003
- v13-0007-Find-and-fix-VM-corruption-in-heap_page_prune_an.patch (text/x-patch) patch v13-0007
- v13-0006-Combine-vacuum-phase-I-VM-update-cases.patch (text/x-patch) patch v13-0006
- v13-0008-Keep-all_frozen-updated-too-in-heap_page_prune_a.patch (text/x-patch) patch v13-0008
- v13-0005-Combine-lazy_scan_prune-VM-corruption-cases.patch (text/x-patch) patch v13-0005
- v13-0010-Rename-PruneState.freeze-to-attempt_freeze.patch (text/x-patch) patch v13-0010
- v13-0009-Update-VM-in-pruneheap.c.patch (text/x-patch) patch v13-0009
- v13-0011-Eliminate-xl_heap_visible-from-vacuum-phase-I-pr.patch (text/x-patch) patch v13-0011
- v13-0012-Remove-xl_heap_visible-entirely.patch (text/x-patch) patch v13-0012
- v13-0015-Inline-TransactionIdFollows-Precedes.patch (text/x-patch) patch v13-0015
- v13-0013-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (text/x-patch) patch v13-0013
- v13-0014-Use-GlobalVisState-to-determine-page-level-visib.patch (text/x-patch) patch v13-0014
- v13-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (text/x-patch) patch v13-0017
- v13-0018-Add-helper-functions-to-heap_page_prune_and_free.patch (text/x-patch) patch v13-0018
- v13-0016-Unset-all-visible-sooner-if-not-freezing.patch (text/x-patch) patch v13-0016
- v13-0020-Set-pd_prune_xid-on-insert.patch (text/x-patch) patch v13-0020
- v13-0019-Reorder-heap_page_prune_and_freeze-parameters.patch (text/x-patch) patch v13-0019
Thanks for the review! I've made the changes to comments and minor
fixes you suggested in attached v13 and have limited my inline
responses to areas where further discussion is required.
On Tue, Sep 9, 2025 at 3:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> 0003. What you've done here with xl_heap_prune.flags is kind of
> horrifying. The problem is that, while you've added code explaining
> that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
> nobody who isn't looking directly at that comment is going to
> understand the muddling of the two namespaces. I would suggest not
> doing this, even if it means defining redundant constants and writing
> technically-unnecessary code to translate between them.
Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.
> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?
Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).
As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().
I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.
> 0005. It looks a little curious that you delete the
> identify-corruption logic from the end of the if-nest and add it to
> the beginning. Ceteris paribus, you'd expect that to be worse, since
> corruption is a rare case.
On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.
But, specifically, from a performance perspective:
I think moving up the third case doesn't matter because the check is so cheap:
else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
And as for moving up the second case (the other corruption case), the
non-cheap thing it does is call visibilitymap_get_status()
else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
But once you call visibilitymap_get_status() once, assuming there is
no corruption and you need to go set the VM, you've already got that
page of the VM read, so it is probably pretty cheap. Overall, I didn't
think this would add noticeable overhead or many wasted operations.
And I thought that reorganizing the code improved clarity as well as
decreased the likelihood of bugs from insufficiently guarding positive
cases against corrupt pages and overwriting corruption instead of
detecting it.
If we're really worried about it from a performance perspective, I
could add an extra test at the top of identify_and_fix_vm_corruption()
that dumps out early if (!all_visible_according_to_vm &&
presult.all_visible).
> + * If the heap page is all-visible but the VM bit is
> not set, we don't
> + * need to dirty the heap page. However, if checksums
> are enabled, we
> + * do need to make sure that the heap page is dirtied
> before passing
> + * it to visibilitymap_set(), because it may be logged.
> */
> - PageSetAllVisible(page);
> - MarkBufferDirty(buf);
> + if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
> + {
> + PageSetAllVisible(page);
> + MarkBufferDirty(buf);
> + }
>
> I really hate this. Maybe you're going to argue that it's not the job
> of this patch to fix the awfulness here, but surely marking a buffer
> dirty in case some other function decides to WAL-log it is a
> ridiculous plan.
Right, it isn't pretty. But I don't quite see what the alternative is.
We need to mark the buffer dirty before setting the LSN. We could
perhaps rewrite visibilitymap_set()'s API to return the LSN of the
xl_heap_visible record and stamp it on the heap buffer ourselves. But
1) I think visibilitymap_set() purposefully conceals its WAL logging
ways from the caller and propagating that info back up starts to make
the API messy in another way and 2) I'm a bit loath to make big
changes to visibilitymap_set() right now since my patch set eventually
resolves this by putting the changes to the VM and heap page in the
same WAL record.
- Melanie