Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Melanie Plageman <melanieplageman@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin
- 284925508ae6 19 (unreleased) landed
-
Simplify visibility check in heap_page_would_be_all_visible()
- 3efe58febc3c 19 (unreleased) landed
-
Eliminate use of cached VM value in lazy_scan_prune()
- 648a7e28d7c2 19 (unreleased) landed
-
Combine visibilitymap_set() cases in lazy_scan_prune()
- 21796c267d0a 19 (unreleased) landed
-
Fix const qualification in prune_freeze_setup()
- 4877391ce894 19 (unreleased) landed
-
Simplify vacuum visibility assertion
- bd298f54a0d6 19 (unreleased) landed
-
Split heap_page_prune_and_freeze() into helpers
- e135e044572e 19 (unreleased) landed
-
Assert that cutoffs are provided if freezing will be attempted
- cd38b7e77315 19 (unreleased) landed
-
Split PruneFreezeParams initializers to one field per line
- 1e14edcea5e1 19 (unreleased) landed
-
Refactor heap_page_prune_and_freeze() parameters into a struct
- 1937ed70621e 19 (unreleased) landed
-
Make heap_page_is_all_visible independent of LVRelState
- 3e4705484e0c 19 (unreleased) landed
-
Inline TransactionIdFollows/Precedes[OrEquals]()
- 43b05b38ea4d 19 (unreleased) landed
-
Add helper for freeze determination to heap_page_prune_and_freeze
- c8dd6542bae4 19 (unreleased) landed
-
Bump XLOG_PAGE_MAGIC after xl_heap_prune change
- 4a8fb58671d3 19 (unreleased) landed
-
Correct prune WAL record opcode name in comment
- ae8ea7278c16 19 (unreleased) landed
-
Add error codes when vacuum discovers VM corruption
- 8ec97e78a771 19 (unreleased) landed
-
Remove unused xl_heap_prune member, reason
- 4b5f206de2bb 19 (unreleased) landed
-
Remove unneeded VM pin from VM replay
- 3399c265543e 19 (unreleased) landed
-
Add assert and log message to visibilitymap_set
- e3d5ddb7ca91 19 (unreleased) landed
-
Add error codes to some corruption log messages
- fd6ec93bf890 13.0 cited
On Thu, Dec 18, 2025 at 3:55 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>
> > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
> > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
> > causing us to vacuum an all-frozen empty page.
>
> Yes, vacuum (disable_page_skipping);
Ah, right, that would be a reliable way for it to happen.
> > Then the question is, why wouldn't we have coverage of the empty page
> > first being set all-visible/all-frozen? It can't be COPY FREEZE
> > because the page is empty. And it can't be vacuum, because then we
> > would have coverage. It's very mysterious.
<--snip-->
> I am currently inclined to think that we cannot see an empty page that
> has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
> we are in a critical section, and we WAL-log everything we do, so our
> changes should not be half-made. Maybe as of 608195a3a365, there was a
> case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
> happens on HEAD.
Right, so the way that empty pages get set PD_ALL_VISIBLE is when a
page has all its tuples deleted, the next time it is vacuumed it will
be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if
it's a trailing page it will be truncated, but any non-trailing page
will be like this).
But you are right, I don't see any non-error code path where a heap
page would become empty (all line pointers set unused) and then not be
set all-visible. Only vacuum sets line pointers unused and if all the
line pointers are unused it will always set the page all-visible.
I think, though, that if we error out in lazy_scan_prune() after
returning from heap_page_prune_and_freeze() such that we don't set the
empty page all-visible, we can end up with an empty page without
PD_ALL_VISIBLE set. You can see how this might work by patching the VM
set code in lazy_scan_prune() to skip empty pages.
> I did small archeology and this "if (PageIsEmpty(page)) { if
> (!PageIsAllVisible(page)) { .... }}" code originates back to
> 608195a3a365. Comment about not WAL-logged relation extension is from
> a6370fd9ed3d, and I don't think we need to think about this case.
Thanks for looking into this. Even if this code was added to handle
the error codepath I mentioned above, it seems like it would have been
good enough to just let lazy_scan_prune() handle setting the empty
page all-visible the next time the page was vacuumed. Since there is
no non-error code path where this can happen, it doesn't seem like it
would merit its own special case.
It is possible it was more common as of 608195a3a365, as you say.
I don't understand how the bug fixed by a6370fd9ed3d can happen. When
a new page is initialized, flags are set to 0, so regardless of WAL
logging of the extension not happening, how would the new page have
been set PD_ALL_VISIBLE? We'll have to ask Andres or Robert about how
this was hit.
> Also, after the whole set is committed, we should then never
> experience discrepancy between PD_ALL_VISIBLE and VM bits? Because
> they will be set in a single WAL record. The only cases when heap and
> VM disagrees on all-visibility then are corruption,
> pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
> If my understanding is correct, should we add document this?
Even on current master, I don't see a scenario other than VM
corruption or truncation where PD_ALL_VISIBLE can be set but not the
VM (or vice versa). The only way would be if you error out after
setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
is not in a critical section in lazy_scan_prune(), so it won't panic
and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
later get written out. But the only obvious way I see to error out of
MarkBufferDirty() is if the buffer is not valid -- which would have
kept us from doing previous operations on the buffer, I would think.
It's true this will no longer happen after my patches, as
PageSetAllVisible() will happen in a critical section. We could add a
comment about this particular scenario in the code somewhere. But I
don't think we should document it in any user-facing documentation
since you could still truncate the VM and have the two out of sync.
- Melanie