Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Melanie Plageman <melanieplageman@gmail.com>

From: Melanie Plageman <melanieplageman@gmail.com>
To: Xuneng Zhou <xunengzhou@gmail.com>
Cc: Andres Freund <andres@anarazel.de>, Kirill Reshke <reshkekirill@gmail.com>, Robert Haas <robertmhaas@gmail.com>, Andrey Borodin <x4mmm@yandex-team.ru>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>, Chao Li <li.evan.chao@gmail.com>
Date: 2025-12-19T21:09:47Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin

  2. Simplify visibility check in heap_page_would_be_all_visible()

  3. Eliminate use of cached VM value in lazy_scan_prune()

  4. Combine visibilitymap_set() cases in lazy_scan_prune()

  5. Fix const qualification in prune_freeze_setup()

  6. Simplify vacuum visibility assertion

  7. Split heap_page_prune_and_freeze() into helpers

  8. Assert that cutoffs are provided if freezing will be attempted

  9. Split PruneFreezeParams initializers to one field per line

  10. Refactor heap_page_prune_and_freeze() parameters into a struct

  11. Make heap_page_is_all_visible independent of LVRelState

  12. Inline TransactionIdFollows/Precedes[OrEquals]()

  13. Add helper for freeze determination to heap_page_prune_and_freeze

  14. Bump XLOG_PAGE_MAGIC after xl_heap_prune change

  15. Correct prune WAL record opcode name in comment

  16. Add error codes when vacuum discovers VM corruption

  17. Remove unused xl_heap_prune member, reason

  18. Remove unneeded VM pin from VM replay

  19. Add assert and log message to visibilitymap_set

  20. Add error codes to some corruption log messages

Attachments

Attached v29 addresses some feedback and also corrects a small error
with the assertion I had added in the previous version's 0009.

On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> I’ve done a basic review of patches 1 and 2. Here are some comments
> which may be somewhat immature, as this is a fairly large change set
> and I’m new to some parts of the code.
>
> 1) Potential stale old_vmbits after VM repair n v2

Good catch! I've fixed this in attached v29.

> 2) Add Assert(BufferIsDirty(buf))
>
> Since the patch's core claim is "buffer must be dirty before WAL
> registration", an assertion encodes this invariant. Should we add:
>
> Assert(BufferIsValid(buf));
> Assert(BufferIsDirty(buf));
>
> right before the visibilitymap_set() call?

There are already assertions that will trip in various places -- most
importantly in XLogRegisterBuffer(), which is the one that inspired
this refactor.

> The comment at lines:
> > "The only scenario where it is not already dirty is if the VM was removed…"
>
> This phrasing could become misleading after future refactors. Can we
> make it more direct like:
>
> > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."

I see your point about future refactors missing updating comments like
this. But, I don't think we are going to refactor the code such that
we can have PD_ALL_VISIBLE set without the VM bits set more often.
Also, it is common practice in Postgres to describe very specific edge
cases or odd scenarios in order to explain code that may seem
confusing without the comment. It does risk that comment later
becoming stale, but it is better that future developers understand why
the code is there.

That being said, I take your point that the comment is confusing. I
have updated it in a different way.

> > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
>
> In this test we now call MarkBufferDirty() on the heap page even when
> only setting the VM, so the comments claiming “does not need to modify
> the heap buffer”/“no heap page modification” might be misleading. It
> might be better to say the test doesn’t need to modify heap
> tuples/page contents or doesn’t need to prune/freeze.

The point I'm trying to make is that we have to dirty the buffer even
if we don't modify the page because of the XLOG sub-system
requirements. And, it may seem like a waste to do that if not
modifying the page, but the page will rarely be clean anyway. I've
tried to make this more clear in attached v29.

- Melanie