Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: Melanie Plageman <melanieplageman@gmail.com>

Cc: Andres Freund <andres@anarazel.de>, Kirill Reshke <reshkekirill@gmail.com>, Andrey Borodin <x4mmm@yandex-team.ru>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>

Date: 2025-09-08T20:14:47Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin
- 284925508ae6 19 (unreleased) landed
Simplify visibility check in heap_page_would_be_all_visible()
- 3efe58febc3c 19 (unreleased) landed
Eliminate use of cached VM value in lazy_scan_prune()
- 648a7e28d7c2 19 (unreleased) landed
Combine visibilitymap_set() cases in lazy_scan_prune()
- 21796c267d0a 19 (unreleased) landed
Fix const qualification in prune_freeze_setup()
- 4877391ce894 19 (unreleased) landed
Simplify vacuum visibility assertion
- bd298f54a0d6 19 (unreleased) landed
Split heap_page_prune_and_freeze() into helpers
- e135e044572e 19 (unreleased) landed
Assert that cutoffs are provided if freezing will be attempted
- cd38b7e77315 19 (unreleased) landed
Split PruneFreezeParams initializers to one field per line
- 1e14edcea5e1 19 (unreleased) landed
Refactor heap_page_prune_and_freeze() parameters into a struct
- 1937ed70621e 19 (unreleased) landed
Make heap_page_is_all_visible independent of LVRelState
- 3e4705484e0c 19 (unreleased) landed
Inline TransactionIdFollows/Precedes[OrEquals]()
- 43b05b38ea4d 19 (unreleased) landed
Add helper for freeze determination to heap_page_prune_and_freeze
- c8dd6542bae4 19 (unreleased) landed
Bump XLOG_PAGE_MAGIC after xl_heap_prune change
- 4a8fb58671d3 19 (unreleased) landed
Correct prune WAL record opcode name in comment
- ae8ea7278c16 19 (unreleased) landed
Add error codes when vacuum discovers VM corruption
- 8ec97e78a771 19 (unreleased) landed
Remove unused xl_heap_prune member, reason
- 4b5f206de2bb 19 (unreleased) landed
Remove unneeded VM pin from VM replay
- 3399c265543e 19 (unreleased) landed
Add assert and log message to visibilitymap_set
- e3d5ddb7ca91 19 (unreleased) landed
Add error codes to some corruption log messages
- fd6ec93bf890 13.0 cited

Reviewing 0003:

+               /*
+                * If we're only adding already frozen rows to a
previously empty
+                * page, mark it as all-frozen and update the
visibility map. We're
+                * already holding a pin on the vmbuffer.
+                */
                else if (all_frozen_set)
+               {
                        PageSetAllVisible(page);
+                       LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+                       visibilitymap_set_vmbits(relation,
+
  BufferGetBlockNumber(buffer),
+
  vmbuffer,
+
  VISIBILITYMAP_ALL_VISIBLE |
+
  VISIBILITYMAP_ALL_FROZEN);

Locking a buffer in a critical section violates the order of
operations proposed in the 'Write-Ahead Log Coding' section of
src/backend/access/transam/README.

+        * Now read and update the VM block. Even if we skipped
updating the heap
+        * page due to the file being dropped or truncated later in
recovery, it's
+        * still safe to update the visibility map.  Any WAL record that clears
+        * the visibility map bit does so before checking the page LSN, so any
+        * bits that need to be cleared will still be cleared.
+        *
+        * It is only okay to set the VM bits without holding the heap page lock
+        * because we can expect no other writers of this page.

The first paragraph of this paraphrases a similar content in
xlog_heap_visible(), but I don't see the variation in phrasing as an
improvement.

The second paragraph does not convince me at all. I see no reason to
believe that this is safe, or that it is a good idea. The code in
xlog_heap_visible() thinks its OK to unlock and relock the page to
make visibilitymap_set() happy, which is cringy but probably safe for
lack of concurrent writers, but skipping locking altogether seems
deeply unwise.

- *             visibilitymap_set        - set a bit in a previously pinned page
+ *             visibilitymap_set        - set bit(s) in a previously
pinned page and log
+ *      visibilitymap_set_vmbits - set bit(s) in a pinned page

I suspect the indentation was done with a different mix of spaces and
tabs here, because this doesn't align for me.

In general, this idea makes some sense to me -- there doesn't seem to
be any particularly good reason why the visibility-map update should
be handled by a different WAL record than the all-visible flag on the
page itself. It's a little hard for me to make that statement too
conclusively without studying more of the patches than I've had time
to do today, but off the top of my head it seems to make sense.
However, I'm not sure you've taken enough care with the details here.

--
Robert Haas
EDB: http://www.enterprisedb.com