Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Melanie Plageman <melanieplageman@gmail.com>

Cc: Robert Haas <robertmhaas@gmail.com>, Kirill Reshke <reshkekirill@gmail.com>, Andrey Borodin <x4mmm@yandex-team.ru>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <hlinnaka@iki.fi>

Date: 2025-09-18T16:48:45Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin
- 284925508ae6 19 (unreleased) landed
Simplify visibility check in heap_page_would_be_all_visible()
- 3efe58febc3c 19 (unreleased) landed
Eliminate use of cached VM value in lazy_scan_prune()
- 648a7e28d7c2 19 (unreleased) landed
Combine visibilitymap_set() cases in lazy_scan_prune()
- 21796c267d0a 19 (unreleased) landed
Fix const qualification in prune_freeze_setup()
- 4877391ce894 19 (unreleased) landed
Simplify vacuum visibility assertion
- bd298f54a0d6 19 (unreleased) landed
Split heap_page_prune_and_freeze() into helpers
- e135e044572e 19 (unreleased) landed
Assert that cutoffs are provided if freezing will be attempted
- cd38b7e77315 19 (unreleased) landed
Split PruneFreezeParams initializers to one field per line
- 1e14edcea5e1 19 (unreleased) landed
Refactor heap_page_prune_and_freeze() parameters into a struct
- 1937ed70621e 19 (unreleased) landed
Make heap_page_is_all_visible independent of LVRelState
- 3e4705484e0c 19 (unreleased) landed
Inline TransactionIdFollows/Precedes[OrEquals]()
- 43b05b38ea4d 19 (unreleased) landed
Add helper for freeze determination to heap_page_prune_and_freeze
- c8dd6542bae4 19 (unreleased) landed
Bump XLOG_PAGE_MAGIC after xl_heap_prune change
- 4a8fb58671d3 19 (unreleased) landed
Correct prune WAL record opcode name in comment
- ae8ea7278c16 19 (unreleased) landed
Add error codes when vacuum discovers VM corruption
- 8ec97e78a771 19 (unreleased) landed
Remove unused xl_heap_prune member, reason
- 4b5f206de2bb 19 (unreleased) landed
Remove unneeded VM pin from VM replay
- 3399c265543e 19 (unreleased) landed
Add assert and log message to visibilitymap_set
- e3d5ddb7ca91 19 (unreleased) landed
Add error codes to some corruption log messages
- fd6ec93bf890 13.0 cited

Hi,

On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
> 0001 is RFC but waiting on one other reviewer

> From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 17 Jun 2025 17:22:10 -0400
> Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
> diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
> index cf843277938..faa7c561a8a 100644
> --- a/src/backend/access/heap/heapam_xlog.c
> +++ b/src/backend/access/heap/heapam_xlog.c
> @@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	int			i;
>  	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
>  	XLogRedoAction action;
> +	Buffer		vmbuffer = InvalidBuffer;
>
>  	/*
>  	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
> @@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
>  	{
>  		Relation	reln = CreateFakeRelcacheEntry(rlocator);
> -		Buffer		vmbuffer = InvalidBuffer;
>
>  		visibilitymap_pin(reln, blkno, &vmbuffer);
>  		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
>  		ReleaseBuffer(vmbuffer);
> +		vmbuffer = InvalidBuffer;
>  		FreeFakeRelcacheEntry(reln);
>  	}
>
> @@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
>  	if (BufferIsValid(buffer))
>  		UnlockReleaseBuffer(buffer);
>
> +	buffer = InvalidBuffer;
> +
> +	/*
> +	 * Now read and update the VM block.
> +	 *
> +	 * Note that the heap relation may have been dropped or truncated, leading
> +	 * us to skip updating the heap block due to the LSN interlock.

I don't fully understand this - how does dropping/truncating the relation lead
to skipping due to the LSN interlock?


> +	 * even in that case, it's still safe to update the visibility map. Any
> +	 * WAL record that clears the visibility map bit does so before checking
> +	 * the page LSN, so any bits that need to be cleared will still be
> +	 * cleared.
> +	 *
> +	 * Note that the lock on the heap page was dropped above. In normal
> +	 * operation this would never be safe because a concurrent query could
> +	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
> +	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
> +	 * the VM is set.
> +	 *
> +	 * In recovery, we expect no other writers, so writing to the VM page
> +	 * without holding a lock on the heap page is considered safe enough. It
> +	 * is done this way when replaying xl_heap_visible records (see
> +	 * heap_xlog_visible()).
> +	 */
> +	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
> +		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
> +									  &vmbuffer) == BLK_NEEDS_REDO)
> +	{

Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
heap_xlog_visible(), but I don't immediately understand (or remember) why we
do so there either?


> +		Page		vmpage = BufferGetPage(vmbuffer);
> +		Relation	reln = CreateFakeRelcacheEntry(rlocator);

Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
really like to eventually get rid of that and given that the new "code shape"
delegates a lot more responsibility to the redo routines, they should have a
fairly easy time not needing a fake relcache?  Afaict the relation already is
not used outside of debugging paths?


> +		/* initialize the page if it was read as zeros */
> +		if (PageIsNew(vmpage))
> +			PageInit(vmpage, BLCKSZ, 0);
> +
> +		visibilitymap_set_vmbits(reln, blkno,
> +								 vmbuffer,
> +								 VISIBILITYMAP_ALL_VISIBLE |
> +								 VISIBILITYMAP_ALL_FROZEN);
> +
> +		/*
> +		 * It is not possible that the VM was already set for this heap page,
> +		 * so the vmbuffer must have been modified and marked dirty.
> +		 */

I assume that's because we a) checked the LSN interlock b) are replaying
something that needed to newly set the bit?


Except for the above comments, this looks pretty good to me.


Seems 0002 should just be applied...


Re 0003: I wonder if it's getting to the point that a struct should be used as
the argument.

Greetings,

Andres Freund