Re: Add 64-bit XIDs into PostgreSQL 15

Yura Sokolov <y.sokolov@postgrespro.ru>

From: Yura Sokolov <y.sokolov@postgrespro.ru>
To: Evgeny Voropaev <evorop.wiki@gmail.com>, Maxim Orlov <orlovmg@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2025-06-11T13:12:39Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Add SLRU tests for 64-bit page case

  2. Make use FullTransactionId in 2PC filenames

  3. Use larger segment file names for pg_notify

  4. Index SLRUs by 64-bit integers rather than by 32-bit integers

11.06.2025 09:00, Evgeny Voropaev wrote:
> 2) About repairing fragmentation.
> 
> The original approach implemented in PG18 assumes that fragmentation 
> occurs during every `prune_freeze` operation. It happens because the 
> logic of the "redo"-function `heap_xlog_prune_freeze` assumes that 
> fragmentation has to be done by `heap_page_prune_execute`.


> Attempting to 
> omit fragmentation can result in page inconsistencies on the "redo"-side 
> (i.e. on a secondary node, or during the recovery process on primary 
> one).

No! Because patch uses flag in WAL record to instruct "redo"-side to omit
fragmentation as well if needed.

> So, implementation of optional repairing of fragmentation 
> conflicts with the basic assumption about "necessity of fragmentation". 
> In order to prevent inconsistency xid64v62 patch invokes 
> `heap_page_prune_and_freeze` with `repairFragmentation` equal to true 
> from everywhere in the patch code except from 
> `heap_page_prepare_for_xid` which uses `repairFragmentation=false`.
> 
> So, why must we perform a `heap_page_prune_execute` without a 
> fragmentation during the preparation of a page for xid?
> 
> What exactly would break if we did invoke `heap_page_prune_execute` with 
> `repairFragmentation=true` during performing of `heap_page_prepare_for_xid`?

Short answer:
- `repairFragmentation` parameter were added after investigating real
production issues with earlier patch versions.

Long answer:

How SELECT works with tuples on a page?
It:
- PINS the page
- takes CONTENT LOCK in SHARED mode
- collects HeapTuples which LOOKS INTO RAW PAGE with t_data.t_choice.t_heap
- RELEASES content lock
- may use those HeapTuples for indefinitely long time relying only on PIN
of the page.

I.e. SELECT relies on the fact, while a page is pinned, tuples on the page
stay at the same positions in memory.

That is why LockBufferForCleanup and ConditionalLockBufferForCleanup checks
there is only single PIN on the page - only backend which will perform
cleanup is allowed to PIN the page.

UPDATE/INSERT/DELETE lock CONTENT LOCK in EXCLUSIVE mode because they may
add new tuples. But they are not allowed to move tuples because concurrent
backends allowed to read tuples from the page in exactly same moment.

-- 
regards
Yura Sokolov aka funny-falcon