Re: POC: make mxidoff 64 bits

Maxim Orlov <orlovmg@gmail.com>

From: Maxim Orlov <orlovmg@gmail.com>
To: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: wenhui qiu <qiuwenhuifx@gmail.com>, Alexander Korotkov <aekorotkov@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-10-30T16:17:00Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix partial read handling in pg_upgrade's multixact conversion

  2. Increase timeout in multixid_conversion upgrade test

  3. Improve sanity checks on multixid members length

  4. Clarify comment on multixid offset wraparound check

  5. Never store 0 as the nextMXact

  6. Add runtime checks for bogus multixact offsets

  7. Widen MultiXactOffset to 64 bits

  8. Move pg_multixact SLRU page format definitions to a separate header

  9. Convert confusing macros in multixact.c to static inline functions

  10. Index SLRUs by 64-bit integers rather than by 32-bit integers

  11. Cope with possible failure of the oldest MultiXact to exist.

Attachments

On Thu, 30 Oct 2025 at 12:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

>
> Oh I see, the 'base' is not necessarily the base offset of the first
> multixact on the page, it's the base offset of the first multixid that
> is written to the page. And the (short) offsets can be negative. That's
> a frighteningly clever encoding scheme. One upshot of that is that WAL
> redo might get construct the page with a different 'base'. I guess that
> works, but it scares me. Could we come up with a more deterministic scheme?
>
> Definitely! The most stable approach is the one we had before, which
used actual 64-bit offsets in the SLRU. To be honest, I'm completely
happy with it. After all, what's most important for me is to have 64-bit
xids in Postgres, and this patch is a step towards that goal.

PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
segments.

Fortunately, now that I've separated reading and writing offsets into
different functions, switching from one implementation to another is
easy to do.

Here's a quick overview of the current state of the patch:
1) Access to the offset is placed to separate calls:
   MXOffsetWrite/MXOffsetRead.
2) I abandoned byte juggling in pg_upgrade and moved to using logic that
   replicates the work with offsets im multixact.c
3) As a result, the update issue came down to the correct implementation
   of functions MXOffsetWrite/MXOffsetRead.
4) The only question that remains is the question of disk representation
   of 64-bit offsets in SLRU segments.

My thoughts on point (4).

Using 32-bit offsets + some kind of packing:
Pros:
 + Reduce the total disc space used by the segments; ideally it is
   almost the same as before.
Cons:
 - Reduces reliability (losing a part will most likely result in losing
   the entire page).
 - Complicates code, especially considering that segments may be written
   to the page in random order.

Using 64-bit offsets in SLRU:
Pros:
 + Easy to implement/transparent logic.
Cons:
 - Increases the amount of disk space used.

In terms of speed, I'm not sure which will be faster. On the one hand,
64-bit eliminates the necessity for calculations and branching. On the
other hand, the amount of data used will increase.

I am not opposed to any of these options, as our primary goal is getting
64-bit offsets. However, I like the approach using full 64-bit offsets
in SLRU, because it is more clear and, should we say, robust. Yes, it
will increase the number of segment, however this is not heap data in
for a table. Under typical circumstances, there should not be too many
such segments.

-- 
Best regards,
Maxim Orlov.