Re: POC: make mxidoff 64 bits

Heikki Linnakangas <hlinnaka@iki.fi>

From: Heikki Linnakangas <hlinnaka@iki.fi>
To: Maxim Orlov <orlovmg@gmail.com>
Cc: wenhui qiu <qiuwenhuifx@gmail.com>, Alexander Korotkov <aekorotkov@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-11-05T15:37:54Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix partial read handling in pg_upgrade's multixact conversion

  2. Increase timeout in multixid_conversion upgrade test

  3. Improve sanity checks on multixid members length

  4. Clarify comment on multixid offset wraparound check

  5. Never store 0 as the nextMXact

  6. Add runtime checks for bogus multixact offsets

  7. Widen MultiXactOffset to 64 bits

  8. Move pg_multixact SLRU page format definitions to a separate header

  9. Convert confusing macros in multixact.c to static inline functions

  10. Index SLRUs by 64-bit integers rather than by 32-bit integers

  11. Cope with possible failure of the oldest MultiXact to exist.

Attachments

On 30/10/2025 18:17, Maxim Orlov wrote:
> PFA v20 returns to using actual 64-bit offsets for on-disk SLRU
> segments.
> 
> Fortunately, now that I've separated reading and writing offsets into
> different functions, switching from one implementation to another is
> easy to do.
> 
> Here's a quick overview of the current state of the patch:
> 1) Access to the offset is placed to separate calls:
>     MXOffsetWrite/MXOffsetRead.
> 2) I abandoned byte juggling in pg_upgrade and moved to using logic that
>     replicates the work with offsets im multixact.c
> 3) As a result, the update issue came down to the correct implementation
>     of functions MXOffsetWrite/MXOffsetRead.
> 4) The only question that remains is the question of disk representation
>     of 64-bit offsets in SLRU segments.

Here's another round of review and cleanup of this. I made a bunch of 
small changes, but haven't found any major problems. Looking pretty good.

Notable changes since v20:

- Changed MULTIXACT_MEMBER_AUTOVAC_THRESHOLD to 4000000000 instead of 
0xFFFFFFFF. The 2^32 mark doesn't have any particular meaning 
significance and using a round number makes that more clear. We should 
possibly expose that as a separate GUC, but that can be done in a 
followup patch.

- Removed the MXOffsetRead/Write functions again. They certainly make 
sense if we make them more complicated with some kind of a compression 
scheme, but I preferred to keep the code closer to 'master' for now.

- Removed more remnants of offset wraparound handling. There were still 
a few places that checked for wraparound and tried to deal with it, 
while other places just assumed that it doesn't happen. I added a check 
in GetNewMultiXactId() to error out if it would wrap around. It really 
should not happen in the real world, one reason being that we would run 
out of WAL before running out of 64-bit mxoffsets, but a sanity check 
won't hurt just in case someone e.g. abuses pg_resetwal to get into that 
situation.

- Removed MaybeExtendOffsetSlru(). It was only used to deal with binary 
upgrade from version 9.2 and below. Now that pg_upgrade rewrites the 
files, it's not needed anymore.

- Modified PerformMembersTruncation() to use SimpleLruTruncate()

Changes in pg_upgrade:

- Removed the nextMXact/nextOffset fields from MultiXactWriter. They 
were redundant with the next_multi and next_offset local variables in 
the caller.

Remaining issues:

- There's one more refactoring I'd like to do before merging this: Move 
the definitions that are now duplicated between 
src/bin/pg_upgrade/multixact_new.c and 
src/backend/access/transam/multixact.c into a new header file, 
multixact_internal.h. One complication with that is that it needs 
SLRU_PAGES_PER_SEGMENT from access/slru.h, but slru.h cannot currently 
be included in FRONTEND code. Perhaps we should move 
SLRU_PAGES_PER_SEGMENT to pg_config_manual.h, or if that feels too 
global, to a separate slru_config.h file.

- I saw Alexander's proposal for a new compression scheme but didn't 
incorporate that here. It might be a good idea, but I think we can do 
that as a followup patch before the release, if it seems worth it. I 
don't feel too bad about just making pg_multixact/offsets 2x larger either.

- Have you done any performance testing of the pg_upgrade code? How long 
does the conversion take if you have e.g. 1 billion multixids?

- Is the !oldestOffsetKnown case in the code still reachable? I left one 
FIXME comment about that. Needs a comment update at least.

- The new pg_upgrade test fails on my system with this error in the log:

> # Running: pg_dump --no-sync --restrict-key test -d port=22462 host=/tmp/5KdMvth1jk dbname='postgres' -f /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql
> pg_dump: error: aborting because of server version mismatch
> pg_dump: detail: server version: 19devel; pg_dump version: 17.6
> could not read "/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql": No such file or directory at /home/heikki/git-sandbox/postgresql/src/bin/pg_upgrade/t/007_mxoff.pl line 242.

This turns out to be an issue with IPC::Run. Setting the 
IPCRUNDEBUG=basic env variable reveals that it has a built-in command cache:

> IPC::Run 0004 [#19(109223)]: ****** harnessing *****
> IPC::Run 0004 [#19(109223)]: parsing [ pg_dump --no-sync --restrict-key test -d 'port=20999 host=/tmp/NsJKldN1Ie dbname='postgres'' -f '/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql' ]
> IPC::Run 0004 [#19(109223)]: ** starting
> IPC::Run 0004 [#19(109223)]: 'pg_dump' found in cache: '/home/heikki/pgsql.17stable/bin/pg_dump'
> IPC::Run 0004 [#19(111432) pg_dump]: execing /home/heikki/pgsql.17stable/bin/pg_dump --no-sync --restrict-key test -d 'port=20999 host=/tmp/NsJKldN1Ie dbname='postgres'' -f /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql
> IPC::Run 0004 [#19(109223)]: ** finishing
> pg_dump: error: aborting because of server version mismatch
> pg_dump: detail: server version: 19devel; pg_dump version: 17.6

The test calls pg_dump twice: first with the old version, then with the 
new version. But thanks to IPC::Run's command cache, the invocation of 
the new pg_dump version actually also calls the old version. I'm not 
sure how to fix that, but I was able to work around it by reversing the 
pg_dump calls so that thew new version is called first. That way we use 
the new pg_dump against both server versions which works.

- The new pg_ugprade test is very slow. I would love to include that 
test permanently in the test suite, but it's too slow for that currently.

- Heikki