Re: POC: make mxidoff 64 bits
Heikki Linnakangas <hlinnaka@iki.fi>
From: Heikki Linnakangas <hlinnaka@iki.fi>
To: Maxim Orlov <orlovmg@gmail.com>
Cc: wenhui qiu <qiuwenhuifx@gmail.com>,
Alexander Korotkov <aekorotkov@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
Postgres hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-11-05T15:37:54Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix partial read handling in pg_upgrade's multixact conversion
- ac94ce8194e5 19 (unreleased) landed
-
Increase timeout in multixid_conversion upgrade test
- bd43940b02b2 19 (unreleased) landed
-
Improve sanity checks on multixid members length
- ecb553ae8211 19 (unreleased) landed
-
Clarify comment on multixid offset wraparound check
- 170361d7b869 14.21 landed
- b0b52b7123ae 15.16 landed
- 7d42e2367c6b 16.12 landed
- cd1a887fe9bf 17.8 landed
- 3fbad030a24d 18.2 landed
- 366dcdaf5779 19 (unreleased) landed
-
Never store 0 as the nextMXact
- 87a350e1f284 19 (unreleased) landed
-
Add runtime checks for bogus multixact offsets
- d4b7bde4183b 19 (unreleased) landed
-
Widen MultiXactOffset to 64 bits
- bd8d9c9bdfa0 19 (unreleased) landed
-
Move pg_multixact SLRU page format definitions to a separate header
- bb3b1c4f6462 19 (unreleased) landed
-
Convert confusing macros in multixact.c to static inline functions
- 0099b9408e8c 17.0 landed
-
Index SLRUs by 64-bit integers rather than by 32-bit integers
- 4ed8f0913bfd 17.0 cited
-
Cope with possible failure of the oldest MultiXact to exist.
- b6a3444fa635 9.4.4 cited
Attachments
- v22-0001-Use-64-bit-multixact-offsets.patch (text/x-patch) patch v22-0001
- v22-0002-Add-pg_upgarde-for-64-bit-multixact-offsets.patch (text/x-patch) patch v22-0002
- v22-0003-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch (text/x-patch) patch v22-0003
- v22-0004-TEST-bump-catversion.patch (text/x-patch) patch v22-0004
- v22-0005-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch (text/x-patch) patch v22-0005
On 30/10/2025 18:17, Maxim Orlov wrote: > PFA v20 returns to using actual 64-bit offsets for on-disk SLRU > segments. > > Fortunately, now that I've separated reading and writing offsets into > different functions, switching from one implementation to another is > easy to do. > > Here's a quick overview of the current state of the patch: > 1) Access to the offset is placed to separate calls: > MXOffsetWrite/MXOffsetRead. > 2) I abandoned byte juggling in pg_upgrade and moved to using logic that > replicates the work with offsets im multixact.c > 3) As a result, the update issue came down to the correct implementation > of functions MXOffsetWrite/MXOffsetRead. > 4) The only question that remains is the question of disk representation > of 64-bit offsets in SLRU segments. Here's another round of review and cleanup of this. I made a bunch of small changes, but haven't found any major problems. Looking pretty good. Notable changes since v20: - Changed MULTIXACT_MEMBER_AUTOVAC_THRESHOLD to 4000000000 instead of 0xFFFFFFFF. The 2^32 mark doesn't have any particular meaning significance and using a round number makes that more clear. We should possibly expose that as a separate GUC, but that can be done in a followup patch. - Removed the MXOffsetRead/Write functions again. They certainly make sense if we make them more complicated with some kind of a compression scheme, but I preferred to keep the code closer to 'master' for now. - Removed more remnants of offset wraparound handling. There were still a few places that checked for wraparound and tried to deal with it, while other places just assumed that it doesn't happen. I added a check in GetNewMultiXactId() to error out if it would wrap around. It really should not happen in the real world, one reason being that we would run out of WAL before running out of 64-bit mxoffsets, but a sanity check won't hurt just in case someone e.g. abuses pg_resetwal to get into that situation. - Removed MaybeExtendOffsetSlru(). It was only used to deal with binary upgrade from version 9.2 and below. Now that pg_upgrade rewrites the files, it's not needed anymore. - Modified PerformMembersTruncation() to use SimpleLruTruncate() Changes in pg_upgrade: - Removed the nextMXact/nextOffset fields from MultiXactWriter. They were redundant with the next_multi and next_offset local variables in the caller. Remaining issues: - There's one more refactoring I'd like to do before merging this: Move the definitions that are now duplicated between src/bin/pg_upgrade/multixact_new.c and src/backend/access/transam/multixact.c into a new header file, multixact_internal.h. One complication with that is that it needs SLRU_PAGES_PER_SEGMENT from access/slru.h, but slru.h cannot currently be included in FRONTEND code. Perhaps we should move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h, or if that feels too global, to a separate slru_config.h file. - I saw Alexander's proposal for a new compression scheme but didn't incorporate that here. It might be a good idea, but I think we can do that as a followup patch before the release, if it seems worth it. I don't feel too bad about just making pg_multixact/offsets 2x larger either. - Have you done any performance testing of the pg_upgrade code? How long does the conversion take if you have e.g. 1 billion multixids? - Is the !oldestOffsetKnown case in the code still reachable? I left one FIXME comment about that. Needs a comment update at least. - The new pg_upgrade test fails on my system with this error in the log: > # Running: pg_dump --no-sync --restrict-key test -d port=22462 host=/tmp/5KdMvth1jk dbname='postgres' -f /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql > pg_dump: error: aborting because of server version mismatch > pg_dump: detail: server version: 19devel; pg_dump version: 17.6 > could not read "/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_CINS/newnode_1_dump.sql": No such file or directory at /home/heikki/git-sandbox/postgresql/src/bin/pg_upgrade/t/007_mxoff.pl line 242. This turns out to be an issue with IPC::Run. Setting the IPCRUNDEBUG=basic env variable reveals that it has a built-in command cache: > IPC::Run 0004 [#19(109223)]: ****** harnessing ***** > IPC::Run 0004 [#19(109223)]: parsing [ pg_dump --no-sync --restrict-key test -d 'port=20999 host=/tmp/NsJKldN1Ie dbname='postgres'' -f '/home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql' ] > IPC::Run 0004 [#19(109223)]: ** starting > IPC::Run 0004 [#19(109223)]: 'pg_dump' found in cache: '/home/heikki/pgsql.17stable/bin/pg_dump' > IPC::Run 0004 [#19(111432) pg_dump]: execing /home/heikki/pgsql.17stable/bin/pg_dump --no-sync --restrict-key test -d 'port=20999 host=/tmp/NsJKldN1Ie dbname='postgres'' -f /home/heikki/git-sandbox/postgresql/build/testrun/pg_upgrade/007_mxoff/data/tmp_test_urgw/newnode_1_dump.sql > IPC::Run 0004 [#19(109223)]: ** finishing > pg_dump: error: aborting because of server version mismatch > pg_dump: detail: server version: 19devel; pg_dump version: 17.6 The test calls pg_dump twice: first with the old version, then with the new version. But thanks to IPC::Run's command cache, the invocation of the new pg_dump version actually also calls the old version. I'm not sure how to fix that, but I was able to work around it by reversing the pg_dump calls so that thew new version is called first. That way we use the new pg_dump against both server versions which works. - The new pg_ugprade test is very slow. I would love to include that test permanently in the test suite, but it's too slow for that currently. - Heikki