Re: POC: make mxidoff 64 bits

Heikki Linnakangas <hlinnaka@iki.fi>

From: Heikki Linnakangas <hlinnaka@iki.fi>

To: Maxim Orlov <orlovmg@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>

Cc: Alexander Korotkov <aekorotkov@gmail.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>

Date: 2025-11-13T16:04:48Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Fix partial read handling in pg_upgrade's multixact conversion
- ac94ce8194e5 19 (unreleased) landed
Increase timeout in multixid_conversion upgrade test
- bd43940b02b2 19 (unreleased) landed
Improve sanity checks on multixid members length
- ecb553ae8211 19 (unreleased) landed
Clarify comment on multixid offset wraparound check
- 170361d7b869 14.21 landed
- b0b52b7123ae 15.16 landed
- 7d42e2367c6b 16.12 landed
- cd1a887fe9bf 17.8 landed
- 3fbad030a24d 18.2 landed
- 366dcdaf5779 19 (unreleased) landed
Never store 0 as the nextMXact
- 87a350e1f284 19 (unreleased) landed
Add runtime checks for bogus multixact offsets
- d4b7bde4183b 19 (unreleased) landed
Widen MultiXactOffset to 64 bits
- bd8d9c9bdfa0 19 (unreleased) landed
Move pg_multixact SLRU page format definitions to a separate header
- bb3b1c4f6462 19 (unreleased) landed
Convert confusing macros in multixact.c to static inline functions
- 0099b9408e8c 17.0 landed
Index SLRUs by 64-bit integers rather than by 32-bit integers
- 4ed8f0913bfd 17.0 cited
Cope with possible failure of the oldest MultiXact to exist.
- b6a3444fa635 9.4.4 cited

Attachments

v25-0001-Move-pg_multixact-SLRU-page-format-definitions-t.patch (text/x-patch) patch v25-0001
v25-0002-Use-64-bit-multixact-offsets.patch (text/x-patch) patch v25-0002
v25-0003-Add-pg_upgrade-for-64-bit-multixact-offsets.patch (text/x-patch) patch v25-0003
v25-0004-Remove-oldestOffset-oldestOffsetKnown-from-multi.patch (text/x-patch) patch v25-0004
v25-0005-Reintroduce-MultiXactMemberFreezeThreshold.patch (text/x-patch) patch v25-0005
v25-0006-TEST-bump-catversion.patch (text/x-patch) patch v25-0006
v25-0007-TEST-Add-test-for-64-bit-mxoff-in-pg_resetwal.patch (text/x-patch) patch v25-0007
v25-0008-TEST-Add-test-for-wraparound-of-next-new-multi-i.patch (text/x-patch) patch v25-0008
v25-0009-TEST-Add-test-for-64-bit-mxoff-in-pg_upgrade.patch (text/x-patch) patch v25-0009
v25-0010-TEST-add-consume_multixids-function.patch (text/x-patch) patch v25-0010

I realized that this issue was still outstanding:

On 01/04/2025 21:25, Heikki Linnakangas wrote:
> Thanks! I did some manual testing of this. I created a little helper 
> function to consume multixids, to test the autovacuum behavior, and 
> found one issue:
> 
> If you consume a lot of multixid members space, by creating lots of 
> multixids with huge number of members in each, you can end up with a 
> very bloated members SLRU, and autovacuum is in no hurry to clean it up. 
> Here's what I did:
> 
> 1. Installed attached test module
> 2. Ran "select consume_multixids(10000, 100000);" many times
> 3. ran:
> 
> $ du -h data/pg_multixact/members/
> 26G    data/pg_multixact/members/
> 
> When I run "vacuum freeze; select * from pg_database;", I can see that 
> 'datminmxid' for the current database is advanced. However, autovacuum 
> is in no hurry to vacuum 'template0' and 'template1', so pg_multixact/ 
> members/ does not get truncated. Eventually, when 
> autovacuum_multixact_freeze_max_age is reached, it presumably will, but 
> you will run out of disk space before that.
> 
> There is this check for members size at the end of SetOffsetVacuumLimit():
> 
>>
>>     /*
>>      * Do we need autovacuum?    If we're not sure, assume yes.
>>      */
>>     return !oldestOffsetKnown ||
>>         (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
> 
> And the caller (SetMultiXactIdLimit()) will in fact signal the 
> autovacuum launcher after "vacuum freeze" because of that. But 
> autovacuum launcher will look at the datminmxid / relminmxid values, see 
> that they are well within autovacuum_multixact_freeze_max_age, and do 
> nothing.
> 
> This is a very extreme case, but clearly the code to signal autovacuum 
> launcher, and the freeze age cutoff that autovacuum then uses, are not 
> in sync.
> 
> This patch removed MultiXactMemberFreezeThreshold(), per my suggestion, 
> but we threw this baby with the bathwater. We discussed that in this 
> thread, but didn't come up with any solution. But ISTM we still need 
> something like MultiXactMemberFreezeThreshold() to trigger autovacuum 
> freezing if the members have grown too large.

Here's a new patch version that addresses the above issue. I resurrected 
MultiXactMemberFreezeThreshold(), using the same logic as before, just 
using pretty arbitrary thresholds of 1 and 2 billion offsets instead of 
the safe/danger thresholds derived from MaxMultiOffset. That gives 
roughly the same behavior wrt. calculating effective freeze age as before.

Another change is that I removed the offset-based emergency vacuum 
triggering. With 64-bit offsets, we never need to shut down the system 
to prevent offset wraparound, so even if the offsets SLRU grows large, 
it's not an "emergency" the same way that wraparound is. Consuming lots 
of disk space could be a problem, of course, but we can let autovacuum 
deal with that at the normal pace, like it deals with bloated tables.

The heuristics could surely be made better and/or more configurable, but 
I think this good enough for now.

I included these changes as a separate patch for review purposes, but it 
ought to be squashed with the main patch before committing.

- Heikki