Re: POC: make mxidoff 64 bits

Heikki Linnakangas <hlinnaka@iki.fi>

From: Heikki Linnakangas <hlinnaka@iki.fi>
To: Maxim Orlov <orlovmg@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>
Cc: Alexander Korotkov <aekorotkov@gmail.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>
Date: 2025-11-13T16:04:48Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix partial read handling in pg_upgrade's multixact conversion

  2. Increase timeout in multixid_conversion upgrade test

  3. Improve sanity checks on multixid members length

  4. Clarify comment on multixid offset wraparound check

  5. Never store 0 as the nextMXact

  6. Add runtime checks for bogus multixact offsets

  7. Widen MultiXactOffset to 64 bits

  8. Move pg_multixact SLRU page format definitions to a separate header

  9. Convert confusing macros in multixact.c to static inline functions

  10. Index SLRUs by 64-bit integers rather than by 32-bit integers

  11. Cope with possible failure of the oldest MultiXact to exist.

Attachments

I realized that this issue was still outstanding:

On 01/04/2025 21:25, Heikki Linnakangas wrote:
> Thanks! I did some manual testing of this. I created a little helper 
> function to consume multixids, to test the autovacuum behavior, and 
> found one issue:
> 
> If you consume a lot of multixid members space, by creating lots of 
> multixids with huge number of members in each, you can end up with a 
> very bloated members SLRU, and autovacuum is in no hurry to clean it up. 
> Here's what I did:
> 
> 1. Installed attached test module
> 2. Ran "select consume_multixids(10000, 100000);" many times
> 3. ran:
> 
> $ du -h data/pg_multixact/members/
> 26G    data/pg_multixact/members/
> 
> When I run "vacuum freeze; select * from pg_database;", I can see that 
> 'datminmxid' for the current database is advanced. However, autovacuum 
> is in no hurry to vacuum 'template0' and 'template1', so pg_multixact/ 
> members/ does not get truncated. Eventually, when 
> autovacuum_multixact_freeze_max_age is reached, it presumably will, but 
> you will run out of disk space before that.
> 
> There is this check for members size at the end of SetOffsetVacuumLimit():
> 
>>
>>     /*
>>      * Do we need autovacuum?    If we're not sure, assume yes.
>>      */
>>     return !oldestOffsetKnown ||
>>         (nextOffset - oldestOffset > MULTIXACT_MEMBER_AUTOVAC_THRESHOLD);
> 
> And the caller (SetMultiXactIdLimit()) will in fact signal the 
> autovacuum launcher after "vacuum freeze" because of that. But 
> autovacuum launcher will look at the datminmxid / relminmxid values, see 
> that they are well within autovacuum_multixact_freeze_max_age, and do 
> nothing.
> 
> This is a very extreme case, but clearly the code to signal autovacuum 
> launcher, and the freeze age cutoff that autovacuum then uses, are not 
> in sync.
> 
> This patch removed MultiXactMemberFreezeThreshold(), per my suggestion, 
> but we threw this baby with the bathwater. We discussed that in this 
> thread, but didn't come up with any solution. But ISTM we still need 
> something like MultiXactMemberFreezeThreshold() to trigger autovacuum 
> freezing if the members have grown too large.

Here's a new patch version that addresses the above issue. I resurrected 
MultiXactMemberFreezeThreshold(), using the same logic as before, just 
using pretty arbitrary thresholds of 1 and 2 billion offsets instead of 
the safe/danger thresholds derived from MaxMultiOffset. That gives 
roughly the same behavior wrt. calculating effective freeze age as before.

Another change is that I removed the offset-based emergency vacuum 
triggering. With 64-bit offsets, we never need to shut down the system 
to prevent offset wraparound, so even if the offsets SLRU grows large, 
it's not an "emergency" the same way that wraparound is. Consuming lots 
of disk space could be a problem, of course, but we can let autovacuum 
deal with that at the normal pace, like it deals with bloated tables.

The heuristics could surely be made better and/or more configurable, but 
I think this good enough for now.

I included these changes as a separate patch for review purposes, but it 
ought to be squashed with the main patch before committing.

- Heikki