Re: POC: make mxidoff 64 bits

Heikki Linnakangas <hlinnaka@iki.fi>

From: Heikki Linnakangas <hlinnaka@iki.fi>
To: Maxim Orlov <orlovmg@gmail.com>
Cc: Alvaro Herrera <alvherre@alvh.no-ip.org>, Alexander Korotkov <aekorotkov@gmail.com>, wenhui qiu <qiuwenhuifx@gmail.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: 2025-12-04T10:39:43Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix partial read handling in pg_upgrade's multixact conversion

  2. Increase timeout in multixid_conversion upgrade test

  3. Improve sanity checks on multixid members length

  4. Clarify comment on multixid offset wraparound check

  5. Never store 0 as the nextMXact

  6. Add runtime checks for bogus multixact offsets

  7. Widen MultiXactOffset to 64 bits

  8. Move pg_multixact SLRU page format definitions to a separate header

  9. Convert confusing macros in multixact.c to static inline functions

  10. Index SLRUs by 64-bit integers rather than by 32-bit integers

  11. Cope with possible failure of the oldest MultiXact to exist.

On 26/11/2025 17:50, Heikki Linnakangas wrote:
> On 26/11/2025 17:23, Maxim Orlov wrote:
>> On Tue, 25 Nov 2025 at 13:07, Heikki Linnakangas <hlinnaka@iki.fi 
>> <mailto:hlinnaka@iki.fi>> wrote:
>>> GetOldMultiXactIdSingleMember() currently asserts that the offset is
>>> never zero, but it should try to do something sensible in that case
>>> instead of just failing.
>>
>> Correct me if I'm wrong, but we added the assertion that offsets are
>> never 0, based on the idea that case #2 will never take place during an
>> update. If this isn't the case, this assertion could be removed.
>> The rest of the function appears to work correctly.
>>
>> I even think that, as an experiment, we could randomly reset some of the
>> offsets to zero and nothing would happen, except that some data would
>> be lost.
> 
> +1
> 
>> The most sensible thing we can do is give the user a warning, right?
>> Something like, "During the update, we encountered some weird offset
>> that shouldn't have been there, but there's nothing we can do about it,
>> just take note."
> 
> Yep, makes sense.

I read through the SLRU reading codepath, looking for all the things 
that could go wrong (not sure I got them all):

1. An SLRU file does not exist
2. An SLRU file is too short, i.e. a page does not exist
3. The offset in 'offsets' page is 0
4. The offset in 'offsets' page looks invalid, i.e. it's greater than 
nextOffset or smaller than oldestOffset.
5. The offset is out of order compared to its neighbors
6. The multixid has no members
7. The multixid has an invalid (0) member
8. A multixid has more than one updating member

Some of those situations are theoretically are possible if there was a 
crash. We don't follow the WAL-before-data rule for these SLRUs. 
Instead, we piggyback on the WAL-before-data of the heap page that would 
reference the multixid. In other words, we rely on the fact that if a 
multixid write is missed or torn because of a crash, that multixid will 
not be referenced from anywhwere and will never be read.

However, that doesn't hold for pg_upgrade. pg_upgrade will try to read 
all the multixids. So we need to make the multixact reading code 
tolerant of the situations that could be present after a crash. I think 
the right philosophy here is that we try to read all the old multixids, 
and do our best to interpret them the same way that the old server 
would. For those situations that can legitimately be present if the old 
server crashed at some point, be silent. For cases that should not 
happen, even if there was a crash, print a warning. For example, I think 
an SLRU file should never be missing (1) or truncated (2). But the zero 
offset (3), and (6) can happen.

Perhaps we should check that all the files exist and have the correct 
sizes in the pre-check stage, and abort the upgrade early if anything is 
missing. That would be pretty cheap to check.

- Heikki