Thread

  1. Re: POC: make mxidoff 64 bits

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-12-09T12:00:05Z

    On 08/12/2025 17:43, Ashutosh Bapat wrote:
    > On Sat, Dec 6, 2025 at 5:06 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
    >> On 05/12/2025 15:42, Ashutosh Bapat wrote:
    >>> And it makes the test faster by about a second or two on my laptop.
    >>> Something on those lines or other is required to reduce the output
    >>> from query_safe().
    >>
    >> Nice! That log bloat was the reason I bundled together the "COMMIT;
    >> BEGIN; SELECT ...;" steps into one statement in the loop. Your solution
    >> addresses it more directly.
    > 
    > Now we can call query_safe() separately on each of those. That will be
    > more readable and marginally less code.
    
    Done.
    
    >> I'm not entirely happy with the "Old" prefix here, because as you
    >> pointed out, we might end up needing "older" or "oldold" in the future.
    >> I couldn't come up with anything better though. "PreV19MultiXactOffset"
    >> is quite a mouthful.
    > 
    > How about MultiXactOffset32?
    
    Ooh, I like that. It doesn't sound as nice for the other "old" prefixed 
    things though. So I changed OldMultiXactOffset to MultiXactOffset32, but 
    kept OldMultiXactReader, GetOldMultiXactIdSingleMember() et al. We can 
    live with that for now, and rename in the future if we'd need "oldold".
    
    Committed with that and some other minor cleanups. Thanks everyone! This 
    patch has been brewing for a while :-).
    
    
    There are some noncritical followups that I'd like to address, now that 
    we know that in v19 the pg_multixact files will be rewritten. That gives 
    us an opportunity to clean up some backwards-compatibility stuff. The 
    committed patch already cleaned up a bunch, but there's some more we 
    could do:
    
    1. Currently, at multixid wraparound, MultiXactState->nextMXact goes to 
    0, which is invalid. All the readers must be prepared for that, and skip 
    over the 0. That's error-prone, we've already missed that a few times. 
    Let's change things so that the code that *writes* 
    MultiXactState->nextMXact skips over the zero already.
    
    2. We currently don't persist 'oldestOffset' in the control file the 
    same way as 'oldestMultiXactId'. Instead, we look up the offset of the 
    oldestMultiXactId at startup, and keep that value in memory. Originally 
    that was because we missed the need for that and had to add the offset 
    wraparound protections in a minor release without changing the control 
    file format. But we could easily do it now.
    
    With 64-bit offsets, it's actually less critical to persist the 
    oldestOffset. Previously, if we failed to look up the oldest offset 
    because the oldest multixid was invalid, it could lead to serious 
    trouble if the offsets then wrapped around and old offsets were 
    overwritten, but that won't happen anymore. Nevertheless, it leads to 
    unnecessarily aggressive vacuuming and some messages in the log.
    
    At first I thought that the failure to look up the oldest offset should 
    no longer happen, because we don't need to support reading old 9.3 era 
    SLRUs anymore that were created before we added the offset wraparound 
    protection. But it's not so: it's still possible to have multixids with 
    invalid offsets in the 'offsets' SLRU on a crash. Such multixids won't 
    be referenced from anywhere in the heap, but I think they could later 
    become the oldest multixid, and we would fail to look up its offset. 
    Persisting the oldest offset doesn't fully fix that problem, because 
    advancing the oldest offset is done by looking up the oldest multixid's 
    offset anyway.
    
    3. I think we should turn some of the assertions in 
    GetMultiXactIdMembers() into ereports(ERROR) calls. I included those 
    changes in my patch version 29 [1], as a separate patch, but I didn't 
    commit that yet.
    
    4. Compressing the offsets, per discussion. It doesn't really seem worth 
    to me and I don't intend to work on it, but if someone wants to do it, 
    now would be the time, so that we don't need to have upgrade code to 
    deal with yet another format.
    
    - Heikki