Thread

  1. Re: Fix possible 'unexpected data beyond EOF' on replica restart

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-12-18T14:18:46Z

    On 17/12/2025 10:40, Anthonin Bonnefoy wrote:
    > On Wed, Dec 17, 2025 at 8:26 AM Amul Sul <sulamul@gmail.com 
    > <mailto:sulamul@gmail.com>> wrote:
    > 
    >> The deleted code you moved to mdtruncate() should be kept where it
    >> was. We cannot ensure that every extension using this interface will
    >> adhere to that requirement what the comment describes. 
    > 
    > Yeah, I've overlooked the case of extensions. I've moved back the 
    > InvalidBlockNumber assignment.
    
    The smgr interface isn't really an extension point, so we don't need to 
    worry about that. (I wish it was, but that's a different story [1])
    
    I don't think mdtruncate() should modify smgr_cached_nblocks, we should 
    keep that in smgrtruncate(). smgr.c is responsible for all other updates 
    of smgr_cached_nblocks, so doing it in mdtruncate() would be a layering 
    violation.
    
    I'm thinking that we should do the attached. Untested, and we should 
    also add a comment to smgrtruncate() and mdtruncate() to explain how 
    they behave if nblocks > curnblk.
    
    I wonder if we should move the whole "if (nblocks > curnblk)" check and 
    ereport() from mdtruncate() to smgrtruncate(). That logic doesn't really 
    depend on anything specific to md.c. If you'd imagine a different smgr 
    implementation, it'd need to just copy-paste that check. It's the 
    caller's mistake if it passes nblocks > curnblk, when not in recovery. 
    Then again, we do have other places in md.c too that behave differently 
    when InRecovery.
    
    What do you think?
    
    > I wonder how critical it is to have an up to date value of 
    > smgr_cached_nblocks after smgr_truncate. Leaving InvalidBlockNumber was 
    > also an option as the next user will ask the kernel for the real size. 
    > There are some functions like DropRelationBuffers which rely only on the 
    > cached value, so it's probably safer to keep the same behaviour.
    
    Leaving it invalid should work. But as the comment says, we might as 
    well update the cached value since we have the value at hand. It's just 
    that we were doing it wrong.
    
    [1] 
    https://www.postgresql.org/message-id/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com
    
    - Heikki