Thread

  1. PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory

    Tomas Vondra <tomas@vondra.me> — 2026-05-04T08:18:31Z

    Hi,
    
    A couple of days ago we got a report regarding an incorrect shmem size
    calculation in btree [1], leading to a buffer overflow / memory
    corruption. Which became a much less common issue in our code, thanks to
    valgrind and similar tools. But it took me a while to realize valgrind
    won't catch this because we only instrument private memory (palloc et
    al), while shmem is left alone.
    
    I was wondering if it's feasible to improve this. Attached is a trivial
    patch that adjusts shm_toc.c to add a couple NOACCESS bytes after each
    entry in the segment. It seems to do the trick - with this we get a
    reasonable report (for the reproducer provided in the bug report, before
    it got fixed by 748d871b7c) from valgrind, with invalid accesses. See
    the attached .log for an example. It's much better than the confusing
    crashes due to corrupted state.
    
    There's an issue, though. It seems the valgrind memory markings are not
    shared between processes. The leader sets the shm_toc up, marks the
    ranges as NOACCESS, and then checks it while accessing the memory. But
    the parallel workers don't seem to see this, and so will produce no
    reports. I'm assuming this is the case, because all the reports come
    from the leader, never from the workers. Maybe there's a different
    explanation (e.g. maybe it's just the leader touching the memory?).
    
    Still, it seems useful even with this limitation. If the leader
    participates, it will produce a nice report (unless the worker crashes
    first, because of the corrupted state). But we also have
    
        debug_parallel_query = regress
    
    in which case everything should run in a single process, and work just
    fine. Maybe that's good enough.
    
    An alternative would be to do mprotect(), but unfortunately it requires
    page-aligned ranges, which makes it somewhat useless for small overflows
    of a couple bytes (like here). It should work cross-process, I think,
    
    FWIW the PoC patch adds a 32-byte chunk, not just a single byte. This is
    intentional, because if the state is an array, it's quite possible the
    invalid access steps over a fair number of bytes. I'm actually thinking
    it should be even larger.
    
    This modifies just the dynamic shmem, but maybe we should do this even
    for the "regular" shmem allocated at start. Issues in that would likely
    cause crashes pretty quick (unlike the btree issue, which requires a
    somewhat special reproducer), but a nice valgrind report helps.
    
    regards
    
    
    [1]
    https://www.postgresql.org/message-id/CAGCUe0Lwk3C0qdkBa%2BOLpYc7yXwW%3Dpbaz8Sju4xMXEQAmyp%2B5g%40mail.gmail.com
    
    -- 
    Tomas Vondra