Thread

  1. Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ​barriers

    Greg Burd <greg@burd.me> — 2025-11-21T19:36:12Z

    On Nov 20 2025, at 7:03 pm, Andres Freund <andres@anarazel.de> wrote:
    
    > Hi,
    > 
    > On 2025-11-20 15:45:22 -0500, Greg Burd wrote:
    >> Dave and I have been working together to get ARM64 with MSVC functional.
    >>  The attached patches accomplish that. Dave is the author of the first
    >> which addresses some build issues and fixes the spin_delay() semantics,
    >> I did the second which fixes some atomics in this combination.
    > 
    > Thanks for working on this!
    
    You're welcome, thanks for reviewing it. :)
    
    >> 
    >> MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the
    >> atomic operation but does NOT emit the necessary Data Memory Barrier
    >> (DMB) instructions [4][5].
    > 
    > I couldn't reproduce this result when playing around on godbolt. By specifying
    > /arch:armv9.4 msvc can be convinced to emit the code for the
    > intrinsics inline
    > (at least for most of them).  And that makes it visible that
    > _InterlockedCompareExchange() results in a "casal" instruction.
    > Looking that
    > up shows:
    >  https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64-
    > which includes these two statements:
    > "CASA and CASAL load from memory with acquire semantics."
    > "CASL and CASAL store to memory with release semantics."
    
    I didn't even think to check for a compiler flag for the architecture,
    nice call!  If this emits the correct instructions it is a much better
    approach.  I'll give it a try, thanks for the nudge.
    
    >> Issue 2: S_UNLOCK() uses only a compiler barrier
    >> 
    >> _ReadWriteBarrier() is a compiler barrier, NOT a hardware memory
    >> barrier [6].  It prevents the compiler from reordering operations, but
    >> the CPU can still reorder memory operations. This is fundamentally
    >> insufficient for ARM64's weaker memory model.
    > 
    > Yea, that seems broken on a non-TSO architecture.  Is the problem
    > fixed if you change just this to include a proper barrier?
    
    Using the flag from above the _ReadWriteBarrier() does (in godbolt) turn
    into a casal which (AFAIK) is going to do the trick.  I'll see if I can
    update meson.build and get this work as intended.
    
    > Greetings,
    > 
    > Andres Freund
    
    best.
    
    -greg