Thread

  1. Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ?barriers

    Greg Burd <greg@burd.me> — 2025-12-15T17:27:25Z

    On Fri, Dec 12, 2025, at 2:32 PM, Andres Freund wrote:
    > Hi,
    >
    > On 2025-12-12 14:21:47 -0500, Greg Burd wrote:
    >> 
    >> On Fri, Dec 12, 2025, at 11:03 AM, Nathan Bossart wrote:
    >> > +/*
    >> > + * _InterlockedExchange() generates a full memory barrier (or release
    >> > + * semantics that ensures all prior memory operations are visible to
    >> > + * other cores before the lock is released.
    >> > + */
    >> > +#define S_UNLOCK(lock) (InterlockedExchange(lock, 0))
    >> 
    >> Nathan, thanks for looking at the patch!
    >> 
    >> > This seems to change the implementation from
    >> >
    >> > 	#define S_UNLOCK(lock)	\
    >> > 		do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
    >> >
    >> > in some cases, but I am insufficiently caffeinated to figure out what
    >> > platforms use which implementation.  In any case, it looks like we are
    >> > changing it for some currently-supported platforms, and I'm curious why.
    >> 
    >> This change is within _MSC_VER, but AFAICT this intrinsic is available
    >> across their supported platforms.  The previous implementation of S_UNLOCK()
    >> boils down to a no-op because the _ReadWriteBarrier()[1] is a hint to the
    >> compiler and does not emit any instruction on any platform and it's also
    >> deprecated.  So, on MSVC S_UNLOCK is an unguarded assignment and then a loop
    >> that will be optimized out, not really what we wanted I'd imagine.
    
    Thanks Andres for the comments.
    
    > I don't think it can be optimized out, that should be prevented by
    > _ReadWriteBarrier() being a compiler barrier.
    
    While the documentation does mention that this has been deprecated, I tend to agree with you.  This has been in place for a while, no need to change what works at this time.  A new thread might be a better forum if there is some evidence that we should revisit this in the future.  I'd like to land the ARM64/MSVC changes and enable that platform in this thread.
    
    >> My tests with godbolt showed this to be true, no instruction barriers
    >> emitted.  I think it was Andres[2] who suggested replacing it with
    >> _InterlockedExchange()[3].  So, given that _ReadWriteBarrier() is deprecated
    >> I decided not to specialize this change to only the ARM64 platform, sorry
    >> for not making that clear in the commit or email.
    >
    > I don't think that's a good idea - the _ReadWriteBarrier() is sufficient on
    > x86 to implement a spinlock release (due to x86 being a total store order
    > architecture, once the lock is observed as being released, all the effects
    > protected by the lock are also guaranteed to be visible).  Making the
    > spinlocks use an atomic op for both acquire and release does cause measurable
    > slowdowns on x86 with gcc, so I'd expect the same to be true on windows.
    
    Got it, fixed in v9.
    
    best.
    
    -greg
    
    > Greetings,
    >
    > Andres Freund