Thread
-
Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB ?barriers
Greg Burd <greg@burd.me> — 2025-12-15T17:27:25Z
On Fri, Dec 12, 2025, at 2:32 PM, Andres Freund wrote: > Hi, > > On 2025-12-12 14:21:47 -0500, Greg Burd wrote: >> >> On Fri, Dec 12, 2025, at 11:03 AM, Nathan Bossart wrote: >> > +/* >> > + * _InterlockedExchange() generates a full memory barrier (or release >> > + * semantics that ensures all prior memory operations are visible to >> > + * other cores before the lock is released. >> > + */ >> > +#define S_UNLOCK(lock) (InterlockedExchange(lock, 0)) >> >> Nathan, thanks for looking at the patch! >> >> > This seems to change the implementation from >> > >> > #define S_UNLOCK(lock) \ >> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0) >> > >> > in some cases, but I am insufficiently caffeinated to figure out what >> > platforms use which implementation. In any case, it looks like we are >> > changing it for some currently-supported platforms, and I'm curious why. >> >> This change is within _MSC_VER, but AFAICT this intrinsic is available >> across their supported platforms. The previous implementation of S_UNLOCK() >> boils down to a no-op because the _ReadWriteBarrier()[1] is a hint to the >> compiler and does not emit any instruction on any platform and it's also >> deprecated. So, on MSVC S_UNLOCK is an unguarded assignment and then a loop >> that will be optimized out, not really what we wanted I'd imagine. Thanks Andres for the comments. > I don't think it can be optimized out, that should be prevented by > _ReadWriteBarrier() being a compiler barrier. While the documentation does mention that this has been deprecated, I tend to agree with you. This has been in place for a while, no need to change what works at this time. A new thread might be a better forum if there is some evidence that we should revisit this in the future. I'd like to land the ARM64/MSVC changes and enable that platform in this thread. >> My tests with godbolt showed this to be true, no instruction barriers >> emitted. I think it was Andres[2] who suggested replacing it with >> _InterlockedExchange()[3]. So, given that _ReadWriteBarrier() is deprecated >> I decided not to specialize this change to only the ARM64 platform, sorry >> for not making that clear in the commit or email. > > I don't think that's a good idea - the _ReadWriteBarrier() is sufficient on > x86 to implement a spinlock release (due to x86 being a total store order > architecture, once the lock is observed as being released, all the effects > protected by the lock are also guaranteed to be visible). Making the > spinlocks use an atomic op for both acquire and release does cause measurable > slowdowns on x86 with gcc, so I'd expect the same to be true on windows. Got it, fixed in v9. best. -greg > Greetings, > > Andres Freund