Thread
-
Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB barriers
Greg Burd <greg@burd.me> — 2025-11-23T20:41:31Z
On Nov 23 2025, at 10:55 am, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2025-11-22 16:43:30 -0500, Greg Burd wrote: >> With the new MSVC compiler flag Andres mentioned (/arch:armv9.4) I only >> had to update the S_UNLOCK() macro, the compiler did the rest correctly >> AFAICT. > > Just to be clear - the flag shouldn't be necessary for things to work > correctly. I was only using it to have godbolt inline the intrinsics, rather > than have them generate an out-of-line function call that I couldn't easily > inspect. I'm fairly sure that the out-of-line functions also have the relevant > barriers. Well, I learned something new about MSVC (again) today. Thanks Andres for pointing that out. I fiddled with godbolt a bit and indeed on ARM64 systems the only thing that flag (/arch:armv9.4) does is to replace the out-of-line function call for the intrinsic with the inlined version of it. Without: bl _InterlockedCompareExchange With: mov w10,#2 str w8,[sp] mov x9,sp casal w8,w10,[x9] Good to know, and yes it does seem that the ASM includes the correct instruction sequence. I think, except for the build issues, the one thing I can put my finger on as a "fix" is the change from _ReadWriteBarrier() to _dmb(_ARM64_BARRIER_SY). The docs say: The _ReadWriteBarrier intrinsic limits the compiler optimizations that can remove or reorder memory accesses across the point of the call. Note that it says "compiler optimization" not "system memory barrier". So, this macro change: #define S_UNLOCK(lock) \ - do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0) + do { __dmb(_ARM64_BARRIER_SY); (*(lock)) = 0; } while (0) Makes more sense. This change replaces a compiler-only barrier with a full system memory barrier, fundamentally strengthening memory ordering guarantees for spinlock release on ARM64. When I remove the compiler flag from the patch and keep the S_UNLOCK() change the tests pass. I think we've found the critical fix. I'll update the patch set. >> @@ -2509,7 +2513,10 @@ int main(void) >> } >> ''' >> >> - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and >> __crc32cd without -march=armv8-a+crc', >> + if cc.get_id() == 'msvc' >> + cdata.set('USE_ARMV8_CRC32C', 1) >> + have_optimized_crc = true > > Should have a comment explaining why we can do this unconditionally... Sure, the more I look at this the more I want to clean it up a bit more. > Greetings, > > Andres Freund Thanks again for the helpful nudge Andres, best. -greg