Thread

  1. Re: Trying out <stdatomic.h>

    Thomas Munro <thomas.munro@gmail.com> — 2025-11-23T21:08:19Z

    On Mon, Nov 24, 2025 at 4:23 AM Greg Burd <greg@burd.me> wrote:
    > As mentioned on a separate thread about fixing ARM64 support when building with MSVC on Win11 [1] I tried out this patch.  The reply on that thread had an issue with _mm_pause() in spin_delay(), it turns out we need to use __yield() [2].  I went ahead and fixed that, so ignore that patch on the other thread [1].  The new patch attached that layers on top of your work and supports that platform, there was one minor change that was required:
    >
    >
    > #ifdef _MSC_VER
    >
    >         /*
    >          * If using Visual C++ on Win64, inline assembly is unavailable.  Use a
    >          * _mm_pause intrinsic instead of rep nop. For ARM64, use the __yield()
    >          * intrinsic which emits the YIELD instruction as a hint to the processor.
    >          */
    > #if defined(_M_ARM64)
    >         __yield();
    > #elif defined(_WIN64)
    >         _mm_pause();
    > #else
    >         /* See comment for gcc code. Same code, MASM syntax */
    >         __asm           rep nop;
    > #endif
    > #endif                                                  /* _MSC_VER */
    
    That makes more intuitive sense... but I didn't know that people *do*
    sometimes prefer instruction synchronisation barriers for spinlock
    delays:
    
    https://stackoverflow.com/questions/70810121/why-does-hintspin-loop-use-isb-on-aarch64
    
    When reading your patch I was pretty confused by that, because it said
    it was fixing a barrier problem and apparently doing so in an
    unprincipled place.  I guess we really need to research the best delay
    mechanism for our needs on this architecture, independently of the
    compiler being used, and then write matching GCC and Visual Studio
    versions of that?  I think there were some threads about spinlock
    performance on Linux + Graviton with graphs and theories...
    
    > tests are passing, best.
    
    Great news!  Thanks.  It sounds like if I could supply the missing
    credible evidence of codegen quality on... all the computers, then I
    think we'd be down to just: when can we pull the trigger and require
    Visual Studio 2022 and do we trust /experimental:c11atomics?
    
    FTR I had earlier shared some version of this patch with Dave when he
    was trying to get his Windows/ARM system going, but I think my earlier
    version was probably too broken.  Sorry Dave.  At that stage I was
    also trying to do it as an option but keeping the existing stuff
    around.  Since then we adopted C11, so this is the all-in version.  I
    also hadn't understood a key part of the C11 memory model that your
    RISC-V animal taught me and that c5d34f4a fixed, and you can see in
    this patch set too, and I'm not sure if Visual Studio is like GCC or
    Clang in that respect.  It crossed my mind that this might even be
    related to the problem you've noticed with barriers being missing, but
    I haven't looked into that.  BTW I believe we could actually change
    our code NOT to rely on that, ie to follow the C11 memory model better
    and declare eg PgAioHandle::status as atomic_uint8 or whatever (other
    non-atomic access would be considered dependent and do the right thing
    IIUC), but I'm not sure if it's necessary and that research project
    can wait.