Thread

  1. Re: Trying out <stdatomic.h>

    Greg Burd <greg@burd.me> — 2025-11-23T21:17:25Z

    On Nov 23 2025, at 4:08 pm, Thomas Munro <thomas.munro@gmail.com> wrote:
    
    > On Mon, Nov 24, 2025 at 4:23 AM Greg Burd <greg@burd.me> wrote:
    >> As mentioned on a separate thread about fixing ARM64 support when
    >> building with MSVC on Win11 [1] I tried out this patch.  The reply on
    >> that thread had an issue with _mm_pause() in spin_delay(), it turns
    >> out we need to use __yield() [2].  I went ahead and fixed that, so
    >> ignore that patch on the other thread [1].  The new patch attached
    >> that layers on top of your work and supports that platform, there was
    >> one minor change that was required:
    >>  
    >>  
    >> #ifdef _MSC_VER
    >>  
    >>         /*
    >>          * If using Visual C++ on Win64, inline assembly is
    >> unavailable.  Use a
    >>          * _mm_pause intrinsic instead of rep nop. For ARM64, use the __yield()
    >>          * intrinsic which emits the YIELD instruction as a hint to
    >> the processor.
    >>          */
    >> #if defined(_M_ARM64)
    >>         __yield();
    >> #elif defined(_WIN64)
    >>         _mm_pause();
    >> #else
    >>         /* See comment for gcc code. Same code, MASM syntax */
    >>         __asm           rep nop;
    >> #endif
    >> #endif                                                  /* _MSC_VER */
    >  
    > That makes more intuitive sense... but I didn't know that people *do*
    > sometimes prefer instruction synchronisation barriers for spinlock
    > delays:
    >  
    > https://stackoverflow.com/questions/70810121/why-does-hintspin-loop-use-isb-on-aarch64
    >  
    > When reading your patch I was pretty confused by that, because it said
    > it was fixing a barrier problem and apparently doing so in an
    > unprincipled place.  I guess we really need to research the best delay
    > mechanism for our needs on this architecture, independently of the
    > compiler being used, and then write matching GCC and Visual Studio
    > versions of that?  I think there were some threads about spinlock
    > performance on Linux + Graviton with graphs and theories...
    
    Interesting, I think I was rushing to get past that compile issue rather
    than optimizing.  This sounds like yet another place where we should
    choose based on arch and it seems hint::spin_loop() does.
    
    >> tests are passing, best.
    >  
    > Great news!  Thanks.  It sounds like if I could supply the missing
    > credible evidence of codegen quality on... all the computers, then I
    > think we'd be down to just: when can we pull the trigger and require
    > Visual Studio 2022 and do we trust /experimental:c11atomics?
    
    I'm in favor of the stdatomic approach.  I can't speak to codegen
    quality on *all the platforms* or how *experimental* c11 atomics are
    when using MSVC.
    
    > FTR I had earlier shared some version of this patch with Dave when he
    > was trying to get his Windows/ARM system going, but I think my earlier
    > version was probably too broken.  Sorry Dave.  At that stage I was
    > also trying to do it as an option but keeping the existing stuff
    > around.  Since then we adopted C11, so this is the all-in version.  I
    > also hadn't understood a key part of the C11 memory model that your
    > RISC-V animal taught me and that c5d34f4a fixed, and you can see in
    > this patch set too, and I'm not sure if Visual Studio is like GCC or
    > Clang in that respect.
    
    Thanks for that work on RISC-V, I appreciate that!  Much more digging to
    be done to answer those questions for sure.
    
    > It crossed my mind that this might even be
    > related to the problem you've noticed with barriers being missing, but
    > I haven't looked into that.  BTW I believe we could actually change
    > our code NOT to rely on that, ie to follow the C11 memory model better
    > and declare eg PgAioHandle::status as atomic_uint8 or whatever (other
    > non-atomic access would be considered dependent and do the right thing
    > IIUC), but I'm not sure if it's necessary and that research project
    > can wait.
    
    Interesting.  Yeah, let's do one thing and then move to the next, but I
    do like the idea.
    
    best.
    
    -greg