Thread

  1. Re: spinlock contention

    Robert Haas <robertmhaas@gmail.com> — 2011-06-23T21:40:33Z

    On Thu, Jun 23, 2011 at 5:35 PM, Florian Pflug <fgp@phlo.org> wrote:
    >> Well, I'm sure there is some effect, but my experiments seem to
    >> indicate that it's not a very important one.  Again, please feel free
    >> to provide contrary evidence.  I think the basic issue is that - in
    >> the best possible case - padding the LWLocks so that you don't have
    >> two locks sharing a cache line can reduce contention on the busier
    >> lock by at most 2x.  (The less busy lock may get a larger reduction
    >> but that may not help you much.)  If you what you really need is for
    >> the contention to decrease by 1000x, you're just not really moving the
    >> needle.
    >
    > Agreed. OTOH, adding a few dummy entries to the LWLocks array to separate
    > the most heavily contested LWLocks for the others might still be
    > worthwhile.
    
    Hey, if we can show that it works, sign me up.
    
    >> That's why the basic fast-relation-lock patch helps so much:
    >> it replaces a system where every lock request results in contention
    >> with a system where NONE of them do.
    >>
    >> I tried rewriting the LWLocks using CAS.  It actually seems to make
    >> things slightly worse on the tests I've done so far, perhaps because I
    >> didn't make it respect spins_per_delay.  Perhaps fetch-and-add would
    >> be better, but I'm not holding my breath.  Everything I'm seeing so
    >> far leads me to the belief that we need to get rid of the contention
    >> altogether, not just contend more quickly.
    >
    > Is there a patch available? How did you do the slow path (i.e. the
    > case where there's contention and you need to block)? It seems to
    > me that without some kernel support like futexes it's impossible
    > to do better than LWLocks already do, because any simpler scheme
    > like
    >  while (atomic_inc_post(lock) > 0) {
    >    atomic_dec(lock);
    >    block(lock);
    >  }
    > for the shared-locker case suffers from a race condition (the lock
    > might be released before you actually block()).
    
    Attached...
    
    > The idea would be to start out with something trivial like the above.
    > Maybe with an #if for compilers which have something like GCC's
    > __sync_synchronize(). We could then gradually add implementations
    > for specific architectures, hopefully done by people who actually
    > own the hardware and can test.
    
    Yes.  But if we go that route, then we have to also support a code
    path for architectures for which we don't have that support.  That's
    going to be more work, so I don't want to do it until we have a case
    where there is a good, clear benefit.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company