Thread

Re: spinlock contention

Robert Haas <robertmhaas@gmail.com> — 2011-06-23T21:40:33Z
On Thu, Jun 23, 2011 at 5:35 PM, Florian Pflug <fgp@phlo.org> wrote:
>> Well, I'm sure there is some effect, but my experiments seem to
>> indicate that it's not a very important one.  Again, please feel free
>> to provide contrary evidence.  I think the basic issue is that - in
>> the best possible case - padding the LWLocks so that you don't have
>> two locks sharing a cache line can reduce contention on the busier
>> lock by at most 2x.  (The less busy lock may get a larger reduction
>> but that may not help you much.)  If you what you really need is for
>> the contention to decrease by 1000x, you're just not really moving the
>> needle.
>
> Agreed. OTOH, adding a few dummy entries to the LWLocks array to separate
> the most heavily contested LWLocks for the others might still be
> worthwhile.

Hey, if we can show that it works, sign me up.

>> That's why the basic fast-relation-lock patch helps so much:
>> it replaces a system where every lock request results in contention
>> with a system where NONE of them do.
>>
>> I tried rewriting the LWLocks using CAS.  It actually seems to make
>> things slightly worse on the tests I've done so far, perhaps because I
>> didn't make it respect spins_per_delay.  Perhaps fetch-and-add would
>> be better, but I'm not holding my breath.  Everything I'm seeing so
>> far leads me to the belief that we need to get rid of the contention
>> altogether, not just contend more quickly.
>
> Is there a patch available? How did you do the slow path (i.e. the
> case where there's contention and you need to block)? It seems to
> me that without some kernel support like futexes it's impossible
> to do better than LWLocks already do, because any simpler scheme
> like
>  while (atomic_inc_post(lock) > 0) {
>    atomic_dec(lock);
>    block(lock);
>  }
> for the shared-locker case suffers from a race condition (the lock
> might be released before you actually block()).

Attached...

> The idea would be to start out with something trivial like the above.
> Maybe with an #if for compilers which have something like GCC's
> __sync_synchronize(). We could then gradually add implementations
> for specific architectures, hopefully done by people who actually
> own the hardware and can test.

Yes.  But if we go that route, then we have to also support a code
path for architectures for which we don't have that support.  That's
going to be more work, so I don't want to do it until we have a case
where there is a good, clear benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company