Thread

Re: mosbench revisited

Jeff Janes <jeff.janes@gmail.com> — 2011-08-06T17:43:03Z
On Wed, Aug 3, 2011 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> About nine months ago, we had a discussion of some benchmarking that
> was done by the mosbench folks at MIT:
>
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg00160.php
>
> Although the authors used PostgreSQL as a test harness for driving
> load, it's pretty clear from reading the paper that their primary goal
> was to stress the Linux kernel, so the applicability of the paper to
> real-world PostgreSQL performance improvement is less than it might
> be.  Still, having now actually investigated in some detail many of
> the same performance issues that they were struggling with, I have a
> much clearer understanding of what's really going on here.  In
> PostgreSQL terms, here are the bottlenecks they ran into:
>
> 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
> because PostgreSQL protects its free-list with a single lock and thus
> scales poorly with smaller caches."  This is a complaint about
> BufFreeList lock which, in fact, I've seen as a huge point of
> contention on some workloads.  In fact, on read-only workloads, with
> my lazy vxid lock patch applied, this is, I believe, the only
> remaining unpartitioned LWLock that is ever taken in exclusive mode;
> or at least the only one that's taken anywhere near often enough to
> matter.  I think we're going to do something about this, although I
> don't have a specific idea in mind at the moment.

I was going to ask if you if had done any benchmarks with scale such
that the tables fit in RAM but not in shared_buffers.  I guess you
have.

The attached experimental patch fixed freelist contention on 8 cores.
It would be nice to see what happens above that.

It has been cherry picked up to HEAD, but not tested against it. (Last
tested in Dec 2010, my how time flies)

The approach is to move the important things from a LWLock to a
spinlock, and to not do any locking for increments to clock-hand
increment and numBufferAllocs.
That means that some buffers might occasionally get inspected twice
and some might not get inspected at all during any given clock cycle,
but this should not lead to any correctness problems.   (Disclosure:
Tom didn't like this approach when it was last discussed.)

I just offer this for whatever it is worth to you--I'm not proposing
it as an actual patch to be applied.

When data fits in RAM but not shared_buffers, maybe the easiest fix is
to increase shared_buffers.  Which brings up the other question I had
for you about your work with Nate's celebrated loaner machine.  Have
you tried to reproduce the performance problems that have been
reported (but without public disclosure of how to reproduce) with
shared_buffers > 8GB on machines with RAM >>8GB ?

Cheers,

Jeff