Thread

  1. Re: mosbench revisited

    Jeff Janes <jeff.janes@gmail.com> — 2011-08-06T17:43:03Z

    On Wed, Aug 3, 2011 at 11:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    > About nine months ago, we had a discussion of some benchmarking that
    > was done by the mosbench folks at MIT:
    >
    > http://archives.postgresql.org/pgsql-hackers/2010-10/msg00160.php
    >
    > Although the authors used PostgreSQL as a test harness for driving
    > load, it's pretty clear from reading the paper that their primary goal
    > was to stress the Linux kernel, so the applicability of the paper to
    > real-world PostgreSQL performance improvement is less than it might
    > be.  Still, having now actually investigated in some detail many of
    > the same performance issues that they were struggling with, I have a
    > much clearer understanding of what's really going on here.  In
    > PostgreSQL terms, here are the bottlenecks they ran into:
    >
    > 1. "We configure PostgreSQL to use a 2 Gbyte application-level cache
    > because PostgreSQL protects its free-list with a single lock and thus
    > scales poorly with smaller caches."  This is a complaint about
    > BufFreeList lock which, in fact, I've seen as a huge point of
    > contention on some workloads.  In fact, on read-only workloads, with
    > my lazy vxid lock patch applied, this is, I believe, the only
    > remaining unpartitioned LWLock that is ever taken in exclusive mode;
    > or at least the only one that's taken anywhere near often enough to
    > matter.  I think we're going to do something about this, although I
    > don't have a specific idea in mind at the moment.
    
    I was going to ask if you if had done any benchmarks with scale such
    that the tables fit in RAM but not in shared_buffers.  I guess you
    have.
    
    The attached experimental patch fixed freelist contention on 8 cores.
    It would be nice to see what happens above that.
    
    It has been cherry picked up to HEAD, but not tested against it. (Last
    tested in Dec 2010, my how time flies)
    
    The approach is to move the important things from a LWLock to a
    spinlock, and to not do any locking for increments to clock-hand
    increment and numBufferAllocs.
    That means that some buffers might occasionally get inspected twice
    and some might not get inspected at all during any given clock cycle,
    but this should not lead to any correctness problems.   (Disclosure:
    Tom didn't like this approach when it was last discussed.)
    
    I just offer this for whatever it is worth to you--I'm not proposing
    it as an actual patch to be applied.
    
    When data fits in RAM but not shared_buffers, maybe the easiest fix is
    to increase shared_buffers.  Which brings up the other question I had
    for you about your work with Nate's celebrated loaner machine.  Have
    you tried to reproduce the performance problems that have been
    reported (but without public disclosure of how to reproduce) with
    shared_buffers > 8GB on machines with RAM >>8GB ?
    
    Cheers,
    
    Jeff