Thread

  1. Re: spinlocks on HP-UX

    Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-08-29T18:48:12Z

    Robert Haas <robertmhaas@gmail.com> wrote:
     
    > Stepping beyond the immediate issue of whether we want an unlocked
    > test in there or not (and I agree that based on these numbers we
    > don't), there's a clear and puzzling difference between those sets
    > of numbers.  The Opteron test is showing 32 clients getting about
    > 23.9 times the throughput of a single client, which is not exactly
    > linear but is at least respectable, whereas the PPC64 test is
    > showing 32 clients getting just 14.5 times the throughput of a
    > single client, which is pretty significantly less good.  Moreover,
    > cranking it up to 64 clients is squeezing a significant amount of
    > additional work out on Opteron, but not on PPC64.  The
    > HP-UX/Itanium numbers in my OP give a ratio of 17.3x - a little
    > better than your PPC64 numbers, but certainly not great.
     
    I wouldn't make too much of that without comparing to a STREAM test
    (properly configured -- the default array size is likely not to be
    large enough for these machines).  On a recently delivered 32 core
    machine with 256 GB RAM, I saw numbers like this for just RAM
    access:
     
    Threads    Copy       Scale         Add       Triad
    1        3332.3721   3374.8146   4500.1954   4309.7392
    2        5822.8107   6158.4621   8563.3236   7756.9050
    4       12474.9646  12282.3401  16960.7216  15399.2406
    8       22353.6013  23502.4389  31911.5206  29574.8124
    16      35760.8782  40946.6710  49108.4386  49264.6576
    32      47567.3882  44935.4608  52983.9355  52278.1373
    64      48428.9939  47851.7320  54141.8830  54560.0520
    128     49354.4303  49586.6092  55663.2606  57489.5798
    256     45081.3601  44303.1032  49561.3815  50073.3530
    512     42271.9688  41689.8609  47362.4190  46388.9720
     
    Graph attached for those who are visually inclined and have support
    for the display of JPEG files.
     
    Note that this is a machine which is *not* configured to be
    blazingly fast for a single connection, but to scale up well for a
    large number of concurrent processes:
     
    http://www.redbooks.ibm.com/redpapers/pdfs/redp4650.pdf
     
    Unless your benchmarks are falling off a lot faster than the STREAM
    test on that hardware, I wouldn't worry.
     
    -Kevin