Inlining comparators as a performance optimisation

Peter Geoghegan <peter@2ndquadrant.com>

From: Peter Geoghegan <peter@2ndquadrant.com>
To: PG Hackers <pgsql-hackers@postgresql.org>
Date: 2011-09-20T01:56:01Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Speed up conversion of signed integers to C strings.

  2. Remove some unnecessary tests of pgstat_track_counts.

  3. Remove cvs keywords from all files.

  4. Code cleanup for function prototypes: change two K&R-style prototypes

  5. Use Min() instead of min() in qsort, for consistency and to avoid

  6. pgindent run for 8.2.

  7. Switch over to using our own qsort() all the time, as has been proposed

Attachments

Recent discussions on the threads "Double sorting split patch" and
"CUDA sorting" raised the possibility that there could be significant
performance optimisation "low-hanging fruit" picked by having the
executor treat integers and floats as a special case during sorting,
avoiding going to the trouble of calling a comparator using the
built-in SQL function machinery, and taking advantage of inlining of
the comparator, which has been shown to have a considerable
performance advantage (at least compared to a general purpose c stdlib
qsort(), that takes a function pointer as its comparator, much like
tuplesort).

I've hacked together a sloppy POC implementation in a hurry
(basically, some code is shifted around) , which is attached - I felt
that it would be useful to determine if the community feels that this
is a worth-while undertaking in advance of a business trip that I'm
leaving on tomorrow lasting until Friday, during which I will be
mostly unavailable. The patch breaks the Postgres sorting executor
node (at least when it quicksorts) for any type other than int4. I
apologise for how rough the patch is, but the code itself isn't
important right now - the idea is. I anticipate that the value
state->datumType or something similar will be set in a near future
revision, so that tuplesort_performsort will know which type-specific
optimisation it can use for the type, while falling back on the
existing generic qsort_arg + qsort_arg_comparator, and sorting won't
actually be broken.

I've been doing some preliminary testing using the dell store 2 sample
database. I increase work_mem to '50MB', to ensure that a quicksort
will be performed for sorting (otherwise, I'm using the
postgresql.conf that initdb gave me). The query is:

explain analyze select * from orderlines order by prod_id;

Once the cache has been warmed, explain analyze very consistently
reports a runtime of 123ms for this query on master/HEAD, which varies
+/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply
this patch, that goes down to 107ms +/- 1ms at -O0. I think that
that's a pretty good start. Funnily enough, the difference/advantage
vanishes at -O2 (I'm guessing that the higher optimisation level of
GCC 4.5 hyper-corrects away the inlining, but I don't have time to
check that right now).

I imagine the version that I actually submit for patch review will
have a macro-based infrastructure for inlining the sorting of various
built-in types, initially integers and floats. It will most likely
have some other optimisations - I haven't even used a profiler yet.

This performance patch differs from most in that it's difficult in
principle to imagine a performance regression occurring.

Thoughts?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services