Inlining comparators as a performance optimisation
Peter Geoghegan <peter@2ndquadrant.com>
From: Peter Geoghegan <peter@2ndquadrant.com>
To: PG Hackers <pgsql-hackers@postgresql.org>
Date: 2011-09-20T01:56:01Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Speed up conversion of signed integers to C strings.
- 4fc115b2e981 9.1.0 cited
-
Remove some unnecessary tests of pgstat_track_counts.
- f4d242ef9473 9.1.0 cited
-
Remove cvs keywords from all files.
- 9f2e21138693 9.1.0 cited
-
Code cleanup for function prototypes: change two K&R-style prototypes
- b9954fbb4ef2 8.3.0 cited
-
Use Min() instead of min() in qsort, for consistency and to avoid
- b38900c76776 8.2.0 cited
-
pgindent run for 8.2.
- f99a569a2ee3 8.2.0 cited
-
Switch over to using our own qsort() all the time, as has been proposed
- 6edd2b4a91bd 8.2.0 cited
Attachments
- inline_comp.patch (text/x-patch) patch
Recent discussions on the threads "Double sorting split patch" and "CUDA sorting" raised the possibility that there could be significant performance optimisation "low-hanging fruit" picked by having the executor treat integers and floats as a special case during sorting, avoiding going to the trouble of calling a comparator using the built-in SQL function machinery, and taking advantage of inlining of the comparator, which has been shown to have a considerable performance advantage (at least compared to a general purpose c stdlib qsort(), that takes a function pointer as its comparator, much like tuplesort). I've hacked together a sloppy POC implementation in a hurry (basically, some code is shifted around) , which is attached - I felt that it would be useful to determine if the community feels that this is a worth-while undertaking in advance of a business trip that I'm leaving on tomorrow lasting until Friday, during which I will be mostly unavailable. The patch breaks the Postgres sorting executor node (at least when it quicksorts) for any type other than int4. I apologise for how rough the patch is, but the code itself isn't important right now - the idea is. I anticipate that the value state->datumType or something similar will be set in a near future revision, so that tuplesort_performsort will know which type-specific optimisation it can use for the type, while falling back on the existing generic qsort_arg + qsort_arg_comparator, and sorting won't actually be broken. I've been doing some preliminary testing using the dell store 2 sample database. I increase work_mem to '50MB', to ensure that a quicksort will be performed for sorting (otherwise, I'm using the postgresql.conf that initdb gave me). The query is: explain analyze select * from orderlines order by prod_id; Once the cache has been warmed, explain analyze very consistently reports a runtime of 123ms for this query on master/HEAD, which varies +/- 1 ms, with a few outliers of maybe +/- 2ms. However, when I apply this patch, that goes down to 107ms +/- 1ms at -O0. I think that that's a pretty good start. Funnily enough, the difference/advantage vanishes at -O2 (I'm guessing that the higher optimisation level of GCC 4.5 hyper-corrects away the inlining, but I don't have time to check that right now). I imagine the version that I actually submit for patch review will have a macro-based infrastructure for inlining the sorting of various built-in types, initially integers and floats. It will most likely have some other optimisations - I haven't even used a profiler yet. This performance patch differs from most in that it's difficult in principle to imagine a performance regression occurring. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services