Thread
-
Re: Add a greedy join search algorithm to handle large join problems
Tomas Vondra <tomas@vondra.me> — 2025-12-11T17:30:46Z
Hi, Here's a more complete set of results from a TPC-DS run. See the run-queries-2.sh script for more details. There are also .sql files with DDL to create the database, etc. It does not include the parts to generate the data etc. (you'll need to the generator from TPC site). The attached CSV has results for scales 1 and 10, with 0 and 4 parallel workers. It runs three configurations: - master (geqo=off, threshold=12) - master-geqo (goo=off, threshold=2) - master-goo (goo=on, threshold=2) There's a couple more fields, e.g. whether it's cold/cached run, etc. A very simple summary of the results is the total duration of the run, for all 99 queries combined: scale workers caching master master-geqo master-goo =================================================================== 1 0 cold 816 399 1124 warm 784 369 1097 4 cold 797 384 1085 warm 774 366 1069 ------------------------------------------------------------------- 10 0 cold 2760 2653 2340 warm 2580 2470 2177 4 cold 2563 2423 1969 warm 2439 2325 1859 This is interesting, and also a bit funny. The funny part is that geqo seems to do better than master - on scale 1 it's pretty clear, on scale 10 the difference is much smaller. The interesting part is that "goo" is doing worse than master (or geqo) on scale 1, and better on scale 10. I wonder how would it do on larger scales, but I don't have such results. There's a PDF with per-query results too. This may be a little bit misleading because the statement timeout was set to 300s, and there's a couple queries that did not complete before this timeout. Maybe it'd be better to not include these queries. I haven't tried, though. It might be interesting to look at some of the queries that got worse, and check why. Maybe that'd help you with picking the heuristics? FWIW I still think no heuristics can be perfect, so getting slower plans for some queries should not be treated as "hard failure". The other thing is the quality of plans depends on GUCs like random_page_cost, and I kept them at default values. Anyway, I hope this is helpful input. regards -- Tomas Vondra