Thread

Re: MergeAppend could consider sorting cheapest child path

Alexander Pyhalov <a.pyhalov@postgrespro.ru> — 2025-04-25T09:16:29Z
Andrei Lepikhov писал(а) 2025-04-24 16:01:
> On 3/28/25 09:19, Alexander Pyhalov wrote:
>> Andy Fan писал(а) 2024-10-17 03:33:
>> I've updated patch. One more interesting case which we found - when 
>> fractional path is selected, it still can be more expensive than 
>> sorted cheapest total path (as we look only on indexes whith necessary 
>> pathkeys, not on indexes which allow efficiently fetch data).
>> So far couldn't find artificial example, but we've seen inadequate 
>> index selection due to this issue - instead of using index suited for 
>> filters in where, index, suitable for sorting was selected as one 
>> having the cheapest fractional cost.
> I think it is necessary to generalise the approach a little.
> 
> Each MergeAppend subpath candidate that fits pathkeys should be 
> compared to the overall-optimal path + explicit Sort node. Let's label 
> this two-node composition as base_path. There are three cases exist: 
> startup-optimal, total-optimal and fractional-optimal.
> In the startup case, base_path should use cheapest_startup_path, the 
> total-optimal case - cheapest_total_path and for a fractional case, we 
> may employ the get_cheapest_fractional_path routine to detect proper 
> base_path.
> 
> It may provide a more effective plan either in full, fractional and 
> partial scan cases:
> 1. The Sort node may be pushed down to subpaths under a parallel or 
> async Append.
> 2. When a minor set of subpaths doesn't have a proper index, and it is 
> profitable to sort them instead of switching to plain Append.
> 
> In general, analysing the regression tests changed by this code, I see 
> that the cost model prefers a series of small sortings instead of a 
> single massive one. May be it will show some profit for execution time.
> 
> I am not afraid of any palpable planning overhead here because we just 
> do cheap cost estimation and comparison operations that don't need 
> additional memory allocations. The caller is responsible for building a 
> proper Sort node if this method is chosen as optimal.
> 
> In the attachment, see the patch written according to the idea. There 
> are I introduced two new routines:
> get_cheapest_path_for_pathkeys_ext
> get_cheapest_fractional_path_for_pathkeys_ext

Hi. I'm a bit confused that 
get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting 
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in 
STARTUP_COST case looks only on sorting cheapest_startup_path.
Usually, sorted cheapest_total_path will be cheaper than sorted 
fractional/startup path at least by startup cost (as after sorting it 
includes total_cost of input path). But we ignore this case when 
selecting cheapest_startup and cheapest_fractional subpaths. As result 
selected cheapest_startup and cheapest_fractional can be not cheapest 
for startup or selecting a fraction of rows.

Consider the partition with the following access paths:

1) cheapest_startup without required pathkeys:
   startup_cost: 0.42
   total_cost: 4004

2) some index_path  with required pathkeys:
   startup_cost: 6.6
   total_cost: 2000

3) cheapest_total_path:
   startup_cost: 0.42
   total_cost: 3.48

Here, when selecting cheapest startup subpath we'll compare costs of 
index path (2) and sorted cheapest_startup (1), but will ignore sorted 
cheapest_total_path (3), despite the fact that it really can be the 
cheapest startup path, providing required sorting order.

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional