Thread

  1. Re: MergeAppend could consider sorting cheapest child path

    Alexander Pyhalov <a.pyhalov@postgrespro.ru> — 2025-04-25T09:16:29Z

    Andrei Lepikhov писал(а) 2025-04-24 16:01:
    > On 3/28/25 09:19, Alexander Pyhalov wrote:
    >> Andy Fan писал(а) 2024-10-17 03:33:
    >> I've updated patch. One more interesting case which we found - when 
    >> fractional path is selected, it still can be more expensive than 
    >> sorted cheapest total path (as we look only on indexes whith necessary 
    >> pathkeys, not on indexes which allow efficiently fetch data).
    >> So far couldn't find artificial example, but we've seen inadequate 
    >> index selection due to this issue - instead of using index suited for 
    >> filters in where, index, suitable for sorting was selected as one 
    >> having the cheapest fractional cost.
    > I think it is necessary to generalise the approach a little.
    > 
    > Each MergeAppend subpath candidate that fits pathkeys should be 
    > compared to the overall-optimal path + explicit Sort node. Let's label 
    > this two-node composition as base_path. There are three cases exist: 
    > startup-optimal, total-optimal and fractional-optimal.
    > In the startup case, base_path should use cheapest_startup_path, the 
    > total-optimal case - cheapest_total_path and for a fractional case, we 
    > may employ the get_cheapest_fractional_path routine to detect proper 
    > base_path.
    > 
    > It may provide a more effective plan either in full, fractional and 
    > partial scan cases:
    > 1. The Sort node may be pushed down to subpaths under a parallel or 
    > async Append.
    > 2. When a minor set of subpaths doesn't have a proper index, and it is 
    > profitable to sort them instead of switching to plain Append.
    > 
    > In general, analysing the regression tests changed by this code, I see 
    > that the cost model prefers a series of small sortings instead of a 
    > single massive one. May be it will show some profit for execution time.
    > 
    > I am not afraid of any palpable planning overhead here because we just 
    > do cheap cost estimation and comparison operations that don't need 
    > additional memory allocations. The caller is responsible for building a 
    > proper Sort node if this method is chosen as optimal.
    > 
    > In the attachment, see the patch written according to the idea. There 
    > are I introduced two new routines:
    > get_cheapest_path_for_pathkeys_ext
    > get_cheapest_fractional_path_for_pathkeys_ext
    
    Hi. I'm a bit confused that 
    get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting 
    cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in 
    STARTUP_COST case looks only on sorting cheapest_startup_path.
    Usually, sorted cheapest_total_path will be cheaper than sorted 
    fractional/startup path at least by startup cost (as after sorting it 
    includes total_cost of input path). But we ignore this case when 
    selecting cheapest_startup and cheapest_fractional subpaths. As result 
    selected cheapest_startup and cheapest_fractional can be not cheapest 
    for startup or selecting a fraction of rows.
    
    Consider the partition with the following access paths:
    
    1) cheapest_startup without required pathkeys:
       startup_cost: 0.42
       total_cost: 4004
    
    2) some index_path  with required pathkeys:
       startup_cost: 6.6
       total_cost: 2000
    
    3) cheapest_total_path:
       startup_cost: 0.42
       total_cost: 3.48
    
    Here, when selecting cheapest startup subpath we'll compare costs of 
    index path (2) and sorted cheapest_startup (1), but will ignore sorted 
    cheapest_total_path (3), despite the fact that it really can be the 
    cheapest startup path, providing required sorting order.
    
    -- 
    Best regards,
    Alexander Pyhalov,
    Postgres Professional