Thread

  1. Re: MergeAppend could consider sorting cheapest child path

    Alexander Pyhalov <a.pyhalov@postgrespro.ru> — 2025-04-29T17:25:48Z

    Andrei Lepikhov писал(а) 2025-04-29 16:52:
    > On 4/25/25 17:13, Andrei Lepikhov wrote:
    >> On 4/25/25 11:16, Alexander Pyhalov wrote:
    >>> Usually, sorted cheapest_total_path will be cheaper than sorted 
    >>> fractional/startup path at least by startup cost (as after sorting it 
    >>> includes total_cost of input path). But we ignore this case when 
    >>> selecting cheapest_startup and cheapest_fractional subpaths. As 
    >>> result selected cheapest_startup and cheapest_fractional can be not 
    >>> cheapest for startup or selecting a fraction of rows.
    >> I don't know what you mean by that. The cheapest_total_path is 
    >> considered when we chose optimal cheapest_total path. The same works 
    >> for the fractional path - get_cheapest_fractional_path gives us the 
    >> most optimal fractional path and probes cheapest_total_path too.
    >> As above, not sure about min-startup case for now. I can imagine 
    >> MergeAppend over sophisticated subquery: non-sorted includes highly 
    >> parameterised JOINs and the alternative (with pathkeys) includes 
    >> HashJoin, drastically increasing startup cost. It is only a theory, of 
    >> course. So, lets discover how min-startup works.
    > After a second thought I have caught your idea. I agree that for a 
    > fractional path it have no sense to choose any other path except a 
    > cheapest total one.
    > There are the modified patch in the attachment.
    > 
    > Also, to be more objective, I propose to use examples in argumentation 
    > - something like in attached test2.sql script.
    
    Hi.
    I've looked through new patch and found minor inconsistencies in 
    get_cheapest_path_for_pathkeys_ext() and 
    get_cheapest_fractional_path_for_pathkeys_ext().
    
    In get_cheapest_fractional_path_for_pathkeys_ext() we check that 
    base_path is not NULL
            path = get_cheapest_fractional_path_for_pathkeys(rel->pathlist, 
    pathkeys,
                                                                              
                                    required_outer, fraction);
    
            base_path = rel->cheapest_total_path;
    
            /* Stop here if the path doesn't satisfy necessary conditions */
            if (!base_path || !bms_is_subset(PATH_REQ_OUTER(base_path), 
    required_outer))
                    return path;
    
    But it seems, base_path can't be NULL (as add_paths_to_append_rel() is 
    called after set_rel_pathlist() for childrels).
    However,  path can. Can we do these two functions 
    get_cheapest_path_for_pathkeys_ext() and 
    get_cheapest_fractional_path_for_pathkeys_ext()
    more similar?
    
    Also we check base_path for required_outer and require_parallel_safe, 
    but if cheapest path for pathkeys is NULL, these checks are not 
    performed. Luckily, they seen to be no-op anyway due to 
    cheapest_total->param_info == NULL and function arguments being NULL 
    (required_outer) and false (require_parallel_safe). Should we do 
    something about this? Don't know, perhaps, remove these misleading 
    arguments?
    
    Now, if we return cheapest_total_path from 
    get_cheapest_fractional_path_for_pathkeys_ext() if cheapest paths for 
    pathkeys don't exist, do the following lines
    
    
                                     /*
                                      * If we found no path with matching 
    pathkeys, use the
                                      * cheapest total path instead.
                                      *
                                      * XXX We might consider partially 
    sorted paths too (with an
                                      * incremental sort on top). But we'd 
    have to build all the
                                      * incremental paths, do the costing 
    etc.
                                      */
                                     if (!cheapest_fractional)
                                             cheapest_fractional = 
    cheapest_total;
    
    become no-op? And we do return non-null path from  
    get_cheapest_fractional_path_for_pathkeys_ext(), as it seems we return 
    either cheapest_total_path or cheapest fractional path from 
    get_cheapest_fractional_path_for_pathkeys_ext().
    
    -- 
    Best regards,
    Alexander Pyhalov,
    Postgres Professional