Thread

  1. Re: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)

    Richard Guo <guofenglinux@gmail.com> — 2025-12-12T09:28:43Z

    On Thu, Dec 11, 2025 at 11:32 AM Haowu Ge <gehaowu@bitmoe.com> wrote:
    > Based on the "Minimal Reproducible Example" principle,
    > I previously created a simplified version of the example,
    > which you successfully fixed. However, during actual testing,
    > I found that the behavior does not meet expectations.
    > Specifically, I observed inconsistent index behavior when using a view compared to querying the underlying table directly.
    
    This seems to be a separate issue.  The query against the view chooses
    different indexes than the query against the underlying table.  This
    isn't the index capability issue you initially reported, but rather a
    cost estimation issue.
    
    Upon closer look, the issue is that the planner fails to look through
    PHVs when looking up statistical data.  Since examine_variable()
    relies on strict structural matching, the presence of PHVs prevents
    the planner from matching the expression to the table columns.  As a
    result, the view-based query falls back to default selectivity
    estimates, leading to a poor plan choice.
    
    To fix, we can strip all PHVs when looking up statistical data.  This
    is safe during estimation because PHVs are transparent for the purpose
    of statistics lookup; they do not alter the value distribution of the
    underlying expression.
    
    Regarding implementation, I considered reusing the stripper for index
    operands in 0001 but decided to use a dedicated function.  While the
    traversal structure is similar, the logic is fundamentally different.
    Stripping PHVs for index matching requires strict checks on
    phnullingrels and phexpr to ensure execution correctness.  For
    statistics lookup, however, we can be permissive and unconditionally
    strip all PHVs.  It could be argued that we can combine them into a
    single function using flags, but that seems to unnecessarily entangle
    two unrelated logics.
    
    There is one plan change in the regression tests with 0002, but that
    is precisely because the row count estimates become more accurate with
    this patch.  For instance:
    
    on master:
    ->  Seq Scan on parttbl1 parttbl  (cost=0.00..41.88 rows=13 width=8)
                              (actual time=0.034..0.036 rows=1.00 loops=1)
    
    
    on patched:
    ->  Seq Scan on parttbl1 parttbl  (cost=0.00..41.88 rows=1 width=8)
                              (actual time=0.062..0.064 rows=1.00 loops=1)
    
    - Richard