Thread

  1. Re: Qual push down to table AM

    amit <amitlangote09@gmail.com> — 2025-12-15T12:56:12Z

    On Thu, Dec 11, 2025 at 12:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:
    > > On 2025-12-09 16:40:17 -0500, Robert Haas wrote:
    > > > On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
    > > > Potentially, there could be a performance problem
    > >
    > > I think the big performance hazard with this is repeated deforming. The
    > > scankey infrastructure deforms attributes one-by-one *and* it does not
    > > "persist" the work of deforming for later accesses.  So if you e.g. have
    > > something like
    > >
    > >   SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
    > > or
    > >   SELECT * FROM tbl WHERE col_30 = common_value;
    > >
    > > we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
    > > deform columns 1-29 in the slot (because we always deform all the leading
    > > columns), during projection.
    >
    > Hmm, this is a good point, and I agree that it's a huge challenge for
    > this patch set. Repeated tuple deforming is REALLY expensive, which is
    > why we've spent so much energy trying to use slots in an as many
    > places as possible. I find it easy to believe that HeapKeyTest's loop
    > over heap_getattr() is going to prohibitively painful and that this
    > code will need to somehow also be slot-ified for this to be a viable
    > project.
    >
    > > I don't really see this being viable without first tackling two nontrivial
    > > projects:
    > >
    > > 1) Make slot deforming for expressions & projections selective, i.e. don't
    > >    deform all the leading columns, but only ones that will eventually be
    > >    needed
    > > 2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
    > >    and to make deforming of multiple columns sufficiently efficient.
    >
    > IOW, I agree that we probably need to do #2. I am not entirely sure
    > about #1.
    
    I'm also curious to understand why Andres sees #1 as a prerequisite
    for qual pushdown.
    
    > I'm a little afraid that trying to skip over columns without
    > deforming them will add a bunch of code complexity that doesn't really
    > pay off.
    
    I think it might be worthwhile. I have a PoC [1] I worked on (at
    Andres's suggestion) that showed ~2x improvement on simple aggregation
    queries over wide tables (all pages buffered) by tracking the minimum
    needed attribute and using cached offsets stored in TupleDesc to skip
    fixed-not-null prefixes. I'm thinking of reviving it with proper
    tracking of which attributes are needed and deformed (bitmapset or
    flag array in TupleTableSlot).
    
    > > > So, somewhat to my surprise, I think that v4-0001 might be basically
    > > > fine. I wonder if anyone else sees a problem that I'm missing?
    > >
    > > I doubt this would be safe as-is: ISTM that if you release the page lock
    > > between tuples, things like the number of items on the page can change. But we
    > > store stuff like that in registers / on the stack, which could change while
    > > the lock is not held.
    > >
    > > We could refetch the number items on the page for every loop iteration, but
    > > that'd probably not be free. OTOH, it's probably nothing compared to the cost
    > > of relocking the page...
    >
    > We still hold a pin, though, which I think means very little can
    > change. More items can be added to the page, so we might want to
    > refresh the number of items on the page at least when we think we're
    > done, but I believe that any sort of more invasive page rearrangement
    > would be precluded by the pin.
    >
    > I kind of wonder if it would be good to make a change along the lines
    > of v4-0001 even if this patch set doesn't move forward overall, or
    > will need a lot of slot-ification to do so. It seems weird to me that
    > we're OK with calling out to arbitrary code with a buffer lock held,
    > and even weirder that whether or not we do that depends on whether
    > SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
    > between pagemode behavior and non-pagemode behavior would survive
    > review if someone proposed it today; the fact that it works the way it
    > does is probably an artifact of this mechanism having been added
    > twenty years ago when the project was in a very different place.
    
    One maybe crazy thought: what about only enabling qual pushdown when
    pagemode is used, since it already processes all tuples on a page in
    one locked phase? That raises the question of whether there's a class
    of quals simple enough (built-in ops?) that evaluating them alongside
    visibility checking would be acceptable, with lock held that is -- but
    it would avoid the lock churn and racy loop termination issues with
    v4-0001.
    
    --
    Thanks, Amit Langote
    
    [1] https://www.postgresql.org/message-id/CA%2BHiwqHXDY6TxegR2Cr_4sRa_LY1QJnoL8XRmOqdfrx21pZ6cw%40mail.gmail.com