Thread

  1. Re: Qual push down to table AM

    Robert Haas <robertmhaas@gmail.com> — 2025-12-10T15:41:19Z

    On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:
    > On 2025-12-09 16:40:17 -0500, Robert Haas wrote:
    > > On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
    > > Potentially, there could be a performance problem
    >
    > I think the big performance hazard with this is repeated deforming. The
    > scankey infrastructure deforms attributes one-by-one *and* it does not
    > "persist" the work of deforming for later accesses.  So if you e.g. have
    > something like
    >
    >   SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
    > or
    >   SELECT * FROM tbl WHERE col_30 = common_value;
    >
    > we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
    > deform columns 1-29 in the slot (because we always deform all the leading
    > columns), during projection.
    
    Hmm, this is a good point, and I agree that it's a huge challenge for
    this patch set. Repeated tuple deforming is REALLY expensive, which is
    why we've spent so much energy trying to use slots in an as many
    places as possible. I find it easy to believe that HeapKeyTest's loop
    over heap_getattr() is going to prohibitively painful and that this
    code will need to somehow also be slot-ified for this to be a viable
    project.
    
    > I don't really see this being viable without first tackling two nontrivial
    > projects:
    >
    > 1) Make slot deforming for expressions & projections selective, i.e. don't
    >    deform all the leading columns, but only ones that will eventually be
    >    needed
    > 2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
    >    and to make deforming of multiple columns sufficiently efficient.
    
    IOW, I agree that we probably need to do #2. I am not entirely sure
    about #1. I'm a little afraid that trying to skip over columns without
    deforming them will add a bunch of code complexity that doesn't really
    pay off. You have to do the bookkeeping to know what to skip, and then
    how much are you really gaining by skipping it? If you can skip over a
    bunch of fixed-width columns, that's cool, but it's probably fairly
    normal to have lots of varlena columns, and then I don't really see
    that you're gaining much here. You still have to iterate through the
    tuple, and not storing the pointer to the start of each column as you
    find it doesn't seem like it will save much. What's your reasoning
    behind thinking that #1 will be necessary?
    
    > > So, somewhat to my surprise, I think that v4-0001 might be basically
    > > fine. I wonder if anyone else sees a problem that I'm missing?
    >
    > I doubt this would be safe as-is: ISTM that if you release the page lock
    > between tuples, things like the number of items on the page can change. But we
    > store stuff like that in registers / on the stack, which could change while
    > the lock is not held.
    >
    > We could refetch the number items on the page for every loop iteration, but
    > that'd probably not be free. OTOH, it's probably nothing compared to the cost
    > of relocking the page...
    
    We still hold a pin, though, which I think means very little can
    change. More items can be added to the page, so we might want to
    refresh the number of items on the page at least when we think we're
    done, but I believe that any sort of more invasive page rearrangement
    would be precluded by the pin.
    
    I kind of wonder if it would be good to make a change along the lines
    of v4-0001 even if this patch set doesn't move forward overall, or
    will need a lot of slot-ification to do so. It seems weird to me that
    we're OK with calling out to arbitrary code with a buffer lock held,
    and even weirder that whether or not we do that depends on whether
    SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
    between pagemode behavior and non-pagemode behavior would survive
    review if someone proposed it today; the fact that it works the way it
    does is probably an artifact of this mechanism having been added
    twenty years ago when the project was in a very different place.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com