Thread

Re: Qual push down to table AM

amit <amitlangote09@gmail.com> — 2025-12-15T12:56:12Z
On Thu, Dec 11, 2025 at 12:41 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2025-12-09 16:40:17 -0500, Robert Haas wrote:
> > > On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien@tachoires.me> wrote:
> > > Potentially, there could be a performance problem
> >
> > I think the big performance hazard with this is repeated deforming. The
> > scankey infrastructure deforms attributes one-by-one *and* it does not
> > "persist" the work of deforming for later accesses.  So if you e.g. have
> > something like
> >
> >   SELECT sum(col_29) FROM tbl WHERE col_30 = common_value;
> > or
> >   SELECT * FROM tbl WHERE col_30 = common_value;
> >
> > we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll
> > deform columns 1-29 in the slot (because we always deform all the leading
> > columns), during projection.
>
> Hmm, this is a good point, and I agree that it's a huge challenge for
> this patch set. Repeated tuple deforming is REALLY expensive, which is
> why we've spent so much energy trying to use slots in an as many
> places as possible. I find it easy to believe that HeapKeyTest's loop
> over heap_getattr() is going to prohibitively painful and that this
> code will need to somehow also be slot-ified for this to be a viable
> project.
>
> > I don't really see this being viable without first tackling two nontrivial
> > projects:
> >
> > 1) Make slot deforming for expressions & projections selective, i.e. don't
> >    deform all the leading columns, but only ones that will eventually be
> >    needed
> > 2) Perform ScanKey evaluation in slot form, to be able to cache the deforming
> >    and to make deforming of multiple columns sufficiently efficient.
>
> IOW, I agree that we probably need to do #2. I am not entirely sure
> about #1.

I'm also curious to understand why Andres sees #1 as a prerequisite
for qual pushdown.

> I'm a little afraid that trying to skip over columns without
> deforming them will add a bunch of code complexity that doesn't really
> pay off.

I think it might be worthwhile. I have a PoC [1] I worked on (at
Andres's suggestion) that showed ~2x improvement on simple aggregation
queries over wide tables (all pages buffered) by tracking the minimum
needed attribute and using cached offsets stored in TupleDesc to skip
fixed-not-null prefixes. I'm thinking of reviving it with proper
tracking of which attributes are needed and deformed (bitmapset or
flag array in TupleTableSlot).

> > > So, somewhat to my surprise, I think that v4-0001 might be basically
> > > fine. I wonder if anyone else sees a problem that I'm missing?
> >
> > I doubt this would be safe as-is: ISTM that if you release the page lock
> > between tuples, things like the number of items on the page can change. But we
> > store stuff like that in registers / on the stack, which could change while
> > the lock is not held.
> >
> > We could refetch the number items on the page for every loop iteration, but
> > that'd probably not be free. OTOH, it's probably nothing compared to the cost
> > of relocking the page...
>
> We still hold a pin, though, which I think means very little can
> change. More items can be added to the page, so we might want to
> refresh the number of items on the page at least when we think we're
> done, but I believe that any sort of more invasive page rearrangement
> would be precluded by the pin.
>
> I kind of wonder if it would be good to make a change along the lines
> of v4-0001 even if this patch set doesn't move forward overall, or
> will need a lot of slot-ification to do so. It seems weird to me that
> we're OK with calling out to arbitrary code with a buffer lock held,
> and even weirder that whether or not we do that depends on whether
> SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind
> between pagemode behavior and non-pagemode behavior would survive
> review if someone proposed it today; the fact that it works the way it
> does is probably an artifact of this mechanism having been added
> twenty years ago when the project was in a very different place.

One maybe crazy thought: what about only enabling qual pushdown when
pagemode is used, since it already processes all tuples on a page in
one locked phase? That raises the question of whether there's a class
of quals simple enough (built-in ops?) that evaluating them alongside
visibility checking would be acceptable, with lock held that is -- but
it would avoid the lock churn and racy loop termination issues with
v4-0001.

--
Thanks, Amit Langote

[1] https://www.postgresql.org/message-id/CA%2BHiwqHXDY6TxegR2Cr_4sRa_LY1QJnoL8XRmOqdfrx21pZ6cw%40mail.gmail.com