Re: Virtual generated columns

jian he <jian.universality@gmail.com>

From: jian he <jian.universality@gmail.com>
To: Dean Rasheed <dean.a.rasheed@gmail.com>
Cc: Peter Eisentraut <peter@eisentraut.org>, pgsql-hackers <pgsql-hackers@postgresql.org>
Date: 2024-08-29T09:01:00Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Expand virtual generated columns for ALTER COLUMN TYPE

  2. Eliminate code duplication in replace_rte_variables callbacks

  3. Expand virtual generated columns in the planner

  4. Virtual generated columns

  5. Additional tests for stored generated columns

  6. Improve generated_stored test

  7. Fix handling of CREATE DOMAIN with GENERATED constraint syntax

  8. Add pg_constraint rows for not-null constraints

  9. Put generated_stored test objects in a schema

  10. Rename regress test generated to generated_stored

  11. Small code simplification

  12. Remove useless code

  13. Remove useless initializations

  14. doc: Clarify that pg_attrdef also stores generation expressions

  15. Clean out column-level pg_init_privs entries when dropping tables.

  16. Re-implement the ereport() macro using __VA_ARGS__.

Attachments

On Wed, Aug 21, 2024 at 6:52 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 21 Aug 2024 at 08:00, Peter Eisentraut <peter@eisentraut.org> wrote:
> >
> > On 08.08.24 20:22, Dean Rasheed wrote:
> > > Looking at the rewriter changes, it occurred to me that it could
> > > perhaps be done more simply using ReplaceVarsFromTargetList() for each
> > > RTE with virtual generated columns. That function already has the
> > > required wholerow handling code, so there'd be less code duplication.
> >
> > Hmm, I don't quite see how ReplaceVarsFromTargetList() could be used
> > here.  It does have the wholerow logic that we need somehow, but other
> > than that it seems to target something different?
> >
>
> Well what I was thinking was that (in fireRIRrules()'s final loop over
> relations in the rtable), if the relation had any virtual generated
> columns, you'd build a targetlist containing a TLE for each one,
> containing the generated expression. Then you could just call
> ReplaceVarsFromTargetList() to replace any Vars in the query with the
> corresponding generated expressions. That takes care of descending
> into subqueries, adjusting varlevelsup, and expanding wholerow Vars
> that might refer to the generated expression.
>
> I also have half an eye on how this patch will interact with my patch
> to support RETURNING OLD/NEW values. If you use
> ReplaceVarsFromTargetList(), it should just do the right thing for
> RETURNING OLD/NEW generated expressions.
>
> > > I think it might be better to do this from within fireRIRrules(), just
> > > after RLS policies are applied, so it wouldn't need to worry about
> > > CTEs and sublink subqueries. That would also make the
> > > hasGeneratedVirtual flags unnecessary, since we'd already only be
> > > doing the extra work for tables with virtual generated columns. That
> > > would eliminate possible bugs caused by failing to set those flags.
> >
> > Yes, ideally, we'd piggy-back this into fireRIRrules().  One thing I'm
> > missing is that if you're descending into subqueries, there is no link
> > to the upper levels' range tables, which we need to lookup the
> > pg_attribute entries of column referencing Vars.  That's why there is
> > this whole custom walk with its own context data.  Maybe there is a way
> > to do this already that I missed?
> >
>
> That link to the upper levels' range tables wouldn't be needed because
> essentially using ReplaceVarsFromTargetList() flips the whole thing
> round: instead of traversing the tree looking for Var nodes that need
> to be replaced (possibly from upper query levels), you build a list of
> replacement expressions to be applied and apply them from the top,
> descending into subqueries as needed.
>
> Another argument for doing it that way round is to not add too many
> extra cycles to the processing of existing queries that don't
> reference generated expressions. ISTM that this patch is potentially
> adding quite a lot of additional overhead -- it looks like, for every
> Var in the tree, it's calling get_attgenerated(), which involves a
> syscache lookup to see if that column is a generated expression (which
> most won't be). Ideally, we should be trying to do the minimum amount
> of extra work in the common case where there are no generated
> expressions.
>
> Looking ahead, I can also imagine that one day we might want to
> support subqueries in generated expressions. That would require
> recursive processing of generated expressions in the generated
> expression's subquery, as well as applying RLS policies to the new
> relations pulled in, and checks to guard against infinite recursion.
> fireRIRrules() already has the infrastructure to support all of that,
> so that feels like a much more natural place to do this.
>

Is the attached something you are thinking of?
(mainly see changes of src/backend/rewrite/rewriteHandler.c)

i bloated rewriteHandler.c a lot, mainly because
expand_generated_columns_in_expr
not using ReplaceVarsFromTargetList, only expand_generated_columns_in_query do.



if we are using ReplaceVarsFromTargetList, then
expand_generated_columns_in_expr also needs to use ReplaceVarsFromTargetList?


I don't think we can call ReplaceVarsFromTargetList within
expand_generated_columns_in_expr.


if so, then the pattern would be like:
{
            Node       *tgqual;
            tgqual = (Node *) expand_generated_columns_in_expr(tgqual,
relinfo->ri_RelationDesc, context);
           ReplaceVarsFromTargetList
}

There are 6 of expand_generated_columns_in_expr called.