Re: proposal: schema variables

Dmitry Dolgov <9erthalion6@gmail.com>

From: Dmitry Dolgov <9erthalion6@gmail.com>

To: Pavel Stehule <pavel.stehule@gmail.com>

Cc: Laurenz Albe <laurenz.albe@cybertec.at>, Erik Rijkers <er@xs4all.nl>, Michael Paquier <michael@paquier.xyz>, Amit Kapila <amit.kapila16@gmail.com>, DUVAL REMI <REMI.DUVAL@cheops.fr>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2024-11-16T14:27:30Z

Lists: pgsql-hackers, pgsql-performance

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Move WAL sequence code into its own file
- a87987cafca6 19 (unreleased) cited
Add ExplainState argument to pg_plan_query() and planner().
- c83ac02ec730 19 (unreleased) cited
Don't include access/htup_details.h in executor/tuptable.h
- 1a8b5b11e48a 19 (unreleased) cited
Refactor to avoid code duplication in transformPLAssignStmt.
- b0fb2c6aa5a4 19 (unreleased) cited
Avoid including commands/dbcommands.h in so many places
- 325fc0ab14d1 19 (unreleased) cited
Restrict psql meta-commands in plain-text dumps.
- 71ea0d679543 19 (unreleased) cited
Split func.sgml into more manageable pieces
- 4e23c9ef65ac 19 (unreleased) cited
Fix squashing algorithm for query texts
- 0f65f3eec478 18.0 cited
EXPLAIN: Always use two fractional digits for row counts.
- 95dbd827f2ed 18.0 cited
Preliminary refactoring of plpgsql expression construction.
- a654af21ae52 18.0 cited
plpgsql: pure parser and reentrant scanner
- 7b27f5fd36cb 18.0 cited
Add some sanity checks in executor for query ID reporting
- 24f520594809 18.0 cited
Fix misleading error message context
- 4af123ad45bd 18.0 cited
Add macros for looping through a List without a ListCell.
- 14dd0f27d7cd 17.0 cited

> On Sat, Nov 16, 2024 at 07:10:31AM GMT, Pavel Stehule wrote:

Sorry, got distracted. Let me try to answer step by step.

> > As far as I recall, last time this topic was discussed in hackers, two
> > options were proposed: the one with VARIABLE(name), what you mention
> > here; and another one with adding variables to the FROM clause. The
> > VARIABLE(...) syntax didn't get much negative feedback, so I guess why
> > not -- if you find it fitting, it would be interesting to see the
> > implementation.
> >
> > I'm afraid it should not be just an alternative syntax, but the only one
> > allowed, because otherwise I don't see how scenarious like "drop a
> > column with the same name" could be avoided. As in the previous thread:
> >
> >     -- we've got a variable b at the same time
> >     SELECT a, b FROM table1;
> >
>
> I am sorry, but I am in very strong opposition  against this idea. Nobody
> did reply to my questions, that can change my opinion.

From your reply it's not quite clear, are you opposed to have a mandatory
VARIABLE syntax, or having variables in the FROM clause? My main proposal was
about the former, but the points that are following seems to talk about the
latter. I think it's fine to reject the idea about the FROM clause, as long as
you got some reasonable arguments.

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
>
> but same risk you have any time in plpgsql - all time. I don't remember any
> bug report related to this issue.

Which exactly scenario about plpgsql do you have in mind? Just have tried to
declare a variable inside a plpgsql function with the same name as a table
column, and got an error about an ambiguous reference.

> Theoretically, variables can have the same names as tables. The table
> overshadows the variable, so it can work. But when somebody drops the
> variable, then the query still can work. So requirement of usage variable
> in FROM clause protects us just against drop column, but not against
> dropping table. In Postgres the dropping table is possibly risky due
> search_path (that introduces shadowing concept) without introduction
> variables. There is a possibility of this issue, but how common is this
> issue?

This sounds to me like an argument against allowing name clashing between
variables and tables. It makes even more sense, since session variables are in
many ways similar to tables.

> I think this issue can be partially similar to creating two equally named
> tables in different schemas (both schemas are in search path). When you
> drop one table, the query will work, but the result is different. It is the
> same issue. The SQL has no concept of shadowing and on the base line it is
> not necessary.

The point is that most of users are aware about schemas and search path
dangers. But to me such a precedent is not an excuse to introduce a new feature
with similar traps, which folks would have to learn by making mistakes. Judging
from the feedback to this patch over time, I've got an impression that lots of
people are also not fans of that.

> > Then dropping the column b, but everything still works beause the
> > variable b got silently picked up. But if it would be required to say
> > VARIABLE(b), then all fine.
> >
>
> In this scenario you will get  a warning related to variable shadowing
> (before you drop a column).
>
> [...]
>
> What do you think about the following design?  I can implement a warning
> "variable_usage_guard" when the variable is accessed without using
> VARIABLE() syntax. We can discuss later if this warning can be enabled by
> default or not. There I am open to any variant.

I don't follow what are you winning by that? In the context of problem above
(i.e. dropping a column), such a warning is functionally equivalend to a
warning about shadowing.

The problem is that it doesn't sound very appealing to have a feature, which
requires some extra efforts to be used in a right way (e.g. put everything into
a special vars schema, or keep an eye on logs). Most certainly there are such
bits in PostgreSQL today, with all the best practices, crowd wisdom, etc. But
the bar for new features in this sense is much higher, you can see it from the
feedback to this patch. Thus I believe it makes sense, from purely tactical
reasons, to not try to convince half of the community to lower the bar, but
instead try to modify the feature to make it more acceptable, even if some
parts you might not like.

Btw, could you repeat, what was exactly your argument against mandatory
VARIABLE() syntax? It's somehow scattered across many replies, would be great
to summarize it in a couple of phrases.

> Shadowing by self is not an issue, probably, but it is a signal of code
> quality problems.

Agree, but I'm afraid code quality of an average application using PostgreSQL
is quite low, so here we are.

As a side note, I've recently caught myself thinking "it would be cool to have
session variables here". The use case was preparing a policy for RLS, based on
some session-level data set by an application. This session-level data is of a
composite data type, so simple current_setting is cumbersome to use, and a
temporary table will be dropped at the end, taking the policy with it due to
the recorded dependency between them. Thus a session variable of some composite
type sounds like a good fit.