Re: proposal: schema variables

Tomas Vondra <tomas.vondra@2ndquadrant.com>

From: Tomas Vondra <tomas.vondra@2ndquadrant.com>

To: Pavel Stehule <pavel.stehule@gmail.com>

Cc: Philippe BEAUDOIN <phb07@apra.asso.fr>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2020-01-21T23:41:41Z

Lists: pgsql-hackers, pgsql-performance

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Move WAL sequence code into its own file
- a87987cafca6 19 (unreleased) cited
Add ExplainState argument to pg_plan_query() and planner().
- c83ac02ec730 19 (unreleased) cited
Don't include access/htup_details.h in executor/tuptable.h
- 1a8b5b11e48a 19 (unreleased) cited
Refactor to avoid code duplication in transformPLAssignStmt.
- b0fb2c6aa5a4 19 (unreleased) cited
Avoid including commands/dbcommands.h in so many places
- 325fc0ab14d1 19 (unreleased) cited
Restrict psql meta-commands in plain-text dumps.
- 71ea0d679543 19 (unreleased) cited
Split func.sgml into more manageable pieces
- 4e23c9ef65ac 19 (unreleased) cited
Fix squashing algorithm for query texts
- 0f65f3eec478 18.0 cited
EXPLAIN: Always use two fractional digits for row counts.
- 95dbd827f2ed 18.0 cited
Preliminary refactoring of plpgsql expression construction.
- a654af21ae52 18.0 cited
plpgsql: pure parser and reentrant scanner
- 7b27f5fd36cb 18.0 cited
Add some sanity checks in executor for query ID reporting
- 24f520594809 18.0 cited
Fix misleading error message context
- 4af123ad45bd 18.0 cited
Add macros for looping through a List without a ListCell.
- 14dd0f27d7cd 17.0 cited

Hi,

I did a quick review of this patch today, so let me share a couple of
comments.

Firstly, the patch is pretty large - it has ~270kB. Not the largest
patch ever, but large. I think breaking the patch into smaller pieces
would significantly improve the chnce of it getting committed.

Is it possible to break the patch into smaller pieces that could be
applied incrementally? For example, we could move the "transactional"
behavior into a separate patch, but it's not clear to me how much code
would that actually move to that second patch. Any other parts that
could be moved to a separate patch?

I see one of the main contention points was a rather long discussion
about transactional vs. non-transactional behavior. I agree with Pavel's
position that the non-transactional behavior should be the default,
simply because it's better aligned with what the other databases are
doing (and supporting migrations seems like one of the main use cases
for this feature).

I do understand it may not be suitable for some other use cases,
mentioned by Fabien, but IMHO it's fine to require explicit
specification of transactional behavior. Well, we can't have both as
default, and I don't think there's an obvious reason why it should be
the other way around.

Now, a bunch of comments about the code (some of that nitpicking):


1) This hunk in doc/src/sgml/catalogs.sgml still talks about "table
creation" instead of schema creation:

      <row>
       <entry><structfield>vartypmod</structfield></entry>
       <entry><type>int4</type></entry>
       <entry></entry>
       <entry>
        <structfield>vartypmod</structfield> records type-specific data
        supplied at table creation time (for example, the maximum
        length of a <type>varchar</type> column).  It is passed to
        type-specific input functions and length coercion functions.
        The value will generally be -1 for types that do not need <structfield>vartypmod</structfield>.
       </entry>
      </row>


2) This hunk in doc/src/sgml/ref/alter_default_privileges.sgml uses
"role_name" instead of "variable_name"

     GRANT { READ | WRITE | ALL [ PRIVILEGES ] }
     ON VARIABLES
     TO { [ GROUP ] <replaceable class="parameter">role_name</replaceable> | PUBLIC } [, ...] [ WITH GRANT OPTION ]


3) I find the syntax in create_variable.sgml a bit too over-complicated:

<synopsis>
CREATE [ { TEMPORARY | TEMP } ] [ { TRANSACTIONAL | TRANSACTION } ] VARIABLE [ IF NOT EXISTS ] <replaceable class="parameter">name</replaceable> [ AS ] <replaceable class="parameter">data_type</replaceable> ] [ COLLATE <replaceable class="parameter">collation</replaceable> ]
     [ NOT NULL ] [ DEFAULT <replaceable class="parameter">default_expr</replaceable> ] [ { ON COMMIT DROP | ON { TRANSACTIONAL | TRANSACTION } END RESET } ]
</synopsis>

Do we really need both TRANSACTION and TRANSACTIONAL? Why not just one
that we already have in the grammar (i.e. TRANSACTION)?


4) I think we should rename schemavariable.h to schema_variable.h.


5) objectaddress.c has extra line after 'break;' in one switch.


6) The comment is wrong:

     /*
      * Find the ObjectAddress for a type or domain
      */
     static ObjectAddress
     get_object_address_variable(List *object, bool missing_ok)


7) I think the code/comments are really inconsistent in talking about
"variables" and "schema variables". For example in objectaddress.c we do
these two things:

         case OCLASS_VARIABLE:
             appendStringInfoString(&buffer, "schema variable");
             break;

vs.

         case DEFACLOBJ_VARIABLE:
             appendStringInfoString(&buffer,
                                    " on variables");
             break;

That's going to be confusing for people.


8) I'm rather confused by CMD_PLAN_UTILITY, which seems to be defined
merely to support LET. I'm not sure why that's necessary (Why wouldn't
CMD_UTILITY be sufficient?).

Having to add conditions checking for CMD_PLAN_UTILITY to various places
in planner.c is rather annoying, and I wonder how likely it's this will
unnecessarily break external code in extensions etc.


9) This comment in planner.c seems obsolete (not updated to reflect
addition of the CMD_PLAN_UTILITY check):

   /*
    * If this is an INSERT/UPDATE/DELETE, and we're not being called from
    * inheritance_planner, add the ModifyTable node.
    */
   if (parse->commandType != CMD_SELECT && parse->commandType != CMD_PLAN_UTILITY && !inheritance_update)


10) I kinda wonder what happens when a function is used in a WHERE
condition, but it depends on a variable and alsu mutates it on each
call ...


11) I think Query->hasSchemaVariable (in analyze.c) should be renamed to
hasSchemaVariables (which reflects the other fields referring to things
like window functions etc.)


12) I find it rather suspicious that we make decisions in utility.c
solely based on commandType (whether it's CMD_UTILITY or not). IMO
it's pretty strange/ugly that T_LetStmt can be both CMD_UTILITY and
CMD_PLAN_UTILITY:

     case T_LetStmt:
         {
             if (pstmt->commandType == CMD_UTILITY)
                 doLetStmtReset(pstmt);
             else
             {
                 Assert(pstmt->commandType == CMD_PLAN_UTILITY);
                 doLetStmtEval(pstmt, params, queryEnv, queryString);
             }

             if (completionTag)
                 strcpy(completionTag, "LET");
         }
         break;


13) Not sure why we moved DO_TABLE in addBoundaryDependencies
(pg_dump.c), seems unnecessary:

      case DO_CONVERSION:
-    case DO_TABLE:
+    case DO_VARIABLE:
      case DO_ATTRDEF:
+    case DO_TABLE:
      case DO_PROCLANG:


14) namespace.c defines VariableIsVisible twice:

     extern bool VariableIsVisible(Oid relid);
     ...
     extern bool VariableIsVisible(Oid varid);


15) I'd say lookup_variable and identify_variable should use camelcase
just like the other functions in the same file.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services