Re: Make relation_openrv atomic wrt DDL

Noah Misch <noah@2ndquadrant.com>

From: Noah Misch <noah@2ndQuadrant.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Greg Stark <stark@mit.edu>, Simon Riggs <simon@2ndquadrant.com>, pgsql-hackers@postgresql.org, heikki.linnakangas@enterprisedb.com
Date: 2011-07-07T19:54:37Z
Lists: pgsql-hackers
On Thu, Jul 07, 2011 at 11:43:30AM -0400, Robert Haas wrote:
> On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch <noah@2ndquadrant.com> wrote:
> > Attached.  I made the counter 64 bits wide, handled the nothing-found case per
> > your idea, and improved a few comments cosmetically.  I have not attempted to
> > improve the search_path interposition case.  We can recommend the workaround
> > above, and doing better looks like an excursion much larger than the one
> > represented by this patch.
> 
> I looked at this some more and started to get uncomfortable with the
> whole idea of having RangeVarLockRelid() be a wrapper around
> RangeVarGetRelid().  This hazard exists everywhere the latter function
> gets called, not just in relation_open().  So it doesn't seem right to
> fix the problem only in those places.

Yes; I wished to focus on the major case for this round, then address the
other callers later.  We can do it this way, though.

It does make long-term sense to expose only the lock-taking form, making
otherwise-unaffected callers say NoLock explicitly.

> So I went through and incorporated the logic proposed for
> RangeVarLockRelid() into RangeVarGetRelid() itself, and then went
> through and examined all the callers of RangeVarGetRelid().  There are
> some, such as has_table_privilege(), where it's really impractical to
> take any lock, first because we might have no privileges at all on
> that table and second because that could easily lead to a massive
> amount of locking for no particular good reason.  I believe Tom
> suggested that the right fix for these functions is to have them
> index-scan the system catalogs using the caller's MVCC snapshot, which
> would be right at least for pg_dump.  And there are other callers that
> cannot acquire the lock as part of RangeVarGetRelid() for a variety of
> other reasons.  However, having said that, there do appear to be a
> number of cases that are can be fixed fairly easily.
> 
> So here's a (heavily) updated patch that tries to do that, along with
> adding comments to the places where things still need more fixing.  In
> addition to the problems corrected by your last version, this fixes
> LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table
> variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong
> as it stands, since it acquires *no lock at all* on the table
> specified in the FROM clause, never mind the question of doing so
> atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE.

Looks basically sound; see a few code comments below.

> Regardless of exactly how we decide to proceed here, it strikes me
> that there is a heck of a lot more work that could stand to be done in
> this area...  :-(

Yes.  DDL-DDL concurrency is a much smaller practical concern, but it is a
quality-of-implementation hole.

> --- a/src/backend/catalog/namespace.c
> +++ b/src/backend/catalog/namespace.c

> @@ -238,37 +246,121 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
>  	}
>  
>  	/*
> -	 * Some non-default relpersistence value may have been specified.  The
> -	 * parser never generates such a RangeVar in simple DML, but it can happen
> -	 * in contexts such as "CREATE TEMP TABLE foo (f1 int PRIMARY KEY)".  Such
> -	 * a command will generate an added CREATE INDEX operation, which must be
> -	 * careful to find the temp table, even when pg_temp is not first in the
> -	 * search path.
> +	 * If lockmode = NoLock, the caller is assumed to already hold some sort
> +	 * of heavyweight lock on the target relation.  Otherwise, we're preceding
> +	 * here with no lock at all, which means that any answers we get must be
> +	 * viewed with a certain degree of suspicion.  In particular, any time we
> +	 * AcceptInvalidationMessages(), the anwer might change.  We handle that
> +	 * case by retrying the operation until either (1) no more invalidation
> +	 * messages show up or (2) the answer doesn't change.

The third sentence is true for all lock levels.  The fourth sentence is true
for lock levels _except_ NoLock.

> +		/*
> +		 * If no lock requested, we assume the caller knows what they're
> +		 * doing.  They should have already acquired a heavyweight lock on
> +		 * this relation earlier in the processing of this same statement,
> +		 * so it wouldn't be appropriate to AcceptInvalidationMessages()
> +		 * here, as that might pull the rug out from under them.
> +		 */

What sort of rug-pullout do you have in mind?  Also, I don't think it matters
if the caller acquired the lock during this _statement_.  It just needs to
hold a lock, somehow.

> --- a/src/backend/commands/lockcmds.c
> +++ b/src/backend/commands/lockcmds.c

> @@ -69,67 +70,10 @@ LockTableCommand(LockStmt *lockstmt)
>   * "rv" is NULL and we should just silently ignore any dropped child rel.

This comment refers to a now-removed argument.

>   */
>  static void
> -LockTableRecurse(Oid reloid, RangeVar *rv,
> -				 LOCKMODE lockmode, bool nowait, bool recurse)
> +LockTableRecurse(Relation rel, LOCKMODE lockmode, bool nowait, bool recurse)

> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -764,8 +764,15 @@ RemoveRelations(DropStmt *drop)
>  		 */
>  		AcceptInvalidationMessages();
>  
> -		/* Look up the appropriate relation using namespace search */
> -		relOid = RangeVarGetRelid(rel, true);
> +		/*
> +		 * Look up the appropriate relation using namespace search.
> +		 *
> +		 * XXX: Doing this without a lock is unsafe in the presence of
> +		 * concurrent DDL, but acquiring a lock here might violate the rule
> +		 * that a table must be locked before its corresponding index.
> +		 * So, for now, we ignore the hazzard.

Spelling.

> --- a/src/backend/rewrite/rewriteDefine.c
> +++ b/src/backend/rewrite/rewriteDefine.c
> @@ -196,11 +196,15 @@ DefineRule(RuleStmt *stmt, const char *queryString)
>  	Node	   *whereClause;
>  	Oid			relId;
>  
> -	/* Parse analysis ... */
> +	/* Parse analysis. */
>  	transformRuleStmt(stmt, queryString, &actions, &whereClause);
>  
> -	/* ... find the relation ... */
> -	relId = RangeVarGetRelid(stmt->relation, false);
> +	/*
> +	 * Find and lock the relation.  Lock level should match
> +	 * DefineQueryRewrite.
> +	 */
> +	relId = RangeVarGetRelid(stmt->relation, AccessExclusiveLock, false,
> +							 false);

Seems better to just pass the RangeVar to DefineQueryRewrite() ...

>  
>  	/* ... and execute */
>  	DefineQueryRewrite(stmt->rulename,
> @@ -235,17 +239,8 @@ DefineQueryRewrite(char *rulename,
>  	Query	   *query;
>  	bool		RelisBecomingView = false;
>  
> -	/*
> -	 * If we are installing an ON SELECT rule, we had better grab
> -	 * AccessExclusiveLock to ensure no SELECTs are currently running on the
> -	 * event relation.	For other types of rules, it is sufficient to grab
> -	 * ShareRowExclusiveLock to lock out insert/update/delete actions and to
> -	 * ensure that we lock out current CREATE RULE statements.
> -	 */
> -	if (event_type == CMD_SELECT)
> -		event_relation = heap_open(event_relid, AccessExclusiveLock);
> -	else
> -		event_relation = heap_open(event_relid, ShareRowExclusiveLock);
> +	/* lock level should match the one used in DefineRule */
> +	event_relation = heap_open(event_relid, AccessExclusiveLock);

... also making it cleaner to preserve this optimization.

Incidentally, you've added in many places this pattern of commenting that a
lock level must match another lock level used elsewhere.  Would it be better
to migrate away from looking up a relation oid in one function and opening the
relation in another function, instead passing either a Relation or a RangeVar?
You did substitute passing a Relation in a couple of places.

Thanks,
nm