Re: Parallel INSERT SELECT take 2

Tomas Vondra <tomas@vondra.me>
From: Tomas Vondra <tomas@vondra.me>
To: "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>, Greg Nancarrow <gregn4422@gmail.com>
Cc: PostgreSQL Developers <pgsql-hackers@lists.postgresql.org>, Amit Kapila <amit.kapila16@gmail.com>, Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com>, Dilip Kumar <dilipbalaut@gmail.com>
Date: 2026-05-08T23:58:35Z
Lists: pgsql-hackers
Attachments

v15-0001-CREATE-ALTER-TABLE-PARALLEL-DML.patch (text/x-patch)
v15-0002-parallel-SELECT-for-INSERT.patch (text/x-patch)
v15-0003-get-parallel-safety-functions.patch (text/x-patch)
v15-0004-regression-test-and-doc-updates.patch (text/x-patch)
v15-0005-wip-psql-tab-completion.patch (text/x-patch)
Hi,

I've been recently reminded of this thread by a user asking if there's a
way to enable parallel SELECT for an expensive ETL job implemented as an
INSERT + SELECT. IIRC the user was migrating from Oracle, which supports
parallelism for this. I was wondering what happened to the patches, and
what would need to happen to move it forward for PG20.

So I took a quick look. This is my attempt to summarize the existing
patches, main problems, the proposed solutions as presented in 2021, and
a couple of my opinions and alternative ideas.

Attached is the last version from [1], rebased on current master. If
fixes a bunch of regression failures (so that "make check" passes), adds
a couple XXX review comments, and basic tab completion.

I'm going to present (my understanding of) the patch history, the issues
and the approach in the latest patch version, and an alternative idea.


patch history overview
----------------------

There are multiple long-ish threads related to this patch series. The
main ones being

A) Parallel INSERT (INTO ... SELECT ...)
https://www.postgresql.org/message-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com

B) Bug in query rewriter - hasModifyingCTE not getting set
https://www.postgresql.org/message-id/CAJcOf-fAdj%3DnDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g%40mail.gmail.com

C) adding enable_parallel_insert GUC (+ revert of the feature)
https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY@gemulon.postgresql.org

D) Parallel INSERT SELECT take 2
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com

These are the main threads directly about to this patch. There are
additional threads about related stuff, but let's keep it short. I
skimmed all the threads, but I think (C) and (D) have the most relevant
information about current direction and open issues.

The initial PoC patch discussed in (A) was very ambitious - trying to
parallelize both the INSERT and SELECT parts, if possible. Or at least
the SELECT part. But AFAIK that was rather ambitions, and after a while
the scope was reduced to parallelizing just the SELECT part (so the
INSERT part happens only in the leader).

Ultimately, that's what got committed in 05c8482f7f, but then reverted
after a discussion in (C). The primary reason being the potential
overhead due to checking parallel-safety when inserting into partitioned
tables. For large hierarchies this can be costly, and it happens during
planning for each individual - for expensive queries it likely wins, but
for smaller ones it can easily cause regressions.

There were other issues, but AFAICS this was the primary one. The GUC
introduced in (C) was not deemed an acceptable solution, and the feature
got reverted and a discussion about solutions started in (C) and (D).

But after a while, the thread died off :-(

The (D) thread identifies the following two issues:

(1) lack of mechanism for parallel workers to assign an XID

(2) significant overhead of checking parallel-safety

I'm not sure if (1) is an issue we have to solve or not, and in (C) the
opinions are not quite clear either. Robert does not like it, Andres
seems to be somewhat OK with just obtaining the XID every time before
entering a parallel mode. I personally agree with Andres, more or less.
Of course, we may want a more "proper" way for workers to obtain a
"shared" XID, in some way. I suppose they could request it from the
leader somehow? Not sure, but the "hack" seems to be working fine, even
if it's not pretty.

The (2) seems to be the fundamental issue. The reverted patch checked
parallel safety in each planning cycle, and evaluating parallel safety
for large partition hierarchies is costly, and it requires locking all
the partitions (not just those the query ends up using).


declarative approach
--------------------

The new declarative approach in inspired by Tom's post [3]  - there's a
new pg_class attribute, determining if a given target relation
(including all it's child partitions) is assumed to be "safe" (i.e. at
least restricted).

I like that this is "caching" the safety information in pg_class (which
means we don't have to do the expensive stuff even for the first insert
in a connection). That seems like a good approach.

But I very much dislike that we're shifting all the responsibility to
the user/developer. I have very little faith people will understand what
is required for safety, and I believe we'd get plenty of bogus bug
reports due to this.

I 100% agree with Robert [4] that labeling functions as parallel-safe is
not the same as labeling target relations. We can't determine safety for
functions (halting theorem and all that). We can determine safety for
target relations, we just choose not to because of overhead.

But why can't we keep the pg_class updated automatically? I mean, every
time a parallel safety of a relation changes (e.g. because a trigger is
added/removed, a default expression changes, ...), we could recalculate
the pg_class flag, and then update all the parent relations too.

If we did that, then we'd not need the manual declarations at all, I
think. Sure, determining the safety may not be cheap, but so what? It
should be only a very rare operation. If you do ALTER TABLE often enough
for this to matter, you have other problems, I guess.

Yes, there may be other problems, e.g. there may be deadlock hazards
when updating the parent relations, as pointed out by Robert [5]. But
that outght to be solvable, e.g. by figuring out which relations to lock
and then locking them from the top (and retrying if needed).

There's also suggestions there might be race conditions, e.g. [3]. I
haven't thought about this very much, but I don't quite see why this
would be any different from other ALTER TABLE commands. Maybe I'm
missing something obvious ...

I'm not sure how difficult it is to figure out all possibly affected
relations, when e.g. a marking on a function changes. But these claims
[6] seem rather hand-wavy, and I'd like to see some clear explanation
why this is not practical. Maybe it'd require improving some of the
dependency tracking, or whatever. But so what? New features often
require that kind of stuff.


alternative idea
----------------

What if we took a very different approach, and just made sure the INSERT
part never runs concurrently with the SELECT? Say we fully materialize
the SELECT result (e.g. write it into a tuplestore), finish the parallel
mode, and only then do the INSERT?

That's pretty much what existing workarounds do - they do the SELECT,
stash the result somewhere (either outside the database or in a CTAS),
and then do a separate INSERT into the table.

AFAIK that wouldn't have any of the issues mentioned here. Yes, it may
require writing the data into an temporary file, but that seems like
something we could cost.

I haven't even tried to implement this, but it seems workable. Or am I
missing something?



regards


[1]
https://www.postgresql.org/message-id/CAJcOf-cL1Kyb2fJkjD_%3Dp_8vhmKDL%2B5zp6quFYh4EkutAbahWQ%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/b54f2e306780449093c311118cd8a04e%40G08CNEXMBPEKD05.g08.fujitsu.local

[3] https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us

[4]
https://www.postgresql.org/message-id/flat/E1lMiB9-0001c3-SY%40gemulon.postgresql.org

[5]
https://www.postgresql.org/message-id/CA%2BTgmobaQFUBkVOsmdOt9ag%3D3g0WjKfefP8ys1TV5GhHELfLOA%40mail.gmail.com

[6]
https://www.postgresql.org/message-id/OS0PR01MB5716A603121216A70266A86594419%40OS0PR01MB5716.jpnprd01.prod.outlook.com