Re: Parallel INSERT SELECT take 2
Tomas Vondra <tomas@vondra.me>
From: Tomas Vondra <tomas@vondra.me>
To: "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>,
Greg Nancarrow <gregn4422@gmail.com>
Cc: PostgreSQL Developers <pgsql-hackers@lists.postgresql.org>, Amit Kapila <amit.kapila16@gmail.com>, Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>, Dilip Kumar <dilipbalaut@gmail.com>
Date: 2026-05-08T23:58:35Z
Lists: pgsql-hackers
Attachments
- v15-0001-CREATE-ALTER-TABLE-PARALLEL-DML.patch (text/x-patch)
- v15-0002-parallel-SELECT-for-INSERT.patch (text/x-patch)
- v15-0003-get-parallel-safety-functions.patch (text/x-patch)
- v15-0004-regression-test-and-doc-updates.patch (text/x-patch)
- v15-0005-wip-psql-tab-completion.patch (text/x-patch)
Hi, I've been recently reminded of this thread by a user asking if there's a way to enable parallel SELECT for an expensive ETL job implemented as an INSERT + SELECT. IIRC the user was migrating from Oracle, which supports parallelism for this. I was wondering what happened to the patches, and what would need to happen to move it forward for PG20. So I took a quick look. This is my attempt to summarize the existing patches, main problems, the proposed solutions as presented in 2021, and a couple of my opinions and alternative ideas. Attached is the last version from [1], rebased on current master. If fixes a bunch of regression failures (so that "make check" passes), adds a couple XXX review comments, and basic tab completion. I'm going to present (my understanding of) the patch history, the issues and the approach in the latest patch version, and an alternative idea. patch history overview ---------------------- There are multiple long-ish threads related to this patch series. The main ones being A) Parallel INSERT (INTO ... SELECT ...) https://www.postgresql.org/message-id/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com B) Bug in query rewriter - hasModifyingCTE not getting set https://www.postgresql.org/message-id/CAJcOf-fAdj%3DnDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g%40mail.gmail.com C) adding enable_parallel_insert GUC (+ revert of the feature) https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY@gemulon.postgresql.org D) Parallel INSERT SELECT take 2 https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com These are the main threads directly about to this patch. There are additional threads about related stuff, but let's keep it short. I skimmed all the threads, but I think (C) and (D) have the most relevant information about current direction and open issues. The initial PoC patch discussed in (A) was very ambitious - trying to parallelize both the INSERT and SELECT parts, if possible. Or at least the SELECT part. But AFAIK that was rather ambitions, and after a while the scope was reduced to parallelizing just the SELECT part (so the INSERT part happens only in the leader). Ultimately, that's what got committed in 05c8482f7f, but then reverted after a discussion in (C). The primary reason being the potential overhead due to checking parallel-safety when inserting into partitioned tables. For large hierarchies this can be costly, and it happens during planning for each individual - for expensive queries it likely wins, but for smaller ones it can easily cause regressions. There were other issues, but AFAICS this was the primary one. The GUC introduced in (C) was not deemed an acceptable solution, and the feature got reverted and a discussion about solutions started in (C) and (D). But after a while, the thread died off :-( The (D) thread identifies the following two issues: (1) lack of mechanism for parallel workers to assign an XID (2) significant overhead of checking parallel-safety I'm not sure if (1) is an issue we have to solve or not, and in (C) the opinions are not quite clear either. Robert does not like it, Andres seems to be somewhat OK with just obtaining the XID every time before entering a parallel mode. I personally agree with Andres, more or less. Of course, we may want a more "proper" way for workers to obtain a "shared" XID, in some way. I suppose they could request it from the leader somehow? Not sure, but the "hack" seems to be working fine, even if it's not pretty. The (2) seems to be the fundamental issue. The reverted patch checked parallel safety in each planning cycle, and evaluating parallel safety for large partition hierarchies is costly, and it requires locking all the partitions (not just those the query ends up using). declarative approach -------------------- The new declarative approach in inspired by Tom's post [3] - there's a new pg_class attribute, determining if a given target relation (including all it's child partitions) is assumed to be "safe" (i.e. at least restricted). I like that this is "caching" the safety information in pg_class (which means we don't have to do the expensive stuff even for the first insert in a connection). That seems like a good approach. But I very much dislike that we're shifting all the responsibility to the user/developer. I have very little faith people will understand what is required for safety, and I believe we'd get plenty of bogus bug reports due to this. I 100% agree with Robert [4] that labeling functions as parallel-safe is not the same as labeling target relations. We can't determine safety for functions (halting theorem and all that). We can determine safety for target relations, we just choose not to because of overhead. But why can't we keep the pg_class updated automatically? I mean, every time a parallel safety of a relation changes (e.g. because a trigger is added/removed, a default expression changes, ...), we could recalculate the pg_class flag, and then update all the parent relations too. If we did that, then we'd not need the manual declarations at all, I think. Sure, determining the safety may not be cheap, but so what? It should be only a very rare operation. If you do ALTER TABLE often enough for this to matter, you have other problems, I guess. Yes, there may be other problems, e.g. there may be deadlock hazards when updating the parent relations, as pointed out by Robert [5]. But that outght to be solvable, e.g. by figuring out which relations to lock and then locking them from the top (and retrying if needed). There's also suggestions there might be race conditions, e.g. [3]. I haven't thought about this very much, but I don't quite see why this would be any different from other ALTER TABLE commands. Maybe I'm missing something obvious ... I'm not sure how difficult it is to figure out all possibly affected relations, when e.g. a marking on a function changes. But these claims [6] seem rather hand-wavy, and I'd like to see some clear explanation why this is not practical. Maybe it'd require improving some of the dependency tracking, or whatever. But so what? New features often require that kind of stuff. alternative idea ---------------- What if we took a very different approach, and just made sure the INSERT part never runs concurrently with the SELECT? Say we fully materialize the SELECT result (e.g. write it into a tuplestore), finish the parallel mode, and only then do the INSERT? That's pretty much what existing workarounds do - they do the SELECT, stash the result somewhere (either outside the database or in a CTAS), and then do a separate INSERT into the table. AFAIK that wouldn't have any of the issues mentioned here. Yes, it may require writing the data into an temporary file, but that seems like something we could cost. I haven't even tried to implement this, but it seems workable. Or am I missing something? regards [1] https://www.postgresql.org/message-id/CAJcOf-cL1Kyb2fJkjD_%3Dp_8vhmKDL%2B5zp6quFYh4EkutAbahWQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/b54f2e306780449093c311118cd8a04e%40G08CNEXMBPEKD05.g08.fujitsu.local [3] https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us [4] https://www.postgresql.org/message-id/flat/E1lMiB9-0001c3-SY%40gemulon.postgresql.org [5] https://www.postgresql.org/message-id/CA%2BTgmobaQFUBkVOsmdOt9ag%3D3g0WjKfefP8ys1TV5GhHELfLOA%40mail.gmail.com [6] https://www.postgresql.org/message-id/OS0PR01MB5716A603121216A70266A86594419%40OS0PR01MB5716.jpnprd01.prod.outlook.com