Thread

  1. Re: Parallel INSERT SELECT take 2

    Tomas Vondra <tomas@vondra.me> — 2026-05-17T11:24:29Z

    On 5/11/26 13:44, Tomas Vondra wrote:
    > ...
    > 
    >>>
    >>>>> alternative idea
    >>>>> ----------------
    >>>>>
    >>>>> What if we took a very different approach, and just made sure the INSERT
    >>>>> part never runs concurrently with the SELECT? Say we fully materialize the
    >>>>> SELECT result (e.g. write it into a tuplestore), finish the parallel mode, and
    >>>>> only then do the INSERT?
    >>>>> ...
    
    
    I've been wondering about aspects of this approach, particularly about
    item (b) in my message, i.e. why we start/end the parallel mode
    ExecutePlan and not in the Gather node. Even if acquiring the XID before
    entering the parallel mode works, it's a bit annoying.
    
    So I asked about it on our Discord, and there was a little bit
    discussion, with the explanation that it's difficult to predict how
    exactly will the nodes interleave, and with doing this in Gather the
    nodes above might observe various values depending on the plan. At least
    that's how I understand/interpret the discussion. Of course, with the
    materialization this would not be an issue (I think).
    
    But Tom also mentioned one additional issue for parallelizing DML this
    way, and that's EvalPlanQual (EPQ), i.e. the mechanism we use to recheck
    concurrently updated rows in READ COMMITTED isolation mode.
    
    I don't see EPQ mentioned in these threads about parallelizing INSERT +
    SELECT at all. I suppose that's because EPQ is an issue only for
    UPDATE/DELETE queries, not for INSERT. Maybe everyone was aware of it,
    which is why the effort was limited to INSERT, not sure?
    
    The question is - is this acceptable? Did people expect to eventually
    extend the INSERT solution to UPDATE/DELETE in the future? Because the
    materialization does not really allow that, I think. Is it worth it, or
    would it be something we regret later? Not sure.
    
    In a way, users can already do exactly what the materialization does by
    
        CREATE UNLOGGED TABLE tmp AS SELECT ...;
        INSERT INTO t SELECT * FROM tmp;
    
    and they could even do something like this for UPDATE/DELETE, except
    that it (of course) won't do the EPQ recheck against relations in the
    original query.
    
    I'm not sure if we need to do something about EPQ. If it's acceptable to
    have parallel SELECT only for INSERT, then we don't need to do anything.
    In that case it's maybe a bit wrong to use "PARALLEL DML" in the syntax.
    
    If we need to eventually support all DML types, then I guess we'd need
    to come up with some way to do EPQ. A couple random ideas about that:
    
    a) It's probably not feasible to do EPQ rechecks with parallel plans, at
    least not exactly the way we do them now. The EPQ injects tuples into
    various scan nodes, and it's unclear how to do that with parallel plans.
    The leader would need to inject tuples into workers, it's not clear
    which worker(s) should get the tuple, maybe the workers are not still
    running at that point, etc. The EPQ state would have to be shared with
    the workers somehow, etc. That seems like a substantial amount of work.
    
    b) Perhaps it would be possible to identify plans that are somehow
    "safe" to parallelize / materialize as proposed for INSERTs? Are there
    plans that won't need to inject tuples into the "subquery" part of the
    ModifyTable? The executor/README says In UPDATE/DELETE/MERGE, only the
    target relation needs to be handled this way. so what if the target
    relation is not in the subplan at all (or in some part of it)? Of
    course, I'm not sure common such queries are, but in my experience it's
    not uncommon to have a large ETL job that aggregates a large table then
    inserts/merges the result into a different one. Also, queries with
    UPDATE ... FROM ... do a join with the target table, and EPQ may need to
    rerun the join, so the materialization would need to happen only on one
    side of the join (I think).
    
    c) Another option I can think of is doing the EPQ in the leader, a bit
    as if the parallel plan runs with no workers. So maybe we could "force"
    it to do this for EPQ rechecks? I suppose it'd need to have a completely
    separate plan for this purpose, not sure of that's possible.
    
    d) Maybe the answer to parallel DML is "don't do EPQ"? There's an
    article explaining how CockroachDB does EPQ, or rather how it solved the
    problem without EPQ by retrying the DML. Would be a significant change
    in behavior, of course, and I'm sure it has other disadvantages.
    
    e) AFAIK this is a problem only in READ COMMITTED. In SERIALIZABLE the
    DML simply fails if there are concurrent modifications, and it's up to
    the application to retry the whole thing (which for the ETL jobs would
    almost never happen). In a way this is a version of (d) but with the
    retry performed by the application. But it means parallelizing DML would
    be simpler in SERIALIZABLE, i.e. a stricter serialization mode, which is
    a bit surprising (but also not entirely).
    
    
    Anyway, all of this is mostly just a brain dump. I don't have a clear
    plan, or even a full understanding of how EPQ works, and the trade offs.
    
    
    regards
    
    -- 
    Tomas Vondra