Thread

  1. Re: postgres_fdw: Use COPY to speed up batch inserts

    Matheus Alcantara <matheusssilv97@gmail.com> — 2025-11-06T23:49:37Z

    On Fri Oct 31, 2025 at 4:02 PM -03, I wrote:
    > It's showing a bit complicated to decide at runtime if we should use the
    > COPY or INSERT for batch insert into a foreign table. Perhaps we could
    > add a new option on CREATE FOREIGN TABLE to enable this usage or not? We
    > could document the performance improvements and the limitations so the
    > user can decide if it should enable or not.
    >
    Here is v5 that implement this idea.
    
    On this version I've introduced a foreign table and foreign server
    option "use_copy_for_insert" (I'm open for a better name) that enable
    the use of the COPY as remote command to execute an INSERT into a
    foreign table. The COPY can be used if the user enable this option on
    the foreign table or the foreign server and if the original INSERT
    statement don't have a RETURNING clause.
    
    See the benchmark results:
    
    pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres
    
    Master (batch_size = 1 with a single row to insert):
    tps = 16000.768037
    
    Master (batch_size = 1 with 1000 rows to insert):
    tps = 133.451518
    
    Master (batch_size = 100 with 1000 rows to insert):
    tps = 1274.096347
    
    -----------------
    
    Patch(batch_size = 1, use_copy_for_insert = false with single row to
    insert)
    tps = 15734.155705
    
    Master (batch_size = 1, use_copy_for_insert = false with 1000 rows to
    insert):
    tps = 132.644801
    
    Master (batch_size = 100, use_copy_for_insert = false with 1000 rows to
    insert):
    tps = 1245.514591
    
    -----------------
    
    Patch(batch_size = 1, use_copy_for_insert = true with single row to
    insert)
    tps = 17604.394057
    
    Master (batch_size = 1, use_copy_for_insert = true with 1000 rows to
    insert):
    tps = 88.998804
    
    Master (batch_size = 100, use_copy_for_insert = true with 1000 rows to
    insert):
    tps = 2406.009249
    
    -----------------
    
    We can see that when batching inserting with the batch_size configured
    properly we have a very significant performance improvement and when the
    "use_copy_for_insert" option is disabled the performance are close
    compared with master.
    
    The problem is when the "batch_size" is 1 (default) and
    "use_copy_for_insert" is enabled. This is because on this scenario we
    are sending multiple COPY commands with a single row to the foreign
    server.
    
    One way to fix this would to decide at runtime (at
    execute_foreign_modify()) if the COPY can be used based on the number of
    rows being insert. I don't think that I like this option because it
    would make the EXPLAIN output different when the ANALYZE option is used
    since during planning time we don't have the number of rows being
    inserted, so if just EXPLAIN(VERBOSE) is executed we would show the
    INSERT as remote SQL, and if the ANALYZE is included and we have enough
    rows to enable the COPY usage, the remote SQL would show the COPY
    command.
    
    Since the new "use_copy_for_insert" option is be disabled by default I
    think that we could document this limitation and mention the performance
    improvements when used correctly with the batch_size option.
    
    Another option would be to use the COPY command only if the
    "use_copy_for_insert" is true and also if the "batch_size" is > 1. We
    would still have the performance issue if the user insert a single row
    but we would close to less scenarios. The attached 0002 implement this
    idea.
    
    Thoughts?
    
    -- 
    Matheus Alcantara
    EDB: http://www.enterprisedb.com