Thread

  1. Re: [PATCH] Better Performance for PostgreSQL with large INSERTs

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-12-12T12:54:38Z

    On Wed, Nov 26, 2025 at 3:03 PM Filip Janus <fjanus@redhat.com> wrote:
    >
    >
    >
    >     -Filip-
    >
    >
    > út 7. 10. 2025 v 16:54 odesílatel Andres Freund <andres@anarazel.de> napsal:
    >>
    >> Hi,
    >>
    >> On 2025-10-07 15:03:29 +0200, Philipp Marek wrote:
    >> > > Have you tried to verify that this doesn't cause performance regressions
    >> > > in
    >> > > other workloads? pq_recvbuf() has this code:
    >> > >
    >> > ...
    >> > >
    >> > > I do seem to recall that just increasing the buffer size substantially
    >> > > lead to
    >> > > more time being spent inside that memmove() (likely due to exceeding
    >> > > L1/L2).
    >> >
    >> >
    >> > Do you have any pointers to discussions or other data about that?
    >> >
    >> >
    >> > My (quick) analysis was that clients that send one request,
    >> > wait for an answer, then send the next request wouldn't run that code
    >> > as there's nothing behind the individual requests that could be moved.
    >> >
    >> >
    >> > But yes, Pipeline Mode[1] might/would be affected.
    >> >
    >> > The interesting question is how much data can userspace copy before
    >> > that means more load than doing a userspace-kernel-userspace round trip.
    >> > (I guess that moving 64kB or 128kB should be quicker, especially since
    >> > the various CPU mitigations.)
    >>
    >> I unfortunately don't remember the details of where I saw it
    >> happening. Unfortunately I suspect it'll depend a lot on hardware and
    >> operating system details (like the security mitigations you mention) when it
    >> matters too.
    >>
    >>
    >> > As long as there are complete requests in the buffer the memmove()
    >> > could be avoided; only the initial part of the first incomplete request
    >> > might need moving to the beginning.
    >>
    >> Right.  I'd be inclined that that ought to be addressed as part of this patch,
    >> that way we can be sure that it's pretty sure it's not going to cause
    >> regressions.
    >
    >
    > I tried to benchmark the usage of memmove(), but I wasn’t able to hit the memmove() part of the code. This led me to a deeper investigation, and I realized that the memmove() call is probably in a dead part of the code.
    > pq_recvbuf is called when PqRecvPointer >= PqRecvLength, while memmove() is called later only if PqRecvLength > PqRecvPointer.
    > This results in a contradiction.
    >
    >>
    >> > The documentation says
    >> >
    >> >   > Pipelining is less useful, and more complex,
    >> >   > when a single pipeline contains multiple transactions
    >> >   > (see Section 32.5.1.3).
    >> >
    >> > are there any benchmarks/usage statistics for pipeline mode?
    >>
    >> You can write benchmarks for it using pgbench's pipeline support, with a
    >> custom script.
    >>
    >> Greetings,
    >>
    >> Andres Freund
    >>
    > I am also proposing the introduction of a new GUC variable for setting PQ_RECV_BUFFER_SIZE in the first patch. And the second patch removes the dead code.
    >
    
    
    Hi Filip,
    
    Can you please how have you verified it is giving you that some perf. increase?
    
    3 tries each, best:
    
    @ pq_recv_buffers = 2MB best of 3:
        latency average = 2.594 ms
        latency stddev = 0.352 ms
        initial connection time = 9.419 ms
        tps = 385.431723 (without initial connection time)
    
    @ pq_recv_buffers = default (8kB) best of 3:
        latency average = 2.629 ms
        latency stddev = 0.929 ms
        initial connection time = 9.937 ms
        tps = 380.336257 (without initial connection time)
    
    /usr/pgsql19/bin/pgbench  -h xxx -U app -f insert.sql -c 1 -P 1 -n -T
    5 -M prepared postgres
    where insert.sql was:
        echo "CREATE TEMPORARY TABLE IF NOT EXISTS file_storage (data
    BYTEA STORAGE EXTERNAL) ON COMMIT PRESERVE ROWS;" > insert.sql
        echo "INSERT INTO file_storage(data) VALUES ('" >> insert.sql
        perl -e 'print "A"x(1024*1024);' >> insert.sql # 1MB
        echo "');" >> insert.sql
    
    Some description of the env I had:
    - tuned TCP rmem/wmem and set congestion to BBR (to eliminate TCP as
    being a bottlneck)
    - low RTT (same AWS zone), max 25Gbps total, max 9.6 Gbps single stream TCP
    - as TOAST pglz compression way taking most of CPU in my case , I've
    changed it to lz4 also didn't help a lot, so I've changed it to avoid
    *any* compression
    - switched to temporary table to avoid I/O as much as possible,
    wal_level=minimal too
    - had to use prepared statements as otherwise I was hitting way too
    much CPU in parser (yylex routines)
    
    So I'm looking for a way to demonstrate the effect. I've also written
    a simple psypong2 based LO upload benchmark as pgbench cannot
    apparently benchmark this. Sadly of course, then you cannot I think
    disable compression and/or load into TEMPORARY table so it's far worse
    and hits I/O heavy (as it hit pg_largeobjects*)
    
    -J.