Thread

Re: Speed up COPY FROM text/CSV parsing using SIMD

Nazir Bilal Yavuz <byavuz81@gmail.com> — 2025-12-18T07:35:44Z
Hi,

On Sat, 13 Dec 2025 at 02:09, Manni Wood <manni.wood@enterprisedb.com> wrote:
>
> Hello, Everyone!
>
> I have attached two files. 1) the shell script that Mark and I have been using to get our test results, and 2) a screenshot of a spreadsheet of my latest test results. (Please let me know if there's a different format than a screenshot that I could share my spreadsheet in.)
>
> I took greater care this time to compile all three variants of Postgres (master at bfb335df, master at bfb335df with v4.2 patches installed, master at bfb335df with v3 patches installed) with the same gcc optimization flags that would be used to build Postgres packages. To the best of my knowledge, the two gcc flags of greatest interest would be -g and -O2. I built all three variants of Postgres using meson like so:
>
> BRANCH=$(git branch --show-current)
> meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH} --buildtype=debugoptimized
>
> It occurred to me that in addition to end users only caring about 1) wall clock time (is the speedup noticeable in "real time" or just technically faster / uses less CPU?) and 2) Postgres binaries compiled with the same optimization level one would get when installing Postgres from packages like .deb or .rpm; in other words, will the user see speedups without having do manually compile postgres.
>
> My interesting finding, on my laptop (ThinkPad P14s Gen 1 running Ubuntu 24.04.3), is different from Mark Wong's. On my laptop, using three Postgres installations all compiled with the -O2 optimization flag, I see speedups with the v4.2 patch except for a 2% slowdown with CSV with 1/3rd quotes (a 2% slowdown). But with Nazir's proposed v3 patch, I see improvements across the board. So even for a text file with 1/3rd escape characters, and even with a CSV file with 1/3rd quotes, I see speedups of 11% and 26% respectively.
>
> The format of these test files originally comes from Ayoub Kazar's test scripts; all Mark and I have done in playing with them is make them much larger: 5,000,000 rows, based on the assumption that longer tests are better tests.
>
> I find my results interesting enough that I'd be curious to know if anybody else can reproduce them. It is very interesting that Mark's results are noticeably different from mine.

Thank you for sharing the benchmark script! I ran the benchmarks using
your script with --buildtype=debugoptimized. My results are below:

master: 85ddcc2f4c

text, no special: 102294
text, 1/3 special: 108946
csv, no special: 121831
csv, 1/3 special: 140063

v3

text, no special: 88890 (13.1% speedup)
text, 1/3 special: 110463 (1.4% regression)
csv, no special: 89781 (26.3% speedup)
csv, 1/3 special: 147094 (5.0% regression)

v4.2

text, no special: 87785 (14.2% speedup)
text, 1/3 special: 127008 (16.6% regression)
csv, no special: 88093 (27.7% speedup)
csv, 1/3 special: 164487 (17.4% regression)

One thing I noticed is that your benchmark timings appear to have some
variance. In my runs, I did not observe differences greater than one
second between runs. It is possible that this variance is affecting
your results.

Before running the benchmarks, I use the these commands [1] to improve
result stability; they might be helpful if you are not already using
something similar:

I did this benchmark on my local and my specs are Intel i5 13600k,
32GB Memory and SATA SSD.

[1]
sudo cpupower frequency-set --governor=performance
sudo cpupower idle-set -D 0 # disable idle
echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo (intel only)

-- 
Regards,
Nazir Bilal Yavuz
Microsoft