Re: Speed up COPY FROM text/CSV parsing using SIMD

Nazir Bilal Yavuz <byavuz81@gmail.com>

From: Nazir Bilal Yavuz <byavuz81@gmail.com>

To: Shinya Kato <shinya11.kato@gmail.com>

Cc: pgsql-hackers@postgresql.org

Date: 2025-08-11T08:52:25Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
- e0a3a3fd5361 19 (unreleased) landed
Speedup COPY FROM with additional function inlining.
- dc592a41557b 19 (unreleased) landed
doc: Fix incorrect wording for --file in pg_dump
- 07961ef86625 19 (unreleased) cited

Attachments

v2-0001-Feedback.txt (text/plain)

Hi,

On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> On Thu, 7 Aug 2025 at 04:49, Shinya Kato <shinya11.kato@gmail.com> wrote:
> >
> > I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> > text}) command and observed approximately a 5% performance
> > improvement. Please see the detailed test results below.
>
> Also, I did a benchmark on text format. I created a benchmark for line
> length in a table being from 1 byte to 1 megabyte.The peak improvement
> is line length being 4096 and the improvement is more than 20% [1], I
> saw no regression on your patch.

I did the same benchmark for the CSV format. The peak improvement is
line length being 4096 and the improvement is more than 25% [1]. I saw
a 5% regression on the 1 byte benchmark, there are no other
regressions.

> What do you think about adding SIMD to CopyReadAttributesText() and
> CopyReadAttributesCSV() functions? When I add your SIMD approach to
> CopyReadAttributesText() function, the improvement on the 4096 byte
> line length input [1] goes from 20% to 30%.

I wanted to try using SIMD in CopyReadAttributesCSV() as well. The
improvement on the 4096 byte line length input [1] goes from 25% to
35%, the regression on the 1 byte input is the same.

CopyReadAttributesCSV() changes are attached as feedback v2.

--
Regards,
Nazir Bilal Yavuz
Microsoft