Re: Speed up COPY FROM text/CSV parsing using SIMD
KAZAR Ayoub <ma_kazar@esi.dz>
From: KAZAR Ayoub <ma_kazar@esi.dz>
To: Nathan Bossart <nathandbossart@gmail.com>
Cc: Nazir Bilal Yavuz <byavuz81@gmail.com>,
Andrew Dunstan <andrew@dunslane.net>, Shinya Kato <shinya11.kato@gmail.com>,
Manni Wood <manni.wood@enterprisedb.com>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-11-26T11:50:58Z
Lists: pgsql-hackers
Attachments
- copyfromparse-constant.asm (application/octet-stream)
- copyfromparse-variable.asm (application/octet-stream)
Hello, On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote: > > Thanks, done. > > I took a look at the v3 patches. Here are my high-level thoughts: > > + /* > + * Parse data and transfer into line_buf. To get benefit from > inlining, > + * call CopyReadLineText() with the constant boolean variables. > + */ > + if (cstate->simd_continue) > + result = CopyReadLineText(cstate, is_csv, true); > + else > + result = CopyReadLineText(cstate, is_csv, false); > > I'm curious whether this actually generates different code, and if it does, > if it's actually faster. We're already branching on cstate->simd_continue > here. I've compiled both versions with -O2 and confirmed they generate different code. When simd_continue is passed as a constant to CopyReadLineText, the compiler optimizes out the condition checks from the SIMD path. A small benchmark on a 1GB+ file shows the expected benefit which is around 6% performance improvement. I've attached the assembly outputs in case someone wants to check something else. Regards, Ayoub Kazar