Re: Speed up COPY FROM text/CSV parsing using SIMD
Nathan Bossart <nathandbossart@gmail.com>
From: Nathan Bossart <nathandbossart@gmail.com>
To: Nazir Bilal Yavuz <byavuz81@gmail.com>
Cc: Andrew Dunstan <andrew@dunslane.net>, Shinya Kato <shinya11.kato@gmail.com>, Manni Wood <manni.wood@enterprisedb.com>, KAZAR Ayoub <ma_kazar@esi.dz>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-11-24T21:59:21Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
- e0a3a3fd5361 19 (unreleased) landed
-
Speedup COPY FROM with additional function inlining.
- dc592a41557b 19 (unreleased) landed
-
doc: Fix incorrect wording for --file in pg_dump
- 07961ef86625 19 (unreleased) cited
On Thu, Nov 20, 2025 at 03:55:43PM +0300, Nazir Bilal Yavuz wrote: > On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <nathandbossart@gmail.com> wrote: >> + /* Load a chunk of data into a vector register */ >> + vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]); >> >> In other places, processing 2 or 4 vectors of data at a time has proven >> faster. Have you tried that here? > > Sorry, I could not find the related code piece. I only saw the > vector8_load() inside of hex_decode_safe() function and its comment > says: > > /* > * We must process 2 vectors at a time since the output will be half the > * length of the input. > */ > > But this does not mention any speedup from using 2 vectors at a time. > Could you please show the related code? See pg_lfind32(). -- nathan