Re: Speed up COPY FROM text/CSV parsing using SIMD
Nathan Bossart <nathandbossart@gmail.com>
From: Nathan Bossart <nathandbossart@gmail.com>
To: KAZAR Ayoub <ma_kazar@esi.dz>
Cc: Nazir Bilal Yavuz <byavuz81@gmail.com>, "ants.aasma@cybertec.at" <ants.aasma@cybertec.at>, Andrew Dunstan <andrew@dunslane.net>, Shinya Kato <shinya11.kato@gmail.com>, pgsql-hackers@postgresql.org
Date: 2025-10-21T18:55:07Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
- e0a3a3fd5361 19 (unreleased) landed
-
Speedup COPY FROM with additional function inlining.
- dc592a41557b 19 (unreleased) landed
-
doc: Fix incorrect wording for --file in pg_dump
- 07961ef86625 19 (unreleased) cited
On Tue, Oct 21, 2025 at 08:17:01AM +0200, KAZAR Ayoub wrote: >>> I'm also trying the idea of doing SIMD inside quotes with prefix XOR >>> using carry less multiplication avoiding the slow path in all cases even >>> with weird looking input, but it needs to take into consideration the >>> availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we >>> go, it quickly starts to become dirty OR we can wait for the decision to >>> start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2. > > [...] > > Currently we are at 200-400Mbps which isn't that terrible compared to > production and non production grade parsers (of course we don't only parse > in our case), also we are using SSE2 only so theoretically if we add > support for avx later on we'll have even better numbers. > Maybe more micro optimizations to the current heuristic can squeeze it more. I'd greatly prefer that we stick with SSE2/Neon (i.e., simd.h) unless the gains are extraordinary. Beyond the inherent complexity of using architecture-specific intrinsics, you also have to deal with configure-time checks, runtime checks, and function pointer overhead juggling. That tends to be a lot of work for the amount of gain. -- nathan