Re: Speed up COPY FROM text/CSV parsing using SIMD
Nazir Bilal Yavuz <byavuz81@gmail.com>
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
To: KAZAR Ayoub <ma_kazar@esi.dz>
Cc: Shinya Kato <shinya11.kato@gmail.com>, pgsql-hackers@postgresql.org
Date: 2025-08-14T10:29:35Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
- e0a3a3fd5361 19 (unreleased) landed
-
Speedup COPY FROM with additional function inlining.
- dc592a41557b 19 (unreleased) landed
-
doc: Fix incorrect wording for --file in pg_dump
- 07961ef86625 19 (unreleased) cited
Hi, On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <ma_kazar@esi.dz> wrote: > > Following Nazir's findings about 4096 bytes being the performant line length, I did more benchmarks from my side on both TEXT and CSV formats with two different cases of normal data (no special characters) and data with many special characters. > > Results are con good as expected and similar to previous benchmarks > ~30.9% faster copy in TEXT format > ~32.4% faster copy in CSV format > 20%-30% reduces cycles per instructions > > In the case of doing a lot of special characters in the lines (e.g., tables with large numbers of columns maybe), we obviously expect regressions here because of the overhead of many fallbacks to scalar processing. > Results for a 1/3 of line length of special characters: > ~43.9% slower copy in TEXT format > ~16.7% slower copy in CSV format > So for even less occurrences of special characters or wider distance between there might still be some regressions in this case, a non-significant case maybe, but can be treated in other patches if we consider to not use SIMD path sometimes. > > I hope this helps more and confirms the patch. Thanks for running that benchmark! Would you mind sharing a reproducer for the regression you observed? -- Regards, Nazir Bilal Yavuz Microsoft