Re: Speed up COPY FROM text/CSV parsing using SIMD
KAZAR Ayoub <ma_kazar@esi.dz>
From: KAZAR Ayoub <ma_kazar@esi.dz>
To: Nazir Bilal Yavuz <byavuz81@gmail.com>
Cc: Shinya Kato <shinya11.kato@gmail.com>, pgsql-hackers@postgresql.org
Date: 2025-08-14T14:59:55Z
Lists: pgsql-hackers
Attachments
- simd-copy-from-bench.sql (application/sql)
> Hi, > > On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <ma_kazar@esi.dz> wrote: > > > > Following Nazir's findings about 4096 bytes being the performant line > length, I did more benchmarks from my side on both TEXT and CSV formats > with two different cases of normal data (no special characters) and data > with many special characters. > > > > Results are con good as expected and similar to previous benchmarks > > ~30.9% faster copy in TEXT format > > ~32.4% faster copy in CSV format > > 20%-30% reduces cycles per instructions > > > > In the case of doing a lot of special characters in the lines (e.g., > tables with large numbers of columns maybe), we obviously expect > regressions here because of the overhead of many fallbacks to scalar > processing. > > Results for a 1/3 of line length of special characters: > > ~43.9% slower copy in TEXT format > > ~16.7% slower copy in CSV format > > So for even less occurrences of special characters or wider distance > between there might still be some regressions in this case, a > non-significant case maybe, but can be treated in other patches if we > consider to not use SIMD path sometimes. > > > > I hope this helps more and confirms the patch. > > Thanks for running that benchmark! Would you mind sharing a reproducer > for the regression you observed? > > -- > Regards, > Nazir Bilal Yavuz > Microsoft Of course, I attached the sql to generate the text and csv test files. If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea. Best regards, Ayoub Kazar