Re: Speed up COPY FROM text/CSV parsing using SIMD

KAZAR Ayoub <ma_kazar@esi.dz>

From: KAZAR Ayoub <ma_kazar@esi.dz>
To: Nazir Bilal Yavuz <byavuz81@gmail.com>
Cc: Shinya Kato <shinya11.kato@gmail.com>, pgsql-hackers@postgresql.org
Date: 2025-08-21T19:36:42Z
Lists: pgsql-hackers
> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
> >> Thanks for running that benchmark! Would you mind sharing a reproducer
> >> for the regression you observed?
> >
> > Of course, I attached the sql to generate the text and csv test files.
> > If having a 1/3 of line length of special characters can be an
> exaggeration, something lower might still reproduce some regressions of
> course for the same idea.
>
> Thank you so much!
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).
> - Each time advance < 5, n is doubled.
> - Each time advance ≥ 5, n is halved.
>
> I am sharing a POC patch to show heuristic, it can be applied on top
> of v1-0001. Heuristic version has the same performance improvements
> with the v1-0001 but the regression is %5 instead of %20 compared to
> the master.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Yes this is good, i'm also getting about 5% regression only now.



Regards,
Ayoub Kazar