Re: Speed up COPY FROM text/CSV parsing using SIMD

Nathan Bossart <nathandbossart@gmail.com>

From: Nathan Bossart <nathandbossart@gmail.com>
To: Nazir Bilal Yavuz <byavuz81@gmail.com>
Cc: Andrew Dunstan <andrew@dunslane.net>, Shinya Kato <shinya11.kato@gmail.com>, Manni Wood <manni.wood@enterprisedb.com>, KAZAR Ayoub <ma_kazar@esi.dz>, PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: 2025-11-24T21:59:21Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Optimize COPY FROM (FORMAT {text,csv}) using SIMD.

  2. Speedup COPY FROM with additional function inlining.

  3. doc: Fix incorrect wording for --file in pg_dump

On Thu, Nov 20, 2025 at 03:55:43PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <nathandbossart@gmail.com> wrote:
>> +            /* Load a chunk of data into a vector register */
>> +            vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
>>
>> In other places, processing 2 or 4 vectors of data at a time has proven
>> faster.  Have you tried that here?
> 
> Sorry, I could not find the related code piece. I only saw the
> vector8_load() inside of hex_decode_safe() function and its comment
> says:
> 
> /*
>  * We must process 2 vectors at a time since the output will be half the
>  * length of the input.
>  */
> 
> But this does not mention any speedup from using 2 vectors at a time.
> Could you please show the related code?

See pg_lfind32().

-- 
nathan