Re: Speed up COPY FROM text/CSV parsing using SIMD
Ants Aasma <ants.aasma@cybertec.at>
From: Ants Aasma <ants.aasma@cybertec.at>
To: Nazir Bilal Yavuz <byavuz81@gmail.com>
Cc: Shinya Kato <shinya11.kato@gmail.com>, pgsql-hackers@postgresql.org
Date: 2025-08-19T09:09:20Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
- e0a3a3fd5361 19 (unreleased) landed
-
Speedup COPY FROM with additional function inlining.
- dc592a41557b 19 (unreleased) landed
-
doc: Fix incorrect wording for --file in pg_dump
- 07961ef86625 19 (unreleased) cited
On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > I have a couple of ideas that I was working on: > --- > > + * However, SIMD optimization cannot be applied in the following cases: > + * - Inside quoted fields, where escape sequences and closing quotes > + * require sequential processing to handle correctly. > > I think you can continue SIMD inside quoted fields. Only important > thing is you need to set last_was_esc to false when SIMD skipped the > chunk. There is a trick with doing carryless multiplication with -1 that can be used to SIMD process transitions between quoted/not-quoted. [1] This is able to convert a bitmask of unescaped quote character positions to a quote mask in a single operation. I last looked at it 5 years ago, but I remember coming to the conclusion that it would work for implementing PostgreSQL's interpretation of CSV. [1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76 -- Ants