Thread

  1. Re: Speed up COPY FROM text/CSV parsing using SIMD

    Manni Wood <manni.wood@enterprisedb.com> — 2025-11-26T00:09:42Z

    Hello.
    
    I tried Ayoub Kazar's test files again, using Nazir Bilal Yavuz's v3
    patches, but with one difference since my last attempt: this time, I used 5
    million lines per file. For each 5 million line file, I ran the import 5
    times and averaged the results.
    
    (I found that even using 1 million lines could sometimes produce surprising
    speedups where the newer algorithm should be at least a tiny bit slower
    than the non-simd version.)
    
    The text file with no special characters is 30% faster. The CSV file with
    no special characters is 39% faster. The text file with roughly 1/3rd
    special characters is 0.5% slower. The CSV file with roughly 1/3rd special
    characters is 2.7% slower.
    
    I also tried files that alternated lines with no special characters and
    lines with 1/3rd special characters, thinking I could force the algorithm
    to continually check whether or not it should use simd and therefore force
    more overhead in the try-simd/don't-try-simd housekeeping code. The text
    file was still 50% faster. The CSV file was still 13% faster.
    
    
    
    On Mon, Nov 24, 2025 at 3:59 PM Nathan Bossart <nathandbossart@gmail.com>
    wrote:
    
    > On Thu, Nov 20, 2025 at 03:55:43PM +0300, Nazir Bilal Yavuz wrote:
    > > On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <nathandbossart@gmail.com>
    > wrote:
    > >> +            /* Load a chunk of data into a vector register */
    > >> +            vector8_load(&chunk, (const uint8 *)
    > &copy_input_buf[input_buf_ptr]);
    > >>
    > >> In other places, processing 2 or 4 vectors of data at a time has proven
    > >> faster.  Have you tried that here?
    > >
    > > Sorry, I could not find the related code piece. I only saw the
    > > vector8_load() inside of hex_decode_safe() function and its comment
    > > says:
    > >
    > > /*
    > >  * We must process 2 vectors at a time since the output will be half the
    > >  * length of the input.
    > >  */
    > >
    > > But this does not mention any speedup from using 2 vectors at a time.
    > > Could you please show the related code?
    >
    > See pg_lfind32().
    >
    > --
    > nathan
    >
    
    
    -- 
    -- Manni Wood EDB: https://www.enterprisedb.com