Thread

  1. Re: Speed up COPY FROM text/CSV parsing using SIMD

    Manni Wood <manni.wood@enterprisedb.com> — 2025-11-26T14:21:46Z

    On Wed, Nov 26, 2025 at 5:51 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
    
    > Hello,
    > On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <nathandbossart@gmail.com>
    > wrote:
    >
    >> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
    >> > Thanks, done.
    >>
    >> I took a look at the v3 patches.  Here are my high-level thoughts:
    >>
    >> +    /*
    >> +     * Parse data and transfer into line_buf. To get benefit from
    >> inlining,
    >> +     * call CopyReadLineText() with the constant boolean variables.
    >> +     */
    >> +    if (cstate->simd_continue)
    >> +        result = CopyReadLineText(cstate, is_csv, true);
    >> +    else
    >> +        result = CopyReadLineText(cstate, is_csv, false);
    >>
    >> I'm curious whether this actually generates different code, and if it
    >> does,
    >> if it's actually faster.  We're already branching on cstate->simd_continue
    >> here.
    >
    > I've compiled both versions with -O2 and confirmed they generate different
    > code. When simd_continue is passed as a constant to CopyReadLineText, the
    > compiler optimizes out the condition checks from the SIMD path.
    > A small benchmark on a 1GB+ file shows the expected benefit which is
    > around 6% performance improvement.
    > I've attached the assembly outputs in case someone wants to check
    > something else.
    >
    >
    > Regards,
    > Ayoub Kazar
    >
    
    Correction to my last post:
    
    I also tried files that alternated lines with no special characters and
    lines with 1/3rd special characters, thinking I could force the algorithm
    to continually check whether or not it should use simd and therefore force
    more overhead in the try-simd/don't-try-simd housekeeping code. The text
    file was still 20% faster (not 50% faster as I originally stated --- that
    was a typo). The CSV file was still 13% faster.
    
    Also, apologies for posting at the top in my last e-mail.
    -- 
    -- Manni Wood EDB: https://www.enterprisedb.com