Thread

  1. Re: Speed up COPY FROM text/CSV parsing using SIMD

    Andrew Dunstan <andrew@dunslane.net> — 2025-10-29T22:22:46Z

    On 2025-10-22 We 3:24 PM, Nathan Bossart wrote:
    > On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
    >> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart@gmail.com> wrote:
    >>> I wonder if we could mitigate the regression further by spacing out the
    >>> checks a bit more.  It could be worth comparing a variety of values to
    >>> identify what works best with the test data.
    >> Do you mean that instead of doubling the SIMD sleep, we should
    >> multiply it by 3 (or another factor)? Or are you referring to
    >> increasing the maximum sleep from 1024? Or possibly both?
    > I'm not sure of the precise details, but the main thrust of my suggestion
    > is to assume that whatever sampling you do to determine whether to use SIMD
    > is good for a larger chunk of data.  That is, if you are sampling 1K lines
    > and then using the result to choose whether to use SIMD for the next 100K
    > lines, we could instead bump the latter number to 1M lines (or something).
    > That way we minimize the regression for relatively uniform data sets while
    > retaining some ability to adapt in case things change halfway through a
    > large table.
    >
    
    
    I'd be ok with numbers like this, although I suspect the numbers of 
    cases where we see shape shifts like this in the middle of a data set 
    would be vanishingly small.
    
    
    cheers
    
    
    andrew
    
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com