Thread

  1. Re: [Proposal] Adding TRIM_SPACE option to COPY

    Tatsuya Kawata <kawatatatsuya0913@gmail.com> — 2025-11-26T16:00:41Z

    Hi all,
    
    Thank you for the feedback and for taking the time to review this patch.
    
    ## Performance Measurement Results
    > I also agree that this feature probably won't add noticeable overhead to
    > COPY when it isn't used, but it would still be good to measure
    > the performance impact of the patch.
    I conducted simple performance measurements with 5 million rows to assess
    the overhead
    when TRIM_SPACE is not used:
    
    **BEFORE patch (master branch):**
    Time: 5565.584 ms (00:05.566)
    Time: 5626.593 ms (00:05.627)
    
    **AFTER patch (TRIM_SPACE disabled):**
    Time: 5840.472 ms (00:05.840)
    Time: 5523.806 ms (00:05.524)
    
    The overhead appears to be approximately 1.5%, which seems to be within
    an acceptable range for typical COPY operations.
    
    ## Regarding COPY PROGRAM as an Alternative
    
    I understand Tom's concern about adding features to core that could be
    implemented externally.
    
    > As for TRIM_SPACE, it seems possible to implement it as an external module
    > and call it via COPY PROGRAM. Is this true?
    You're right that TRIM_SPACE can technically be
    achieved using COPY PROGRAM with an external script. I'd like to share
    some observations about this approach:
    
    **Example implementation with COPY PROGRAM:**
    
    ```python
    #!/usr/bin/env python3
    import sys
    import csv
    
    reader = csv.reader(sys.stdin)
    writer = csv.writer(sys.stdout)
    
    for row in reader:
        trimmed_row = [field.strip() for field in row]
        writer.writerow(trimmed_row)
    ```
    
    ```sql
    COPY users FROM PROGRAM 'python3 /tmp/trim_csv.py'
    WITH (FORMAT csv, HEADER true);
    ```
    
    **Some considerations:**
    
    While the COPY PROGRAM approach works, it does have some practical
    limitations:
    it requires elevated privileges (`pg_execute_server_program`), may not be
    available on all platforms (particularly Windows), and requires users to
    handle parsing correctly. That said, I understand the concern about
    where to draw the line for features in core, especially given that options
    like FORCE_NULL follow a similar pattern.
    
    ## My Thoughts
    
    > So the question is which ones belong in core. On second thought,
    > perhaps features that are difficult or impossible to implement outside
    the core
    > are the ones that can be considered for COPY itself. Otherwise, it might
    be
    > better to avoid expanding COPY unnecessarily. Anyway I'd like to hear more
    > opinons on this.
    I can see the validity of the concern about feature creep.
    I'm open to the community's perspective on whether this
    belongs in core.
    
    If the general feeling is that this functionality should remain external,
    I'm happy to withdraw the patch.
    
    Thank you again for your time and valuable feedback. This discussion has
    been
    very educational for me.
    
    Best regards,
    Tatsuya Kawata