Thread

  1. Re: Should CSV parsing be stricter about mid-field quotes?

    Kirk Wolak <wolakk@gmail.com> — 2023-05-17T22:18:05Z

    On Wed, May 17, 2023 at 5:47 PM Joel Jacobson <joel@compiler.org> wrote:
    
    > On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
    > > You can use CSV mode pretty reliably for TSV files. The trick is to use a
    > > quoting char that shouldn't appear, such as E'\x01' as well as setting
    > the
    > > delimiter to E'\t'. Yes, it's far from obvious.
    >
    > I've been using that trick myself many times in the past, but thanks to
    > this
    > deep-dive into this topic, it looks to me like TEXT would be a better
    > format
    > fit when dealing with unquoted TSV files, or?
    >
    > OTOH, one would then need to inspect the TSV file doesn't contain \. on an
    > empty
    > line...
    >
    > I was about to suggest we perhaps should consider adding a TSV format, that
    > is like TEXT excluding the PostgreSQL specific things like \. and \N,
    > but then I tested exporting TSV from Numbers on Mac and Google Sheets,
    > and I can see there are incompatible differences. Numbers quote fields
    > that contain double-quote marks, while Google Sheets doesn't.
    > None of them (unsurpringly) uses midfield quoting though.
    >
    > Anyone using Excel that could try exporting the following example as
    > CSV/TSV?
    >
    > CREATE TABLE t (a text, b text, c text, d text);
    > INSERT INTO t (a, b, c, d)
    > VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t
    > with a tab');
    >
    >
    Here you go. Not horrible handling.  (I use DataGrip so I saved it from
    there directly as TSV,
    just for an extra datapoint).
    
    FWIW, if you copy/paste in windows, the data, the field with the tab gets
    split into another column in Excel.
    But saving it as a file, and opening it.
    Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
    text file, and saving it back)
    
    Kirk...