Thread
-
Re: Should CSV parsing be stricter about mid-field quotes?
Kirk Wolak <wolakk@gmail.com> — 2023-05-17T22:18:05Z
On Wed, May 17, 2023 at 5:47 PM Joel Jacobson <joel@compiler.org> wrote: > On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote: > > You can use CSV mode pretty reliably for TSV files. The trick is to use a > > quoting char that shouldn't appear, such as E'\x01' as well as setting > the > > delimiter to E'\t'. Yes, it's far from obvious. > > I've been using that trick myself many times in the past, but thanks to > this > deep-dive into this topic, it looks to me like TEXT would be a better > format > fit when dealing with unquoted TSV files, or? > > OTOH, one would then need to inspect the TSV file doesn't contain \. on an > empty > line... > > I was about to suggest we perhaps should consider adding a TSV format, that > is like TEXT excluding the PostgreSQL specific things like \. and \N, > but then I tested exporting TSV from Numbers on Mac and Google Sheets, > and I can see there are incompatible differences. Numbers quote fields > that contain double-quote marks, while Google Sheets doesn't. > None of them (unsurpringly) uses midfield quoting though. > > Anyone using Excel that could try exporting the following example as > CSV/TSV? > > CREATE TABLE t (a text, b text, c text, d text); > INSERT INTO t (a, b, c, d) > VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t > with a tab'); > > Here you go. Not horrible handling. (I use DataGrip so I saved it from there directly as TSV, just for an extra datapoint). FWIW, if you copy/paste in windows, the data, the field with the tab gets split into another column in Excel. But saving it as a file, and opening it. Saving it as XLSX, and then having Excel save it as a TSV (versus opening a text file, and saving it back) Kirk...