Re: Support UTF-8 files with BOM in COPY FROM

David Wheeler <david@kineticode.com>

From: "David E. Wheeler" <david@kineticode.com>
To: Itagaki Takahiro <itagaki.takahiro@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2011-09-26T06:14:03Z
Lists: pgsql-hackers
On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:

> I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
> in COPY FROM command. BOM will be automatically detected and ignored
> if the file encoding is UTF-8. WIP patch attached.

By my reading of http://unicode.org/faq/utf_bom.html#bom5, I'd say +1

So I think what you propose makes sense.

> I'm thinking about only COPY FROM for reads, but if someone wants to add
> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.

I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."

Best,

David