Re: Support UTF-8 files with BOM in COPY FROM

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Tatsuo Ishii <ishii@postgresql.org>, david@kineticode.com, itagaki.takahiro@gmail.com, pgsql-hackers@postgresql.org
Date: 2011-09-26T17:34:23Z
Lists: pgsql-hackers
On Mon, Sep 26, 2011 at 1:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The thing that makes me doubt that is this comment from Tatsuo Ishii:
>> TI> COPY explicitly specifies the encoding (to be UTF-8 in this case).  So
>> TI> I think we should not regard U+FEFF as "BOM" in COPY, rather we should
>> TI> regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE".
>
> Yeah, that's a reasonable argument for rejecting the patch altogether.

Yeah, or for making the behavior optional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company