Re: Support UTF-8 files with BOM in COPY FROM
Tatsuo Ishii <ishii@postgresql.org>
From: Tatsuo Ishii <ishii@postgresql.org>
To: itagaki.takahiro@gmail.com
Cc: pgsql-hackers@postgresql.org
Date: 2011-09-26T14:33:50Z
Lists: pgsql-hackers
> I'd like to support UTF-8 text or csv files that has BOM (byte order mark) > in COPY FROM command. BOM will be automatically detected and ignored > if the file encoding is UTF-8. WIP patch attached. >From RFC3629(http://tools.ietf.org/html/rfc3629#section-6): o A protocol SHOULD forbid use of U+FEFF as a signature for those textual protocol elements that the protocol mandates to be always UTF-8, the signature function being totally useless in those cases. COPY explicitly specifies the encoding (to be UTF-8 in this case). So I think we should not regard U+FEFF as "BOM" in COPY, rather we should regard U+FEFF as "ZERO WIDTH NO-BREAK SPACE". -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp