Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON

Andrew Dunstan <andrew@dunslane.net>

From: Andrew Dunstan <andrew@dunslane.net>
To: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Cc: pgsql-hackers@postgresql.org, Tom Lane <tgl@sss.pgh.pa.us>
Date: 2026-05-04T14:19:21Z
Lists: pgsql-hackers

Attachments

On 2026-04-29 We 12:49 PM, Ayush Tiwari wrote:
> Hi,
>
> On Mon, 20 Apr 2026 at 20:31, Ayush Tiwari 
> <ayushtiwari.slg01@gmail.com> wrote:
>
>     Hi,
>
>
>         On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>             Seems to me the correct thing here is to make it work like
>             the other
>             cases, ie perform pg_server_to_any().  I have exactly no
>             sympathy for
>             the argument about the RFC saying it must be UTF-8, not
>             least because
>             that's not in fact what is implemented (what if the server
>             encoding
>             isn't UTF-8?).
>
>
>         Agreed. I initially thought rejecting the option was the safer
>         route
>         given the RFC, but as you pointed out, we aren't enforcing
>         UTF-8 strictly on the server side anyway.
>
>
>             Rejecting this option altogether doesn't improve anything, not
>             functionally, not specs-compliance-wise, nor according to the
>             principle of least surprise.
>
>         Makes sense. Implementing the conversion properly
>         keeps JSON format consistent with how the text and CSV formats
>         behave.
>
>
>             No, you don't get to punt this till later.  Once we ship
>             v19 there's
>             going to be a strong expectation of backwards compatibility.
>
>             The idea of sending UTF-8 to a client that's set
>             client_encoding to
>             something else would be risible, if it weren't a security
>             hazard.
>
>
>         I agree sending unconverted bytes to a mismatched
>         client encoding is clearly a security hazard that needs
>         addressing. Did
>         not consider the backward compatibility part, my bad.
>
>         Was trying out adding  pg_server_to_any() to the json_buf after
>         composite_to_json() returns,
>         correctly covering both explicit ENCODING option
>         specifications and
>         implicit client_encoding mismatches.
>
>         Let me send a patch with code and associated test cases.
>
>     Attached patch with round trip test case. Please review and let me
>     know if it's in the right direction.
>
>
> I have registered this patch set in the CommitFest for tracking:
> https://commitfest.postgresql.org/patch/6700/
>
> Please let me know if the patch looks good, and if I need to add it
> in the open items list for PG 19.
>
>

Basically good, I think. I have modified your test a bit, testing more 
directly for the presence of the LATIN-1 encoded character and the 
absence of the UTF-8 encoded character, by reading in the file with 
pg_read_binary_file, and adding a test for implicit encoding by setting 
client_encoding.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com