Thread

  1. Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON

    Andrew Dunstan <andrew@dunslane.net> — 2026-05-04T14:19:21Z

    On 2026-04-29 We 12:49 PM, Ayush Tiwari wrote:
    > Hi,
    >
    > On Mon, 20 Apr 2026 at 20:31, Ayush Tiwari 
    > <ayushtiwari.slg01@gmail.com> wrote:
    >
    >     Hi,
    >
    >
    >         On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    >             Seems to me the correct thing here is to make it work like
    >             the other
    >             cases, ie perform pg_server_to_any().  I have exactly no
    >             sympathy for
    >             the argument about the RFC saying it must be UTF-8, not
    >             least because
    >             that's not in fact what is implemented (what if the server
    >             encoding
    >             isn't UTF-8?).
    >
    >
    >         Agreed. I initially thought rejecting the option was the safer
    >         route
    >         given the RFC, but as you pointed out, we aren't enforcing
    >         UTF-8 strictly on the server side anyway.
    >
    >
    >             Rejecting this option altogether doesn't improve anything, not
    >             functionally, not specs-compliance-wise, nor according to the
    >             principle of least surprise.
    >
    >         Makes sense. Implementing the conversion properly
    >         keeps JSON format consistent with how the text and CSV formats
    >         behave.
    >
    >
    >             No, you don't get to punt this till later.  Once we ship
    >             v19 there's
    >             going to be a strong expectation of backwards compatibility.
    >
    >             The idea of sending UTF-8 to a client that's set
    >             client_encoding to
    >             something else would be risible, if it weren't a security
    >             hazard.
    >
    >
    >         I agree sending unconverted bytes to a mismatched
    >         client encoding is clearly a security hazard that needs
    >         addressing. Did
    >         not consider the backward compatibility part, my bad.
    >
    >         Was trying out adding  pg_server_to_any() to the json_buf after
    >         composite_to_json() returns,
    >         correctly covering both explicit ENCODING option
    >         specifications and
    >         implicit client_encoding mismatches.
    >
    >         Let me send a patch with code and associated test cases.
    >
    >     Attached patch with round trip test case. Please review and let me
    >     know if it's in the right direction.
    >
    >
    > I have registered this patch set in the CommitFest for tracking:
    > https://commitfest.postgresql.org/patch/6700/
    >
    > Please let me know if the patch looks good, and if I need to add it
    > in the open items list for PG 19.
    >
    >
    
    Basically good, I think. I have modified your test a bit, testing more 
    directly for the presence of the LATIN-1 encoded character and the 
    absence of the UTF-8 encoded character, by reading in the file with 
    pg_read_binary_file, and adding a test for implicit encoding by setting 
    client_encoding.
    
    
    cheers
    
    
    andrew
    
    --
    Andrew Dunstan
    EDB:https://www.enterprisedb.com