Thread

  1. Re: C11: should we use char32_t for unicode code points?

    Tatsuo Ishii <ishii@postgresql.org> — 2025-10-24T09:43:15Z

    > Now that we're using C11, should we use char32_t for unicode code
    > points?
    > 
    > Right now, we use pg_wchar for two purposes: 
    > 
    >   1. to abstract away some problems with wchar_t on platforms where
    > it's 16 bits; and
    >   2. hold unicode code point values
    > 
    > In UTF8, they are are equivalent and can be freely cast back and forth,
    > but not necessarily in other encodings. That can be confusing in some
    > contexts. Attached is a patch to use char32_t for the second purpose.
    > 
    > Both are equivalent to uint32, so there's no functional change and no
    > actual typechecking, it's just for readability.
    > 
    > Is this helpful, or needless code churn?
    
    Unless char32_t is solely used for the Unicode code point data, I
    think it would be better to define something like "pg_unicode" and use
    it instead of directly using char32_t because it would be cleaner for
    code readers.
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp