Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Force strings passed to and from plperl to be in UTF8 encoding.

  1. pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Andrew Dunstan <andrew@dunslane.net> — 2011-02-06T22:31:13Z

    Force strings passed to and from plperl to be in UTF8 encoding.
    
    String are converted to UTF8 on the way into perl and to the
    database encoding on the way back. This avoids a number of
    observed anomalies, and ensures Perl a consistent view of the
    world.
    
    Some minor code cleanups are also accomplished.
    
    Alex Hunsaker, reviewed by Andy Colson.
    
    Branch
    ------
    master
    
    Details
    -------
    http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=50d89d422f9c68a52a6964e5468e8eb4f90b1d95
    
    Modified Files
    --------------
    doc/src/sgml/plperl.sgml       |    8 ++
    src/pl/plperl/SPI.xs           |   52 ++++++---
    src/pl/plperl/Util.xs          |   66 +++++-----
    src/pl/plperl/plperl.c         |  260 +++++++++++++++++++++++-----------------
    src/pl/plperl/plperl_helpers.h |   69 +++++++++++
    5 files changed, 295 insertions(+), 160 deletions(-)
    
    
  2. Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Andrew Dunstan <andrew@dunslane.net> — 2011-02-07T01:02:05Z

    
    
    
    
    On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
    > Force strings passed to and from plperl to be in UTF8 encoding.
    >
    > String are converted to UTF8 on the way into perl and to the
    > database encoding on the way back. This avoids a number of
    > observed anomalies, and ensures Perl a consistent view of the
    > world.
    >
    > Some minor code cleanups are also accomplished.
    >
    > Alex Hunsaker, reviewed by Andy Colson.
    
    
    This has broken the buildfarm :-(
    
    perl ppport.h reports:
    
        *** WARNING: Uses HeUTF8, which may not be portable below perl
        5.11.0, even with 'ppport.h'
    
    
    Experimentation on a CentOS machine suggests we can cure it with this:
    
        #ifndef HeUTF8
        #define HeUTF8(he)             ((HeKLEN(he) == HEf_SVKEY) ?            \
                                         SvUTF8(HeKEY_sv(he))
        :                 \
                                         (U32)HeKUTF8(he))
        #endif
    
    
    cheers
    
    andrew
    
    
  3. Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-02-07T02:13:22Z

    On Sun, Feb 6, 2011 at 18:02, Andrew Dunstan <andrew@dunslane.net> wrote:
    >
    >
    >
    >
    >
    > On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
    >>
    >> Force strings passed to and from plperl to be in UTF8 encoding.
    >>
    >> String are converted to UTF8 on the way into perl and to the
    >> database encoding on the way back. This avoids a number of
    >> observed anomalies, and ensures Perl a consistent view of the
    >> world.
    >>
    >> Some minor code cleanups are also accomplished.
    >>
    >> Alex Hunsaker, reviewed by Andy Colson.
    >
    >
    > This has broken the buildfarm :-(
    
    Drat.
    
    > perl ppport.h reports:
    >
    >   *** WARNING: Uses HeUTF8, which may not be portable below perl
    >   5.11.0, even with 'ppport.h'
    > Experimentation on a CentOS machine suggests we can cure it with this:
    >
    >   #ifndef HeUTF8
    >   #define HeUTF8(he)             ((HeKLEN(he) == HEf_SVKEY) ?            \
    >                                    SvUTF8(HeKEY_sv(he))
    >   :                 \
    >                                    (U32)HeKUTF8(he))
    >   #endif
    
    Yeah, that should work. BTW I would have loved to add some regression
    tests for some of this (like the example hek2cstr states). Is there
    any way to do that?
    
    
  4. Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Andrew Dunstan <andrew@dunslane.net> — 2011-02-07T02:52:48Z

    
    On 02/06/2011 09:13 PM, Alex Hunsaker wrote:
    > I would have loved to add some regression
    > tests for some of this (like the example hek2cstr states). Is there
    > any way to do that?
    >
    
    I can't think of an obvious way. Anyone else?
    
    cheers
    
    ndrew
    
    
  5. Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-02-12T09:18:36Z

    On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew@dunslane.net> wrote:
    > Force strings passed to and from plperl to be in UTF8 encoding.
    >
    > String are converted to UTF8 on the way into perl and to the
    > database encoding on the way back. This avoids a number of
    > observed anomalies, and ensures Perl a consistent view of the
    > world.
    
    So I noticed a problem while playing with this in my discussion with
    David Wheeler. pg_do_encoding() does nothing when the src encoding ==
    the dest encoding. That means on a UTF-8 database we fail make sure
    our strings are valid utf8.
    
    An easy way to see this is to embed a null in the middle of a string:
    => create or replace function zerob() returns text as $$ return
    "abcd\0efg"; $$ language plperl;
    => SELECT zerob();
    abcd
    
    Also It seems bogus to bogus to do any encoding conversion when we are
    SQL_ASCII, and its really trivial to fix.
    
    With the attached:
    - when we are on a utf8 database make sure to verify our output string
    in sv2cstr (we assume database strings coming in are already valid)
    
    - Do no string conversion when we are SQL_ASCII in or out
    
    - add plperl_helpers.h as a dep to plperl.o in our makefile
    
    - remove some redundant calls to pg_verify_mbstr()
    
    - as utf_e2u only as one caller dont pstrdup() instead have the caller
    check (saves some cycles and memory)
    
  6. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-03T10:20:30Z

    On 12 February 2011 14:48, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew@dunslane.net> wrote:
    >> Force strings passed to and from plperl to be in UTF8 encoding.
    >>
    >> String are converted to UTF8 on the way into perl and to the
    >> database encoding on the way back. This avoids a number of
    >> observed anomalies, and ensures Perl a consistent view of the
    >> world.
    >
    > So I noticed a problem while playing with this in my discussion with
    > David Wheeler. pg_do_encoding() does nothing when the src encoding ==
    > the dest encoding. That means on a UTF-8 database we fail make sure
    > our strings are valid utf8.
    >
    > An easy way to see this is to embed a null in the middle of a string:
    > => create or replace function zerob() returns text as $$ return
    > "abcd\0efg"; $$ language plperl;
    > => SELECT zerob();
    > abcd
    >
    > Also It seems bogus to bogus to do any encoding conversion when we are
    > SQL_ASCII, and its really trivial to fix.
    >
    > With the attached:
    > - when we are on a utf8 database make sure to verify our output string
    > in sv2cstr (we assume database strings coming in are already valid)
    >
    > - Do no string conversion when we are SQL_ASCII in or out
    >
    > - add plperl_helpers.h as a dep to plperl.o in our makefile
    >
    > - remove some redundant calls to pg_verify_mbstr()
    >
    > - as utf_e2u only as one caller dont pstrdup() instead have the caller
    > check (saves some cycles and memory)
    >
    
    Is there a plan to commit this issue? I am still seeing this issue on
    PG 9.1 STABLE branch. Attached is a small patch that targets only the
    specific issue in the described testcase :
    
    create or replace function zerob() returns text as $$ return
    "abcd\0efg"; $$ language plperl;
    SELECT zerob();
    
    The patch does the perl data validation in the function utf_u2e() itself.
    
    >
    > --
    > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    > To make changes to your subscription:
    > http://www.postgresql.org/mailpref/pgsql-hackers
    >
    >
    
  7. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-03T17:07:01Z

    On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
    <amit.khandekar@enterprisedb.com> wrote:
    
    > Is there a plan to commit this issue? I am still seeing this issue on
    > PG 9.1 STABLE branch. Attached is a small patch that targets only the
    > specific issue in the described testcase :
    >
    > create or replace function zerob() returns text as $$ return
    > "abcd\0efg"; $$ language plperl;
    > SELECT zerob();
    >
    > The patch does the perl data validation in the function utf_u2e() itself.
    
    I think thats fine, but as coded it will verify the string twice in
    the GetDatabaseEncoding() != PG_UTF8 case (once for
    pg_do_encoding_conversion() and again with the added
    pg_verify_mbstr_len), which seems a bit wasteful.
    
    It might be worth adding a regression test also...
    
    
  8. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-04T05:35:21Z

    On 3 October 2011 22:37, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
    > <amit.khandekar@enterprisedb.com> wrote:
    >
    >> Is there a plan to commit this issue? I am still seeing this issue on
    >> PG 9.1 STABLE branch. Attached is a small patch that targets only the
    >> specific issue in the described testcase :
    >>
    >> create or replace function zerob() returns text as $$ return
    >> "abcd\0efg"; $$ language plperl;
    >> SELECT zerob();
    >>
    >> The patch does the perl data validation in the function utf_u2e() itself.
    >
    > I think thats fine, but as coded it will verify the string twice in
    > the GetDatabaseEncoding() != PG_UTF8 case (once for
    > pg_do_encoding_conversion() and again with the added
    > pg_verify_mbstr_len), which seems a bit wasteful.
    
    WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    utf8_str, so pg_verify_mbstr_len() will not get called. That's the
    reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
    condition.
    
    
    >
    > It might be worth adding a regression test also...
    
    I could not find any basic pl/perl tests in the regression
    serial_schedule. I am not sure if we want to add just this scenario
    without any basic tests for pl/perl ?
    
    >
    
    
  9. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-04T06:43:53Z

    On 04.10.2011 08:35, Amit Khandekar wrote:
    > On 3 October 2011 22:37, Alex Hunsaker<badalex@gmail.com>  wrote:
    >> It might be worth adding a regression test also...
    >
    > I could not find any basic pl/perl tests in the regression
    > serial_schedule. I am not sure if we want to add just this scenario
    > without any basic tests for pl/perl ?
    
    See src/pl/plperl/[sql|expected]
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  10. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-04T08:34:14Z

    On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
    <amit.khandekar@enterprisedb.com> wrote:
    
    > WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    > utf8_str, so pg_verify_mbstr_len() will not get called. That's the
    > reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
    > condition.
    
    Consider a latin1 database where utf8_str was a string of ascii
    characters. Then no conversion would take place and ret == utf8_str
    but the string would be verified by pg_do_encdoing_conversion() and
    verified again by your added check :-).
    
    >> It might be worth adding a regression test also...
    >
    > I could not find any basic pl/perl tests in the regression
    > serial_schedule. I am not sure if we want to add just this scenario
    > without any basic tests for pl/perl ?
    
    I went ahead and added one in the attached based upon your example.
    
    Look ok to you?
    
    BTW thanks for the patch!
    
    [ side note ]
    I still think we should not be doing any conversion in the SQL_ASCII
    case but this slimmed down patch should be less controversial.
    
  11. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-04T09:09:44Z

    On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
    > <amit.khandekar@enterprisedb.com> wrote:
    >
    >> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    >> utf8_str, so pg_verify_mbstr_len() will not get called. That's the
    >> reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
    >> condition.
    >
    > Consider a latin1 database where utf8_str was a string of ascii
    > characters. Then no conversion would take place and ret == utf8_str
    > but the string would be verified by pg_do_encdoing_conversion() and
    > verified again by your added check :-).
    >
    >>> It might be worth adding a regression test also...
    >>
    >> I could not find any basic pl/perl tests in the regression
    >> serial_schedule. I am not sure if we want to add just this scenario
    >> without any basic tests for pl/perl ?
    >
    > I went ahead and added one in the attached based upon your example.
    >
    > Look ok to you?
    >
    
    + 	if(GetDatabaseEncoding() == PG_UTF8)
    + 		pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    
    In your patch, the above will again skip mb-validation if the database
    encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
    the un-converted string even if *one* of the src and dest encodings is
    SQL_ASCII.
    
    I think :
            if (ret == utf8_str)
    +       {
    +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
                    ret = pstrdup(ret);
    +       }
    
    This (ret == utf8_str) condition would be a reliable way for knowing
    whether pg_do_encoding_conversion() has done the conversion at all.
    
    I am ok with the new test. Thanks for doing it yourself !
    
    
    > BTW thanks for the patch!
    >
    > [ side note ]
    > I still think we should not be doing any conversion in the SQL_ASCII
    > case but this slimmed down patch should be less controversial.
    >
    
    
  12. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-04T17:27:32Z

    On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
    <amit.khandekar@enterprisedb.com> wrote:
    > On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
    >> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
    >> <amit.khandekar@enterprisedb.com> wrote:
    >>
    >>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    >>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
    >>
    >> Consider a latin1 database where utf8_str was a string of ascii
    >> characters. [...]
    
    >> [Patch] Look ok to you?
    >>
    >
    > +       if(GetDatabaseEncoding() == PG_UTF8)
    > +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >
    > In your patch, the above will again skip mb-validation if the database
    > encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
    > the un-converted string even if *one* of the src and dest encodings is
    > SQL_ASCII.
    
    *scratches head* I thought the point of SQL_ASCII was no encoding
    conversion was done and so there would be nothing to verify.
    
    Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
    NULL bytes in the string when we are a single byte encoding.
    
    > I think :
    >        if (ret == utf8_str)
    > +       {
    > +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >                ret = pstrdup(ret);
    > +       }
    >
    > This (ret == utf8_str) condition would be a reliable way for knowing
    > whether pg_do_encoding_conversion() has done the conversion at all.
    
    Yes. However (and maybe im nitpicking here), I dont see any reason to
    verify certain strings twice if we can avoid it.
    
    What do you think about:
    +    /*
    +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
    +    * will not do any conversion or verification. we need to do it
    manually instead.
    +    */
    +       if( GetDatabaseEncoding() == PG_UTF8 ||
                  GetDatabaseEncoding() == SQL_ASCII)
    +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    
    
  13. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-05T05:46:02Z

    On 4 October 2011 22:57, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
    > <amit.khandekar@enterprisedb.com> wrote:
    >> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
    >>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
    >>> <amit.khandekar@enterprisedb.com> wrote:
    >>>
    >>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    >>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
    >>>
    >>> Consider a latin1 database where utf8_str was a string of ascii
    >>> characters. [...]
    >
    >>> [Patch] Look ok to you?
    >>>
    >>
    >> +       if(GetDatabaseEncoding() == PG_UTF8)
    >> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>
    >> In your patch, the above will again skip mb-validation if the database
    >> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
    >> the un-converted string even if *one* of the src and dest encodings is
    >> SQL_ASCII.
    >
    > *scratches head* I thought the point of SQL_ASCII was no encoding
    > conversion was done and so there would be nothing to verify.
    >
    > Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
    > NULL bytes in the string when we are a single byte encoding.
    >
    >> I think :
    >>        if (ret == utf8_str)
    >> +       {
    >> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>                ret = pstrdup(ret);
    >> +       }
    >>
    >> This (ret == utf8_str) condition would be a reliable way for knowing
    >> whether pg_do_encoding_conversion() has done the conversion at all.
    >
    > Yes. However (and maybe im nitpicking here), I dont see any reason to
    > verify certain strings twice if we can avoid it.
    >
    > What do you think about:
    > +    /*
    > +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
    > +    * will not do any conversion or verification. we need to do it
    > manually instead.
    > +    */
    > +       if( GetDatabaseEncoding() == PG_UTF8 ||
    >              GetDatabaseEncoding() == SQL_ASCII)
    > +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >
    
    You mean the final changes in plperl_helpers.h would look like
    something like this right? :
    
     static inline char *
     utf_u2e(const char *utf8_str, size_t len)
     {
            char       *ret = (char *) pg_do_encoding_conversion((unsigned
    char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
    
            if (ret == utf8_str)
    +       {
    +               if (GetDatabaseEncoding() == PG_UTF8 ||
    +                       GetDatabaseEncoding() == PG_SQL_ASCII)
    +               {
    +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    +               }
    +
                    ret = pstrdup(ret);
    +       }
            return ret;
     }
    
    
    Yeah I am ok with that. It's just an additional check besides (ret ==
    utf8_str) to know if we really require validation.
    
    
  14. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-05T06:30:01Z

    On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
    <amit.khandekar@enterprisedb.com> wrote:
    > On 4 October 2011 22:57, Alex Hunsaker <badalex@gmail.com> wrote:
    >> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
    >> <amit.khandekar@enterprisedb.com> wrote:
    >>> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
    >>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
    >>>> <amit.khandekar@enterprisedb.com> wrote:
    >>>>
    >>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
    >>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
    >>>>
    >>>> Consider a latin1 database where utf8_str was a string of ascii
    >>>> characters. [...]
    >>
    >>>> [Patch] Look ok to you?
    >>>>
    >>>
    >>> +       if(GetDatabaseEncoding() == PG_UTF8)
    >>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>>
    >>> In your patch, the above will again skip mb-validation if the database
    >>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
    >>> the un-converted string even if *one* of the src and dest encodings is
    >>> SQL_ASCII.
    >>
    >> *scratches head* I thought the point of SQL_ASCII was no encoding
    >> conversion was done and so there would be nothing to verify.
    >>
    >> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
    >> NULL bytes in the string when we are a single byte encoding.
    >>
    >>> I think :
    >>>        if (ret == utf8_str)
    >>> +       {
    >>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>>                ret = pstrdup(ret);
    >>> +       }
    >>>
    >>> This (ret == utf8_str) condition would be a reliable way for knowing
    >>> whether pg_do_encoding_conversion() has done the conversion at all.
    >>
    >> Yes. However (and maybe im nitpicking here), I dont see any reason to
    >> verify certain strings twice if we can avoid it.
    >>
    >> What do you think about:
    >> +    /*
    >> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
    >> +    * will not do any conversion or verification. we need to do it
    >> manually instead.
    >> +    */
    >> +       if( GetDatabaseEncoding() == PG_UTF8 ||
    >>              GetDatabaseEncoding() == SQL_ASCII)
    >> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>
    >
    > You mean the final changes in plperl_helpers.h would look like
    > something like this right? :
    >
    >  static inline char *
    >  utf_u2e(const char *utf8_str, size_t len)
    >  {
    >        char       *ret = (char *) pg_do_encoding_conversion((unsigned
    > char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
    >
    >        if (ret == utf8_str)
    > +       {
    > +               if (GetDatabaseEncoding() == PG_UTF8 ||
    > +                       GetDatabaseEncoding() == PG_SQL_ASCII)
    > +               {
    > +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    > +               }
    > +
    >                ret = pstrdup(ret);
    > +       }
    >        return ret;
    >  }
    
    Yes.
    
    > Yeah I am ok with that. It's just an additional check besides (ret ==
    > utf8_str) to know if we really require validation.
    >
    
    
  15. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-05T06:59:47Z

    On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
    > <amit.khandekar@enterprisedb.com> wrote:
    
    >> You mean the final changes in plperl_helpers.h would look like
    >> something like this right? :
    >>
    >>  static inline char *
    >>  utf_u2e(const char *utf8_str, size_t len)
    >>  {
    >>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
    >> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
    >>
    >>        if (ret == utf8_str)
    >> +       {
    >> +               if (GetDatabaseEncoding() == PG_UTF8 ||
    >> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
    >> +               {
    >> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >> +               }
    >> +
    >>                ret = pstrdup(ret);
    >> +       }
    >>        return ret;
    >>  }
    >
    
    >> Yeah I am ok with that. It's just an additional check besides (ret ==
    >> utf8_str) to know if we really require validation.
    
    Find it attached. [ Note I didn't put the check inside the if (ret ==
    utf8_str) as it seemed a bit cleaner (indentation wise) to have it
    outside ]
    
  16. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-05T07:58:51Z

    On 5 October 2011 12:29, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex@gmail.com> wrote:
    >> On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
    >> <amit.khandekar@enterprisedb.com> wrote:
    >
    >>> You mean the final changes in plperl_helpers.h would look like
    >>> something like this right? :
    >>>
    >>>  static inline char *
    >>>  utf_u2e(const char *utf8_str, size_t len)
    >>>  {
    >>>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
    >>> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
    >>>
    >>>        if (ret == utf8_str)
    >>> +       {
    >>> +               if (GetDatabaseEncoding() == PG_UTF8 ||
    >>> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
    >>> +               {
    >>> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
    >>> +               }
    >>> +
    >>>                ret = pstrdup(ret);
    >>> +       }
    >>>        return ret;
    >>>  }
    >>
    >
    >>> Yeah I am ok with that. It's just an additional check besides (ret ==
    >>> utf8_str) to know if we really require validation.
    >
    > Find it attached. [ Note I didn't put the check inside the if (ret ==
    > utf8_str) as it seemed a bit cleaner (indentation wise) to have it
    > outside ]
    
    I have no more issues with the patch.
    Thanks!
    -Amit
    
    >
    
    
  17. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Robert Haas <robertmhaas@gmail.com> — 2011-10-05T14:18:16Z

    On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
    <amit.khandekar@enterprisedb.com> wrote:
    > I have no more issues with the patch.
    > Thanks!
    
    I think this patch needs to be added to the open CommitFest, with
    links to the reviews, and marked Ready for Committer.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  18. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-05T21:03:41Z

    On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
    > <amit.khandekar@enterprisedb.com> wrote:
    >> I have no more issues with the patch.
    >> Thanks!
    >
    > I think this patch needs to be added to the open CommitFest, with
    > links to the reviews, and marked Ready for Committer.
    
    The open commitfest? Even if its an "important" bug fix that should be
    backpatched?
    
    
  19. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Robert Haas <robertmhaas@gmail.com> — 2011-10-06T02:36:27Z

    On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex@gmail.com> wrote:
    > On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
    >> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
    >> <amit.khandekar@enterprisedb.com> wrote:
    >>> I have no more issues with the patch.
    >>> Thanks!
    >>
    >> I think this patch needs to be added to the open CommitFest, with
    >> links to the reviews, and marked Ready for Committer.
    >
    > The open commitfest? Even if its an "important" bug fix that should be
    > backpatched?
    
    Considering that the issue appears to have been ignored from
    mid-February until early October, I don't see why it should now get to
    jump to the head of the queue.  Other people may have different
    opinions, of course.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  20. Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    David Wheeler <david@kineticode.com> — 2011-10-06T16:19:23Z

    On Oct 5, 2011, at 7:36 PM, Robert Haas wrote:
    
    >> The open commitfest? Even if its an "important" bug fix that should be
    >> backpatched?
    > 
    > Considering that the issue appears to have been ignored from
    > mid-February until early October, I don't see why it should now get to
    > jump to the head of the queue.  Other people may have different
    > opinions, of course.
    
    Proper encoding of text data to and from Perl is more important every day, as it's used for more and more uses beyond ASCII. I think it's important to backport such issues modulo behavioral changes.
    
    Best,
    
    David
    
    
    
  21. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-10-07T16:51:46Z

    On Wed, Oct 5, 2011 at 20:36, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex@gmail.com> wrote:
    >> On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
    >>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
    >>> <amit.khandekar@enterprisedb.com> wrote:
    >>>> I have no more issues with the patch.
    >>>> Thanks!
    >>>
    >>> I think this patch needs to be added to the open CommitFest, with
    >>> links to the reviews, and marked Ready for Committer.
    >>
    >> The open commitfest? Even if its an "important" bug fix that should be
    >> backpatched?
    >
    > Considering that the issue appears to have been ignored from
    > mid-February until early October, I don't see why it should now get to
    > jump to the head of the queue.  Other people may have different
    > opinions, of course.
    
    Added. :-)
    
    
  22. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Andrew Dunstan <andrew@dunslane.net> — 2011-11-02T23:12:24Z

    
    On 10/07/2011 12:51 PM, Alex Hunsaker wrote:
    > On Wed, Oct 5, 2011 at 20:36, Robert Haas<robertmhaas@gmail.com>  wrote:
    >> On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker<badalex@gmail.com>  wrote:
    >>> On Wed, Oct 5, 2011 at 08:18, Robert Haas<robertmhaas@gmail.com>  wrote:
    >>>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
    >>>> <amit.khandekar@enterprisedb.com>  wrote:
    >>>>> I have no more issues with the patch.
    >>>>> Thanks!
    >>>> I think this patch needs to be added to the open CommitFest, with
    >>>> links to the reviews, and marked Ready for Committer.
    >>> The open commitfest? Even if its an "important" bug fix that should be
    >>> backpatched?
    >> Considering that the issue appears to have been ignored from
    >> mid-February until early October, I don't see why it should now get to
    >> jump to the head of the queue.  Other people may have different
    >> opinions, of course.
    > Added. :-)
    >
    
    I'm just starting to look at this, by way of a break in staring at 
    pg_dump code ;-). This just needs to be backpatched to 9.1, is that right?
    
    cheers
    
    andrew
    
    
  23. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Alex Hunsaker <badalex@gmail.com> — 2011-11-02T23:39:19Z

    On Wed, Nov 2, 2011 at 17:12, Andrew Dunstan <andrew@dunslane.net> wrote:
    >>> Considering that the issue appears to have been ignored from
    >>> mid-February until early October, I don't see why it should now get to
    >>> jump to the head of the queue.  Other people may have different
    >>> opinions, of course.
    >>
    >> Added. :-)
    >>
    >
    > I'm just starting to look at this, by way of a break in staring at pg_dump
    > code ;-). This just needs to be backpatched to 9.1, is that right?
    
    Yes please.
    
    9.0 did not have this problem (or at least if it did it was a separate issue).
    
    
  24. Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

    Andrew Dunstan <andrew@dunslane.net> — 2011-11-26T17:23:30Z

    
    On 10/05/2011 03:58 AM, Amit Khandekar wrote:
    > On 5 October 2011 12:29, Alex Hunsaker<badalex@gmail.com>  wrote:
    >>
    >> Find it attached. [ Note I didn't put the check inside the if (ret ==
    >> utf8_str) as it seemed a bit cleaner (indentation wise) to have it
    >> outside ]
    > I have no more issues with the patch.
    >
    
    Committed.
    
    cheers
    
    andrew