Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Force strings passed to and from plperl to be in UTF8 encoding.
- 50d89d422f9c 9.1.0 cited

pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Andrew Dunstan <andrew@dunslane.net> — 2011-02-06T22:31:13Z

Force strings passed to and from plperl to be in UTF8 encoding.

String are converted to UTF8 on the way into perl and to the
database encoding on the way back. This avoids a number of
observed anomalies, and ensures Perl a consistent view of the
world.

Some minor code cleanups are also accomplished.

Alex Hunsaker, reviewed by Andy Colson.

Branch
------
master

Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=50d89d422f9c68a52a6964e5468e8eb4f90b1d95

Modified Files
--------------
doc/src/sgml/plperl.sgml       |    8 ++
src/pl/plperl/SPI.xs           |   52 ++++++---
src/pl/plperl/Util.xs          |   66 +++++-----
src/pl/plperl/plperl.c         |  260 +++++++++++++++++++++++-----------------
src/pl/plperl/plperl_helpers.h |   69 +++++++++++
5 files changed, 295 insertions(+), 160 deletions(-)

Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Andrew Dunstan <andrew@dunslane.net> — 2011-02-07T01:02:05Z





On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
> Force strings passed to and from plperl to be in UTF8 encoding.
>
> String are converted to UTF8 on the way into perl and to the
> database encoding on the way back. This avoids a number of
> observed anomalies, and ensures Perl a consistent view of the
> world.
>
> Some minor code cleanups are also accomplished.
>
> Alex Hunsaker, reviewed by Andy Colson.


This has broken the buildfarm :-(

perl ppport.h reports:

    *** WARNING: Uses HeUTF8, which may not be portable below perl
    5.11.0, even with 'ppport.h'


Experimentation on a CentOS machine suggests we can cure it with this:

    #ifndef HeUTF8
    #define HeUTF8(he)             ((HeKLEN(he) == HEf_SVKEY) ?            \
                                     SvUTF8(HeKEY_sv(he))
    :                 \
                                     (U32)HeKUTF8(he))
    #endif


cheers

andrew

Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-02-07T02:13:22Z

On Sun, Feb 6, 2011 at 18:02, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
>
>
>
> On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
>>
>> Force strings passed to and from plperl to be in UTF8 encoding.
>>
>> String are converted to UTF8 on the way into perl and to the
>> database encoding on the way back. This avoids a number of
>> observed anomalies, and ensures Perl a consistent view of the
>> world.
>>
>> Some minor code cleanups are also accomplished.
>>
>> Alex Hunsaker, reviewed by Andy Colson.
>
>
> This has broken the buildfarm :-(

Drat.

> perl ppport.h reports:
>
>   *** WARNING: Uses HeUTF8, which may not be portable below perl
>   5.11.0, even with 'ppport.h'
> Experimentation on a CentOS machine suggests we can cure it with this:
>
>   #ifndef HeUTF8
>   #define HeUTF8(he)             ((HeKLEN(he) == HEf_SVKEY) ?            \
>                                    SvUTF8(HeKEY_sv(he))
>   :                 \
>                                    (U32)HeKUTF8(he))
>   #endif

Yeah, that should work. BTW I would have loved to add some regression
tests for some of this (like the example hek2cstr states). Is there
any way to do that?

Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Andrew Dunstan <andrew@dunslane.net> — 2011-02-07T02:52:48Z


On 02/06/2011 09:13 PM, Alex Hunsaker wrote:
> I would have loved to add some regression
> tests for some of this (like the example hek2cstr states). Is there
> any way to do that?
>

I can't think of an obvious way. Anyone else?

cheers

ndrew

Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-02-12T09:18:36Z

On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew@dunslane.net> wrote:
> Force strings passed to and from plperl to be in UTF8 encoding.
>
> String are converted to UTF8 on the way into perl and to the
> database encoding on the way back. This avoids a number of
> observed anomalies, and ensures Perl a consistent view of the
> world.

So I noticed a problem while playing with this in my discussion with
David Wheeler. pg_do_encoding() does nothing when the src encoding ==
the dest encoding. That means on a UTF-8 database we fail make sure
our strings are valid utf8.

An easy way to see this is to embed a null in the middle of a string:
=> create or replace function zerob() returns text as $$ return
"abcd\0efg"; $$ language plperl;
=> SELECT zerob();
abcd

Also It seems bogus to bogus to do any encoding conversion when we are
SQL_ASCII, and its really trivial to fix.

With the attached:
- when we are on a utf8 database make sure to verify our output string
in sv2cstr (we assume database strings coming in are already valid)

- Do no string conversion when we are SQL_ASCII in or out

- add plperl_helpers.h as a dep to plperl.o in our makefile

- remove some redundant calls to pg_verify_mbstr()

- as utf_e2u only as one caller dont pstrdup() instead have the caller
check (saves some cycles and memory)

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-03T10:20:30Z

On 12 February 2011 14:48, Alex Hunsaker <badalex@gmail.com> wrote:
> On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Force strings passed to and from plperl to be in UTF8 encoding.
>>
>> String are converted to UTF8 on the way into perl and to the
>> database encoding on the way back. This avoids a number of
>> observed anomalies, and ensures Perl a consistent view of the
>> world.
>
> So I noticed a problem while playing with this in my discussion with
> David Wheeler. pg_do_encoding() does nothing when the src encoding ==
> the dest encoding. That means on a UTF-8 database we fail make sure
> our strings are valid utf8.
>
> An easy way to see this is to embed a null in the middle of a string:
> => create or replace function zerob() returns text as $$ return
> "abcd\0efg"; $$ language plperl;
> => SELECT zerob();
> abcd
>
> Also It seems bogus to bogus to do any encoding conversion when we are
> SQL_ASCII, and its really trivial to fix.
>
> With the attached:
> - when we are on a utf8 database make sure to verify our output string
> in sv2cstr (we assume database strings coming in are already valid)
>
> - Do no string conversion when we are SQL_ASCII in or out
>
> - add plperl_helpers.h as a dep to plperl.o in our makefile
>
> - remove some redundant calls to pg_verify_mbstr()
>
> - as utf_e2u only as one caller dont pstrdup() instead have the caller
> check (saves some cycles and memory)
>

Is there a plan to commit this issue? I am still seeing this issue on
PG 9.1 STABLE branch. Attached is a small patch that targets only the
specific issue in the described testcase :

create or replace function zerob() returns text as $$ return
"abcd\0efg"; $$ language plperl;
SELECT zerob();

The patch does the perl data validation in the function utf_u2e() itself.

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-03T17:07:01Z

On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
<amit.khandekar@enterprisedb.com> wrote:

> Is there a plan to commit this issue? I am still seeing this issue on
> PG 9.1 STABLE branch. Attached is a small patch that targets only the
> specific issue in the described testcase :
>
> create or replace function zerob() returns text as $$ return
> "abcd\0efg"; $$ language plperl;
> SELECT zerob();
>
> The patch does the perl data validation in the function utf_u2e() itself.

I think thats fine, but as coded it will verify the string twice in
the GetDatabaseEncoding() != PG_UTF8 case (once for
pg_do_encoding_conversion() and again with the added
pg_verify_mbstr_len), which seems a bit wasteful.

It might be worth adding a regression test also...

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-04T05:35:21Z

On 3 October 2011 22:37, Alex Hunsaker <badalex@gmail.com> wrote:
> On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:
>
>> Is there a plan to commit this issue? I am still seeing this issue on
>> PG 9.1 STABLE branch. Attached is a small patch that targets only the
>> specific issue in the described testcase :
>>
>> create or replace function zerob() returns text as $$ return
>> "abcd\0efg"; $$ language plperl;
>> SELECT zerob();
>>
>> The patch does the perl data validation in the function utf_u2e() itself.
>
> I think thats fine, but as coded it will verify the string twice in
> the GetDatabaseEncoding() != PG_UTF8 case (once for
> pg_do_encoding_conversion() and again with the added
> pg_verify_mbstr_len), which seems a bit wasteful.

WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
utf8_str, so pg_verify_mbstr_len() will not get called. That's the
reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
condition.

>
> It might be worth adding a regression test also...

I could not find any basic pl/perl tests in the regression
serial_schedule. I am not sure if we want to add just this scenario
without any basic tests for pl/perl ?

>

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-04T06:43:53Z

On 04.10.2011 08:35, Amit Khandekar wrote:
> On 3 October 2011 22:37, Alex Hunsaker<badalex@gmail.com>  wrote:
>> It might be worth adding a regression test also...
>
> I could not find any basic pl/perl tests in the regression
> serial_schedule. I am not sure if we want to add just this scenario
> without any basic tests for pl/perl ?

See src/pl/plperl/[sql|expected]

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-04T08:34:14Z

On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
<amit.khandekar@enterprisedb.com> wrote:

> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
> utf8_str, so pg_verify_mbstr_len() will not get called. That's the
> reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
> condition.

Consider a latin1 database where utf8_str was a string of ascii
characters. Then no conversion would take place and ret == utf8_str
but the string would be verified by pg_do_encdoing_conversion() and
verified again by your added check :-).

>> It might be worth adding a regression test also...
>
> I could not find any basic pl/perl tests in the regression
> serial_schedule. I am not sure if we want to add just this scenario
> without any basic tests for pl/perl ?

I went ahead and added one in the attached based upon your example.

Look ok to you?

BTW thanks for the patch!

[ side note ]
I still think we should not be doing any conversion in the SQL_ASCII
case but this slimmed down patch should be less controversial.

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-04T09:09:44Z

On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:
>
>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>> utf8_str, so pg_verify_mbstr_len() will not get called. That's the
>> reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
>> condition.
>
> Consider a latin1 database where utf8_str was a string of ascii
> characters. Then no conversion would take place and ret == utf8_str
> but the string would be verified by pg_do_encdoing_conversion() and
> verified again by your added check :-).
>
>>> It might be worth adding a regression test also...
>>
>> I could not find any basic pl/perl tests in the regression
>> serial_schedule. I am not sure if we want to add just this scenario
>> without any basic tests for pl/perl ?
>
> I went ahead and added one in the attached based upon your example.
>
> Look ok to you?
>

+ 	if(GetDatabaseEncoding() == PG_UTF8)
+ 		pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);

In your patch, the above will again skip mb-validation if the database
encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
the un-converted string even if *one* of the src and dest encodings is
SQL_ASCII.

I think :
        if (ret == utf8_str)
+       {
+               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
                ret = pstrdup(ret);
+       }

This (ret == utf8_str) condition would be a reliable way for knowing
whether pg_do_encoding_conversion() has done the conversion at all.

I am ok with the new test. Thanks for doing it yourself !


> BTW thanks for the patch!
>
> [ side note ]
> I still think we should not be doing any conversion in the SQL_ASCII
> case but this slimmed down patch should be less controversial.
>

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-04T17:27:32Z

On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
<amit.khandekar@enterprisedb.com> wrote:
> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>> <amit.khandekar@enterprisedb.com> wrote:
>>
>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>
>> Consider a latin1 database where utf8_str was a string of ascii
>> characters. [...]

>> [Patch] Look ok to you?
>>
>
> +       if(GetDatabaseEncoding() == PG_UTF8)
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>
> In your patch, the above will again skip mb-validation if the database
> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
> the un-converted string even if *one* of the src and dest encodings is
> SQL_ASCII.

*scratches head* I thought the point of SQL_ASCII was no encoding
conversion was done and so there would be nothing to verify.

Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
NULL bytes in the string when we are a single byte encoding.

> I think :
>        if (ret == utf8_str)
> +       {
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>                ret = pstrdup(ret);
> +       }
>
> This (ret == utf8_str) condition would be a reliable way for knowing
> whether pg_do_encoding_conversion() has done the conversion at all.

Yes. However (and maybe im nitpicking here), I dont see any reason to
verify certain strings twice if we can avoid it.

What do you think about:
+    /*
+    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
+    * will not do any conversion or verification. we need to do it
manually instead.
+    */
+       if( GetDatabaseEncoding() == PG_UTF8 ||
              GetDatabaseEncoding() == SQL_ASCII)
+               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-05T05:46:02Z

On 4 October 2011 22:57, Alex Hunsaker <badalex@gmail.com> wrote:
> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:
>> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>>> <amit.khandekar@enterprisedb.com> wrote:
>>>
>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>>
>>> Consider a latin1 database where utf8_str was a string of ascii
>>> characters. [...]
>
>>> [Patch] Look ok to you?
>>>
>>
>> +       if(GetDatabaseEncoding() == PG_UTF8)
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>
>> In your patch, the above will again skip mb-validation if the database
>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
>> the un-converted string even if *one* of the src and dest encodings is
>> SQL_ASCII.
>
> *scratches head* I thought the point of SQL_ASCII was no encoding
> conversion was done and so there would be nothing to verify.
>
> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
> NULL bytes in the string when we are a single byte encoding.
>
>> I think :
>>        if (ret == utf8_str)
>> +       {
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>                ret = pstrdup(ret);
>> +       }
>>
>> This (ret == utf8_str) condition would be a reliable way for knowing
>> whether pg_do_encoding_conversion() has done the conversion at all.
>
> Yes. However (and maybe im nitpicking here), I dont see any reason to
> verify certain strings twice if we can avoid it.
>
> What do you think about:
> +    /*
> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
> +    * will not do any conversion or verification. we need to do it
> manually instead.
> +    */
> +       if( GetDatabaseEncoding() == PG_UTF8 ||
>              GetDatabaseEncoding() == SQL_ASCII)
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>

You mean the final changes in plperl_helpers.h would look like
something like this right? :

 static inline char *
 utf_u2e(const char *utf8_str, size_t len)
 {
        char       *ret = (char *) pg_do_encoding_conversion((unsigned
char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());

        if (ret == utf8_str)
+       {
+               if (GetDatabaseEncoding() == PG_UTF8 ||
+                       GetDatabaseEncoding() == PG_SQL_ASCII)
+               {
+                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+               }
+
                ret = pstrdup(ret);
+       }
        return ret;
 }


Yeah I am ok with that. It's just an additional check besides (ret ==
utf8_str) to know if we really require validation.

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-05T06:30:01Z

On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
<amit.khandekar@enterprisedb.com> wrote:
> On 4 October 2011 22:57, Alex Hunsaker <badalex@gmail.com> wrote:
>> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
>> <amit.khandekar@enterprisedb.com> wrote:
>>> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
>>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>>>> <amit.khandekar@enterprisedb.com> wrote:
>>>>
>>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>>>
>>>> Consider a latin1 database where utf8_str was a string of ascii
>>>> characters. [...]
>>
>>>> [Patch] Look ok to you?
>>>>
>>>
>>> +       if(GetDatabaseEncoding() == PG_UTF8)
>>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>>
>>> In your patch, the above will again skip mb-validation if the database
>>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
>>> the un-converted string even if *one* of the src and dest encodings is
>>> SQL_ASCII.
>>
>> *scratches head* I thought the point of SQL_ASCII was no encoding
>> conversion was done and so there would be nothing to verify.
>>
>> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
>> NULL bytes in the string when we are a single byte encoding.
>>
>>> I think :
>>>        if (ret == utf8_str)
>>> +       {
>>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>>                ret = pstrdup(ret);
>>> +       }
>>>
>>> This (ret == utf8_str) condition would be a reliable way for knowing
>>> whether pg_do_encoding_conversion() has done the conversion at all.
>>
>> Yes. However (and maybe im nitpicking here), I dont see any reason to
>> verify certain strings twice if we can avoid it.
>>
>> What do you think about:
>> +    /*
>> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
>> +    * will not do any conversion or verification. we need to do it
>> manually instead.
>> +    */
>> +       if( GetDatabaseEncoding() == PG_UTF8 ||
>>              GetDatabaseEncoding() == SQL_ASCII)
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>
>
> You mean the final changes in plperl_helpers.h would look like
> something like this right? :
>
>  static inline char *
>  utf_u2e(const char *utf8_str, size_t len)
>  {
>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>
>        if (ret == utf8_str)
> +       {
> +               if (GetDatabaseEncoding() == PG_UTF8 ||
> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
> +               {
> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
> +               }
> +
>                ret = pstrdup(ret);
> +       }
>        return ret;
>  }

Yes.

> Yeah I am ok with that. It's just an additional check besides (ret ==
> utf8_str) to know if we really require validation.
>

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-05T06:59:47Z

On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex@gmail.com> wrote:
> On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:

>> You mean the final changes in plperl_helpers.h would look like
>> something like this right? :
>>
>>  static inline char *
>>  utf_u2e(const char *utf8_str, size_t len)
>>  {
>>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
>> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>>
>>        if (ret == utf8_str)
>> +       {
>> +               if (GetDatabaseEncoding() == PG_UTF8 ||
>> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
>> +               {
>> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>> +               }
>> +
>>                ret = pstrdup(ret);
>> +       }
>>        return ret;
>>  }
>

>> Yeah I am ok with that. It's just an additional check besides (ret ==
>> utf8_str) to know if we really require validation.

Find it attached. [ Note I didn't put the check inside the if (ret ==
utf8_str) as it seemed a bit cleaner (indentation wise) to have it
outside ]

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Amit Khandekar <amit.khandekar@enterprisedb.com> — 2011-10-05T07:58:51Z

On 5 October 2011 12:29, Alex Hunsaker <badalex@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex@gmail.com> wrote:
>> On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
>> <amit.khandekar@enterprisedb.com> wrote:
>
>>> You mean the final changes in plperl_helpers.h would look like
>>> something like this right? :
>>>
>>>  static inline char *
>>>  utf_u2e(const char *utf8_str, size_t len)
>>>  {
>>>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
>>> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>>>
>>>        if (ret == utf8_str)
>>> +       {
>>> +               if (GetDatabaseEncoding() == PG_UTF8 ||
>>> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
>>> +               {
>>> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>> +               }
>>> +
>>>                ret = pstrdup(ret);
>>> +       }
>>>        return ret;
>>>  }
>>
>
>>> Yeah I am ok with that. It's just an additional check besides (ret ==
>>> utf8_str) to know if we really require validation.
>
> Find it attached. [ Note I didn't put the check inside the if (ret ==
> utf8_str) as it seemed a bit cleaner (indentation wise) to have it
> outside ]

I have no more issues with the patch.
Thanks!
-Amit

>

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Robert Haas <robertmhaas@gmail.com> — 2011-10-05T14:18:16Z

On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
<amit.khandekar@enterprisedb.com> wrote:
> I have no more issues with the patch.
> Thanks!

I think this patch needs to be added to the open CommitFest, with
links to the reviews, and marked Ready for Committer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-05T21:03:41Z

On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:
>> I have no more issues with the patch.
>> Thanks!
>
> I think this patch needs to be added to the open CommitFest, with
> links to the reviews, and marked Ready for Committer.

The open commitfest? Even if its an "important" bug fix that should be
backpatched?

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Robert Haas <robertmhaas@gmail.com> — 2011-10-06T02:36:27Z

On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>> <amit.khandekar@enterprisedb.com> wrote:
>>> I have no more issues with the patch.
>>> Thanks!
>>
>> I think this patch needs to be added to the open CommitFest, with
>> links to the reviews, and marked Ready for Committer.
>
> The open commitfest? Even if its an "important" bug fix that should be
> backpatched?

Considering that the issue appears to have been ignored from
mid-February until early October, I don't see why it should now get to
jump to the head of the queue.  Other people may have different
opinions, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

David Wheeler <david@kineticode.com> — 2011-10-06T16:19:23Z

On Oct 5, 2011, at 7:36 PM, Robert Haas wrote:

>> The open commitfest? Even if its an "important" bug fix that should be
>> backpatched?
> 
> Considering that the issue appears to have been ignored from
> mid-February until early October, I don't see why it should now get to
> jump to the head of the queue.  Other people may have different
> opinions, of course.

Proper encoding of text data to and from Perl is more important every day, as it's used for more and more uses beyond ASCII. I think it's important to backport such issues modulo behavioral changes.

Best,

David

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-10-07T16:51:46Z

On Wed, Oct 5, 2011 at 20:36, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex@gmail.com> wrote:
>> On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>>> <amit.khandekar@enterprisedb.com> wrote:
>>>> I have no more issues with the patch.
>>>> Thanks!
>>>
>>> I think this patch needs to be added to the open CommitFest, with
>>> links to the reviews, and marked Ready for Committer.
>>
>> The open commitfest? Even if its an "important" bug fix that should be
>> backpatched?
>
> Considering that the issue appears to have been ignored from
> mid-February until early October, I don't see why it should now get to
> jump to the head of the queue.  Other people may have different
> opinions, of course.

Added. :-)

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Andrew Dunstan <andrew@dunslane.net> — 2011-11-02T23:12:24Z


On 10/07/2011 12:51 PM, Alex Hunsaker wrote:
> On Wed, Oct 5, 2011 at 20:36, Robert Haas<robertmhaas@gmail.com>  wrote:
>> On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker<badalex@gmail.com>  wrote:
>>> On Wed, Oct 5, 2011 at 08:18, Robert Haas<robertmhaas@gmail.com>  wrote:
>>>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>>>> <amit.khandekar@enterprisedb.com>  wrote:
>>>>> I have no more issues with the patch.
>>>>> Thanks!
>>>> I think this patch needs to be added to the open CommitFest, with
>>>> links to the reviews, and marked Ready for Committer.
>>> The open commitfest? Even if its an "important" bug fix that should be
>>> backpatched?
>> Considering that the issue appears to have been ignored from
>> mid-February until early October, I don't see why it should now get to
>> jump to the head of the queue.  Other people may have different
>> opinions, of course.
> Added. :-)
>

I'm just starting to look at this, by way of a break in staring at 
pg_dump code ;-). This just needs to be backpatched to 9.1, is that right?

cheers

andrew

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Alex Hunsaker <badalex@gmail.com> — 2011-11-02T23:39:19Z

On Wed, Nov 2, 2011 at 17:12, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> Considering that the issue appears to have been ignored from
>>> mid-February until early October, I don't see why it should now get to
>>> jump to the head of the queue.  Other people may have different
>>> opinions, of course.
>>
>> Added. :-)
>>
>
> I'm just starting to look at this, by way of a break in staring at pg_dump
> code ;-). This just needs to be backpatched to 9.1, is that right?

Yes please.

9.0 did not have this problem (or at least if it did it was a separate issue).

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Andrew Dunstan <andrew@dunslane.net> — 2011-11-26T17:23:30Z


On 10/05/2011 03:58 AM, Amit Khandekar wrote:
> On 5 October 2011 12:29, Alex Hunsaker<badalex@gmail.com>  wrote:
>>
>> Find it attached. [ Note I didn't put the check inside the if (ret ==
>> utf8_str) as it seemed a bit cleaner (indentation wise) to have it
>> outside ]
> I have no more issues with the patch.
>

Committed.

cheers

andrew