Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Replace the "New Linear" GiST split algorithm for boxes and points with a
- 7f3bd86843e5 9.2.0 cited

Range Types

Jeff Davis <pgsql@j-davis.com> — 2011-08-23T17:23:28Z

Attached is the latest version of the Range Types patch. I will get it
into better shape before the commitfest, but wanted to put up a draft in
case anyone had comments on the TODO items.

Changes:

  * Uses BTree opclass rather than compare function.
  * Collation specified at type definition time.
  * Various fixes.

TODO:

  * Should the catalog hold the opclass or the opfamily? This doesn't
affect much, but I wasn't sure which to actually store in the catalog.

  * Use Robert Haas' suggestion for auto-generating constructors with
the same name as the range type, e.g. "int8range(1,10,'[]')", where the
third argument defaults to '[)'. This allows better type inference for
constructors, especially when there are multiple range types over the
same base type (and collation is a common case of this). I believe this
was the best idea after significant discussion:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php
http://archives.postgresql.org/pgsql-hackers/2011-07/msg00210.php

  * Send/recv functions

  * cleanup

  * documentation updates

Regards,
	Jeff Davis

Re: Range Types

Jeff Davis <pgsql@j-davis.com> — 2011-09-01T07:32:25Z

Updated patch attached.

Changes:

1. Now supports new constructor scheme. If you define a range foorange,
you get the following constructors:
 * foorange() -- produces empty foorange
 * foorange(S) -- produces singleton range [S]
 * foorange(L,B) -- produces range [L,B)
 * foorange(L,B,'(]') -- produces range (L,B]

Actually, the two-argument form uses a special "default_flags" that can
be specified at creation time, and that defaults to '[)'.

The way I accomplish this is by generating 4 functions at definition
time -- a little ugly, and I am open to suggestions. I ran into a
problem using the default argument as Robert suggested because
pg_node_tree doesn't have a working input function (intentionally so),
so I couldn't get the built-in range types to work with initdb. The
constructors all essentially point to the same C function, aside from
some indirection that I did to avoid excessive complaining from the
opr_sanity test. Again, suggestions welcome.

2. Documentation has been updated.

3. Now there's support for multiple range types over a single base type,
e.g. two text ranges using different collations.

TODO:

There is still some cleanup to do, e.g. pg_dump. I'd like to get some
feedback to stabilize the user-facing behavior before I put too much
effort into the code cleanup (which shouldn't take long, but I just
don't want to work toward a moving target).

Regards,
	Jeff Davis

Re: Range Types

Jeff Davis <pgsql@j-davis.com> — 2011-09-13T08:41:36Z

Another updated patch is attached.

Changes:

  * support for send/recv
  * significant cleanup and fixes
  * test improvements

TODO:

  * pg_dump support. This requires outputting collation names and
opclass names (because those are part of the range type definition).
Currently, that's only done for indexes through special functions in
ruleutils.c. Should I define such functions there as well, or is there a
simpler approach? Also, I have to filter out the generated constructor
functions because those are created internally when defining a new range
type.
  * some error messages should be improved
  * Originally, I wasn't sure whether to define a RangeCoerceExpr
(similar to ArrayCoerceExpr), because the only use I had was for typmod.
But that is necessary for casts as well, so I'll go ahead and do that
and we'll get both casts and typmod for range types.
  * I think I should avoid some syscache lookups for some of the generic
range functions. Right now they are done for every invocation, but it
would be pretty simple to avoid lookups when looking up the same range
type as last time.

Review questions:

  * Do we like the new constructor behavior from the users' standpoint?
  * Right now, the patch accomplishes that behavior by generating
several constructor functions every time a new range type is defined. Is
that acceptable? Is there a better way?

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Erik Rijkers <er@xs4all.nl> — 2011-09-18T16:08:22Z

On Tue, September 13, 2011 10:41, Jeff Davis wrote:
> Another updated patch is attached.
>

Hi,

Below are 2 changes.  The first change is an elog saying 'lower' instead of 'upper'.

The second change is less straightforward, but I think it should be changed too:

Rangetypes as it stands uses NULL to indicate INF or -INF:

select int4range(2, NULL);

 int4range
------------
 [ 2, INF )
(1 row)


but refuses to accept it in the string-form:

select '[ 2 , NULL )'::int4range;
ERROR:  NULL range boundaries are not supported
LINE 1: select '[ 2 , NULL )'::int4range;
               ^

Second part below changes that to accept NULL for INF and -INF
in the string-form construction. (not complete: it still is
case-sensitive).


Thanks,

Erik Rijkers




--- src/backend/utils/adt/rangetypes.c.orig  2011-09-18 12:35:29.000000000 +0200
+++ src/backend/utils/adt/rangetypes.c      2011-09-18 16:03:34.000000000 +0200
@@ -387,7 +387,7 @@
        if (empty)
                elog(ERROR, "range is empty");
        if (upper.infinite)
-               elog(ERROR, "range lower bound is infinite");
+               elog(ERROR, "range upper bound is infinite");

        PG_RETURN_DATUM(upper.val);
 }
@@ -1579,9 +1579,9 @@
                fl = RANGE_EMPTY;

        if (!lb_quoted && strncmp(lb, "NULL", ilen) == 0)
-               elog(ERROR, "NULL range boundaries are not supported");
+               fl |= RANGE_LB_INF;
        if (!ub_quoted && strncmp(ub, "NULL", ilen) == 0)
-               elog(ERROR, "NULL range boundaries are not supported");
+               fl |= RANGE_UB_INF;
        if (!lb_quoted && strncmp(lb, "-INF", ilen) == 0)
                fl |= RANGE_LB_INF;
        if (!ub_quoted && strncmp(ub, "INF", ilen) == 0)

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-19T05:51:34Z

On Sun, 2011-09-18 at 18:08 +0200, Erik Rijkers wrote:
> Below are 2 changes.  The first change is an elog saying 'lower' instead of 'upper'.

Done, thank you. New patch attached.

Changes:
 * documentation fixes
 * added document for pg_range catalog
 * cleaned up errors, increased error checking
 * improved pg_dump

TODO:
 * Support casts and typmod.
   - This requires adding a RangeCoerceExpr, or possibly
     overloading ArrayCoerceExpr somehow. This is likely to
     require a lot of boilerplate code and a fairly large diff.
 * Cache lookups better to avoid unnecessary SearchSysCache calls.
 * I need to find a clean way to get the operator class name in pg_dump.

> Rangetypes as it stands uses NULL to indicate INF or -INF:
> 
> select int4range(2, NULL);
> 
>  int4range
> ------------
>  [ 2, INF )
> (1 row)
> 
> 
> but refuses to accept it in the string-form:
> 
> select '[ 2 , NULL )'::int4range;
> ERROR:  NULL range boundaries are not supported
> LINE 1: select '[ 2 , NULL )'::int4range;

I think this might require more opinions. There is a trade-off here
between convenience and confusion: accepting NULL is convenient in the
constructors, because it avoids the need to have extra constructors just
for unbounded ranges; but could lead to confusion between NULL and INF
(which are not the same).

In the string form, it doesn't add any convenience to accept NULL; but
as you point out, it seems inconsistent without it.

Thoughts?

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Pavel Stehule <pavel.stehule@gmail.com> — 2011-09-19T07:30:11Z

hello

sorry for late assign to discussion.

I don't think so using NULL instead INF is a good idea.

Regards

Pavel Stehule

2011/9/19 Jeff Davis <pgsql@j-davis.com>:
> On Sun, 2011-09-18 at 18:08 +0200, Erik Rijkers wrote:
>> Below are 2 changes.  The first change is an elog saying 'lower' instead of 'upper'.
>
> Done, thank you. New patch attached.
>
> Changes:
>  * documentation fixes
>  * added document for pg_range catalog
>  * cleaned up errors, increased error checking
>  * improved pg_dump
>
> TODO:
>  * Support casts and typmod.
>   - This requires adding a RangeCoerceExpr, or possibly
>     overloading ArrayCoerceExpr somehow. This is likely to
>     require a lot of boilerplate code and a fairly large diff.
>  * Cache lookups better to avoid unnecessary SearchSysCache calls.
>  * I need to find a clean way to get the operator class name in pg_dump.
>
>> Rangetypes as it stands uses NULL to indicate INF or -INF:
>>
>> select int4range(2, NULL);
>>
>>  int4range
>> ------------
>>  [ 2, INF )
>> (1 row)
>>
>>
>> but refuses to accept it in the string-form:
>>
>> select '[ 2 , NULL )'::int4range;
>> ERROR:  NULL range boundaries are not supported
>> LINE 1: select '[ 2 , NULL )'::int4range;
>
> I think this might require more opinions. There is a trade-off here
> between convenience and confusion: accepting NULL is convenient in the
> constructors, because it avoids the need to have extra constructors just
> for unbounded ranges; but could lead to confusion between NULL and INF
> (which are not the same).
>
> In the string form, it doesn't add any convenience to accept NULL; but
> as you point out, it seems inconsistent without it.
>
> Thoughts?
>
> Regards,
>        Jeff Davis
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-09-19T13:33:29Z

On Mon, Sep 19, 2011 at 1:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> select '[ 2 , NULL )'::int4range;
>> ERROR:  NULL range boundaries are not supported
>> LINE 1: select '[ 2 , NULL )'::int4range;
>
> I think this might require more opinions. There is a trade-off here
> between convenience and confusion: accepting NULL is convenient in the
> constructors, because it avoids the need to have extra constructors just
> for unbounded ranges; but could lead to confusion between NULL and INF
> (which are not the same).

I agree with this line of reasoning.  I think we will be making pain
for ourselves if we need to invent a bunch more constructors just to
have a way of indicating an unbounded range, but OTOH I don't see any
compelling reason why the type input function needs to accept N-U-L-L.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-19T15:23:34Z

On Sep19, 2011, at 15:33 , Robert Haas wrote:
> On Mon, Sep 19, 2011 at 1:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> select '[ 2 , NULL )'::int4range;
>>> ERROR:  NULL range boundaries are not supported
>>> LINE 1: select '[ 2 , NULL )'::int4range;
>> 
>> I think this might require more opinions. There is a trade-off here
>> between convenience and confusion: accepting NULL is convenient in the
>> constructors, because it avoids the need to have extra constructors just
>> for unbounded ranges; but could lead to confusion between NULL and INF
>> (which are not the same).
> 
> I agree with this line of reasoning.  I think we will be making pain
> for ourselves if we need to invent a bunch more constructors just to
> have a way of indicating an unbounded range, but OTOH I don't see any
> compelling reason why the type input function needs to accept N-U-L-L.

The one reason I can see in favour of supporting N-U-L-L there is 
compatibility with arrays. I've recently had the questionable pleasure
of writing PHP functions to parse and emit our textual representations of
arrays, records, dates and timestamps. After that experience, I feel that
the number of similar-yet-slightly-different textual input output format
for non-primitive types is already excessive, and any further additions
should be modeled after some existing ones.

(And BTW, why in heavens sake, is date and time input and output
asymmetric for some DateStyle settings? Asymmetric like in you need to
send one format, but get back another...)

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-19T15:46:27Z

On Mon, 2011-09-19 at 17:23 +0200, Florian Pflug wrote:
> The one reason I can see in favour of supporting N-U-L-L there is 
> compatibility with arrays.

But arrays actually do store and produce NULLs; ranges don't.

>  I've recently had the questionable pleasure
> of writing PHP functions to parse and emit our textual representations of
> arrays, records, dates and timestamps. After that experience, I feel that
> the number of similar-yet-slightly-different textual input output format
> for non-primitive types is already excessive, and any further additions
> should be modeled after some existing ones.

I'm not clear on how accepting "NULL" would really save effort. With
ranges, the brackets have an actual meaning (inclusivity), and empty
ranges have no brackets at all. So I don't think it's going to be easy
to write one function to parse everything.

What about binary formats?

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-09-19T16:00:18Z

Robert Haas <robertmhaas@gmail.com> wrote:
> Jeff Davis <pgsql@j-davis.com> wrote:
>>> select '[ 2 , NULL )'::int4range;
>>> ERROR:  NULL range boundaries are not supported
>>> LINE 1: select '[ 2 , NULL )'::int4range;
>>
>> I think this might require more opinions. There is a trade-off
>> here between convenience and confusion: accepting NULL is
>> convenient in the constructors, because it avoids the need to
>> have extra constructors just for unbounded ranges; but could lead
>> to confusion between NULL and INF (which are not the same).
> 
> I agree with this line of reasoning.  I think we will be making
> pain for ourselves if we need to invent a bunch more constructors
> just to have a way of indicating an unbounded range, but OTOH I
> don't see any compelling reason why the type input function needs
> to accept N-U-L-L.

FWIW, the existing semantics of NULL include not only "UNKNOWN" but
also "NOT APPLICABLE".  It seems fairly natural to think of a range
as being unbounded if the boundary limit is "not applicable".

On a practical level, our shop is already effectively doing this. 
We have several tables where part of the primary key is "effective
date" and there is a null capable "expiration date" -- with a NULL
meaning that no expiration date has been set.  It would be nice to
be able to have a "generated column" function which used these two
dates to build a range for exclusion constraints and such.

-Kevin

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-09-19T16:26:58Z

On Mon, Sep 19, 2011 at 11:23 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Sep19, 2011, at 15:33 , Robert Haas wrote:
>> On Mon, Sep 19, 2011 at 1:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>>>> select '[ 2 , NULL )'::int4range;
>>>> ERROR:  NULL range boundaries are not supported
>>>> LINE 1: select '[ 2 , NULL )'::int4range;
>>>
>>> I think this might require more opinions. There is a trade-off here
>>> between convenience and confusion: accepting NULL is convenient in the
>>> constructors, because it avoids the need to have extra constructors just
>>> for unbounded ranges; but could lead to confusion between NULL and INF
>>> (which are not the same).
>>
>> I agree with this line of reasoning.  I think we will be making pain
>> for ourselves if we need to invent a bunch more constructors just to
>> have a way of indicating an unbounded range, but OTOH I don't see any
>> compelling reason why the type input function needs to accept N-U-L-L.
>
> The one reason I can see in favour of supporting N-U-L-L there is
> compatibility with arrays. I've recently had the questionable pleasure
> of writing PHP functions to parse and emit our textual representations of
> arrays, records, dates and timestamps. After that experience, I feel that
> the number of similar-yet-slightly-different textual input output format
> for non-primitive types is already excessive, and any further additions
> should be modeled after some existing ones.

Well, I'm not violently opposed to accepting NULL to mean an unbounded
range.  The semantics of "no bound at all" and "an unknown bound"
(i.e. NULL) are pretty close, as Kevin also points out downthread.
But I think the way Jeff actually did it is OK, too.  What I really
care about is that we don't talk ourselves into needing a zillion
constructor functions.  Making things work with a single constructor
function seems to me to simplify life quite a bit, and allowing there
seems essential for that.

(I am also vaguely wondering what happens if if you have a text
range.... is (nubile, null) ambiguous?)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-19T16:32:44Z

On Sep19, 2011, at 17:46 , Jeff Davis wrote:
> On Mon, 2011-09-19 at 17:23 +0200, Florian Pflug wrote:
>> The one reason I can see in favour of supporting N-U-L-L there is 
>> compatibility with arrays.
> 
> But arrays actually do store and produce NULLs; ranges don't.

Hm, yeah, granted. But OTOH, clients will very likely use NULL, null, nil
or something similar as a bound to represent unbounded ranges. And those
will probably already be mapped to SQL's NULL. So in practice, people will
think of unbounded ranges having the (lower or upper) bound NULL I think.

>> I've recently had the questionable pleasure
>> of writing PHP functions to parse and emit our textual representations of
>> arrays, records, dates and timestamps. After that experience, I feel that
>> the number of similar-yet-slightly-different textual input output format
>> for non-primitive types is already excessive, and any further additions
>> should be modeled after some existing ones.
> 
> I'm not clear on how accepting "NULL" would really save effort. With
> ranges, the brackets have an actual meaning (inclusivity), and empty
> ranges have no brackets at all. So I don't think it's going to be easy
> to write one function to parse everything.

No, but more similar the format are the easier it gets to at least factor
the hairy parts of such a parser into a common subroutine. Assume that we
don't support NULL as an alias for INF. What would then be the result of

  '[A,NULL)'::textrange?

Presumably, it'd be the same as textrange('A','NULL','[)'). Which think
is a bit surprising, since '[A,NULL]'::text[] produces ARRAY['A',NULL],
*NOT* ARRAY['A','NULL'].

BTW, we currently represent infinity for floating point values as
'Infinity', not 'INF'. Shouldn't we do the same for ranges, i.e. make

  int4range(0,NULL,'[)')::text

return 

  '[0,Infinity)'?

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-19T16:48:42Z

On Mon, 2011-09-19 at 11:00 -0500, Kevin Grittner wrote:
> On a practical level, our shop is already effectively doing this. 
> We have several tables where part of the primary key is "effective
> date" and there is a null capable "expiration date" -- with a NULL
> meaning that no expiration date has been set.  It would be nice to
> be able to have a "generated column" function which used these two
> dates to build a range for exclusion constraints and such.

Agreed, that's a good convenience argument for accepting NULL boundaries
in the constructors.

Underneath though, we don't use NULL semantics (because they don't make
sense for ranges -- in fact, avoiding the need to constantly
special-case NULLs is one of the reasons to use range types). So, we
want to avoid confusion where possible.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-21T07:23:39Z

On Mon, 2011-09-19 at 18:32 +0200, Florian Pflug wrote:
> No, but more similar the format are the easier it gets to at least factor
> the hairy parts of such a parser into a common subroutine. Assume that we
> don't support NULL as an alias for INF. What would then be the result of
> 
>   '[A,NULL)'::textrange?

I think that the range input should *parse* NULL in a similar way, but
reject it. So, to make it the range between two definite strings, you'd
do:

  '[A,"NULL")'::textrange

which would be equal to textrange('A','NULL','[)'). Without the quotes,
it would detect the NULL, and give an error. Open to suggestion here,
though.

> Presumably, it'd be the same as textrange('A','NULL','[)'). Which think
> is a bit surprising, since '[A,NULL]'::text[] produces ARRAY['A',NULL],
> *NOT* ARRAY['A','NULL'].
> 
> BTW, we currently represent infinity for floating point values as
> 'Infinity', not 'INF'. Shouldn't we do the same for ranges, i.e. make
> 
>   int4range(0,NULL,'[)')::text
> 
> return 
> 
>   '[0,Infinity)'?

I'm open to that, if you think it's an improvement I'll do it (but we
should probably pick one identifiable string and stick with it). What
I'd like to avoid is adding to the NULL/infinity confusion.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-21T07:29:47Z

On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
> What I really
> care about is that we don't talk ourselves into needing a zillion
> constructor functions.  Making things work with a single constructor
> function seems to me to simplify life quite a bit, and allowing there
> seems essential for that.

I think we pretty much all agree on that. However, you did see the note
about the difficulty of using default parameters in built-in functions,
right?

I ultimately ended up with 4 constructors, each with the same name but
0, 1, 2, and 3 parameters. Suggestions welcome.

> (I am also vaguely wondering what happens if if you have a text
> range.... is (nubile, null) ambiguous?)

There are a few ways to handle that. I would lean toward parsing the
NULL as a special keyword, and then rejecting it (does it matter if it's
upper case?).

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-21T11:24:03Z

On Sep21, 2011, at 09:23 , Jeff Davis wrote:
> On Mon, 2011-09-19 at 18:32 +0200, Florian Pflug wrote:
>> No, but more similar the format are the easier it gets to at least factor
>> the hairy parts of such a parser into a common subroutine. Assume that we
>> don't support NULL as an alias for INF. What would then be the result of
>> 
>>  '[A,NULL)'::textrange?
> 
> I think that the range input should *parse* NULL in a similar way, but
> reject it. So, to make it the range between two definite strings, you'd
> do:
> 
>  '[A,"NULL")'::textrange
> 
> which would be equal to textrange('A','NULL','[)'). Without the quotes,
> it would detect the NULL, and give an error. Open to suggestion here,
> though.

Hm, that seems like a reasonable compromise. As long as range types and
arrays agree on the same basic lexical rules regarding quoting and whitespace
(i.e. that spaces outside of double-quotes are non-significant, that keywords
like NULL and INF/Infinity are case-insensitive, ...) I'm happy I guess.

>> BTW, we currently represent infinity for floating point values as
>> 'Infinity', not 'INF'. Shouldn't we do the same for ranges, i.e. make
>> 
>>  int4range(0,NULL,'[)')::text
>> 
>> return 
>> 
>>  '[0,Infinity)'?
> 
> I'm open to that, if you think it's an improvement I'll do it (but we
> should probably pick one identifiable string and stick with it). What
> I'd like to avoid is adding to the NULL/infinity confusion.

I've thought about this some more, and came to realize that the question
here really is whether

  floatrange(0, 'Infinity'::float, '[)')

and

  floatrange(0, NULL, '[)')

are the same thing or not. If they're not, then obviously using "Infinity"
to represent omitted bounds is going to be very confusing. If they are,
then using "Infinity" seems preferable. Maybe boundaries should be restricted
to numeric float values (i.e. +/-Infinity and NaN should be rejected), though
I dunno if the range type infrastructure supports that. Thoughts?

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-09-21T12:00:26Z

On Wed, Sep 21, 2011 at 3:29 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
>> What I really
>> care about is that we don't talk ourselves into needing a zillion
>> constructor functions.  Making things work with a single constructor
>> function seems to me to simplify life quite a bit, and allowing there
>> seems essential for that.
>
> I think we pretty much all agree on that. However, you did see the note
> about the difficulty of using default parameters in built-in functions,
> right?
>
> I ultimately ended up with 4 constructors, each with the same name but
> 0, 1, 2, and 3 parameters. Suggestions welcome.
>
>> (I am also vaguely wondering what happens if if you have a text
>> range.... is (nubile, null) ambiguous?)
>
> There are a few ways to handle that. I would lean toward parsing the
> NULL as a special keyword, and then rejecting it (does it matter if it's
> upper case?).

Boy, that seems really weird to me.  If you're going to do it, it
ought to be case-insensitive, but I think detecting the case only for
the purpose of rejecting it is probably a mistake.  I mean, if
(nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
really seem like they ought to be any different.  Otherwise, anyone
who wants to construct these strings programatically is going to need
to escape everything and always write ("cat","dog") or however you do
that, and that seems like an unnecessary imposition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-21T12:41:15Z

On Sep21, 2011, at 14:00 , Robert Haas wrote:
> On Wed, Sep 21, 2011 at 3:29 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
>>> (I am also vaguely wondering what happens if if you have a text
>>> range.... is (nubile, null) ambiguous?)
>> 
>> There are a few ways to handle that. I would lean toward parsing the
>> NULL as a special keyword, and then rejecting it (does it matter if it's
>> upper case?).
> 
> Boy, that seems really weird to me.  If you're going to do it, it
> ought to be case-insensitive, but I think detecting the case only for
> the purpose of rejecting it is probably a mistake.  I mean, if
> (nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
> really seem like they ought to be any different.

But that's exactly how arrays behave too. '{null,nutty}' is interpreted
as ARRAY[NULL,'nutty'] while '{nubile,nutty}' is interpreted as
ARRAY['nubile','nutty'].

> Otherwise, anyone
> who wants to construct these strings programatically is going to need
> to escape everything and always write ("cat","dog") or however you do
> that, and that seems like an unnecessary imposition.

Unless you fully depart from what arrays you, you'll have to do that anyway
because leading and trailing spaces aren't considered to be significant in
non-quoted elements. In other words, '( cat , dog )' represents
textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').

Also, as long as we need to recognize at least one special value meaning
a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
around the need for quotes in the general case. Well, expect making the
representation of

  range(X, NULL, '[)') be '[X)',

the one of

  range(NULL, X, '(]') be '(X]'

and the one of

  range(NULL, NULL, '()') be '()',

but I'm not sure that's an improvement. And even if it was, you'd still
need to quote X if it contained one of "(",")","[","]" or ",". So most
client would probably still choose to quote unconditionally, instead of
detecting whether it was necessary or not.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-09-21T12:47:00Z

On Wed, Sep 21, 2011 at 8:41 AM, Florian Pflug <fgp@phlo.org> wrote:
>> Boy, that seems really weird to me.  If you're going to do it, it
>> ought to be case-insensitive, but I think detecting the case only for
>> the purpose of rejecting it is probably a mistake.  I mean, if
>> (nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
>> really seem like they ought to be any different.
>
> But that's exactly how arrays behave too. '{null,nutty}' is interpreted
> as ARRAY[NULL,'nutty'] while '{nubile,nutty}' is interpreted as
> ARRAY['nubile','nutty'].

Oh.  Well, never mind then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-09-21T14:41:39Z

Florian Pflug <fgp@phlo.org> writes:
> On Sep21, 2011, at 14:00 , Robert Haas wrote:
>> Otherwise, anyone
>> who wants to construct these strings programatically is going to need
>> to escape everything and always write ("cat","dog") or however you do
>> that, and that seems like an unnecessary imposition.

> Unless you fully depart from what arrays you, you'll have to do that anyway
> because leading and trailing spaces aren't considered to be significant in
> non-quoted elements. In other words, '( cat , dog )' represents
> textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').

Keep in mind that the array I/O behavior is widely considered to suck.
When we defined the record I/O behavior, we did not emulate that
whitespace weirdness, nor a number of other weirdnesses.  I would argue
that ranges ought to model their I/O behavior on records not arrays,
because that's not as much of a legacy syntax.

> Also, as long as we need to recognize at least one special value meaning
> a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
> around the need for quotes in the general case.

Right.  In the record case, we used an empty string for NULL, and then
had to insist on quotes for actual empty strings.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-21T15:20:10Z

On Sep21, 2011, at 16:41 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Sep21, 2011, at 14:00 , Robert Haas wrote:
>>> Otherwise, anyone
>>> who wants to construct these strings programatically is going to need
>>> to escape everything and always write ("cat","dog") or however you do
>>> that, and that seems like an unnecessary imposition.
> 
>> Unless you fully depart from what arrays you, you'll have to do that anyway
>> because leading and trailing spaces aren't considered to be significant in
>> non-quoted elements. In other words, '( cat , dog )' represents
>> textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').
> 
> Keep in mind that the array I/O behavior is widely considered to suck.
> When we defined the record I/O behavior, we did not emulate that
> whitespace weirdness, nor a number of other weirdnesses.  I would argue
> that ranges ought to model their I/O behavior on records not arrays,
> because that's not as much of a legacy syntax.

Interesting. Funnily enough, I always assumed it was the other way around.
Probably because I don't care much for the empty-unquoted-string-is-NULL
behaviour of records. But leaving that personal opinion aside, yeah, in that
case we should make range I/O follow record I/O.

>> Also, as long as we need to recognize at least one special value meaning
>> a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
>> around the need for quotes in the general case.
> 
> Right.  In the record case, we used an empty string for NULL, and then
> had to insist on quotes for actual empty strings.

Hm, so we'd have

  '(X,)' for range(X, NULL, '()'),
  '(,X)' for range(NULL, X, '()') and
  '(,)' for range(NULL, NULL, '()').

We'd then have the choice of either declaring

  '(X,]' to mean '(X,)',
  '[,X)' to mean '(,X)' and
  '[,]' to mean '(,)'

or to forbid the use of '[' and ']' for unspecified bounds.

(Leaving out the ',' in the case of only one bound as in my reply to Robert's
mail somewhere else in this thread doesn't actually work, since it'd be 
ambiguous whether '(X)' means range(X, NULL, '()') or range(NULL, X, '()').)

One nice property is that, apart from the different brackets used, this
representation is identical to the one used by records while still avoiding
the infinity vs. NULL confusion.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-21T17:00:39Z

On Wed, 2011-09-21 at 17:20 +0200, Florian Pflug wrote:
> Hm, so we'd have
> 
>   '(X,)' for range(X, NULL, '()'),
>   '(,X)' for range(NULL, X, '()') and
>   '(,)' for range(NULL, NULL, '()').
> 
> We'd then have the choice of either declaring
> 
>   '(X,]' to mean '(X,)',
>   '[,X)' to mean '(,X)' and
>   '[,]' to mean '(,)'
> 
> or to forbid the use of '[' and ']' for unspecified bounds.

Right now, I just canonicalize it to round brackets if infinite. Seems
pointless to reject it, but I can if someone thinks it's better.

> (Leaving out the ',' in the case of only one bound as in my reply to Robert's
> mail somewhere else in this thread doesn't actually work, since it'd be 
> ambiguous whether '(X)' means range(X, NULL, '()') or range(NULL, X, '()').)
> 
> One nice property is that, apart from the different brackets used, this
> representation is identical to the one used by records while still avoiding
> the infinity vs. NULL confusion.

OK, I like that. Slightly strange to require quoting empty strings, but
not stranger than the alternatives.

While we're at it, any suggestions on the text representation of an
empty range?

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-21T17:02:32Z

On Wed, 2011-09-21 at 13:24 +0200, Florian Pflug wrote:
> I've thought about this some more, and came to realize that the question
> here really is whether
> 
>   floatrange(0, 'Infinity'::float, '[)')
> 
> and
> 
>   floatrange(0, NULL, '[)')
> 
> are the same thing or not.

The unbounded side of a range is never equal to a value in the data
type's domain, so no, it's not the same.

I think that we pretty much settled on just using an empty string for
infinity in the other thread, right? So that makes this a non-issue.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-22T00:31:45Z

On Sep21, 2011, at 19:00 , Jeff Davis wrote:
> While we're at it, any suggestions on the text representation of an
> empty range?

My personal favourite would be '0', since it resembles the symbol used
for empty sets in mathematics, and we already decided to use mathematical
notation for ranges.

If we're concerned that most of our users won't get that, then 'empty'
would be a viable alternative I think.

From a consistency POV it'd make sense to use a bracket-based syntax
also for empty ranges. But the only available options would be '()' and '[]',
which are too easily confused with '(,)' and '[,]' (which we already
decided should represent the full range).

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-09-22T00:40:29Z

On Sep21, 2011, at 19:02 , Jeff Davis wrote:
> On Wed, 2011-09-21 at 13:24 +0200, Florian Pflug wrote:
>> I've thought about this some more, and came to realize that the question
>> here really is whether
>> 
>>  floatrange(0, 'Infinity'::float, '[)')
>> 
>> and
>> 
>>  floatrange(0, NULL, '[)')
>> 
>> are the same thing or not.
> 
> The unbounded side of a range is never equal to a value in the data
> type's domain, so no, it's not the same.
> 
> I think that we pretty much settled on just using an empty string for
> infinity in the other thread, right? So that makes this a non-issue.

Agreed.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-09-22T06:01:22Z

On Thu, 2011-09-22 at 02:31 +0200, Florian Pflug wrote:
> My personal favourite would be '0', since it resembles the symbol used
> for empty sets in mathematics, and we already decided to use mathematical
> notation for ranges.
> 
> If we're concerned that most of our users won't get that, then 'empty'
> would be a viable alternative I think.
> 
> From a consistency POV it'd make sense to use a bracket-based syntax
> also for empty ranges. But the only available options would be '()' and '[]',
> which are too easily confused with '(,)' and '[,]' (which we already
> decided should represent the full range).

Yes, I think () is too close to (,).

Brainstorming so far:
 0       : simple, looks like the empty set symbol
 empty   : simple
 <empty> : a little more obvious that it's special
 <>      : visually looks empty
 -       : also looks empty
 {}      : mathematical notation, but doesn't quite fit ranges

I don't have a strong opinion. I'd be OK with any of those.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-02T06:12:03Z

On Wed, 2011-09-21 at 10:41 -0400, Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
> > On Sep21, 2011, at 14:00 , Robert Haas wrote:
> >> Otherwise, anyone
> >> who wants to construct these strings programatically is going to need
> >> to escape everything and always write ("cat","dog") or however you do
> >> that, and that seems like an unnecessary imposition.
> 
> > Unless you fully depart from what arrays you, you'll have to do that anyway
> > because leading and trailing spaces aren't considered to be significant in
> > non-quoted elements. In other words, '( cat , dog )' represents
> > textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').
> 
> Keep in mind that the array I/O behavior is widely considered to suck.
> When we defined the record I/O behavior, we did not emulate that
> whitespace weirdness, nor a number of other weirdnesses.  I would argue
> that ranges ought to model their I/O behavior on records not arrays,
> because that's not as much of a legacy syntax.

Done. Now range types more closely resemble records in parsing behavior.
Patch attached.

Changes:
 * new parser + doc changes
 * merged with master
 * pg_dump now work
 * various cleanup

TODO:
 * ultimately, there should be a simple cache to avoid repeated syscache
lookups. I have put this off mainly to avoid premature optimization, but
that code has been pretty stable for a while
 * support casts and typmod -- this requires the work for the
RangeCoerceExpr that I mentioned before

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-10-02T09:32:29Z

On Oct2, 2011, at 08:12 , Jeff Davis wrote:
> Done. Now range types more closely resemble records in parsing behavior.
> Patch attached.

Cool!

Looking at the patch, I noticed that it's possible to specify the default
boundaries ([], [), (] or ()) per individual float type with the
DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
do more harm then good - it makes it impossible to deduce the meaning of
e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.

I suggest we pick one set of default boundaries, ideally '[)' since that
is what all the built-in canonization functions produce, and stick with it.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-02T16:01:10Z

On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
> Looking at the patch, I noticed that it's possible to specify the default
> boundaries ([], [), (] or ()) per individual float type with the
> DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
> do more harm then good - it makes it impossible to deduce the meaning of
> e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.
> 
> I suggest we pick one set of default boundaries, ideally '[)' since that
> is what all the built-in canonization functions produce, and stick with it.

That sounds reasonable to me. Unless someone objects, I'll make the
change in the next patch.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-02T19:05:31Z

On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
> Looking at the patch, I noticed that it's possible to specify the default
> boundaries ([], [), (] or ()) per individual float type with the
> DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
> do more harm then good - it makes it impossible to deduce the meaning of
> e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.
> 
> I suggest we pick one set of default boundaries, ideally '[)' since that
> is what all the built-in canonization functions produce, and stick with it.

Done.

Also, made the range parsing even more like records with more code
copied verbatim. And fixed some parsing tests along the way.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Alexander Korotkov <aekorotkov@gmail.com> — 2011-10-06T19:26:09Z

Hi, Jeff!

Heikki has recently commited my patch about picksplit for GiST on points and
boxes:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f3bd86843e5aad84585a57d3f6b80db3c609916
I would like to try this picksplit method on ranges. I believe that it might
be much more efficient on highly overlapping ranges than current picksplit.
Range types patch isn't commited, yet. So, what way of work publishing is
prefered in this case? Add-on patch, separate branch in git or something
else?

------
With best regards,
Alexander Korotkov.

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-07T03:41:42Z

On Thu, 2011-10-06 at 23:26 +0400, Alexander Korotkov wrote:
> Hi, Jeff!
> 
> 
> Heikki has recently commited my patch about picksplit for GiST on
> points and boxes:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f3bd86843e5aad84585a57d3f6b80db3c609916
> I would like to try this picksplit method on ranges. I believe that it
> might be much more efficient on highly overlapping ranges than current
> picksplit. Range types patch isn't commited, yet. So, what way of work
> publishing is prefered in this case? Add-on patch, separate branch in
> git or something else?

Sounds great! Thank you.

The repo is available here:

http://git.postgresql.org/gitweb/?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

I'd prefer to include it in the initial patch. If the current GiST code
is going to be replaced, then there's not much sense reviewing/testing
it.

You may need to consider unbounded and empty ranges specially. I made an
attempt to do so in the current GiST code, and you might want to take a
look at that first. I'm not particularly attached to my approach, but we
should do something reasonable with unbounded and empty ranges.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Janes <jeff.janes@gmail.com> — 2011-10-08T19:44:52Z

On Sun, Oct 2, 2011 at 12:05 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
>> Looking at the patch, I noticed that it's possible to specify the default
>> boundaries ([], [), (] or ()) per individual float type with the
>> DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
>> do more harm then good - it makes it impossible to deduce the meaning of
>> e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.
>>
>> I suggest we pick one set of default boundaries, ideally '[)' since that
>> is what all the built-in canonization functions produce, and stick with it.
>
> Done.
>
> Also, made the range parsing even more like records with more code
> copied verbatim. And fixed some parsing tests along the way.

When I apply this to head, "make check" fails with:

  create type textrange_en_us as range(subtype=text, collation="en_US");
+ ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist

Is this a problem?  e.g. will it break the build-farm?

"make check LANG=en_US" does pass

Cheers,

Jeff

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-09T00:14:21Z

On Sat, 2011-10-08 at 12:44 -0700, Jeff Janes wrote:
> When I apply this to head, "make check" fails with:
> 
>   create type textrange_en_us as range(subtype=text, collation="en_US");
> + ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist
> 
> Is this a problem?  e.g. will it break the build-farm?
> 
> "make check LANG=en_US" does pass

Thank you for pointing that out. I think I need to remove those before
commit, but I just wanted them in there now to exercise that part of the
code.

Is there a better way to test collations like that?

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-09T16:06:34Z

Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2011-10-08 at 12:44 -0700, Jeff Janes wrote:
>> When I apply this to head, "make check" fails with:
>> 
>> create type textrange_en_us as range(subtype=text, collation="en_US");
>> + ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist

> Thank you for pointing that out. I think I need to remove those before
> commit, but I just wanted them in there now to exercise that part of the
> code.

> Is there a better way to test collations like that?

You could add some code that assumes particular collations are present
into collate.linux.utf8.sql.  It's not going to work to depend on
anything beyond C locale being present in a mainline regression test
case.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Thom Brown <thom@linux.com> — 2011-10-10T13:27:03Z

On 2 October 2011 20:05, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
>> Looking at the patch, I noticed that it's possible to specify the default
>> boundaries ([], [), (] or ()) per individual float type with the
>> DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
>> do more harm then good - it makes it impossible to deduce the meaning of
>> e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.
>>
>> I suggest we pick one set of default boundaries, ideally '[)' since that
>> is what all the built-in canonization functions produce, and stick with it.
>
> Done.
>
> Also, made the range parsing even more like records with more code
> copied verbatim. And fixed some parsing tests along the way.

I don't know if this has already been discussed, but can you explain
the following:

postgres=# select '[1,8]'::int4range;
 int4range
-----------
 [1,9)
(1 row)

It seems unintuitive to represent a discrete range using an exclusive
upper bound.  While I agree that the value itself is correct, it's
representation looks odd.  Is it necessary?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-10T16:44:51Z

On Mon, 2011-10-10 at 14:27 +0100, Thom Brown wrote:
> I don't know if this has already been discussed, but can you explain
> the following:
> 
> postgres=# select '[1,8]'::int4range;
>  int4range
> -----------
>  [1,9)
> (1 row)
> 
> It seems unintuitive to represent a discrete range using an exclusive
> upper bound.  While I agree that the value itself is correct, it's
> representation looks odd.  Is it necessary?

The "canonicalize" function (specified at type creation time) allows you
to specify the canonical output representation. So, I can change the
canonical form for discrete ranges to use '[]' notation if we think
that's more expected.

But then "int4range(1,8)" would still mean "int4range(1,8,'[)')" and
therefore '[1,7]'. I used to have a "default_flags" parameter that could
also be specified at type creation time that would control the default
third parameter (the parameter that controls inclusivity) of the
constructor. However, I removed the "default_flags" parameter because,
as Florian pointed out, it's better to have a consistent output from the
constructor.

I'm open to suggestions, including potentially bringing back
"default_flags".

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-10T16:53:54Z

Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2011-10-10 at 14:27 +0100, Thom Brown wrote:
>> I don't know if this has already been discussed, but can you explain
>> the following:
>> 
>> postgres=# select '[1,8]'::int4range;
>> int4range
>> -----------
>> [1,9)
>> (1 row)
>> 
>> It seems unintuitive to represent a discrete range using an exclusive
>> upper bound.  While I agree that the value itself is correct, it's
>> representation looks odd.  Is it necessary?

> The "canonicalize" function (specified at type creation time) allows you
> to specify the canonical output representation. So, I can change the
> canonical form for discrete ranges to use '[]' notation if we think
> that's more expected.

What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
fail with an integer overflow.  I suppose you could canonicalize it to
an unbounded range, but that seems unnecessarily surprising.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-10-10T17:22:50Z

On Oct10, 2011, at 18:53 , Tom Lane wrote:
> What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
> fail with an integer overflow.  I suppose you could canonicalize it to
> an unbounded range, but that seems unnecessarily surprising.

That is a very good point. Canonicalizing to an unbounded range doesn't work,
because, as it stands, the ranges '[1, INT_MAX]' and '[1,)' are *not* equal. So
the only remaining option is to canonicalize to the closed form always.

I still think we should strive for consistency here, so let's also make
'[]' the default flags for the range constructors.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-10T17:31:09Z

On Mon, 2011-10-10 at 12:53 -0400, Tom Lane wrote:
> > The "canonicalize" function (specified at type creation time) allows you
> > to specify the canonical output representation. So, I can change the
> > canonical form for discrete ranges to use '[]' notation if we think
> > that's more expected.
> 
> What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
> fail with an integer overflow.  I suppose you could canonicalize it to
> an unbounded range, but that seems unnecessarily surprising.

So, are you suggesting that I canonicalize to '[]' then? That seems
reasonable to me, but there's still some slight awkwardness because
int4range(1,10) would be '[1,9]'.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Thom Brown <thom@linux.com> — 2011-10-10T17:39:20Z

On 10 October 2011 18:31, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2011-10-10 at 12:53 -0400, Tom Lane wrote:
>> > The "canonicalize" function (specified at type creation time) allows you
>> > to specify the canonical output representation. So, I can change the
>> > canonical form for discrete ranges to use '[]' notation if we think
>> > that's more expected.
>>
>> What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
>> fail with an integer overflow.  I suppose you could canonicalize it to
>> an unbounded range, but that seems unnecessarily surprising.
>
> So, are you suggesting that I canonicalize to '[]' then? That seems
> reasonable to me, but there's still some slight awkwardness because
> int4range(1,10) would be '[1,9]'.

Why?  int4range(1,10,'[]') returns:

 int4range
-----------
 [1,11)
(1 row)

Which if corrected to display the proposed way would just be '[1,10]'.
 So the default boundaries should be '[]' as opposed to '[)' as it is
now.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-10T17:41:42Z

On Mon, 2011-10-10 at 19:22 +0200, Florian Pflug wrote:
> I still think we should strive for consistency here, so let's also make
> '[]' the default flags for the range constructors.

For continuous ranges I don't think that's a good idea. Closed-open is a
very widely-accepted convention and there are good reasons for it -- for
instance, it's good for specifying contiguous-but-non-overlapping
ranges.

So, I think we either need to standardize on '[)' or allow different
default_flags for different types. Or, always specify the inclusivity in
the constructor (hopefully in a convenient way).

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-10T17:45:01Z

On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
>  So the default boundaries should be '[]' as opposed to '[)' as it is
> now.

Would that vary between range types? In other words, do I bring back
default_flags?

If not, I think a lot of people will object. The most common use-case
for range types are for continuous ranges like timestamps. And (as I
pointed out in reply to Florian) there are good reasons to use the '[)'
convention for those cases.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Thom Brown <thom@linux.com> — 2011-10-10T17:53:34Z

On 10 October 2011 18:45, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
>>  So the default boundaries should be '[]' as opposed to '[)' as it is
>> now.
>
> Would that vary between range types? In other words, do I bring back
> default_flags?
>
> If not, I think a lot of people will object. The most common use-case
> for range types are for continuous ranges like timestamps. And (as I
> pointed out in reply to Florian) there are good reasons to use the '[)'
> convention for those cases.

I'm proposing it for discrete ranges.  For continuous ranges, I guess
it makes sense to have "up to, but not including".  The same boundary
inclusivity/exclusivity thing seems unintuitive for discrete ranges.
This has the downside of inconsistency, but I don't think that's
really a solid argument against it since their use will be different
anyway.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Thom Brown <thom@linux.com> — 2011-10-10T18:06:16Z

On 10 October 2011 18:53, Thom Brown <thom@linux.com> wrote:
> On 10 October 2011 18:45, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
>>>  So the default boundaries should be '[]' as opposed to '[)' as it is
>>> now.
>>
>> Would that vary between range types? In other words, do I bring back
>> default_flags?
>>
>> If not, I think a lot of people will object. The most common use-case
>> for range types are for continuous ranges like timestamps. And (as I
>> pointed out in reply to Florian) there are good reasons to use the '[)'
>> convention for those cases.
>
> I'm proposing it for discrete ranges.  For continuous ranges, I guess
> it makes sense to have "up to, but not including".  The same boundary
> inclusivity/exclusivity thing seems unintuitive for discrete ranges.
> This has the downside of inconsistency, but I don't think that's
> really a solid argument against it since their use will be different
> anyway.

Okay, a real example of why discrete should be '[]' and continuous
should be '[)'.

If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
precisely it either becomes free or is available to someone else, so
it can be booked 11:00 to 12:00 without conflict.

If you have raffle tickets numbered 1 to 100 (int4range), and you ask
for tickets 9 to 11, no-one else can use 11 as it aligns with the last
one you bought.

So for me, it's intuitive for them to behave differently.  So yes,
default behaviour would vary between types, but I didn't previously
read anything on default_flags, so I don't know where that comes into
it.  Shouldn't it be the case that if a type has a canonical function,
it's entirely inclusive, otherwise it's upper boundary is exclusive?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-10-11T01:14:57Z

On Oct10, 2011, at 20:06 , Thom Brown wrote:
> Okay, a real example of why discrete should be '[]' and continuous
> should be '[)'.
> 
> If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
> precisely it either becomes free or is available to someone else, so
> it can be booked 11:00 to 12:00 without conflict.
> 
> If you have raffle tickets numbered 1 to 100 (int4range), and you ask
> for tickets 9 to 11, no-one else can use 11 as it aligns with the last
> one you bought.
> 
> So for me, it's intuitive for them to behave differently.  So yes,
> default behaviour would vary between types, but I didn't previously
> read anything on default_flags, so I don't know where that comes into
> it.  Shouldn't it be the case that if a type has a canonical function,
> it's entirely inclusive, otherwise it's upper boundary is exclusive?

I'm not convinced by this. The question here is not whether which types
of ranges we *support*, only which type we consider to be more *canonical*,
and whether the bounds provided to a range constructor are inclusive or
exclusive by *default*.

Maybe ranges over discrete types are slightly more likely to be closed,
and ranges over continuous types slightly more likely to be open. Still,
I very much doubt that the skew in the distribution is large enough to
warrant the confusion and possibility of subtle bugs we introduce by making
the semantics of a range type's constructor depend on the definition of the
range and/or base type. Especially since we're talking about only *6* extra
characters to communicate the intended inclusivity/exclusivity of the bounds
to the range constructor.

Also, the distinction between types for which ranges should "obviously"
be closed, and those for which they should "obviously" be half-open is nowhere
as clear-cut as it seems at first sight.

First, there's the type "date", which in my book is discrete. So we'd make
date ranges closed by default, not half-open. And there's timestamp, which
is continuous so we'd make its default half-open. That doesn't seem exactly
intuitive to me.

Second, there's int4 and float8, one discrete, one continuous. So would we
make int4range(1, 2) include 2, but float8range(1.0, 2.0) *not* include 2.0?

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-10-11T01:26:01Z

On Oct10, 2011, at 19:41 , Jeff Davis wrote:
> On Mon, 2011-10-10 at 19:22 +0200, Florian Pflug wrote:
>> I still think we should strive for consistency here, so let's also make
>> '[]' the default flags for the range constructors.
> 
> For continuous ranges I don't think that's a good idea. Closed-open is a
> very widely-accepted convention and there are good reasons for it -- for
> instance, it's good for specifying contiguous-but-non-overlapping
> ranges.

It really depends on what you're using ranges for. Yeah, if you're "convering"
something with ranges (like mapping things to a certain period of time, or
an area of space), then half-open ranges are probably very common.

If, OTOH, you're storing measurement with error margins, then open ranges,
i.e. '()', are probably what you want.

I still firmly believe that consistency trumps convenience here. Specifying
the range boundaries' exclusivity/inclusivity explicitly is quite cheap...

> So, I think we either need to standardize on '[)' or allow different
> default_flags for different types. Or, always specify the inclusivity in
> the constructor (hopefully in a convenient way).

In the light of Tom's argument, my pick would be '[]'. It's seem strange
to normalize ranges over discrete types to closed ranges, yet make the
construction function expect open boundaries by default.

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-11T05:25:18Z

On Tue, 2011-10-11 at 03:14 +0200, Florian Pflug wrote:
> Maybe ranges over discrete types are slightly more likely to be closed,
> and ranges over continuous types slightly more likely to be open. Still,
> I very much doubt that the skew in the distribution is large enough to
> warrant the confusion and possibility of subtle bugs we introduce by making
> the semantics of a range type's constructor depend on the definition of the
> range and/or base type.

I think you persuaded me on the consistency aspect.

I'm wondering whether to do away with the default argument entirely, and
just force the user to always specify it during construction. It seems
like a shame that such pain is caused over the syntax, because in a
perfect world it wouldn't be a bother to specify it at all. I even
considered using prefix/postfix operators to try to make it nicer, but
it seems like every idea I had was just short of practical. Maybe a few
extra characters at the end aren't so bad.

I'd like to hear from some potential users though to see if anyone
recoils at the common case.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Thom Brown <thom@linux.com> — 2011-10-11T07:25:47Z

On 11 October 2011 02:14, Florian Pflug <fgp@phlo.org> wrote:
> On Oct10, 2011, at 20:06 , Thom Brown wrote:
>> Okay, a real example of why discrete should be '[]' and continuous
>> should be '[)'.
>>
>> If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
>> precisely it either becomes free or is available to someone else, so
>> it can be booked 11:00 to 12:00 without conflict.
>>
>> If you have raffle tickets numbered 1 to 100 (int4range), and you ask
>> for tickets 9 to 11, no-one else can use 11 as it aligns with the last
>> one you bought.
>>
>> So for me, it's intuitive for them to behave differently.  So yes,
>> default behaviour would vary between types, but I didn't previously
>> read anything on default_flags, so I don't know where that comes into
>> it.  Shouldn't it be the case that if a type has a canonical function,
>> it's entirely inclusive, otherwise it's upper boundary is exclusive?
>
> First, there's the type "date", which in my book is discrete. So we'd make
> date ranges closed by default, not half-open. And there's timestamp, which
> is continuous so we'd make its default half-open. That doesn't seem exactly
> intuitive to me.

Ah yes, I agree there.  Okay, I see your point.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

David Fetter <david@fetter.org> — 2011-10-11T12:43:07Z

On Mon, Oct 10, 2011 at 10:25:18PM -0700, Jeff Davis wrote:
> On Tue, 2011-10-11 at 03:14 +0200, Florian Pflug wrote:
> > Maybe ranges over discrete types are slightly more likely to be
> > closed, and ranges over continuous types slightly more likely to
> > be open. Still, I very much doubt that the skew in the
> > distribution is large enough to warrant the confusion and
> > possibility of subtle bugs we introduce by making the semantics of
> > a range type's constructor depend on the definition of the range
> > and/or base type.
> 
> I think you persuaded me on the consistency aspect.
> 
> I'm wondering whether to do away with the default argument entirely,
> and just force the user to always specify it during construction. It
> seems like a shame that such pain is caused over the syntax, because
> in a perfect world it wouldn't be a bother to specify it at all. I
> even considered using prefix/postfix operators to try to make it
> nicer, but it seems like every idea I had was just short of
> practical. Maybe a few extra characters at the end aren't so bad.
> 
> I'd like to hear from some potential users though to see if anyone
> recoils at the common case.

I'd recoil at not having ranges default to left-closed, right-open.
The use case for that one is so compelling that I'm OK with making it
the default from which deviations need to be specified.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Range Types - typo + NULL string constructor

Florian G. Pflug <fgp@phlo.org> — 2011-10-11T13:20:05Z

On Oct11, 2011, at 14:43 , David Fetter wrote:
> I'd recoil at not having ranges default to left-closed, right-open.
> The use case for that one is so compelling that I'm OK with making it
> the default from which deviations need to be specified.

The downside of that is that, as Tom pointed out upthread, we cannot
make [) the canonical representation of ranges. It'd require us to
increment the right boundary of a closed range, but that incremented
boundary might no longer be in the base type's domain.

So we'd end up with [) being the default for range construction,
but [] being the canonical representation, i.e. what you get back
when SELECTing a range (over a discrete base type).

Certainly not the end of the world, but is the convenience of being
able to write somerange(a, b) instead of somerange(a, b, '[)') really
worth it? I kind of doubt that...

best regards,
Florian Pflug

Re: Range Types - typo + NULL string constructor

David Fetter <david@fetter.org> — 2011-10-11T13:28:30Z

On Tue, Oct 11, 2011 at 03:20:05PM +0200, Florian Pflug wrote:
> On Oct11, 2011, at 14:43 , David Fetter wrote:
> > I'd recoil at not having ranges default to left-closed,
> > right-open.  The use case for that one is so compelling that I'm
> > OK with making it the default from which deviations need to be
> > specified.
> 
> The downside of that is that, as Tom pointed out upthread, we cannot
> make [) the canonical representation of ranges. It'd require us to
> increment the right boundary of a closed range, but that incremented
> boundary might no longer be in the base type's domain.
> 
> So we'd end up with [) being the default for range construction, but
> [] being the canonical representation, i.e. what you get back when
> SELECTing a range (over a discrete base type).
> 
> Certainly not the end of the world, but is the convenience of being
> able to write somerange(a, b) instead of somerange(a, b, '[)')
> really worth it? I kind of doubt that...

You're making a persuasive argument for the latter based solely on the
clarity.  If people see that 3rd element in the DDL, or need to
provide it, it's *very* obvious what's going on.

Cheers,
David (who suspects that having a syntax like somerange[a,b) just
won't work with the current state of parsers, etc.)
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-11T16:03:12Z

On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
> > Certainly not the end of the world, but is the convenience of being
> > able to write somerange(a, b) instead of somerange(a, b, '[)')
> > really worth it? I kind of doubt that...
> 
> You're making a persuasive argument for the latter based solely on the
> clarity.  If people see that 3rd element in the DDL, or need to
> provide it, it's *very* obvious what's going on.

That was how I originally thought, but we're also providing built-in
range types like tsrange and daterange. I could see how if the former
excluded the endpoint and the latter included it, it could be confusing.

We could go back to having different constructor names for different
inclusivity; e.g. int4range_cc(1,10). That at least removes the
awkwardness of typing (and seeing) '[]'.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-10-11T16:09:01Z

On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
>> > Certainly not the end of the world, but is the convenience of being
>> > able to write somerange(a, b) instead of somerange(a, b, '[)')
>> > really worth it? I kind of doubt that...
>>
>> You're making a persuasive argument for the latter based solely on the
>> clarity.  If people see that 3rd element in the DDL, or need to
>> provide it, it's *very* obvious what's going on.
>
> That was how I originally thought, but we're also providing built-in
> range types like tsrange and daterange. I could see how if the former
> excluded the endpoint and the latter included it, it could be confusing.
>
> We could go back to having different constructor names for different
> inclusivity; e.g. int4range_cc(1,10). That at least removes the
> awkwardness of typing (and seeing) '[]'.

The cure seems worse than the disease.  What is so bad about '[]'?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

David Fetter <david@fetter.org> — 2011-10-11T16:12:10Z

On Tue, Oct 11, 2011 at 12:09:01PM -0400, Robert Haas wrote:
> On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
> >> > Certainly not the end of the world, but is the convenience of being
> >> > able to write somerange(a, b) instead of somerange(a, b, '[)')
> >> > really worth it? I kind of doubt that...
> >>
> >> You're making a persuasive argument for the latter based solely on the
> >> clarity.  If people see that 3rd element in the DDL, or need to
> >> provide it, it's *very* obvious what's going on.
> >
> > That was how I originally thought, but we're also providing built-in
> > range types like tsrange and daterange. I could see how if the former
> > excluded the endpoint and the latter included it, it could be confusing.
> >
> > We could go back to having different constructor names for different
> > inclusivity; e.g. int4range_cc(1,10). That at least removes the
> > awkwardness of typing (and seeing) '[]'.
> 
> The cure seems worse than the disease.  What is so bad about '[]'?

Nothing's bad about '[]' per se.  What's better, but possibly out of
the reach of our current lexing and parsing system, would be things
like:

[1::int, 10)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-10-11T16:18:18Z

On Tue, Oct 11, 2011 at 12:12 PM, David Fetter <david@fetter.org> wrote:
> Nothing's bad about '[]' per se.  What's better, but possibly out of
> the reach of our current lexing and parsing system, would be things
> like:
>
> [1::int, 10)

That's been discussed before.  Aside from the parser issues (which are
formidable) it would break brace-matching in most if not all commonly
used editors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-11T16:30:02Z

On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
> The cure seems worse than the disease.  What is so bad about '[]'?

OK, so we stick with the 3-argument form. Do we have a default for the
third argument, or do we scrap it to avoid confusion?

There were some fairly strong objections to using '[]' as the default or
having the default vary between types. So, the only real option
remaining, if we do have a default, is '[)'.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-10-11T16:40:32Z

On Tue, Oct 11, 2011 at 12:30 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
>> The cure seems worse than the disease.  What is so bad about '[]'?
>
> OK, so we stick with the 3-argument form. Do we have a default for the
> third argument, or do we scrap it to avoid confusion?
>
> There were some fairly strong objections to using '[]' as the default or
> having the default vary between types. So, the only real option
> remaining, if we do have a default, is '[)'.

I think using '[)' is fine.  At some level, this is just a question of
expectations.  If you expect that int4range(1,4) will create a range
that includes 4, well, you're wrong.  Once you get used to it, it will
seem normal, and you'll know that you need to write
int4range(1,4,'[]') if that's what you want.  As long as the system is
designed around a set of consistent and well-thought-out principles,
people will get used to the details.  I don't see that the idea of a
half-open range over a discrete-valued type is particularly confusing
- we use them all the time in programming, when we make the end
pointer point to the byte following the end of the array, rather than
the last element - but even if it is, overall design consistency
trumps what someone may find to be the absolutely perfect behavior in
some particular case.  And saving typing is nearly always good -
unless it creates a LOT more confusion than I think this will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-11T17:48:41Z

On Tue, 2011-10-11 at 12:40 -0400, Robert Haas wrote:
> I think using '[)' is fine.  At some level, this is just a question of
> expectations.  If you expect that int4range(1,4) will create a range
> that includes 4, well, you're wrong.  Once you get used to it, it will
> seem normal, and you'll know that you need to write
> int4range(1,4,'[]') if that's what you want.  As long as the system is
> designed around a set of consistent and well-thought-out principles,
> people will get used to the details.  I don't see that the idea of a
> half-open range over a discrete-valued type is particularly confusing
> - we use them all the time in programming, when we make the end
> pointer point to the byte following the end of the array, rather than
> the last element - but even if it is, overall design consistency
> trumps what someone may find to be the absolutely perfect behavior in
> some particular case.  And saving typing is nearly always good -
> unless it creates a LOT more confusion than I think this will.

That sounds very reasonable to me.

Tom made an observation about '[1,INT_MAX]' thowing an error because
canonicalization would try to increment INT_MAX. But I'm not
particularly disturbed by it. If you want a bigger range, use int8range
or numrange -- the same advice we give to people who want unsigned
types. Or, for people who really need the entire range of signed int4
exactly, they can easily make their own range type that canonicalizes to
'[]'.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-11T17:51:46Z

Jeff Davis <pgsql@j-davis.com> writes:
> Tom made an observation about '[1,INT_MAX]' thowing an error because
> canonicalization would try to increment INT_MAX. But I'm not
> particularly disturbed by it. If you want a bigger range, use int8range
> or numrange -- the same advice we give to people who want unsigned
> types. Or, for people who really need the entire range of signed int4
> exactly, they can easily make their own range type that canonicalizes to
> '[]'.

I agree we shouldn't contort the entire design to avoid that corner
case.  We should, however, make sure that the increment *does* throw
an error, and not just silently overflow.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

David Fetter <david@fetter.org> — 2011-10-11T18:04:07Z

On Tue, Oct 11, 2011 at 12:18:18PM -0400, Robert Haas wrote:
> On Tue, Oct 11, 2011 at 12:12 PM, David Fetter <david@fetter.org> wrote:
> > Nothing's bad about '[]' per se.  What's better, but possibly out
> > of the reach of our current lexing and parsing system, would be
> > things like:
> >
> > [1::int, 10)
> 
> That's been discussed before.  Aside from the parser issues (which
> are formidable) it would break brace-matching in most if not all
> commonly used editors.

That being the situation, ubiquitous support for the natural syntax
looks like it's a decade away, minimum. :(

Trying to be cheery,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-11T19:57:26Z

Florian Pflug <fgp@phlo.org> writes:
> On Oct11, 2011, at 14:43 , David Fetter wrote:
>> I'd recoil at not having ranges default to left-closed, right-open.
>> The use case for that one is so compelling that I'm OK with making it
>> the default from which deviations need to be specified.

I agree with David on this.

> The downside of that is that, as Tom pointed out upthread, we cannot
> make [) the canonical representation of ranges.

Yeah, we certainly *can* do that, we just have to allow ranges that
include the last element of the domain to be corner cases that require
special handling.  If we don't want to just fail, we have to
canonicalize them to closed instead of open ranges.  It does not follow
that the default on input has to be closed.

Note that the INT_MAX case is probably not the worst issue in practice.
What is going to be an issue is ranges over enum types, where having the
last element being part of the range is a much more likely use-case.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-14T22:46:26Z

On 07.10.2011 06:41, Jeff Davis wrote:
> The repo is available here:
>
> http://git.postgresql.org/gitweb/?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

I took a look at this, fixed a bunch of minor issues, and pushed to my
git repo: git://git.postgresql.org/git/users/heikki/postgres.git.
Attached is an updated patch including those changes, for the archives.

A few issues that I didn't fix straight away:

* Why do you need to be a superuser to create a range type? In
DefineRange, the comment explaining why superuser privilege is required
seems bogus, copy-pasted from base types. So what is the reason?

* The binary i/o format includes the length of the lower and upper
bounds twice, once explicitly in range_send, and second time within the
send-function of the subtype. Seems wasteful.

* In findRangeSubOpclass, I think it's OK to hard-code "btree" into the
error message, no need to look that up from pg_am.

* Do we really need non_empty(anyrange) ? You can just do "NOT empty(x)"

* Range_before and range_after: I think "input range is empty" would
sound better than "undefined for empty ranges".

* In range_hash, rotating the hash of the initial flags byte seems pointless

* range_constructor_internal - I think it would be better to move logic
to figure out the the arguments into the callers.

* The gist support functions frequently call range_deserialize(), which
does catalog lookups. Isn't that horrendously expensive?

* In range_serialize and range_deserialize, reduce the number of catalog
lookups by combining the get_range_subtype, get_typlen, get_typalign,
get_typbyval and get_typstorage calls
with just one catalog lookup, and fetch all those fields at once.
Something like get_typlenbyvalalign()

* Have you tested this on an architecture with strict alignment? I don't
see any alignment bugs, but I think there's a lot of potential for them..

Documentation
-------------
* Section "Built-in Range Types". How about "PostgreSQL comes with the
following built-in range types:" <examples> "In addition, you can define
your own"

* In section "Inclusive and Exclusive Bounds", it is said "An inclusive
lower bound is represented by ...". That begs the question: where is it
represented like that? That doesn't become clear until you read the
section on Input/Output format. Reordering the sections might help. Are
the double-quotes around [ and ( necessary?

* It would be good to have some examples on the input formats in the
Input/Output section. There's one in the Examples section, but it's not
very clear that it's related. Perhaps the whole Examples section should
be moved to the end, or the examples be dispersed into the other sections.

* Likewise it would be nice to have some examples in the Inclusive and
Exclusive Bounds section.

* subtype_float, how is it supposed to work? I couldn't find any
explanation in the docs. Can we come up with a better name for the function?

* Constructing Ranges section: should describe the 0-3 argument forms of
the constructors, not only in the examples but in the main text too.

* What exactly is canonical function supposed to return? It's not clear
what format one should choose as the canonical format when writing a
custom range type. And even for the built-in types, it would be good to
explain which types use which canonical format (I saw the discussions on
that, so I understand that might still be subject to change).

* Defining New RangeTypes section: A more general description in
addition to the example would be good.

* "GiST Indexing" title: how about making it just "Indexing". Also, it
would be nice to list exactly which operators can be sped up by gist.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-15T01:58:29Z

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> * Have you tested this on an architecture with strict alignment? I don't 
> see any alignment bugs, but I think there's a lot of potential for them..

Well, fwiw, this patch passes its regression tests on HPPA, except for this
which seems more to do with the already-noted unacceptable dependency on
non-C collations:

*** /home/postgres/pgsql/src/test/regress/expected/rangetypes.out	Fri Oct 14 21:19:19 2011
--- /home/postgres/pgsql/src/test/regress/results/rangetypes.out	Fri Oct 14 21:50:11 2011
***************
*** 842,857 ****
  --
  create type textrange_c as range(subtype=text, collation="C");
  create type textrange_en_us as range(subtype=text, collation="en_US");
  select textrange_c('a','Z') @> 'b'::text;
  ERROR:  range lower bound must be less than or equal to range upper bound
  select textrange_en_us('a','Z') @> 'b'::text;
!  ?column? 
! ----------
!  t
! (1 row)
! 
  drop type textrange_c;
  drop type textrange_en_us;
  --
  -- Test out polymorphic type system
  --
--- 842,858 ----
  --
  create type textrange_c as range(subtype=text, collation="C");
  create type textrange_en_us as range(subtype=text, collation="en_US");
+ ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist
  select textrange_c('a','Z') @> 'b'::text;
  ERROR:  range lower bound must be less than or equal to range upper bound
  select textrange_en_us('a','Z') @> 'b'::text;
! ERROR:  function textrange_en_us(unknown, unknown) does not exist
! LINE 1: select textrange_en_us('a','Z') @> 'b'::text;
!                ^
! HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
  drop type textrange_c;
  drop type textrange_en_us;
+ ERROR:  type "textrange_en_us" does not exist
  --
  -- Test out polymorphic type system
  --

======================================================================


Also, I notice you forgot to update serial_schedule:

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9c612970ef777e8cc39369a91e343f6afc..2e87d9eefd6fbb70a5603bc000ffe833
5945201f 100644
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
*************** test: txid
*** 18,23 ****
--- 18,24 ----
  test: uuid
  test: enum
  test: money
+ test: rangetypes
  test: strings
  test: numerology
  test: point


			regards, tom lane

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-16T22:09:13Z

On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:
> On 07.10.2011 06:41, Jeff Davis wrote:
> > The repo is available here:
> >
> > http://git.postgresql.org/gitweb/?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes
> 
> I took a look at this, fixed a bunch of minor issues, and pushed to my 
> git repo: git://git.postgresql.org/git/users/heikki/postgres.git. 
> Attached is an updated patch including those changes, for the archives.

Thank you for the review!

Updated patch attached (and pushed to repo).

> A few issues that I didn't fix straight away:
> 
> * Why do you need to be a superuser to create a range type? In 
> DefineRange, the comment explaining why superuser privilege is required 
> seems bogus, copy-pasted from base types. So what is the reason?

No particular reason. I thought someone might ask me to include such as
restriction. I have removed it to allow non-superuser creation.

> * The binary i/o format includes the length of the lower and upper 
> bounds twice, once explicitly in range_send, and second time within the 
> send-function of the subtype. Seems wasteful.

Any ideas how to fix that? How else do I know how much the underlying
send function will consume?

> * In findRangeSubOpclass, I think it's OK to hard-code "btree" into the 
> error message, no need to look that up from pg_am.

Done.

> * Do we really need non_empty(anyrange) ? You can just do "NOT empty(x)"

To make it a searchable (via GiST) condition, I need an operator. I
could either remove that operator (as it's not amazingly useful), or I
could just not document the function but leave the operator there.

> * Range_before and range_after: I think "input range is empty" would 
> sound better than "undefined for empty ranges".

Done.

> * In range_hash, rotating the hash of the initial flags byte seems pointless

Fixed.

> * range_constructor_internal - I think it would be better to move logic 
> to figure out the the arguments into the callers.

Done.

> * The gist support functions frequently call range_deserialize(), which 
> does catalog lookups. Isn't that horrendously expensive?

Yes, it was. I have introduced a cached structure that avoids syscache
lookups when it's the same range as the last lookup (the common case).

> * In range_serialize and range_deserialize, reduce the number of catalog 
> lookups by combining the get_range_subtype, get_typlen, get_typalign, 
> get_typbyval and get_typstorage calls
> with just one catalog lookup, and fetch all those fields at once. 
> Something like get_typlenbyvalalign()

Done in conjunction with the cache structure I mentioned above.

> * Have you tested this on an architecture with strict alignment? I don't 
> see any alignment bugs, but I think there's a lot of potential for them..

I don't have such an architecture readily available.

> Documentation
> -------------
> * Section "Built-in Range Types". How about "PostgreSQL comes with the 
> following built-in range types:" <examples> "In addition, you can define 
> your own"

Done.

> * In section "Inclusive and Exclusive Bounds", it is said "An inclusive 
> lower bound is represented by ...". That begs the question: where is it 
> represented like that? That doesn't become clear until you read the 
> section on Input/Output format. Reordering the sections might help. Are 
> the double-quotes around [ and ( necessary?

I added a forward reference. Rearranging the sections seemed to create
the opposite problem. And I removed the extra double-quotes.

> * It would be good to have some examples on the input formats in the 
> Input/Output section. There's one in the Examples section, but it's not 
> very clear that it's related. Perhaps the whole Examples section should 
> be moved to the end, or the examples be dispersed into the other sections.

Done.

> * Likewise it would be nice to have some examples in the Inclusive and 
> Exclusive Bounds section.

I think the new examples in the I/O section cover that, and there's a
forward link from the inclusivity/exclusivity section.

> * subtype_float, how is it supposed to work? I couldn't find any 
> explanation in the docs. Can we come up with a better name for the function?

It's only for the gist penalty function. It's difficult to know how to
penalize without being able to do some math on the boundaries, so
subtype_float it meant to return a float that might be useful for that
purpose.

It's likely that this will change based on Alexander Korotkov's comments
here:

http://archives.postgresql.org/message-id/CAPpHfdsVow
+Yeoq5Be=sxPFPsbYG3Gj3QyVvVdt8dg7XqNyHpg@mail.gmail.com

So I will defer the documentation updates until we work that out.

> * Constructing Ranges section: should describe the 0-3 argument forms of 
> the constructors, not only in the examples but in the main text too.

Done.

> * What exactly is canonical function supposed to return? It's not clear 
> what format one should choose as the canonical format when writing a 
> custom range type. And even for the built-in types, it would be good to 
> explain which types use which canonical format (I saw the discussions on 
> that, so I understand that might still be subject to change).

The canonical function is just supposed to return a new range such that
two equal values will have equal representations. I have listed the
built-in discrete range types and their canonical form.

As far as describing what a custom range type is supposed to use for the
canonical form, I don't know which part is exactly unclear. There aren't
too many rules to defining one -- the only guideline is that ranges of
equal value going in should be put in one canonical representation.

> * Defining New RangeTypes section: A more general description in 
> addition to the example would be good.

Done.

> * "GiST Indexing" title: how about making it just "Indexing". Also, it 
> would be nice to list exactly which operators can be sped up by gist.

Done.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Erik Rijkers <er@xs4all.nl> — 2011-10-17T14:07:23Z

On Mon, October 17, 2011 00:09, Jeff Davis wrote:

To facilitate would-be testers let me mention that to apply this patch one needs to remove the
catversion changes from the patch.

(as discussed before: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00817.php )


Erik Rijkers

Re: Range Types - typo + NULL string constructor

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-24T10:15:03Z

On 17.10.2011 01:09, Jeff Davis wrote:
> On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:
>> * The binary i/o format includes the length of the lower and upper
>> bounds twice, once explicitly in range_send, and second time within the
>> send-function of the subtype. Seems wasteful.
>
> Any ideas how to fix that? How else do I know how much the underlying
> send function will consume?

Oh, never mind. I was misreading the code, it's not sending the length 
twice.

>> * range_constructor_internal - I think it would be better to move logic
>> to figure out the the arguments into the callers.
>
> Done.

The comment above range_constructor0() is now outdated.

>> * The gist support functions frequently call range_deserialize(), which
>> does catalog lookups. Isn't that horrendously expensive?
>
> Yes, it was. I have introduced a cached structure that avoids syscache
> lookups when it's the same range as the last lookup (the common case).

Hmm, I don't think that's safe. After Oid wraparound, a range type oid 
might get reused for some other range type, and the cache would return 
stale values. Extremely unlikely to happen by accident, but could be 
exploited by an attacker.

>> * What exactly is canonical function supposed to return? It's not clear
>> what format one should choose as the canonical format when writing a
>> custom range type. And even for the built-in types, it would be good to
>> explain which types use which canonical format (I saw the discussions on
>> that, so I understand that might still be subject to change).
>
> The canonical function is just supposed to return a new range such that
> two equal values will have equal representations. I have listed the
> built-in discrete range types and their canonical form.
>
> As far as describing what a custom range type is supposed to use for the
> canonical form, I don't know which part is exactly unclear. There aren't
> too many rules to defining one -- the only guideline is that ranges of
> equal value going in should be put in one canonical representation.

Ok. The name "canonical" certainly hints at that, but it would be good 
to explicitly state that guideline. As the text stands, it would seem 
that a canonical function that maps "[1,7]" to "[1,8)", and also vice 
versa, "[1,8)" to "[1,7]", would be valid. That would be pretty silly, 
but it would be good to say something like "The canonical output for two 
values that are equal, like [1,7] and [1,8), must be equal. It doesn't 
matter which representation you choose to be the canonical one, as long 
as two equal values with different formattings are always mapped to the 
same value with same formatting"

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-25T16:37:12Z

On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
> Hmm, I don't think that's safe. After Oid wraparound, a range type oid 
> might get reused for some other range type, and the cache would return 
> stale values. Extremely unlikely to happen by accident, but could be 
> exploited by an attacker.
> 

Any ideas on how to remedy that? I don't have another plan for making it
perform well. Plugging it into the cache invalidation mechanism seems
like overkill, but I suppose that would solve the problem.

Aren't there a few other cases like this floating around the code? I
know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-26T06:53:46Z

On 25.10.2011 19:37, Jeff Davis wrote:
> On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
>> Hmm, I don't think that's safe. After Oid wraparound, a range type oid
>> might get reused for some other range type, and the cache would return
>> stale values. Extremely unlikely to happen by accident, but could be
>> exploited by an attacker.
>
> Any ideas on how to remedy that? I don't have another plan for making it
> perform well. Plugging it into the cache invalidation mechanism seems
> like overkill, but I suppose that would solve the problem.

I think we should look at the array-functions for precedent. array_in et 
al cache the information in fn_extra, so that when it's called 
repeatedly in one statement for the same type, the information is only 
looked up once. That's good enough, it covers repeated execution in a 
single query, as well as COPY and comparison calls from index searches, 
for example.

> Aren't there a few other cases like this floating around the code?

Not that I know of. That said, I wouldn't be too surprised if there was.

> I know the single-xid cache is potentially vulnerable to xid wraparound
> for the same reason.

True.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-10-26T15:42:52Z

On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
>> Hmm, I don't think that's safe. After Oid wraparound, a range type oid
>> might get reused for some other range type, and the cache would return
>> stale values. Extremely unlikely to happen by accident, but could be
>> exploited by an attacker.
>
> Any ideas on how to remedy that? I don't have another plan for making it
> perform well. Plugging it into the cache invalidation mechanism seems
> like overkill, but I suppose that would solve the problem.
>
> Aren't there a few other cases like this floating around the code? I
> know the single-xid cache is potentially vulnerable to xid wraparound
> for the same reason.

I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, because then you could
have a situation where some people think a given XID is in the future
and others think it's in the past.  I have been wondering if we should
have some sort of active guard against that scenario; I don't think we
do at present.

But OID wraparound is not the same as XID wraparound.  It's far more
common, I think, for a single transaction to use lots of OIDs than it
is for it to use lots of XIDs (i.e. have many subtransactions).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-26T15:52:28Z

On 26.10.2011 18:42, Robert Haas wrote:
> On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis<pgsql@j-davis.com>  wrote:
>> Aren't there a few other cases like this floating around the code? I
>> know the single-xid cache is potentially vulnerable to xid wraparound
>> for the same reason.
>
> I believe that we're in trouble with XIDs as soon as you have two
> active XIDs that are separated by a billion, ...

That's not what Jeff is referring to here, though (correct me if I'm 
wrong). He's talking about the one-item cache in 
TransactionIdLogFetch(). You don't need need long-running transactions 
for that to get confused. Specifically, this could happen:

1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
    The row has xmin = 123456, and it is cached as committed in the 
one-item cache by TransactionLogFetch.
2. A lot of time passes. Everything is frozen, and XID wrap-around 
happens. (Session A is idle but not in a transaction, so it doesn't 
inhibit freezing.)
3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
    By coincidence, this transaction was assigned XID 123456.
4. In session A: SELECT * FROM foo WHERE id = 2;
    The one-item cache still says that 123456 committed, so we return 
the tuple inserted by the aborted transaction. Oops.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Range Types - typo + NULL string constructor

Robert Haas <robertmhaas@gmail.com> — 2011-10-26T16:19:47Z

On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 26.10.2011 18:42, Robert Haas wrote:
>>
>> On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis<pgsql@j-davis.com>  wrote:
>>>
>>> Aren't there a few other cases like this floating around the code? I
>>> know the single-xid cache is potentially vulnerable to xid wraparound
>>> for the same reason.
>>
>> I believe that we're in trouble with XIDs as soon as you have two
>> active XIDs that are separated by a billion, ...
>
> That's not what Jeff is referring to here, though (correct me if I'm wrong).
> He's talking about the one-item cache in TransactionIdLogFetch(). You don't
> need need long-running transactions for that to get confused. Specifically,
> this could happen:
>
> 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
>   The row has xmin = 123456, and it is cached as committed in the one-item
> cache by TransactionLogFetch.
> 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> (Session A is idle but not in a transaction, so it doesn't inhibit
> freezing.)
> 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
>   By coincidence, this transaction was assigned XID 123456.
> 4. In session A: SELECT * FROM foo WHERE id = 2;
>   The one-item cache still says that 123456 committed, so we return the
> tuple inserted by the aborted transaction. Oops.

Oh, hmm.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types - typo + NULL string constructor

Alvaro Herrera <alvherre@commandprompt.com> — 2011-10-26T16:31:43Z

Excerpts from Robert Haas's message of mié oct 26 13:19:47 -0300 2011:
> On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:

> > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
> >   The row has xmin = 123456, and it is cached as committed in the one-item
> > cache by TransactionLogFetch.
> > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> > (Session A is idle but not in a transaction, so it doesn't inhibit
> > freezing.)
> > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
> >   By coincidence, this transaction was assigned XID 123456.
> > 4. In session A: SELECT * FROM foo WHERE id = 2;
> >   The one-item cache still says that 123456 committed, so we return the
> > tuple inserted by the aborted transaction. Oops.
> 
> Oh, hmm.  That sucks.

Can this be solved by simple application of the Xid epoch?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-26T16:47:05Z

On Wed, 2011-10-26 at 12:19 -0400, Robert Haas wrote:
> > 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
> >   The row has xmin = 123456, and it is cached as committed in the one-item
> > cache by TransactionLogFetch.
> > 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
> > (Session A is idle but not in a transaction, so it doesn't inhibit
> > freezing.)
> > 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
> >   By coincidence, this transaction was assigned XID 123456.
> > 4. In session A: SELECT * FROM foo WHERE id = 2;
> >   The one-item cache still says that 123456 committed, so we return the
> > tuple inserted by the aborted transaction. Oops.

Yes, that's the scenario I was talking about.

> Oh, hmm.  That sucks.

It didn't seem like a major concern a while ago:
http://archives.postgresql.org/message-id/28107.1291264452@sss.pgh.pa.us

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-26T17:27:03Z

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> That's not what Jeff is referring to here, though (correct me if I'm 
> wrong). He's talking about the one-item cache in 
> TransactionIdLogFetch(). You don't need need long-running transactions 
> for that to get confused. Specifically, this could happen:

> 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
>     The row has xmin = 123456, and it is cached as committed in the 
> one-item cache by TransactionLogFetch.
> 2. A lot of time passes. Everything is frozen, and XID wrap-around 
> happens. (Session A is idle but not in a transaction, so it doesn't 
> inhibit freezing.)
> 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
>     By coincidence, this transaction was assigned XID 123456.
> 4. In session A: SELECT * FROM foo WHERE id = 2;
>     The one-item cache still says that 123456 committed, so we return 
> the tuple inserted by the aborted transaction. Oops.

I think this is probably a red herring, because it's impossible for a
session to remain totally idle for that long --- see sinval updating.
(If you wanted to be really sure, we could have sinval reset clear
the TransactionLogFetch cache, but I doubt it's worth the trouble.)

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-26T17:28:52Z

Robert Haas <robertmhaas@gmail.com> writes:
> I believe that we're in trouble with XIDs as soon as you have two
> active XIDs that are separated by a billion, because then you could
> have a situation where some people think a given XID is in the future
> and others think it's in the past.  I have been wondering if we should
> have some sort of active guard against that scenario; I don't think we
> do at present.

Sure we do.  It's covered by the XID wraparound prevention logic.

			regards, tom lane

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-10-31T18:01:44Z

On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
> >> * range_constructor_internal - I think it would be better to move logic
> >> to figure out the the arguments into the callers.
> >
> > Done.
> 
> The comment above range_constructor0() is now outdated.

Removed.

> >> * The gist support functions frequently call range_deserialize(), which
> >> does catalog lookups. Isn't that horrendously expensive?
> >
> > Yes, it was. I have introduced a cached structure that avoids syscache
> > lookups when it's the same range as the last lookup (the common case).
> 
> Hmm, I don't think that's safe. After Oid wraparound, a range type oid 
> might get reused for some other range type, and the cache would return 
> stale values. Extremely unlikely to happen by accident, but could be 
> exploited by an attacker.

Done.

> Ok. The name "canonical" certainly hints at that, but it would be good 
> to explicitly state that guideline. As the text stands, it would seem 
> that a canonical function that maps "[1,7]" to "[1,8)", and also vice 
> versa, "[1,8)" to "[1,7]", would be valid. That would be pretty silly, 
> but it would be good to say something like "The canonical output for two 
> values that are equal, like [1,7] and [1,8), must be equal. It doesn't 
> matter which representation you choose to be the canonical one, as long 
> as two equal values with different formattings are always mapped to the 
> same value with same formatting"

Used your wording, thank you.

Regards,
	Jeff Davis

Re: Range Types - typo + NULL string constructor

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-11-03T08:37:00Z

On 17.10.2011 01:09, Jeff Davis wrote:
> On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:
>> * Do we really need non_empty(anyrange) ? You can just do "NOT empty(x)"
>
> To make it a searchable (via GiST) condition, I need an operator. I
> could either remove that operator (as it's not amazingly useful), or I
> could just not document the function but leave the operator there.

Looking at the most recent patch, I don't actually see any GiST support 
for the empty and non-empty operators (!? and ?). I don't see how those 
could be accelerated with GiST, anyway; I think if you want to use an 
index for those operators, you might as well create a partial or 
functional index on empty(x).

So I'm actually inclined to remove not only the nonempty function, but 
also the ? and !? operators. They don't seem very useful, and ? and !? 
don't feel very intuitive to me, anyway. I'll just leave the empty(x) 
function.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Range Types - typo + NULL string constructor

Jeff Davis <pgsql@j-davis.com> — 2011-11-03T08:46:09Z

On Thu, 2011-11-03 at 10:37 +0200, Heikki Linnakangas wrote:
> Looking at the most recent patch, I don't actually see any GiST support 
> for the empty and non-empty operators (!? and ?). I don't see how those 
> could be accelerated with GiST, anyway; I think if you want to use an 
> index for those operators, you might as well create a partial or 
> functional index on empty(x).
> 
> So I'm actually inclined to remove not only the nonempty function, but 
> also the ? and !? operators. They don't seem very useful, and ? and !? 
> don't feel very intuitive to me, anyway. I'll just leave the empty(x) 
> function.

That's fine with me. At best, they were only marginally useful.

Regards,
	Jeff Davis