Thread

  1. Re: [HACKERS] backslash in psql output

    Tom Lane <tgl@sss.pgh.pa.us> — 1998-10-10T15:19:44Z

    Bruce Momjian <maillist@candle.pha.pa.us> writes:
    > I realize the double-backslash is confusing, but I don't think we can
    > make such a user-visible change at this time.  I think we need to open
    > discussion on this issue on the general list, and to include discussion
    > of NULL displays, and any other issues, as well as how to properly
    > output the column separation character if that appears in the data.
    > So, I think we have to put it back to the old way, and open discussion
    > about this after 6.4.
    
    Well, actually there *was* public discussion of this issue, on the
    pgsql-interfaces list around 12/13 August.  The consensus was that
    unnecessary backslashing was a bad idea --- in fact, I didn't see
    *anyone* arguing in favor of the old behavior, and the people who
    actually had backslashes in their data definitely didn't want it.
    Admittedly it was a pretty small sample (Tom Lockhart and I were
    two of the primary complainers) but there wasn't any sentiment
    for keeping the old behavior.
    
    Keep in mind that what we are discussing here is the behavior of
    PQprint(), not the behavior of FE/BE transport protocol or anything
    else that affects data received by applications.  PQprint's goal in
    life is to present data in a reasonably human-friendly way, *not*
    to produce a completely unambiguous machine-readable syntax.  Its
    output format is in fact very ambiguous.  Here's an example:
    
    play=> create table test(id int4, val text);
    play=> insert into test values (1, NULL);
    play=> insert into test values (2, '    ');
    play=> insert into test values (3, 'foobar');
    play=> insert into test values (4, 'oneback\\slash');
    play=> insert into test values (5, 'onevert|bar');
    play=> select * from test;
    id|val
    --+-------------
     1|
     2|
     3|foobar
     4|oneback\slash
     5|onevert|bar
    (5 rows)
    
    You can't tell the difference between a NULL field and an all-blanks
    value in this format; nor can you really be sure how many trailing
    blanks there are in tuples 3 and 5.  So the goal is readability,
    not lack of ambiguity.  Given that goal, I don't see the value of
    printing backslash escapes.  Are you really having difficulty telling
    the data vertical bar from the ones used as column separators?
    Physical alignment is the cue the eye relies on, I think.
    
    The only cases that PQprint inserted backslashes for were the column
    separator char (unnecessary per above example), newlines (also not
    exactly hard to recognize), and backslash itself.  All of these
    seem unnecessary and confusing to me.
    
    I'm sorry that this change sat in my to-do queue for so long, but
    I don't see it as a last-minute thing.  The consensus to do it was
    established two months ago.
    
    			regards, tom lane
    
    
  2. Re: [HACKERS] backslash in psql output

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-10-11T01:45:50Z

    > You can't tell the difference between a NULL field and an all-blanks
    > value in this format; nor can you really be sure how many trailing
    > blanks there are in tuples 3 and 5.  So the goal is readability,
    > not lack of ambiguity.  Given that goal, I don't see the value of
    > printing backslash escapes.  Are you really having difficulty telling
    > the data vertical bar from the ones used as column separators?
    > Physical alignment is the cue the eye relies on, I think.
    > 
    > The only cases that PQprint inserted backslashes for were the column
    > separator char (unnecessary per above example), newlines (also not
    > exactly hard to recognize), and backslash itself.  All of these
    > seem unnecessary and confusing to me.
    
    OK, I understand your point here, that we must maximize readability, and
    that robustness is not as important.
    
    OK, let's keep the removal of backslashes.  Can you recommend a nice
    NULL display, perhaps '[NULL]' or '<NULL>'?
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
    
  3. Re: [HACKERS] backslash in psql output

    D'Arcy Cain <darcy@druid.net> — 1998-10-11T04:40:24Z

    Thus spake Bruce Momjian
    > OK, let's keep the removal of backslashes.  Can you recommend a nice
    > NULL display, perhaps '[NULL]' or '<NULL>'?
    
    I'd like to make at least one vote to keep the status quo.  Some of
    us have come to depend on the existing behaviour.  At least make
    it an option that you have to turn on.
    
    -- 
    D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
    http://www.druid.net/darcy/                |  and a sheep voting on
    +1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.
    
    
  4. Re: [HACKERS] backslash in psql output

    Bruce Momjian <maillist@candle.pha.pa.us> — 1998-10-11T05:02:56Z

    > Thus spake Bruce Momjian
    > > OK, let's keep the removal of backslashes.  Can you recommend a nice
    > > NULL display, perhaps '[NULL]' or '<NULL>'?
    > 
    > I'd like to make at least one vote to keep the status quo.  Some of
    > us have come to depend on the existing behaviour.  At least make
    > it an option that you have to turn on.
    
    Man, I can't win.  :-)
    
    I vote for the status quo, and people want double backslashes removed. 
    I want to add a NULL display, and people want status quo.
    
    -- 
      Bruce Momjian                        |  http://www.op.net/~candle
      maillist@candle.pha.pa.us            |  (610) 853-3000
      +  If your life is a hard drive,     |  830 Blythe Avenue
      +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
    
    
    
  5. Re: [HACKERS] backslash in psql output

    Massimo Dal Zotto <dz@cs.unitn.it> — 1998-10-11T12:24:14Z

    > 
    > Bruce Momjian <maillist@candle.pha.pa.us> writes:
    > > I realize the double-backslash is confusing, but I don't think we can
    > > make such a user-visible change at this time.  I think we need to open
    > > discussion on this issue on the general list, and to include discussion
    > > of NULL displays, and any other issues, as well as how to properly
    > > output the column separation character if that appears in the data.
    > > So, I think we have to put it back to the old way, and open discussion
    > > about this after 6.4.
    > 
    > Well, actually there *was* public discussion of this issue, on the
    > pgsql-interfaces list around 12/13 August.  The consensus was that
    > unnecessary backslashing was a bad idea --- in fact, I didn't see
    > *anyone* arguing in favor of the old behavior, and the people who
    > actually had backslashes in their data definitely didn't want it.
    > Admittedly it was a pretty small sample (Tom Lockhart and I were
    > two of the primary complainers) but there wasn't any sentiment
    > for keeping the old behavior.
    > 
    > Keep in mind that what we are discussing here is the behavior of
    > PQprint(), not the behavior of FE/BE transport protocol or anything
    > else that affects data received by applications.  PQprint's goal in
    > life is to present data in a reasonably human-friendly way, *not*
    > to produce a completely unambiguous machine-readable syntax.  Its
    > output format is in fact very ambiguous.  Here's an example:
    > 
    > play=> create table test(id int4, val text);
    > play=> insert into test values (1, NULL);
    > play=> insert into test values (2, '    ');
    > play=> insert into test values (3, 'foobar');
    > play=> insert into test values (4, 'oneback\\slash');
    > play=> insert into test values (5, 'onevert|bar');
    > play=> select * from test;
    > id|val
    > --+-------------
    >  1|
    >  2|
    >  3|foobar
    >  4|oneback\slash
    >  5|onevert|bar
    > (5 rows)
    > 
    > You can't tell the difference between a NULL field and an all-blanks
    > value in this format; nor can you really be sure how many trailing
    > blanks there are in tuples 3 and 5.  So the goal is readability,
    > not lack of ambiguity.  Given that goal, I don't see the value of
    > printing backslash escapes.  Are you really having difficulty telling
    > the data vertical bar from the ones used as column separators?
    > Physical alignment is the cue the eye relies on, I think.
    > 
    > The only cases that PQprint inserted backslashes for were the column
    > separator char (unnecessary per above example), newlines (also not
    > exactly hard to recognize), and backslash itself.  All of these
    > seem unnecessary and confusing to me.
    > 
    > I'm sorry that this change sat in my to-do queue for so long, but
    > I don't see it as a last-minute thing.  The consensus to do it was
    > established two months ago.
    > 
    > 			regards, tom lane
    > 
    > 
    > 
    
    In my opinion we should privilege machine-readableness first and then provide
    some user option to enable user-friendly conversion in psql output if one
    really needs it.
    
    In situations where data is processed by other programs it is very important
    that there is no ambiguity in strings exchanged between the application and
    the backend. This is already done for input, which supports C-like escape,
    but not yet for output, which can produce ambiguous data when nulls, arrays
    or non-printing characters are involved. This is the reason why I always use
    my C-like output functions (contrib/string-io) in all my applications.
    
    These arguments apply also to the copy command which uses the same output
    functions. Consider the case where a text field contains a multi-line string
    with newlines embedded; if you export the table into an external files the
    field is split into many lines which are interpreted as separate records by
    commonly used line-oriented filters like awk or grep.
    
    I believe that the right way to handle all this stuff is the following:
    
      input:
    
               binary data            escaped data
                    |                       |
            (user conversion)         (psql input)
                    |                       |
                    +-----------------------+
                    |
              escaped query
                    |
                 (libpq)
                    |
              escaped query            escaped data
                    |                       |
            (parser unescape)      (copy-from unescape)
                    |                       |
                    +-----------------------+
                    |
               binary data
                    |
             (input function)
                    |
              internal data
    
    
      output:
    
              internal data
                    |
            (output function)
                    |
              escaped data
                    |
                    +-----------------------+
                    |                       |
                 (libpq)                (copy-to)
                    |                       |
              escaped data            escaped data
                    |
                    |
                    +-----------------------+-----------------------+
                    |                       |                       |
            (user conversion)         (psql output)           (psql unescape)
                    |                       |                       |
               binary data            escaped data             binary data
    
    
    In the above schema binary data means the external representation of data
    containing non-printing or delimiters characters like quotes or newlines.
    In this schema all the data exchanged with the backend should be escaped
    in order to guarantee unambiguity to applications. The input and output
    user conversion functions could be provided by libpq as utilities, and the
    conversion could possibly be done automatically by libpq itself if some
    global flag is set by the application.
    Psql input should accept only escaped data while the output could be escaped
    (default) or binary depending on a user supplied switch.
    Files read or written by the copy command should be always escaped with
    exactly one record for line. Pg_dump should produce escaped strings.
    All this stuff requires the use of new output functions like those provided
    in contrib/string-io.
    
    There is still the problem of distinguishing between scalars and arrays which
    is necessary for user output conversion. In my output functions I solved the
    problem by escaping the first '{' of each field if it is not an array.
    Another problem is that array input requires a double escaping, one for the
    query parser and a second one for the array parser. Also nulls (\0) are not
    handled by the input code. This should be fixed if we want true binary data.
    
    I don't know if C-escapes violate the ansi sql standard but I believe they
    makes life easier for the programmer. And if we add some global flag in
    libpq we could also do automatic conversion to be compatible with ansi sql
    and old applications. Note that arrays aren't ansi sql anyway.
    
    Anyway a runtime switch is preferable to a configure switch.
    
    -- 
    Massimo Dal Zotto
    
    +----------------------------------------------------------------------+
    |  Massimo Dal Zotto                email:  dz@cs.unitn.it             |
    |  Via Marconi, 141                 phone:  ++39-461-534251            |
    |  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
    |  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
    +----------------------------------------------------------------------+
    
    
  6. Re: [HACKERS] backslash in psql output

    Marc Howard Zuckman <marc@fallon.classyad.com> — 1998-10-11T17:10:41Z

    On Sun, 11 Oct 1998, D'Arcy J.M. Cain wrote:
    
    > I'd like to make at least one vote to keep the status quo.  Some of
    > us have come to depend on the existing behaviour.  At least make
    > it an option that you have to turn on.
    > 
    > -- 
    > D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
    > http://www.druid.net/darcy/                |  and a sheep voting on
    > +1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.
    > 
    
    To which I add my hearty approval (I'd like both options)!
    
    Marc Zuckman
    marc@fallon.classyad.com
    
    _\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
    _     Visit The Home and Condo MarketPlace		      _
    _          http://www.ClassyAd.com			      _
    _							      _
    _  FREE basic property listings/advertisements and searches.  _
    _							      _
    _  Try our premium, yet inexpensive services for a real	      _
    _   selling or buying edge!				      _
    _\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_