Thread

  1. synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-08-15T01:31:20Z

    This is a patch to implement synchronized snapshots. It is based on
    Alvaro's specifications in:
    
    http://archives.postgresql.org/pgsql-hackers/2011-02/msg02074.php
    
    In short, this is how it works:
    
    SELECT pg_export_snapshot();
     pg_export_snapshot
    --------------------
     000003A1-1
    (1 row)
    
    
    (and then in a different session)
    
    BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '000003A1-1');
    
    
    The one thing that it does not implement is leaving the transaction in
    an aborted state if the BEGIN TRANSACTION command failed for an
    invalid snapshot identifier. I can certainly see that this would be
    useful but I am not sure if it justifies introducing this
    inconsistency. We would have a BEGIN TRANSACTION command that left the
    session in a different state depending on why it failed...
    
    Also I was unsure if we really need to do further checking beyond the
    existence of the file, why exactly is this necessary?
    
    The patch is adding an extra "stemplate" parameter to the GetSnapshot
    functions, the primary reason for this is to make it work with SSI,
    which gets a snapshot and then does stuff with it. The alternative
    would have been splitting up the SSI function so that we can smuggle
    in our own snapshot but that didn't seem to be less ugly. The way it
    works now is that the lowest function checks if a template is being
    passed from higher up and if so, it doesn't get a fresh snapshot but
    returns just a copy of the template.
    
    I am wondering if pg_export_snapshot() is still the right name, since
    the snapshot is no longer exported to the user. It is exported to a
    file but that's an implementation detail.
    
    
    Joachim
    
  2. Re: synchronized snapshots

    Simon Riggs <simon@2ndquadrant.com> — 2011-08-15T07:40:34Z

    On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland <joe@mcknight.de> wrote:
    
    > In short, this is how it works:
    >
    > SELECT pg_export_snapshot();
    >  pg_export_snapshot
    > --------------------
    >  000003A1-1
    > (1 row)
    >
    >
    > (and then in a different session)
    >
    > BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '000003A1-1');
    
    I don't see the need to change the BEGIN command, which is SQL
    Standard. We don't normally do that.
    
    If we have pg_export_snapshot() why not pg_import_snapshot() as well?
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  3. Re: synchronized snapshots

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-08-15T07:47:45Z

    On 15.08.2011 04:31, Joachim Wieland wrote:
    > The one thing that it does not implement is leaving the transaction in
    > an aborted state if the BEGIN TRANSACTION command failed for an
    > invalid snapshot identifier.
    
    So what if the snapshot is invalid, the SNAPSHOT clause silently 
    ignored? That sounds really bad.
    
    > I can certainly see that this would be
    > useful but I am not sure if it justifies introducing this
    > inconsistency. We would have a BEGIN TRANSACTION command that left the
    > session in a different state depending on why it failed...
    
    I don't understand what inconsistency you're talking about. What else 
    can cause BEGIN TRANSACTION to fail? Is there currently any failure mode 
    that doesn't leave the transaction in aborted state?
    
    > I am wondering if pg_export_snapshot() is still the right name, since
    > the snapshot is no longer exported to the user. It is exported to a
    > file but that's an implementation detail.
    
    It's still exporting the snapshot to other sessions, that name still 
    seems appropriate to me.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  4. Re: synchronized snapshots

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-08-15T07:51:42Z

    On 15.08.2011 10:40, Simon Riggs wrote:
    > On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland<joe@mcknight.de>  wrote:
    >
    >> In short, this is how it works:
    >>
    >> SELECT pg_export_snapshot();
    >>   pg_export_snapshot
    >> --------------------
    >>   000003A1-1
    >> (1 row)
    >>
    >>
    >> (and then in a different session)
    >>
    >> BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '000003A1-1');
    >
    > I don't see the need to change the BEGIN command, which is SQL
    > Standard. We don't normally do that.
    >
    > If we have pg_export_snapshot() why not pg_import_snapshot() as well?
    
    It would be nice a symmetry, but you'd need a limitation that 
    pg_import_snapshot() must be the first thing you do in the session. And 
    it might be hard to enforce that, as once you get control into the 
    function, you've already acquired another snapshot in the transaction to 
    run the "SELECT pg_import_snapshot()" query with. Specifying the 
    snapshot in the BEGIN command makes sense.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  5. Re: synchronized snapshots

    Andres Freund <andres@anarazel.de> — 2011-08-15T08:07:31Z

    On Monday, August 15, 2011 08:40:34 Simon Riggs wrote:
    > On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland <joe@mcknight.de> wrote:
    > > In short, this is how it works:
    > > 
    > > SELECT pg_export_snapshot();
    > >  pg_export_snapshot
    > > --------------------
    > >  000003A1-1
    > > (1 row)
    > > 
    > > 
    > > (and then in a different session)
    > > 
    > > BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT =
    > > '000003A1-1');
    > I don't see the need to change the BEGIN command, which is SQL
    > Standard. We don't normally do that.
    Uhm. There already are several extensions to begin transaction. Like the just 
    added "DEFERRABLE".
    
    > If we have pg_export_snapshot() why not pg_import_snapshot() as well?
    Using BEGIN has the advantage of making it explicit that it cannot be used 
    inside an existing transaction. Which I do find advantageous.
    
    Andres
    
    
  6. Re: synchronized snapshots

    Hans-Jürgen Schönig <postgres@cybertec.at> — 2011-08-15T08:51:54Z

    On Aug 15, 2011, at 9:40 AM, Simon Riggs wrote:
    
    > On Mon, Aug 15, 2011 at 2:31 AM, Joachim Wieland <joe@mcknight.de> wrote:
    > 
    >> In short, this is how it works:
    >> 
    >> SELECT pg_export_snapshot();
    >>  pg_export_snapshot
    >> --------------------
    >>  000003A1-1
    >> (1 row)
    >> 
    >> 
    >> (and then in a different session)
    >> 
    >> BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT = '000003A1-1');
    > 
    > I don't see the need to change the BEGIN command, which is SQL
    > Standard. We don't normally do that.
    > 
    > If we have pg_export_snapshot() why not pg_import_snapshot() as well?
    > 
    > -- 
    >  Simon Riggs                   http://www.2ndQuadrant.com/
    >  PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
    i would definitely argue for a syntax like the one proposed by Joachim.. i could stay the same if this is turned into some sort of flashback implementation some day.
    
    	regards,
    
    		hans
    
    --
    Cybertec Schönig & Schönig GmbH
    Gröhrmühlgasse 26
    A-2700 Wiener Neustadt, Austria
    Web: http://www.postgresql-support.de
    
    
    
  7. Re: synchronized snapshots

    Florian Weimer <fweimer@bfk.de> — 2011-08-15T10:41:57Z

    * Simon Riggs:
    
    > I don't see the need to change the BEGIN command, which is SQL
    > Standard. We don't normally do that.
    
    Some language bindings treat BEGIN specially, so it might be difficult
    to use this feature.
    
    -- 
    Florian Weimer                <fweimer@bfk.de>
    BFK edv-consulting GmbH       http://www.bfk.de/
    Kriegsstraße 100              tel: +49-721-96201-1
    D-76133 Karlsruhe             fax: +49-721-96201-99
    
    
  8. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-08-15T11:23:29Z

    On Mon, Aug 15, 2011 at 3:47 AM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    > On 15.08.2011 04:31, Joachim Wieland wrote:
    >>
    >> The one thing that it does not implement is leaving the transaction in
    >> an aborted state if the BEGIN TRANSACTION command failed for an
    >> invalid snapshot identifier.
    >
    > So what if the snapshot is invalid, the SNAPSHOT clause silently ignored?
    > That sounds really bad.
    
    No, the command would fail, but since it fails, it doesn't change the
    transaction state.
    
    What was proposed originally was to start a transaction but throw an
    error that leaves the transaction in the aborted state. But then the
    command had some effect because it started a transaction block, even
    though it failed.
    
    
    >> I can certainly see that this would be
    >> useful but I am not sure if it justifies introducing this
    >> inconsistency. We would have a BEGIN TRANSACTION command that left the
    >> session in a different state depending on why it failed...
    >
    > I don't understand what inconsistency you're talking about. What else can
    > cause BEGIN TRANSACTION to fail? Is there currently any failure mode that
    > doesn't leave the transaction in aborted state?
    
    Granted, it might only fail for parse errors so far, but that would
    include for example sending BEGIN DEFERRABLE to a pre-9.1 server. It
    wouldn't start a transaction and leave it in an aborted state, but it
    would just fail.
    
    
    >> I am wondering if pg_export_snapshot() is still the right name, since
    >> the snapshot is no longer exported to the user. It is exported to a
    >> file but that's an implementation detail.
    >
    > It's still exporting the snapshot to other sessions, that name still seems
    > appropriate to me.
    
    ok.
    
    
    Joachim
    
    
  9. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-08-15T11:30:10Z

    On Mon, Aug 15, 2011 at 6:41 AM, Florian Weimer <fweimer@bfk.de> wrote:
    > * Simon Riggs:
    >
    >> I don't see the need to change the BEGIN command, which is SQL
    >> Standard. We don't normally do that.
    >
    > Some language bindings treat BEGIN specially, so it might be difficult
    > to use this feature.
    
    It's true, the command might require explicit support from language
    bindings. However I used some perl test scripts, where you can also
    send a START TRANSACTION command in an $dbh->do(...).
    
    The intended use case of this feature is still pg_dump btw...
    
    
    Joachim
    
    
  10. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-15T13:37:54Z

    On Mon, Aug 15, 2011 at 3:51 AM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    > It would be nice a symmetry, but you'd need a limitation that
    > pg_import_snapshot() must be the first thing you do in the session. And it
    > might be hard to enforce that, as once you get control into the function,
    > you've already acquired another snapshot in the transaction to run the
    > "SELECT pg_import_snapshot()" query with. Specifying the snapshot in the
    > BEGIN command makes sense.
    
    +1.  Also, I am pretty sure that there are drivers out there, and
    connection poolers, that keep track of the transaction state by
    watching commands go by.  Right now you can tell by the first word of
    the command whether it's something that might change the transaction
    state; I wouldn't like to make that harder.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  11. Re: synchronized snapshots

    Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-08-15T14:56:00Z

    Simon Riggs <simon@2ndQuadrant.com> wrote:
    > Joachim Wieland <joe@mcknight.de> wrote:
     
    >> BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ (SNAPSHOT =
    >> '000003A1-1');
    > 
    > I don't see the need to change the BEGIN command, which is SQL
    > Standard.
     
    No, it's not standard.
     
    To quote from our docs at:
     
    http://www.postgresql.org/docs/9.0/interactive/sql-begin.html#AEN58214
     
    | BEGIN is a PostgreSQL language extension. It is equivalent to the
    | SQL-standard command START TRANSACTION, whose reference page
    | contains additional compatibility information.
    | 
    | Incidentally, the BEGIN key word is used for a different purpose
    | in embedded SQL. You are advised to be careful about the
    | transaction semantics when porting database applications. 
     
    In checking the most recent standards draft I have available, it
    appears that besides embedded SQL, this keyword is also used in the
    standard trigger declaration syntax.  Using BEGIN to start a
    transaction is a PostgreSQL extension to the standard.  That said,
    if we support a feature on the nonstandard BEGIN statement, we
    typically add it as an extension to the standard START TRANSACTION
    and SET TRANSACTION statements.  Through 9.0 that consisted of
    having a non-standard default for isolation level and the ability to
    omit commas required by the standard.  In 9.1 we added another
    optional transaction property which defaults to standard behavior:
    DEFERRABLE.
     
    If we're talking about a property of a transaction, like the
    transaction snapshot, it seems to me to be best to support it using
    the same statements we use for other transaction properties.
     
    -Kevin
    
    
  12. Re: synchronized snapshots

    Jim Nasby <jim@nasby.net> — 2011-08-15T22:09:35Z

    On Aug 15, 2011, at 6:23 AM, Joachim Wieland wrote:
    > On Mon, Aug 15, 2011 at 3:47 AM, Heikki Linnakangas
    > <heikki.linnakangas@enterprisedb.com> wrote:
    >> On 15.08.2011 04:31, Joachim Wieland wrote:
    >>> 
    >>> The one thing that it does not implement is leaving the transaction in
    >>> an aborted state if the BEGIN TRANSACTION command failed for an
    >>> invalid snapshot identifier.
    >> 
    >> So what if the snapshot is invalid, the SNAPSHOT clause silently ignored?
    >> That sounds really bad.
    > 
    > No, the command would fail, but since it fails, it doesn't change the
    > transaction state.
    > 
    > What was proposed originally was to start a transaction but throw an
    > error that leaves the transaction in the aborted state. But then the
    > command had some effect because it started a transaction block, even
    > though it failed.
    
    It certainly seems safer to me to set the transaction to an aborted state; you were expecting a set of commands to run with one snapshot, but if we don't abort the transaction they'll end up running anyway and doing so with the *wrong* snapshot. That could certainly lead to data corruption.
    
    I suspect that all the other cases of BEGIN failing would be syntax errors, so you would immediately know in testing that something was wrong. A missing file is definitely not a syntax error, so we can't really depend on user testing to ensure this is handled correctly. IMO, that makes it critical that that error puts us in an aborted transaction.
    --
    Jim C. Nasby, Database Architect                   jim@nasby.net
    512.569.9461 (cell)                         http://jim.nasby.net
    
    
    
    
  13. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-08-15T22:46:14Z

    On Mon, Aug 15, 2011 at 6:09 PM, Jim Nasby <jim@nasby.net> wrote:
    > I suspect that all the other cases of BEGIN failing would be syntax errors, so
    > you would immediately know in testing that something was wrong. A missing file
    > is definitely not a syntax error, so we can't really depend on user testing to ensure
    > this is handled correctly. IMO, that makes it critical that that error puts us in an
    > aborted transaction.
    
    Why can we not just require the user to verify if his BEGIN query
    failed or succeeded?
    Is that really too much to ask for?
    
    Also see what Robert wrote about proxies in between that keep track of
    the transaction
    state. Consider they see a BEGIN query that fails. How would they know
    if the session
    is now in an aborted transaction or not in a transaction at all?
    
    
    Joachim
    
    
  14. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-16T13:59:04Z

    On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland <joe@mcknight.de> wrote:
    > On Mon, Aug 15, 2011 at 6:09 PM, Jim Nasby <jim@nasby.net> wrote:
    >> I suspect that all the other cases of BEGIN failing would be syntax errors, so
    >> you would immediately know in testing that something was wrong. A missing file
    >> is definitely not a syntax error, so we can't really depend on user testing to ensure
    >> this is handled correctly. IMO, that makes it critical that that error puts us in an
    >> aborted transaction.
    >
    > Why can we not just require the user to verify if his BEGIN query
    > failed or succeeded?
    > Is that really too much to ask for?
    >
    > Also see what Robert wrote about proxies in between that keep track of
    > the transaction
    > state. Consider they see a BEGIN query that fails. How would they know
    > if the session
    > is now in an aborted transaction or not in a transaction at all?
    
    I think the point here is that we should be consistent.  Currently,
    you can make BEGIN fail by doing it on the standby, and asking for
    READ WRITE mode:
    
    rhaas=# begin transaction read write;
    ERROR:  cannot set transaction read-write mode during recovery
    
    After doing that, you are NOT in a transaction context:
    
    rhaas=# select 1;
     ?column?
    ----------
            1
    (1 row)
    
    So whatever this does should be consistent with that, at least IMHO.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  15. Re: synchronized snapshots

    Alvaro Herrera <alvherre@commandprompt.com> — 2011-08-16T14:43:20Z

    Excerpts from Robert Haas's message of mar ago 16 09:59:04 -0400 2011:
    > On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland <joe@mcknight.de> wrote:
    
    > > Also see what Robert wrote about proxies in between that keep track
    > > of the transaction state. Consider they see a BEGIN query that
    > > fails. How would they know if the session is now in an aborted
    > > transaction or not in a transaction at all?
    > 
    > I think the point here is that we should be consistent.  Currently,
    > you can make BEGIN fail by doing it on the standby, and asking for
    > READ WRITE mode:
    > 
    > rhaas=# begin transaction read write;
    > ERROR:  cannot set transaction read-write mode during recovery
    > 
    > After doing that, you are NOT in a transaction context:
    > 
    > rhaas=# select 1;
    >  ?column?
    > ----------
    >         1
    > (1 row)
    > 
    > So whatever this does should be consistent with that, at least IMHO.
    
    I think we argued about a very similar problem years ago and the outcome
    was that you should be left in an aborted transaction block; otherwise
    running a dumb SQL script (which has no way to "abort if it fails")
    could wreak serious havoc (?).  I think this failure to behave in that
    fashion on the standby is something to be fixed, not imitated.
    
    What this says is that a driver or app seeing BEGIN fail should issue
    ROLLBACK before going further -- which seems the intuitive way to behave
    to me.  No?
    
    -- 
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
    
    
  16. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-16T14:46:55Z

    On Tue, Aug 16, 2011 at 10:43 AM, Alvaro Herrera
    <alvherre@commandprompt.com> wrote:
    > Excerpts from Robert Haas's message of mar ago 16 09:59:04 -0400 2011:
    >> On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland <joe@mcknight.de> wrote:
    >
    >> > Also see what Robert wrote about proxies in between that keep track
    >> > of the transaction state. Consider they see a BEGIN query that
    >> > fails. How would they know if the session is now in an aborted
    >> > transaction or not in a transaction at all?
    >>
    >> I think the point here is that we should be consistent.  Currently,
    >> you can make BEGIN fail by doing it on the standby, and asking for
    >> READ WRITE mode:
    >>
    >> rhaas=# begin transaction read write;
    >> ERROR:  cannot set transaction read-write mode during recovery
    >>
    >> After doing that, you are NOT in a transaction context:
    >>
    >> rhaas=# select 1;
    >>  ?column?
    >> ----------
    >>         1
    >> (1 row)
    >>
    >> So whatever this does should be consistent with that, at least IMHO.
    >
    > I think we argued about a very similar problem years ago and the outcome
    > was that you should be left in an aborted transaction block; otherwise
    > running a dumb SQL script (which has no way to "abort if it fails")
    > could wreak serious havoc (?).  I think this failure to behave in that
    > fashion on the standby is something to be fixed, not imitated.
    >
    > What this says is that a driver or app seeing BEGIN fail should issue
    > ROLLBACK before going further -- which seems the intuitive way to behave
    > to me.  No?
    
    Maybe.  But if we're going to change the behavior of BEGIN, then (1)
    we need to think about backward compatibility and (2) we should change
    it across the board.  It's not for this patch to go invent something
    that's inconsistent with what we're already doing elsewhere.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  17. Re: synchronized snapshots

    Jim Nasby <jim@nasby.net> — 2011-08-16T16:01:39Z

    On Aug 15, 2011, at 5:46 PM, Joachim Wieland wrote:
    > On Mon, Aug 15, 2011 at 6:09 PM, Jim Nasby <jim@nasby.net> wrote:
    >> I suspect that all the other cases of BEGIN failing would be syntax errors, so
    >> you would immediately know in testing that something was wrong. A missing file
    >> is definitely not a syntax error, so we can't really depend on user testing to ensure
    >> this is handled correctly. IMO, that makes it critical that that error puts us in an
    >> aborted transaction.
    > 
    > Why can we not just require the user to verify if his BEGIN query
    > failed or succeeded?
    > Is that really too much to ask for?
    
    It's something else that you have to remember to get right. psql, for example, will blindly continue on unless you remembered to tell it to exit on an error.
    
    Also, an invalid transaction seems to be the result of least surprise... if you cared enough to begin a transaction, you're going to expect that either everything between that and the COMMIT succeeds or fails, not something in-between.
    
    > Also see what Robert wrote about proxies in between that keep track of
    > the transaction
    > state. Consider they see a BEGIN query that fails. How would they know
    > if the session
    > is now in an aborted transaction or not in a transaction at all?
    
    AFAIK a proxy can tell if a transaction is in progress or not via libpq. Worst-case, it just needs to send an extra ROLLBACK.
    --
    Jim C. Nasby, Database Architect                   jim@nasby.net
    512.569.9461 (cell)                         http://jim.nasby.net
    
    
    
    
  18. Re: synchronized snapshots

    Jeff Davis <pgsql@j-davis.com> — 2011-08-16T22:40:06Z

    On Tue, 2011-08-16 at 11:01 -0500, Jim Nasby wrote:
    > Also, an invalid transaction seems to be the result of least
    > surprise... if you cared enough to begin a transaction, you're going
    > to expect that either everything between that and the COMMIT succeeds
    > or fails, not something in-between.
    
    Agreed.
    
    Perhaps we need a new utility command to set the snapshot to make the
    error handling a little more obvious?
    
    Regards,
    	Jeff Davis
    
    
    
  19. Re: synchronized snapshots

    Jim Nasby <jim@nasby.net> — 2011-08-16T22:53:43Z

    On Aug 16, 2011, at 5:40 PM, Jeff Davis wrote:
    > On Tue, 2011-08-16 at 11:01 -0500, Jim Nasby wrote:
    >> Also, an invalid transaction seems to be the result of least
    >> surprise... if you cared enough to begin a transaction, you're going
    >> to expect that either everything between that and the COMMIT succeeds
    >> or fails, not something in-between.
    > 
    > Agreed.
    > 
    > Perhaps we need a new utility command to set the snapshot to make the
    > error handling a little more obvious?
    
    Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem).
    
    So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN.
    
    I guess the big question to answer now is: what's the backwards compatibility impact of changing how BEGIN deals with runtime errors?
    --
    Jim C. Nasby, Database Architect                   jim@nasby.net
    512.569.9461 (cell)                         http://jim.nasby.net
    
    
    
    
  20. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-08-17T00:35:50Z

    Jim Nasby <jim@nasby.net> writes:
    > Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem).
    
    > So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN.
    
    I'm not convinced by the above argument, because it requires that
    you pretend there's a significant difference between syntax errors and
    "run time" errors (whatever those are).  Syntax errors in a BEGIN
    command are not going to leave you in an aborted transaction, because
    the backend is not going to recognize the command as a BEGIN at all.
    This means that frontends *must* be capable of dealing with the case
    that a failed BEGIN didn't start a transaction.  (Either that, or
    they just assume their commands are always syntactically perfect,
    which seems like pretty fragile programming to me; and the more weird
    nonstandard options we load onto BEGIN, the less tenable the position
    becomes.  For example, if you feed BEGIN option-foo to a server that's
    a bit older than you thought it was, you will get a syntax error.)
    If we have some failure cases that start a transaction and some that do
    not, we just have a mess, IMO.
    
    I think we'd be far better off to maintain the position that a failed
    BEGIN does not start a transaction, under any circumstances.  To do
    that, we cannot have this new option attached to the BEGIN, which is a
    good thing anyway IMO from a standards compatibility point of view.
    It'd be better to make it a separate utility statement.  There is no
    logical problem in doing that, and we already have a precedent for
    utility statements that do something special before the transaction
    snapshot is taken: see LOCK.
    
    In fact, now that I think about it, setting the transaction snapshot
    from a utility statement would be functionally useful because then you
    could take locks beforehand.
    
    And as a bonus, we don't have a backwards compatibility problem to solve.
    
    			regards, tom lane
    
    
  21. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-17T00:42:31Z

    On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Jim Nasby <jim@nasby.net> writes:
    >> Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem).
    >
    >> So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN.
    >
    > I'm not convinced by the above argument, because it requires that
    > you pretend there's a significant difference between syntax errors and
    > "run time" errors (whatever those are).  Syntax errors in a BEGIN
    > command are not going to leave you in an aborted transaction, because
    > the backend is not going to recognize the command as a BEGIN at all.
    > This means that frontends *must* be capable of dealing with the case
    > that a failed BEGIN didn't start a transaction.  (Either that, or
    > they just assume their commands are always syntactically perfect,
    > which seems like pretty fragile programming to me; and the more weird
    > nonstandard options we load onto BEGIN, the less tenable the position
    > becomes.  For example, if you feed BEGIN option-foo to a server that's
    > a bit older than you thought it was, you will get a syntax error.)
    > If we have some failure cases that start a transaction and some that do
    > not, we just have a mess, IMO.
    
    More or less agreed.
    
    > I think we'd be far better off to maintain the position that a failed
    > BEGIN does not start a transaction, under any circumstances.
    
    Also agreed.
    
    > To do
    > that, we cannot have this new option attached to the BEGIN, ...
    
    Eh, why not?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  22. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-08-17T00:53:34Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I think we'd be far better off to maintain the position that a failed
    >> BEGIN does not start a transaction, under any circumstances.
    
    > Also agreed.
    
    >> To do
    >> that, we cannot have this new option attached to the BEGIN, ...
    
    > Eh, why not?
    
    Maybe I wasn't paying close enough attention to the thread, but I had
    the idea that there was some implementation reason why not.  If not,
    we could still load the option onto BEGIN ... but I still find myself
    liking the idea of a separate command better, because of the locking
    issue.
    
    			regards, tom lane
    
    
  23. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-17T01:08:48Z

    On Tue, Aug 16, 2011 at 8:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    >> On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >>> I think we'd be far better off to maintain the position that a failed
    >>> BEGIN does not start a transaction, under any circumstances.
    >
    >> Also agreed.
    >
    >>> To do
    >>> that, we cannot have this new option attached to the BEGIN, ...
    >
    >> Eh, why not?
    >
    > Maybe I wasn't paying close enough attention to the thread, but I had
    > the idea that there was some implementation reason why not.  If not,
    > we could still load the option onto BEGIN ... but I still find myself
    > liking the idea of a separate command better, because of the locking
    > issue.
    
    Why does it matter whether you take the locks before or after the snapshot?
    
    If you're concerned with minimizing the race, what you should do is
    take all relevant locks in the parent before exporting the snapshot.
    
    I am not wild about adding another toplevel command for this.  It
    seems a rather narrow use case, and attaching it to BEGIN feels
    natural to me.  There may be some small benefit also in terms of
    minimizing the amount of sanity checking that must be done - for
    example, at BEGIN time, you don't have to check for the case where a
    snapshot has already been set.
    
    If we did add another toplevel command, what would we call it?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  24. Re: synchronized snapshots

    Jeff Davis <pgsql@j-davis.com> — 2011-08-17T01:44:56Z

    On Tue, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
    > I'm not convinced by the above argument, because it requires that
    > you pretend there's a significant difference between syntax errors and
    > "run time" errors (whatever those are).
    
    After a syntax error like "COMMMIT" the transaction will remain inside
    the failed transaction block, but an error during COMMIT (e.g. deferred
    constraint check failure) will exit the transaction block.
    
    > I think we'd be far better off to maintain the position that a failed
    > BEGIN does not start a transaction, under any circumstances.  To do
    > that, we cannot have this new option attached to the BEGIN, which is a
    > good thing anyway IMO from a standards compatibility point of view.
    > It'd be better to make it a separate utility statement.
    
    +1 for a utility statement. Much clearer from the user's standpoint what
    kind of errors they might expect, and whether the session will remain in
    a transaction block.
    
    Regards,
    	Jeff Davis
    
    
    
  25. Re: synchronized snapshots

    Jeff Davis <pgsql@j-davis.com> — 2011-08-17T01:54:44Z

    On Tue, 2011-08-16 at 21:08 -0400, Robert Haas wrote: 
    > attaching it to BEGIN feels natural to me.
    
    My only objection is that people have different expectations about
    whether the session will remain in a transaction block when they
    encounter an error. So, it's hard to make this work without surprising
    about half the users.
    
    And there are some fairly significant consequences to users who guess
    that they will remain in a transaction block in case of an error; or who
    are just careless and don't consider that an error may occur.
    
    > If we did add another toplevel command, what would we call it?
    
    "SET TRANSACTION SNAPSHOT" perhaps?
    
    Regards,
    	Jeff Davis
    
    
    
  26. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-08-17T02:07:56Z

    On Tue, Aug 16, 2011 at 9:54 PM, Jeff Davis <pgsql@j-davis.com> wrote:
    >> If we did add another toplevel command, what would we call it?
    >
    > "SET TRANSACTION SNAPSHOT" perhaps?
    
    Hmm, that's not bad, but I think we'd have to partially reserve
    TRANSACTION to make it work, which doesn't seem worth the pain it
    would cause.
    
    We could do something like TRANSACTION SNAPSHOT IS 'xyz', but that's a
    bit awkard.
    
    I still like BEGIN SNAPSHOT 'xyz' -- or even adding a generic options
    list like we use for some other commands, i.e. BEGIN (snapshot 'xyz'),
    which would leave the door open to arbitrary amounts of future
    nonstandard fiddling without the need for any more keywords.  I
    understand the point about the results of a BEGIN failure leaving you
    outside a transaction, but that really only matters if you're doing
    "psql < dumb_script.sql", which is impractical for this feature
    anyway.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  27. Re: synchronized snapshots

    Peter Eisentraut <peter_e@gmx.net> — 2011-08-17T06:59:41Z

    On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
    > In fact, now that I think about it, setting the transaction snapshot
    > from a utility statement would be functionally useful because then you
    > could take locks beforehand.
    
    Another issue is that in some client interfaces, BEGIN and COMMIT are
    hidden behind API calls, which cannot easily be changed or equipped with
    new parameters.  So in order to have this functionality available
    through those interfaces, we'd need a separately callable command.
    
    
    
  28. Re: synchronized snapshots

    Bruce Momjian <bruce@momjian.us> — 2011-08-20T13:56:58Z

    Peter Eisentraut wrote:
    > On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
    > > In fact, now that I think about it, setting the transaction snapshot
    > > from a utility statement would be functionally useful because then you
    > > could take locks beforehand.
    > 
    > Another issue is that in some client interfaces, BEGIN and COMMIT are
    > hidden behind API calls, which cannot easily be changed or equipped with
    > new parameters.  So in order to have this functionality available
    > through those interfaces, we'd need a separately callable command.
    
    How do they set a transaction to SERIALIZABLE?  Seem the same syntax
    should be used here.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + It's impossible for everything to be true. +
    
    
  29. Re: synchronized snapshots

    Peter Eisentraut <peter_e@gmx.net> — 2011-08-24T18:38:42Z

    On lör, 2011-08-20 at 09:56 -0400, Bruce Momjian wrote:
    > Peter Eisentraut wrote:
    > > On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
    > > > In fact, now that I think about it, setting the transaction snapshot
    > > > from a utility statement would be functionally useful because then you
    > > > could take locks beforehand.
    > > 
    > > Another issue is that in some client interfaces, BEGIN and COMMIT are
    > > hidden behind API calls, which cannot easily be changed or equipped with
    > > new parameters.  So in order to have this functionality available
    > > through those interfaces, we'd need a separately callable command.
    > 
    > How do they set a transaction to SERIALIZABLE?  Seem the same syntax
    > should be used here.
    
    The API typically has parameters to set the isolation level, since
    that's a standardized property.
    
    
    
    
  30. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-09-14T02:37:56Z

    On Sun, Aug 14, 2011 at 9:31 PM, Joachim Wieland <joe@mcknight.de> wrote:
    > This is a patch to implement synchronized snapshots. It is based on
    > Alvaro's specifications in:
    >
    > http://archives.postgresql.org/pgsql-hackers/2011-02/msg02074.php
    
    Here is a new version of this patch, what has changed is that the
    snapshot is now imported via:
    
    BEGIN
    [... set serializable or read committed on the BEGIN or via SET TRANSACTION ...]
    SET TRANSACTION SNAPSHOT '00000801-1'
    
    This way any failure importing the snapshot leaves the transaction in
    the aborted state.
    
    I am also attaching a small perl script that demonstrates a
    serialization failure with an imported snapshot.
    
    This is the link to the previous patch:
    http://archives.postgresql.org/pgsql-hackers/2011-08/msg00684.php
    
    
    Joachim
    
  31. Re: synchronized snapshots

    Marko Tiikkaja <marko.tiikkaja@2ndquadrant.com> — 2011-09-28T06:29:08Z

    Hi Joachim,
    
    On 14/09/2011 05:37, Joachim Wieland wrote:
    > Here is a new version of this patch
    
    In a sequence such as this:
    
       BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
       INSERT INTO foo VALUES (-1);
       SELECT pg_export_snapshot();
    
    the row added to "foo" is not visible in the exported snapshot.  If 
    that's the desired behaviour, I think it should be mentioned in the 
    documentation.
    
    I can make a patched backend die with an assertion failure by trying to 
    export a snapshot after rolling back a transaction which exported a 
    snapshot.  Seems like no cleanup is done at transaction abort.
    
    I think that trying to import a snapshot that doesn't exist deserves a 
    better error message.  There's currently no way for the user to know 
    that the snapshot didn't exist, other than looking at the SQLSTATE 
    (22023), and even that doesn't tell me a whole lot without looking at 
    the manual.
    
    Finally, the comment in ImportSnapshot() still mentions the old syntax.
    
    Other than these four problems, the patch looks good.
    
    
    -- 
    Marko Tiikkaja                http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
    
    
  32. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-09-28T12:25:09Z

    Hi Marko,
    
    On Wed, Sep 28, 2011 at 2:29 AM, Marko Tiikkaja
    <marko.tiikkaja@2ndquadrant.com> wrote:
    > In a sequence such as this:
    >
    >  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    >  INSERT INTO foo VALUES (-1);
    >  SELECT pg_export_snapshot();
    >
    > the row added to "foo" is not visible in the exported snapshot.  If that's
    > the desired behaviour, I think it should be mentioned in the documentation.
    
    Yes, that's the desired behaviour, the patch add this paragraph to the
    documentation already:
    
    "Also note that even after the synchronization both clients still run
    their own independent transactions. As a consequence, even though
    synchronized with respect to reading pre-existing data, both
    transactions won't be able to see each other's uncommitted data."
    
    I'll take a look at the other issues and update the patch either
    tonight or tomorrow.
    
    
    Thank you,
    Joachim
    
    
  33. Re: synchronized snapshots

    Joachim Wieland <joe@mcknight.de> — 2011-09-29T02:16:35Z

    On Wed, Sep 28, 2011 at 2:29 AM, Marko Tiikkaja
    <marko.tiikkaja@2ndquadrant.com> wrote:
    > Other than these four problems, the patch looks good.
    
    The attached patch addresses the reported issues.
    
    
    Thanks for the review,
    Joachim
    
  34. Re: synchronized snapshots

    Marko Tiikkaja <marko.tiikkaja@2ndquadrant.com> — 2011-10-03T06:09:12Z

    On 2011-09-28 15:25, Joachim Wieland wrote:
    > Yes, that's the desired behaviour, the patch add this paragraph to the
    > documentation already:
    
    I can't believe I missed that.  My apologies.
    
    On 2011-09-29 05:16, Joachim Wieland wrote:
    > The attached patch addresses the reported issues.
    
    Thanks, this one looks good to me.  Going to mark this patch as ready 
    for committer.
    
    
    -- 
    Marko Tiikkaja                http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training & Services
    
    
  35. Re: synchronized snapshots

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-18T15:18:14Z

    On Mon, Oct 3, 2011 at 7:09 AM, Marko Tiikkaja
    <marko.tiikkaja@2ndquadrant.com> wrote:
    > On 2011-09-28 15:25, Joachim Wieland wrote:
    >>
    >> Yes, that's the desired behaviour, the patch add this paragraph to the
    >> documentation already:
    >
    > I can't believe I missed that.  My apologies.
    >
    > On 2011-09-29 05:16, Joachim Wieland wrote:
    >>
    >> The attached patch addresses the reported issues.
    >
    > Thanks, this one looks good to me.  Going to mark this patch as ready for
    > committer.
    
    
    I don't see any tests with this patch, so I personally won't be the
    committer on this just yet.
    
    Also, not sure why the snapshot id syntax has leading zeroes on first
    part of the number, but not on second part. It will still sort
    incorrectly if that's what we were trying to achieve. Either leave off
    the leading zeroes on first part of add them to second.
    
    We probably need some more discussion added to the README about this.
    
    I'm also concerned that we are adding this to the BEGIN statement as
    the only option. I don't have a problem with it being there, but I do
    think it is a problem to make it the *only* way to use this. Altering
    BEGIN gives clear problems with any API that does the begin and commit
    for you, such as perl DBI, java JDBC to name just two. I can't really
    see its a good implementation if we say this won't work until client
    APIs follow our new non-standard syntax. I wouldn't block commit on
    this point, but I think we should work on alternative ways to invoke
    this feature as well.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  36. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-18T17:22:51Z

    Simon Riggs <simon@2ndQuadrant.com> writes:
    > On Mon, Oct 3, 2011 at 7:09 AM, Marko Tiikkaja
    >> Thanks, this one looks good to me. Going to mark this patch as ready for
    >> committer.
    
    > I don't see any tests with this patch, so I personally won't be the
    > committer on this just yet.
    
    I've already taken it according to the commitfest app.  There's a lot of
    things I don't like stylistically, but they all seem fixable, and I'm
    working through them now.  The only actual bug I've found so far is a
    race condition while setting MyProc->xmin (you can't do that separately
    from verifying that the source transaction is still running, else
    somebody else could see a global xmin that's gone backwards).
    
    > Also, not sure why the snapshot id syntax has leading zeroes on first
    > part of the number, but not on second part. It will still sort
    > incorrectly if that's what we were trying to achieve. Either leave off
    > the leading zeroes on first part of add them to second.
    
    The first part is of fixed length, the second not so much.  I'm not
    wedded to the syntax but I don't see anything wrong with it either.
    
    > I'm also concerned that we are adding this to the BEGIN statement as
    > the only option.
    
    Huh?  The last version of the patch has it only as SET TRANSACTION
    SNAPSHOT, which I think is the right way.
    
    			regards, tom lane
    
    
  37. Re: synchronized snapshots

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-18T17:32:37Z

    On Tue, Oct 18, 2011 at 6:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    
    >> I'm also concerned that we are adding this to the BEGIN statement as
    >> the only option.
    >
    > Huh?  The last version of the patch has it only as SET TRANSACTION
    > SNAPSHOT, which I think is the right way.
    
    Sorry Tom, didn't see your name on it earlier, thats not shown on the
    main CF display and I didn't think to check on the detail. Please
    continue.
    
    I misread the SET TRANSACTION docs changes. Happy with that.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  38. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-19T16:17:28Z

    Joachim Wieland <joe@mcknight.de> writes:
    > [ synchronized-snapshots patch ]
    
    Looking through this code, it strikes me that SET TRANSACTION SNAPSHOT
    is fundamentally incompatible with SERIALIZABLE READ ONLY DEFERRABLE
    mode.  That mode assumes that you should be able to just take a new
    snapshot, repeatedly, until you get one that's "safe".  With the patch
    as written, if the supplied snapshot is "unsafe", GetSafeSnapshot()
    will just go into an infinite loop.
    
    AFAICS we should just throw an error if SET TRANSACTION SNAPSHOT is done
    in a transaction with those properties.  Has anyone got another
    interpretation?  Would it be better to silently ignore the DEFERRABLE
    property?
    
    			regards, tom lane
    
    
  39. Re: synchronized snapshots

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-19T16:20:17Z

    On 19.10.2011 19:17, Tom Lane wrote:
    > Joachim Wieland<joe@mcknight.de>  writes:
    >> [ synchronized-snapshots patch ]
    >
    > Looking through this code, it strikes me that SET TRANSACTION SNAPSHOT
    > is fundamentally incompatible with SERIALIZABLE READ ONLY DEFERRABLE
    > mode.  That mode assumes that you should be able to just take a new
    > snapshot, repeatedly, until you get one that's "safe".  With the patch
    > as written, if the supplied snapshot is "unsafe", GetSafeSnapshot()
    > will just go into an infinite loop.
    >
    > AFAICS we should just throw an error if SET TRANSACTION SNAPSHOT is done
    > in a transaction with those properties.  Has anyone got another
    > interpretation?  Would it be better to silently ignore the DEFERRABLE
    > property?
    
    An error seems appropriate to me.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  40. Re: synchronized snapshots

    Florian G. Pflug <fgp@phlo.org> — 2011-10-19T16:44:08Z

    On Oct19, 2011, at 18:17 , Tom Lane wrote:
    > Joachim Wieland <joe@mcknight.de> writes:
    >> [ synchronized-snapshots patch ]
    > 
    > Looking through this code, it strikes me that SET TRANSACTION SNAPSHOT
    > is fundamentally incompatible with SERIALIZABLE READ ONLY DEFERRABLE
    > mode.  That mode assumes that you should be able to just take a new
    > snapshot, repeatedly, until you get one that's "safe".  With the patch
    > as written, if the supplied snapshot is "unsafe", GetSafeSnapshot()
    > will just go into an infinite loop.
    > 
    > AFAICS we should just throw an error if SET TRANSACTION SNAPSHOT is done
    > in a transaction with those properties.  Has anyone got another
    > interpretation?  Would it be better to silently ignore the DEFERRABLE
    > property?
    
    Hm, both features are meant to be used by pg_dump, so think we should
    make the combination work. It'd say SET TRANSACTION SNAPSHOT should throw
    an error only if the transaction is marked READ ONLY DEFERRABLE *and*
    the provided snapshot isn't "safe".
    
    This allows a deferrable snapshot to be used on a second connection (
    by e.g. pg_dump), and still be marked as DEFERRABLE. If we throw an
    error unconditionally, the second connection has to import the snapshot
    without marking it DEFERRABLE, which I think has consequences for
    performance. AFAIR, the SERIALIZABLE implementation is able to skip
    almost all (or all? Kevin?) SIREAD lock acquisitions in DEFERRABLE READ
    ONLY transaction, because those cannot participate in dangerous (i.e.
    non-serializable) dependency structures.
    
    best regards,
    Florian Pflug
    
    
    
    
  41. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-19T17:02:34Z

    Florian Pflug <fgp@phlo.org> writes:
    > On Oct19, 2011, at 18:17 , Tom Lane wrote:
    >> AFAICS we should just throw an error if SET TRANSACTION SNAPSHOT is done
    >> in a transaction with those properties.  Has anyone got another
    >> interpretation?  Would it be better to silently ignore the DEFERRABLE
    >> property?
    
    > Hm, both features are meant to be used by pg_dump, so think we should
    > make the combination work. It'd say SET TRANSACTION SNAPSHOT should throw
    > an error only if the transaction is marked READ ONLY DEFERRABLE *and*
    > the provided snapshot isn't "safe".
    
    Um, no, I don't think so.  It would be sensible for the "leader"
    transaction to use READ ONLY DEFERRABLE and then export the snapshot it
    got (possibly after waiting).  It doesn't follow that the child
    transactions should use DEFERRABLE too.  They're not going to wait.
    
    > This allows a deferrable snapshot to be used on a second connection (
    > by e.g. pg_dump), and still be marked as DEFERRABLE. If we throw an
    > error unconditionally, the second connection has to import the snapshot
    > without marking it DEFERRABLE, which I think has consequences for
    > performance.
    
    No, I don't believe that either.  AIUI the performance benefit comes if
    the snapshot is recognized as safe.  DEFERRABLE only means to keep
    retrying until you get a safe one.  This is nonsense when you're
    importing the snapshot.
    
    			regards, tom lane
    
    
  42. Re: synchronized snapshots

    Robert Haas <robertmhaas@gmail.com> — 2011-10-19T17:10:02Z

    On Wed, Oct 19, 2011 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Florian Pflug <fgp@phlo.org> writes:
    >> On Oct19, 2011, at 18:17 , Tom Lane wrote:
    >>> AFAICS we should just throw an error if SET TRANSACTION SNAPSHOT is done
    >>> in a transaction with those properties.  Has anyone got another
    >>> interpretation?  Would it be better to silently ignore the DEFERRABLE
    >>> property?
    >
    >> Hm, both features are meant to be used by pg_dump, so think we should
    >> make the combination work. It'd say SET TRANSACTION SNAPSHOT should throw
    >> an error only if the transaction is marked READ ONLY DEFERRABLE *and*
    >> the provided snapshot isn't "safe".
    >
    > Um, no, I don't think so.  It would be sensible for the "leader"
    > transaction to use READ ONLY DEFERRABLE and then export the snapshot it
    > got (possibly after waiting).  It doesn't follow that the child
    > transactions should use DEFERRABLE too.  They're not going to wait.
    >
    >> This allows a deferrable snapshot to be used on a second connection (
    >> by e.g. pg_dump), and still be marked as DEFERRABLE. If we throw an
    >> error unconditionally, the second connection has to import the snapshot
    >> without marking it DEFERRABLE, which I think has consequences for
    >> performance.
    >
    > No, I don't believe that either.  AIUI the performance benefit comes if
    > the snapshot is recognized as safe.  DEFERRABLE only means to keep
    > retrying until you get a safe one.  This is nonsense when you're
    > importing the snapshot.
    
    I think the requirement is that we need to do the appropriate push-ups
    so that the people who import the snapshot know that it's safe, and
    that the SSI stuff can all be skipped.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  43. Re: synchronized snapshots

    Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-10-19T17:49:07Z

    Robert Haas <robertmhaas@gmail.com> wrote:
    > Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> Florian Pflug <fgp@phlo.org> writes:
     
    >>> This allows a deferrable snapshot to be used on a second
    >>> connection (by e.g. pg_dump), and still be marked as DEFERRABLE.
    >>> If we throw an error unconditionally, the second connection has
    >>> to import the snapshot without marking it DEFERRABLE, which I
    >>> think has consequences for performance.
    >>
    >> No, I don't believe that either.  AIUI the performance benefit
    >> comes if the snapshot is recognized as safe.  DEFERRABLE only
    >> means to keep retrying until you get a safe one.
     
    Right, there are other circumstances in which a READ ONLY
    transaction's snapshot may be recognized as safe, and it can opt out
    of all the additional SSI logic.  As you say, DEFERRABLE means we
    *wait* for that.
     
    >> This is nonsense when you're importing the snapshot.
     
    Agreed.
     
    > I think the requirement is that we need to do the appropriate
    > push-ups so that the people who import the snapshot know that it's
    > safe, and that the SSI stuff can all be skipped.
     
    If the snapshot was safe in the first process, it will be safe for
    any others with which it is shared.  Basically, a SERIALIZABLE READ
    ONLY DEFERRABLE transaction waits for a snapshot which, as a READ
    ONLY transaction, can't see any serialization anomalies.  It can run
    exactly like a REPEATABLE READ transaction.  In fact, it would be OK
    from a functional perspective if the first transaction in pg_dump
    got a safe snapshot through DEFERRABLE techniques and then shared it
    with REPEATABLE READ transactions.
     
    I don't know which is the best way to implement this, but it should
    be fine to skip the DEFERRABLE logic for secondary users, as long as
    they are READ ONLY.
     
    -Kevin
    
    
  44. Re: synchronized snapshots

    Florian G. Pflug <fgp@phlo.org> — 2011-10-19T20:30:26Z

    On Oct19, 2011, at 19:49 , Kevin Grittner wrote:
    > Robert Haas <robertmhaas@gmail.com> wrote:
    >> Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >>> Florian Pflug <fgp@phlo.org> writes:
    > 
    >>>> This allows a deferrable snapshot to be used on a second
    >>>> connection (by e.g. pg_dump), and still be marked as DEFERRABLE.
    >>>> If we throw an error unconditionally, the second connection has
    >>>> to import the snapshot without marking it DEFERRABLE, which I
    >>>> think has consequences for performance.
    >>> 
    >>> No, I don't believe that either.  AIUI the performance benefit
    >>> comes if the snapshot is recognized as safe.  DEFERRABLE only
    >>> means to keep retrying until you get a safe one.
    > 
    > Right, there are other circumstances in which a READ ONLY
    > transaction's snapshot may be recognized as safe, and it can opt out
    > of all the additional SSI logic.  As you say, DEFERRABLE means we
    > *wait* for that.
    
    Oh, cool. I thought the opt-out only works for explicitly DEFERRABLE
    transactions.
    
    best regards,
    Florian Pflug
    
    
    
  45. Re: synchronized snapshots

    Kevin Grittner <kevin.grittner@wicourts.gov> — 2011-10-19T20:52:07Z

    Florian Pflug <fgp@phlo.org> wrote:
     
    > Oh, cool. I thought the opt-out only works for explicitly
    > DEFERRABLE transactions.
     
    Basically, if there is no serializable read-write transaction active
    which overlaps the read-only transaction and also overlaps a
    serializable transaction which wrote something and committed in time
    to be visible to the read-only transaction, then the read-only
    transaction's snapshot is "safe" and it can stop worrying about SSI
    logic.  If these conditions happen to exist when a read-only
    transaction is starting, it never needs to set up for SSI; it can
    run just like a REPEATABLE READ transaction and still be safe from
    serialization anomalies.  We make some effort to spot the transition
    to this state while a read-only transaction is running, allowing it
    to "drop out" of SSI while running.
     
    The fact that a read-only transaction can often skip some or all of
    the SSI overhead (beyond determining that opting out is safe) is why
    declaring transactions to be READ ONLY when possible is #1 on my
    list of performance considerations for SSI.
     
    -Kevin
    
    
  46. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-22T22:25:52Z

    Joachim Wieland <joe@mcknight.de> writes:
    > [ synchronized snapshots patch ]
    
    Applied with, um, rather extensive editorialization.
    
    I'm not convinced that the SSI case is bulletproof yet, but it'll be
    easier to test with the code committed.
    
    			regards, tom lane
    
    
  47. Re: synchronized snapshots

    Thom Brown <thom@linux.com> — 2011-10-23T00:19:07Z

    On 23 October 2011 00:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Joachim Wieland <joe@mcknight.de> writes:
    >> [ synchronized snapshots patch ]
    >
    > Applied with, um, rather extensive editorialization.
    >
    > I'm not convinced that the SSI case is bulletproof yet, but it'll be
    > easier to test with the code committed.
    
    Can I ask why it doesn't return the same snapshot ID each time?
    Surely it can't change since you can only export the snapshot of a
    serializable or repeatable read transaction?  A "SELECT
    count(pg_export_snapshot()) FROM generate_series(1,10000000);" would
    quickly bork the pg_snapshots directory which any user can run.
    
    -- 
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
    
    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  48. Re: synchronized snapshots

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-23T02:15:00Z

    Thom Brown <thom@linux.com> writes:
    > Can I ask why it doesn't return the same snapshot ID each time?
    > Surely it can't change since you can only export the snapshot of a
    > serializable or repeatable read transaction?
    
    No, that's incorrect.  You can export from a READ COMMITTED transaction;
    indeed, you'd more or less have to, if you want the control transaction
    to be able to see what the slaves do.
    
    > A "SELECT
    > count(pg_export_snapshot()) FROM generate_series(1,10000000);" would
    > quickly bork the pg_snapshots directory which any user can run.
    
    Shrug ... you can create a much more severe DOS problem by making
    zillions of tables, if the filesystem doesn't handle lots-o-files
    well.
    
    			regards, tom lane
    
    
  49. Re: synchronized snapshots

    Thom Brown <thom@linux.com> — 2011-10-23T09:00:30Z

    On 23 October 2011 03:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Thom Brown <thom@linux.com> writes:
    >> Can I ask why it doesn't return the same snapshot ID each time?
    >> Surely it can't change since you can only export the snapshot of a
    >> serializable or repeatable read transaction?
    >
    > No, that's incorrect.  You can export from a READ COMMITTED transaction;
    > indeed, you'd more or less have to, if you want the control transaction
    > to be able to see what the slaves do.
    
    My bad.  I didn't read the documentation carefully enough.  I can make
    sense of it now.
    
    Thanks
    
    -- 
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
    
    EnterpriseDB UK: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company