Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Treat 2PC commit/abort the same as regular xacts in recovery.

  1. pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-08T08:36:38Z

    On Thu, Sep 8, 2011 at 7:06 AM, Chris Redekop <chris@replicon.com> wrote:
    > Is there anything available to get the last time a transaction
    > occurred?....like say "pg_last_xact_timestamp"?  In order to accurately
    > calculate how far behind my slave is I need to do something like
    > master::pg_last_xact_timestamp() -
    > slave::pg_last_xact_replay_timestamp()....currently I'm using now() instead
    > of the pg_last_xact_timestamp() call, but then when the master is not busy
    > the slave appears to lag behind.  I'm considering writing a C module to get
    > the last modified file time of the xlog, but I'm hoping there is a better
    > alternative that I haven't found yet....
    
    The above has been posted in pgsql-general. The reason why Chris thinks
    a counterpart of pg_last_xact_replay_timestamp() is required sounds
    reasonable to me. So I'd like to propose new function
    "pg_last_xact_insert_timestamp" as the counterpart, which returns the
    timestamp of the last inserted commit or abort WAL record. I'll add the
    attached patch into the next CF.
    
    Comments?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  2. Re: pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-09-08T08:55:40Z

    On Thu, Sep 8, 2011 at 9:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Thu, Sep 8, 2011 at 7:06 AM, Chris Redekop <chris@replicon.com> wrote:
    >> Is there anything available to get the last time a transaction
    >> occurred?....like say "pg_last_xact_timestamp"?  In order to accurately
    >> calculate how far behind my slave is I need to do something like
    >> master::pg_last_xact_timestamp() -
    >> slave::pg_last_xact_replay_timestamp()....currently I'm using now() instead
    >> of the pg_last_xact_timestamp() call, but then when the master is not busy
    >> the slave appears to lag behind.  I'm considering writing a C module to get
    >> the last modified file time of the xlog, but I'm hoping there is a better
    >> alternative that I haven't found yet....
    >
    > The above has been posted in pgsql-general. The reason why Chris thinks
    > a counterpart of pg_last_xact_replay_timestamp() is required sounds
    > reasonable to me. So I'd like to propose new function
    > "pg_last_xact_insert_timestamp" as the counterpart, which returns the
    > timestamp of the last inserted commit or abort WAL record. I'll add the
    > attached patch into the next CF.
    
    For reasons stated on the original thread, I don't think we need this.
    
    The OP can calculate what he wants without this.
    
    The code already exists in recovery and has some purpose there. Adding
    similar code to the mainline just so somebody can run this function
    occasionally is not good. Based on this I will be looking to see if we
    can optimise the recovery path some more.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  3. Re: pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-08T10:14:46Z

    On Thu, Sep 8, 2011 at 5:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Thu, Sep 8, 2011 at 9:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> The above has been posted in pgsql-general. The reason why Chris thinks
    >> a counterpart of pg_last_xact_replay_timestamp() is required sounds
    >> reasonable to me. So I'd like to propose new function
    >> "pg_last_xact_insert_timestamp" as the counterpart, which returns the
    >> timestamp of the last inserted commit or abort WAL record. I'll add the
    >> attached patch into the next CF.
    >
    > For reasons stated on the original thread, I don't think we need this.
    >
    > The OP can calculate what he wants without this.
    >
    > The code already exists in recovery and has some purpose there. Adding
    > similar code to the mainline just so somebody can run this function
    > occasionally is not good. Based on this I will be looking to see if we
    > can optimise the recovery path some more.
    
    Okay. Let me explain another use case of pg_last_xact_insert_timestamp().
    The existing functions might be enough for checking whether the standby
    has already caught up with the master. But I think that new function would be
    very useful to calculate *how much* the standby is behind from the master.
    
    Of course, we can do that by using the existing functions. But a problem is
    that they return LSN and the calculated delay is represented as the size of
    WAL. For users, it's not easy to handle LSN (e.g., there is no function to do
    calculation of LSN), and the delay in WAL size is not intuitive. I've sometimes
    received such a complaint.
    
    OTOH, new function enables users to monitor the delay as a timestamp.
    For users, a timestamp is obviously easier to handle than LSN, and the delay
    as a timestamp is more intuitive. So, I think that it's worth adding
    something like pg_last_xact_insert_timestamp into core for improvement
    of user-friendness.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  4. Re: pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-09-08T13:03:14Z

    On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > OTOH, new function enables users to monitor the delay as a timestamp.
    > For users, a timestamp is obviously easier to handle than LSN, and the delay
    > as a timestamp is more intuitive. So, I think that it's worth adding
    > something like pg_last_xact_insert_timestamp into core for improvement
    > of user-friendness.
    
    It seems very nice from a usability point of view, but I have to agree
    with Simon's concern about performance.  Actually, as of today,
    WALInsertLock is such a gigantic bottleneck that I suspect the
    overhead of this additional bookkeeping would be completely
    unnoticeable.  But I'm still reluctant to add more centralized
    spinlocks that everyone has to fight over, having recently put a lot
    of effort into getting rid of some of the ones we've traditionally
    had.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  5. Re: pg_last_xact_insert_timestamp

    Chris Redekop <chris@replicon.com> — 2011-09-08T22:33:39Z

    Thanks for all the feedback guys.  Just to throw another monkey wrench in
    here - I've been playing with Simon's proposed solution of returning 0 when
    the WAL positions match, and I've come to the realizatiion that even if
    using pg_last_xact_insert_timestamp, although it would help, we still
    wouldn't be able to get a 100% accurate "how far behind?" counter....not
    that this is a big deal, but I know my ops team is going to bitch to me
    about it :).....take this situation: there's a lull of 30 seconds where
    there are no transactions committed on the server....the slave is totally
    caught up, WAL positions match, I'm reporting 0, everything is happy.  Then
    a transaction is committed on the master....before the slave gets it my
    query hits it and sees that we're 30 seconds behind (when in reality we're
    <1sec behind).....Because of this affect my graph is a little spikey...I
    mean it's not a huge deal or anything - I can put some sanity checking in my
    number reporting ("if 1 second ago you were 0 seconds behind, you can't be
    more than 1 second behind now" sorta thing).  But if we wanted to go for
    super-ideal solution there would be a way to get the timestamp of
    pg_stat_replication.replay_location+1 (the first transaction that the slave
    does not have).
    
    
    On Thu, Sep 8, 2011 at 7:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > > OTOH, new function enables users to monitor the delay as a timestamp.
    > > For users, a timestamp is obviously easier to handle than LSN, and the
    > delay
    > > as a timestamp is more intuitive. So, I think that it's worth adding
    > > something like pg_last_xact_insert_timestamp into core for improvement
    > > of user-friendness.
    >
    > It seems very nice from a usability point of view, but I have to agree
    > with Simon's concern about performance.  Actually, as of today,
    > WALInsertLock is such a gigantic bottleneck that I suspect the
    > overhead of this additional bookkeeping would be completely
    > unnoticeable.  But I'm still reluctant to add more centralized
    > spinlocks that everyone has to fight over, having recently put a lot
    > of effort into getting rid of some of the ones we've traditionally
    > had.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
  6. Re: pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-09T00:42:03Z

    On Thu, Sep 8, 2011 at 10:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> OTOH, new function enables users to monitor the delay as a timestamp.
    >> For users, a timestamp is obviously easier to handle than LSN, and the delay
    >> as a timestamp is more intuitive. So, I think that it's worth adding
    >> something like pg_last_xact_insert_timestamp into core for improvement
    >> of user-friendness.
    >
    > It seems very nice from a usability point of view, but I have to agree
    > with Simon's concern about performance.  Actually, as of today,
    > WALInsertLock is such a gigantic bottleneck that I suspect the
    > overhead of this additional bookkeeping would be completely
    > unnoticeable.
    
    The patch I've posted adds one acquisition and release of spinlock per
    a commit or abort. But it's done outside of WALInsertLock. So I don't
    think that the patch degrades a performance.
    
    > But I'm still reluctant to add more centralized
    > spinlocks that everyone has to fight over, having recently put a lot
    > of effort into getting rid of some of the ones we've traditionally
    > had.
    
    You are against adding new acquisition and release of spinlock itself
    even if it has nothing to do with WALInsertLock? The patch uses
    XLogCtl->info_lck spinlock to save the last insert timestamp in the
    shmem. XLogCtl->info_lck already protects many shmem variables
    related to XLOG. So using XLogCtl->info_lck additionally might make
    its contention heavier and degrade a performance. If the patch
    defines new spinlock and uses it to save the timestamp to avoid
    such a contention, you feel satisfied with the patch?
    
    Another idea to avoid spinlock contention is save the timestamp in
    PgBackendStatus (which contains information for pg_stat_activity).
    This enables us to write and read the timestamp without spinlock.
    Comments?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  7. Re: pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-09-09T15:32:46Z

    On Thu, Sep 8, 2011 at 8:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > Another idea to avoid spinlock contention is save the timestamp in
    > PgBackendStatus (which contains information for pg_stat_activity).
    > This enables us to write and read the timestamp without spinlock.
    > Comments?
    
    That seems like a possibly promising approach, in that each backend
    could update the information separately, and it's the reader's job to
    go find the maximum of all those values when needed.  So the overhead
    is (properly, in this case) placed on the reader instead of the
    writer.  But it's a bit tricky, because when the reader wants that
    maximum, it has to take into account inactive backends that may have
    committed transactions before exiting, not just the ones that are
    still connected.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  8. Re: pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-11T04:24:10Z

    On Sat, Sep 10, 2011 at 12:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Thu, Sep 8, 2011 at 8:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> Another idea to avoid spinlock contention is save the timestamp in
    >> PgBackendStatus (which contains information for pg_stat_activity).
    >> This enables us to write and read the timestamp without spinlock.
    >> Comments?
    >
    > That seems like a possibly promising approach, in that each backend
    > could update the information separately, and it's the reader's job to
    > go find the maximum of all those values when needed.  So the overhead
    > is (properly, in this case) placed on the reader instead of the
    > writer.  But it's a bit tricky, because when the reader wants that
    > maximum, it has to take into account inactive backends that may have
    > committed transactions before exiting, not just the ones that are
    > still connected.
    
    Yes, that's what I was thinking. The attached patch has adopted that
    approach.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  9. [REVIEW] pg_last_xact_insert_timestamp

    Kyotaro Horiguchi <horiguchi.kyotaro@oss.ntt.co.jp> — 2011-09-14T09:21:08Z

    Hi, This is a review for pg_last_xact_insert_timestamp patch.
    (https://commitfest.postgresql.org/action/patch_view?id=634)
    
    
    Summary
    ====================
    There's one question and two comments.
    
    Q1: The shmem entry for timestamp is not initialized on
    allocating. Is this OK? (I don't know that for OSs other than
    Linux) And zeroing double field is OK for all OSs?
    
    And you can find the two more comment in "Code details" section.
    
    I have no objection for commiter review with the Q1 above is
    cleared.
    
    
    
    Purpose and function of this patch
    ====================
    
    This patch allows users to grip the amount of replay lag on the
    standby by giving the timestamp of the WAL latestly inserted on
    the primary.
    
    Each backend writes the timestamp of the just inserted log record
    of commit/abort freely on PgBackendStatus, and the function
    pg_last_xact_insert_timestamp() gathers them for all backends
    including inactive ones and returns the latest one, or NULL when
    no timestamp is saved.
    
    Implemented in lockless way.
    
    
    
    Patch application, regression test
    ====================
    This patch applies cleanly onto HEAD.
    make world completed successfully. make check passed.
    Style of the code and patch seems correct.
    
    
    Documentation
    ====================
    It looks properly documented for new functions.
    
    
    Performance
    ====================
    
    This patch writes one TimestampTz value on shmem per one WAL
    insertion for commit/abort with no lock, and collect the values
    for all backends on reading without any locks.
    
    I think the former gives virtually no penalty for performance of
    transactions, and the latter has no influence on other
    transactions thanks to the lockless implement.
    
    
    Code details
    ====================
    
    == Shared memory initialization
    
    beentry->st_xact_end_timestamp is not initialized on backend
    startup. This is because the life time of the field is beyond
    that of backends. On the other hand, Linux man page says that
    newly created shared memory segment is zeroed, but I don't have
    information about other OSs.
    
    Nevertheless this is ok for all OSs, I don't know whether
    initializing TimestampTz(double, int64 is ok) field with 8 bytes
    zeros is OK or not, for all platforms. (It is ok for
    IEEE754-binary64).
    
    
    
    == Modification detection protocol in pgstat.c
    
    In pgstat_report_xact_end_timestamp, `beentry->st_changecount
    protocol' is used. It is for avoiding reading halfway-updated
    beentry as described in pgstat.h. Meanwhile,
    beentry->st_xact_end_timestamp is not read or (re-)initialized in
    pgstat.c and xlog.c reads only this field of whole beentry and
    st_changecount is not get cared here..
    
    So I think at present it is needless to touch st_changecount
    here.
    
    Of cource the penalty is very close to nothing so it may be there
    for future use...
    
    
    == Code duplication in xact.c
    
    in xact.c, same lines inserted into the end of both IF and ELSE
    blocks.
    
    xact.c:1047>	pgstat_report_xact_end_timestamp(xlrec.xact_time);
    xact.c:1073>	pgstat_report_xact_end_timestamp(xlrec.xact_time);
    
    These two lines refer to xlrec.xact_time, both of which comes
    from xactStopTimestamp freezed at xact.c:986
    
    xact.c:986>	SetCurrentTransactionStopTimestamp();
    xact.c:1008>	xlrec.xact_time = xactStopTimestamp;
    xact.c:1051>	xlrec.xact_time = xactStopTimestamp;
    
    I think it is better to move this line to just after this ELSE
    block using xactStopTimestamp instead of xlrec.xact_time.  Please
    leave it as it is if you put more importance on the similarity
    with the code inserted into RecordTransactionAbort().
    
    
    Conclusion
    ====================
    
    With the issue of shmem zeroing on other OSs is cleard, I have no
    objection to commit this patch.
    
    
    
    Sincerely,
    
    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center
    
    
  10. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-15T08:52:53Z

    On Wed, Sep 14, 2011 at 6:21 PM, Kyotaro HORIGUCHI
    <horiguchi.kyotaro@oss.ntt.co.jp> wrote:
    > Hi, This is a review for pg_last_xact_insert_timestamp patch.
    > (https://commitfest.postgresql.org/action/patch_view?id=634)
    
    Thanks for the review!
    
    > Q1: The shmem entry for timestamp is not initialized on
    > allocating. Is this OK? (I don't know that for OSs other than
    > Linux) And zeroing double field is OK for all OSs?
    
    CreateSharedBackendStatus() initializes that shmem entries by doing
    MemSet(BackendStatusArray, 0, size). You think this is not enough?
    
    > Nevertheless this is ok for all OSs, I don't know whether
    > initializing TimestampTz(double, int64 is ok) field with 8 bytes
    > zeros is OK or not, for all platforms. (It is ok for
    > IEEE754-binary64).
    
    Which code are you concerned about?
    
    > == Modification detection protocol in pgstat.c
    >
    > In pgstat_report_xact_end_timestamp, `beentry->st_changecount
    > protocol' is used. It is for avoiding reading halfway-updated
    > beentry as described in pgstat.h. Meanwhile,
    > beentry->st_xact_end_timestamp is not read or (re-)initialized in
    > pgstat.c and xlog.c reads only this field of whole beentry and
    > st_changecount is not get cared here..
    
    No, st_changecount is used to read st_xact_end_timestamp.
    st_xact_end_timestamp is read from the shmem to the local memory
    in pgstat_read_current_status(), and this function always checks
    st_changecount when reading the shmem value.
    
    > == Code duplication in xact.c
    >
    > in xact.c, same lines inserted into the end of both IF and ELSE
    > blocks.
    >
    > xact.c:1047>    pgstat_report_xact_end_timestamp(xlrec.xact_time);
    > xact.c:1073>    pgstat_report_xact_end_timestamp(xlrec.xact_time);
    >
    > These two lines refer to xlrec.xact_time, both of which comes
    > from xactStopTimestamp freezed at xact.c:986
    >
    > xact.c:986>     SetCurrentTransactionStopTimestamp();
    > xact.c:1008>    xlrec.xact_time = xactStopTimestamp;
    > xact.c:1051>    xlrec.xact_time = xactStopTimestamp;
    >
    > I think it is better to move this line to just after this ELSE
    > block using xactStopTimestamp instead of xlrec.xact_time.
    
    Okay, I've changed the patch in that way.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  11. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-09-23T13:34:19Z

    On Thu, Sep 15, 2011 at 4:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > Thanks for the review!
    
    Koyotaro Horiguchi -
    
    Are you going to re-review the latest version of this patch?
    
    Thanks,
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  12. Re: [REVIEW] pg_last_xact_insert_timestamp

    Kyotaro Horiguchi <horiguchi.kyotaro@oss.ntt.co.jp> — 2011-09-29T08:20:33Z

    Sorry for late to re-review.
    
     One question is remaind, 
    
    > > Q1: The shmem entry for timestamp is not initialized on
    > > allocating. Is this OK? (I don't know that for OSs other than
    > > Linux) And zeroing double field is OK for all OSs?
    > 
    > CreateSharedBackendStatus() initializes that shmem entries by doing
    > MemSet(BackendStatusArray, 0, size). You think this is not enough?
    
     Sorry for missing it. That's enough.
    
    > > Nevertheless this is ok for all OSs, I don't know whether
    > > initializing TimestampTz(double, int64 is ok) field with 8 bytes
    > > zeros is OK or not, for all platforms. (It is ok for
    > > IEEE754-binary64).
    > 
    > Which code are you concerned about?
    
    xlog.c: 5889
    > 	beentry = pgstat_fetch_all_beentry();
    > 
    > 	for (i = 0; i < MaxBackends; i++, beentry++)
    > 	{
    > 		xtime = beentry->st_xact_end_timestamp;
    
     I think the last line in quoted code above reads possibly
    zero-initialized double (or int64) value, then the doubted will
    be compared and copied to another double.
    
    > 		if (result < xtime)
    > 			result = xtime;
    > 
    
      
    > No, st_changecount is used to read st_xact_end_timestamp.
    > st_xact_end_timestamp is read from the shmem to the local memory
    > in pgstat_read_current_status(), and this function always checks
    > st_changecount when reading the shmem value.
    
     Yes, I confirmed that pg_lats_xact_insert_timestamp looks local
    copy of BackendStatus. I've found it not unnecessary. 
    
    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center
    
    
  13. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-29T12:21:32Z

    On Thu, Sep 29, 2011 at 5:20 PM, Kyotaro HORIGUCHI
    <horiguchi.kyotaro@oss.ntt.co.jp> wrote:
    > Sorry for late to re-review.
    
    Thanks!
    
    >> > Nevertheless this is ok for all OSs, I don't know whether
    >> > initializing TimestampTz(double, int64 is ok) field with 8 bytes
    >> > zeros is OK or not, for all platforms. (It is ok for
    >> > IEEE754-binary64).
    >>
    >> Which code are you concerned about?
    >
    > xlog.c: 5889
    >>       beentry = pgstat_fetch_all_beentry();
    >>
    >>       for (i = 0; i < MaxBackends; i++, beentry++)
    >>       {
    >>               xtime = beentry->st_xact_end_timestamp;
    >
    >  I think the last line in quoted code above reads possibly
    > zero-initialized double (or int64) value, then the doubted will
    > be compared and copied to another double.
    >
    >>               if (result < xtime)
    >>                       result = xtime;
    
    I believe it's safe. Such a code is placed elsewhere in the source, too.
    If it's unsafe, we should have received lots of bug reports related
    to that. But we've not.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  14. Re: [REVIEW] pg_last_xact_insert_timestamp

    Kyotaro Horiguchi <horiguchi.kyotaro@oss.ntt.co.jp> — 2011-09-30T02:24:43Z

    Hello,
    
    I understand that it has been at least practically no problem.
    
    Ok, I send this patch to comitters.
    
    Thanks for your dealing with nuisance questions.
    
    
    At Thu, 29 Sep 2011 21:21:32 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwG0C21F0CZY5ExX-49dxdx7hJuNeiBBJ0Tzvh+7vMXWgw@mail.gmail.com>
    > >> > Nevertheless this is ok for all OSs, I don't know whether
    > >> > initializing TimestampTz(double, int64 is ok) field with 8 bytes
    ...
    > I believe it's safe. Such a code is placed elsewhere in the source, too.
    > If it's unsafe, we should have received lots of bug reports related
    > to that. But we've not.
    
    Regards,
    
    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center
    
    
  15. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-09-30T07:18:53Z

    On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI
    <horiguchi.kyotaro@oss.ntt.co.jp> wrote:
    
    > Ok, I send this patch to comitters.
    
    I repeat my objection to this patch. I'm very sorry I haven't been
    around much in last few weeks to keep up a dialogue about this and to
    make it clear how wrong I think this is.
    
    Adding something onto the main path of transaction commit is not good,
    especially when the only purpose of it is to run an occasional
    monitoring query for those people that use replication. Not everybody
    uses replication and even people that do use it don't need to run it
    so frequently that every single commit needs this. This is just bloat
    that we do not need and can also easily avoid.
    
    The calculation itself would be problematic since it differs from the
    way standby_delay is calculated on the standby, which will cause much
    confusion. Some thought or comment should be made about that also.
    
    If we want to measure times, we can easily send regular messages into
    WAL to provide this function. Using checkpoint records would seem
    frequent enough to me. We don't always send checkpoint records but we
    can send an info message instead if we are streaming. If
    archive_timeout is not set and max_wal_senders > 0 then we can send an
    info WAL message with timestamp on a regular cycle.
    
    Alternatively, we can send regular special messages from WALSender to
    WALReceiver that do not form part of the WAL stream, so we don't bulk
    up WAL archives. (i.e. don't use "w" messages). That seems like the
    most viable approach to this problem.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  16. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-09-30T19:11:30Z

    On Fri, Sep 30, 2011 at 3:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI
    > <horiguchi.kyotaro@oss.ntt.co.jp> wrote:
    >
    >> Ok, I send this patch to comitters.
    >
    > I repeat my objection to this patch. I'm very sorry I haven't been
    > around much in last few weeks to keep up a dialogue about this and to
    > make it clear how wrong I think this is.
    >
    > Adding something onto the main path of transaction commit is not good,
    > especially when the only purpose of it is to run an occasional
    > monitoring query for those people that use replication. Not everybody
    > uses replication and even people that do use it don't need to run it
    > so frequently that every single commit needs this. This is just bloat
    > that we do not need and can also easily avoid.
    
    I think the overhead of this is so completely trivial that we
    shouldn't be concerned about it.  It's writing 12 bytes to shared
    memory for each commit, without taking a lock, in a cache line that
    should be minimally contended.  We write plenty of other data to
    shared memory on each commit, and that's nowhere close to being the
    expensive part of committing a transaction.  What's expensive is
    fighting over WALInsertLock and ProcArrayLock and CLogControlLock, and
    possibly waiting for the commit to be durably flushed to disk, and
    maybe (at the margin) wrenching the cache lines in our PGPROC away
    from whatever processor last stole them to do GetSnapshotData().  I
    don't think that a couple of stores to uncontended shared memory are
    going to make any difference.
    
    > The calculation itself would be problematic since it differs from the
    > way standby_delay is calculated on the standby, which will cause much
    > confusion. Some thought or comment should be made about that also.
    
    The string standby_delay doesn't appear in our source tree anywhere,
    so I'm not sure what this is referring to.  In any case, I'm in favor
    of this feature.  Currently, the only way to measure the lag on the
    standby is to measure it in WAL bytes - and you get to write your own
    script to parse the WAL positions.  This will allow people to measure
    it in minutes.  That seems like a significant usability improvement.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  17. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-09-30T19:22:24Z

    On Fri, Sep 30, 2011 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Fri, Sep 30, 2011 at 3:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI
    >> <horiguchi.kyotaro@oss.ntt.co.jp> wrote:
    >>
    >>> Ok, I send this patch to comitters.
    >>
    >> I repeat my objection to this patch. I'm very sorry I haven't been
    >> around much in last few weeks to keep up a dialogue about this and to
    >> make it clear how wrong I think this is.
    >>
    >> Adding something onto the main path of transaction commit is not good,
    >> especially when the only purpose of it is to run an occasional
    >> monitoring query for those people that use replication. Not everybody
    >> uses replication and even people that do use it don't need to run it
    >> so frequently that every single commit needs this. This is just bloat
    >> that we do not need and can also easily avoid.
    >
    > I think the overhead of this is so completely trivial that we
    > shouldn't be concerned about it.  It's writing 12 bytes to shared
    > memory for each commit, without taking a lock, in a cache line that
    > should be minimally contended.  We write plenty of other data to
    > shared memory on each commit, and that's nowhere close to being the
    > expensive part of committing a transaction.  What's expensive is
    > fighting over WALInsertLock and ProcArrayLock and CLogControlLock, and
    > possibly waiting for the commit to be durably flushed to disk, and
    > maybe (at the margin) wrenching the cache lines in our PGPROC away
    > from whatever processor last stole them to do GetSnapshotData().  I
    > don't think that a couple of stores to uncontended shared memory are
    > going to make any difference.
    
    If the feature could not be done another way, easily, I might agree.
    
    The point is it isn't necessary, useful or elegant to do it this way
    and *any* cycles added to mainline transactions need to have careful
    justification. And there really isn't any justification for doing
    things this way other than its the first way somebody thought of.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  18. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-09-30T19:57:19Z

    On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > If the feature could not be done another way, easily, I might agree.
    
    I don't see that you've offered a reasonable alternative.  The
    alternative proposals that you proposed don't appear to me to be
    solving the same problem.  AIUI, the requested feature is to be able
    to get, on the master, the timestamp of the last commit or abort, just
    as we can already get the timestamp of the last commit or abort
    replayed on the standby.  Nothing you WAL log and no messages you send
    to the standby are going to accomplish that goal.
    
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  19. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-09-30T20:07:58Z

    On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> If the feature could not be done another way, easily, I might agree.
    >
    > I don't see that you've offered a reasonable alternative.  The
    > alternative proposals that you proposed don't appear to me to be
    > solving the same problem.  AIUI, the requested feature is to be able
    > to get, on the master, the timestamp of the last commit or abort, just
    > as we can already get the timestamp of the last commit or abort
    > replayed on the standby.  Nothing you WAL log and no messages you send
    > to the standby are going to accomplish that goal.
    
    The goal of the OP was to find out the replication delay. This
    function was suggested, but its not the only way.
    
    My alternative proposals solve the original problem in a better way.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  20. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-10-02T11:10:11Z

    On Fri, Sep 30, 2011 at 4:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >> On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> If the feature could not be done another way, easily, I might agree.
    >>
    >> I don't see that you've offered a reasonable alternative.  The
    >> alternative proposals that you proposed don't appear to me to be
    >> solving the same problem.  AIUI, the requested feature is to be able
    >> to get, on the master, the timestamp of the last commit or abort, just
    >> as we can already get the timestamp of the last commit or abort
    >> replayed on the standby.  Nothing you WAL log and no messages you send
    >> to the standby are going to accomplish that goal.
    >
    > The goal of the OP was to find out the replication delay. This
    > function was suggested, but its not the only way.
    >
    > My alternative proposals solve the original problem in a better way.
    
    As far as I can see, they don't solve the problem at all.  Your
    proposals involve sending additional information from the master to
    the slave, but the slave already knows both its WAL position and the
    timestamp of the transaction it has most recently replayed, because
    the startup process on the slave tracks that information and publishes
    it in shared memory.  On the master, however, only the WAL position is
    centrally tracked; the transaction timestamps are not.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  21. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-10-02T11:19:33Z

    On Thu, Sep 15, 2011 at 4:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > Okay, I've changed the patch in that way.
    
    It occurs to me that pgstat_report_xact_end_timestamp doesn't really
    need to follow the protocol of bumping the change count before and
    after bumping the timestamp.  We elsewhere assume that four-byte reads
    and writes are atomic, so there's no harm in assuming the same thing
    here (and if they're not... then the change-count thing is pretty
    dubious anyway).  I think it's sufficient to just set the value, full
    stop.
    
    Also, in pg_last_xact_insert_timestamp, the errhint() seems a little
    strange - this is not exactly a WAL *control* function, is it?
    
    In the documentation, for the short description of
    pg_last_xact_insert_timestamp(), how about something like "returns the
    time at which a transaction commit or transaction about record was
    last inserted into the transaction log"?  Or maybe that's too long.
    But the current description doesn't seem to do much other than
    recapitulate the function name, so I'm wondering if we can do any
    better than that.
    
    I think that instead of hacking up the backend-status copying code to
    have a mode where it copies everything, you should just have a
    special-purpose function that computes the value you need directly off
    the backend status entries themselves.  This approach seems like it
    both clutters the code and adds lots of extra data copying.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  22. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-02T12:21:31Z

    On Sun, Oct 2, 2011 at 12:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Fri, Sep 30, 2011 at 4:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >>> On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>>> If the feature could not be done another way, easily, I might agree.
    >>>
    >>> I don't see that you've offered a reasonable alternative.  The
    >>> alternative proposals that you proposed don't appear to me to be
    >>> solving the same problem.  AIUI, the requested feature is to be able
    >>> to get, on the master, the timestamp of the last commit or abort, just
    >>> as we can already get the timestamp of the last commit or abort
    >>> replayed on the standby.  Nothing you WAL log and no messages you send
    >>> to the standby are going to accomplish that goal.
    >>
    >> The goal of the OP was to find out the replication delay. This
    >> function was suggested, but its not the only way.
    >>
    >> My alternative proposals solve the original problem in a better way.
    >
    > As far as I can see, they don't solve the problem at all.  Your
    > proposals involve sending additional information from the master to
    > the slave, but the slave already knows both its WAL position and the
    > timestamp of the transaction it has most recently replayed, because
    > the startup process on the slave tracks that information and publishes
    > it in shared memory.  On the master, however, only the WAL position is
    > centrally tracked; the transaction timestamps are not.
    
    The problem is to find the replication delay, even when the system is quiet.
    
    What I have proposed finds the replication delay more accurately even
    than looking at the last commit, since often there are writes but no
    commits.
    
    If we focus on the problem, rather than the first suggested solution
    to that problem, we'll come out on top.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  23. Re: [REVIEW] pg_last_xact_insert_timestamp

    Tom Lane <tgl@sss.pgh.pa.us> — 2011-10-02T21:35:31Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > It occurs to me that pgstat_report_xact_end_timestamp doesn't really
    > need to follow the protocol of bumping the change count before and
    > after bumping the timestamp.  We elsewhere assume that four-byte reads
    > and writes are atomic, so there's no harm in assuming the same thing
    > here (and if they're not... then the change-count thing is pretty
    > dubious anyway).  I think it's sufficient to just set the value, full
    > stop.
    
    I agree you can read the value without that, but I think that setting
    it should still bump the change count.  Otherwise there's no way for
    another process to collect the whole struct and be sure that it's
    self-consistent.  We may not have a critical need for that right now,
    but it's silly to foreclose the possibility to save a cycle or so.
    
    			regards, tom lane
    
    
  24. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-03T09:31:47Z

    On Sun, Oct 2, 2011 at 8:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > It occurs to me that pgstat_report_xact_end_timestamp doesn't really
    > need to follow the protocol of bumping the change count before and
    > after bumping the timestamp.  We elsewhere assume that four-byte reads
    > and writes are atomic, so there's no harm in assuming the same thing
    > here (and if they're not... then the change-count thing is pretty
    > dubious anyway).  I think it's sufficient to just set the value, full
    > stop.
    
    I agree with Tom here. It seems to be safer to follow the protocol even if
    that's not required for now.
    
    > Also, in pg_last_xact_insert_timestamp, the errhint() seems a little
    > strange - this is not exactly a WAL *control* function, is it?
    
    Not only "control" but also "WAL" might be confusing. What about
    "transaction information functions"?
    
    BTW, pg_current_xlog_location() and pg_current_xlog_insert_location()
    use the same HINT message as I used for pg_last_xact_insert_timestamp(),
    but they are also not WAL *control* function. And, in the document,
    they are categorized as "Backup Control Functions", but which sounds also
    strange. We should call them "WAL information functions" in both
    HINT message and the document?
    
    > In the documentation, for the short description of
    > pg_last_xact_insert_timestamp(), how about something like "returns the
    > time at which a transaction commit or transaction about record was
    > last inserted into the transaction log"?  Or maybe that's too long.
    > But the current description doesn't seem to do much other than
    > recapitulate the function name, so I'm wondering if we can do any
    > better than that.
    
    Agreed. I will change the description per your suggestion.
    
    > I think that instead of hacking up the backend-status copying code to
    > have a mode where it copies everything, you should just have a
    > special-purpose function that computes the value you need directly off
    > the backend status entries themselves.  This approach seems like it
    > both clutters the code and adds lots of extra data copying.
    
    Agreed. Will change.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  25. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-03T09:56:53Z

    On Fri, Sep 30, 2011 at 4:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > If we want to measure times, we can easily send regular messages into
    > WAL to provide this function. Using checkpoint records would seem
    > frequent enough to me. We don't always send checkpoint records but we
    > can send an info message instead if we are streaming. If
    > archive_timeout is not set and max_wal_senders > 0 then we can send an
    > info WAL message with timestamp on a regular cycle.
    
    What timestamp are you thinking the walsender should send? What we need
    is the timestamp which is comparable with the result of
    pg_last_xact_replay_timestamp() which returns the timestamp of the
    transaction commit or abort record. So, even if we adopt your proposal,
    ISTM that we still need to collect the timestamp for each commit. No?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  26. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-10-03T19:07:54Z

    On Sun, Oct 2, 2011 at 8:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > The problem is to find the replication delay, even when the system is quiet.
    >
    > What I have proposed finds the replication delay more accurately even
    > than looking at the last commit, since often there are writes but no
    > commits.
    >
    > If we focus on the problem, rather than the first suggested solution
    > to that problem, we'll come out on top.
    
    Sorry, but I still don't really think it's fair to say that you've
    proposed a solution to this problem.  Or if you have, neither I nor
    Fujii Masao understand that proposal well enough to decide whether we
    like it.  You said "maybe we could WAL log something once per
    checkpoint cycle" or "maybe we could add a new protocol message".
    We've both replied with various emails saying "we don't understand how
    that would solve the problem".  If you want to add some detail to your
    proposal, then we can weigh the pros and cons as compared with what
    the patch does - but right now all you've provided is a theory that
    there might be a better solution to this problem out there, not any
    details about how it would work.  Or if you have, then please post a
    link to the message where those details are written out, because I
    cannot find them on the thread.
    
    I do, however, agree that that the case where the system is quiet is
    the problem case for computing replication delay.  It seems to me
    that, even without this patch, if the system has a continuous stream
    of commits, you can easily find the replication delay by differencing
    the current time on the master with the value returned by
    pg_last_xact_replay_timestamp().  However, if the master goes quiet,
    then the slave will appear to be progressively farther behind.  With
    the addition of this patch, that problem goes away: you can now
    difference the return value of pg_last_xact_insert_timestamp() on the
    master with the return value of pg_last_xact_replay_timestamp() on the
    slave.  If the master goes quiet, then pg_last_xact_insert_timestamp()
    will stop advancing, and so the two values you are comparing will be
    equal once the slave has caught up, and remain equal until activity
    resumes on the master.
    
    Now, there is a more subtle remaining problem, which is that when
    activity *does* resume on the master, there will be a (probably quite
    short) window of time during which the slave will have a much earlier
    timestamp than the one on the master.  When the master has a commit
    after a long idle period but the slave has not yet replayed the commit
    record, the replication delay will appear to be equal to the length of
    the idle period.  But that doesn't seem like a sufficient reason to
    reject the whole approach, because there are several ways around it.
    First, you could simply decide that the large computed lag value,
    although counterintuitive, is accurate under some definition, because,
    well, that really is the lag between the last transaction committed on
    the master and the last transaction committed on the standby, and if
    you don't like the fact that timestamps behave that way, you should
    compare using WAL positions instead.  If you don't like that approach,
    then a second, also viable approach is to teach your monitoring
    software that the replication delay can never increase faster than the
    rate at which clock time is passing.  So if you were caught up a
    minute ago, then you can't be more than a minute behind now.
    
    Another point I want to make here is that there's probably more than
    one useful definition of replication delay.  The previous question
    presupposes that you're trying to answer the question "if I have a
    transaction that committed N seconds ago on the master, will it be
    visible on the standby?".  It's also a reasonable time-based
    substitute for measuring the difference in master and standby WAL
    positions, although certainly it's going to work better if the rate of
    WAL generation is relatively even.  But for a lot of people, it may be
    that what they really want to know is "what is the expected time for
    the standby to replay all generated but not yet applied WAL?" - or
    maybe some third thing that I'm not thinking of - and this function
    won't provide that.  I think we can ultimately afford to provide more
    than one mechanism here, so I don't see doing this as foreclosing any
    other also-useful calculation that someone may wish to add in the
    future.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  27. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-03T20:25:01Z

    On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    
    > Sorry, but I still don't really think it's fair to say that you've
    > proposed a solution to this problem.  Or if you have, neither I nor
    > Fujii Masao understand that proposal well enough to decide whether we
    > like it.
    
    Arguing between trenches doesn't really get us anywhere.
    
    As ever, when someone claims to have a better solution then it is up
    to them to prove that is the case.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  28. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-10-03T20:33:01Z

    On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> Sorry, but I still don't really think it's fair to say that you've
    >> proposed a solution to this problem.  Or if you have, neither I nor
    >> Fujii Masao understand that proposal well enough to decide whether we
    >> like it.
    >
    > Arguing between trenches doesn't really get us anywhere.
    >
    > As ever, when someone claims to have a better solution then it is up
    > to them to prove that is the case.
    
    So... are you going to do that?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  29. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-04T11:52:59Z

    On Mon, Oct 3, 2011 at 6:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> Also, in pg_last_xact_insert_timestamp, the errhint() seems a little
    >> strange - this is not exactly a WAL *control* function, is it?
    >
    > Not only "control" but also "WAL" might be confusing. What about
    > "transaction information functions"?
    
    Attached is the updated version of the patch. In the patch, I used the
    function name itself in the HINT message, i.e., the HINT message is
    the following.
    
        pg_last_xact_insert_timestamp() cannot be executed during recovery.
    
    >> In the documentation, for the short description of
    >> pg_last_xact_insert_timestamp(), how about something like "returns the
    >> time at which a transaction commit or transaction about record was
    >> last inserted into the transaction log"?  Or maybe that's too long.
    >> But the current description doesn't seem to do much other than
    >> recapitulate the function name, so I'm wondering if we can do any
    >> better than that.
    >
    > Agreed. I will change the description per your suggestion.
    
    Done.
    
    >> I think that instead of hacking up the backend-status copying code to
    >> have a mode where it copies everything, you should just have a
    >> special-purpose function that computes the value you need directly off
    >> the backend status entries themselves.  This approach seems like it
    >> both clutters the code and adds lots of extra data copying.
    >
    > Agreed. Will change.
    
    Done.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  30. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-04T12:05:29Z

    On Tue, Oct 4, 2011 at 5:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >>
    >>> Sorry, but I still don't really think it's fair to say that you've
    >>> proposed a solution to this problem.  Or if you have, neither I nor
    >>> Fujii Masao understand that proposal well enough to decide whether we
    >>> like it.
    >>
    >> Arguing between trenches doesn't really get us anywhere.
    >>
    >> As ever, when someone claims to have a better solution then it is up
    >> to them to prove that is the case.
    >
    > So... are you going to do that?
    
    Simon, could you? If your proposal turns out to be better than mine, I'd be
    happy to agree to drop my patch and adopt yours.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  31. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-04T13:15:11Z

    On Tue, Oct 4, 2011 at 1:05 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Tue, Oct 4, 2011 at 5:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    >> On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >>>
    >>>> Sorry, but I still don't really think it's fair to say that you've
    >>>> proposed a solution to this problem.  Or if you have, neither I nor
    >>>> Fujii Masao understand that proposal well enough to decide whether we
    >>>> like it.
    >>>
    >>> Arguing between trenches doesn't really get us anywhere.
    >>>
    >>> As ever, when someone claims to have a better solution then it is up
    >>> to them to prove that is the case.
    >>
    >> So... are you going to do that?
    >
    > Simon, could you? If your proposal turns out to be better than mine, I'd be
    > happy to agree to drop my patch and adopt yours.
    
    Yes, will do.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  32. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-10-13T12:25:28Z

    On Tue, Oct 4, 2011 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> Simon, could you? If your proposal turns out to be better than mine, I'd be
    >> happy to agree to drop my patch and adopt yours.
    >
    > Yes, will do.
    
    Simon,
    
    I believe that we are still waiting for this.
    
    Thanks,
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  33. Re: [REVIEW] pg_last_xact_insert_timestamp

    Magnus Hagander <magnus@hagander.net> — 2011-12-06T11:49:29Z

    On Thu, Oct 13, 2011 at 14:25, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Tue, Oct 4, 2011 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> Simon, could you? If your proposal turns out to be better than mine, I'd be
    >>> happy to agree to drop my patch and adopt yours.
    >>
    >> Yes, will do.
    >
    > Simon,
    >
    > I believe that we are still waiting for this.
    
    Are we going to hear anything back on this one for the current CF? If
    not, I suggest we go with Fujiis version for now - we can always
    change it for a potentially better version later.
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  34. Re: [REVIEW] pg_last_xact_insert_timestamp

    Greg Smith <greg@2ndquadrant.com> — 2011-12-10T12:29:39Z

    On 10/02/2011 07:10 AM, Robert Haas wrote:
    > Your proposals involve sending additional information from the master to
    > the slave, but the slave already knows both its WAL position and the
    > timestamp of the transaction it has most recently replayed, because
    > the startup process on the slave tracks that information and publishes
    > it in shared memory.  On the master, however, only the WAL position is
    > centrally tracked; the transaction timestamps are not.
    >    
    
    This seems to be the question that was never really answered well enough 
    to satisfy anyone, so let's rewind to here for a bit.  I wasn't 
    following this closely until now, so I just did my own review from 
    scratch against the latest patch.  I found a few issues, and I don't 
    think all of them have been vented here yet:
    
    -It adds overhead at every commit, even for people who aren't using it.  
    Probably not enough to matter, but it's yet another thing going through 
    the often maligned as too heavy pgstat system, often.
    
    -In order to measure lag this way, you need access to both the master 
    and the standby.  Yuck, dblink or application code doing timestamp math, 
    either idea makes me shudder.  It would be nice to answer "how many 
    seconds of lag do have?" directly from the standby.  Ideally I would 
    like both the master and standby to know those numbers.
    
    -After the server is restarted, you get a hole in the monitoring data 
    until the first transaction is committed or aborted.  The way the 
    existing pg_last_xact_replay_timestamp side of this computation goes 
    NULL for some unpredictable period after restart is going to drive 
    monitoring systems batty.  Building this new facility similarly will 
    force everyone who writes a lag monitor to special case for that 
    condition on both sides.  Sure, that's less user hostile than the status 
    quo, but it isn't going to help PostgreSQL's battered reputation in the 
    area of monitoring either.
    
    -The transaction ID advancing is not a very good proxy for real-world 
    lag.  You can have a very large amount of writing in between commits.  
    The existing lag monitoring possibilities focus on WAL position instead, 
    which is better correlated with the sort of activity that causes lag.  
    Making one measurement of lag WAL position based (the existing ones) and 
    another based on transaction advance (this proposal) is bound to raise 
    some "which of these is the correct lag?" questions, when the two 
    diverge.  Large data loading operations come to mind as a not unusual at 
    all situation where this would happen.
    
    I'm normally a fan of building the simplest thing that will do something 
    useful, and this patch succeeds there.  But the best data to collect 
    needs to have a timestamp that keeps moving forward in a way that 
    correlates reasonably well to the system WAL load.  Using the 
    transaction ID doesn't do that.  Simon did some hand waving around 
    sending a timestamp every checkpoint.  That would allow the standby to 
    compute its own lag, limit overhead to something much lighter than 
    per-transaction, and better track write volume.  There could still be a 
    bigger than normal discontinuity after server restart, if the server was 
    down a while, but at least there wouldn't ever be a point where the 
    value was returned by the master but was NULL.
    
    But as Simon mentioned in passing, it will bloat the WAL streams for 
    everyone.  Here's the as yet uncoded mini-proposal that seems to have 
    slid by uncommented upon:
    
    "We can send regular special messages from WALSender to WALReceiver that 
    do not form part of the WAL stream, so we don't bulk
    up WAL archives. (i.e. don't use "w" messages)."
    
    Here's my understanding of how this would work.  Each time a 
    commit/abort record appears in the WAL, that updates XLogCtl with the 
    associated timestamp.  If WALReceiver received regular updates 
    containing the master's clock timestamp and stored them 
    similarly--either via regular streaming or the heartbeat--it could 
    compute lag with the same resolution as this patch aims to do, for the 
    price of two spinlocks:  just subtract the two timestamps.  No overhead 
    on the master, and lag can be directly computed and queried from each 
    standby.  If you want to feed that data back to the master so it can 
    appear in pg_stat_replication in both places, send it periodically via 
    the same channel sync rep and standby feedback use.  I believe that will 
    be cheap in many cases, since it can piggyback on messages that will 
    still be quite small relative to minimum packet size on most networks.  
    (Exception for compressed/encrypted networks where packets aren't 
    discrete in this way doesn't seem that relevant, presuming that if 
    you're doing one of those I would think this overhead is the least of 
    your worries)
    
    Now, there's still one problem here.  This doesn't address the "lots of 
    write volume but no commit" problem any better than the proposed patch 
    does.  Maybe there's some fancy way to inject it along with the received 
    WAL on the standby, I'm out of brain power to think through that right 
    now.  One way to force this to go away is to add a "time update" WAL 
    record type here, one that only appears in some of these unusual 
    situations.  My first idea for keeping that overhead small is to put the 
    injection logic for it at WAL file switch time.  If you haven't 
    committed a transaction during that entire 16MB of writes, start the new 
    WAL segment with a little time update record.  That would ensure you 
    never do too many writes before the WAL replay clock is updated, while 
    avoiding this record type altogether during commit-heavy periods.
    
    The WAL sender and receiver pair should be able to work out how to 
    handle the other corner case, where neither WAL advance nor transactions 
    occured.  You don't want lag to keep increasing forever there.  If the 
    transaction ID hasn't moved forward, the heartbeat can still update the 
    time.  I think if you do that, all you'd need to special case is that if 
    master XID=standby replay XID, lag time should be forced to 0 instead of 
    being its usual value (master timestamp - last standby commit/abort record).
    
    As for "what do you return if asked for lag before the first data with a 
    timestamp appears?" problem, there are surely still cases where that 
    happens in this approach.  I'm less concerned about that if there's only 
    a single function involved though.  The part that worries me is the high 
    probability people are going to do NULL math wrong if they have to 
    combine two values from different servers, and not catch it during 
    development.  If I had a penny for every NULL handling mistake ever made 
    in that category, I'd be writing this from my island lair instead of 
    Baltimore.  I'm more comfortable saying "this lag interval might be 
    NULL"; if that's the only exception people have to worry about, that 
    doesn't stress me as much.
    
    Last disclaimer:  if this argument is persuasive enough to punt 
    pg_last_xact_insert_timestamp for now, in favor of a new specification 
    along the lines I've outlined here, I have no intention of letting 
    myself or Simon end up blocking progress on that.  This one is important 
    enough for 9.2 that if there's not another patch offering the same 
    feature sitting in the queue by the end of the month, I'll ask someone 
    else here to sit on this problem a while.  Probably Jaime, because he's 
    suffering with this problem as much as I am.  We maintain code in repmgr 
    to do this job with brute force:  it saves a history table to translate 
    XID->timestamps and works out lag from there.  Window function query + 
    constantly churning monitoring table=high overhead.  It would really be 
    great to EOL that whole messy section as deprecated starting in 9.2.
    
    -- 
    Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
    
    
    
  35. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-12-12T12:17:10Z

    On Sat, Dec 10, 2011 at 12:29 PM, Greg Smith <greg@2ndquadrant.com> wrote:
    
    > "We can send regular special messages from WALSender to WALReceiver that do
    > not form part of the WAL stream, so we don't bulk
    > up WAL archives. (i.e. don't use "w" messages)."
    >
    > Here's my understanding of how this would work.
    
    Let me explain a little more and provide a very partial patch.
    
    We define a new replication protocol message 'k' which sends a
    keepalive from primary to standby when there is no WAL to send. The
    message does not form part of the WAL stream so does not bloat WAL
    files, nor cause them to fill when unattended.
    
    Keepalives contain current end of WAL and a current timestamp.
    
    Keepalive processing is all done on the standby and there is no
    overhead on a primary which does not use replication. There is a
    slight overhead on primary for keepalives but this happens only when
    there are no writes. On the standby we already update shared state
    when we receive some data, so not much else to do there.
    
    When the standby has applied up to the end of WAL the replication
    delay is receipt time - send time of keepalive.
    
    When standby receives a data packet it records WAL ptr and time. As
    standby applies each chunk it removes the record for each data packet
    and sets the last applied timestamp.
    
    If standby falls behind the number of data packet records will build
    up, so we begin to keep record every 2 packets, then every 4 packets
    etc. So the further the standby falls behind the less accurately we
    record the replication delay - though the accuracy remains
    proportional to the delay.
    
    To complete the patch I need to
    * send the keepalive messages when no WAL outstanding
    * receive the messages
    * store timestamp info for data and keepalives
    * progressively filter the messages if we get too many
    
    I will be working on this patch some more this week.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
  36. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-12-12T13:45:25Z

    On Sat, Dec 10, 2011 at 7:29 AM, Greg Smith <greg@2ndquadrant.com> wrote:
    > -It adds overhead at every commit, even for people who aren't using it.
    >  Probably not enough to matter, but it's yet another thing going through the
    > often maligned as too heavy pgstat system, often.
    
    The bit about the pgstat system being too heavy is a red herring; this
    could equally well be stored in the PGPROC.  So, the overhead is
    basically one additional store to shared memory per commit, and we can
    arrange for that store to involve a cache line that has to be touched
    anyway (by ProcArrayEndTransaction).  I suppose you can make the
    argument that that still isn't free, but there aren't very many places
    in the code where we worry about effects this small.  Operations like
    AcceptInvalidationMessages() or LockRelationOid() happen more often
    than transaction commit, and yet we were happy for years with a system
    where AcceptInvalidationMessages() did three spinlock
    acquire-and-release cycles that were unnecessary in 99% of cases.
    That cost was vastly greater than what we're talking about here, and
    it affected an operation that is more frequent than this one.
    
    > [ usability complaints ]
    
    Fair enough.
    
    > I'm normally a fan of building the simplest thing that will do something
    > useful, and this patch succeeds there.  But the best data to collect needs
    > to have a timestamp that keeps moving forward in a way that correlates
    > reasonably well to the system WAL load.  Using the transaction ID doesn't do
    > that.  Simon did some hand waving around sending a timestamp every
    > checkpoint.  That would allow the standby to compute its own lag, limit
    > overhead to something much lighter than per-transaction, and better track
    > write volume.  There could still be a bigger than normal discontinuity after
    > server restart, if the server was down a while, but at least there wouldn't
    > ever be a point where the value was returned by the master but was NULL.
    >
    > But as Simon mentioned in passing, it will bloat the WAL streams for
    > everyone.
    
    But, as with this proposal, the overhead seems largely irrelevant; the
    question is whether it actually solves the problem.  Unless I am
    misunderstanding, we are talking about 4 bytes of WAL per checkpoint
    cycle, which strikes me as even less worth worrying about than one
    store to shared memory per transaction commit.  So, personally, I see
    no reason to fret about the overhead.  But I'm skeptical that anything
    that we only update once per checkpoint cycle will help much in
    calculating an accurate lag value.  It also strikes me that anything
    that is based on augmenting the walsender/walreceiver protocol leaves
    anyone who is using WAL shipping out in the cold.  I'm not clear from
    the comments you or Simon have made how important you think that use
    case still is.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  37. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-12-12T14:24:41Z

    On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    
    > It also strikes me that anything
    > that is based on augmenting the walsender/walreceiver protocol leaves
    > anyone who is using WAL shipping out in the cold.  I'm not clear from
    > the comments you or Simon have made how important you think that use
    > case still is.
    
    archive_timeout > 0 works just fine at generating files even when
    quiet, or if it does not, it is a bug.
    
    So I don't understand your comments, please explain.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  38. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-12-12T14:47:34Z

    On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >> It also strikes me that anything
    >> that is based on augmenting the walsender/walreceiver protocol leaves
    >> anyone who is using WAL shipping out in the cold.  I'm not clear from
    >> the comments you or Simon have made how important you think that use
    >> case still is.
    >
    > archive_timeout > 0 works just fine at generating files even when
    > quiet, or if it does not, it is a bug.
    >
    > So I don't understand your comments, please explain.
    
    If the standby has restore_command set but not primary_conninfo, then
    it will never make a direct connection to the master.  So anything
    that's based on extending that protocol won't get used in that case.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  39. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-12-12T14:51:10Z

    On Mon, Dec 12, 2011 at 2:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >>> It also strikes me that anything
    >>> that is based on augmenting the walsender/walreceiver protocol leaves
    >>> anyone who is using WAL shipping out in the cold.  I'm not clear from
    >>> the comments you or Simon have made how important you think that use
    >>> case still is.
    >>
    >> archive_timeout > 0 works just fine at generating files even when
    >> quiet, or if it does not, it is a bug.
    >>
    >> So I don't understand your comments, please explain.
    >
    > If the standby has restore_command set but not primary_conninfo, then
    > it will never make a direct connection to the master.  So anything
    > that's based on extending that protocol won't get used in that case.
    
    Got that, but now explain the reason for saying such people are "out
    in the cold".
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  40. Re: [REVIEW] pg_last_xact_insert_timestamp

    Robert Haas <robertmhaas@gmail.com> — 2011-12-12T14:53:26Z

    On Mon, Dec 12, 2011 at 9:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Mon, Dec 12, 2011 at 2:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >> On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >>>> It also strikes me that anything
    >>>> that is based on augmenting the walsender/walreceiver protocol leaves
    >>>> anyone who is using WAL shipping out in the cold.  I'm not clear from
    >>>> the comments you or Simon have made how important you think that use
    >>>> case still is.
    >>>
    >>> archive_timeout > 0 works just fine at generating files even when
    >>> quiet, or if it does not, it is a bug.
    >>>
    >>> So I don't understand your comments, please explain.
    >>
    >> If the standby has restore_command set but not primary_conninfo, then
    >> it will never make a direct connection to the master.  So anything
    >> that's based on extending that protocol won't get used in that case.
    >
    > Got that, but now explain the reason for saying such people are "out
    > in the cold".
    
    By that I only meant that if we add a lag-monitoring feature that
    relies on the streaming replication protocol, then people who are not
    using the streaming replication replication protocol will be unable to
    monitor lag (or at least, the will be unable to do it any better than
    they can do it today).
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  41. Re: [REVIEW] pg_last_xact_insert_timestamp

    Greg Smith <greg@2ndquadrant.com> — 2011-12-12T22:08:51Z

    On 12/12/2011 08:45 AM, Robert Haas wrote:
    > But I'm skeptical that anything that we only update once per 
    > checkpoint cycle will help much in
    > calculating an accurate lag value.
    
    I'm sure there is no upper bound on how much WAL lag you can build up 
    between commit/abort records either; they can be far less frequent than 
    checkpoints.  All it takes is a multi-hour COPY with no other commits to 
    completely hose lag measured by that advance, and that is not an unusual 
    situation at all.  Overnight daily ETL or reporting MV-ish roll-ups, 
    scheduled specifically for when no one is normally at the office, are 
    the first thing that spring to mind.
    
    Anyway, I wasn't suggesting checkpoints as anything other than a worst 
    case behavior.  We can always thump out more frequent updates to reduce 
    lag, and in what I expect to the most common case the WAL send/receive 
    stuff will usually do much better.  I see the XID vs. WAL position UI 
    issues as being fundamentally unsolvable, which really bothers me.  I'd 
    have preferred to run screaming away from this thread if it hadn't.
    
    > It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves
    > anyone who is using WAL shipping out in the cold.  I'm not clear from
    > the comments you or Simon have made how important you think that use
    > case still is.
    >    
    
    There's a number of reasons why we might want more timestamps streamed 
    into the WAL; this might be one.  We'd just need one to pop out one as 
    part of the archive_timeout switch to in theory make it possible for 
    these people to be happy.  I think Simon was hoping to avoid WAL 
    timestamps, I wouldn't bet too much on that myself.  The obvious 
    implementation problem here is that the logical place to put the 
    timestamps is right at the end of the WAL file, just before it's closed 
    for archiving.  But that position isn't known until you've at least 
    started processing it, which you clearly are not doing fast enough if 
    lag exists.
    
    As far as who's still important here, two observations.  Note that the 
    pg_last_xact_insert_timestamp approach can fail to satisfy WAL shipping 
    people who are going to a separate network, where it's impractical to 
    connect to both servers with libpq.  I have some customers who like 
    putting a one-way WAL wall (sorry) between production and the standby 
    server, with the log shipping being the only route between them; that's 
    one reason why they might still be doing this instead of using 
    streaming.  There's really no good way to make these people happy and 
    provide time lag monitoring inside the database.
    
    I was actually the last person I recall who suggested some extra 
    monitoring mainly aimed at WAL shipping environments:  
    http://archives.postgresql.org/pgsql-hackers/2010-01/msg01522.php  Had 
    some pg_standby changes I was also working on back then, almost two 
    years ago.  I never circled back to any of it due to having zero demand 
    since 9.0 shipped, the requests I had been regularly getting about this 
    all dried up.  While I'm all for keeping new features working for 
    everyone when it doesn't hold progress back, it's not unreasonable to 
    recognize we can't support every monitoring option through all of the 
    weird ways WAL files can move around.  pg_stat_replication isn't very 
    helpful for 9.0+ WAL shippers either, yet they still go on doing their 
    thing.
    
    In the other direction, people who will immediately adopt the latest 
    hotness, cascading is a whole new layer of use case concerns on top of 
    the ones considered so far.  Now you're talking two layers of 
    connections users have to navigate though to compute master->cascaded 
    standby lag.  Cascade the WALSender timestamps instead, which seems 
    pretty simple to do, and then people can just ask their local standby.
    
    -- 
    Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
    
    
    
  42. Re: [REVIEW] pg_last_xact_insert_timestamp

    Simon Riggs <simon@2ndquadrant.com> — 2011-12-14T16:06:08Z

    On Mon, Dec 12, 2011 at 12:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Sat, Dec 10, 2011 at 12:29 PM, Greg Smith <greg@2ndquadrant.com> wrote:
    >
    >> "We can send regular special messages from WALSender to WALReceiver that do
    >> not form part of the WAL stream, so we don't bulk
    >> up WAL archives. (i.e. don't use "w" messages)."
    >>
    >> Here's my understanding of how this would work.
    >
    > Let me explain a little more and provide a very partial patch.
    >
    > We define a new replication protocol message 'k' which sends a
    > keepalive from primary to standby when there is no WAL to send. The
    > message does not form part of the WAL stream so does not bloat WAL
    > files, nor cause them to fill when unattended.
    >
    > Keepalives contain current end of WAL and a current timestamp.
    >
    > Keepalive processing is all done on the standby and there is no
    > overhead on a primary which does not use replication. There is a
    > slight overhead on primary for keepalives but this happens only when
    > there are no writes. On the standby we already update shared state
    > when we receive some data, so not much else to do there.
    >
    > When the standby has applied up to the end of WAL the replication
    > delay is receipt time - send time of keepalive.
    
    Patch introduces regular keepalives from WALsender to WALreceiver,
    using a new protocol message 'k'.
    These are sent at intervals of a fraction of replication_delay or 10s
    if not set,
    whenever no WAL records have been sent recently.
    
    Patch exposes info used for standby_delay calculation as used in 9.0+
    In addition introduces direct calculations of replication apply delay
    and replication transfer latency, both in ms.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
  43. Re: [REVIEW] pg_last_xact_insert_timestamp

    Andres Freund <andres@2ndquadrant.com> — 2014-08-11T07:46:23Z

    Hi,
    
    On 2011-10-04 20:52:59 +0900, Fujii Masao wrote:
    > *** a/src/backend/access/transam/xact.c
    > --- b/src/backend/access/transam/xact.c
    > ***************
    > *** 1066,1071 **** RecordTransactionCommit(void)
    > --- 1066,1074 ----
    >   
    >   			(void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata);
    >   		}
    > + 
    > + 		/* Save timestamp of latest transaction commit record */
    > + 		pgstat_report_xact_end_timestamp(xactStopTimestamp);
    >   	}
    >
    
    Perhaps that pgstat_report() should instead be combined with the
    pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number
    of changecount increases and cacheline references would stay the
    same. The only thing that'd change would be a single additional
    assignment.
    
    Greetings,
    
    Andres Freund
    
    -- 
     Andres Freund	                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
    
  44. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2014-08-11T11:27:28Z

    On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
    > Hi,
    >
    > On 2011-10-04 20:52:59 +0900, Fujii Masao wrote:
    >> *** a/src/backend/access/transam/xact.c
    >> --- b/src/backend/access/transam/xact.c
    >> ***************
    >> *** 1066,1071 **** RecordTransactionCommit(void)
    >> --- 1066,1074 ----
    >>
    >>                       (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata);
    >>               }
    >> +
    >> +             /* Save timestamp of latest transaction commit record */
    >> +             pgstat_report_xact_end_timestamp(xactStopTimestamp);
    >>       }
    >>
    >
    > Perhaps that pgstat_report() should instead be combined with the
    > pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number
    > of changecount increases and cacheline references would stay the
    > same. The only thing that'd change would be a single additional
    > assignment.
    
    Sounds good suggestion.
    
    While reading the patch again, I found it didn't handle the COMMIT/ABORT
    PREPARED case properly. According to the commit e74e090, now
    pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED.
    pg_last_xact_insert_timestamp() is mainly expected to be used to calculate
    the replication delay, so it also needs to return that timestam. But the patch
    didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp()
    into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or
    RecordTransactionAbortPrepared().
    
    Regards,
    
    -- 
    Fujii Masao
    
    
    
  45. Re: [REVIEW] pg_last_xact_insert_timestamp

    Fujii Masao <masao.fujii@gmail.com> — 2014-08-13T03:41:24Z

    On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
    >> Hi,
    >>
    >> On 2011-10-04 20:52:59 +0900, Fujii Masao wrote:
    >>> *** a/src/backend/access/transam/xact.c
    >>> --- b/src/backend/access/transam/xact.c
    >>> ***************
    >>> *** 1066,1071 **** RecordTransactionCommit(void)
    >>> --- 1066,1074 ----
    >>>
    >>>                       (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata);
    >>>               }
    >>> +
    >>> +             /* Save timestamp of latest transaction commit record */
    >>> +             pgstat_report_xact_end_timestamp(xactStopTimestamp);
    >>>       }
    >>>
    >>
    >> Perhaps that pgstat_report() should instead be combined with the
    >> pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number
    >> of changecount increases and cacheline references would stay the
    >> same. The only thing that'd change would be a single additional
    >> assignment.
    >
    > Sounds good suggestion.
    
    I attached the updated version of the patch. I changed pgstat_report_xx
    functions like Andres suggested.
    
    > While reading the patch again, I found it didn't handle the COMMIT/ABORT
    > PREPARED case properly. According to the commit e74e090, now
    > pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED.
    > pg_last_xact_insert_timestamp() is mainly expected to be used to calculate
    > the replication delay, so it also needs to return that timestam. But the patch
    > didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp()
    > into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or
    > RecordTransactionAbortPrepared().
    
    Done.
    
    Regards,
    
    -- 
    Fujii Masao
    
  46. Re: [REVIEW] pg_last_xact_insert_timestamp

    Bruce Momjian <bruce@momjian.us> — 2014-10-12T00:32:09Z

    On Wed, Aug 13, 2014 at 12:41:24PM +0900, Fujii Masao wrote:
    > On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > > On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
    > >> Hi,
    > >>
    > >> On 2011-10-04 20:52:59 +0900, Fujii Masao wrote:
    > >>> *** a/src/backend/access/transam/xact.c
    > >>> --- b/src/backend/access/transam/xact.c
    > >>> ***************
    > >>> *** 1066,1071 **** RecordTransactionCommit(void)
    > >>> --- 1066,1074 ----
    > >>>
    > >>>                       (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata);
    > >>>               }
    > >>> +
    > >>> +             /* Save timestamp of latest transaction commit record */
    > >>> +             pgstat_report_xact_end_timestamp(xactStopTimestamp);
    > >>>       }
    > >>>
    > >>
    > >> Perhaps that pgstat_report() should instead be combined with the
    > >> pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number
    > >> of changecount increases and cacheline references would stay the
    > >> same. The only thing that'd change would be a single additional
    > >> assignment.
    > >
    > > Sounds good suggestion.
    > 
    > I attached the updated version of the patch. I changed pgstat_report_xx
    > functions like Andres suggested.
    > 
    > > While reading the patch again, I found it didn't handle the COMMIT/ABORT
    > > PREPARED case properly. According to the commit e74e090, now
    > > pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED.
    > > pg_last_xact_insert_timestamp() is mainly expected to be used to calculate
    > > the replication delay, so it also needs to return that timestam. But the patch
    > > didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp()
    > > into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or
    > > RecordTransactionAbortPrepared().
    > 
    > Done.
    
    Is this going to be applied?
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
      + Everyone has their own god. +