Thread

  1. Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-05T06:45:47Z

    > I will provide a patch which can exeute pg_start/stop_backup
    > including to solve above comment and conditions in next stage.
    > Then please review.
    
    done.
    
    
    * Procedure
    
    1. Call pg_start_backup('x') on the standby.
    2. Take a backup of the data dir.
    3. Call pg_stop_backup() on the standby.
    4. Copy the control file on the standby to the backup.
    5. Check whether the control file is status during hot standby with pg_controldata.
       -> If the standby promote between 3. and 4., the backup can not recovery.
          -> pg_control is that "Minimum recovery ending location" is equals 0/0.
          -> backup-end record is not written.
    
    * Not correspond yet
    
      * full_page_write = off
        -> If the primary is "full_page_write = off", archive recovery may not act 
           normally. Therefore the standby may need to check whether "full_page_write
           = off" to WAL.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  2. Re: Online base backup from the hot-standby

    Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-08-05T08:02:15Z

    2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >> I will provide a patch which can exeute pg_start/stop_backup
    >> including to solve above comment and conditions in next stage.
    >> Then please review.
    >
    > done.
    
    great !
    
    >
    >
    > * Procedure
    >
    > 1. Call pg_start_backup('x') on the standby.
    > 2. Take a backup of the data dir.
    > 3. Call pg_stop_backup() on the standby.
    > 4. Copy the control file on the standby to the backup.
    > 5. Check whether the control file is status during hot standby with pg_controldata.
    >   -> If the standby promote between 3. and 4., the backup can not recovery.
    >      -> pg_control is that "Minimum recovery ending location" is equals 0/0.
    >      -> backup-end record is not written.
    >
    > * Not correspond yet
    >
    >  * full_page_write = off
    >    -> If the primary is "full_page_write = off", archive recovery may not act
    >       normally. Therefore the standby may need to check whether "full_page_write
    >       = off" to WAL.
    
    Isn't having a standby make the full_page_write = on in all case
    (bypass configuration) ?
    
    >
    > --------------------------------------------
    > Jun Ishizuka
    > NTT Software Corporation
    > TEL:045-317-7018
    > E-Mail: ishizuka.jun@po.ntts.co.jp
    > --------------------------------------------
    >
    >
    > --
    > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    > To make changes to your subscription:
    > http://www.postgresql.org/mailpref/pgsql-hackers
    >
    >
    
    
    
    -- 
    Cédric Villemain +33 (0)6 20 30 22 52
    http://2ndQuadrant.fr/
    PostgreSQL: Support 24x7 - Développement, Expertise et Formation
    
    
  3. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-15T08:46:53Z

    > > * Not correspond yet
    > >
    > >  * full_page_write = off
    > >    -> If the primary is "full_page_write = off", archive recovery may not act
    > >       normally. Therefore the standby may need to check whether "full_page_write
    > >       = off" to WAL.
    > 
    > Isn't having a standby make the full_page_write = on in all case
    > (bypass configuration) ?
    
    what's the meaning?
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  4. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2011-08-15T11:52:21Z

    2011/8/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >> > * Not correspond yet
    >> >
    >> >  * full_page_write = off
    >> >    -> If the primary is "full_page_write = off", archive recovery may not act
    >> >       normally. Therefore the standby may need to check whether "full_page_write
    >> >       = off" to WAL.
    >>
    >> Isn't having a standby make the full_page_write = on in all case
    >> (bypass configuration) ?
    >
    > what's the meaning?
    
    Yeah.  full_page_writes is a WAL generation parameter.  Standbys don't
    generate WAL.  I think you just have to insist that the master has it
    on.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  5. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-16T06:09:16Z

    > >> > * Not correspond yet
    > >> >
    > >> > ?* full_page_write = off
    > >> > ? ?-> If the primary is "full_page_write = off", archive recovery may not act
    > >> > ? ? ? normally. Therefore the standby may need to check whether "full_page_write
    > >> > ? ? ? = off" to WAL.
    > >>
    > >> Isn't having a standby make the full_page_write = on in all case
    > >> (bypass configuration) ?
    > >
    > > what's the meaning?
    
    Thanks. 
    
    This has the following two problems.
     * pg_start_backup() must set 'on' to full_page_writes of the master that 
       is actual writing of the WAL, but not the standby.
     * The standby doesn't need to connect to the master that's actual writing 
       WAL.
       (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
    
    I'm worried how I should clear these problems.
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  6. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-16T06:12:59Z

    > >> > * Not correspond yet
    > >> >
    > >> > ?* full_page_write = off
    > >> > ? ?-> If the primary is "full_page_write = off", archive recovery may not act
    > >> > ? ? ? normally. Therefore the standby may need to check whether "full_page_write
    > >> > ? ? ? = off" to WAL.
    > >>
    > >> Isn't having a standby make the full_page_write = on in all case
    > >> (bypass configuration) ?
    > >
    > > what's the meaning?
    > 
    > Yeah.  full_page_writes is a WAL generation parameter.  Standbys don't
    > generate WAL.  I think you just have to insist that the master has it
    > on.
    
    Thanks. 
    
    This has the following two problems.
     * pg_start_backup() must set 'on' to full_page_writes of the master that 
       is actual writing of the WAL, but not the standby.
     * The standby doesn't need to connect to the master that's actual writing 
       WAL.
       (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
    
    I'm worried how I should clear these problems.
    
    Regards.
    
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  7. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-08-16T15:24:24Z

    On 11-08-16 02:09 AM, Jun Ishiduka wrote:
    >
    > Thanks. 
    >
    > This has the following two problems.
    >  * pg_start_backup() must set 'on' to full_page_writes of the master that 
    >    is actual writing of the WAL, but not the standby.
    Is there any way to tell from the WAL segments if they contain the full
    page data? If so could you verify this on the second slave when it is
    brought up? Or can you track this on the first slave and produce an
    error in either pg_start_backup or pg_stop_backup()
    
    I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    flag is used to indicate that the archiver can compress the full page
    blocks to non-full page blocks. I am not familiar with where in the code
    this actually happens but will this cause issues if the first standby is
    processing WAL files from the archive?
    
    
    >  * The standby doesn't need to connect to the master that's actual writing 
    >    WAL.
    >    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
    >
    > I'm worried how I should clear these problems.
    >
    > Regards.
    >
    > --------------------------------------------
    > Jun Ishizuka
    > NTT Software Corporation
    > TEL:045-317-7018
    > E-Mail: ishizuka.jun@po.ntts.co.jp
    > --------------------------------------------
    >
    >
    >
    
    
    
  8. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-17T08:59:37Z

    > Is there any way to tell from the WAL segments if they contain the full
    > page data? If so could you verify this on the second slave when it is
    > brought up? Or can you track this on the first slave and produce an
    > error in either pg_start_backup or pg_stop_backup()
    
    Sure.
    I will make a patch with the way to tell from the WAL segments if they 
    contain the full page data.
    
    
    > I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    > flag is used to indicate that the archiver can compress the full page
    > blocks to non-full page blocks. I am not familiar with where in the code
    > this actually happens but will this cause issues if the first standby is
    > processing WAL files from the archive?
    
    I confirmed the flag in xlog.c, so I seemed to only insert it in 
    XLogInsert(). I consider whether it is available.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  9. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-08-17T10:19:03Z

    2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >> I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    >> flag is used to indicate that the archiver can compress the full page
    >> blocks to non-full page blocks. I am not familiar with where in the code
    >> this actually happens but will this cause issues if the first standby is
    >> processing WAL files from the archive?
    >
    > I confirmed the flag in xlog.c, so I seemed to only insert it in
    > XLogInsert(). I consider whether it is available.
    
    That flag is not available to check whether full-page writing was
    skipped or not.
    Because it's in full-page data, not non-full-page one.
    
    The straightforward approach to address the problem you raised is to log
    the change of full_page_writes on the master. Since such a WAL record is also
    replicated to the standby, the standby can know whether full_page_writes is
    enabled or not in the master, from the WAL record. If it's disabled,
    pg_start_backup() in the standby should emit an error and refuse standby-only
    backup. If the WAL record indicating that full_page_writes was disabled
    on the master arrives during standby-only backup, the standby should cancel
    the backup.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  10. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2011-08-17T12:40:15Z

    On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > 2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >>> I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
    >>> flag is used to indicate that the archiver can compress the full page
    >>> blocks to non-full page blocks. I am not familiar with where in the code
    >>> this actually happens but will this cause issues if the first standby is
    >>> processing WAL files from the archive?
    >>
    >> I confirmed the flag in xlog.c, so I seemed to only insert it in
    >> XLogInsert(). I consider whether it is available.
    >
    > That flag is not available to check whether full-page writing was
    > skipped or not.
    > Because it's in full-page data, not non-full-page one.
    >
    > The straightforward approach to address the problem you raised is to log
    > the change of full_page_writes on the master. Since such a WAL record is also
    > replicated to the standby, the standby can know whether full_page_writes is
    > enabled or not in the master, from the WAL record. If it's disabled,
    > pg_start_backup() in the standby should emit an error and refuse standby-only
    > backup. If the WAL record indicating that full_page_writes was disabled
    > on the master arrives during standby-only backup, the standby should cancel
    > the backup.
    
    Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  11. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-08-17T13:53:08Z

    On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> The straightforward approach to address the problem you raised is to log
    >> the change of full_page_writes on the master. Since such a WAL record is also
    >> replicated to the standby, the standby can know whether full_page_writes is
    >> enabled or not in the master, from the WAL record. If it's disabled,
    >> pg_start_backup() in the standby should emit an error and refuse standby-only
    >> backup. If the WAL record indicating that full_page_writes was disabled
    >> on the master arrives during standby-only backup, the standby should cancel
    >> the backup.
    >
    > Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
    
    I'm afraid it's not so easy. Because since fpw can be changed by
    SIGHUP, it's not
    easy to ensure that logging the change of fpw must happen ahead of the actual
    behavior change by that. Probably we need to make the backend which detects
    the change of fpw first log that.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  12. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2011-08-17T15:09:43Z

    On Wed, Aug 17, 2011 at 9:53 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
    >> On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>> The straightforward approach to address the problem you raised is to log
    >>> the change of full_page_writes on the master. Since such a WAL record is also
    >>> replicated to the standby, the standby can know whether full_page_writes is
    >>> enabled or not in the master, from the WAL record. If it's disabled,
    >>> pg_start_backup() in the standby should emit an error and refuse standby-only
    >>> backup. If the WAL record indicating that full_page_writes was disabled
    >>> on the master arrives during standby-only backup, the standby should cancel
    >>> the backup.
    >>
    >> Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
    >
    > I'm afraid it's not so easy. Because since fpw can be changed by
    > SIGHUP, it's not
    > easy to ensure that logging the change of fpw must happen ahead of the actual
    > behavior change by that. Probably we need to make the backend which detects
    > the change of fpw first log that.
    
    Ugh, you're right.  But then you might have problems if the state
    changes again before all backends have picked up the previous change.
    What I've thought about before is making one backend (say, bgwriter)
    store its latest value in shared memory, protected by some lock that
    would already be held at the time the value is needed.  Everyone else
    uses the shared memory copy instead of relying on their local value.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  13. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-08-18T01:43:21Z

    On Thu, Aug 18, 2011 at 12:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    > Ugh, you're right.  But then you might have problems if the state
    > changes again before all backends have picked up the previous change.
    
    Right.
    
    > What I've thought about before is making one backend (say, bgwriter)
    > store its latest value in shared memory, protected by some lock that
    > would already be held at the time the value is needed.  Everyone else
    > uses the shared memory copy instead of relying on their local value.
    
    Sounds reasonable.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  14. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-08-18T02:12:55Z

    2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    > * Procedure
    >
    > 1. Call pg_start_backup('x') on the standby.
    > 2. Take a backup of the data dir.
    > 3. Call pg_stop_backup() on the standby.
    > 4. Copy the control file on the standby to the backup.
    > 5. Check whether the control file is status during hot standby with pg_controldata.
    >   -> If the standby promote between 3. and 4., the backup can not recovery.
    >      -> pg_control is that "Minimum recovery ending location" is equals 0/0.
    >      -> backup-end record is not written.
    
    What if we do #4 before #3? The backup gets corrupted? My guess is
    that the backup is still valid even if we copy pg_control before executing
    pg_stop_backup(). Which would not require #5 because if the standby
    promotion happens before pg_stop_backup(), pg_stop_backup() can
    detect that status change and cancel the backup.
    
    #5 looks fragile. If we can get rid of it, the procedure becomes more
    robust, I think.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  15. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-18T05:47:25Z

    > > * Procedure
    > >
    > > 1. Call pg_start_backup('x') on the standby.
    > > 2. Take a backup of the data dir.
    > > 3. Call pg_stop_backup() on the standby.
    > > 4. Copy the control file on the standby to the backup.
    > > 5. Check whether the control file is status during hot standby with pg_controldata.
    > > ? -> If the standby promote between 3. and 4., the backup can not recovery.
    > > ? ? ?-> pg_control is that "Minimum recovery ending location" is equals 0/0.
    > > ? ? ?-> backup-end record is not written.
    > 
    > What if we do #4 before #3? The backup gets corrupted? My guess is
    > that the backup is still valid even if we copy pg_control before executing
    > pg_stop_backup(). Which would not require #5 because if the standby
    > promotion happens before pg_stop_backup(), pg_stop_backup() can
    > detect that status change and cancel the backup.
    > 
    > #5 looks fragile. If we can get rid of it, the procedure becomes more
    > robust, I think.
    
    Sure, you're right.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  16. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-12T06:46:37Z

    Hi, Created a patch in response to comments.
    
    
    * Procedure
    1. Call pg_start_backup('x') on hot standby.
    2. Take a backup of the data dir.
    3. Copy the control file on hot standby to the backup.
    4. Call pg_stop_backup() on hot standby.
    
    
    * Behavior
    (take backup)
     If we execute pg_start_backup() on hot standby then execute restartpoint,
     write a strings as "FROM: slave" in backup_label and change backup mode,
     but do not change full_page_writes into "on" forcibly.
    
     If we execute pg_stop_backup() on hot standby then rename backup_label
     and change backup mode, but neither write backup end record and history
     file nor wait to complete the WAL archiving.
     pg_stop_backup() is returned this MinRecoveryPoint as result.
    
     If we execute pg_stop_backup() on the server promoted then error
     message is output since read the backup_label.
    
    (recovery)
     If we recover with the backup taken on hot standby, MinRecoveryPoint in
     the control file copied by 3 of above-procedure is used instead of backup
     end record.
    
     If recovery starts as first, BackupEndPoint in the control file is written
     a same value as MinRecoveryPoint. This is for remembering the value of
     MinRecoveryPoint during recovery.
    
     HINT message("If this has ...") is always output when we recover with the
     backup taken on hot standby.
    
    
    * Problem
     full_page_writes's problem.
      > This has the following two problems.
      >  * pg_start_backup() must set 'on' to full_page_writes of the master that 
      >    is actual writing of the WAL, but not the standby.
      >  * The standby doesn't need to connect to the master that's actual writing 
      >    WAL.
      >    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
      > 
      > I'm worried how I should clear these problems.
    
     Status: Considering
      (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)
    
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  17. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-13T06:50:28Z

    Update patch.
    
    Changes:
      * set 'on' full_page_writes by user (in document)
      * read "FROM: XX" in backup_label (in xlog.c)
      * check status when pg_stop_backup is executed (in xlog.c)
    
    > Hi, Created a patch in response to comments.
    > 
    > 
    > * Procedure
    > 1. Call pg_start_backup('x') on hot standby.
    > 2. Take a backup of the data dir.
    > 3. Copy the control file on hot standby to the backup.
    > 4. Call pg_stop_backup() on hot standby.
    > 
    > 
    > * Behavior
    > (take backup)
    >  If we execute pg_start_backup() on hot standby then execute restartpoint,
    >  write a strings as "FROM: slave" in backup_label and change backup mode,
    >  but do not change full_page_writes into "on" forcibly.
    > 
    >  If we execute pg_stop_backup() on hot standby then rename backup_label
    >  and change backup mode, but neither write backup end record and history
    >  file nor wait to complete the WAL archiving.
    >  pg_stop_backup() is returned this MinRecoveryPoint as result.
    > 
    >  If we execute pg_stop_backup() on the server promoted then error
    >  message is output since read the backup_label.
    > 
    > (recovery)
    >  If we recover with the backup taken on hot standby, MinRecoveryPoint in
    >  the control file copied by 3 of above-procedure is used instead of backup
    >  end record.
    > 
    >  If recovery starts as first, BackupEndPoint in the control file is written
    >  a same value as MinRecoveryPoint. This is for remembering the value of
    >  MinRecoveryPoint during recovery.
    > 
    >  HINT message("If this has ...") is always output when we recover with the
    >  backup taken on hot standby.
    > 
    > 
    > * Problem
    >  full_page_writes's problem.
    >   > This has the following two problems.
    >   >  * pg_start_backup() must set 'on' to full_page_writes of the master that 
    >   >    is actual writing of the WAL, but not the standby.
    >   >  * The standby doesn't need to connect to the master that's actual writing 
    >   >    WAL.
    >   >    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
    >   > 
    >   > I'm worried how I should clear these problems.
    > 
    >  Status: Considering
    >   (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)
    > 
    > 
    > Regards.
    > 
    > 
    > --------------------------------------------
    > Jun Ishizuka
    > NTT Software Corporation
    > TEL:045-317-7018
    > E-Mail: ishizuka.jun@po.ntts.co.jp
    > --------------------------------------------
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  18. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-21T02:50:24Z

    2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >
    > Update patch.
    >
    > Changes:
    >  * set 'on' full_page_writes by user (in document)
    >  * read "FROM: XX" in backup_label (in xlog.c)
    >  * check status when pg_stop_backup is executed (in xlog.c)
    
    Thanks for updating the patch.
    
    Before reviewing the patch, to encourage people to comment and
    review the patch, I explain what this patch provides:
    
    This patch provides the capability to take a base backup during recovery,
    i.e., from the standby server. This is very useful feature to offload the
    expense of periodic backups from the master. That backup procedure is
    similar to that during normal running, but slightly different:
    
    1. Execute pg_start_backup on the standby. To execute a query on the
       standby, hot standby must be enabled.
    
    2. Perform a file system backup on the standby.
    
    3. Copy the pg_control file from the cluster directory on the standby to
        the backup as follows:
    
        cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    
    4. Execute pg_stop_backup on the standby.
    
    The backup taken by the above procedure is available for an archive
    recovery or standby server.
    
    If the standby is promoted during a backup, pg_stop_backup() detects
    the change of the server status and fails. The data backed up before the
    promotion is invalid and not available for recovery.
    
    Taking a backup from the standby by using pg_basebackup is still not
    possible. But we can relax that restriction after applying this patch.
    
    To take a base backup during recovery safely, some sort of parameters
    must be set properly. Hot standby must be enabled on the standby, i.e.,
    wal_level and hot_standby must be enabled on the master and the standby,
    respectively. FPW (full page writes) is required for a base backup,
    so full_page_writes must be enabled on the master.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  19. Re: Online base backup from the hot-standby

    Magnus Hagander <magnus@hagander.net> — 2011-09-21T05:13:21Z

    On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
    > 2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >>
    >> Update patch.
    >>
    >> Changes:
    >>  * set 'on' full_page_writes by user (in document)
    >>  * read "FROM: XX" in backup_label (in xlog.c)
    >>  * check status when pg_stop_backup is executed (in xlog.c)
    >
    > Thanks for updating the patch.
    >
    > Before reviewing the patch, to encourage people to comment and
    > review the patch, I explain what this patch provides:
    >
    > This patch provides the capability to take a base backup during recovery,
    > i.e., from the standby server. This is very useful feature to offload the
    > expense of periodic backups from the master. That backup procedure is
    > similar to that during normal running, but slightly different:
    >
    > 1. Execute pg_start_backup on the standby. To execute a query on the
    >   standby, hot standby must be enabled.
    >
    > 2. Perform a file system backup on the standby.
    >
    > 3. Copy the pg_control file from the cluster directory on the standby to
    >    the backup as follows:
    >
    >    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    
    But this is done as part of step 2 already. I assume what this really
    means is that the pg_control file must be the last file backed up?
    
    (Since there are certainly a lot other ways to do the backup than just
    cp to a mounted directory..)
    
    
    > 4. Execute pg_stop_backup on the standby.
    >
    > The backup taken by the above procedure is available for an archive
    > recovery or standby server.
    >
    > If the standby is promoted during a backup, pg_stop_backup() detects
    > the change of the server status and fails. The data backed up before the
    > promotion is invalid and not available for recovery.
    >
    > Taking a backup from the standby by using pg_basebackup is still not
    > possible. But we can relax that restriction after applying this patch.
    
    I think that this is going to be very important, particularly given
    the requirements on pt 3 above. (But yes, it certainly doesn't have to
    be done as part of this patch, but it really should be the plan to
    have this included in the same version)
    
    
    > To take a base backup during recovery safely, some sort of parameters
    > must be set properly. Hot standby must be enabled on the standby, i.e.,
    > wal_level and hot_standby must be enabled on the master and the standby,
    > respectively. FPW (full page writes) is required for a base backup,
    > so full_page_writes must be enabled on the master.
    
    Presumably pg_start_backup() will check this. And we'll somehow track
    this before pg_stop_backup() as well? (for such evil things such as
    the user changing FPW from on to off and then back to on again during
    a backup, will will make it look correct both during start and stop,
    but incorrect in the middle - pg_stop_backup needs to fail in that
    case as well)
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  20. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-21T06:23:53Z

    On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
    > On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> 3. Copy the pg_control file from the cluster directory on the standby to
    >>    the backup as follows:
    >>
    >>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    >
    > But this is done as part of step 2 already. I assume what this really
    > means is that the pg_control file must be the last file backed up?
    
    Yes.
    
    When we perform an archive recovery from the backup taken during
    normal processing, we gets a backup end location from the backup-end
    WAL record which was written by pg_stop_backup(). But since no WAL
    writing is allowed during recovery, pg_stop_backup() on the standby
    cannot write a backup-end WAL record. So, in his patch, instead of
    a backup-end WAL record, the startup process uses the minimum
    recovery point recorded in pg_control which has been included in the
    backup, as a backup end location. BTW, a backup end location is
    used to check whether recovery has reached a consistency state
    (i.e., end-of-backup).
    
    To use the minimum recovery point in pg_control as a backup end
    location safely, pg_control must be backed up last. Otherwise, data
    page which has the newer LSN than the minimum recovery point
    might be included in the backup.
    
    > (Since there are certainly a lot other ways to do the backup than just
    > cp to a mounted directory..)
    
    Yes. The above command I described is just an example.
    
    >> 4. Execute pg_stop_backup on the standby.
    >>
    >> The backup taken by the above procedure is available for an archive
    >> recovery or standby server.
    >>
    >> If the standby is promoted during a backup, pg_stop_backup() detects
    >> the change of the server status and fails. The data backed up before the
    >> promotion is invalid and not available for recovery.
    >>
    >> Taking a backup from the standby by using pg_basebackup is still not
    >> possible. But we can relax that restriction after applying this patch.
    >
    > I think that this is going to be very important, particularly given
    > the requirements on pt 3 above. (But yes, it certainly doesn't have to
    > be done as part of this patch, but it really should be the plan to
    > have this included in the same version)
    
    Agreed.
    
    >> To take a base backup during recovery safely, some sort of parameters
    >> must be set properly. Hot standby must be enabled on the standby, i.e.,
    >> wal_level and hot_standby must be enabled on the master and the standby,
    >> respectively. FPW (full page writes) is required for a base backup,
    >> so full_page_writes must be enabled on the master.
    >
    > Presumably pg_start_backup() will check this. And we'll somehow track
    > this before pg_stop_backup() as well? (for such evil things such as
    > the user changing FPW from on to off and then back to on again during
    > a backup, will will make it look correct both during start and stop,
    > but incorrect in the middle - pg_stop_backup needs to fail in that
    > case as well)
    
    Right. As I suggested upthread, to address that problem, we need to log
    the change of FPW on the master, and then we need to check whether
    such a WAL is replayed on the standby during the backup. If it's done,
    pg_stop_backup() should emit an error.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  21. Re: Online base backup from the hot-standby

    Magnus Hagander <magnus@hagander.net> — 2011-09-21T08:34:26Z

    On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >> On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>> 3. Copy the pg_control file from the cluster directory on the standby to
    >>>    the backup as follows:
    >>>
    >>>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    >>
    >> But this is done as part of step 2 already. I assume what this really
    >> means is that the pg_control file must be the last file backed up?
    >
    > Yes.
    >
    > When we perform an archive recovery from the backup taken during
    > normal processing, we gets a backup end location from the backup-end
    > WAL record which was written by pg_stop_backup(). But since no WAL
    > writing is allowed during recovery, pg_stop_backup() on the standby
    > cannot write a backup-end WAL record. So, in his patch, instead of
    > a backup-end WAL record, the startup process uses the minimum
    > recovery point recorded in pg_control which has been included in the
    > backup, as a backup end location. BTW, a backup end location is
    > used to check whether recovery has reached a consistency state
    > (i.e., end-of-backup).
    >
    > To use the minimum recovery point in pg_control as a backup end
    > location safely, pg_control must be backed up last. Otherwise, data
    > page which has the newer LSN than the minimum recovery point
    > might be included in the backup.
    
    Ah, check.
    
    
    >> (Since there are certainly a lot other ways to do the backup than just
    >> cp to a mounted directory..)
    >
    > Yes. The above command I described is just an example.
    
    ok.
    
    
    >>> 4. Execute pg_stop_backup on the standby.
    >>>
    >>> The backup taken by the above procedure is available for an archive
    >>> recovery or standby server.
    >>>
    >>> If the standby is promoted during a backup, pg_stop_backup() detects
    >>> the change of the server status and fails. The data backed up before the
    >>> promotion is invalid and not available for recovery.
    >>>
    >>> Taking a backup from the standby by using pg_basebackup is still not
    >>> possible. But we can relax that restriction after applying this patch.
    >>
    >> I think that this is going to be very important, particularly given
    >> the requirements on pt 3 above. (But yes, it certainly doesn't have to
    >> be done as part of this patch, but it really should be the plan to
    >> have this included in the same version)
    >
    > Agreed.
    >
    >>> To take a base backup during recovery safely, some sort of parameters
    >>> must be set properly. Hot standby must be enabled on the standby, i.e.,
    >>> wal_level and hot_standby must be enabled on the master and the standby,
    >>> respectively. FPW (full page writes) is required for a base backup,
    >>> so full_page_writes must be enabled on the master.
    >>
    >> Presumably pg_start_backup() will check this. And we'll somehow track
    >> this before pg_stop_backup() as well? (for such evil things such as
    >> the user changing FPW from on to off and then back to on again during
    >> a backup, will will make it look correct both during start and stop,
    >> but incorrect in the middle - pg_stop_backup needs to fail in that
    >> case as well)
    >
    > Right. As I suggested upthread, to address that problem, we need to log
    > the change of FPW on the master, and then we need to check whether
    > such a WAL is replayed on the standby during the backup. If it's done,
    > pg_stop_backup() should emit an error.
    
    I somehow missed this thread completely, so I didn't catch your
    previous comments - oops, sorry. The important point being that we
    need to track if when this happens even if it has been reset to a
    valid value. So we can't just check the state of the variable at the
    beginning and at the end.
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  22. Remastering using streaming only replication?

    Josh Berkus <josh@agliodbs.com> — 2011-09-21T16:52:18Z

    Fujii,
    
    I haven't really been following your latest patches about taking backups
    from the standby and cascading replication, but I wanted to see if it
    fulfills another TODO: the ability to "remaster" (that is, designate the
    "lead standby" as the new master) without needing to copy WAL files.
    
    Supporting remastering using steaming replication only was on your TODO
    list when we closed 9.1.  It seems like this would get solved as a
    side-effect, but I wanted to confirm that.
    
    -- 
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
    
    
  23. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-22T12:13:43Z

    On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander <magnus@hagander.net> wrote:
    > On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >>> Presumably pg_start_backup() will check this. And we'll somehow track
    >>> this before pg_stop_backup() as well? (for such evil things such as
    >>> the user changing FPW from on to off and then back to on again during
    >>> a backup, will will make it look correct both during start and stop,
    >>> but incorrect in the middle - pg_stop_backup needs to fail in that
    >>> case as well)
    >>
    >> Right. As I suggested upthread, to address that problem, we need to log
    >> the change of FPW on the master, and then we need to check whether
    >> such a WAL is replayed on the standby during the backup. If it's done,
    >> pg_stop_backup() should emit an error.
    >
    > I somehow missed this thread completely, so I didn't catch your
    > previous comments - oops, sorry. The important point being that we
    > need to track if when this happens even if it has been reset to a
    > valid value. So we can't just check the state of the variable at the
    > beginning and at the end.
    
    Right. Let me explain again what I'm thinking.
    
    When FPW is changed, the master always writes the WAL record
    which contains the current value of FPW. This means that the standby
    can track all changes of FPW by reading WAL records.
    
    The standby has two flags: One indicates whether FPW has always
    been TRUE since last restartpoint. Another indicates whether FPW
    has always been TRUE since last pg_start_backup(). The standby
    can maintain those flags by reading WAL records streamed from
    the master.
    
    If the former flag indicates FALSE (i.e., the WAL records which
    the standby has replayed since last restartpoint might not contain
    required FPW), pg_start_backup() fails. If the latter flag indicates
    FALSE (i.e., the WAL records which the standby has replayed
    during the backup might not contain required FPW),
    pg_stop_backup() fails.
    
    If I'm not missing something, this approach can address the problem
    which you're concerned about.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  24. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-22T13:24:51Z

    On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > 2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >>
    >> Update patch.
    >>
    >> Changes:
    >>  * set 'on' full_page_writes by user (in document)
    >>  * read "FROM: XX" in backup_label (in xlog.c)
    >>  * check status when pg_stop_backup is executed (in xlog.c)
    >
    > Thanks for updating the patch.
    >
    > Before reviewing the patch, to encourage people to comment and
    > review the patch, I explain what this patch provides:
    
    Attached is the updated version of the patch. I refactored the code, fixed
    some bugs, added lots of source code comments, improved the document,
    but didn't change the basic design. Please check this patch, and let's use
    this patch as the base if you agree with that.
    
    In the current patch, there is no safeguard for preventing users from
    taking backup during recovery when FPW is disabled. This is unsafe.
    Are you planning to implement such a safeguard?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  25. Re: Online base backup from the hot-standby

    Magnus Hagander <magnus@hagander.net> — 2011-09-22T15:44:44Z

    On Thu, Sep 22, 2011 at 14:13, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >> On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >>>> Presumably pg_start_backup() will check this. And we'll somehow track
    >>>> this before pg_stop_backup() as well? (for such evil things such as
    >>>> the user changing FPW from on to off and then back to on again during
    >>>> a backup, will will make it look correct both during start and stop,
    >>>> but incorrect in the middle - pg_stop_backup needs to fail in that
    >>>> case as well)
    >>>
    >>> Right. As I suggested upthread, to address that problem, we need to log
    >>> the change of FPW on the master, and then we need to check whether
    >>> such a WAL is replayed on the standby during the backup. If it's done,
    >>> pg_stop_backup() should emit an error.
    >>
    >> I somehow missed this thread completely, so I didn't catch your
    >> previous comments - oops, sorry. The important point being that we
    >> need to track if when this happens even if it has been reset to a
    >> valid value. So we can't just check the state of the variable at the
    >> beginning and at the end.
    >
    > Right. Let me explain again what I'm thinking.
    >
    > When FPW is changed, the master always writes the WAL record
    > which contains the current value of FPW. This means that the standby
    > can track all changes of FPW by reading WAL records.
    >
    > The standby has two flags: One indicates whether FPW has always
    > been TRUE since last restartpoint. Another indicates whether FPW
    > has always been TRUE since last pg_start_backup(). The standby
    > can maintain those flags by reading WAL records streamed from
    > the master.
    >
    > If the former flag indicates FALSE (i.e., the WAL records which
    > the standby has replayed since last restartpoint might not contain
    > required FPW), pg_start_backup() fails. If the latter flag indicates
    > FALSE (i.e., the WAL records which the standby has replayed
    > during the backup might not contain required FPW),
    > pg_stop_backup() fails.
    >
    > If I'm not missing something, this approach can address the problem
    > which you're concerned about.
    
    Yeah, it sounds safe to me.
    
    Would it make sense for pg_start_backup() to have the ability to wait
    for the next restartpoint in a case like this, if we know that FPW has
    been set? Instead of failing? Or maybe that's just overcomplicating
    things when trying to be user-friendly.
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  26. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-09-26T02:39:00Z

    On 11-09-22 09:24 AM, Fujii Masao wrote:
    > On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
    >> 2011/9/13 Jun Ishiduka<ishizuka.jun@po.ntts.co.jp>:
    >>> Update patch.
    >>>
    >>> Changes:
    >>>   * set 'on' full_page_writes by user (in document)
    >>>   * read "FROM: XX" in backup_label (in xlog.c)
    >>>   * check status when pg_stop_backup is executed (in xlog.c)
    >> Thanks for updating the patch.
    >>
    >> Before reviewing the patch, to encourage people to comment and
    >> review the patch, I explain what this patch provides:
    > Attached is the updated version of the patch. I refactored the code, fixed
    > some bugs, added lots of source code comments, improved the document,
    > but didn't change the basic design. Please check this patch, and let's use
    > this patch as the base if you agree with that.
    >
    
    I have looked at both Jun's patch from Sept 13 and Fujii's updates to 
    the patch.  I agree that Fujii's updated version should be used as the 
    basis for changes going forward.   My comments below refer to that 
    version (unless otherwise noted).
    
    
    In backup.sgml  the new section titled "Making a Base Backup during 
    Recovery"  I would prefer to see some mention in the title that this 
    procedure is for standby servers ie "Making a Base Backup from a Standby 
    Database".  Users who have setup a hot-standby database should be 
    familiar with the 'standby' terminology. I agree that the "during 
    recovery" description is technically correct but I'm not sure someone 
    who is looking through the manual for instructions on making a base 
    backup from here standby will realize this is the section they should read.
    
    Around line 969 where you give an example of copying the control file I 
    would be a bit clearer that this is an example command.  Ie (Copy the 
    pg_control file from the cluster directory to the global sub-directory 
    of the backup.  For example "cp $PGDATA/global/pg_control 
    /mnt/server/backupdir/global")
    
    
    Testing Notes
    -----------------------------
    
    I created a standby server from a base backup of another standby server. 
    On this new standby server I then
    
    1. Ran pg_start_backup('3'); and left the psql connection open
    2. touch /tmp/3 -- my trigger_file
    
    ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file 
    found: /tmp/3
    FATAL:  terminating walreceiver process due to administrator command
    LOG:  restored log file "000000010000000000000006" from archive
    LOG:  record with zero length at 0/60002F0
    LOG:  restored log file "000000010000000000000006" from archive
    LOG:  redo done at 0/6000298
    LOG:  restored log file "000000010000000000000006" from archive
    PANIC:  record with zero length at 0/6000298
    LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
    LOG:  terminating any other active server processes
    WARNING:  terminating connection because of crash of another server process
    DETAIL:  The postmaster has commanded this server process to roll back 
    the current transaction and exit, because another server process exited 
    abnormally and possibly corrupted shared memory.
    HINT:  In a moment you should be able to reconnect to the database and 
    repeat your command.
    
    The new postmaster (the one trying to be promoted) dies.  This is 
    somewhat repeatable.
    
    ----
    
    If a base backup is in progress on a recovery database and that recovery 
    database is promoted to master, following the promotion (if you don't 
    restart the postmaster).  I see
    select pg_stop_backup();
    ERROR:  database system status mismatches between pg_start_backup() and 
    pg_stop_backup()
    
    If you restart the postmaster this goes away.  When the postmaster 
    leaves recovery mode I think it should abort an existing base backup so 
    pg_stop_backup() will say no backup in progress, or give an error 
    message on pg_stop_backup() saying that the base backup won't be 
    usable.  The above error doesn't really tell the user why there is a 
    mismatch.
    
    ---------
    
    In my testing a few times I got into a situation where a standby server 
    coming from a recovery target took a while to finish recovery (this is 
    on a database with no activity).  Then when i tried promoting that 
    server to master I got
    
    LOG:  trigger file found: /tmp/3
    FATAL:  terminating walreceiver process due to administrator command
    LOG:  restored log file "000000010000000000000009" from archive
    LOG:  restored log file "000000010000000000000009" from archive
    LOG:  redo done at 0/90000E8
    LOG:  restored log file "000000010000000000000009" from archive
    PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
    LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
    LOG:  terminating any other active server processes
    
    
    It is *possible* I mixed up the order of a step somewhere since my 
    testing isn't script based. A standby server that 'looks' okay but can't 
    actually be promoted is dangerous.
    
    This version of the patch (I was testing the Sept 22nd version) seems 
    less stable than how I remember the version from the July CF.  Maybe I'm 
    just testing it harder or maybe something has been broken.
    
    
    
    > In the current patch, there is no safeguard for preventing users from
    > taking backup during recovery when FPW is disabled. This is unsafe.
    > Are you planning to implement such a safeguard?
    >
    
    I agree with Fujii that we need a way (on the recovery machine) to 
    detect if the master doesn't have FPW on. The ideas up-thread on how to 
    do this sound good.
    
    
    > Regards,
    >
    >
    >
    >
    
    
  27. Re: Remastering using streaming only replication?

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-26T08:07:21Z

    On Thu, Sep 22, 2011 at 1:52 AM, Josh Berkus <josh@agliodbs.com> wrote:
    > Fujii,
    >
    > I haven't really been following your latest patches about taking backups
    > from the standby and cascading replication, but I wanted to see if it
    > fulfills another TODO: the ability to "remaster" (that is, designate the
    > "lead standby" as the new master) without needing to copy WAL files.
    
    Sorry, I could not follow you. I believe that we can "remaster" even in 9.1.
    When the master crashes, we can choose the "lead standby" by comparing
    each standby replay location, and can promote it by pg_ctl promote.
    
    What "remaster" feature are you expecting we should develop in 9.2?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  28. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-26T08:41:06Z

    > Attached is the updated version of the patch. I refactored the code, fixed
    > some bugs, added lots of source code comments, improved the document,
    > but didn't change the basic design. Please check this patch, and let's use
    > this patch as the base if you agree with that.
    
    Thanks for update patch.
    Yes. I agree.
    
    
    > In the current patch, there is no safeguard for preventing users from
    > taking backup during recovery when FPW is disabled. This is unsafe.
    > Are you planning to implement such a safeguard?
    
    Yes.
    I want to reference the following Fujii's comments.
    
    -------------------------------------------------------------------------
    > Right. Let me explain again what I'm thinking.
    > 
    > When FPW is changed, the master always writes the WAL record
    > which contains the current value of FPW. This means that the standby
    > can track all changes of FPW by reading WAL records.
    > 
    > The standby has two flags: One indicates whether FPW has always
    > been TRUE since last restartpoint. Another indicates whether FPW
    > has always been TRUE since last pg_start_backup(). The standby
    > can maintain those flags by reading WAL records streamed from
    > the master.
    > 
    > If the former flag indicates FALSE (i.e., the WAL records which
    > the standby has replayed since last restartpoint might not contain
    > required FPW), pg_start_backup() fails. If the latter flag indicates
    > FALSE (i.e., the WAL records which the standby has replayed
    > during the backup might not contain required FPW),
    > pg_stop_backup() fails.
    > 
    > If I'm not missing something, this approach can address the problem
    > which you're concerned about.
    -------------------------------------------------------------------------
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  29. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-26T12:12:41Z

    On Fri, Sep 23, 2011 at 12:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
    > Would it make sense for pg_start_backup() to have the ability to wait
    > for the next restartpoint in a case like this, if we know that FPW has
    > been set? Instead of failing? Or maybe that's just overcomplicating
    > things when trying to be user-friendly.
    
    I don't think that it's worth adding code for such a feature. Because I believe
    there are not many users who enable FPW on-the-fly for standby-only backup
    and use such a feature.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  30. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-27T02:56:25Z

    On Mon, Sep 26, 2011 at 11:39 AM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
    > I have looked at both Jun's patch from Sept 13 and Fujii's updates to the
    > patch.  I agree that Fujii's updated version should be used as the basis for
    > changes going forward.   My comments below refer to that version (unless
    > otherwise noted).
    
    Thanks for the tests and comments!
    
    > In backup.sgml  the new section titled "Making a Base Backup during
    > Recovery"  I would prefer to see some mention in the title that this
    > procedure is for standby servers ie "Making a Base Backup from a Standby
    > Database".  Users who have setup a hot-standby database should be familiar
    > with the 'standby' terminology. I agree that the "during recovery"
    > description is technically correct but I'm not sure someone who is looking
    > through the manual for instructions on making a base backup from here
    > standby will realize this is the section they should read.
    
    I used the term "recovery" rather than "standby" because we can take
    a backup even from the server in normal archive recovery mode but not
    standby mode. But there is not many users who take a backup during
    normal archive recovery, so I agree that the term "standby" is better to
    be used in the document. Will change.
    
    > Around line 969 where you give an example of copying the control file I
    > would be a bit clearer that this is an example command.  Ie (Copy the
    > pg_control file from the cluster directory to the global sub-directory of
    > the backup.  For example "cp $PGDATA/global/pg_control
    > /mnt/server/backupdir/global")
    
    Looks better. Will change.
    
    > Testing Notes
    > -----------------------------
    >
    > I created a standby server from a base backup of another standby server. On
    > this new standby server I then
    >
    > 1. Ran pg_start_backup('3'); and left the psql connection open
    > 2. touch /tmp/3 -- my trigger_file
    >
    > ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file found:
    > /tmp/3
    > FATAL:  terminating walreceiver process due to administrator command
    > LOG:  restored log file "000000010000000000000006" from archive
    > LOG:  record with zero length at 0/60002F0
    > LOG:  restored log file "000000010000000000000006" from archive
    > LOG:  redo done at 0/6000298
    > LOG:  restored log file "000000010000000000000006" from archive
    > PANIC:  record with zero length at 0/6000298
    > LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
    > LOG:  terminating any other active server processes
    > WARNING:  terminating connection because of crash of another server process
    > DETAIL:  The postmaster has commanded this server process to roll back the
    > current transaction and exit, because another server process exited
    > abnormally and possibly corrupted shared memory.
    > HINT:  In a moment you should be able to reconnect to the database and
    > repeat your command.
    >
    > The new postmaster (the one trying to be promoted) dies.  This is somewhat
    > repeatable.
    
    Looks weired. Though the WAL record starting from 0/6000298 was read
    successfully, then re-fetch of the same record fails at the end of recovery.
    One possible cause is the corruption of archived WAL file. What
    restore_command on the standby and archive_command on the master
    are you using? Could you confirm that there is no chance to overwrite
    archive WAL files in your environment?
    
    I tried to reproduce this problem several times, but I could not. Could
    you provide the test case which reproduces the problem?
    
    > If a base backup is in progress on a recovery database and that recovery
    > database is promoted to master, following the promotion (if you don't
    > restart the postmaster).  I see
    > select pg_stop_backup();
    > ERROR:  database system status mismatches between pg_start_backup() and
    > pg_stop_backup()
    >
    > If you restart the postmaster this goes away.  When the postmaster leaves
    > recovery mode I think it should abort an existing base backup so
    > pg_stop_backup() will say no backup in progress,
    
    I don't think that it's good idea to cancel the backup when promoting
    the standby.
    Because if we do so, we need to handle correctly the case where cancel of backup
    and pg_start_backup/pg_stop_backup are performed at the same time. We can
    simply do that by protecting those whole operations including pg_start_backup's
    checkpoint by the lwlock. But I don't think that it's worth
    introducing new lwlock
    only for that. And it's not good to take a lwlock through
    time-consuming checkpoint
    operation. Of course we can avoid such a lwlock, but which would require more
    complicated code.
    
    > or give an error message on
    > pg_stop_backup() saying that the base backup won't be usable.  The above
    > error doesn't really tell the user why there is a mismatch.
    
    What about the following error message?
    
    ERROR:  pg_stop_backup() was executed during normal processing though
    pg_start_backup() was executed during recovery
    HINT:  The database backup will not be usable.
    
    Or, you have better idea?
    
    > In my testing a few times I got into a situation where a standby server
    > coming from a recovery target took a while to finish recovery (this is on a
    > database with no activity).  Then when i tried promoting that server to
    > master I got
    >
    > LOG:  trigger file found: /tmp/3
    > FATAL:  terminating walreceiver process due to administrator command
    > LOG:  restored log file "000000010000000000000009" from archive
    > LOG:  restored log file "000000010000000000000009" from archive
    > LOG:  redo done at 0/90000E8
    > LOG:  restored log file "000000010000000000000009" from archive
    > PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
    > LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
    > LOG:  terminating any other active server processes
    >
    > It is *possible* I mixed up the order of a step somewhere since my testing
    > isn't script based. A standby server that 'looks' okay but can't actually be
    > promoted is dangerous.
    
    Looks the same problem as the above. Another weired point is that
    the same archived WAL file is restored two times before redo is done.
    I'm not sure why this happens... Could you provide the test case which
    reproduces this problem? Will diagnose.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  31. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-27T05:51:38Z

    On Tue, Sep 27, 2011 at 11:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> In backup.sgml  the new section titled "Making a Base Backup during
    >> Recovery"  I would prefer to see some mention in the title that this
    >> procedure is for standby servers ie "Making a Base Backup from a Standby
    >> Database".  Users who have setup a hot-standby database should be familiar
    >> with the 'standby' terminology. I agree that the "during recovery"
    >> description is technically correct but I'm not sure someone who is looking
    >> through the manual for instructions on making a base backup from here
    >> standby will realize this is the section they should read.
    >
    > I used the term "recovery" rather than "standby" because we can take
    > a backup even from the server in normal archive recovery mode but not
    > standby mode. But there is not many users who take a backup during
    > normal archive recovery, so I agree that the term "standby" is better to
    > be used in the document. Will change.
    
    Done.
    
    >> Around line 969 where you give an example of copying the control file I
    >> would be a bit clearer that this is an example command.  Ie (Copy the
    >> pg_control file from the cluster directory to the global sub-directory of
    >> the backup.  For example "cp $PGDATA/global/pg_control
    >> /mnt/server/backupdir/global")
    >
    > Looks better. Will change.
    
    Done.
    
    >> or give an error message on
    >> pg_stop_backup() saying that the base backup won't be usable.  The above
    >> error doesn't really tell the user why there is a mismatch.
    >
    > What about the following error message?
    >
    > ERROR:  pg_stop_backup() was executed during normal processing though
    > pg_start_backup() was executed during recovery
    > HINT:  The database backup will not be usable.
    
    Done. I attached the new version of the patch.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  32. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-09-27T23:10:39Z

    On 11-09-26 10:56 PM, Fujii Masao wrote:
    >
    > Looks weired. Though the WAL record starting from 0/6000298 was read
    > successfully, then re-fetch of the same record fails at the end of recovery.
    > One possible cause is the corruption of archived WAL file. What
    > restore_command on the standby and archive_command on the master
    > are you using? Could you confirm that there is no chance to overwrite
    > archive WAL files in your environment?
    >
    > I tried to reproduce this problem several times, but I could not. Could
    > you provide the test case which reproduces the problem?
    >
    
    This is the test procedure I'm trying today, I wasn't able to reproduce 
    the crash.  What I was doing the other day was similar but I can't speak 
    to unintentional differences.
    
    
    I have my master server
    data
    port=5439
    wal_level=hot_standby
    archive_mode=on
    archive_command='cp -i %p /usr/local/pgsql92git/archive/%f'
    hot_standby=on
    
    I then run
    select pg_start_backup('foo');
    $ rm -r ../data2
    $ cp -r ../data ../data2
    $ rm ../data2/postmaster.pid
    select pg_stop_backup();
    I edit data2/postgresql.conf so
    port=5438
    I commented out archive_mode and archive_command (or at least today I did)
    recovery.conf is
    
    standby_mode='on'
    primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
    restore_command='cp /usr/local/pgsql92git/archive/%f %p'
    
    I then start up the second cluster. On it I run
    
    select pg_start_backup('1');
    $ rm -r ../data3
    $ rm -r ../archive2
    $ cp -r ../data2 ../data3
    $ cp ../data2/global/pg_control ../data3/global
    
    select pg_stop_backup();
    I edit ../data2/postgresql.conf
    port=5437
    archive_mode=on
                                     # (change requires restart)
    archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'
    
    recovery.conf is
    
    standby_mode='on'
    primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
    restore_command='cp /usr/local/pgsql92git/archive/%f %p'
    trigger_file='/tmp/3'
    
    $ postgres -D ../data3
    
    The first time I did this postgres came up quickly.
    
    $ touch /tmp/3
    
    worked fine.
    
    I then stopped data3
    $ rm -r ../data3
    on data 2 I run
    pg_start_backup('1')
    $ cp -r ../data2 ../data3
    $ cp ../data2/global/pg_control ../data3/global
    select pg_stop_backup() # on data2
    $ rm ../data3/postmaster.pid
    vi ../data3/postgresql.conf # same changes as above for data3
    vi ../data3/recovery.conf # same as above for data 3
    postgres -D ../data3
    
    This time I got
    ./postgres -D ../data3
    LOG:  database system was interrupted while in recovery at log time 
    2011-09-27 22:04:17 GMT
    HINT:  If this has occurred more than once some data might be corrupted 
    and you might need to choose an earlier recovery target.
    LOG:  entering standby mode
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
    or directory
    LOG:  redo starts at 0/C000020
    LOG:  record with incorrect prev-link 0/9000058 at 0/C0000B0
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
    or directory
    LOG:  streaming replication successfully connected to primary
    FATAL:  the database system is starting up
    FATAL:  the database system is starting up
    LOG:  consistent recovery state reached at 0/C0000E8
    LOG:  database system is ready to accept read only connections
    
    In order to get the database to come in read only mode I manually issued 
    a checkpoint on the master (data) shortly after the checkpoint command 
    the data3 instance went to read only mode.
    
    then
    
    touch /tmp/3
    
    trigger file found: /tmp/3
    FATAL:  terminating walreceiver process due to administrator command
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
    or directory
    LOG:  record with incorrect prev-link 0/9000298 at 0/C0002F0
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
    or directory
    LOG:  redo done at 0/C000298
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
    or directory
    cp: cannot stat `/usr/local/pgsql92git/archive/00000002.history': No 
    such file or directory
    LOG:  selected new timeline ID: 2
    cp: cannot stat `/usr/local/pgsql92git/archive/00000001.history': No 
    such file or directory
    LOG:  archive recovery complete
    LOG:  database system is ready to accept connections
    LOG:  autovacuum launcher started
    
    
    It looks like data3 is still pulling files with the recovery command 
    after it sees the touch file (is this expected behaviour?)
    $ grep archive ../data3/postgresql.conf
    #wal_level = minimal            # minimal, archive, or hot_standby
    #archive_mode = off        # allows archiving to be done
    archive_mode=on
    archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'
    
    
    I have NOT been able to make postgres crash during a recovery (today).  
    It is *possible* that on some of my runs the other day I had skipped 
    changing the archive command on data3 to write to archive2 instead of 
    archive.
    
    I have also today not been able to get it to attempt to restore the same 
    WAL file twice.
    
    
    >> If a base backup is in progress on a recovery database and that recovery
    >> database is promoted to master, following the promotion (if you don't
    >> restart the postmaster).  I see
    >> select pg_stop_backup();
    >> ERROR:  database system status mismatches between pg_start_backup() and
    >> pg_stop_backup()
    >>
    >> If you restart the postmaster this goes away.  When the postmaster leaves
    >> recovery mode I think it should abort an existing base backup so
    >> pg_stop_backup() will say no backup in progress,
    > I don't think that it's good idea to cancel the backup when promoting
    > the standby.
    > Because if we do so, we need to handle correctly the case where cancel of backup
    > and pg_start_backup/pg_stop_backup are performed at the same time. We can
    > simply do that by protecting those whole operations including pg_start_backup's
    > checkpoint by the lwlock. But I don't think that it's worth
    > introducing new lwlock
    > only for that. And it's not good to take a lwlock through
    > time-consuming checkpoint
    > operation. Of course we can avoid such a lwlock, but which would require more
    > complicated code.
    >
    >> or give an error message on
    >> pg_stop_backup() saying that the base backup won't be usable.  The above
    >> error doesn't really tell the user why there is a mismatch.
    > What about the following error message?
    >
    > ERROR:  pg_stop_backup() was executed during normal processing though
    > pg_start_backup() was executed during recovery
    > HINT:  The database backup will not be usable.
    >
    > Or, you have better idea?
    
    I like that error message better.  It tells me what is going on versus 
    complaining about a state mismatch.
    >> In my testing a few times I got into a situation where a standby server
    >> coming from a recovery target took a while to finish recovery (this is on a
    >> database with no activity).  Then when i tried promoting that server to
    >> master I got
    >>
    >> LOG:  trigger file found: /tmp/3
    >> FATAL:  terminating walreceiver process due to administrator command
    >> LOG:  restored log file "000000010000000000000009" from archive
    >> LOG:  restored log file "000000010000000000000009" from archive
    >> LOG:  redo done at 0/90000E8
    >> LOG:  restored log file "000000010000000000000009" from archive
    >> PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
    >> LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
    >> LOG:  terminating any other active server processes
    >>
    >> It is *possible* I mixed up the order of a step somewhere since my testing
    >> isn't script based. A standby server that 'looks' okay but can't actually be
    >> promoted is dangerous.
    > Looks the same problem as the above. Another weired point is that
    > the same archived WAL file is restored two times before redo is done.
    > I'm not sure why this happens... Could you provide the test case which
    > reproduces this problem? Will diagnose.
    >
    > Regards,
    >
    
    
    
  33. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-09-28T01:58:19Z

    On Wed, Sep 28, 2011 at 8:10 AM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
    > This is the test procedure I'm trying today, I wasn't able to reproduce the
    > crash.  What I was doing the other day was similar but I can't speak to
    > unintentional differences.
    
    Thanks for the info! I tried your test case three times, but was not able to
    reproduce the issue, too.
    
    BTW, I created the shell script (attached) which runs your test scenario and
    used it for the test.
    
    If the issue will happen again, please feel free to share the information about
    it. I will diagnose it.
    
    > It looks like data3 is still pulling files with the recovery command after
    > it sees the touch file (is this expected behaviour?)
    
    Yes, that's expected behavior. After the trigger file is found, PostgreSQL
    tries to replay all available WAL files in pg_xlog directory and archive one.
    So, if there is unreplayed archived WAL file at that time, PostgreSQL tries
    to pull it by calling the recovery command.
    
    And, after WAL replay is done, PostgreSQL tries to re-fetch the last
    replayed WAL record in order to identify the end of replay location. So,
    if the last replayed record is included in the archived WAL file, it's pulled
    by the recovery command.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  34. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-09T18:04:25Z

    I created a patch corresponding FPW.
    Fujii's patch (ver 9) is based.
    
     Manage own FPW in shared-memory (on master)
       * startup and walwriter process update it. startup initializes it
         after REDO. walwriter updates it when started or received SIGHUP.
    
     Insert WAL including a value of current FPW (on master)
       * In the the same timing as update, they insert WAL (is named
         XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
       * When it creates CHECKPOINT, it adds a value of current FPW to the
         CHECKPOINT WAL.
    
     Manage master's FPW in local-memory in startup (on standby)
       * It takes a value of the master's FPW by reading XLOG_FPW_CHANGE at
         REDO.
    
     Check when pg_start_backup/pg_stop_backup (on standby)
       * It checks to use these two value.
           * master's FPW at latest CHECKPOINT
           * current master's FPW by XLOG_FPW_CHANGE
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  35. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2011-10-09T18:56:11Z

    2011/10/9 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    
    >  Insert WAL including a value of current FPW (on master)
    >   * In the the same timing as update, they insert WAL (is named
    >     XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
    >   * When it creates CHECKPOINT, it adds a value of current FPW to the
    >     CHECKPOINT WAL.
    
    I can't see a reason why we would use a new WAL record for this,
    rather than modify the XLOG_PARAMETER_CHANGE record type which was
    created for a very similar reason.
    The code would be much simpler if we just extend
    XLOG_PARAMETER_CHANGE, so please can we do that?
    
    The log message "full_page_writes on master is set invalid more than
    once during online backup" should read "at least once" rather than
    "more than once".
    
    lastFpwDisabledLSN needs to be initialized.
    
    Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    we log parameters after every checkpoint then we'll know the values
    when we startup. If we keep logging parameters this way we'll end up
    with a very awkward and large control file. I would personally prefer
    to avoid that, but that thought could go either way. Let's see if
    anyone else thinks that also.
    
    Looks good.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  36. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-11T10:15:54Z

    > I can't see a reason why we would use a new WAL record for this,
    > rather than modify the XLOG_PARAMETER_CHANGE record type which was
    > created for a very similar reason.
    > The code would be much simpler if we just extend
    > XLOG_PARAMETER_CHANGE, so please can we do that?
    
    Sure.
    
    > The log message "full_page_writes on master is set invalid more than
    > once during online backup" should read "at least once" rather than
    > "more than once".
    
    Yes.
    
    > lastFpwDisabledLSN needs to be initialized.
    
    I think it don't need because all values in XLogCtl is initialized 0.
    
    > Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    > we log parameters after every checkpoint then we'll know the values
    > when we startup. If we keep logging parameters this way we'll end up
    > with a very awkward and large control file. I would personally prefer
    > to avoid that, but that thought could go either way. Let's see if
    > anyone else thinks that also.
    
    Yes. I add to CreateCheckPoint().
    
    Image:
      CreateCheckPoint()
      {
         if (!shutdown && XLogStandbyInfoActive())
         {
            LogStandbySnapshot()
            XLogReportParameters()
         }
       }
    
      XLogReportParameters()
      {
         if (fpw == 'off' || ... )
             XLOGINSERT()
      }
    
    However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
    (It will increases the amount of WAL.)
    Is it OK?
    
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  37. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-11T15:17:27Z

    > > I can't see a reason why we would use a new WAL record for this,
    > > rather than modify the XLOG_PARAMETER_CHANGE record type which was
    > > created for a very similar reason.
    > > The code would be much simpler if we just extend
    > > XLOG_PARAMETER_CHANGE, so please can we do that?
    > 
    > Sure.
    > 
    > > The log message "full_page_writes on master is set invalid more than
    > > once during online backup" should read "at least once" rather than
    > > "more than once".
    > 
    > Yes.
    > 
    > > lastFpwDisabledLSN needs to be initialized.
    > 
    > I think it don't need because all values in XLogCtl is initialized 0.
    > 
    > > Is there a reason to add lastFpwDisabledLSN onto the Control file? If
    > > we log parameters after every checkpoint then we'll know the values
    > > when we startup. If we keep logging parameters this way we'll end up
    > > with a very awkward and large control file. I would personally prefer
    > > to avoid that, but that thought could go either way. Let's see if
    > > anyone else thinks that also.
    > 
    > Yes. I add to CreateCheckPoint().
    > 
    > Image:
    >   CreateCheckPoint()
    >   {
    >      if (!shutdown && XLogStandbyInfoActive())
    >      {
    >         LogStandbySnapshot()
    >         XLogReportParameters()
    >      }
    >    }
    > 
    >   XLogReportParameters()
    >   {
    >      if (fpw == 'off' || ... )
    >          XLOGINSERT()
    >   }
    > 
    > However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
    > (It will increases the amount of WAL.)
    > Is it OK?
    
    Done.
    
    Updated patch attached.
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  38. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-10-11T21:44:34Z

    On 11-10-11 11:17 AM, Jun Ishiduka wrote:
    > Done.
    >
    > Updated patch attached.
    >
    
    I have taken Jun's latest patch and applied it on top of Fujii's most
    recent patch. I did some testing with the result but nothing theory
    enough to stumble on any race conditions.
    
    Some testing notes
    ------------------------------
    select pg_start_backup('x');
    ERROR: full_page_writes on master is set invalid at least once since
    latest checkpoint
    
    I think this error should be rewritten as
    ERROR: full_page_writes on master has been off at some point since
    latest checkpoint
    
    We should be using 'off' instead of 'invalid' since that is what is what
    the user sets it to.
    
    
    I switched full_page_writes=on , on the master
    
    did a pg_start_backup() on the slave1.
    
    Then I switched full_page_writes=off on the master, did a reload +
    checkpoint.
    
    I was able to then do my backup of slave1, copy the control file, and
    pg_stop_backup().
    When I did the test slave2 started okay, but is this safe? Do we need a
    warning from pg_stop_backup() that is printed if it is detected that
    full_page_writes was turned off on the master during the backup period?
    
    
    Code Notes
    ---------------------
    *** 6865,6870 ****
    --- 6871,6886 ----
    /* Pre-scan prepared transactions to find out the range of XIDs present */
    oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
    
    + /*
    + * The startup updates FPW in shaerd-memory after REDO. However, it must
    + * perform before writing the WAL of the CHECKPOINT. The reason is that
    + * it uses a value of fpw in shared-memory when it writes a WAL of its
    + * CHECKPOTNT.
    + */
    
    Minor typo above at 'CHECKPOTNT'
    
    
    
    If my concern about full page writes being switched to off in the middle
    of a backup is unfounded then I think this patch is ready for a
    committer. They can clean the two editorial changes when they apply the
    patches.
    
    If do_pg_stop_backup is going to need some logic to recheck the full
    page write status then an updated patch is required.
    
    
    
    
    
    > Regards.
    >
    > --------------------------------------------
    > Jun Ishizuka
    > NTT Software Corporation
    > TEL:045-317-7018
    > E-Mail: ishizuka.jun@po.ntts.co.jp
    > --------------------------------------------
    >
    >
    >
    
    
  39. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-12T02:43:59Z

    > Some testing notes
    > ------------------------------
    > select pg_start_backup('x');
    > ERROR: full_page_writes on master is set invalid at least once since
    > latest checkpoint
    > 
    > I think this error should be rewritten as
    > ERROR: full_page_writes on master has been off at some point since
    > latest checkpoint
    > 
    > We should be using 'off' instead of 'invalid' since that is what is what
    > the user sets it to.
    
    Sure.
    
    
    > I switched full_page_writes=on , on the master
    > 
    > did a pg_start_backup() on the slave1.
    > 
    > Then I switched full_page_writes=off on the master, did a reload +
    > checkpoint.
    > 
    > I was able to then do my backup of slave1, copy the control file, and
    > pg_stop_backup().
    >
    > When I did the test slave2 started okay, but is this safe? Do we need a
    > warning from pg_stop_backup() that is printed if it is detected that
    > full_page_writes was turned off on the master during the backup period?
    
    I also reproduced.
    
    pg_stop_backup() fails in most cases.
    However, it succeeds if both the following cases are true.
      * checkpoint is done before walwriter recieves SIGHUP.
      * slave1 has not received the WAL of 'off' by SIGHUP yet.
    
    
    
    > Minor typo above at 'CHECKPOTNT'
    
    Yes.
    
    
    > If my concern about full page writes being switched to off in the middle
    > of a backup is unfounded then I think this patch is ready for a
    > committer. They can clean the two editorial changes when they apply the
    > patches.
    
    Yes. I'll clean since these comments fix.
    
    
    > If do_pg_stop_backup is going to need some logic to recheck the full
    > page write status then an updated patch is required.
    
    It already contains.
    
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  40. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-12T07:27:02Z

    > > Some testing notes
    > > ------------------------------
    > > select pg_start_backup('x');
    > > ERROR: full_page_writes on master is set invalid at least once since
    > > latest checkpoint
    > > 
    > > I think this error should be rewritten as
    > > ERROR: full_page_writes on master has been off at some point since
    > > latest checkpoint
    > > 
    > > We should be using 'off' instead of 'invalid' since that is what is what
    > > the user sets it to.
    > 
    > Sure.
    
    
    > > Minor typo above at 'CHECKPOTNT'
    > 
    > Yes.
    
    
    I updated to patch corresponded above-comments.
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  41. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-12T07:53:52Z

    2011/10/12 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    > > ERROR: full_page_writes on master is set invalid at least once since
    > > latest checkpoint
    > >
    > > I think this error should be rewritten as
    > > ERROR: full_page_writes on master has been off at some point since
    > > latest checkpoint
    > >
    > > We should be using 'off' instead of 'invalid' since that is what is what
    > > the user sets it to.
    >
    > Sure.
    
    What about the following message? It sounds more precise to me.
    
    ERROR: WAL generated with full_page_writes=off was replayed since last
    restartpoint
    
    > I updated to patch corresponded above-comments.
    
    Thanks for updating the patch! Here are the comments:
    
     	 * don't yet have the insert lock, forcePageWrites could change under us,
     	 * but we'll recheck it once we have the lock.
     	 */
    -	doPageWrites = fullPageWrites || Insert->forcePageWrites;
    +	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;
    
    The source comment needs to be modified.
    
     	 * just turned off, we could recompute the record without full pages, but
     	 * we choose not to bother.)
     	 */
    -	if (Insert->forcePageWrites && !doPageWrites)
    +	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)
    
    Same as above.
    
    +	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    +	XLogCtl->Insert.fullPageWrites = fullPageWrites;
    +	LWLockRelease(WALInsertLock);
    
    I don't think WALInsertLock needs to be hold here because there is no
    concurrently running process which can access Insert.fullPageWrites.
    For example, Insert->currpos and Insert->LogwrtResult are also changed
    without the lock there.
    
    The source comment of XLogReportParameters() needs to be modified.
    
    XLogReportParameters() should skip writing WAL if full_page_writes has not been
    changed by SIGHUP.
    
    XLogReportParameters() should skip updating pg_control if any parameter related
    to hot standby has not been changed.
    
    +	if (!fpw_manager)
    +	{
    +		LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    +		fpw = XLogCtl->Insert.fullPageWrites;
    +		LWLockRelease(WALInsertLock);
    
    It's safe to take WALInsertLock with shared mode here.
    
    In checkpoint, XLogReportParameters() is called only when wal_level is
    hot_standby.
    OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    Can't we skip calling XLogReportParameters() whenever wal_level is not
    hot_standby?
    
    In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    see XLogCtl->lastFpwDisabledLSN.
    
    +	/* check whether the master's FPW is 'off' since pg_start_backup. */
    +	if (recovery_in_progress && XLByteLE(startpoint, XLogCtl->lastFpwDisabledLSN))
    +		ereport(ERROR,
    +				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    +			  errmsg("full_page_writes on master has been off at some point
    during online backup")));
    
    What about changing the error message to:
    ERROR: WAL generated with full_page_writes=off was replayed during online backup
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  42. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T04:31:27Z

    > > > ERROR: full_page_writes on master is set invalid at least once since
    > > > latest checkpoint
    > > >
    > > > I think this error should be rewritten as
    > > > ERROR: full_page_writes on master has been off at some point since
    > > > latest checkpoint
    > > >
    > > > We should be using 'off' instead of 'invalid' since that is what is what
    > > > the user sets it to.
    > >
    > > Sure.
    > 
    > What about the following message? It sounds more precise to me.
    > 
    > ERROR: WAL generated with full_page_writes=off was replayed since last
    > restartpoint
    
    Okay, I changes the patch to this messages.
    If someone says there is a idea better than it, I will consider again.
    
    
    > > I updated to patch corresponded above-comments.
    > 
    > Thanks for updating the patch! Here are the comments:
    > 
    >  	 * don't yet have the insert lock, forcePageWrites could change under us,
    >  	 * but we'll recheck it once we have the lock.
    >  	 */
    > -	doPageWrites = fullPageWrites || Insert->forcePageWrites;
    > +	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;
    > 
    > The source comment needs to be modified.
    >
    >  	 * just turned off, we could recompute the record without full pages, but
    >  	 * we choose not to bother.)
    >  	 */
    > -	if (Insert->forcePageWrites && !doPageWrites)
    > +	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)
    > 
    > Same as above.
    
    Sure.
    
    
    > XLogReportParameters() should skip writing WAL if full_page_writes has not been
    > changed by SIGHUP.
    > 
    > XLogReportParameters() should skip updating pg_control if any parameter related
    > to hot standby has not been changed.
    
    YES.
    
    
    > In checkpoint, XLogReportParameters() is called only when wal_level is
    > hot_standby.
    > OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    > Can't we skip calling XLogReportParameters() whenever wal_level is not
    > hot_standby?
    
    Yes, It is possible.
    
    
    > In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    > see XLogCtl->lastFpwDisabledLSN.
    
    Yes.
    
    
    > What about changing the error message to:
    > ERROR: WAL generated with full_page_writes=off was replayed during online backup
    
    Okay, too.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  43. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T05:01:38Z

    Sorry.
    I was not previously able to answer fujii's all comments.
    This is the remaining answers.
    
    
    > +	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    > +	XLogCtl->Insert.fullPageWrites = fullPageWrites;
    > +	LWLockRelease(WALInsertLock);
    > 
    > I don't think WALInsertLock needs to be hold here because there is no
    > concurrently running process which can access Insert.fullPageWrites.
    > For example, Insert->currpos and Insert->LogwrtResult are also changed
    > without the lock there.
    > 
    
    Yes. 
    
    > The source comment of XLogReportParameters() needs to be modified.
    
    Yes, too.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  44. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T09:39:09Z

    > 
    > > > > ERROR: full_page_writes on master is set invalid at least once since
    > > > > latest checkpoint
    > > > >
    > > > > I think this error should be rewritten as
    > > > > ERROR: full_page_writes on master has been off at some point since
    > > > > latest checkpoint
    > > > >
    > > > > We should be using 'off' instead of 'invalid' since that is what is what
    > > > > the user sets it to.
    > > >
    > > > Sure.
    > > 
    > > What about the following message? It sounds more precise to me.
    > > 
    > > ERROR: WAL generated with full_page_writes=off was replayed since last
    > > restartpoint
    > 
    > Okay, I changes the patch to this messages.
    > If someone says there is a idea better than it, I will consider again.
    > 
    > 
    > > > I updated to patch corresponded above-comments.
    > > 
    > > Thanks for updating the patch! Here are the comments:
    > > 
    > >  	 * don't yet have the insert lock, forcePageWrites could change under us,
    > >  	 * but we'll recheck it once we have the lock.
    > >  	 */
    > > -	doPageWrites = fullPageWrites || Insert->forcePageWrites;
    > > +	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;
    > > 
    > > The source comment needs to be modified.
    > >
    > >  	 * just turned off, we could recompute the record without full pages, but
    > >  	 * we choose not to bother.)
    > >  	 */
    > > -	if (Insert->forcePageWrites && !doPageWrites)
    > > +	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)
    > > 
    > > Same as above.
    > 
    > Sure.
    > 
    > 
    > > XLogReportParameters() should skip writing WAL if full_page_writes has not been
    > > changed by SIGHUP.
    > > 
    > > XLogReportParameters() should skip updating pg_control if any parameter related
    > > to hot standby has not been changed.
    > 
    > YES.
    > 
    > 
    > > In checkpoint, XLogReportParameters() is called only when wal_level is
    > > hot_standby.
    > > OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
    > > Can't we skip calling XLogReportParameters() whenever wal_level is not
    > > hot_standby?
    > 
    > Yes, It is possible.
    > 
    > 
    > > In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
    > > see XLogCtl->lastFpwDisabledLSN.
    > 
    > Yes.
    > 
    > 
    > > What about changing the error message to:
    > > ERROR: WAL generated with full_page_writes=off was replayed during online backup
    > 
    > Okay, too.
    
    > Sorry.
    > I was not previously able to answer fujii's all comments.
    > This is the remaining answers.
    > 
    > 
    > > +	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
    > > +	XLogCtl->Insert.fullPageWrites = fullPageWrites;
    > > +	LWLockRelease(WALInsertLock);
    > > 
    > > I don't think WALInsertLock needs to be hold here because there is no
    > > concurrently running process which can access Insert.fullPageWrites.
    > > For example, Insert->currpos and Insert->LogwrtResult are also changed
    > > without the lock there.
    > > 
    > 
    > Yes. 
    > 
    > > The source comment of XLogReportParameters() needs to be modified.
    > 
    > Yes, too.
    
    Done.
    I updated to patch corresponded above-comments.
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  45. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-13T13:44:50Z

    On Mon, Oct 10, 2011 at 3:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > 2011/10/9 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >
    >>  Insert WAL including a value of current FPW (on master)
    >>   * In the the same timing as update, they insert WAL (is named
    >>     XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
    >>   * When it creates CHECKPOINT, it adds a value of current FPW to the
    >>     CHECKPOINT WAL.
    >
    > I can't see a reason why we would use a new WAL record for this,
    > rather than modify the XLOG_PARAMETER_CHANGE record type which was
    > created for a very similar reason.
    > The code would be much simpler if we just extend
    > XLOG_PARAMETER_CHANGE, so please can we do that?
    
    After reading Ishiduka-san's patch, I'm thinking the opposite because
    (1) Whenever full_page_writes must be WAL-logged, there is no need
    to WAL-log the HS parameters. The opposite is also true. (2) How
    full_page_writes record should be replayed is quite different from
    how HS parameters record is.
    
    So ISTM that the code would be simpler if we introduce new WAL
    record for full_page_writes. Thought?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  46. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-14T12:28:29Z

    2011/10/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    > I updated to patch corresponded above-comments.
    
    Thanks for updating the patch!
    
    As I suggested in the reply to Simon, I think that the change of FPW
    should be WAL-logged separately from that of HS parameters. ISTM
    packing them in one WAL record makes XLogReportParameters()
    quite confusing. Thought?
    
     	if (!shutdown && XLogStandbyInfoActive())
    +	{
     		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    +		XLogReportParameters(REPORT_ON_BACKEND);
    +	}
    
    Why doesn't the change of FPW need to be WAL-logged when
    shutdown checkpoint is performed? It's helpful to add the comment
    explaining why.
    
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  47. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-15T01:35:45Z

    > As I suggested in the reply to Simon, I think that the change of FPW
    > should be WAL-logged separately from that of HS parameters. ISTM
    > packing them in one WAL record makes XLogReportParameters()
    > quite confusing. Thought?
    
    I want to confirm the reply of Simon. I think we cannot decide how this
    code should be if there is not the reply.
    
    
    >  	if (!shutdown && XLogStandbyInfoActive())
    > +	{
    >  		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    > +		XLogReportParameters(REPORT_ON_BACKEND);
    > +	}
    > 
    > Why doesn't the change of FPW need to be WAL-logged when
    > shutdown checkpoint is performed? It's helpful to add the comment
    > explaining why.
    
    Sure. I update the patch soon.
    
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  48. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-15T02:12:36Z

    > >  	if (!shutdown && XLogStandbyInfoActive())
    > > +	{
    > >  		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    > > +		XLogReportParameters(REPORT_ON_BACKEND);
    > > +	}
    > > 
    > > Why doesn't the change of FPW need to be WAL-logged when
    > > shutdown checkpoint is performed? It's helpful to add the comment
    > > explaining why.
    > 
    > Sure. I update the patch soon.
    
    Done.
    Please check this.
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  49. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-17T07:16:05Z

    2011/10/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >
    >> >     if (!shutdown && XLogStandbyInfoActive())
    >> > +   {
    >> >             LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
    >> > +           XLogReportParameters(REPORT_ON_BACKEND);
    >> > +   }
    >> >
    >> > Why doesn't the change of FPW need to be WAL-logged when
    >> > shutdown checkpoint is performed? It's helpful to add the comment
    >> > explaining why.
    >>
    >> Sure. I update the patch soon.
    >
    > Done.
    
    + 		/*
    + 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
    + 		 * not need to write WAL of FPW at checkpoint shutdown because it
    + 		 * performs when startup finishes.
    + 		 */
    + 		XLogReportParameters(REPORT_ON_BACKEND);
    
    I'm still unclear why that WAL doesn't need to be written at shutdown
    checkpoint.
    Anyway, the first sentence in the above comments is not right. Not a backend but
    a bgwriter writes that WAL at checkpoint.
    
    The second also seems not to be right. It implies that a shutdown checkpoint is
    performed only at end of startup. But it may be done when smart or fast shutdown
    is requested.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  50. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-18T06:25:46Z

    > + 		/*
    > + 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
    > + 		 * not need to write WAL of FPW at checkpoint shutdown because it
    > + 		 * performs when startup finishes.
    > + 		 */
    > + 		XLogReportParameters(REPORT_ON_BACKEND);
    > 
    > I'm still unclear why that WAL doesn't need to be written at shutdown
    > checkpoint.
    > Anyway, the first sentence in the above comments is not right. Not a backend but
    > a bgwriter writes that WAL at checkpoint.
    > 
    > The second also seems not to be right. It implies that a shutdown checkpoint is
    > performed only at end of startup. But it may be done when smart or fast shutdown
    > is requested.
    
    
    Okay. 
    I change to the following messages.
    
    /* 
     * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
     * Because XLogReportParameters() is always called at the end of startup
     * process, it does not need to be called at shutdown.
     */
    
    
    In addition, I change macro name.
    
    REPORT_ON_BACKEND -> REPORT_ON_BGWRITER
    
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
    
    
    
  51. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-19T02:47:08Z

    > > + 		/*
    > > + 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
    > > + 		 * not need to write WAL of FPW at checkpoint shutdown because it
    > > + 		 * performs when startup finishes.
    > > + 		 */
    > > + 		XLogReportParameters(REPORT_ON_BACKEND);
    > > 
    > > I'm still unclear why that WAL doesn't need to be written at shutdown
    > > checkpoint.
    > > Anyway, the first sentence in the above comments is not right. Not a backend but
    > > a bgwriter writes that WAL at checkpoint.
    > > 
    > > The second also seems not to be right. It implies that a shutdown checkpoint is
    > > performed only at end of startup. But it may be done when smart or fast shutdown
    > > is requested.
    > 
    > 
    > Okay. 
    > I change to the following messages.
    > 
    > /* 
    >  * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
    >  * Because XLogReportParameters() is always called at the end of startup
    >  * process, it does not need to be called at shutdown.
    >  */
    > 
    > 
    > In addition, I change macro name.
    > 
    > REPORT_ON_BACKEND -> REPORT_ON_BGWRITER
    
    I have updated as above-comment.
    Please check this.
    
    Regards.
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  52. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-19T07:37:32Z

    > As I suggested in the reply to Simon, I think that the change of FPW
    > should be WAL-logged separately from that of HS parameters. ISTM
    > packing them in one WAL record makes XLogReportParameters()
    > quite confusing. Thought?
    
    I updated a patch for what you have suggested (that the change of FPW
    should be WAL-logged separately from that of HS parameters).
    
    I want to base on this patch if there are no other opinions.
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  53. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-24T12:29:20Z

    2011/10/19 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
    >> As I suggested in the reply to Simon, I think that the change of FPW
    >> should be WAL-logged separately from that of HS parameters. ISTM
    >> packing them in one WAL record makes XLogReportParameters()
    >> quite confusing. Thought?
    >
    > I updated a patch for what you have suggested (that the change of FPW
    > should be WAL-logged separately from that of HS parameters).
    >
    > I want to base on this patch if there are no other opinions.
    
    Thanks for updating the patch!
    
    Attached is the updated version of the patch. I merged your patch into
    standby_online_backup_09_fujii.patch, refactored the code, fixed some
    bugs, added lots of source code comments, but didn't change the basic
    design that you proposed.
    
    In your patch, FPW is always WAL-logged at startup even when FPW has
    not been changed since last shutdown. I don't think that's required.
    I changed the recovery code so that it keeps track of last FPW indicated
    by WAL record. Then, at end of startup, if that FPW is equal to FPW
    specified in postgresql.conf (which means that FPW has not been changed
    since last shutdown or crash), WAL-logging of FPW is skipped. This change
    prevents unnecessary WAL-logging. Thought?
    
    Is the patch well-formed enough to mark as ready-for-committer? It would
    be very helpful if you review the patch.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  54. Re: Online base backup from the hot-standby

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-24T15:24:28Z

    On 24.10.2011 15:29, Fujii Masao wrote:
    > +    <listitem>
    > +     <para>
    > +      Copy the pg_control file from the cluster directory to the global
    > +      sub-directory of the backup. For example:
    > + <programlisting>
    > + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    > + </programlisting>
    > +     </para>
    > +    </listitem>
    
    Why is this step required? The control file is overwritten by 
    information from the backup_label anyway, no?
    
    > +    <listitem>
    > +     <para>
    > +      Again connect to the database as a superuser, and execute
    > +      <function>pg_stop_backup</>. This terminates the backup mode, but does not
    > +      perform a switch to the next WAL segment, create a backup history file and
    > +      wait for all required WAL segments to be archived,
    > +      unlike that during normal processing.
    > +     </para>
    > +    </listitem>
    
    How do you ensure that all the required WAL segments have been archived, 
    then?
    
    > +   </orderedlist>
    > +    </para>
    > +
    > +    <para>
    > +     You cannot use the <application>pg_basebackup</> tool to take the backup
    > +     from the standby.
    > +    </para>
    
    Why not? We have cascading replication now.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  55. Re: Online base backup from the hot-standby

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-24T15:33:40Z

    On 24.10.2011 15:29, Fujii Masao wrote:
    > In your patch, FPW is always WAL-logged at startup even when FPW has
    > not been changed since last shutdown. I don't think that's required.
    > I changed the recovery code so that it keeps track of last FPW indicated
    > by WAL record. Then, at end of startup, if that FPW is equal to FPW
    > specified in postgresql.conf (which means that FPW has not been changed
    > since last shutdown or crash), WAL-logging of FPW is skipped. This change
    > prevents unnecessary WAL-logging. Thought?
    
    One problem with this whole FPW-tracking is that pg_lesslog makes it 
    fail. I'm not sure what we need to do about that - maybe just add a 
    warning to the docs. But it leaves a bit bad feeling in my mouth. 
    Usually we try to make features work orthogonally, without dependencies 
    to other settings. Now this feature requires that full_page_writes is 
    turned on in the master, and also that you don't use pg_lesslog to 
    compress the WAL segments or your base backup might be corrupt. The 
    procedure to take a backup from the standby seems more complicated than 
    taking it on the master - there are more steps to follow.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  56. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2011-10-24T15:38:18Z

    On Mon, Oct 24, 2011 at 11:33 AM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    > On 24.10.2011 15:29, Fujii Masao wrote:
    >>
    >> In your patch, FPW is always WAL-logged at startup even when FPW has
    >> not been changed since last shutdown. I don't think that's required.
    >> I changed the recovery code so that it keeps track of last FPW indicated
    >> by WAL record. Then, at end of startup, if that FPW is equal to FPW
    >> specified in postgresql.conf (which means that FPW has not been changed
    >> since last shutdown or crash), WAL-logging of FPW is skipped. This change
    >> prevents unnecessary WAL-logging. Thought?
    >
    > One problem with this whole FPW-tracking is that pg_lesslog makes it fail.
    > I'm not sure what we need to do about that - maybe just add a warning to the
    > docs. But it leaves a bit bad feeling in my mouth. Usually we try to make
    > features work orthogonally, without dependencies to other settings. Now this
    > feature requires that full_page_writes is turned on in the master, and also
    > that you don't use pg_lesslog to compress the WAL segments or your base
    > backup might be corrupt. The procedure to take a backup from the standby
    > seems more complicated than taking it on the master - there are more steps
    > to follow.
    
    Doing it on the master isn't as easy as I'd like it to be, either.
    
    But it's not really clear how to make it simpler.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  57. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T05:12:20Z

    Thanks for the review!
    
    On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    > On 24.10.2011 15:29, Fujii Masao wrote:
    >>
    >> +    <listitem>
    >> +     <para>
    >> +      Copy the pg_control file from the cluster directory to the global
    >> +      sub-directory of the backup. For example:
    >> + <programlisting>
    >> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    >> + </programlisting>
    >> +     </para>
    >> +    </listitem>
    >
    > Why is this step required? The control file is overwritten by information
    > from the backup_label anyway, no?
    
    Yes, when recovery starts, the control file is overwritten. But before that,
    we retrieve the minimum recovery point from the control file. Then it's used
    as the backup end location.
    
    During recovery, pg_stop_backup() cannot write an end-of-backup record.
    So, in standby-only backup, other way to retrieve the backup end location
    (instead of an end-of-backup record) is required. Ishiduka-san used the
    control file as that, according to your suggestion ;)
    http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php
    
    >> +    <listitem>
    >> +     <para>
    >> +      Again connect to the database as a superuser, and execute
    >> +      <function>pg_stop_backup</>. This terminates the backup mode, but
    >> does not
    >> +      perform a switch to the next WAL segment, create a backup history
    >> file and
    >> +      wait for all required WAL segments to be archived,
    >> +      unlike that during normal processing.
    >> +     </para>
    >> +    </listitem>
    >
    > How do you ensure that all the required WAL segments have been archived,
    > then?
    
    The patch doesn't provide any capability to ensure that, IOW assumes that's
    a user responsibility. If a user wants to ensure that, he/she needs to calculate
    the backup start and end WAL files from the result of pg_start_backup()
    and pg_stop_backup() respectively, and needs to wait until those files have
    appeared in the archive. Also if the required WAL file has not been archived
    yet, a user might need to execute pg_switch_xlog() in the master.
    
    If we change pg_stop_backup() so that, even during recovery, it waits until
    all required WAL files have been archived, we would need to WAL-log
    the completion of WAL archiving in the master. This enables the standby to
    check whether specified WAL files have been archived. We should change
    the patch in this way? But even if we change, you still might need to execute
    pg_switch_xlog() in the master additionally, and pg_stop_backup() might keep
    waiting infinitely if the master is not in progress.
    
    >> +   </orderedlist>
    >> +    </para>
    >> +
    >> +    <para>
    >> +     You cannot use the <application>pg_basebackup</> tool to take the
    >> backup
    >> +     from the standby.
    >> +    </para>
    >
    > Why not? We have cascading replication now.
    
    Because no one has implemented that feature.
    
    Yeah, we have cascading replication, but without adopting the standby-only
    backup patch, pg_basebackup cannot execute do_pg_start_backup() and
    do_pg_stop_backup() during recovery. So we can think that the patch that
    Ishiduka-san proposed is the first step to extend pg_basebackup so that it
    can take backup from the standby.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  58. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T05:37:48Z

    On Tue, Oct 25, 2011 at 12:33 AM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    > One problem with this whole FPW-tracking is that pg_lesslog makes it fail.
    > I'm not sure what we need to do about that - maybe just add a warning to the
    > docs. But it leaves a bit bad feeling in my mouth. Usually we try to make
    > features work orthogonally, without dependencies to other settings. Now this
    > feature requires that full_page_writes is turned on in the master, and also
    > that you don't use pg_lesslog to compress the WAL segments or your base
    > backup might be corrupt.
    
    Right, pg_lesslog users cannot use the documented procedure. They need to
    do more complex one;
    
    1. Execute pg_start_backup() in the master, and save its return value.
    2. Wait until the backup starting checkpoint record has been replayed
        in the standby. You can do this by comparing the return value of
        pg_start_backup() with pg_last_replay_location().
    3. Do the documented standby-only backup procedure.
    4. Execute pg_stop_backup() in the master.
    
    This is complicated, but I'm not sure how we can simplify it. Anyway we can
    document this procedure for pg_lesslog users. We should?
    
    > The procedure to take a backup from the standby
    > seems more complicated than taking it on the master - there are more steps
    > to follow.
    
    Extending pg_basebackup so that it can take a backup from the standby would
    make the procedure simple to a certain extent, I think. Though a user
    still needs
    to enable FPW in the master and must not use pg_lesslog.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  59. Re: Online base backup from the hot-standby

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-25T06:44:30Z

    On 25.10.2011 08:12, Fujii Masao wrote:
    > On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
    > <heikki.linnakangas@enterprisedb.com>  wrote:
    >> On 24.10.2011 15:29, Fujii Masao wrote:
    >>>
    >>> +<listitem>
    >>> +<para>
    >>> +      Copy the pg_control file from the cluster directory to the global
    >>> +      sub-directory of the backup. For example:
    >>> +<programlisting>
    >>> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
    >>> +</programlisting>
    >>> +</para>
    >>> +</listitem>
    >>
    >> Why is this step required? The control file is overwritten by information
    >> from the backup_label anyway, no?
    >
    > Yes, when recovery starts, the control file is overwritten. But before that,
    > we retrieve the minimum recovery point from the control file. Then it's used
    > as the backup end location.
    >
    > During recovery, pg_stop_backup() cannot write an end-of-backup record.
    > So, in standby-only backup, other way to retrieve the backup end location
    > (instead of an end-of-backup record) is required. Ishiduka-san used the
    > control file as that, according to your suggestion ;)
    > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php
    
    Oh :-)
    
    >>> +<para>
    >>> +      Again connect to the database as a superuser, and execute
    >>> +<function>pg_stop_backup</>. This terminates the backup mode, but
    >>> does not
    >>> +      perform a switch to the next WAL segment, create a backup history
    >>> file and
    >>> +      wait for all required WAL segments to be archived,
    >>> +      unlike that during normal processing.
    >>> +</para>
    >>> +</listitem>
    >>
    >> How do you ensure that all the required WAL segments have been archived,
    >> then?
    >
    > The patch doesn't provide any capability to ensure that, IOW assumes that's
    > a user responsibility. If a user wants to ensure that, he/she needs to calculate
    > the backup start and end WAL files from the result of pg_start_backup()
    > and pg_stop_backup() respectively, and needs to wait until those files have
    > appeared in the archive. Also if the required WAL file has not been archived
    > yet, a user might need to execute pg_switch_xlog() in the master.
    
    Frankly, I think this whole thing is too fragile. The procedure is 
    superficially similar to what you do on master: run pg_start_backup(), 
    rsync data directory, run pg_stop_backup(), but is actually subtly 
    different and more complicated. If you don't know that, and don't follow 
    the full procedure, you get a corrupt backup. And the backup might look 
    ok, and might even sometimes work, which means that you won't notice in 
    quick testing. That's a *huge* foot-gun.
    
    I think we need to step back and find a way to make this:
    a) less complicated, or at least
    b) more robust, so that if you don't follow the procedure, you get an error.
    
    With pg_basebackup, we have a fighting chance of getting this right, 
    because we have more control over how the backup is made. For example, 
    we can co-operate with the buffer manager to avoid torn-pages, 
    eliminating the need for full_page_writes=on, and we can include a 
    control file with the correct end-of-backup location automatically, 
    without requiring user intervention. pg_basebackup is less flexible than 
    the pg_start/stop_backup method, and unfortunately you're more likely to 
    need the flexibility in a more complicated setup with a hot standby 
    server and all, but making the generic pg_start/stop_backup method work 
    seems infeasible at the moment.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  60. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T08:50:10Z

    On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
    <heikki.linnakangas@enterprisedb.com> wrote:
    >>>> +<para>
    >>>> +      Again connect to the database as a superuser, and execute
    >>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
    >>>> does not
    >>>> +      perform a switch to the next WAL segment, create a backup history
    >>>> file and
    >>>> +      wait for all required WAL segments to be archived,
    >>>> +      unlike that during normal processing.
    >>>> +</para>
    >>>> +</listitem>
    >>>
    >>> How do you ensure that all the required WAL segments have been archived,
    >>> then?
    >>
    >> The patch doesn't provide any capability to ensure that, IOW assumes
    >> that's
    >> a user responsibility. If a user wants to ensure that, he/she needs to
    >> calculate
    >> the backup start and end WAL files from the result of pg_start_backup()
    >> and pg_stop_backup() respectively, and needs to wait until those files
    >> have
    >> appeared in the archive. Also if the required WAL file has not been
    >> archived
    >> yet, a user might need to execute pg_switch_xlog() in the master.
    >
    > Frankly, I think this whole thing is too fragile. The procedure is
    > superficially similar to what you do on master: run pg_start_backup(), rsync
    > data directory, run pg_stop_backup(), but is actually subtly different and
    > more complicated. If you don't know that, and don't follow the full
    > procedure, you get a corrupt backup. And the backup might look ok, and might
    > even sometimes work, which means that you won't notice in quick testing.
    > That's a *huge* foot-gun.
    >
    > I think we need to step back and find a way to make this:
    > a) less complicated, or at least
    > b) more robust, so that if you don't follow the procedure, you get an error.
    
    One idea to make the way more robust is to change the PostgreSQL so that
    it writes the buffer page to a temporary space instead of database file
    during a backup. This means that there is no torn-pages in the database files
    of the backup. After backup, the data blocks are written back to the database
    files over time. When recovery starts from that backup(i.e., backup_label is
    found), it clears the temporary space in the backup first and continues recovery
    by using the database files which contain no torn-pages. OTOH,
    in crash recovery (i.e., backup_label is not found), recovery is performed by
    using both database files and temporary space. This whole approach would
    make the standby-only backup available even if FPW is disabled in the master
    and you don't care about the order to backup the control file.
    
    But this idea looks overkill. It seems very complicated to implement that, and
    likely to invite other bugs. I don't have any other good and simple
    idea for now.
    
    > With pg_basebackup, we have a fighting chance of getting this right, because
    > we have more control over how the backup is made. For example, we can
    > co-operate with the buffer manager to avoid torn-pages, eliminating the need
    > for full_page_writes=on, and we can include a control file with the correct
    > end-of-backup location automatically, without requiring user intervention.
    > pg_basebackup is less flexible than the pg_start/stop_backup method, and
    > unfortunately you're more likely to need the flexibility in a more
    > complicated setup with a hot standby server and all, but making the generic
    > pg_start/stop_backup method work seems infeasible at the moment.
    
    Yes, so we should give up supporting manual procedure? And extend
    pg_basebackup for the standby-only backup, first? I can live with this.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  61. Re: Online base backup from the hot-standby

    Magnus Hagander <magnus@hagander.net> — 2011-10-25T10:19:33Z

    On Tue, Oct 25, 2011 at 10:50, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
    > <heikki.linnakangas@enterprisedb.com> wrote:
    >>>>> +<para>
    >>>>> +      Again connect to the database as a superuser, and execute
    >>>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
    >>>>> does not
    >>>>> +      perform a switch to the next WAL segment, create a backup history
    >>>>> file and
    >>>>> +      wait for all required WAL segments to be archived,
    >>>>> +      unlike that during normal processing.
    >>>>> +</para>
    >>>>> +</listitem>
    >>>>
    >>>> How do you ensure that all the required WAL segments have been archived,
    >>>> then?
    >>>
    >>> The patch doesn't provide any capability to ensure that, IOW assumes
    >>> that's
    >>> a user responsibility. If a user wants to ensure that, he/she needs to
    >>> calculate
    >>> the backup start and end WAL files from the result of pg_start_backup()
    >>> and pg_stop_backup() respectively, and needs to wait until those files
    >>> have
    >>> appeared in the archive. Also if the required WAL file has not been
    >>> archived
    >>> yet, a user might need to execute pg_switch_xlog() in the master.
    >>
    >> Frankly, I think this whole thing is too fragile. The procedure is
    >> superficially similar to what you do on master: run pg_start_backup(), rsync
    >> data directory, run pg_stop_backup(), but is actually subtly different and
    >> more complicated. If you don't know that, and don't follow the full
    >> procedure, you get a corrupt backup. And the backup might look ok, and might
    >> even sometimes work, which means that you won't notice in quick testing.
    >> That's a *huge* foot-gun.
    >>
    >> I think we need to step back and find a way to make this:
    >> a) less complicated, or at least
    >> b) more robust, so that if you don't follow the procedure, you get an error.
    >
    > One idea to make the way more robust is to change the PostgreSQL so that
    > it writes the buffer page to a temporary space instead of database file
    > during a backup. This means that there is no torn-pages in the database files
    > of the backup. After backup, the data blocks are written back to the database
    > files over time. When recovery starts from that backup(i.e., backup_label is
    > found), it clears the temporary space in the backup first and continues recovery
    > by using the database files which contain no torn-pages. OTOH,
    > in crash recovery (i.e., backup_label is not found), recovery is performed by
    > using both database files and temporary space. This whole approach would
    > make the standby-only backup available even if FPW is disabled in the master
    > and you don't care about the order to backup the control file.
    >
    > But this idea looks overkill. It seems very complicated to implement that, and
    > likely to invite other bugs. I don't have any other good and simple
    > idea for now.
    >
    >> With pg_basebackup, we have a fighting chance of getting this right, because
    >> we have more control over how the backup is made. For example, we can
    >> co-operate with the buffer manager to avoid torn-pages, eliminating the need
    >> for full_page_writes=on, and we can include a control file with the correct
    >> end-of-backup location automatically, without requiring user intervention.
    >> pg_basebackup is less flexible than the pg_start/stop_backup method, and
    >> unfortunately you're more likely to need the flexibility in a more
    >> complicated setup with a hot standby server and all, but making the generic
    >> pg_start/stop_backup method work seems infeasible at the moment.
    >
    > Yes, so we should give up supporting manual procedure? And extend
    > pg_basebackup for the standby-only backup, first? I can live with this.
    
    I don't think we should necessarily give up completely. But doing a
    pg_basebackup way *first* seems reasonable - because it's going to be
    the easiest one to "get right", given that we have more control there.
    Doesn't mean we shouldn't extend it in the future...
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  62. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T11:54:58Z

    On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
    > I don't think we should necessarily give up completely. But doing a
    > pg_basebackup way *first* seems reasonable - because it's going to be
    > the easiest one to "get right", given that we have more control there.
    > Doesn't mean we shouldn't extend it in the future...
    
    Agreed. The question is -- how far should we change pg_basebackup to
    "get right"? I think it's not difficult to change it so that it backs up
    the control file at the end. But eliminating the need for full_page_writes=on
    seems not easy. No? So I'm not inclined to do that in at least first commit.
    Otherwise, I'm afraid the patch would become huge.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  63. Re: Online base backup from the hot-standby

    Magnus Hagander <magnus@hagander.net> — 2011-10-25T12:03:57Z

    On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >> I don't think we should necessarily give up completely. But doing a
    >> pg_basebackup way *first* seems reasonable - because it's going to be
    >> the easiest one to "get right", given that we have more control there.
    >> Doesn't mean we shouldn't extend it in the future...
    >
    > Agreed. The question is -- how far should we change pg_basebackup to
    > "get right"? I think it's not difficult to change it so that it backs up
    > the control file at the end. But eliminating the need for full_page_writes=on
    > seems not easy. No? So I'm not inclined to do that in at least first commit.
    > Otherwise, I'm afraid the patch would become huge.
    
    It's more server side of base backups than the actual pg_basebackup
    tool of course, but I'm sure that's what we're all referring to here.
    
    Personally, I'd see the fpw stuff as part of the infrastructure
    needed. Meaning that the fpw stuff should go in *first*, and the
    pg_basebackup stuff later.
    
    If we want something to go in early, that could be as simple as a
    version of pg_basebackup that runs against the slave but only if
    full_page_writes=on on the master. If it's not, it throws an error.
    Then we can improve upon that by adding handling of fpw=off, first by
    infrastructure, then by tool.
    
    Doing it piece by piece like that is probably a good idea, since as
    you say, all at once will be pretty huge.
    
    -- 
     Magnus Hagander
     Me: http://www.hagander.net/
     Work: http://www.redpill-linpro.com/
    
    
  64. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-10-25T12:56:47Z

    On 11-10-25 02:44 AM, Heikki Linnakangas wrote:
    > With pg_basebackup, we have a fighting chance of getting this right, 
    > because we have more control over how the backup is made. For example, 
    > we can co-operate with the buffer manager to avoid torn-pages, 
    > eliminating the need for full_page_writes=on, and we can include a 
    > control file with the correct end-of-backup location automatically, 
    > without requiring user intervention. pg_basebackup is less flexible 
    > than the pg_start/stop_backup method, and unfortunately you're more 
    > likely to need the flexibility in a more complicated setup with a hot 
    > standby server and all, but making the generic pg_start/stop_backup 
    > method work seems infeasible at the moment.
    
    Would pg_basebackup be able to work with the buffer manager on the slave 
    to avoid full_page_writes=on needing to be set on the master?  (the 
    point of this is to be able to take the base backup without having the 
    backup program contact the master). If so could pg_start_backup() not 
    just put the buffer manager into the same state?
    
    
    
    
    
  65. Re: Online base backup from the hot-standby

    Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-25T13:05:11Z

    On 25.10.2011 15:56, Steve Singer wrote:
    > On 11-10-25 02:44 AM, Heikki Linnakangas wrote:
    >> With pg_basebackup, we have a fighting chance of getting this right,
    >> because we have more control over how the backup is made. For example,
    >> we can co-operate with the buffer manager to avoid torn-pages,
    >> eliminating the need for full_page_writes=on, and we can include a
    >> control file with the correct end-of-backup location automatically,
    >> without requiring user intervention. pg_basebackup is less flexible
    >> than the pg_start/stop_backup method, and unfortunately you're more
    >> likely to need the flexibility in a more complicated setup with a hot
    >> standby server and all, but making the generic pg_start/stop_backup
    >> method work seems infeasible at the moment.
    >
    > Would pg_basebackup be able to work with the buffer manager on the slave
    > to avoid full_page_writes=on needing to be set on the master? (the point
    > of this is to be able to take the base backup without having the backup
    > program contact the master).
    
    In theory, yes. I'm not sure how difficult it would be in practice. 
    Currently, the walsender process just scans and copies everything in the 
    data directory, at the filesystem level. It would have to go through the 
    buffer manager instead, to avoid reading a page at the same time that 
    the buffer manager is writing it out.
    
    > If so could pg_start_backup() not just put the buffer manager into the same state?
    
    No. . The trick that pg_basebackup (= walsender) can do is to co-operate 
    with the buffer manager when reading each page. An external program 
    cannot do that.
    
    -- 
       Heikki Linnakangas
       EnterpriseDB   http://www.enterprisedb.com
    
    
  66. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-10-26T01:48:05Z

    On Tue, Oct 25, 2011 at 9:03 PM, Magnus Hagander <magnus@hagander.net> wrote:
    > On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
    >>> I don't think we should necessarily give up completely. But doing a
    >>> pg_basebackup way *first* seems reasonable - because it's going to be
    >>> the easiest one to "get right", given that we have more control there.
    >>> Doesn't mean we shouldn't extend it in the future...
    >>
    >> Agreed. The question is -- how far should we change pg_basebackup to
    >> "get right"? I think it's not difficult to change it so that it backs up
    >> the control file at the end. But eliminating the need for full_page_writes=on
    >> seems not easy. No? So I'm not inclined to do that in at least first commit.
    >> Otherwise, I'm afraid the patch would become huge.
    >
    > It's more server side of base backups than the actual pg_basebackup
    > tool of course, but I'm sure that's what we're all referring to here.
    >
    > Personally, I'd see the fpw stuff as part of the infrastructure
    > needed. Meaning that the fpw stuff should go in *first*, and the
    > pg_basebackup stuff later.
    
    Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
    as the infrastructure patch.
    
    The changes of pg_start_backup() etc that Ishiduka-san did are also
    a server-side infrastructure. I will extract them as another infrastructure one.
    
    Ishiduka-san, if you have time, feel free to try the above, barring objection.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  67. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-31T04:11:19Z

    > > On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
    > >> On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
    > >>> I don't think we should necessarily give up completely. But doing a
    > >>> pg_basebackup way *first* seems reasonable - because it's going to be
    > >>> the easiest one to "get right", given that we have more control there.
    > >>> Doesn't mean we shouldn't extend it in the future...
    > >>
    > >> Agreed. The question is -- how far should we change pg_basebackup to
    > >> "get right"? I think it's not difficult to change it so that it backs up
    > >> the control file at the end. But eliminating the need for full_page_writes=on
    > >> seems not easy. No? So I'm not inclined to do that in at least first commit.
    > >> Otherwise, I'm afraid the patch would become huge.
    > >
    > > It's more server side of base backups than the actual pg_basebackup
    > > tool of course, but I'm sure that's what we're all referring to here.
    > >
    > > Personally, I'd see the fpw stuff as part of the infrastructure
    > > needed. Meaning that the fpw stuff should go in *first*, and the
    > > pg_basebackup stuff later.
    > 
    > Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
    > as the infrastructure patch.
    > 
    > The changes of pg_start_backup() etc that Ishiduka-san did are also
    > a server-side infrastructure. I will extract them as another infrastructure one.
    > 
    > Ishiduka-san, if you have time, feel free to try the above, barring objection.
    
    
    Done.
    Changed the name of the patch.
    
    <Modifications>
     So changed to the positioning of infrastructure,
       * Removed the documentation.
       * changed to an error when you run pg_start/stop_backup() on the standby.
    
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------
    
  68. Re: Online base backup from the hot-standby

    Josh Berkus <josh@agliodbs.com> — 2011-11-03T23:06:16Z

    On 10/25/11 5:03 AM, Magnus Hagander wrote:
    > If we want something to go in early, that could be as simple as a
    > version of pg_basebackup that runs against the slave but only if
    > full_page_writes=on on the master. If it's not, it throws an error.
    > Then we can improve upon that by adding handling of fpw=off, first by
    > infrastructure, then by tool.
    
    Just to be clear, the idea is to require full_page_writes to do backup
    from the standby in 9.2, but to remove the requirement later?
    
    -- 
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
    
    
  69. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2011-11-04T04:20:31Z

    On Fri, Nov 4, 2011 at 8:06 AM, Josh Berkus <josh@agliodbs.com> wrote:
    > On 10/25/11 5:03 AM, Magnus Hagander wrote:
    >> If we want something to go in early, that could be as simple as a
    >> version of pg_basebackup that runs against the slave but only if
    >> full_page_writes=on on the master. If it's not, it throws an error.
    >> Then we can improve upon that by adding handling of fpw=off, first by
    >> infrastructure, then by tool.
    >
    > Just to be clear, the idea is to require full_page_writes to do backup
    > from the standby in 9.2, but to remove the requirement later?
    
    Yes unless I'm missing something. Not sure if we can remove that in 9.2, though.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  70. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2011-11-15T02:11:38Z

    On 11-10-31 12:11 AM, Jun Ishiduka wrote:
    >>
    >> Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
    >> as the infrastructure patch.
    >>
    >> The changes of pg_start_backup() etc that Ishiduka-san did are also
    >> a server-side infrastructure. I will extract them as another infrastructure one.
    >>
    >> Ishiduka-san, if you have time, feel free to try the above, barring objection.
    >
    > Done.
    > Changed the name of the patch.
    >
    > <Modifications>
    >  So changed to the positioning of infrastructure,
    >    * Removed the documentation.
    >    * changed to an error when you run pg_start/stop_backup() on the standby.
    >
    >
    
    Here is my stab at reviewing this version of this version of the patch.
    
    Submission
    -------------------
    The purpose of this version of the patch is to provide some
    infrastructure needed for backups from the slave without having to solve
    some of the usability issues raised in previous versions of the patch.
    
    This patch applied fine earlier versions of head but it doesn't today.
    Simon moved some of the code touched by this patch as part of the xlog
    refactoring. Please post an updated/rebased version of the patch.
    
    
    I think the purpose of this patch is to provide
    
    a) The code changes to record changes to fpw state of the master in WAL.
    b) Track the state of FPW while in recovery mode
    
    This version of the patch is NOT intended to allow SQL calls to
    pg_start_backup() on slaves to work. This patch lays the infrastructure
    for another patch (which I haven't seen) to allow pg_basebackup to do a
    base backup from a slave assuming fpw=on has been set on the master (my
    understanding of this patch is that it puts into place all of the pieces
    required for the pg_basebackup patch to detect if fpw!=on and abort).
    
    
    The consensus upthread was to get this infrastructure in and figure out
    a safe+usable way of doing a slave backup without pg_basebackup later.
    
    The patch seems to do what I expect of it.
    
    I don't see any issues with most of the code changes in this patch.
    However I admit that even after reviewing many versions of this patch I
    still am not familiar enough with the recovery code to comment on a lot
    of the details.
    
    One thing I did see:
    
    In pg_ctl.c
    
    ! if (stat(recovery_file, &statbuf) != 0)
    ! print_msg(_("WARNING: online backup mode is active\n"
    ! "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
    ! else
    ! print_msg(_("WARNING: online backup mode is active if you can connect
    as a superuser to server\n"
    ! "If so, shutdown will not complete until pg_stop_backup() is
    called.\n\n"));
    
    I am having difficulty understanding what this error message is trying
    to tell me. I think it is telling me (based on the code comments) that
    if I can't connect to the server because the server is not yet accepting
    connections then I shouldn't worry about anything. However if the server
    is accepting connections then I need to login and call pg_stop_backup().
    
    Maybe
    "WARNING: online backup mode is active. If your server is accepting
    connections then you must connect as superuser and run pg_stop_backup()
    before shutdown will complete"
    
    I will wait on attempting to test the patch until you have sent a
    version that applies against the current HEAD.
    
    
    > Regards.
    >
    >
    > --------------------------------------------
    > Jun Ishizuka
    > NTT Software Corporation
    > TEL:045-317-7018
    > E-Mail: ishizuka.jun@po.ntts.co.jp
    > --------------------------------------------
    >
    >
    >
    
    
  71. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-13T08:02:55Z

    Sorry for the delay.
    
    2011/11/15 Steve Singer <ssinger_pg@sympatico.ca>:
    > Here is my stab at reviewing this version of this version of the patch.
    
    Thanks for the review!
    
    > This version of the patch is NOT intended to allow SQL calls to
    > pg_start_backup() on slaves to work.   This patch lays the infrastructure
    > for another patch (which I haven't seen) to allow pg_basebackup to do a base
    > backup from a slave assuming fpw=on has been set on the master (my
    > understanding of this patch is that it puts into place all of the pieces
    > required for the pg_basebackup patch to detect if fpw!=on and abort).
    
    The amount of code changes to allow pg_basebackup to make a backup from
    the standby seems to be small. So I ended up merging that changes and the
    infrastructure patch. WIP patch attached. But I'd happy to split the patch again
    if you want.
    
    > In pg_ctl.c
    >
    > !             if (stat(recovery_file, &statbuf) != 0)
    > !                 print_msg(_("WARNING: online backup mode is active\n"
    > !                             "Shutdown will not complete until
    > pg_stop_backup() is called.\n\n"));
    > !             else
    > !                 print_msg(_("WARNING: online backup mode is active if you
    > can connect as a superuser to server\n"
    > !                             "If so, shutdown will not complete until
    > pg_stop_backup() is called.\n\n"));
    >
    > I am having difficulty understanding what this error message is trying to
    > tell me.   I think it is telling me (based on the code comments) that if I
    > can't connect to the server because the server is not yet accepting
    > connections then I shouldn't worry about anything.   However if the server
    > is accepting connections then I need to login and call pg_stop_backup().
    >
    > Maybe
    > "WARNING:  online backup mode is active.  If your server is accepting
    > connections then you must connect as superuser and run pg_stop_backup()
    > before shutdown will complete"
    
    The reason why the above change of pg_ctl.c was required is that new
    backup_label can be created by standby-only backup during recovery.
    But, now, we decided to disallow pg_start_backup() and pg_stop_backup()
    to be called during recovery again, and allow only pg_basebackup to make
    a base backup from the standby, which means that backup_label will not be
    created during recovery. So the above change of pg_ctl.c has not been
    required now. I excluded that change from the patch.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  72. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-17T10:38:23Z

    On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > The amount of code changes to allow pg_basebackup to make a backup from
    > the standby seems to be small. So I ended up merging that changes and the
    > infrastructure patch. WIP patch attached. But I'd happy to split the patch again
    > if you want.
    
    Attached is the updated version of the patch. I wrote the limitations of
    standby-only backup in the document and changed the error messages.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  73. Re: Online base backup from the hot-standby

    Steve Singer <ssinger_pg@sympatico.ca> — 2012-01-20T04:01:49Z

    On 12-01-17 05:38 AM, Fujii Masao wrote:
    > On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
    >> The amount of code changes to allow pg_basebackup to make a backup from
    >> the standby seems to be small. So I ended up merging that changes and the
    >> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
    >> if you want.
    > Attached is the updated version of the patch. I wrote the limitations of
    > standby-only backup in the document and changed the error messages.
    >
    
    Here is my review of this verison of the patch. I think this patch has 
    been in every CF for 9.2 and I feel it is getting close to being 
    committed.  The only issue of significants is a crash I encountered 
    while testing, see below.
    
    I am fine with including the pg_basebackup changes in the patch it also 
    makes testing some of the other changes possible.
    
    
    The documentation updates you have are good
    
    I don't see any issues looking at the code.
    
    
    
    Testing Review
    --------------------------------
    
    I encountered this on my first replica (the one based on the master).  I 
    am not sure if it is related to this patch, it happened after the 
    pg_basebackup against the replica finished.
    
    TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: 
    "twophase.c", Line: 1238)
    LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
    LOG:  terminating any other active server processes
    
    A little earlier this postmaster had printed.
    
    LOG:  restored log file "00000001000000000000001F" from archive
    LOG:  restored log file "000000010000000000000020" from archive
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
    or directory
    LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
    cp: cannot stat 
    `/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
    or directory
    
    
    I have NOT been able to replicate this error  and I am not sure exactly 
    what I had done in my testing prior to that point.
    
    
    In another test run I had
    
    - set full page writes=off and did a checkpoint
    - Started the pg_basebackup
    - set full_page_writes=on and did a HUP + some database activity that 
    might have forced a checkpoint.
    
    I got this message from pg_basebackup.
    ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
    pg_basebackup: could not get WAL end position from server
    
    I point this out because the message is different than the normal "could 
    not initiate base backup: FATAL:  WAL generated with 
    full_page_writes=off" thatI normally see.    We might want to add a 
    PQerrorMessage(conn)) to pg_basebackup to print the error details.  
    Since this patch didn't actually change pg_basebackup I don't think your 
    required to improve the error messages in it.  I am just mentioning this 
    because it came up in testing.
    
    
    The rest of the tests I did involving changing full_page_writes  
    with/without checkpoints and sighups and promoting the replica seemed to 
    work as expected.
    
    
    
    
    > Regards,
    >
    >
    >
    >
    
    
  74. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T09:48:59Z

    On Fri, Jan 20, 2012 at 1:01 PM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
    > Here is my review of this verison of the patch. I think this patch has been
    > in every CF for 9.2 and I feel it is getting close to being committed.
    
    Thanks for the review!
    
    > Testing Review
    > --------------------------------
    >
    > I encountered this on my first replica (the one based on the master).  I am
    > not sure if it is related to this patch, it happened after the pg_basebackup
    > against the replica finished.
    >
    > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
    > "twophase.c", Line: 1238)
    > LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
    
    I spent one hour to reproduce that issue, but finally I was not able
    to do that :(
    For now I have no idea what causes that issue. But basically the patch doesn't
    touch any codes related to that issue, so I'm guessing that it's a problem of
    the HEAD rather than the patch...
    
    I will spend more time to diagnose the issue. If you notice something, please
    let me know.
    
    > - set full page writes=off and did a checkpoint
    > - Started the pg_basebackup
    > - set full_page_writes=on and did a HUP + some database activity that might
    > have forced a checkpoint.
    >
    > I got this message from pg_basebackup.
    > ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
    > pg_basebackup: could not get WAL end position from server
    >
    > I point this out because the message is different than the normal "could not
    > initiate base backup: FATAL:  WAL generated with full_page_writes=off" thatI
    > normally see.
    
    I guess that's because you started pg_basebackup before checkpoint record
    with full_page_writes=off had been replicated and replayed to the standby.
    In this case, when you starts pg_basebackup, it uses the previous checkpoint
    record with maybe full_page_writes=on as the backup starting checkpoint, so
    pg_basebackup passes the check of full_page_writes at the start of backup.
    Then, it fails the check at the end of backup, so you got such an error message.
    
    > We might want to add a PQerrorMessage(conn)) to
    > pg_basebackup to print the error details.  Since this patch didn't actually
    > change pg_basebackup I don't think your required to improve the error
    > messages in it.  I am just mentioning this because it came up in testing.
    
    Agreed.
    
    When PQresultStatus() returns an unexpected status, basically the error
    message from PQerrorMessage() should be reported. But only when
    pg_basebackup could not get WAL end position, PQerrorMessage() was
    not reported... This looks like a oversight of pg_basebackup... I think that
    it's better to fix that as a separate patch (attached). Thought?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
  75. Re: Online base backup from the hot-standby

    Erik Rijkers <er@xs4all.nl> — 2012-01-20T10:37:41Z

    On Fri, January 20, 2012 05:01, Steve Singer wrote:
    > On 12-01-17 05:38 AM, Fujii Masao wrote:
    >> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
    >>> The amount of code changes to allow pg_basebackup to make a backup from
    >>> the standby seems to be small. So I ended up merging that changes and the
    >>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
    >>> if you want.
    >> Attached is the updated version of the patch. I wrote the limitations of
    >> standby-only backup in the document and changed the error messages.
    >>
    >
    > Here is my review of this verison of the patch. I think this patch has
    > been in every CF for 9.2 and I feel it is getting close to being
    > committed.  The only issue of significants is a crash I encountered
    > while testing, see below.
    >
    > I am fine with including the pg_basebackup changes in the patch it also
    > makes testing some of the other changes possible.
    >
    >
    > The documentation updates you have are good
    >
    > I don't see any issues looking at the code.
    >
    >
    >
    > Testing Review
    > --------------------------------
    >
    > I encountered this on my first replica (the one based on the master).  I
    > am not sure if it is related to this patch, it happened after the
    > pg_basebackup against the replica finished.
    >
    > TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
    > "twophase.c", Line: 1238)
    > LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
    > LOG:  terminating any other active server processes
    >
    > A little earlier this postmaster had printed.
    >
    > LOG:  restored log file "00000001000000000000001F" from archive
    > LOG:  restored log file "000000010000000000000020" from archive
    > cp: cannot stat
    > `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
    > or directory
    > LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
    > cp: cannot stat
    > `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
    > or directory
    >
    >
    > I have NOT been able to replicate this error  and I am not sure exactly
    > what I had done in my testing prior to that point.
    >
    
    I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly
    already in 9.0devel; but I was never able to reproduce it reliably.  But I don't think it was ever
    solved.
    
      http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php
    
    Erik Rijkers
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    >
    > In another test run I had
    >
    > - set full page writes=off and did a checkpoint
    > - Started the pg_basebackup
    > - set full_page_writes=on and did a HUP + some database activity that
    > might have forced a checkpoint.
    >
    > I got this message from pg_basebackup.
    > ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
    > pg_basebackup: could not get WAL end position from server
    >
    > I point this out because the message is different than the normal "could
    > not initiate base backup: FATAL:  WAL generated with
    > full_page_writes=off" thatI normally see.    We might want to add a
    > PQerrorMessage(conn)) to pg_basebackup to print the error details.
    > Since this patch didn't actually change pg_basebackup I don't think your
    > required to improve the error messages in it.  I am just mentioning this
    > because it came up in testing.
    >
    >
    > The rest of the tests I did involving changing full_page_writes
    > with/without checkpoints and sighups and promoting the replica seemed to
    > work as expected.
    >
    
    
    
    
  76. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T11:04:29Z

    On Fri, Jan 20, 2012 at 7:37 PM, Erik Rijkers <er@xs4all.nl> wrote:
    > I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly
    > already in 9.0devel; but I was never able to reproduce it reliably.  But I don't think it was ever
    > solved.
    >
    >  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php
    
    I also encountered the same issue one year before:
    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01579.php
    
    At that moment, I identified its cause:
    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01700.php
    
    At last it was fixed:
    http://archives.postgresql.org/pgsql-hackers/2010-11/msg01910.php
    
    But Steve encountered it again, which means that the above fix is not
    sufficient. Unless the issue is derived from my patch, we should do
    another cycle of diagnosis of it.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  77. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T11:15:55Z

    On Tue, Jan 17, 2012 at 10:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> The amount of code changes to allow pg_basebackup to make a backup from
    >> the standby seems to be small. So I ended up merging that changes and the
    >> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
    >> if you want.
    >
    > Attached is the updated version of the patch. I wrote the limitations of
    > standby-only backup in the document and changed the error messages.
    
    
    I'm looking at this patch and wondering why we're doing so many
    press-ups to ensure full_page_writes parameter is on. This will still
    fail if you use a utility that removes the full page writes, but fail
    silently.
    
    I think it would be beneficial to explicitly check that all WAL
    records have full page writes actually attached to them until we
    achieve consistency.
    
    Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
    and it was removed. Not sure why? We already track other parameters
    when they change, so I don't want to introduce a whole new WAL record
    for each new parameter whose change needs tracking.
    
    Please make a note for committer that wal version needs bumping.
    
    I think its probably time to start a README.recovery to explain why
    this works the way it does. Other changes can then start to do that as
    well, so we can keep this to sane levels of complexity.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  78. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T11:26:35Z

    On Fri, Jan 20, 2012 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    
    > But Steve encountered it again, which means that the above fix is not
    > sufficient. Unless the issue is derived from my patch, we should do
    > another cycle of diagnosis of it.
    
    It's my bug, and I've posted a fix but not yet applied it, just added
    to open items list. The only reason for that was time pressure, which
    is now gone, so I'll look to apply it sooner.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  79. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T12:54:28Z

    Thanks for the review!
    
    On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > I'm looking at this patch and wondering why we're doing so many
    > press-ups to ensure full_page_writes parameter is on. This will still
    > fail if you use a utility that removes the full page writes, but fail
    > silently.
    >
    > I think it would be beneficial to explicitly check that all WAL
    > records have full page writes actually attached to them until we
    > achieve consistency.
    
    I agree that it's worth adding such a safeguard. That can be a self-contained
    feature, so I'll submit a separate patch for that, to keep each patch small.
    
    > Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
    > and it was removed. Not sure why? We already track other parameters
    > when they change, so I don't want to introduce a whole new WAL record
    > for each new parameter whose change needs tracking.
    
    I revived that because whenever full_page_writes must be WAL-logged
    or replayed, there is no need to WAL-log or replay the HS parameters.
    The opposite is also true. Logging or replaying all of them every time
    seems to be a bit useless, and to make the code unreadable. ISTM that
    XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
    WAL activity by merging them into one WAL record.
    
    > Please make a note for committer that wal version needs bumping.
    
    Okay, will add the note about bumping XLOG_PAGE_MAGIC.
    
    > I think its probably time to start a README.recovery to explain why
    > this works the way it does. Other changes can then start to do that as
    > well, so we can keep this to sane levels of complexity.
    
    In this CF, there are other patches which change recovery codes. So
    I think that it's better to do that after all of them will have been committed.
    No need to hurry up to do that now.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  80. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T14:34:31Z

    On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > Thanks for the review!
    >
    > On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> I'm looking at this patch and wondering why we're doing so many
    >> press-ups to ensure full_page_writes parameter is on. This will still
    >> fail if you use a utility that removes the full page writes, but fail
    >> silently.
    >>
    >> I think it would be beneficial to explicitly check that all WAL
    >> records have full page writes actually attached to them until we
    >> achieve consistency.
    >
    > I agree that it's worth adding such a safeguard. That can be a self-contained
    > feature, so I'll submit a separate patch for that, to keep each patch small.
    
    Maybe, but you mean do this now as well? Not sure I like silent errors.
    
    >> Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
    >> and it was removed. Not sure why? We already track other parameters
    >> when they change, so I don't want to introduce a whole new WAL record
    >> for each new parameter whose change needs tracking.
    >
    > I revived that because whenever full_page_writes must be WAL-logged
    > or replayed, there is no need to WAL-log or replay the HS parameters.
    > The opposite is also true. Logging or replaying all of them every time
    > seems to be a bit useless, and to make the code unreadable. ISTM that
    > XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
    > WAL activity by merging them into one WAL record.
    
    I don't agree, but for the sake of getting on with things I say this
    is minor so is no reason to block this.
    
    >> Please make a note for committer that wal version needs bumping.
    >
    > Okay, will add the note about bumping XLOG_PAGE_MAGIC.
    >
    >> I think its probably time to start a README.recovery to explain why
    >> this works the way it does. Other changes can then start to do that as
    >> well, so we can keep this to sane levels of complexity.
    >
    > In this CF, there are other patches which change recovery codes. So
    > I think that it's better to do that after all of them will have been committed.
    > No need to hurry up to do that now.
    
    Agreed.
    
    Will proceed to final review and if all OK, commit.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  81. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-23T10:29:20Z

    On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> Thanks for the review!
    >>
    >> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> I'm looking at this patch and wondering why we're doing so many
    >>> press-ups to ensure full_page_writes parameter is on. This will still
    >>> fail if you use a utility that removes the full page writes, but fail
    >>> silently.
    >>>
    >>> I think it would be beneficial to explicitly check that all WAL
    >>> records have full page writes actually attached to them until we
    >>> achieve consistency.
    >>
    >> I agree that it's worth adding such a safeguard. That can be a self-contained
    >> feature, so I'll submit a separate patch for that, to keep each patch small.
    >
    > Maybe, but you mean do this now as well? Not sure I like silent errors.
    
    If many people think the patch is not acceptable without such a safeguard,
    I will do that right now. Otherwise, I'd like to take more time to do
    that, i.e.,
    add it to 9.2dev Oepn Items.
    
    I've not come up with good idea. Ugly idea is to keep track of all replays of
    full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
    table and set the specified bit to 1 when full_page_writes is applied),
    and then check whether full_page_writes has been already applied when
    replaying normal WAL record... Do you have any better idea?
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  82. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-23T13:11:04Z

    On Mon, Jan 23, 2012 at 10:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>> Thanks for the review!
    >>>
    >>> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>>> I'm looking at this patch and wondering why we're doing so many
    >>>> press-ups to ensure full_page_writes parameter is on. This will still
    >>>> fail if you use a utility that removes the full page writes, but fail
    >>>> silently.
    >>>>
    >>>> I think it would be beneficial to explicitly check that all WAL
    >>>> records have full page writes actually attached to them until we
    >>>> achieve consistency.
    >>>
    >>> I agree that it's worth adding such a safeguard. That can be a self-contained
    >>> feature, so I'll submit a separate patch for that, to keep each patch small.
    >>
    >> Maybe, but you mean do this now as well? Not sure I like silent errors.
    >
    > If many people think the patch is not acceptable without such a safeguard,
    > I will do that right now. Otherwise, I'd like to take more time to do
    > that, i.e.,
    > add it to 9.2dev Oepn Items.
    
    > I've not come up with good idea. Ugly idea is to keep track of all replays of
    > full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
    > table and set the specified bit to 1 when full_page_writes is applied),
    > and then check whether full_page_writes has been already applied when
    > replaying normal WAL record... Do you have any better idea?
    
    Not sure.
    
    I think the only possible bug here is one introduced by an outside utility.
    
    In that case, I don't think it should be the job of the backend to go
    too far to protect against such atypical error. So if we can't solve
    it fairly easily and with no overhead then I'd say lets skip it. We
    could easily introduce a bug here just by having faulty checking code.
    
    So lets add it to 9.2 open items as a non-priority item. I'll proceed
    to commit for this now.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  83. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2012-01-23T13:11:10Z

    On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    > If many people think the patch is not acceptable without such a safeguard,
    > I will do that right now.
    
    That's my view.  I think we ought to resolve this issue before commit,
    especially since it seems unclear that we know how to fix it.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  84. Re: Online base backup from the hot-standby

    Robert Haas <robertmhaas@gmail.com> — 2012-01-23T13:13:51Z

    On Mon, Jan 23, 2012 at 8:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
    > On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> If many people think the patch is not acceptable without such a safeguard,
    >> I will do that right now.
    >
    > That's my view.  I think we ought to resolve this issue before commit,
    > especially since it seems unclear that we know how to fix it.
    
    Actually, never mind.  On reading this more carefully, I'm not too
    concerned about the possibility of people breaking it with pg_lesslog
    or similar.  But it should be solid if you use only the functionality
    built into core.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
  85. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-24T09:51:13Z

    On Mon, Jan 23, 2012 at 10:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Mon, Jan 23, 2012 at 10:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >> On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>> On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>>> Thanks for the review!
    >>>>
    >>>> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >>>>> I'm looking at this patch and wondering why we're doing so many
    >>>>> press-ups to ensure full_page_writes parameter is on. This will still
    >>>>> fail if you use a utility that removes the full page writes, but fail
    >>>>> silently.
    >>>>>
    >>>>> I think it would be beneficial to explicitly check that all WAL
    >>>>> records have full page writes actually attached to them until we
    >>>>> achieve consistency.
    >>>>
    >>>> I agree that it's worth adding such a safeguard. That can be a self-contained
    >>>> feature, so I'll submit a separate patch for that, to keep each patch small.
    >>>
    >>> Maybe, but you mean do this now as well? Not sure I like silent errors.
    >>
    >> If many people think the patch is not acceptable without such a safeguard,
    >> I will do that right now. Otherwise, I'd like to take more time to do
    >> that, i.e.,
    >> add it to 9.2dev Oepn Items.
    >
    >> I've not come up with good idea. Ugly idea is to keep track of all replays of
    >> full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
    >> table and set the specified bit to 1 when full_page_writes is applied),
    >> and then check whether full_page_writes has been already applied when
    >> replaying normal WAL record... Do you have any better idea?
    >
    > Not sure.
    >
    > I think the only possible bug here is one introduced by an outside utility.
    >
    > In that case, I don't think it should be the job of the backend to go
    > too far to protect against such atypical error. So if we can't solve
    > it fairly easily and with no overhead then I'd say lets skip it. We
    > could easily introduce a bug here just by having faulty checking code.
    >
    > So lets add it to 9.2 open items as a non-priority item.
    
    Agreed.
    
    > I'll proceed to commit for this now.
    
    Thanks a lot!
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  86. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-24T10:54:56Z

    On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    
    >> I'll proceed to commit for this now.
    >
    > Thanks a lot!
    
    Can I just check a few things?
    
    You say
    /*
    +        * Update full_page_writes in shared memory and write an
    +        * XLOG_FPW_CHANGE record before resource manager writes cleanup
    +        * WAL records or checkpoint record is written.
    +        */
    
    why does it need to be before the cleanup and checkpoint?
    
    You say
    /*
    +        * Currently only non-exclusive backup can be taken during recovery.
    +        */
    
    why?
    
    You mention in the docs
    "The backup history file is not created in the database cluster backed up."
    but we need to explain the bad effect, if any.
    
    You say
    "If the standby is promoted to the master during online backup, the
    backup fails."
    but no explanation of why?
    
    I could work those things out, but I don't want to have to, plus we
    may disagree if I did.
    
    There are some good explanations in comments of other things, just not
    everywhere needed.
    
    What happens if we shutdown the WALwriter and then issue SIGHUP?
    
    Are we sure we want to make the change of file format mandatory? That
    means earlier versions of clients such as pg_basebackup will fail
    against this version. Should we allow that if BACKUP FROM is missing
    we assume it was master?
    
    There are no docs to explain the new feature is available in the main
    docs, or to explain the restrictions.
    I expect you will add that later after commit.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  87. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-24T11:22:16Z

    On Tue, Jan 24, 2012 at 10:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >
    >>> I'll proceed to commit for this now.
    >>
    >> Thanks a lot!
    >
    > Can I just check a few things?
    
    Just to clarify, not expecting another patch version, just reply here
    and I can edit.
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  88. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-25T08:16:40Z

    On Tue, Jan 24, 2012 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >
    >>> I'll proceed to commit for this now.
    >>
    >> Thanks a lot!
    >
    > Can I just check a few things?
    
    Sure!
    
    > You say
    > /*
    > +        * Update full_page_writes in shared memory and write an
    > +        * XLOG_FPW_CHANGE record before resource manager writes cleanup
    > +        * WAL records or checkpoint record is written.
    > +        */
    >
    > why does it need to be before the cleanup and checkpoint?
    
    Because the cleanup and checkpoint need to see FPW in shared memory.
    If FPW in shared memory is not updated there, the cleanup and (end-of-recovery)
    checkpoint always use an initial value (= false) of FPW in shared memory.
    
    > You say
    > /*
    > +        * Currently only non-exclusive backup can be taken during recovery.
    > +        */
    >
    > why?
    
    At first I proposed to allow exclusive backup to be taken during recovery. But
    Heikki disagreed with the proposal because he thought that the exclusive backup
    procedure which I proposed was too fragile. No one could come up with any good
    user-friendly easy-to-implement procedure. So we decided to allow only
    non-exclusive backup to be taken during recovery. In non-exclusive backup,
    the complicated procedure is performed by pg_basebackup, so a user doesn't
    need to care about that.
    
    > You mention in the docs
    > "The backup history file is not created in the database cluster backed up."
    > but we need to explain the bad effect, if any.
    
    Users cannot know various information (e.g., which WAL files are required for
    backups, backup starting/ending time, etc) about backups which have been taken
    so far. If they need such information, they need to record that manually.
    
    Users cannot pass the backup history file to pg_archivecleanup. Which might make
    the usage of pg_archivecleanup more difficult.
    
    After a little thought, pg_basebackup would be able to create the backup history
    file in the backup, though it cannot be archived. We shoud implement
    that feature
    to alleviate the bad effect?
    
    > You say
    > "If the standby is promoted to the master during online backup, the
    > backup fails."
    > but no explanation of why?
    >
    > I could work those things out, but I don't want to have to, plus we
    > may disagree if I did.
    
    If the backup succeeds in that case, when we start an archive recovery from that
    backup, the recovery needs to cross between two timelines. Which means that
    we need to set recovery_target_timeline before starting recovery. Whether
    recovery_target_timeline needs to be set or not depends on whether the standby
    was promoted during taking the backup. Leaving such a decision to a user seems
    fragile.
    
    pg_basebackup -x ensures that all required files are included in the backup and
    we can start recovery without restoring any file from the archive. But
    if the standby
    is promoted during the backup, the timeline history file would become
    an essential
    file for recovery, but it's not included in the backup.
    
    > There are some good explanations in comments of other things, just not
    > everywhere needed.
    >
    > What happens if we shutdown the WALwriter and then issue SIGHUP?
    
    SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
    the case where smart shutdown kills walwriter but some backends are
    still running?
    Currently SIGHUP affects full_page_writes and running backends use the changed
    new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
    
    To address the problem, we should either postpone the shutdown of walwriter
    until all backends have gone away, or leave the update of full_page_writes to
    checkpointer process instead of walwriter. Thought?
    
    > Are we sure we want to make the change of file format mandatory? That
    > means earlier versions of clients such as pg_basebackup will fail
    > against this version.
    
    Really? Unless I'm missing something, pg_basebackup doesn't care about the
    file format of backup_label. So I don't think that earlier version of
    pg_basebackup
    fails.
    
    > There are no docs to explain the new feature is available in the main
    > docs, or to explain the restrictions.
    > I expect you will add that later after commit.
    
    Okay. Will do.
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    
    
  89. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-25T08:49:42Z

    On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    
    >> What happens if we shutdown the WALwriter and then issue SIGHUP?
    >
    > SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
    > the case where smart shutdown kills walwriter but some backends are
    > still running?
    > Currently SIGHUP affects full_page_writes and running backends use the changed
    > new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
    >
    > To address the problem, we should either postpone the shutdown of walwriter
    > until all backends have gone away, or leave the update of full_page_writes to
    > checkpointer process instead of walwriter. Thought?
    
    checkpointer seems the correct place to me
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  90. Re: Online base backup from the hot-standby

    Simon Riggs <simon@2ndquadrant.com> — 2012-01-25T18:07:24Z

    On Wed, Jan 25, 2012 at 8:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >
    >>> What happens if we shutdown the WALwriter and then issue SIGHUP?
    >>
    >> SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
    >> the case where smart shutdown kills walwriter but some backends are
    >> still running?
    >> Currently SIGHUP affects full_page_writes and running backends use the changed
    >> new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
    >>
    >> To address the problem, we should either postpone the shutdown of walwriter
    >> until all backends have gone away, or leave the update of full_page_writes to
    >> checkpointer process instead of walwriter. Thought?
    >
    > checkpointer seems the correct place to me
    
    
    Done.
    
    
    -- 
     Simon Riggs                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    
    
  91. Re: Online base backup from the hot-standby

    Fujii Masao <masao.fujii@gmail.com> — 2012-01-26T06:09:33Z

    On Thu, Jan 26, 2012 at 3:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    > On Wed, Jan 25, 2012 at 8:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
    >> On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
    >>
    >>>> What happens if we shutdown the WALwriter and then issue SIGHUP?
    >>>
    >>> SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
    >>> the case where smart shutdown kills walwriter but some backends are
    >>> still running?
    >>> Currently SIGHUP affects full_page_writes and running backends use the changed
    >>> new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
    >>>
    >>> To address the problem, we should either postpone the shutdown of walwriter
    >>> until all backends have gone away, or leave the update of full_page_writes to
    >>> checkpointer process instead of walwriter. Thought?
    >>
    >> checkpointer seems the correct place to me
    >
    >
    > Done.
    
    Thanks a lot!!
    
    I proposed another small patch which fixes the issue about an error message of
    pg_basebackup, in this upthread. If it's reasonable, could you commit it?
    http://archives.postgresql.org/message-id/CAHGQGwENjSDN=f_VDPwVQ53QRU0cu9+wZKBvwNaEXMawj-y-GQ@mail.gmail.com
    
    Regards,
    
    -- 
    Fujii Masao
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center