Thread

Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-05T06:45:47Z

> I will provide a patch which can exeute pg_start/stop_backup
> including to solve above comment and conditions in next stage.
> Then please review.

done.


* Procedure

1. Call pg_start_backup('x') on the standby.
2. Take a backup of the data dir.
3. Call pg_stop_backup() on the standby.
4. Copy the control file on the standby to the backup.
5. Check whether the control file is status during hot standby with pg_controldata.
   -> If the standby promote between 3. and 4., the backup can not recovery.
      -> pg_control is that "Minimum recovery ending location" is equals 0/0.
      -> backup-end record is not written.

* Not correspond yet

  * full_page_write = off
    -> If the primary is "full_page_write = off", archive recovery may not act 
       normally. Therefore the standby may need to check whether "full_page_write
       = off" to WAL.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Cédric Villemain <cedric.villemain.debian@gmail.com> — 2011-08-05T08:02:15Z

2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>> I will provide a patch which can exeute pg_start/stop_backup
>> including to solve above comment and conditions in next stage.
>> Then please review.
>
> done.

great !

>
>
> * Procedure
>
> 1. Call pg_start_backup('x') on the standby.
> 2. Take a backup of the data dir.
> 3. Call pg_stop_backup() on the standby.
> 4. Copy the control file on the standby to the backup.
> 5. Check whether the control file is status during hot standby with pg_controldata.
>   -> If the standby promote between 3. and 4., the backup can not recovery.
>      -> pg_control is that "Minimum recovery ending location" is equals 0/0.
>      -> backup-end record is not written.
>
> * Not correspond yet
>
>  * full_page_write = off
>    -> If the primary is "full_page_write = off", archive recovery may not act
>       normally. Therefore the standby may need to check whether "full_page_write
>       = off" to WAL.

Isn't having a standby make the full_page_write = on in all case
(bypass configuration) ?

>
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka.jun@po.ntts.co.jp
> --------------------------------------------
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-15T08:46:53Z

> > * Not correspond yet
> >
> >  * full_page_write = off
> >    -> If the primary is "full_page_write = off", archive recovery may not act
> >       normally. Therefore the standby may need to check whether "full_page_write
> >       = off" to WAL.
> 
> Isn't having a standby make the full_page_write = on in all case
> (bypass configuration) ?

what's the meaning?


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2011-08-15T11:52:21Z

2011/8/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>> > * Not correspond yet
>> >
>> >  * full_page_write = off
>> >    -> If the primary is "full_page_write = off", archive recovery may not act
>> >       normally. Therefore the standby may need to check whether "full_page_write
>> >       = off" to WAL.
>>
>> Isn't having a standby make the full_page_write = on in all case
>> (bypass configuration) ?
>
> what's the meaning?

Yeah.  full_page_writes is a WAL generation parameter.  Standbys don't
generate WAL.  I think you just have to insist that the master has it
on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-16T06:09:16Z

> >> > * Not correspond yet
> >> >
> >> > ?* full_page_write = off
> >> > ? ?-> If the primary is "full_page_write = off", archive recovery may not act
> >> > ? ? ? normally. Therefore the standby may need to check whether "full_page_write
> >> > ? ? ? = off" to WAL.
> >>
> >> Isn't having a standby make the full_page_write = on in all case
> >> (bypass configuration) ?
> >
> > what's the meaning?

Thanks. 

This has the following two problems.
 * pg_start_backup() must set 'on' to full_page_writes of the master that 
   is actual writing of the WAL, but not the standby.
 * The standby doesn't need to connect to the master that's actual writing 
   WAL.
   (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

I'm worried how I should clear these problems.

Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-16T06:12:59Z

> >> > * Not correspond yet
> >> >
> >> > ?* full_page_write = off
> >> > ? ?-> If the primary is "full_page_write = off", archive recovery may not act
> >> > ? ? ? normally. Therefore the standby may need to check whether "full_page_write
> >> > ? ? ? = off" to WAL.
> >>
> >> Isn't having a standby make the full_page_write = on in all case
> >> (bypass configuration) ?
> >
> > what's the meaning?
> 
> Yeah.  full_page_writes is a WAL generation parameter.  Standbys don't
> generate WAL.  I think you just have to insist that the master has it
> on.

Thanks. 

This has the following two problems.
 * pg_start_backup() must set 'on' to full_page_writes of the master that 
   is actual writing of the WAL, but not the standby.
 * The standby doesn't need to connect to the master that's actual writing 
   WAL.
   (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)

I'm worried how I should clear these problems.

Regards.



--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-08-16T15:24:24Z

On 11-08-16 02:09 AM, Jun Ishiduka wrote:
>
> Thanks. 
>
> This has the following two problems.
>  * pg_start_backup() must set 'on' to full_page_writes of the master that 
>    is actual writing of the WAL, but not the standby.
Is there any way to tell from the WAL segments if they contain the full
page data? If so could you verify this on the second slave when it is
brought up? Or can you track this on the first slave and produce an
error in either pg_start_backup or pg_stop_backup()

I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
flag is used to indicate that the archiver can compress the full page
blocks to non-full page blocks. I am not familiar with where in the code
this actually happens but will this cause issues if the first standby is
processing WAL files from the archive?


>  * The standby doesn't need to connect to the master that's actual writing 
>    WAL.
>    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
>
> I'm worried how I should clear these problems.
>
> Regards.
>
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka.jun@po.ntts.co.jp
> --------------------------------------------
>
>
>

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-17T08:59:37Z

> Is there any way to tell from the WAL segments if they contain the full
> page data? If so could you verify this on the second slave when it is
> brought up? Or can you track this on the first slave and produce an
> error in either pg_start_backup or pg_stop_backup()

Sure.
I will make a patch with the way to tell from the WAL segments if they 
contain the full page data.


> I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
> flag is used to indicate that the archiver can compress the full page
> blocks to non-full page blocks. I am not familiar with where in the code
> this actually happens but will this cause issues if the first standby is
> processing WAL files from the archive?

I confirmed the flag in xlog.c, so I seemed to only insert it in 
XLogInsert(). I consider whether it is available.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-08-17T10:19:03Z

2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>> I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
>> flag is used to indicate that the archiver can compress the full page
>> blocks to non-full page blocks. I am not familiar with where in the code
>> this actually happens but will this cause issues if the first standby is
>> processing WAL files from the archive?
>
> I confirmed the flag in xlog.c, so I seemed to only insert it in
> XLogInsert(). I consider whether it is available.

That flag is not available to check whether full-page writing was
skipped or not.
Because it's in full-page data, not non-full-page one.

The straightforward approach to address the problem you raised is to log
the change of full_page_writes on the master. Since such a WAL record is also
replicated to the standby, the standby can know whether full_page_writes is
enabled or not in the master, from the WAL record. If it's disabled,
pg_start_backup() in the standby should emit an error and refuse standby-only
backup. If the WAL record indicating that full_page_writes was disabled
on the master arrives during standby-only backup, the standby should cancel
the backup.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2011-08-17T12:40:15Z

On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 2011/8/17 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>>> I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this
>>> flag is used to indicate that the archiver can compress the full page
>>> blocks to non-full page blocks. I am not familiar with where in the code
>>> this actually happens but will this cause issues if the first standby is
>>> processing WAL files from the archive?
>>
>> I confirmed the flag in xlog.c, so I seemed to only insert it in
>> XLogInsert(). I consider whether it is available.
>
> That flag is not available to check whether full-page writing was
> skipped or not.
> Because it's in full-page data, not non-full-page one.
>
> The straightforward approach to address the problem you raised is to log
> the change of full_page_writes on the master. Since such a WAL record is also
> replicated to the standby, the standby can know whether full_page_writes is
> enabled or not in the master, from the WAL record. If it's disabled,
> pg_start_backup() in the standby should emit an error and refuse standby-only
> backup. If the WAL record indicating that full_page_writes was disabled
> on the master arrives during standby-only backup, the standby should cancel
> the backup.

Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-08-17T13:53:08Z

On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> The straightforward approach to address the problem you raised is to log
>> the change of full_page_writes on the master. Since such a WAL record is also
>> replicated to the standby, the standby can know whether full_page_writes is
>> enabled or not in the master, from the WAL record. If it's disabled,
>> pg_start_backup() in the standby should emit an error and refuse standby-only
>> backup. If the WAL record indicating that full_page_writes was disabled
>> on the master arrives during standby-only backup, the standby should cancel
>> the backup.
>
> Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.

I'm afraid it's not so easy. Because since fpw can be changed by
SIGHUP, it's not
easy to ensure that logging the change of fpw must happen ahead of the actual
behavior change by that. Probably we need to make the backend which detects
the change of fpw first log that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2011-08-17T15:09:43Z

On Wed, Aug 17, 2011 at 9:53 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Aug 17, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> The straightforward approach to address the problem you raised is to log
>>> the change of full_page_writes on the master. Since such a WAL record is also
>>> replicated to the standby, the standby can know whether full_page_writes is
>>> enabled or not in the master, from the WAL record. If it's disabled,
>>> pg_start_backup() in the standby should emit an error and refuse standby-only
>>> backup. If the WAL record indicating that full_page_writes was disabled
>>> on the master arrives during standby-only backup, the standby should cancel
>>> the backup.
>>
>> Seems like something we could add to XLOG_PARAMETER_CHANGE fairly easily.
>
> I'm afraid it's not so easy. Because since fpw can be changed by
> SIGHUP, it's not
> easy to ensure that logging the change of fpw must happen ahead of the actual
> behavior change by that. Probably we need to make the backend which detects
> the change of fpw first log that.

Ugh, you're right.  But then you might have problems if the state
changes again before all backends have picked up the previous change.
What I've thought about before is making one backend (say, bgwriter)
store its latest value in shared memory, protected by some lock that
would already be held at the time the value is needed.  Everyone else
uses the shared memory copy instead of relying on their local value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-08-18T01:43:21Z

On Thu, Aug 18, 2011 at 12:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Ugh, you're right.  But then you might have problems if the state
> changes again before all backends have picked up the previous change.

Right.

> What I've thought about before is making one backend (say, bgwriter)
> store its latest value in shared memory, protected by some lock that
> would already be held at the time the value is needed.  Everyone else
> uses the shared memory copy instead of relying on their local value.

Sounds reasonable.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-08-18T02:12:55Z

2011/8/5 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
> * Procedure
>
> 1. Call pg_start_backup('x') on the standby.
> 2. Take a backup of the data dir.
> 3. Call pg_stop_backup() on the standby.
> 4. Copy the control file on the standby to the backup.
> 5. Check whether the control file is status during hot standby with pg_controldata.
>   -> If the standby promote between 3. and 4., the backup can not recovery.
>      -> pg_control is that "Minimum recovery ending location" is equals 0/0.
>      -> backup-end record is not written.

What if we do #4 before #3? The backup gets corrupted? My guess is
that the backup is still valid even if we copy pg_control before executing
pg_stop_backup(). Which would not require #5 because if the standby
promotion happens before pg_stop_backup(), pg_stop_backup() can
detect that status change and cancel the backup.

#5 looks fragile. If we can get rid of it, the procedure becomes more
robust, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-08-18T05:47:25Z

> > * Procedure
> >
> > 1. Call pg_start_backup('x') on the standby.
> > 2. Take a backup of the data dir.
> > 3. Call pg_stop_backup() on the standby.
> > 4. Copy the control file on the standby to the backup.
> > 5. Check whether the control file is status during hot standby with pg_controldata.
> > ? -> If the standby promote between 3. and 4., the backup can not recovery.
> > ? ? ?-> pg_control is that "Minimum recovery ending location" is equals 0/0.
> > ? ? ?-> backup-end record is not written.
> 
> What if we do #4 before #3? The backup gets corrupted? My guess is
> that the backup is still valid even if we copy pg_control before executing
> pg_stop_backup(). Which would not require #5 because if the standby
> promotion happens before pg_stop_backup(), pg_stop_backup() can
> detect that status change and cancel the backup.
> 
> #5 looks fragile. If we can get rid of it, the procedure becomes more
> robust, I think.

Sure, you're right.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-12T06:46:37Z

Hi, Created a patch in response to comments.


* Procedure
1. Call pg_start_backup('x') on hot standby.
2. Take a backup of the data dir.
3. Copy the control file on hot standby to the backup.
4. Call pg_stop_backup() on hot standby.


* Behavior
(take backup)
 If we execute pg_start_backup() on hot standby then execute restartpoint,
 write a strings as "FROM: slave" in backup_label and change backup mode,
 but do not change full_page_writes into "on" forcibly.

 If we execute pg_stop_backup() on hot standby then rename backup_label
 and change backup mode, but neither write backup end record and history
 file nor wait to complete the WAL archiving.
 pg_stop_backup() is returned this MinRecoveryPoint as result.

 If we execute pg_stop_backup() on the server promoted then error
 message is output since read the backup_label.

(recovery)
 If we recover with the backup taken on hot standby, MinRecoveryPoint in
 the control file copied by 3 of above-procedure is used instead of backup
 end record.

 If recovery starts as first, BackupEndPoint in the control file is written
 a same value as MinRecoveryPoint. This is for remembering the value of
 MinRecoveryPoint during recovery.

 HINT message("If this has ...") is always output when we recover with the
 backup taken on hot standby.


* Problem
 full_page_writes's problem.
  > This has the following two problems.
  >  * pg_start_backup() must set 'on' to full_page_writes of the master that 
  >    is actual writing of the WAL, but not the standby.
  >  * The standby doesn't need to connect to the master that's actual writing 
  >    WAL.
  >    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
  > 
  > I'm worried how I should clear these problems.

 Status: Considering
  (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)


Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-13T06:50:28Z

Update patch.

Changes:
  * set 'on' full_page_writes by user (in document)
  * read "FROM: XX" in backup_label (in xlog.c)
  * check status when pg_stop_backup is executed (in xlog.c)

> Hi, Created a patch in response to comments.
> 
> 
> * Procedure
> 1. Call pg_start_backup('x') on hot standby.
> 2. Take a backup of the data dir.
> 3. Copy the control file on hot standby to the backup.
> 4. Call pg_stop_backup() on hot standby.
> 
> 
> * Behavior
> (take backup)
>  If we execute pg_start_backup() on hot standby then execute restartpoint,
>  write a strings as "FROM: slave" in backup_label and change backup mode,
>  but do not change full_page_writes into "on" forcibly.
> 
>  If we execute pg_stop_backup() on hot standby then rename backup_label
>  and change backup mode, but neither write backup end record and history
>  file nor wait to complete the WAL archiving.
>  pg_stop_backup() is returned this MinRecoveryPoint as result.
> 
>  If we execute pg_stop_backup() on the server promoted then error
>  message is output since read the backup_label.
> 
> (recovery)
>  If we recover with the backup taken on hot standby, MinRecoveryPoint in
>  the control file copied by 3 of above-procedure is used instead of backup
>  end record.
> 
>  If recovery starts as first, BackupEndPoint in the control file is written
>  a same value as MinRecoveryPoint. This is for remembering the value of
>  MinRecoveryPoint during recovery.
> 
>  HINT message("If this has ...") is always output when we recover with the
>  backup taken on hot standby.
> 
> 
> * Problem
>  full_page_writes's problem.
>   > This has the following two problems.
>   >  * pg_start_backup() must set 'on' to full_page_writes of the master that 
>   >    is actual writing of the WAL, but not the standby.
>   >  * The standby doesn't need to connect to the master that's actual writing 
>   >    WAL.
>   >    (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2)
>   > 
>   > I'm worried how I should clear these problems.
> 
>  Status: Considering
>   (Latest: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00880.php)
> 
> 
> Regards.
> 
> 
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka.jun@po.ntts.co.jp
> --------------------------------------------


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-21T02:50:24Z

2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>
> Update patch.
>
> Changes:
>  * set 'on' full_page_writes by user (in document)
>  * read "FROM: XX" in backup_label (in xlog.c)
>  * check status when pg_stop_backup is executed (in xlog.c)

Thanks for updating the patch.

Before reviewing the patch, to encourage people to comment and
review the patch, I explain what this patch provides:

This patch provides the capability to take a base backup during recovery,
i.e., from the standby server. This is very useful feature to offload the
expense of periodic backups from the master. That backup procedure is
similar to that during normal running, but slightly different:

1. Execute pg_start_backup on the standby. To execute a query on the
   standby, hot standby must be enabled.

2. Perform a file system backup on the standby.

3. Copy the pg_control file from the cluster directory on the standby to
    the backup as follows:

    cp $PGDATA/global/pg_control /mnt/server/backupdir/global

4. Execute pg_stop_backup on the standby.

The backup taken by the above procedure is available for an archive
recovery or standby server.

If the standby is promoted during a backup, pg_stop_backup() detects
the change of the server status and fails. The data backed up before the
promotion is invalid and not available for recovery.

Taking a backup from the standby by using pg_basebackup is still not
possible. But we can relax that restriction after applying this patch.

To take a base backup during recovery safely, some sort of parameters
must be set properly. Hot standby must be enabled on the standby, i.e.,
wal_level and hot_standby must be enabled on the master and the standby,
respectively. FPW (full page writes) is required for a base backup,
so full_page_writes must be enabled on the master.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Magnus Hagander <magnus@hagander.net> — 2011-09-21T05:13:21Z

On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
> 2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>>
>> Update patch.
>>
>> Changes:
>>  * set 'on' full_page_writes by user (in document)
>>  * read "FROM: XX" in backup_label (in xlog.c)
>>  * check status when pg_stop_backup is executed (in xlog.c)
>
> Thanks for updating the patch.
>
> Before reviewing the patch, to encourage people to comment and
> review the patch, I explain what this patch provides:
>
> This patch provides the capability to take a base backup during recovery,
> i.e., from the standby server. This is very useful feature to offload the
> expense of periodic backups from the master. That backup procedure is
> similar to that during normal running, but slightly different:
>
> 1. Execute pg_start_backup on the standby. To execute a query on the
>   standby, hot standby must be enabled.
>
> 2. Perform a file system backup on the standby.
>
> 3. Copy the pg_control file from the cluster directory on the standby to
>    the backup as follows:
>
>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global

But this is done as part of step 2 already. I assume what this really
means is that the pg_control file must be the last file backed up?

(Since there are certainly a lot other ways to do the backup than just
cp to a mounted directory..)


> 4. Execute pg_stop_backup on the standby.
>
> The backup taken by the above procedure is available for an archive
> recovery or standby server.
>
> If the standby is promoted during a backup, pg_stop_backup() detects
> the change of the server status and fails. The data backed up before the
> promotion is invalid and not available for recovery.
>
> Taking a backup from the standby by using pg_basebackup is still not
> possible. But we can relax that restriction after applying this patch.

I think that this is going to be very important, particularly given
the requirements on pt 3 above. (But yes, it certainly doesn't have to
be done as part of this patch, but it really should be the plan to
have this included in the same version)


> To take a base backup during recovery safely, some sort of parameters
> must be set properly. Hot standby must be enabled on the standby, i.e.,
> wal_level and hot_standby must be enabled on the master and the standby,
> respectively. FPW (full page writes) is required for a base backup,
> so full_page_writes must be enabled on the master.

Presumably pg_start_backup() will check this. And we'll somehow track
this before pg_stop_backup() as well? (for such evil things such as
the user changing FPW from on to off and then back to on again during
a backup, will will make it look correct both during start and stop,
but incorrect in the middle - pg_stop_backup needs to fail in that
case as well)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-21T06:23:53Z

On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
>> 3. Copy the pg_control file from the cluster directory on the standby to
>>    the backup as follows:
>>
>>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>
> But this is done as part of step 2 already. I assume what this really
> means is that the pg_control file must be the last file backed up?

Yes.

When we perform an archive recovery from the backup taken during
normal processing, we gets a backup end location from the backup-end
WAL record which was written by pg_stop_backup(). But since no WAL
writing is allowed during recovery, pg_stop_backup() on the standby
cannot write a backup-end WAL record. So, in his patch, instead of
a backup-end WAL record, the startup process uses the minimum
recovery point recorded in pg_control which has been included in the
backup, as a backup end location. BTW, a backup end location is
used to check whether recovery has reached a consistency state
(i.e., end-of-backup).

To use the minimum recovery point in pg_control as a backup end
location safely, pg_control must be backed up last. Otherwise, data
page which has the newer LSN than the minimum recovery point
might be included in the backup.

> (Since there are certainly a lot other ways to do the backup than just
> cp to a mounted directory..)

Yes. The above command I described is just an example.

>> 4. Execute pg_stop_backup on the standby.
>>
>> The backup taken by the above procedure is available for an archive
>> recovery or standby server.
>>
>> If the standby is promoted during a backup, pg_stop_backup() detects
>> the change of the server status and fails. The data backed up before the
>> promotion is invalid and not available for recovery.
>>
>> Taking a backup from the standby by using pg_basebackup is still not
>> possible. But we can relax that restriction after applying this patch.
>
> I think that this is going to be very important, particularly given
> the requirements on pt 3 above. (But yes, it certainly doesn't have to
> be done as part of this patch, but it really should be the plan to
> have this included in the same version)

Agreed.

>> To take a base backup during recovery safely, some sort of parameters
>> must be set properly. Hot standby must be enabled on the standby, i.e.,
>> wal_level and hot_standby must be enabled on the master and the standby,
>> respectively. FPW (full page writes) is required for a base backup,
>> so full_page_writes must be enabled on the master.
>
> Presumably pg_start_backup() will check this. And we'll somehow track
> this before pg_stop_backup() as well? (for such evil things such as
> the user changing FPW from on to off and then back to on again during
> a backup, will will make it look correct both during start and stop,
> but incorrect in the middle - pg_stop_backup needs to fail in that
> case as well)

Right. As I suggested upthread, to address that problem, we need to log
the change of FPW on the master, and then we need to check whether
such a WAL is replayed on the standby during the backup. If it's done,
pg_stop_backup() should emit an error.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Magnus Hagander <magnus@hagander.net> — 2011-09-21T08:34:26Z

On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Wed, Sep 21, 2011 at 04:50, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> 3. Copy the pg_control file from the cluster directory on the standby to
>>>    the backup as follows:
>>>
>>>    cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>>
>> But this is done as part of step 2 already. I assume what this really
>> means is that the pg_control file must be the last file backed up?
>
> Yes.
>
> When we perform an archive recovery from the backup taken during
> normal processing, we gets a backup end location from the backup-end
> WAL record which was written by pg_stop_backup(). But since no WAL
> writing is allowed during recovery, pg_stop_backup() on the standby
> cannot write a backup-end WAL record. So, in his patch, instead of
> a backup-end WAL record, the startup process uses the minimum
> recovery point recorded in pg_control which has been included in the
> backup, as a backup end location. BTW, a backup end location is
> used to check whether recovery has reached a consistency state
> (i.e., end-of-backup).
>
> To use the minimum recovery point in pg_control as a backup end
> location safely, pg_control must be backed up last. Otherwise, data
> page which has the newer LSN than the minimum recovery point
> might be included in the backup.

Ah, check.


>> (Since there are certainly a lot other ways to do the backup than just
>> cp to a mounted directory..)
>
> Yes. The above command I described is just an example.

ok.


>>> 4. Execute pg_stop_backup on the standby.
>>>
>>> The backup taken by the above procedure is available for an archive
>>> recovery or standby server.
>>>
>>> If the standby is promoted during a backup, pg_stop_backup() detects
>>> the change of the server status and fails. The data backed up before the
>>> promotion is invalid and not available for recovery.
>>>
>>> Taking a backup from the standby by using pg_basebackup is still not
>>> possible. But we can relax that restriction after applying this patch.
>>
>> I think that this is going to be very important, particularly given
>> the requirements on pt 3 above. (But yes, it certainly doesn't have to
>> be done as part of this patch, but it really should be the plan to
>> have this included in the same version)
>
> Agreed.
>
>>> To take a base backup during recovery safely, some sort of parameters
>>> must be set properly. Hot standby must be enabled on the standby, i.e.,
>>> wal_level and hot_standby must be enabled on the master and the standby,
>>> respectively. FPW (full page writes) is required for a base backup,
>>> so full_page_writes must be enabled on the master.
>>
>> Presumably pg_start_backup() will check this. And we'll somehow track
>> this before pg_stop_backup() as well? (for such evil things such as
>> the user changing FPW from on to off and then back to on again during
>> a backup, will will make it look correct both during start and stop,
>> but incorrect in the middle - pg_stop_backup needs to fail in that
>> case as well)
>
> Right. As I suggested upthread, to address that problem, we need to log
> the change of FPW on the master, and then we need to check whether
> such a WAL is replayed on the standby during the backup. If it's done,
> pg_stop_backup() should emit an error.

I somehow missed this thread completely, so I didn't catch your
previous comments - oops, sorry. The important point being that we
need to track if when this happens even if it has been reset to a
valid value. So we can't just check the state of the variable at the
beginning and at the end.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Remastering using streaming only replication?

Josh Berkus <josh@agliodbs.com> — 2011-09-21T16:52:18Z

Fujii,

I haven't really been following your latest patches about taking backups
from the standby and cascading replication, but I wanted to see if it
fulfills another TODO: the ability to "remaster" (that is, designate the
"lead standby" as the new master) without needing to copy WAL files.

Supporting remastering using steaming replication only was on your TODO
list when we closed 9.1.  It seems like this would get solved as a
side-effect, but I wanted to confirm that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-22T12:13:43Z

On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> Presumably pg_start_backup() will check this. And we'll somehow track
>>> this before pg_stop_backup() as well? (for such evil things such as
>>> the user changing FPW from on to off and then back to on again during
>>> a backup, will will make it look correct both during start and stop,
>>> but incorrect in the middle - pg_stop_backup needs to fail in that
>>> case as well)
>>
>> Right. As I suggested upthread, to address that problem, we need to log
>> the change of FPW on the master, and then we need to check whether
>> such a WAL is replayed on the standby during the backup. If it's done,
>> pg_stop_backup() should emit an error.
>
> I somehow missed this thread completely, so I didn't catch your
> previous comments - oops, sorry. The important point being that we
> need to track if when this happens even if it has been reset to a
> valid value. So we can't just check the state of the variable at the
> beginning and at the end.

Right. Let me explain again what I'm thinking.

When FPW is changed, the master always writes the WAL record
which contains the current value of FPW. This means that the standby
can track all changes of FPW by reading WAL records.

The standby has two flags: One indicates whether FPW has always
been TRUE since last restartpoint. Another indicates whether FPW
has always been TRUE since last pg_start_backup(). The standby
can maintain those flags by reading WAL records streamed from
the master.

If the former flag indicates FALSE (i.e., the WAL records which
the standby has replayed since last restartpoint might not contain
required FPW), pg_start_backup() fails. If the latter flag indicates
FALSE (i.e., the WAL records which the standby has replayed
during the backup might not contain required FPW),
pg_stop_backup() fails.

If I'm not missing something, this approach can address the problem
which you're concerned about.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-22T13:24:51Z

On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 2011/9/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>>
>> Update patch.
>>
>> Changes:
>>  * set 'on' full_page_writes by user (in document)
>>  * read "FROM: XX" in backup_label (in xlog.c)
>>  * check status when pg_stop_backup is executed (in xlog.c)
>
> Thanks for updating the patch.
>
> Before reviewing the patch, to encourage people to comment and
> review the patch, I explain what this patch provides:

Attached is the updated version of the patch. I refactored the code, fixed
some bugs, added lots of source code comments, improved the document,
but didn't change the basic design. Please check this patch, and let's use
this patch as the base if you agree with that.

In the current patch, there is no safeguard for preventing users from
taking backup during recovery when FPW is disabled. This is unsafe.
Are you planning to implement such a safeguard?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Magnus Hagander <magnus@hagander.net> — 2011-09-22T15:44:44Z

On Thu, Sep 22, 2011 at 14:13, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Sep 21, 2011 at 5:34 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Wed, Sep 21, 2011 at 08:23, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Wed, Sep 21, 2011 at 2:13 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> Presumably pg_start_backup() will check this. And we'll somehow track
>>>> this before pg_stop_backup() as well? (for such evil things such as
>>>> the user changing FPW from on to off and then back to on again during
>>>> a backup, will will make it look correct both during start and stop,
>>>> but incorrect in the middle - pg_stop_backup needs to fail in that
>>>> case as well)
>>>
>>> Right. As I suggested upthread, to address that problem, we need to log
>>> the change of FPW on the master, and then we need to check whether
>>> such a WAL is replayed on the standby during the backup. If it's done,
>>> pg_stop_backup() should emit an error.
>>
>> I somehow missed this thread completely, so I didn't catch your
>> previous comments - oops, sorry. The important point being that we
>> need to track if when this happens even if it has been reset to a
>> valid value. So we can't just check the state of the variable at the
>> beginning and at the end.
>
> Right. Let me explain again what I'm thinking.
>
> When FPW is changed, the master always writes the WAL record
> which contains the current value of FPW. This means that the standby
> can track all changes of FPW by reading WAL records.
>
> The standby has two flags: One indicates whether FPW has always
> been TRUE since last restartpoint. Another indicates whether FPW
> has always been TRUE since last pg_start_backup(). The standby
> can maintain those flags by reading WAL records streamed from
> the master.
>
> If the former flag indicates FALSE (i.e., the WAL records which
> the standby has replayed since last restartpoint might not contain
> required FPW), pg_start_backup() fails. If the latter flag indicates
> FALSE (i.e., the WAL records which the standby has replayed
> during the backup might not contain required FPW),
> pg_stop_backup() fails.
>
> If I'm not missing something, this approach can address the problem
> which you're concerned about.

Yeah, it sounds safe to me.

Would it make sense for pg_start_backup() to have the ability to wait
for the next restartpoint in a case like this, if we know that FPW has
been set? Instead of failing? Or maybe that's just overcomplicating
things when trying to be user-friendly.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-09-26T02:39:00Z

On 11-09-22 09:24 AM, Fujii Masao wrote:
> On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> 2011/9/13 Jun Ishiduka<ishizuka.jun@po.ntts.co.jp>:
>>> Update patch.
>>>
>>> Changes:
>>>   * set 'on' full_page_writes by user (in document)
>>>   * read "FROM: XX" in backup_label (in xlog.c)
>>>   * check status when pg_stop_backup is executed (in xlog.c)
>> Thanks for updating the patch.
>>
>> Before reviewing the patch, to encourage people to comment and
>> review the patch, I explain what this patch provides:
> Attached is the updated version of the patch. I refactored the code, fixed
> some bugs, added lots of source code comments, improved the document,
> but didn't change the basic design. Please check this patch, and let's use
> this patch as the base if you agree with that.
>

I have looked at both Jun's patch from Sept 13 and Fujii's updates to 
the patch.  I agree that Fujii's updated version should be used as the 
basis for changes going forward.   My comments below refer to that 
version (unless otherwise noted).

In backup.sgml  the new section titled "Making a Base Backup during 
Recovery"  I would prefer to see some mention in the title that this 
procedure is for standby servers ie "Making a Base Backup from a Standby 
Database".  Users who have setup a hot-standby database should be 
familiar with the 'standby' terminology. I agree that the "during 
recovery" description is technically correct but I'm not sure someone 
who is looking through the manual for instructions on making a base 
backup from here standby will realize this is the section they should read.

Around line 969 where you give an example of copying the control file I 
would be a bit clearer that this is an example command.  Ie (Copy the 
pg_control file from the cluster directory to the global sub-directory 
of the backup.  For example "cp $PGDATA/global/pg_control 
/mnt/server/backupdir/global")

Testing Notes
-----------------------------

I created a standby server from a base backup of another standby server. 
On this new standby server I then

1. Ran pg_start_backup('3'); and left the psql connection open
2. touch /tmp/3 -- my trigger_file

ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file 
found: /tmp/3
FATAL:  terminating walreceiver process due to administrator command
LOG:  restored log file "000000010000000000000006" from archive
LOG:  record with zero length at 0/60002F0
LOG:  restored log file "000000010000000000000006" from archive
LOG:  redo done at 0/6000298
LOG:  restored log file "000000010000000000000006" from archive
PANIC:  record with zero length at 0/6000298
LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.

The new postmaster (the one trying to be promoted) dies.  This is 
somewhat repeatable.

----

If a base backup is in progress on a recovery database and that recovery 
database is promoted to master, following the promotion (if you don't 
restart the postmaster).  I see
select pg_stop_backup();
ERROR:  database system status mismatches between pg_start_backup() and 
pg_stop_backup()

If you restart the postmaster this goes away.  When the postmaster 
leaves recovery mode I think it should abort an existing base backup so 
pg_stop_backup() will say no backup in progress, or give an error 
message on pg_stop_backup() saying that the base backup won't be 
usable.  The above error doesn't really tell the user why there is a 
mismatch.

---------

In my testing a few times I got into a situation where a standby server 
coming from a recovery target took a while to finish recovery (this is 
on a database with no activity).  Then when i tried promoting that 
server to master I got

LOG:  trigger file found: /tmp/3
FATAL:  terminating walreceiver process due to administrator command
LOG:  restored log file "000000010000000000000009" from archive
LOG:  restored log file "000000010000000000000009" from archive
LOG:  redo done at 0/90000E8
LOG:  restored log file "000000010000000000000009" from archive
PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

It is *possible* I mixed up the order of a step somewhere since my 
testing isn't script based. A standby server that 'looks' okay but can't 
actually be promoted is dangerous.

This version of the patch (I was testing the Sept 22nd version) seems 
less stable than how I remember the version from the July CF.  Maybe I'm 
just testing it harder or maybe something has been broken.

> In the current patch, there is no safeguard for preventing users from
> taking backup during recovery when FPW is disabled. This is unsafe.
> Are you planning to implement such a safeguard?
>

I agree with Fujii that we need a way (on the recovery machine) to 
detect if the master doesn't have FPW on. The ideas up-thread on how to 
do this sound good.

> Regards,
>
>
>
>

Re: Remastering using streaming only replication?

Fujii Masao <masao.fujii@gmail.com> — 2011-09-26T08:07:21Z

On Thu, Sep 22, 2011 at 1:52 AM, Josh Berkus <josh@agliodbs.com> wrote:
> Fujii,
>
> I haven't really been following your latest patches about taking backups
> from the standby and cascading replication, but I wanted to see if it
> fulfills another TODO: the ability to "remaster" (that is, designate the
> "lead standby" as the new master) without needing to copy WAL files.

Sorry, I could not follow you. I believe that we can "remaster" even in 9.1.
When the master crashes, we can choose the "lead standby" by comparing
each standby replay location, and can promote it by pg_ctl promote.

What "remaster" feature are you expecting we should develop in 9.2?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-09-26T08:41:06Z

> Attached is the updated version of the patch. I refactored the code, fixed
> some bugs, added lots of source code comments, improved the document,
> but didn't change the basic design. Please check this patch, and let's use
> this patch as the base if you agree with that.

Thanks for update patch.
Yes. I agree.


> In the current patch, there is no safeguard for preventing users from
> taking backup during recovery when FPW is disabled. This is unsafe.
> Are you planning to implement such a safeguard?

Yes.
I want to reference the following Fujii's comments.

-------------------------------------------------------------------------
> Right. Let me explain again what I'm thinking.
> 
> When FPW is changed, the master always writes the WAL record
> which contains the current value of FPW. This means that the standby
> can track all changes of FPW by reading WAL records.
> 
> The standby has two flags: One indicates whether FPW has always
> been TRUE since last restartpoint. Another indicates whether FPW
> has always been TRUE since last pg_start_backup(). The standby
> can maintain those flags by reading WAL records streamed from
> the master.
> 
> If the former flag indicates FALSE (i.e., the WAL records which
> the standby has replayed since last restartpoint might not contain
> required FPW), pg_start_backup() fails. If the latter flag indicates
> FALSE (i.e., the WAL records which the standby has replayed
> during the backup might not contain required FPW),
> pg_stop_backup() fails.
> 
> If I'm not missing something, this approach can address the problem
> which you're concerned about.
-------------------------------------------------------------------------

Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-26T12:12:41Z

On Fri, Sep 23, 2011 at 12:44 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Would it make sense for pg_start_backup() to have the ability to wait
> for the next restartpoint in a case like this, if we know that FPW has
> been set? Instead of failing? Or maybe that's just overcomplicating
> things when trying to be user-friendly.

I don't think that it's worth adding code for such a feature. Because I believe
there are not many users who enable FPW on-the-fly for standby-only backup
and use such a feature.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-27T02:56:25Z

On Mon, Sep 26, 2011 at 11:39 AM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
> I have looked at both Jun's patch from Sept 13 and Fujii's updates to the
> patch.  I agree that Fujii's updated version should be used as the basis for
> changes going forward.   My comments below refer to that version (unless
> otherwise noted).

Thanks for the tests and comments!

> In backup.sgml  the new section titled "Making a Base Backup during
> Recovery"  I would prefer to see some mention in the title that this
> procedure is for standby servers ie "Making a Base Backup from a Standby
> Database".  Users who have setup a hot-standby database should be familiar
> with the 'standby' terminology. I agree that the "during recovery"
> description is technically correct but I'm not sure someone who is looking
> through the manual for instructions on making a base backup from here
> standby will realize this is the section they should read.

I used the term "recovery" rather than "standby" because we can take
a backup even from the server in normal archive recovery mode but not
standby mode. But there is not many users who take a backup during
normal archive recovery, so I agree that the term "standby" is better to
be used in the document. Will change.

> Around line 969 where you give an example of copying the control file I
> would be a bit clearer that this is an example command.  Ie (Copy the
> pg_control file from the cluster directory to the global sub-directory of
> the backup.  For example "cp $PGDATA/global/pg_control
> /mnt/server/backupdir/global")

Looks better. Will change.

> Testing Notes
> -----------------------------
>
> I created a standby server from a base backup of another standby server. On
> this new standby server I then
>
> 1. Ran pg_start_backup('3'); and left the psql connection open
> 2. touch /tmp/3 -- my trigger_file
>
> ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG:  trigger file found:
> /tmp/3
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  restored log file "000000010000000000000006" from archive
> LOG:  record with zero length at 0/60002F0
> LOG:  restored log file "000000010000000000000006" from archive
> LOG:  redo done at 0/6000298
> LOG:  restored log file "000000010000000000000006" from archive
> PANIC:  record with zero length at 0/6000298
> LOG:  startup process (PID 19011) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
>
> The new postmaster (the one trying to be promoted) dies.  This is somewhat
> repeatable.

Looks weired. Though the WAL record starting from 0/6000298 was read
successfully, then re-fetch of the same record fails at the end of recovery.
One possible cause is the corruption of archived WAL file. What
restore_command on the standby and archive_command on the master
are you using? Could you confirm that there is no chance to overwrite
archive WAL files in your environment?

I tried to reproduce this problem several times, but I could not. Could
you provide the test case which reproduces the problem?

> If a base backup is in progress on a recovery database and that recovery
> database is promoted to master, following the promotion (if you don't
> restart the postmaster).  I see
> select pg_stop_backup();
> ERROR:  database system status mismatches between pg_start_backup() and
> pg_stop_backup()
>
> If you restart the postmaster this goes away.  When the postmaster leaves
> recovery mode I think it should abort an existing base backup so
> pg_stop_backup() will say no backup in progress,

I don't think that it's good idea to cancel the backup when promoting
the standby.
Because if we do so, we need to handle correctly the case where cancel of backup
and pg_start_backup/pg_stop_backup are performed at the same time. We can
simply do that by protecting those whole operations including pg_start_backup's
checkpoint by the lwlock. But I don't think that it's worth
introducing new lwlock
only for that. And it's not good to take a lwlock through
time-consuming checkpoint
operation. Of course we can avoid such a lwlock, but which would require more
complicated code.

> or give an error message on
> pg_stop_backup() saying that the base backup won't be usable.  The above
> error doesn't really tell the user why there is a mismatch.

What about the following error message?

ERROR:  pg_stop_backup() was executed during normal processing though
pg_start_backup() was executed during recovery
HINT:  The database backup will not be usable.

Or, you have better idea?

> In my testing a few times I got into a situation where a standby server
> coming from a recovery target took a while to finish recovery (this is on a
> database with no activity).  Then when i tried promoting that server to
> master I got
>
> LOG:  trigger file found: /tmp/3
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  restored log file "000000010000000000000009" from archive
> LOG:  restored log file "000000010000000000000009" from archive
> LOG:  redo done at 0/90000E8
> LOG:  restored log file "000000010000000000000009" from archive
> PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
> LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
>
> It is *possible* I mixed up the order of a step somewhere since my testing
> isn't script based. A standby server that 'looks' okay but can't actually be
> promoted is dangerous.

Looks the same problem as the above. Another weired point is that
the same archived WAL file is restored two times before redo is done.
I'm not sure why this happens... Could you provide the test case which
reproduces this problem? Will diagnose.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-27T05:51:38Z

On Tue, Sep 27, 2011 at 11:56 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> In backup.sgml  the new section titled "Making a Base Backup during
>> Recovery"  I would prefer to see some mention in the title that this
>> procedure is for standby servers ie "Making a Base Backup from a Standby
>> Database".  Users who have setup a hot-standby database should be familiar
>> with the 'standby' terminology. I agree that the "during recovery"
>> description is technically correct but I'm not sure someone who is looking
>> through the manual for instructions on making a base backup from here
>> standby will realize this is the section they should read.
>
> I used the term "recovery" rather than "standby" because we can take
> a backup even from the server in normal archive recovery mode but not
> standby mode. But there is not many users who take a backup during
> normal archive recovery, so I agree that the term "standby" is better to
> be used in the document. Will change.

Done.

>> Around line 969 where you give an example of copying the control file I
>> would be a bit clearer that this is an example command.  Ie (Copy the
>> pg_control file from the cluster directory to the global sub-directory of
>> the backup.  For example "cp $PGDATA/global/pg_control
>> /mnt/server/backupdir/global")
>
> Looks better. Will change.

Done.

>> or give an error message on
>> pg_stop_backup() saying that the base backup won't be usable.  The above
>> error doesn't really tell the user why there is a mismatch.
>
> What about the following error message?
>
> ERROR:  pg_stop_backup() was executed during normal processing though
> pg_start_backup() was executed during recovery
> HINT:  The database backup will not be usable.

Done. I attached the new version of the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-09-27T23:10:39Z

On 11-09-26 10:56 PM, Fujii Masao wrote:
>
> Looks weired. Though the WAL record starting from 0/6000298 was read
> successfully, then re-fetch of the same record fails at the end of recovery.
> One possible cause is the corruption of archived WAL file. What
> restore_command on the standby and archive_command on the master
> are you using? Could you confirm that there is no chance to overwrite
> archive WAL files in your environment?
>
> I tried to reproduce this problem several times, but I could not. Could
> you provide the test case which reproduces the problem?
>

This is the test procedure I'm trying today, I wasn't able to reproduce 
the crash.  What I was doing the other day was similar but I can't speak 
to unintentional differences.


I have my master server
data
port=5439
wal_level=hot_standby
archive_mode=on
archive_command='cp -i %p /usr/local/pgsql92git/archive/%f'
hot_standby=on

I then run
select pg_start_backup('foo');
$ rm -r ../data2
$ cp -r ../data ../data2
$ rm ../data2/postmaster.pid
select pg_stop_backup();
I edit data2/postgresql.conf so
port=5438
I commented out archive_mode and archive_command (or at least today I did)
recovery.conf is

standby_mode='on'
primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
restore_command='cp /usr/local/pgsql92git/archive/%f %p'

I then start up the second cluster. On it I run

select pg_start_backup('1');
$ rm -r ../data3
$ rm -r ../archive2
$ cp -r ../data2 ../data3
$ cp ../data2/global/pg_control ../data3/global

select pg_stop_backup();
I edit ../data2/postgresql.conf
port=5437
archive_mode=on
                                 # (change requires restart)
archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'

recovery.conf is

standby_mode='on'
primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test'
restore_command='cp /usr/local/pgsql92git/archive/%f %p'
trigger_file='/tmp/3'

$ postgres -D ../data3

The first time I did this postgres came up quickly.

$ touch /tmp/3

worked fine.

I then stopped data3
$ rm -r ../data3
on data 2 I run
pg_start_backup('1')
$ cp -r ../data2 ../data3
$ cp ../data2/global/pg_control ../data3/global
select pg_stop_backup() # on data2
$ rm ../data3/postmaster.pid
vi ../data3/postgresql.conf # same changes as above for data3
vi ../data3/recovery.conf # same as above for data 3
postgres -D ../data3

This time I got
./postgres -D ../data3
LOG:  database system was interrupted while in recovery at log time 
2011-09-27 22:04:17 GMT
HINT:  If this has occurred more than once some data might be corrupted 
and you might need to choose an earlier recovery target.
LOG:  entering standby mode
cp: cannot stat 
`/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
or directory
LOG:  redo starts at 0/C000020
LOG:  record with incorrect prev-link 0/9000058 at 0/C0000B0
cp: cannot stat 
`/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
or directory
LOG:  streaming replication successfully connected to primary
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  consistent recovery state reached at 0/C0000E8
LOG:  database system is ready to accept read only connections

In order to get the database to come in read only mode I manually issued 
a checkpoint on the master (data) shortly after the checkpoint command 
the data3 instance went to read only mode.

then

touch /tmp/3

trigger file found: /tmp/3
FATAL:  terminating walreceiver process due to administrator command
cp: cannot stat 
`/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
or directory
LOG:  record with incorrect prev-link 0/9000298 at 0/C0002F0
cp: cannot stat 
`/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
or directory
LOG:  redo done at 0/C000298
cp: cannot stat 
`/usr/local/pgsql92git/archive/00000001000000000000000C': No such file 
or directory
cp: cannot stat `/usr/local/pgsql92git/archive/00000002.history': No 
such file or directory
LOG:  selected new timeline ID: 2
cp: cannot stat `/usr/local/pgsql92git/archive/00000001.history': No 
such file or directory
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started


It looks like data3 is still pulling files with the recovery command 
after it sees the touch file (is this expected behaviour?)
$ grep archive ../data3/postgresql.conf
#wal_level = minimal            # minimal, archive, or hot_standby
#archive_mode = off        # allows archiving to be done
archive_mode=on
archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f'


I have NOT been able to make postgres crash during a recovery (today).  
It is *possible* that on some of my runs the other day I had skipped 
changing the archive command on data3 to write to archive2 instead of 
archive.

I have also today not been able to get it to attempt to restore the same 
WAL file twice.


>> If a base backup is in progress on a recovery database and that recovery
>> database is promoted to master, following the promotion (if you don't
>> restart the postmaster).  I see
>> select pg_stop_backup();
>> ERROR:  database system status mismatches between pg_start_backup() and
>> pg_stop_backup()
>>
>> If you restart the postmaster this goes away.  When the postmaster leaves
>> recovery mode I think it should abort an existing base backup so
>> pg_stop_backup() will say no backup in progress,
> I don't think that it's good idea to cancel the backup when promoting
> the standby.
> Because if we do so, we need to handle correctly the case where cancel of backup
> and pg_start_backup/pg_stop_backup are performed at the same time. We can
> simply do that by protecting those whole operations including pg_start_backup's
> checkpoint by the lwlock. But I don't think that it's worth
> introducing new lwlock
> only for that. And it's not good to take a lwlock through
> time-consuming checkpoint
> operation. Of course we can avoid such a lwlock, but which would require more
> complicated code.
>
>> or give an error message on
>> pg_stop_backup() saying that the base backup won't be usable.  The above
>> error doesn't really tell the user why there is a mismatch.
> What about the following error message?
>
> ERROR:  pg_stop_backup() was executed during normal processing though
> pg_start_backup() was executed during recovery
> HINT:  The database backup will not be usable.
>
> Or, you have better idea?

I like that error message better.  It tells me what is going on versus 
complaining about a state mismatch.
>> In my testing a few times I got into a situation where a standby server
>> coming from a recovery target took a while to finish recovery (this is on a
>> database with no activity).  Then when i tried promoting that server to
>> master I got
>>
>> LOG:  trigger file found: /tmp/3
>> FATAL:  terminating walreceiver process due to administrator command
>> LOG:  restored log file "000000010000000000000009" from archive
>> LOG:  restored log file "000000010000000000000009" from archive
>> LOG:  redo done at 0/90000E8
>> LOG:  restored log file "000000010000000000000009" from archive
>> PANIC:  unexpected pageaddr 0/6000000 in log file 0, segment 9, offset 0
>> LOG:  startup process (PID 1804) was terminated by signal 6: Aborted
>> LOG:  terminating any other active server processes
>>
>> It is *possible* I mixed up the order of a step somewhere since my testing
>> isn't script based. A standby server that 'looks' okay but can't actually be
>> promoted is dangerous.
> Looks the same problem as the above. Another weired point is that
> the same archived WAL file is restored two times before redo is done.
> I'm not sure why this happens... Could you provide the test case which
> reproduces this problem? Will diagnose.
>
> Regards,
>

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-09-28T01:58:19Z

On Wed, Sep 28, 2011 at 8:10 AM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
> This is the test procedure I'm trying today, I wasn't able to reproduce the
> crash.  What I was doing the other day was similar but I can't speak to
> unintentional differences.

Thanks for the info! I tried your test case three times, but was not able to
reproduce the issue, too.

BTW, I created the shell script (attached) which runs your test scenario and
used it for the test.

If the issue will happen again, please feel free to share the information about
it. I will diagnose it.

> It looks like data3 is still pulling files with the recovery command after
> it sees the touch file (is this expected behaviour?)

Yes, that's expected behavior. After the trigger file is found, PostgreSQL
tries to replay all available WAL files in pg_xlog directory and archive one.
So, if there is unreplayed archived WAL file at that time, PostgreSQL tries
to pull it by calling the recovery command.

And, after WAL replay is done, PostgreSQL tries to re-fetch the last
replayed WAL record in order to identify the end of replay location. So,
if the last replayed record is included in the archived WAL file, it's pulled
by the recovery command.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-09T18:04:25Z

I created a patch corresponding FPW.
Fujii's patch (ver 9) is based.

 Manage own FPW in shared-memory (on master)
   * startup and walwriter process update it. startup initializes it
     after REDO. walwriter updates it when started or received SIGHUP.

 Insert WAL including a value of current FPW (on master)
   * In the the same timing as update, they insert WAL (is named
     XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
   * When it creates CHECKPOINT, it adds a value of current FPW to the
     CHECKPOINT WAL.

 Manage master's FPW in local-memory in startup (on standby)
   * It takes a value of the master's FPW by reading XLOG_FPW_CHANGE at
     REDO.

 Check when pg_start_backup/pg_stop_backup (on standby)
   * It checks to use these two value.
       * master's FPW at latest CHECKPOINT
       * current master's FPW by XLOG_FPW_CHANGE

Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2011-10-09T18:56:11Z

2011/10/9 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:

>  Insert WAL including a value of current FPW (on master)
>   * In the the same timing as update, they insert WAL (is named
>     XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
>   * When it creates CHECKPOINT, it adds a value of current FPW to the
>     CHECKPOINT WAL.

I can't see a reason why we would use a new WAL record for this,
rather than modify the XLOG_PARAMETER_CHANGE record type which was
created for a very similar reason.
The code would be much simpler if we just extend
XLOG_PARAMETER_CHANGE, so please can we do that?

The log message "full_page_writes on master is set invalid more than
once during online backup" should read "at least once" rather than
"more than once".

lastFpwDisabledLSN needs to be initialized.

Is there a reason to add lastFpwDisabledLSN onto the Control file? If
we log parameters after every checkpoint then we'll know the values
when we startup. If we keep logging parameters this way we'll end up
with a very awkward and large control file. I would personally prefer
to avoid that, but that thought could go either way. Let's see if
anyone else thinks that also.

Looks good.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-11T10:15:54Z

> I can't see a reason why we would use a new WAL record for this,
> rather than modify the XLOG_PARAMETER_CHANGE record type which was
> created for a very similar reason.
> The code would be much simpler if we just extend
> XLOG_PARAMETER_CHANGE, so please can we do that?

Sure.

> The log message "full_page_writes on master is set invalid more than
> once during online backup" should read "at least once" rather than
> "more than once".

Yes.

> lastFpwDisabledLSN needs to be initialized.

I think it don't need because all values in XLogCtl is initialized 0.

> Is there a reason to add lastFpwDisabledLSN onto the Control file? If
> we log parameters after every checkpoint then we'll know the values
> when we startup. If we keep logging parameters this way we'll end up
> with a very awkward and large control file. I would personally prefer
> to avoid that, but that thought could go either way. Let's see if
> anyone else thinks that also.

Yes. I add to CreateCheckPoint().

Image:
  CreateCheckPoint()
  {
     if (!shutdown && XLogStandbyInfoActive())
     {
        LogStandbySnapshot()
        XLogReportParameters()
     }
   }

  XLogReportParameters()
  {
     if (fpw == 'off' || ... )
         XLOGINSERT()
  }

However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
(It will increases the amount of WAL.)
Is it OK?


Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-11T15:17:27Z

> > I can't see a reason why we would use a new WAL record for this,
> > rather than modify the XLOG_PARAMETER_CHANGE record type which was
> > created for a very similar reason.
> > The code would be much simpler if we just extend
> > XLOG_PARAMETER_CHANGE, so please can we do that?
> 
> Sure.
> 
> > The log message "full_page_writes on master is set invalid more than
> > once during online backup" should read "at least once" rather than
> > "more than once".
> 
> Yes.
> 
> > lastFpwDisabledLSN needs to be initialized.
> 
> I think it don't need because all values in XLogCtl is initialized 0.
> 
> > Is there a reason to add lastFpwDisabledLSN onto the Control file? If
> > we log parameters after every checkpoint then we'll know the values
> > when we startup. If we keep logging parameters this way we'll end up
> > with a very awkward and large control file. I would personally prefer
> > to avoid that, but that thought could go either way. Let's see if
> > anyone else thinks that also.
> 
> Yes. I add to CreateCheckPoint().
> 
> Image:
>   CreateCheckPoint()
>   {
>      if (!shutdown && XLogStandbyInfoActive())
>      {
>         LogStandbySnapshot()
>         XLogReportParameters()
>      }
>    }
> 
>   XLogReportParameters()
>   {
>      if (fpw == 'off' || ... )
>          XLOGINSERT()
>   }
> 
> However, it'll write XLOG_PARAMETER_CHANGE every checkpoints when FPW is 'off'.
> (It will increases the amount of WAL.)
> Is it OK?

Done.

Updated patch attached.

Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-10-11T21:44:34Z

On 11-10-11 11:17 AM, Jun Ishiduka wrote:
> Done.
>
> Updated patch attached.
>

I have taken Jun's latest patch and applied it on top of Fujii's most
recent patch. I did some testing with the result but nothing theory
enough to stumble on any race conditions.

Some testing notes
------------------------------
select pg_start_backup('x');
ERROR: full_page_writes on master is set invalid at least once since
latest checkpoint

I think this error should be rewritten as
ERROR: full_page_writes on master has been off at some point since
latest checkpoint

We should be using 'off' instead of 'invalid' since that is what is what
the user sets it to.

I switched full_page_writes=on , on the master

did a pg_start_backup() on the slave1.

Then I switched full_page_writes=off on the master, did a reload +
checkpoint.

I was able to then do my backup of slave1, copy the control file, and
pg_stop_backup().
When I did the test slave2 started okay, but is this safe? Do we need a
warning from pg_stop_backup() that is printed if it is detected that
full_page_writes was turned off on the master during the backup period?

Code Notes
---------------------
*** 6865,6870 ****
--- 6871,6886 ----
/* Pre-scan prepared transactions to find out the range of XIDs present */
oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);

+ /*
+ * The startup updates FPW in shaerd-memory after REDO. However, it must
+ * perform before writing the WAL of the CHECKPOINT. The reason is that
+ * it uses a value of fpw in shared-memory when it writes a WAL of its
+ * CHECKPOTNT.
+ */

Minor typo above at 'CHECKPOTNT'

If my concern about full page writes being switched to off in the middle
of a backup is unfounded then I think this patch is ready for a
committer. They can clean the two editorial changes when they apply the
patches.

If do_pg_stop_backup is going to need some logic to recheck the full
page write status then an updated patch is required.

> Regards.
>
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka.jun@po.ntts.co.jp
> --------------------------------------------
>
>
>

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-12T02:43:59Z

> Some testing notes
> ------------------------------
> select pg_start_backup('x');
> ERROR: full_page_writes on master is set invalid at least once since
> latest checkpoint
> 
> I think this error should be rewritten as
> ERROR: full_page_writes on master has been off at some point since
> latest checkpoint
> 
> We should be using 'off' instead of 'invalid' since that is what is what
> the user sets it to.

Sure.


> I switched full_page_writes=on , on the master
> 
> did a pg_start_backup() on the slave1.
> 
> Then I switched full_page_writes=off on the master, did a reload +
> checkpoint.
> 
> I was able to then do my backup of slave1, copy the control file, and
> pg_stop_backup().
>
> When I did the test slave2 started okay, but is this safe? Do we need a
> warning from pg_stop_backup() that is printed if it is detected that
> full_page_writes was turned off on the master during the backup period?

I also reproduced.

pg_stop_backup() fails in most cases.
However, it succeeds if both the following cases are true.
  * checkpoint is done before walwriter recieves SIGHUP.
  * slave1 has not received the WAL of 'off' by SIGHUP yet.



> Minor typo above at 'CHECKPOTNT'

Yes.


> If my concern about full page writes being switched to off in the middle
> of a backup is unfounded then I think this patch is ready for a
> committer. They can clean the two editorial changes when they apply the
> patches.

Yes. I'll clean since these comments fix.


> If do_pg_stop_backup is going to need some logic to recheck the full
> page write status then an updated patch is required.

It already contains.


Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-12T07:27:02Z

> > Some testing notes
> > ------------------------------
> > select pg_start_backup('x');
> > ERROR: full_page_writes on master is set invalid at least once since
> > latest checkpoint
> > 
> > I think this error should be rewritten as
> > ERROR: full_page_writes on master has been off at some point since
> > latest checkpoint
> > 
> > We should be using 'off' instead of 'invalid' since that is what is what
> > the user sets it to.
> 
> Sure.


> > Minor typo above at 'CHECKPOTNT'
> 
> Yes.


I updated to patch corresponded above-comments.

Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-12T07:53:52Z

2011/10/12 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
> > ERROR: full_page_writes on master is set invalid at least once since
> > latest checkpoint
> >
> > I think this error should be rewritten as
> > ERROR: full_page_writes on master has been off at some point since
> > latest checkpoint
> >
> > We should be using 'off' instead of 'invalid' since that is what is what
> > the user sets it to.
>
> Sure.

What about the following message? It sounds more precise to me.

ERROR: WAL generated with full_page_writes=off was replayed since last
restartpoint

> I updated to patch corresponded above-comments.

Thanks for updating the patch! Here are the comments:

 	 * don't yet have the insert lock, forcePageWrites could change under us,
 	 * but we'll recheck it once we have the lock.
 	 */
-	doPageWrites = fullPageWrites || Insert->forcePageWrites;
+	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;

The source comment needs to be modified.

 	 * just turned off, we could recompute the record without full pages, but
 	 * we choose not to bother.)
 	 */
-	if (Insert->forcePageWrites && !doPageWrites)
+	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)

Same as above.

+	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
+	XLogCtl->Insert.fullPageWrites = fullPageWrites;
+	LWLockRelease(WALInsertLock);

I don't think WALInsertLock needs to be hold here because there is no
concurrently running process which can access Insert.fullPageWrites.
For example, Insert->currpos and Insert->LogwrtResult are also changed
without the lock there.

The source comment of XLogReportParameters() needs to be modified.

XLogReportParameters() should skip writing WAL if full_page_writes has not been
changed by SIGHUP.

XLogReportParameters() should skip updating pg_control if any parameter related
to hot standby has not been changed.

+	if (!fpw_manager)
+	{
+		LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
+		fpw = XLogCtl->Insert.fullPageWrites;
+		LWLockRelease(WALInsertLock);

It's safe to take WALInsertLock with shared mode here.

In checkpoint, XLogReportParameters() is called only when wal_level is
hot_standby.
OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
Can't we skip calling XLogReportParameters() whenever wal_level is not
hot_standby?

In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
see XLogCtl->lastFpwDisabledLSN.

+	/* check whether the master's FPW is 'off' since pg_start_backup. */
+	if (recovery_in_progress && XLByteLE(startpoint, XLogCtl->lastFpwDisabledLSN))
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			  errmsg("full_page_writes on master has been off at some point
during online backup")));

What about changing the error message to:
ERROR: WAL generated with full_page_writes=off was replayed during online backup

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T04:31:27Z

> > > ERROR: full_page_writes on master is set invalid at least once since
> > > latest checkpoint
> > >
> > > I think this error should be rewritten as
> > > ERROR: full_page_writes on master has been off at some point since
> > > latest checkpoint
> > >
> > > We should be using 'off' instead of 'invalid' since that is what is what
> > > the user sets it to.
> >
> > Sure.
> 
> What about the following message? It sounds more precise to me.
> 
> ERROR: WAL generated with full_page_writes=off was replayed since last
> restartpoint

Okay, I changes the patch to this messages.
If someone says there is a idea better than it, I will consider again.


> > I updated to patch corresponded above-comments.
> 
> Thanks for updating the patch! Here are the comments:
> 
>  	 * don't yet have the insert lock, forcePageWrites could change under us,
>  	 * but we'll recheck it once we have the lock.
>  	 */
> -	doPageWrites = fullPageWrites || Insert->forcePageWrites;
> +	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;
> 
> The source comment needs to be modified.
>
>  	 * just turned off, we could recompute the record without full pages, but
>  	 * we choose not to bother.)
>  	 */
> -	if (Insert->forcePageWrites && !doPageWrites)
> +	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)
> 
> Same as above.

Sure.


> XLogReportParameters() should skip writing WAL if full_page_writes has not been
> changed by SIGHUP.
> 
> XLogReportParameters() should skip updating pg_control if any parameter related
> to hot standby has not been changed.

YES.


> In checkpoint, XLogReportParameters() is called only when wal_level is
> hot_standby.
> OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
> Can't we skip calling XLogReportParameters() whenever wal_level is not
> hot_standby?

Yes, It is possible.


> In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
> see XLogCtl->lastFpwDisabledLSN.

Yes.


> What about changing the error message to:
> ERROR: WAL generated with full_page_writes=off was replayed during online backup

Okay, too.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T05:01:38Z

Sorry.
I was not previously able to answer fujii's all comments.
This is the remaining answers.


> +	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
> +	XLogCtl->Insert.fullPageWrites = fullPageWrites;
> +	LWLockRelease(WALInsertLock);
> 
> I don't think WALInsertLock needs to be hold here because there is no
> concurrently running process which can access Insert.fullPageWrites.
> For example, Insert->currpos and Insert->LogwrtResult are also changed
> without the lock there.
> 

Yes. 

> The source comment of XLogReportParameters() needs to be modified.

Yes, too.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-13T09:39:09Z

> 
> > > > ERROR: full_page_writes on master is set invalid at least once since
> > > > latest checkpoint
> > > >
> > > > I think this error should be rewritten as
> > > > ERROR: full_page_writes on master has been off at some point since
> > > > latest checkpoint
> > > >
> > > > We should be using 'off' instead of 'invalid' since that is what is what
> > > > the user sets it to.
> > >
> > > Sure.
> > 
> > What about the following message? It sounds more precise to me.
> > 
> > ERROR: WAL generated with full_page_writes=off was replayed since last
> > restartpoint
> 
> Okay, I changes the patch to this messages.
> If someone says there is a idea better than it, I will consider again.
> 
> 
> > > I updated to patch corresponded above-comments.
> > 
> > Thanks for updating the patch! Here are the comments:
> > 
> >  	 * don't yet have the insert lock, forcePageWrites could change under us,
> >  	 * but we'll recheck it once we have the lock.
> >  	 */
> > -	doPageWrites = fullPageWrites || Insert->forcePageWrites;
> > +	doPageWrites = Insert->fullPageWrites || Insert->forcePageWrites;
> > 
> > The source comment needs to be modified.
> >
> >  	 * just turned off, we could recompute the record without full pages, but
> >  	 * we choose not to bother.)
> >  	 */
> > -	if (Insert->forcePageWrites && !doPageWrites)
> > +	if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites)
> > 
> > Same as above.
> 
> Sure.
> 
> 
> > XLogReportParameters() should skip writing WAL if full_page_writes has not been
> > changed by SIGHUP.
> > 
> > XLogReportParameters() should skip updating pg_control if any parameter related
> > to hot standby has not been changed.
> 
> YES.
> 
> 
> > In checkpoint, XLogReportParameters() is called only when wal_level is
> > hot_standby.
> > OTOH, in walwriter, it's always called even when wal_level is not hot_standby.
> > Can't we skip calling XLogReportParameters() whenever wal_level is not
> > hot_standby?
> 
> Yes, It is possible.
> 
> 
> > In do_pg_start_backup() and do_pg_stop_backup(), the spinlock must be held to
> > see XLogCtl->lastFpwDisabledLSN.
> 
> Yes.
> 
> 
> > What about changing the error message to:
> > ERROR: WAL generated with full_page_writes=off was replayed during online backup
> 
> Okay, too.

> Sorry.
> I was not previously able to answer fujii's all comments.
> This is the remaining answers.
> 
> 
> > +	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
> > +	XLogCtl->Insert.fullPageWrites = fullPageWrites;
> > +	LWLockRelease(WALInsertLock);
> > 
> > I don't think WALInsertLock needs to be hold here because there is no
> > concurrently running process which can access Insert.fullPageWrites.
> > For example, Insert->currpos and Insert->LogwrtResult are also changed
> > without the lock there.
> > 
> 
> Yes. 
> 
> > The source comment of XLogReportParameters() needs to be modified.
> 
> Yes, too.

Done.
I updated to patch corresponded above-comments.

Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-13T13:44:50Z

On Mon, Oct 10, 2011 at 3:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> 2011/10/9 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>
>>  Insert WAL including a value of current FPW (on master)
>>   * In the the same timing as update, they insert WAL (is named
>>     XLOG_FPW_CHANGE). XLOG_FPW_CHANGE has a value of the changed FPW.
>>   * When it creates CHECKPOINT, it adds a value of current FPW to the
>>     CHECKPOINT WAL.
>
> I can't see a reason why we would use a new WAL record for this,
> rather than modify the XLOG_PARAMETER_CHANGE record type which was
> created for a very similar reason.
> The code would be much simpler if we just extend
> XLOG_PARAMETER_CHANGE, so please can we do that?

After reading Ishiduka-san's patch, I'm thinking the opposite because
(1) Whenever full_page_writes must be WAL-logged, there is no need
to WAL-log the HS parameters. The opposite is also true. (2) How
full_page_writes record should be replayed is quite different from
how HS parameters record is.

So ISTM that the code would be simpler if we introduce new WAL
record for full_page_writes. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-14T12:28:29Z

2011/10/13 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
> I updated to patch corresponded above-comments.

Thanks for updating the patch!

As I suggested in the reply to Simon, I think that the change of FPW
should be WAL-logged separately from that of HS parameters. ISTM
packing them in one WAL record makes XLogReportParameters()
quite confusing. Thought?

 	if (!shutdown && XLogStandbyInfoActive())
+	{
 		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
+		XLogReportParameters(REPORT_ON_BACKEND);
+	}

Why doesn't the change of FPW need to be WAL-logged when
shutdown checkpoint is performed? It's helpful to add the comment
explaining why.


Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-15T01:35:45Z

> As I suggested in the reply to Simon, I think that the change of FPW
> should be WAL-logged separately from that of HS parameters. ISTM
> packing them in one WAL record makes XLogReportParameters()
> quite confusing. Thought?

I want to confirm the reply of Simon. I think we cannot decide how this
code should be if there is not the reply.


>  	if (!shutdown && XLogStandbyInfoActive())
> +	{
>  		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
> +		XLogReportParameters(REPORT_ON_BACKEND);
> +	}
> 
> Why doesn't the change of FPW need to be WAL-logged when
> shutdown checkpoint is performed? It's helpful to add the comment
> explaining why.

Sure. I update the patch soon.



--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-15T02:12:36Z

> >  	if (!shutdown && XLogStandbyInfoActive())
> > +	{
> >  		LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
> > +		XLogReportParameters(REPORT_ON_BACKEND);
> > +	}
> > 
> > Why doesn't the change of FPW need to be WAL-logged when
> > shutdown checkpoint is performed? It's helpful to add the comment
> > explaining why.
> 
> Sure. I update the patch soon.

Done.
Please check this.

Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-17T07:16:05Z

2011/10/15 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>
>> >     if (!shutdown && XLogStandbyInfoActive())
>> > +   {
>> >             LogStandbySnapshot(&checkPoint.oldestActiveXid, &checkPoint.nextXid);
>> > +           XLogReportParameters(REPORT_ON_BACKEND);
>> > +   }
>> >
>> > Why doesn't the change of FPW need to be WAL-logged when
>> > shutdown checkpoint is performed? It's helpful to add the comment
>> > explaining why.
>>
>> Sure. I update the patch soon.
>
> Done.

+ 		/*
+ 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
+ 		 * not need to write WAL of FPW at checkpoint shutdown because it
+ 		 * performs when startup finishes.
+ 		 */
+ 		XLogReportParameters(REPORT_ON_BACKEND);

I'm still unclear why that WAL doesn't need to be written at shutdown
checkpoint.
Anyway, the first sentence in the above comments is not right. Not a backend but
a bgwriter writes that WAL at checkpoint.

The second also seems not to be right. It implies that a shutdown checkpoint is
performed only at end of startup. But it may be done when smart or fast shutdown
is requested.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-18T06:25:46Z

> + 		/*
> + 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
> + 		 * not need to write WAL of FPW at checkpoint shutdown because it
> + 		 * performs when startup finishes.
> + 		 */
> + 		XLogReportParameters(REPORT_ON_BACKEND);
> 
> I'm still unclear why that WAL doesn't need to be written at shutdown
> checkpoint.
> Anyway, the first sentence in the above comments is not right. Not a backend but
> a bgwriter writes that WAL at checkpoint.
> 
> The second also seems not to be right. It implies that a shutdown checkpoint is
> performed only at end of startup. But it may be done when smart or fast shutdown
> is requested.


Okay. 
I change to the following messages.

/* 
 * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
 * Because XLogReportParameters() is always called at the end of startup
 * process, it does not need to be called at shutdown.
 */


In addition, I change macro name.

REPORT_ON_BACKEND -> REPORT_ON_BGWRITER


Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-19T02:47:08Z

> > + 		/*
> > + 		 * The backend writes WAL of FPW at checkpoint. However, The backend do
> > + 		 * not need to write WAL of FPW at checkpoint shutdown because it
> > + 		 * performs when startup finishes.
> > + 		 */
> > + 		XLogReportParameters(REPORT_ON_BACKEND);
> > 
> > I'm still unclear why that WAL doesn't need to be written at shutdown
> > checkpoint.
> > Anyway, the first sentence in the above comments is not right. Not a backend but
> > a bgwriter writes that WAL at checkpoint.
> > 
> > The second also seems not to be right. It implies that a shutdown checkpoint is
> > performed only at end of startup. But it may be done when smart or fast shutdown
> > is requested.
> 
> 
> Okay. 
> I change to the following messages.
> 
> /* 
>  * The bgwriter writes WAL of FPW at checkpoint. But does not at shutdown.
>  * Because XLogReportParameters() is always called at the end of startup
>  * process, it does not need to be called at shutdown.
>  */
> 
> 
> In addition, I change macro name.
> 
> REPORT_ON_BACKEND -> REPORT_ON_BGWRITER

I have updated as above-comment.
Please check this.

Regards.

--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-19T07:37:32Z

> As I suggested in the reply to Simon, I think that the change of FPW
> should be WAL-logged separately from that of HS parameters. ISTM
> packing them in one WAL record makes XLogReportParameters()
> quite confusing. Thought?

I updated a patch for what you have suggested (that the change of FPW
should be WAL-logged separately from that of HS parameters).

I want to base on this patch if there are no other opinions.

Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-24T12:29:20Z

2011/10/19 Jun Ishiduka <ishizuka.jun@po.ntts.co.jp>:
>> As I suggested in the reply to Simon, I think that the change of FPW
>> should be WAL-logged separately from that of HS parameters. ISTM
>> packing them in one WAL record makes XLogReportParameters()
>> quite confusing. Thought?
>
> I updated a patch for what you have suggested (that the change of FPW
> should be WAL-logged separately from that of HS parameters).
>
> I want to base on this patch if there are no other opinions.

Thanks for updating the patch!

Attached is the updated version of the patch. I merged your patch into
standby_online_backup_09_fujii.patch, refactored the code, fixed some
bugs, added lots of source code comments, but didn't change the basic
design that you proposed.

In your patch, FPW is always WAL-logged at startup even when FPW has
not been changed since last shutdown. I don't think that's required.
I changed the recovery code so that it keeps track of last FPW indicated
by WAL record. Then, at end of startup, if that FPW is equal to FPW
specified in postgresql.conf (which means that FPW has not been changed
since last shutdown or crash), WAL-logging of FPW is skipped. This change
prevents unnecessary WAL-logging. Thought?

Is the patch well-formed enough to mark as ready-for-committer? It would
be very helpful if you review the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-24T15:24:28Z

On 24.10.2011 15:29, Fujii Masao wrote:
> +    <listitem>
> +     <para>
> +      Copy the pg_control file from the cluster directory to the global
> +      sub-directory of the backup. For example:
> + <programlisting>
> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
> + </programlisting>
> +     </para>
> +    </listitem>

Why is this step required? The control file is overwritten by 
information from the backup_label anyway, no?

> +    <listitem>
> +     <para>
> +      Again connect to the database as a superuser, and execute
> +      <function>pg_stop_backup</>. This terminates the backup mode, but does not
> +      perform a switch to the next WAL segment, create a backup history file and
> +      wait for all required WAL segments to be archived,
> +      unlike that during normal processing.
> +     </para>
> +    </listitem>

How do you ensure that all the required WAL segments have been archived, 
then?

> +   </orderedlist>
> +    </para>
> +
> +    <para>
> +     You cannot use the <application>pg_basebackup</> tool to take the backup
> +     from the standby.
> +    </para>

Why not? We have cascading replication now.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Online base backup from the hot-standby

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-24T15:33:40Z

On 24.10.2011 15:29, Fujii Masao wrote:
> In your patch, FPW is always WAL-logged at startup even when FPW has
> not been changed since last shutdown. I don't think that's required.
> I changed the recovery code so that it keeps track of last FPW indicated
> by WAL record. Then, at end of startup, if that FPW is equal to FPW
> specified in postgresql.conf (which means that FPW has not been changed
> since last shutdown or crash), WAL-logging of FPW is skipped. This change
> prevents unnecessary WAL-logging. Thought?

One problem with this whole FPW-tracking is that pg_lesslog makes it 
fail. I'm not sure what we need to do about that - maybe just add a 
warning to the docs. But it leaves a bit bad feeling in my mouth. 
Usually we try to make features work orthogonally, without dependencies 
to other settings. Now this feature requires that full_page_writes is 
turned on in the master, and also that you don't use pg_lesslog to 
compress the WAL segments or your base backup might be corrupt. The 
procedure to take a backup from the standby seems more complicated than 
taking it on the master - there are more steps to follow.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2011-10-24T15:38:18Z

On Mon, Oct 24, 2011 at 11:33 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 24.10.2011 15:29, Fujii Masao wrote:
>>
>> In your patch, FPW is always WAL-logged at startup even when FPW has
>> not been changed since last shutdown. I don't think that's required.
>> I changed the recovery code so that it keeps track of last FPW indicated
>> by WAL record. Then, at end of startup, if that FPW is equal to FPW
>> specified in postgresql.conf (which means that FPW has not been changed
>> since last shutdown or crash), WAL-logging of FPW is skipped. This change
>> prevents unnecessary WAL-logging. Thought?
>
> One problem with this whole FPW-tracking is that pg_lesslog makes it fail.
> I'm not sure what we need to do about that - maybe just add a warning to the
> docs. But it leaves a bit bad feeling in my mouth. Usually we try to make
> features work orthogonally, without dependencies to other settings. Now this
> feature requires that full_page_writes is turned on in the master, and also
> that you don't use pg_lesslog to compress the WAL segments or your base
> backup might be corrupt. The procedure to take a backup from the standby
> seems more complicated than taking it on the master - there are more steps
> to follow.

Doing it on the master isn't as easy as I'd like it to be, either.

But it's not really clear how to make it simpler.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T05:12:20Z

Thanks for the review!

On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 24.10.2011 15:29, Fujii Masao wrote:
>>
>> +    <listitem>
>> +     <para>
>> +      Copy the pg_control file from the cluster directory to the global
>> +      sub-directory of the backup. For example:
>> + <programlisting>
>> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>> + </programlisting>
>> +     </para>
>> +    </listitem>
>
> Why is this step required? The control file is overwritten by information
> from the backup_label anyway, no?

Yes, when recovery starts, the control file is overwritten. But before that,
we retrieve the minimum recovery point from the control file. Then it's used
as the backup end location.

During recovery, pg_stop_backup() cannot write an end-of-backup record.
So, in standby-only backup, other way to retrieve the backup end location
(instead of an end-of-backup record) is required. Ishiduka-san used the
control file as that, according to your suggestion ;)
http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php

>> +    <listitem>
>> +     <para>
>> +      Again connect to the database as a superuser, and execute
>> +      <function>pg_stop_backup</>. This terminates the backup mode, but
>> does not
>> +      perform a switch to the next WAL segment, create a backup history
>> file and
>> +      wait for all required WAL segments to be archived,
>> +      unlike that during normal processing.
>> +     </para>
>> +    </listitem>
>
> How do you ensure that all the required WAL segments have been archived,
> then?

The patch doesn't provide any capability to ensure that, IOW assumes that's
a user responsibility. If a user wants to ensure that, he/she needs to calculate
the backup start and end WAL files from the result of pg_start_backup()
and pg_stop_backup() respectively, and needs to wait until those files have
appeared in the archive. Also if the required WAL file has not been archived
yet, a user might need to execute pg_switch_xlog() in the master.

If we change pg_stop_backup() so that, even during recovery, it waits until
all required WAL files have been archived, we would need to WAL-log
the completion of WAL archiving in the master. This enables the standby to
check whether specified WAL files have been archived. We should change
the patch in this way? But even if we change, you still might need to execute
pg_switch_xlog() in the master additionally, and pg_stop_backup() might keep
waiting infinitely if the master is not in progress.

>> +   </orderedlist>
>> +    </para>
>> +
>> +    <para>
>> +     You cannot use the <application>pg_basebackup</> tool to take the
>> backup
>> +     from the standby.
>> +    </para>
>
> Why not? We have cascading replication now.

Because no one has implemented that feature.

Yeah, we have cascading replication, but without adopting the standby-only
backup patch, pg_basebackup cannot execute do_pg_start_backup() and
do_pg_stop_backup() during recovery. So we can think that the patch that
Ishiduka-san proposed is the first step to extend pg_basebackup so that it
can take backup from the standby.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T05:37:48Z

On Tue, Oct 25, 2011 at 12:33 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> One problem with this whole FPW-tracking is that pg_lesslog makes it fail.
> I'm not sure what we need to do about that - maybe just add a warning to the
> docs. But it leaves a bit bad feeling in my mouth. Usually we try to make
> features work orthogonally, without dependencies to other settings. Now this
> feature requires that full_page_writes is turned on in the master, and also
> that you don't use pg_lesslog to compress the WAL segments or your base
> backup might be corrupt.

Right, pg_lesslog users cannot use the documented procedure. They need to
do more complex one;

1. Execute pg_start_backup() in the master, and save its return value.
2. Wait until the backup starting checkpoint record has been replayed
    in the standby. You can do this by comparing the return value of
    pg_start_backup() with pg_last_replay_location().
3. Do the documented standby-only backup procedure.
4. Execute pg_stop_backup() in the master.

This is complicated, but I'm not sure how we can simplify it. Anyway we can
document this procedure for pg_lesslog users. We should?

> The procedure to take a backup from the standby
> seems more complicated than taking it on the master - there are more steps
> to follow.

Extending pg_basebackup so that it can take a backup from the standby would
make the procedure simple to a certain extent, I think. Though a user
still needs
to enable FPW in the master and must not use pg_lesslog.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-25T06:44:30Z

On 25.10.2011 08:12, Fujii Masao wrote:
> On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> On 24.10.2011 15:29, Fujii Masao wrote:
>>>
>>> +<listitem>
>>> +<para>
>>> +      Copy the pg_control file from the cluster directory to the global
>>> +      sub-directory of the backup. For example:
>>> +<programlisting>
>>> + cp $PGDATA/global/pg_control /mnt/server/backupdir/global
>>> +</programlisting>
>>> +</para>
>>> +</listitem>
>>
>> Why is this step required? The control file is overwritten by information
>> from the backup_label anyway, no?
>
> Yes, when recovery starts, the control file is overwritten. But before that,
> we retrieve the minimum recovery point from the control file. Then it's used
> as the backup end location.
>
> During recovery, pg_stop_backup() cannot write an end-of-backup record.
> So, in standby-only backup, other way to retrieve the backup end location
> (instead of an end-of-backup record) is required. Ishiduka-san used the
> control file as that, according to your suggestion ;)
> http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php

Oh :-)

>>> +<para>
>>> +      Again connect to the database as a superuser, and execute
>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>> does not
>>> +      perform a switch to the next WAL segment, create a backup history
>>> file and
>>> +      wait for all required WAL segments to be archived,
>>> +      unlike that during normal processing.
>>> +</para>
>>> +</listitem>
>>
>> How do you ensure that all the required WAL segments have been archived,
>> then?
>
> The patch doesn't provide any capability to ensure that, IOW assumes that's
> a user responsibility. If a user wants to ensure that, he/she needs to calculate
> the backup start and end WAL files from the result of pg_start_backup()
> and pg_stop_backup() respectively, and needs to wait until those files have
> appeared in the archive. Also if the required WAL file has not been archived
> yet, a user might need to execute pg_switch_xlog() in the master.

Frankly, I think this whole thing is too fragile. The procedure is 
superficially similar to what you do on master: run pg_start_backup(), 
rsync data directory, run pg_stop_backup(), but is actually subtly 
different and more complicated. If you don't know that, and don't follow 
the full procedure, you get a corrupt backup. And the backup might look 
ok, and might even sometimes work, which means that you won't notice in 
quick testing. That's a *huge* foot-gun.

I think we need to step back and find a way to make this:
a) less complicated, or at least
b) more robust, so that if you don't follow the procedure, you get an error.

With pg_basebackup, we have a fighting chance of getting this right, 
because we have more control over how the backup is made. For example, 
we can co-operate with the buffer manager to avoid torn-pages, 
eliminating the need for full_page_writes=on, and we can include a 
control file with the correct end-of-backup location automatically, 
without requiring user intervention. pg_basebackup is less flexible than 
the pg_start/stop_backup method, and unfortunately you're more likely to 
need the flexibility in a more complicated setup with a hot standby 
server and all, but making the generic pg_start/stop_backup method work 
seems infeasible at the moment.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T08:50:10Z

On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>>>> +<para>
>>>> +      Again connect to the database as a superuser, and execute
>>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>>> does not
>>>> +      perform a switch to the next WAL segment, create a backup history
>>>> file and
>>>> +      wait for all required WAL segments to be archived,
>>>> +      unlike that during normal processing.
>>>> +</para>
>>>> +</listitem>
>>>
>>> How do you ensure that all the required WAL segments have been archived,
>>> then?
>>
>> The patch doesn't provide any capability to ensure that, IOW assumes
>> that's
>> a user responsibility. If a user wants to ensure that, he/she needs to
>> calculate
>> the backup start and end WAL files from the result of pg_start_backup()
>> and pg_stop_backup() respectively, and needs to wait until those files
>> have
>> appeared in the archive. Also if the required WAL file has not been
>> archived
>> yet, a user might need to execute pg_switch_xlog() in the master.
>
> Frankly, I think this whole thing is too fragile. The procedure is
> superficially similar to what you do on master: run pg_start_backup(), rsync
> data directory, run pg_stop_backup(), but is actually subtly different and
> more complicated. If you don't know that, and don't follow the full
> procedure, you get a corrupt backup. And the backup might look ok, and might
> even sometimes work, which means that you won't notice in quick testing.
> That's a *huge* foot-gun.
>
> I think we need to step back and find a way to make this:
> a) less complicated, or at least
> b) more robust, so that if you don't follow the procedure, you get an error.

One idea to make the way more robust is to change the PostgreSQL so that
it writes the buffer page to a temporary space instead of database file
during a backup. This means that there is no torn-pages in the database files
of the backup. After backup, the data blocks are written back to the database
files over time. When recovery starts from that backup(i.e., backup_label is
found), it clears the temporary space in the backup first and continues recovery
by using the database files which contain no torn-pages. OTOH,
in crash recovery (i.e., backup_label is not found), recovery is performed by
using both database files and temporary space. This whole approach would
make the standby-only backup available even if FPW is disabled in the master
and you don't care about the order to backup the control file.

But this idea looks overkill. It seems very complicated to implement that, and
likely to invite other bugs. I don't have any other good and simple
idea for now.

> With pg_basebackup, we have a fighting chance of getting this right, because
> we have more control over how the backup is made. For example, we can
> co-operate with the buffer manager to avoid torn-pages, eliminating the need
> for full_page_writes=on, and we can include a control file with the correct
> end-of-backup location automatically, without requiring user intervention.
> pg_basebackup is less flexible than the pg_start/stop_backup method, and
> unfortunately you're more likely to need the flexibility in a more
> complicated setup with a hot standby server and all, but making the generic
> pg_start/stop_backup method work seems infeasible at the moment.

Yes, so we should give up supporting manual procedure? And extend
pg_basebackup for the standby-only backup, first? I can live with this.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Magnus Hagander <magnus@hagander.net> — 2011-10-25T10:19:33Z

On Tue, Oct 25, 2011 at 10:50, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>>>>> +<para>
>>>>> +      Again connect to the database as a superuser, and execute
>>>>> +<function>pg_stop_backup</>. This terminates the backup mode, but
>>>>> does not
>>>>> +      perform a switch to the next WAL segment, create a backup history
>>>>> file and
>>>>> +      wait for all required WAL segments to be archived,
>>>>> +      unlike that during normal processing.
>>>>> +</para>
>>>>> +</listitem>
>>>>
>>>> How do you ensure that all the required WAL segments have been archived,
>>>> then?
>>>
>>> The patch doesn't provide any capability to ensure that, IOW assumes
>>> that's
>>> a user responsibility. If a user wants to ensure that, he/she needs to
>>> calculate
>>> the backup start and end WAL files from the result of pg_start_backup()
>>> and pg_stop_backup() respectively, and needs to wait until those files
>>> have
>>> appeared in the archive. Also if the required WAL file has not been
>>> archived
>>> yet, a user might need to execute pg_switch_xlog() in the master.
>>
>> Frankly, I think this whole thing is too fragile. The procedure is
>> superficially similar to what you do on master: run pg_start_backup(), rsync
>> data directory, run pg_stop_backup(), but is actually subtly different and
>> more complicated. If you don't know that, and don't follow the full
>> procedure, you get a corrupt backup. And the backup might look ok, and might
>> even sometimes work, which means that you won't notice in quick testing.
>> That's a *huge* foot-gun.
>>
>> I think we need to step back and find a way to make this:
>> a) less complicated, or at least
>> b) more robust, so that if you don't follow the procedure, you get an error.
>
> One idea to make the way more robust is to change the PostgreSQL so that
> it writes the buffer page to a temporary space instead of database file
> during a backup. This means that there is no torn-pages in the database files
> of the backup. After backup, the data blocks are written back to the database
> files over time. When recovery starts from that backup(i.e., backup_label is
> found), it clears the temporary space in the backup first and continues recovery
> by using the database files which contain no torn-pages. OTOH,
> in crash recovery (i.e., backup_label is not found), recovery is performed by
> using both database files and temporary space. This whole approach would
> make the standby-only backup available even if FPW is disabled in the master
> and you don't care about the order to backup the control file.
>
> But this idea looks overkill. It seems very complicated to implement that, and
> likely to invite other bugs. I don't have any other good and simple
> idea for now.
>
>> With pg_basebackup, we have a fighting chance of getting this right, because
>> we have more control over how the backup is made. For example, we can
>> co-operate with the buffer manager to avoid torn-pages, eliminating the need
>> for full_page_writes=on, and we can include a control file with the correct
>> end-of-backup location automatically, without requiring user intervention.
>> pg_basebackup is less flexible than the pg_start/stop_backup method, and
>> unfortunately you're more likely to need the flexibility in a more
>> complicated setup with a hot standby server and all, but making the generic
>> pg_start/stop_backup method work seems infeasible at the moment.
>
> Yes, so we should give up supporting manual procedure? And extend
> pg_basebackup for the standby-only backup, first? I can live with this.

I don't think we should necessarily give up completely. But doing a
pg_basebackup way *first* seems reasonable - because it's going to be
the easiest one to "get right", given that we have more control there.
Doesn't mean we shouldn't extend it in the future...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-25T11:54:58Z

On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I don't think we should necessarily give up completely. But doing a
> pg_basebackup way *first* seems reasonable - because it's going to be
> the easiest one to "get right", given that we have more control there.
> Doesn't mean we shouldn't extend it in the future...

Agreed. The question is -- how far should we change pg_basebackup to
"get right"? I think it's not difficult to change it so that it backs up
the control file at the end. But eliminating the need for full_page_writes=on
seems not easy. No? So I'm not inclined to do that in at least first commit.
Otherwise, I'm afraid the patch would become huge.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Magnus Hagander <magnus@hagander.net> — 2011-10-25T12:03:57Z

On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> I don't think we should necessarily give up completely. But doing a
>> pg_basebackup way *first* seems reasonable - because it's going to be
>> the easiest one to "get right", given that we have more control there.
>> Doesn't mean we shouldn't extend it in the future...
>
> Agreed. The question is -- how far should we change pg_basebackup to
> "get right"? I think it's not difficult to change it so that it backs up
> the control file at the end. But eliminating the need for full_page_writes=on
> seems not easy. No? So I'm not inclined to do that in at least first commit.
> Otherwise, I'm afraid the patch would become huge.

It's more server side of base backups than the actual pg_basebackup
tool of course, but I'm sure that's what we're all referring to here.

Personally, I'd see the fpw stuff as part of the infrastructure
needed. Meaning that the fpw stuff should go in *first*, and the
pg_basebackup stuff later.

If we want something to go in early, that could be as simple as a
version of pg_basebackup that runs against the slave but only if
full_page_writes=on on the master. If it's not, it throws an error.
Then we can improve upon that by adding handling of fpw=off, first by
infrastructure, then by tool.

Doing it piece by piece like that is probably a good idea, since as
you say, all at once will be pretty huge.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-10-25T12:56:47Z

On 11-10-25 02:44 AM, Heikki Linnakangas wrote:
> With pg_basebackup, we have a fighting chance of getting this right, 
> because we have more control over how the backup is made. For example, 
> we can co-operate with the buffer manager to avoid torn-pages, 
> eliminating the need for full_page_writes=on, and we can include a 
> control file with the correct end-of-backup location automatically, 
> without requiring user intervention. pg_basebackup is less flexible 
> than the pg_start/stop_backup method, and unfortunately you're more 
> likely to need the flexibility in a more complicated setup with a hot 
> standby server and all, but making the generic pg_start/stop_backup 
> method work seems infeasible at the moment.

Would pg_basebackup be able to work with the buffer manager on the slave 
to avoid full_page_writes=on needing to be set on the master?  (the 
point of this is to be able to take the base backup without having the 
backup program contact the master). If so could pg_start_backup() not 
just put the buffer manager into the same state?

Re: Online base backup from the hot-standby

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> — 2011-10-25T13:05:11Z

On 25.10.2011 15:56, Steve Singer wrote:
> On 11-10-25 02:44 AM, Heikki Linnakangas wrote:
>> With pg_basebackup, we have a fighting chance of getting this right,
>> because we have more control over how the backup is made. For example,
>> we can co-operate with the buffer manager to avoid torn-pages,
>> eliminating the need for full_page_writes=on, and we can include a
>> control file with the correct end-of-backup location automatically,
>> without requiring user intervention. pg_basebackup is less flexible
>> than the pg_start/stop_backup method, and unfortunately you're more
>> likely to need the flexibility in a more complicated setup with a hot
>> standby server and all, but making the generic pg_start/stop_backup
>> method work seems infeasible at the moment.
>
> Would pg_basebackup be able to work with the buffer manager on the slave
> to avoid full_page_writes=on needing to be set on the master? (the point
> of this is to be able to take the base backup without having the backup
> program contact the master).

In theory, yes. I'm not sure how difficult it would be in practice. 
Currently, the walsender process just scans and copies everything in the 
data directory, at the filesystem level. It would have to go through the 
buffer manager instead, to avoid reading a page at the same time that 
the buffer manager is writing it out.

> If so could pg_start_backup() not just put the buffer manager into the same state?

No. . The trick that pg_basebackup (= walsender) can do is to co-operate 
with the buffer manager when reading each page. An external program 
cannot do that.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-10-26T01:48:05Z

On Tue, Oct 25, 2011 at 9:03 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> I don't think we should necessarily give up completely. But doing a
>>> pg_basebackup way *first* seems reasonable - because it's going to be
>>> the easiest one to "get right", given that we have more control there.
>>> Doesn't mean we shouldn't extend it in the future...
>>
>> Agreed. The question is -- how far should we change pg_basebackup to
>> "get right"? I think it's not difficult to change it so that it backs up
>> the control file at the end. But eliminating the need for full_page_writes=on
>> seems not easy. No? So I'm not inclined to do that in at least first commit.
>> Otherwise, I'm afraid the patch would become huge.
>
> It's more server side of base backups than the actual pg_basebackup
> tool of course, but I'm sure that's what we're all referring to here.
>
> Personally, I'd see the fpw stuff as part of the infrastructure
> needed. Meaning that the fpw stuff should go in *first*, and the
> pg_basebackup stuff later.

Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
as the infrastructure patch.

The changes of pg_start_backup() etc that Ishiduka-san did are also
a server-side infrastructure. I will extract them as another infrastructure one.

Ishiduka-san, if you have time, feel free to try the above, barring objection.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-10-31T04:11:19Z

> > On Tue, Oct 25, 2011 at 13:54, Fujii Masao <masao.fujii@gmail.com> wrote:
> >> On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
> >>> I don't think we should necessarily give up completely. But doing a
> >>> pg_basebackup way *first* seems reasonable - because it's going to be
> >>> the easiest one to "get right", given that we have more control there.
> >>> Doesn't mean we shouldn't extend it in the future...
> >>
> >> Agreed. The question is -- how far should we change pg_basebackup to
> >> "get right"? I think it's not difficult to change it so that it backs up
> >> the control file at the end. But eliminating the need for full_page_writes=on
> >> seems not easy. No? So I'm not inclined to do that in at least first commit.
> >> Otherwise, I'm afraid the patch would become huge.
> >
> > It's more server side of base backups than the actual pg_basebackup
> > tool of course, but I'm sure that's what we're all referring to here.
> >
> > Personally, I'd see the fpw stuff as part of the infrastructure
> > needed. Meaning that the fpw stuff should go in *first*, and the
> > pg_basebackup stuff later.
> 
> Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
> as the infrastructure patch.
> 
> The changes of pg_start_backup() etc that Ishiduka-san did are also
> a server-side infrastructure. I will extract them as another infrastructure one.
> 
> Ishiduka-san, if you have time, feel free to try the above, barring objection.


Done.
Changed the name of the patch.

<Modifications>
 So changed to the positioning of infrastructure,
   * Removed the documentation.
   * changed to an error when you run pg_start/stop_backup() on the standby.


Regards.


--------------------------------------------
Jun Ishizuka
NTT Software Corporation
TEL：045-317-7018
E-Mail: ishizuka.jun@po.ntts.co.jp
--------------------------------------------

Re: Online base backup from the hot-standby

Josh Berkus <josh@agliodbs.com> — 2011-11-03T23:06:16Z

On 10/25/11 5:03 AM, Magnus Hagander wrote:
> If we want something to go in early, that could be as simple as a
> version of pg_basebackup that runs against the slave but only if
> full_page_writes=on on the master. If it's not, it throws an error.
> Then we can improve upon that by adding handling of fpw=off, first by
> infrastructure, then by tool.

Just to be clear, the idea is to require full_page_writes to do backup
from the standby in 9.2, but to remove the requirement later?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2011-11-04T04:20:31Z

On Fri, Nov 4, 2011 at 8:06 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 10/25/11 5:03 AM, Magnus Hagander wrote:
>> If we want something to go in early, that could be as simple as a
>> version of pg_basebackup that runs against the slave but only if
>> full_page_writes=on on the master. If it's not, it throws an error.
>> Then we can improve upon that by adding handling of fpw=off, first by
>> infrastructure, then by tool.
>
> Just to be clear, the idea is to require full_page_writes to do backup
> from the standby in 9.2, but to remove the requirement later?

Yes unless I'm missing something. Not sure if we can remove that in 9.2, though.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2011-11-15T02:11:38Z

On 11-10-31 12:11 AM, Jun Ishiduka wrote:
>>
>> Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it
>> as the infrastructure patch.
>>
>> The changes of pg_start_backup() etc that Ishiduka-san did are also
>> a server-side infrastructure. I will extract them as another infrastructure one.
>>
>> Ishiduka-san, if you have time, feel free to try the above, barring objection.
>
> Done.
> Changed the name of the patch.
>
> <Modifications>
>  So changed to the positioning of infrastructure,
>    * Removed the documentation.
>    * changed to an error when you run pg_start/stop_backup() on the standby.
>
>

Here is my stab at reviewing this version of this version of the patch.

Submission
-------------------
The purpose of this version of the patch is to provide some
infrastructure needed for backups from the slave without having to solve
some of the usability issues raised in previous versions of the patch.

This patch applied fine earlier versions of head but it doesn't today.
Simon moved some of the code touched by this patch as part of the xlog
refactoring. Please post an updated/rebased version of the patch.

I think the purpose of this patch is to provide

a) The code changes to record changes to fpw state of the master in WAL.
b) Track the state of FPW while in recovery mode

This version of the patch is NOT intended to allow SQL calls to
pg_start_backup() on slaves to work. This patch lays the infrastructure
for another patch (which I haven't seen) to allow pg_basebackup to do a
base backup from a slave assuming fpw=on has been set on the master (my
understanding of this patch is that it puts into place all of the pieces
required for the pg_basebackup patch to detect if fpw!=on and abort).

The consensus upthread was to get this infrastructure in and figure out
a safe+usable way of doing a slave backup without pg_basebackup later.

The patch seems to do what I expect of it.

I don't see any issues with most of the code changes in this patch.
However I admit that even after reviewing many versions of this patch I
still am not familiar enough with the recovery code to comment on a lot
of the details.

One thing I did see:

In pg_ctl.c

! if (stat(recovery_file, &statbuf) != 0)
! print_msg(_("WARNING: online backup mode is active\n"
! "Shutdown will not complete until pg_stop_backup() is called.\n\n"));
! else
! print_msg(_("WARNING: online backup mode is active if you can connect
as a superuser to server\n"
! "If so, shutdown will not complete until pg_stop_backup() is
called.\n\n"));

I am having difficulty understanding what this error message is trying
to tell me. I think it is telling me (based on the code comments) that
if I can't connect to the server because the server is not yet accepting
connections then I shouldn't worry about anything. However if the server
is accepting connections then I need to login and call pg_stop_backup().

Maybe
"WARNING: online backup mode is active. If your server is accepting
connections then you must connect as superuser and run pg_stop_backup()
before shutdown will complete"

I will wait on attempting to test the patch until you have sent a
version that applies against the current HEAD.

> Regards.
>
>
> --------------------------------------------
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka.jun@po.ntts.co.jp
> --------------------------------------------
>
>
>

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-13T08:02:55Z

Sorry for the delay.

2011/11/15 Steve Singer <ssinger_pg@sympatico.ca>:
> Here is my stab at reviewing this version of this version of the patch.

Thanks for the review!

> This version of the patch is NOT intended to allow SQL calls to
> pg_start_backup() on slaves to work.   This patch lays the infrastructure
> for another patch (which I haven't seen) to allow pg_basebackup to do a base
> backup from a slave assuming fpw=on has been set on the master (my
> understanding of this patch is that it puts into place all of the pieces
> required for the pg_basebackup patch to detect if fpw!=on and abort).

The amount of code changes to allow pg_basebackup to make a backup from
the standby seems to be small. So I ended up merging that changes and the
infrastructure patch. WIP patch attached. But I'd happy to split the patch again
if you want.

> In pg_ctl.c
>
> !             if (stat(recovery_file, &statbuf) != 0)
> !                 print_msg(_("WARNING: online backup mode is active\n"
> !                             "Shutdown will not complete until
> pg_stop_backup() is called.\n\n"));
> !             else
> !                 print_msg(_("WARNING: online backup mode is active if you
> can connect as a superuser to server\n"
> !                             "If so, shutdown will not complete until
> pg_stop_backup() is called.\n\n"));
>
> I am having difficulty understanding what this error message is trying to
> tell me.   I think it is telling me (based on the code comments) that if I
> can't connect to the server because the server is not yet accepting
> connections then I shouldn't worry about anything.   However if the server
> is accepting connections then I need to login and call pg_stop_backup().
>
> Maybe
> "WARNING:  online backup mode is active.  If your server is accepting
> connections then you must connect as superuser and run pg_stop_backup()
> before shutdown will complete"

The reason why the above change of pg_ctl.c was required is that new
backup_label can be created by standby-only backup during recovery.
But, now, we decided to disallow pg_start_backup() and pg_stop_backup()
to be called during recovery again, and allow only pg_basebackup to make
a base backup from the standby, which means that backup_label will not be
created during recovery. So the above change of pg_ctl.c has not been
required now. I excluded that change from the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-17T10:38:23Z

On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> The amount of code changes to allow pg_basebackup to make a backup from
> the standby seems to be small. So I ended up merging that changes and the
> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
> if you want.

Attached is the updated version of the patch. I wrote the limitations of
standby-only backup in the document and changed the error messages.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Steve Singer <ssinger_pg@sympatico.ca> — 2012-01-20T04:01:49Z

On 12-01-17 05:38 AM, Fujii Masao wrote:
> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> The amount of code changes to allow pg_basebackup to make a backup from
>> the standby seems to be small. So I ended up merging that changes and the
>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
>> if you want.
> Attached is the updated version of the patch. I wrote the limitations of
> standby-only backup in the document and changed the error messages.
>

Here is my review of this verison of the patch. I think this patch has 
been in every CF for 9.2 and I feel it is getting close to being 
committed.  The only issue of significants is a crash I encountered 
while testing, see below.

I am fine with including the pg_basebackup changes in the patch it also 
makes testing some of the other changes possible.

The documentation updates you have are good

I don't see any issues looking at the code.

Testing Review
--------------------------------

I encountered this on my first replica (the one based on the master).  I 
am not sure if it is related to this patch, it happened after the 
pg_basebackup against the replica finished.

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: 
"twophase.c", Line: 1238)
LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

A little earlier this postmaster had printed.

LOG:  restored log file "00000001000000000000001F" from archive
LOG:  restored log file "000000010000000000000020" from archive
cp: cannot stat 
`/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
or directory
LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
cp: cannot stat 
`/usr/local/pgsql92git/archive/000000010000000000000021': No such file 
or directory

I have NOT been able to replicate this error  and I am not sure exactly 
what I had done in my testing prior to that point.

In another test run I had

- set full page writes=off and did a checkpoint
- Started the pg_basebackup
- set full_page_writes=on and did a HUP + some database activity that 
might have forced a checkpoint.

I got this message from pg_basebackup.
./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
pg_basebackup: could not get WAL end position from server

I point this out because the message is different than the normal "could 
not initiate base backup: FATAL:  WAL generated with 
full_page_writes=off" thatI normally see.    We might want to add a 
PQerrorMessage(conn)) to pg_basebackup to print the error details.  
Since this patch didn't actually change pg_basebackup I don't think your 
required to improve the error messages in it.  I am just mentioning this 
because it came up in testing.

The rest of the tests I did involving changing full_page_writes  
with/without checkpoints and sighups and promoting the replica seemed to 
work as expected.

> Regards,
>
>
>
>

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T09:48:59Z

On Fri, Jan 20, 2012 at 1:01 PM, Steve Singer <ssinger_pg@sympatico.ca> wrote:
> Here is my review of this verison of the patch. I think this patch has been
> in every CF for 9.2 and I feel it is getting close to being committed.

Thanks for the review!

> Testing Review
> --------------------------------
>
> I encountered this on my first replica (the one based on the master).  I am
> not sure if it is related to this patch, it happened after the pg_basebackup
> against the replica finished.
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1238)
> LOG:  startup process (PID 12222) was terminated by signal 6: Aborted

I spent one hour to reproduce that issue, but finally I was not able
to do that :(
For now I have no idea what causes that issue. But basically the patch doesn't
touch any codes related to that issue, so I'm guessing that it's a problem of
the HEAD rather than the patch...

I will spend more time to diagnose the issue. If you notice something, please
let me know.

> - set full page writes=off and did a checkpoint
> - Started the pg_basebackup
> - set full_page_writes=on and did a HUP + some database activity that might
> have forced a checkpoint.
>
> I got this message from pg_basebackup.
> ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
> pg_basebackup: could not get WAL end position from server
>
> I point this out because the message is different than the normal "could not
> initiate base backup: FATAL:  WAL generated with full_page_writes=off" thatI
> normally see.

I guess that's because you started pg_basebackup before checkpoint record
with full_page_writes=off had been replicated and replayed to the standby.
In this case, when you starts pg_basebackup, it uses the previous checkpoint
record with maybe full_page_writes=on as the backup starting checkpoint, so
pg_basebackup passes the check of full_page_writes at the start of backup.
Then, it fails the check at the end of backup, so you got such an error message.

> We might want to add a PQerrorMessage(conn)) to
> pg_basebackup to print the error details.  Since this patch didn't actually
> change pg_basebackup I don't think your required to improve the error
> messages in it.  I am just mentioning this because it came up in testing.

Agreed.

When PQresultStatus() returns an unexpected status, basically the error
message from PQerrorMessage() should be reported. But only when
pg_basebackup could not get WAL end position, PQerrorMessage() was
not reported... This looks like a oversight of pg_basebackup... I think that
it's better to fix that as a separate patch (attached). Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Erik Rijkers <er@xs4all.nl> — 2012-01-20T10:37:41Z

On Fri, January 20, 2012 05:01, Steve Singer wrote:
> On 12-01-17 05:38 AM, Fujii Masao wrote:
>> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>> The amount of code changes to allow pg_basebackup to make a backup from
>>> the standby seems to be small. So I ended up merging that changes and the
>>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
>>> if you want.
>> Attached is the updated version of the patch. I wrote the limitations of
>> standby-only backup in the document and changed the error messages.
>>
>
> Here is my review of this verison of the patch. I think this patch has
> been in every CF for 9.2 and I feel it is getting close to being
> committed.  The only issue of significants is a crash I encountered
> while testing, see below.
>
> I am fine with including the pg_basebackup changes in the patch it also
> makes testing some of the other changes possible.
>
>
> The documentation updates you have are good
>
> I don't see any issues looking at the code.
>
>
>
> Testing Review
> --------------------------------
>
> I encountered this on my first replica (the one based on the master).  I
> am not sure if it is related to this patch, it happened after the
> pg_basebackup against the replica finished.
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1238)
> LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
>
> A little earlier this postmaster had printed.
>
> LOG:  restored log file "00000001000000000000001F" from archive
> LOG:  restored log file "000000010000000000000020" from archive
> cp: cannot stat
> `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
> or directory
> LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0
> cp: cannot stat
> `/usr/local/pgsql92git/archive/000000010000000000000021': No such file
> or directory
>
>
> I have NOT been able to replicate this error  and I am not sure exactly
> what I had done in my testing prior to that point.
>

I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly
already in 9.0devel; but I was never able to reproduce it reliably.  But I don't think it was ever
solved.

  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

Erik Rijkers















>
> In another test run I had
>
> - set full page writes=off and did a checkpoint
> - Started the pg_basebackup
> - set full_page_writes=on and did a HUP + some database activity that
> might have forced a checkpoint.
>
> I got this message from pg_basebackup.
> ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
> pg_basebackup: could not get WAL end position from server
>
> I point this out because the message is different than the normal "could
> not initiate base backup: FATAL:  WAL generated with
> full_page_writes=off" thatI normally see.    We might want to add a
> PQerrorMessage(conn)) to pg_basebackup to print the error details.
> Since this patch didn't actually change pg_basebackup I don't think your
> required to improve the error messages in it.  I am just mentioning this
> because it came up in testing.
>
>
> The rest of the tests I did involving changing full_page_writes
> with/without checkpoints and sighups and promoting the replica seemed to
> work as expected.
>

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T11:04:29Z

On Fri, Jan 20, 2012 at 7:37 PM, Erik Rijkers <er@xs4all.nl> wrote:
> I'm not sure, but it does look like this is the "mystery" bug that I encountered repeatedly
> already in 9.0devel; but I was never able to reproduce it reliably.  But I don't think it was ever
> solved.
>
>  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

I also encountered the same issue one year before:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01579.php

At that moment, I identified its cause:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01700.php

At last it was fixed:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01910.php

But Steve encountered it again, which means that the above fix is not
sufficient. Unless the issue is derived from my patch, we should do
another cycle of diagnosis of it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T11:15:55Z

On Tue, Jan 17, 2012 at 10:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> The amount of code changes to allow pg_basebackup to make a backup from
>> the standby seems to be small. So I ended up merging that changes and the
>> infrastructure patch. WIP patch attached. But I'd happy to split the patch again
>> if you want.
>
> Attached is the updated version of the patch. I wrote the limitations of
> standby-only backup in the document and changed the error messages.

I'm looking at this patch and wondering why we're doing so many
press-ups to ensure full_page_writes parameter is on. This will still
fail if you use a utility that removes the full page writes, but fail
silently.

I think it would be beneficial to explicitly check that all WAL
records have full page writes actually attached to them until we
achieve consistency.

Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
and it was removed. Not sure why? We already track other parameters
when they change, so I don't want to introduce a whole new WAL record
for each new parameter whose change needs tracking.

Please make a note for committer that wal version needs bumping.

I think its probably time to start a README.recovery to explain why
this works the way it does. Other changes can then start to do that as
well, so we can keep this to sane levels of complexity.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T11:26:35Z

On Fri, Jan 20, 2012 at 11:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

> But Steve encountered it again, which means that the above fix is not
> sufficient. Unless the issue is derived from my patch, we should do
> another cycle of diagnosis of it.

It's my bug, and I've posted a fix but not yet applied it, just added
to open items list. The only reason for that was time pressure, which
is now gone, so I'll look to apply it sooner.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-20T12:54:28Z

Thanks for the review!

On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'm looking at this patch and wondering why we're doing so many
> press-ups to ensure full_page_writes parameter is on. This will still
> fail if you use a utility that removes the full page writes, but fail
> silently.
>
> I think it would be beneficial to explicitly check that all WAL
> records have full page writes actually attached to them until we
> achieve consistency.

I agree that it's worth adding such a safeguard. That can be a self-contained
feature, so I'll submit a separate patch for that, to keep each patch small.

> Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
> and it was removed. Not sure why? We already track other parameters
> when they change, so I don't want to introduce a whole new WAL record
> for each new parameter whose change needs tracking.

I revived that because whenever full_page_writes must be WAL-logged
or replayed, there is no need to WAL-log or replay the HS parameters.
The opposite is also true. Logging or replaying all of them every time
seems to be a bit useless, and to make the code unreadable. ISTM that
XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
WAL activity by merging them into one WAL record.

> Please make a note for committer that wal version needs bumping.

Okay, will add the note about bumping XLOG_PAGE_MAGIC.

> I think its probably time to start a README.recovery to explain why
> this works the way it does. Other changes can then start to do that as
> well, so we can keep this to sane levels of complexity.

In this CF, there are other patches which change recovery codes. So
I think that it's better to do that after all of them will have been committed.
No need to hurry up to do that now.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-20T14:34:31Z

On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Thanks for the review!
>
> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I'm looking at this patch and wondering why we're doing so many
>> press-ups to ensure full_page_writes parameter is on. This will still
>> fail if you use a utility that removes the full page writes, but fail
>> silently.
>>
>> I think it would be beneficial to explicitly check that all WAL
>> records have full page writes actually attached to them until we
>> achieve consistency.
>
> I agree that it's worth adding such a safeguard. That can be a self-contained
> feature, so I'll submit a separate patch for that, to keep each patch small.

Maybe, but you mean do this now as well? Not sure I like silent errors.

>> Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
>> and it was removed. Not sure why? We already track other parameters
>> when they change, so I don't want to introduce a whole new WAL record
>> for each new parameter whose change needs tracking.
>
> I revived that because whenever full_page_writes must be WAL-logged
> or replayed, there is no need to WAL-log or replay the HS parameters.
> The opposite is also true. Logging or replaying all of them every time
> seems to be a bit useless, and to make the code unreadable. ISTM that
> XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
> WAL activity by merging them into one WAL record.

I don't agree, but for the sake of getting on with things I say this
is minor so is no reason to block this.

>> Please make a note for committer that wal version needs bumping.
>
> Okay, will add the note about bumping XLOG_PAGE_MAGIC.
>
>> I think its probably time to start a README.recovery to explain why
>> this works the way it does. Other changes can then start to do that as
>> well, so we can keep this to sane levels of complexity.
>
> In this CF, there are other patches which change recovery codes. So
> I think that it's better to do that after all of them will have been committed.
> No need to hurry up to do that now.

Agreed.

Will proceed to final review and if all OK, commit.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-23T10:29:20Z

On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Thanks for the review!
>>
>> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I'm looking at this patch and wondering why we're doing so many
>>> press-ups to ensure full_page_writes parameter is on. This will still
>>> fail if you use a utility that removes the full page writes, but fail
>>> silently.
>>>
>>> I think it would be beneficial to explicitly check that all WAL
>>> records have full page writes actually attached to them until we
>>> achieve consistency.
>>
>> I agree that it's worth adding such a safeguard. That can be a self-contained
>> feature, so I'll submit a separate patch for that, to keep each patch small.
>
> Maybe, but you mean do this now as well? Not sure I like silent errors.

If many people think the patch is not acceptable without such a safeguard,
I will do that right now. Otherwise, I'd like to take more time to do
that, i.e.,
add it to 9.2dev Oepn Items.

I've not come up with good idea. Ugly idea is to keep track of all replays of
full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
table and set the specified bit to 1 when full_page_writes is applied),
and then check whether full_page_writes has been already applied when
replaying normal WAL record... Do you have any better idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-23T13:11:04Z

On Mon, Jan 23, 2012 at 10:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Thanks for the review!
>>>
>>> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> I'm looking at this patch and wondering why we're doing so many
>>>> press-ups to ensure full_page_writes parameter is on. This will still
>>>> fail if you use a utility that removes the full page writes, but fail
>>>> silently.
>>>>
>>>> I think it would be beneficial to explicitly check that all WAL
>>>> records have full page writes actually attached to them until we
>>>> achieve consistency.
>>>
>>> I agree that it's worth adding such a safeguard. That can be a self-contained
>>> feature, so I'll submit a separate patch for that, to keep each patch small.
>>
>> Maybe, but you mean do this now as well? Not sure I like silent errors.
>
> If many people think the patch is not acceptable without such a safeguard,
> I will do that right now. Otherwise, I'd like to take more time to do
> that, i.e.,
> add it to 9.2dev Oepn Items.

> I've not come up with good idea. Ugly idea is to keep track of all replays of
> full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
> table and set the specified bit to 1 when full_page_writes is applied),
> and then check whether full_page_writes has been already applied when
> replaying normal WAL record... Do you have any better idea?

Not sure.

I think the only possible bug here is one introduced by an outside utility.

In that case, I don't think it should be the job of the backend to go
too far to protect against such atypical error. So if we can't solve
it fairly easily and with no overhead then I'd say lets skip it. We
could easily introduce a bug here just by having faulty checking code.

So lets add it to 9.2 open items as a non-priority item. I'll proceed
to commit for this now.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2012-01-23T13:11:10Z

On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> If many people think the patch is not acceptable without such a safeguard,
> I will do that right now.

That's my view.  I think we ought to resolve this issue before commit,
especially since it seems unclear that we know how to fix it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Robert Haas <robertmhaas@gmail.com> — 2012-01-23T13:13:51Z

On Mon, Jan 23, 2012 at 8:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 23, 2012 at 5:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> If many people think the patch is not acceptable without such a safeguard,
>> I will do that right now.
>
> That's my view.  I think we ought to resolve this issue before commit,
> especially since it seems unclear that we know how to fix it.

Actually, never mind.  On reading this more carefully, I'm not too
concerned about the possibility of people breaking it with pg_lesslog
or similar.  But it should be solid if you use only the functionality
built into core.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-24T09:51:13Z

On Mon, Jan 23, 2012 at 10:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 23, 2012 at 10:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Jan 20, 2012 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> Thanks for the review!
>>>>
>>>> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> I'm looking at this patch and wondering why we're doing so many
>>>>> press-ups to ensure full_page_writes parameter is on. This will still
>>>>> fail if you use a utility that removes the full page writes, but fail
>>>>> silently.
>>>>>
>>>>> I think it would be beneficial to explicitly check that all WAL
>>>>> records have full page writes actually attached to them until we
>>>>> achieve consistency.
>>>>
>>>> I agree that it's worth adding such a safeguard. That can be a self-contained
>>>> feature, so I'll submit a separate patch for that, to keep each patch small.
>>>
>>> Maybe, but you mean do this now as well? Not sure I like silent errors.
>>
>> If many people think the patch is not acceptable without such a safeguard,
>> I will do that right now. Otherwise, I'd like to take more time to do
>> that, i.e.,
>> add it to 9.2dev Oepn Items.
>
>> I've not come up with good idea. Ugly idea is to keep track of all replays of
>> full_page_writes for every buffer pages (i.e., prepare 1-bit per buffer page
>> table and set the specified bit to 1 when full_page_writes is applied),
>> and then check whether full_page_writes has been already applied when
>> replaying normal WAL record... Do you have any better idea?
>
> Not sure.
>
> I think the only possible bug here is one introduced by an outside utility.
>
> In that case, I don't think it should be the job of the backend to go
> too far to protect against such atypical error. So if we can't solve
> it fairly easily and with no overhead then I'd say lets skip it. We
> could easily introduce a bug here just by having faulty checking code.
>
> So lets add it to 9.2 open items as a non-priority item.

Agreed.

> I'll proceed to commit for this now.

Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-24T10:54:56Z

On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

>> I'll proceed to commit for this now.
>
> Thanks a lot!

Can I just check a few things?

You say
/*
+        * Update full_page_writes in shared memory and write an
+        * XLOG_FPW_CHANGE record before resource manager writes cleanup
+        * WAL records or checkpoint record is written.
+        */

why does it need to be before the cleanup and checkpoint?

You say
/*
+        * Currently only non-exclusive backup can be taken during recovery.
+        */

why?

You mention in the docs
"The backup history file is not created in the database cluster backed up."
but we need to explain the bad effect, if any.

You say
"If the standby is promoted to the master during online backup, the
backup fails."
but no explanation of why?

I could work those things out, but I don't want to have to, plus we
may disagree if I did.

There are some good explanations in comments of other things, just not
everywhere needed.

What happens if we shutdown the WALwriter and then issue SIGHUP?

Are we sure we want to make the change of file format mandatory? That
means earlier versions of clients such as pg_basebackup will fail
against this version. Should we allow that if BACKUP FROM is missing
we assume it was master?

There are no docs to explain the new feature is available in the main
docs, or to explain the restrictions.
I expect you will add that later after commit.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-24T11:22:16Z

On Tue, Jan 24, 2012 at 10:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>>> I'll proceed to commit for this now.
>>
>> Thanks a lot!
>
> Can I just check a few things?

Just to clarify, not expecting another patch version, just reply here
and I can edit.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-25T08:16:40Z

On Tue, Jan 24, 2012 at 7:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 24, 2012 at 9:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>>> I'll proceed to commit for this now.
>>
>> Thanks a lot!
>
> Can I just check a few things?

Sure!

> You say
> /*
> +        * Update full_page_writes in shared memory and write an
> +        * XLOG_FPW_CHANGE record before resource manager writes cleanup
> +        * WAL records or checkpoint record is written.
> +        */
>
> why does it need to be before the cleanup and checkpoint?

Because the cleanup and checkpoint need to see FPW in shared memory.
If FPW in shared memory is not updated there, the cleanup and (end-of-recovery)
checkpoint always use an initial value (= false) of FPW in shared memory.

> You say
> /*
> +        * Currently only non-exclusive backup can be taken during recovery.
> +        */
>
> why?

At first I proposed to allow exclusive backup to be taken during recovery. But
Heikki disagreed with the proposal because he thought that the exclusive backup
procedure which I proposed was too fragile. No one could come up with any good
user-friendly easy-to-implement procedure. So we decided to allow only
non-exclusive backup to be taken during recovery. In non-exclusive backup,
the complicated procedure is performed by pg_basebackup, so a user doesn't
need to care about that.

> You mention in the docs
> "The backup history file is not created in the database cluster backed up."
> but we need to explain the bad effect, if any.

Users cannot know various information (e.g., which WAL files are required for
backups, backup starting/ending time, etc) about backups which have been taken
so far. If they need such information, they need to record that manually.

Users cannot pass the backup history file to pg_archivecleanup. Which might make
the usage of pg_archivecleanup more difficult.

After a little thought, pg_basebackup would be able to create the backup history
file in the backup, though it cannot be archived. We shoud implement
that feature
to alleviate the bad effect?

> You say
> "If the standby is promoted to the master during online backup, the
> backup fails."
> but no explanation of why?
>
> I could work those things out, but I don't want to have to, plus we
> may disagree if I did.

If the backup succeeds in that case, when we start an archive recovery from that
backup, the recovery needs to cross between two timelines. Which means that
we need to set recovery_target_timeline before starting recovery. Whether
recovery_target_timeline needs to be set or not depends on whether the standby
was promoted during taking the backup. Leaving such a decision to a user seems
fragile.

pg_basebackup -x ensures that all required files are included in the backup and
we can start recovery without restoring any file from the archive. But
if the standby
is promoted during the backup, the timeline history file would become
an essential
file for recovery, but it's not included in the backup.

> There are some good explanations in comments of other things, just not
> everywhere needed.
>
> What happens if we shutdown the WALwriter and then issue SIGHUP?

SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
the case where smart shutdown kills walwriter but some backends are
still running?
Currently SIGHUP affects full_page_writes and running backends use the changed
new value of full_page_writes. But in the patch, SIGHUP doesn't affect...

To address the problem, we should either postpone the shutdown of walwriter
until all backends have gone away, or leave the update of full_page_writes to
checkpointer process instead of walwriter. Thought?

> Are we sure we want to make the change of file format mandatory? That
> means earlier versions of clients such as pg_basebackup will fail
> against this version.

Really? Unless I'm missing something, pg_basebackup doesn't care about the
file format of backup_label. So I don't think that earlier version of
pg_basebackup
fails.

> There are no docs to explain the new feature is available in the main
> docs, or to explain the restrictions.
> I expect you will add that later after commit.

Okay. Will do.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-25T08:49:42Z

On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

>> What happens if we shutdown the WALwriter and then issue SIGHUP?
>
> SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
> the case where smart shutdown kills walwriter but some backends are
> still running?
> Currently SIGHUP affects full_page_writes and running backends use the changed
> new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
>
> To address the problem, we should either postpone the shutdown of walwriter
> until all backends have gone away, or leave the update of full_page_writes to
> checkpointer process instead of walwriter. Thought?

checkpointer seems the correct place to me

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Simon Riggs <simon@2ndquadrant.com> — 2012-01-25T18:07:24Z

On Wed, Jan 25, 2012 at 8:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>>> What happens if we shutdown the WALwriter and then issue SIGHUP?
>>
>> SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
>> the case where smart shutdown kills walwriter but some backends are
>> still running?
>> Currently SIGHUP affects full_page_writes and running backends use the changed
>> new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
>>
>> To address the problem, we should either postpone the shutdown of walwriter
>> until all backends have gone away, or leave the update of full_page_writes to
>> checkpointer process instead of walwriter. Thought?
>
> checkpointer seems the correct place to me


Done.


-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online base backup from the hot-standby

Fujii Masao <masao.fujii@gmail.com> — 2012-01-26T06:09:33Z

On Thu, Jan 26, 2012 at 3:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, Jan 25, 2012 at 8:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, Jan 25, 2012 at 8:16 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>>>> What happens if we shutdown the WALwriter and then issue SIGHUP?
>>>
>>> SIGHUP doesn't affect full_page_writes in that case. Oh, you are concerned about
>>> the case where smart shutdown kills walwriter but some backends are
>>> still running?
>>> Currently SIGHUP affects full_page_writes and running backends use the changed
>>> new value of full_page_writes. But in the patch, SIGHUP doesn't affect...
>>>
>>> To address the problem, we should either postpone the shutdown of walwriter
>>> until all backends have gone away, or leave the update of full_page_writes to
>>> checkpointer process instead of walwriter. Thought?
>>
>> checkpointer seems the correct place to me
>
>
> Done.

Thanks a lot!!

I proposed another small patch which fixes the issue about an error message of
pg_basebackup, in this upthread. If it's reasonable, could you commit it?
http://archives.postgresql.org/message-id/CAHGQGwENjSDN=f_VDPwVQ53QRU0cu9+wZKBvwNaEXMawj-y-GQ@mail.gmail.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center