Thread

  1. Re: Online base backup from the hot-standby

    Jun Ishiduka <ishizuka.jun@po.ntts.co.jp> — 2011-07-12T07:12:44Z

    > This version of the patch adds a field into pg_controldata that tries to
    > store the source of the base backup while in recovery mode.
    > I think your ultimate goal with this patch is to be able to take a
    > backup of a running hot-standby slave and recover it as another
    > instance. This patch seems to provide the ability to have the second
    > slave stop recovery at minRecoveryPoint from the control file.
    > 
    > 
    > My understanding of the procedure you want to get to to take base
    > backups off a slave is
    > 
    > 1. execute pg_start_backup('x') on the slave (*)
    > 2. take a backup of the data dir
    > 3. call pg_stop_backup() on the slave
    > 4. Copy the control file on the slave
    > 
    > This patch only addresses the recovery portions.
    
    Yes.
    
    
    > I don't think the above comment is very clear on what backupserver is.
    > Perhaps
    > 
    > /**
    > * backupserver is used while postgresql is in recovery mode to
    > * store the location of where the backup comes from.
    > * When Postgres starts recovery operations
    > * it is set to "none". During recovery it is updated to either "master",
    > or "slave"
    > * When recovery operations finish it is updated back to "none"
    > **/
    
    Done.
    
    
    > Also shouldn't backupServer be the enum type of 'BackupServer' not int?
    > Other enums in the structure such as DBState are defined this way.
    
    Now, this is a same as wal_level, not DBState. No?
    
    
    > Since I can't yet call pg_start_backup or pg_stop_backup() on the slave
    > I am calling them on the master.
    > (I also did some testing where I didn't put the system into backup
    > mode). I admit that I am not sure what to look for as an indication that
    > the system isn't recovering to the correct point. In much of my testing
    > I was just verifying that the slave started and my data 'looked' okay.
    
    Updated patch as can execute pg_start/stop_backup() on standby server.
    One-pass of above steps(from 1. to 4.) is now done on this.
    However, there are conditions.
     * Master's full_page_write = on.
     * On the slave,  do not execute stop/promote operation before pg_stop_backup() is executed.
     * the result of pg_start_backup() may exceed the result of pg_stop_backup().
    
    
    > I seem to get this warning in my logs when I start up the instance based
    > on the slave backup.
    > LOG: 00000: database system was interrupted while in recovery at log
    > time 2011-07-08 18:40:20 EDT
    > HINT: If this has occurred more than once some data might be corrupted
    > and you might need to choose an earlier recovery target
    > 
    > I'm wondering if this warning is a bit misleading to users because it is
    > an expected message when starting up an instance based on a slave backup
    > (because the slave was already in recovery mode). If I shutdown this
    > instance and start it up again I keep getting the warning. My
    > understanding of your patch is that there shouldn't be any risk of
    > corruption in that case (assuming your patch has no bugs). Can/should we
    > be suppressing this message when we detect that we are recovering from a
    > slave backup?
    
    This has not been supported yet.
    I do not see what state of this message.
    
    Always happens when backup is taken from slave.
    What do you think about an approach to add context, "unless take backup from slave"?
    
    
    > The direction of the patch has changed a bit during this commit fest. I
    > think it would be good to provide an update on the rest of the changes
    > you plan for this to be a complete useable feature. That would make it
    > easier to comment on something you
    > missed versus something your planning on dealing with in the next stage.
    
    I see.
    
    I will provide a patch which can exeute pg_start/stop_backup
    including to solve above comment and conditions in next stage.
    Then please review.
    
    I change this patch status to "Returned with feedback".
    
    Regards.
    
    
    --------------------------------------------
    Jun Ishizuka
    NTT Software Corporation
    TEL:045-317-7018
    E-Mail: ishizuka.jun@po.ntts.co.jp
    --------------------------------------------