Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

David Zhang <david.zhang@highgo.ca>

From: David Zhang <david.zhang@highgo.ca>

To: Michael Paquier <michael@paquier.xyz>

Cc: Postgres hackers <pgsql-hackers@lists.postgresql.org>

Date: 2023-07-19T18:21:17Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Delay recovery mode LOG after reading backup_label and/or checkpoint record
- dc5bd3889437 17.0 landed
Mention standby.signal in FATALs for checkpoint record missing at recovery
- 1ffdc03c21ae 17.0 landed
XLOG file archiving and point-in-time recovery. There are still some
- 66ec2db72840 8.0.0 cited

On 2023-07-16 6:27 p.m., Michael Paquier wrote:
>
> Delete a backup_label from a fresh base backup can easily lead to data
> corruption, as the startup process would pick up as LSN to start
> recovery from the control file rather than the backup_label file.
> This would happen if a checkpoint updates the redo LSN in the control
> file while a backup happens and the control file is copied after the
> checkpoint, for instance.  If one wishes to deploy a new primary from
> a base backup, recovery.signal is the way to go, making sure that the
> new primary is bumped into a new timeline once recovery finishes, on
> top of making sure that the startup process starts recovery from a
> position where the cluster would be able to achieve a consistent
> state.
Thanks a lot for sharing this information.
>
> How would you rewrite that?  I am not sure how many details we want to
> put here in terms of differences between recovery.signal and
> standby.signal, still we surely should mention these are the two
> possible choices.

Honestly, I can't convince myself to mention the backup_label here too. 
But, I can share some information regarding my testing of the patch and 
the corresponding results.

To assess the impact of the patch, I executed the following commands for 
before and after,

pg_basebackup -h localhost -p 5432 -U david -D pg_backup1

pg_ctl -D pg_backup1 -l /tmp/logfile start

Before the patch, there were no issues encountered when starting an 
independent Primary server.

However, after applying the patch, I observed the following behavior 
when starting from the base backup:

1) simply start server from a base backup

FATAL:  could not find recovery.signal or standby.signal when recovering 
with backup_label

HINT:  If you are restoring from a backup, touch 
"/media/david/disk1/pg_backup1/recovery.signal" or 
"/media/david/disk1/pg_backup1/standby.signal" and add required recovery 
options.

2) touch a recovery.signal file and then try to start the server, the 
following error was encountered:

FATAL:  must specify restore_command when standby mode is not enabled

3) touch a standby.signal file, then the server successfully started, 
however, it operates in standby mode, whereas the intended behavior was 
for it to function as a primary server.

Best regards,

David