Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

Michael Paquier <michael@paquier.xyz>

From: Michael Paquier <michael@paquier.xyz>
To: Andres Freund <andres@anarazel.de>
Cc: Robert Haas <robertmhaas@gmail.com>, David Steele <david@pgmasters.net>, Kyotaro Horiguchi <horikyota.ntt@gmail.com>, pgsql-hackers@lists.postgresql.org, zxwsbg12138@gmail.com, david.zhang@highgo.ca
Date: 2023-11-14T00:13:44Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Delay recovery mode LOG after reading backup_label and/or checkpoint record

  2. Mention standby.signal in FATALs for checkpoint record missing at recovery

  3. XLOG file archiving and point-in-time recovery. There are still some

On Mon, Nov 13, 2023 at 03:41:44PM -0800, Andres Freund wrote:
> On 2023-11-09 12:16:52 +0900, Michael Paquier wrote:
>> On Thu, Nov 09, 2023 at 12:04:19PM +0900, Michael Paquier wrote:
>> > Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
>> > "to take no specific action related to archive recovery and recovery
>> > parameters at the end of recovery", meaning that a combination of
>> > backup_label with no signal file would be the same as crash recovery,
>> > replaying WAL up to the end of what can be found in pg_wal/, and only
>> > that.
> 
> I don't think those are equivalent - in the "backup_label with no signal file"
> case we start recovery at a different location than the "crash recovery" case
> does.

It depends on how you see things, and based on my read of the thread
or the code we've never really put a clear definition what a
"backup_label with no signal file" should do.  The definition I was
suggesting is to make it work the same way as crash recovery
internally:
- use the start LSN from the backup_label.
- replay up to the end of local WAL.
- don't rely on any recovery GUCs.
- if at the end of recovery replay has not reached the end-of-backup
record, then fail.

>> By being slightly more precise.  I also mean to fail recovery if it is
>> not possible to replay up to the end-of-backup LSN marked in the label
>> file because we are missing some stuff in pg_wal/, which is something
>> that the code is currently able to handle.
> 
> "able to handle" as in detect and error out? Because that's the only possible
> sane thing to do, correct?

By "able to handle", I mean to detect that the expected LSN has not
been reached and FATAL, or fail recovery.  So yes.
--
Michael