Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Andres Freund <andres@anarazel.de>
From: Andres Freund <andres@anarazel.de>
To: Michael Paquier <michael@paquier.xyz>, Robert Haas <robertmhaas@gmail.com>
Cc: David Steele <david@pgmasters.net>, Kyotaro Horiguchi <horikyota.ntt@gmail.com>, pgsql-hackers@lists.postgresql.org, zxwsbg12138@gmail.com, david.zhang@highgo.ca
Date: 2023-10-30T19:47:41Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Delay recovery mode LOG after reading backup_label and/or checkpoint record
- dc5bd3889437 17.0 landed
-
Mention standby.signal in FATALs for checkpoint record missing at recovery
- 1ffdc03c21ae 17.0 landed
-
XLOG file archiving and point-in-time recovery. There are still some
- 66ec2db72840 8.0.0 cited
Hi, On 2023-10-30 16:08:50 +0900, Michael Paquier wrote: > From 26a8432fe3ab8426e7797d85d19b0fe69d3384c9 Mon Sep 17 00:00:00 2001 > From: Michael Paquier <michael@paquier.xyz> > Date: Mon, 30 Oct 2023 16:02:52 +0900 > Subject: [PATCH v4] Require recovery.signal or standby.signal when reading a > backup_file > > Historically, the startup process uses two static variables to control > if archive recovery should happen, when either recovery.signal or > standby.signal are defined in the data folder at the beginning of > recovery: I think the problem with these variables is that they're a really messy state machine - something this patch doesn't meaningfully improve IMO. > This configuration was possible when recovering from a base backup taken > by pg_basebackup without -R. Note that the documentation requires at > least to set recovery.signal to restore from a backup, but the startup > process was not making this policy explicit. Maybe I just didn't check the right place, but from I saw, this, at most, is implied, rather than explicitly stated. > In most cases, one would have been able to complete recovery, but that's a > matter of luck, really, as it depends on the workload of the origin server. With -X ... we have all the necessary WAL locally, how does the workload on the primary matter? If you pass --no-slot, pg_basebackup might fail to fetch the necessary wal, but then you'd also have gotten an error. I agree with Robert that this would be a good error check on a green field, but that I am less convinced it's going to help more than hurt now. Right now running pg_basebackup with -X stream, without --write-recovery-conf, gives you a copy of a cluster that will come up correctly as a distinct instance. With this change applied, you need to know that the way to avoid the existing FATAL about restore_command at startup (when recovery.signal exists but restore_command isn't set)) is to is to set "restore_command = false", something we don't explain anywhere afaict. We should lessen the need to ever use restore_command, not increase it. It also seems risky to have people get used to restore_command = false, because that effectively disables detection of other timelines etc. But, this method does force a new timeline - which will be the same on each clone of the database... I also just don't think that it's always desirable to create a new timeline. Greetings, Andres Freund