Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
pg_createsubscriber: Improve handling of automated recovery configuration
- 639352d904c8 19 (unreleased) landed
-
Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-07-16T05:57:18Z
Hi Hackers, I noticed that pg_createsubscriber sets recovery target params for correct recovery before converting a physical replica to a logical one but does not reset them afterward. It can lead to recovery failures in certain scenarios. For example, if recovery begins from a checkpoint where no WAL records need to be applied, the system might incorrectly determine that the recovery target was never reached because these parameters remain active. I’ve attached a TAP test to reproduce the issue. The proposed patch ensures all recovery parameters are reset after conversion to prevent such edge cases. I would appreciate any feedback. -- Regards, Alyona Vinter
-
RE: Resetting recovery target parameters in pg_createsubscriber
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2025-09-01T02:06:34Z
Dear Laaren, > I noticed that pg_createsubscriber sets recovery target params for > correct recovery before converting a physical replica to a logical > one but does not reset them afterward. It can lead to recovery > failures in certain scenarios. > For example, if recovery begins from a checkpoint where no WAL records > need to be applied, the system might incorrectly determine that the > recovery target was never reached because these parameters remain > active. Thanks for reporting. I have known that parameters won't be overwritten, but I didn't recognize the case that recovery fails. > I’ve attached a TAP test to reproduce the issue. > The proposed patch ensures all recovery parameters are reset after > conversion to prevent such edge cases. WriteRecoveryConfig() has been used to setup the recovery parameters. Can we follow the way to restore them? Also, can we add a test to 040_pg_createsubscriber? IIUC it is enough to check one of recovery parameter is reset after the conversion. Best regards, Hayato Kuroda FUJITSU LIMITED
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-09-01T02:29:22Z
On Mon, Sep 01, 2025 at 02:06:34AM +0000, Hayato Kuroda (Fujitsu) wrote: > WriteRecoveryConfig() has been used to setup the recovery parameters. Can we > follow the way to restore them? > > Also, can we add a test to 040_pg_createsubscriber? IIUC it is enough to check > one of recovery parameter is reset after the conversion. Yeah, we'd want some tests to check the behaviors and expectations in this tool. This tool is complex enough that this is going to be mandatory, and making a test cheaper is always nicer. FWIW, I find the proposed patch a bit dangerous. It updates pg_createsubscriber.c so as an ALTER SYSTEM is used to reset the parameters, but the recovery parameters are updated via WriteRecoveryConfig() which is the code path holding the knowledge that postgresql.auto.conf is used to hold the recovery parameters. I don't like much the fact this creates a duplication with setup_recovery() for the list of parameters handled. All the recovery parameters are forced to a hardcoded value, except recovery_target_lsn. So perhaps it would be better to maintain in pg_createsubscriber.c a list made of (GUC names, values), with the LSN part handled as an exception for the value to assign. GenerateRecoveryConfig() can work with a replication connection, relying on ALTER SYSTEM would not be able to do that properly, so perhaps we should just invent a new routine that resets a portion of the file on disk because recovery_gen.c's code already assumes that it has write access to a data folder to do its work? -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-02T12:36:57Z
Dear Michael and Hayato, Thank you both for your valuable feedback on the previous patch version. I've reworked the patch based on your suggestions - the new version should address the concerns about ALTER SYSTEM and follows the same patterns as the 'setup_recovery' code. I kept primary_conninfo as-is for now since I'm not totally sure if we need to touch it I look forward to your feedback! ;) Best regards, Alyona Vinter
-
RE: Resetting recovery target parameters in pg_createsubscriber
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2025-09-05T02:55:14Z
Dear Alyona, Thanks for updating the patch! Sadly, your patch cannot be applied cleanly. Even after the manual merge, it could not be built. Maybe `dbinfo` should be `dbinfos.dbinfo`. Obtained message is written in [1]. (cfbot seemed not to run correctly) Regarding patch content, your patch restores the postgresql.auto.conf after the command runs. Initially I felt that it is enough to set below GUCs becasue only they are changed from the default. Is there a reason why you fully restore them? ``` recovery_target_inclusive true recovery_target_action pause recovery_target_lsn "" ``` [1] ``` ../postgres/src/bin/pg_basebackup/pg_createsubscriber.c: In function ‘main’: ../postgres/src/bin/pg_basebackup/pg_createsubscriber.c:2526:31: error: ‘dbinfo’ undeclared (first use in this function); did you mean ‘dbinfos’? 2526 | reset_recovery_params(dbinfo, subscriber_dir); | ^~~~~~ | dbinfos ``` Best regards, Hayato Kuroda FUJITSU LIMITED -
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-05T05:51:12Z
Dear Hayato, Thank you for the review! My apologies for the error in the patch -- it looks like I accidentally modified it before sending =(. I've attached the fixed versions below. > Regarding patch content, your patch restores the postgresql.auto.conf after the > command runs. Initially I felt that it is enough to set below GUCs becasue only > they are changed from the default. Is there a reason why you fully restore them? I just found it easier to restore the original state of 'postgresql.auto.conf', as removing parameters from the file resets them to their default values. This approach achieves the same final state without having to explicitly set each one.
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-08T04:07:22Z
Hi, CFbot indicated some issues with the patch. I've attached rebased versions of the patches, so hopefully everything will be ok this time. Best regards, Alyona Vinter On Fri, 5 Sept 2025 at 12:51, Alyona Vinter <dlaaren8@gmail.com> wrote: > Dear Hayato, > > Thank you for the review! My apologies for the error in the patch -- it > looks like I accidentally modified it before sending =(. I've attached the > fixed versions below. > > > Regarding patch content, your patch restores the postgresql.auto.conf > after the > > command runs. Initially I felt that it is enough to set below GUCs > becasue only > > they are changed from the default. Is there a reason why you fully > restore them? > > I just found it easier to restore the original state of > 'postgresql.auto.conf', as removing parameters from the file resets them to > their default values. This approach achieves the same final state without > having to explicitly set each one. >
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-08T05:35:29Z
Sorry, wrong patches again. Here are the correct ones. Best regards, Alyona Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alexander Korotkov <aekorotkov@gmail.com> — 2025-09-15T07:29:47Z
Hello, Alyona! On Mon, Sep 8, 2025 at 8:35 AM Alyona Vinter <dlaaren8@gmail.com> wrote: > > Sorry, wrong patches again. Here are the correct ones. I went though this patches. 1) I've removed the array of parameters. I see it was proposed by Michael upthread. But I think his proposal came from the fact we walk trough the parameters twice. But we end up walking trough the parameter once in setup_recovery(), while reset_recovery_params() just restores the previous contents. I think it makes sense to keep the changes minimal. 2) I reordered patches so that helper function goes first. I think it essential to order commit in the way that every commit leaves our tree in working state. 3) I make pgpreltidy run over 040_pg_createsubscriber.pl. Any thought? ------ Regards, Alexander Korotkov Supabase
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-09-16T01:51:20Z
On Mon, Sep 15, 2025 at 10:29:47AM +0300, Alexander Korotkov wrote: > I went though this patches. > 1) I've removed the array of parameters. I see it was proposed by > Michael upthread. But I think his proposal came from the fact we walk > trough the parameters twice. But we end up walking trough the > parameter once in setup_recovery(), while reset_recovery_params() just > restores the previous contents. I think it makes sense to keep the > changes minimal. Yeah, my concern was about the duplication of the list. As long as a fix does not do any of that, I'm OK. Sorry if my idea of a list of parameters felt misguided if we make recovery_gen.c smarter with the handling of the on-disk files. > 2) I reordered patches so that helper function goes first. I think it > essential to order commit in the way that every commit leaves our tree > in working state. Yep. That would create some noise if one bisects for example. These are always annoying because they make analysis of a range of commits longer with more false positives. If you have a large range of commits, the odds are usually very low, but who knows.. > 3) I make pgpreltidy run over 040_pg_createsubscriber.pl. > Any thought? GetRecoveryConfig() and ReplaceRecoveryConfig() should have some documentation, regarding what the callers of these functions can expect from them. + use_recovery_conf = + PQserverVersion(pgconn) < MINIMUM_VERSION_FOR_RECOVERY_GUC; + + snprintf(tmp_filename, MAXPGPATH, "%s/%s.tmp", target_dir, + use_recovery_conf ? "recovery.conf" : "postgresql.auto.conf"); + + snprintf(filename, MAXPGPATH, "%s/%s", target_dir, + use_recovery_conf ? "recovery.conf" : "postgresql.auto.conf" No need for use_recovery_conf. You could just set a pointer to the file name instead and avoid the duplication. + cf = fopen(tmp_filename, "w"); + if (cf == NULL) + pg_fatal("could not open file \"%s\": %m", tmp_filename); "a" is used in fopen() when calling WriteRecoveryConfig() when under use_recovery_conf. Perhaps this inconsistency should be specified as a comment because we are generating a temporary file from scratch with the new recovery GUC contents? This patch also means that pg_createsubscriber is written so as the contents added to recovery.conf/postgresql.auto.conf by setup_recovery() are never reset if there is a failure in-flight. Is that OK or should we also have an exit callback to do the cleanup work in such cases? Perhaps these internal manipulations should be documented as well, to make the users of this tool aware of steps they may need to take in the event of an in-flight failure? pg_createsubscriber includes a "How it works" section that explains how the tool works, including the part about the recovery parameters. The changes of this patch become implied facts, and are not reflected in the docs. That sounds like a problem to me because we are hiding some of the the internal logic, but the docs are written so as they explain all these details. -- Michael -
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-16T10:27:43Z
Hi Michael and Alexander, Thank you both for your help! I really appreciate it. As a newcomer here, I might make some mistakes, but I hope with your guidance I can avoid making them in the future =) > Yeah, my concern was about the duplication of the list. As long as a > fix does not do any of that, I'm OK. Sorry if my idea of a list of > parameters felt misguided if we make recovery_gen.c smarter with the > handling of the on-disk files. I got your concern about avoiding duplication. I thought that defining all parameters explicitly in the file header would lead to clearer and nicer code, which is why I left it that way (even without duplicating). But now I agree with Alexander's point about keeping the changes minimal. > This patch also means that pg_createsubscriber is written so as the > contents added to recovery.conf/postgresql.auto.conf by > setup_recovery() are never reset if there is a failure in-flight. Is > that OK or should we also have an exit callback to do the cleanup work > in such cases? It's a good idea to add an exit callback. Additionally, I'd like to propose adding a pre-flight check at the start. This check would look for any existing recovery configuration that might be an artifact from a previous aborted run and warn the user or handle it appropriately. What do you think about implementing both the exit callback and the pre-flight check? > pg_createsubscriber includes a > "How it works" section that explains how the tool works, including the > part about the recovery parameters. I looked through the `pg_createsubscriber.c` file but wasn't able to locate a "How it works" section. Could you please point me to the specific file or line number you are referring to? Or do you mean all the descriptive comments? For context, I'm currently working on the version where my patch is being tested in CI. I will work on improving the code and will also add the documentation notes that Michael has pointed out ASAP. Best regards, Alyona Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-09-19T01:08:02Z
On Tue, Sep 16, 2025 at 05:27:43PM +0700, Alyona Vinter wrote: >> This patch also means that pg_createsubscriber is written so as the >> contents added to recovery.conf/postgresql.auto.conf by >> setup_recovery() are never reset if there is a failure in-flight. Is >> that OK or should we also have an exit callback to do the cleanup work >> in such cases? > > It's a good idea to add an exit callback. Additionally, I'd like to propose > adding a pre-flight check at the start. This check would look for any > existing recovery configuration that might be an artifact from a previous > aborted run and warn the user or handle it appropriately. What do you think > about implementing both the exit callback and the pre-flight check? I am not sure how much a pre-flight check would help if we have an exit callback that would make sure that things are cleaned up on exit. Is there any need to worry about a kill(9) that would cause the exit cleanup callback to not be called? We don't bother about that usually, so I don't see a strong case for it here, either. :) Alexander may have a different opinion. > I will work on improving the code and will also add the documentation notes > that Michael has pointed out ASAP. Thanks. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-23T05:04:04Z
Hi, I'm back with improvements :) I've added code comments in `recovery_gen.c` and expanded the documentation in `pg_createsubscriber.sgml`. About the recovery parameters cleanup: I thought about adding an exit callback, but it doesn't really make sense because once the target server gets promoted (which happens soon after we set the parameters), there's no point in cleaning up - the server is already promoted and can't be used as a replica again and must be recreated. Also, `reset_recovery_params()` might call `exit()` itself, which could cause problems with the cleanup callback. So I think it's better to just warn users about leftover parameters and let them handle the cleanup manually if needed. By the way, is it ok that the second patch includes both code and test changes together, or should I split them into separate commits? I look forward to your feedback! Regards, Alena Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-09-26T00:40:50Z
On Tue, Sep 23, 2025 at 12:04:04PM +0700, Alena Vinter wrote: > About the recovery parameters cleanup: I thought about adding an exit > callback, but it doesn't really make sense because once the target server > gets promoted (which happens soon after we set the parameters), there's no > point in cleaning up - the server is already promoted and can't be used as > a replica again and must be recreated. Also, `reset_recovery_params()` > might call `exit()` itself, which could cause problems with the cleanup > callback. Your argument does not consider one case, which is very common: pg_rewind. Even if the standby finishes recovery and is promoted with its new recovery parameters, we could rewind it rather than recreate a new standby from scratch. That's cheaper than recreating a new physical replica from scratch. Keeping the recovery parameters added by pg_createsubscriber around would make pg_rewind's work more complicated, because it does similar manipulations, for different requirements. The tipping point where we would not be able to reuse the promoted standby happens as the last step of pg_createsuscriber in modify_subscriber_sysid() where its system ID is changed. Before that, the code also makes an effort of cleaning up anything that's been created in-betwee. Even the system ID argument is not entirely true, actually. One could also decide to switch the system ID back to what it was previously to match with the primary. That requires a bit more magic, but that's not impossible. > So I think it's better to just warn users about leftover parameters and let > them handle the cleanup manually if needed. Warnings tend to be ignored and missed, especially these days where vendors automate these actions. It is true that there could be an argument about requiring extra implementation steps on each vendor side, but they would also need to keep up with any new GUCs that pg_createsubscriber may add in the future when setting up its recovery parameters, which would mean extra work for everybody, increasing the range of problems for some logic that's isolated to pg_createsubscriber. In short, I disagree with what you are doing here: we should take the extra step and clean up anything that's been created by the tool when we know we can safely do so (aka adding a static flag that the existing cleanup callback should rely on, which is already what your patch 0003 does to show a warning). > By the way, is it ok that the second patch includes both code and test > changes together, or should I split them into separate commits? The tests and the fix touch entirely separate code paths, keeping them together is no big deal. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-29T09:57:09Z
Hi, > In short, I disagree with what you are doing here: we should take the > extra step and clean up anything that's been created by the tool when > we know we can safely do so I got your point, thanks for pointing to the `pg_rewind` case. I've attached a new version of the patches. I've changed `ReplaceRecoveryConfig` a little bit -- now it returns false in case of an error instead of exiting. Best wishes, Alena Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-09-30T02:16:41Z
On Mon, Sep 29, 2025 at 04:57:09PM +0700, Alena Vinter wrote: > I got your point, thanks for pointing to the `pg_rewind` case. I've > attached a new version of the patches. I've changed `ReplaceRecoveryConfig` > a little bit -- now it returns false in case of an error instead of exiting. #include "common/logging.h" +#include "common/file_utils.h" Incorrect include file ordering. +GetRecoveryConfig(PGconn *pgconn, const char *target_dir) [...] + char data[1024]; [...] + while ((bytes_read = fread(data, 1, sizeof(data), cf)) > 0) + { + data[bytes_read] = '\0'; + appendPQExpBufferStr(contents, data); + } You are assuming that this will never overflow. However, recovery parameters could include commands, which are mostly limited to MAXPGPATH, itself 1024. So that's unsafe. The in-core routine pg_get_line(), or the rest in pg_get_line.c is safer to use, relying on malloc() for the frontend and the lines fetched. + pg_log_warning_hint("Manual removal of recovery parameters is required from 'postgresql.auto.conf' (PostgreSQL %d+) or 'recovery.conf' (older versions)", + MINIMUM_VERSION_FOR_RECOVERY_GUC / 10000); Hmm, okay here. You would need that hint anyway if you cannot connect to determine to which file the recovery parameters need to go to, the other code paths failures in ReplaceRecoveryConfig() would include the file name, which offers a sufficient hint about the version, but a connect_database() failure does not. +static bool recovery_params_set = false; +static bool recovery_params_reset = false; Hmm. We may need an explanation about these, in the shape of a comment, to document what's expected from them. Rather than two booleans, using an enum tracking the state of the parameters would be cleaner? And actually, you do not need two flags. Why not just switch recovery_params_set to false once ReplaceRecoveryConfig() is called? +reset_recovery_params(const struct LogicalRepInfo *dbinfo, const char *datadir) [...] + recoveryconfcontents = GenerateRecoveryConfig(conn, NULL, NULL); Why do we need to call again GenerateRecoveryConfig() when resetting recovery.conf/postgresql.conf.sample with its original contents before switching the system ID of the new replica? I may be missing something, of course, but we're done with recovery so I don't quite see the point in appending the recovery config generated with the original contents. If this is justified (don't think it is), this deserves a comment to explain the reason behind this logic. -- Michael -
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-09-30T05:22:08Z
HI Michael, Thank you for the review! > Why not just > switch recovery_params_set to false once ReplaceRecoveryConfig() is > called? Stupid me! > Why do we need to call again GenerateRecoveryConfig() when resetting > recovery.conf/postgresql.conf.sample with its original contents before > switching the system ID of the new replica? I may be missing > something, of course, but we're done with recovery so I don't quite > see the point in appending the recovery config generated with the > original contents. If this is justified (don't think it is), this > deserves a comment to explain the reason behind this logic. This relates to the point I mentioned earlier about being unsure whether we should preserve `primary_conninfo`: > I kept primary_conninfo as-is for now since I'm not totally sure if we need to touch it The reason I called `GenerateRecoveryConfig()` was to regenerate the `primary_conninfo` string in the recovery configuration file. If we should remove it, then the reset function can be much simpler. Could you please help me to clarify should we regenerate `primary_conninfo` or we can safely purge it? Best regards, Alena Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-10-01T04:02:25Z
On Tue, Sep 30, 2025 at 12:22:08PM +0700, Alena Vinter wrote: > This relates to the point I mentioned earlier about being unsure whether we > should preserve `primary_conninfo`: > > I kept primary_conninfo as-is for now since I'm not totally sure if we > need to touch it > > The reason I called `GenerateRecoveryConfig()` was to regenerate the > `primary_conninfo` string in the recovery configuration file. If we should > remove it, then the reset function can be much simpler. Could you please > help me to clarify should we regenerate `primary_conninfo` or we can safely > purge it? Based on the contents of the latest patch, we reset the parameters after promoting the node, and primary_conninfo only matters while we are in recovery, for a standby recovery WAL using the streaming replication protocol. -- Michael
-
RE: Resetting recovery target parameters in pg_createsubscriber
Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2025-10-01T04:38:42Z
Dear Alena, Thanks for updating the patch. Few comments. ``` + /* Before setting up the recovery parameters save the original content. */ + savedrecoveryconfcontents = GetRecoveryConfig(conn, datadir); ``` To confirm, you put the connection to the primary/publisher instead of standby/subscriber. But it is harmless because streaming replication requires that both instances have the same major version. Is it correct? ``` + pg_log_warning_hint("Manual removal of recovery parameters is required from 'postgresql.auto.conf' (PostgreSQL %d+) or 'recovery.conf' (older versions)", + MINIMUM_VERSION_FOR_RECOVERY_GUC / 10000); ``` Can we cache the version info when we firstly connect to the primary node to print appropriate filename? Or is it hacky? ``` + if (dry_run) + { + appendPQExpBufferStr(savedrecoveryconfcontents, "# dry run mode"); + } ``` Per my understanding, setup_recovery() puts the indicator becasue the content can be printed. I think it is not needed since reset_recovery_params() does not have that, or we can even print the parameters. ``` +sub test_param_absent +{ + my ($node, $param) = @_; + my $auto_conf = $node->data_dir . '/postgresql.auto.conf'; + + return 1 unless -e $auto_conf; + + my $content = slurp_file($auto_conf); + return $content !~ /^\s*$param\s*=/m; +} ``` Can you add a short comment atop the function? Something like: "Check whether the given parameter is specified in postgresql.auto.conf" Best regards, Hayato Kuroda FUJITSU LIMITED -
RE: Resetting recovery target parameters in pg_createsubscriber
Ilyasov Ian <ianilyasov@outlook.com> — 2025-10-05T22:30:53Z
Hello Alena! I am new in reviewing here and tried to review your patch: I agree with the review of Michael considering +char data[1024]; This looks unsafe. +static bool recovery_params_set = false; +static bool recovery_params_reset = false; Using two booleans here looks wrong to me. Maybe one is enough with refactored logic in cleanup_objects_atexit()? +pg_log_warning_hint("Manual removal of recovery parameters is required from 'postgresql.auto.conf' (PostgreSQL %d+) or 'recovery.conf' (older versions)", Do we need info about recovery.conf here since patch applies only to master? Also I am not sure what scenario we are protecting against. I set up logical replication via pg_createsubscriber first and did this: ./bin/pg_ctl -D standby -l standby stop -m fast touch standby/recovery.signal ./bin/pg_ctl -D standby -l standby start with restore_command = 'cp /home/postgresql-install/wal_archive/%f "%p"' With no patch I got: LOG: invalid record length at 0/A0000A0: expected at least 24, got 0 LOG: redo is not required FATAL: recovery ended before configured recovery target was reached But with patches applied I successfully started the standby. Did I get the idea right? Kind regards, Ian Ilyasov. -
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-10-06T03:58:07Z
On Sun, Oct 05, 2025 at 10:30:53PM +0000, Ilyasov Ian wrote: > Do we need info about recovery.conf here since patch applies only to > master? And actually, I think that you are pointing at a bug here. pg_createsubscriber does updates of the control file but it includes zero checks based on PG_CONTROL_VERSION to make sure that it is able to work with a version compatible with what's on disk. The CRC check would be reported as incorrect after calling get_controlfile(), but it's disturbing to claim that the control file looks corrupted. So, oops? [.. checks ..] The last control file update has been done in 44fe30fdab67, and attempting to run pg_createsubscriber on a v17 cluster leads to: $ pg_createsubscriber -D $HOME/data/5433 -P "host=/tmp port=5432" -d postgres pg_createsubscriber: error: control file appears to be corrupt So, yes, oops. We document that pg_cretesubscriber should have the same major version as the source and target servers, which is good. This error is no good, especially as checking it is just a few lines of code, and that the take is actually PG_CONTROL_VERSION for control file consistency. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-10-06T06:25:12Z
Hi everyone, Thank you for all the valuable feedback! I've improved the patches in the latest version. > Based on the contents of the latest patch, we reset the parameters > after promoting the node, and primary_conninfo only matters while we > are in recovery, for a standby recovery WAL using the streaming > replication protocol. Michael, thanks for helping! This fact simplifies the code. I put resetting the parameters exclusively in the `atexit` callback -- this approach seems neater to me. What do you think? > Did I get the idea right? Ian, yes, you got it right. The core issue occurs when postgres encounters a checkpoint during recovery, determines redo isn't needed (because there are no records after the checkpoint), but then fails with a fatal error because it cannot reach the specified LSN target (which is lower than the checkpoint LSN). I reckon this is a recovery logic issue, but I also believe the component that sets recovery parameters should be responsible for cleaning them up when they're no longer required. Best wishes, Alena Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-10-08T11:42:31Z
On Mon, Oct 06, 2025 at 01:25:12PM +0700, Alena Vinter wrote: > Michael, thanks for helping! This fact simplifies the code. I put resetting > the parameters exclusively in the `atexit` callback -- this approach seems > neater to me. What do you think? I have been looking at this patch for a couple of hours, and I don't really like the result, for a variety of reasons. Some of the reasons come with the changes in recovery_gen.c themselves, as proposed in the patch, where the only thing we want to do it replace the contents of one file by the other, some other reasons come from the way pg_createsubscriber complicates its life on HEAD. There is no need to read the contents line by line and write them back, we can just do file manipulations. The reason why the patch does things this way currently is that it has zero knowledge of the file location where the recovery parameters contents are written, because this location is internal in recovery_gen.c, at least based on how pg_createsubscriber is written. And well, this fact is wrong even on HEAD: we know where the recovery parameters are written because pg_createsubscriber is documented as only supporting the same major version as the one where the tool has been compiled. So it is pointless to call WriteRecoveryConfig() with a connection object (using a PGconn pointer in this API is an artifact of pg_basebackup, where we support base backups taken from older major versions when using a newer version of the tool). pg_createsubscriber has no need to bind to this limitation, but we don't need to improve this point for the sake of this thread. The proposed patch is written without taking into account this issue, and the patch has a lot of logic that's not necessary. There is no point in referring to recovery.conf in the code and the tests, as well. Anyway, a second reason why I am not cool with the patch is that the contents written by pg_createsubscriber are entirely erased from existence, and I see a good point in keeping a trace of them at least for post-operation debugging purposes. With all that in mind, I came up with the following solution, which is able to fix what you want to address (aka not load any of the recovery parameters written by the tool if you reactivate a standby with a new signal file), while also satisfying my condition, which is to keep a track of the parameters written. Hence, let's: - forget about the changes in recovery_gen.c. - call WriteRecoveryConfig() with only one line added in the contents written to the "recovery" file (which is postgresql.conf.auto, okay): include_if_exists = 'pg_createsubscriber.conf' - Write the parameters generated by pg_createsubscriber to this new configuration file. - In the exit callback, call durable_rename() and rename pg_createsubscriber.conf to a pg_createsubscriber.conf.old. There is no need to cache the backend version or rely on a connection. We'll unlikely see a failure. Even if there is a failure, fixing the problem would be just to move or delete the extra file, and documenting that is simpler. All that points to the direction that we may not want to backpatch any of this, considering these changes as improvements in usability. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Robert Haas <robertmhaas@gmail.com> — 2025-10-14T18:04:22Z
On Wed, Oct 8, 2025 at 7:43 AM Michael Paquier <michael@paquier.xyz> wrote: > With all that in mind, I came > up with the following solution, which is able to fix what you want to > address (aka not load any of the recovery parameters written by the > tool if you reactivate a standby with a new signal file), while also > satisfying my condition, which is to keep a track of the parameters > written. I'd like to back up one more step: why do we think that this is even a valid scenario in the first place? The original scenario involves running pg_createsubscriber and then putting the server back into recovery mode. But why is it valid to just put the server back into recovery mode at that point? That doesn't seem like something that you can just go do and expect it to work, especially if you don't check that other parameters have the values that you want. Generally, recovery is a one-time event, and once you exit, you only reenter on a newly-taken backup or after a crash or a pg_rewind. There are, of course, other times when you can force a server back into recovery without anything bad happening, but it's not my impression that we support that in general; it's something you can choose to do as an expert operator if you are certain that it's OK in your scenario. So my question is: why should we do anything at all about this? -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-10-27T12:22:12Z
Hi everyone, Michael, thank you for outlining your alternative approach. After rethinking the current patch state with a clearer vision, I realized that simply truncating the postgresql.auto.conf file is sufficient. All modifications made by pg_createsubscriber in this file are append-only, so truncation reliably restores it to its original state without adding extra logic. This keeps the patch small and clean. For older versions using recovery.conf, the situation differs — since that file is fully rewritten during recovery setup, we instead restore the previously saved original file using a durable rename. Regarding debugging: the contents are not entirely lost. pg_createsubscriber already prints the new recovery configuration as debug output, so the full parameter set remains visible in the logs for inspection when needed. My point is that adding include directives isn't needed, as we already have debug output, and, moreover, they aren't applied to recovery.conf. Robert, this scenario actually occurred in production at one of our customer environments. Even though this workflow may be uncommon, PostgreSQL should still handle it gracefully. The fact that the server can end up in a state where it cannot start because it fails to reach a recovery target point far in the past suggests a potential area for improvement in the recovery process. It would be helpful if the system could detect such a case — where the recovery target precedes the current consistent point — and either skip recovery or emit a clear diagnostic message rather than failing to start. Best regards, Vinter Alena
-
Re: Resetting recovery target parameters in pg_createsubscriber
Robert Haas <robertmhaas@gmail.com> — 2025-10-29T19:46:38Z
On Mon, Oct 27, 2025 at 8:22 AM Alena Vinter <dlaaren8@gmail.com> wrote: > Robert, this scenario actually occurred in production at one of our customer environments. Even though this workflow may be uncommon, PostgreSQL should still handle it gracefully. The fact that the server can end up in a state where it cannot start because it fails to reach a recovery target point far in the past suggests a potential area for improvement in the recovery process. It would be helpful if the system could detect such a case — where the recovery target precedes the current consistent point — and either skip recovery or emit a clear diagnostic message rather than failing to start. The question isn't whether the workflow is common. If something is broken, we should ideally fix it even if we don't believe that it is very likely to occur. The question is whether the workflow is something that a user can reasonably expect to work. If you remove all of your data files and then complain that the database doesn't work any more, that's not an indication of a problem with PostgreSQL, but rather an indication that it's not a good idea to remove all of your data files. More generally, if you make manual changes to the data directory and the results are unsatisfactory, we generally consider that to be an error on your part rather than a problem with PostgreSQL. You can of course edit configuration files like postgresql.conf or pg_hba.conf and expect things to work as long as the resulting configuration file is still valid, but you can't manually modify pg_control on disk and then call it a bug when something goes wrong. In the case at hand, you've offered no justification for why it's OK to put the server back into recovery at all -- normally, you only put a server in recovery if you're spinning it up from a backup, which is not the case in this scenario. I don't really understand why that would be a valid thing to do, and I even less understand why you should expect to be able to do it without checking the recovery configuration and still have things work. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alexander Korotkov <aekorotkov@gmail.com> — 2025-10-30T23:08:17Z
On Wed, Oct 29, 2025 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Oct 27, 2025 at 8:22 AM Alena Vinter <dlaaren8@gmail.com> wrote: > > Robert, this scenario actually occurred in production at one of our customer environments. Even though this workflow may be uncommon, PostgreSQL should still handle it gracefully. The fact that the server can end up in a state where it cannot start because it fails to reach a recovery target point far in the past suggests a potential area for improvement in the recovery process. It would be helpful if the system could detect such a case — where the recovery target precedes the current consistent point — and either skip recovery or emit a clear diagnostic message rather than failing to start. > > The question isn't whether the workflow is common. If something is > broken, we should ideally fix it even if we don't believe that it is > very likely to occur. The question is whether the workflow is > something that a user can reasonably expect to work. If you remove all > of your data files and then complain that the database doesn't work > any more, that's not an indication of a problem with PostgreSQL, but > rather an indication that it's not a good idea to remove all of your > data files. More generally, if you make manual changes to the data > directory and the results are unsatisfactory, we generally consider > that to be an error on your part rather than a problem with > PostgreSQL. You can of course edit configuration files like > postgresql.conf or pg_hba.conf and expect things to work as long as > the resulting configuration file is still valid, but you can't > manually modify pg_control on disk and then call it a bug when > something goes wrong. > > In the case at hand, you've offered no justification for why it's OK > to put the server back into recovery at all -- normally, you only put > a server in recovery if you're spinning it up from a backup, which is > not the case in this scenario. I don't really understand why that > would be a valid thing to do, and I even less understand why you > should expect to be able to do it without checking the recovery > configuration and still have things work. Can we see this from the different prospective? pg_createsubscriber is intended to turn physical replica into a logical replica. And it leaves subscriber database cluster with rudimentary configuration options needed purely for its intermediate step. Whatever usage scenario is, user might deleted these extra options if needed. This is not a big deal. However, it's certainly cleaner for pg_createsubscriber to avoid leaving this options (especially if their appearance is not documented). ------ Regards, Alexander Korotkov Supabase
-
Re: Resetting recovery target parameters in pg_createsubscriber
Robert Haas <robertmhaas@gmail.com> — 2025-10-31T13:30:36Z
On Thu, Oct 30, 2025 at 7:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Can we see this from the different prospective? pg_createsubscriber > is intended to turn physical replica into a logical replica. And it > leaves subscriber database cluster with rudimentary configuration > options needed purely for its intermediate step. Whatever usage > scenario is, user might deleted these extra options if needed. This > is not a big deal. However, it's certainly cleaner for > pg_createsubscriber to avoid leaving this options (especially if their > appearance is not documented). I think that's a fair point. Michael's proposal doesn't sound too bad to me from that point of view, as it seems to reduce the amount of machinery that we need to introduce in order to accomplish the goal of not leaving configuration behind. It's a little unfortunate that pg_createsubscriber needs to modify the configuration file at all, rather than say passing flags through on the PostgreSQL command line. However, doing it that way might make the code a lot more complicated, and it's not worth a lot of code complexity to avoid leaving recovery parameters behind given that, as you can say, the user can always just delete them. Also, to your point, the fact that they will be added really ought to be documented. But what I don't want to us to do is write a commit message, or add tests, that imply that the scenario so far proposed is a reasonable one. What we were told is that the server is put back into recovery just after being made into a primary, which seems clearly nonsensical. You might think of using pg_rewind to put a server back into recovery after a failover, but in that situation, there is at least 1 standby following the primary, and you're trying to get them to switch roles. But here, just after running pg_subscriber, there's no standby yet. If you put the one and only server with that system ID back into recovery, there's no other server it can possibly follow, so what's the point? Maybe what we could use a scenario is one where the node created by pg_createsubscriber continues to run as a primary, and eventually someone creates a standby from it which also inherits its physical replication configuration and then that cause some kind of problem. Now, to be fair, that scenario could also happen without using pg_createsubscriber at all: you could spin up a new primary via PITR, for example, and that configuration could propagate to standbys created from the primary and then cause problems, and as far as I know we do nothing to protect against that. That doesn't mean that we can't try to protect against this, but I do think it needs to be justified by reference to a sequence of actions that would be reasonable to perform on unpatched PostgreSQL. Otherwise, I think we'll be adding to future confusion in an area that can already be extremely confusing. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alexander Korotkov <aekorotkov@gmail.com> — 2025-11-03T21:27:59Z
On Mon, Oct 27, 2025 at 2:22 PM Alena Vinter <dlaaren8@gmail.com> wrote: > Michael, thank you for outlining your alternative approach. > After rethinking the current patch state with a clearer vision, I realized that simply truncating the postgresql.auto.conf file is sufficient. All modifications made by pg_createsubscriber in this file are append-only, so truncation reliably restores it to its original state without adding extra logic. This keeps the patch small and clean. > > For older versions using recovery.conf, the situation differs — since that file is fully rewritten during recovery setup, we instead restore the previously saved original file using a durable rename. > > Regarding debugging: the contents are not entirely lost. pg_createsubscriber already prints the new recovery configuration as debug output, so the full parameter set remains visible in the logs for inspection when needed. My point is that adding include directives isn't needed, as we already have debug output, and, moreover, they aren't applied to recovery.conf. I have rechecked this. It appears that pg_createsubscriber writes the recovery configuration to the output and only in verbose mode. So, it's far no guaranteed that this information would be accessible. One may run pg_createsubscriber not in verbose mode or don't save its output. I suggest we should re-implement this in a way Michael proposed [1]: save the configuration to pg_createsubscriber.conf.old file. Links. 1. https://www.postgresql.org/message-id/aOZOJ8p8LEcw0SpH%40paquier.xyz ------ Regards, Alexander Korotkov Supabase
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-11-10T05:02:14Z
On Tue, 4 Nov 2025 at 04:28, Alexander Korotkov <aekorotkov@gmail.com> wrote: > I have rechecked this. It appears that pg_createsubscriber writes the > recovery configuration to the output and only in verbose mode. So, > it's far no guaranteed that this information would be accessible. One > may run pg_createsubscriber not in verbose mode or don't save its > output. I suggest we should re-implement this in a way Michael > proposed [1]: save the configuration to pg_createsubscriber.conf.old > file. > Alexander, I'm not in favor of saving additional files. This approach seems to replace one type of leftover artifact (recovery params) with another (debug-files). Neither option is good. As Michael pointed out, the parameters might be useful for post-debugging purposes. This suggests to me that they are, by nature, debugging information. Therefore, it seems appropriate that they should be captured by the verbose/debug mode. If verbose mode isn't used, we lose more than just the recovery parameters — we also lose the sequence of commands for managing replication slots and other steps. Following this logic, why not save all information in non-verbose mode that might be used for debugging? Robert, I'll think more about a valid scenario (including the one you proposed) and get back with results later. --- Regards, Alena Vinter
-
Re: Resetting recovery target parameters in pg_createsubscriber
Robert Haas <robertmhaas@gmail.com> — 2025-11-19T19:49:29Z
On Thu, Nov 13, 2025 at 7:46 AM Alena Vinter <dlaaren8@gmail.com> wrote: > Robert, here's a realistic scenario where the issue occurs: > 1. Start with a primary and physical standby > 2. Convert the standby to a logical replica using `pg_createsubscriber` > 3. Later, create a new physical standby from a backup of this logical replica > 4. The new standby fails to start because it cannot reach consistency point > > The root cause is that `pg_createsubscriber` leaves behind recovery parameters that interfere with the new standby's startup process, causing recovery to stop before reaching a consistency point. Yes, I agree this is a much better scenario to test. Thanks. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-11-25T06:45:56Z
It seems my previous message isn't visible in the thread. I can see Robert's reply, but not the original message he was responding to. I'm resending the message and all attachments to ensure it's saved in the thread. On Thu, 13 Nov 2025 at 19:45, Alena Vinter <dlaaren8@gmail.com> wrote: > Hi everyone! > > Robert, here's a realistic scenario where the issue occurs: > 1. Start with a primary and physical standby > 2. Convert the standby to a logical replica using `pg_createsubscriber` > 3. Later, create a new physical standby from a backup of this logical > replica > 4. The new standby fails to start because it cannot reach consistency point > > The root cause is that `pg_createsubscriber` leaves behind recovery > parameters that interfere with the new standby's startup process, causing > recovery to stop before reaching a consistency point. > > To demonstrate this, I've expanded the existing ' > 040_pg_createsubscriber.pl' test to include this scenario. I've also > attached a standalone TAP test below for easier verification of the > specific failure case. > > --- > Regards, > Alena Vinter >
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-12-03T05:08:12Z
On Tue, Nov 25, 2025 at 01:45:56PM +0700, Alena Vinter wrote: > It seems my previous message isn't visible in the thread. I can see > Robert's reply, but not the original message he was responding to. I'm > resending the message and all attachments to ensure it's saved in the > thread. I don't understand the fuzziness of the patch around recovery.conf. Why would that be required knowing that pg_createsubscriber only supports the same major version as the target server? We have dropped support for recovery.conf a long time ago, and this includes all stable branches. FWIW, based on the concerns I'm grabbing, nothing presented on this thread would be candidate material for a backpatch. Just mentioning in passing. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-12-03T05:11:39Z
On Wed, Nov 19, 2025 at 02:49:29PM -0500, Robert Haas wrote: > On Thu, Nov 13, 2025 at 7:46 AM Alena Vinter <dlaaren8@gmail.com> wrote: >> The root cause is that `pg_createsubscriber` leaves behind recovery >> parameters that interfere with the new standby's startup process, >> causing recovery to stop before reaching a consistency point. > > Yes, I agree this is a much better scenario to test. Thanks. Robert, does this line of arguments address the concerns you had upthread? Or do you still see a reason why not cleaning up these recovery parameters would not be a good idea? I don't see a strong need for a backpatch here as the approach I have mentioned with a secondary configuration file, and an include_if_exists would be a kind of design change for the tool itself. So this would qualify as a HEAD-only thing, if of course you wouldd agree with the change to clean up the recovery parameters. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Robert Haas <robertmhaas@gmail.com> — 2025-12-03T14:08:42Z
On Wed, Dec 3, 2025 at 12:11 AM Michael Paquier <michael@paquier.xyz> wrote: > Robert, does this line of arguments address the concerns you had > upthread? Or do you still see a reason why not cleaning up these > recovery parameters would not be a good idea? > > I don't see a strong need for a backpatch here as the approach I have > mentioned with a secondary configuration file, and an > include_if_exists would be a kind of design change for the tool > itself. So this would qualify as a HEAD-only thing, if of course > you wouldd agree with the change to clean up the recovery parameters. As I see it, there's no bug here, because I don't think we've made any promise to the user that they don't need to check what recovery parameters are configured before running recovery. However, your proposal seems like it could make life more convenient for users, which seems like a good thing to do. The revised test case doesn't, in my opinion, represent a scenario that absolutely has to work without user intervention, but it's nicer if it does. So, I would see committing this as a feature improvement but not back-patching it as a reasonable way forward. -- Robert Haas EDB: http://www.enterprisedb.com
-
Re: Resetting recovery target parameters in pg_createsubscriber
Andrey Rudometov <unlimitedhikari@gmail.com> — 2025-12-13T05:07:35Z
Hello Alena! While reading through your patch, i noticed backend's include of +#include "storage/fd.h" which, as I understand, is needed for +err = durable_rename(filename_with_original_contents, filename, ERROR); +durable_rename(filename, filename_with_original_contents, FATAL); Postgres' source code actually has two different durable_rename signatures - one for frontend, and one for backend. As pg_createsubscriber uses frontend definitions, I suggest replacing this durable_rename with one from common/file_utils.h. A patch with my approach to replacement is in attachments (to be applied after yours). p.s. I wonder why this builds at all - as far as I have tried, using BE in FE usually leads to a ton of compilation errors or, at least, warnings treated as errors - p.e, in this case backend definition uses ereport(), which does not work in frontend utilities. -- best regards, Andrey Rudometov
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-12-17T04:10:35Z
On Sat, Dec 13, 2025 at 12:07:35PM +0700, Andrey Rudometov wrote: > A patch with my approach to replacement is in attachments > (to be applied after yours). > > p.s. I wonder why this builds at all - as far as I have tried, using > BE in FE usually leads to a ton of compilation errors or, at least, > warnings treated as errors - p.e, in this case backend definition > uses ereport(), which does not work in frontend utilities. Yep, that's indeed something that the patch should not do at all. > - durable_rename(filename, filename_with_original_contents, FATAL); > + err = durable_rename(filename, filename_with_original_contents); > + if (err) > + pg_fatal("could not rename file \"%s\"", filename); I don't see a point in re-emitting an error message as well as you are suggesting here. durable_rename() does this job already on failure with a set of pg_log_error(). The only difference is that pg_fatal() is a shortcut for pg_log_error()+exit(). -- Michael -
Re: Resetting recovery target parameters in pg_createsubscriber
Andrey Rudometov <unlimitedhikari@gmail.com> — 2025-12-24T11:12:24Z
Hello Alena! As discussion here came to a halt, I want to add some (counter)arguments, which may or may not make Michael's proposal sound preferable to you. On Mon, 10 Nov 2025 at 12:02, Alena Vinter <dlaaren8@gmail.com> wrote: > I'm not in favor of saving additional files. As far as I understand, there should be only one to two additional files - it's not like they will multiply. Some extensions can use additional files, so why pg_createsubscriber couldn't? > Therefore, it seems appropriate that they should be > captured by the verbose/debug mode. If verbose mode > isn't used, we lose more than just the recovery parameters But if you save them as messages in server log, then user might lose them even in verbose mode. Log is not stored infinitely, and user may not have access to messages if some time has passed after the incident. A file seems to be a more reliable solution here. P.s. About re-emitting error message with pg_fatal in my proposal - I agree, simple exit() should suffice. -- best regards, Andrey Rudometov
-
Re: Resetting recovery target parameters in pg_createsubscriber
Michael Paquier <michael@paquier.xyz> — 2025-12-25T01:50:16Z
On Wed, Dec 24, 2025 at 06:12:24PM +0700, Andrey Rudometov wrote: > As discussion here came to a halt, I want to add some > (counter)arguments, which may or may not make Michael's > proposal sound preferable to you. If somebody could post a refreshed patch set that addresses the comments raised until now, I would be happy to look at it and get something done for this release. -- Michael
-
Re: Resetting recovery target parameters in pg_createsubscriber
Alena Vinter <dlaaren8@gmail.com> — 2025-12-30T07:34:21Z
Hi everyone! Apologies for the long silence — I was temporarily pulled away from this work and didn’t want to send an update until I could properly address the feedback. I’m now back on track and have refined the implementation based on earlier discussions. The current version fully adopts the `include_if_exists` approach, writes temporary recovery settings to a separate config file, and disables it on exit by renaming to `.disabled`. Thank you for your patience — I appreciate any further review! --- Alena Vinter