Re: backup manifests

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Robert Haas <robertmhaas@gmail.com>

Cc: David Steele <david@pgmasters.net>, Noah Misch <noah@leadboat.com>, Stephen Frost <sfrost@snowman.net>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2020-03-31T22:50:34Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Hi,

On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
> I made an attempt to implement this.

Awesome!


> In the attached patch set, 0001 I'm going to work on those things. I
> would appreciate *very timely* feedback on anything people do or do
> not like about this, because I want to commit this patch set by the
> end of the work week and that isn't very far away. I would also
> appreciate if people would bear in mind the principle that half a loaf
> is better than none, and further improvements can be made in future
> releases.
> 
> As part of my light testing, I tried promoting a standby that was
> running pg_basebackup, and found that pg_basebackup failed like this:
> 
> pg_basebackup: error: could not get COPY data stream: ERROR:  the
> standby was promoted during online backup
> HINT:  This means that the backup being taken is corrupt and should
> not be used. Try taking another online backup.
> pg_basebackup: removing data directory "/Users/rhaas/pgslave2"
> 
> My first thought was that this error message is hard to reconcile with
> this comment:
> 
>         /*
>          * Send timeline history files too. Only the latest timeline history
>          * file is required for recovery, and even that only if there happens
>          * to be a timeline switch in the first WAL segment that contains the
>          * checkpoint record, or if we're taking a base backup from a standby
>          * server and the target timeline changes while the backup is taken.
>          * But they are small and highly useful for debugging purposes, so
>          * better include them all, always.
>          */
> 
> But then it occurred to me that this might be a cascading standby.

Yea. The check just prevents the walsender's database from being
promoted:

		/*
		 * Check if the postmaster has signaled us to exit, and abort with an
		 * error in that case. The error handler further up will call
		 * do_pg_abort_backup() for us. Also check that if the backup was
		 * started while still in recovery, the server wasn't promoted.
		 * do_pg_stop_backup() will check that too, but it's better to stop
		 * the backup early than continue to the end and fail there.
		 */
		CHECK_FOR_INTERRUPTS();
		if (RecoveryInProgress() != backup_started_in_recovery)
			ereport(ERROR,
					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
					 errmsg("the standby was promoted during online backup"),
					 errhint("This means that the backup being taken is corrupt "
							 "and should not be used. "
							 "Try taking another online backup.")));
and

	if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
		ereport(ERROR,
				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
				 errmsg("the standby was promoted during online backup"),
				 errhint("This means that the backup being taken is corrupt "
						 "and should not be used. "
						 "Try taking another online backup.")));

So that just prevents promotions of the current node, afaict.



> Regardless, it seems like a really good idea to store a list of WAL
> ranges rather than a single start/end/timeline, because even if it's
> impossible today it might become possible in the future.

Indeed.


> Still, unless there's an easy way to set up a test scenario where
> multiple WAL ranges need to be verified, it may be hard to test that
> this code actually behaves properly.

I think it'd be possible to test without a fully cascading setup, by
creating an initial base backup, then do some work to create a bunch of
new timelines, and then start the initial base backup. That'd have to
follow all those timelines.  Not sure that's better than a cascading
setup though.


> +/*
> + * Add information about the WAL that will need to be replayed when restoring
> + * this backup to the manifest.
> + */
> +static void
> +AddWALInfoToManifest(manifest_info *manifest, XLogRecPtr startptr,
> +					 TimeLineID starttli, XLogRecPtr endptr, TimeLineID endtli)
> +{
> +	List *timelines = readTimeLineHistory(endtli);

should probably happen after the manifest->buffile check.


> +	ListCell *lc;
> +	bool	first_wal_range = true;
> +	bool	found_ending_tli = false;
> +
> +	/* If there is no buffile, then the user doesn't want a manifest. */
> +	if (manifest->buffile == NULL)
> +		return;

Not really about this patch/function specifically: I wonder if this'd
look better if you added ManifestEnabled() macro instead of repeating
the comment repeatedly.



> +	/* Unless --no-parse-wal was specified, we will need pg_waldump. */
> +	if (!no_parse_wal)
> +	{
> +		int		ret;
> +
> +		pg_waldump_path = pg_malloc(MAXPGPATH);
> +		ret = find_other_exec(argv[0], "pg_waldump",
> +							  "pg_waldump (PostgreSQL) " PG_VERSION "\n",
> +							 pg_waldump_path);
> +		if (ret < 0)
> +		{
> +			char	full_path[MAXPGPATH];
> +
> +			if (find_my_exec(argv[0], full_path) < 0)
> +				strlcpy(full_path, progname, sizeof(full_path));
> +			if (ret == -1)
> +				pg_log_fatal("The program \"%s\" is needed by %s but was\n"
> +							 "not found in the same directory as \"%s\".\n"
> +							 "Check your installation.",
> +							 "pg_waldump", "pg_validatebackup", full_path);
> +			else
> +				pg_log_fatal("The program \"%s\" was found by \"%s\" but was\n"
> +							 "not the same version as %s.\n"
> +							 "Check your installation.",
> +							 "pg_waldump", full_path, "pg_validatebackup");
> +		}
> +	}

ISTM, and this can definitely wait for another time, that we should have
one wrapper doing all of this, instead of having quite a few copies of
very similar logic to the above.


> +/*
> + * Attempt to parse the WAL files required to restore from backup using
> + * pg_waldump.
> + */
> +static void
> +parse_required_wal(validator_context *context, char *pg_waldump_path,
> +				   char *wal_directory, manifest_wal_range *first_wal_range)
> +{
> +	manifest_wal_range *this_wal_range = first_wal_range;
> +
> +	while (this_wal_range != NULL)
> +	{
> +		char *pg_waldump_cmd;
> +
> +		pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
> +			   pg_waldump_path, wal_directory, this_wal_range->tli,
> +			   (uint32) (this_wal_range->start_lsn >> 32),
> +			   (uint32) this_wal_range->start_lsn,
> +			   (uint32) (this_wal_range->end_lsn >> 32),
> +			   (uint32) this_wal_range->end_lsn);
> +		if (system(pg_waldump_cmd) != 0)
> +			report_backup_error(context,
> +								"WAL parsing failed for timeline %u",
> +								this_wal_range->tli);
> +
> +		this_wal_range = this_wal_range->next;
> +	}
> +}

Should we have a function to properly escape paths in cases like this?
Not that it's likely or really problematic, but the quoting for path
could be "circumvented".


Greetings,

Andres Freund