Re: backup manifests
Stephen Frost <sfrost@snowman.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > I think I forgot an initializer. Try this version. Just took a quick look through this. I'm pretty sure David wants to look at it too. Anyway, some comments below. > diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml > index f139ba0231..d1ff53e8e8 100644 > --- a/doc/src/sgml/protocol.sgml > +++ b/doc/src/sgml/protocol.sgml > @@ -2466,7 +2466,7 @@ The commands accepted in replication mode are: > </varlistentry> > > <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP"> > - <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] > + <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ] > <indexterm><primary>BASE_BACKUP</primary></indexterm> > </term> > <listitem> > @@ -2576,6 +2576,37 @@ The commands accepted in replication mode are: > </para> > </listitem> > </varlistentry> > + > + <varlistentry> > + <term><literal>MANIFEST</literal></term> > + <listitem> > + <para> > + When this option is specified with a value of <literal>ye'</literal> > + or <literal>force-escape</literal>, a backup manifest is created > + and sent along with the backup. The latter value forces all filenames > + to be hex-encoded; otherwise, this type of encoding is performed only > + for files whose names are non-UTF8 octet sequences. > + <literal>force-escape</literal> is intended primarily for testing > + purposes, to be sure that clients which read the backup manifest > + can handle this case. For compatibility with previous releases, > + the default is <literal>MANIFEST 'no'</literal>. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > + <term><literal>MANIFEST_CHECKSUMS</literal></term> > + <listitem> > + <para> > + Specifies the algorithm that should be used to checksum each file > + for purposes of the backup manifest. Currently, the available > + algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>, > + <literal>SHA224</literal>, <literal>SHA256</literal>, > + <literal>SHA384</literal>, and <literal>SHA512</literal>. > + The default is <literal>CRC32C</literal>. > + </para> > + </listitem> > + </varlistentry> > </variablelist> > </para> > <para> While I get the desire to have a default here that includes checksums, the way the command is structured, it strikes me as odd that the lack of MANIFEST_CHECKSUMS in the command actually results in checksums being included. I would think that we'd either: - have the lack of MANIFEST_CHECKSUMS mean 'No checksums' or - Require MANIFEST_CHECKSUMS to be specified and not have it be optional We aren't expecting people to actually be typing these commands out and so I don't think it's a *huge* deal to have it the way you've written it, but it still strikes me as odd. I don't think I have a real preference between the two options that I suggest above, maybe very slightly in favor of the first. > diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml > index 90638aad0e..bf6963a595 100644 > --- a/doc/src/sgml/ref/pg_basebackup.sgml > +++ b/doc/src/sgml/ref/pg_basebackup.sgml > @@ -561,6 +561,69 @@ PostgreSQL documentation > </para> > </listitem> > </varlistentry> > + > + <varlistentry> > + <term><option>--no-manifest</option></term> > + <listitem> > + <para> > + Disables generation of a backup manifest. If this option is not > + specified, the server will and send generate a backup manifest > + which can be verified using <xref linkend="app-pgvalidatebackup" />. > + </para> > + </listitem> > + </varlistentry> How about "If this option is not specified, the server will generate and send a backup manifest which can be verified using ..." > + <varlistentry> > + <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term> > + <listitem> > + <para> > + Specifies the algorithm that should be used to checksum each file > + for purposes of the backup manifest. Currently, the available > + algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>, > + <literal>SHA224</literal>, <literal>SHA256</literal>, > + <literal>SHA384</literal>, and <literal>SHA512</literal>. > + The default is <literal>CRC32C</literal>. > + </para> As I recall, there was an invitation to argue about the defaults at one point, and so I'm going to say here that I would advocate for a different default than 'crc32c'. Specifically, I would think sha256 or 512 would be better. I don't recall seeing a debate about this that conclusively found crc32c to be better, but I'm happy to go back and reread anything someone wants to point me at. > + <para> > + If <literal>NONE</literal> is selected, the backup manifest will > + not contain any checksums. Otherwise, it will contain a checksum > + of each file in the backup using the specified algorithm. In addition, > + the manifest itself will always contain a <literal>SHA256</literal> > + checksum of its own contents. The <literal>SHA</literal> algorithms > + are significantly more CPU-intensive than <literal>CRC32C</literal>, > + so selecting one of them may increase the time required to complete > + the backup. > + </para> It also seems a bit silly to me that using the defaults means having to deal with two different algorithms- crc32c and sha256. Considering how fast these algorithms are, compared to everything else involved in a backup (particularly one that's likely going across a network...), I wonder if we should say "may slightly increase" above. > + <para> > + On the other hand, <literal>CRC32C</literal> is not a cryptographic > + hash function, so it is only suitable for protecting against > + inadvertent or random modifications to a backup. An adversary > + who can modify the backup could easily do so in such a way that > + the CRC does not change, whereas a SHA collision will be hard > + to manufacture. (However, note that if the attacker also has access > + to modify the backup manifest itself, no checksum algorithm will > + provide any protection.) An additional advantage of the > + <literal>SHA</literal> family of functions is that they output > + a much larger number of bits. > + </para> I'm not really sure that this paragraph is sensible to include.. We certainly don't talk about adversaries and cryptographic hash functions when we talk about our page-level checksums, for example. I'm not completely against including it, but I don't want to give the impression that this is something we routinely consider or that lack of discussion elsewhere implies we have protections against a determined attacker. > diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml > new file mode 100644 > index 0000000000..1c171f6970 > --- /dev/null > +++ b/doc/src/sgml/ref/pg_validatebackup.sgml > @@ -0,0 +1,232 @@ > +<!-- > +doc/src/sgml/ref/pg_validatebackup.sgml > +PostgreSQL documentation > +--> > + > +<refentry id="app-pgvalidatebackup"> > + <indexterm zone="app-pgvalidatebackup"> > + <primary>pg_validatebackup</primary> > + </indexterm> > + > + <refmeta> > + <refentrytitle>pg_validatebackup</refentrytitle> > + <manvolnum>1</manvolnum> > + <refmiscinfo>Application</refmiscinfo> > + </refmeta> > + > + <refnamediv> > + <refname>pg_validatebackup</refname> > + <refpurpose>verify the integrity of a base backup of a > + <productname>PostgreSQL</productname> cluster</refpurpose> > + </refnamediv> "verify the integrity of a backup taken using pg_basebackup" > + <refsect1> > + <title> > + Description > + </title> > + <para> > + <application>pg_validatebackup</application> is used to check the integrity > + of a database cluster backup. The backup being checked should have been > + created by <command>pg_basebackup</command> or some other tool that includes > + a <literal>backup_manifest</literal> file with the backup. The backup > + must be stored in the "plain" format; a "tar" format backup can be checked > + after extracting it. Backup manifests are created by the server beginning > + with <productname>PostgreSQL</productname> version 13, so older backups > + cannot be validated using this tool. > + </para> This seems to invite the idea that pg_validatebackup should be able to work with external backup solutions- but I'm a bit concerned by that idea because it seems like it would then mean we'd have to be particularly careful when changing things in this area, and I'm not thrilled by that. I'd like to make sure that new versions of pg_validatebackup work with older backups, and, ideally, older versions of pg_validatebackup would work even with newer backups, all of which I think the json structure of the manifest helps us with, but that's when we're building the manifest and know what it's going to look like. Maybe to put it another way- would a patch be accepted to make pg_validatebackup work with other manifests..? If not, then I'd keep this to the more specific "this tool is used to validate backups taken using pg_basebackup". > + <para> > + <application>pg_validatebackup</application> reads the manifest file of a > + backup, verifies the manifest against its own internal checksum, and then > + verifies that the same files are present in the target directory as in the > + manifest itself. It then verifies that each file has the expected checksum, > + unless the backup was taken the checksum algorithm set to "was taken with the checksum algorithm"... > + <literal>none</literal>, in which case checksum verification is not > + performed. The presence or absence of directories is not checked, except > + indirectly: if a directory is missing, any files it should have contained > + will necessarily also be missing. Certain files and directories are > + excluded from verification: > + </para> > + > + <itemizedlist> > + <listitem> > + <para> > + <literal>backup_manifest</literal> is ignored because the backup > + manifest is logically not part of the backup and does not include > + any entry for itself. > + </para> > + </listitem> This seems a bit confusing, doesn't it? The backup_manifest must exist, and its checksum is internal, and is checked, isn't it? Why say that it's excluded..? > + <listitem> > + <para> > + <literal>pg_wal</literal> is ignored because WAL files are sent > + separately from the backup, and are therefore not described by the > + backup manifest. > + </para> > + </listitem> I don't agree with the choice to exclude the WAL files, considering they're an integral part of a backup, to exclude them means that if they've been corrupted at all then the entire backup is invalid. You don't want to be discovering that when you're trying to do a restore of a backup that you took with pg_basebackup and which pg_validatebackup says is valid. After all, the tool being used here, pg_basebackup, *does* also stream the WAL files- there's no reason why we can't calculate a checksum on them and store that checksum somewhere and use it to validate the WAL files. This, in my opinion, is actually a show-stopper for this feature. Claiming it's a valid backup when we don't check the absolutely necessary-for-restore WAL is making a false claim, no matter how well it's documented. I do understand that it's possible to run pg_basebackup without the WAL files being grabbed as part of that run- in such a case, we should be able to detect that was the case for the backup and when running pg_validatebackup we should issue a WARNING that the WAL files weren't able to be verified (we could have an option to suppress that warning if people feel that's needed). > + <listitem> > + <para> > + <literal>postgesql.auto.conf</literal>, > + <literal>standby.signal</literal>, > + and <literal>recovery.signal</literal> are ignored because they may > + sometimes be created or modified by the backup client itself. > + (For example, <literal>pg_basebackup -R</literal> will modify > + <literal>postgresql.auto.conf</literal> and create > + <literal>standby.signal</literal>.) > + </para> > + </listitem> > + </itemizedlist> > + </refsect1> Not really thrilled with this (pg_basebackup certainly could figure out the checksum for those files...), but I also don't think it's a huge issue as they can be recreated by a user (unlike a WAL file..). I got through most of the pg_basebackup changes, and they looked pretty good in general. Will try to review more tomorrow. Thanks, Stephen