Re: backup manifests
David Steele <david@pgmasters.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On 3/26/20 11:37 AM, Robert Haas wrote: >> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrot > > This is where I feel like I'm trying to make decisions in a vacuum. If > we had a few more people weighing in on the thread on this point, I'd > be happy to go with whatever the consensus was. If most people think > having both --no-manifest (suppressing the manifest completely) and > --manifest-checksums=none (suppressing only the checksums) is useless > and confusing, then sure, let's rip the latter one out. If most people > like the flexibility, let's keep it: it's already implemented and > tested. But I hate to base the decision on what one or two people > think. I'm not sure I see a lot of value to being able to build manifest with no checksums, especially if overhead for the default checksum algorithm is negligible. However, I'd still prefer that the default be something more robust and allow users to tune it down rather than the other way around. But I've made that pretty clear up-thread and I consider that argument lost at this point. >> As for folks who are that close to the edge on their backup timing that >> they can't have it slow down- chances are pretty darn good that they're >> not far from ending up needing to find a better solution than >> pg_basebackup anyway. Or they don't need to generate a manifest (or, I >> suppose, they could have one but not have checksums..). > > 40-50% is a lot more than "if you were on the edge." For the record I think this is a very misleading number. Sure, if you are doing your backup to a local SSD on a powerful development laptop it makes sense. But backups are generally placed on slower storage, remotely, with compression. Even without compression the first two are going to bring this percentage down by a lot. When you get to page-level incremental backups, which is where this all started, I'd still recommend using a stronger checksum algorithm to verify that the file was reconstructed correctly on restore. That much I believe we have agreed on. >> Even pg_basebackup (in both fetch and stream modes...) checks that we at >> least got all the WAL that's needed for the backup from the server >> before considering the backup to be valid and telling the user that >> there was a successful backup. With what you're proposing here, we >> could have someone do a pg_basebackup, get back an ERROR saying the >> backup wasn't valid, and then run pg_validatebackup and be told that the >> backup is valid. I don't get how that's sensible. > > I'm sorry that you can't see how that's sensible, but it doesn't mean > that it isn't sensible. It is totally unrealistic to expect that any > backup verification tool can verify that you won't get an error when > trying to use the backup. That would require that everything that the > validation tool try to do everything that PostgreSQL will try to do > when the backup is used, including running recovery and updating the > data files. Anything less than that creates a real possibility that > the backup will verify good but fail when used. This tool has a much > narrower purpose, which is to try to verify that we (still) have the > files the server sent as part of the backup and that, to the best of > our ability to detect such things, they have not been modified. As you > know, or should know, the WAL files are not sent as part of the > backup, and so are not verified. Other things that would also be > useful to check are also not verified. It would be fantastic to have > more verification tools in the future, but it is difficult to see why > anyone would bother trying if an attempt to get the first one > committed gets blocked because it does not yet do everything. Very few > patches try to do everything, and those that do usually get blocked > because, by trying to do too much, they get some of it badly wrong. I agree with Stephen that this should be done, but I agree with you that it can wait for a future commit. However, I do think: 1) It should be called out rather plainly in the documentation. 2) If there are files in pg_wal then pg_validatebackup should inform the user that those files have not been validated. I know you and Stephen have agreed on a number of doc changes, would it be possible to get a new patch with those included? I finally have time to do a review of this tomorrow. I saw some mistakes in the docs in the current patch but I know those patches are not current. Regards, -- -David david@pgmasters.net