Re: backup manifests
Stephen Frost <sfrost@snowman.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Mar 25, 2020 at 9:31 AM Stephen Frost <sfrost@snowman.net> wrote: > > I get that the default for manifest is 'no', but I don't really see how > > that means that the lack of saying anything about checksums should mean > > "give me crc32c checksums". It's really rather common that if we don't > > specify something, it means don't do that thing- like an 'ORDER BY' > > clause. > > That's a fair argument, but I think the other relevant principle is > that we try to give people useful defaults for things. I think that > checksums are a sufficiently useful thing that having the default be > not to do it doesn't make sense. I had the impression that you and > David were in agreement on that point, actually. I agree with wanting to have useful defaults and that checksums should be included by default, and I'm alright even with letting people pick what algorithms they'd like to have too. The construct here is made odd because we've got this idea that "no checksum" is an option, which is actually something that I don't particularly like, but that's what's making this particular syntax weird. I don't suppose you'd be open to the idea of just dropping that though..? There wouldn't be any issue with this syntax if we just always had checksums included when a manifest is requested. :) Somehow, I don't think I'm going to win that argument. > > There were also comments made up-thread about how it might not be great > > for larger (eg: 1GB files, like we tend to have quite a few of...), and > > something about it being a 40 year old algorithm.. > > Well, the 512MB "limit" for CRC-32C means only that for certain very > specific types of errors, detection is not guaranteed above that file > size. So if you have a single flipped bit, for example, and the file > size is greater than 512MB, then CRC-32C has only a 99.9999999767169% > chance of detecting the error, whereas if the file size is less than > 512MB, it is 100% certain, because of the design of the algorithm. But > nine nines is plenty, and neither SHA nor our page-level checksums > provide guaranteed error detection properties anyway. Right, so we know that CRC-32C has an upper-bound of 512MB to be useful for exactly what it's designed to be useful for, but we also know that we're going to have larger files- at least 1GB ones, and quite possibly larger, so why are we choosing this? At the least, wouldn't it make sense to consider a larger CRC, one whose limit is above the size of commonly expected files, if we're going to use a CRC? > I'm not sure why the fact that it's a 40-year-old algorithm is > relevant. There are many 40-year-old algorithms that are very good. Sure there are, but there probably wasn't a lot of thought about GB-sized files, and this doesn't really seem to be the direction people are going in for larger objects. s3, as an example, uses sha256. Google, it seems, suggests folks use "HighwayHash" (from their crc32c github repo- https://github.com/google/crc32c). Most CRC uses seem to be for much smaller data sets. > My guess is that if this patch is adopted as currently proposed, we > will eventually need to replace the cryptographic hash functions due > to the march of time. As I'm sure you realize, the problem with hash > functions that are designed to foil an adversary is that adversaries > keep getting smarter. So, eventually someone will probably figure out > how to do something nefarious with SHA-512. Some other technique that > nobody's cracked yet will need to be adopted, and then people will > begin trying to crack that, and the whole thing will repeat. But I > suspect that we can keep using the same non-cryptographic hash > function essentially forever. It does not matter that people know how > the algorithm works because it makes no pretensions of trying to foil > an opponent. It is just trying to mix up the bits in such a way that a > change to the file is likely to cause a change in the checksum. The > bit-mixing properties of the algorithm do not degrade with the passage > of time. Sure, there's a good chance we'll need newer algorithms in the future, I don't doubt that. On the other hand, if crc32c, or CRC whatever, was the perfect answer and no one will ever need something better, then what's with folks like Google suggesting something else..? > > I'm sure that sha256 takes a lot more time than crc32c, I'm certainly > > not trying to dispute that, but what's relevent here is how much it > > impacts the time required to run the overall backup (including sync'ing > > it to disk, and possibly network transmission time.. if we're just > > comparing the time to run it through memory then, sure, the sha256 > > computation time might end up being quite a bit of the time, but that's > > not really that interesting of a test..). > > I think that http://postgr.es/m/38e29a1c-0d20-fc73-badd-ca05f7f07ffa@pgmasters.net > is one of the more interesting emails on this topic. My conclusion > from that email, and the ones that led up to it, was that there is a > 40-50% overhead from doing a SHA checksum, but in pgbackrest, users > don't see it because backups are compressed. Because the compression > uses so much CPU time, the additional overhead from the SHA checksum > is only a few percent more. But I don't think that it would be smart > to slow down uncompressed backups by 40-50%. That's going to cause a > problem for somebody, almost for sure. I like that email on the topic also, as it points out again (as I tried to do earlier also..) that it depends on what we're actually including in the test- and it seems, again, that those tests didn't consider the time to actually write the data somewhere, either network or disk. As for folks who are that close to the edge on their backup timing that they can't have it slow down- chances are pretty darn good that they're not far from ending up needing to find a better solution than pg_basebackup anyway. Or they don't need to generate a manifest (or, I suppose, they could have one but not have checksums..). > > In particular, the sentence right above this list is: > > > > "Certain files and directories are excluded from verification:" > > > > but we actually do verify the manifest, that's all I'm saying here. > > > > Maybe rewording that a bit is what would help, say: > > > > "Certain files and directories are not included in the manifest:" > > Well, that'd be wrong, though. It's true that backup_manifest won't > have an entry in the manifest, and neither will WAL files, but > postgresql.auto.conf will. We'll just skip complaining about it if the > checksum doesn't match or whatever. The server generates manifest > entries for everything, and the client decides not to pay attention to > some of them because it knows that pg_basebackup may have made certain > changes that were not known to the server. Ok, but it's also wrong to say that the backup_label is excluded from verification. > > Yeah, I get that it's not easy to figure out how to validate the WAL, > > but I stand by my opinion that it's simply not acceptable to exclude the > > necessary WAL from verification so and to claim that a backup is valid > > when we haven't checked the WAL. > > I hear that, but I don't agree that having nothing is better than > having this much committed. I would be fine with renaming the tool > (pg_validatebackupmanifest? pg_validatemanifest?), or with updating > the documentation to be more clear about what is and is not checked, > but I'm not going to extent the tool to do totally new things for > which we don't even have an agreed design yet. I believe in trying to > create patches that do one thing and do it well, and this patch does > that. The fact that it doesn't do some other thing that is > conceptually related yet different is a good thing, not a bad one. I fail to see the usefulness of a tool that doesn't actually verify that the backup is able to be restored from. Even pg_basebackup (in both fetch and stream modes...) checks that we at least got all the WAL that's needed for the backup from the server before considering the backup to be valid and telling the user that there was a successful backup. With what you're proposing here, we could have someone do a pg_basebackup, get back an ERROR saying the backup wasn't valid, and then run pg_validatebackup and be told that the backup is valid. I don't get how that's sensible. Thanks, Stephen