Re: backup manifests
Robert Haas <robertmhaas@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On Fri, Nov 22, 2019 at 2:29 PM David Steele <david@pgmasters.net> wrote: > See: > https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf > Search for "The maximum block size" Hmm, so it says: "The maximum block size that can be protected by a 32-bit CRC is 512MB." My problem is that (1) it doesn't back this up with a citation or any kind of logical explanation and (2) it's not very clear what "protected" means. Tels replies downthread to explain that the internal state of the 32-bit CRC calculation is also limited to 32 bits, and changes once per bit, so that after processing 512MB = 2^29 bytes = 2^32 bits of data, you're guaranteed to start repeating internal states. Perhaps this is also what the NIST folks had in mind, though it's hard to know. This link provides some more details: https://community.arm.com/developer/tools-software/tools/f/keil-forum/17467/crc-for-256-byte-data Not everyone on the thread agrees with everybody else, but it seems like there are size limits below which a CRC-n is guaranteed to detect all 1-bit and 2-bit errors, and above which this is no longer guaranteed. They put the limit *lower* than what NIST supposes, namely 2^(n-1)-1 bits, which would be 256MB, not 512MB, if I'm doing math correctly. However, they also say that above that value, you are still likely to detect most errors. Absent an intelligent adversary, the chance of a random collision when corruption is present is still about 1 in 4 billion (2^-32). To me, guaranteed detection of 1-bit and 2-bit errors (and the other kinds of specific things CRC is designed to catch) doesn't seem like a principle design consideration. It's nice if we can get it and I'm not against it, but these are algorithms that are designed to be used when data undergoes a digital-to-analog-to-digital conversion, where for example it's possible that that the conversion back to digital loses sync and reads 9 bits or 7 bits rather than 8 bits. And that's not really what we're doing here: we all know that bits get flipped sometimes, but nobody uses scp to copy a 1GB file and ends up with a file that is 1GB +/- a few bits. Some lower-level part of the communication stack is handling that part of the work; you're going to get exactly 1GB. So it seems to me that here, as with XLOG, we're not relying on the specific CRC properties that were intended to be used to catch and in some cases repair bit flips caused by wrinkles in an A-to-D conversion, but just on its general tendency to probably not match if any bits got flipped. And those properties hold regardless of input length. That being said, having done some reading on this, I am a little concerned that we're getting further and further from the design center of the CRC algorithm. Like relation segment files, XLOG records are not packets subject to bit insertions, but at least they're small, and relation files are not. Using a 40-year-old algorithm that was intended to be used for things like making sure the modem hadn't lost framing in the last second to verify 1GB files feels, in some nebulous way, like we might be stretching. That being said, I'm not sure what we think the reasonable alternatives are. Users aren't going to be better off if we say that, because CRC-32C might not do a great job detecting errors, we're not going to check for errors at all. If we go the other way and say we're going to use some variant of SHA, they will be better off, but at the price of what looks like a *significant* hit in terms of backup time. > > "Typically an n-bit CRC applied to a data block of arbitrary length > > will detect any single error burst not longer than n bits, and the > > fraction of all longer error bursts that it will detect is (1 − > > 2^−n)." > > I'm not sure how encouraging I find this -- a four-byte error not a lot > and 2^32 is only 4 billion. We have individual users who have backed up > more than 4 billion files over the last few years. I agree that people have a lot more than 4 billion files backed up, but I'm not sure it matters very much given the use case I'm trying to enable. There's a lot of difference between delta restore and backup integrity checking. For backup integrity checking, my goal is that, on those occasions when a file gets corrupted, the chances that we notice that it has been corrupted. For that purpose, a 32-bit checksum is probably sufficient. If a file gets corrupted, we have about a 1-in-4-billion chance of being unable to detect it. If 4 billion files get corrupted, we'll miss, on average, one of those corruption events. That's sad, but so is the fact that you had *4 billion corrupted files*. This is not the total number of files backed up; this is the number of those that got corrupted. I don't really know how common it is to copy a file and end up with a corrupt copy, but if you say it's one-in-a-million, which I suspect is far too high, then you'd have to back up something like 4 quadrillion files before you missed a corruption event, and that's a *very* big number. Now delta restore is a whole different kettle of fish. The birthday problem is huge here. If you've got a 32-bit checksum for file A, and you go and look it up in a database of checksums, and that database has even 1 billion things in it, you've got a pretty decent shot of latching onto a file that is not actually the same as file A. The problem goes away almost entirely if you only compare against previous versions of that file from that database cluster. You've probably only got tens or maybe at the very outside hundreds or thousands of backups of that particular file, and a collision is unlikely even with only a 32-bit checksum -- though even there maybe you'd like to use something larger just to be on the safe side. But if you're going to compare to other files from the same cluster, or even worse any file from any cluster, 32 bits is *woefully* inadequate. TBH even using SHA for such use cases feels a little scary to me. It's probably good enough -- 2^160 for SHA-1 is a *lot* bigger than 2^32, and 2^512 for SHA-512 is enormous. But I'd want to spend time thinking very carefully about the math before designing such a system. > OK, I'll buy that. But I *don't* think CRCs should be allowed for > deltas (when we have them) and I *do* think we should caveat their > effectiveness (assuming we can agree on them). Sounds good. > In general the answer to faster backups should be more cores/faster > network/faster disk, not compromising backup integrity. I understand > we'll need to wait until we have parallelism in pg_basebackup to justify > that answer. I would like to dispute that characterization of what we're talking about here. If we added a 1-bit checksum (parity bit) it would be *strictly better* than what we're doing right now, which is nothing. That's not a serious proposal because it's obvious we can do a lot better for trivial additional cost, but deciding that we're going to use a weaker kind of checksum to avoid adding too much overhead is not wimping out, because it's still going to be strong enough to catch the overwhelming majority of problems that go undetected today. Even an *8-bit* checksum would give us a >99% chance of catching a corrupted file, which would be noticeably better than the 0% chance we have today. Even a manifest with no checksums at all that just checked the presence and size of files would catch tons of operator error, e.g. - wait, that database had tablespaces? - were those logs in pg_clog anything important? - oh, i wasn't supposed to start postgres on the copy of the database stored in the backup directory? So I don't think we're talking about whether to compromise backup integrity. I think we're talking about - if we're going to make backup integrity better than it is today, how much better should we try to make it, and what are the trade-offs there? The straw man here is that we could make the database infinitely secure if we put it in a concrete bunker and sunk it to the bottom of the ocean, with the small price that we'd no longer be able to access it either. Somewhere between that extreme and the other extreme of setting the authentication method to 0.0.0.0/0 trust there's a happy medium where security is tolerably good but ease of access isn't crippled, and the same thing applies here. We could (probably) be the first database on the planet to store a 1024-bit encrypted checksum of every 8kB block, but that seems like it's going too far in the "concrete bunker" direction. IMHO, at least, we should be aiming for something that has a high probability of catching real problems and a low probability of being super-annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company