Re: backup manifests
David Steele <david@pgmasters.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On 11/22/19 2:01 PM, Robert Haas wrote: > On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote: >> Well, the maximum amount of data that can be protected with a 32-bit CRC >> is 512MB according to all the sources I found (NIST, Wikipedia, etc). I >> presume that's what we are talking about since I can't find any 64-bit >> CRC code in core or this patch. > > Could you give a more precise citation for this? See: https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf Search for "The maximum block size" https://en.wikipedia.org/wiki/Cyclic_redundancy_check "The design of the CRC polynomial depends on the maximum total length of the block to be protected (data + CRC bits)", which I took to mean there are limits. Here another interesting bit from: https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks "Because a CRC is based on division, no polynomial can detect errors consisting of a string of zeroes prepended to the data, or of missing leading zeroes" -- but it appears to matter what CRC you are using. There's a variation that works in this case and hopefully we are using that one. This paper talks about appropriate block lengths vs crc length: http://users.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf but it is concerned with network transmission and small block lengths. > "Typically an n-bit CRC applied to a data block of arbitrary length > will detect any single error burst not longer than n bits, and the > fraction of all longer error bursts that it will detect is (1 − > 2^−n)." I'm not sure how encouraging I find this -- a four-byte error not a lot and 2^32 is only 4 billion. We have individual users who have backed up more than 4 billion files over the last few years. >> This is the basic premise of what we call delta restore which can speed >> up restores by orders of magnitude. >> >> Delta restore is the main advantage that made us decide to require SHA1 >> checksums. In most cases, restore speed is more important than backup >> speed. > > I see your point, but it's not the whole story. We've encountered a > bunch of cases where the time it took to complete a backup exceeded > the user's desired backup interval, which is obviously very bad, or > even more commonly where it exceeded the length of the user's > "low-usage" period when they could tolerate the extra overhead imposed > by the backup. A few percentage points is probably not a big deal, but > a user who has an 8-hour window to get the backup done overnight will > not be happy if it's taking 6 hours now and we tack 40%-50% on to > that. So I think that we either have to disable backup checksums by > default, or figure out a way to get the overhead down to something a > lot smaller than what current tests are showing -- which we could > possibly do without changing the algorithm if we can somehow make it a > lot cheaper, but otherwise I think the choice is between disabling the > functionality altogether by default and adopting a less-expensive > algorithm. Maybe someday when delta restore is in core and widely used > and CPUs are faster, it'll make sense to revise the default, and > that's cool, but I can't see imposing a big overhead by default to > enable a feature core doesn't have yet... OK, I'll buy that. But I *don't* think CRCs should be allowed for deltas (when we have them) and I *do* think we should caveat their effectiveness (assuming we can agree on them). In general the answer to faster backups should be more cores/faster network/faster disk, not compromising backup integrity. I understand we'll need to wait until we have parallelism in pg_basebackup to justify that answer. Regards, -- -David david@pgmasters.net