Re: backup manifests

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Robert Haas <robertmhaas@gmail.com>

Cc: Stephen Frost <sfrost@snowman.net>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2020-03-27T19:56:24Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Hi,

On 2020-03-27 14:13:17 -0400, Robert Haas wrote:
> On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > I mean, the property that I care about is the one where it detects
> > > better than 999,999,999 errors out of every 1,000,000,000, regardless
> > > of input length.
> >
> > Throwing these kinds of things around I really don't think is useful.
> 
> ...but I don't understand his reasoning, or yours.
> 
> My reasoning for thinking that the number is accurate is that a 32-bit
> checksum has 2^32 possible results. If all of those results are
> equally probable, then the probability that two files with unequal
> contents produce the same result is 2^-32. This does assume that the
> hash function is perfect, which no hash function is, so the actual
> probability of a collision is likely higher. But if the hash function
> is pretty good, it shouldn't be all that much higher. Note that I am
> making no assumptions here about how many bits are different, nor am I
> making any assumption about the length of a file. I am simply saying
> that an n-bit checksum should detect a difference between two files
> with a probability of roughly 1-2^{-n}, modulo the imperfections of
> the hash function. I thought that this was a well-accepted fact that
> would produce little argument from anybody, and I'm confused that
> people seem to feel otherwise.

Well: crc32 is a terrible hash, if you're looking for even distribution
of hashed values. That's not too surprising - its design goals included
guaranteed error detection for certain lengths, and error correction of
single bit errors.  My understanding of the underlying math is spotty at
best, but from what I understand that does pretty directly imply less
independence between source data -> hash value than what we'd want from
a good hash function.

Here's an smhasher result page for crc32 (at least the latter is crc32
afaict):
https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32
https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32_hw

and then compare that with something like xxhash, or even lookup3 (which
I think is what our hash is a variant of):
https://notabug.org/vaeringjar/smhasher/src/master/doc/xxHash32
https://notabug.org/vaeringjar/smhasher/src/master/doc/lookup3

The birthday paradoxon doesn't apply (otherwise 32bit would never be
enough, at a 50% chance of conflict at around 80k hashes), but still I
do wonder if it matters that we're trying to detect errors in not one,
but commonly tens of thousands to millions of files. But since we just
need to detect one error to call the whole backup corrupt...

> One explanation that would make sense to me is if somebody said, well,
> the nature of this particular algorithm means that, although values
> are uniformly distributed in general, the kinds of errors that are
> likely to occur in practice are likely to cancel out. For instance, if
> you imagine trivial algorithms such as adding or xor-ing all the
> bytes, adding zero bytes doesn't change the answer, and neither do
> transpositions. However, CRC is, AIUI, designed to be resistant to
> such problems. Your remark about large blocks of zero bytes is
> interesting to me in this context, but in a quick search I couldn't
> find anything stating that CRC was weak for such use cases.

My main point was that CRC's error detection guarantees are pretty much
irrelevant for us. I.e. while the right CRC will guarantee that all
single 2 bit errors will be detected, that's not a helpful property for
us. There rarely are single bit errors, and the bursts are too long to
to benefit from any >2 bit guarantees. Nor are multiple failures rare
once you hit a problem.

> The old thread about switching from 64-bit CRC to 32-bit CRC had a
> link to a page which has subsequently been moved to here:
> 
> https://www.ece.unb.ca/tervo/ee4253/crc.shtml
> 
> Down towards the bottom, it says:
> 
> "In general, bit errors and bursts up to N-bits long will be detected
> for a P(x) of degree N. For arbitrary bit errors longer than N-bits,
> the odds are one in 2^{N} than a totally false bit pattern will
> nonetheless lead to a zero remainder."

That's still about a single sequence of bit errors though, as far as I
can tell. I.e. it doesn't hold for CRCs if you have two errors at
different places.

Greetings,

Andres Freund