Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: David Steele <david@pgmasters.net>

Cc: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2019-11-22T19:01:44Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
> Well, the maximum amount of data that can be protected with a 32-bit CRC
> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
> presume that's what we are talking about since I can't find any 64-bit
> CRC code in core or this patch.

Could you give a more precise citation for this? I can't find a
reference to that in the Wikipedia article off-hand and I don't know
where to look in NIST. I apologize if I'm being dense here, but I
don't see why there should be any limit on the amount of data that can
be protected. The important thing is that if the original file F is
altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
probability of that, assuming that the alteration is random rather
than malicious and that the checksum function is equally likely to
produce every possible output, is just 1-2^-${CHECKSUM_BITS},
regardless of the length of the message (except that there might be
some special cases for very short messages, which don't matter here).

This analysis by me seems to match
https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:

"Typically an n-bit CRC applied to a data block of arbitrary length
will detect any single error burst not longer than n bits, and the
fraction of all longer error bursts that it will detect is (1 −
2^−n)."

Notice the phrase "a data block of arbitrary length" and the formula "1 - 2^-n".

> > Phrased more positively, if you want a cryptographic hash
> > at all, you should probably use one that isn't widely viewed as too
> > weak.
>
> Sure.  There's another advantage to picking an algorithm with lower
> collision rates, though.
>
> CRCs are fine for catching transmission errors (as caveated above) but
> not as great for comparing two files for equality.  With strong hashes
> you can confidently compare local files against the path, size, and hash
> stored in the manifest and save yourself a round-trip to the remote
> storage to grab the file if it has not changed locally.

I agree in part. I think there are two reasons why a cryptographically
strong hash is desirable for delta restore. First, since the checksums
are longer, the probability of a false match happening randomly is
lower, which is important. Even if the above analysis is correct and
the chance of a false match is just 2^-32 with a 32-bit CRC, if you
back up ten million files every day, you'll likely get a false match
within a few years or less, and once is too often. Second, unlike what
I supposed above, the contents of a PostgreSQL data file are not
chosen at random, unlike transmission errors, which probably are more
or less random. It seems somewhat possible that there is an adversary
who is trying to choose the data that gets stored in some particular
record so as to create a false checksum match. A CRC is a lot easier
to fool than a crytographic hash, so I think that using a CRC of *any*
length for this kind of use case would be extremely dangerous no
matter the probability of an accidental match.

> This is the basic premise of what we call delta restore which can speed
> up restores by orders of magnitude.
>
> Delta restore is the main advantage that made us decide to require SHA1
> checksums.  In most cases, restore speed is more important than backup
> speed.

I see your point, but it's not the whole story. We've encountered a
bunch of cases where the time it took to complete a backup exceeded
the user's desired backup interval, which is obviously very bad, or
even more commonly where it exceeded the length of the user's
"low-usage" period when they could tolerate the extra overhead imposed
by the backup. A few percentage points is probably not a big deal, but
a user who has an 8-hour window to get the backup done overnight will
not be happy if it's taking 6 hours now and we tack 40%-50% on to
that. So I think that we either have to disable backup checksums by
default, or figure out a way to get the overhead down to something a
lot smaller than what current tests are showing -- which we could
possibly do without changing the algorithm if we can somehow make it a
lot cheaper, but otherwise I think the choice is between disabling the
functionality altogether by default and adopting a less-expensive
algorithm. Maybe someday when delta restore is in core and widely used
and CPUs are faster, it'll make sense to revise the default, and
that's cool, but I can't see imposing a big overhead by default to
enable a feature core doesn't have yet...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company