Re: backup manifests

Stephen Frost <sfrost@snowman.net>

From: Stephen Frost <sfrost@snowman.net>

To: Robert Haas <robertmhaas@gmail.com>

Cc: Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2020-03-26T20:44:14Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I do agree with excluding things like md5 and others that aren't good
> > options.  I wasn't saying we should necessarily exclude crc32c either..
> > but rather saying that it shouldn't be the default.
> >
> > Here's another way to look at it- where do we use crc32c today, and how
> > much data might we possibly be covering with that crc?
> 
> WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
> minus 1 byte. In practice, most of them are not more than a few
> hundred bytes, the amount we might possibly be covering is a lot more.

Is it actually possible, today, in PG, to have a 4GB WAL record?
Judging this based on the WAL record size doesn't seem quite right.

> > Why was crc32c
> > picked for that purpose?
> 
> Because it was discovered that 64-bit CRC was too slow, per commit
> 21fda22ec46deb7734f793ef4d7fa6c226b4c78e.

... 15 years ago.  I actually find it pretty interesting that we started
out with a 64bit CRC there, I didn't know that was the case.  Also
interesting is that we had 64bit CRC code already.

> > If the individual who decided to pick crc32c
> > for that case was contemplating a checksum for up-to-1GB files, would
> > they have picked crc32c?  Seems unlikely to me.
> 
> It's hard to be sure what someone who isn't us would have done in some
> situation that they didn't face, but we do have the discussion thread:
> 
> https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c
> 
> The question of how much data is protected by the CRC was discussed,
> mostly in the first few messages, in general terms, but it doesn't
> seem to have covered the question very thoroughly. I'm sure we could
> each draw things from that discussion that support our view of the
> situation, but I'm not sure it would be very productive.

Interesting.

> What confuses to me is that you seem to have a view of the upsides and
> downsides of these various algorithms that seems to me to be highly
> skewed. Like, suppose we change the default from CRC-32C to
> SHA-something. On the upside, the error detection rate will increase
> from 99.9999999+% to something much closer to 100%. On the downside,
> backups will get as much as 40-50% slower for some users. I hope we
> can agree that both detecting errors and taking backups quickly are
> important. However, it is hard for me to imagine that the typical user
> would want to pay even a 5-10% performance penalty when taking a
> backup in order to improve an error detection feature which they may
> not even use and which already has less than a one-in-a-billion chance
> of going wrong. We routinely reject features for causing, say, a 2%
> regression on general workloads. Base backup speed is probably less
> important than how many SELECT or INSERT queries you can pump through
> the system in a second, but it's still a pain point for lots of
> people. I think if you said to some users "hey, would you like to have
> error detection for your backups? it'll cost 10%" many people would
> say "yes, please." But I think if you went to the same users and said
> "hey, would you like to make the error detection for your backups
> better? it currently has a less than 1-in-a-billion chance of failing
> to detect random corruption, and you can reduce that by many orders of
> magnitude for an extra 10% on your backup time," I think the results
> would be much more mixed. Some people would like it, but it certainly
> not everybody.

I think you're right that base backup speed is much less of an issue to
slow down than SELECT or INSERT workloads, but I do also understand
that it isn't completely unimportant, which is why having options isn't
a bad idea here.  That said, the options presented for users should all
be reasonable options, and for the default we should pick something
sensible, erroring on the "be safer" side, if anything.

There's lots of options for speeding up base backups, with this patch,
even if the default is to have a manifest with sha256 hashes- it could
be changed to some form of CRC, or changed to not have checksums, or
changed to not have a manifest.  Users will have options.

Again, I'm not against having a checksum algorithm as a option.  I'm not
saying that it must be SHA512 as the default.

> > I'm not actually argueing about which hash functions we should support,
> > but rather what the default is and if crc32c, specifically, is actually
> > a reasonable choice.  Just because it's fast and we already had an
> > implementation of it doesn't justify its use as the default.  Given that
> > it doesn't actually provide the check that is generally expected of
> > CRC checksums (100% detection of single-bit errors) when the file size
> > gets over 512MB makes me wonder if we should have it at all, yes, but it
> > definitely makes me think it shouldn't be our default.
> 
> I mean, the property that I care about is the one where it detects
> better than 999,999,999 errors out of every 1,000,000,000, regardless
> of input length.

Throwing these kinds of things around I really don't think is useful.

> > I don't agree with limiting our view to only those algorithms that we've
> > already got implemented in PG.
> 
> I mean, opening that giant can of worms ~2 weeks before feature freeze
> is not very nice. This patch has been around for months, and the
> algorithms were openly discussed a long time ago. 

Yes, they were discussed before, and these issues were brought up before
and there was specifically concern brought up about exactly the same
issues that I'm repeating here.  Those concerns seem to have been
largely ignored, apparently because "we don't have that in PG today" as
at least one of the considerations- even though we used to.  I don't
think that was the right response and, yeah, I saw that you were
planning to commit and that prompted me to look into it right now.  I
don't think that's entirely uncommon around here.  I also had hoped that
David's concerns that were raised before had been heeded, as I knew he
was involved in the discussion previously, but that turns out to not
have been the case.

> > It's saying, removing the listing aspect, exactly that "backup_label is
> > excluded from verification".  That's what I am taking issue with.  I've
> > made multiple attempts to suggest other language to avoid saying that
> > because it's clearly wrong- the manifest is verified.
> 
> Well, it's talking about the particular kind of verification that has
> just been discussed, not any form of verification. As one idea,
> perhaps instead of:
> 
> + Certain files and directories are
> +   excluded from verification:
> 
> ...I could maybe insert a paragraph break there and then continue with
> something like this:
> 
> When pg_basebackup compares the files and directories in the manifest
> to those which are present on disk, it will ignore the presence of, or
> changes to, certain files:
> 
> backup_manifest will not be present in the manifest itself, and is
> therefore ignored. Note that the manifest is still verified
> internally, as described above, but no error will be issued about the
> presence of a backup_manifest file in the backup directory even though
> it is not listed in the manifest.
> 
> Would that be more clear? Do you want to suggest something else?

Yes, that looks fine.  Feels slightly redundant to include the "as
described above ..." bit, and I think that could be dropped, but up to
you.

> > I'm not talking about making sure that no error ever happens when doing
> > I'm saying that the existing tool that takes the backup has a *really*
> > *important* verification check that this proposed "validate backup" tool
> > doesn't have, and that isn't sensible.  It leads to situations where the
> > backup tool itself, pg_basebackup, can fail or be killed before it's
> > actually completed, and the "validate backup" tool would say that the
> > backup is perfectly fine.  That is not sensible.
> 
> If someone's procedure for taking and restoring backups involves not
> knowing whether or not pg_basebackup completed without error and then
> trying to use the backup anyway, they are doing something which is
> very foolish, and it's questionable whether any technological solution
> has much hope of getting them out of trouble. But on the plus side,
> this patch would have a good chance of detecting the problem, which is
> a noticeable improvement over what we have now, which has no chance of
> detecting the problem, because we have nothing.

This doesn't address my concern at all.  Even if it seems ridiculous and
foolish to think that a backup was successful when the system was
rebooted and pg_basebackup was killed before all of the WAL had made it
into pg_wal, there is absolutely zero doubt in my mind that it's going
to happen and users are going to, entirely reasonably, think that
pg_validatebackup at least includes all the checks that pg_basebackup
does about making sure that the backup is valid.

I really don't understand how we can have a backup validation tool that
doesn't do the absolute basics, like making sure that we have all of the
WAL for the backup.  I've routinely, almost jokingly, said to folks that
any backup tool that doesn't check that isn't really a backup tool, and
I was glad that pg_basebackup had that check, so, yeah, I'm going to
continue to object to committing a backup validation tool that doesn't
have that absolutely basic and necessary check.

Thanks,

Stephen