Re: backup manifests

David Steele <david@pgmasters.net>

From: David Steele <david@pgmasters.net>
To: Stephen Frost <sfrost@snowman.net>, Robert Haas <robertmhaas@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-03-27T20:39:29Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On 3/27/20 3:55 PM, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> I think that what we have seen so far is that all of the SHA-n
>> algorithms that PostgreSQL supports are about equally slow, so it
>> doesn't really matter which one you pick there from a performance
>> point of view. If you're not saying it has to be SHA-512 but you do
>> want it to be SHA-256, I don't think that really fixes anything. Using
>> CRC-32C does fix the performance issue, but I don't think you like
>> that, either. We could default to having no checksums at all, or even
>> no manifest at all, but I didn't get the impression that David, at
>> least, wanted to go that way, and I don't like it either. It's not the
>> world's best feature, but I think it's good enough to justify enabling
>> it by default. So I'm not sure we have any options here that will
>> satisfy you.
> 
> I do like having a manifest by default.  At this point it's pretty clear
> that we've just got a fundamental disagreement that more words aren't
> going to fix.  I'd rather we play it safe and use a sha256 hash and
> accept that it's going to be slower by default, and then give users an
> option to make it go faster if they want (though I'd much rather that
> alternative be a 64bit CRC than a 32bit one).
> 
> Andres seems to agree with you.  I'm not sure where David sits on this
> specific question.

I would prefer a stronger checksum as the default but I would be fine 
with SHA1, which is a bit faster.

I believe the overhead of checksums is being overblown. In my experience 
the vast majority of users are using compression and running the backup 
over a network.  Once you have done those two things the cost of SHA1 is 
pretty negligible.  As I posted way up-thread we found that just gzip -6 
pushed the cost of SHA1 below 3% and that did not include network transfer.

Regards,
-- 
-David
david@pgmasters.net