Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>
To: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Cc: Rushabh Lathia <rushabh.lathia@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>, David Steele <david@pgmasters.net>
Date: 2019-11-22T15:58:27Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> I admit I haven't been following along closely, but why do we need a
> cryptographic checksum here instead of, say, a CRC? Do we think that
> somehow the checksum might be forged? Use of cryptographic hashes as
> general purpose checksums has become far too common IMNSHO.

I tend to agree with you. I suspect if we just use CRC, some people
are going to complain that they want something "stronger" because that
will make them feel better about error detection rates or obscure
threat models or whatever other things a SHA-based approach might be
able to catch that CRC would not catch. However, I suspect that for
normal use cases, CRC would be totally adequate, and the fact that the
performance overhead is almost none vs. a whole lot - at least in this
test setup, other results might vary depending on what you test -
makes it look pretty appealing.

My gut reaction is to make CRC the default, but have an option that
you can use to either turn it off entirely (if even 1-2% is too much
for you) or opt in to SHA-something if you want it. I don't think we
should offer an option for MD5, because MD5 is a dirty word these days
and will cause problems for users who have to worry about FIPS 140-2
compliance. Phrased more positively, if you want a cryptographic hash
at all, you should probably use one that isn't widely viewed as too
weak.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company