Re: backup manifests

David Steele <david@pgmasters.net>

From: David Steele <david@pgmasters.net>
To: Robert Haas <robertmhaas@gmail.com>, Andres Freund <andres@anarazel.de>
Cc: Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-03-27T20:02:00Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On 3/27/20 3:20 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
> 
>> Hm. Is it a great choice to include the checksum for the manifest inside
>> the manifest itself? With a cryptographic checksum it seems like it
>> could make a ton of sense to store the checksum somewhere "safe", but
>> keep the manifest itself alongside the base backup itself. While not
>> huge, they won't be tiny either.
> 
> Seems like the user could just copy the manifest checksum and store it
> somewhere, if they wish. Then they can check it against the manifest
> itself later, if they wish. Or they can take a SHA-512 of the whole
> file and store that securely. The problem is that we have no idea how
> to write that checksum to a more security storage. We could write
> backup_manifest and backup_manifest.checksum into separate files, but
> that seems like it's adding complexity without any real benefit.

I agree that this seems like a separate problem. What Robert has done 
here is detect random mutilation of the manifest.

To prevent malicious modifications you either need to store the checksum 
in another place, or digitally sign the file and store that alongside it 
(or inside it even). Either way seems pretty far out of scope to me.

>> Hm. I think it'd be good to verify that the checksummed size is the same
>> as the size of the file in the manifest.
> 
> That's checked in an earlier phase. Are you worried about the file
> being modified after the first pass checks the size and before we come
> through to do the checksumming?

I prefer to validate the size and checksum in the same pass, but I'm not 
sure it's that big a deal.  If the backup is being corrupted under the 
validate process that would also apply to files that had already been 
validated.

Regards,
-- 
-David
david@pgmasters.net