Re: documenting the backup manifest file format

David Steele <david@pgmasters.net>

From: David Steele <david@pgmasters.net>
To: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>, Robert Haas <robertmhaas@gmail.com>
Cc: Alvaro Herrera <alvherre@2ndquadrant.com>, Justin Pryzby <pryzby@telsasoft.com>, Andres Freund <andres@anarazel.de>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-04-15T22:54:14Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
> On Wed, 15 Apr 2020 12:03:28 -0400
> Robert Haas <robertmhaas@gmail.com> wrote:
> 
>> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
>> <jgdr@dalibo.com> wrote:
>>> But for backup_manifest, it's kind of shame we have to check the checksum
>>> against an transformed version of the file. Did you consider creating eg. a
>>> separate backup_manifest.sha256 file?
>>>
>>> I'm very sorry in advance if this has been discussed previously.
>>
>> It was briefly mentioned in the original (lengthy) discussion, but I
>> think there was one vote in favor and two votes against or something
>> like that, so it didn't go anywhere.
> 
> Argh.
> 
>> I didn't realize that there were handy command-line tools for manipulating
>> json like that, or I probably would have considered that idea more strongly.
> 
> That was indeed a lengthy thread with various details discussed. I'm sorry I
> didn't catch the ball back then.

One of the reasons to use JSON was to be able to use command line tools 
like jq to do tasks (I use it myself). But I think only the 
pg_verifybackup tool should be used to verify the internal checksum.

Two thoughts:

1) You can always generate an external checksum when you generate the 
backup if you want to do your own verification without running 
pg_verifybackup.

2) Perhaps it would be good if the pg_verifybackup command had a 
--verify-manifest-checksum option (or something) to check that the 
manifest file looks valid without checking any files. That's not going 
to happen for PG13, but it's possible for PG14.

Regards,
-- 
-David
david@pgmasters.net