Re: documenting the backup manifest file format

Jehan-Guillaume de Rorthais <jgdr@dalibo.com>

From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
To: David Steele <david@pgmasters.net>
Cc: Robert Haas <robertmhaas@gmail.com>, Alvaro Herrera <alvherre@2ndquadrant.com>, Justin Pryzby <pryzby@telsasoft.com>, Andres Freund <andres@anarazel.de>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-04-16T22:23:27Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On Wed, 15 Apr 2020 18:54:14 -0400
David Steele <david@pgmasters.net> wrote:

> On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
> > On Wed, 15 Apr 2020 12:03:28 -0400
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >   
> >> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
> >> <jgdr@dalibo.com> wrote:  
> >>> But for backup_manifest, it's kind of shame we have to check the checksum
> >>> against an transformed version of the file. Did you consider creating eg.
> >>> a separate backup_manifest.sha256 file?
> >>>
> >>> I'm very sorry in advance if this has been discussed previously.  
> >>
> >> It was briefly mentioned in the original (lengthy) discussion, but I
> >> think there was one vote in favor and two votes against or something
> >> like that, so it didn't go anywhere.  
> > 
> > Argh.
> >   
> >> I didn't realize that there were handy command-line tools for manipulating
> >> json like that, or I probably would have considered that idea more
> >> strongly.  
> > 
> > That was indeed a lengthy thread with various details discussed. I'm sorry I
> > didn't catch the ball back then.  
> 
> One of the reasons to use JSON was to be able to use command line tools 
> like jq to do tasks (I use it myself).

That's perfectly fine. I was only wondering about having the manifest checksum
outside of the manifest itself.

> But I think only the pg_verifybackup tool should be used to verify the
> internal checksum.

true.

> Two thoughts:
> 
> 1) You can always generate an external checksum when you generate the 
> backup if you want to do your own verification without running 
> pg_verifybackup.

Sure, but by the time I want to produce an external checksum, the manifest
would have travel around quite a bit with various danger on its way to corrupt
it. Checksuming it from the original process that produced it sounds safer.

> 2) Perhaps it would be good if the pg_verifybackup command had a 
> --verify-manifest-checksum option (or something) to check that the 
> manifest file looks valid without checking any files. That's not going 
> to happen for PG13, but it's possible for PG14.

Sure.

I just liked the idea to be able to check the manifest using an external
command line implementing the same standardized checksum algo. Without editing
the manifest first. But I understand it's too late to discuss this now.

Regards,