Re: documenting the backup manifest file format
David Steele <david@pgmasters.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On 4/13/20 4:14 PM, Robert Haas wrote: > On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> Also, I >> see no mention of prettification-chars such as newlines or indentation. >> I suppose if I pass a manifest file through prettification (or Windows >> newline conversion), the checksum may break. > > It would indeed break. I'm not sure what you want me to say here, > though. If you're trying to parse a manifest, you shouldn't care about > how the whitespace is arranged. If you're trying to generate one, you > can arrange it any way you like, as long as you also include it in the > checksum. pgBackRest ignores whitespace but this is a legacy of the way Perl calculated checksums, not an intentional feature. This worked well when the manifest was loaded as a whole, converted to JSON, and checksummed, but it is a major pain for the streaming code we now have in C. I guarantee that that our next manifest version will do a simple checksum of bytes as Robert has done in this feature. So, I'm +1 as implemented. >> Why is the top-level checksum only allowed to be SHA-256, if the files >> can use up to SHA-512? <snip> > I agree that it's a little bit weird that you can have a stronger > checksum for the files instead of the manifest itself, but I also > wonder what the use case would be for using a stronger checksum on the > manifest. David Steele argued that strong checksums on the files could > be useful to software that wants to rifle through all the backups > you've ever taken and find another copy of that file by looking for > something with a matching checksum. CRC-32C wouldn't be strong enough > for that, because eventually you could have enough files that you > start to have collisions. The SHA algorithms output enough bits to > make that quite unlikely. But this argument only makes sense for the > files, not the manifest. Agreed. I think SHA-256 is *more* than enough to protect the manifest against corruption. That said, since the cost of SHA-256 vs. SHA-512 in the context on the manifest is negligible we could just use the stronger algorithm to deflect a similar question going forward. That choice might not age well, but we could always say, well, we picked it because it was the strongest available at the time. Allowing a choice of which algorithm to use for to manifest checksum seems like it will just make verifying the file harder with no tangible benefit. Maybe just a comment in the docs about why SHA-256 was used would be fine. >> (Also, did we intentionally omit the dash in >> hash names, so "SHA-256" to make it SHA256? This will also be critical >> for checksumming the manifest itself.) > > I debated this with myself, settled on this spelling, and nobody > complained until now. It could be changed, though. I didn't have any > particular reason for choosing it except the feeling that people would > probably prefer to type --manifest-checksum=sha256 rather than > --manifest-checksum=sha-256. +1 for sha256 rather than sha-256. Regards, -- -David david@pgmasters.net