Re: backup manifests

David Steele <david@pgmasters.net>

From: David Steele <david@pgmasters.net>
To: Robert Haas <robertmhaas@gmail.com>, Stephen Frost <sfrost@snowman.net>
Cc: Andres Freund <andres@anarazel.de>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-03-27T20:16:11Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On 3/27/20 3:29 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
>>> Seems better to (later?) add support for generating manifests for WAL
>>> files, and then have a tool that can verify all the manifests required
>>> to restore a base backup.
>>
>> I'm not trying to expand on the feature set here or move the goalposts
>> way down the road, which is what seems to be what's being suggested
>> here.  To be clear, I don't have any objection to adding a generic tool
>> for validating WAL as you're talking about here, but I also don't think
>> that's required for pg_validatebackup.  What I do think we need is a
>> check of the WAL that's fetched when people use pg_basebackup -Xstream
>> or -Xfetch.  pg_basebackup itself has that check because it's critical
>> to the backup being successful and valid.  Not having that basic
>> validation of a backup really just isn't ok- there's a reason
>> pg_basebackup has that check.
> 
> I don't understand how this could be done without significantly
> complicating the architecture. As I said before, -Xstream sends WAL
> over a separate connection that is unrelated to the one running
> BASE_BACKUP, so the base-backup connection doesn't know what to
> include in the manifest. Now you could do something like: once all of
> the WAL files have been fetched, the client checksums all of those and
> sends their names and checksums to the server, which turns around and
> puts them into the manifest, which it then sends back to the client.
> But that is actually quite a bit of additional complexity, and it's
> pretty strange, too, because now you have the client checksumming some
> files and the server checksumming others. I know you mentioned a few
> different ideas before, but I think they all kinda have some problem
> along these lines.
> 
> I also kinda disagree with the idea that the WAL should be considered
> an integral part of the backup. I don't know how pgbackrest does
> things, 

We checksum each WAL file while it is read and transmitted to the repo 
by the archive_command.  Then at the end of the backup we ensure that 
all the WAL required to make the backup consistent has made it to the repo.

> but BART stores each backup in a separate directly without any
> associated WAL, and then keeps all the WAL together in a different
> directory. I imagine that people who are using continuous archiving
> also tend to use -Xnone, or if they do backups by copying the files
> rather than using pg_backrest, they exclude pg_wal. In fact, for
> people with big, important databases, I'd assume that would be the
> normal pattern. You presumably wouldn't want to keep one copy of the
> WAL files taken during the backup with the backup itself, and a
> separate copy in the archive.

pgBackRest does provide the option to copy WAL into the backup directory 
for the super-paranoid, though it is not the default. It is pretty handy 
for moving individual backups some other medium like tape, though.

If -Xnone is specified then it seems like pg_validatebackup is 
completely off the hook.  But in the case of -Xstream or -Xfetch 
couldn't we at least verify that the expected WAL segments are present 
and the correct size?

Storing the start/stop lsn in the manifest would be a nice thing to have 
anyway and that would make this feature pretty trivial. Yeah, that's in 
the backup_label file as well but the manifest is so much easier to read.

Regards,
-- 
-David
david@pgmasters.net