Re: backup manifests
Robert Haas <robertmhaas@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote: > Stephen Frost mentioned that a backup could pass validation even if > pg_basebackup were killed after writing the base backup and before finishing > the writing of pg_wal. One might avoid that by simply writing the manifest to > a temporary name and renaming it to the final name after populating pg_wal. Huh, that's an idea. I'll have a look at the code and see what would be involved. > What do you think of having the verification process also call pg_waldump to > validate the WAL CRCs (shown upthread)? That looked helpful and simple. I don't love calls to external binaries, but I think the thing that really bothers me is that pg_waldump is practically bound to terminate with an error, because the last WAL segment will end with a partial record. For the same reason, I think there's really no such thing as validating a single WAL file. I suppose you'd need to know the exact start and end locations for a minimal WAL replay and check that all records between those LSNs appear OK, ignoring any apparent problems after the minimum ending point, or at least ignoring any problems due to an incomplete record in the last file. We don't have a tool for that currently, and I don't think I can write one this week. Or at least, not a good one. > I think this functionality doesn't belong in its own program. If you suspect > pg_basebackup or pg_restore will eventually gain the ability to merge > incremental backups into a recovery-ready base backup, I would put the > functionality in that program. Otherwise, I would put it in pg_checksums. > For me, part of the friction here is that the program description indicates > general verification, but the actual functionality merely checks hashes on a > directory tree that happens to represent a PostgreSQL base backup. Suraj's original patch made this part of pg_basebackup, but I didn't really like that, because I wanted it to have its own set of options. I still think all the options I've added are pretty useful ones, and I can think of other things somebody might want to do. It feels very uncomfortable to make pg_basebackup, or pg_checksums, take either options from set A and do thing X, or options from set B and do thing Y. But it feels clear that the name pg_validatebackup is not going over very well with anyone. I think I should rename it to pg_validatemanifest. > > + parse->pathname = palloc(raw_length + 1); > > I don't see this freed anywhere; is it? (It's useful to make peak memory > consumption not grow in proportion to the number of files backed up.) We need the hash table to remain populated for the whole run time of the tool, because we're essentially doing a full join of the actual directory contents against the manifest contents. That's a bit unfortunate but it doesn't seem simple to improve. I think the only people who are really going to suffer are people who have an enormous pile of empty or nearly-empty relations. People who have large databases for the normal reason - i.e. a reasonable number of tables that hold a lot of data - will have manifests of very manageable size. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company