Re: backup manifests
David Steele <david@pgmasters.net>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Hi Robert, On 9/19/19 9:51 AM, Robert Haas wrote: > On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote: >> Also consider adding the timestamp. > > Sounds reasonable, even if only for the benefit of humans who might > look at the file. We can decide later whether to use it for anything > else (and third-party tools could make different decisions from core). > I assume we're talking about file mtime here, not file ctime or file > atime or the time the manifest was generated, but let me know if I'm > wrong. In my experience only mtime is useful. >> Based on my original calculations (which sadly I don't have anymore), >> the combination of SHA1, size, and file name is *extremely* unlikely to >> generate a collision. As in, unlikely to happen before the end of the >> universe kind of unlikely. Though, I guess it depends on your >> expectations for the lifetime of the universe. > What I'd say is: if > the probability of getting a collision is demonstrably many orders of > magnitude less than the probability of the disk writing the block > incorrectly, then I think we're probably reasonably OK. Somebody might > differ, which is perhaps a mild point in favor of LSN-based > approaches, but as a practical matter, if a bad block is a billion > times more likely to be the result of a disk error than a checksum > mismatch, then it's a negligible risk. Agreed. >> We include the version/sysid of the cluster to avoid mixups. It's a >> great extra check on top of references to be sure everything is kosher. > > I don't think it's a good idea to duplicate the information that's > already in the backup_label. Storing two copies of the same > information is just an invitation to having to worry about what > happens if they don't agree. OK, but now we have backup_label, tablespace_map, XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a backup.manifest file. I feel like we may be drowning in backup info files. >> I'd >> recommend JSON for the format since it is so ubiquitous and easily >> handles escaping which can be gotchas in a home-grown format. We >> currently have a format that is a combination of Windows INI and JSON >> (for human-readability in theory) and we have become painfully aware of >> escaping issues. Really, why would you drop files with '=' in their >> name in PGDATA? And yet it happens. > > I am not crazy about JSON because it requires that I get a json parser > into src/common, which I could do, but given the possibly-imminent end > of the universe, I'm not sure it's the greatest use of time. You're > right that if we pick an ad-hoc format, we've got to worry about > escaping, which isn't lovely. My experience is that JSON is simple to implement and has already dealt with escaping and data structure considerations. A home-grown solution will be at least as complex but have the disadvantage of being non-standard. >>> One thing I'm not quite sure about is where to store the backup >>> manifest. If you take a base backup in tar format, you get base.tar, >>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace. >>> Does the backup manifest go into base.tar? Get written into a separate >>> file outside of any tar archive? Something else? And what about a >>> plain-format backup? I suppose then we should just write the manifest >>> into the top level of the main data directory, but perhaps someone has >>> another idea. >> >> We do: >> >> [backup_label]/ >> backup.manifest >> pg_data/ >> pg_tblspc/ >> >> In general, having the manifest easily accessible is ideal. > > That's a fine choice for a tool, but a I'm talking about something > that is part of the actual backup format supported by PostgreSQL, not > what a tool might wrap around it. The choice is whether, for a > tar-format backup, the manifest goes inside a tar file or as a > separate file. To put that another way, a patch adding backup > manifests does not get to redesign where pg_basebackup puts anything > else; it only gets to decide where to put the manifest. Fair enough. The point is to make the manifest easily accessible. I'd keep it in the data directory for file-based backups and as a separate file for tar-based backups. The advantage here is that we can pick a file name that becomes reserved which a tool can't do. Regards, -- -David david@pgmasters.net