Re: backup manifests
Robert Haas <robertmhaas@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- 0002-POC-of-backup-manifest-with-file-names-sizes-timesta.patch (application/octet-stream) patch 0002
- 0001-Refactor-some-pg_basebackup-code.patch (application/octet-stream) patch 0001
On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote: > Seems to me we are overdue for elog()/ereport() compatible > error-handling in the front end. Plus mem contexts. > > It sucks to make that a prereq for this project but the longer we kick > that can down the road... There are no doubt many patches that would benefit from having more backend infrastructure exposed in frontend contexts, and I think we're slowly moving in that direction, but I generally do not believe in burdening feature patches with major infrastructure improvements. Sometimes it's necessary, as in the case of parallel query, which required upgrading a whole lot of backend infrastructure in order to have any chance of doing something useful. In most cases, however, there's a way of getting the patch done that dodges the problem. For example, I think there's a pretty good argument that Heikki's design for relation forks was a bad one. It's proven to scale poorly and create performance problems and extra complexity in quite a few places. It would likely have been better, from a strictly theoretical point of view, to insist on a design where the FSM and VM pages got stored inside the relation itself, and the heap was responsible for figuring out how various pages were being used. When BRIN came along, we insisted on precisely that design, because it was clear that further straining the relation fork system was not a good plan. However, if we'd insisted on that when Heikki did the original work, it might have delayed the arrival of the free space map for one or more releases, and we got big benefits out of having that done sooner. There's nothing stopping someone from writing a patch to get rid of relation forks and allow a heap AM to have multiple relfilenodes (with the extra ones used for the FSM and VM) or with multiplexing all the data inside of a single file. Nobody has, though, because it's hard, and the problems with the status quo are not so bad as to justify the amount of development effort that would be required to fix it. At some point, that problem is probably going to work its way to the top of somebody's priority list, but it's already been about 10 years since that all happened and everyone has so far dodged dealing with the problem, which in turn has enabled them to work on other things that are perhaps more important. I think the same principle applies here. It's reasonable to ask the author of a feature patch to fix issues that are closely related to the feature in question, or even problems that are not new but would be greatly exacerbated by the addition of the feature. It's not reasonable to stack up a list of infrastructure upgrades that somebody has to do as a condition of having a feature patch accepted that does not necessarily require those upgrades. I am not convinced that JSON is actually a better format for a backup manifest (more on that below), but even if I were, I believe that getting a backup manifest functionality into PostgreSQL 13, and perhaps incremental backup on top of that, is valuable enough to justify making some compromises to make that happen. And I don't mean "compromises" as in "let's commit something that doesn't work very well;" rather, I mean making design choices that are aimed at making the project something that is feasible and can be completed in reasonable time, rather than not. And saying, well, the backup manifest format *has* to be JSON because everything else suxxor is not that. We don't have a single other example of a file that we read and write in JSON format. Extension control files use a custom format. Backup labels and backup history files and timeline history files and tablespace map files use custom formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom formats. postmaster.opts and postmaster.pid use custom formats. If JSON is better and easier, at least one of the various people who coded those things up would have chosen to use it, but none of them did, and nobody's made a serious attempt to convert them to use it. That might be because we lack the infrastructure for dealing with JSON and building it is more work than anybody's willing to do, or it might be because JSON is not actually better for these kinds of use cases, but either way, it's hard to see why this particular patch should be burdened with a requirement that none of the previous ones had to satisfy. Personally, I'd be intensely unhappy if a motion to convert postgresql.conf or pg_hba.conf to JSON format gathered enough steam to be adopted. It would be darn useful, because you could specify complex values for options instead of being limited to scalars, but it would also make the configuration files a lot harder for human beings to read and grep and the quality of error reporting would probably decline significantly. Also, appending a setting to the file, something which is currently quite simple, would get a lot harder. Ad-hoc file formats can be problematic, but they can also have real advantages in terms of readability, brevity, and fitness for purpose. > This talk was good fun. The largest number of tables we've seen is a > few hundred thousand, but that still adds up to more than a million > files to backup. A quick survey of some of my colleagues turned up a few examples of people with 2-4 million files to backup, so similar kind of ballpark. Probably not big enough for the manifest to hit the 1GB mark, but getting close. > > Or we could just decide that you have to have enough memory > > to hold the parsed version of the entire manifest file in memory all > > at once, and if you don't, maybe you should drop some tables or buy > > more RAM. > > I assume you meant "un-parsed" here? I don't think I meant that, although it seems like you might need to store either all the parsed data or all the unparsed data or even both, depending on exactly what you are trying to do. > > I hear you saying that this is going to end up being just as complex > > in the end, but I don't think I believe it. It sounds to me like the > > difference between spending a couple of hours figuring this out and > > spending a couple of months trying to figure it out and maybe not > > actually getting anywhere. > > Maybe the initial implementation will be easier but I am confident we'll > pay for it down the road. Also, don't we want users to be able to read > this file? Do we really want them to need to cook up a custom parser in > Perl, Go, Python, etc.? Well, I haven't heard anybody complain that they can't read a backup_label file because it's too hard to cook up a parser. And I think the reason is pretty clear: such files are not hard to parse. Similarly for a pg_hba.conf file. This case is a little more complicated than those, but AFAICS, not enormously so. Actually, it seems like a combination of those two cases: it has some fixed metadata fields that can be represented with one line per field, like a backup_label, and then a bunch of entries for files that are somewhat like entries in a pg_hba.conf file, in that they can be represented by a line per record with a certain number of fields on each line. I attach here a couple of patches. The first one does some refactoring of relevant code in pg_basebackup, and the second one adds checksum manifests using a format that I pulled out of my ear. It probably needs some adjustment but I don't think it's crazy. Each file gets a line that looks like this: File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM Right now, the file checksums are computed using SHA-256 but it could be changed to anything else for which we've got code. On my system, shasum -a256 $FILE produces the same answer that shows up here. At the bottom of the manifest there's a checksum of the manifest itself, which looks like this: Manifest-Checksum 385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781 That's a SHA-256 checksum of the file contents excluding the final line. It can be verified by feeding all the file contents except the last line to shasum -a256. I can't help but observe that if the file were defined to be a JSONB blob, it's not very clear how you would include a checksum of the blob contents in the blob itself, but with a format based on a bunch of lines of data, it's super-easy to generate and super-easy to write tools that verify it. This is just a prototype so I haven't written a verification tool, and there's a bunch of testing and documentation and so forth that would need to be done aside from whatever we've got to hammer out in terms of design issues and file formats. But I think it's cool, and perhaps some discussion of how it could be evolved will get us closer to a resolution everybody can at least live with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company