Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: David Steele <david@pgmasters.net>

Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>

Date: 2019-09-20T18:55:15Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Attachments

0002-POC-of-backup-manifest-with-file-names-sizes-timesta.patch (application/octet-stream) patch 0002
0001-Refactor-some-pg_basebackup-code.patch (application/octet-stream) patch 0001

On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote:
> Seems to me we are overdue for elog()/ereport() compatible
> error-handling in the front end.  Plus mem contexts.
>
> It sucks to make that a prereq for this project but the longer we kick
> that can down the road...

There are no doubt many patches that would benefit from having more
backend infrastructure exposed in frontend contexts, and I think we're
slowly moving in that direction, but I generally do not believe in
burdening feature patches with major infrastructure improvements.
Sometimes it's necessary, as in the case of parallel query, which
required upgrading a whole lot of backend infrastructure in order to
have any chance of doing something useful. In most cases, however,
there's a way of getting the patch done that dodges the problem.

For example, I think there's a pretty good argument that Heikki's
design for relation forks was a bad one. It's proven to scale poorly
and create performance problems and extra complexity in quite a few
places. It would likely have been better, from a strictly theoretical
point of view, to insist on a design where the FSM and VM pages got
stored inside the relation itself, and the heap was responsible for
figuring out how various pages were being used. When BRIN came along,
we insisted on precisely that design, because it was clear that
further straining the relation fork system was not a good plan.
However, if we'd insisted on that when Heikki did the original work,
it might have delayed the arrival of the free space map for one or
more releases, and we got big benefits out of having that done sooner.
There's nothing stopping someone from writing a patch to get rid of
relation forks and allow a heap AM to have multiple relfilenodes (with
the extra ones used for the FSM and VM) or with multiplexing all the
data inside of a single file. Nobody has, though, because it's hard,
and the problems with the status quo are not so bad as to justify the
amount of development effort that would be required to fix it. At some
point, that problem is probably going to work its way to the top of
somebody's priority list, but it's already been about 10 years since
that all happened and everyone has so far dodged dealing with the
problem, which in turn has enabled them to work on other things that
are perhaps more important.

I think the same principle applies here. It's reasonable to ask the
author of a feature patch to fix issues that are closely related to
the feature in question, or even problems that are not new but would
be greatly exacerbated by the addition of the feature. It's not
reasonable to stack up a list of infrastructure upgrades that somebody
has to do as a condition of having a feature patch accepted that does
not necessarily require those upgrades. I am not convinced that JSON
is actually a better format for a backup manifest (more on that
below), but even if I were, I believe that getting a backup manifest
functionality into PostgreSQL 13, and perhaps incremental backup on
top of that, is valuable enough to justify making some compromises to
make that happen. And I don't mean "compromises" as in "let's commit
something that doesn't work very well;" rather, I mean making design
choices that are aimed at making the project something that is
feasible and can be completed in reasonable time, rather than not.

And saying, well, the backup manifest format *has* to be JSON because
everything else suxxor is not that. We don't have a single other
example of a file that we read and write in JSON format. Extension
control files use a custom format. Backup labels and backup history
files and timeline history files and tablespace map files use custom
formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom
formats. postmaster.opts and postmaster.pid use custom formats. If
JSON is better and easier, at least one of the various people who
coded those things up would have chosen to use it, but none of them
did, and nobody's made a serious attempt to convert them to use it.
That might be because we lack the infrastructure for dealing with JSON
and building it is more work than anybody's willing to do, or it might
be because JSON is not actually better for these kinds of use cases,
but either way, it's hard to see why this particular patch should be
burdened with a requirement that none of the previous ones had to
satisfy.

Personally, I'd be intensely unhappy if a motion to convert
postgresql.conf or pg_hba.conf to JSON format gathered enough steam to
be adopted.  It would be darn useful, because you could specify
complex values for options instead of being limited to scalars, but it
would also make the configuration files a lot harder for human beings
to read and grep and the quality of error reporting would probably
decline significantly.  Also, appending a setting to the file,
something which is currently quite simple, would get a lot harder.
Ad-hoc file formats can be problematic, but they can also have real
advantages in terms of readability, brevity, and fitness for purpose.

> This talk was good fun.  The largest number of tables we've seen is a
> few hundred thousand, but that still adds up to more than a million
> files to backup.

A quick survey of some of my colleagues turned up a few examples of
people with 2-4 million files to backup, so similar kind of ballpark.
Probably not big enough for the manifest to hit the 1GB mark, but
getting close.

> > Or we could just decide that you have to have enough memory
> > to hold the parsed version of the entire manifest file in memory all
> > at once, and if you don't, maybe you should drop some tables or buy
> > more RAM.
>
> I assume you meant "un-parsed" here?

I don't think I meant that, although it seems like you might need to
store either all the parsed data or all the unparsed data or even
both, depending on exactly what you are trying to do.

> > I hear you saying that this is going to end up being just as complex
> > in the end, but I don't think I believe it.  It sounds to me like the
> > difference between spending a couple of hours figuring this out and
> > spending a couple of months trying to figure it out and maybe not
> > actually getting anywhere.
>
> Maybe the initial implementation will be easier but I am confident we'll
> pay for it down the road.  Also, don't we want users to be able to read
> this file?  Do we really want them to need to cook up a custom parser in
> Perl, Go, Python, etc.?

Well, I haven't heard anybody complain that they can't read a
backup_label file because it's too hard to cook up a parser.  And I
think the reason is pretty clear: such files are not hard to parse.
Similarly for a pg_hba.conf file.  This case is a little more
complicated than those, but AFAICS, not enormously so. Actually, it
seems like a combination of those two cases: it has some fixed
metadata fields that can be represented with one line per field, like
a backup_label, and then a bunch of entries for files that are
somewhat like entries in a pg_hba.conf file, in that they can be
represented by a line per record with a certain number of fields on
each line.

I attach here a couple of patches.  The first one does some
refactoring of relevant code in pg_basebackup, and the second one adds
checksum manifests using a format that I pulled out of my ear. It
probably needs some adjustment but I don't think it's crazy.  Each
file gets a line that looks like this:

File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM

Right now, the file checksums are computed using SHA-256 but it could
be changed to anything else for which we've got code. On my system,
shasum -a256 $FILE produces the same answer that shows up here.  At
the bottom of the manifest there's a checksum of the manifest itself,
which looks like this:

Manifest-Checksum
385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781

That's a SHA-256 checksum of the file contents excluding the final
line. It can be verified by feeding all the file contents except the
last line to shasum -a256. I can't help but observe that if the file
were defined to be a JSONB blob, it's not very clear how you would
include a checksum of the blob contents in the blob itself, but with a
format based on a bunch of lines of data, it's super-easy to generate
and super-easy to write tools that verify it.

This is just a prototype so I haven't written a verification tool, and
there's a bunch of testing and documentation and so forth that would
need to be done aside from whatever we've got to hammer out in terms
of design issues and file formats.  But I think it's cool, and perhaps
some discussion of how it could be evolved will get us closer to a
resolution everybody can at least live with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company