Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>

To: David Steele <david@pgmasters.net>

Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>

Date: 2019-09-19T13:51:11Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
> Also consider adding the timestamp.

Sounds reasonable, even if only for the benefit of humans who might
look at the file.  We can decide later whether to use it for anything
else (and third-party tools could make different decisions from core).
I assume we're talking about file mtime here, not file ctime or file
atime or the time the manifest was generated, but let me know if I'm
wrong.

> Consider adding a reference to each file that specifies where the file
> can be found in if it is not in this backup.  As I understand the
> pg_basebackup proposal, it would only be implementing differential
> backups, i.e. an incremental that is *only* based on the last full
> backup.  So, the reference can be inferred in this case.  However, if
> the user selects the wrong full backup on restore, and we have labeled
> each backup, then a differential restore with references against the
> wrong full backup would result in a hard error rather than corruption.

I intend that we should be able to support incremental backups based
either on a previous full backup or based on a previous incremental
backup. I am not aware of a technical reason why we need to identify
the specific backup that must be used. If incremental backup B is
taken based on a pre-existing backup A, then I think that B can be
restored using either A or *any other backup taken after A and before
B*. In the normal case, there probably wouldn't be any such backup,
but AFAICS the start-LSNs are a sufficient cross-check that the chosen
base backup is legal.

> Based on my original calculations (which sadly I don't have anymore),
> the combination of SHA1, size, and file name is *extremely* unlikely to
> generate a collision.  As in, unlikely to happen before the end of the
> universe kind of unlikely.  Though, I guess it depends on your
> expectations for the lifetime of the universe.

Somebody once said that we should be prepared for it to end at an any
time, or not, and that the time at which it actually was due to end
would not be disclosed in advance. This is probably good life advice
which I ought to take more frequently than I do, but I think we can
finesse the issue for purposes of this discussion. What I'd say is: if
the probability of getting a collision is demonstrably many orders of
magnitude less than the probability of the disk writing the block
incorrectly, then I think we're probably reasonably OK. Somebody might
differ, which is perhaps a mild point in favor of LSN-based
approaches, but as a practical matter, if a bad block is a billion
times more likely to be the result of a disk error than a checksum
mismatch, then it's a negligible risk.

> And maybe a few other bits of metadata, but I'm not sure
> > exactly what.  Ideas?
>
> A backup label for sure.  You can also use this as the directory/tar
> name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
> full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
> incrementals and have logic to prevent two backups from having the same
> label.  This is unlikely outside of testing but still a good idea.
>
> Knowing the start/stop time of the backup is useful in all kinds of
> ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
> also good.  I know this is also in backup_label but having it all in one
> place is nice.
>
> We include the version/sysid of the cluster to avoid mixups.  It's a
> great extra check on top of references to be sure everything is kosher.

I don't think it's a good idea to duplicate the information that's
already in the backup_label. Storing two copies of the same
information is just an invitation to having to worry about what
happens if they don't agree.

> A manifest version is good in case we change the format later.

Yeah.

> I'd
> recommend JSON for the format since it is so ubiquitous and easily
> handles escaping which can be gotchas in a home-grown format.  We
> currently have a format that is a combination of Windows INI and JSON
> (for human-readability in theory) and we have become painfully aware of
> escaping issues.  Really, why would you drop files with '=' in their
> name in PGDATA?  And yet it happens.

I am not crazy about JSON because it requires that I get a json parser
into src/common, which I could do, but given the possibly-imminent end
of the universe, I'm not sure it's the greatest use of time. You're
right that if we pick an ad-hoc format, we've got to worry about
escaping, which isn't lovely.

> > (1) When taking a backup, have the option (perhaps enabled by default)
> > to include a backup manifest.
>
> Manifests are cheap to builds so I wouldn't make it an option.

Huh. That's an interesting idea. Thanks.

> > (3) Cross-check a manifest against a backup and complain about extra
> > files, missing files, size differences, or checksum mismatches.
>
> Verification is the best part of the manifest.  Plus, you can do
> verification pretty cheaply on restore.  We also restore pg_control last
> so clusters that have a restore error won't start.

There's no "restore" operation here, really. A backup taken by
pg_basebackup can be "restored" by copying the whole thing, but it can
also be used just where it is. If we were going to build something
into some in-core tool to copy backups around, this would be a smart
way to implement said tool, but I'm not planning on that myself.

> > One thing I'm not quite sure about is where to store the backup
> > manifest. If you take a base backup in tar format, you get base.tar,
> > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> > Does the backup manifest go into base.tar? Get written into a separate
> > file outside of any tar archive? Something else? And what about a
> > plain-format backup? I suppose then we should just write the manifest
> > into the top level of the main data directory, but perhaps someone has
> > another idea.
>
> We do:
>
> [backup_label]/
>     backup.manifest
>     pg_data/
>     pg_tblspc/
>
> In general, having the manifest easily accessible is ideal.

That's a fine choice for a tool, but a I'm talking about something
that is part of the actual backup format supported by PostgreSQL, not
what a tool might wrap around it. The choice is whether, for a
tar-format backup, the manifest goes inside a tar file or as a
separate file. To put that another way, a patch adding backup
manifests does not get to redesign where pg_basebackup puts anything
else; it only gets to decide where to put the manifest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company