Re: backup manifests

David Steele <david@pgmasters.net>

From: David Steele <david@pgmasters.net>
To: Robert Haas <robertmhaas@gmail.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Date: 2019-09-20T03:06:04Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

Hi Robert,

On 9/19/19 9:51 AM, Robert Haas wrote:
> On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
>> Also consider adding the timestamp.
> 
> Sounds reasonable, even if only for the benefit of humans who might
> look at the file.  We can decide later whether to use it for anything
> else (and third-party tools could make different decisions from core).
> I assume we're talking about file mtime here, not file ctime or file
> atime or the time the manifest was generated, but let me know if I'm
> wrong.

In my experience only mtime is useful.

>> Based on my original calculations (which sadly I don't have anymore),
>> the combination of SHA1, size, and file name is *extremely* unlikely to
>> generate a collision.  As in, unlikely to happen before the end of the
>> universe kind of unlikely.  Though, I guess it depends on your
>> expectations for the lifetime of the universe.

> What I'd say is: if
> the probability of getting a collision is demonstrably many orders of
> magnitude less than the probability of the disk writing the block
> incorrectly, then I think we're probably reasonably OK. Somebody might
> differ, which is perhaps a mild point in favor of LSN-based
> approaches, but as a practical matter, if a bad block is a billion
> times more likely to be the result of a disk error than a checksum
> mismatch, then it's a negligible risk.

Agreed.

>> We include the version/sysid of the cluster to avoid mixups.  It's a
>> great extra check on top of references to be sure everything is kosher.
> 
> I don't think it's a good idea to duplicate the information that's
> already in the backup_label. Storing two copies of the same
> information is just an invitation to having to worry about what
> happens if they don't agree.

OK, but now we have backup_label, tablespace_map, 
XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a 
backup.manifest file.  I feel like we may be drowning in backup info files.

>> I'd
>> recommend JSON for the format since it is so ubiquitous and easily
>> handles escaping which can be gotchas in a home-grown format.  We
>> currently have a format that is a combination of Windows INI and JSON
>> (for human-readability in theory) and we have become painfully aware of
>> escaping issues.  Really, why would you drop files with '=' in their
>> name in PGDATA?  And yet it happens.
> 
> I am not crazy about JSON because it requires that I get a json parser
> into src/common, which I could do, but given the possibly-imminent end
> of the universe, I'm not sure it's the greatest use of time. You're
> right that if we pick an ad-hoc format, we've got to worry about
> escaping, which isn't lovely.

My experience is that JSON is simple to implement and has already dealt 
with escaping and data structure considerations.  A home-grown solution 
will be at least as complex but have the disadvantage of being non-standard.

>>> One thing I'm not quite sure about is where to store the backup
>>> manifest. If you take a base backup in tar format, you get base.tar,
>>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
>>> Does the backup manifest go into base.tar? Get written into a separate
>>> file outside of any tar archive? Something else? And what about a
>>> plain-format backup? I suppose then we should just write the manifest
>>> into the top level of the main data directory, but perhaps someone has
>>> another idea.
>>
>> We do:
>>
>> [backup_label]/
>>      backup.manifest
>>      pg_data/
>>      pg_tblspc/
>>
>> In general, having the manifest easily accessible is ideal.
> 
> That's a fine choice for a tool, but a I'm talking about something
> that is part of the actual backup format supported by PostgreSQL, not
> what a tool might wrap around it. The choice is whether, for a
> tar-format backup, the manifest goes inside a tar file or as a
> separate file. To put that another way, a patch adding backup
> manifests does not get to redesign where pg_basebackup puts anything
> else; it only gets to decide where to put the manifest.

Fair enough.  The point is to make the manifest easily accessible.

I'd keep it in the data directory for file-based backups and as a 
separate file for tar-based backups.  The advantage here is that we can 
pick a file name that becomes reserved which a tool can't do.

Regards,
-- 
-David
david@pgmasters.net