Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>
To: Suraj Kharage <suraj.kharage@enterprisedb.com>
Cc: Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-02-27T15:52:25Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

Attachments

On Fri, Jan 3, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
> Thank you for review comments.

Here's a new patch set for this feature.

0001 adds checksum helper functions, similar to what Suraj had
incorporated into my original patch but separated out into a separate
patch and with some different aesthetic decisions. I also decided to
support all of the SHA variants that PG knows about as options and
added a function to parse a checksum algorithm name, along the lines I
suggested previously.

0002 teaches the server to generate a backup manifest using the format
I originally proposed. This is similar to the patch I posted
previously, but it spools the manifest to disk as it's being
generated, so that we don't run the server out of memory or fail when
hitting the 1GB allocation limit.

0003 adds a new utility, pg_validatebackup, to validate a backup
against a manifest. Suraj tried to incorporate this into
pg_basebackup, which I initially thought might be OK but eventually
decided wasn't good, partly because this really wants to take some
command-line options entirely unrelated to the options accepted by
pg_basebackup. I tried to improve the error checking and the order in
which various things are done, too. This is a basically a complete
rewrite as compared with Suraj's version.

0004 modifies the server to generate a backup manifest in JSON format
rather than my originally proposed format. This allows for some
comparison of the code doing it one way vs. the other. Assuming we
stick with JSON, I will squash this with 0002 at some point.

0005 is a very much work-in-progress and proof-of-concept to modify
the backup validator to understand the JSON format. It doesn't
validate the manifest checksum at this point; it just prints it out.
The error handling needs work. It has other problems, and bugs.
Although I'm still not very happy about the idea of using JSON here,
I'm pretty happy with the basic approach this patch takes. It
demonstrates that the JSON parser can be used for non-trivial things
in frontend code, and I'd say the code even looks reasonably clean -
with the exception of small details like being buggy and
under-commented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company