Re: backup manifests

Robert Haas <robertmhaas@gmail.com>

From: Robert Haas <robertmhaas@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: David Fetter <david@fetter.org>, David Steele <david@pgmasters.net>, Tels <nospam-pg-abuse@bloodgate.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-01-02T01:57:11Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
> > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> >> So, if someone can suggest to me how I could read JSON from a tool in
> >> src/bin without writing a lot of code, I'm all ears.
>
> > Maybe I'm missing something obvious, but wouldn't combining
> > pg_read_file() with a cast to JSONB fix this, as below?
>
> Only if you're prepared to restrict the use of the tool to superusers
> (or at least people with whatever privilege that function requires).
>
> Admittedly, you can probably feed the data to the backend without
> use of an intermediate file; but it still requires a working backend
> connection, which might be a bit of a leap for backup-related tools.
> I'm sure Robert was envisioning doing this processing inside the tool.

Yeah, exactly. I don't think verifying a backup should require a
running server, let alone a running server on the same machine where
the backup is stored and for which you have superuser privileges.
AFAICS, the only options to make that work with JSON are (1) introduce
a new hand-coded JSON parser designed for frontend operation, (2) add
a dependency on an external JSON parser that we can use from frontend
code, or (3) adapt the existing JSON parser used in the backend so
that it can also be used in the frontend.

I'd be willing to do (1) -- it wouldn't be the first time I've written
JSON parser for PostgreSQL -- but I think it will take an order of
magnitude more code than using a file with tab-separated columns as
I've proposed, and I assume that there will be complaints about having
two JSON parsers in core. I'd also be willing to do (2) if that's the
consensus, but I'd vote against such an approach if somebody else
proposed it because (a) I'm not aware of a widely-available library
upon which we could depend and (b) introducing such a dependency for a
minor feature like this seems fairly unpalatable to me, and it'd
probably still be more code than just using a tab-separated file.  I'd
be willing to do (3) if somebody could explain to me how to solve the
problems with porting that code to work on the frontend side, but the
only suggestion so far as to how to do that is to port memory
contexts, elog/report, and presumably encoding handling to work on the
frontend side. That seems to me to be an unreasonably large lift,
especially given that we have lots of other files that use ad-hoc
formats already, and if somebody ever gets around to converting all of
those to JSON, they can certainly convert this one at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company