Re: backup manifests

David Fetter <david@fetter.org>

From: David Fetter <david@fetter.org>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, David Steele <david@pgmasters.net>, Tels <nospam-pg-abuse@bloodgate.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-01-02T18:03:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On Wed, Jan 01, 2020 at 08:57:11PM -0500, Robert Haas wrote:
> On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Fetter <david@fetter.org> writes:
> > > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> > >> So, if someone can suggest to me how I could read JSON from a tool in
> > >> src/bin without writing a lot of code, I'm all ears.
> >
> > > Maybe I'm missing something obvious, but wouldn't combining
> > > pg_read_file() with a cast to JSONB fix this, as below?
> >
> > Only if you're prepared to restrict the use of the tool to superusers
> > (or at least people with whatever privilege that function requires).
> >
> > Admittedly, you can probably feed the data to the backend without
> > use of an intermediate file; but it still requires a working backend
> > connection, which might be a bit of a leap for backup-related tools.
> > I'm sure Robert was envisioning doing this processing inside the tool.
> 
> Yeah, exactly. I don't think verifying a backup should require a
> running server, let alone a running server on the same machine where
> the backup is stored and for which you have superuser privileges.

Thanks for clarifying the context.

> AFAICS, the only options to make that work with JSON are (1) introduce
> a new hand-coded JSON parser designed for frontend operation, (2) add
> a dependency on an external JSON parser that we can use from frontend
> code, or (3) adapt the existing JSON parser used in the backend so
> that it can also be used in the frontend.
> 
> I'd be willing to do (1) -- it wouldn't be the first time I've written
> JSON parser for PostgreSQL -- but I think it will take an order of
> magnitude more code than using a file with tab-separated columns as
> I've proposed, and I assume that there will be complaints about having
> two JSON parsers in core. I'd also be willing to do (2) if that's the
> consensus, but I'd vote against such an approach if somebody else
> proposed it because (a) I'm not aware of a widely-available library
> upon which we could depend and

I believe jq has an excellent one that's available under a suitable
license.

Making jq a dependency seems like a separate discussion, though. At
the moment, we don't use git tools like submodel/subtree, and deciding
which (or whether) seems like a gigantic discussion all on its own.

> (b) introducing such a dependency for a minor feature like this
> seems fairly unpalatable to me, and it'd probably still be more code
> than just using a tab-separated file.  I'd be willing to do (3) if
> somebody could explain to me how to solve the problems with porting
> that code to work on the frontend side, but the only suggestion so
> far as to how to do that is to port memory contexts, elog/report,
> and presumably encoding handling to work on the frontend side.

This port has come up several times recently in different contexts.
How big a chunk of work would it be?  Just so we're clear, I'm not
suggesting that this port should gate this feature.

> That seems to me to be an unreasonably large lift, especially given
> that we have lots of other files that use ad-hoc formats already,
> and if somebody ever gets around to converting all of those to JSON,
> they can certainly convert this one at the same time.

Would that require some kind of file converter program, or just a
really loud notice in the release notes?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate