Re: backup manifests

David Fetter <david@fetter.org>

From: David Fetter <david@fetter.org>

To: Robert Haas <robertmhaas@gmail.com>

Cc: Tom Lane <tgl@sss.pgh.pa.us>, David Steele <david@pgmasters.net>, Tels <nospam-pg-abuse@bloodgate.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2020-01-02T18:03:23Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

On Wed, Jan 01, 2020 at 08:57:11PM -0500, Robert Haas wrote:
> On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Fetter <david@fetter.org> writes:
> > > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
> > >> So, if someone can suggest to me how I could read JSON from a tool in
> > >> src/bin without writing a lot of code, I'm all ears.
> >
> > > Maybe I'm missing something obvious, but wouldn't combining
> > > pg_read_file() with a cast to JSONB fix this, as below?
> >
> > Only if you're prepared to restrict the use of the tool to superusers
> > (or at least people with whatever privilege that function requires).
> >
> > Admittedly, you can probably feed the data to the backend without
> > use of an intermediate file; but it still requires a working backend
> > connection, which might be a bit of a leap for backup-related tools.
> > I'm sure Robert was envisioning doing this processing inside the tool.
> 
> Yeah, exactly. I don't think verifying a backup should require a
> running server, let alone a running server on the same machine where
> the backup is stored and for which you have superuser privileges.

Thanks for clarifying the context.

> AFAICS, the only options to make that work with JSON are (1) introduce
> a new hand-coded JSON parser designed for frontend operation, (2) add
> a dependency on an external JSON parser that we can use from frontend
> code, or (3) adapt the existing JSON parser used in the backend so
> that it can also be used in the frontend.
> 
> I'd be willing to do (1) -- it wouldn't be the first time I've written
> JSON parser for PostgreSQL -- but I think it will take an order of
> magnitude more code than using a file with tab-separated columns as
> I've proposed, and I assume that there will be complaints about having
> two JSON parsers in core. I'd also be willing to do (2) if that's the
> consensus, but I'd vote against such an approach if somebody else
> proposed it because (a) I'm not aware of a widely-available library
> upon which we could depend and

I believe jq has an excellent one that's available under a suitable
license.

Making jq a dependency seems like a separate discussion, though. At
the moment, we don't use git tools like submodel/subtree, and deciding
which (or whether) seems like a gigantic discussion all on its own.

> (b) introducing such a dependency for a minor feature like this
> seems fairly unpalatable to me, and it'd probably still be more code
> than just using a tab-separated file.  I'd be willing to do (3) if
> somebody could explain to me how to solve the problems with porting
> that code to work on the frontend side, but the only suggestion so
> far as to how to do that is to port memory contexts, elog/report,
> and presumably encoding handling to work on the frontend side.

This port has come up several times recently in different contexts.
How big a chunk of work would it be?  Just so we're clear, I'm not
suggesting that this port should gate this feature.

> That seems to me to be an unreasonably large lift, especially given
> that we have lots of other files that use ad-hoc formats already,
> and if somebody ever gets around to converting all of those to JSON,
> they can certainly convert this one at the same time.

Would that require some kind of file converter program, or just a
really loud notice in the release notes?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate