Re: backup manifests

David Fetter <david@fetter.org>

From: David Fetter <david@fetter.org>
To: Stephen Frost <sfrost@snowman.net>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Robert Haas <robertmhaas@gmail.com>, David Steele <david@pgmasters.net>, Tels <nospam-pg-abuse@bloodgate.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-01-14T22:14:49Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

On Tue, Jan 14, 2020 at 03:35:40PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * David Fetter (david@fetter.org) wrote:
> > On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > > Robert Haas <robertmhaas@gmail.com> writes:
> > > > ... I would also expect that depending on an external package
> > > > would provoke significant opposition. If we suck the code into core,
> > > > then we have to keep it up to date with the upstream, which is a
> > > > significant maintenance burden - look at all the time Tom has spent on
> > > > snowball, regex, and time zone code over the years.
> > > 
> > > Also worth noting is that we have a seriously bad track record about
> > > choosing external packages to depend on.  The regex code has no upstream
> > > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > > upstream for that now), and snowball is next door to moribund.
> > > With C not being a particularly hip language to develop in anymore,
> > > it wouldn't surprise me in the least for any C-code JSON parser
> > > we might pick to go dead pretty soon.
> > 
> > Given jq's extreme popularity and compatible license, I'd nominate that.
> 
> I don't think that really changes Tom's concerns here about having an
> "upstream" for this.
> 
> For my part, I don't really agree with the whole "we don't want two
> different JSON parsers" when we've got two of a bunch of stuff between
> the frontend and the backend, particularly since I don't really think
> it'll end up being *that* much code.
> 
> My thought, which I had expressed to David (though he obviously didn't
> entirely agree with me since he suggested the other options), was to
> adapt the pgBackRest JSON parser, which isn't really all that much code.
> 
> Frustratingly, that code has got some internal pgBackRest dependency on
> things like the memory context system (which looks, unsurprisingly, an
> awful lot like what is in PG backend), the error handling and logging
> systems (which are different from PG because they're quite intentionally
> segregated from each other- something PG would benefit from, imv..), and
> Variadics (known in the PG backend as Datums, and quite similar to
> them..).

It might be more fun to put in that infrastructure and have it gate
the manifest feature than to have two vastly different parsers to
contend with. I get that putting off the backup manifests isn't an
awesome prospect, but neither is rushing them in and getting them
wrong in ways we'll still be regretting a decade hence.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate