Re: backup manifests

Stephen Frost <sfrost@snowman.net>

From: Stephen Frost <sfrost@snowman.net>
To: David Fetter <david@fetter.org>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Robert Haas <robertmhaas@gmail.com>, David Steele <david@pgmasters.net>, Tels <nospam-pg-abuse@bloodgate.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-01-14T20:35:40Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

Greetings,

* David Fetter (david@fetter.org) wrote:
> On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > ... I would also expect that depending on an external package
> > > would provoke significant opposition. If we suck the code into core,
> > > then we have to keep it up to date with the upstream, which is a
> > > significant maintenance burden - look at all the time Tom has spent on
> > > snowball, regex, and time zone code over the years.
> > 
> > Also worth noting is that we have a seriously bad track record about
> > choosing external packages to depend on.  The regex code has no upstream
> > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > upstream for that now), and snowball is next door to moribund.
> > With C not being a particularly hip language to develop in anymore,
> > it wouldn't surprise me in the least for any C-code JSON parser
> > we might pick to go dead pretty soon.
> 
> Given jq's extreme popularity and compatible license, I'd nominate that.

I don't think that really changes Tom's concerns here about having an
"upstream" for this.

For my part, I don't really agree with the whole "we don't want two
different JSON parsers" when we've got two of a bunch of stuff between
the frontend and the backend, particularly since I don't really think
it'll end up being *that* much code.

My thought, which I had expressed to David (though he obviously didn't
entirely agree with me since he suggested the other options), was to
adapt the pgBackRest JSON parser, which isn't really all that much code.

Frustratingly, that code has got some internal pgBackRest dependency on
things like the memory context system (which looks, unsurprisingly, an
awful lot like what is in PG backend), the error handling and logging
systems (which are different from PG because they're quite intentionally
segregated from each other- something PG would benefit from, imv..), and
Variadics (known in the PG backend as Datums, and quite similar to
them..).

Even so, David's offered to adjust the code to use the frontend's memory
management (*cough* malloc()..), and error handling/logging, and he had
some idea for Variadics (or maybe just pulling the backend's Datum
system in..?  He could answer better), and basically write a frontend
JSON parser for PG without too much code, no external dependencies, and
to make sure it answers this requirement, and I've agreed that he can
spend some time on that instead of pgBackRest to get us through this, if
everyone else is agreeable to the idea.  Obviously this isn't intended
to box anyone in- if there turns out even after the code's been written
to be some fatal issue with using it, so be it, but we're offering to
help.

Thanks,

Stephen