Re: backup manifests

Suraj Kharage <suraj.kharage@enterprisedb.com>

From: Suraj Kharage <suraj.kharage@enterprisedb.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2019-12-12T12:32:49Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

Attachments

Thanks, Robert for the review.

On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> > Please find attached patch for backup validator implementation (0004
> patch). This patch is based
> > on Rushabh's latest patch for backup manifest.
> >
> > There are some functions required at client side as well, so I have
> moved those functions
> > and some data structure at common place so that they can be accessible
> for both. (0003 patch).
> >
> > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for
> tap test cases which
> > is also attached. As of now, test cases related to the tablespace and
> tar backup  format are missing,
> > will continue work on same and submit the complete patch.
> >
> > With this mail, I have attached the complete patch stack for backup
> manifest and backup
> > validate implementation.
> >
> > Please let me know your thoughts on the same.
>
> Well, for the second time on this thread, please don't take a bunch of
> somebody else's code and post it in a patch that doesn't attribute
> that person as one of the authors. For the second time on this thread,
> the person is me, but don't borrow *anyone's* code without proper
> attribution. It's really important!
>
> On a related note, it's a very good idea to use git format-patch and
> git rebase -i to maintain patch stacks like this. Rushabh seems to
> have done that, but the files you're posting look like raw 'git diff'
> output. Notice that this gives him a way to include authorship
> information and a tentative commit message in each patch, but you
> don't have any of that.
>

Sorry, I have corrected this in the attached v2 patch set.


> Also on a related note, part of the process of adapting existing code
> to a new purpose is adapting the comments. You haven't done that:
>
> + * Search a result-set hash table for a row matching a given filename.
> ...
> + * Insert a row into a result-set hash table, provided no such row is
> already
> ...
> + * Most of the values
> + * that we're hashing are short integers formatted as text, so there
> + * shouldn't be much room for pathological input.
>
Corrected in v2 patch.


> I think that what we should actually do here is try to use simplehash.
> Right now, it won't work for frontend code, but I posted some patches
> to try to address that issue:
>
>
> https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
>
> That would have a few advantages. One, we wouldn't need to know the
> number of elements in advance, because simplehash can grow
> dynamically. Two, we could use the iteration interfaces to walk the
> hash table.  Your solution to that is pgrhash_seq_search, but that's
> actually not well-designed, because it's not a generic iterator
> function but something that knows specifically about the 'touch' flag.
> I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
> bad, but I think 'matched' will be a little more recognizable.
>

Thanks for the suggestion. Will try to implement the same and update
accordingly.
I am assuming that I need to build the patch based on the changes that you
proposed on the mentioned thread.


> Please run pgindent. If needed, first add locally defined types to
> typedefs.list, so that things indent properly.
>
> It's not a crazy idea to try to share some data structures and code
> between the frontend and the backend here, but I think
> src/common/backup.c and src/include/common/backup.h is a far too
> generic name given what the code is actually doing. It's mostly about
> checksums, not backup, and I think it should be named accordingly. I
> suggest removing "manifestinfo" and renaming the rest to just talk
> about checksums rather than manifests. That would make it logical to
> reuse this for any other future code that needs a configurable
> checksum type. Also, how about adding a function like:
>
> extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);
>
> ...which would return true and set *algo if name is recognized, and
> return false otherwise. That code could be used on both the client and
> server sides of this patch, and by any future patches that want to
> return this scaffolding.
>

Corrected the filename and implemented the function as suggested.


> The file header for backup.h has the wrong filename (string.h). The
> header format looks somewhat atypical compared to what we normally do,
> too.


My bad, corrected the header format as well.


>
>
It's arguable, but I tend to think that it would be better to
> hex-encode the CRC rather than printing it as an integer.  Maybe
> hex_encode() is another thing that could be moved into the new
> src/common file.


We are already encoding the CRC checksum as well. Please let me know if I
misunderstood anything.
Moved hex_encode into src/common.


> As I said before about Rushabh's patch set, it's very confusing that
> we have so many patches here stacked up. Like, you have 0002 moving
> stuff, and then 0003 moving it again. That's super-confusing. Please
> try to structure the patch set so as to make it as easy to review as
> possible.
>

Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch.


> Regarding the test case patch, error checks are important! Don't do
> things like this:
>
> +open my $modify_file_sha256, '>>',
> "$tempdir/backup_verify/postgresql.conf";
> +print $modify_file_sha256 "port = 5555\n";
> +close $modify_file_sha256;
>
> If the open fails, then it and the print and the close are going to
> silently do nothing. That's bad. I don't know exactly what the
> customary error-checking is for things like this in TAP tests, but I
> hope it's not like this, because this has a pretty fair chance of
> looking like it's testing something that it isn't. Let's figure out
> what the best practice in this area is and adhere to it.
>

Rajkumar has fixed this, please find attached 0003 patch for same.

Please find attached v2 set patches.

TODO: will implement the simplehash as suggested.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.