Re: backup manifests
Robert Haas <robertmhaas@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote: > Please find attached patch for backup validator implementation (0004 patch). This patch is based > on Rushabh's latest patch for backup manifest. > > There are some functions required at client side as well, so I have moved those functions > and some data structure at common place so that they can be accessible for both. (0003 patch). > > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which > is also attached. As of now, test cases related to the tablespace and tar backup format are missing, > will continue work on same and submit the complete patch. > > With this mail, I have attached the complete patch stack for backup manifest and backup > validate implementation. > > Please let me know your thoughts on the same. Well, for the second time on this thread, please don't take a bunch of somebody else's code and post it in a patch that doesn't attribute that person as one of the authors. For the second time on this thread, the person is me, but don't borrow *anyone's* code without proper attribution. It's really important! On a related note, it's a very good idea to use git format-patch and git rebase -i to maintain patch stacks like this. Rushabh seems to have done that, but the files you're posting look like raw 'git diff' output. Notice that this gives him a way to include authorship information and a tentative commit message in each patch, but you don't have any of that. Also on a related note, part of the process of adapting existing code to a new purpose is adapting the comments. You haven't done that: + * Search a result-set hash table for a row matching a given filename. ... + * Insert a row into a result-set hash table, provided no such row is already ... + * Most of the values + * that we're hashing are short integers formatted as text, so there + * shouldn't be much room for pathological input. I think that what we should actually do here is try to use simplehash. Right now, it won't work for frontend code, but I posted some patches to try to address that issue: https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com That would have a few advantages. One, we wouldn't need to know the number of elements in advance, because simplehash can grow dynamically. Two, we could use the iteration interfaces to walk the hash table. Your solution to that is pgrhash_seq_search, but that's actually not well-designed, because it's not a generic iterator function but something that knows specifically about the 'touch' flag. I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not bad, but I think 'matched' will be a little more recognizable. Please run pgindent. If needed, first add locally defined types to typedefs.list, so that things indent properly. It's not a crazy idea to try to share some data structures and code between the frontend and the backend here, but I think src/common/backup.c and src/include/common/backup.h is a far too generic name given what the code is actually doing. It's mostly about checksums, not backup, and I think it should be named accordingly. I suggest removing "manifestinfo" and renaming the rest to just talk about checksums rather than manifests. That would make it logical to reuse this for any other future code that needs a configurable checksum type. Also, how about adding a function like: extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo); ...which would return true and set *algo if name is recognized, and return false otherwise. That code could be used on both the client and server sides of this patch, and by any future patches that want to return this scaffolding. The file header for backup.h has the wrong filename (string.h). The header format looks somewhat atypical compared to what we normally do, too. It's arguable, but I tend to think that it would be better to hex-encode the CRC rather than printing it as an integer. Maybe hex_encode() is another thing that could be moved into the new src/common file. As I said before about Rushabh's patch set, it's very confusing that we have so many patches here stacked up. Like, you have 0002 moving stuff, and then 0003 moving it again. That's super-confusing. Please try to structure the patch set so as to make it as easy to review as possible. Regarding the test case patch, error checks are important! Don't do things like this: +open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf"; +print $modify_file_sha256 "port = 5555\n"; +close $modify_file_sha256; If the open fails, then it and the print and the close are going to silently do nothing. That's bad. I don't know exactly what the customary error-checking is for things like this in TAP tests, but I hope it's not like this, because this has a pretty fair chance of looking like it's testing something that it isn't. Let's figure out what the best practice in this area is and adhere to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company