Re: backup manifests
Suraj Kharage <suraj.kharage@enterprisedb.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- v2-0001-Backup-manifest-with-file-names-sizes-timestamps-.patch (application/octet-stream) patch v2-0001
- v2-0002-Implemenation-of-backup-validator.patch (application/octet-stream) patch v2-0002
- v2-0003-Tap-test-case-patch-to-verify-the-backup-using-ve.patch (application/octet-stream) patch v2-0003
Thanks, Robert for the review. On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage > <suraj.kharage@enterprisedb.com> wrote: > > Please find attached patch for backup validator implementation (0004 > patch). This patch is based > > on Rushabh's latest patch for backup manifest. > > > > There are some functions required at client side as well, so I have > moved those functions > > and some data structure at common place so that they can be accessible > for both. (0003 patch). > > > > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for > tap test cases which > > is also attached. As of now, test cases related to the tablespace and > tar backup format are missing, > > will continue work on same and submit the complete patch. > > > > With this mail, I have attached the complete patch stack for backup > manifest and backup > > validate implementation. > > > > Please let me know your thoughts on the same. > > Well, for the second time on this thread, please don't take a bunch of > somebody else's code and post it in a patch that doesn't attribute > that person as one of the authors. For the second time on this thread, > the person is me, but don't borrow *anyone's* code without proper > attribution. It's really important! > > On a related note, it's a very good idea to use git format-patch and > git rebase -i to maintain patch stacks like this. Rushabh seems to > have done that, but the files you're posting look like raw 'git diff' > output. Notice that this gives him a way to include authorship > information and a tentative commit message in each patch, but you > don't have any of that. > Sorry, I have corrected this in the attached v2 patch set. > Also on a related note, part of the process of adapting existing code > to a new purpose is adapting the comments. You haven't done that: > > + * Search a result-set hash table for a row matching a given filename. > ... > + * Insert a row into a result-set hash table, provided no such row is > already > ... > + * Most of the values > + * that we're hashing are short integers formatted as text, so there > + * shouldn't be much room for pathological input. > Corrected in v2 patch. > I think that what we should actually do here is try to use simplehash. > Right now, it won't work for frontend code, but I posted some patches > to try to address that issue: > > > https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com > > That would have a few advantages. One, we wouldn't need to know the > number of elements in advance, because simplehash can grow > dynamically. Two, we could use the iteration interfaces to walk the > hash table. Your solution to that is pgrhash_seq_search, but that's > actually not well-designed, because it's not a generic iterator > function but something that knows specifically about the 'touch' flag. > I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not > bad, but I think 'matched' will be a little more recognizable. > Thanks for the suggestion. Will try to implement the same and update accordingly. I am assuming that I need to build the patch based on the changes that you proposed on the mentioned thread. > Please run pgindent. If needed, first add locally defined types to > typedefs.list, so that things indent properly. > > It's not a crazy idea to try to share some data structures and code > between the frontend and the backend here, but I think > src/common/backup.c and src/include/common/backup.h is a far too > generic name given what the code is actually doing. It's mostly about > checksums, not backup, and I think it should be named accordingly. I > suggest removing "manifestinfo" and renaming the rest to just talk > about checksums rather than manifests. That would make it logical to > reuse this for any other future code that needs a configurable > checksum type. Also, how about adding a function like: > > extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo); > > ...which would return true and set *algo if name is recognized, and > return false otherwise. That code could be used on both the client and > server sides of this patch, and by any future patches that want to > return this scaffolding. > Corrected the filename and implemented the function as suggested. > The file header for backup.h has the wrong filename (string.h). The > header format looks somewhat atypical compared to what we normally do, > too. My bad, corrected the header format as well. > > It's arguable, but I tend to think that it would be better to > hex-encode the CRC rather than printing it as an integer. Maybe > hex_encode() is another thing that could be moved into the new > src/common file. We are already encoding the CRC checksum as well. Please let me know if I misunderstood anything. Moved hex_encode into src/common. > As I said before about Rushabh's patch set, it's very confusing that > we have so many patches here stacked up. Like, you have 0002 moving > stuff, and then 0003 moving it again. That's super-confusing. Please > try to structure the patch set so as to make it as easy to review as > possible. > Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch. > Regarding the test case patch, error checks are important! Don't do > things like this: > > +open my $modify_file_sha256, '>>', > "$tempdir/backup_verify/postgresql.conf"; > +print $modify_file_sha256 "port = 5555\n"; > +close $modify_file_sha256; > > If the open fails, then it and the print and the close are going to > silently do nothing. That's bad. I don't know exactly what the > customary error-checking is for things like this in TAP tests, but I > hope it's not like this, because this has a pretty fair chance of > looking like it's testing something that it isn't. Let's figure out > what the best practice in this area is and adhere to it. > Rajkumar has fixed this, please find attached 0003 patch for same. Please find attached v2 set patches. TODO: will implement the simplehash as suggested. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- -- Thanks & Regards, Suraj kharage, EnterpriseDB Corporation, The Postgres Database Company.