Re: backup manifests
Rushabh Lathia <rushabh.lathia@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- 0002_new_data_structure_v2.patch (text/x-patch) patch v2
On Mon, Dec 9, 2019 at 2:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:
>
> Thanks Jeevan for reviewing the patch and offline discussion.
>
> On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <
> jeevan.chalke@enterprisedb.com> wrote:
>
>>
>>
>> On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>
>>>> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <
>>>> rushabh.lathia@gmail.com> wrote:
>>>> > Here is the whole stack of patches.
>>>>
>>>> I committed 0001, as that's just refactoring and I think (hope) it's
>>>> uncontroversial. I think 0002-0005 need to be squashed together
>>>> (crediting all authors properly and in the appropriate order) as it's
>>>> quite hard to understand right now,
>>>
>>>
>>> Please find attached single patch and I tried to add the credit to all
>>> the authors.
>>>
>>
>> I had a look over the patch and here are my few review comments:
>>
>> 1.
>> + if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
>> + manifest_checksums = MC_SHA256;
>> + else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") ==
>> 0)
>> + manifest_checksums = MC_CRC32C;
>> + else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
>> + manifest_checksums = MC_NONE;
>> + else
>> + ereport(ERROR,
>>
>> Is NONE is a valid input? I think the default is "NONE" only and thus no
>> need
>> of this as an input. It will be better if we simply error out if input is
>> neither "SHA256" nor "CRC32C".
>>
>> I believe you have done this way as from pg_basebackup you are always
>> passing
>> MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
>> given. But I think passing that conditional will be better like we have
>> maxrate_clause for example.
>>
>> Well, this is what I think, feel free to ignore as I don't see any
>> correctness
>> issue over here.
>>
>>
> I would still keep this NONE as it's look more cleaner in the say of
> given options to the checksums.
>
>
>> 2.
>> + if (manifest_checksums != MC_NONE)
>> + {
>> + checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
>> + switch (manifest_checksums)
>> + {
>> + case MC_NONE:
>> + break;
>> + }
>>
>> Since switch case is within "if (manifest_checksums != MC_NONE)"
>> condition,
>> I don't think we need a case for MC_NONE here. Rather we can use a default
>> case to error out.
>>
>>
> Yeah, with the new patch we don't have this part of code.
>
>
>> 3.
>> + if (manifest_checksums != MC_NONE)
>> + {
>> + initialize_manifest_checksum(&cCtx);
>> + update_manifest_checksum(&cCtx, content, len);
>> + }
>>
>> @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
>> *tarfilename, struct stat *statbuf
>> int segmentno = 0;
>> char *segmentpath;
>> bool verify_checksum = false;
>> + ChecksumCtx cCtx;
>> +
>> + initialize_manifest_checksum(&cCtx);
>>
>>
>> I see that in a few cases you are calling
>> initialize/update_manifest_checksum()
>> conditional and at some other places call is unconditional. It seems like
>> calling unconditional will not have any issues as switch cases inside them
>> return doing nothing when manifest_checksums is MC_NONE.
>>
>>
> Fixed.
>
>
>> 4.
>> initialize/update/finalize_manifest_checksum() functions may be needed by
>> the
>> validation patch as well. And thus I think these functions should not
>> depend
>> on a global variable as such. Also, it will be good if we keep them in a
>> file
>> that is accessible to frontend-only code. Well, you can ignore these
>> comments
>> with the argument saying that this refactoring can be done by the patch
>> adding
>> validation support. I have no issues. Since both the patches are
>> dependent and
>> posted on the same email chain, thought of putting that observation.
>>
>>
> Make sense, I just changed those API to that it doesn't have to
> access the global.
>
>
>> 5.
>> + switch (manifest_checksums)
>> + {
>> + case MC_SHA256:
>> + checksumlabel = "SHA256:";
>> + break;
>> + case MC_CRC32C:
>> + checksumlabel = "CRC32C:";
>> + break;
>> + case MC_NONE:
>> + break;
>> + }
>>
>> This code in AddFileToManifest() is executed for every file for which we
>> are
>> adding an entry. However, the checksumlabel will be going to remain the
>> same
>> throughout. Can it be set just once and then used as is?
>>
>>
> Yeah, with the attached patch we no more have this part of code.
>
>
>> 6.
>> Can we avoid manifest_checksums from declaring it as a global variable?
>> I think for that, we need to pass that to every function and thus need to
>> change the function signature of various functions. Currently, we pass
>> "StringInfo manifest" to all the required function, will it better to pass
>> the struct variable instead? A struct may have members like,
>> "StringInfo manifest" in it, checksum type (manifest_checksums),
>> checksum label, etc.
>>
>>
> I agree. Earlier I was not sure about this because that require data
> structure
> to expose. But in the given attached patch that's what I tried,
> introduced new
> data structure and defined in basebackup.h and passed the same through the
> function so that doesn't require to pass an individual members. Also
> removed
> global manifest_checksum and added the same in the newly introduced
> structure.
>
> Attaching the patch, which need to apply on the top of earlier 0001 patch.
>
Attaching another version of 0002 patch, as my collogue Jeevan Chalke
pointed
few indentation problem in 0002 patch which I sent earlier. Fixed the same
in
the latest patch.
> Thanks,
>
> --
> Rushabh Lathia
> www.EnterpriseDB.com
>
--
Rushabh Lathia