Re: backup manifests
Rushabh Lathia <rushabh.lathia@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- 0002_new_data_structure.patch (text/x-patch) patch
Thanks Jeevan for reviewing the patch and offline discussion.
On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:
>
>
> On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com>
> wrote:
>
>>
>>
>> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com>
>>> wrote:
>>> > Here is the whole stack of patches.
>>>
>>> I committed 0001, as that's just refactoring and I think (hope) it's
>>> uncontroversial. I think 0002-0005 need to be squashed together
>>> (crediting all authors properly and in the appropriate order) as it's
>>> quite hard to understand right now,
>>
>>
>> Please find attached single patch and I tried to add the credit to all
>> the authors.
>>
>
> I had a look over the patch and here are my few review comments:
>
> 1.
> + if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
> + manifest_checksums = MC_SHA256;
> + else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
> + manifest_checksums = MC_CRC32C;
> + else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
> + manifest_checksums = MC_NONE;
> + else
> + ereport(ERROR,
>
> Is NONE is a valid input? I think the default is "NONE" only and thus no
> need
> of this as an input. It will be better if we simply error out if input is
> neither "SHA256" nor "CRC32C".
>
> I believe you have done this way as from pg_basebackup you are always
> passing
> MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
> given. But I think passing that conditional will be better like we have
> maxrate_clause for example.
>
> Well, this is what I think, feel free to ignore as I don't see any
> correctness
> issue over here.
>
>
I would still keep this NONE as it's look more cleaner in the say of
given options to the checksums.
> 2.
> + if (manifest_checksums != MC_NONE)
> + {
> + checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
> + switch (manifest_checksums)
> + {
> + case MC_NONE:
> + break;
> + }
>
> Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
> I don't think we need a case for MC_NONE here. Rather we can use a default
> case to error out.
>
>
Yeah, with the new patch we don't have this part of code.
> 3.
> + if (manifest_checksums != MC_NONE)
> + {
> + initialize_manifest_checksum(&cCtx);
> + update_manifest_checksum(&cCtx, content, len);
> + }
>
> @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
> *tarfilename, struct stat *statbuf
> int segmentno = 0;
> char *segmentpath;
> bool verify_checksum = false;
> + ChecksumCtx cCtx;
> +
> + initialize_manifest_checksum(&cCtx);
>
>
> I see that in a few cases you are calling
> initialize/update_manifest_checksum()
> conditional and at some other places call is unconditional. It seems like
> calling unconditional will not have any issues as switch cases inside them
> return doing nothing when manifest_checksums is MC_NONE.
>
>
Fixed.
> 4.
> initialize/update/finalize_manifest_checksum() functions may be needed by
> the
> validation patch as well. And thus I think these functions should not
> depend
> on a global variable as such. Also, it will be good if we keep them in a
> file
> that is accessible to frontend-only code. Well, you can ignore these
> comments
> with the argument saying that this refactoring can be done by the patch
> adding
> validation support. I have no issues. Since both the patches are dependent
> and
> posted on the same email chain, thought of putting that observation.
>
>
Make sense, I just changed those API to that it doesn't have to
access the global.
> 5.
> + switch (manifest_checksums)
> + {
> + case MC_SHA256:
> + checksumlabel = "SHA256:";
> + break;
> + case MC_CRC32C:
> + checksumlabel = "CRC32C:";
> + break;
> + case MC_NONE:
> + break;
> + }
>
> This code in AddFileToManifest() is executed for every file for which we
> are
> adding an entry. However, the checksumlabel will be going to remain the
> same
> throughout. Can it be set just once and then used as is?
>
>
Yeah, with the attached patch we no more have this part of code.
> 6.
> Can we avoid manifest_checksums from declaring it as a global variable?
> I think for that, we need to pass that to every function and thus need to
> change the function signature of various functions. Currently, we pass
> "StringInfo manifest" to all the required function, will it better to pass
> the struct variable instead? A struct may have members like,
> "StringInfo manifest" in it, checksum type (manifest_checksums),
> checksum label, etc.
>
>
I agree. Earlier I was not sure about this because that require data
structure
to expose. But in the given attached patch that's what I tried, introduced
new
data structure and defined in basebackup.h and passed the same through the
function so that doesn't require to pass an individual members. Also
removed
global manifest_checksum and added the same in the newly introduced
structure.
Attaching the patch, which need to apply on the top of earlier 0001 patch.
Thanks,
--
Rushabh Lathia
www.EnterpriseDB.com