Re: backup manifests

Suraj Kharage <suraj.kharage@enterprisedb.com>

From: Suraj Kharage <suraj.kharage@enterprisedb.com>

To: Robert Haas <robertmhaas@gmail.com>

Cc: Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2019-12-20T13:24:20Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Attachments

v4-0002-Implementation-of-backup-validator.patch (application/octet-stream) patch v4-0002
v4-0001-Backup-manifest-with-file-names-sizes-timestamps-.patch (application/octet-stream) patch v4-0001

Thank you for review comments.

On Thu, Dec 19, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:

> On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> > I have implemented the simplehash in backup validator patch as Robert
> suggested. Please find attached 0002 patch for the same.
> >
> > kindly review and let me know your thoughts.
>
> +#define CHECKSUM_LENGTH 256
>
> This seems wrong. Not all checksums are the same length, and none of
> the ones we're using are 256 bytes in length, and if we've got to have
> a constant someplace for the maximum checksum length, it should
> probably be in the new header file, not here. But I don't think we
> should need this in the first place; see comments below about how to
> revise the parsing of the manifest file.
>

I agree. Removed.

+    char        filetype[10];
>
> A mysterious 10-byte field with no comments explaining what it
> means... and the same magic number 10 appears in at least one other
> place in the patch.
>

with current logic, we don't need this anymore.
I have removed the filetype from the structure as we are not doing any
comparison anywhere.


>
> +typedef struct manifesthash_hash *hashtab;
>
> This declares a new *type* called hashtab, not a variable called
> hashtab. The new type is not used anywhere, but later, you have
> several variables of the same type that have this name. Just remove
> this: it's wrong and unused.
>
>
corrected.


> +static enum ChecksumAlgorithm checksum_type = MC_NONE;
>
> Remove "enum". Not needed, because you have a typedef for it in the
> header, and not per style.
>
> corrected.


> +static  manifesthash_hash *create_manifest_hash(char
> manifest_path[MAXPGPATH]);
>
> Whitespace is wrong. The whole patch needs a visit from pgindent with
> a properly-updated typedefs.list.
>
> Also, you will struggle to find anywhere else in the code base where
> pass a character array as a function argument. I don't know why this
> isn't just char *.
>

Corrected.


>
> +    if(verify_backup)
>
> Whitespace wrong here, too.
>
>
Fixed


>
> It's also pretty unhelpful, here and elsewhere, to refer to "the hash
> table" as if there were only one, and as if the reader were supposed
> to know something about it when you haven't told them anything about
> it.
>
> +        if (!entry->matched)
> +        {
> +            pg_log_info("missing file: %s", entry->filename);
> +        }
> +
>
> The braces here are not project style. We usually omit braces when
> only a single line of code is present.
>

fixed


>
> I think some work needs to be done to standardize and improve the
> messages that get produced here.  You have:
>
> 1. missing file: %s
> 2. duplicate file present: %s
> 3. size changed for file: %s, original size: %d, current size: %zu
> 4. checksum difference for file: %s
> 5. extra file found: %s
>
> I suggest:
>
> 1. file \"%s\" is present in manifest but missing from the backup
> 2. file \"%s\" has multiple manifest entries
> (this one should probably be pg_log_error(), not pg_log_info(), as it
> represents a corrupt-manifest problem)
> 3. file \"%s" has size %lu in manifest but size %lu in backup
> 4. file \"%s" has checksum %s in manifest but checksum %s in backup
> 5. file \"%s" is present in backup but not in manifest
>

Corrected.


>
> Your patch actually doesn't compile on my system, because for the
> third message above, it uses %zu to print the size. But %zu is for
> size_t, not off_t. I went looking for other places in the code where
> we print off_t; based on that, I think the right thing to do is to
> print it using %lu and write (unsigned long) st.st_size.
>

Corrected.

+    char        file_checksum[256];
> +    char        header[1024];
>
> More arbitrary constants.



>
> +    if (!file)
> +    {
> +        pg_log_error("could not open backup_manifest");
>
> That's bad error reporting.  See e.g. readfile() in initdb.c.
>

Corrected.


>
> +    if (fscanf(file, "%1023[^\n]\n", header) != 1)
> +    {
> +        pg_log_error("error while reading the header from
> backup_manifest");
>
> That's also bad error reporting. It is only a slight step up from
> "ERROR: error".
>
> And we have another magic number (1023).
>

With current logic, we don't need this anymore.


>
> +    appendPQExpBufferStr(manifest, header);
> +    appendPQExpBufferStr(manifest, "\n");
> ...
> +        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
> +                          filesize, mtime, checksum_with_type);
>
> This whole thing seems completely crazy to me. Basically, you're
> trying to use fscanf() to parse the file. But then, because fscanf()
> doesn't give you the original bytes back, you're trying to reassemble
> the data that you parsed to recover the original line, so that you can
> stuff it in the buffer and eventually checksum it. However, that's
> highly error-prone. You're basically duplicating server code, and thus
> risking getting out of sync in the server code, to work around a
> problem that is entirely self-inflicted, namely, deciding to use
> fscanf().
>
> What I would recommend is:
>
> 1. Use open(), read(), close() rather than the fopen() family of
> functions. As we have discovered elsewhere, fread() doesn't promise to
> set errno, so we can't necessarily get reliable error-reporting out of
> it.
>
> 2. Before you start reading the file, create a buffer that's large
> enough to hold the whole thing, by using fstat() to figure out how big
> the file is. Read the whole file into that buffer.  If you're not able
> to read the whole file -- i.e. open() or read() or close() fail --
> then just error out and exit.
>
> 3. Now advance through the file line by line. Write a function that
> knows how to search forward for the next \r or \n but with checks to
> make sure it can't run off the end of the buffer, and use that to
> locate the end of each line so that you can walk forward. As you walk
> forward line by line, add the line you just processed to the checksum.
> That way, you only need a single pass over the data. Also, you can
> modify it in place.  More on that below.
>
> 4. As you examine each line, start by examining the first word. You'll
> need a function that finds the first word by searching forward for a
> tab character, but not beyond the end of the line. The first word of
> the first line should be PostgreSQL-Backup-Manifest-Version and the
> second word should be 1. Then on each subsequent line check whether
> the first word is File or Manifest-Checksum or something else,
> erroring out in the last case. If it's Manifest-Checksum, verify that
> this is the last line of the file and that the checksum matches. If
> it's File, break the line into fields so you can add it to the hash
> table. You'll want a pointer to the filename and a pointer to the
> checksum, and you'll want to parse the size as an integer. Instead of
> allocating new memory for those fields, just overwrite the character
> that follows the field with a \0. There must be one - either \t or \n
> - so you shouldn't run off the end of the buffer.
>
> If you do this, a bunch of the fixed-size buffers you have right now
> go away. You don't need the variable filetype[10] any more, or
> checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
> the character arrays inside DataDirectoryFileInfo. Instead you can
> just have pointers into the buffer that contains the file. And you
> don't need this code to back up using fseek() and reread the lines,
> either.
>
>
Thanks for the suggestion. I tried to mimic your approach in the attached
v4-0002 patch.
Please let me know your thoughts on the same.

Also read this article:
>
> https://stackoverflow.com/questions/2430303/disadvantages-of-scanf
>
> Note that the very first point in the article talks about the problem
> of overrunning the buffer, which you certainly have in the current
> code right here:
>
> +        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,
>
> filetype is declared as char[10], but %s could read arbitrarily much data.
>

now with this revised logic, we don't use this anymore.


>
> +        filename = (char*) pg_malloc(MAXPGPATH);
>
> pg_malloc returns void *, so no cast is required.
>
>
fixed.


> +        if (strcmp(checksum_with_type, "-") == 0)
> +        {
> +            checksum_type = MC_NONE;
> +        }
> +        else
> +        {
> +            if (strncmp(checksum_with_type, "SHA256", 6) == 0)
>
> Use parse_checksum_algorithm. Right now you've invented a "common"
> function with 1 caller, but I explicitly suggested previously that you
> put it in common so that you could reuse it.
>

while parsing the record, we get <checktype>:<checksum> as a string for
checksum.
parse_checksum_algorithm uses pg_strcasecmp()  so we need to pass exact
string to that function.
with current logic, we can't add '\0' in between the line unless we parse
it completely.
So we may need to allocate another small buffer and copy only checksum type
in that and pass that to
 parse_checksum_algorithm.  I don't think of any other solution apart from
this. I might be missing something
here, please correct me if I am wrong.


> +        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0
> ||
> +            strcmp(de->d_name, "pg_wal") == 0)
> +            continue;
>
> Ignoring pg_wal at the top level might be OK, but this will ignore a
> pg_wal entry anywhere in the directory tree.
>
> +    /* Skip backup manifest file. */
> +    if (strcmp(de->d_name, "backup_manifest") == 0)
> +        return;
>
> Same problem.
>

You are right. Added extra check for this.


>
> +    filename = createPQExpBuffer();
> +    if (!filename)
> +    {
> +        pg_log_error("out of memory");
> +        exit(1);
> +    }
> +
> +    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);
>
> Just use char filename[MAXPGPATH] and snprintf here, as you do
> elsewhere. It will be simpler and save memory.
>
Fixed.

TAP test case patch needs some modification, Will do that and submit.

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.