Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

  1. backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-18T17:48:06Z

    In the lengthy thread on block-level incremental backup,[1] both
    Vignesh C[2] and Stephen Frost[3] have suggested storing a manifest as
    part of each backup, somethig that could be useful not only for
    incremental backups but also for full backups. I initially didn't
    think this was necessary,[4] but some of my colleagues figured out
    that my design was broken, because my proposal was to detect new
    blocks just using LSN, and that ignores the fact that CREATE DATABASE
    and ALTER TABLE .. SET TABLESPACE do physical copies without bumping
    page LSNs, which I knew but somehow forgot about.  Fortunately, some
    of my colleagues realized my mistake in testing.[5] Because of this
    problem, for an LSN-based approach to work, we'll need to send not
    only an LSN, but also a list of files (and file sizes) that exist in
    the previous full backup; so, some kind of backup manifest now seems
    like a good idea to me.[6] That whole approach might still be dead on
    arrival if it's possible to add new blocks with old LSNs to existing
    files,[7] but there seems to be room to hope that there are no such
    cases.[8]
    
    So, let's suppose we invent a backup manifest. What should it contain?
    I imagine that it would consist of a list of files, and the lengths of
    those files, and a checksum for each file. I think you should have a
    choice of what kind of checksums to use, because algorithms that used
    to seem like good choices (e.g. MD5) no longer do; this trend can
    probably be expected to continue. Even if we initially support only
    one kind of checksum -- presumably SHA-something since we have code
    for that already for SCRAM -- I think that it would also be a good
    idea to allow for future changes. And maybe it's best to just allow a
    choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
    gate, so that we can avoid bikeshedding over which one is secure
    enough. I guess we'll still have to argue about the default. I also
    think that it should be possible to build a manifest with no
    checksums, so that one need not pay the overhead of computing
    checksums if one does not wish. Of course, such a manifest is of much
    less utility for checking backup integrity, but you can still check
    that you've got the right files, which is noticeably better than
    nothing.  The manifest should probably also contain a checksum of its
    own contents so that the integrity of the manifest itself can be
    verified. And maybe a few other bits of metadata, but I'm not sure
    exactly what.  Ideas?
    
    Once we invent the concept of a backup manifest, what do we need to do
    with them? I think we'd want three things initially:
    
    (1) When taking a backup, have the option (perhaps enabled by default)
    to include a backup manifest.
    (2) Given an existing backup that has not got a manifest, construct one.
    (3) Cross-check a manifest against a backup and complain about extra
    files, missing files, size differences, or checksum mismatches.
    
    One thing I'm not quite sure about is where to store the backup
    manifest. If you take a base backup in tar format, you get base.tar,
    pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
    Does the backup manifest go into base.tar? Get written into a separate
    file outside of any tar archive? Something else? And what about a
    plain-format backup? I suppose then we should just write the manifest
    into the top level of the main data directory, but perhaps someone has
    another idea.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    [1] https://www.postgresql.org/message-id/flat/CA%2BTgmoYxQLL%3DmVyN90HZgH0X_EUrw%2BaZ0xsXJk7XV3-3LygTvA%40mail.gmail.com
    [2] https://www.postgresql.org/message-id/CALDaNm310fUZ72nM2n%3DcD0eSHKRAoJPuCyvvR0dhTEZ9Oytyzg%40mail.gmail.com
    [3] https://www.postgresql.org/message-id/20190916143817.GA6962%40tamriel.snowman.net
    [4] https://www.postgresql.org/message-id/CA%2BTgmoaj-zw4Mou4YBcJSkHmQM%2BJA-dAVJnRP8zSASP1S4ZVgw%40mail.gmail.com
    [5] https://www.postgresql.org/message-id/CAM2%2B6%3DXfJX%3DKXvpTgDvgd1rQjya_Am27j4UvJtL3nA%2BJMCTGVQ%40mail.gmail.com
    [6] https://www.postgresql.org/message-id/CA%2BTgmoYg9i8TZhyjf8MqCyU8unUVuW%2B03FeBF1LGDu_-eOONag%40mail.gmail.com
    [7] https://www.postgresql.org/message-id/CA%2BTgmoYT9xODgEB6y6j93hFHqobVcdiRCRCp0dHh%2BfFzZALn%3Dw%40mail.gmail.com
    and nearby messages
    [8] https://www.postgresql.org/message-id/20190916173933.GE6962%40tamriel.snowman.net
    
    
    
    
  2. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-19T01:11:36Z

    Hi Robert,
    
    On 9/18/19 1:48 PM, Robert Haas wrote:
    > That whole approach might still be dead on
    > arrival if it's possible to add new blocks with old LSNs to existing
    > files,[7] but there seems to be room to hope that there are no such
    > cases.[8]
    
    I sure hope there are no such cases, but we should be open to the idea
    just in case.
    
    > So, let's suppose we invent a backup manifest. What should it contain?
    > I imagine that it would consist of a list of files, and the lengths of
    > those files, and a checksum for each file. 
    
    These are essential.
    
    Also consider adding the timestamp.  You have justifiable concerns about
    using timestamps for deltas and I get that.  However, there are a number
    of methods that can be employed to make it *much* safer.  I won't go
    into that here since it is an entire thread in itself.  Suffice to say
    we can detect many anomalies in the timestamps and require a checksum
    backup when we see them.  I'm really interested in scanning the WAL for
    changed files but that method is very complex and getting it right might
    be harder than ensuring FS checksums are reliable.  Still worth trying,
    though, since the benefits are enormous.  We are planning to use
    timestamp + size + wal data to do incrementals if we get there.
    
    Consider adding a reference to each file that specifies where the file
    can be found in if it is not in this backup.  As I understand the
    pg_basebackup proposal, it would only be implementing differential
    backups, i.e. an incremental that is *only* based on the last full
    backup.  So, the reference can be inferred in this case.  However, if
    the user selects the wrong full backup on restore, and we have labeled
    each backup, then a differential restore with references against the
    wrong full backup would result in a hard error rather than corruption.
    
    > I think you should have a
    > choice of what kind of checksums to use, because algorithms that used
    > to seem like good choices (e.g. MD5) no longer do; this trend can
    > probably be expected to continue. Even if we initially support only
    > one kind of checksum -- presumably SHA-something since we have code
    > for that already for SCRAM -- I think that it would also be a good
    > idea to allow for future changes. And maybe it's best to just allow a
    > choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
    > gate, so that we can avoid bikeshedding over which one is secure
    > enough. I guess we'll still have to argue about the default. 
    
    Based on my original calculations (which sadly I don't have anymore),
    the combination of SHA1, size, and file name is *extremely* unlikely to
    generate a collision.  As in, unlikely to happen before the end of the
    universe kind of unlikely.  Though, I guess it depends on your
    expectations for the lifetime of the universe.
    
    These checksums don't have to be cryptographically secure, in the sense
    that you could infer the plaintext from the checksum.  They just need to
    have a suitably low collision rate.  These days I would choose something
    with more bits because the computation time is similar, though the
    larger size requires more storage.
    
    > I also
    > think that it should be possible to build a manifest with no
    > checksums, so that one need not pay the overhead of computing
    > checksums if one does not wish. 
    
    Our benchmarks have indicated that checksums only account for about 1%
    of total cpu time when gzip -6 compression is used.  Without compression
    the percentage may be higher of course, but in that case we find network
    latency is the primary bottleneck.
    
    For S3 backups we do a SHA1 hash for our manifest, a SHA256 hash for
    authv4 and a good-old-fashioned MD5 checksum for each upload part.  This
    is barely noticeable when compression is enabled.
    
    > Of course, such a manifest is of much
    > less utility for checking backup integrity, but you can still check
    > that you've got the right files, which is noticeably better than
    > nothing.  
    
    Absolutely -- and yet.  There was a time when we made checksums optional
    but eventually gave up on that once we profiled and realized how low the
    cost was vs. the benefit.
    
    > The manifest should probably also contain a checksum of its
    > own contents so that the integrity of the manifest itself can be
    > verified. 
    
    This is a good idea.  Amazingly we've never seen a manifest checksum
    error in the field but it's only a matter of time.
    
    And maybe a few other bits of metadata, but I'm not sure
    > exactly what.  Ideas?
    
    A backup label for sure.  You can also use this as the directory/tar
    name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
    full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
    incrementals and have logic to prevent two backups from having the same
    label.  This is unlikely outside of testing but still a good idea.
    
    Knowing the start/stop time of the backup is useful in all kinds of
    ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
    also good.  I know this is also in backup_label but having it all in one
    place is nice.
    
    We include the version/sysid of the cluster to avoid mixups.  It's a
    great extra check on top of references to be sure everything is kosher.
    
    A manifest version is good in case we change the format later.  I'd
    recommend JSON for the format since it is so ubiquitous and easily
    handles escaping which can be gotchas in a home-grown format.  We
    currently have a format that is a combination of Windows INI and JSON
    (for human-readability in theory) and we have become painfully aware of
    escaping issues.  Really, why would you drop files with '=' in their
    name in PGDATA?  And yet it happens.
    
    > Once we invent the concept of a backup manifest, what do we need to do
    > with them? I think we'd want three things initially:
    > 
    > (1) When taking a backup, have the option (perhaps enabled by default)
    > to include a backup manifest.
    
    Manifests are cheap to builds so I wouldn't make it an option.
    
    > (2) Given an existing backup that has not got a manifest, construct one.
    
    Might be too late to be trusted and we'd have to write extra code for
    it.  I'd leave this for a project down the road, if at all.
    
    > (3) Cross-check a manifest against a backup and complain about extra
    > files, missing files, size differences, or checksum mismatches.
    
    Verification is the best part of the manifest.  Plus, you can do
    verification pretty cheaply on restore.  We also restore pg_control last
    so clusters that have a restore error won't start.
    
    > One thing I'm not quite sure about is where to store the backup
    > manifest. If you take a base backup in tar format, you get base.tar,
    > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
    > Does the backup manifest go into base.tar? Get written into a separate
    > file outside of any tar archive? Something else? And what about a
    > plain-format backup? I suppose then we should just write the manifest
    > into the top level of the main data directory, but perhaps someone has
    > another idea.
    
    We do:
    
    [backup_label]/
        backup.manifest
        pg_data/
        pg_tblspc/
    
    In general, having the manifest easily accessible is ideal.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  3. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-19T13:51:11Z

    On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
    > Also consider adding the timestamp.
    
    Sounds reasonable, even if only for the benefit of humans who might
    look at the file.  We can decide later whether to use it for anything
    else (and third-party tools could make different decisions from core).
    I assume we're talking about file mtime here, not file ctime or file
    atime or the time the manifest was generated, but let me know if I'm
    wrong.
    
    > Consider adding a reference to each file that specifies where the file
    > can be found in if it is not in this backup.  As I understand the
    > pg_basebackup proposal, it would only be implementing differential
    > backups, i.e. an incremental that is *only* based on the last full
    > backup.  So, the reference can be inferred in this case.  However, if
    > the user selects the wrong full backup on restore, and we have labeled
    > each backup, then a differential restore with references against the
    > wrong full backup would result in a hard error rather than corruption.
    
    I intend that we should be able to support incremental backups based
    either on a previous full backup or based on a previous incremental
    backup. I am not aware of a technical reason why we need to identify
    the specific backup that must be used. If incremental backup B is
    taken based on a pre-existing backup A, then I think that B can be
    restored using either A or *any other backup taken after A and before
    B*. In the normal case, there probably wouldn't be any such backup,
    but AFAICS the start-LSNs are a sufficient cross-check that the chosen
    base backup is legal.
    
    > Based on my original calculations (which sadly I don't have anymore),
    > the combination of SHA1, size, and file name is *extremely* unlikely to
    > generate a collision.  As in, unlikely to happen before the end of the
    > universe kind of unlikely.  Though, I guess it depends on your
    > expectations for the lifetime of the universe.
    
    Somebody once said that we should be prepared for it to end at an any
    time, or not, and that the time at which it actually was due to end
    would not be disclosed in advance. This is probably good life advice
    which I ought to take more frequently than I do, but I think we can
    finesse the issue for purposes of this discussion. What I'd say is: if
    the probability of getting a collision is demonstrably many orders of
    magnitude less than the probability of the disk writing the block
    incorrectly, then I think we're probably reasonably OK. Somebody might
    differ, which is perhaps a mild point in favor of LSN-based
    approaches, but as a practical matter, if a bad block is a billion
    times more likely to be the result of a disk error than a checksum
    mismatch, then it's a negligible risk.
    
    > And maybe a few other bits of metadata, but I'm not sure
    > > exactly what.  Ideas?
    >
    > A backup label for sure.  You can also use this as the directory/tar
    > name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
    > full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
    > incrementals and have logic to prevent two backups from having the same
    > label.  This is unlikely outside of testing but still a good idea.
    >
    > Knowing the start/stop time of the backup is useful in all kinds of
    > ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
    > also good.  I know this is also in backup_label but having it all in one
    > place is nice.
    >
    > We include the version/sysid of the cluster to avoid mixups.  It's a
    > great extra check on top of references to be sure everything is kosher.
    
    I don't think it's a good idea to duplicate the information that's
    already in the backup_label. Storing two copies of the same
    information is just an invitation to having to worry about what
    happens if they don't agree.
    
    > A manifest version is good in case we change the format later.
    
    Yeah.
    
    > I'd
    > recommend JSON for the format since it is so ubiquitous and easily
    > handles escaping which can be gotchas in a home-grown format.  We
    > currently have a format that is a combination of Windows INI and JSON
    > (for human-readability in theory) and we have become painfully aware of
    > escaping issues.  Really, why would you drop files with '=' in their
    > name in PGDATA?  And yet it happens.
    
    I am not crazy about JSON because it requires that I get a json parser
    into src/common, which I could do, but given the possibly-imminent end
    of the universe, I'm not sure it's the greatest use of time. You're
    right that if we pick an ad-hoc format, we've got to worry about
    escaping, which isn't lovely.
    
    > > (1) When taking a backup, have the option (perhaps enabled by default)
    > > to include a backup manifest.
    >
    > Manifests are cheap to builds so I wouldn't make it an option.
    
    Huh. That's an interesting idea. Thanks.
    
    > > (3) Cross-check a manifest against a backup and complain about extra
    > > files, missing files, size differences, or checksum mismatches.
    >
    > Verification is the best part of the manifest.  Plus, you can do
    > verification pretty cheaply on restore.  We also restore pg_control last
    > so clusters that have a restore error won't start.
    
    There's no "restore" operation here, really. A backup taken by
    pg_basebackup can be "restored" by copying the whole thing, but it can
    also be used just where it is. If we were going to build something
    into some in-core tool to copy backups around, this would be a smart
    way to implement said tool, but I'm not planning on that myself.
    
    > > One thing I'm not quite sure about is where to store the backup
    > > manifest. If you take a base backup in tar format, you get base.tar,
    > > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
    > > Does the backup manifest go into base.tar? Get written into a separate
    > > file outside of any tar archive? Something else? And what about a
    > > plain-format backup? I suppose then we should just write the manifest
    > > into the top level of the main data directory, but perhaps someone has
    > > another idea.
    >
    > We do:
    >
    > [backup_label]/
    >     backup.manifest
    >     pg_data/
    >     pg_tblspc/
    >
    > In general, having the manifest easily accessible is ideal.
    
    That's a fine choice for a tool, but a I'm talking about something
    that is part of the actual backup format supported by PostgreSQL, not
    what a tool might wrap around it. The choice is whether, for a
    tar-format backup, the manifest goes inside a tar file or as a
    separate file. To put that another way, a patch adding backup
    manifests does not get to redesign where pg_basebackup puts anything
    else; it only gets to decide where to put the manifest.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  4. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-19T15:00:45Z

    On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > I intend that we should be able to support incremental backups based
    > either on a previous full backup or based on a previous incremental
    > backup. I am not aware of a technical reason why we need to identify
    > the specific backup that must be used. If incremental backup B is
    > taken based on a pre-existing backup A, then I think that B can be
    > restored using either A or *any other backup taken after A and before
    > B*. In the normal case, there probably wouldn't be any such backup,
    > but AFAICS the start-LSNs are a sufficient cross-check that the chosen
    > base backup is legal.
    
    Scratch that: there can be overlapping backups, so you have to
    cross-check both start and stop LSNs.
    
    > > > (3) Cross-check a manifest against a backup and complain about extra
    > > > files, missing files, size differences, or checksum mismatches.
    > >
    > > Verification is the best part of the manifest.  Plus, you can do
    > > verification pretty cheaply on restore.  We also restore pg_control last
    > > so clusters that have a restore error won't start.
    >
    > There's no "restore" operation here, really. A backup taken by
    > pg_basebackup can be "restored" by copying the whole thing, but it can
    > also be used just where it is. If we were going to build something
    > into some in-core tool to copy backups around, this would be a smart
    > way to implement said tool, but I'm not planning on that myself.
    
    Scratch that: incremental backups need a restore tool, so we can use
    this technique there. And it can work for full backups too, because
    why not?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  5. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-20T03:06:04Z

    Hi Robert,
    
    On 9/19/19 9:51 AM, Robert Haas wrote:
    > On Wed, Sep 18, 2019 at 9:11 PM David Steele <david@pgmasters.net> wrote:
    >> Also consider adding the timestamp.
    > 
    > Sounds reasonable, even if only for the benefit of humans who might
    > look at the file.  We can decide later whether to use it for anything
    > else (and third-party tools could make different decisions from core).
    > I assume we're talking about file mtime here, not file ctime or file
    > atime or the time the manifest was generated, but let me know if I'm
    > wrong.
    
    In my experience only mtime is useful.
    
    >> Based on my original calculations (which sadly I don't have anymore),
    >> the combination of SHA1, size, and file name is *extremely* unlikely to
    >> generate a collision.  As in, unlikely to happen before the end of the
    >> universe kind of unlikely.  Though, I guess it depends on your
    >> expectations for the lifetime of the universe.
    
    > What I'd say is: if
    > the probability of getting a collision is demonstrably many orders of
    > magnitude less than the probability of the disk writing the block
    > incorrectly, then I think we're probably reasonably OK. Somebody might
    > differ, which is perhaps a mild point in favor of LSN-based
    > approaches, but as a practical matter, if a bad block is a billion
    > times more likely to be the result of a disk error than a checksum
    > mismatch, then it's a negligible risk.
    
    Agreed.
    
    >> We include the version/sysid of the cluster to avoid mixups.  It's a
    >> great extra check on top of references to be sure everything is kosher.
    > 
    > I don't think it's a good idea to duplicate the information that's
    > already in the backup_label. Storing two copies of the same
    > information is just an invitation to having to worry about what
    > happens if they don't agree.
    
    OK, but now we have backup_label, tablespace_map, 
    XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a 
    backup.manifest file.  I feel like we may be drowning in backup info files.
    
    >> I'd
    >> recommend JSON for the format since it is so ubiquitous and easily
    >> handles escaping which can be gotchas in a home-grown format.  We
    >> currently have a format that is a combination of Windows INI and JSON
    >> (for human-readability in theory) and we have become painfully aware of
    >> escaping issues.  Really, why would you drop files with '=' in their
    >> name in PGDATA?  And yet it happens.
    > 
    > I am not crazy about JSON because it requires that I get a json parser
    > into src/common, which I could do, but given the possibly-imminent end
    > of the universe, I'm not sure it's the greatest use of time. You're
    > right that if we pick an ad-hoc format, we've got to worry about
    > escaping, which isn't lovely.
    
    My experience is that JSON is simple to implement and has already dealt 
    with escaping and data structure considerations.  A home-grown solution 
    will be at least as complex but have the disadvantage of being non-standard.
    
    >>> One thing I'm not quite sure about is where to store the backup
    >>> manifest. If you take a base backup in tar format, you get base.tar,
    >>> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
    >>> Does the backup manifest go into base.tar? Get written into a separate
    >>> file outside of any tar archive? Something else? And what about a
    >>> plain-format backup? I suppose then we should just write the manifest
    >>> into the top level of the main data directory, but perhaps someone has
    >>> another idea.
    >>
    >> We do:
    >>
    >> [backup_label]/
    >>      backup.manifest
    >>      pg_data/
    >>      pg_tblspc/
    >>
    >> In general, having the manifest easily accessible is ideal.
    > 
    > That's a fine choice for a tool, but a I'm talking about something
    > that is part of the actual backup format supported by PostgreSQL, not
    > what a tool might wrap around it. The choice is whether, for a
    > tar-format backup, the manifest goes inside a tar file or as a
    > separate file. To put that another way, a patch adding backup
    > manifests does not get to redesign where pg_basebackup puts anything
    > else; it only gets to decide where to put the manifest.
    
    Fair enough.  The point is to make the manifest easily accessible.
    
    I'd keep it in the data directory for file-based backups and as a 
    separate file for tar-based backups.  The advantage here is that we can 
    pick a file name that becomes reserved which a tool can't do.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  6. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-20T03:10:46Z

    On 9/19/19 11:00 AM, Robert Haas wrote:
    
    > On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >> I intend that we should be able to support incremental backups based
    >> either on a previous full backup or based on a previous incremental
    >> backup. I am not aware of a technical reason why we need to identify
    >> the specific backup that must be used. If incremental backup B is
    >> taken based on a pre-existing backup A, then I think that B can be
    >> restored using either A or *any other backup taken after A and before
    >> B*. In the normal case, there probably wouldn't be any such backup,
    >> but AFAICS the start-LSNs are a sufficient cross-check that the chosen
    >> base backup is legal.
    > 
    > Scratch that: there can be overlapping backups, so you have to
    > cross-check both start and stop LSNs.
    
    Overall we have found it's much simpler to label each backup and 
    cross-check that against the pg version and system id.  Start LSN is 
    pretty unique, but backup labels work really well and are more widely 
    understood.
    
    >>>> (3) Cross-check a manifest against a backup and complain about extra
    >>>> files, missing files, size differences, or checksum mismatches.
    >>>
    >>> Verification is the best part of the manifest.  Plus, you can do
    >>> verification pretty cheaply on restore.  We also restore pg_control last
    >>> so clusters that have a restore error won't start.
    >>
    >> There's no "restore" operation here, really. A backup taken by
    >> pg_basebackup can be "restored" by copying the whole thing, but it can
    >> also be used just where it is. If we were going to build something
    >> into some in-core tool to copy backups around, this would be a smart
    >> way to implement said tool, but I'm not planning on that myself.
    > 
    > Scratch that: incremental backups need a restore tool, so we can use
    > this technique there. And it can work for full backups too, because
    > why not?
    
    Agreed, once we have a restore tool, use it for everything.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  7. Re: backup manifests

    Michael Paquier <michael@paquier.xyz> — 2019-09-20T07:15:28Z

    On Thu, Sep 19, 2019 at 11:10:46PM -0400, David Steele wrote:
    > On 9/19/19 11:00 AM, Robert Haas wrote:
    >> On Thu, Sep 19, 2019 at 9:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >> > I intend that we should be able to support incremental backups based
    >> > either on a previous full backup or based on a previous incremental
    >> > backup. I am not aware of a technical reason why we need to identify
    >> > the specific backup that must be used. If incremental backup B is
    >> > taken based on a pre-existing backup A, then I think that B can be
    >> > restored using either A or *any other backup taken after A and before
    >> > B*. In the normal case, there probably wouldn't be any such backup,
    >> > but AFAICS the start-LSNs are a sufficient cross-check that the chosen
    >> > base backup is legal.
    >> 
    >> Scratch that: there can be overlapping backups, so you have to
    >> cross-check both start and stop LSNs.
    > 
    > Overall we have found it's much simpler to label each backup and cross-check
    > that against the pg version and system id.  Start LSN is pretty unique, but
    > backup labels work really well and are more widely understood.
    
    Warning.  The start LSN could be the same for multiple backups when
    taken from a standby.
    --
    Michael
    
  8. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-20T12:58:40Z

    On Thu, Sep 19, 2019 at 11:10 PM David Steele <david@pgmasters.net> wrote:
    > Overall we have found it's much simpler to label each backup and
    > cross-check that against the pg version and system id.  Start LSN is
    > pretty unique, but backup labels work really well and are more widely
    > understood.
    
    I see your point, but part of my point is that uniqueness is not a
    technical requirement. However, it may be a requirement for user
    comprehension.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  9. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-20T13:46:11Z

    On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
    > > I am not crazy about JSON because it requires that I get a json parser
    > > into src/common, which I could do, but given the possibly-imminent end
    > > of the universe, I'm not sure it's the greatest use of time. You're
    > > right that if we pick an ad-hoc format, we've got to worry about
    > > escaping, which isn't lovely.
    >
    > My experience is that JSON is simple to implement and has already dealt
    > with escaping and data structure considerations.  A home-grown solution
    > will be at least as complex but have the disadvantage of being non-standard.
    
    I think that's fair and just spent a little while investigating how
    difficult it would be to disentangle the JSON parser from the backend.
    It has dependencies on the following bits of backend-only
    functionality:
    
    - check_stack_depth(). No problem, I think.  Just skip it for frontend code.
    
    - pg_mblen() / GetDatabaseEncoding(). Not sure what to do about this.
    Some of our infrastructure for dealing with encoding is available in
    the frontend and backend, but this part is backend-only.
    
    - elog() / ereport(). Kind of a pain. We could just kill the program
    if an error occurs, but that seems a bit ham-fisted. Refactoring the
    code so that the error is returned rather than thrown might be the way
    to go, but it's not simple, because you're not just passing a string.
    
                            ereport(ERROR,
    
    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                             errmsg("invalid input syntax
    for type %s", "json"),
                                             errdetail("Character with
    value 0x%02x must be escaped.",
                                                               (unsigned char) *s),
                                             report_json_context(lex)));
    
    - appendStringInfo et. al. I don't think it would be that hard to move
    this to src/common, but I'm also not sure it really solves the
    problem, because StringInfo has a 1GB limit, and there's no rule at
    all that a backup manifest has got to be less than 1GB.
    
    https://www.pgcon.org/2013/schedule/events/595.en.html
    
    This gets at another problem that I just started to think about. If
    the file is just a series of lines, you can parse it one line and a
    time and do something with that line, then move on. If it's a JSON
    blob, you have to parse the whole file and get a potentially giant
    data structure back, and then operate on that data structure. At
    least, I think you do. There's probably some way to create a callback
    structure that lets you presuppose that the toplevel data structure is
    an array (or object) and get back each element of that array (or
    key/value pair) as it's parsed, but that sounds pretty annoying to get
    working. Or we could just decide that you have to have enough memory
    to hold the parsed version of the entire manifest file in memory all
    at once, and if you don't, maybe you should drop some tables or buy
    more RAM. That still leaves you with bypassing the 1GB size limit on
    StringInfo, maybe by having a "huge" option, or perhaps by
    memory-mapping the file and then making the StringInfo point directly
    into the mapped region. Perhaps I'm overthinking this and maybe you
    have a simpler idea in mind about how it can be made to work, but I
    find all this complexity pretty unappealing.
    
    Here's a competing proposal: let's decide that lines consist of
    tab-separated fields. If a field contains a \t, \r, or \n, put a " at
    the beginning, a " at the end, and double any " that appears in the
    middle. This is easy to generate and easy to parse. It lets us
    completely ignore encoding considerations. Incremental parsing is
    straightforward. Quoting will rarely be needed because there's very
    little reason to create a file inside a PostgreSQL data directory that
    contains a tab or a newline, but if you do it'll still work.  The lack
    of quoting is nice for humans reading the manifest, and nice in terms
    of keeping the manifest succinct; in contrast, note that using JSON
    doubles every backslash.
    
    I hear you saying that this is going to end up being just as complex
    in the end, but I don't think I believe it.  It sounds to me like the
    difference between spending a couple of hours figuring this out and
    spending a couple of months trying to figure it out and maybe not
    actually getting anywhere.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  10. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-20T14:40:06Z

    On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
    > > I don't think it's a good idea to duplicate the information that's
    > > already in the backup_label. Storing two copies of the same
    > > information is just an invitation to having to worry about what
    > > happens if they don't agree.
    >
    > OK, but now we have backup_label, tablespace_map,
    > XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.backup (in the WAL) and now perhaps a
    > backup.manifest file.  I feel like we may be drowning in backup info files.
    
    I agree!
    
    I'm not sure what to do about it, though.  The information that is
    present in the tablespace_map file could have been stored in the
    backup_label file, I think, and that would have made sense, because
    both files are serving a very similar purpose: they tell the server
    that it needs to do some non-standard stuff when it starts up, and
    they give it instructions for what those things are. And, as a
    secondary purpose, humans or third-party tools can read them and use
    that information for whatever purpose they wish.
    
    The proposed backup_manifest file is a little different. I don't think
    that anyone is proposing that the server should read that file: it is
    there solely for the purpose of helping our own tools or third-party
    tools or human beings who are, uh, acting like tools.[1] We're also
    proposing to put it in a different place: the backup_label goes into
    one of the tar files, but the backup_manifest would sit outside of any
    tar file.
    
    If we were designing this from scratch, maybe we'd roll all of this
    into one file that serves as backup manifest, tablespace map, backup
    label, and backup history file, but then again, maybe separating the
    instructions-to-the-server part from the backup-integrity-checking
    part makes sense.  At any rate, even if we knew for sure that's the
    direction we wanted to go, getting there from here looks a bit rough.
    If we just add a backup manifest, people who don't care can mostly
    ignore it and then should be mostly fine. If we start trying to create
    the one backup information system to rule them all, we're going to
    break people's tools. Maybe that's worth doing someday, but the paint
    isn't even dry on removing recovery.conf yet.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    [1] There are a surprising number of installations where, in effect,
    the DBA is the backup-and-restore tool, performing all the steps by
    hand and hoping not to mess any of them up. The fact that nearly every
    PostgreSQL company offers tools to make this easier does not seem to
    have done a whole lot to diminish the number of people using ad-hoc
    solutions.
    
    
    
    
  11. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-20T14:59:55Z

    On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > - appendStringInfo et. al. I don't think it would be that hard to move
    > this to src/common, but I'm also not sure it really solves the
    > problem, because StringInfo has a 1GB limit, and there's no rule at
    > all that a backup manifest has got to be less than 1GB.
    
    Hmm.  That's actually going to be a problem on the server side, no
    matter what we do on the client side.  We have to send the manifest
    after we send everything else, so that we know what we sent. But if we
    sent a lot of files, the manifest might be really huge. I had been
    thinking that we would generate the manifest on the server and send it
    to the client after everything else, but maybe this is an argument for
    generating the manifest on the client side and writing it
    incrementally. That would require the client to peek at the contents
    of every tar file it receives all the time, which it currently doesn't
    need to do, but it does peek inside them a little bit, so maybe it's
    OK.
    
    Another alternative would be to have the server spill the manifest in
    progress to a temp file and then stream it from there to the client.
    
    Thoughts?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  12. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-20T15:09:34Z

    On 9/20/19 9:46 AM, Robert Haas wrote:
    > On Thu, Sep 19, 2019 at 11:06 PM David Steele <david@pgmasters.net> wrote:
    >
    >> My experience is that JSON is simple to implement and has already dealt
    >> with escaping and data structure considerations.  A home-grown solution
    >> will be at least as complex but have the disadvantage of being non-standard.
    >
    > I think that's fair and just spent a little while investigating how
    > difficult it would be to disentangle the JSON parser from the backend.
    > It has dependencies on the following bits of backend-only
    > functionality:
    
    > - elog() / ereport(). Kind of a pain. We could just kill the program
    > if an error occurs, but that seems a bit ham-fisted. Refactoring the
    > code so that the error is returned rather than thrown might be the way
    > to go, but it's not simple, because you're not just passing a string.
    
    Seems to me we are overdue for elog()/ereport() compatible
    error-handling in the front end.  Plus mem contexts.
    
    It sucks to make that a prereq for this project but the longer we kick
    that can down the road...
    
    > https://www.pgcon.org/2013/schedule/events/595.en.html
    
    This talk was good fun.  The largest number of tables we've seen is a
    few hundred thousand, but that still adds up to more than a million
    files to backup.
    
    > This gets at another problem that I just started to think about. If
    > the file is just a series of lines, you can parse it one line and a
    > time and do something with that line, then move on. If it's a JSON
    > blob, you have to parse the whole file and get a potentially giant
    > data structure back, and then operate on that data structure. At
    > least, I think you do. 
    
    JSON can definitely be parsed incrementally, but for practical reasons
    certain structures work better than others.
    
    > There's probably some way to create a callback
    > structure that lets you presuppose that the toplevel data structure is
    > an array (or object) and get back each element of that array (or
    > key/value pair) as it's parsed, but that sounds pretty annoying to get
    > working. 
    
    And that's how we do it.  It's annoying and yeah it's complicated but it
    is very fast and memory-efficient.
    
    > Or we could just decide that you have to have enough memory
    > to hold the parsed version of the entire manifest file in memory all
    > at once, and if you don't, maybe you should drop some tables or buy
    > more RAM. 
    
    I assume you meant "un-parsed" here?
    
    > That still leaves you with bypassing the 1GB size limit on
    > StringInfo, maybe by having a "huge" option, or perhaps by
    > memory-mapping the file and then making the StringInfo point directly
    > into the mapped region. Perhaps I'm overthinking this and maybe you
    > have a simpler idea in mind about how it can be made to work, but I
    > find all this complexity pretty unappealing.
    
    Our String object has the same 1GB limit.  Partly because it works and
    saves a bit of memory per object, but also because if we find ourselves
    exceeding that limit we know we've probably made a design error.
    
    Parsing in stream means that you only need to store the final in-memory
    representation of the manifest which can be much more compact.  Yeah,
    it's complicated, but the memory and time savings are worth it.
    
    Note that our Perl implementation took the naive approach and has worked
    pretty well for six years, but can choke on really large manifests with
    out of memory errors.  Overall, I'd say getting the format right is more
    important than having the perfect initial implementation.
    
    > Here's a competing proposal: let's decide that lines consist of
    > tab-separated fields. If a field contains a \t, \r, or \n, put a " at
    > the beginning, a " at the end, and double any " that appears in the
    > middle. This is easy to generate and easy to parse. It lets us
    > completely ignore encoding considerations. Incremental parsing is
    > straightforward. Quoting will rarely be needed because there's very
    > little reason to create a file inside a PostgreSQL data directory that
    > contains a tab or a newline, but if you do it'll still work.  The lack
    > of quoting is nice for humans reading the manifest, and nice in terms
    > of keeping the manifest succinct; in contrast, note that using JSON
    > doubles every backslash.
    
    There's other information you'll want to store that is not strictly file
    info so you need a way to denote that.  It gets complicated quickly.
    
    > I hear you saying that this is going to end up being just as complex
    > in the end, but I don't think I believe it.  It sounds to me like the
    > difference between spending a couple of hours figuring this out and
    > spending a couple of months trying to figure it out and maybe not
    > actually getting anywhere.
    
    Maybe the initial implementation will be easier but I am confident we'll
    pay for it down the road.  Also, don't we want users to be able to read
    this file?  Do we really want them to need to cook up a custom parser in
    Perl, Go, Python, etc.?
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  13. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-20T15:21:24Z

    On 9/20/19 10:59 AM, Robert Haas wrote:
    > On Fri, Sep 20, 2019 at 9:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >> - appendStringInfo et. al. I don't think it would be that hard to move
    >> this to src/common, but I'm also not sure it really solves the
    >> problem, because StringInfo has a 1GB limit, and there's no rule at
    >> all that a backup manifest has got to be less than 1GB.
    > 
    > Hmm.  That's actually going to be a problem on the server side, no
    > matter what we do on the client side.  We have to send the manifest
    > after we send everything else, so that we know what we sent. But if we
    > sent a lot of files, the manifest might be really huge. I had been
    > thinking that we would generate the manifest on the server and send it
    > to the client after everything else, but maybe this is an argument for
    > generating the manifest on the client side and writing it
    > incrementally. That would require the client to peek at the contents
    > of every tar file it receives all the time, which it currently doesn't
    > need to do, but it does peek inside them a little bit, so maybe it's
    > OK.
    > 
    > Another alternative would be to have the server spill the manifest in
    > progress to a temp file and then stream it from there to the client.
    
    This seems reasonable to me.
    
    We keep an in-memory representation which is just an array of structs
    and is fairly compact -- 1 million files uses ~150MB of memory.  We just
    format and stream this to storage when saving.  Saving is easier than
    loading, of course.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  14. Re: backup manifests

    Chapman Flack <chap@anastigmatix.net> — 2019-09-20T15:24:39Z

    On 9/20/19 9:46 AM, Robert Haas wrote:
    
    > least, I think you do. There's probably some way to create a callback
    > structure that lets you presuppose that the toplevel data structure is
    > an array (or object) and get back each element of that array (or
    > key/value pair) as it's parsed,
    
    If a JSON parser does find its way into src/common, it probably wants
    to have such an incremental mode available, similar to [2] offered
    in the "Jackson" library for Java.
    
    The Jackson developer has propounded a thesis[1] that such a parsing
    library ought to offer "Three -- and Only Three" different styles of
    API corresponding to three ways of organizing the code using the
    library ([2], [3], [4], which also resemble the different APIs
    supplied in Java for XML processing).
    
    Regards,
    -Chap
    
    
    [1] http://www.cowtowncoder.com/blog/archives/2009/01/entry_132.html
    [2] http://www.cowtowncoder.com/blog/archives/2009/01/entry_137.html
    [3] http://www.cowtowncoder.com/blog/archives/2009/01/entry_153.html
    [4] http://www.cowtowncoder.com/blog/archives/2009/01/entry_152.html
    
    
    
    
  15. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-09-20T18:55:15Z

    On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote:
    > Seems to me we are overdue for elog()/ereport() compatible
    > error-handling in the front end.  Plus mem contexts.
    >
    > It sucks to make that a prereq for this project but the longer we kick
    > that can down the road...
    
    There are no doubt many patches that would benefit from having more
    backend infrastructure exposed in frontend contexts, and I think we're
    slowly moving in that direction, but I generally do not believe in
    burdening feature patches with major infrastructure improvements.
    Sometimes it's necessary, as in the case of parallel query, which
    required upgrading a whole lot of backend infrastructure in order to
    have any chance of doing something useful. In most cases, however,
    there's a way of getting the patch done that dodges the problem.
    
    For example, I think there's a pretty good argument that Heikki's
    design for relation forks was a bad one. It's proven to scale poorly
    and create performance problems and extra complexity in quite a few
    places. It would likely have been better, from a strictly theoretical
    point of view, to insist on a design where the FSM and VM pages got
    stored inside the relation itself, and the heap was responsible for
    figuring out how various pages were being used. When BRIN came along,
    we insisted on precisely that design, because it was clear that
    further straining the relation fork system was not a good plan.
    However, if we'd insisted on that when Heikki did the original work,
    it might have delayed the arrival of the free space map for one or
    more releases, and we got big benefits out of having that done sooner.
    There's nothing stopping someone from writing a patch to get rid of
    relation forks and allow a heap AM to have multiple relfilenodes (with
    the extra ones used for the FSM and VM) or with multiplexing all the
    data inside of a single file. Nobody has, though, because it's hard,
    and the problems with the status quo are not so bad as to justify the
    amount of development effort that would be required to fix it. At some
    point, that problem is probably going to work its way to the top of
    somebody's priority list, but it's already been about 10 years since
    that all happened and everyone has so far dodged dealing with the
    problem, which in turn has enabled them to work on other things that
    are perhaps more important.
    
    I think the same principle applies here. It's reasonable to ask the
    author of a feature patch to fix issues that are closely related to
    the feature in question, or even problems that are not new but would
    be greatly exacerbated by the addition of the feature. It's not
    reasonable to stack up a list of infrastructure upgrades that somebody
    has to do as a condition of having a feature patch accepted that does
    not necessarily require those upgrades. I am not convinced that JSON
    is actually a better format for a backup manifest (more on that
    below), but even if I were, I believe that getting a backup manifest
    functionality into PostgreSQL 13, and perhaps incremental backup on
    top of that, is valuable enough to justify making some compromises to
    make that happen. And I don't mean "compromises" as in "let's commit
    something that doesn't work very well;" rather, I mean making design
    choices that are aimed at making the project something that is
    feasible and can be completed in reasonable time, rather than not.
    
    And saying, well, the backup manifest format *has* to be JSON because
    everything else suxxor is not that. We don't have a single other
    example of a file that we read and write in JSON format. Extension
    control files use a custom format. Backup labels and backup history
    files and timeline history files and tablespace map files use custom
    formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom
    formats. postmaster.opts and postmaster.pid use custom formats. If
    JSON is better and easier, at least one of the various people who
    coded those things up would have chosen to use it, but none of them
    did, and nobody's made a serious attempt to convert them to use it.
    That might be because we lack the infrastructure for dealing with JSON
    and building it is more work than anybody's willing to do, or it might
    be because JSON is not actually better for these kinds of use cases,
    but either way, it's hard to see why this particular patch should be
    burdened with a requirement that none of the previous ones had to
    satisfy.
    
    Personally, I'd be intensely unhappy if a motion to convert
    postgresql.conf or pg_hba.conf to JSON format gathered enough steam to
    be adopted.  It would be darn useful, because you could specify
    complex values for options instead of being limited to scalars, but it
    would also make the configuration files a lot harder for human beings
    to read and grep and the quality of error reporting would probably
    decline significantly.  Also, appending a setting to the file,
    something which is currently quite simple, would get a lot harder.
    Ad-hoc file formats can be problematic, but they can also have real
    advantages in terms of readability, brevity, and fitness for purpose.
    
    > This talk was good fun.  The largest number of tables we've seen is a
    > few hundred thousand, but that still adds up to more than a million
    > files to backup.
    
    A quick survey of some of my colleagues turned up a few examples of
    people with 2-4 million files to backup, so similar kind of ballpark.
    Probably not big enough for the manifest to hit the 1GB mark, but
    getting close.
    
    > > Or we could just decide that you have to have enough memory
    > > to hold the parsed version of the entire manifest file in memory all
    > > at once, and if you don't, maybe you should drop some tables or buy
    > > more RAM.
    >
    > I assume you meant "un-parsed" here?
    
    I don't think I meant that, although it seems like you might need to
    store either all the parsed data or all the unparsed data or even
    both, depending on exactly what you are trying to do.
    
    > > I hear you saying that this is going to end up being just as complex
    > > in the end, but I don't think I believe it.  It sounds to me like the
    > > difference between spending a couple of hours figuring this out and
    > > spending a couple of months trying to figure it out and maybe not
    > > actually getting anywhere.
    >
    > Maybe the initial implementation will be easier but I am confident we'll
    > pay for it down the road.  Also, don't we want users to be able to read
    > this file?  Do we really want them to need to cook up a custom parser in
    > Perl, Go, Python, etc.?
    
    Well, I haven't heard anybody complain that they can't read a
    backup_label file because it's too hard to cook up a parser.  And I
    think the reason is pretty clear: such files are not hard to parse.
    Similarly for a pg_hba.conf file.  This case is a little more
    complicated than those, but AFAICS, not enormously so. Actually, it
    seems like a combination of those two cases: it has some fixed
    metadata fields that can be represented with one line per field, like
    a backup_label, and then a bunch of entries for files that are
    somewhat like entries in a pg_hba.conf file, in that they can be
    represented by a line per record with a certain number of fields on
    each line.
    
    I attach here a couple of patches.  The first one does some
    refactoring of relevant code in pg_basebackup, and the second one adds
    checksum manifests using a format that I pulled out of my ear. It
    probably needs some adjustment but I don't think it's crazy.  Each
    file gets a line that looks like this:
    
    File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM
    
    Right now, the file checksums are computed using SHA-256 but it could
    be changed to anything else for which we've got code. On my system,
    shasum -a256 $FILE produces the same answer that shows up here.  At
    the bottom of the manifest there's a checksum of the manifest itself,
    which looks like this:
    
    Manifest-Checksum
    385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781
    
    That's a SHA-256 checksum of the file contents excluding the final
    line. It can be verified by feeding all the file contents except the
    last line to shasum -a256. I can't help but observe that if the file
    were defined to be a JSONB blob, it's not very clear how you would
    include a checksum of the blob contents in the blob itself, but with a
    format based on a bunch of lines of data, it's super-easy to generate
    and super-easy to write tools that verify it.
    
    This is just a prototype so I haven't written a verification tool, and
    there's a bunch of testing and documentation and so forth that would
    need to be done aside from whatever we've got to hammer out in terms
    of design issues and file formats.  But I think it's cool, and perhaps
    some discussion of how it could be evolved will get us closer to a
    resolution everybody can at least live with.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  16. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-09-20T23:11:47Z

    On 9/20/19 2:55 PM, Robert Haas wrote:
    > On Fri, Sep 20, 2019 at 11:09 AM David Steele <david@pgmasters.net> wrote:
    >>
    >> It sucks to make that a prereq for this project but the longer we kick
    >> that can down the road...
    > 
    > There are no doubt many patches that would benefit from having more
    > backend infrastructure exposed in frontend contexts, and I think we're
    > slowly moving in that direction, but I generally do not believe in
    > burdening feature patches with major infrastructure improvements.
    
    The hardest part about technical debt is knowing when to incur it.  It
    is never a cut-and-dried choice.
    
    >> This talk was good fun.  The largest number of tables we've seen is a
    >> few hundred thousand, but that still adds up to more than a million
    >> files to backup.
    > 
    > A quick survey of some of my colleagues turned up a few examples of
    > people with 2-4 million files to backup, so similar kind of ballpark.
    > Probably not big enough for the manifest to hit the 1GB mark, but
    > getting close.
    
    I have so many doubts about clusters with this many tables, but we do
    support it, so...
    
    >>> I hear you saying that this is going to end up being just as complex
    >>> in the end, but I don't think I believe it.  It sounds to me like the
    >>> difference between spending a couple of hours figuring this out and
    >>> spending a couple of months trying to figure it out and maybe not
    >>> actually getting anywhere.
    >>
    >> Maybe the initial implementation will be easier but I am confident we'll
    >> pay for it down the road.  Also, don't we want users to be able to read
    >> this file?  Do we really want them to need to cook up a custom parser in
    >> Perl, Go, Python, etc.?
    > 
    > Well, I haven't heard anybody complain that they can't read a
    > backup_label file because it's too hard to cook up a parser.  And I
    > think the reason is pretty clear: such files are not hard to parse.
    > Similarly for a pg_hba.conf file.  This case is a little more
    > complicated than those, but AFAICS, not enormously so. Actually, it
    > seems like a combination of those two cases: it has some fixed
    > metadata fields that can be represented with one line per field, like
    > a backup_label, and then a bunch of entries for files that are
    > somewhat like entries in a pg_hba.conf file, in that they can be
    > represented by a line per record with a certain number of fields on
    > each line.
    
    Yeah, they are not hard to parse, but *everyone* has to cook up code for
    it.  A bit of a bummer, that.
    
    > I attach here a couple of patches.  The first one does some
    > refactoring of relevant code in pg_basebackup, and the second one adds
    > checksum manifests using a format that I pulled out of my ear. It
    > probably needs some adjustment but I don't think it's crazy.  Each
    > file gets a line that looks like this:
    > 
    > File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM
    
    We also include page checksum validation failures in the file record.
    Not critical for the first pass, perhaps, but something to keep in mind.
    
    > Right now, the file checksums are computed using SHA-256 but it could
    > be changed to anything else for which we've got code. On my system,
    > shasum -a256 $FILE produces the same answer that shows up here.  At
    > the bottom of the manifest there's a checksum of the manifest itself,
    > which looks like this:
    > 
    > Manifest-Checksum
    > 385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781
    > 
    > That's a SHA-256 checksum of the file contents excluding the final
    > line. It can be verified by feeding all the file contents except the
    > last line to shasum -a256. I can't help but observe that if the file
    > were defined to be a JSONB blob, it's not very clear how you would
    > include a checksum of the blob contents in the blob itself, but with a
    > format based on a bunch of lines of data, it's super-easy to generate
    > and super-easy to write tools that verify it.
    
    You can do this in JSON pretty easily by handling the terminating
    brace/bracket:
    
    {
    <some json contents>*,
    "checksum":<sha256>
    }
    
    But of course a linefeed-delimited file is even easier.
    
    > This is just a prototype so I haven't written a verification tool, and
    > there's a bunch of testing and documentation and so forth that would
    > need to be done aside from whatever we've got to hammer out in terms
    > of design issues and file formats.  But I think it's cool, and perhaps
    > some discussion of how it could be evolved will get us closer to a
    > resolution everybody can at least live with.
    
    I had a quick look and it seems pretty reasonable.  I'll need to
    generate a manifest to see if I can spot any obvious gotchas.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  17. Re: backup manifests

    vignesh C <vignesh21@gmail.com> — 2019-09-25T12:46:54Z

    On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    Some comments:
    
    Manifest file will be in plain text format even if compression is
    specified, should we compress it?
    May be this is intended, just raised the point to make sure that it is intended.
    +static void
    +ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
    +{
    + WriteManifestState *state = callback_data;
    +
    + if (fwrite(copybuf, r, 1, state->file) != 1)
    + {
    + pg_log_error("could not write to file \"%s\": %m", state->filename);
    + exit(1);
    + }
    +}
    
    WALfile.done file gets added but wal file information is not included
    in the manifest file, should we include WAL file also?
    @@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
      (errcode_for_file_access(),
      errmsg("could not stat file \"%s\": %m", pathbuf)));
    
    - sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
    + sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
    + NULL);
    
      /* unconditionally mark file as archived */
      StatusFilePath(pathbuf, fname, ".done");
    - sendFileWithContent(pathbuf, "");
    + sendFileWithContent(pathbuf, "", manifest);
    
    Should we add an option to make manifest generation configurable to
    reduce overhead during backup?
    
    Manifest file does not include directory information, should we include it?
    
    There is one warning:
    In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
                     from pg_basebackup.c:34:
    pg_basebackup.c: In function ‘ReceiveTarFile’:
    ../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
    comparison will always evaluate as ‘false’ for the address of ‘buf’
    will never be NULL [-Waddress]
      ((str) == NULL || (str)->maxlen == 0)
             ^
    pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
       if (PQExpBufferBroken(&buf))
    
    pg_gmtime can fail in case of malloc failure:
    + /*
    + * Convert time to a string. Since it's not clear what time zone to use
    + * and since time zone definitions can change, possibly causing confusion,
    + * use GMT always.
    + */
    + pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
    + pg_gmtime(&mtime));
    
    Regards,
    Vignesh
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  18. Re: backup manifests

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-09-30T09:31:25Z

    Entry for directory is not added in manifest. So it might be difficult
    at client to get to know about the directories. Will it be good to add
    an entry for each directory too? May be like:
    Dir    <dirname> <mtime>
    
    Also, on latest HEAD patches does not apply.
    
    On Wed, Sep 25, 2019 at 6:17 PM vignesh C <vignesh21@gmail.com> wrote:
    
    > On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com>
    > wrote:
    > >
    > Some comments:
    >
    > Manifest file will be in plain text format even if compression is
    > specified, should we compress it?
    > May be this is intended, just raised the point to make sure that it is
    > intended.
    > +static void
    > +ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
    > +{
    > + WriteManifestState *state = callback_data;
    > +
    > + if (fwrite(copybuf, r, 1, state->file) != 1)
    > + {
    > + pg_log_error("could not write to file \"%s\": %m", state->filename);
    > + exit(1);
    > + }
    > +}
    >
    > WALfile.done file gets added but wal file information is not included
    > in the manifest file, should we include WAL file also?
    > @@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
    >   (errcode_for_file_access(),
    >   errmsg("could not stat file \"%s\": %m", pathbuf)));
    >
    > - sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
    > + sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
    > + NULL);
    >
    >   /* unconditionally mark file as archived */
    >   StatusFilePath(pathbuf, fname, ".done");
    > - sendFileWithContent(pathbuf, "");
    > + sendFileWithContent(pathbuf, "", manifest);
    >
    > Should we add an option to make manifest generation configurable to
    > reduce overhead during backup?
    >
    > Manifest file does not include directory information, should we include it?
    >
    > There is one warning:
    > In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
    >                  from pg_basebackup.c:34:
    > pg_basebackup.c: In function ‘ReceiveTarFile’:
    > ../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
    > comparison will always evaluate as ‘false’ for the address of ‘buf’
    > will never be NULL [-Waddress]
    >   ((str) == NULL || (str)->maxlen == 0)
    >          ^
    > pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
    >    if (PQExpBufferBroken(&buf))
    >
    >
    Yes I too obeserved this warning.
    
    
    > pg_gmtime can fail in case of malloc failure:
    > + /*
    > + * Convert time to a string. Since it's not clear what time zone to use
    > + * and since time zone definitions can change, possibly causing confusion,
    > + * use GMT always.
    > + */
    > + pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
    > + pg_gmtime(&mtime));
    >
    > Regards,
    > Vignesh
    > EnterpriseDB: http://www.enterprisedb.com
    >
    >
    >
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  19. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-09-30T10:07:09Z

    On Wed, Sep 25, 2019 at 6:17 PM vignesh C <vignesh21@gmail.com> wrote:
    
    > On Sat, Sep 21, 2019 at 12:25 AM Robert Haas <robertmhaas@gmail.com>
    > wrote:
    > >
    > Some comments:
    >
    > Manifest file will be in plain text format even if compression is
    > specified, should we compress it?
    > May be this is intended, just raised the point to make sure that it is
    > intended.
    > +static void
    > +ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
    > +{
    > + WriteManifestState *state = callback_data;
    > +
    > + if (fwrite(copybuf, r, 1, state->file) != 1)
    > + {
    > + pg_log_error("could not write to file \"%s\": %m", state->filename);
    > + exit(1);
    > + }
    > +}
    >
    > WALfile.done file gets added but wal file information is not included
    > in the manifest file, should we include WAL file also?
    > @@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
    >   (errcode_for_file_access(),
    >   errmsg("could not stat file \"%s\": %m", pathbuf)));
    >
    > - sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
    > + sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid, manifest,
    > + NULL);
    >
    >   /* unconditionally mark file as archived */
    >   StatusFilePath(pathbuf, fname, ".done");
    > - sendFileWithContent(pathbuf, "");
    > + sendFileWithContent(pathbuf, "", manifest);
    >
    > Should we add an option to make manifest generation configurable to
    > reduce overhead during backup?
    >
    > Manifest file does not include directory information, should we include it?
    >
    > There is one warning:
    > In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
    >                  from pg_basebackup.c:34:
    > pg_basebackup.c: In function ‘ReceiveTarFile’:
    > ../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
    > comparison will always evaluate as ‘false’ for the address of ‘buf’
    > will never be NULL [-Waddress]
    >   ((str) == NULL || (str)->maxlen == 0)
    >          ^
    > pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
    >    if (PQExpBufferBroken(&buf))
    >
    >
    I also observed this warning.  PFA to fix the same.
    
    pg_gmtime can fail in case of malloc failure:
    > + /*
    > + * Convert time to a string. Since it's not clear what time zone to use
    > + * and since time zone definitions can change, possibly causing confusion,
    > + * use GMT always.
    > + */
    > + pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
    > + pg_gmtime(&mtime));
    >
    >
    Fixed that into attached patch.
    
    
    
    
    Regards.
    Rushabh Lathia
    www.EnterpriseDB.com
    
  20. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-10-01T12:13:05Z

    On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
    <jeevan.chalke@enterprisedb.com> wrote:
    > Entry for directory is not added in manifest. So it might be difficult
    > at client to get to know about the directories. Will it be good to add
    > an entry for each directory too? May be like:
    > Dir    <dirname> <mtime>
    
    Well, what kind of corruption would this allow us to detect that we
    can't detect as things stand? I think the only case is an empty
    directory. If it's not empty, we'd have some entries for the files in
    that directory, and those files won't be able to exist unless the
    directory does. But, how would we end up backing up an empty
    directory, anyway?
    
    I don't really *mind* adding directories into the manifest, but I'm
    not sure how much it helps.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  21. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-11-19T10:00:17Z

    My colleague Suraj did testing and noticed the performance impact
    with the checksums.   On further testing, he found that specifically with
    sha its more of performance impact.
    
    Please find below statistics:
    
    no of tables without checksum SHA256
    checksum % performnce
    overhead
    with
    SHA-256 md5 checksum % performnce
    overhead with md5 CRC checksum % performnce
    overhead with
    CRC
    10 (100 MB
    in each table) real 0m10.957s
    user 0m0.367s
    sys 0m2.275s real 0m16.816s
    user 0m0.210s
    sys 0m2.067s 53% real 0m11.895s
    user 0m0.174s
    sys 0m1.725s 8% real 0m11.136s
    user 0m0.365s
    sys 0m2.298s 2%
    20 (100 MB
    in each table) real 0m20.610s
    user 0m0.484s
    sys 0m3.198s real 0m31.745s
    user 0m0.569s
    sys 0m4.089s
    54% real 0m22.717s
    user 0m0.638s
    sys 0m4.026s 10% real 0m21.075s
    user 0m0.538s
    sys 0m3.417s 2%
    50 (100 MB
    in each table) real 0m49.143s
    user 0m1.646s
    sys 0m8.499s real 1m13.683s
    user 0m1.305s
    sys 0m10.541s 50% real 0m51.856s
    user 0m0.932s
    sys 0m7.702s 6% real 0m49.689s
    user 0m1.028s
    sys 0m6.921s 1%
    100 (100 MB
    in each table) real 1m34.308s
    user 0m2.265s
    sys 0m14.717s real 2m22.403s
    user 0m2.613s
    sys 0m20.776s 51% real 1m41.524s
    user 0m2.158s
    sys 0m15.949s
    8% real 1m35.045s
    user 0m2.061s
    sys 0m16.308s 1%
    100 (1 GB
    in each table) real 17m18.336s
    user 0m20.222s
    sys 3m12.960s real 24m45.942s
    user 0m26.911s
    sys 3m33.501s 43% real 17m41.670s
    user 0m26.506s
    sys 3m18.402s 2% real 17m22.296s
    user 0m26.811s
    sys 3m56.653s
    
    sometimes, this test
    completes within the
    same time as without
    checksum. approx. 0.5%
    
    
    Considering the above results, I modified the earlier Robert's patch and
    added
    "manifest_with_checksums" option to pg_basebackup.  With a new patch.
    by default, checksums will be disabled and will be only enabled when
    "manifest_with_checksums" option is provided.  Also re-based all patch set.
    
    
    
    Regards,
    
    -- 
    Rushabh Lathia
    www.EnterpriseDB.com
    
    On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
    > <jeevan.chalke@enterprisedb.com> wrote:
    > > Entry for directory is not added in manifest. So it might be difficult
    > > at client to get to know about the directories. Will it be good to add
    > > an entry for each directory too? May be like:
    > > Dir    <dirname> <mtime>
    >
    > Well, what kind of corruption would this allow us to detect that we
    > can't detect as things stand? I think the only case is an empty
    > directory. If it's not empty, we'd have some entries for the files in
    > that directory, and those files won't be able to exist unless the
    > directory does. But, how would we end up backing up an empty
    > directory, anyway?
    >
    > I don't really *mind* adding directories into the manifest, but I'm
    > not sure how much it helps.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    >
    >
    
    -- 
    Rushabh Lathia
    
  22. Re: backup manifests

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2019-11-19T13:49:24Z

    On 11/19/19 5:00 AM, Rushabh Lathia wrote:
    >
    >
    > My colleague Suraj did testing and noticed the performance impact
    > with the checksums.   On further testing, he found that specifically with
    > sha its more of performance impact.  
    >
    >
    
    I admit I haven't been following along closely, but why do we need a
    cryptographic checksum here instead of, say, a CRC? Do we think that
    somehow the checksum might be forged? Use of cryptographic hashes as
    general purpose checksums has become far too common IMNSHO.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  23. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-19T21:34:16Z

    On 11/19/19 5:00 AM, Rushabh Lathia wrote:
    > 
    > My colleague Suraj did testing and noticed the performance impact
    > with the checksums.   On further testing, he found that specifically with
    > sha its more of performance impact.  
    
    We have found that SHA1 adds about 3% overhead when the backup is also
    compressed (gzip -6), which is what most people want to do.  This
    percentage goes down even more if the backup is being transferred over a
    network or to an object store such as S3.
    
    We judged that the lower collision rate of SHA1 justified the additional
    expense.
    
    That said, making SHA256 optional seems reasonable.  We decided not to
    make our SHA1 checksums optional to reduce the test matrix and because
    parallelism largely addressed performance concerns.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  24. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-11-20T05:28:18Z

    On Tue, Nov 19, 2019 at 7:19 PM Andrew Dunstan <
    andrew.dunstan@2ndquadrant.com> wrote:
    
    >
    > On 11/19/19 5:00 AM, Rushabh Lathia wrote:
    > >
    > >
    > > My colleague Suraj did testing and noticed the performance impact
    > > with the checksums.   On further testing, he found that specifically with
    > > sha its more of performance impact.
    > >
    > >
    >
    > I admit I haven't been following along closely, but why do we need a
    > cryptographic checksum here instead of, say, a CRC? Do we think that
    > somehow the checksum might be forged? Use of cryptographic hashes as
    > general purpose checksums has become far too common IMNSHO.
    >
    
    Yeah, maybe.  I was thinking to give the user an option to choose checksums
    algorithms (SHA256. CRC, MD5, etc),  so that they are open to choose what
    suites for their environment.
    
    If we decide to do that than we need  to store the checksums algorithm
    information in the manifest file.
    
    Thoughts?
    
    
    
    >
    > cheers
    >
    >
    > andrew
    >
    >
    > --
    > Andrew Dunstan                https://www.2ndQuadrant.com
    > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    >
    >
    
    -- 
    Rushabh Lathia
    
  25. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-11-20T05:35:11Z

    Hi,
    
    Since now we are generating the backup manifest file with each backup, it
    provides us an option to validate the given backup.
    Let's say, we have taken a backup and after a few days, we want to check
    whether that backup is validated or corruption-free without restarting the
    server.
    
    Please find attached POC patch for same which will be based on the latest
    backup manifest patch from Rushabh. With this functionality, we add new
    option to pg_basebackup, something like --verify-backup.
    So, the syntax would be:
    ./bin/pg_basebackup --verify-backup -D <backup_directory_path>
    
    Basically, we read the backup_manifest file line by line from the given
    directory path and build the hash table, then scan the directory and
    compare each file with the hash entry.
    
    Thoughts/suggestions?
    
    On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    >
    > My colleague Suraj did testing and noticed the performance impact
    > with the checksums.   On further testing, he found that specifically with
    > sha its more of performance impact.
    >
    > Please find below statistics:
    >
    > no of tables without checksum SHA256
    > checksum % performnce
    > overhead
    > with
    > SHA-256 md5 checksum % performnce
    > overhead with md5 CRC checksum % performnce
    > overhead with
    > CRC
    > 10 (100 MB
    > in each table) real 0m10.957s
    > user 0m0.367s
    > sys 0m2.275s real 0m16.816s
    > user 0m0.210s
    > sys 0m2.067s 53% real 0m11.895s
    > user 0m0.174s
    > sys 0m1.725s 8% real 0m11.136s
    > user 0m0.365s
    > sys 0m2.298s 2%
    > 20 (100 MB
    > in each table) real 0m20.610s
    > user 0m0.484s
    > sys 0m3.198s real 0m31.745s
    > user 0m0.569s
    > sys 0m4.089s
    > 54% real 0m22.717s
    > user 0m0.638s
    > sys 0m4.026s 10% real 0m21.075s
    > user 0m0.538s
    > sys 0m3.417s 2%
    > 50 (100 MB
    > in each table) real 0m49.143s
    > user 0m1.646s
    > sys 0m8.499s real 1m13.683s
    > user 0m1.305s
    > sys 0m10.541s 50% real 0m51.856s
    > user 0m0.932s
    > sys 0m7.702s 6% real 0m49.689s
    > user 0m1.028s
    > sys 0m6.921s 1%
    > 100 (100 MB
    > in each table) real 1m34.308s
    > user 0m2.265s
    > sys 0m14.717s real 2m22.403s
    > user 0m2.613s
    > sys 0m20.776s 51% real 1m41.524s
    > user 0m2.158s
    > sys 0m15.949s
    > 8% real 1m35.045s
    > user 0m2.061s
    > sys 0m16.308s 1%
    > 100 (1 GB
    > in each table) real 17m18.336s
    > user 0m20.222s
    > sys 3m12.960s real 24m45.942s
    > user 0m26.911s
    > sys 3m33.501s 43% real 17m41.670s
    > user 0m26.506s
    > sys 3m18.402s 2% real 17m22.296s
    > user 0m26.811s
    > sys 3m56.653s
    >
    > sometimes, this test
    > completes within the
    > same time as without
    > checksum. approx. 0.5%
    >
    >
    > Considering the above results, I modified the earlier Robert's patch and
    > added
    > "manifest_with_checksums" option to pg_basebackup.  With a new patch.
    > by default, checksums will be disabled and will be only enabled when
    > "manifest_with_checksums" option is provided.  Also re-based all patch set.
    >
    >
    >
    > Regards,
    >
    > --
    > Rushabh Lathia
    > www.EnterpriseDB.com
    >
    > On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
    >> <jeevan.chalke@enterprisedb.com> wrote:
    >> > Entry for directory is not added in manifest. So it might be difficult
    >> > at client to get to know about the directories. Will it be good to add
    >> > an entry for each directory too? May be like:
    >> > Dir    <dirname> <mtime>
    >>
    >> Well, what kind of corruption would this allow us to detect that we
    >> can't detect as things stand? I think the only case is an empty
    >> directory. If it's not empty, we'd have some entries for the files in
    >> that directory, and those files won't be able to exist unless the
    >> directory does. But, how would we end up backing up an empty
    >> directory, anyway?
    >>
    >> I don't really *mind* adding directories into the manifest, but I'm
    >> not sure how much it helps.
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >>
    >>
    >
    > --
    > Rushabh Lathia
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  26. Re: backup manifests

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-11-21T09:03:05Z

    On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    >
    > My colleague Suraj did testing and noticed the performance impact
    > with the checksums.   On further testing, he found that specifically with
    > sha its more of performance impact.
    >
    > Please find below statistics:
    >
    > no of tables without checksum SHA256
    > checksum % performnce
    > overhead
    > with
    > SHA-256 md5 checksum % performnce
    > overhead with md5 CRC checksum % performnce
    > overhead with
    > CRC
    > 10 (100 MB
    > in each table) real 0m10.957s
    > user 0m0.367s
    > sys 0m2.275s real 0m16.816s
    > user 0m0.210s
    > sys 0m2.067s 53% real 0m11.895s
    > user 0m0.174s
    > sys 0m1.725s 8% real 0m11.136s
    > user 0m0.365s
    > sys 0m2.298s 2%
    > 20 (100 MB
    > in each table) real 0m20.610s
    > user 0m0.484s
    > sys 0m3.198s real 0m31.745s
    > user 0m0.569s
    > sys 0m4.089s
    > 54% real 0m22.717s
    > user 0m0.638s
    > sys 0m4.026s 10% real 0m21.075s
    > user 0m0.538s
    > sys 0m3.417s 2%
    > 50 (100 MB
    > in each table) real 0m49.143s
    > user 0m1.646s
    > sys 0m8.499s real 1m13.683s
    > user 0m1.305s
    > sys 0m10.541s 50% real 0m51.856s
    > user 0m0.932s
    > sys 0m7.702s 6% real 0m49.689s
    > user 0m1.028s
    > sys 0m6.921s 1%
    > 100 (100 MB
    > in each table) real 1m34.308s
    > user 0m2.265s
    > sys 0m14.717s real 2m22.403s
    > user 0m2.613s
    > sys 0m20.776s 51% real 1m41.524s
    > user 0m2.158s
    > sys 0m15.949s
    > 8% real 1m35.045s
    > user 0m2.061s
    > sys 0m16.308s 1%
    > 100 (1 GB
    > in each table) real 17m18.336s
    > user 0m20.222s
    > sys 3m12.960s real 24m45.942s
    > user 0m26.911s
    > sys 3m33.501s 43% real 17m41.670s
    > user 0m26.506s
    > sys 3m18.402s 2% real 17m22.296s
    > user 0m26.811s
    > sys 3m56.653s
    >
    > sometimes, this test
    > completes within the
    > same time as without
    > checksum. approx. 0.5%
    >
    >
    > Considering the above results, I modified the earlier Robert's patch and
    > added
    > "manifest_with_checksums" option to pg_basebackup.  With a new patch.
    > by default, checksums will be disabled and will be only enabled when
    > "manifest_with_checksums" option is provided.  Also re-based all patch set.
    >
    
    Review comments on 0004:
    
    1.
    I don't think we need o_manifest_with_checksums variable,
    manifest_with_checksums can be used instead.
    
    2.
    We need to document this new option for pg_basebackup and basebackup.
    
    3.
    Also, instead of keeping manifest_with_checksums as a global variable, we
    should pass that to the required function. Patch 0002 already modified the
    signature of all relevant functions anyways. So just need to add one more
    bool
    variable there.
    
    4.
    Why we need a "File" at the start of each entry as we are adding files only?
    I wonder if we also need to provide a tablespace name and directory marker
    so
    that we have "Tablespace" and "Dir" at the start.
    
    5.
    If I don't provide manifest-with-checksums option then too I see that
    checksum
    is calculated for backup_manifest file itself. Is that intentional or
    missed?
    I think we should omit that too if this option is not provided.
    
    6.
    Is it possible to get only a backup manifest from the server? A client like
    pg_basebackup can then use that to fetch files reading that.
    
    Thanks
    
    
    >
    >
    >
    > Regards,
    >
    > --
    > Rushabh Lathia
    > www.EnterpriseDB.com
    >
    > On Tue, Oct 1, 2019 at 5:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Mon, Sep 30, 2019 at 5:31 AM Jeevan Chalke
    >> <jeevan.chalke@enterprisedb.com> wrote:
    >> > Entry for directory is not added in manifest. So it might be difficult
    >> > at client to get to know about the directories. Will it be good to add
    >> > an entry for each directory too? May be like:
    >> > Dir    <dirname> <mtime>
    >>
    >> Well, what kind of corruption would this allow us to detect that we
    >> can't detect as things stand? I think the only case is an empty
    >> directory. If it's not empty, we'd have some entries for the files in
    >> that directory, and those files won't be able to exist unless the
    >> directory does. But, how would we end up backing up an empty
    >> directory, anyway?
    >>
    >> I don't really *mind* adding directories into the manifest, but I'm
    >> not sure how much it helps.
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >>
    >>
    >
    > --
    > Rushabh Lathia
    >
    
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  27. Re: backup manifests

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-11-21T09:21:01Z

    On Wed, Nov 20, 2019 at 11:05 AM Suraj Kharage <
    suraj.kharage@enterprisedb.com> wrote:
    
    > Hi,
    >
    > Since now we are generating the backup manifest file with each backup, it
    > provides us an option to validate the given backup.
    > Let's say, we have taken a backup and after a few days, we want to check
    > whether that backup is validated or corruption-free without restarting the
    > server.
    >
    > Please find attached POC patch for same which will be based on the latest
    > backup manifest patch from Rushabh. With this functionality, we add new
    > option to pg_basebackup, something like --verify-backup.
    > So, the syntax would be:
    > ./bin/pg_basebackup --verify-backup -D <backup_directory_path>
    >
    > Basically, we read the backup_manifest file line by line from the given
    > directory path and build the hash table, then scan the directory and
    > compare each file with the hash entry.
    >
    > Thoughts/suggestions?
    >
    
    
    I like the idea of verifying the backup once we have backup_manifest with
    us.
    Periodically verifying the already taken backup with this simple tool
    becomes
    easy now.
    
    I have reviewed this patch and here are my comments:
    
    1.
    @@ -30,7 +30,9 @@
     #include "common/file_perm.h"
     #include "common/file_utils.h"
     #include "common/logging.h"
    +#include "common/sha2.h"
     #include "common/string.h"
    +#include "fe_utils/simple_list.h"
     #include "fe_utils/recovery_gen.h"
     #include "fe_utils/string_utils.h"
     #include "getopt_long.h"
    @@ -38,12 +40,19 @@
     #include "pgtar.h"
     #include "pgtime.h"
     #include "pqexpbuffer.h"
    +#include "pgrhash.h"
     #include "receivelog.h"
     #include "replication/basebackup.h"
     #include "streamutil.h"
    
    Please add new files in order.
    
    2.
    Can hash related file names be renamed to backuphash.c and backuphash.h?
    
    3.
    Need indentation adjustments at various places.
    
    4.
    +            char        buf[1000000];  // 1MB chunk
    
    It will be good if we have multiple of block /page size (or at-least power
    of 2
    number).
    
    5.
    +typedef struct pgrhash_entry
    +{
    +    struct pgrhash_entry *next; /* link to next entry in same bucket */
    +    DataDirectoryFileInfo *record;
    +} pgrhash_entry;
    +
    +struct pgrhash
    +{
    +    unsigned    nbuckets;        /* number of buckets */
    +    pgrhash_entry **bucket;        /* pointer to hash entries */
    +};
    +
    +typedef struct pgrhash pgrhash;
    
    These two can be moved to .h file instead of redefining over there.
    
    6.
    +/*
    + * TODO: this function is not necessary, can be removed.
    + * Test whether the given row number is match for the supplied keys.
    + */
    +static bool
    +pgrhash_compare(char *bt_filename, char *filename)
    
    Yeah, it can be removed by doing strcmp() at the required places rather than
    doing it in a separate function.
    
    7.
    mdate is not compared anywhere. I understand that it can't be compared with
    the file in the backup directory and its entry in the manifest as manifest
    entry gives mtime from server file whereas the same file in the backup will
    have different mtime. But adding a few comments there will be good.
    
    8.
    +    char        mdate[24];
    
    should be mtime instead?
    
    
    Thanks
    
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  28. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-11-22T09:57:53Z

    Thank you Jeevan for reviewing the patch.
    
    On Thu, Nov 21, 2019 at 2:33 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    >
    >>
    >>
    >> My colleague Suraj did testing and noticed the performance impact
    >> with the checksums.   On further testing, he found that specifically with
    >> sha its more of performance impact.
    >>
    >> Please find below statistics:
    >>
    >> no of tables without checksum SHA256
    >> checksum % performnce
    >> overhead
    >> with
    >> SHA-256 md5 checksum % performnce
    >> overhead with md5 CRC checksum % performnce
    >> overhead with
    >> CRC
    >> 10 (100 MB
    >> in each table) real 0m10.957s
    >> user 0m0.367s
    >> sys 0m2.275s real 0m16.816s
    >> user 0m0.210s
    >> sys 0m2.067s 53% real 0m11.895s
    >> user 0m0.174s
    >> sys 0m1.725s 8% real 0m11.136s
    >> user 0m0.365s
    >> sys 0m2.298s 2%
    >> 20 (100 MB
    >> in each table) real 0m20.610s
    >> user 0m0.484s
    >> sys 0m3.198s real 0m31.745s
    >> user 0m0.569s
    >> sys 0m4.089s
    >> 54% real 0m22.717s
    >> user 0m0.638s
    >> sys 0m4.026s 10% real 0m21.075s
    >> user 0m0.538s
    >> sys 0m3.417s 2%
    >> 50 (100 MB
    >> in each table) real 0m49.143s
    >> user 0m1.646s
    >> sys 0m8.499s real 1m13.683s
    >> user 0m1.305s
    >> sys 0m10.541s 50% real 0m51.856s
    >> user 0m0.932s
    >> sys 0m7.702s 6% real 0m49.689s
    >> user 0m1.028s
    >> sys 0m6.921s 1%
    >> 100 (100 MB
    >> in each table) real 1m34.308s
    >> user 0m2.265s
    >> sys 0m14.717s real 2m22.403s
    >> user 0m2.613s
    >> sys 0m20.776s 51% real 1m41.524s
    >> user 0m2.158s
    >> sys 0m15.949s
    >> 8% real 1m35.045s
    >> user 0m2.061s
    >> sys 0m16.308s 1%
    >> 100 (1 GB
    >> in each table) real 17m18.336s
    >> user 0m20.222s
    >> sys 3m12.960s real 24m45.942s
    >> user 0m26.911s
    >> sys 3m33.501s 43% real 17m41.670s
    >> user 0m26.506s
    >> sys 3m18.402s 2% real 17m22.296s
    >> user 0m26.811s
    >> sys 3m56.653s
    >>
    >> sometimes, this test
    >> completes within the
    >> same time as without
    >> checksum. approx. 0.5%
    >>
    >>
    >> Considering the above results, I modified the earlier Robert's patch and
    >> added
    >> "manifest_with_checksums" option to pg_basebackup.  With a new patch.
    >> by default, checksums will be disabled and will be only enabled when
    >> "manifest_with_checksums" option is provided.  Also re-based all patch
    >> set.
    >>
    >
    > Review comments on 0004:
    >
    > 1.
    > I don't think we need o_manifest_with_checksums variable,
    > manifest_with_checksums can be used instead.
    >
    
    Yes, done in the latest version of opatch.
    
    
    > 2.
    > We need to document this new option for pg_basebackup and basebackup.
    >
    >
    Done, attaching documentation patch with the mail.
    
    3.
    > Also, instead of keeping manifest_with_checksums as a global variable, we
    > should pass that to the required function. Patch 0002 already modified the
    > signature of all relevant functions anyways. So just need to add one more
    > bool
    > variable there.
    >
    >
    yes, earlier I did that implementation but later found that we already
    have checksum related global variable i.e. noverify_checksums, so
    that it will be clean implementation - rather modifying the function
    definition
    to pass the variable (which is actually global for the operation).
    
    4.
    > Why we need a "File" at the start of each entry as we are adding files
    > only?
    > I wonder if we also need to provide a tablespace name and directory marker
    > so
    > that we have "Tablespace" and "Dir" at the start.
    >
    
    Sorry, I am not quite sure about this, may be Robert is right person
    to answer this.
    
    
    > 5.
    > If I don't provide manifest-with-checksums option then too I see that
    > checksum
    > is calculated for backup_manifest file itself. Is that intentional or
    > missed?
    > I think we should omit that too if this option is not provided.
    >
    >
    Oops yeah, corrected this in the latest version of patch.
    
    6.
    > Is it possible to get only a backup manifest from the server? A client like
    > pg_basebackup can then use that to fetch files reading that.
    >
    >
    Currently we don't have any option to just get the manifest file from the
    server.  I am not sure but why we need this at this point of time.
    
    
    
    Regards,
    
    Rushabh Lathia
    www.EnterpriseDB.com
    
  29. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-22T15:58:27Z

    On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
    <andrew.dunstan@2ndquadrant.com> wrote:
    > I admit I haven't been following along closely, but why do we need a
    > cryptographic checksum here instead of, say, a CRC? Do we think that
    > somehow the checksum might be forged? Use of cryptographic hashes as
    > general purpose checksums has become far too common IMNSHO.
    
    I tend to agree with you. I suspect if we just use CRC, some people
    are going to complain that they want something "stronger" because that
    will make them feel better about error detection rates or obscure
    threat models or whatever other things a SHA-based approach might be
    able to catch that CRC would not catch. However, I suspect that for
    normal use cases, CRC would be totally adequate, and the fact that the
    performance overhead is almost none vs. a whole lot - at least in this
    test setup, other results might vary depending on what you test -
    makes it look pretty appealing.
    
    My gut reaction is to make CRC the default, but have an option that
    you can use to either turn it off entirely (if even 1-2% is too much
    for you) or opt in to SHA-something if you want it. I don't think we
    should offer an option for MD5, because MD5 is a dirty word these days
    and will cause problems for users who have to worry about FIPS 140-2
    compliance. Phrased more positively, if you want a cryptographic hash
    at all, you should probably use one that isn't widely viewed as too
    weak.
    
    Thoughts?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  30. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-22T18:10:06Z

    On 11/22/19 10:58 AM, Robert Haas wrote:
    > On Tue, Nov 19, 2019 at 8:49 AM Andrew Dunstan
    > <andrew.dunstan@2ndquadrant.com> wrote:
    >> I admit I haven't been following along closely, but why do we need a
    >> cryptographic checksum here instead of, say, a CRC? Do we think that
    >> somehow the checksum might be forged? Use of cryptographic hashes as
    >> general purpose checksums has become far too common IMNSHO.
    > 
    > I tend to agree with you. I suspect if we just use CRC, some people
    > are going to complain that they want something "stronger" because that
    > will make them feel better about error detection rates or obscure
    > threat models or whatever other things a SHA-based approach might be
    > able to catch that CRC would not catch. 
    
    Well, the maximum amount of data that can be protected with a 32-bit CRC
    is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
    presume that's what we are talking about since I can't find any 64-bit
    CRC code in core or this patch.
    
    So, that's half of what we need with the default relation segment size
    (I've seen larger in the field).
    
    > I don't think we
    > should offer an option for MD5, because MD5 is a dirty word these days
    > and will cause problems for users who have to worry about FIPS 140-2
    > compliance. 
    
    +1.
    
    > Phrased more positively, if you want a cryptographic hash
    > at all, you should probably use one that isn't widely viewed as too
    > weak.
    
    Sure.  There's another advantage to picking an algorithm with lower
    collision rates, though.
    
    CRCs are fine for catching transmission errors (as caveated above) but
    not as great for comparing two files for equality.  With strong hashes
    you can confidently compare local files against the path, size, and hash
    stored in the manifest and save yourself a round-trip to the remote
    storage to grab the file if it has not changed locally.
    
    This is the basic premise of what we call delta restore which can speed
    up restores by orders of magnitude.
    
    Delta restore is the main advantage that made us decide to require SHA1
    checksums.  In most cases, restore speed is more important than backup
    speed.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  31. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-22T18:24:16Z

    On Tue, Nov 19, 2019 at 4:34 PM David Steele <david@pgmasters.net> wrote:
    > On 11/19/19 5:00 AM, Rushabh Lathia wrote:
    > > My colleague Suraj did testing and noticed the performance impact
    > > with the checksums.   On further testing, he found that specifically with
    > > sha its more of performance impact.
    >
    > We have found that SHA1 adds about 3% overhead when the backup is also
    > compressed (gzip -6), which is what most people want to do.  This
    > percentage goes down even more if the backup is being transferred over a
    > network or to an object store such as S3.
    
    I don't really understand why your tests and Suraj's tests are showing
    such different results, or how compression plays into it. I tried
    running shasum -a$N lineitem-big.csv on my laptop, where that file
    contains ~70MB of random-looking data whose source I no longer
    remember. Here are the results by algorithm: SHA1, ~25 seconds; SHA224
    or SHA256, ~52 seconds; SHA384 and SHA512, ~39 seconds. Aside from the
    interesting discovery that the algorithms with more bits actually run
    faster on this machine, this seems to show that there's only about a
    ~2x difference between the SHA1 that you used and that I (pretty much
    arbitrarily) used. But Rushabh and Suraj are reporting 43-54%
    overhead, and even if you divide that by two it's a lot more than 3%.
    
    One possible explanation is that the compression is really slow, and
    so it makes the checksum overhead a smaller percentage of the total.
    Like, if you've already slowed down the backup by 8x, then 24%
    overhead turns into 3% overhead! But I assume that's not the real
    explanation here. Another explanation is that your tests were
    I/O-bound rather than CPU-bound, maybe because you tested with a much
    larger database or a much smaller amount of I/O bandwidth. If you had
    CPU cycles to burn, then neither compression nor checksums will cost
    much in terms of overall runtime. But that's a little hard to swallow,
    too, because I don't think the testing mentioned above was done using
    any sort of exotic test configuration, so why would yours be so
    different? Another possibility is that Suraj and Rushabh messed up the
    tests, or alternatively that you did. Or, it could be that your
    checksum implementation is way faster than the one PG uses, and so the
    impact was much less. I don't know, but I'm having a hard time
    understanding the divergent results. Any ideas?
    
    > We judged that the lower collision rate of SHA1 justified the additional
    > expense.
    >
    > That said, making SHA256 optional seems reasonable.  We decided not to
    > make our SHA1 checksums optional to reduce the test matrix and because
    > parallelism largely addressed performance concerns.
    
    Just to be clear, I really don't have any objection to using SHA1
    instead of SHA256, or anything else for that matter. I picked the one
    to use out of a hat for the purpose of having a POC quickly; I didn't
    have any intention to insist on that as the final selection. It seems
    likely that anything we pick here will eventually be considered
    obsolete, so I think we need to allow for configurability, but I don't
    have a horse in the game as far as an initial selection goes.
    
    Except - and this gets back to the previous point - I don't want to
    slow down backups by 40% by default. I wouldn't mind slowing them down
    3% by default, but 40% is too much overhead. I think we've gotta
    either the overhead of using SHA way down or not use SHA by default.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  32. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-22T19:01:44Z

    On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
    > Well, the maximum amount of data that can be protected with a 32-bit CRC
    > is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
    > presume that's what we are talking about since I can't find any 64-bit
    > CRC code in core or this patch.
    
    Could you give a more precise citation for this? I can't find a
    reference to that in the Wikipedia article off-hand and I don't know
    where to look in NIST. I apologize if I'm being dense here, but I
    don't see why there should be any limit on the amount of data that can
    be protected. The important thing is that if the original file F is
    altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
    probability of that, assuming that the alteration is random rather
    than malicious and that the checksum function is equally likely to
    produce every possible output, is just 1-2^-${CHECKSUM_BITS},
    regardless of the length of the message (except that there might be
    some special cases for very short messages, which don't matter here).
    
    This analysis by me seems to match
    https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:
    
    "Typically an n-bit CRC applied to a data block of arbitrary length
    will detect any single error burst not longer than n bits, and the
    fraction of all longer error bursts that it will detect is (1 −
    2^−n)."
    
    Notice the phrase "a data block of arbitrary length" and the formula "1 - 2^-n".
    
    > > Phrased more positively, if you want a cryptographic hash
    > > at all, you should probably use one that isn't widely viewed as too
    > > weak.
    >
    > Sure.  There's another advantage to picking an algorithm with lower
    > collision rates, though.
    >
    > CRCs are fine for catching transmission errors (as caveated above) but
    > not as great for comparing two files for equality.  With strong hashes
    > you can confidently compare local files against the path, size, and hash
    > stored in the manifest and save yourself a round-trip to the remote
    > storage to grab the file if it has not changed locally.
    
    I agree in part. I think there are two reasons why a cryptographically
    strong hash is desirable for delta restore. First, since the checksums
    are longer, the probability of a false match happening randomly is
    lower, which is important. Even if the above analysis is correct and
    the chance of a false match is just 2^-32 with a 32-bit CRC, if you
    back up ten million files every day, you'll likely get a false match
    within a few years or less, and once is too often. Second, unlike what
    I supposed above, the contents of a PostgreSQL data file are not
    chosen at random, unlike transmission errors, which probably are more
    or less random. It seems somewhat possible that there is an adversary
    who is trying to choose the data that gets stored in some particular
    record so as to create a false checksum match. A CRC is a lot easier
    to fool than a crytographic hash, so I think that using a CRC of *any*
    length for this kind of use case would be extremely dangerous no
    matter the probability of an accidental match.
    
    > This is the basic premise of what we call delta restore which can speed
    > up restores by orders of magnitude.
    >
    > Delta restore is the main advantage that made us decide to require SHA1
    > checksums.  In most cases, restore speed is more important than backup
    > speed.
    
    I see your point, but it's not the whole story. We've encountered a
    bunch of cases where the time it took to complete a backup exceeded
    the user's desired backup interval, which is obviously very bad, or
    even more commonly where it exceeded the length of the user's
    "low-usage" period when they could tolerate the extra overhead imposed
    by the backup. A few percentage points is probably not a big deal, but
    a user who has an 8-hour window to get the backup done overnight will
    not be happy if it's taking 6 hours now and we tack 40%-50% on to
    that. So I think that we either have to disable backup checksums by
    default, or figure out a way to get the overhead down to something a
    lot smaller than what current tests are showing -- which we could
    possibly do without changing the algorithm if we can somehow make it a
    lot cheaper, but otherwise I think the choice is between disabling the
    functionality altogether by default and adopting a less-expensive
    algorithm. Maybe someday when delta restore is in core and widely used
    and CPUs are faster, it'll make sense to revise the default, and
    that's cool, but I can't see imposing a big overhead by default to
    enable a feature core doesn't have yet...
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  33. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-22T19:02:12Z

    On 11/22/19 1:24 PM, Robert Haas wrote:
    > On Tue, Nov 19, 2019 at 4:34 PM David Steele <david@pgmasters.net> wrote:
    >> On 11/19/19 5:00 AM, Rushabh Lathia wrote:
    >>> My colleague Suraj did testing and noticed the performance impact
    >>> with the checksums.   On further testing, he found that specifically with
    >>> sha its more of performance impact.
    >>
    >> We have found that SHA1 adds about 3% overhead when the backup is also
    >> compressed (gzip -6), which is what most people want to do.  This
    >> percentage goes down even more if the backup is being transferred over a
    >> network or to an object store such as S3.
    > 
    > I don't really understand why your tests and Suraj's tests are showing
    > such different results, or how compression plays into it. I tried
    > running shasum -a$N lineitem-big.csv on my laptop, where that file
    > contains ~70MB of random-looking data whose source I no longer
    > remember. Here are the results by algorithm: SHA1, ~25 seconds; SHA224
    > or SHA256, ~52 seconds; SHA384 and SHA512, ~39 seconds. Aside from the
    > interesting discovery that the algorithms with more bits actually run
    > faster on this machine, this seems to show that there's only about a
    > ~2x difference between the SHA1 that you used and that I (pretty much
    > arbitrarily) used. But Rushabh and Suraj are reporting 43-54%
    > overhead, and even if you divide that by two it's a lot more than 3%.
    > 
    > One possible explanation is that the compression is really slow, and
    > so it makes the checksum overhead a smaller percentage of the total.
    > Like, if you've already slowed down the backup by 8x, then 24%
    > overhead turns into 3% overhead! But I assume that's not the real
    > explanation here. 
    
    That's the real explanation here.  Hash calculations run at the same
    speed, they just become a smaller portion of the *total* time once
    compression (gzip -6) is added.  With something like lz4 hashing will
    obviously be a big percentage of the total.
    
    Also consider how much extra latency you get from copying over a
    network.  My 3% did not include that but realistically most backups are
    running over a network (hopefully).
    
    >> That said, making SHA256 optional seems reasonable.  We decided not to
    >> make our SHA1 checksums optional to reduce the test matrix and because
    >> parallelism largely addressed performance concerns.
    > 
    > Just to be clear, I really don't have any objection to using SHA1
    > instead of SHA256, or anything else for that matter. I picked the one
    > to use out of a hat for the purpose of having a POC quickly; I didn't
    > have any intention to insist on that as the final selection. It seems
    > likely that anything we pick here will eventually be considered
    > obsolete, so I think we need to allow for configurability, but I don't
    > have a horse in the game as far as an initial selection goes.
    
    We decided that SHA1 was good enough and there was no need to go up to
    SHA256.  What we were interested in was collision rates and what the
    chance of getting a false positive were based on the combination of
    path, size, and hash.  With SHA1 the chance of a collision was literally
    astronomically low (as in the universe would probably end before it
    happened, depending on whether you are an expand forever or contract
    proponent).
    
    > Except - and this gets back to the previous point - I don't want to
    > slow down backups by 40% by default. I wouldn't mind slowing them down
    > 3% by default, but 40% is too much overhead. I think we've gotta
    > either the overhead of using SHA way down or not use SHA by default.
    
    Maybe -- my take is that the measurements, an uncompressed backup to the
    local filesystem, are not a very realistic use case.
    
    However, I'm still fine with leaving the user the option of checksums or
    no.  I just wanted to point out that CRCs have their limits so maybe
    that's not a great option unless it is properly caveated and perhaps not
    the default.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  34. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-22T19:29:17Z

    On 11/22/19 2:01 PM, Robert Haas wrote:
    > On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
    >> Well, the maximum amount of data that can be protected with a 32-bit CRC
    >> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  I
    >> presume that's what we are talking about since I can't find any 64-bit
    >> CRC code in core or this patch.
    > 
    > Could you give a more precise citation for this? 
    
    See:
    https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf
    Search for "The maximum block size"
    
    https://en.wikipedia.org/wiki/Cyclic_redundancy_check
    "The design of the CRC polynomial depends on the maximum total length of
    the block to be protected (data + CRC bits)", which I took to mean there
    are limits.
    
    Here another interesting bit from:
    https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks
    "Because a CRC is based on division, no polynomial can detect errors
    consisting of a string of zeroes prepended to the data, or of missing
    leading zeroes" -- but it appears to matter what CRC you are using.
    There's a variation that works in this case and hopefully we are using
    that one.
    
    This paper talks about appropriate block lengths vs crc length:
    http://users.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf
    but it is concerned with network transmission and small block lengths.
    
    > "Typically an n-bit CRC applied to a data block of arbitrary length
    > will detect any single error burst not longer than n bits, and the
    > fraction of all longer error bursts that it will detect is (1 −
    > 2^−n)."
    
    I'm not sure how encouraging I find this -- a four-byte error not a lot
    and 2^32 is only 4 billion.  We have individual users who have backed up
    more than 4 billion files over the last few years.
    
    >> This is the basic premise of what we call delta restore which can speed
    >> up restores by orders of magnitude.
    >>
    >> Delta restore is the main advantage that made us decide to require SHA1
    >> checksums.  In most cases, restore speed is more important than backup
    >> speed.
    > 
    > I see your point, but it's not the whole story. We've encountered a
    > bunch of cases where the time it took to complete a backup exceeded
    > the user's desired backup interval, which is obviously very bad, or
    > even more commonly where it exceeded the length of the user's
    > "low-usage" period when they could tolerate the extra overhead imposed
    > by the backup. A few percentage points is probably not a big deal, but
    > a user who has an 8-hour window to get the backup done overnight will
    > not be happy if it's taking 6 hours now and we tack 40%-50% on to
    > that. So I think that we either have to disable backup checksums by
    > default, or figure out a way to get the overhead down to something a
    > lot smaller than what current tests are showing -- which we could
    > possibly do without changing the algorithm if we can somehow make it a
    > lot cheaper, but otherwise I think the choice is between disabling the
    > functionality altogether by default and adopting a less-expensive
    > algorithm. Maybe someday when delta restore is in core and widely used
    > and CPUs are faster, it'll make sense to revise the default, and
    > that's cool, but I can't see imposing a big overhead by default to
    > enable a feature core doesn't have yet...
    
    OK, I'll buy that.  But I *don't* think CRCs should be allowed for
    deltas (when we have them) and I *do* think we should caveat their
    effectiveness (assuming we can agree on them).
    
    In general the answer to faster backups should be more cores/faster
    network/faster disk, not compromising backup integrity.  I understand
    we'll need to wait until we have parallelism in pg_basebackup to justify
    that answer.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  35. Re: backup manifests

    Tels <nospam-pg-abuse@bloodgate.com> — 2019-11-22T22:15:29Z

    Moin Robert,
    
    On 2019-11-22 20:01, Robert Haas wrote:
    > On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> 
    > wrote:
    >> Well, the maximum amount of data that can be protected with a 32-bit 
    >> CRC
    >> is 512MB according to all the sources I found (NIST, Wikipedia, etc).  
    >> I
    >> presume that's what we are talking about since I can't find any 64-bit
    >> CRC code in core or this patch.
    > 
    > Could you give a more precise citation for this? I can't find a
    > reference to that in the Wikipedia article off-hand and I don't know
    > where to look in NIST. I apologize if I'm being dense here, but I
    > don't see why there should be any limit on the amount of data that can
    > be protected. The important thing is that if the original file F is
    > altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
    > probability of that, assuming that the alteration is random rather
    > than malicious and that the checksum function is equally likely to
    > produce every possible output, is just 1-2^-${CHECKSUM_BITS},
    > regardless of the length of the message (except that there might be
    > some special cases for very short messages, which don't matter here).
    > 
    > This analysis by me seems to match
    > https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:
    > 
    > "Typically an n-bit CRC applied to a data block of arbitrary length
    > will detect any single error burst not longer than n bits, and the
    > fraction of all longer error bursts that it will detect is (1 −
    > 2^−n)."
    > 
    > Notice the phrase "a data block of arbitrary length" and the formula "1 
    > - 2^-n".
    
    It is related to the number of states, and the birthday problem factors 
    in it, too:
    
        https://en.wikipedia.org/wiki/Birthday_problem
    
    If you have a 32 bit checksum or hash, it can represent only 2**32-1 
    states at most (or less, if the
    algorithmn isn't really good).
    
    Each byte is 8 bit, so 2 ** 32 / 8 is 512 Mbyte. If you process your 
    data bit by bit, each
    new bit would add a new state (consider: missing bit == 0, added bit == 
    1). If each new state
    is repesented by a different checksum, all possible 2 ** 32 values are 
    exhausted after
    processing 512 Mbyte, after that you get one of the former states again 
    - aka a collision.
    
    There is no way around it with so little bits, no matter what algorithmn 
    you choose.
    
    >> > Phrased more positively, if you want a cryptographic hash
    >> > at all, you should probably use one that isn't widely viewed as too
    >> > weak.
    >> 
    >> Sure.  There's another advantage to picking an algorithm with lower
    >> collision rates, though.
    >> 
    >> CRCs are fine for catching transmission errors (as caveated above) but
    >> not as great for comparing two files for equality.  With strong hashes
    >> you can confidently compare local files against the path, size, and 
    >> hash
    >> stored in the manifest and save yourself a round-trip to the remote
    >> storage to grab the file if it has not changed locally.
    > 
    > I agree in part. I think there are two reasons why a cryptographically
    > strong hash is desirable for delta restore. First, since the checksums
    > are longer, the probability of a false match happening randomly is
    > lower, which is important. Even if the above analysis is correct and
    > the chance of a false match is just 2^-32 with a 32-bit CRC, if you
    > back up ten million files every day, you'll likely get a false match
    > within a few years or less, and once is too often. Second, unlike what
    > I supposed above, the contents of a PostgreSQL data file are not
    > chosen at random, unlike transmission errors, which probably are more
    > or less random. It seems somewhat possible that there is an adversary
    > who is trying to choose the data that gets stored in some particular
    > record so as to create a false checksum match. A CRC is a lot easier
    > to fool than a crytographic hash, so I think that using a CRC of *any*
    > length for this kind of use case would be extremely dangerous no
    > matter the probability of an accidental match.
    
    Agreed. See above.
    
    However, if you choose a hash, please do not go below SHA-256. Both MD5
    and SHA-1 already had collision attacks, and these only got to be bound
    to be worse.
    
       https://www.mscs.dal.ca/~selinger/md5collision/
       https://shattered.io/
    
    It might even be a wise idea to encode the used Hash-Algorithm into the
    manifest file, so it can be changed later. The hash length might be not
    enough to decide which algorithm is the one used.
    
    >> This is the basic premise of what we call delta restore which can 
    >> speed
    >> up restores by orders of magnitude.
    >> 
    >> Delta restore is the main advantage that made us decide to require 
    >> SHA1
    >> checksums.  In most cases, restore speed is more important than backup
    >> speed.
    > 
    > I see your point, but it's not the whole story. We've encountered a
    > bunch of cases where the time it took to complete a backup exceeded
    > the user's desired backup interval, which is obviously very bad, or
    > even more commonly where it exceeded the length of the user's
    > "low-usage" period when they could tolerate the extra overhead imposed
    > by the backup. A few percentage points is probably not a big deal, but
    > a user who has an 8-hour window to get the backup done overnight will
    > not be happy if it's taking 6 hours now and we tack 40%-50% on to
    > that. So I think that we either have to disable backup checksums by
    > default, or figure out a way to get the overhead down to something a
    > lot smaller than what current tests are showing -- which we could
    > possibly do without changing the algorithm if we can somehow make it a
    > lot cheaper, but otherwise I think the choice is between disabling the
    > functionality altogether by default and adopting a less-expensive
    > algorithm. Maybe someday when delta restore is in core and widely used
    > and CPUs are faster, it'll make sense to revise the default, and
    > that's cool, but I can't see imposing a big overhead by default to
    > enable a feature core doesn't have yet...
    
    Modern algorithms are amazingly fast on modern hardware, some even
    are implemented in hardware nowadays:
    
      https://software.intel.com/en-us/articles/intel-sha-extensions
    
    Quote from:
    
      
    https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-extensions-to-intels-cpus/
    
      "Despite the extremely limited availability of SHA extension support
       in modern desktop and mobile processors, crypto libraries have already
       upstreamed support to great effect. Botan’s SHA extension patches show 
    a
       significant 3x to 5x performance boost when taking advantage of the 
    hardware
       extensions, and the Linux kernel itself shipped with hardware SHA 
    support
       with version 4.4, bringing a very respectable 3.6x performance upgrade 
    over
       the already hardware-assisted SSE3-enabled code."
    
    If you need to load the data from disk and shove it over a network, the
    hashing will certainly be very little overhead, it might even be 
    completely
    invisible, since it can run in paralell to all the other things. Sure, 
    there
    is the thing called zero-copy-networking, but if you have to compress 
    the
    data bevore sending it to the network, you have to put it through the 
    CPU,
    anyway. And if you have more than one core, the second one can to the
    hashing it paralell to the first one doing the compression.
    
    To get a feeling one can use:
    
        openssl speed md5 sha1 sha256 sha512
    
    On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
    
      The 'numbers' are in 1000s of bytes per second processed.
       type       16 bytes     64 bytes    256 bytes   1024 bytes   8192 
    bytes  16384 bytes
       md5       122638.55k   277023.96k   487725.57k   630806.19k   
    683892.74k   688553.98k
       sha1      127226.45k   313891.52k   632510.55k   865753.43k   
    960995.33k   977215.19k
       sha256     77611.02k   173368.15k   325460.99k   412633.43k   
    447022.92k   448020.48k
       sha512     51164.77k   205189.87k   361345.79k   543883.26k   
    638372.52k   645933.74k
    
    Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
    427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd 
    need a
    pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these 
    speeds
    and then you'd use a real CPU for your server, not some poor Intel 
    powersaving
    surfing thingy-majingy :)
    
    Best regards,
    
    Tels
    
    
    
    
  36. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-22T22:30:18Z

    On 11/22/19 5:15 PM, Tels wrote:
    > On 2019-11-22 20:01, Robert Haas wrote:
    >> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> wrote:
    > 
    >>> > Phrased more positively, if you want a cryptographic hash
    >>> > at all, you should probably use one that isn't widely viewed as too
    >>> > weak.
    >>>
    >>> Sure.  There's another advantage to picking an algorithm with lower
    >>> collision rates, though.
    >>>
    >>> CRCs are fine for catching transmission errors (as caveated above) but
    >>> not as great for comparing two files for equality.  With strong hashes
    >>> you can confidently compare local files against the path, size, and hash
    >>> stored in the manifest and save yourself a round-trip to the remote
    >>> storage to grab the file if it has not changed locally.
    >>
    >> I agree in part. I think there are two reasons why a cryptographically
    >> strong hash is desirable for delta restore. First, since the checksums
    >> are longer, the probability of a false match happening randomly is
    >> lower, which is important. Even if the above analysis is correct and
    >> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
    >> back up ten million files every day, you'll likely get a false match
    >> within a few years or less, and once is too often. Second, unlike what
    >> I supposed above, the contents of a PostgreSQL data file are not
    >> chosen at random, unlike transmission errors, which probably are more
    >> or less random. It seems somewhat possible that there is an adversary
    >> who is trying to choose the data that gets stored in some particular
    >> record so as to create a false checksum match. A CRC is a lot easier
    >> to fool than a crytographic hash, so I think that using a CRC of *any*
    >> length for this kind of use case would be extremely dangerous no
    >> matter the probability of an accidental match.
    > 
    > Agreed. See above.
    > 
    > However, if you choose a hash, please do not go below SHA-256. Both MD5
    > and SHA-1 already had collision attacks, and these only got to be bound
    > to be worse.
    
    I don't think collision attacks are a big consideration in the general
    case.  The manifest is generally stored with the backup files so if a
    file is modified it is then trivial to modify the manifest as well.
    
    Of course, you could store the manifest separately or even just know the
    hash of the manifest and store that separately.  In that case SHA-256
    might be useful and it would be good to have the option, which I believe
    is the plan.
    
    I do wonder if you could construct a successful collision attack (even
    in MD5) that would also result in a valid relation file.  Probably, at
    least eventually.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  37. Re: backup manifests

    Tels <nospam-pg-abuse@bloodgate.com> — 2019-11-23T08:13:21Z

    Moin,
    
    On 2019-11-22 23:30, David Steele wrote:
    > On 11/22/19 5:15 PM, Tels wrote:
    >> On 2019-11-22 20:01, Robert Haas wrote:
    >>> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david@pgmasters.net> 
    >>> wrote:
    >> 
    >>>> > Phrased more positively, if you want a cryptographic hash
    >>>> > at all, you should probably use one that isn't widely viewed as too
    >>>> > weak.
    >>>> 
    >>>> Sure.  There's another advantage to picking an algorithm with lower
    >>>> collision rates, though.
    >>>> 
    >>>> CRCs are fine for catching transmission errors (as caveated above) 
    >>>> but
    >>>> not as great for comparing two files for equality.  With strong 
    >>>> hashes
    >>>> you can confidently compare local files against the path, size, and 
    >>>> hash
    >>>> stored in the manifest and save yourself a round-trip to the remote
    >>>> storage to grab the file if it has not changed locally.
    >>> 
    >>> I agree in part. I think there are two reasons why a 
    >>> cryptographically
    >>> strong hash is desirable for delta restore. First, since the 
    >>> checksums
    >>> are longer, the probability of a false match happening randomly is
    >>> lower, which is important. Even if the above analysis is correct and
    >>> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
    >>> back up ten million files every day, you'll likely get a false match
    >>> within a few years or less, and once is too often. Second, unlike 
    >>> what
    >>> I supposed above, the contents of a PostgreSQL data file are not
    >>> chosen at random, unlike transmission errors, which probably are more
    >>> or less random. It seems somewhat possible that there is an adversary
    >>> who is trying to choose the data that gets stored in some particular
    >>> record so as to create a false checksum match. A CRC is a lot easier
    >>> to fool than a crytographic hash, so I think that using a CRC of 
    >>> *any*
    >>> length for this kind of use case would be extremely dangerous no
    >>> matter the probability of an accidental match.
    >> 
    >> Agreed. See above.
    >> 
    >> However, if you choose a hash, please do not go below SHA-256. Both 
    >> MD5
    >> and SHA-1 already had collision attacks, and these only got to be 
    >> bound
    >> to be worse.
    > 
    > I don't think collision attacks are a big consideration in the general
    > case.  The manifest is generally stored with the backup files so if a
    > file is modified it is then trivial to modify the manifest as well.
    
    That is true. However, a simple way around this is to sign the manifest
    with a public key l(GPG or similiar). And if the manifest contains
    strong, hard-to-forge hashes, we got a mure more secure backup, where
    (almost) nobody else can alter the manifest, nor can he mount easy
    collision attacks against the single files.
    
    Without the strong hashes it would be pointless to sign the manifest.
    
    > Of course, you could store the manifest separately or even just know 
    > the
    > hash of the manifest and store that separately.  In that case SHA-256
    > might be useful and it would be good to have the option, which I 
    > believe
    > is the plan.
    > 
    > I do wonder if you could construct a successful collision attack (even
    > in MD5) that would also result in a valid relation file.  Probably, at
    > least eventually.
    
    With MD5, certainly. One way is to have two block of 512 bits that hash
    to the different MD5s. It is trivial to re-use one already existing from
    the known examples.
    
    Here is one, where the researchers constructed 12 PDFs that all
    have the same MD5 hash:
    
       https://www.win.tue.nl/hashclash/Nostradamus/
    
    If you insert one of these blocks into a relation and dump it, you could
    swap it (probably?) out on disk for the other block. I'm not sure this
    is of practical usage as an attack, tho. It would, however, cast doubt
    on the integrity of the backup and prove that MD5 is useless.
    
    OTOH, finding a full collision with MD5 should also be in reach with
    todays hardware. It is hard find exact numbers but this:
    
        https://www.win.tue.nl/hashclash/SingleBlock/
    
    gives the following numbers for 2008/2009:
    
       "Finding the birthday bits took 47 hours (expected was 3 days) on the
       cluster of 215 Playstation 3 game consoles at LACAL, EPFL. This is
       roughly equivalent to 400,000 hours on a single PC core. The single
       near-collision block construction took 18 hours and 20 minutes on a
       single PC core."
    
    Today one can probably compute it on a single GPU in mere hours. And you
    can rent massive amounts of them in the cloud for real cheap.
    
    Here are a few, now a bit dated, references:
    
        https://blog.codinghorror.com/speed-hashing/
        http://codahale.com/how-to-safely-store-a-password/
    
    Best regards,
    
    Tels
    
    
    
    
  38. Re: backup manifests

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2019-11-23T21:34:05Z

    On 11/23/19 3:13 AM, Tels wrote:
    >
    > Without the strong hashes it would be pointless to sign the manifest.
    >
    >
    
    I guess I must have missed where we are planning to add a cryptographic
    signature.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  39. Re: backup manifests

    David Steele <david@pgmasters.net> — 2019-11-24T14:38:09Z

    On 11/23/19 4:34 PM, Andrew Dunstan wrote:
    > 
    > On 11/23/19 3:13 AM, Tels wrote:
    >>
    >> Without the strong hashes it would be pointless to sign the manifest.
    >>
    > 
    > I guess I must have missed where we are planning to add a cryptographic
    > signature.
    
    I don't think we were planning to, but the user could do so if they wished.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  40. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-11-25T09:35:22Z

    Hi Jeevan,
    
    I have incorporated all the comments in the attached patch. Please review
    and let me know your thoughts.
    
    On Thu, Nov 21, 2019 at 2:51 PM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Wed, Nov 20, 2019 at 11:05 AM Suraj Kharage <
    > suraj.kharage@enterprisedb.com> wrote:
    >
    >> Hi,
    >>
    >> Since now we are generating the backup manifest file with each backup, it
    >> provides us an option to validate the given backup.
    >> Let's say, we have taken a backup and after a few days, we want to check
    >> whether that backup is validated or corruption-free without restarting the
    >> server.
    >>
    >> Please find attached POC patch for same which will be based on the latest
    >> backup manifest patch from Rushabh. With this functionality, we add new
    >> option to pg_basebackup, something like --verify-backup.
    >> So, the syntax would be:
    >> ./bin/pg_basebackup --verify-backup -D <backup_directory_path>
    >>
    >> Basically, we read the backup_manifest file line by line from the given
    >> directory path and build the hash table, then scan the directory and
    >> compare each file with the hash entry.
    >>
    >> Thoughts/suggestions?
    >>
    >
    >
    > I like the idea of verifying the backup once we have backup_manifest with
    > us.
    > Periodically verifying the already taken backup with this simple tool
    > becomes
    > easy now.
    >
    > I have reviewed this patch and here are my comments:
    >
    > 1.
    > @@ -30,7 +30,9 @@
    >  #include "common/file_perm.h"
    >  #include "common/file_utils.h"
    >  #include "common/logging.h"
    > +#include "common/sha2.h"
    >  #include "common/string.h"
    > +#include "fe_utils/simple_list.h"
    >  #include "fe_utils/recovery_gen.h"
    >  #include "fe_utils/string_utils.h"
    >  #include "getopt_long.h"
    > @@ -38,12 +40,19 @@
    >  #include "pgtar.h"
    >  #include "pgtime.h"
    >  #include "pqexpbuffer.h"
    > +#include "pgrhash.h"
    >  #include "receivelog.h"
    >  #include "replication/basebackup.h"
    >  #include "streamutil.h"
    >
    > Please add new files in order.
    >
    > 2.
    > Can hash related file names be renamed to backuphash.c and backuphash.h?
    >
    > 3.
    > Need indentation adjustments at various places.
    >
    > 4.
    > +            char        buf[1000000];  // 1MB chunk
    >
    > It will be good if we have multiple of block /page size (or at-least power
    > of 2
    > number).
    >
    > 5.
    > +typedef struct pgrhash_entry
    > +{
    > +    struct pgrhash_entry *next; /* link to next entry in same bucket */
    > +    DataDirectoryFileInfo *record;
    > +} pgrhash_entry;
    > +
    > +struct pgrhash
    > +{
    > +    unsigned    nbuckets;        /* number of buckets */
    > +    pgrhash_entry **bucket;        /* pointer to hash entries */
    > +};
    > +
    > +typedef struct pgrhash pgrhash;
    >
    > These two can be moved to .h file instead of redefining over there.
    >
    > 6.
    > +/*
    > + * TODO: this function is not necessary, can be removed.
    > + * Test whether the given row number is match for the supplied keys.
    > + */
    > +static bool
    > +pgrhash_compare(char *bt_filename, char *filename)
    >
    > Yeah, it can be removed by doing strcmp() at the required places rather
    > than
    > doing it in a separate function.
    >
    > 7.
    > mdate is not compared anywhere. I understand that it can't be compared with
    > the file in the backup directory and its entry in the manifest as manifest
    > entry gives mtime from server file whereas the same file in the backup will
    > have different mtime. But adding a few comments there will be good.
    >
    > 8.
    > +    char        mdate[24];
    >
    > should be mtime instead?
    >
    >
    > Thanks
    >
    > --
    > Jeevan Chalke
    > Associate Database Architect & Team Lead, Product Development
    > EnterpriseDB Corporation
    > The Enterprise PostgreSQL Company
    >
    >
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  41. Re: backup manifests

    Tels <nospam-pg-abuse@bloodgate.com> — 2019-11-25T16:24:48Z

    On 2019-11-24 15:38, David Steele wrote:
    > On 11/23/19 4:34 PM, Andrew Dunstan wrote:
    >> 
    >> On 11/23/19 3:13 AM, Tels wrote:
    >>> 
    >>> Without the strong hashes it would be pointless to sign the manifest.
    >>> 
    >> 
    >> I guess I must have missed where we are planning to add a 
    >> cryptographic
    >> signature.
    > 
    > I don't think we were planning to, but the user could do so if they 
    > wished.
    
    That was what I meant.
    
    Best regards,
    
    Tels
    
    
    
    
  42. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-25T17:11:53Z

    On Fri, Nov 22, 2019 at 2:29 PM David Steele <david@pgmasters.net> wrote:
    > See:
    > https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf
    > Search for "The maximum block size"
    
    Hmm, so it says: "The maximum block size that can be protected by a
    32-bit CRC is 512MB." My problem is that (1) it doesn't back this up
    with a citation or any kind of logical explanation and (2) it's not
    very clear what "protected" means. Tels replies downthread to explain
    that the internal state of the 32-bit CRC calculation is also limited
    to 32 bits, and changes once per bit, so that after processing 512MB =
    2^29 bytes = 2^32 bits of data, you're guaranteed to start repeating
    internal states. Perhaps this is also what the NIST folks had in mind,
    though it's hard to know.
    
    This link provides some more details:
    
    https://community.arm.com/developer/tools-software/tools/f/keil-forum/17467/crc-for-256-byte-data
    
    Not everyone on the thread agrees with everybody else, but it seems
    like there are size limits below which a CRC-n is guaranteed to detect
    all 1-bit and 2-bit errors, and above which this is no longer
    guaranteed. They put the limit *lower* than what NIST supposes, namely
    2^(n-1)-1 bits, which would be 256MB, not 512MB, if I'm doing math
    correctly. However, they also say that above that value, you are still
    likely to detect most errors. Absent an intelligent adversary, the
    chance of a random collision when corruption is present is still about
    1 in 4 billion (2^-32).
    
    To me, guaranteed detection of 1-bit and 2-bit errors (and the other
    kinds of specific things CRC is designed to catch) doesn't seem like a
    principle design consideration. It's nice if we can get it and I'm not
    against it, but these are algorithms that are designed to be used when
    data undergoes a digital-to-analog-to-digital conversion, where for
    example it's possible that that the conversion back to digital loses
    sync and reads 9 bits or 7 bits rather than 8 bits. And that's not
    really what we're doing here: we all know that bits get flipped
    sometimes, but nobody uses scp to copy a 1GB file and ends up with a
    file that is 1GB +/- a few bits. Some lower-level part of the
    communication stack is handling that part of the work; you're going to
    get exactly 1GB. So it seems to me that here, as with XLOG, we're not
    relying on the specific CRC properties that were intended to be used
    to catch and in some cases repair bit flips caused by wrinkles in an
    A-to-D conversion, but just on its general tendency to probably not
    match if any bits got flipped. And those properties hold regardless of
    input length.
    
    That being said, having done some reading on this, I am a little
    concerned that we're getting further and further from the design
    center of the CRC algorithm. Like relation segment files, XLOG records
    are not packets subject to bit insertions, but at least they're small,
    and relation files are not. Using a 40-year-old algorithm that was
    intended to be used for things like making sure the modem hadn't lost
    framing in the last second to verify 1GB files feels, in some nebulous
    way, like we might be stretching. That being said, I'm not sure what
    we think the reasonable alternatives are. Users aren't going to be
    better off if we say that, because CRC-32C might not do a great job
    detecting errors, we're not going to check for errors at all. If we go
    the other way and say we're going to use some variant of SHA, they
    will be better off, but at the price of what looks like a
    *significant* hit in terms of backup time.
    
    > > "Typically an n-bit CRC applied to a data block of arbitrary length
    > > will detect any single error burst not longer than n bits, and the
    > > fraction of all longer error bursts that it will detect is (1 −
    > > 2^−n)."
    >
    > I'm not sure how encouraging I find this -- a four-byte error not a lot
    > and 2^32 is only 4 billion.  We have individual users who have backed up
    > more than 4 billion files over the last few years.
    
    I agree that people have a lot more than 4 billion files backed up,
    but I'm not sure it matters very much given the use case I'm trying to
    enable. There's a lot of difference between delta restore and backup
    integrity checking. For backup integrity checking, my goal is that, on
    those occasions when a file gets corrupted, the chances that we notice
    that it has been corrupted. For that purpose, a 32-bit checksum is
    probably sufficient. If a file gets corrupted, we have about a
    1-in-4-billion chance of being unable to detect it. If 4 billion files
    get corrupted, we'll miss, on average, one of those corruption events.
    That's sad, but so is the fact that you had *4 billion corrupted
    files*. This is not the total number of files backed up; this is the
    number of those that got corrupted. I don't really know how common it
    is to copy a file and end up with a corrupt copy, but if you say it's
    one-in-a-million, which I suspect is far too high, then you'd have to
    back up something like 4 quadrillion files before you missed a
    corruption event, and that's a *very* big number.
    
    Now delta restore is a whole different kettle of fish. The birthday
    problem is huge here. If you've got a 32-bit checksum for file A, and
    you go and look it up in a database of checksums, and that database
    has even 1 billion things in it, you've got a pretty decent shot of
    latching onto a file that is not actually the same as file A. The
    problem goes away almost entirely if you only compare against previous
    versions of that file from that database cluster. You've probably only
    got tens or maybe at the very outside hundreds or thousands of backups
    of that particular file, and a collision is unlikely even with only a
    32-bit checksum -- though even there maybe you'd like to use something
    larger just to be on the safe side. But if you're going to compare to
    other files from the same cluster, or even worse any file from any
    cluster, 32 bits is *woefully* inadequate. TBH even using SHA for such
    use cases feels a little scary to me. It's probably good enough --
    2^160 for SHA-1 is a *lot* bigger than 2^32, and 2^512 for SHA-512 is
    enormous. But I'd want to spend time thinking very carefully about the
    math before designing such a system.
    
    > OK, I'll buy that.  But I *don't* think CRCs should be allowed for
    > deltas (when we have them) and I *do* think we should caveat their
    > effectiveness (assuming we can agree on them).
    
    Sounds good.
    
    > In general the answer to faster backups should be more cores/faster
    > network/faster disk, not compromising backup integrity.  I understand
    > we'll need to wait until we have parallelism in pg_basebackup to justify
    > that answer.
    
    I would like to dispute that characterization of what we're talking
    about here. If we added a 1-bit checksum (parity bit) it would be
    *strictly better* than what we're doing right now, which is nothing.
    That's not a serious proposal because it's obvious we can do a lot
    better for trivial additional cost, but deciding that we're going to
    use a weaker kind of checksum to avoid adding too much overhead is not
    wimping out, because it's still going to be strong enough to catch the
    overwhelming majority of problems that go undetected today. Even an
    *8-bit* checksum would give us a >99% chance of catching a corrupted
    file, which would be noticeably better than the 0% chance we have
    today. Even a manifest with no checksums at all that just checked the
    presence and size of files would catch tons of operator error, e.g.
    
    - wait, that database had tablespaces?
    - were those logs in pg_clog anything important?
    - oh, i wasn't supposed to start postgres on the copy of the database
    stored in the backup directory?
    
    So I don't think we're talking about whether to compromise backup
    integrity. I think we're talking about - if we're going to make backup
    integrity better than it is today, how much better should we try to
    make it, and what are the trade-offs there? The straw man here is that
    we could make the database infinitely secure if we put it in a
    concrete bunker and sunk it to the bottom of the ocean, with the small
    price that we'd no longer be able to access it either. Somewhere
    between that extreme and the other extreme of setting the
    authentication method to 0.0.0.0/0 trust there's a happy medium where
    security is tolerably good but ease of access isn't crippled, and the
    same thing applies here. We could (probably) be the first database on
    the planet to store a 1024-bit encrypted checksum of every 8kB block,
    but that seems like it's going too far in the "concrete bunker"
    direction. IMHO, at least, we should be aiming for something that has
    a high probability of catching real problems and a low probability of
    being super-annoying.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  43. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-25T17:21:56Z

    On Fri, Nov 22, 2019 at 2:02 PM David Steele <david@pgmasters.net> wrote:
    > > Except - and this gets back to the previous point - I don't want to
    > > slow down backups by 40% by default. I wouldn't mind slowing them down
    > > 3% by default, but 40% is too much overhead. I think we've gotta
    > > either the overhead of using SHA way down or not use SHA by default.
    >
    > Maybe -- my take is that the measurements, an uncompressed backup to the
    > local filesystem, are not a very realistic use case.
    
    Well, compression is a feature we don't have yet, in core. So for
    people who are only using core tools, an uncompressed backup is a very
    realistic use case, because it's the only kind they can get. Granted
    the situation is different if you are using pgbackrest.
    
    I don't have enough experience to know how often people back up to
    local filesystems vs. remote filesystems mounted locally vs. overtly
    over-the-network. I sometimes get the impression that users choose
    their backup tools and procedures with, as Tom would say, the aid of a
    dart board, but that's probably the cynic in me talking. Or maybe a
    reflection of the fact that I usually end up talking to the users for
    whom things have gone really, really badly wrong, rather than the ones
    for whom things went as planned.
    
    > However, I'm still fine with leaving the user the option of checksums or
    > no.  I just wanted to point out that CRCs have their limits so maybe
    > that's not a great option unless it is properly caveated and perhaps not
    > the default.
    
    I think the default is the sticking point here. To me, it looks like
    CRC is a better default than nothing at all because it should still
    catch a high percentage of issues that would otherwise be missed, and
    a better default than SHA because it's so cheap to compute. However,
    I'm certainly willing to consider other theories.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  44. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-11-25T17:43:18Z

    On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-abuse@bloodgate.com> wrote:
    > It is related to the number of states...
    
    Thanks for this explanation. See my reply to David where I also
    discuss this point.
    
    > However, if you choose a hash, please do not go below SHA-256. Both MD5
    > and SHA-1 already had collision attacks, and these only got to be bound
    > to be worse.
    >
    >    https://www.mscs.dal.ca/~selinger/md5collision/
    >    https://shattered.io/
    
    Yikes, that second link, about SHA-1, is depressing. Now, it's not
    likely that an attacker has access to your backup repository and can
    spend 6500 years of CPU time to engineer a Trojan file there (maybe
    more, because the files are probably bigger than the PDFs they used in
    that case) and then induce you to restore and rely upon that backup.
    However, it's entirely likely that somebody is going to eventually ban
    SHA-1 as the attacks get better, which is going to be a problem for us
    whether the underlying exposures are problems or not.
    
    > It might even be a wise idea to encode the used Hash-Algorithm into the
    > manifest file, so it can be changed later. The hash length might be not
    > enough to decide which algorithm is the one used.
    
    I agree. Let's write
    SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
    or similar rather than just the hash. That way even if the entire SHA
    family gets cracked, we can easily substitute in something else that
    hasn't been cracked yet.
    
    (It is unclear to me why anyone supposes that *any* popular hash
    function won't eventually be cracked. For a K-bit hash function, there
    are 2^K possible outputs, where K is probably in the hundreds. But
    there are 2^{2^33} possible 1GB files. So for every possible output
    value, there are 2^{2^33-K} inputs that produce that value, which is a
    very very big number. The probability that any given input produces a
    certain output is very low, but the number of possible inputs that
    produce a given output is very high; so assuming that nobody's ever
    going to figure out how to construct them seems optimistic.)
    
    > To get a feeling one can use:
    >
    >     openssl speed md5 sha1 sha256 sha512
    >
    > On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
    >
    >   The 'numbers' are in 1000s of bytes per second processed.
    >    type       16 bytes     64 bytes    256 bytes   1024 bytes   8192
    > bytes  16384 bytes
    >    md5       122638.55k   277023.96k   487725.57k   630806.19k
    > 683892.74k   688553.98k
    >    sha1      127226.45k   313891.52k   632510.55k   865753.43k
    > 960995.33k   977215.19k
    >    sha256     77611.02k   173368.15k   325460.99k   412633.43k
    > 447022.92k   448020.48k
    >    sha512     51164.77k   205189.87k   361345.79k   543883.26k
    > 638372.52k   645933.74k
    >
    > Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
    > 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
    > need a
    > pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
    > speeds
    > and then you'd use a real CPU for your server, not some poor Intel
    > powersaving
    > surfing thingy-majingy :)
    
    I mean, how fast is in theory doesn't matter nearly as much as what
    happens when you benchmark the proposed implementation, and the
    results we have so far don't support the theory that this is so cheap
    as to be negligible.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  45. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-04T18:01:37Z

    As per the  discussion on the thread, here is the patch which
    
    a) Make checksum for manifest file optional.
    b) Allow user to choose a particular algorithm.
    
    Currently with the WIP patch SHA256 and CRC checksum algorithm
    supported.  Patch also changed the manifest file format to append
    the used algorithm name before the checksum, this way it will be
    easy to validator to know which algorithm to used.
    
    Ex:
    ./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
    
    $ cat bksha/backup_manifest  | more
    PostgreSQL-Backup-Manifest-Version 1
    File backup_label 226 2019-12-04 17:46:46 GMT
    SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
    File pg_xact/0000 8192 2019-12-04 17:46:46 GMT
    SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
    
    ./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
    PostgreSQL-Backup-Manifest-Version 1
    File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
    File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
    
    Pending TODOs:
    - Documentation update
    - Code cleanup
    - Testing.
    
    I will further continue to work on the patch and meanwhile feel free to
    provide
    thoughts/inputs.
    
    Thanks,
    
    
    On Mon, Nov 25, 2019 at 11:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-abuse@bloodgate.com>
    > wrote:
    > > It is related to the number of states...
    >
    > Thanks for this explanation. See my reply to David where I also
    > discuss this point.
    >
    > > However, if you choose a hash, please do not go below SHA-256. Both MD5
    > > and SHA-1 already had collision attacks, and these only got to be bound
    > > to be worse.
    > >
    > >    https://www.mscs.dal.ca/~selinger/md5collision/
    > >    https://shattered.io/
    >
    > Yikes, that second link, about SHA-1, is depressing. Now, it's not
    > likely that an attacker has access to your backup repository and can
    > spend 6500 years of CPU time to engineer a Trojan file there (maybe
    > more, because the files are probably bigger than the PDFs they used in
    > that case) and then induce you to restore and rely upon that backup.
    > However, it's entirely likely that somebody is going to eventually ban
    > SHA-1 as the attacks get better, which is going to be a problem for us
    > whether the underlying exposures are problems or not.
    >
    > > It might even be a wise idea to encode the used Hash-Algorithm into the
    > > manifest file, so it can be changed later. The hash length might be not
    > > enough to decide which algorithm is the one used.
    >
    > I agree. Let's write
    > SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
    > or similar rather than just the hash. That way even if the entire SHA
    > family gets cracked, we can easily substitute in something else that
    > hasn't been cracked yet.
    >
    > (It is unclear to me why anyone supposes that *any* popular hash
    > function won't eventually be cracked. For a K-bit hash function, there
    > are 2^K possible outputs, where K is probably in the hundreds. But
    > there are 2^{2^33} possible 1GB files. So for every possible output
    > value, there are 2^{2^33-K} inputs that produce that value, which is a
    > very very big number. The probability that any given input produces a
    > certain output is very low, but the number of possible inputs that
    > produce a given output is very high; so assuming that nobody's ever
    > going to figure out how to construct them seems optimistic.)
    >
    > > To get a feeling one can use:
    > >
    > >     openssl speed md5 sha1 sha256 sha512
    > >
    > > On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
    > >
    > >   The 'numbers' are in 1000s of bytes per second processed.
    > >    type       16 bytes     64 bytes    256 bytes   1024 bytes   8192
    > > bytes  16384 bytes
    > >    md5       122638.55k   277023.96k   487725.57k   630806.19k
    > > 683892.74k   688553.98k
    > >    sha1      127226.45k   313891.52k   632510.55k   865753.43k
    > > 960995.33k   977215.19k
    > >    sha256     77611.02k   173368.15k   325460.99k   412633.43k
    > > 447022.92k   448020.48k
    > >    sha512     51164.77k   205189.87k   361345.79k   543883.26k
    > > 638372.52k   645933.74k
    > >
    > > Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
    > > 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
    > > need a
    > > pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
    > > speeds
    > > and then you'd use a real CPU for your server, not some poor Intel
    > > powersaving
    > > surfing thingy-majingy :)
    >
    > I mean, how fast is in theory doesn't matter nearly as much as what
    > happens when you benchmark the proposed implementation, and the
    > results we have so far don't support the theory that this is so cheap
    > as to be negligible.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    Rushabh Lathia
    
  46. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-04T18:47:07Z

    On Wed, Dec 4, 2019 at 1:01 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > As per the  discussion on the thread, here is the patch which
    >
    > a) Make checksum for manifest file optional.
    > b) Allow user to choose a particular algorithm.
    >
    > Currently with the WIP patch SHA256 and CRC checksum algorithm
    > supported.  Patch also changed the manifest file format to append
    > the used algorithm name before the checksum, this way it will be
    > easy to validator to know which algorithm to used.
    >
    > Ex:
    > ./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
    >
    > $ cat bksha/backup_manifest  | more
    > PostgreSQL-Backup-Manifest-Version 1
    > File backup_label 226 2019-12-04 17:46:46 GMT SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
    > File pg_xact/0000 8192 2019-12-04 17:46:46 GMT SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
    >
    > ./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
    > PostgreSQL-Backup-Manifest-Version 1
    > File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
    > File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
    >
    > Pending TODOs:
    > - Documentation update
    > - Code cleanup
    > - Testing.
    >
    > I will further continue to work on the patch and meanwhile feel free to provide
    > thoughts/inputs.
    
    + initilize_manifest_checksum(&cCtx);
    
    Spelling.
    
    -
    
    Spurious.
    
    + case MC_CRC:
    + INIT_CRC32C(cCtx->crc_ctx);
    
    Suggest that we do CRC -> CRC32C throughout the patch. Someone might
    conceivably want some other CRC variant, mostly likely 64-bit, in the
    future.
    
    +final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf)
    
    finalize
    
      printf(_("      --manifest-with-checksums\n"
    - "                         do calculate checksums for manifest files\n"));
    + "                         calculate checksums for manifest files
    using provided algorithm\n"));
    
    Switch name is wrong. Suggest --manifest-checksums.
    Help usually shows that an argument is expected, e.g.
    --manifest-checksums=ALGORITHM or
    --manifest-checksums=sha256|crc32c|none
    
    This seems to apply over some earlier version of the patch.  A
    consolidated patch, or the whole stack, would be better.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  47. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-05T16:22:21Z

    On Thu, Dec 5, 2019 at 12:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Wed, Dec 4, 2019 at 1:01 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    > > As per the  discussion on the thread, here is the patch which
    > >
    > > a) Make checksum for manifest file optional.
    > > b) Allow user to choose a particular algorithm.
    > >
    > > Currently with the WIP patch SHA256 and CRC checksum algorithm
    > > supported.  Patch also changed the manifest file format to append
    > > the used algorithm name before the checksum, this way it will be
    > > easy to validator to know which algorithm to used.
    > >
    > > Ex:
    > > ./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
    > >
    > > $ cat bksha/backup_manifest  | more
    > > PostgreSQL-Backup-Manifest-Version 1
    > > File backup_label 226 2019-12-04 17:46:46 GMT
    > SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
    > > File pg_xact/0000 8192 2019-12-04 17:46:46 GMT
    > SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
    > >
    > > ./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
    > > PostgreSQL-Backup-Manifest-Version 1
    > > File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
    > > File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
    > >
    > > Pending TODOs:
    > > - Documentation update
    > > - Code cleanup
    > > - Testing.
    > >
    > > I will further continue to work on the patch and meanwhile feel free to
    > provide
    > > thoughts/inputs.
    >
    > + initilize_manifest_checksum(&cCtx);
    >
    > Spelling.
    >
    >
    Fixed.
    
    -
    >
    > Spurious.
    >
    > + case MC_CRC:
    > + INIT_CRC32C(cCtx->crc_ctx);
    >
    > Suggest that we do CRC -> CRC32C throughout the patch. Someone might
    > conceivably want some other CRC variant, mostly likely 64-bit, in the
    > future.
    >
    >
    Make sense, done.
    
    +final_manifest_checksum(ChecksumCtx *cCtx, char *checksumbuf)
    >
    > finalize
    >
    >
    Done.
    
      printf(_("      --manifest-with-checksums\n"
    > - "                         do calculate checksums for manifest files\n"));
    > + "                         calculate checksums for manifest files
    > using provided algorithm\n"));
    >
    > Switch name is wrong. Suggest --manifest-checksums.
    > Help usually shows that an argument is expected, e.g.
    > --manifest-checksums=ALGORITHM or
    > --manifest-checksums=sha256|crc32c|none
    >
    >
    Fixed.
    
    This seems to apply over some earlier version of the patch.  A
    > consolidated patch, or the whole stack, would be better.
    >
    
    Here is the whole stack of patches.
    
    
    Thanks,
    Rushabh Lathia
    www.EnterpriseDB.com
    
  48. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-05T18:46:08Z

    On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > Here is the whole stack of patches.
    
    Please include proper attribution and, where somebody's written them,
    commit messages in each patch in the stack. For example, I see that
    your 0001 is mostly the same as my 0001 from upthread, but now it
    says:
    
    
  49. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-05T20:14:34Z

    On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > Here is the whole stack of patches.
    
    I committed 0001, as that's just refactoring and I think (hope) it's
    uncontroversial. I think 0002-0005 need to be squashed together
    (crediting all authors properly and in the appropriate order) as it's
    quite hard to understand right now, and that Suraj's patch to validate
    the backup should be included in the patch stack. It needs
    documentation. Also, we need, either in that patch or a separate, TAP
    tests that exercise this feature. Things we should try to check:
    
    - Plain format backups can be verified against the manifest.
    - Tar format backups can be verified against the manifest after
    untarring (this might be a problem; not sure there's any guarantee
    that we have a working "tar" command available).
    - Verification succeeds for all available checksums algorithms and
    also for no checksum algorithm (should still check which files are
    present, and sizes).
    - If we tamper with a backup by removing a file, adding a file, or
    changing the size of a file, the modification is detected even without
    checksums.
    - If we tamper with a backup by changing the contents of a file but
    not the size, the modification is detected if checksums are used.
    - Everything above still works if there is user-defined tablespace
    that contains a table.
    
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  50. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-06T06:35:19Z

    On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    > > Here is the whole stack of patches.
    >
    > I committed 0001, as that's just refactoring and I think (hope) it's
    > uncontroversial. I think 0002-0005 need to be squashed together
    > (crediting all authors properly and in the appropriate order) as it's
    > quite hard to understand right now,
    
    
    Please find attached single patch and I tried to add the credit to all
    the authors.
    
    There is one review comment from Jeevan Chalke, which still pending
    to address is:
    
    4.
    > Why we need a "File" at the start of each entry as we are adding files
    > only?
    > I wonder if we also need to provide a tablespace name and directory marker
    > so
    > that we have "Tablespace" and "Dir" at the start.
    >
    
    Sorry, I am not quite sure about this, may be Robert is right person
    to answer this.
    
    and that Suraj's patch to validate
    > the backup should be included in the patch stack. It needs
    > documentation. Also, we need, either in that patch or a separate, TAP
    > tests that exercise this feature. Things we should try to check:
    >
    > - Plain format backups can be verified against the manifest.
    > - Tar format backups can be verified against the manifest after
    > untarring (this might be a problem; not sure there's any guarantee
    > that we have a working "tar" command available).
    > - Verification succeeds for all available checksums algorithms and
    > also for no checksum algorithm (should still check which files are
    > present, and sizes).
    > - If we tamper with a backup by removing a file, adding a file, or
    > changing the size of a file, the modification is detected even without
    > checksums.
    > - If we tamper with a backup by changing the contents of a file but
    > not the size, the modification is detected if checksums are used.
    > - Everything above still works if there is user-defined tablespace
    > that contains a table.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    Thanks.
    Rushabh Lathia
    www.EnterpriseDB.com
    
  51. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-06T13:33:39Z

    On Fri, Dec 6, 2019 at 1:35 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > There is one review comment from Jeevan Chalke, which still pending
    > to address is:
    >
    >> 4.
    >> Why we need a "File" at the start of each entry as we are adding files only?
    >> I wonder if we also need to provide a tablespace name and directory marker so
    >> that we have "Tablespace" and "Dir" at the start.
    >
    > Sorry, I am not quite sure about this, may be Robert is right person
    > to answer this.
    
    I did it that way for extensibility. Notice that the first and last
    line of the manifest begin with other words, so someone parsing the
    manifest can identify the line type by looking just at the first word.
    Someone might in the future find some need to add other kinds of lines
    that don't exist today.
    
    "Tablespace" and "Dir" are, in fact, pretty good examples of things
    that someone might want to add in the future. I don't really see a
    clear need for either one today, although maybe somebody else will,
    but I think we should leave ourselves room to add such things in the
    future.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  52. Re: backup manifests

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-12-09T05:45:23Z

    On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    >
    > On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com>
    >> wrote:
    >> > Here is the whole stack of patches.
    >>
    >> I committed 0001, as that's just refactoring and I think (hope) it's
    >> uncontroversial. I think 0002-0005 need to be squashed together
    >> (crediting all authors properly and in the appropriate order) as it's
    >> quite hard to understand right now,
    >
    >
    > Please find attached single patch and I tried to add the credit to all
    > the authors.
    >
    
    I had a look over the patch and here are my few review comments:
    
    1.
    +            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
    +                manifest_checksums = MC_SHA256;
    +            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
    +                manifest_checksums = MC_CRC32C;
    +            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
    +                manifest_checksums = MC_NONE;
    +            else
    +                ereport(ERROR,
    
    Is NONE is a valid input? I think the default is "NONE" only and thus no
    need
    of this as an input. It will be better if we simply error out if input is
    neither "SHA256" nor "CRC32C".
    
    I believe you have done this way as from pg_basebackup you are always
    passing
    MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
    given. But I think passing that conditional will be better like we have
    maxrate_clause for example.
    
    Well, this is what I think, feel free to ignore as I don't see any
    correctness
    issue over here.
    
    
    2.
    +    if (manifest_checksums != MC_NONE)
    +    {
    +        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
    +        switch (manifest_checksums)
    +        {
    +            case MC_NONE:
    +                break;
    +        }
    
    Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
    I don't think we need a case for MC_NONE here. Rather we can use a default
    case to error out.
    
    
    3.
    +    if (manifest_checksums != MC_NONE)
    +    {
    +        initialize_manifest_checksum(&cCtx);
    +        update_manifest_checksum(&cCtx, content, len);
    +    }
    
    @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
    *tarfilename, struct stat *statbuf
         int            segmentno = 0;
         char       *segmentpath;
         bool        verify_checksum = false;
    +    ChecksumCtx cCtx;
    +
    +    initialize_manifest_checksum(&cCtx);
    
    
    I see that in a few cases you are calling
    initialize/update_manifest_checksum()
    conditional and at some other places call is unconditional. It seems like
    calling unconditional will not have any issues as switch cases inside them
    return doing nothing when manifest_checksums is MC_NONE.
    
    
    4.
    initialize/update/finalize_manifest_checksum() functions may be needed by
    the
    validation patch as well. And thus I think these functions should not depend
    on a global variable as such. Also, it will be good if we keep them in a
    file
    that is accessible to frontend-only code. Well, you can ignore these
    comments
    with the argument saying that this refactoring can be done by the patch
    adding
    validation support. I have no issues. Since both the patches are dependent
    and
    posted on the same email chain, thought of putting that observation.
    
    
    5.
    +        switch (manifest_checksums)
    +        {
    +            case MC_SHA256:
    +                checksumlabel = "SHA256:";
    +                break;
    +            case MC_CRC32C:
    +                checksumlabel = "CRC32C:";
    +                break;
    +            case MC_NONE:
    +                break;
    +        }
    
    This code in AddFileToManifest() is executed for every file for which we are
    adding an entry. However, the checksumlabel will be going to remain the same
    throughout. Can it be set just once and then used as is?
    
    
    6.
    Can we avoid manifest_checksums from declaring it as a global variable?
    I think for that, we need to pass that to every function and thus need to
    change the function signature of various functions. Currently, we pass
    "StringInfo manifest" to all the required function, will it better to pass
    the struct variable instead? A struct may have members like,
    "StringInfo manifest" in it, checksum type (manifest_checksums),
    checksum label, etc.
    
    
    Thanks
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  53. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-09T09:22:34Z

    Thanks Jeevan for reviewing the patch and offline discussion.
    
    On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <
    jeevan.chalke@enterprisedb.com> wrote:
    
    >
    >
    > On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    >
    >>
    >>
    >> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >>
    >>> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com>
    >>> wrote:
    >>> > Here is the whole stack of patches.
    >>>
    >>> I committed 0001, as that's just refactoring and I think (hope) it's
    >>> uncontroversial. I think 0002-0005 need to be squashed together
    >>> (crediting all authors properly and in the appropriate order) as it's
    >>> quite hard to understand right now,
    >>
    >>
    >> Please find attached single patch and I tried to add the credit to all
    >> the authors.
    >>
    >
    > I had a look over the patch and here are my few review comments:
    >
    > 1.
    > +            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
    > +                manifest_checksums = MC_SHA256;
    > +            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
    > +                manifest_checksums = MC_CRC32C;
    > +            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
    > +                manifest_checksums = MC_NONE;
    > +            else
    > +                ereport(ERROR,
    >
    > Is NONE is a valid input? I think the default is "NONE" only and thus no
    > need
    > of this as an input. It will be better if we simply error out if input is
    > neither "SHA256" nor "CRC32C".
    >
    > I believe you have done this way as from pg_basebackup you are always
    > passing
    > MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
    > given. But I think passing that conditional will be better like we have
    > maxrate_clause for example.
    >
    > Well, this is what I think, feel free to ignore as I don't see any
    > correctness
    > issue over here.
    >
    >
    I would still keep this NONE as it's look more cleaner in the say of
    given options to the checksums.
    
    
    > 2.
    > +    if (manifest_checksums != MC_NONE)
    > +    {
    > +        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
    > +        switch (manifest_checksums)
    > +        {
    > +            case MC_NONE:
    > +                break;
    > +        }
    >
    > Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
    > I don't think we need a case for MC_NONE here. Rather we can use a default
    > case to error out.
    >
    >
    Yeah, with the new patch we don't have this part of code.
    
    
    > 3.
    > +    if (manifest_checksums != MC_NONE)
    > +    {
    > +        initialize_manifest_checksum(&cCtx);
    > +        update_manifest_checksum(&cCtx, content, len);
    > +    }
    >
    > @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
    > *tarfilename, struct stat *statbuf
    >      int            segmentno = 0;
    >      char       *segmentpath;
    >      bool        verify_checksum = false;
    > +    ChecksumCtx cCtx;
    > +
    > +    initialize_manifest_checksum(&cCtx);
    >
    >
    > I see that in a few cases you are calling
    > initialize/update_manifest_checksum()
    > conditional and at some other places call is unconditional. It seems like
    > calling unconditional will not have any issues as switch cases inside them
    > return doing nothing when manifest_checksums is MC_NONE.
    >
    >
    Fixed.
    
    
    > 4.
    > initialize/update/finalize_manifest_checksum() functions may be needed by
    > the
    > validation patch as well. And thus I think these functions should not
    > depend
    > on a global variable as such. Also, it will be good if we keep them in a
    > file
    > that is accessible to frontend-only code. Well, you can ignore these
    > comments
    > with the argument saying that this refactoring can be done by the patch
    > adding
    > validation support. I have no issues. Since both the patches are dependent
    > and
    > posted on the same email chain, thought of putting that observation.
    >
    >
    Make sense, I just changed those API to that it doesn't have to
    access the global.
    
    
    > 5.
    > +        switch (manifest_checksums)
    > +        {
    > +            case MC_SHA256:
    > +                checksumlabel = "SHA256:";
    > +                break;
    > +            case MC_CRC32C:
    > +                checksumlabel = "CRC32C:";
    > +                break;
    > +            case MC_NONE:
    > +                break;
    > +        }
    >
    > This code in AddFileToManifest() is executed for every file for which we
    > are
    > adding an entry. However, the checksumlabel will be going to remain the
    > same
    > throughout. Can it be set just once and then used as is?
    >
    >
    Yeah, with the attached patch we no more have this part of code.
    
    
    > 6.
    > Can we avoid manifest_checksums from declaring it as a global variable?
    > I think for that, we need to pass that to every function and thus need to
    > change the function signature of various functions. Currently, we pass
    > "StringInfo manifest" to all the required function, will it better to pass
    > the struct variable instead? A struct may have members like,
    > "StringInfo manifest" in it, checksum type (manifest_checksums),
    > checksum label, etc.
    >
    >
    I agree.  Earlier I was not sure about this because that require data
    structure
    to expose.  But in the given attached patch that's what I tried, introduced
    new
    data structure and defined in basebackup.h and passed the same through the
    function so that doesn't require to pass an individual members.   Also
    removed
    global manifest_checksum and added the same in the newly introduced
    structure.
    
    Attaching the patch, which need to apply on the top of earlier 0001 patch.
    
    Thanks,
    
    -- 
    Rushabh Lathia
    www.EnterpriseDB.com
    
  54. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-10T09:59:43Z

    On Mon, Dec 9, 2019 at 2:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    > Thanks Jeevan for reviewing the patch and offline discussion.
    >
    > On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <
    > jeevan.chalke@enterprisedb.com> wrote:
    >
    >>
    >>
    >> On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    >> wrote:
    >>
    >>>
    >>>
    >>> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com>
    >>> wrote:
    >>>
    >>>> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <
    >>>> rushabh.lathia@gmail.com> wrote:
    >>>> > Here is the whole stack of patches.
    >>>>
    >>>> I committed 0001, as that's just refactoring and I think (hope) it's
    >>>> uncontroversial. I think 0002-0005 need to be squashed together
    >>>> (crediting all authors properly and in the appropriate order) as it's
    >>>> quite hard to understand right now,
    >>>
    >>>
    >>> Please find attached single patch and I tried to add the credit to all
    >>> the authors.
    >>>
    >>
    >> I had a look over the patch and here are my few review comments:
    >>
    >> 1.
    >> +            if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
    >> +                manifest_checksums = MC_SHA256;
    >> +            else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") ==
    >> 0)
    >> +                manifest_checksums = MC_CRC32C;
    >> +            else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
    >> +                manifest_checksums = MC_NONE;
    >> +            else
    >> +                ereport(ERROR,
    >>
    >> Is NONE is a valid input? I think the default is "NONE" only and thus no
    >> need
    >> of this as an input. It will be better if we simply error out if input is
    >> neither "SHA256" nor "CRC32C".
    >>
    >> I believe you have done this way as from pg_basebackup you are always
    >> passing
    >> MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
    >> given. But I think passing that conditional will be better like we have
    >> maxrate_clause for example.
    >>
    >> Well, this is what I think, feel free to ignore as I don't see any
    >> correctness
    >> issue over here.
    >>
    >>
    > I would still keep this NONE as it's look more cleaner in the say of
    > given options to the checksums.
    >
    >
    >> 2.
    >> +    if (manifest_checksums != MC_NONE)
    >> +    {
    >> +        checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
    >> +        switch (manifest_checksums)
    >> +        {
    >> +            case MC_NONE:
    >> +                break;
    >> +        }
    >>
    >> Since switch case is within "if (manifest_checksums != MC_NONE)"
    >> condition,
    >> I don't think we need a case for MC_NONE here. Rather we can use a default
    >> case to error out.
    >>
    >>
    > Yeah, with the new patch we don't have this part of code.
    >
    >
    >> 3.
    >> +    if (manifest_checksums != MC_NONE)
    >> +    {
    >> +        initialize_manifest_checksum(&cCtx);
    >> +        update_manifest_checksum(&cCtx, content, len);
    >> +    }
    >>
    >> @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
    >> *tarfilename, struct stat *statbuf
    >>      int            segmentno = 0;
    >>      char       *segmentpath;
    >>      bool        verify_checksum = false;
    >> +    ChecksumCtx cCtx;
    >> +
    >> +    initialize_manifest_checksum(&cCtx);
    >>
    >>
    >> I see that in a few cases you are calling
    >> initialize/update_manifest_checksum()
    >> conditional and at some other places call is unconditional. It seems like
    >> calling unconditional will not have any issues as switch cases inside them
    >> return doing nothing when manifest_checksums is MC_NONE.
    >>
    >>
    > Fixed.
    >
    >
    >> 4.
    >> initialize/update/finalize_manifest_checksum() functions may be needed by
    >> the
    >> validation patch as well. And thus I think these functions should not
    >> depend
    >> on a global variable as such. Also, it will be good if we keep them in a
    >> file
    >> that is accessible to frontend-only code. Well, you can ignore these
    >> comments
    >> with the argument saying that this refactoring can be done by the patch
    >> adding
    >> validation support. I have no issues. Since both the patches are
    >> dependent and
    >> posted on the same email chain, thought of putting that observation.
    >>
    >>
    > Make sense, I just changed those API to that it doesn't have to
    > access the global.
    >
    >
    >> 5.
    >> +        switch (manifest_checksums)
    >> +        {
    >> +            case MC_SHA256:
    >> +                checksumlabel = "SHA256:";
    >> +                break;
    >> +            case MC_CRC32C:
    >> +                checksumlabel = "CRC32C:";
    >> +                break;
    >> +            case MC_NONE:
    >> +                break;
    >> +        }
    >>
    >> This code in AddFileToManifest() is executed for every file for which we
    >> are
    >> adding an entry. However, the checksumlabel will be going to remain the
    >> same
    >> throughout. Can it be set just once and then used as is?
    >>
    >>
    > Yeah, with the attached patch we no more have this part of code.
    >
    >
    >> 6.
    >> Can we avoid manifest_checksums from declaring it as a global variable?
    >> I think for that, we need to pass that to every function and thus need to
    >> change the function signature of various functions. Currently, we pass
    >> "StringInfo manifest" to all the required function, will it better to pass
    >> the struct variable instead? A struct may have members like,
    >> "StringInfo manifest" in it, checksum type (manifest_checksums),
    >> checksum label, etc.
    >>
    >>
    > I agree.  Earlier I was not sure about this because that require data
    > structure
    > to expose.  But in the given attached patch that's what I tried,
    > introduced new
    > data structure and defined in basebackup.h and passed the same through the
    > function so that doesn't require to pass an individual members.   Also
    > removed
    > global manifest_checksum and added the same in the newly introduced
    > structure.
    >
    > Attaching the patch, which need to apply on the top of earlier 0001 patch.
    >
    
    Attaching another version of 0002 patch, as my collogue Jeevan Chalke
    pointed
    few indentation problem in 0002 patch which I sent earlier.  Fixed the same
    in
    the latest patch.
    
    
    
    
    > Thanks,
    >
    > --
    > Rushabh Lathia
    > www.EnterpriseDB.com
    >
    
    
    -- 
    Rushabh Lathia
    
  55. Re: backup manifests

    Jeevan Chalke <jeevan.chalke@enterprisedb.com> — 2019-12-10T10:55:50Z

    On Tue, Dec 10, 2019 at 3:29 PM Rushabh Lathia <rushabh.lathia@gmail.com>
    wrote:
    
    >
    > Attaching another version of 0002 patch, as my collogue Jeevan Chalke
    > pointed
    > few indentation problem in 0002 patch which I sent earlier.  Fixed the
    > same in
    > the latest patch.
    >
    
    I had a look over the new patch and see no issues. Looks good to me.
    Thanks for quickly fixing the review comments posted earlier.
    
    However, here are the minor comments:
    
    1.
    @@ -122,6 +133,7 @@ static long long int total_checksum_failures;
     /* Do not verify checksums. */
     static bool noverify_checksums = false;
    
    +
     /*
      * The contents of these directories are removed or recreated during server
      * start so they are not included in backups.  The directories themselves
    are
    
    
    Please remove this unnecessary change.
    
    Need to run the indentation.
    
    Thanks
    -- 
    Jeevan Chalke
    Associate Database Architect & Team Lead, Product Development
    EnterpriseDB Corporation
    The Enterprise PostgreSQL Company
    
  56. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-10T11:40:35Z

    Hi,
    
    Please find attached patch for backup validator implementation (0004
    patch). This patch is based
    on Rushabh's latest patch for backup manifest.
    
    There are some functions required at client side as well, so I have moved
    those functions
    and some data structure at common place so that they can be accessible for
    both. (0003 patch).
    
    My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap
    test cases which
    is also attached. As of now, test cases related to the tablespace and tar
    backup  format are missing,
    will continue work on same and submit the complete patch.
    
    With this mail, I have attached the complete patch stack for backup
    manifest and backup
    validate implementation.
    
    Please let me know your thoughts on the same.
    
    On Fri, Dec 6, 2019 at 1:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <rushabh.lathia@gmail.com>
    > wrote:
    > > Here is the whole stack of patches.
    >
    > I committed 0001, as that's just refactoring and I think (hope) it's
    > uncontroversial. I think 0002-0005 need to be squashed together
    > (crediting all authors properly and in the appropriate order) as it's
    > quite hard to understand right now, and that Suraj's patch to validate
    > the backup should be included in the patch stack. It needs
    > documentation. Also, we need, either in that patch or a separate, TAP
    > tests that exercise this feature. Things we should try to check:
    >
    > - Plain format backups can be verified against the manifest.
    > - Tar format backups can be verified against the manifest after
    > untarring (this might be a problem; not sure there's any guarantee
    > that we have a working "tar" command available).
    > - Verification succeeds for all available checksums algorithms and
    > also for no checksum algorithm (should still check which files are
    > present, and sizes).
    > - If we tamper with a backup by removing a file, adding a file, or
    > changing the size of a file, the modification is detected even without
    > checksums.
    > - If we tamper with a backup by changing the contents of a file but
    > not the size, the modification is detected if checksums are used.
    > - Everything above still works if there is user-defined tablespace
    > that contains a table.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    >
    >
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  57. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-10T19:39:47Z

    On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > Please find attached patch for backup validator implementation (0004 patch). This patch is based
    > on Rushabh's latest patch for backup manifest.
    >
    > There are some functions required at client side as well, so I have moved those functions
    > and some data structure at common place so that they can be accessible for both. (0003 patch).
    >
    > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which
    > is also attached. As of now, test cases related to the tablespace and tar backup  format are missing,
    > will continue work on same and submit the complete patch.
    >
    > With this mail, I have attached the complete patch stack for backup manifest and backup
    > validate implementation.
    >
    > Please let me know your thoughts on the same.
    
    Well, for the second time on this thread, please don't take a bunch of
    somebody else's code and post it in a patch that doesn't attribute
    that person as one of the authors. For the second time on this thread,
    the person is me, but don't borrow *anyone's* code without proper
    attribution. It's really important!
    
    On a related note, it's a very good idea to use git format-patch and
    git rebase -i to maintain patch stacks like this. Rushabh seems to
    have done that, but the files you're posting look like raw 'git diff'
    output. Notice that this gives him a way to include authorship
    information and a tentative commit message in each patch, but you
    don't have any of that.
    
    Also on a related note, part of the process of adapting existing code
    to a new purpose is adapting the comments. You haven't done that:
    
    + * Search a result-set hash table for a row matching a given filename.
    ...
    + * Insert a row into a result-set hash table, provided no such row is already
    ...
    + * Most of the values
    + * that we're hashing are short integers formatted as text, so there
    + * shouldn't be much room for pathological input.
    
    I think that what we should actually do here is try to use simplehash.
    Right now, it won't work for frontend code, but I posted some patches
    to try to address that issue:
    
    https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
    
    That would have a few advantages. One, we wouldn't need to know the
    number of elements in advance, because simplehash can grow
    dynamically. Two, we could use the iteration interfaces to walk the
    hash table.  Your solution to that is pgrhash_seq_search, but that's
    actually not well-designed, because it's not a generic iterator
    function but something that knows specifically about the 'touch' flag.
    I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
    bad, but I think 'matched' will be a little more recognizable.
    
    Please run pgindent. If needed, first add locally defined types to
    typedefs.list, so that things indent properly.
    
    It's not a crazy idea to try to share some data structures and code
    between the frontend and the backend here, but I think
    src/common/backup.c and src/include/common/backup.h is a far too
    generic name given what the code is actually doing. It's mostly about
    checksums, not backup, and I think it should be named accordingly. I
    suggest removing "manifestinfo" and renaming the rest to just talk
    about checksums rather than manifests. That would make it logical to
    reuse this for any other future code that needs a configurable
    checksum type. Also, how about adding a function like:
    
    extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);
    
    ...which would return true and set *algo if name is recognized, and
    return false otherwise. That code could be used on both the client and
    server sides of this patch, and by any future patches that want to
    return this scaffolding.
    
    The file header for backup.h has the wrong filename (string.h). The
    header format looks somewhat atypical compared to what we normally do,
    too.
    
    It's arguable, but I tend to think that it would be better to
    hex-encode the CRC rather than printing it as an integer.  Maybe
    hex_encode() is another thing that could be moved into the new
    src/common file.
    
    As I said before about Rushabh's patch set, it's very confusing that
    we have so many patches here stacked up. Like, you have 0002 moving
    stuff, and then 0003 moving it again. That's super-confusing. Please
    try to structure the patch set so as to make it as easy to review as
    possible.
    
    Regarding the test case patch, error checks are important! Don't do
    things like this:
    
    +open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf";
    +print $modify_file_sha256 "port = 5555\n";
    +close $modify_file_sha256;
    
    If the open fails, then it and the print and the close are going to
    silently do nothing. That's bad. I don't know exactly what the
    customary error-checking is for things like this in TAP tests, but I
    hope it's not like this, because this has a pretty fair chance of
    looking like it's testing something that it isn't. Let's figure out
    what the best practice in this area is and adhere to it.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  58. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-12T12:32:49Z

    Thanks, Robert for the review.
    
    On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > Please find attached patch for backup validator implementation (0004
    > patch). This patch is based
    > > on Rushabh's latest patch for backup manifest.
    > >
    > > There are some functions required at client side as well, so I have
    > moved those functions
    > > and some data structure at common place so that they can be accessible
    > for both. (0003 patch).
    > >
    > > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for
    > tap test cases which
    > > is also attached. As of now, test cases related to the tablespace and
    > tar backup  format are missing,
    > > will continue work on same and submit the complete patch.
    > >
    > > With this mail, I have attached the complete patch stack for backup
    > manifest and backup
    > > validate implementation.
    > >
    > > Please let me know your thoughts on the same.
    >
    > Well, for the second time on this thread, please don't take a bunch of
    > somebody else's code and post it in a patch that doesn't attribute
    > that person as one of the authors. For the second time on this thread,
    > the person is me, but don't borrow *anyone's* code without proper
    > attribution. It's really important!
    >
    > On a related note, it's a very good idea to use git format-patch and
    > git rebase -i to maintain patch stacks like this. Rushabh seems to
    > have done that, but the files you're posting look like raw 'git diff'
    > output. Notice that this gives him a way to include authorship
    > information and a tentative commit message in each patch, but you
    > don't have any of that.
    >
    
    Sorry, I have corrected this in the attached v2 patch set.
    
    
    > Also on a related note, part of the process of adapting existing code
    > to a new purpose is adapting the comments. You haven't done that:
    >
    > + * Search a result-set hash table for a row matching a given filename.
    > ...
    > + * Insert a row into a result-set hash table, provided no such row is
    > already
    > ...
    > + * Most of the values
    > + * that we're hashing are short integers formatted as text, so there
    > + * shouldn't be much room for pathological input.
    >
    Corrected in v2 patch.
    
    
    > I think that what we should actually do here is try to use simplehash.
    > Right now, it won't work for frontend code, but I posted some patches
    > to try to address that issue:
    >
    >
    > https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
    >
    > That would have a few advantages. One, we wouldn't need to know the
    > number of elements in advance, because simplehash can grow
    > dynamically. Two, we could use the iteration interfaces to walk the
    > hash table.  Your solution to that is pgrhash_seq_search, but that's
    > actually not well-designed, because it's not a generic iterator
    > function but something that knows specifically about the 'touch' flag.
    > I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
    > bad, but I think 'matched' will be a little more recognizable.
    >
    
    Thanks for the suggestion. Will try to implement the same and update
    accordingly.
    I am assuming that I need to build the patch based on the changes that you
    proposed on the mentioned thread.
    
    
    > Please run pgindent. If needed, first add locally defined types to
    > typedefs.list, so that things indent properly.
    >
    > It's not a crazy idea to try to share some data structures and code
    > between the frontend and the backend here, but I think
    > src/common/backup.c and src/include/common/backup.h is a far too
    > generic name given what the code is actually doing. It's mostly about
    > checksums, not backup, and I think it should be named accordingly. I
    > suggest removing "manifestinfo" and renaming the rest to just talk
    > about checksums rather than manifests. That would make it logical to
    > reuse this for any other future code that needs a configurable
    > checksum type. Also, how about adding a function like:
    >
    > extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);
    >
    > ...which would return true and set *algo if name is recognized, and
    > return false otherwise. That code could be used on both the client and
    > server sides of this patch, and by any future patches that want to
    > return this scaffolding.
    >
    
    Corrected the filename and implemented the function as suggested.
    
    
    > The file header for backup.h has the wrong filename (string.h). The
    > header format looks somewhat atypical compared to what we normally do,
    > too.
    
    
    My bad, corrected the header format as well.
    
    
    >
    >
    It's arguable, but I tend to think that it would be better to
    > hex-encode the CRC rather than printing it as an integer.  Maybe
    > hex_encode() is another thing that could be moved into the new
    > src/common file.
    
    
    We are already encoding the CRC checksum as well. Please let me know if I
    misunderstood anything.
    Moved hex_encode into src/common.
    
    
    > As I said before about Rushabh's patch set, it's very confusing that
    > we have so many patches here stacked up. Like, you have 0002 moving
    > stuff, and then 0003 moving it again. That's super-confusing. Please
    > try to structure the patch set so as to make it as easy to review as
    > possible.
    >
    
    Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch.
    
    
    > Regarding the test case patch, error checks are important! Don't do
    > things like this:
    >
    > +open my $modify_file_sha256, '>>',
    > "$tempdir/backup_verify/postgresql.conf";
    > +print $modify_file_sha256 "port = 5555\n";
    > +close $modify_file_sha256;
    >
    > If the open fails, then it and the print and the close are going to
    > silently do nothing. That's bad. I don't know exactly what the
    > customary error-checking is for things like this in TAP tests, but I
    > hope it's not like this, because this has a pretty fair chance of
    > looking like it's testing something that it isn't. Let's figure out
    > what the best practice in this area is and adhere to it.
    >
    
    Rajkumar has fixed this, please find attached 0003 patch for same.
    
    Please find attached v2 set patches.
    
    TODO: will implement the simplehash as suggested.
    
    
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  59. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-17T05:54:46Z

    Hi,
    
    
    > I think that what we should actually do here is try to use simplehash.
    >> Right now, it won't work for frontend code, but I posted some patches
    >> to try to address that issue:
    >>
    >>
    >> https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
    >>
    >> That would have a few advantages. One, we wouldn't need to know the
    >> number of elements in advance, because simplehash can grow
    >> dynamically. Two, we could use the iteration interfaces to walk the
    >> hash table.  Your solution to that is pgrhash_seq_search, but that's
    >> actually not well-designed, because it's not a generic iterator
    >> function but something that knows specifically about the 'touch' flag.
    >> I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
    >> bad, but I think 'matched' will be a little more recognizable.
    >>
    >
    > Thanks for the suggestion. Will try to implement the same and update
    > accordingly.
    > I am assuming that I need to build the patch based on the changes that you
    > proposed on the mentioned thread.
    >
    >
    
    I have implemented the simplehash in backup validator patch as Robert
    suggested. Please find attached 0002 patch for the same.
    
    kindly review and let me know your thoughts.
    
    Also attached the remaining patches. 0001 and 0003 are same as v2, only
    patch version is bumped.
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  60. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-18T21:24:03Z

    On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > I have implemented the simplehash in backup validator patch as Robert suggested. Please find attached 0002 patch for the same.
    >
    > kindly review and let me know your thoughts.
    
    +#define CHECKSUM_LENGTH 256
    
    This seems wrong. Not all checksums are the same length, and none of
    the ones we're using are 256 bytes in length, and if we've got to have
    a constant someplace for the maximum checksum length, it should
    probably be in the new header file, not here. But I don't think we
    should need this in the first place; see comments below about how to
    revise the parsing of the manifest file.
    
    +    char        filetype[10];
    
    A mysterious 10-byte field with no comments explaining what it
    means... and the same magic number 10 appears in at least one other
    place in the patch.
    
    +typedef struct manifesthash_hash *hashtab;
    
    This declares a new *type* called hashtab, not a variable called
    hashtab. The new type is not used anywhere, but later, you have
    several variables of the same type that have this name. Just remove
    this: it's wrong and unused.
    
    +static enum ChecksumAlgorithm checksum_type = MC_NONE;
    
    Remove "enum". Not needed, because you have a typedef for it in the
    header, and not per style.
    
    +static  manifesthash_hash *create_manifest_hash(char manifest_path[MAXPGPATH]);
    
    Whitespace is wrong. The whole patch needs a visit from pgindent with
    a properly-updated typedefs.list.
    
    Also, you will struggle to find anywhere else in the code base where
    pass a character array as a function argument. I don't know why this
    isn't just char *.
    
    +    if(verify_backup)
    
    Whitespace wrong here, too.
    
    + * Read the backup_manifest file and generate the hash table, then scan data
    + * directroy and verify each file. Finally, iterate on hash table to find
    + * out missing files.
    
    You've got a word spelled wrong here, but the bigger problem is that
    this comment doesn't actually describe what this function is trying to
    do. Instead, it describes how it does it. If it's necessary to explain
    what steps the function takes in order to accomplish some goal, you
    should comment individual bits of code in the function. The header
    comment is a high-level overview, not a description of the algorithm.
    
    It's also pretty unhelpful, here and elsewhere, to refer to "the hash
    table" as if there were only one, and as if the reader were supposed
    to know something about it when you haven't told them anything about
    it.
    
    +        if (!entry->matched)
    +        {
    +            pg_log_info("missing file: %s", entry->filename);
    +        }
    +
    
    The braces here are not project style. We usually omit braces when
    only a single line of code is present.
    
    I think some work needs to be done to standardize and improve the
    messages that get produced here.  You have:
    
    1. missing file: %s
    2. duplicate file present: %s
    3. size changed for file: %s, original size: %d, current size: %zu
    4. checksum difference for file: %s
    5. extra file found: %s
    
    I suggest:
    
    1. file \"%s\" is present in manifest but missing from the backup
    2. file \"%s\" has multiple manifest entries
    (this one should probably be pg_log_error(), not pg_log_info(), as it
    represents a corrupt-manifest problem)
    3. file \"%s" has size %lu in manifest but size %lu in backup
    4. file \"%s" has checksum %s in manifest but checksum %s in backup
    5. file \"%s" is present in backup but not in manifest
    
    Your patch actually doesn't compile on my system, because for the
    third message above, it uses %zu to print the size. But %zu is for
    size_t, not off_t. I went looking for other places in the code where
    we print off_t; based on that, I think the right thing to do is to
    print it using %lu and write (unsigned long) st.st_size.
    
    +    char        file_checksum[256];
    +    char        header[1024];
    
    More arbitrary constants.
    
    +    if (!file)
    +    {
    +        pg_log_error("could not open backup_manifest");
    
    That's bad error reporting.  See e.g. readfile() in initdb.c.
    
    +    if (fscanf(file, "%1023[^\n]\n", header) != 1)
    +    {
    +        pg_log_error("error while reading the header from backup_manifest");
    
    That's also bad error reporting. It is only a slight step up from
    "ERROR: error".
    
    And we have another magic number (1023).
    
    +    appendPQExpBufferStr(manifest, header);
    +    appendPQExpBufferStr(manifest, "\n");
    ...
    +        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
    +                          filesize, mtime, checksum_with_type);
    
    This whole thing seems completely crazy to me. Basically, you're
    trying to use fscanf() to parse the file. But then, because fscanf()
    doesn't give you the original bytes back, you're trying to reassemble
    the data that you parsed to recover the original line, so that you can
    stuff it in the buffer and eventually checksum it. However, that's
    highly error-prone. You're basically duplicating server code, and thus
    risking getting out of sync in the server code, to work around a
    problem that is entirely self-inflicted, namely, deciding to use
    fscanf().
    
    What I would recommend is:
    
    1. Use open(), read(), close() rather than the fopen() family of
    functions. As we have discovered elsewhere, fread() doesn't promise to
    set errno, so we can't necessarily get reliable error-reporting out of
    it.
    
    2. Before you start reading the file, create a buffer that's large
    enough to hold the whole thing, by using fstat() to figure out how big
    the file is. Read the whole file into that buffer.  If you're not able
    to read the whole file -- i.e. open() or read() or close() fail --
    then just error out and exit.
    
    3. Now advance through the file line by line. Write a function that
    knows how to search forward for the next \r or \n but with checks to
    make sure it can't run off the end of the buffer, and use that to
    locate the end of each line so that you can walk forward. As you walk
    forward line by line, add the line you just processed to the checksum.
    That way, you only need a single pass over the data. Also, you can
    modify it in place.  More on that below.
    
    4. As you examine each line, start by examining the first word. You'll
    need a function that finds the first word by searching forward for a
    tab character, but not beyond the end of the line. The first word of
    the first line should be PostgreSQL-Backup-Manifest-Version and the
    second word should be 1. Then on each subsequent line check whether
    the first word is File or Manifest-Checksum or something else,
    erroring out in the last case. If it's Manifest-Checksum, verify that
    this is the last line of the file and that the checksum matches. If
    it's File, break the line into fields so you can add it to the hash
    table. You'll want a pointer to the filename and a pointer to the
    checksum, and you'll want to parse the size as an integer. Instead of
    allocating new memory for those fields, just overwrite the character
    that follows the field with a \0. There must be one - either \t or \n
    - so you shouldn't run off the end of the buffer.
    
    If you do this, a bunch of the fixed-size buffers you have right now
    go away. You don't need the variable filetype[10] any more, or
    checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
    the character arrays inside DataDirectoryFileInfo. Instead you can
    just have pointers into the buffer that contains the file. And you
    don't need this code to back up using fseek() and reread the lines,
    either.
    
    Also read this article:
    
    https://stackoverflow.com/questions/2430303/disadvantages-of-scanf
    
    Note that the very first point in the article talks about the problem
    of overrunning the buffer, which you certainly have in the current
    code right here:
    
    +        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,
    
    filetype is declared as char[10], but %s could read arbitrarily much data.
    
    +        filename = (char*) pg_malloc(MAXPGPATH);
    
    pg_malloc returns void *, so no cast is required.
    
    +        if (strcmp(checksum_with_type, "-") == 0)
    +        {
    +            checksum_type = MC_NONE;
    +        }
    +        else
    +        {
    +            if (strncmp(checksum_with_type, "SHA256", 6) == 0)
    
    Use parse_checksum_algorithm. Right now you've invented a "common"
    function with 1 caller, but I explicitly suggested previously that you
    put it in common so that you could reuse it.
    
    +        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0 ||
    +            strcmp(de->d_name, "pg_wal") == 0)
    +            continue;
    
    Ignoring pg_wal at the top level might be OK, but this will ignore a
    pg_wal entry anywhere in the directory tree.
    
    +    /* Skip backup manifest file. */
    +    if (strcmp(de->d_name, "backup_manifest") == 0)
    +        return;
    
    Same problem.
    
    +    filename = createPQExpBuffer();
    +    if (!filename)
    +    {
    +        pg_log_error("out of memory");
    +        exit(1);
    +    }
    +
    +    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);
    
    Just use char filename[MAXPGPATH] and snprintf here, as you do
    elsewhere. It will be simpler and save memory.
    
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  61. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-20T13:24:20Z

    Thank you for review comments.
    
    On Thu, Dec 19, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > I have implemented the simplehash in backup validator patch as Robert
    > suggested. Please find attached 0002 patch for the same.
    > >
    > > kindly review and let me know your thoughts.
    >
    > +#define CHECKSUM_LENGTH 256
    >
    > This seems wrong. Not all checksums are the same length, and none of
    > the ones we're using are 256 bytes in length, and if we've got to have
    > a constant someplace for the maximum checksum length, it should
    > probably be in the new header file, not here. But I don't think we
    > should need this in the first place; see comments below about how to
    > revise the parsing of the manifest file.
    >
    
    I agree. Removed.
    
    +    char        filetype[10];
    >
    > A mysterious 10-byte field with no comments explaining what it
    > means... and the same magic number 10 appears in at least one other
    > place in the patch.
    >
    
    with current logic, we don't need this anymore.
    I have removed the filetype from the structure as we are not doing any
    comparison anywhere.
    
    
    >
    > +typedef struct manifesthash_hash *hashtab;
    >
    > This declares a new *type* called hashtab, not a variable called
    > hashtab. The new type is not used anywhere, but later, you have
    > several variables of the same type that have this name. Just remove
    > this: it's wrong and unused.
    >
    >
    corrected.
    
    
    > +static enum ChecksumAlgorithm checksum_type = MC_NONE;
    >
    > Remove "enum". Not needed, because you have a typedef for it in the
    > header, and not per style.
    >
    > corrected.
    
    
    > +static  manifesthash_hash *create_manifest_hash(char
    > manifest_path[MAXPGPATH]);
    >
    > Whitespace is wrong. The whole patch needs a visit from pgindent with
    > a properly-updated typedefs.list.
    >
    > Also, you will struggle to find anywhere else in the code base where
    > pass a character array as a function argument. I don't know why this
    > isn't just char *.
    >
    
    Corrected.
    
    
    >
    > +    if(verify_backup)
    >
    > Whitespace wrong here, too.
    >
    >
    Fixed
    
    
    >
    > It's also pretty unhelpful, here and elsewhere, to refer to "the hash
    > table" as if there were only one, and as if the reader were supposed
    > to know something about it when you haven't told them anything about
    > it.
    >
    > +        if (!entry->matched)
    > +        {
    > +            pg_log_info("missing file: %s", entry->filename);
    > +        }
    > +
    >
    > The braces here are not project style. We usually omit braces when
    > only a single line of code is present.
    >
    
    fixed
    
    
    >
    > I think some work needs to be done to standardize and improve the
    > messages that get produced here.  You have:
    >
    > 1. missing file: %s
    > 2. duplicate file present: %s
    > 3. size changed for file: %s, original size: %d, current size: %zu
    > 4. checksum difference for file: %s
    > 5. extra file found: %s
    >
    > I suggest:
    >
    > 1. file \"%s\" is present in manifest but missing from the backup
    > 2. file \"%s\" has multiple manifest entries
    > (this one should probably be pg_log_error(), not pg_log_info(), as it
    > represents a corrupt-manifest problem)
    > 3. file \"%s" has size %lu in manifest but size %lu in backup
    > 4. file \"%s" has checksum %s in manifest but checksum %s in backup
    > 5. file \"%s" is present in backup but not in manifest
    >
    
    Corrected.
    
    
    >
    > Your patch actually doesn't compile on my system, because for the
    > third message above, it uses %zu to print the size. But %zu is for
    > size_t, not off_t. I went looking for other places in the code where
    > we print off_t; based on that, I think the right thing to do is to
    > print it using %lu and write (unsigned long) st.st_size.
    >
    
    Corrected.
    
    +    char        file_checksum[256];
    > +    char        header[1024];
    >
    > More arbitrary constants.
    
    
    
    >
    > +    if (!file)
    > +    {
    > +        pg_log_error("could not open backup_manifest");
    >
    > That's bad error reporting.  See e.g. readfile() in initdb.c.
    >
    
    Corrected.
    
    
    >
    > +    if (fscanf(file, "%1023[^\n]\n", header) != 1)
    > +    {
    > +        pg_log_error("error while reading the header from
    > backup_manifest");
    >
    > That's also bad error reporting. It is only a slight step up from
    > "ERROR: error".
    >
    > And we have another magic number (1023).
    >
    
    With current logic, we don't need this anymore.
    
    
    >
    > +    appendPQExpBufferStr(manifest, header);
    > +    appendPQExpBufferStr(manifest, "\n");
    > ...
    > +        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
    > +                          filesize, mtime, checksum_with_type);
    >
    > This whole thing seems completely crazy to me. Basically, you're
    > trying to use fscanf() to parse the file. But then, because fscanf()
    > doesn't give you the original bytes back, you're trying to reassemble
    > the data that you parsed to recover the original line, so that you can
    > stuff it in the buffer and eventually checksum it. However, that's
    > highly error-prone. You're basically duplicating server code, and thus
    > risking getting out of sync in the server code, to work around a
    > problem that is entirely self-inflicted, namely, deciding to use
    > fscanf().
    >
    > What I would recommend is:
    >
    > 1. Use open(), read(), close() rather than the fopen() family of
    > functions. As we have discovered elsewhere, fread() doesn't promise to
    > set errno, so we can't necessarily get reliable error-reporting out of
    > it.
    >
    > 2. Before you start reading the file, create a buffer that's large
    > enough to hold the whole thing, by using fstat() to figure out how big
    > the file is. Read the whole file into that buffer.  If you're not able
    > to read the whole file -- i.e. open() or read() or close() fail --
    > then just error out and exit.
    >
    > 3. Now advance through the file line by line. Write a function that
    > knows how to search forward for the next \r or \n but with checks to
    > make sure it can't run off the end of the buffer, and use that to
    > locate the end of each line so that you can walk forward. As you walk
    > forward line by line, add the line you just processed to the checksum.
    > That way, you only need a single pass over the data. Also, you can
    > modify it in place.  More on that below.
    >
    > 4. As you examine each line, start by examining the first word. You'll
    > need a function that finds the first word by searching forward for a
    > tab character, but not beyond the end of the line. The first word of
    > the first line should be PostgreSQL-Backup-Manifest-Version and the
    > second word should be 1. Then on each subsequent line check whether
    > the first word is File or Manifest-Checksum or something else,
    > erroring out in the last case. If it's Manifest-Checksum, verify that
    > this is the last line of the file and that the checksum matches. If
    > it's File, break the line into fields so you can add it to the hash
    > table. You'll want a pointer to the filename and a pointer to the
    > checksum, and you'll want to parse the size as an integer. Instead of
    > allocating new memory for those fields, just overwrite the character
    > that follows the field with a \0. There must be one - either \t or \n
    > - so you shouldn't run off the end of the buffer.
    >
    > If you do this, a bunch of the fixed-size buffers you have right now
    > go away. You don't need the variable filetype[10] any more, or
    > checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
    > the character arrays inside DataDirectoryFileInfo. Instead you can
    > just have pointers into the buffer that contains the file. And you
    > don't need this code to back up using fseek() and reread the lines,
    > either.
    >
    >
    Thanks for the suggestion. I tried to mimic your approach in the attached
    v4-0002 patch.
    Please let me know your thoughts on the same.
    
    Also read this article:
    >
    > https://stackoverflow.com/questions/2430303/disadvantages-of-scanf
    >
    > Note that the very first point in the article talks about the problem
    > of overrunning the buffer, which you certainly have in the current
    > code right here:
    >
    > +        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,
    >
    > filetype is declared as char[10], but %s could read arbitrarily much data.
    >
    
    now with this revised logic, we don't use this anymore.
    
    
    >
    > +        filename = (char*) pg_malloc(MAXPGPATH);
    >
    > pg_malloc returns void *, so no cast is required.
    >
    >
    fixed.
    
    
    > +        if (strcmp(checksum_with_type, "-") == 0)
    > +        {
    > +            checksum_type = MC_NONE;
    > +        }
    > +        else
    > +        {
    > +            if (strncmp(checksum_with_type, "SHA256", 6) == 0)
    >
    > Use parse_checksum_algorithm. Right now you've invented a "common"
    > function with 1 caller, but I explicitly suggested previously that you
    > put it in common so that you could reuse it.
    >
    
    while parsing the record, we get <checktype>:<checksum> as a string for
    checksum.
    parse_checksum_algorithm uses pg_strcasecmp()  so we need to pass exact
    string to that function.
    with current logic, we can't add '\0' in between the line unless we parse
    it completely.
    So we may need to allocate another small buffer and copy only checksum type
    in that and pass that to
     parse_checksum_algorithm.  I don't think of any other solution apart from
    this. I might be missing something
    here, please correct me if I am wrong.
    
    
    > +        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0
    > ||
    > +            strcmp(de->d_name, "pg_wal") == 0)
    > +            continue;
    >
    > Ignoring pg_wal at the top level might be OK, but this will ignore a
    > pg_wal entry anywhere in the directory tree.
    >
    > +    /* Skip backup manifest file. */
    > +    if (strcmp(de->d_name, "backup_manifest") == 0)
    > +        return;
    >
    > Same problem.
    >
    
    You are right. Added extra check for this.
    
    
    >
    > +    filename = createPQExpBuffer();
    > +    if (!filename)
    > +    {
    > +        pg_log_error("out of memory");
    > +        exit(1);
    > +    }
    > +
    > +    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);
    >
    > Just use char filename[MAXPGPATH] and snprintf here, as you do
    > elsewhere. It will be simpler and save memory.
    >
    Fixed.
    
    TAP test case patch needs some modification, Will do that and submit.
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  62. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-20T15:10:57Z

    Fixed some typos in attached v5-0002 patch. Please consider this patch for
    review.
    
    On Fri, Dec 20, 2019 at 6:54 PM Suraj Kharage <
    suraj.kharage@enterprisedb.com> wrote:
    
    > Thank you for review comments.
    >
    > On Thu, Dec 19, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Tue, Dec 17, 2019 at 12:54 AM Suraj Kharage
    >> <suraj.kharage@enterprisedb.com> wrote:
    >> > I have implemented the simplehash in backup validator patch as Robert
    >> suggested. Please find attached 0002 patch for the same.
    >> >
    >> > kindly review and let me know your thoughts.
    >>
    >> +#define CHECKSUM_LENGTH 256
    >>
    >> This seems wrong. Not all checksums are the same length, and none of
    >> the ones we're using are 256 bytes in length, and if we've got to have
    >> a constant someplace for the maximum checksum length, it should
    >> probably be in the new header file, not here. But I don't think we
    >> should need this in the first place; see comments below about how to
    >> revise the parsing of the manifest file.
    >>
    >
    > I agree. Removed.
    >
    > +    char        filetype[10];
    >>
    >> A mysterious 10-byte field with no comments explaining what it
    >> means... and the same magic number 10 appears in at least one other
    >> place in the patch.
    >>
    >
    > with current logic, we don't need this anymore.
    > I have removed the filetype from the structure as we are not doing any
    > comparison anywhere.
    >
    >
    >>
    >> +typedef struct manifesthash_hash *hashtab;
    >>
    >> This declares a new *type* called hashtab, not a variable called
    >> hashtab. The new type is not used anywhere, but later, you have
    >> several variables of the same type that have this name. Just remove
    >> this: it's wrong and unused.
    >>
    >>
    > corrected.
    >
    >
    >> +static enum ChecksumAlgorithm checksum_type = MC_NONE;
    >>
    >> Remove "enum". Not needed, because you have a typedef for it in the
    >> header, and not per style.
    >>
    >> corrected.
    >
    >
    >> +static  manifesthash_hash *create_manifest_hash(char
    >> manifest_path[MAXPGPATH]);
    >>
    >> Whitespace is wrong. The whole patch needs a visit from pgindent with
    >> a properly-updated typedefs.list.
    >>
    >> Also, you will struggle to find anywhere else in the code base where
    >> pass a character array as a function argument. I don't know why this
    >> isn't just char *.
    >>
    >
    > Corrected.
    >
    >
    >>
    >> +    if(verify_backup)
    >>
    >> Whitespace wrong here, too.
    >>
    >>
    > Fixed
    >
    >
    >>
    >> It's also pretty unhelpful, here and elsewhere, to refer to "the hash
    >> table" as if there were only one, and as if the reader were supposed
    >> to know something about it when you haven't told them anything about
    >> it.
    >>
    >> +        if (!entry->matched)
    >> +        {
    >> +            pg_log_info("missing file: %s", entry->filename);
    >> +        }
    >> +
    >>
    >> The braces here are not project style. We usually omit braces when
    >> only a single line of code is present.
    >>
    >
    > fixed
    >
    >
    >>
    >> I think some work needs to be done to standardize and improve the
    >> messages that get produced here.  You have:
    >>
    >> 1. missing file: %s
    >> 2. duplicate file present: %s
    >> 3. size changed for file: %s, original size: %d, current size: %zu
    >> 4. checksum difference for file: %s
    >> 5. extra file found: %s
    >>
    >> I suggest:
    >>
    >> 1. file \"%s\" is present in manifest but missing from the backup
    >> 2. file \"%s\" has multiple manifest entries
    >> (this one should probably be pg_log_error(), not pg_log_info(), as it
    >> represents a corrupt-manifest problem)
    >> 3. file \"%s" has size %lu in manifest but size %lu in backup
    >> 4. file \"%s" has checksum %s in manifest but checksum %s in backup
    >> 5. file \"%s" is present in backup but not in manifest
    >>
    >
    > Corrected.
    >
    >
    >>
    >> Your patch actually doesn't compile on my system, because for the
    >> third message above, it uses %zu to print the size. But %zu is for
    >> size_t, not off_t. I went looking for other places in the code where
    >> we print off_t; based on that, I think the right thing to do is to
    >> print it using %lu and write (unsigned long) st.st_size.
    >>
    >
    > Corrected.
    >
    > +    char        file_checksum[256];
    >> +    char        header[1024];
    >>
    >> More arbitrary constants.
    >
    >
    >
    >>
    >> +    if (!file)
    >> +    {
    >> +        pg_log_error("could not open backup_manifest");
    >>
    >> That's bad error reporting.  See e.g. readfile() in initdb.c.
    >>
    >
    > Corrected.
    >
    >
    >>
    >> +    if (fscanf(file, "%1023[^\n]\n", header) != 1)
    >> +    {
    >> +        pg_log_error("error while reading the header from
    >> backup_manifest");
    >>
    >> That's also bad error reporting. It is only a slight step up from
    >> "ERROR: error".
    >>
    >> And we have another magic number (1023).
    >>
    >
    > With current logic, we don't need this anymore.
    >
    >
    >>
    >> +    appendPQExpBufferStr(manifest, header);
    >> +    appendPQExpBufferStr(manifest, "\n");
    >> ...
    >> +        appendPQExpBuffer(manifest, "File\t%s\t%d\t%s\t%s\n", filename,
    >> +                          filesize, mtime, checksum_with_type);
    >>
    >> This whole thing seems completely crazy to me. Basically, you're
    >> trying to use fscanf() to parse the file. But then, because fscanf()
    >> doesn't give you the original bytes back, you're trying to reassemble
    >> the data that you parsed to recover the original line, so that you can
    >> stuff it in the buffer and eventually checksum it. However, that's
    >> highly error-prone. You're basically duplicating server code, and thus
    >> risking getting out of sync in the server code, to work around a
    >> problem that is entirely self-inflicted, namely, deciding to use
    >> fscanf().
    >>
    >> What I would recommend is:
    >>
    >> 1. Use open(), read(), close() rather than the fopen() family of
    >> functions. As we have discovered elsewhere, fread() doesn't promise to
    >> set errno, so we can't necessarily get reliable error-reporting out of
    >> it.
    >>
    >> 2. Before you start reading the file, create a buffer that's large
    >> enough to hold the whole thing, by using fstat() to figure out how big
    >> the file is. Read the whole file into that buffer.  If you're not able
    >> to read the whole file -- i.e. open() or read() or close() fail --
    >> then just error out and exit.
    >>
    >> 3. Now advance through the file line by line. Write a function that
    >> knows how to search forward for the next \r or \n but with checks to
    >> make sure it can't run off the end of the buffer, and use that to
    >> locate the end of each line so that you can walk forward. As you walk
    >> forward line by line, add the line you just processed to the checksum.
    >> That way, you only need a single pass over the data. Also, you can
    >> modify it in place.  More on that below.
    >>
    >> 4. As you examine each line, start by examining the first word. You'll
    >> need a function that finds the first word by searching forward for a
    >> tab character, but not beyond the end of the line. The first word of
    >> the first line should be PostgreSQL-Backup-Manifest-Version and the
    >> second word should be 1. Then on each subsequent line check whether
    >> the first word is File or Manifest-Checksum or something else,
    >> erroring out in the last case. If it's Manifest-Checksum, verify that
    >> this is the last line of the file and that the checksum matches. If
    >> it's File, break the line into fields so you can add it to the hash
    >> table. You'll want a pointer to the filename and a pointer to the
    >> checksum, and you'll want to parse the size as an integer. Instead of
    >> allocating new memory for those fields, just overwrite the character
    >> that follows the field with a \0. There must be one - either \t or \n
    >> - so you shouldn't run off the end of the buffer.
    >>
    >> If you do this, a bunch of the fixed-size buffers you have right now
    >> go away. You don't need the variable filetype[10] any more, or
    >> checksum_with_type[CHECKSUM_LENGTH], or checksum[CHECKSUM_LENGTH], or
    >> the character arrays inside DataDirectoryFileInfo. Instead you can
    >> just have pointers into the buffer that contains the file. And you
    >> don't need this code to back up using fseek() and reread the lines,
    >> either.
    >>
    >>
    > Thanks for the suggestion. I tried to mimic your approach in the attached
    > v4-0002 patch.
    > Please let me know your thoughts on the same.
    >
    > Also read this article:
    >>
    >> https://stackoverflow.com/questions/2430303/disadvantages-of-scanf
    >>
    >> Note that the very first point in the article talks about the problem
    >> of overrunning the buffer, which you certainly have in the current
    >> code right here:
    >>
    >> +        if (fscanf(file, "%s\t%s\t%d\t%23[^\t] %s\n", filetype, filename,
    >>
    >> filetype is declared as char[10], but %s could read arbitrarily much data.
    >>
    >
    > now with this revised logic, we don't use this anymore.
    >
    >
    >>
    >> +        filename = (char*) pg_malloc(MAXPGPATH);
    >>
    >> pg_malloc returns void *, so no cast is required.
    >>
    >>
    > fixed.
    >
    >
    >> +        if (strcmp(checksum_with_type, "-") == 0)
    >> +        {
    >> +            checksum_type = MC_NONE;
    >> +        }
    >> +        else
    >> +        {
    >> +            if (strncmp(checksum_with_type, "SHA256", 6) == 0)
    >>
    >> Use parse_checksum_algorithm. Right now you've invented a "common"
    >> function with 1 caller, but I explicitly suggested previously that you
    >> put it in common so that you could reuse it.
    >>
    >
    > while parsing the record, we get <checktype>:<checksum> as a string for
    > checksum.
    > parse_checksum_algorithm uses pg_strcasecmp()  so we need to pass exact
    > string to that function.
    > with current logic, we can't add '\0' in between the line unless we parse
    > it completely.
    > So we may need to allocate another small buffer and copy only checksum
    > type in that and pass that to
    >  parse_checksum_algorithm.  I don't think of any other solution apart from
    > this. I might be missing something
    > here, please correct me if I am wrong.
    >
    >
    >> +        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") ==
    >> 0 ||
    >> +            strcmp(de->d_name, "pg_wal") == 0)
    >> +            continue;
    >>
    >> Ignoring pg_wal at the top level might be OK, but this will ignore a
    >> pg_wal entry anywhere in the directory tree.
    >>
    >> +    /* Skip backup manifest file. */
    >> +    if (strcmp(de->d_name, "backup_manifest") == 0)
    >> +        return;
    >>
    >> Same problem.
    >>
    >
    > You are right. Added extra check for this.
    >
    >
    >>
    >> +    filename = createPQExpBuffer();
    >> +    if (!filename)
    >> +    {
    >> +        pg_log_error("out of memory");
    >> +        exit(1);
    >> +    }
    >> +
    >> +    appendPQExpBuffer(filename, "%s%s", relative_path, de->d_name);
    >>
    >> Just use char filename[MAXPGPATH] and snprintf here, as you do
    >> elsewhere. It will be simpler and save memory.
    >>
    > Fixed.
    >
    > TAP test case patch needs some modification, Will do that and submit.
    >
    > --
    > --
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  63. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-20T15:43:57Z

    On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > Thank you for review comments.
    
    Thanks for the new version.
    
    +      <term><option>--verify-backup </option></term>
    
    Whitespace.
    
    +struct manifesthash_hash *hashtab;
    
    Uh, I had it in mind that you would nuke this line completely, not
    just remove "typedef" from it. You shouldn't need a global variable
    here.
    
    + if (buf == NULL)
    
    pg_malloc seems to have an internal check such that it never returns
    NULL. I don't see anything like this test in other callers.
    
    The order of operations in create_manifest_hash() seems unusual:
    
    + fd = open(manifest_path, O_RDONLY, 0);
    + if (fstat(fd, &stat))
    + buf = pg_malloc(stat.st_size);
    + hashtab = manifesthash_create(1024, NULL);
    ...
    + entry = manifesthash_insert(hashtab, filename, &found);
    ...
    + close(fd);
    
    I would have expected open-fstat-read-close to be consecutive, and the
    manifesthash stuff all done afterwards. In fact, it seems like reading
    the file could be a separate function.
    
    + if (strncmp(checksum, "SHA256", 6) == 0)
    
    This isn't really right; it would give a false match if we had a
    checksum algorithm with a name like SHA2560 or SHA256C or
    SHA256ExceptWayBetter. The right thing to do is find the colon first,
    and then probably overwrite it with '\0' so that you have a string
    that you can pass to parse_checksum_algorithm().
    
    + /*
    + * we don't have checksum type in the header, so need to
    + * read through the first file enttry to find the checksum
    + * type for the manifest file and initilize the checksum
    + * for the manifest file itself.
    + */
    
    This seems to be proceeding on the assumption that the checksum type
    for the manifest itself will always be the same as the checksum type
    for the first file in the manifest. I don't think that's the right
    approach. I think the manifest should always have a SHA256 checksum,
    regardless of what type of checksum is used for the individual files
    within the manifest. Since the volume of data in the manifest is
    presumably very small compared to the size of the database cluster
    itself, I don't think there should be any performance problem there.
    
    + filesize = atol(size);
    
    Using strtol() would allow for some error checking.
    
    + * Increase the checksum by its lable length so that we can
    + checksum = checksum + checksum_lable_length;
    
    Spelling.
    
    + pg_log_error("invalid record found in \"%s\"", manifest_path);
    
    Error message needs work.
    
    +VerifyBackup(void)
    +create_manifest_hash(char *manifest_path)
    +nextLine(char *buf)
    
    Your function names should be consistent with the surrounding style,
    and with each other, as far as possible. Three different conventions
    within the same patch and source file seems over the top.
    
    Also keep in mind that you're not writing code in a vacuum. There's a
    whole file of code here, and around that, a whole project.
    scan_data_directory() is a good example of a function whose name is
    clearly too generic. It's not a general-purpose function for scanning
    the data directory; it's specifically a support function for verifying
    a backup. Yet, the name gives no hint of this.
    
    +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
    + char relative_path[MAXPGPATH], manifesthash_hash *hashtab)
    
    I think I commented on the use of char[] parameters in my previous review.
    
    + /* Skip backup manifest file. */
    + if (strcmp(de->d_name, "backup_manifest") == 0)
    + return;
    
    Still looks like this will be skipped at any level of the directory
    hierarchy, not just the top. And why are we skipping backup_manifest
    here bug pg_wal in scan_data_directory? That's a rhetorical question,
    because I know the answer: verify_file() is only getting called for
    files, so you can't use it to skip directories. But that's not a good
    excuse for putting closely-related checks in different parts of the
    code. It's just going to result in the checks being inconsistent and
    each one having its own bugs that have to be fixed separately from the
    other one, as here. Please try to reorganize this code so that it can
    be done in a consistent way.
    
    I think this is related to the way you're traversing the directory
    tree, which somehow looks a bit awkward to me. At the top of
    scan_data_directory(), you've got code that uses basedir and
    subdirpath to construct path and relative_path. I was initially
    surprised to see that this was the job of this function, rather than
    the caller, but then I thought: well, as long as it makes life easy
    for the caller, it's probably fine. However, I notice that the only
    non-trivial caller is the scan_data_directory() itself, and it has to
    go and construct newsubdirpath from subdirpath and the directory name.
    
    It seems to me that this would get easier if you defined
    scan_data_directory() -- or whatever we end up calling it -- to take
    two pathname-related arguments:
    
    - basepath, which would be $PGDATA and would never change as we
    recurse down, so same as what you're now calling basedir
    - pathsuffix, which would be an empty string at the top level and at
    each recursive level we'd add a slash and then de->d_name.
    
    So at the top of the function we wouldn't need an if statement,
    because you could just do:
    
    snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);
    
    And when you recurse you wouldn't need an if statement either, because
    you could just do:
    
    snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);
    
    What I'd suggest is constructing newpathsuffix right after rejecting
    "." and ".." entries, and then you can reject both pg_wal and
    backup_manifest, at the top-level only, using symmetric and elegant
    code:
    
    if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
    "/backup_manifest") == 0)
        continue;
    
    + record = manifesthash_lookup(hashtab, filename);;
    + if (record)
    + {
    ...long block...
    + }
    + else
    + pg_log_info("file \"%s\" is present in backup but not in manifest",
    + filename);
    
    Try to structure the code in such a way that you minimize unnecessary
    indentation. For example, in this case, you could instead write:
    
    if (record == NULL)
    {
        pg_log_info(...)
        return;
    }
    
    and the result would be that everything inside that long if-block is
    now at the top level of the function and indented one level less. And
    I think if you look at this function you'll see a way that you can
    save a *second* level of indentation for much of that code. Please
    check the rest of the patch for similar cases, too.
    
    +static char *
    +nextLine(char *buf)
    +{
    + while (*buf != '\0' && *buf != '\n')
    + buf = buf + 1;
    +
    + return buf + 1;
    +}
    
    I'm pretty sure that my previous review mentioned the importance of
    protecting against buffer overruns here.
    
    +static char *
    +nextWord(char *line)
    +{
    + while (*line != '\0' && *line != '\t' && *line != '\n')
    + line = line + 1;
    +
    + return line + 1;
    +}
    
    Same problem here.
    
    In both cases, ++ is more idiomatic.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  64. Re: backup manifests

    Rushabh Lathia <rushabh.lathia@gmail.com> — 2019-12-23T04:32:28Z

    On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > Thank you for review comments.
    >
    > Thanks for the new version.
    >
    > +      <term><option>--verify-backup </option></term>
    >
    > Whitespace.
    >
    > +struct manifesthash_hash *hashtab;
    >
    > Uh, I had it in mind that you would nuke this line completely, not
    > just remove "typedef" from it. You shouldn't need a global variable
    > here.
    >
    > + if (buf == NULL)
    >
    > pg_malloc seems to have an internal check such that it never returns
    > NULL. I don't see anything like this test in other callers.
    >
    > The order of operations in create_manifest_hash() seems unusual:
    >
    > + fd = open(manifest_path, O_RDONLY, 0);
    > + if (fstat(fd, &stat))
    > + buf = pg_malloc(stat.st_size);
    > + hashtab = manifesthash_create(1024, NULL);
    > ...
    > + entry = manifesthash_insert(hashtab, filename, &found);
    > ...
    > + close(fd);
    >
    > I would have expected open-fstat-read-close to be consecutive, and the
    > manifesthash stuff all done afterwards. In fact, it seems like reading
    > the file could be a separate function.
    >
    > + if (strncmp(checksum, "SHA256", 6) == 0)
    >
    > This isn't really right; it would give a false match if we had a
    > checksum algorithm with a name like SHA2560 or SHA256C or
    > SHA256ExceptWayBetter. The right thing to do is find the colon first,
    > and then probably overwrite it with '\0' so that you have a string
    > that you can pass to parse_checksum_algorithm().
    >
    > + /*
    > + * we don't have checksum type in the header, so need to
    > + * read through the first file enttry to find the checksum
    > + * type for the manifest file and initilize the checksum
    > + * for the manifest file itself.
    > + */
    >
    > This seems to be proceeding on the assumption that the checksum type
    > for the manifest itself will always be the same as the checksum type
    > for the first file in the manifest. I don't think that's the right
    > approach. I think the manifest should always have a SHA256 checksum,
    > regardless of what type of checksum is used for the individual files
    > within the manifest. Since the volume of data in the manifest is
    > presumably very small compared to the size of the database cluster
    > itself, I don't think there should be any performance problem there.
    >
    
    Agree, that performance won't be a problem, but that will be bit confusing
    to the user.  As at the start user providing the manifest-checksum (assume
    that user-provided CRC32C) and at the end, user will find the SHA256
    checksum string in the backup_manifest file.
    
    Does this also means that irrespective of whether user provided a checksum
    option or not,  we will be always generating the checksum for the
    backup_manifest file?
    
    
    > + filesize = atol(size);
    >
    > Using strtol() would allow for some error checking.
    >
    > + * Increase the checksum by its lable length so that we can
    > + checksum = checksum + checksum_lable_length;
    >
    > Spelling.
    >
    > + pg_log_error("invalid record found in \"%s\"", manifest_path);
    >
    > Error message needs work.
    >
    > +VerifyBackup(void)
    > +create_manifest_hash(char *manifest_path)
    > +nextLine(char *buf)
    >
    > Your function names should be consistent with the surrounding style,
    > and with each other, as far as possible. Three different conventions
    > within the same patch and source file seems over the top.
    >
    > Also keep in mind that you're not writing code in a vacuum. There's a
    > whole file of code here, and around that, a whole project.
    > scan_data_directory() is a good example of a function whose name is
    > clearly too generic. It's not a general-purpose function for scanning
    > the data directory; it's specifically a support function for verifying
    > a backup. Yet, the name gives no hint of this.
    >
    > +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
    > + char relative_path[MAXPGPATH], manifesthash_hash *hashtab)
    >
    > I think I commented on the use of char[] parameters in my previous review.
    >
    > + /* Skip backup manifest file. */
    > + if (strcmp(de->d_name, "backup_manifest") == 0)
    > + return;
    >
    > Still looks like this will be skipped at any level of the directory
    > hierarchy, not just the top. And why are we skipping backup_manifest
    > here bug pg_wal in scan_data_directory? That's a rhetorical question,
    > because I know the answer: verify_file() is only getting called for
    > files, so you can't use it to skip directories. But that's not a good
    > excuse for putting closely-related checks in different parts of the
    > code. It's just going to result in the checks being inconsistent and
    > each one having its own bugs that have to be fixed separately from the
    > other one, as here. Please try to reorganize this code so that it can
    > be done in a consistent way.
    >
    > I think this is related to the way you're traversing the directory
    > tree, which somehow looks a bit awkward to me. At the top of
    > scan_data_directory(), you've got code that uses basedir and
    > subdirpath to construct path and relative_path. I was initially
    > surprised to see that this was the job of this function, rather than
    > the caller, but then I thought: well, as long as it makes life easy
    > for the caller, it's probably fine. However, I notice that the only
    > non-trivial caller is the scan_data_directory() itself, and it has to
    > go and construct newsubdirpath from subdirpath and the directory name.
    >
    > It seems to me that this would get easier if you defined
    > scan_data_directory() -- or whatever we end up calling it -- to take
    > two pathname-related arguments:
    >
    > - basepath, which would be $PGDATA and would never change as we
    > recurse down, so same as what you're now calling basedir
    > - pathsuffix, which would be an empty string at the top level and at
    > each recursive level we'd add a slash and then de->d_name.
    >
    > So at the top of the function we wouldn't need an if statement,
    > because you could just do:
    >
    > snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);
    >
    > And when you recurse you wouldn't need an if statement either, because
    > you could just do:
    >
    > snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);
    >
    > What I'd suggest is constructing newpathsuffix right after rejecting
    > "." and ".." entries, and then you can reject both pg_wal and
    > backup_manifest, at the top-level only, using symmetric and elegant
    > code:
    >
    > if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
    > "/backup_manifest") == 0)
    >     continue;
    >
    > + record = manifesthash_lookup(hashtab, filename);;
    > + if (record)
    > + {
    > ...long block...
    > + }
    > + else
    > + pg_log_info("file \"%s\" is present in backup but not in manifest",
    > + filename);
    >
    > Try to structure the code in such a way that you minimize unnecessary
    > indentation. For example, in this case, you could instead write:
    >
    > if (record == NULL)
    > {
    >     pg_log_info(...)
    >     return;
    > }
    >
    > and the result would be that everything inside that long if-block is
    > now at the top level of the function and indented one level less. And
    > I think if you look at this function you'll see a way that you can
    > save a *second* level of indentation for much of that code. Please
    > check the rest of the patch for similar cases, too.
    >
    > +static char *
    > +nextLine(char *buf)
    > +{
    > + while (*buf != '\0' && *buf != '\n')
    > + buf = buf + 1;
    > +
    > + return buf + 1;
    > +}
    >
    > I'm pretty sure that my previous review mentioned the importance of
    > protecting against buffer overruns here.
    >
    > +static char *
    > +nextWord(char *line)
    > +{
    > + while (*line != '\0' && *line != '\t' && *line != '\n')
    > + line = line + 1;
    > +
    > + return line + 1;
    > +}
    >
    > Same problem here.
    >
    > In both cases, ++ is more idiomatic.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    Rushabh Lathia
    
  65. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-24T04:50:54Z

    On Sun, Dec 22, 2019 at 8:32 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
    > Agree, that performance won't be a problem, but that will be bit confusing
    > to the user.  As at the start user providing the manifest-checksum (assume
    > that user-provided CRC32C) and at the end, user will find the SHA256
    > checksum string in the backup_manifest file.
    
    I don't think that's particularly confusing. The documentation should
    say that this is the algorithm to be used for checksumming the files
    which are backed up. The algorithm to be used for the manifest itself
    is another matter. To me, it seems far MORE confusing if the algorithm
    used for the manifest itself is magically inferred from the algorithm
    used for one of the File lines therein.
    
    > Does this also means that irrespective of whether user provided a checksum
    > option or not,  we will be always generating the checksum for the backup_manifest file?
    
    Yes, that is what I am proposing.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  66. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2019-12-24T10:41:50Z

    Thank you for review comments.
    
    On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > Thank you for review comments.
    >
    > Thanks for the new version.
    >
    > +      <term><option>--verify-backup </option></term>
    >
    > Whitespace.
    >
    Corrected.
    
    
    >
    > +struct manifesthash_hash *hashtab;
    >
    > Uh, I had it in mind that you would nuke this line completely, not
    > just remove "typedef" from it. You shouldn't need a global variable
    > here.
    >
    
    Removed.
    
    
    > + if (buf == NULL)
    >
    > pg_malloc seems to have an internal check such that it never returns
    > NULL. I don't see anything like this test in other callers.
    >
    
    Yeah, removed this check
    
    
    >
    > The order of operations in create_manifest_hash() seems unusual:
    >
    > + fd = open(manifest_path, O_RDONLY, 0);
    > + if (fstat(fd, &stat))
    > + buf = pg_malloc(stat.st_size);
    > + hashtab = manifesthash_create(1024, NULL);
    > ...
    > + entry = manifesthash_insert(hashtab, filename, &found);
    > ...
    > + close(fd);
    >
    > I would have expected open-fstat-read-close to be consecutive, and the
    > manifesthash stuff all done afterwards. In fact, it seems like reading
    > the file could be a separate function.
    >
    
    Yes, created new function which will read the file and return the buffer.
    
    
    >
    > + if (strncmp(checksum, "SHA256", 6) == 0)
    >
    > This isn't really right; it would give a false match if we had a
    > checksum algorithm with a name like SHA2560 or SHA256C or
    > SHA256ExceptWayBetter. The right thing to do is find the colon first,
    > and then probably overwrite it with '\0' so that you have a string
    > that you can pass to parse_checksum_algorithm().
    >
    
    Corrected this check. Below suggestion, allow us to put '\0' in between the
    line.
    since SHA256 is used to generate for backup manifest, so that we can feed
    that
    line early to the checksum machinery.
    
    
    >
    > + /*
    > + * we don't have checksum type in the header, so need to
    > + * read through the first file enttry to find the checksum
    > + * type for the manifest file and initilize the checksum
    > + * for the manifest file itself.
    > + */
    >
    > This seems to be proceeding on the assumption that the checksum type
    > for the manifest itself will always be the same as the checksum type
    > for the first file in the manifest. I don't think that's the right
    > approach. I think the manifest should always have a SHA256 checksum,
    > regardless of what type of checksum is used for the individual files
    > within the manifest. Since the volume of data in the manifest is
    > presumably very small compared to the size of the database cluster
    > itself, I don't think there should be any performance problem there.
    >
    Made the change in backup manifest as well in backup validatort patch.
    Thanks to Rushabh Lathia for the offline discussion and help.
    
    To examine the first word of each line, I am using below check:
    if (strncmp(line, "File", 4) == 0)
    {
    ..
    }
    else if (strncmp(line, "Manifest-Checksum", 17) == 0)
    {
    ..
    }
    else
        error
    
    strncmp might be not right here, but we can not put '\0' in between the
    line (to find out first word)
    before we recognize the line type.
    All the lines expect line last one (where we have manifest checksum) are
    feed to the checksum machinary to calculate manifest checksum.
    so update_checksum() should be called after recognizing the type, i.e: if
    it is a File type record. Do you see any issues with this?
    
    + filesize = atol(size);
    >
    > Using strtol() would allow for some error checking.
    >
    corrected.
    
    
    >
    > + * Increase the checksum by its lable length so that we can
    > + checksum = checksum + checksum_lable_length;
    >
    > Spelling.
    >
    corrected.
    
    
    >
    > + pg_log_error("invalid record found in \"%s\"", manifest_path);
    >
    > Error message needs work.
    >
    > +VerifyBackup(void)
    > +create_manifest_hash(char *manifest_path)
    > +nextLine(char *buf)
    >
    > Your function names should be consistent with the surrounding style,
    > and with each other, as far as possible. Three different conventions
    > within the same patch and source file seems over the top.
    >
    > Also keep in mind that you're not writing code in a vacuum. There's a
    > whole file of code here, and around that, a whole project.
    > scan_data_directory() is a good example of a function whose name is
    > clearly too generic. It's not a general-purpose function for scanning
    > the data directory; it's specifically a support function for verifying
    > a backup. Yet, the name gives no hint of this.
    >
    > +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st,
    > + char relative_path[MAXPGPATH], manifesthash_hash *hashtab)
    >
    > I think I commented on the use of char[] parameters in my previous review.
    >
    > + /* Skip backup manifest file. */
    > + if (strcmp(de->d_name, "backup_manifest") == 0)
    > + return;
    >
    > Still looks like this will be skipped at any level of the directory
    > hierarchy, not just the top. And why are we skipping backup_manifest
    > here bug pg_wal in scan_data_directory? That's a rhetorical question,
    > because I know the answer: verify_file() is only getting called for
    > files, so you can't use it to skip directories. But that's not a good
    > excuse for putting closely-related checks in different parts of the
    > code. It's just going to result in the checks being inconsistent and
    > each one having its own bugs that have to be fixed separately from the
    > other one, as here. Please try to reorganize this code so that it can
    > be done in a consistent way.
    >
    > I think this is related to the way you're traversing the directory
    > tree, which somehow looks a bit awkward to me. At the top of
    > scan_data_directory(), you've got code that uses basedir and
    > subdirpath to construct path and relative_path. I was initially
    > surprised to see that this was the job of this function, rather than
    > the caller, but then I thought: well, as long as it makes life easy
    > for the caller, it's probably fine. However, I notice that the only
    > non-trivial caller is the scan_data_directory() itself, and it has to
    > go and construct newsubdirpath from subdirpath and the directory name.
    >
    > It seems to me that this would get easier if you defined
    > scan_data_directory() -- or whatever we end up calling it -- to take
    > two pathname-related arguments:
    >
    > - basepath, which would be $PGDATA and would never change as we
    > recurse down, so same as what you're now calling basedir
    > - pathsuffix, which would be an empty string at the top level and at
    > each recursive level we'd add a slash and then de->d_name.
    >
    > So at the top of the function we wouldn't need an if statement,
    > because you could just do:
    >
    > snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix);
    >
    > And when you recurse you wouldn't need an if statement either, because
    > you could just do:
    >
    > snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name);
    >
    > What I'd suggest is constructing newpathsuffix right after rejecting
    > "." and ".." entries, and then you can reject both pg_wal and
    > backup_manifest, at the top-level only, using symmetric and elegant
    > code:
    >
    > if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
    > "/backup_manifest") == 0)
    >     continue;
    >
    
    Thanks for the suggestion. Corrected as per the above inputs.
    
    
    > + record = manifesthash_lookup(hashtab, filename);;
    > + if (record)
    > + {
    > ...long block...
    > + }
    > + else
    > + pg_log_info("file \"%s\" is present in backup but not in manifest",
    > + filename);
    >
    > Try to structure the code in such a way that you minimize unnecessary
    > indentation. For example, in this case, you could instead write:
    >
    > if (record == NULL)
    > {
    >     pg_log_info(...)
    >     return;
    > }
    >
    > and the result would be that everything inside that long if-block is
    > now at the top level of the function and indented one level less. And
    > I think if you look at this function you'll see a way that you can
    > save a *second* level of indentation for much of that code. Please
    > check the rest of the patch for similar cases, too.
    >
    
    Make sense. corrected.
    
    
    >
    > +static char *
    > +nextLine(char *buf)
    > +{
    > + while (*buf != '\0' && *buf != '\n')
    > + buf = buf + 1;
    > +
    > + return buf + 1;
    > +}
    >
    > I'm pretty sure that my previous review mentioned the importance of
    > protecting against buffer overruns here.
    >
    > +static char *
    > +nextWord(char *line)
    > +{
    > + while (*line != '\0' && *line != '\t' && *line != '\n')
    > + line = line + 1;
    > +
    > + return line + 1;
    > +}
    >
    > Same problem here.
    >
    > In both cases, ++ is more idiomatic.
    >
    I have added a check for EOF, but not sure whether that woule be right here.
    Do we need to check the length of buffer as well?
    
    Rajkaumar has changed the tap test case patch as per revised error
    messages.
    Please find attached patch stack incorporated the above comments.
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  67. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2019-12-30T18:22:59Z

    On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > Made the change in backup manifest as well in backup validatort patch. Thanks to Rushabh Lathia for the offline discussion and help.
    >
    > To examine the first word of each line, I am using below check:
    > if (strncmp(line, "File", 4) == 0)
    > {
    > ..
    > }
    > else if (strncmp(line, "Manifest-Checksum", 17) == 0)
    > {
    > ..
    > }
    > else
    >     error
    >
    > strncmp might be not right here, but we can not put '\0' in between the line (to find out first word)
    > before we recognize the line type.
    > All the lines expect line last one (where we have manifest checksum) are feed to the checksum machinary to calculate manifest checksum.
    > so update_checksum() should be called after recognizing the type, i.e: if it is a File type record. Do you see any issues with this?
    
    I see the problem, but I don't think your solution is right, because
    the first test would pass if the line said FiletMignon rather than
    just File, which we certainly don't want. You've got to write the test
    so that you're checking against the whole first word, not just some
    prefix of it. There are several possible ways to accomplish that, but
    this isn't one of them.
    
    >> + pg_log_error("invalid record found in \"%s\"", manifest_path);
    >>
    >> Error message needs work.
    
    Looks better now, but you have a messages that say "invalid checksums
    type \"%s\" found in \"%s\"". This is wrong because checksums would
    need to be singular in this context (checksum). Also, I think it could
    be better phrased as "manifest file \"%s\" specifies unknown checksum
    algorithm \"%s\" at line %d".
    
    >> Your function names should be consistent with the surrounding style,
    >> and with each other, as far as possible. Three different conventions
    >> within the same patch and source file seems over the top.
    
    This appears to be fixed.
    
    >> Also keep in mind that you're not writing code in a vacuum. There's a
    >> whole file of code here, and around that, a whole project.
    >> scan_data_directory() is a good example of a function whose name is
    >> clearly too generic. It's not a general-purpose function for scanning
    >> the data directory; it's specifically a support function for verifying
    >> a backup. Yet, the name gives no hint of this.
    
    But this appears not to be fixed.
    
    >> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
    >> "/backup_manifest") == 0)
    >>     continue;
    >
    > Thanks for the suggestion. Corrected as per the above inputs.
    
    You need a comment here, like "Ignore the possible presence of a
    backup_manifest file and/or a pg_wal directory in the backup being
    verified." and then maybe another sentence explaining why that's the
    right thing to do.
    
    +             * The forth parameter to VerifyFile() will pass the relative path
    +             * of file to match exactly with the filename present in manifest.
    
    I don't know what this comment is trying to tell me, which might be
    something you want to try to fix. However, I'm pretty sure it's
    supposed to say "fourth" not "forth".
    
    >> and the result would be that everything inside that long if-block is
    >> now at the top level of the function and indented one level less. And
    >> I think if you look at this function you'll see a way that you can
    >> save a *second* level of indentation for much of that code. Please
    >> check the rest of the patch for similar cases, too.
    >
    > Make sense. corrected.
    
    I don't agree. A large chunk of VerifyFile() is still subject to a
    quite unnecessary level of indentation.
    
    > I have added a check for EOF, but not sure whether that woule be right here.
    > Do we need to check the length of buffer as well?
    
    That's really, really not right. EOF is not a character that can
    appear in the buffer. It's chosen on purpose to be a value that never
    matches any actual character when both the character and the EOF value
    are regarded as values of type 'int'. That guarantee doesn't apply
    here though because you're dealing with values of type 'char'. So what
    this code is doing is searching for an impossible value using
    incorrect logic, which has very little to do with the actual need
    here, which is to avoid running off the end of the buffer. To see what
    the problem is, try creating a file with no terminating newline, like
    this:
    
    echo -n this file has no terminating newline >> some-file
    
    I doubt it will be very hard to make this patch crash horribly. Even
    if you can't, it seems pretty clear that the logic isn't right.
    
    I don't really know what the \0 tests in NextLine() and NextWord()
    think they're doing either. If there's a \0 in the buffer before you
    add one, it was in the original input data, and pretending like that
    marks a word or line boundary seems like a fairly arbitrary choice.
    
    What I suggest is:
    
    (1) Allocate one byte more than the file size for the buffer that's
    going to hold the file, so that if you write a \0 just after the last
    byte of the file, you don't overrun the allocated buffer.
    
    (2) Compute char *endptr = buf + len.
    
    (3) Pass endptr to NextLine and NextWord and write the loop condition
    something like while (*buf != '\n' && buf < endptr).
    
    Other notes:
    
    - The error handling in ReadFileIntoBuffer() does not seem to consider
    the case of a short read. If you look through the source tree, you can
    find examples of how we normally handle that.
    
    - Putting string_hash_sdbm() into encode.c seems like a surprising
    choice. What does this have to do with encoding anything? And why is
    it going into src/common at all if it's only intended for frontend
    use?
    
    - It seems like whether or not any problems were found while verifying
    the manifest ought to affect the exit status of pg_basebackup. I'm not
    exactly sure what exit codes ought to be used, but you could look for
    similar precedents. Document this, too.
    
    - As much as possible let's have errors in the manifest file report
    the line number, and let's also try to make them more specific, e.g.
    instead of "invalid manifest record found in \"%s\"", perhaps
    "manifest file \"%s\" contains invalid keyword \"%s\" at line %d".
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  68. Re: backup manifests

    Tels <nospam-pg-abuse@bloodgate.com> — 2019-12-31T12:30:01Z

    Moin,
    
    sorry for the very late reply. There was a discussion about the specific 
    format of the backup manifests, and maybe that was already discussed and 
    I just overlooked it:
    
    1) Why invent your own format, and not just use a machine-readable 
    format that already exists? It doesn't have to be full blown XML, or 
    even JSON, something simple as YAML would already be better. That way 
    not everyone has to write their own parser. Or maybe it is already YAML 
    and just the different keywords where under discussion?
    
    2) It would be very wise to add a version number to the format. That 
    will making an extension later much easier and avoids the "we need  to 
    add X, but that breaks compatibility with all software out there" 
    situations that often arise a few years down the line.
    
    Best regards,
    
    and a happy New Year 2020
    
    Tels
    
    
    
    
  69. Re: backup manifests

    David Fetter <david@fetter.org> — 2019-12-31T17:43:26Z

    On Tue, Dec 31, 2019 at 01:30:01PM +0100, Tels wrote:
    > Moin,
    > 
    > sorry for the very late reply. There was a discussion about the specific
    > format of the backup manifests, and maybe that was already discussed and I
    > just overlooked it:
    > 
    > 1) Why invent your own format, and not just use a machine-readable format
    > that already exists? It doesn't have to be full blown XML, or even JSON,
    > something simple as YAML would already be better. That way not everyone has
    > to write their own parser. Or maybe it is already YAML and just the
    > different keywords where under discussion?
    
    YAML is extremely fragile and error-prone. It's also a superset of
    JSON, so I don't understand what you mean by "as simple as."
    
    -1 from me on YAML
    
    That said, I agree that there's no reason to come up with a bespoke
    format and parser when JSON is already available in every PostgreSQL
    installation.  Imposing a structure atop that includes a version
    number, as you suggest, seems pretty straightforward, and should be
    done.
    
    Would it make sense to include some kind of capability description in
    the format along with the version number?
    
    Best,
    David.
    -- 
    David Fetter <david(at)fetter(dot)org> http://fetter.org/
    Phone: +1 415 235 3778
    
    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
    
    
    
    
  70. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-01-01T02:16:53Z

    On 12/31/19 10:43 AM, David Fetter wrote:
    > On Tue, Dec 31, 2019 at 01:30:01PM +0100, Tels wrote:
    >> Moin,
    >>
    >> sorry for the very late reply. There was a discussion about the specific
    >> format of the backup manifests, and maybe that was already discussed and I
    >> just overlooked it:
    >>
    >> 1) Why invent your own format, and not just use a machine-readable format
    >> that already exists? It doesn't have to be full blown XML, or even JSON,
    >> something simple as YAML would already be better. That way not everyone has
    >> to write their own parser. Or maybe it is already YAML and just the
    >> different keywords where under discussion?
    > 
    > YAML is extremely fragile and error-prone. It's also a superset of
    > JSON, so I don't understand what you mean by "as simple as."
    > 
    > -1 from me on YAML
    
    -1 from me as well.  YAML is easy to write but definitely non-trivial to 
    read.
    
    > That said, I agree that there's no reason to come up with a bespoke
    > format and parser when JSON is already available in every PostgreSQL
    > installation.  Imposing a structure atop that includes a version
    > number, as you suggest, seems pretty straightforward, and should be
    > done.
    
    +1.  I continue to support a format that would be easily readable 
    without writing a lot of code.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  71. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-01T18:43:40Z

    On Tue, Dec 31, 2019 at 9:16 PM David Steele <david@pgmasters.net> wrote:
    > > That said, I agree that there's no reason to come up with a bespoke
    > > format and parser when JSON is already available in every PostgreSQL
    > > installation.  Imposing a structure atop that includes a version
    > > number, as you suggest, seems pretty straightforward, and should be
    > > done.
    >
    > +1.  I continue to support a format that would be easily readable
    > without writing a lot of code.
    
    So, if someone can suggest to me how I could read JSON from a tool in
    src/bin without writing a lot of code, I'm all ears. So far that's
    been asserted but not been demonstrated to be possible. Getting the
    JSON parser that we have in the backend to work from frontend doesn't
    look all that straightforward, for reasons that I talked about in
    http://postgr.es/m/CA+TgmobZrNYR-ATtfZiZ_k-W7tSPgvmYZmyiqumQig4R4fkzHw@mail.gmail.com
    
    As to the suggestion that a version number be included, that's been
    there in every version of the patch I've posted.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  72. Re: backup manifests

    David Fetter <david@fetter.org> — 2020-01-01T19:09:18Z

    On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
    > On Tue, Dec 31, 2019 at 9:16 PM David Steele <david@pgmasters.net> wrote:
    > > > That said, I agree that there's no reason to come up with a bespoke
    > > > format and parser when JSON is already available in every PostgreSQL
    > > > installation.  Imposing a structure atop that includes a version
    > > > number, as you suggest, seems pretty straightforward, and should be
    > > > done.
    > >
    > > +1.  I continue to support a format that would be easily readable
    > > without writing a lot of code.
    > 
    > So, if someone can suggest to me how I could read JSON from a tool in
    > src/bin without writing a lot of code, I'm all ears. So far that's
    > been asserted but not been demonstrated to be possible. Getting the
    > JSON parser that we have in the backend to work from frontend doesn't
    > look all that straightforward, for reasons that I talked about in
    > http://postgr.es/m/CA+TgmobZrNYR-ATtfZiZ_k-W7tSPgvmYZmyiqumQig4R4fkzHw@mail.gmail.com
    
    Maybe I'm missing something obvious, but wouldn't combining
    pg_read_file() with a cast to JSONB fix this, as below?
    
    shackle@[local]:5413/postgres(13devel)(892328) # SELECT jsonb_pretty(j::jsonb) FROM pg_read_file('/home/shackle/advanced_comparison.json') AS t(j);
                jsonb_pretty            
    ════════════════════════════════════
     [                                 ↵
         {                             ↵
             "message": "hello world!",↵
             "severity": "[DEBUG]"     ↵
         },                            ↵
         {                             ↵
             "message": "boz",         ↵
             "severity": "[INFO]"      ↵
         },                            ↵
         {                             ↵
             "message": "foo",         ↵
             "severity": "[DEBUG]"     ↵
         },                            ↵
         {                             ↵
             "message": "null",        ↵
             "severity": "null"        ↵
         }                             ↵
     ]
    (1 row)
    
    Time: 3.050 ms
    
    > As to the suggestion that a version number be included, that's been
    > there in every version of the patch I've posted.
    
    and thanks for that!
    
    Best,
    David.
    -- 
    David Fetter <david(at)fetter(dot)org> http://fetter.org/
    Phone: +1 415 235 3778
    
    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
    
    
    
    
  73. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-01-02T00:46:15Z

    David Fetter <david@fetter.org> writes:
    > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
    >> So, if someone can suggest to me how I could read JSON from a tool in
    >> src/bin without writing a lot of code, I'm all ears.
    
    > Maybe I'm missing something obvious, but wouldn't combining
    > pg_read_file() with a cast to JSONB fix this, as below?
    
    Only if you're prepared to restrict the use of the tool to superusers
    (or at least people with whatever privilege that function requires).
    
    Admittedly, you can probably feed the data to the backend without
    use of an intermediate file; but it still requires a working backend
    connection, which might be a bit of a leap for backup-related tools.
    I'm sure Robert was envisioning doing this processing inside the tool.
    
    			regards, tom lane
    
    
    
    
  74. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-02T01:57:11Z

    On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > David Fetter <david@fetter.org> writes:
    > > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
    > >> So, if someone can suggest to me how I could read JSON from a tool in
    > >> src/bin without writing a lot of code, I'm all ears.
    >
    > > Maybe I'm missing something obvious, but wouldn't combining
    > > pg_read_file() with a cast to JSONB fix this, as below?
    >
    > Only if you're prepared to restrict the use of the tool to superusers
    > (or at least people with whatever privilege that function requires).
    >
    > Admittedly, you can probably feed the data to the backend without
    > use of an intermediate file; but it still requires a working backend
    > connection, which might be a bit of a leap for backup-related tools.
    > I'm sure Robert was envisioning doing this processing inside the tool.
    
    Yeah, exactly. I don't think verifying a backup should require a
    running server, let alone a running server on the same machine where
    the backup is stored and for which you have superuser privileges.
    AFAICS, the only options to make that work with JSON are (1) introduce
    a new hand-coded JSON parser designed for frontend operation, (2) add
    a dependency on an external JSON parser that we can use from frontend
    code, or (3) adapt the existing JSON parser used in the backend so
    that it can also be used in the frontend.
    
    I'd be willing to do (1) -- it wouldn't be the first time I've written
    JSON parser for PostgreSQL -- but I think it will take an order of
    magnitude more code than using a file with tab-separated columns as
    I've proposed, and I assume that there will be complaints about having
    two JSON parsers in core. I'd also be willing to do (2) if that's the
    consensus, but I'd vote against such an approach if somebody else
    proposed it because (a) I'm not aware of a widely-available library
    upon which we could depend and (b) introducing such a dependency for a
    minor feature like this seems fairly unpalatable to me, and it'd
    probably still be more code than just using a tab-separated file.  I'd
    be willing to do (3) if somebody could explain to me how to solve the
    problems with porting that code to work on the frontend side, but the
    only suggestion so far as to how to do that is to port memory
    contexts, elog/report, and presumably encoding handling to work on the
    frontend side. That seems to me to be an unreasonably large lift,
    especially given that we have lots of other files that use ad-hoc
    formats already, and if somebody ever gets around to converting all of
    those to JSON, they can certainly convert this one at the same time.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  75. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-01-02T02:20:11Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > AFAICS, the only options to make that work with JSON are (1) introduce
    > a new hand-coded JSON parser designed for frontend operation, (2) add
    > a dependency on an external JSON parser that we can use from frontend
    > code, or (3) adapt the existing JSON parser used in the backend so
    > that it can also be used in the frontend.
    > ...  I'd
    > be willing to do (3) if somebody could explain to me how to solve the
    > problems with porting that code to work on the frontend side, but the
    > only suggestion so far as to how to do that is to port memory
    > contexts, elog/report, and presumably encoding handling to work on the
    > frontend side. That seems to me to be an unreasonably large lift,
    
    Yeah, agreed.  The only consideration that'd make that a remotely
    sane idea is that if somebody did the work, there would be other
    uses for it.  (One that comes to mind immediately is cleaning up
    ecpg's miserably-maintained fork of the backend datetime code.)
    
    But there's no denying that it would be a large amount of work
    (if it's even feasible), and nobody has stepped up to volunteer.
    It's not reasonable to hold up this particular feature waiting
    for that to happen.
    
    If a tab-delimited file can handle this requirement, that seems
    like a sane choice to me.
    
    			regards, tom lane
    
    
    
    
  76. Re: backup manifests

    David Fetter <david@fetter.org> — 2020-01-02T18:03:23Z

    On Wed, Jan 01, 2020 at 08:57:11PM -0500, Robert Haas wrote:
    > On Wed, Jan 1, 2020 at 7:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > David Fetter <david@fetter.org> writes:
    > > > On Wed, Jan 01, 2020 at 01:43:40PM -0500, Robert Haas wrote:
    > > >> So, if someone can suggest to me how I could read JSON from a tool in
    > > >> src/bin without writing a lot of code, I'm all ears.
    > >
    > > > Maybe I'm missing something obvious, but wouldn't combining
    > > > pg_read_file() with a cast to JSONB fix this, as below?
    > >
    > > Only if you're prepared to restrict the use of the tool to superusers
    > > (or at least people with whatever privilege that function requires).
    > >
    > > Admittedly, you can probably feed the data to the backend without
    > > use of an intermediate file; but it still requires a working backend
    > > connection, which might be a bit of a leap for backup-related tools.
    > > I'm sure Robert was envisioning doing this processing inside the tool.
    > 
    > Yeah, exactly. I don't think verifying a backup should require a
    > running server, let alone a running server on the same machine where
    > the backup is stored and for which you have superuser privileges.
    
    Thanks for clarifying the context.
    
    > AFAICS, the only options to make that work with JSON are (1) introduce
    > a new hand-coded JSON parser designed for frontend operation, (2) add
    > a dependency on an external JSON parser that we can use from frontend
    > code, or (3) adapt the existing JSON parser used in the backend so
    > that it can also be used in the frontend.
    > 
    > I'd be willing to do (1) -- it wouldn't be the first time I've written
    > JSON parser for PostgreSQL -- but I think it will take an order of
    > magnitude more code than using a file with tab-separated columns as
    > I've proposed, and I assume that there will be complaints about having
    > two JSON parsers in core. I'd also be willing to do (2) if that's the
    > consensus, but I'd vote against such an approach if somebody else
    > proposed it because (a) I'm not aware of a widely-available library
    > upon which we could depend and
    
    I believe jq has an excellent one that's available under a suitable
    license.
    
    Making jq a dependency seems like a separate discussion, though. At
    the moment, we don't use git tools like submodel/subtree, and deciding
    which (or whether) seems like a gigantic discussion all on its own.
    
    > (b) introducing such a dependency for a minor feature like this
    > seems fairly unpalatable to me, and it'd probably still be more code
    > than just using a tab-separated file.  I'd be willing to do (3) if
    > somebody could explain to me how to solve the problems with porting
    > that code to work on the frontend side, but the only suggestion so
    > far as to how to do that is to port memory contexts, elog/report,
    > and presumably encoding handling to work on the frontend side.
    
    This port has come up several times recently in different contexts.
    How big a chunk of work would it be?  Just so we're clear, I'm not
    suggesting that this port should gate this feature.
    
    > That seems to me to be an unreasonably large lift, especially given
    > that we have lots of other files that use ad-hoc formats already,
    > and if somebody ever gets around to converting all of those to JSON,
    > they can certainly convert this one at the same time.
    
    Would that require some kind of file converter program, or just a
    really loud notice in the release notes?
    
    Best,
    David.
    -- 
    David Fetter <david(at)fetter(dot)org> http://fetter.org/
    Phone: +1 415 235 3778
    
    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
    
    
    
    
  77. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-02T18:34:44Z

    On Thu, Jan 2, 2020 at 1:03 PM David Fetter <david@fetter.org> wrote:
    > I believe jq has an excellent one that's available under a suitable
    > license.
    >
    > Making jq a dependency seems like a separate discussion, though. At
    > the moment, we don't use git tools like submodel/subtree, and deciding
    > which (or whether) seems like a gigantic discussion all on its own.
    
    Yep. And it doesn't seem worth it for a relatively small feature like
    this. If we already had it, it might be worth using for a relatively
    small feature like this, but that's a different issue.
    
    > > (b) introducing such a dependency for a minor feature like this
    > > seems fairly unpalatable to me, and it'd probably still be more code
    > > than just using a tab-separated file.  I'd be willing to do (3) if
    > > somebody could explain to me how to solve the problems with porting
    > > that code to work on the frontend side, but the only suggestion so
    > > far as to how to do that is to port memory contexts, elog/report,
    > > and presumably encoding handling to work on the frontend side.
    >
    > This port has come up several times recently in different contexts.
    > How big a chunk of work would it be?  Just so we're clear, I'm not
    > suggesting that this port should gate this feature.
    
    I don't really know. It's more of a research project than a coding
    project, at least initially, I think. For instance, psql has its own
    non-local-transfer-of-control mechanism using sigsetjmp(). If you
    wanted to introduce elog/ereport on the frontend, would you make psql
    use it? Or just let psql continue to do what it does now and introduce
    the new mechanism as an option for code going forward? Or try to make
    the two mechanisms work together somehow? Will you start using the
    same error codes that we use in the backend on the frontend side, and
    if so, what will they do, given that what the backend does is just
    embed them in a protocol message that any particular client may or may
    not display? Similarly, should frontend errors support reporting a
    hint, detail, statement, or query? Will it be confusing if backend and
    frontend errors are too similar? If you make memory contexts available
    in the frontend, what if any code will you adapt to use them? There's
    a lot of stuff in src/bin. If you want the encoding machinery on the
    front end, what will you use in place of the backend's idea of the
    "database encoding"? What will you do about dependencies on Datum in
    frontend code? Somebody would need to study all this stuff, come up
    with a tentative set of decisions, write patches, get it all working,
    and then quite possibly have the choices they made get second-guessed
    by other people who have different ideas. If you come up with a really
    good, clean proposal that doesn't provoke any major disagreements, you
    might be able to get this done in a couple of months. If you can't
    come up with something people good, or if you're the only one who
    thinks what you come up with is good, it might take years.
    
    It seems to me that in a perfect world a lot of the code we have in
    the backend that is usefully reusable in other contexts would be
    structured so that it doesn't have random dependencies on backend-only
    machinery like memory contexts and elog/ereport. For example, if you
    write a function that returns an error message rather than throwing an
    error, then you can arrange to call that from either frontend or
    backend code and the caller can do whatever it wishes with that error
    text. However, once you've written your code so that an error gets
    thrown six layers down in the call stack, it's really hard to
    rearrange that so that the error is returned, and if you are
    populating not only the primary error message but error code, detail,
    hint, etc. it's almost impractical to think that you can rearrange
    things that way anyway. And generally you want to be populating those
    things, as a best practice for backend code. So while in theory I kind
    of like the idea of adapting the JSON parser we've already got to just
    not depend so heavily on a backend environment, it's not really very
    clear how to actually make that happen. At least not to me.
    
    > > That seems to me to be an unreasonably large lift, especially given
    > > that we have lots of other files that use ad-hoc formats already,
    > > and if somebody ever gets around to converting all of those to JSON,
    > > they can certainly convert this one at the same time.
    >
    > Would that require some kind of file converter program, or just a
    > really loud notice in the release notes?
    
    Maybe neither. I don't see why it wouldn't be possible to be
    backward-compatible just by keeping the old code around and having it
    parse as far as the version number. Then it could decide to continue
    on with the old code or call the new code, depending.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  78. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-01-03T12:41:45Z

    Thank you for review comments.
    
    On Mon, Dec 30, 2019 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > To examine the first word of each line, I am using below check:
    > > if (strncmp(line, "File", 4) == 0)
    > > {
    > > ..
    > > }
    > > else if (strncmp(line, "Manifest-Checksum", 17) == 0)
    > > {
    > > ..
    > > }
    > > else
    > >     error
    > >
    > > strncmp might be not right here, but we can not put '\0' in between the
    > line (to find out first word)
    > > before we recognize the line type.
    > > All the lines expect line last one (where we have manifest checksum) are
    > feed to the checksum machinary to calculate manifest checksum.
    > > so update_checksum() should be called after recognizing the type, i.e:
    > if it is a File type record. Do you see any issues with this?
    >
    > I see the problem, but I don't think your solution is right, because
    > the first test would pass if the line said FiletMignon rather than
    > just File, which we certainly don't want. You've got to write the test
    > so that you're checking against the whole first word, not just some
    > prefix of it. There are several possible ways to accomplish that, but
    > this isn't one of them.
    >
    
    Yeah. Fixed in the attached patch.
    
    
    >
    > >> + pg_log_error("invalid record found in \"%s\"", manifest_path);
    > >>
    > >> Error message needs work.
    >
    > Looks better now, but you have a messages that say "invalid checksums
    > type \"%s\" found in \"%s\"". This is wrong because checksums would
    > need to be singular in this context (checksum). Also, I think it could
    > be better phrased as "manifest file \"%s\" specifies unknown checksum
    > algorithm \"%s\" at line %d".
    >
    
    Corrected.
    
    
    >
    > >> Your function names should be consistent with the surrounding style,
    > >> and with each other, as far as possible. Three different conventions
    > >> within the same patch and source file seems over the top.
    >
    > This appears to be fixed.
    >
    > >> Also keep in mind that you're not writing code in a vacuum. There's a
    > >> whole file of code here, and around that, a whole project.
    > >> scan_data_directory() is a good example of a function whose name is
    > >> clearly too generic. It's not a general-purpose function for scanning
    > >> the data directory; it's specifically a support function for verifying
    > >> a backup. Yet, the name gives no hint of this.
    >
    > But this appears not to be fixed.
    >
    
    I have changed this function name to "VerifyDir" likewise, we have sendDir
    and sendFile in basebackup.c
    
    
    >
    > >> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
    > >> "/backup_manifest") == 0)
    > >>     continue;
    > >
    > > Thanks for the suggestion. Corrected as per the above inputs.
    >
    > You need a comment here, like "Ignore the possible presence of a
    > backup_manifest file and/or a pg_wal directory in the backup being
    > verified." and then maybe another sentence explaining why that's the
    > right thing to do.
    >
    
    Corrected.
    
    
    >
    > +             * The forth parameter to VerifyFile() will pass the relative
    > path
    > +             * of file to match exactly with the filename present in
    > manifest.
    >
    > I don't know what this comment is trying to tell me, which might be
    > something you want to try to fix. However, I'm pretty sure it's
    > supposed to say "fourth" not "forth".
    >
    
    I have changed the fourth parameter of VerifyFile(), so my comment over
    there is no more valid.
    
    
    >
    > >> and the result would be that everything inside that long if-block is
    > >> now at the top level of the function and indented one level less. And
    > >> I think if you look at this function you'll see a way that you can
    > >> save a *second* level of indentation for much of that code. Please
    > >> check the rest of the patch for similar cases, too.
    > >
    > > Make sense. corrected.
    >
    > I don't agree. A large chunk of VerifyFile() is still subject to a
    > quite unnecessary level of indentation.
    >
    
    Yeah, corrected.
    
    
    >
    > > I have added a check for EOF, but not sure whether that woule be right
    > here.
    > > Do we need to check the length of buffer as well?
    >
    > That's really, really not right. EOF is not a character that can
    > appear in the buffer. It's chosen on purpose to be a value that never
    > matches any actual character when both the character and the EOF value
    > are regarded as values of type 'int'. That guarantee doesn't apply
    > here though because you're dealing with values of type 'char'. So what
    > this code is doing is searching for an impossible value using
    > incorrect logic, which has very little to do with the actual need
    > here, which is to avoid running off the end of the buffer. To see what
    > the problem is, try creating a file with no terminating newline, like
    > this:
    >
    > echo -n this file has no terminating newline >> some-file
    >
    > I doubt it will be very hard to make this patch crash horribly. Even
    > if you can't, it seems pretty clear that the logic isn't right.
    >
    > I don't really know what the \0 tests in NextLine() and NextWord()
    > think they're doing either. If there's a \0 in the buffer before you
    > add one, it was in the original input data, and pretending like that
    > marks a word or line boundary seems like a fairly arbitrary choice.
    >
    > What I suggest is:
    >
    > (1) Allocate one byte more than the file size for the buffer that's
    > going to hold the file, so that if you write a \0 just after the last
    > byte of the file, you don't overrun the allocated buffer.
    >
    > (2) Compute char *endptr = buf + len.
    >
    > (3) Pass endptr to NextLine and NextWord and write the loop condition
    > something like while (*buf != '\n' && buf < endptr).
    >
    
    Thanks for the suggestion. Corrected as per above suggestion.
    
    
    >
    > Other notes:
    >
    > - The error handling in ReadFileIntoBuffer() does not seem to consider
    > the case of a short read. If you look through the source tree, you can
    > find examples of how we normally handle that.
    >
    
    yeah, corrected.
    
    
    >
    > - Putting string_hash_sdbm() into encode.c seems like a surprising
    > choice. What does this have to do with encoding anything? And why is
    > it going into src/common at all if it's only intended for frontend
    > use?
    >
    I thought this function can be used in backend as well,  i.e: likewise we
    are using in simplehash,  so kept that in src/common.
    After your comment, I have moved this to pg_basebackup.c.
    I think this can be kept in common place but not in "srs/common/encode.c"
    thoughts?
    
    
    >
    > - It seems like whether or not any problems were found while verifying
    > the manifest ought to affect the exit status of pg_basebackup. I'm not
    > exactly sure what exit codes ought to be used, but you could look for
    > similar precedents. Document this, too.
    >
    I might be not getting this completely correct, but as per my observation,
    if any error occurs, pg_basebackup terminated with exit(1).
    Whereas in normal case (without an error), main function returns 0. The
    "help" and "version" option terminate normally with exit(0).
    So in our case, exit(0) would be appropriate. Please correct me if I
    misunderstood anything.
    
    
    >
    > - As much as possible let's have errors in the manifest file report
    > the line number, and let's also try to make them more specific, e.g.
    > instead of "invalid manifest record found in \"%s\"", perhaps
    > "manifest file \"%s\" contains invalid keyword \"%s\" at line %d".
    >
    yeah, added line number at possible places.
    
    I have also fixed few comments given by Jeevan Chalke offlist.
    
    Please find attached v7 patches and let me know your comments.
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  79. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-01-03T16:44:24Z

    Greetings,
    
    * Tom Lane (tgl@sss.pgh.pa.us) wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > AFAICS, the only options to make that work with JSON are (1) introduce
    > > a new hand-coded JSON parser designed for frontend operation, (2) add
    > > a dependency on an external JSON parser that we can use from frontend
    > > code, or (3) adapt the existing JSON parser used in the backend so
    > > that it can also be used in the frontend.
    > > ...  I'd
    > > be willing to do (3) if somebody could explain to me how to solve the
    > > problems with porting that code to work on the frontend side, but the
    > > only suggestion so far as to how to do that is to port memory
    > > contexts, elog/report, and presumably encoding handling to work on the
    > > frontend side. That seems to me to be an unreasonably large lift,
    > 
    > Yeah, agreed.  The only consideration that'd make that a remotely
    > sane idea is that if somebody did the work, there would be other
    > uses for it.  (One that comes to mind immediately is cleaning up
    > ecpg's miserably-maintained fork of the backend datetime code.)
    > 
    > But there's no denying that it would be a large amount of work
    > (if it's even feasible), and nobody has stepped up to volunteer.
    > It's not reasonable to hold up this particular feature waiting
    > for that to happen.
    
    Sure, it'd be work, and for "adding a simple backup manifest", maybe too
    much to be worth considering ... but that's not what is going on here,
    is it?  Are we really *just* going to add a backup manifest to
    pg_basebackup and call it done?  That's not what I understood the goal
    here to be but rather to start doing a lot of other things with
    pg_basebackup beyond just having a manifest and if you think just a bit
    farther down the path, I think you start to realize that you're going to
    need this base set of capabilities to get to a point where pg_basebackup
    (or whatever it ends up being called) is able to have the kind of
    capabilities that exist in other PG backup software already.
    
    I'm sure I don't need to say where to find it, but I can point you to a
    pretty good example of a similar effort, and we didn't start with "build
    a manifest into a custom format" as the first thing implemented, but
    rather a great deal of work was first put into building out things like
    logging, memory management/contexts, error handling/try-catch, having a
    string type, a variant type, etc.
    
    In some ways, it's kind of impressive what we've got in our front-ends
    tools even though we don't have these things, really, and certainly not
    all in one nice library that they all use...  but at the same time, I
    think that lack has also held those tools back, pg_basebackup among
    them.
    
    Anyway, off my high horse, I'll just say I agree w/ David and David wrt
    using JSON for this over hacking together yet another format.  We didn't
    do that as thoroughly as we should have (we've got a JSON parser and all
    that, and use JSON quite a bit, but the actual manifest format is a mix
    of ini-style and JSON, because it's got more in it than just a list of
    files, something that I suspect will also end up being true of this down
    the road and for good reasons, and we started with the ini format and
    discovered it sucked and then started embedding JSON in it...), and
    we've come to realize that was a bad idea, and intend to fix it in our
    next manifest major version bump.  Would be unfortunate to see PG making
    that same mistake.  
    
    Thanks,
    
    Stephen
    
  80. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-03T16:51:06Z

    On Fri, Jan 3, 2020 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote:
    > Sure, it'd be work, and for "adding a simple backup manifest", maybe too
    > much to be worth considering ... but that's not what is going on here,
    > is it?  Are we really *just* going to add a backup manifest to
    > pg_basebackup and call it done?  That's not what I understood the goal
    > here to be but rather to start doing a lot of other things with
    > pg_basebackup beyond just having a manifest and if you think just a bit
    > farther down the path, I think you start to realize that you're going to
    > need this base set of capabilities to get to a point where pg_basebackup
    > (or whatever it ends up being called) is able to have the kind of
    > capabilities that exist in other PG backup software already.
    
    I have no development plans for pg_basebackup that require extending
    the format of the manifest file in any significant way, and am not
    aware that anyone else has such plans either. If you are aware of
    something I'm not, or if anyone else is, it would be helpful to know
    about it.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  81. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-01-03T17:01:23Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Fri, Jan 3, 2020 at 11:44 AM Stephen Frost <sfrost@snowman.net> wrote:
    > > Sure, it'd be work, and for "adding a simple backup manifest", maybe too
    > > much to be worth considering ... but that's not what is going on here,
    > > is it?  Are we really *just* going to add a backup manifest to
    > > pg_basebackup and call it done?  That's not what I understood the goal
    > > here to be but rather to start doing a lot of other things with
    > > pg_basebackup beyond just having a manifest and if you think just a bit
    > > farther down the path, I think you start to realize that you're going to
    > > need this base set of capabilities to get to a point where pg_basebackup
    > > (or whatever it ends up being called) is able to have the kind of
    > > capabilities that exist in other PG backup software already.
    > 
    > I have no development plans for pg_basebackup that require extending
    > the format of the manifest file in any significant way, and am not
    > aware that anyone else has such plans either. If you are aware of
    > something I'm not, or if anyone else is, it would be helpful to know
    > about it.
    
    You're certainly intending to do *something* with the manifest, and
    while I appreciate that you feel you've come up with a complete use-case
    that this simple manifest will be sufficient for, I frankly doubt
    that'll actually be the case.  Not long ago it wasn't completely clear
    that a manifest at *all* was even going to be necessary for the specific
    use-case you had in mind (I'll admit I wasn't 100% sure myself at the
    time either), but now that we're down the road of having one, I can't
    agree with the blanket assumption that we're never going to want to
    extend it, or even that it won't be necessary to add to it before this
    particular use-case is fully addressed.
    
    And the same goes for the other things that were discussed up-thread
    regarding memory context and error handling and such.
    
    I'm happy to outline the other things that one *might* want to include
    in a manifest, if that would be helpful, but I'll also say that I'm not
    planning to hack on adding that to pg_basebackup in the next month or
    two.  Once we've actually got a manifest, if it's in an extendable
    format, I could certainly see people wanting to do more with it though.
    
    Thanks,
    
    Stephen
    
  82. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-03T17:37:47Z

    On Fri, Jan 3, 2020 at 12:01 PM Stephen Frost <sfrost@snowman.net> wrote:
    > You're certainly intending to do *something* with the manifest, and
    > while I appreciate that you feel you've come up with a complete use-case
    > that this simple manifest will be sufficient for, I frankly doubt
    > that'll actually be the case.  Not long ago it wasn't completely clear
    > that a manifest at *all* was even going to be necessary for the specific
    > use-case you had in mind (I'll admit I wasn't 100% sure myself at the
    > time either), but now that we're down the road of having one, I can't
    > agree with the blanket assumption that we're never going to want to
    > extend it, or even that it won't be necessary to add to it before this
    > particular use-case is fully addressed.
    >
    > And the same goes for the other things that were discussed up-thread
    > regarding memory context and error handling and such.
    
    Well, I don't know how to make you happy here. It looks to me like
    insisting on a JSON-format manifest will likely mean that this doesn't
    get into PG13 or PG14 or probably PG15, because a port of all that
    machinery to work in frontend code will be neither simple nor quick.
    If you want this to happen for this release, you've got to be willing
    to settle for something that can be implemented in the time we have.
    
    I'm not sure whether what you and David are arguing boils down to
    thinking that I'm wrong when I say that doing that is hard, or whether
    you know it's hard but you just don't care because you'd rather see
    the feature go nowhere than use a format other than JSON. I don't see
    much difference between the latter position and a desire to block the
    feature permanently. And if it's the former then you have yet to make
    any suggestions for how to get it done with reasonable effort.
    
    > I'm happy to outline the other things that one *might* want to include
    > in a manifest, if that would be helpful, but I'll also say that I'm not
    > planning to hack on adding that to pg_basebackup in the next month or
    > two.  Once we've actually got a manifest, if it's in an extendable
    > format, I could certainly see people wanting to do more with it though.
    
    Well, as I say, it's got a version number, so somebody can always come
    along with something better. I really think this is a red herring,
    though. If somebody wants to track additional data about a backup,
    there's no rule that they have to include it in the backup manifest. A
    backup management solution might want to track things like who
    initiated the backup, or for what purpose it was taken, or the IP
    address of the machine where it was taken, or the backup system's own
    identifier, but any of that stuff could (and probably should) be
    stored in a file managed by that tool rather than in the server's own
    manifest.  As to the per-file information, I believe that David and I
    discussed that and the list of fields that I had seemed relatively OK,
    and I believe I added at least one (mtime) per his suggestion. Of
    course, it's a tab-separated file; more fields could easily be added
    at the end, separated by tabs. Or, you could modify the file so that
    after each "File" line you had another line with supplementary
    information about that file, beginning with some other word. Or, you
    could convert the whole file to JSON for v2 of the manifest, if,
    contrary to my belief, that's a fairly simple thing to do. There are
    probably other approaches as well. This file format has already had
    considerably more thought about forward-compatibility than
    pg_hba.conf, which has been retrofitted multiple times without
    breaking the world.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  83. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-01-03T19:35:59Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Fri, Jan 3, 2020 at 12:01 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > You're certainly intending to do *something* with the manifest, and
    > > while I appreciate that you feel you've come up with a complete use-case
    > > that this simple manifest will be sufficient for, I frankly doubt
    > > that'll actually be the case.  Not long ago it wasn't completely clear
    > > that a manifest at *all* was even going to be necessary for the specific
    > > use-case you had in mind (I'll admit I wasn't 100% sure myself at the
    > > time either), but now that we're down the road of having one, I can't
    > > agree with the blanket assumption that we're never going to want to
    > > extend it, or even that it won't be necessary to add to it before this
    > > particular use-case is fully addressed.
    > >
    > > And the same goes for the other things that were discussed up-thread
    > > regarding memory context and error handling and such.
    > 
    > Well, I don't know how to make you happy here.
    
    I suppose I should admit that, first off, I don't feel you're required
    to make me happy, and I don't think it's necessary to make me happy to
    get this feature into PG.
    
    Since you expressed that interest though, I'll go out on a limb and say
    that what would make me *really* happy would be to think about where the
    project should be taking pg_basebackup, what we should be working on
    *today* to address the concerns we hear about from our users, and to
    consider the best way to implement solutions to what they're actively
    asking for a core backup solution to be providing.  I get that maybe
    that isn't how the world works and that sometimes we have people who
    write our paychecks wanting us to work on something else, and yes, I'm
    sure there are some users who are asking for this specific thing but I
    certainly don't think it's a common ask of pg_basebackup or what users
    feel is missing from the backup options we offer in core; we had users
    on this list specifically saying they *wouldn't* use this feature
    (referring to the differential backup stuff, of course), in fact,
    because of the things which are missing, which is pretty darn rare.
    
    That's what would make *me* happy.  Even some comments about how to
    *get* there while also working towards these features would be likely
    to make me happy.  Instead, I feel like we're being told that we need
    this feature badly in v13 and we're going to cut bait and do whatever
    is necessary to get us there.
    
    > It looks to me like
    > insisting on a JSON-format manifest will likely mean that this doesn't
    > get into PG13 or PG14 or probably PG15, because a port of all that
    > machinery to work in frontend code will be neither simple nor quick.
    
    I certainly understand that these things take time, sometimes quite a
    bit of it as the past 2 years have shown in this other little side
    project, and that was hacking without having to go through the much
    larger effort involved in getting things into PG core.  That doesn't
    mean that kind of effort isn't worthwhile or that, because something is
    a bunch of work, we shouldn't spend the time on it.  I do feel what
    you're after here is a multi-year project, and I've said before that I
    don't agree that this is a feature (the differential backup with
    pg_basebackup thing) that makes any sense going into PG at this time,
    but I'm also not trying to block this feature, just to share the
    experience that we've gotten from working in this area for quite a
    while and hopefully help guide the effort in PG away from pitfalls and
    in a good direction long-term.
    
    > If you want this to happen for this release, you've got to be willing
    > to settle for something that can be implemented in the time we have.
    
    I'm not sure what you're expecting here, but for my part, at least, I'm
    not going to be terribly upset if this feature doesn't make this release
    because there's an agreement and understanding that the current
    direction isn't a good long-term solution.  Nor am I going to be
    terribly upset about the time that's been spent on this particular
    approach given that there's been no shortage of people commenting that
    they'd rather see an extensible format, like JSON, and has been for
    quite some time.
    
    All that said- one thing we've done is to consider that *we* are the
    ones who are writing the JSON, while also being the ones to read it- we
    don't need the parsing side to understand and deal with *any* JSON that
    might exist out there, just whatever it is the server creates/created.
    It may be possible to use that to simplify the parser, or perhaps at
    least to accept that if it ends up being given something else that it
    might not perform as well with it.  I'm not sure how helpful that will
    be to you, but I recall David finding it a helpful thought.
    
    > I'm not sure whether what you and David are arguing boils down to
    > thinking that I'm wrong when I say that doing that is hard, or whether
    > you know it's hard but you just don't care because you'd rather see
    > the feature go nowhere than use a format other than JSON. I don't see
    > much difference between the latter position and a desire to block the
    > feature permanently. And if it's the former then you have yet to make
    > any suggestions for how to get it done with reasonable effort.
    
    There seems to be a great deal of daylight between the two positions
    you're proposing I might have (as I don't speak for David..).
    
    I *do* think there's a lot of work that would need to be done here to
    make this a good solution.  I'm *not* completely against other formats
    besides JSON.  Even more so though, I am *not* argueing that this
    feature should go 'nowhere', whether it uses JSON or not.
    
    What I don't care for is having a hand-hacked inflexible format that's
    going to require everyone down the road to implement their own parser
    for it and bespoke code for every version of the custom format that
    there ends up being, *including* PG core, to be clear.  Whatever utility
    is going to be utilizing this manifest, it's going to need to support
    older versions, just like pg_dump deals with older versions of custom
    format dumps (though we still get people complaining about not being
    able to use older tools with newer dumps- it'd be awful nice if we
    could use JSON, or something, and then just *add* things that wouldn't
    break older tools, except for the rare case where we don't have a
    choice..).  Not to mention the debugging grief and such, since we can't
    just use a tool like jq to check out what's going on.  
    
    As to the reference to pg_hba.conf- I don't think the packagers would
    necessairly agree that there's been little grief around that, but even
    so, a given pg_hba.conf is only going to be used with a given major
    version and, sure, it might have to be updated to that newer major
    version's format if we change the format and someone copies the old
    version to the new version, but that's during a major version upgrade of
    the server, and at least newer tools don't have to deal with the older
    pg_hba.conf version.
    
    Also, pg_hba.conf doesn't seem like a terribly good example in any case-
    the last time the actual structure of that file was changed in a
    breaking way was in 2002 when the 'user' column was added, and the
    example pg_hba.conf from that commit works just fine with PG12, it
    seems, based on some quick tests.  There have been other
    backwards-incompatible changes, of course, the last being 6 years ago, I
    think, when 'krb5' was removed.  I suppose there is some chance that you
    might have a PG12-configured pg_hba.conf and you try copying that back
    to a PG11 or PG10 server and it doesn't work, but that strikes me as far
    less of an issue than trying to read a PG12 backup with a PG11 tool,
    which we know people do because they complain on the lists about it with
    pg_dump/pg_restore.
    
    Thanks,
    
    Stephen
    
  84. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-07T18:05:33Z

    On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > Well, I don't know how to make you happy here.
    >
    > I suppose I should admit that, first off, I don't feel you're required
    > to make me happy, and I don't think it's necessary to make me happy to
    > get this feature into PG.
    
    Fair enough. That is gracious of you, but I would like to try to make
    you happy if it is possible to do so.
    
    > Since you expressed that interest though, I'll go out on a limb and say
    > that what would make me *really* happy would be to think about where the
    > project should be taking pg_basebackup, what we should be working on
    > *today* to address the concerns we hear about from our users, and to
    > consider the best way to implement solutions to what they're actively
    > asking for a core backup solution to be providing.  I get that maybe
    > that isn't how the world works and that sometimes we have people who
    > write our paychecks wanting us to work on something else, and yes, I'm
    > sure there are some users who are asking for this specific thing but I
    > certainly don't think it's a common ask of pg_basebackup or what users
    > feel is missing from the backup options we offer in core; we had users
    > on this list specifically saying they *wouldn't* use this feature
    > (referring to the differential backup stuff, of course), in fact,
    > because of the things which are missing, which is pretty darn rare.
    
    Well, I mean, what you seem to be suggesting here is that somebody is
    driving me with a stick to do something that I don't really like but
    have to do because otherwise I won't be able to make rent, but that's
    actually not the case. I genuinely believe that this is a good design,
    and it's driven by me, not some shadowy conglomerate of EnterpriseDB
    executives who are out to make PostgreSQL sucks. If I'm wrong and the
    design sucks, that's again not the fault of shadowy EnterpriseDB
    executives; it's my fault. Incidentally, my boss is not very shadowy
    anyhow; he's a super-nice guy, and a major reason why I work here. :-)
    
    I don't think the issue here is that I haven't thought about what
    users want, but that not everybody wants the same thing, and it's
    seems like the people with whom I interact want somewhat different
    things than those with whom you interact. EnterpriseDB has an existing
    tool that does parallel and block-level incremental backup, and I
    started out with the goal of providing those same capabilities in
    core. They are quite popular with EnterpriseDB customers, and I'd like
    to make them more widely available and, as far as I can, improve on
    them. From our previous discussion and from a (brief) look at
    pgbackrest, I gather that the interests of your customers are somewhat
    different. Apparently, block-level incremental backup isn't quite as
    important to your customers, perhaps because you've already got
    file-level incremental backup, but various other things like
    encryption and backup verification are extremely important, and you've
    got a set of ideas about what would be valuable in the future which
    I'm sure is based on real input from your customers. I hope you pursue
    those ideas, and I hope you do it in core rather than in a separate
    piece of software, but that's up to you. Meanwhile, I think that if I
    have somewhat different ideas about what I'd like to pursue, that
    ought to be just fine. And I don't think it is unreasonable to hope
    that you'll acknowledge my goals as legitimate even if you have
    different ones.
    
    I want to point out that my idea about how to do all of this has
    shifted by a considerable amount based on the input that you and David
    have provided. My original design didn't involve a backup manifest,
    but now it does. That turned out to be necessary, but it was also
    something you suggested, and something where I asked and took advice
    on what ought to go into it. Likewise, you suggested that the process
    of taking the backup should involve giving the client more control
    rather than trying to do everything on the server side, and that is
    now the design which I plan to pursue. You suggested that because it
    would be more advantageous for out-of-core backup tools, such as
    pgbackrest, and I acknowledge that as a benefit and I think we're
    headed in that direction. I am not doing a single thing which, to my
    knowledge, blocks anything that you might want to do with
    pg_basebackup in the future. I have accepted as much of your input as
    I believe that I can without killing the project off completely. To go
    further, I'd have to either accept years of delay or abandon my
    priorities entirely and pursue yours.
    
    > That's what would make *me* happy.  Even some comments about how to
    > *get* there while also working towards these features would be likely
    > to make me happy.  Instead, I feel like we're being told that we need
    > this feature badly in v13 and we're going to cut bait and do whatever
    > is necessary to get us there.
    
    This seems like a really unfair accusation given how much work I've
    put into trying to satisfy you and David. If this patch, the parallel
    full backup patch, and the incremental backup patch were all to get
    committed to v13, an outcome which seems pretty unlikely to me at this
    point, then you would have a very significant number of things that
    you have requested in the course of the various discussions, and
    AFAICS the only thing you'd have that you don't want is the need to
    parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
    than $a = parse_json(join '', <>). You would, for example, have the
    ability to request an individual file from the server rather than a
    complete tarball. Maybe the command that requests a file would lack an
    encryption option, something which IIUC you would like to have, but
    that certainly does not leave you worse off. It is easier to add an
    encryption option to a command which you already have than it is to
    invent a whole new command -- or really several whole new commands,
    since such a command is not really usable unless you also have
    facilities to start and stop a backup through the replication
    protocol.
    
    All that being said, I continue to maintain that insisting on JSON is
    not a reasonable request. It is not easy to parse JSON, or a subset of
    JSON. The amount of code required to write even a stripped-down JSON
    parser is far more than the amount required to split a file on tabs,
    and the existing code we have for the backend cannot be easily (or
    even with moderate effort) adapted to work in the frontend. On the
    other hand, the code that pgbackrest would need to parse the manifest
    file format I've proposed could have easily been written in less time
    than you've spent arguing about it. Heck, if it helps, I'll offer
    write that patch myself (I could be a pgbackrest contributor!). I
    don't want this effort to suck because something gets rushed through
    too quickly, but I also don't want it to get derailed because of what
    I view as a relatively minor detail. It is not always right to take
    the easier road, but it is also not always wrong. I have no illusions
    that what is being proposed here is perfect, but lots of features
    started out imperfect and get better over time -- RLS and parallel
    query come to mind, among others -- and we often learn from the
    experience of shipping something which parts of the feature are most
    in need of improvement.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  85. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-01-08T01:33:48Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > > Well, I don't know how to make you happy here.
    > >
    > > I suppose I should admit that, first off, I don't feel you're required
    > > to make me happy, and I don't think it's necessary to make me happy to
    > > get this feature into PG.
    > 
    > Fair enough. That is gracious of you, but I would like to try to make
    > you happy if it is possible to do so.
    
    I certainly appreciate that, but I don't know that it is possible to do
    so while approaching this in the order that you are, which I tried to
    point out previously.
    
    > > Since you expressed that interest though, I'll go out on a limb and say
    > > that what would make me *really* happy would be to think about where the
    > > project should be taking pg_basebackup, what we should be working on
    > > *today* to address the concerns we hear about from our users, and to
    > > consider the best way to implement solutions to what they're actively
    > > asking for a core backup solution to be providing.  I get that maybe
    > > that isn't how the world works and that sometimes we have people who
    > > write our paychecks wanting us to work on something else, and yes, I'm
    > > sure there are some users who are asking for this specific thing but I
    > > certainly don't think it's a common ask of pg_basebackup or what users
    > > feel is missing from the backup options we offer in core; we had users
    > > on this list specifically saying they *wouldn't* use this feature
    > > (referring to the differential backup stuff, of course), in fact,
    > > because of the things which are missing, which is pretty darn rare.
    > 
    > Well, I mean, what you seem to be suggesting here is that somebody is
    > driving me with a stick to do something that I don't really like but
    > have to do because otherwise I won't be able to make rent, but that's
    > actually not the case. I genuinely believe that this is a good design,
    > and it's driven by me, not some shadowy conglomerate of EnterpriseDB
    > executives who are out to make PostgreSQL sucks. If I'm wrong and the
    > design sucks, that's again not the fault of shadowy EnterpriseDB
    > executives; it's my fault. Incidentally, my boss is not very shadowy
    > anyhow; he's a super-nice guy, and a major reason why I work here. :-)
    
    Then I just have to disagree, really vehemently, that having a
    block-level incremental backup solution without solid dependency
    handling between incremental and full backups, solid WAL management and
    archiving, expiration handling for incremental/full backups and WAL, and
    the manifest that that this thread has been about, is a good design.
    
    Ultimately, what this calls for is some kind of 'repository' which
    you've stressed you don't think is a good idea for pg_basebackup to ever
    deal with and I just can't disagree more with that.  I could perhaps
    agree that it isn't appropriate for the specific tool "pg_basebackup" to
    work with a repo because of the goal of that particular tool, but in
    that case, I don't think pg_basebackup should be the tool to provide a
    block-level incremental backup solution, it should continue to be a tool
    to provide a simple and easy way to take a one-time, complete, snapshot
    of a running PG system over the replication protocol- and adding support
    for parallel backups, or encrypted backups, or similar things would be
    completely in-line and appropriate for such a tool, and I'm not against
    those features being added to pg_basebackup even in advance of anything
    like support for a repo or dependency handling.
    
    > I don't think the issue here is that I haven't thought about what
    > users want, but that not everybody wants the same thing, and it's
    > seems like the people with whom I interact want somewhat different
    > things than those with whom you interact. EnterpriseDB has an existing
    > tool that does parallel and block-level incremental backup, and I
    > started out with the goal of providing those same capabilities in
    > core. They are quite popular with EnterpriseDB customers, and I'd like
    > to make them more widely available and, as far as I can, improve on
    > them. From our previous discussion and from a (brief) look at
    > pgbackrest, I gather that the interests of your customers are somewhat
    > different. Apparently, block-level incremental backup isn't quite as
    > important to your customers, perhaps because you've already got
    > file-level incremental backup, but various other things like
    > encryption and backup verification are extremely important, and you've
    > got a set of ideas about what would be valuable in the future which
    > I'm sure is based on real input from your customers. I hope you pursue
    > those ideas, and I hope you do it in core rather than in a separate
    > piece of software, but that's up to you. Meanwhile, I think that if I
    > have somewhat different ideas about what I'd like to pursue, that
    > ought to be just fine. And I don't think it is unreasonable to hope
    > that you'll acknowledge my goals as legitimate even if you have
    > different ones.
    
    I'm all for block-level incremental backup, in general (though I've got
    concerns about it from a correctness standpoint..  I certainly think
    it's going to be difficult to get right and probably finicky, but
    hopefully your experience with BART has let you identify where the
    dragons lie and it'll be interesting to see what that code looks like
    and if the approach used can be leveraged in other tools), but I am
    concerned about how we're getting there.
    
    > I want to point out that my idea about how to do all of this has
    > shifted by a considerable amount based on the input that you and David
    > have provided. My original design didn't involve a backup manifest,
    > but now it does. That turned out to be necessary, but it was also
    > something you suggested, and something where I asked and took advice
    > on what ought to go into it. Likewise, you suggested that the process
    > of taking the backup should involve giving the client more control
    > rather than trying to do everything on the server side, and that is
    > now the design which I plan to pursue. You suggested that because it
    > would be more advantageous for out-of-core backup tools, such as
    > pgbackrest, and I acknowledge that as a benefit and I think we're
    > headed in that direction. I am not doing a single thing which, to my
    > knowledge, blocks anything that you might want to do with
    > pg_basebackup in the future. I have accepted as much of your input as
    > I believe that I can without killing the project off completely. To go
    > further, I'd have to either accept years of delay or abandon my
    > priorities entirely and pursue yours.
    
    While I'm hopeful that the parallel backup pieces will be useful to
    out-of-core backup tools, I've been increasingly less confident that
    it'll end up being very useful to pgbackrest, as much as I would like it
    to be.  Perhaps after it's in place we might be able to work on it to
    make it useful, but we'd need to push all the features like encryption
    and options for compression and such into the backend, in a way that
    works for pgbackrest, to be able to leverage it, and I'm not sure that
    would get much support or that it could be done in a way that doesn't
    end up causing problems for pg_basebackup, which clearly wouldn't be
    acceptable.  Further, if we can't leverage the PG backup protocol that
    you're building here, it seems pretty darn unlikely we'd have much use
    for the manifest that's built as part of that.
    
    I'm probably going to lose what credibility I have in critizing what
    you're doing with pg_basebackup here, but I started off saying you don't
    have to make me happy and this is part of why- I really don't think
    there's much that you're doing with pg_basebackup that is ultimately
    going to impact what plans I have for the future, for pretty much
    anything.  I haven't got any real specific plans around pg_basebackup,
    though, point-in-fact, if you put in a bunch of code that shows how to
    get PG and pg_basebackup to do block-level incremental backups in a safe
    and trusted way, that would actually be *really* useful to the
    pgbackrest project because we could then lift that logic out of
    pg_basebackup and leverage it.  If I wanted to be entirely selfish, I'd
    be pushing you to get block-level incremental backup into pg_basebackup
    as quickly as possible so that we could have such an example of "how to
    do it in a way that, if it breaks, the PG community will figure out what
    went wrong and fix it".  If you look at other things we've done, such as
    not backing up unlogged tables, that's exactly the approach we've used:
    introduce the feature into pg_basebackup *first*, make sure the
    community agrees that it's a valid approach and will deal with any
    issues with it (and will take pains to avoid *breaking* it in future
    versions..), and only *then* introduce it into pgbackrest by using the
    same approach.  Those other features were well in-line with what makes
    sense for pg_basebackup too though.
    
    We haven't done that though, and I haven't been pushing in that
    direction, not because I think it's a bad feature or that I want to
    block something going into pg_basebackup or whatever, but because I
    think it's actually going to cause more problems for users than it
    solves because some users will want to use it (though not all, as we've
    seen on this list, as there's at least some users out there who are as
    scared of the idea of having *just* this in pg_basebackup without the
    other things I talk about above as I am) and then they're going to try
    and hack together all those other things they need around WAL management
    and archiving and expiration and they're likely to get it wrong- perhaps
    in obvious ways, perhaps in relatively subtle ways, but either way,
    they'll end up with backups that aren't valid that they only discover
    when they're in an emergency.  Again, perhaps selfish me would say "oh
    good, then they'll call me and pay me lots to fix it for them", but it
    certainly wouldn't look good for the community- even if all of the
    documentation and everything we put out there says that they way they
    were doing it had this subtle issue or whatever (considering our docs
    still promote a really bad, imv anyway, archive command kinda makes this
    likely, if you ask me anyway..), and it wouldn't be good for the user.
    
    > > That's what would make *me* happy.  Even some comments about how to
    > > *get* there while also working towards these features would be likely
    > > to make me happy.  Instead, I feel like we're being told that we need
    > > this feature badly in v13 and we're going to cut bait and do whatever
    > > is necessary to get us there.
    > 
    > This seems like a really unfair accusation given how much work I've
    > put into trying to satisfy you and David. If this patch, the parallel
    > full backup patch, and the incremental backup patch were all to get
    > committed to v13, an outcome which seems pretty unlikely to me at this
    > point, then you would have a very significant number of things that
    > you have requested in the course of the various discussions, and
    > AFAICS the only thing you'd have that you don't want is the need to
    > parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
    > than $a = parse_json(join '', <>). You would, for example, have the
    > ability to request an individual file from the server rather than a
    > complete tarball. Maybe the command that requests a file would lack an
    > encryption option, something which IIUC you would like to have, but
    > that certainly does not leave you worse off. It is easier to add an
    > encryption option to a command which you already have than it is to
    > invent a whole new command -- or really several whole new commands,
    > since such a command is not really usable unless you also have
    > facilities to start and stop a backup through the replication
    > protocol.
    
    No, the manifest format is definitely not the only issue that I have
    with this- but as it relates to the thread about building a manifest, my
    complaint really is isolated to the format and just forward thinking
    about how the format you're advocating for will mean custom code for who
    knows how many different tools.  While I appreciate the offer to write
    all the bespoke code for every version of the manifest for pgbackrest,
    I'm really not thrilled about the idea of having to have that extra code
    and having to then maintain it.  Yes, when you compare the single format
    of the manifest and the code required for it against a JSON parser, if
    we only ever have this one format then it'd win in terms of code, but I
    don't believe it'll end up being one format, instead we're going to end
    up with multiple formats, each of which will have some additional code
    for dealing with parsing it, and that's going to add up.  That's also
    going to, as I said before, make it almost certain that we can't use
    older tools with newer backups.  These are issues that we've thought
    about and worried about over the years of pgbackrest and with that
    experience we've come down on the side that a JSON-based format would be
    an altogether better design.  That's why we're advocating for it, not
    because it requires more code or so that it delays the efforts here, but
    because we've been there, we've used other formats, we've dealt with
    user complaints when we do break things, this is all history for us
    that's helped us learn- for PG, it looks like the future with a static
    format, and I get that the future is hard to predict and pg_basebackup
    isn't pgbackrest and yeah, I could be completely wrong because I don't
    actually have a crystal ball, but this starting point sure looks really
    familiar.
    
    Thanks,
    
    Stephen
    
  86. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-01-10T01:19:00Z

    Hi Robert,
    
    On 1/7/20 6:33 PM, Stephen Frost wrote:
    
     > These are issues that we've thought
     > about and worried about over the years of pgbackrest and with that
     > experience we've come down on the side that a JSON-based format would be
     > an altogether better design.  That's why we're advocating for it, not
     > because it requires more code or so that it delays the efforts here, but
     > because we've been there, we've used other formats, we've dealt with
     > user complaints when we do break things, this is all history for us
     > that's helped us learn- for PG, it looks like the future with a static
     > format, and I get that the future is hard to predict and pg_basebackup
     > isn't pgbackrest and yeah, I could be completely wrong because I don't
     > actually have a crystal ball, but this starting point sure looks really
     > familiar.
    
    For example, have you considered what will happen if you have a file in 
    the cluster with a tab in the name?  This is perfectly valid in Posix 
    filesystems, at least.  You may already be escaping tabs but the simple 
    code snippet you provided earlier isn't going to work so well either 
    way.  It gets complicated quickly.
    
    I know users should not be creating weird files in PGDATA, but it's 
    amazing how often this sort of thing pops up.  We currently have an open 
    issue because = in file names breaks our file format.  Tab is surely 
    less common but it's amazing what users will do.
    
    Another fun one is 03849840 which fixes the handling of \ characters in 
    the code which checksums the manifest.  The file is not fully JSON but 
    the checksums are and that was initially missed in the C migration.  The 
    bug never got released but it easily could have been.
    
    In short, using a quick-and-dirty homegrown format seemed great at first 
    but has caused many headaches.  Because we don't change the repo format 
    across releases we are kind of stuck with past sins until we create a 
    new repo format and write update/compatability code.  Users are 
    understandably concerned if new versions of the software won't work with 
    their repo, some of which contain years of backups (really).
    
    This doesn't even get into the work everyone else will need to do to 
    read a custom format.  I do appreciate your offer of contributing parser 
    code to pgBackRest, but honestly I'd rather it were not necessary. 
    Though of course I'd still love to see a contribution of some sort from you!
    
    Hard experience tells me that using a standard format where all these 
    issues have been worked out is the way to go.
    
    There are a few MIT-licensed JSON projects that are implemented in a 
    single file.  cJSON is very capable while JSMN is very minimal. Is is 
    possible that one of those (or something like it) would be acceptable? 
    It looks like the one requirement we have is that the JSON can be 
    streamed rather than just building up one big blob?  Even with that 
    requirement there are a few tricks that can be used.  JSON nests rather 
    nicely after all so the individual file records can be transmitted 
    independently of the overall file format.
    
    Your first question may be why didn't pgBackRest use one of those 
    parsers?  The answer is that JSON parsing/rendering is pretty trivial. 
    Memory management and a (datum-like) type system are the hard parts and 
    pgBackRest already had those.
    
    Would it be acceptable to bring in JSON code with a compatible license 
    to use in libcommon?  If so I'm willing to help adapt that code for use 
    in Postgres.  It's possible that the pgBackRest code could be adapted 
    similarly, but it might make more sense to start from one of these 
    general purpose parsers.
    
    Thoughts?
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  87. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-01-14T16:54:30Z

    On Thu, Jan 9, 2020 at 8:19 PM David Steele <david@pgmasters.net> wrote:
    > For example, have you considered what will happen if you have a file in
    > the cluster with a tab in the name?  This is perfectly valid in Posix
    > filesystems, at least.
    
    Yeah, there's code for that in the patch I posted. I don't think the
    validator patch deals with it, but that's fixable.
    
    > You may already be escaping tabs but the simple
    > code snippet you provided earlier isn't going to work so well either
    > way.  It gets complicated quickly.
    
    Sure, but obviously neither of those code snippets were intended to be
    used straight out of the box. Even after you parse the manifest as
    JSON, you would still - if you really want to validate it - check that
    you have the keys and values you expect, that the individual field
    values are sensible, etc. I still stand by my earlier contention that,
    as things stand today, you can parse an ad-hoc format in less code
    than a JSON format. If we had a JSON parser available on the front
    end, I think it'd be roughly comparable, but maybe the JSON format
    would come out a bit ahead. Not sure.
    
    > There are a few MIT-licensed JSON projects that are implemented in a
    > single file.  cJSON is very capable while JSMN is very minimal. Is is
    > possible that one of those (or something like it) would be acceptable?
    > It looks like the one requirement we have is that the JSON can be
    > streamed rather than just building up one big blob?  Even with that
    > requirement there are a few tricks that can be used.  JSON nests rather
    > nicely after all so the individual file records can be transmitted
    > independently of the overall file format.
    
    I haven't really looked at these. I would have expected that including
    a second JSON parser in core would provoke significant opposition.
    Generally, people dislike having more than one piece of code to do the
    same thing. I would also expect that depending on an external package
    would provoke significant opposition. If we suck the code into core,
    then we have to keep it up to date with the upstream, which is a
    significant maintenance burden - look at all the time Tom has spent on
    snowball, regex, and time zone code over the years. If we don't suck
    the code into core but depend on it, then every developer needs to
    have that package installed on their operating system, and every
    packager has to make sure that it is being built for their OS so that
    PostgreSQL can depend on it. Perhaps JSON is so popular today that
    imposing such a requirement would provoke only a groundswell of
    support, but based on past precedent I would assume that if I
    committed a patch of this sort the chances that I'd have to revert it
    would be about 99.9%. Optional dependencies for optional features are
    usually pretty well-tolerated when they're clearly necessary: e.g. you
    can't really do JIT without depending on something like LLVM, but the
    bar for a mandatory dependency has historically been quite high.
    
    > Would it be acceptable to bring in JSON code with a compatible license
    > to use in libcommon?  If so I'm willing to help adapt that code for use
    > in Postgres.  It's possible that the pgBackRest code could be adapted
    > similarly, but it might make more sense to start from one of these
    > general purpose parsers.
    
    For the reasons above, I expect this approach would be rejected, by
    Tom and by others.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  88. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-01-14T17:53:04Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > ... I would also expect that depending on an external package
    > would provoke significant opposition. If we suck the code into core,
    > then we have to keep it up to date with the upstream, which is a
    > significant maintenance burden - look at all the time Tom has spent on
    > snowball, regex, and time zone code over the years.
    
    Also worth noting is that we have a seriously bad track record about
    choosing external packages to depend on.  The regex code has no upstream
    maintainer anymore (well, the Tcl guys seem to think that *we* are
    upstream for that now), and snowball is next door to moribund.
    With C not being a particularly hip language to develop in anymore,
    it wouldn't surprise me in the least for any C-code JSON parser
    we might pick to go dead pretty soon.
    
    Between that problem and the likelihood that we'd need to make
    significant code changes anyway to meet our own coding style etc
    expectations, I think really we'd have to assume that we're going
    to fork and maintain our own copy of any code we pick.
    
    Now, if it's a small enough chunk of code (and really, how complex
    is JSON parsing anyway) maybe that doesn't matter.  But I tend to
    agree with Robert's position that it's a big ask for this patch
    to introduce a frontend JSON parser.
    
    			regards, tom lane
    
    
    
    
  89. Re: backup manifests

    David Fetter <david@fetter.org> — 2020-01-14T18:33:12Z

    On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > ... I would also expect that depending on an external package
    > > would provoke significant opposition. If we suck the code into core,
    > > then we have to keep it up to date with the upstream, which is a
    > > significant maintenance burden - look at all the time Tom has spent on
    > > snowball, regex, and time zone code over the years.
    > 
    > Also worth noting is that we have a seriously bad track record about
    > choosing external packages to depend on.  The regex code has no upstream
    > maintainer anymore (well, the Tcl guys seem to think that *we* are
    > upstream for that now), and snowball is next door to moribund.
    > With C not being a particularly hip language to develop in anymore,
    > it wouldn't surprise me in the least for any C-code JSON parser
    > we might pick to go dead pretty soon.
    
    Given jq's extreme popularity and compatible license, I'd nominate that.
    
    Best,
    David.
    -- 
    David Fetter <david(at)fetter(dot)org> http://fetter.org/
    Phone: +1 415 235 3778
    
    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
    
    
    
    
  90. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-01-14T20:35:40Z

    Greetings,
    
    * David Fetter (david@fetter.org) wrote:
    > On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
    > > Robert Haas <robertmhaas@gmail.com> writes:
    > > > ... I would also expect that depending on an external package
    > > > would provoke significant opposition. If we suck the code into core,
    > > > then we have to keep it up to date with the upstream, which is a
    > > > significant maintenance burden - look at all the time Tom has spent on
    > > > snowball, regex, and time zone code over the years.
    > > 
    > > Also worth noting is that we have a seriously bad track record about
    > > choosing external packages to depend on.  The regex code has no upstream
    > > maintainer anymore (well, the Tcl guys seem to think that *we* are
    > > upstream for that now), and snowball is next door to moribund.
    > > With C not being a particularly hip language to develop in anymore,
    > > it wouldn't surprise me in the least for any C-code JSON parser
    > > we might pick to go dead pretty soon.
    > 
    > Given jq's extreme popularity and compatible license, I'd nominate that.
    
    I don't think that really changes Tom's concerns here about having an
    "upstream" for this.
    
    For my part, I don't really agree with the whole "we don't want two
    different JSON parsers" when we've got two of a bunch of stuff between
    the frontend and the backend, particularly since I don't really think
    it'll end up being *that* much code.
    
    My thought, which I had expressed to David (though he obviously didn't
    entirely agree with me since he suggested the other options), was to
    adapt the pgBackRest JSON parser, which isn't really all that much code.
    
    Frustratingly, that code has got some internal pgBackRest dependency on
    things like the memory context system (which looks, unsurprisingly, an
    awful lot like what is in PG backend), the error handling and logging
    systems (which are different from PG because they're quite intentionally
    segregated from each other- something PG would benefit from, imv..), and
    Variadics (known in the PG backend as Datums, and quite similar to
    them..).
    
    Even so, David's offered to adjust the code to use the frontend's memory
    management (*cough* malloc()..), and error handling/logging, and he had
    some idea for Variadics (or maybe just pulling the backend's Datum
    system in..?  He could answer better), and basically write a frontend
    JSON parser for PG without too much code, no external dependencies, and
    to make sure it answers this requirement, and I've agreed that he can
    spend some time on that instead of pgBackRest to get us through this, if
    everyone else is agreeable to the idea.  Obviously this isn't intended
    to box anyone in- if there turns out even after the code's been written
    to be some fatal issue with using it, so be it, but we're offering to
    help.
    
    Thanks,
    
    Stephen
    
  91. Re: backup manifests

    David Fetter <david@fetter.org> — 2020-01-14T22:14:49Z

    On Tue, Jan 14, 2020 at 03:35:40PM -0500, Stephen Frost wrote:
    > Greetings,
    > 
    > * David Fetter (david@fetter.org) wrote:
    > > On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
    > > > Robert Haas <robertmhaas@gmail.com> writes:
    > > > > ... I would also expect that depending on an external package
    > > > > would provoke significant opposition. If we suck the code into core,
    > > > > then we have to keep it up to date with the upstream, which is a
    > > > > significant maintenance burden - look at all the time Tom has spent on
    > > > > snowball, regex, and time zone code over the years.
    > > > 
    > > > Also worth noting is that we have a seriously bad track record about
    > > > choosing external packages to depend on.  The regex code has no upstream
    > > > maintainer anymore (well, the Tcl guys seem to think that *we* are
    > > > upstream for that now), and snowball is next door to moribund.
    > > > With C not being a particularly hip language to develop in anymore,
    > > > it wouldn't surprise me in the least for any C-code JSON parser
    > > > we might pick to go dead pretty soon.
    > > 
    > > Given jq's extreme popularity and compatible license, I'd nominate that.
    > 
    > I don't think that really changes Tom's concerns here about having an
    > "upstream" for this.
    > 
    > For my part, I don't really agree with the whole "we don't want two
    > different JSON parsers" when we've got two of a bunch of stuff between
    > the frontend and the backend, particularly since I don't really think
    > it'll end up being *that* much code.
    > 
    > My thought, which I had expressed to David (though he obviously didn't
    > entirely agree with me since he suggested the other options), was to
    > adapt the pgBackRest JSON parser, which isn't really all that much code.
    > 
    > Frustratingly, that code has got some internal pgBackRest dependency on
    > things like the memory context system (which looks, unsurprisingly, an
    > awful lot like what is in PG backend), the error handling and logging
    > systems (which are different from PG because they're quite intentionally
    > segregated from each other- something PG would benefit from, imv..), and
    > Variadics (known in the PG backend as Datums, and quite similar to
    > them..).
    
    It might be more fun to put in that infrastructure and have it gate
    the manifest feature than to have two vastly different parsers to
    contend with. I get that putting off the backup manifests isn't an
    awesome prospect, but neither is rushing them in and getting them
    wrong in ways we'll still be regretting a decade hence.
    
    Best,
    David.
    -- 
    David Fetter <david(at)fetter(dot)org> http://fetter.org/
    Phone: +1 415 235 3778
    
    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
    
    
    
    
  92. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-01-15T04:36:30Z

    Hi Stephen,
    
    On 1/14/20 1:35 PM, Stephen Frost wrote:
    > 
    > My thought, which I had expressed to David (though he obviously didn't
    > entirely agree with me since he suggested the other options), was to
    > adapt the pgBackRest JSON parser, which isn't really all that much code.
    
    It's not that I didn't agree, it's just that the pgBackRest code does 
    use mem contexts, the type system, etc.  After looking at some other 
    solutions with similar amounts of code I thought they might be more 
    acceptable.  At least it seemed like a good idea to throw it out there.
    
    > Even so, David's offered to adjust the code to use the frontend's memory
    > management (*cough* malloc()..), and error handling/logging, and he had
    > some idea for Variadics (or maybe just pulling the backend's Datum
    > system in..?  He could answer better), and basically write a frontend
    > JSON parser for PG without too much code, no external dependencies, and
    > to make sure it answers this requirement, and I've agreed that he can
    > spend some time on that instead of pgBackRest to get us through this, if
    > everyone else is agreeable to the idea.  
    
    To keep it simple I think we are left with callbacks or a somewhat 
    static "what's the next datum" kind of approach.  I think the latter 
    could get us through a release or two while we make improvements.
    
    > Obviously this isn't intended
    > to box anyone in- if there turns out even after the code's been written
    > to be some fatal issue with using it, so be it, but we're offering to
    > help.
    
    I'm happy to work up a prototype unless the consensus is that we 
    absolutely don't want a second JSON parser in core.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  93. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-01-15T04:47:13Z

    David Steele <david@pgmasters.net> writes:
    > I'm happy to work up a prototype unless the consensus is that we 
    > absolutely don't want a second JSON parser in core.
    
    How much code are we talking about?  If the answer is "a few hundred
    lines", it's a lot easier to swallow than if it's "a few thousand".
    
    			regards, tom lane
    
    
    
    
  94. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-01-15T05:21:00Z

    Hi Tom,
    
    On 1/14/20 9:47 PM, Tom Lane wrote:
    > David Steele <david@pgmasters.net> writes:
    >> I'm happy to work up a prototype unless the consensus is that we
    >> absolutely don't want a second JSON parser in core.
    > 
    > How much code are we talking about?  If the answer is "a few hundred
    > lines", it's a lot easier to swallow than if it's "a few thousand".
    
    It's currently about a thousand lines but we have a lot of functions to 
    convert to/from specific types.  I imagine the line count would be 
    similar using one of the approaches I discussed above.
    
    Current source attached for reference.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
  95. Re: backup manifests

    Bruce Momjian <bruce@momjian.us> — 2020-01-18T14:36:14Z

    On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
    > Also worth noting is that we have a seriously bad track record about
    > choosing external packages to depend on.  The regex code has no upstream
    > maintainer anymore (well, the Tcl guys seem to think that *we* are
    > upstream for that now), and snowball is next door to moribund.
    > With C not being a particularly hip language to develop in anymore,
    > it wouldn't surprise me in the least for any C-code JSON parser
    > we might pick to go dead pretty soon.
    > 
    > Between that problem and the likelihood that we'd need to make
    > significant code changes anyway to meet our own coding style etc
    > expectations, I think really we'd have to assume that we're going
    > to fork and maintain our own copy of any code we pick.
    > 
    > Now, if it's a small enough chunk of code (and really, how complex
    > is JSON parsing anyway) maybe that doesn't matter.  But I tend to
    > agree with Robert's position that it's a big ask for this patch
    > to introduce a frontend JSON parser.
    
    I know we have talked about our experience in maintaining external code:
    
    *  TCL regex
    *  Snowball
    *  Timezone handling
    
    However, the regex code is complex, and the Snowball and timezone code
    is improved as they add new languages and time zones.  I don't see JSON
    parsing as complex or likely to change much, so it might be acceptable
    to include it in our frontend code.
    
    As far as using tab-delimited data, I know this usage was compared to
    postgresql.conf and pg_hba.conf, which don't change much.  However,
    those files are not usually written, and do not contain user data, while
    the backup file might contain user-specified paths if they are not just
    relative to the PGDATA directory, and that would make escaping a
    requirement.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
    + As you are, so once was I.  As I am, so you will be. +
    +                      Ancient Roman grave inscription +
    
    
    
    
  96. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-02-27T15:52:25Z

    On Fri, Jan 3, 2020 at 6:11 PM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > Thank you for review comments.
    
    Here's a new patch set for this feature.
    
    0001 adds checksum helper functions, similar to what Suraj had
    incorporated into my original patch but separated out into a separate
    patch and with some different aesthetic decisions. I also decided to
    support all of the SHA variants that PG knows about as options and
    added a function to parse a checksum algorithm name, along the lines I
    suggested previously.
    
    0002 teaches the server to generate a backup manifest using the format
    I originally proposed. This is similar to the patch I posted
    previously, but it spools the manifest to disk as it's being
    generated, so that we don't run the server out of memory or fail when
    hitting the 1GB allocation limit.
    
    0003 adds a new utility, pg_validatebackup, to validate a backup
    against a manifest. Suraj tried to incorporate this into
    pg_basebackup, which I initially thought might be OK but eventually
    decided wasn't good, partly because this really wants to take some
    command-line options entirely unrelated to the options accepted by
    pg_basebackup. I tried to improve the error checking and the order in
    which various things are done, too. This is a basically a complete
    rewrite as compared with Suraj's version.
    
    0004 modifies the server to generate a backup manifest in JSON format
    rather than my originally proposed format. This allows for some
    comparison of the code doing it one way vs. the other. Assuming we
    stick with JSON, I will squash this with 0002 at some point.
    
    0005 is a very much work-in-progress and proof-of-concept to modify
    the backup validator to understand the JSON format. It doesn't
    validate the manifest checksum at this point; it just prints it out.
    The error handling needs work. It has other problems, and bugs.
    Although I'm still not very happy about the idea of using JSON here,
    I'm pretty happy with the basic approach this patch takes. It
    demonstrates that the JSON parser can be used for non-trivial things
    in frontend code, and I'd say the code even looks reasonably clean -
    with the exception of small details like being buggy and
    under-commented.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  97. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-03T10:34:07Z

    On 2/27/20 9:22 PM, Robert Haas wrote:
    > Here's a new patch set for this feature.
    
    Thanks Robert.  After applying all the 5 patches (v8-00*) against PG v13 
    (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) ,
    
    There are few issues/observations
    
    1)Getting segmentation fault error if  we try pg_validatebackup against  
    a valid backup_manifest file but data directory path is WRONG
    
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D bk 
    --manifest-checksums=sha224
    
    [centos@tushar-ldap-docker bin]$ cp bk/backup_manifest /tmp/.
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
    /tmp/backup_manifest    random_directory/
    pg_validatebackup: * manifest_checksum = 
    f0460cd6aa13cf0c5e35426a41af940a9231e6425cd65115a19778b7abfdaef9
    pg_validatebackup: error: could not open directory "random_directory": 
    No such file or directory
    Segmentation fault
    
    2) when used '-R' option at the time of create base backup
    
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D bar -R
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup  bar
    pg_validatebackup: * manifest_checksum = 
    a195d3a3a82a41200c9ac92c12d764d23c810e7e91b31c44a7d04f67ce012edc
    pg_validatebackup: error: "standby.signal" is present on disk but not in 
    the manifest
    pg_validatebackup: error: "postgresql.auto.conf" has size 286 on disk 
    but size 88 in the manifest
    [centos@tushar-ldap-docker bin]$
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  98. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-03T14:49:42Z

    On 3/3/20 4:04 PM, tushar wrote:
    > Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
    > v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
    
    There is a scenario where pg_validatebackup is not throwing an error if 
    some file deleted from pg_wal/ folder and  but later at the time of 
    restoring - we are getting an error
    
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
    
    [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
    000000010000000000000010  archive_status
    
    [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
    pg_validatebackup: * manifest_checksum = 
    88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
    pg_validatebackup: backup successfully verified
    
    [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
    waiting for server to start....2020-03-02 20:05:22.732 IST [21441] LOG:  
    starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
    (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
    "::1", port 3333
    2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
    "127.0.0.1", port 3333
    2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
    "/tmp/.s.PGSQL.3333"
    2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
    interrupted; last known up at 2020-03-02 20:04:35 IST
    2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL directory 
    "pg_wal/archive_status"
    2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
    2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
    checkpoint record
    2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from a 
    backup, touch 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
    add required recovery options.
         If you are not restoring from a backup, try removing the file 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
         Be careful: removing 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
    result in a corrupt cluster if restoring from a backup.
    2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
    exited with exit code 1
    2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
    startup process failure
    2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
      stopped waiting
    pg_ctl: could not start server
    Examine the log output.
    [centos@tushar-ldap-docker bin]$
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  99. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-04T09:56:16Z

    Hi,
    Another observation , if i change the ownership of a file which is under 
    global/ directory
    i.e
    
    [root@tushar-ldap-docker global]# chown enterprisedb 2396
    
    and run the pg_validatebackup command, i am getting this message -
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
    pg_validatebackup: * manifest_checksum = 
    e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
    pg_validatebackup: error: could not open file "global/2396": Permission 
    denied
    *** Error in `./pg_validatebackup': double free or corruption (!prev): 
    0x0000000001850ba0 ***
    ======= Backtrace: =========
    /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
    ./pg_validatebackup[0x401f4c]
    /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
    ./pg_validatebackup[0x402049]
    ======= Memory map: ========
    00400000-00415000 r-xp 00000000 fd:03 4044545 
    /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    00614000-00615000 r--p 00014000 fd:03 4044545 
    /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    00615000-00616000 rw-p 00015000 fd:03 4044545 
    /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    017f3000-01878000 rw-p 00000000 00:00 0                                  
    [heap]
    7fa218000000-7fa218021000 rw-p 00000000 00:00 0
    7fa218021000-7fa21c000000 ---p 00000000 00:00 0
    7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697                     
    /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697                     
    /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697                     
    /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697                     
    /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    7fa21e338000-7fa224862000 r--p 00000000 fd:03 266442                     
    /usr/lib/locale/locale-archive
    7fa224862000-7fa224a25000 r-xp 00000000 fd:03 134456                     
    /usr/lib64/libc-2.17.so
    7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 134456                     
    /usr/lib64/libc-2.17.so
    7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 134456                     
    /usr/lib64/libc-2.17.so
    7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 134456                     
    /usr/lib64/libc-2.17.so
    7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
    7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 134485                     
    /usr/lib64/libpthread-2.17.so
    7fa224c47000-7fa224e46000 ---p 00017000 fd:03 134485                     
    /usr/lib64/libpthread-2.17.so
    7fa224e46000-7fa224e47000 r--p 00016000 fd:03 134485                     
    /usr/lib64/libpthread-2.17.so
    7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 134485                     
    /usr/lib64/libpthread-2.17.so
    7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
    7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
    /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
    /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
    /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
    /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 130333                     
    /usr/lib64/ld-2.17.so
    7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
    7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
    7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 130333                     
    /usr/lib64/ld-2.17.so
    7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 130333                     
    /usr/lib64/ld-2.17.so
    7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
    7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 0                          
    [stack]
    7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 0                          
    [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
    [vsyscall]
    Aborted
    [centos@tushar-ldap-docker bin]$
    
    
    I am getting the error message but along with "*** Error in 
    `./pg_validatebackup': double free or corruption (!prev): 
    0x0000000001850ba0 ***"  messages
    
    Is this expected ?
    
    regards,
    
    On 3/3/20 8:19 PM, tushar wrote:
    > On 3/3/20 4:04 PM, tushar wrote:
    >> Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
    >> v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
    >
    > There is a scenario where pg_validatebackup is not throwing an error 
    > if some file deleted from pg_wal/ folder and  but later at the time of 
    > restoring - we are getting an error
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
    >
    > [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
    > 000000010000000000000010  archive_status
    >
    > [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
    > pg_validatebackup: * manifest_checksum = 
    > 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
    > pg_validatebackup: backup successfully verified
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
    > waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
    > LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
    > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    > 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
    > "::1", port 3333
    > 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
    > "127.0.0.1", port 3333
    > 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
    > "/tmp/.s.PGSQL.3333"
    > 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
    > interrupted; last known up at 2020-03-02 20:04:35 IST
    > 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
    > directory "pg_wal/archive_status"
    > 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
    > 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
    > checkpoint record
    > 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from a 
    > backup, touch 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
    > add required recovery options.
    >     If you are not restoring from a backup, try removing the file 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
    >     Be careful: removing 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
    > result in a corrupt cluster if restoring from a backup.
    > 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
    > exited with exit code 1
    > 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
    > startup process failure
    > 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
    >  stopped waiting
    > pg_ctl: could not start server
    > Examine the log output.
    > [centos@tushar-ldap-docker bin]$
    >
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  100. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-04T10:21:32Z

    Another scenario, in which if we modify Manifest-Checksum" value from 
    backup_manifest file , we are not getting an error
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    pg_validatebackup: * manifest_checksum = 
    28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
    pg_validatebackup: backup successfully verified
    
    open backup_manifest file and replace
    
    "Manifest-Checksum": 
    "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
    with
    "Manifest-Checksum": "Hello World"}
    
    rerun the pg_validatebackup
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    pg_validatebackup: * manifest_checksum = Hello World
    pg_validatebackup: backup successfully verified
    
    regards,
    
    On 3/4/20 3:26 PM, tushar wrote:
    > Hi,
    > Another observation , if i change the ownership of a file which is 
    > under global/ directory
    > i.e
    >
    > [root@tushar-ldap-docker global]# chown enterprisedb 2396
    >
    > and run the pg_validatebackup command, i am getting this message -
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
    > pg_validatebackup: * manifest_checksum = 
    > e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
    > pg_validatebackup: error: could not open file "global/2396": 
    > Permission denied
    > *** Error in `./pg_validatebackup': double free or corruption (!prev): 
    > 0x0000000001850ba0 ***
    > ======= Backtrace: =========
    > /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
    > ./pg_validatebackup[0x401f4c]
    > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
    > ./pg_validatebackup[0x402049]
    > ======= Memory map: ========
    > 00400000-00415000 r-xp 00000000 fd:03 4044545 
    > /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    > 00614000-00615000 r--p 00014000 fd:03 4044545 
    > /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    > 00615000-00616000 rw-p 00015000 fd:03 4044545 
    > /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    > 017f3000-01878000 rw-p 00000000 00:00 
    > 0                                  [heap]
    > 7fa218000000-7fa218021000 rw-p 00000000 00:00 0
    > 7fa218021000-7fa21c000000 ---p 00000000 00:00 0
    > 7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 
    > 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    > 7fa21e137000-7fa21e336000 ---p 00015000 fd:03 
    > 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    > 7fa21e336000-7fa21e337000 r--p 00014000 fd:03 
    > 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    > 7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 
    > 141697                     /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    > 7fa21e338000-7fa224862000 r--p 00000000 fd:03 
    > 266442                     /usr/lib/locale/locale-archive
    > 7fa224862000-7fa224a25000 r-xp 00000000 fd:03 
    > 134456                     /usr/lib64/libc-2.17.so
    > 7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 
    > 134456                     /usr/lib64/libc-2.17.so
    > 7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 
    > 134456                     /usr/lib64/libc-2.17.so
    > 7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 
    > 134456                     /usr/lib64/libc-2.17.so
    > 7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
    > 7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 
    > 134485                     /usr/lib64/libpthread-2.17.so
    > 7fa224c47000-7fa224e46000 ---p 00017000 fd:03 
    > 134485                     /usr/lib64/libpthread-2.17.so
    > 7fa224e46000-7fa224e47000 r--p 00016000 fd:03 
    > 134485                     /usr/lib64/libpthread-2.17.so
    > 7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 
    > 134485                     /usr/lib64/libpthread-2.17.so
    > 7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
    > 7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
    > /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    > 7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
    > /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    > 7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
    > /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    > 7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
    > /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    > 7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 
    > 130333                     /usr/lib64/ld-2.17.so
    > 7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
    > 7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
    > 7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 
    > 130333                     /usr/lib64/ld-2.17.so
    > 7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 
    > 130333                     /usr/lib64/ld-2.17.so
    > 7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
    > 7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 
    > 0                          [stack]
    > 7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 
    > 0                          [vdso]
    > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 
    > 0                  [vsyscall]
    > Aborted
    > [centos@tushar-ldap-docker bin]$
    >
    >
    > I am getting the error message but along with "*** Error in 
    > `./pg_validatebackup': double free or corruption (!prev): 
    > 0x0000000001850ba0 ***"  messages
    >
    > Is this expected ?
    >
    > regards,
    >
    > On 3/3/20 8:19 PM, tushar wrote:
    >> On 3/3/20 4:04 PM, tushar wrote:
    >>> Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
    >>> v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
    >>
    >> There is a scenario where pg_validatebackup is not throwing an error 
    >> if some file deleted from pg_wal/ folder and  but later at the time 
    >> of restoring - we are getting an error
    >>
    >> [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
    >>
    >> [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
    >> 000000010000000000000010  archive_status
    >>
    >> [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
    >>
    >> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
    >> pg_validatebackup: * manifest_checksum = 
    >> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
    >> pg_validatebackup: backup successfully verified
    >>
    >> [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
    >> waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
    >> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
    >> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    >> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
    >> "::1", port 3333
    >> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
    >> "127.0.0.1", port 3333
    >> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
    >> "/tmp/.s.PGSQL.3333"
    >> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
    >> interrupted; last known up at 2020-03-02 20:04:35 IST
    >> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
    >> directory "pg_wal/archive_status"
    >> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
    >> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate required 
    >> checkpoint record
    >> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from 
    >> a backup, touch 
    >> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" and 
    >> add required recovery options.
    >>     If you are not restoring from a backup, try removing the file 
    >> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
    >>     Be careful: removing 
    >> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
    >> result in a corrupt cluster if restoring from a backup.
    >> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 21442) 
    >> exited with exit code 1
    >> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
    >> startup process failure
    >> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
    >>  stopped waiting
    >> pg_ctl: could not start server
    >> Examine the log output.
    >> [centos@tushar-ldap-docker bin]$
    >>
    >
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  101. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-04T13:51:03Z

    Hi,
    
    There is a scenario in which i add something inside the pg_tablespace 
    directory , i am getting an error like-
    
    pg_validatebackup: * manifest_checksum = 
    77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is 
    present on disk but not in the manifest
    
    but if i remove 'PG_13_202002271 ' directory then there is no error
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
    pg_validatebackup: * manifest_checksum = 
    77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    pg_validatebackup: backup successfully verified
    
    Steps to reproduce -
    --connect to psql terminal   , create a tablespace
    postgres=# \! mkdir /tmp/my_tblspc
    postgres=# create tablespace tbs location '/tmp/my_tblspc';
    CREATE TABLESPACE
    postgres=# \q
    
    --run pg_basebackup
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D data_dir   -T 
    /tmp/my_tblspc/=/tmp/new_my_tblspc
    [centos@tushar-ldap-docker bin]$
    [centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
    PG_13_202002271
    
    --create a new file under PG_13_* folder
    [centos@tushar-ldap-docker bin]$ touch 
    /tmp/new_my_tblspc/PG_13_202002271/test
    [centos@tushar-ldap-docker bin]$
    
    --run pg_validatebackup ,Getting an error which looks expected
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
    pg_validatebackup: * manifest_checksum = 
    3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
    pg_validatebackup: error: "pg_tblspc/16386/PG_13_202002271/test" is 
    present on disk but not in the manifest
    [centos@tushar-ldap-docker bin]$
    
    --remove the added file
    [centos@tushar-ldap-docker bin]$ rm -rf   
    /tmp/new_my_tblspc/PG_13_202002271/test
    
    --run pg_validatebackup , working fine
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
    pg_validatebackup: * manifest_checksum = 
    3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
    pg_validatebackup: backup successfully verified
    [centos@tushar-ldap-docker bin]$
    
    --remove the folder PG_13*
    [centos@tushar-ldap-docker bin]$ rm -rf   
    /tmp/new_my_tblspc/PG_13_202002271/
    [centos@tushar-ldap-docker bin]$
    [centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
    
    --run pg_validatebackup ,   No error reported  ?
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
    pg_validatebackup: * manifest_checksum = 
    3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
    pg_validatebackup: backup successfully verified
    [centos@tushar-ldap-docker bin]$
    
    Start the server -
    
    [centos@tushar-ldap-docker bin]$ ./pg_ctl -D data_dir/ start -o '-p 9033'
    waiting for server to start....2020-03-04 19:18:54.839 IST [13097] LOG:  
    starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
    (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv6 address 
    "::1", port 9033
    2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv4 address 
    "127.0.0.1", port 9033
    2020-03-04 19:18:54.842 IST [13097] LOG:  listening on Unix socket 
    "/tmp/.s.PGSQL.9033"
    2020-03-04 19:18:54.843 IST [13097] LOG:  could not open directory 
    "pg_tblspc/16386/PG_13_202002271": No such file or directory
    2020-03-04 19:18:54.845 IST [13098] LOG:  database system was 
    interrupted; last known up at 2020-03-04 19:14:50 IST
    2020-03-04 19:18:54.937 IST [13098] LOG:  could not open directory 
    "pg_tblspc/16386/PG_13_202002271": No such file or directory
    2020-03-04 19:18:54.939 IST [13098] LOG:  could not open directory 
    "pg_tblspc/16386/PG_13_202002271": No such file or directory
    2020-03-04 19:18:54.939 IST [13098] LOG:  redo starts at 0/18000028
    2020-03-04 19:18:54.939 IST [13098] LOG:  consistent recovery state 
    reached at 0/18000100
    2020-03-04 19:18:54.939 IST [13098] LOG:  redo done at 0/18000100
    2020-03-04 19:18:54.941 IST [13098] LOG:  could not open directory 
    "pg_tblspc/16386/PG_13_202002271": No such file or directory
    2020-03-04 19:18:54.984 IST [13097] LOG:  database system is ready to 
    accept connections
      done
    server started
    [centos@tushar-ldap-docker bin]$
    
    regards,
    
    On 3/4/20 3:51 PM, tushar wrote:
    > Another scenario, in which if we modify Manifest-Checksum" value from 
    > backup_manifest file , we are not getting an error
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    > pg_validatebackup: * manifest_checksum = 
    > 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
    > pg_validatebackup: backup successfully verified
    >
    > open backup_manifest file and replace
    >
    > "Manifest-Checksum": 
    > "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
    > with
    > "Manifest-Checksum": "Hello World"}
    >
    > rerun the pg_validatebackup
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    > pg_validatebackup: * manifest_checksum = Hello World
    > pg_validatebackup: backup successfully verified
    >
    > regards,
    >
    > On 3/4/20 3:26 PM, tushar wrote:
    >> Hi,
    >> Another observation , if i change the ownership of a file which is 
    >> under global/ directory
    >> i.e
    >>
    >> [root@tushar-ldap-docker global]# chown enterprisedb 2396
    >>
    >> and run the pg_validatebackup command, i am getting this message -
    >>
    >> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
    >> pg_validatebackup: * manifest_checksum = 
    >> e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
    >> pg_validatebackup: error: could not open file "global/2396": 
    >> Permission denied
    >> *** Error in `./pg_validatebackup': double free or corruption 
    >> (!prev): 0x0000000001850ba0 ***
    >> ======= Backtrace: =========
    >> /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
    >> ./pg_validatebackup[0x401f4c]
    >> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
    >> ./pg_validatebackup[0x402049]
    >> ======= Memory map: ========
    >> 00400000-00415000 r-xp 00000000 fd:03 4044545 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    >> 00614000-00615000 r--p 00014000 fd:03 4044545 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    >> 00615000-00616000 rw-p 00015000 fd:03 4044545 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
    >> 017f3000-01878000 rw-p 00000000 00:00 
    >> 0                                  [heap]
    >> 7fa218000000-7fa218021000 rw-p 00000000 00:00 0
    >> 7fa218021000-7fa21c000000 ---p 00000000 00:00 0
    >> 7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697 
    >> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    >> 7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697 
    >> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    >> 7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697 
    >> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    >> 7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697 
    >> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
    >> 7fa21e338000-7fa224862000 r--p 00000000 fd:03 
    >> 266442                     /usr/lib/locale/locale-archive
    >> 7fa224862000-7fa224a25000 r-xp 00000000 fd:03 
    >> 134456                     /usr/lib64/libc-2.17.so
    >> 7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 
    >> 134456                     /usr/lib64/libc-2.17.so
    >> 7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 
    >> 134456                     /usr/lib64/libc-2.17.so
    >> 7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 
    >> 134456                     /usr/lib64/libc-2.17.so
    >> 7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
    >> 7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 
    >> 134485                     /usr/lib64/libpthread-2.17.so
    >> 7fa224c47000-7fa224e46000 ---p 00017000 fd:03 
    >> 134485                     /usr/lib64/libpthread-2.17.so
    >> 7fa224e46000-7fa224e47000 r--p 00016000 fd:03 
    >> 134485                     /usr/lib64/libpthread-2.17.so
    >> 7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 
    >> 134485                     /usr/lib64/libpthread-2.17.so
    >> 7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
    >> 7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    >> 7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    >> 7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    >> 7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
    >> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
    >> 7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 
    >> 130333                     /usr/lib64/ld-2.17.so
    >> 7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
    >> 7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
    >> 7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 
    >> 130333                     /usr/lib64/ld-2.17.so
    >> 7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 
    >> 130333                     /usr/lib64/ld-2.17.so
    >> 7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
    >> 7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 
    >> 0                          [stack]
    >> 7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 
    >> 0                          [vdso]
    >> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 
    >> 0                  [vsyscall]
    >> Aborted
    >> [centos@tushar-ldap-docker bin]$
    >>
    >>
    >> I am getting the error message but along with "*** Error in 
    >> `./pg_validatebackup': double free or corruption (!prev): 
    >> 0x0000000001850ba0 ***"  messages
    >>
    >> Is this expected ?
    >>
    >> regards,
    >>
    >> On 3/3/20 8:19 PM, tushar wrote:
    >>> On 3/3/20 4:04 PM, tushar wrote:
    >>>> Thanks Robert.  After applying all the 5 patches (v8-00*) against 
    >>>> PG v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
    >>>
    >>> There is a scenario where pg_validatebackup is not throwing an error 
    >>> if some file deleted from pg_wal/ folder and  but later at the time 
    >>> of restoring - we are getting an error
    >>>
    >>> [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
    >>>
    >>> [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
    >>> 000000010000000000000010  archive_status
    >>>
    >>> [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
    >>>
    >>> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
    >>> pg_validatebackup: * manifest_checksum = 
    >>> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
    >>> pg_validatebackup: backup successfully verified
    >>>
    >>> [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
    >>> waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
    >>> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled 
    >>> by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    >>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
    >>> "::1", port 3333
    >>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
    >>> "127.0.0.1", port 3333
    >>> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
    >>> "/tmp/.s.PGSQL.3333"
    >>> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
    >>> interrupted; last known up at 2020-03-02 20:04:35 IST
    >>> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
    >>> directory "pg_wal/archive_status"
    >>> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
    >>> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate 
    >>> required checkpoint record
    >>> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from 
    >>> a backup, touch 
    >>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" 
    >>> and add required recovery options.
    >>>     If you are not restoring from a backup, try removing the file 
    >>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
    >>>     Be careful: removing 
    >>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
    >>> result in a corrupt cluster if restoring from a backup.
    >>> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 
    >>> 21442) exited with exit code 1
    >>> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
    >>> startup process failure
    >>> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
    >>>  stopped waiting
    >>> pg_ctl: could not start server
    >>> Examine the log output.
    >>> [centos@tushar-ldap-docker bin]$
    >>>
    >>
    >
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
  102. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-03-05T03:50:19Z

    On Wed, Mar 4, 2020 at 3:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
    
    > Another scenario, in which if we modify Manifest-Checksum" value from
    > backup_manifest file , we are not getting an error
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    > pg_validatebackup: * manifest_checksum =
    > 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
    > pg_validatebackup: backup successfully verified
    >
    > open backup_manifest file and replace
    >
    > "Manifest-Checksum":
    > "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
    > with
    > "Manifest-Checksum": "Hello World"}
    >
    > rerun the pg_validatebackup
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
    > pg_validatebackup: * manifest_checksum = Hello World
    > pg_validatebackup: backup successfully verified
    >
    > regards,
    >
    
    Yeah, This handling is missing in the provided WIP patch. I believe Robert
    will consider this fixing in upcoming version of validator patch.
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  103. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-03-05T04:07:13Z

    On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
    
    > Hi,
    >
    > There is a scenario in which i add something inside the pg_tablespace
    > directory , i am getting an error like-
    >
    > pg_validatebackup: * manifest_checksum =
    > 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    > pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is
    > present on disk but not in the manifest
    >
    > but if i remove 'PG_13_202002271 ' directory then there is no error
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
    > pg_validatebackup: * manifest_checksum =
    > 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    > pg_validatebackup: backup successfully verified
    >
    >
    This seems expected considering current design as we don't log the
    directory entries in backup_manifest. In your case, you have tablespace
    with no objects (empty tablespace) then backup_manifest does not have any
    entry for this hence when you remove this tablespace directory, validator
    could not detect it.
    
    We can either document it or add the entry for directories in the manifest.
    Robert may have a better idea on this.
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  104. Re: backup manifests

    Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> — 2020-03-05T07:39:02Z

    Hi,
    
    In a negative test scenario, if I changed size to -1 in backup_manifest,
    pg_validatebackup giving
    error with a random size number.
    
    [edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold
    --manifest-checksum 'SHA256'
    [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    pg_validatebackup: backup successfully verified
    
    --change a file size to -1 and generate new checksum.
    [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    [edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
    c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a
     /tmp/bold/backup_manifest
    [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    
    [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    pg_validatebackup: error: "global/4183" has size 0 on disk but size
    *18446744073709551615* in the manifest
    
    Thanks & Regards,
    Rajkumar Raghuwanshi
    
    
    On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com>
    wrote:
    
    >
    > On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com>
    > wrote:
    >
    >> Hi,
    >>
    >> There is a scenario in which i add something inside the pg_tablespace
    >> directory , i am getting an error like-
    >>
    >> pg_validatebackup: * manifest_checksum =
    >> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >> pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is
    >> present on disk but not in the manifest
    >>
    >> but if i remove 'PG_13_202002271 ' directory then there is no error
    >>
    >> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
    >> pg_validatebackup: * manifest_checksum =
    >> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >> pg_validatebackup: backup successfully verified
    >>
    >>
    > This seems expected considering current design as we don't log the
    > directory entries in backup_manifest. In your case, you have tablespace
    > with no objects (empty tablespace) then backup_manifest does not have any
    > entry for this hence when you remove this tablespace directory, validator
    > could not detect it.
    >
    > We can either document it or add the entry for directories in the
    > manifest. Robert may have a better idea on this.
    >
    > --
    > --
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    >
    
  105. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-05T10:10:46Z

    Hi,
    
    There is one scenario  where  i somehow able to run pg_validatebackup 
    successfully but when i tried to start the server , it is failing
    
    Steps to reproduce -
    --create 2 base backup directory
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2
    
    --run pg_validatebackup , use backup_manifest of db1 directory against  
    db2/  . Will get an error
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
    db1/backup_manifest db2/
    pg_validatebackup: * manifest_checksum = 
    5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
    pg_validatebackup: error: checksum mismatch for file "backup_label"
    --copy the backup_level of db1 to db2 folder
    [centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.
    
    --run pg_validatebackup .. working fine
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
    db1/backup_manifest db2/
    pg_validatebackup: * manifest_checksum = 
    5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
    pg_validatebackup: backup successfully verified
    [centos@tushar-ldap-docker bin]$
    
    --try to start the server
    [centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p 7777'
    waiting for server to start....2020-03-05 15:33:53.471 IST [24049] LOG:  
    starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
    (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address 
    "::1", port 7777
    2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address 
    "127.0.0.1", port 7777
    2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket 
    "/tmp/.s.PGSQL.7777"
    2020-03-05 15:33:53.476 IST [24050] LOG:  database system was 
    interrupted; last known up at 2020-03-05 15:32:51 IST
    2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
    2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required 
    checkpoint record
    2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a 
    backup, touch 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add 
    required recovery options.
         If you are not restoring from a backup, try removing the file 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
         Be careful: removing 
    "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result 
    in a corrupt cluster if restoring from a backup.
    2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) 
    exited with exit code 1
    2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to 
    startup process failure
    2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
      stopped waiting
    pg_ctl: could not start server
    Examine the log output.
    [centos@tushar-ldap-docker bin]$
    
    regards,
    
    
    On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:
    > Hi,
    >
    > In a negative test scenario, if I changed size to -1 in 
    > backup_manifest, pg_validatebackup giving
    > error with a random size number.
    >
    > [edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold 
    > --manifest-checksum 'SHA256'
    > [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    > pg_validatebackup: backup successfully verified
    >
    > --change a file size to -1 and generate new checksum.
    > [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    > [edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
    > c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a 
    >  /tmp/bold/backup_manifest
    > [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    >
    > [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    > pg_validatebackup: error: "global/4183" has size 0 on disk but size 
    > *18446744073709551615* in the manifest
    >
    > Thanks & Regards,
    > Rajkumar Raghuwanshi
    >
    >
    > On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage 
    > <suraj.kharage@enterprisedb.com 
    > <mailto:suraj.kharage@enterprisedb.com>> wrote:
    >
    >
    >     On Wed, Mar 4, 2020 at 7:21 PM tushar
    >     <tushar.ahuja@enterprisedb.com
    >     <mailto:tushar.ahuja@enterprisedb.com>> wrote:
    >
    >         Hi,
    >
    >         There is a scenario in which i add something inside the
    >         pg_tablespace directory , i am getting an error like-
    >
    >         pg_validatebackup: * manifest_checksum =
    >         77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >         pg_validatebackup: error:
    >         "pg_tblspc/16385/*PG_13_202002271*/test" is present on disk
    >         but not in the manifest
    >
    >         but if i remove 'PG_13_202002271 ' directory then there is no
    >         error
    >
    >         [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
    >         pg_validatebackup: * manifest_checksum =
    >         77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >         pg_validatebackup: backup successfully verified
    >
    >
    >     This seems expected considering current design as we don't log the
    >     directory entries in backup_manifest. In your case, you have
    >     tablespace with no objects (empty tablespace) then backup_manifest
    >     does not have any entry for this hence when you remove this
    >     tablespace directory, validator could not detect it.
    >
    >     We can either document it or add the entry for directories in the
    >     manifest. Robert may have a better idea on this.
    >
    >     -- 
    >     -- 
    >
    >     Thanks & Regards,
    >     Suraj kharage,
    >     EnterpriseDB Corporation,
    >     The Postgres Database Company.
    >
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
  106. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-05T12:05:28Z

    There is one small observation if we use slash (/) with option -i then 
    not getting the desired result
    
    Steps to reproduce -
    ==============
    
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D test
    
    [centos@tushar-ldap-docker bin]$ touch test/*pg_notify*/dummy_file
    
    --working
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup   
    --ignore=*pg_notify*  test
    pg_validatebackup: * manifest_checksum = 
    be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b
    pg_validatebackup: backup successfully verified
    
    --not working
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup   
    --ignore=*pg_notify/*  test
    pg_validatebackup: * manifest_checksum = 
    be9b72e1320c6c34c131533de19371a10dd5011940181724e43277f786026c7b
    pg_validatebackup: error: "pg_notify/dummy_file" is present on disk but 
    not in the manifest
    
    regards,
    
    On 3/5/20 3:40 PM, tushar wrote:
    > Hi,
    >
    > There is one scenario  where  i somehow able to run pg_validatebackup 
    > successfully but when i tried to start the server , it is failing
    >
    > Steps to reproduce -
    > --create 2 base backup directory
    > [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
    > [centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2
    >
    > --run pg_validatebackup , use backup_manifest of db1 directory 
    > against  db2/  . Will get an error
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
    > db1/backup_manifest db2/
    > pg_validatebackup: * manifest_checksum = 
    > 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
    > pg_validatebackup: error: checksum mismatch for file "backup_label"
    > --copy the backup_level of db1 to db2 folder
    > [centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.
    >
    > --run pg_validatebackup .. working fine
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m 
    > db1/backup_manifest db2/
    > pg_validatebackup: * manifest_checksum = 
    > 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
    > pg_validatebackup: backup successfully verified
    > [centos@tushar-ldap-docker bin]$
    >
    > --try to start the server
    > [centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p 7777'
    > waiting for server to start....2020-03-05 15:33:53.471 IST [24049] 
    > LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by 
    > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
    > 2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv6 address 
    > "::1", port 7777
    > 2020-03-05 15:33:53.471 IST [24049] LOG:  listening on IPv4 address 
    > "127.0.0.1", port 7777
    > 2020-03-05 15:33:53.473 IST [24049] LOG:  listening on Unix socket 
    > "/tmp/.s.PGSQL.7777"
    > 2020-03-05 15:33:53.476 IST [24050] LOG:  database system was 
    > interrupted; last known up at 2020-03-05 15:32:51 IST
    > 2020-03-05 15:33:53.573 IST [24050] LOG:  invalid checkpoint record
    > 2020-03-05 15:33:53.573 IST [24050] FATAL:  could not locate required 
    > checkpoint record
    > 2020-03-05 15:33:53.573 IST [24050] HINT:  If you are restoring from a 
    > backup, touch 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and 
    > add required recovery options.
    >     If you are not restoring from a backup, try removing the file 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
    >     Be careful: removing 
    > "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will 
    > result in a corrupt cluster if restoring from a backup.
    > 2020-03-05 15:33:53.574 IST [24049] LOG:  startup process (PID 24050) 
    > exited with exit code 1
    > 2020-03-05 15:33:53.574 IST [24049] LOG:  aborting startup due to 
    > startup process failure
    > 2020-03-05 15:33:53.575 IST [24049] LOG:  database system is shut down
    >  stopped waiting
    > pg_ctl: could not start server
    > Examine the log output.
    > [centos@tushar-ldap-docker bin]$
    >
    > regards,
    >
    >
    > On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:
    >> Hi,
    >>
    >> In a negative test scenario, if I changed size to -1 in 
    >> backup_manifest, pg_validatebackup giving
    >> error with a random size number.
    >>
    >> [edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold 
    >> --manifest-checksum 'SHA256'
    >> [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    >> pg_validatebackup: backup successfully verified
    >>
    >> --change a file size to -1 and generate new checksum.
    >> [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    >> [edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
    >> c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a 
    >>  /tmp/bold/backup_manifest
    >> [edb@localhost bin]$ vi /tmp/bold/backup_manifest
    >>
    >> [edb@localhost bin]$ ./pg_validatebackup /tmp/bold
    >> pg_validatebackup: error: "global/4183" has size 0 on disk but size 
    >> *18446744073709551615* in the manifest
    >>
    >> Thanks & Regards,
    >> Rajkumar Raghuwanshi
    >>
    >>
    >> On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage 
    >> <suraj.kharage@enterprisedb.com 
    >> <mailto:suraj.kharage@enterprisedb.com>> wrote:
    >>
    >>
    >>     On Wed, Mar 4, 2020 at 7:21 PM tushar
    >>     <tushar.ahuja@enterprisedb.com
    >>     <mailto:tushar.ahuja@enterprisedb.com>> wrote:
    >>
    >>         Hi,
    >>
    >>         There is a scenario in which i add something inside the
    >>         pg_tablespace directory , i am getting an error like-
    >>
    >>         pg_validatebackup: * manifest_checksum =
    >>         77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >>         pg_validatebackup: error:
    >>         "pg_tblspc/16385/*PG_13_202002271*/test" is present on disk
    >>         but not in the manifest
    >>
    >>         but if i remove 'PG_13_202002271 ' directory then there is no
    >>         error
    >>
    >>         [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
    >>         pg_validatebackup: * manifest_checksum =
    >>         77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
    >>         pg_validatebackup: backup successfully verified
    >>
    >>
    >>     This seems expected considering current design as we don't log
    >>     the directory entries in backup_manifest. In your case, you have
    >>     tablespace with no objects (empty tablespace) then
    >>     backup_manifest does not have any entry for this hence when you
    >>     remove this tablespace directory, validator could not detect it.
    >>
    >>     We can either document it or add the entry for directories in the
    >>     manifest. Robert may have a better idea on this.
    >>
    >>     -- 
    >>     -- 
    >>
    >>     Thanks & Regards,
    >>     Suraj kharage,
    >>     EnterpriseDB Corporation,
    >>     The Postgres Database Company.
    >>
    >
    > -- 
    > regards,tushar
    > EnterpriseDBhttps://www.enterprisedb.com/
    > The Enterprise PostgreSQL Company
    
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
  107. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-05T16:55:39Z

    On Thu, Mar 5, 2020 at 7:05 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
    > There is one small observation if we use slash (/) with option -i then not getting the desired result
    
    Here's an updated patch set responding to many of the comments
    received thus far. Since there are quite a few emails, let me
    consolidate my comments and responses here.
    
    Report: Segmentation fault if -m is used to point to a valid manifest,
    but actual backup directory is nonexistent.
    Response: Fixed; thanks for the report.
    
    Report: pg_validatebackup doesn't complain about problems within the
    pg_wal directory.
    Response: That's out of scope. The WAL files are fetched separately
    and are therefore not part of the manifest.
    
    Report: Inaccessible file in data directory being validated leads to a
    double free.
    Response: Fixed; thanks for the report.
    
    Report: Patch 0005 doesn't validate the manifest checksum.
    Response: I know. I mentioned that when posting the previous patch
    set. Fixed in this version, though.
    
    Report: Removing an empty directory doesn't make backup validation
    fail, even though it might cause problems for the server.
    Response: That's a little unfortunate, but I'm not sure it's really
    worth complicating the patch to deal with it. It's something of a
    corner case.
    
    Report: Negative file sizes in the backup manifest are interpreted as
    large integers.
    Response: That's also a little unfortunate, but I doubt it's worth
    adding code to catch it, since any such manifest is corrupt. Also,
    it's not like we're ignoring it; the error just isn't ideal.
    
    Report: If I take the backup label from backup #1 and stick it into
    otherwise-identical backup #2, validation succeeds but the server
    won't start.
    Response: That's because we can't validate the pg_wal directory. As
    noted above, that's out of scope.
    
    Report: Using --ignore with a slash-terminated pathname doesn't work
    as expected.
    Response: Fixed, thanks for the report.
    
    Off-List Report: You forgot a PG_BINARY flag.
    Response: Fixed. I thought I'd done this before but there were two
    places and I'd only fixed one of them.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  108. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-03-06T08:58:24Z

    Thanks, Robert.
    
    1: Getting below error while compiling 0002 patch.
    
    edb@localhost:postgres$ mi > mi.log
    basebackup.c: In function ‘AddFileToManifest’:
    basebackup.c:1052:6: error: ‘pathname’ undeclared (first use in this
    function)
          pathname);
          ^
    basebackup.c:1052:6: note: each undeclared identifier is reported only once
    for each function it appears in
    make[3]: *** [basebackup.o] Error 1
    make[2]: *** [replication-recursive] Error 2
    make[1]: *** [install-backend-recurse] Error 2
    make: *** [install-src-recurse] Error 2
    
    
    I can see you have renamed the filename argument of AddFileToManifest() to
    pathname, but those changes are part of 0003 (validator patch).
    I think the changes related to src/backend/replication/basebackup.c should
    not be there in the validator patch (0003). We can move these changes to
    backup manifest patch, either in 0002 or 0004 for better readability of
    patch set.
    
    2:
    
    #define KW_MANIFEST_VERSION "PostgreSQL-Backup-Manifest-Version"
    #define KW_MANIFEST_FILE "File"
    #define KW_MANIFEST_CHECKSUM "Manifest-Checksum"
    #define KWL_MANIFEST_VERSION (sizeof(KW_MANIFEST_VERSION)-1)
    #define KWL_MANIFEST_FILE (sizeof(KW_MANIFEST_FILE)-1)
    #define KWL_MANIFEST_CHECKSUM (sizeof(KW_MANIFEST_CHECKSUM)-1)
    
    #define FIELDS_PER_FILE_LINE 4
    
    Few macros defined in 0003 patch not used anywhere in 0005 patch. Either we
    can replace these with hard-coded values or remove them.
    
    
    On Thu, Mar 5, 2020 at 10:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Mar 5, 2020 at 7:05 AM tushar <tushar.ahuja@enterprisedb.com>
    > wrote:
    > > There is one small observation if we use slash (/) with option -i then
    > not getting the desired result
    >
    > Here's an updated patch set responding to many of the comments
    > received thus far. Since there are quite a few emails, let me
    > consolidate my comments and responses here.
    >
    > Report: Segmentation fault if -m is used to point to a valid manifest,
    > but actual backup directory is nonexistent.
    > Response: Fixed; thanks for the report.
    >
    > Report: pg_validatebackup doesn't complain about problems within the
    > pg_wal directory.
    > Response: That's out of scope. The WAL files are fetched separately
    > and are therefore not part of the manifest.
    >
    > Report: Inaccessible file in data directory being validated leads to a
    > double free.
    > Response: Fixed; thanks for the report.
    >
    > Report: Patch 0005 doesn't validate the manifest checksum.
    > Response: I know. I mentioned that when posting the previous patch
    > set. Fixed in this version, though.
    >
    > Report: Removing an empty directory doesn't make backup validation
    > fail, even though it might cause problems for the server.
    > Response: That's a little unfortunate, but I'm not sure it's really
    > worth complicating the patch to deal with it. It's something of a
    > corner case.
    >
    > Report: Negative file sizes in the backup manifest are interpreted as
    > large integers.
    > Response: That's also a little unfortunate, but I doubt it's worth
    > adding code to catch it, since any such manifest is corrupt. Also,
    > it's not like we're ignoring it; the error just isn't ideal.
    >
    > Report: If I take the backup label from backup #1 and stick it into
    > otherwise-identical backup #2, validation succeeds but the server
    > won't start.
    > Response: That's because we can't validate the pg_wal directory. As
    > noted above, that's out of scope.
    >
    > Report: Using --ignore with a slash-terminated pathname doesn't work
    > as expected.
    > Response: Fixed, thanks for the report.
    >
    > Off-List Report: You forgot a PG_BINARY flag.
    > Response: Fixed. I thought I'd done this before but there were two
    > places and I'd only fixed one of them.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  109. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-09T16:22:17Z

    On 3/5/20 10:25 PM, Robert Haas wrote:
    > Here's an updated patch set responding to many of the comments
    > received thus far.
    Thanks Robert. There is a scenario - if user provide port of v11 server 
    at the time of  creating 'base backup'  against pg_basebackup(v13+ your 
    patch applied)
    with option --manifest-checksums,will lead to  this  below error
    
    [centos@tushar-ldap-docker bin]$ ./pg_basebackup -R -p 9045 
    --manifest-checksums=SHA224 -D dc1
    pg_basebackup: error: could not initiate base backup: ERROR: syntax error
    pg_basebackup: removing data directory "dc1"
    [centos@tushar-ldap-docker bin]$
    
    Steps to reproduce -
    PG v11 is running
    run pg_basebackup against that with option --manifest-checksums
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  110. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-09T17:16:47Z

    On Mon, Mar 9, 2020 at 12:22 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
    > On 3/5/20 10:25 PM, Robert Haas wrote:
    > > Here's an updated patch set responding to many of the comments
    > > received thus far.
    > Thanks Robert. There is a scenario - if user provide port of v11 server
    > at the time of  creating 'base backup'  against pg_basebackup(v13+ your
    > patch applied)
    > with option --manifest-checksums,will lead to  this  below error
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_basebackup -R -p 9045
    > --manifest-checksums=SHA224 -D dc1
    > pg_basebackup: error: could not initiate base backup: ERROR: syntax error
    > pg_basebackup: removing data directory "dc1"
    > [centos@tushar-ldap-docker bin]$
    >
    > Steps to reproduce -
    > PG v11 is running
    > run pg_basebackup against that with option --manifest-checksums
    
    Seems like expected behavior to me. We could consider providing a more
    descriptive error message, but there's now way for it to work.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  111. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-11T20:08:39Z

    On Fri, Mar 6, 2020 at 3:58 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > 1: Getting below error while compiling 0002 patch.
    > 2:
    >
    > Few macros defined in 0003 patch not used anywhere in 0005 patch. Either we can replace these with hard-coded values or remove them.
    
    Thanks. I hope that I have straightened those things out in the new
    version which is attached. This version also includes some other
    changes. The non-JSON code is now completely gone. Also, I've
    refactored the code that does parses the JSON manifest to make it
    cleaner, and I've moved it out into a separate file. This might be
    useful if anyone ends up wanting to reuse that code for some other
    purpose, and I think it makes it easier to understand, too, since the
    manifest parsing is now much better separated from the task of
    actually validating the given directory against the manifest. I've
    also added some tests, which are based in part on testing ideas from
    Rajkumar Raghuwanshi and Mark Dilger, but this test code was written
    by me. So now it's like this:
    
    0001 - checksum helper functions. same as before.
    0002 - patch the server to generate and send a manifest, and
    pg_basebackup to receive it
    0003 - add pg_validatebackup
    0004 - TAP tests
    
    Comments?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  112. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-12T14:46:57Z

    On 3/9/20 10:46 PM, Robert Haas wrote:
    > Seems like expected behavior to me. We could consider providing a more
    > descriptive error message, but there's now way for it to work.
    
    Right , Error message need to be more user friendly .
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  113. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-13T13:53:03Z

    On 3/12/20 8:16 PM, tushar wrote:
    >> Seems like expected behavior to me. We could consider providing a more
    >> descriptive error message, but there's now way for it to work.
    >
    > Right , Error message need to be more user friendly . 
    
    One scenario which i feel - should error out  even if  -s option is 
    specified.
    
    create  base  backup directory ( ./pg_basebackup data1)
    Connect to root user and take out  the permission from pg_hba.conf file 
    ( chmod 004 pg_hba.conf)
    
    run pg_validatebackup -
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1
    pg_validatebackup: error: could not open file "pg_hba.conf": Permission 
    denied
    
    run pg_validatebackup  with switch -s
    
    [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1 -s
    pg_validatebackup: backup successfully verified
    
    here file is not accessible so i think - it should throw you an error ( 
    the same above one) instead of   blindly skipping it.
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
  114. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-13T16:54:18Z

    On Fri, Mar 13, 2020 at 9:53 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
    > run pg_validatebackup -
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1
    > pg_validatebackup: error: could not open file "pg_hba.conf": Permission denied
    >
    > run pg_validatebackup  with switch -s
    >
    > [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data1 -s
    > pg_validatebackup: backup successfully verified
    >
    > here file is not accessible so i think - it should throw you an error ( the same above one) instead of   blindly skipping it.
    
    I don't really want to do that. That would require it to open every
    file even if it doesn't need to read the data in the files. I think in
    most cases that would just slow it down for no real benefit. If you've
    specified -s, you have to be OK with getting a less complete check for
    problems.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  115. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-13T20:34:27Z

    On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
    > On 3/9/20 10:46 PM, Robert Haas wrote:
    > > Seems like expected behavior to me. We could consider providing a more
    > > descriptive error message, but there's now way for it to work.
    >
    > Right , Error message need to be more user friendly .
    
    OK. Done in the attached version, which also includes a few other changes:
    
    - I expanded the regression tests. They now cover every line of code
    in parse_manifest.c except for a few that I believe to be unreachable
    (though I might be mistaken). Coverage for pg_validatebackup.c is also
    improved, but it's not 100%; there are some cases that I don't know
    how to hit outside of a kernel malfunction, and others that I only
    know how to hit on non-Windows systems. For instance, it's easy to use
    perl to make a file inaccessible on Linux with chmod(0, $filename),
    but I gather that doesn't work on Windows. I'm going to spend a bit
    more time looking at this, but I think it's already reasonably good.
    
    - I fixed a couple of very minor bugs which I discovered by writing those tests.
    
    - I added documentation, in part based on a draft Mark Dilger shared
    with me off-list.
    
    I don't think this is committable just yet, but I think it's getting
    fairly close, so if anyone has major objections please speak up soon.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  116. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-03-16T05:07:35Z

    Thank you, Robert.
    
    Getting below warning while compiling the
    v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.
    
    
    
    *pg_validatebackup.c: In function
    ‘report_manifest_error’:pg_validatebackup.c:356:2: warning: function might
    be possible candidate for ‘gnu_printf’ format attribute
    [-Wsuggest-attribute=format]  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);*
    
    
    To resolve this, can we use "pg_attribute_printf(2, 3)" in function
    declaration something like below?
    e.g:
    
    diff --git a/src/bin/pg_validatebackup/parse_manifest.h
    b/src/bin/pg_validatebackup/parse_manifest.h
    index b0b18a5..25d140f 100644
    --- a/src/bin/pg_validatebackup/parse_manifest.h
    +++ b/src/bin/pg_validatebackup/parse_manifest.h
    @@ -25,7 +25,7 @@ typedef void
    (*json_manifest_perfile_callback)(JsonManifestParseContext *,
                                                                     size_t
    size, pg_checksum_type checksum_type,
                                                                     int
    checksum_length, uint8 *checksum_payload);
     typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
    -                                                                char *fmt,
    ...);
    +                                                                char
    *fmt,...) pg_attribute_printf(2, 3);
    
     struct JsonManifestParseContext
     {
    diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c
    b/src/bin/pg_validatebackup/pg_validatebackup.c
    index 0e7299b..6ccbe59 100644
    --- a/src/bin/pg_validatebackup/pg_validatebackup.c
    +++ b/src/bin/pg_validatebackup/pg_validatebackup.c
    @@ -95,7 +95,7 @@ static void
    record_manifest_details_for_file(JsonManifestParseContext *context,
    
                 int checksum_length,
    
                 uint8 *checksum_payload);
     static void report_manifest_error(JsonManifestParseContext *context,
    -                                                                 char
    *fmt, ...);
    +                                                                 char
    *fmt,...) pg_attribute_printf(2, 3);
    
     static void validate_backup_directory(validator_context *context,
    
    char *relpath, char *fullpath);
    
    
    Typos:
    
    0004 patch
    unexpctedly => unexpectedly
    
    0005 patch
    bacup => backup
    
    On Sat, Mar 14, 2020 at 2:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com>
    > wrote:
    > > On 3/9/20 10:46 PM, Robert Haas wrote:
    > > > Seems like expected behavior to me. We could consider providing a more
    > > > descriptive error message, but there's now way for it to work.
    > >
    > > Right , Error message need to be more user friendly .
    >
    > OK. Done in the attached version, which also includes a few other changes:
    >
    > - I expanded the regression tests. They now cover every line of code
    > in parse_manifest.c except for a few that I believe to be unreachable
    > (though I might be mistaken). Coverage for pg_validatebackup.c is also
    > improved, but it's not 100%; there are some cases that I don't know
    > how to hit outside of a kernel malfunction, and others that I only
    > know how to hit on non-Windows systems. For instance, it's easy to use
    > perl to make a file inaccessible on Linux with chmod(0, $filename),
    > but I gather that doesn't work on Windows. I'm going to spend a bit
    > more time looking at this, but I think it's already reasonably good.
    >
    > - I fixed a couple of very minor bugs which I discovered by writing those
    > tests.
    >
    > - I added documentation, in part based on a draft Mark Dilger shared
    > with me off-list.
    >
    > I don't think this is committable just yet, but I think it's getting
    > fairly close, so if anyone has major objections please speak up soon.
    >
    > --
    > Robert Haas
    > EnterpriseDB: http://www.enterprisedb.com
    > The Enterprise PostgreSQL Company
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  117. Re: backup manifests

    Suraj Kharage <suraj.kharage@enterprisedb.com> — 2020-03-16T06:03:23Z

    One more suggestion, recent commit (1933ae62) has added the PostgreSQL home
    page to --help output.
    
    e.g:
    *PostgreSQL home page: <https://www.postgresql.org/
    <https://www.postgresql.org/>>*
    
    We might need to consider this change for pg_validatebackup binary.
    
    On Mon, Mar 16, 2020 at 10:37 AM Suraj Kharage <
    suraj.kharage@enterprisedb.com> wrote:
    
    > Thank you, Robert.
    >
    > Getting below warning while compiling the
    > v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.
    >
    >
    >
    > *pg_validatebackup.c: In function
    > ‘report_manifest_error’:pg_validatebackup.c:356:2: warning: function might
    > be possible candidate for ‘gnu_printf’ format attribute
    > [-Wsuggest-attribute=format]  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);*
    >
    >
    > To resolve this, can we use "pg_attribute_printf(2, 3)" in function
    > declaration something like below?
    > e.g:
    >
    > diff --git a/src/bin/pg_validatebackup/parse_manifest.h
    > b/src/bin/pg_validatebackup/parse_manifest.h
    > index b0b18a5..25d140f 100644
    > --- a/src/bin/pg_validatebackup/parse_manifest.h
    > +++ b/src/bin/pg_validatebackup/parse_manifest.h
    > @@ -25,7 +25,7 @@ typedef void
    > (*json_manifest_perfile_callback)(JsonManifestParseContext *,
    >                                                                  size_t
    > size, pg_checksum_type checksum_type,
    >                                                                  int
    > checksum_length, uint8 *checksum_payload);
    >  typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
    > -                                                                char
    > *fmt, ...);
    > +                                                                char
    > *fmt,...) pg_attribute_printf(2, 3);
    >
    >  struct JsonManifestParseContext
    >  {
    > diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c
    > b/src/bin/pg_validatebackup/pg_validatebackup.c
    > index 0e7299b..6ccbe59 100644
    > --- a/src/bin/pg_validatebackup/pg_validatebackup.c
    > +++ b/src/bin/pg_validatebackup/pg_validatebackup.c
    > @@ -95,7 +95,7 @@ static void
    > record_manifest_details_for_file(JsonManifestParseContext *context,
    >
    >                int checksum_length,
    >
    >                uint8 *checksum_payload);
    >  static void report_manifest_error(JsonManifestParseContext *context,
    > -                                                                 char
    > *fmt, ...);
    > +                                                                 char
    > *fmt,...) pg_attribute_printf(2, 3);
    >
    >  static void validate_backup_directory(validator_context *context,
    >
    > char *relpath, char *fullpath);
    >
    >
    > Typos:
    >
    > 0004 patch
    > unexpctedly => unexpectedly
    >
    > 0005 patch
    > bacup => backup
    >
    > On Sat, Mar 14, 2020 at 2:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    >> On Thu, Mar 12, 2020 at 10:47 AM tushar <tushar.ahuja@enterprisedb.com>
    >> wrote:
    >> > On 3/9/20 10:46 PM, Robert Haas wrote:
    >> > > Seems like expected behavior to me. We could consider providing a more
    >> > > descriptive error message, but there's now way for it to work.
    >> >
    >> > Right , Error message need to be more user friendly .
    >>
    >> OK. Done in the attached version, which also includes a few other changes:
    >>
    >> - I expanded the regression tests. They now cover every line of code
    >> in parse_manifest.c except for a few that I believe to be unreachable
    >> (though I might be mistaken). Coverage for pg_validatebackup.c is also
    >> improved, but it's not 100%; there are some cases that I don't know
    >> how to hit outside of a kernel malfunction, and others that I only
    >> know how to hit on non-Windows systems. For instance, it's easy to use
    >> perl to make a file inaccessible on Linux with chmod(0, $filename),
    >> but I gather that doesn't work on Windows. I'm going to spend a bit
    >> more time looking at this, but I think it's already reasonably good.
    >>
    >> - I fixed a couple of very minor bugs which I discovered by writing those
    >> tests.
    >>
    >> - I added documentation, in part based on a draft Mark Dilger shared
    >> with me off-list.
    >>
    >> I don't think this is committable just yet, but I think it's getting
    >> fairly close, so if anyone has major objections please speak up soon.
    >>
    >> --
    >> Robert Haas
    >> EnterpriseDB: http://www.enterprisedb.com
    >> The Enterprise PostgreSQL Company
    >>
    >
    >
    > --
    > --
    >
    > Thanks & Regards,
    > Suraj kharage,
    > EnterpriseDB Corporation,
    > The Postgres Database Company.
    >
    
    
    -- 
    --
    
    Thanks & Regards,
    Suraj kharage,
    EnterpriseDB Corporation,
    The Postgres Database Company.
    
  118. Re: backup manifests

    tushar <tushar.ahuja@enterprisedb.com> — 2020-03-16T10:22:04Z

    On 3/14/20 2:04 AM, Robert Haas wrote:
    > OK. Done in the attached version
    
    Thanks. Verified.
    
    -- 
    regards,tushar
    EnterpriseDB  https://www.enterprisedb.com/
    The Enterprise PostgreSQL Company
    
    
    
    
    
  119. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-20T22:29:48Z

    On Mon, Mar 16, 2020 at 2:03 AM Suraj Kharage
    <suraj.kharage@enterprisedb.com> wrote:
    > One more suggestion, recent commit (1933ae62) has added the PostgreSQL home page to --help output.
    
    Good catch. Fixed. I also attempted to address the compiler warning
    you mentioned in your other email.
    
    Also, I realized that the previous patch versions didn't handle the
    hex-encoded path format that we need to use for non-UTF8 filenames,
    and that there was no easy way to test that format. So, in this
    version I added an option to force all pathnames to be encoded in that
    format. I also made that option capable of suppressing the backup
    manifest altogether. Other than that, this version is pretty much the
    same as the last version, except for a few additional test cases which
    I added to get the code coverage up even a little more. It would be
    nice if someone could test whether the tests pass on Windows.
    
    I have squashed the series down to just 2 commits, since that seems
    like the way that this should probably be committed. Barring strong
    objections and/or the end of the world, I plan to do that next week.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  120. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-21T12:26:53Z

    On Sat, Mar 21, 2020 at 4:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Mon, Mar 16, 2020 at 2:03 AM Suraj Kharage
    > <suraj.kharage@enterprisedb.com> wrote:
    > > One more suggestion, recent commit (1933ae62) has added the PostgreSQL home page to --help output.
    >
    > Good catch. Fixed. I also attempted to address the compiler warning
    > you mentioned in your other email.
    >
    > Also, I realized that the previous patch versions didn't handle the
    > hex-encoded path format that we need to use for non-UTF8 filenames,
    > and that there was no easy way to test that format. So, in this
    > version I added an option to force all pathnames to be encoded in that
    > format. I also made that option capable of suppressing the backup
    > manifest altogether. Other than that, this version is pretty much the
    > same as the last version, except for a few additional test cases which
    > I added to get the code coverage up even a little more. It would be
    > nice if someone could test whether the tests pass on Windows.
    >
    
    On my CentOS, the patch gives below compilation failure:
    pg_validatebackup.c: In function ‘parse_manifest_file’:
    pg_validatebackup.c:335:19: error: assignment left-hand side might be
    a candidate for a format attribute [-Werror=suggest-attribute=format]
      context.error_cb = report_manifest_error;
    
    I have tested it on Windows and found there are multiple failures.
    The failures are as below:
    Test Summary Report
    ---------------------------------------
    t/002_algorithm.pl   (Wstat: 512 Tests: 5 Failed: 4)
      Failed tests:  2-5
      Non-zero exit status: 2
      Parse errors: Bad plan.  You planned 19 tests but ran 5.
    t/003_corruption.pl  (Wstat: 256 Tests: 14 Failed: 7)
      Failed tests:  2, 4, 6, 8, 10, 12, 14
      Non-zero exit status: 1
      Parse errors: Bad plan.  You planned 44 tests but ran 14.
    t/004_options.pl     (Wstat: 4352 Tests: 25 Failed: 17)
      Failed tests:  2, 4, 6-12, 14-17, 19-20, 22, 25
      Non-zero exit status: 17
    t/005_bad_manifest.pl (Wstat: 1792 Tests: 44 Failed: 7)
      Failed tests:  18, 24, 26, 30, 32, 34, 36
      Non-zero exit status: 7
    Files=6, Tests=109, 72 wallclock secs ( 0.05 usr +  0.01 sys =  0.06 CPU)
    Result: FAIL
    
    Failure Report
    ------------------------
    t/002_algorithm.pl ..... 1/19
    #   Failed test 'backup ok with algorithm "none"'
    #   at t/002_algorithm.pl line 33.
    
    #   Failed test 'backup manifest exists'
    #   at t/002_algorithm.pl line 39.
    
    t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
    algorithm "none"'
    #   at t/002_algorithm.pl line 53.
    
    #   Failed test 'backup ok with algorithm "crc32c"'
    #   at t/002_algorithm.pl line 33.
    # Looks like you planned 19 tests but ran 5.
    # Looks like you failed 4 tests of 5 run.
    # Looks like your test exited with 2 just after 5.
    t/002_algorithm.pl ..... Dubious, test returned 2 (wstat 512, 0x200)
    Failed 18/19 subtests
    t/003_corruption.pl .... 1/44
    #   Failed test 'intact backup validated'
    #   at t/003_corruption.pl line 110.
    
    #   Failed test 'corrupt backup fails validation: extra_file: matches'
    #   at t/003_corruption.pl line 117.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:extra_file.*present on disk but not in the manifest)'
    t/003_corruption.pl .... 5/44
    #   Failed test 'intact backup validated'
    #   at t/003_corruption.pl line 110.
    t/003_corruption.pl .... 7/44
    #   Failed test 'corrupt backup fails validation:
    extra_tablespace_file: matches'
    #   at t/003_corruption.pl line 117.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:extra_ts_file.*present on disk but not in the
    manifest)'
    t/003_corruption.pl .... 9/44
    #   Failed test 'intact backup validated'
    #   at t/003_corruption.pl line 110.
    
    #   Failed test 'corrupt backup fails validation: missing_file: matches'
    #   at t/003_corruption.pl line 117.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:pg_xact/0000.*present in the manifest but not on disk)'
    t/003_corruption.pl .... 13/44
    #   Failed test 'intact backup validated'
    #   at t/003_corruption.pl line 110.
    # Looks like you planned 44 tests but ran 14.
    # Looks like you failed 7 tests of 14 run.
    # Looks like your test exited with 1 just after 14.
    t/003_corruption.pl .... Dubious, test returned 1 (wstat 256, 0x100)
    Failed 37/44 subtests
    t/004_options.pl ....... 1/25
    #   Failed test '-q succeeds: exit code 0'
    #   at t/004_options.pl line 25.
    
    #   Failed test '-q succeeds: no stderr'
    #   at t/004_options.pl line 27.
    #          got: 'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     expected: ''
    
    #   Failed test '-q checksum mismatch: matches'
    #   at t/004_options.pl line 37.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:checksum mismatch for file \"PG_VERSION\")'
    t/004_options.pl ....... 7/25
    #   Failed test '-s skips checksumming: exit code 0'
    #   at t/004_options.pl line 43.
    
    #   Failed test '-s skips checksumming: no stderr'
    #   at t/004_options.pl line 43.
    #          got: 'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     expected: ''
    
    #   Failed test '-s skips checksumming: matches'
    #   at t/004_options.pl line 43.
    #                   ''
    #     doesn't match '(?^:backup successfully verified)'
    
    #   Failed test '-i ignores problem file: exit code 0'
    #   at t/004_options.pl line 48.
    
    #   Failed test '-i ignores problem file: no stderr'
    #   at t/004_options.pl line 48.
    #          got: 'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     expected: ''
    
    #   Failed test '-i ignores problem file: matches'
    #   at t/004_options.pl line 48.
    #                   ''
    #     doesn't match '(?^:backup successfully verified)'
    
    #   Failed test '-i does not ignore all problems: matches'
    #   at t/004_options.pl line 57.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'
    
    #   Failed test 'multiple -i options work: exit code 0'
    #   at t/004_options.pl line 62.
    
    #   Failed test 'multiple -i options work: no stderr'
    #   at t/004_options.pl line 62.
    #          got: 'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     expected: ''
    
    #   Failed test 'multiple -i options work: matches'
    #   at t/004_options.pl line 62.
    #                   ''
    #     doesn't match '(?^:backup successfully verified)'
    
    #   Failed test 'multiple problems: missing files reported'
    #   at t/004_options.pl line 71.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'
    
    #   Failed test 'multiple problems: checksum mismatch reported'
    #   at t/004_options.pl line 73.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:checksum mismatch for file \"PG_VERSION\")'
    
    #   Failed test '-e reports 1 error: missing files reported'
    #   at t/004_options.pl line 80.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:pg_xact.*is present in the manifest but not on disk)'
    
    #   Failed test 'nonexistent backup directory: matches'
    #   at t/004_options.pl line 86.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:could not open directory)'
    # Looks like you failed 17 tests of 25.
    t/004_options.pl ....... Dubious, test returned 17 (wstat 4352, 0x1100)
    Failed 17/25 subtests
    t/005_bad_manifest.pl .. 1/44
    #   Failed test 'missing pathname: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: missing size
    # '
    #     doesn't match '(?^:could not parse backup manifest: missing pathname)'
    
    #   Failed test 'missing size: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:could not parse backup manifest: missing size)'
    
    #   Failed test 'file size is not an integer: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:could not parse backup manifest: file size is
    not an integer)'
    
    #   Failed test 'duplicate pathname in backup manifest: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:fatal: duplicate pathname in backup manifest)'
    t/005_bad_manifest.pl .. 31/44
    #   Failed test 'checksum without algorithm: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:could not parse backup manifest: checksum
    without algorithm)'
    
    #   Failed test 'unrecognized checksum algorithm: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:fatal: unrecognized checksum algorithm)'
    
    #   Failed test 'invalid checksum for file: matches'
    #   at t/005_bad_manifest.pl line 156.
    #                   'pg_validatebackup: fatal: could not parse backup
    manifest: both pathname and encoded pathname
    # '
    #     doesn't match '(?^:fatal: invalid checksum for file)'
    # Looks like you failed 7 tests of 44.
    t/005_bad_manifest.pl .. Dubious, test returned 7 (wstat 1792, 0x700)
    Failed 7/44 subtests
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  121. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-23T11:04:28Z

    On Sat, Mar 21, 2020 at 5:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    >
    > On my CentOS, the patch gives below compilation failure:
    > pg_validatebackup.c: In function ‘parse_manifest_file’:
    > pg_validatebackup.c:335:19: error: assignment left-hand side might be
    > a candidate for a format attribute [-Werror=suggest-attribute=format]
    >   context.error_cb = report_manifest_error;
    >
    > I have tested it on Windows and found there are multiple failures.
    > The failures are as below:
    >
    
    I have started to investigate the failures.
    
    >
    > Failure Report
    > ------------------------
    > t/002_algorithm.pl ..... 1/19
    > #   Failed test 'backup ok with algorithm "none"'
    > #   at t/002_algorithm.pl line 33.
    >
    
    I checked the log and it was giving error:
    
    /src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
    --manifest-checksum none --no-sync
    \tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum
    
    It seems the option to be used should be --manifest-checksums.  The
    attached patch fixes this problem for me.
    
    > t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
    > algorithm "none"'
    > #   at t/002_algorithm.pl line 53.
    >
    
    The error message for the above failure is:
    pg_validatebackup: fatal: could not parse backup manifest: both
    pathname and encoded pathname
    
    I don't know at this stage what could cause this?  Any pointers?
    
    Attached are logs of failed runs (regression.tar.gz).
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
  122. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-23T16:15:54Z

    On Mon, Mar 23, 2020 at 7:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > /src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
    > --manifest-checksum none --no-sync
    > \tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum
    >
    > It seems the option to be used should be --manifest-checksums.  The
    > attached patch fixes this problem for me.
    
    OK, incorporated that.
    
    > > t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
    > > algorithm "none"'
    > > #   at t/002_algorithm.pl line 53.
    > >
    >
    > The error message for the above failure is:
    > pg_validatebackup: fatal: could not parse backup manifest: both
    > pathname and encoded pathname
    >
    > I don't know at this stage what could cause this?  Any pointers?
    
    I think I forgot an initializer. Try this version.
    
    I also incorporated a fix previously proposed by Suraj for the
    compiler warning you mentioned in the other email.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  123. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-23T22:42:17Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > I think I forgot an initializer. Try this version.
    
    Just took a quick look through this.  I'm pretty sure David wants to
    look at it too.  Anyway, some comments below.
    
    > diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
    > index f139ba0231..d1ff53e8e8 100644
    > --- a/doc/src/sgml/protocol.sgml
    > +++ b/doc/src/sgml/protocol.sgml
    > @@ -2466,7 +2466,7 @@ The commands accepted in replication mode are:
    >    </varlistentry>
    >  
    >    <varlistentry id="protocol-replication-base-backup" xreflabel="BASE_BACKUP">
    > -    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
    > +    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ] [ <literal>MANIFEST</literal> <replaceable>manifest_option</replaceable> ] [ <literal>MANIFEST_CHECKSUMS</literal> <replaceable>checksum_algorithm</replaceable> ]
    >       <indexterm><primary>BASE_BACKUP</primary></indexterm>
    >      </term>
    >      <listitem>
    > @@ -2576,6 +2576,37 @@ The commands accepted in replication mode are:
    >           </para>
    >          </listitem>
    >         </varlistentry>
    > +
    > +       <varlistentry>
    > +        <term><literal>MANIFEST</literal></term>
    > +        <listitem>
    > +         <para>
    > +          When this option is specified with a value of <literal>ye'</literal>
    > +          or <literal>force-escape</literal>, a backup manifest is created
    > +          and sent along with the backup. The latter value forces all filenames
    > +          to be hex-encoded; otherwise, this type of encoding is performed only
    > +          for files whose names are non-UTF8 octet sequences.
    > +          <literal>force-escape</literal> is intended primarily for testing
    > +          purposes, to be sure that clients which read the backup manifest
    > +          can handle this case. For compatibility with previous releases,
    > +          the default is <literal>MANIFEST 'no'</literal>.
    > +         </para>
    > +        </listitem>
    > +       </varlistentry>
    > +
    > +       <varlistentry>
    > +        <term><literal>MANIFEST_CHECKSUMS</literal></term>
    > +        <listitem>
    > +         <para>
    > +          Specifies the algorithm that should be used to checksum each file
    > +          for purposes of the backup manifest. Currently, the available
    > +          algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
    > +          <literal>SHA224</literal>, <literal>SHA256</literal>,
    > +          <literal>SHA384</literal>, and <literal>SHA512</literal>.
    > +          The default is <literal>CRC32C</literal>.
    > +         </para>
    > +        </listitem>
    > +       </varlistentry>
    >        </variablelist>
    >       </para>
    >       <para>
    
    While I get the desire to have a default here that includes checksums,
    the way the command is structured, it strikes me as odd that the lack of
    MANIFEST_CHECKSUMS in the command actually results in checksums being
    included.  I would think that we'd either:
    
    - have the lack of MANIFEST_CHECKSUMS mean 'No checksums'
    
    or
    
    - Require MANIFEST_CHECKSUMS to be specified and not have it be optional
    
    We aren't expecting people to actually be typing these commands out and
    so I don't think it's a *huge* deal to have it the way you've written
    it, but it still strikes me as odd.  I don't think I have a real
    preference between the two options that I suggest above, maybe very
    slightly in favor of the first.
    
    > diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
    > index 90638aad0e..bf6963a595 100644
    > --- a/doc/src/sgml/ref/pg_basebackup.sgml
    > +++ b/doc/src/sgml/ref/pg_basebackup.sgml
    > @@ -561,6 +561,69 @@ PostgreSQL documentation
    >         </para>
    >        </listitem>
    >       </varlistentry>
    > +
    > +     <varlistentry>
    > +      <term><option>--no-manifest</option></term>
    > +      <listitem>
    > +       <para>
    > +        Disables generation of a backup manifest. If this option is not
    > +        specified, the server will and send generate a backup manifest
    > +        which can be verified using <xref linkend="app-pgvalidatebackup" />.
    > +       </para>
    > +      </listitem>
    > +     </varlistentry>
    
    How about "If this option is not specified, the server will generate and
    send a backup manifest which can be verified using ..."
    
    > +     <varlistentry>
    > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
    > +      <listitem>
    > +       <para>
    > +        Specifies the algorithm that should be used to checksum each file
    > +        for purposes of the backup manifest. Currently, the available
    > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
    > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
    > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
    > +        The default is <literal>CRC32C</literal>.
    > +       </para>
    
    As I recall, there was an invitation to argue about the defaults at one
    point, and so I'm going to say here that I would advocate for a
    different default than 'crc32c'.  Specifically, I would think sha256 or
    512 would be better.  I don't recall seeing a debate about this that
    conclusively found crc32c to be better, but I'm happy to go back and
    reread anything someone wants to point me at.
    
    > +       <para>
    > +        If <literal>NONE</literal> is selected, the backup manifest will
    > +        not contain any checksums. Otherwise, it will contain a checksum
    > +        of each file in the backup using the specified algorithm. In addition,
    > +        the manifest itself will always contain a <literal>SHA256</literal>
    > +        checksum of its own contents. The <literal>SHA</literal> algorithms
    > +        are significantly more CPU-intensive than <literal>CRC32C</literal>,
    > +        so selecting one of them may increase the time required to complete
    > +        the backup.
    > +       </para>
    
    It also seems a bit silly to me that using the defaults means having to
    deal with two different algorithms- crc32c and sha256.  Considering how
    fast these algorithms are, compared to everything else involved in a
    backup (particularly one that's likely going across a network...), I
    wonder if we should say "may slightly increase" above.
    
    > +       <para>
    > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
    > +        hash function, so it is only suitable for protecting against
    > +        inadvertent or random modifications to a backup. An adversary
    > +        who can modify the backup could easily do so in such a way that
    > +        the CRC does not change, whereas a SHA collision will be hard
    > +        to manufacture. (However, note that if the attacker also has access
    > +        to modify the backup manifest itself, no checksum algorithm will
    > +        provide any protection.) An additional advantage of the
    > +        <literal>SHA</literal> family of functions is that they output
    > +        a much larger number of bits.
    > +       </para>
    
    I'm not really sure that this paragraph is sensible to include..  We
    certainly don't talk about adversaries and cryptographic hash functions
    when we talk about our page-level checksums, for example.  I'm not
    completely against including it, but I don't want to give the impression
    that this is something we routinely consider or that lack of discussion
    elsewhere implies we have protections against a determined attacker.
    
    > diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml
    > new file mode 100644
    > index 0000000000..1c171f6970
    > --- /dev/null
    > +++ b/doc/src/sgml/ref/pg_validatebackup.sgml
    > @@ -0,0 +1,232 @@
    > +<!--
    > +doc/src/sgml/ref/pg_validatebackup.sgml
    > +PostgreSQL documentation
    > +-->
    > +
    > +<refentry id="app-pgvalidatebackup">
    > + <indexterm zone="app-pgvalidatebackup">
    > +  <primary>pg_validatebackup</primary>
    > + </indexterm>
    > +
    > + <refmeta>
    > +  <refentrytitle>pg_validatebackup</refentrytitle>
    > +  <manvolnum>1</manvolnum>
    > +  <refmiscinfo>Application</refmiscinfo>
    > + </refmeta>
    > +
    > + <refnamediv>
    > +  <refname>pg_validatebackup</refname>
    > +  <refpurpose>verify the integrity of a base backup of a
    > +  <productname>PostgreSQL</productname> cluster</refpurpose>
    > + </refnamediv>
    
    "verify the integrity of a backup taken using pg_basebackup"
    
    > + <refsect1>
    > +  <title>
    > +   Description
    > +  </title>
    > +  <para>
    > +   <application>pg_validatebackup</application> is used to check the integrity
    > +   of a database cluster backup.  The backup being checked should have been
    > +   created by <command>pg_basebackup</command> or some other tool that includes
    > +   a <literal>backup_manifest</literal> file with the backup. The backup
    > +   must be stored in the "plain" format; a "tar" format backup can be checked
    > +   after extracting it. Backup manifests are created by the server beginning
    > +   with <productname>PostgreSQL</productname> version 13, so older backups
    > +   cannot be validated using this tool.
    > +  </para>
    
    This seems to invite the idea that pg_validatebackup should be able to
    work with external backup solutions- but I'm a bit concerned by that
    idea because it seems like it would then mean we'd have to be
    particularly careful when changing things in this area, and I'm not
    thrilled by that.  I'd like to make sure that new versions of
    pg_validatebackup work with older backups, and, ideally, older versions
    of pg_validatebackup would work even with newer backups, all of which I
    think the json structure of the manifest helps us with, but that's when
    we're building the manifest and know what it's going to look like.
    
    Maybe to put it another way- would a patch be accepted to make
    pg_validatebackup work with other manifests..?  If not, then I'd keep
    this to the more specific "this tool is used to validate backups taken
    using pg_basebackup".
    
    > +  <para>
    > +   <application>pg_validatebackup</application> reads the manifest file of a
    > +   backup, verifies the manifest against its own internal checksum, and then
    > +   verifies that the same files are present in the target directory as in the
    > +   manifest itself. It then verifies that each file has the expected checksum,
    > +   unless the backup was taken the checksum algorithm set to
    
    "was taken with the checksum algorithm"...
    
    > +   <literal>none</literal>, in which case checksum verification is not
    > +   performed. The presence or absence of directories is not checked, except
    > +   indirectly: if a directory is missing, any files it should have contained
    > +   will necessarily also be missing. Certain files and directories are
    > +   excluded from verification:
    > +  </para>
    > +
    > +  <itemizedlist>
    > +    <listitem>
    > +      <para>
    > +        <literal>backup_manifest</literal> is ignored because the backup
    > +        manifest is logically not part of the backup and does not include
    > +        any entry for itself.
    > +      </para>
    > +    </listitem>
    
    This seems a bit confusing, doesn't it?  The backup_manifest must exist,
    and its checksum is internal, and is checked, isn't it?  Why say that
    it's excluded..?
    
    > +    <listitem>
    > +      <para>
    > +        <literal>pg_wal</literal> is ignored because WAL files are sent
    > +        separately from the backup, and are therefore not described by the
    > +        backup manifest.
    > +      </para>
    > +    </listitem>
    
    I don't agree with the choice to exclude the WAL files, considering
    they're an integral part of a backup, to exclude them means that if
    they've been corrupted at all then the entire backup is invalid.  You
    don't want to be discovering that when you're trying to do a restore of
    a backup that you took with pg_basebackup and which pg_validatebackup
    says is valid.  After all, the tool being used here, pg_basebackup,
    *does* also stream the WAL files- there's no reason why we can't
    calculate a checksum on them and store that checksum somewhere and use
    it to validate the WAL files.  This, in my opinion, is actually a
    show-stopper for this feature.  Claiming it's a valid backup when we
    don't check the absolutely necessary-for-restore WAL is making a false
    claim, no matter how well it's documented.
    
    I do understand that it's possible to run pg_basebackup without the WAL
    files being grabbed as part of that run- in such a case, we should be
    able to detect that was the case for the backup and when running
    pg_validatebackup we should issue a WARNING that the WAL files weren't
    able to be verified (we could have an option to suppress that warning if
    people feel that's needed).
    
    > +    <listitem>
    > +      <para>
    > +        <literal>postgesql.auto.conf</literal>,
    > +        <literal>standby.signal</literal>,
    > +        and <literal>recovery.signal</literal> are ignored because they may
    > +        sometimes be created or modified by the backup client itself.
    > +        (For example, <literal>pg_basebackup -R</literal> will modify
    > +        <literal>postgresql.auto.conf</literal> and create
    > +        <literal>standby.signal</literal>.)
    > +      </para>
    > +    </listitem>
    > +  </itemizedlist>
    > + </refsect1>
    
    Not really thrilled with this (pg_basebackup certainly could figure out
    the checksum for those files...), but I also don't think it's a huge
    issue as they can be recreated by a user (unlike a WAL file..).
    
    I got through most of the pg_basebackup changes, and they looked pretty
    good in general.  Will try to review more tomorrow.
    
    Thanks,
    
    Stephen
    
  124. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-24T03:43:04Z

    On Mon, Mar 23, 2020 at 9:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Mon, Mar 23, 2020 at 7:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > > /src/bin/pg_validatebackup/tmp_check/t_002_algorithm_master_data/backup/none
    > > --manifest-checksum none --no-sync
    > > \tmp_install\bin\pg_basebackup.EXE: illegal option -- manifest-checksum
    > >
    > > It seems the option to be used should be --manifest-checksums.  The
    > > attached patch fixes this problem for me.
    >
    > OK, incorporated that.
    >
    > > > t/002_algorithm.pl ..... 4/19 #   Failed test 'validate backup with
    > > > algorithm "none"'
    > > > #   at t/002_algorithm.pl line 53.
    > > >
    > >
    > > The error message for the above failure is:
    > > pg_validatebackup: fatal: could not parse backup manifest: both
    > > pathname and encoded pathname
    > >
    > > I don't know at this stage what could cause this?  Any pointers?
    >
    > I think I forgot an initializer. Try this version.
    >
    
    All others except one are passing now.  See the summary of the failed
    test below and attached are failed run logs.
    
    Test Summary Report
    -------------------
    t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
      Non-zero exit status: 255
      Parse errors: Bad plan.  You planned 44 tests but ran 14.
    Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
    Result: FAIL
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
  125. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-24T17:00:38Z

    On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > All others except one are passing now.  See the summary of the failed
    > test below and attached are failed run logs.
    >
    > Test Summary Report
    > -------------------
    > t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
    >   Non-zero exit status: 255
    >   Parse errors: Bad plan.  You planned 44 tests but ran 14.
    > Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
    > Result: FAIL
    
    Hmm. It looks like it's trying to remove the symlink that points to
    the tablespace directory, and failing with no error message. I could
    set that permutation to be skipped on Windows, or maybe there's an
    alternate method you can suggest that would work?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  126. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-24T18:04:24Z

    On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost@snowman.net> wrote:
    > While I get the desire to have a default here that includes checksums,
    > the way the command is structured, it strikes me as odd that the lack of
    > MANIFEST_CHECKSUMS in the command actually results in checksums being
    > included.
    
    I don't think that's quite accurate, because the default for the
    MANIFEST option is 'no', so the actual default if you say nothing
    about manifests at all, you don't get one. However, it is true that if
    you ask for a manifest and you don't specify the type of checksums,
    you get CRC-32C. We could change it so that if you ask for a manifest
    you must also specify the type of checksum, but I don't see any
    advantage in that approach. Nothing prevents the client from
    specifying the value if it cares, but making the default "I don't
    care, you pick" seems pretty sensible. It could be really helpful if,
    for example, we decide to remove the initial default in a future
    release for some reason. Then the client just keeps working without
    needing to change anything, but anyone who explicitly specified the
    old default gets an error.
    
    > > +        Disables generation of a backup manifest. If this option is not
    > > +        specified, the server will and send generate a backup manifest
    > > +        which can be verified using <xref linkend="app-pgvalidatebackup" />.
    > > +       </para>
    > > +      </listitem>
    > > +     </varlistentry>
    >
    > How about "If this option is not specified, the server will generate and
    > send a backup manifest which can be verified using ..."
    
    Good suggestion. :-)
    
    > > +     <varlistentry>
    > > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
    > > +      <listitem>
    > > +       <para>
    > > +        Specifies the algorithm that should be used to checksum each file
    > > +        for purposes of the backup manifest. Currently, the available
    > > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
    > > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
    > > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
    > > +        The default is <literal>CRC32C</literal>.
    > > +       </para>
    >
    > As I recall, there was an invitation to argue about the defaults at one
    > point, and so I'm going to say here that I would advocate for a
    > different default than 'crc32c'.  Specifically, I would think sha256 or
    > 512 would be better.  I don't recall seeing a debate about this that
    > conclusively found crc32c to be better, but I'm happy to go back and
    > reread anything someone wants to point me at.
    
    It was discussed upthread. Andrew Dunstan argued that there was no
    reason to use a cryptographic checksum here and that we shouldn't do
    so gratuitously. Suraj Kharage found that CRC-32C has very little
    performance impact but that any of the SHA functions slow down backups
    considerably. David Steele pointed out that you'd need a better
    checksum if you wanted to use it for purposes such as delta restore,
    with which I agree, but that's not the design center for this feature.
    I concluded that different people wanted different things, so that we
    ought to make this configurable, but that CRC-32C is a good default.
    It has approximately a 99.9999999767169% chance of detecting a random
    error, which is pretty good, and it doesn't drastically slow down
    backups, which is also good.
    
    > It also seems a bit silly to me that using the defaults means having to
    > deal with two different algorithms- crc32c and sha256.  Considering how
    > fast these algorithms are, compared to everything else involved in a
    > backup (particularly one that's likely going across a network...), I
    > wonder if we should say "may slightly increase" above.
    
    Actually, Suraj's results upthread show that it's a pretty big hit.
    
    > > +       <para>
    > > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
    > > +        hash function, so it is only suitable for protecting against
    > > +        inadvertent or random modifications to a backup. An adversary
    > > +        who can modify the backup could easily do so in such a way that
    > > +        the CRC does not change, whereas a SHA collision will be hard
    > > +        to manufacture. (However, note that if the attacker also has access
    > > +        to modify the backup manifest itself, no checksum algorithm will
    > > +        provide any protection.) An additional advantage of the
    > > +        <literal>SHA</literal> family of functions is that they output
    > > +        a much larger number of bits.
    > > +       </para>
    >
    > I'm not really sure that this paragraph is sensible to include..  We
    > certainly don't talk about adversaries and cryptographic hash functions
    > when we talk about our page-level checksums, for example.  I'm not
    > completely against including it, but I don't want to give the impression
    > that this is something we routinely consider or that lack of discussion
    > elsewhere implies we have protections against a determined attacker.
    
    Given the skepticism from some quarters about CRC-32C on this thread,
    I didn't want to oversell it. Also, I do think that these things are
    possibly things that we should consider more widely. I agree with
    Andrew's complaint that it's far too easy to just throw SHA<lots> at
    problems that don't really require it without any actually good
    reason. Spelling out our reasons for choosing certain algorithms for
    certain purposes seems like a good habit to get into, and if we
    haven't done it in other places, maybe we should. On the other hand,
    while I'm inclined to keep this paragraph, I won't lose much sleep if
    we decide to remove it.
    
    > > + <refnamediv>
    > > +  <refname>pg_validatebackup</refname>
    > > +  <refpurpose>verify the integrity of a base backup of a
    > > +  <productname>PostgreSQL</productname> cluster</refpurpose>
    > > + </refnamediv>
    >
    > "verify the integrity of a backup taken using pg_basebackup"
    
    OK.
    
    > This seems to invite the idea that pg_validatebackup should be able to
    > work with external backup solutions- but I'm a bit concerned by that
    > idea because it seems like it would then mean we'd have to be
    > particularly careful when changing things in this area, and I'm not
    > thrilled by that.  I'd like to make sure that new versions of
    > pg_validatebackup work with older backups, and, ideally, older versions
    > of pg_validatebackup would work even with newer backups, all of which I
    > think the json structure of the manifest helps us with, but that's when
    > we're building the manifest and know what it's going to look like.
    
    Both you and David made forceful arguments that this needed to be JSON
    rather than an ad-hoc text format precisely so that other tools could
    parse it more easily, and I just spent *a lot* of time making the JSON
    parsing stuff work precisely so that you could have that. This project
    would've been done a month ago if not for that. I don't care all that
    much whether we remove the mention here, but the idea that using JSON
    was so that pg_validatebackup could manage compatibility issues is
    just not correct. The version number on line 1 of the file was more
    than sufficient for that purpose.
    
    > > +  <para>
    > > +   <application>pg_validatebackup</application> reads the manifest file of a
    > > +   backup, verifies the manifest against its own internal checksum, and then
    > > +   verifies that the same files are present in the target directory as in the
    > > +   manifest itself. It then verifies that each file has the expected checksum,
    > > +   unless the backup was taken the checksum algorithm set to
    >
    > "was taken with the checksum algorithm"...
    
    Oops. Will fix.
    
    > > +  <itemizedlist>
    > > +    <listitem>
    > > +      <para>
    > > +        <literal>backup_manifest</literal> is ignored because the backup
    > > +        manifest is logically not part of the backup and does not include
    > > +        any entry for itself.
    > > +      </para>
    > > +    </listitem>
    >
    > This seems a bit confusing, doesn't it?  The backup_manifest must exist,
    > and its checksum is internal, and is checked, isn't it?  Why say that
    > it's excluded..?
    
    Well, there's no entry in the backup manifest for backup_manifest
    itself. Normally, the presence of a file not mentioned in
    backup_manifest would cause a complaint about an extra file, but
    because backup_manifest is in the ignore list, it doesn't.
    
    > > +    <listitem>
    > > +      <para>
    > > +        <literal>pg_wal</literal> is ignored because WAL files are sent
    > > +        separately from the backup, and are therefore not described by the
    > > +        backup manifest.
    > > +      </para>
    > > +    </listitem>
    >
    > I don't agree with the choice to exclude the WAL files, considering
    > they're an integral part of a backup, to exclude them means that if
    > they've been corrupted at all then the entire backup is invalid.  You
    > don't want to be discovering that when you're trying to do a restore of
    > a backup that you took with pg_basebackup and which pg_validatebackup
    > says is valid.  After all, the tool being used here, pg_basebackup,
    > *does* also stream the WAL files- there's no reason why we can't
    > calculate a checksum on them and store that checksum somewhere and use
    > it to validate the WAL files.  This, in my opinion, is actually a
    > show-stopper for this feature.  Claiming it's a valid backup when we
    > don't check the absolutely necessary-for-restore WAL is making a false
    > claim, no matter how well it's documented.
    
    The default for pg_basebackup is -Xstream, which means that the WAL
    files are being sent over a separate connection that has no connection
    to the original session. The server, when generating the backup
    manifest, has no idea what WAL files are being sent over that separate
    connection, and thus cannot include them in the manifest. This problem
    could be "solved" by having the client generate the manifest rather
    than the server, but I think that cure would be worse than the
    disease. As it stands, the manifest provides some protection against
    transmission errors, which would be lost with that design. As you
    point out, this clearly can't be done with -Xnone. I think it would be
    possible to support this with -Xfetch, but we'd have to have the
    manifest itself specify whether or not it included files in pg_wal,
    which would require complicating the format a bit. I don't think that
    makes sense. I assume -Xstream is the most commonly-used mode, because
    the default used to be -Xfetch and we changed it, which I think we
    would not have done unless people liked -Xstream significantly better.
    Adding complexity to cater to a non-default case which I suspect is
    not widely used doesn't really make sense to me.
    
    In the future, we might want to consider improvements which could make
    validation of pg_wal feasible in common cases. Specifically, suppose
    that pg_basebackup could receive the manifest from the server, keep
    all the entries for the existing files just as they are, but add
    entries for WAL files and anything else it may have added to the
    backup, recompute the manifest checksum, and store the resulting
    revised manifest with the backup. That, I think, would be fairly cool,
    but it's a significant body of additional development work, and this
    is already quite a large patch. The patch itself has grown to about
    3000 lines, and has already 10 preparatory commits doing another ~1500
    lines of refactoring to prepare for it.
    
    > Not really thrilled with this (pg_basebackup certainly could figure out
    > the checksum for those files...), but I also don't think it's a huge
    > issue as they can be recreated by a user (unlike a WAL file..).
    
    Yeah, same issues, though. Here again, there are several possible
    fixes: (1) make the server modify those files rather than letting
    pg_basebackup do it; (2) make the client compute the manifest rather
    than the server; (3) have the client revise the manifest.  (3) makes
    most sense to me, but I think that it would be better to return to
    that topic at a later date. This is certainly not a perfect feature as
    things stand but I believe it is good enough to provide significant
    benefits.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  127. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-25T06:53:23Z

    On Tue, Mar 24, 2020 at 10:30 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > > All others except one are passing now.  See the summary of the failed
    > > test below and attached are failed run logs.
    > >
    > > Test Summary Report
    > > -------------------
    > > t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
    > >   Non-zero exit status: 255
    > >   Parse errors: Bad plan.  You planned 44 tests but ran 14.
    > > Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
    > > Result: FAIL
    >
    > Hmm. It looks like it's trying to remove the symlink that points to
    > the tablespace directory, and failing with no error message. I could
    > set that permutation to be skipped on Windows, or maybe there's an
    > alternate method you can suggest that would work?
    >
    
    We can use rmdir() for Windows.  The attached patch fixes the failure
    for me. I have tried the test on CentOS as well after the fix and it
    passes there as well.
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
  128. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-25T13:31:06Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Mon, Mar 23, 2020 at 6:42 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > While I get the desire to have a default here that includes checksums,
    > > the way the command is structured, it strikes me as odd that the lack of
    > > MANIFEST_CHECKSUMS in the command actually results in checksums being
    > > included.
    > 
    > I don't think that's quite accurate, because the default for the
    > MANIFEST option is 'no', so the actual default if you say nothing
    > about manifests at all, you don't get one. However, it is true that if
    > you ask for a manifest and you don't specify the type of checksums,
    > you get CRC-32C. We could change it so that if you ask for a manifest
    > you must also specify the type of checksum, but I don't see any
    > advantage in that approach. Nothing prevents the client from
    > specifying the value if it cares, but making the default "I don't
    > care, you pick" seems pretty sensible. It could be really helpful if,
    > for example, we decide to remove the initial default in a future
    > release for some reason. Then the client just keeps working without
    > needing to change anything, but anyone who explicitly specified the
    > old default gets an error.
    
    I get that the default for manifest is 'no', but I don't really see how
    that means that the lack of saying anything about checksums should mean
    "give me crc32c checksums".  It's really rather common that if we don't
    specify something, it means don't do that thing- like an 'ORDER BY'
    clause.  We aren't designing SQL here, so I'm not going to get terribly
    upset if you push forward with "if you don't want checksums, you have to
    explicitly say MANIFEST_CHECKSUMS no", but I don't agree with the
    reasoning here.
    
    > > > +     <varlistentry>
    > > > +      <term><option>--manifest-checksums=<replaceable class="parameter">algorithm</replaceable></option></term>
    > > > +      <listitem>
    > > > +       <para>
    > > > +        Specifies the algorithm that should be used to checksum each file
    > > > +        for purposes of the backup manifest. Currently, the available
    > > > +        algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
    > > > +        <literal>SHA224</literal>, <literal>SHA256</literal>,
    > > > +        <literal>SHA384</literal>, and <literal>SHA512</literal>.
    > > > +        The default is <literal>CRC32C</literal>.
    > > > +       </para>
    > >
    > > As I recall, there was an invitation to argue about the defaults at one
    > > point, and so I'm going to say here that I would advocate for a
    > > different default than 'crc32c'.  Specifically, I would think sha256 or
    > > 512 would be better.  I don't recall seeing a debate about this that
    > > conclusively found crc32c to be better, but I'm happy to go back and
    > > reread anything someone wants to point me at.
    > 
    > It was discussed upthread. Andrew Dunstan argued that there was no
    > reason to use a cryptographic checksum here and that we shouldn't do
    > so gratuitously. Suraj Kharage found that CRC-32C has very little
    > performance impact but that any of the SHA functions slow down backups
    > considerably. David Steele pointed out that you'd need a better
    > checksum if you wanted to use it for purposes such as delta restore,
    > with which I agree, but that's not the design center for this feature.
    > I concluded that different people wanted different things, so that we
    > ought to make this configurable, but that CRC-32C is a good default.
    > It has approximately a 99.9999999767169% chance of detecting a random
    > error, which is pretty good, and it doesn't drastically slow down
    > backups, which is also good.
    
    There were also comments made up-thread about how it might not be great
    for larger (eg: 1GB files, like we tend to have quite a few of...), and
    something about it being a 40 year old algorithm..  Having re-read some
    of the discussion, I'm actually more inclined to say we should be using
    sha256 instead of crc32c.
    
    > > It also seems a bit silly to me that using the defaults means having to
    > > deal with two different algorithms- crc32c and sha256.  Considering how
    > > fast these algorithms are, compared to everything else involved in a
    > > backup (particularly one that's likely going across a network...), I
    > > wonder if we should say "may slightly increase" above.
    > 
    > Actually, Suraj's results upthread show that it's a pretty big hit.
    
    So, I went back and re-read part of the thread and looked at the
    (seemingly, only one..?) post regarding timing and didn't understand
    what, exactly, was being timed there, because I didn't see the actual
    commands/script/whatever that was used to get those results included.
    
    I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
    not trying to dispute that, but what's relevent here is how much it
    impacts the time required to run the overall backup (including sync'ing
    it to disk, and possibly network transmission time..  if we're just
    comparing the time to run it through memory then, sure, the sha256
    computation time might end up being quite a bit of the time, but that's
    not really that interesting of a test..).
    
    > > > +       <para>
    > > > +        On the other hand, <literal>CRC32C</literal> is not a cryptographic
    > > > +        hash function, so it is only suitable for protecting against
    > > > +        inadvertent or random modifications to a backup. An adversary
    > > > +        who can modify the backup could easily do so in such a way that
    > > > +        the CRC does not change, whereas a SHA collision will be hard
    > > > +        to manufacture. (However, note that if the attacker also has access
    > > > +        to modify the backup manifest itself, no checksum algorithm will
    > > > +        provide any protection.) An additional advantage of the
    > > > +        <literal>SHA</literal> family of functions is that they output
    > > > +        a much larger number of bits.
    > > > +       </para>
    > >
    > > I'm not really sure that this paragraph is sensible to include..  We
    > > certainly don't talk about adversaries and cryptographic hash functions
    > > when we talk about our page-level checksums, for example.  I'm not
    > > completely against including it, but I don't want to give the impression
    > > that this is something we routinely consider or that lack of discussion
    > > elsewhere implies we have protections against a determined attacker.
    > 
    > Given the skepticism from some quarters about CRC-32C on this thread,
    > I didn't want to oversell it. Also, I do think that these things are
    > possibly things that we should consider more widely. I agree with
    > Andrew's complaint that it's far too easy to just throw SHA<lots> at
    > problems that don't really require it without any actually good
    > reason. Spelling out our reasons for choosing certain algorithms for
    > certain purposes seems like a good habit to get into, and if we
    > haven't done it in other places, maybe we should. On the other hand,
    > while I'm inclined to keep this paragraph, I won't lose much sleep if
    > we decide to remove it.
    
    I don't mind spelling out reasoning for certain algorithms over others,
    in general, this just seems a bit much.  I'm not sure we need to be
    going into what being a cryptographic hash function means every time we
    talk about any hash or checksum.  Those who actually care about
    cryptographic hash function usage really don't need someone to explain
    to them that crc32c isn't cryptographically secure.  The last sentence
    also seems kind of odd (why is a much larger number of bits, alone, an
    advantage..?).
    
    I tried to figure out a way to rewrite this and I feel like I keep
    ending up coming back to something like "CRC32C is a CRC, not a hash"
    and that kind of truism just doesn't feel terribly useful to include in
    our documentation.
    
    Maybe:
    
    "Using a SHA hash function provides a cryptographically secure digest
    of each file for users who wish to verify that the backup has not been
    tampered with, while the CRC32C algorithm provides a checksum which is
    much faster to calculate and good at catching errors due to accidental
    changes but is not resistent to targeted modifications.  Note that, to
    be useful against an adversary who has access to the backup, the backup
    manifest would need to be stored securely elsewhere or otherwise
    verified to have not been modified since the backup was taken."
    
    This at least talks about things in a positive direction (SHA hash
    functions do this, CRC32C does that) rather than in a negative tone.
    
    > > This seems to invite the idea that pg_validatebackup should be able to
    > > work with external backup solutions- but I'm a bit concerned by that
    > > idea because it seems like it would then mean we'd have to be
    > > particularly careful when changing things in this area, and I'm not
    > > thrilled by that.  I'd like to make sure that new versions of
    > > pg_validatebackup work with older backups, and, ideally, older versions
    > > of pg_validatebackup would work even with newer backups, all of which I
    > > think the json structure of the manifest helps us with, but that's when
    > > we're building the manifest and know what it's going to look like.
    > 
    > Both you and David made forceful arguments that this needed to be JSON
    > rather than an ad-hoc text format precisely so that other tools could
    > parse it more easily, and I just spent *a lot* of time making the JSON
    > parsing stuff work precisely so that you could have that. This project
    > would've been done a month ago if not for that. I don't care all that
    > much whether we remove the mention here, but the idea that using JSON
    > was so that pg_validatebackup could manage compatibility issues is
    > just not correct. The version number on line 1 of the file was more
    > than sufficient for that purpose.
    
    I stand by the decision that the manifest should be in JSON, but that's
    what is produced by the backend server as part of a base backup, which
    is quite likely going to be used by some external tools, and isn't at
    all the same as the external pg_validatebackup command that the
    discussion here is about.  I also did make the argument up-thread,
    though I'll admit that it seemed to be mostly ignored, but I make it
    still, that a simple version number sucks and using JSON does avoid some
    of the downsides from it.  Particularly, I'd love to see a v13
    pg_validatebackup able to work with a v14 pg_basebackup, even if that
    v14 pg_basebackup added some extra stuff to the manifest.  That's
    possible to do with a generic structure like JSON and not something that
    a simple version number would allow.  Yes, I admit that we might change
    the structure or the contents in a way where that wouldn't be possible
    and I'm not going to raise a fuss if we do so, but this approach gives
    us more options.
    
    Anyway, my point here was really just that *pg_validatebackup* is about
    validating backups taken with pg_basebackup.  While it's possible that
    it could be used for backups taken with other tools, I don't think
    that's really part of its actual mandate or that we're going to actively
    work to add such support in the future.
    
    > > > +  <itemizedlist>
    > > > +    <listitem>
    > > > +      <para>
    > > > +        <literal>backup_manifest</literal> is ignored because the backup
    > > > +        manifest is logically not part of the backup and does not include
    > > > +        any entry for itself.
    > > > +      </para>
    > > > +    </listitem>
    > >
    > > This seems a bit confusing, doesn't it?  The backup_manifest must exist,
    > > and its checksum is internal, and is checked, isn't it?  Why say that
    > > it's excluded..?
    > 
    > Well, there's no entry in the backup manifest for backup_manifest
    > itself. Normally, the presence of a file not mentioned in
    > backup_manifest would cause a complaint about an extra file, but
    > because backup_manifest is in the ignore list, it doesn't.
    
    Yes, I get why it's excluded from the manifest and why we have code to
    avoid complaining about it being an extra file, but this is
    documentation and, in this part of the docs, we seem to be saying that
    we're not checking/validating the manifest, and that's certainly not
    actually true.
    
    In particular, the sentence right above this list is:
    
    "Certain files and directories are excluded from verification:"
    
    but we actually do verify the manifest, that's all I'm saying here.
    
    Maybe rewording that a bit is what would help, say:
    
    "Certain files and directories are not included in the manifest:"
    
    then have the entry for backup_manifest be something like:
    "backup_manifest is not included as it is the manifest itself and is not
    logically part of the backup; backup_manifest is checked using its own
    internal validation digest" or something along those lines.
    
    > > > +    <listitem>
    > > > +      <para>
    > > > +        <literal>pg_wal</literal> is ignored because WAL files are sent
    > > > +        separately from the backup, and are therefore not described by the
    > > > +        backup manifest.
    > > > +      </para>
    > > > +    </listitem>
    > >
    > > I don't agree with the choice to exclude the WAL files, considering
    > > they're an integral part of a backup, to exclude them means that if
    > > they've been corrupted at all then the entire backup is invalid.  You
    > > don't want to be discovering that when you're trying to do a restore of
    > > a backup that you took with pg_basebackup and which pg_validatebackup
    > > says is valid.  After all, the tool being used here, pg_basebackup,
    > > *does* also stream the WAL files- there's no reason why we can't
    > > calculate a checksum on them and store that checksum somewhere and use
    > > it to validate the WAL files.  This, in my opinion, is actually a
    > > show-stopper for this feature.  Claiming it's a valid backup when we
    > > don't check the absolutely necessary-for-restore WAL is making a false
    > > claim, no matter how well it's documented.
    > 
    > The default for pg_basebackup is -Xstream, which means that the WAL
    > files are being sent over a separate connection that has no connection
    > to the original session. The server, when generating the backup
    > manifest, has no idea what WAL files are being sent over that separate
    > connection, and thus cannot include them in the manifest. This problem
    > could be "solved" by having the client generate the manifest rather
    > than the server, but I think that cure would be worse than the
    > disease. As it stands, the manifest provides some protection against
    > transmission errors, which would be lost with that design. As you
    > point out, this clearly can't be done with -Xnone. I think it would be
    > possible to support this with -Xfetch, but we'd have to have the
    > manifest itself specify whether or not it included files in pg_wal,
    > which would require complicating the format a bit. I don't think that
    > makes sense. I assume -Xstream is the most commonly-used mode, because
    > the default used to be -Xfetch and we changed it, which I think we
    > would not have done unless people liked -Xstream significantly better.
    > Adding complexity to cater to a non-default case which I suspect is
    > not widely used doesn't really make sense to me.
    
    Yeah, I get that it's not easy to figure out how to validate the WAL,
    but I stand by my opinion that it's simply not acceptable to exclude the
    necessary WAL from verification so and to claim that a backup is valid
    when we haven't checked the WAL.
    
    I agree that -Xfetch isn't commonly used and only supporting validation
    of WAL when that's used isn't a good answer.
    
    > In the future, we might want to consider improvements which could make
    > validation of pg_wal feasible in common cases. Specifically, suppose
    > that pg_basebackup could receive the manifest from the server, keep
    > all the entries for the existing files just as they are, but add
    > entries for WAL files and anything else it may have added to the
    > backup, recompute the manifest checksum, and store the resulting
    > revised manifest with the backup. That, I think, would be fairly cool,
    > but it's a significant body of additional development work, and this
    > is already quite a large patch. The patch itself has grown to about
    > 3000 lines, and has already 10 preparatory commits doing another ~1500
    > lines of refactoring to prepare for it.
    
    Having the client calculate the checksums for the WAL and add them to
    the manifest is one approach and could work, but there's others-
    
    - Have the WAL checksums be calculated during the base backup and kept
      somewhere, and then included in the manifest sent by the server- the
      backup_manifest is the last thing we send anyway, isn't it?  And
      surely at the end of the backup we actually do know all of the WAL
      that's needed for the backup to be valid, because we pass that
      information to pg_basebackup to construct the necessary backup_label
      file.
    
    - Validate the WAL using its own internal checksums instead of having
      the manifest involved at all.  That's not ideal since we wouldn't have
      cryptographically secure digests for the WAL, but at least we will
      have validated it and raised the chances that the backup will be able
      to actually be restored using PG a whole bunch.
    
    - With the 'checksum none' option, we aren't really validating contents
      of anything, so in that case it'd actually be alright to simply scan
      the WAL and make sure that we've at least got all of the WAL files
      needed to go from the start of the backup to the end.  I don't think
      just checking that the WAL files exist is a proper solution when it
      comes to a backup where the user has asked for checksums to be
      included though.  I will say that I'm really very surprised that
      pg_validatebackup wasn't already checking that we at least had the WAL
      that is needed, but I don't see any code for that.
    
    > > Not really thrilled with this (pg_basebackup certainly could figure out
    > > the checksum for those files...), but I also don't think it's a huge
    > > issue as they can be recreated by a user (unlike a WAL file..).
    > 
    > Yeah, same issues, though. Here again, there are several possible
    > fixes: (1) make the server modify those files rather than letting
    > pg_basebackup do it; (2) make the client compute the manifest rather
    > than the server; (3) have the client revise the manifest.  (3) makes
    > most sense to me, but I think that it would be better to return to
    > that topic at a later date. This is certainly not a perfect feature as
    > things stand but I believe it is good enough to provide significant
    > benefits.
    
    As I said, I don't consider these files to be as much of an issue and
    therefore excluding them and documenting that we do would be alright.  I
    don't feel that's an acceptable option for the WAL though.
    
    Thanks,
    
    Stephen
    
  129. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-25T16:50:14Z

    On Wed, Mar 25, 2020 at 9:31 AM Stephen Frost <sfrost@snowman.net> wrote:
    > I get that the default for manifest is 'no', but I don't really see how
    > that means that the lack of saying anything about checksums should mean
    > "give me crc32c checksums".  It's really rather common that if we don't
    > specify something, it means don't do that thing- like an 'ORDER BY'
    > clause.
    
    That's a fair argument, but I think the other relevant principle is
    that we try to give people useful defaults for things. I think that
    checksums are a sufficiently useful thing that having the default be
    not to do it doesn't make sense. I had the impression that you and
    David were in agreement on that point, actually.
    
    > There were also comments made up-thread about how it might not be great
    > for larger (eg: 1GB files, like we tend to have quite a few of...), and
    > something about it being a 40 year old algorithm..
    
    Well, the 512MB "limit" for CRC-32C means only that for certain very
    specific types of errors, detection is not guaranteed above that file
    size. So if you have a single flipped bit, for example, and the file
    size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
    chance of detecting the error, whereas if the file size is less than
    512MB, it is 100% certain, because of the design of the algorithm. But
    nine nines is plenty, and neither SHA nor our page-level checksums
    provide guaranteed error detection properties anyway.
    
    I'm not sure why the fact that it's a 40-year-old algorithm is
    relevant. There are many 40-year-old algorithms that are very good.
    Generally, if we discover that we're using bad 40-year-old algorithms,
    like Knuth's tape sorting stuff, we eventually figure out how to
    replace them with something else that's better. But there's no reason
    to retire an algorithm simply because it's old. I have not heard
    anyone say, for example, that we should stop using CRC-32C for XLOG
    checksums. We continue to use it for that purpose because it (1) is
    highly likely to detect any errors and (2) is very fast. Those are the
    same reasons why I think it's a good fit for this case.
    
    My guess is that if this patch is adopted as currently proposed, we
    will eventually need to replace the cryptographic hash functions due
    to the march of time. As I'm sure you realize, the problem with hash
    functions that are designed to foil an adversary is that adversaries
    keep getting smarter. So, eventually someone will probably figure out
    how to do something nefarious with SHA-512. Some other technique that
    nobody's cracked yet will need to be adopted, and then people will
    begin trying to crack that, and the whole thing will repeat. But I
    suspect that we can keep using the same non-cryptographic hash
    function essentially forever. It does not matter that people know how
    the algorithm works because it makes no pretensions of trying to foil
    an opponent. It is just trying to mix up the bits in such a way that a
    change to the file is likely to cause a change in the checksum. The
    bit-mixing properties of the algorithm do not degrade with the passage
    of time.
    
    > I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
    > not trying to dispute that, but what's relevent here is how much it
    > impacts the time required to run the overall backup (including sync'ing
    > it to disk, and possibly network transmission time..  if we're just
    > comparing the time to run it through memory then, sure, the sha256
    > computation time might end up being quite a bit of the time, but that's
    > not really that interesting of a test..).
    
    I think that http://postgr.es/m/38e29a1c-0d20-fc73-badd-ca05f7f07ffa@pgmasters.net
    is one of the more interesting emails on this topic.  My conclusion
    from that email, and the ones that led up to it, was that there is a
    40-50% overhead from doing a SHA checksum, but in pgbackrest, users
    don't see it because backups are compressed. Because the compression
    uses so much CPU time, the additional overhead from the SHA checksum
    is only a few percent more. But I don't think that it would be smart
    to slow down uncompressed backups by 40-50%. That's going to cause a
    problem for somebody, almost for sure.
    
    > Maybe:
    >
    > "Using a SHA hash function provides a cryptographically secure digest
    > of each file for users who wish to verify that the backup has not been
    > tampered with, while the CRC32C algorithm provides a checksum which is
    > much faster to calculate and good at catching errors due to accidental
    > changes but is not resistent to targeted modifications.  Note that, to
    > be useful against an adversary who has access to the backup, the backup
    > manifest would need to be stored securely elsewhere or otherwise
    > verified to have not been modified since the backup was taken."
    >
    > This at least talks about things in a positive direction (SHA hash
    > functions do this, CRC32C does that) rather than in a negative tone.
    
    Cool. I like it.
    
    > Anyway, my point here was really just that *pg_validatebackup* is about
    > validating backups taken with pg_basebackup.  While it's possible that
    > it could be used for backups taken with other tools, I don't think
    > that's really part of its actual mandate or that we're going to actively
    > work to add such support in the future.
    
    I think you're kind just nitpicking here, because the statement that
    pg_validatebackup can validate not only a backup taken by
    pg_basebackup but also a backup taken in using some compatible method
    is just a tautology. But I'll remove the reference.
    
    > In particular, the sentence right above this list is:
    >
    > "Certain files and directories are excluded from verification:"
    >
    > but we actually do verify the manifest, that's all I'm saying here.
    >
    > Maybe rewording that a bit is what would help, say:
    >
    > "Certain files and directories are not included in the manifest:"
    
    Well, that'd be wrong, though. It's true that backup_manifest won't
    have an entry in the manifest, and neither will WAL files, but
    postgresql.auto.conf will. We'll just skip complaining about it if the
    checksum doesn't match or whatever. The server generates manifest
    entries for everything, and the client decides not to pay attention to
    some of them because it knows that pg_basebackup may have made certain
    changes that were not known to the server.
    
    > Yeah, I get that it's not easy to figure out how to validate the WAL,
    > but I stand by my opinion that it's simply not acceptable to exclude the
    > necessary WAL from verification so and to claim that a backup is valid
    > when we haven't checked the WAL.
    
    I hear that, but I don't agree that having nothing is better than
    having this much committed. I would be fine with renaming the tool
    (pg_validatebackupmanifest? pg_validatemanifest?), or with updating
    the documentation to be more clear about what is and is not checked,
    but I'm not going to extent the tool to do totally new things for
    which we don't even have an agreed design yet. I believe in trying to
    create patches that do one thing and do it well, and this patch does
    that. The fact that it doesn't do some other thing that is
    conceptually related yet different is a good thing, not a bad one.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  130. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-25T20:54:33Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Wed, Mar 25, 2020 at 9:31 AM Stephen Frost <sfrost@snowman.net> wrote:
    > > I get that the default for manifest is 'no', but I don't really see how
    > > that means that the lack of saying anything about checksums should mean
    > > "give me crc32c checksums".  It's really rather common that if we don't
    > > specify something, it means don't do that thing- like an 'ORDER BY'
    > > clause.
    > 
    > That's a fair argument, but I think the other relevant principle is
    > that we try to give people useful defaults for things. I think that
    > checksums are a sufficiently useful thing that having the default be
    > not to do it doesn't make sense. I had the impression that you and
    > David were in agreement on that point, actually.
    
    I agree with wanting to have useful defaults and that checksums should
    be included by default, and I'm alright even with letting people pick
    what algorithms they'd like to have too.  The construct here is made odd
    because we've got this idea that "no checksum" is an option, which is
    actually something that I don't particularly like, but that's what's
    making this particular syntax weird.  I don't suppose you'd be open to
    the idea of just dropping that though..?  There wouldn't be any issue
    with this syntax if we just always had checksums included when a
    manifest is requested. :)
    
    Somehow, I don't think I'm going to win that argument.
    
    > > There were also comments made up-thread about how it might not be great
    > > for larger (eg: 1GB files, like we tend to have quite a few of...), and
    > > something about it being a 40 year old algorithm..
    > 
    > Well, the 512MB "limit" for CRC-32C means only that for certain very
    > specific types of errors, detection is not guaranteed above that file
    > size. So if you have a single flipped bit, for example, and the file
    > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
    > chance of detecting the error, whereas if the file size is less than
    > 512MB, it is 100% certain, because of the design of the algorithm. But
    > nine nines is plenty, and neither SHA nor our page-level checksums
    > provide guaranteed error detection properties anyway.
    
    Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
    for exactly what it's designed to be useful for, but we also know that
    we're going to have larger files- at least 1GB ones, and quite possibly
    larger, so why are we choosing this?
    
    At the least, wouldn't it make sense to consider a larger CRC, one whose
    limit is above the size of commonly expected files, if we're going to
    use a CRC?
    
    > I'm not sure why the fact that it's a 40-year-old algorithm is
    > relevant. There are many 40-year-old algorithms that are very good.
    
    Sure there are, but there probably wasn't a lot of thought about
    GB-sized files, and this doesn't really seem to be the direction people
    are going in for larger objects.  s3, as an example, uses sha256.
    Google, it seems, suggests folks use "HighwayHash" (from their crc32c
    github repo- https://github.com/google/crc32c).  Most CRC uses seem to
    be for much smaller data sets.
    
    > My guess is that if this patch is adopted as currently proposed, we
    > will eventually need to replace the cryptographic hash functions due
    > to the march of time. As I'm sure you realize, the problem with hash
    > functions that are designed to foil an adversary is that adversaries
    > keep getting smarter. So, eventually someone will probably figure out
    > how to do something nefarious with SHA-512. Some other technique that
    > nobody's cracked yet will need to be adopted, and then people will
    > begin trying to crack that, and the whole thing will repeat. But I
    > suspect that we can keep using the same non-cryptographic hash
    > function essentially forever. It does not matter that people know how
    > the algorithm works because it makes no pretensions of trying to foil
    > an opponent. It is just trying to mix up the bits in such a way that a
    > change to the file is likely to cause a change in the checksum. The
    > bit-mixing properties of the algorithm do not degrade with the passage
    > of time.
    
    Sure, there's a good chance we'll need newer algorithms in the future, I
    don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
    the perfect answer and no one will ever need something better, then
    what's with folks like Google suggesting something else..?
    
    > > I'm sure that sha256 takes a lot more time than crc32c, I'm certainly
    > > not trying to dispute that, but what's relevent here is how much it
    > > impacts the time required to run the overall backup (including sync'ing
    > > it to disk, and possibly network transmission time..  if we're just
    > > comparing the time to run it through memory then, sure, the sha256
    > > computation time might end up being quite a bit of the time, but that's
    > > not really that interesting of a test..).
    > 
    > I think that http://postgr.es/m/38e29a1c-0d20-fc73-badd-ca05f7f07ffa@pgmasters.net
    > is one of the more interesting emails on this topic.  My conclusion
    > from that email, and the ones that led up to it, was that there is a
    > 40-50% overhead from doing a SHA checksum, but in pgbackrest, users
    > don't see it because backups are compressed. Because the compression
    > uses so much CPU time, the additional overhead from the SHA checksum
    > is only a few percent more. But I don't think that it would be smart
    > to slow down uncompressed backups by 40-50%. That's going to cause a
    > problem for somebody, almost for sure.
    
    I like that email on the topic also, as it points out again (as I tried
    to do earlier also..) that it depends on what we're actually including
    in the test- and it seems, again, that those tests didn't consider the
    time to actually write the data somewhere, either network or disk.
    
    As for folks who are that close to the edge on their backup timing that
    they can't have it slow down- chances are pretty darn good that they're
    not far from ending up needing to find a better solution than
    pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
    suppose, they could have one but not have checksums..).
    
    > > In particular, the sentence right above this list is:
    > >
    > > "Certain files and directories are excluded from verification:"
    > >
    > > but we actually do verify the manifest, that's all I'm saying here.
    > >
    > > Maybe rewording that a bit is what would help, say:
    > >
    > > "Certain files and directories are not included in the manifest:"
    > 
    > Well, that'd be wrong, though. It's true that backup_manifest won't
    > have an entry in the manifest, and neither will WAL files, but
    > postgresql.auto.conf will. We'll just skip complaining about it if the
    > checksum doesn't match or whatever. The server generates manifest
    > entries for everything, and the client decides not to pay attention to
    > some of them because it knows that pg_basebackup may have made certain
    > changes that were not known to the server.
    
    Ok, but it's also wrong to say that the backup_label is excluded from
    verification.
    
    > > Yeah, I get that it's not easy to figure out how to validate the WAL,
    > > but I stand by my opinion that it's simply not acceptable to exclude the
    > > necessary WAL from verification so and to claim that a backup is valid
    > > when we haven't checked the WAL.
    > 
    > I hear that, but I don't agree that having nothing is better than
    > having this much committed. I would be fine with renaming the tool
    > (pg_validatebackupmanifest? pg_validatemanifest?), or with updating
    > the documentation to be more clear about what is and is not checked,
    > but I'm not going to extent the tool to do totally new things for
    > which we don't even have an agreed design yet. I believe in trying to
    > create patches that do one thing and do it well, and this patch does
    > that. The fact that it doesn't do some other thing that is
    > conceptually related yet different is a good thing, not a bad one.
    
    I fail to see the usefulness of a tool that doesn't actually verify that
    the backup is able to be restored from.
    
    Even pg_basebackup (in both fetch and stream modes...) checks that we at
    least got all the WAL that's needed for the backup from the server
    before considering the backup to be valid and telling the user that
    there was a successful backup.  With what you're proposing here, we
    could have someone do a pg_basebackup, get back an ERROR saying the
    backup wasn't valid, and then run pg_validatebackup and be told that the
    backup is valid.  I don't get how that's sensible.
    
    Thanks,
    
    Stephen
    
  131. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-26T15:37:48Z

    On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > That's a fair argument, but I think the other relevant principle is
    > > that we try to give people useful defaults for things. I think that
    > > checksums are a sufficiently useful thing that having the default be
    > > not to do it doesn't make sense. I had the impression that you and
    > > David were in agreement on that point, actually.
    >
    > I agree with wanting to have useful defaults and that checksums should
    > be included by default, and I'm alright even with letting people pick
    > what algorithms they'd like to have too.  The construct here is made odd
    > because we've got this idea that "no checksum" is an option, which is
    > actually something that I don't particularly like, but that's what's
    > making this particular syntax weird.  I don't suppose you'd be open to
    > the idea of just dropping that though..?  There wouldn't be any issue
    > with this syntax if we just always had checksums included when a
    > manifest is requested. :)
    >
    > Somehow, I don't think I'm going to win that argument.
    
    Well, it's not a crazy idea. So, at some point, I had the idea that
    you were always going to get a manifest, and therefore you should at
    least ought to have the option of not checksumming to avoid the
    overhead. But, as things stand now, you can suppress the manifest
    altogether, so that you can still take a backup even if you've got no
    disk space to spool the manifest on the master. So, if you really want
    no overhead from manifests, just don't have a manifest. And if you are
    OK with some overhead, why not at least have a CRC-32C checksum, which
    is, after all, pretty cheap?
    
    Now, on the other hand, I don't have any strong evidence that the
    manifest-without-checksums mode is useless. You can still use it to
    verify that you have the correct files and that those files have the
    expected sizes. And, verifying those things is very cheap, because you
    only need to stat() each file, not open and read them all. True, you
    can do those things by using pg_validatebackup -s. But, you'd still
    incur the (admittedly fairly low) overhead of computing checksums that
    you don't intend to use.
    
    This is where I feel like I'm trying to make decisions in a vacuum. If
    we had a few more people weighing in on the thread on this point, I'd
    be happy to go with whatever the consensus was. If most people think
    having both --no-manifest (suppressing the manifest completely) and
    --manifest-checksums=none (suppressing only the checksums) is useless
    and confusing, then sure, let's rip the latter one out. If most people
    like the flexibility, let's keep it: it's already implemented and
    tested. But I hate to base the decision on what one or two people
    think.
    
    > > Well, the 512MB "limit" for CRC-32C means only that for certain very
    > > specific types of errors, detection is not guaranteed above that file
    > > size. So if you have a single flipped bit, for example, and the file
    > > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
    > > chance of detecting the error, whereas if the file size is less than
    > > 512MB, it is 100% certain, because of the design of the algorithm. But
    > > nine nines is plenty, and neither SHA nor our page-level checksums
    > > provide guaranteed error detection properties anyway.
    >
    > Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
    > for exactly what it's designed to be useful for, but we also know that
    > we're going to have larger files- at least 1GB ones, and quite possibly
    > larger, so why are we choosing this?
    >
    > At the least, wouldn't it make sense to consider a larger CRC, one whose
    > limit is above the size of commonly expected files, if we're going to
    > use a CRC?
    
    I mean, you're just repeating the same argument here, and it's just
    not valid. Regardless of the file size, the chances of a false
    checksum match are literally less than one in a billion. There is
    every reason to believe that users will be happy with a low-overhead
    method that has a 99.9999999+% chance of detecting corrupt files. I do
    agree that a 64-bit CRC would probably be not much more expensive and
    improve the probability of detecting errors even further, but I wanted
    to restrict this patch to using infrastructure we already have. The
    choices there are the various SHA functions (so I supported those),
    MD5 (which I deliberately omitted, for reasons I hope you'll be the
    first to agree with), CRC-32C (which is fast), a couple of other
    CRC-32 variants (which I omitted because they seemed redundant and one
    of them only ever existed in PostgreSQL because of a coding mistake),
    and the hacked-up version of FNV that we use for page-level checksums
    (which is only 16 bits and seems to have no advantages for this
    purpose).
    
    > > I'm not sure why the fact that it's a 40-year-old algorithm is
    > > relevant. There are many 40-year-old algorithms that are very good.
    >
    > Sure there are, but there probably wasn't a lot of thought about
    > GB-sized files, and this doesn't really seem to be the direction people
    > are going in for larger objects.  s3, as an example, uses sha256.
    > Google, it seems, suggests folks use "HighwayHash" (from their crc32c
    > github repo- https://github.com/google/crc32c).  Most CRC uses seem to
    > be for much smaller data sets.
    
    Again, I really want to stick with infrastructure we already have.
    Trying to find a hash function that will please everybody is a hole
    with no bottom, or more to the point, a bikeshed in need of painting.
    There are TONS of great hash functions out there on the Internet, and
    as previous discussions of pgsql-hackers will attest, as soon as you
    go down that road, somebody will say "well, what about xxhash" or
    whatever, and then you spend the rest of your life trying to figure
    out what hash function we could try to commit that is fast and secure
    and doesn't have copyright or patent problems. There have been
    multiple efforts to introduce such hash functions in the past, and I
    think basically all of those have crashed into a brick wall.
    
    I don't think that's because introducing new hash functions is a bad
    idea. I think that there are various reasons why it might be a good
    idea. For instance, highwayhash purports to be a cryptographic hash
    function that is fast enough to replace non-cryptographic hash
    functions. It's easy to see why someone might want that, here. For
    example, it would be entirely reasonable to copy the backup manifest
    onto a USB key and store it in a vault. Later, if you get the USB key
    back out of the vault and validate it against the backup, you pretty
    much know that none of the data files have been tampered with,
    provided that you used a cryptographic hash. So, SHA is a good option
    for people who have a USB key and a vault, and a faster cryptographic
    might be even better. I don't have any desire to block such proposals,
    and I would be thrilled if this work inspires other people to add such
    options. However, I also don't want this patch to get blocked by an
    interminable argument about which hash functions we ought to use. The
    ones we have in core now are good enough for a start, and more can be
    added later.
    
    > Sure, there's a good chance we'll need newer algorithms in the future, I
    > don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
    > the perfect answer and no one will ever need something better, then
    > what's with folks like Google suggesting something else..?
    
    I have never said that CRC was the perfect answer, and the reason why
    Google is suggesting something different is because they wanted a fast
    hash (not SHA) that still has cryptographic properties. What I have
    said is that using CRC-32C by default means that there is very little
    downside as compared with current releases. Backups will not get
    slower, and error detection will get better. If you pick any other
    default from the menu of options currently available, then either
    backups get noticeably slower, or we get less error detection
    capability than that option gives us.
    
    > As for folks who are that close to the edge on their backup timing that
    > they can't have it slow down- chances are pretty darn good that they're
    > not far from ending up needing to find a better solution than
    > pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
    > suppose, they could have one but not have checksums..).
    
    40-50% is a lot more than "if you were on the edge."
    
    > > Well, that'd be wrong, though. It's true that backup_manifest won't
    > > have an entry in the manifest, and neither will WAL files, but
    > > postgresql.auto.conf will. We'll just skip complaining about it if the
    > > checksum doesn't match or whatever. The server generates manifest
    > > entries for everything, and the client decides not to pay attention to
    > > some of them because it knows that pg_basebackup may have made certain
    > > changes that were not known to the server.
    >
    > Ok, but it's also wrong to say that the backup_label is excluded from
    > verification.
    
    The docs don't say that backup_label is excluded from verification.
    They do say that backup_manifest is excluded from verification
    *against the manifest*, because it is. I'm not sure if you're honestly
    confused here or if we're just devolving into arguing for the sake of
    argument, but right now the code looks like this:
    
        simple_string_list_append(&context.ignore_list, "backup_manifest");
        simple_string_list_append(&context.ignore_list, "pg_wal");
        simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
        simple_string_list_append(&context.ignore_list, "recovery.signal");
        simple_string_list_append(&context.ignore_list, "standby.signal");
    
    Notice that this is the same list of files mentioned in the
    documentation. Now let's suppose we remove the first of those lines of
    code, so that backup_manifest is not in the exclude list by default.
    Now let's try to validate a backup:
    
    [rhaas pgsql]$ src/bin/pg_validatebackup/pg_validatebackup ~/pgslave
    pg_validatebackup: error: "backup_manifest" is present on disk but not
    in the manifest
    
    Oops. If you read that error carefully, you can see that the complaint
    is 100% valid. backup_manifest is indeed present on disk, but not in
    the manifest. However, because this situation is expected and known
    not to be a problem, the right thing to do is suppress the error. That
    is why it is in the ignore_list by default. The documentation is
    attempting to explain this. If it's unclear, we should try to make it
    better, but it is absolutely NOT saying that there is no internal
    validation of the backup_manifest. In fact, the previous paragraph
    tries to explain that:
    
    +   <application>pg_validatebackup</application> reads the manifest file of a
    +   backup, verifies the manifest against its own internal checksum, and then
    
    It is, however, saying, and *entirely correctly*, that
    pg_validatebackup will not check the backup_manifest file against the
    backup_manifest. If it did, it would find that it's not there. It
    would then emit an error message like the one above even though
    there's no problem with the backup.
    
    > I fail to see the usefulness of a tool that doesn't actually verify that
    > the backup is able to be restored from.
    >
    > Even pg_basebackup (in both fetch and stream modes...) checks that we at
    > least got all the WAL that's needed for the backup from the server
    > before considering the backup to be valid and telling the user that
    > there was a successful backup.  With what you're proposing here, we
    > could have someone do a pg_basebackup, get back an ERROR saying the
    > backup wasn't valid, and then run pg_validatebackup and be told that the
    > backup is valid.  I don't get how that's sensible.
    
    I'm sorry that you can't see how that's sensible, but it doesn't mean
    that it isn't sensible. It is totally unrealistic to expect that any
    backup verification tool can verify that you won't get an error when
    trying to use the backup. That would require that everything that the
    validation tool try to do everything that PostgreSQL will try to do
    when the backup is used, including running recovery and updating the
    data files. Anything less than that creates a real possibility that
    the backup will verify good but fail when used. This tool has a much
    narrower purpose, which is to try to verify that we (still) have the
    files the server sent as part of the backup and that, to the best of
    our ability to detect such things, they have not been modified. As you
    know, or should know, the WAL files are not sent as part of the
    backup, and so are not verified. Other things that would also be
    useful to check are also not verified. It would be fantastic to have
    more verification tools in the future, but it is difficult to see why
    anyone would bother trying if an attempt to get the first one
    committed gets blocked because it does not yet do everything. Very few
    patches try to do everything, and those that do usually get blocked
    because, by trying to do too much, they get some of it badly wrong.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  132. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-26T16:34:52Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > > That's a fair argument, but I think the other relevant principle is
    > > > that we try to give people useful defaults for things. I think that
    > > > checksums are a sufficiently useful thing that having the default be
    > > > not to do it doesn't make sense. I had the impression that you and
    > > > David were in agreement on that point, actually.
    > >
    > > I agree with wanting to have useful defaults and that checksums should
    > > be included by default, and I'm alright even with letting people pick
    > > what algorithms they'd like to have too.  The construct here is made odd
    > > because we've got this idea that "no checksum" is an option, which is
    > > actually something that I don't particularly like, but that's what's
    > > making this particular syntax weird.  I don't suppose you'd be open to
    > > the idea of just dropping that though..?  There wouldn't be any issue
    > > with this syntax if we just always had checksums included when a
    > > manifest is requested. :)
    > >
    > > Somehow, I don't think I'm going to win that argument.
    > 
    > Well, it's not a crazy idea. So, at some point, I had the idea that
    > you were always going to get a manifest, and therefore you should at
    > least ought to have the option of not checksumming to avoid the
    > overhead. But, as things stand now, you can suppress the manifest
    > altogether, so that you can still take a backup even if you've got no
    > disk space to spool the manifest on the master. So, if you really want
    > no overhead from manifests, just don't have a manifest. And if you are
    > OK with some overhead, why not at least have a CRC-32C checksum, which
    > is, after all, pretty cheap?
    > 
    > Now, on the other hand, I don't have any strong evidence that the
    > manifest-without-checksums mode is useless. You can still use it to
    > verify that you have the correct files and that those files have the
    > expected sizes. And, verifying those things is very cheap, because you
    > only need to stat() each file, not open and read them all. True, you
    > can do those things by using pg_validatebackup -s. But, you'd still
    > incur the (admittedly fairly low) overhead of computing checksums that
    > you don't intend to use.
    > 
    > This is where I feel like I'm trying to make decisions in a vacuum. If
    > we had a few more people weighing in on the thread on this point, I'd
    > be happy to go with whatever the consensus was. If most people think
    > having both --no-manifest (suppressing the manifest completely) and
    > --manifest-checksums=none (suppressing only the checksums) is useless
    > and confusing, then sure, let's rip the latter one out. If most people
    > like the flexibility, let's keep it: it's already implemented and
    > tested. But I hate to base the decision on what one or two people
    > think.
    
    I'm frustrated at the lack of involvement from others also.
    
    Just to be clear- I'm not completely against having a 'manifest but no
    checksum' option, but if that's what we're going to have then it seems
    like the syntax should be such that if you don't specify checksums then
    you don't get checksums and "MANIFEST_CHECKSUM none" shouldn't be a
    thing.
    
    All that said, as I said up-thread, I appreciate that we aren't
    designing SQL here and that this is pretty special syntax to begin with,
    so if you ended up committing it the way you have it now, so be it, I
    wouldn't be asking for it to be reverted over this.  It's a bit awkward
    and kind of a thorn, but it's not entirely unreasonable, and we'd
    probably end up there anyway if we started out without a 'none' option
    and someone did come up with a good argument and a patch to add such an
    option in the future.
    
    > > > Well, the 512MB "limit" for CRC-32C means only that for certain very
    > > > specific types of errors, detection is not guaranteed above that file
    > > > size. So if you have a single flipped bit, for example, and the file
    > > > size is greater than 512MB, then CRC-32C has only a 99.9999999767169%
    > > > chance of detecting the error, whereas if the file size is less than
    > > > 512MB, it is 100% certain, because of the design of the algorithm. But
    > > > nine nines is plenty, and neither SHA nor our page-level checksums
    > > > provide guaranteed error detection properties anyway.
    > >
    > > Right, so we know that CRC-32C has an upper-bound of 512MB to be useful
    > > for exactly what it's designed to be useful for, but we also know that
    > > we're going to have larger files- at least 1GB ones, and quite possibly
    > > larger, so why are we choosing this?
    > >
    > > At the least, wouldn't it make sense to consider a larger CRC, one whose
    > > limit is above the size of commonly expected files, if we're going to
    > > use a CRC?
    > 
    > I mean, you're just repeating the same argument here, and it's just
    > not valid. Regardless of the file size, the chances of a false
    > checksum match are literally less than one in a billion. There is
    > every reason to believe that users will be happy with a low-overhead
    > method that has a 99.9999999+% chance of detecting corrupt files. I do
    > agree that a 64-bit CRC would probably be not much more expensive and
    > improve the probability of detecting errors even further, but I wanted
    > to restrict this patch to using infrastructure we already have. The
    > choices there are the various SHA functions (so I supported those),
    > MD5 (which I deliberately omitted, for reasons I hope you'll be the
    > first to agree with), CRC-32C (which is fast), a couple of other
    > CRC-32 variants (which I omitted because they seemed redundant and one
    > of them only ever existed in PostgreSQL because of a coding mistake),
    > and the hacked-up version of FNV that we use for page-level checksums
    > (which is only 16 bits and seems to have no advantages for this
    > purpose).
    
    The argument that "well, we happened to already have it, even though we
    used it for much smaller data sets, which are well within the
    100%-single-bit-error detection limit" certainly doesn't make me be in
    more support of this.  Choosing the right algorithm to use maybe
    shouldn't be based on the age of that algorithm, but it also certainly
    shouldn't be "just because we already have it" when we're using it for a
    very different use-case.
    
    I'm guessing folks have already seen it, but I thought this was an
    interesting run-down of actual collisions based on various checksum
    lengths using one data set (though it's not clear exactly how big it is,
    from what I can see)-
    
    http://www.backplane.com/matt/crc64.html
    
    I do agree with excluding things like md5 and others that aren't good
    options.  I wasn't saying we should necessarily exclude crc32c either..
    but rather saying that it shouldn't be the default.
    
    Here's another way to look at it- where do we use crc32c today, and how
    much data might we possibly be covering with that crc?  Why was crc32c
    picked for that purpose?  If the individual who decided to pick crc32c
    for that case was contemplating a checksum for up-to-1GB files, would
    they have picked crc32c?  Seems unlikely to me.
    
    > > > I'm not sure why the fact that it's a 40-year-old algorithm is
    > > > relevant. There are many 40-year-old algorithms that are very good.
    > >
    > > Sure there are, but there probably wasn't a lot of thought about
    > > GB-sized files, and this doesn't really seem to be the direction people
    > > are going in for larger objects.  s3, as an example, uses sha256.
    > > Google, it seems, suggests folks use "HighwayHash" (from their crc32c
    > > github repo- https://github.com/google/crc32c).  Most CRC uses seem to
    > > be for much smaller data sets.
    > 
    > Again, I really want to stick with infrastructure we already have.
    
    I don't agree with that as a sensible justification for picking it for
    this case, because it's clearly not the same use-case.
    
    > Trying to find a hash function that will please everybody is a hole
    > with no bottom, or more to the point, a bikeshed in need of painting.
    > There are TONS of great hash functions out there on the Internet, and
    > as previous discussions of pgsql-hackers will attest, as soon as you
    > go down that road, somebody will say "well, what about xxhash" or
    > whatever, and then you spend the rest of your life trying to figure
    > out what hash function we could try to commit that is fast and secure
    > and doesn't have copyright or patent problems. There have been
    > multiple efforts to introduce such hash functions in the past, and I
    > think basically all of those have crashed into a brick wall.
    > 
    > I don't think that's because introducing new hash functions is a bad
    > idea. I think that there are various reasons why it might be a good
    > idea. For instance, highwayhash purports to be a cryptographic hash
    > function that is fast enough to replace non-cryptographic hash
    > functions. It's easy to see why someone might want that, here. For
    > example, it would be entirely reasonable to copy the backup manifest
    > onto a USB key and store it in a vault. Later, if you get the USB key
    > back out of the vault and validate it against the backup, you pretty
    > much know that none of the data files have been tampered with,
    > provided that you used a cryptographic hash. So, SHA is a good option
    > for people who have a USB key and a vault, and a faster cryptographic
    > might be even better. I don't have any desire to block such proposals,
    > and I would be thrilled if this work inspires other people to add such
    > options. However, I also don't want this patch to get blocked by an
    > interminable argument about which hash functions we ought to use. The
    > ones we have in core now are good enough for a start, and more can be
    > added later.
    
    I'm not actually argueing about which hash functions we should support,
    but rather what the default is and if crc32c, specifically, is actually
    a reasonable choice.  Just because it's fast and we already had an
    implementation of it doesn't justify its use as the default.  Given that
    it doesn't actually provide the check that is generally expected of
    CRC checksums (100% detection of single-bit errors) when the file size
    gets over 512MB makes me wonder if we should have it at all, yes, but it
    definitely makes me think it shouldn't be our default.
    
    Folks look to PG as being pretty good at figuring things out and doing
    the thing that makes sense to minimize risk of data loss or corruption.
    I can understand and agree with the desire to have a faster alternative
    to sha256 for those who don't need a cryptographically safe hash, but if
    we're going to provide that option, it should be the right answer and
    it's pretty clear, at least to me, that crc32c isn't a good choice for
    gigabyte-size files.
    
    > > Sure, there's a good chance we'll need newer algorithms in the future, I
    > > don't doubt that.  On the other hand, if crc32c, or CRC whatever, was
    > > the perfect answer and no one will ever need something better, then
    > > what's with folks like Google suggesting something else..?
    > 
    > I have never said that CRC was the perfect answer, and the reason why
    > Google is suggesting something different is because they wanted a fast
    > hash (not SHA) that still has cryptographic properties. What I have
    > said is that using CRC-32C by default means that there is very little
    > downside as compared with current releases. Backups will not get
    > slower, and error detection will get better. If you pick any other
    > default from the menu of options currently available, then either
    > backups get noticeably slower, or we get less error detection
    > capability than that option gives us.
    
    I don't agree with limiting our view to only those algorithms that we've
    already got implemented in PG.
    
    > > As for folks who are that close to the edge on their backup timing that
    > > they can't have it slow down- chances are pretty darn good that they're
    > > not far from ending up needing to find a better solution than
    > > pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
    > > suppose, they could have one but not have checksums..).
    > 
    > 40-50% is a lot more than "if you were on the edge."
    
    We can agree to disagree on this, it's not particularly relevant in the
    end.
    
    > > > Well, that'd be wrong, though. It's true that backup_manifest won't
    > > > have an entry in the manifest, and neither will WAL files, but
    > > > postgresql.auto.conf will. We'll just skip complaining about it if the
    > > > checksum doesn't match or whatever. The server generates manifest
    > > > entries for everything, and the client decides not to pay attention to
    > > > some of them because it knows that pg_basebackup may have made certain
    > > > changes that were not known to the server.
    > >
    > > Ok, but it's also wrong to say that the backup_label is excluded from
    > > verification.
    > 
    > The docs don't say that backup_label is excluded from verification.
    > They do say that backup_manifest is excluded from verification
    > *against the manifest*, because it is. I'm not sure if you're honestly
    > confused here or if we're just devolving into arguing for the sake of
    > argument, but right now the code looks like this:
    
    That you're bringing up code here is really just not sensible- we're
    talking about the documentation, not about the code here.  I do
    understand what the code is doing and I don't have any complaint about
    the code.
    
    > Oops. If you read that error carefully, you can see that the complaint
    > is 100% valid. backup_manifest is indeed present on disk, but not in
    > the manifest. However, because this situation is expected and known
    > not to be a problem, the right thing to do is suppress the error. That
    > is why it is in the ignore_list by default. The documentation is
    > attempting to explain this. If it's unclear, we should try to make it
    > better, but it is absolutely NOT saying that there is no internal
    > validation of the backup_manifest. In fact, the previous paragraph
    > tries to explain that:
    
    Yes, I think the documentation is unclear, as I said before, because it
    purports to list things that aren't being validated and then includes
    backup_manifest in that list, which doesn't make sense.  The sentence in
    question does *not* say "Certain files and directories are excluded from
    the manifest" (which is wording that I actually proposed up-thread, to
    try to address this...), it says, from the patch:
    
    "Certain files and directories are excluded from verification:"
    
    Excluded from verification.  Then lists backup_manifest.  Even though,
    earlier in that same paragraph it says that the manifest is verified
    against its own checksum.
    
    > +   <application>pg_validatebackup</application> reads the manifest file of a
    > +   backup, verifies the manifest against its own internal checksum, and then
    > 
    > It is, however, saying, and *entirely correctly*, that
    > pg_validatebackup will not check the backup_manifest file against the
    > backup_manifest. If it did, it would find that it's not there. It
    > would then emit an error message like the one above even though
    > there's no problem with the backup.
    
    It's saying, removing the listing aspect, exactly that "backup_label is
    excluded from verification".  That's what I am taking issue with.  I've
    made multiple attempts to suggest other language to avoid saying that
    because it's clearly wrong- the manifest is verified.
    
    > > I fail to see the usefulness of a tool that doesn't actually verify that
    > > the backup is able to be restored from.
    > >
    > > Even pg_basebackup (in both fetch and stream modes...) checks that we at
    > > least got all the WAL that's needed for the backup from the server
    > > before considering the backup to be valid and telling the user that
    > > there was a successful backup.  With what you're proposing here, we
    > > could have someone do a pg_basebackup, get back an ERROR saying the
    > > backup wasn't valid, and then run pg_validatebackup and be told that the
    > > backup is valid.  I don't get how that's sensible.
    > 
    > I'm sorry that you can't see how that's sensible, but it doesn't mean
    > that it isn't sensible. It is totally unrealistic to expect that any
    > backup verification tool can verify that you won't get an error when
    > trying to use the backup. That would require that everything that the
    > validation tool try to do everything that PostgreSQL will try to do
    > when the backup is used, including running recovery and updating the
    > data files. Anything less than that creates a real possibility that
    > the backup will verify good but fail when used. This tool has a much
    > narrower purpose, which is to try to verify that we (still) have the
    > files the server sent as part of the backup and that, to the best of
    > our ability to detect such things, they have not been modified. As you
    > know, or should know, the WAL files are not sent as part of the
    > backup, and so are not verified. Other things that would also be
    > useful to check are also not verified. It would be fantastic to have
    > more verification tools in the future, but it is difficult to see why
    > anyone would bother trying if an attempt to get the first one
    > committed gets blocked because it does not yet do everything. Very few
    > patches try to do everything, and those that do usually get blocked
    > because, by trying to do too much, they get some of it badly wrong.
    
    I'm not talking about making sure that no error ever happens when doing
    a restore of a particular backup.  You're arguing against something that
    I have not advocated for and which I don't advocate for.
    
    I'm saying that the existing tool that takes the backup has a *really*
    *important* verification check that this proposed "validate backup" tool
    doesn't have, and that isn't sensible.  It leads to situations where the
    backup tool itself, pg_basebackup, can fail or be killed before it's
    actually completed, and the "validate backup" tool would say that the
    backup is perfectly fine.  That is not sensible.
    
    That there might be other reasons why a backup can't be restored isn't
    relevant and I'm not asking for a tool that is perfect and does some
    kind of proof that the backup is able to be restored.
    
    Thanks,
    
    Stephen
    
  133. Re: backup manifests

    Mark Dilger <mark.dilger@enterprisedb.com> — 2020-03-26T17:40:55Z

    
    > On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
    > 
    > I'm not actually argueing about which hash functions we should support,
    > but rather what the default is and if crc32c, specifically, is actually
    > a reasonable choice.  Just because it's fast and we already had an
    > implementation of it doesn't justify its use as the default.  Given that
    > it doesn't actually provide the check that is generally expected of
    > CRC checksums (100% detection of single-bit errors) when the file size
    > gets over 512MB makes me wonder if we should have it at all, yes, but it
    > definitely makes me think it shouldn't be our default.
    
    I don't understand your focus on the single-bit error issue.  If you are sending your backup across the wire, single bit errors during transmission should already be detected as part of the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are most likely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?  Corruption on the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum algorithm), the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better than 32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an analysis of single-bit error detection against file size, and on the other hand, the verification of backups.
    
    From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A one in a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that your customer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling that option?
    
    —
    Mark Dilger
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
    
    
    
  134. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-26T18:02:29Z

    On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
    > I do agree with excluding things like md5 and others that aren't good
    > options.  I wasn't saying we should necessarily exclude crc32c either..
    > but rather saying that it shouldn't be the default.
    >
    > Here's another way to look at it- where do we use crc32c today, and how
    > much data might we possibly be covering with that crc?
    
    WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
    minus 1 byte. In practice, most of them are not more than a few
    hundred bytes, the amount we might possibly be covering is a lot more.
    
    > Why was crc32c
    > picked for that purpose?
    
    Because it was discovered that 64-bit CRC was too slow, per commit
    21fda22ec46deb7734f793ef4d7fa6c226b4c78e.
    
    > If the individual who decided to pick crc32c
    > for that case was contemplating a checksum for up-to-1GB files, would
    > they have picked crc32c?  Seems unlikely to me.
    
    It's hard to be sure what someone who isn't us would have done in some
    situation that they didn't face, but we do have the discussion thread:
    
    https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c
    
    The question of how much data is protected by the CRC was discussed,
    mostly in the first few messages, in general terms, but it doesn't
    seem to have covered the question very thoroughly. I'm sure we could
    each draw things from that discussion that support our view of the
    situation, but I'm not sure it would be very productive.
    
    What confuses to me is that you seem to have a view of the upsides and
    downsides of these various algorithms that seems to me to be highly
    skewed. Like, suppose we change the default from CRC-32C to
    SHA-something. On the upside, the error detection rate will increase
    from 99.9999999+% to something much closer to 100%. On the downside,
    backups will get as much as 40-50% slower for some users. I hope we
    can agree that both detecting errors and taking backups quickly are
    important. However, it is hard for me to imagine that the typical user
    would want to pay even a 5-10% performance penalty when taking a
    backup in order to improve an error detection feature which they may
    not even use and which already has less than a one-in-a-billion chance
    of going wrong. We routinely reject features for causing, say, a 2%
    regression on general workloads. Base backup speed is probably less
    important than how many SELECT or INSERT queries you can pump through
    the system in a second, but it's still a pain point for lots of
    people. I think if you said to some users "hey, would you like to have
    error detection for your backups? it'll cost 10%" many people would
    say "yes, please." But I think if you went to the same users and said
    "hey, would you like to make the error detection for your backups
    better? it currently has a less than 1-in-a-billion chance of failing
    to detect random corruption, and you can reduce that by many orders of
    magnitude for an extra 10% on your backup time," I think the results
    would be much more mixed. Some people would like it, but it certainly
    not everybody.
    
    > I'm not actually argueing about which hash functions we should support,
    > but rather what the default is and if crc32c, specifically, is actually
    > a reasonable choice.  Just because it's fast and we already had an
    > implementation of it doesn't justify its use as the default.  Given that
    > it doesn't actually provide the check that is generally expected of
    > CRC checksums (100% detection of single-bit errors) when the file size
    > gets over 512MB makes me wonder if we should have it at all, yes, but it
    > definitely makes me think it shouldn't be our default.
    
    I mean, the property that I care about is the one where it detects
    better than 999,999,999 errors out of every 1,000,000,000, regardless
    of input length.
    
    > I don't agree with limiting our view to only those algorithms that we've
    > already got implemented in PG.
    
    I mean, opening that giant can of worms ~2 weeks before feature freeze
    is not very nice. This patch has been around for months, and the
    algorithms were openly discussed a long time ago. I checked and found
    out that the CRC-64 code was nuked in commit
    404bc51cde9dce1c674abe4695635612f08fe27e, so in theory we could revert
    that, but how much confidence do we have that the code in question
    actually did the right thing, or that it's actually fast? An awful lot
    of work has been done on the CRC-32C code over the years, including
    several rounds of speeding it up
    (f044d71e331d77a0039cec0a11859b5a3c72bc95,
    3dc2d62d0486325bf263655c2d9a96aee0b02abe) and one round of fixing it
    because it was producing completely wrong answers
    (5028f22f6eb0579890689655285a4778b4ffc460), so I don't have a lot of
    confidence about that CRC-64 code being totally without problems.
    
    The commit message for that last commit,
    5028f22f6eb0579890689655285a4778b4ffc460, seems pretty relevant in
    this context, too. It observes that, because it "does not correspond
    to any bit-wise CRC calculation" it is "difficult to reason about its
    properties." In other words, the algorithm that we used for WAL
    records for many years likely did not have the guaranteed
    error-detection properties with which you are so concerned (nor do
    most hash functions we might choose; CRC-64 is probably the only
    choice that would). Despite that, the commit message also observed
    that "it has worked well in practice." I realize I'm not convincing
    you of anything here, but the guaranteed error-detection properties of
    CRC are almost totally uninteresting in this context. I'm not
    concerned that CRC-32C doesn't have those properties. I'm not
    concerned that SHA-n wouldn't have those properties. I'm not concerned
    that xxhash or HighwayHash don't have that property either. I doubt
    the fact that CRC-64 would have that property would give us much
    benefit. I think the only things that matter here are (1) how many
    bits you get (more bits = better chance of finding errors, but even
    *sixteen* bits would give you a pretty fair chance of noticing if
    things are broken) and (2) whether you want a cryptographic hash
    function so that you can keep the backup manifest in a vault.
    
    > It's saying, removing the listing aspect, exactly that "backup_label is
    > excluded from verification".  That's what I am taking issue with.  I've
    > made multiple attempts to suggest other language to avoid saying that
    > because it's clearly wrong- the manifest is verified.
    
    Well, it's talking about the particular kind of verification that has
    just been discussed, not any form of verification. As one idea,
    perhaps instead of:
    
    + Certain files and directories are
    +   excluded from verification:
    
    ...I could maybe insert a paragraph break there and then continue with
    something like this:
    
    When pg_basebackup compares the files and directories in the manifest
    to those which are present on disk, it will ignore the presence of, or
    changes to, certain files:
    
    backup_manifest will not be present in the manifest itself, and is
    therefore ignored. Note that the manifest is still verified
    internally, as described above, but no error will be issued about the
    presence of a backup_manifest file in the backup directory even though
    it is not listed in the manifest.
    
    Would that be more clear? Do you want to suggest something else?
    
    > I'm not talking about making sure that no error ever happens when doing
    > I'm saying that the existing tool that takes the backup has a *really*
    > *important* verification check that this proposed "validate backup" tool
    > doesn't have, and that isn't sensible.  It leads to situations where the
    > backup tool itself, pg_basebackup, can fail or be killed before it's
    > actually completed, and the "validate backup" tool would say that the
    > backup is perfectly fine.  That is not sensible.
    
    If someone's procedure for taking and restoring backups involves not
    knowing whether or not pg_basebackup completed without error and then
    trying to use the backup anyway, they are doing something which is
    very foolish, and it's questionable whether any technological solution
    has much hope of getting them out of trouble. But on the plus side,
    this patch would have a good chance of detecting the problem, which is
    a noticeable improvement over what we have now, which has no chance of
    detecting the problem, because we have nothing.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  135. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-26T19:37:11Z

    Greetings,
    
    * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
    > > On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
    > > I'm not actually argueing about which hash functions we should support,
    > > but rather what the default is and if crc32c, specifically, is actually
    > > a reasonable choice.  Just because it's fast and we already had an
    > > implementation of it doesn't justify its use as the default.  Given that
    > > it doesn't actually provide the check that is generally expected of
    > > CRC checksums (100% detection of single-bit errors) when the file size
    > > gets over 512MB makes me wonder if we should have it at all, yes, but it
    > > definitely makes me think it shouldn't be our default.
    > 
    > I don't understand your focus on the single-bit error issue.  
    
    Maybe I'm wrong, but my understanding was that detecting single-bit
    errors was one of the primary design goals of CRC and why people talk
    about CRCs of certain sizes having 'limits'- that's the size at which
    single-bit errors will no longer, necessarily, be picked up and
    therefore that's where the CRC of that size starts falling down on that
    goal.
    
    > If you are sending your backup across the wire, single bit errors during transmission should already be detected as part of the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are most likely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?  Corruption on the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum algorithm), the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better than 32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an analysis of single-bit error detection against file size, and on the other hand, the verification of backups.
    
    We'd like something that does a good job at detecting any differences
    between when the file was copied off of the server and when the command
    is run- potentially weeks or months later.  I would expect most issues
    to end up being storage-level corruption over time where the backup is
    stored, which could be single bit flips or whole pages getting zeroed or
    various other things.  Files changing size probably is one of the less
    common things, but, sure, that too.
    
    That we could take this "arbitrarily far" is actually entirely fine-
    that's a good reason to have alternatives, which this patch does have,
    but that doesn't mean we should have a default that's not suitable for
    the files that we know we're going to be storing.
    
    Consider that we could have used a 16-bit CRC instead, but does that
    actually make sense?  Ok, sure, maybe someone really wants something
    super fast- but should that be our default?  If not, then what criteria
    should we use for the default?
    
    > From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A one in a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that your customer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling that option?
    
    The argument is that adding checksums takes more time.  I can understand
    that argument, though I don't really agree with it.  Certainly a few
    percent really shouldn't be that big of an issue, and in many cases even
    a sha256 hash isn't going to have that dramatic of an impact on the
    actual overall time.
    
    Thanks,
    
    Stephen
    
  136. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-26T20:37:47Z

    On 3/26/20 11:37 AM, Robert Haas wrote:
    >> On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfrost@snowman.net> wrot >
    > This is where I feel like I'm trying to make decisions in a vacuum. If
    > we had a few more people weighing in on the thread on this point, I'd
    > be happy to go with whatever the consensus was. If most people think
    > having both --no-manifest (suppressing the manifest completely) and
    > --manifest-checksums=none (suppressing only the checksums) is useless
    > and confusing, then sure, let's rip the latter one out. If most people
    > like the flexibility, let's keep it: it's already implemented and
    > tested. But I hate to base the decision on what one or two people
    > think.
    
    I'm not sure I see a lot of value to being able to build manifest with 
    no checksums, especially if overhead for the default checksum algorithm 
    is negligible.
    
    However, I'd still prefer that the default be something more robust and 
    allow users to tune it down rather than the other way around.  But I've 
    made that pretty clear up-thread and I consider that argument lost at 
    this point.
    
    >> As for folks who are that close to the edge on their backup timing that
    >> they can't have it slow down- chances are pretty darn good that they're
    >> not far from ending up needing to find a better solution than
    >> pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
    >> suppose, they could have one but not have checksums..).
    > 
    > 40-50% is a lot more than "if you were on the edge."
    
    For the record I think this is a very misleading number.  Sure, if you 
    are doing your backup to a local SSD on a powerful development laptop it 
    makes sense.
    
    But backups are generally placed on slower storage, remotely, with 
    compression.  Even without compression the first two are going to bring 
    this percentage down by a lot.
    
    When you get to page-level incremental backups, which is where this all 
    started, I'd still recommend using a stronger checksum algorithm to 
    verify that the file was reconstructed correctly on restore.  That much 
    I believe we have agreed on.
    
    >> Even pg_basebackup (in both fetch and stream modes...) checks that we at
    >> least got all the WAL that's needed for the backup from the server
    >> before considering the backup to be valid and telling the user that
    >> there was a successful backup.  With what you're proposing here, we
    >> could have someone do a pg_basebackup, get back an ERROR saying the
    >> backup wasn't valid, and then run pg_validatebackup and be told that the
    >> backup is valid.  I don't get how that's sensible.
    > 
    > I'm sorry that you can't see how that's sensible, but it doesn't mean
    > that it isn't sensible. It is totally unrealistic to expect that any
    > backup verification tool can verify that you won't get an error when
    > trying to use the backup. That would require that everything that the
    > validation tool try to do everything that PostgreSQL will try to do
    > when the backup is used, including running recovery and updating the
    > data files. Anything less than that creates a real possibility that
    > the backup will verify good but fail when used. This tool has a much
    > narrower purpose, which is to try to verify that we (still) have the
    > files the server sent as part of the backup and that, to the best of
    > our ability to detect such things, they have not been modified. As you
    > know, or should know, the WAL files are not sent as part of the
    > backup, and so are not verified. Other things that would also be
    > useful to check are also not verified. It would be fantastic to have
    > more verification tools in the future, but it is difficult to see why
    > anyone would bother trying if an attempt to get the first one
    > committed gets blocked because it does not yet do everything. Very few
    > patches try to do everything, and those that do usually get blocked
    > because, by trying to do too much, they get some of it badly wrong.
    
    I agree with Stephen that this should be done, but I agree with you that 
    it can wait for a future commit. However, I do think:
    
    1) It should be called out rather plainly in the documentation.
    2) If there are files in pg_wal then pg_validatebackup should inform the 
    user that those files have not been validated.
    
    I know you and Stephen have agreed on a number of doc changes, would it 
    be possible to get a new patch with those included? I finally have time 
    to do a review of this tomorrow.  I saw some mistakes in the docs in the 
    current patch but I know those patches are not current.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  137. Re: backup manifests

    Mark Dilger <mark.dilger@enterprisedb.com> — 2020-03-26T20:38:13Z

    
    > On Mar 26, 2020, at 12:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
    > 
    > Greetings,
    > 
    > * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
    >>> On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
    >>> I'm not actually argueing about which hash functions we should support,
    >>> but rather what the default is and if crc32c, specifically, is actually
    >>> a reasonable choice.  Just because it's fast and we already had an
    >>> implementation of it doesn't justify its use as the default.  Given that
    >>> it doesn't actually provide the check that is generally expected of
    >>> CRC checksums (100% detection of single-bit errors) when the file size
    >>> gets over 512MB makes me wonder if we should have it at all, yes, but it
    >>> definitely makes me think it shouldn't be our default.
    >> 
    >> I don't understand your focus on the single-bit error issue.  
    > 
    > Maybe I'm wrong, but my understanding was that detecting single-bit
    > errors was one of the primary design goals of CRC and why people talk
    > about CRCs of certain sizes having 'limits'- that's the size at which
    > single-bit errors will no longer, necessarily, be picked up and
    > therefore that's where the CRC of that size starts falling down on that
    > goal.
    
    I think I agree with all that.  I'm not sure it is relevant.  When people use CRCs to detect things *other than* transmission errors, they are in some sense using a hammer to drive a screw.  At that point, the analysis of how good the hammer is, and how big a nail it can drive, is no longer relevant.  The relevant discussion here is how appropriate a CRC is for our purpose.  I don't know the answer to that, but it doesn't seem the single-bit error analysis is the right analysis.
    
    >> If you are sending your backup across the wire, single bit errors during transmission should already be detected as part of the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are most likely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?  Corruption on the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum algorithm), the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better than 32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an analysis of single-bit error detection against file size, and on the other hand, the verification of backups.
    > 
    > We'd like something that does a good job at detecting any differences
    > between when the file was copied off of the server and when the command
    > is run- potentially weeks or months later.  I would expect most issues
    > to end up being storage-level corruption over time where the backup is
    > stored, which could be single bit flips or whole pages getting zeroed or
    > various other things.  Files changing size probably is one of the less
    > common things, but, sure, that too.
    > 
    > That we could take this "arbitrarily far" is actually entirely fine-
    > that's a good reason to have alternatives, which this patch does have,
    > but that doesn't mean we should have a default that's not suitable for
    > the files that we know we're going to be storing.
    > 
    > Consider that we could have used a 16-bit CRC instead, but does that
    > actually make sense?  Ok, sure, maybe someone really wants something
    > super fast- but should that be our default?  If not, then what criteria
    > should we use for the default?
    
    I'll answer this below....
    
    >> From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A one in a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that your customer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling that option?
    > 
    > The argument is that adding checksums takes more time.  I can understand
    > that argument, though I don't really agree with it.  Certainly a few
    > percent really shouldn't be that big of an issue, and in many cases even
    > a sha256 hash isn't going to have that dramatic of an impact on the
    > actual overall time.
    
    I see two dangers here:
    
    (1) The user enables checksums of some type, and due to checksums not being perfect, corruption happens but goes undetected, leaving her in a bad place.
    
    (2) The user makes no checksum selection at all, gets checksums of the *default* type, determines it is too slow for her purposes, and instead of adjusting the checksum algorithm to something faster, simply turns checksums off; corruption happens and of course is undetected, leaving her in a bad place.
    
    I think the risk of (2) is far worse, which makes me tend towards a default that is fast enough not to encourage anybody to disable checksums altogether.  I have no opinion about which algorithm is best suited to that purpose, because I haven't benchmarked any.  I'm pretty much going off what Robert said, in terms of how big an impact using a heavier algorithm would be.  Perhaps you'd like to run benchmarks and make a concrete proposal for another algorithm, with numbers showing the runtime changes?  You mentioned up-thread that prior timings which showed a 40-50% slowdown were not including all the relevant stuff, so perhaps you could fix that in your benchmark and let us know what is included in the timings?
    
    I don't think we should be contemplating for v13 any checksum algorithms for the default except the ones already in the options list.  Doing that just derails the patch.  If you want highwayhash or similar to be the default, can't we hold off until v14 and think about changing the default?  Maybe I'm missing something, but I don't see any reason why it would be hard to change this after the first version has already been released.
    
    —
    Mark Dilger
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
    
    
    
  138. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-26T20:44:14Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > I do agree with excluding things like md5 and others that aren't good
    > > options.  I wasn't saying we should necessarily exclude crc32c either..
    > > but rather saying that it shouldn't be the default.
    > >
    > > Here's another way to look at it- where do we use crc32c today, and how
    > > much data might we possibly be covering with that crc?
    > 
    > WAL record size is a 32-bit unsigned integer, so in theory, up to 4GB
    > minus 1 byte. In practice, most of them are not more than a few
    > hundred bytes, the amount we might possibly be covering is a lot more.
    
    Is it actually possible, today, in PG, to have a 4GB WAL record?
    Judging this based on the WAL record size doesn't seem quite right.
    
    > > Why was crc32c
    > > picked for that purpose?
    > 
    > Because it was discovered that 64-bit CRC was too slow, per commit
    > 21fda22ec46deb7734f793ef4d7fa6c226b4c78e.
    
    ... 15 years ago.  I actually find it pretty interesting that we started
    out with a 64bit CRC there, I didn't know that was the case.  Also
    interesting is that we had 64bit CRC code already.
    
    > > If the individual who decided to pick crc32c
    > > for that case was contemplating a checksum for up-to-1GB files, would
    > > they have picked crc32c?  Seems unlikely to me.
    > 
    > It's hard to be sure what someone who isn't us would have done in some
    > situation that they didn't face, but we do have the discussion thread:
    > 
    > https://www.postgresql.org/message-id/flat/9291.1117593389%40sss.pgh.pa.us#c4e413bbf3d7fbeced7786da1c3aca9c
    > 
    > The question of how much data is protected by the CRC was discussed,
    > mostly in the first few messages, in general terms, but it doesn't
    > seem to have covered the question very thoroughly. I'm sure we could
    > each draw things from that discussion that support our view of the
    > situation, but I'm not sure it would be very productive.
    
    Interesting.
    
    > What confuses to me is that you seem to have a view of the upsides and
    > downsides of these various algorithms that seems to me to be highly
    > skewed. Like, suppose we change the default from CRC-32C to
    > SHA-something. On the upside, the error detection rate will increase
    > from 99.9999999+% to something much closer to 100%. On the downside,
    > backups will get as much as 40-50% slower for some users. I hope we
    > can agree that both detecting errors and taking backups quickly are
    > important. However, it is hard for me to imagine that the typical user
    > would want to pay even a 5-10% performance penalty when taking a
    > backup in order to improve an error detection feature which they may
    > not even use and which already has less than a one-in-a-billion chance
    > of going wrong. We routinely reject features for causing, say, a 2%
    > regression on general workloads. Base backup speed is probably less
    > important than how many SELECT or INSERT queries you can pump through
    > the system in a second, but it's still a pain point for lots of
    > people. I think if you said to some users "hey, would you like to have
    > error detection for your backups? it'll cost 10%" many people would
    > say "yes, please." But I think if you went to the same users and said
    > "hey, would you like to make the error detection for your backups
    > better? it currently has a less than 1-in-a-billion chance of failing
    > to detect random corruption, and you can reduce that by many orders of
    > magnitude for an extra 10% on your backup time," I think the results
    > would be much more mixed. Some people would like it, but it certainly
    > not everybody.
    
    I think you're right that base backup speed is much less of an issue to
    slow down than SELECT or INSERT workloads, but I do also understand
    that it isn't completely unimportant, which is why having options isn't
    a bad idea here.  That said, the options presented for users should all
    be reasonable options, and for the default we should pick something
    sensible, erroring on the "be safer" side, if anything.
    
    There's lots of options for speeding up base backups, with this patch,
    even if the default is to have a manifest with sha256 hashes- it could
    be changed to some form of CRC, or changed to not have checksums, or
    changed to not have a manifest.  Users will have options.
    
    Again, I'm not against having a checksum algorithm as a option.  I'm not
    saying that it must be SHA512 as the default.
    
    > > I'm not actually argueing about which hash functions we should support,
    > > but rather what the default is and if crc32c, specifically, is actually
    > > a reasonable choice.  Just because it's fast and we already had an
    > > implementation of it doesn't justify its use as the default.  Given that
    > > it doesn't actually provide the check that is generally expected of
    > > CRC checksums (100% detection of single-bit errors) when the file size
    > > gets over 512MB makes me wonder if we should have it at all, yes, but it
    > > definitely makes me think it shouldn't be our default.
    > 
    > I mean, the property that I care about is the one where it detects
    > better than 999,999,999 errors out of every 1,000,000,000, regardless
    > of input length.
    
    Throwing these kinds of things around I really don't think is useful.
    
    > > I don't agree with limiting our view to only those algorithms that we've
    > > already got implemented in PG.
    > 
    > I mean, opening that giant can of worms ~2 weeks before feature freeze
    > is not very nice. This patch has been around for months, and the
    > algorithms were openly discussed a long time ago. 
    
    Yes, they were discussed before, and these issues were brought up before
    and there was specifically concern brought up about exactly the same
    issues that I'm repeating here.  Those concerns seem to have been
    largely ignored, apparently because "we don't have that in PG today" as
    at least one of the considerations- even though we used to.  I don't
    think that was the right response and, yeah, I saw that you were
    planning to commit and that prompted me to look into it right now.  I
    don't think that's entirely uncommon around here.  I also had hoped that
    David's concerns that were raised before had been heeded, as I knew he
    was involved in the discussion previously, but that turns out to not
    have been the case.
    
    > > It's saying, removing the listing aspect, exactly that "backup_label is
    > > excluded from verification".  That's what I am taking issue with.  I've
    > > made multiple attempts to suggest other language to avoid saying that
    > > because it's clearly wrong- the manifest is verified.
    > 
    > Well, it's talking about the particular kind of verification that has
    > just been discussed, not any form of verification. As one idea,
    > perhaps instead of:
    > 
    > + Certain files and directories are
    > +   excluded from verification:
    > 
    > ...I could maybe insert a paragraph break there and then continue with
    > something like this:
    > 
    > When pg_basebackup compares the files and directories in the manifest
    > to those which are present on disk, it will ignore the presence of, or
    > changes to, certain files:
    > 
    > backup_manifest will not be present in the manifest itself, and is
    > therefore ignored. Note that the manifest is still verified
    > internally, as described above, but no error will be issued about the
    > presence of a backup_manifest file in the backup directory even though
    > it is not listed in the manifest.
    > 
    > Would that be more clear? Do you want to suggest something else?
    
    Yes, that looks fine.  Feels slightly redundant to include the "as
    described above ..." bit, and I think that could be dropped, but up to
    you.
    
    > > I'm not talking about making sure that no error ever happens when doing
    > > I'm saying that the existing tool that takes the backup has a *really*
    > > *important* verification check that this proposed "validate backup" tool
    > > doesn't have, and that isn't sensible.  It leads to situations where the
    > > backup tool itself, pg_basebackup, can fail or be killed before it's
    > > actually completed, and the "validate backup" tool would say that the
    > > backup is perfectly fine.  That is not sensible.
    > 
    > If someone's procedure for taking and restoring backups involves not
    > knowing whether or not pg_basebackup completed without error and then
    > trying to use the backup anyway, they are doing something which is
    > very foolish, and it's questionable whether any technological solution
    > has much hope of getting them out of trouble. But on the plus side,
    > this patch would have a good chance of detecting the problem, which is
    > a noticeable improvement over what we have now, which has no chance of
    > detecting the problem, because we have nothing.
    
    This doesn't address my concern at all.  Even if it seems ridiculous and
    foolish to think that a backup was successful when the system was
    rebooted and pg_basebackup was killed before all of the WAL had made it
    into pg_wal, there is absolutely zero doubt in my mind that it's going
    to happen and users are going to, entirely reasonably, think that
    pg_validatebackup at least includes all the checks that pg_basebackup
    does about making sure that the backup is valid.
    
    I really don't understand how we can have a backup validation tool that
    doesn't do the absolute basics, like making sure that we have all of the
    WAL for the backup.  I've routinely, almost jokingly, said to folks that
    any backup tool that doesn't check that isn't really a backup tool, and
    I was glad that pg_basebackup had that check, so, yeah, I'm going to
    continue to object to committing a backup validation tool that doesn't
    have that absolutely basic and necessary check.
    
    Thanks,
    
    Stephen
    
  139. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-26T21:00:00Z

    Greetings,
    
    * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
    > > On Mar 26, 2020, at 12:37 PM, Stephen Frost <sfrost@snowman.net> wrote:
    > > * Mark Dilger (mark.dilger@enterprisedb.com) wrote:
    > >>> On Mar 26, 2020, at 9:34 AM, Stephen Frost <sfrost@snowman.net> wrote:
    > >>> I'm not actually argueing about which hash functions we should support,
    > >>> but rather what the default is and if crc32c, specifically, is actually
    > >>> a reasonable choice.  Just because it's fast and we already had an
    > >>> implementation of it doesn't justify its use as the default.  Given that
    > >>> it doesn't actually provide the check that is generally expected of
    > >>> CRC checksums (100% detection of single-bit errors) when the file size
    > >>> gets over 512MB makes me wonder if we should have it at all, yes, but it
    > >>> definitely makes me think it shouldn't be our default.
    > >> 
    > >> I don't understand your focus on the single-bit error issue.  
    > > 
    > > Maybe I'm wrong, but my understanding was that detecting single-bit
    > > errors was one of the primary design goals of CRC and why people talk
    > > about CRCs of certain sizes having 'limits'- that's the size at which
    > > single-bit errors will no longer, necessarily, be picked up and
    > > therefore that's where the CRC of that size starts falling down on that
    > > goal.
    > 
    > I think I agree with all that.  I'm not sure it is relevant.  When people use CRCs to detect things *other than* transmission errors, they are in some sense using a hammer to drive a screw.  At that point, the analysis of how good the hammer is, and how big a nail it can drive, is no longer relevant.  The relevant discussion here is how appropriate a CRC is for our purpose.  I don't know the answer to that, but it doesn't seem the single-bit error analysis is the right analysis.
    
    I disagree that it's not relevant- it's, in fact, the one really clear
    thing we can get a pretty straight-forward answer on, and that seems
    really useful to me.
    
    > >> If you are sending your backup across the wire, single bit errors during transmission should already be detected as part of the networking protocol.  The real issue has to be detection of the kinds of errors or modifications that are most likely to happen in practice.  Which are those?  People manually mucking with the files?  Bugs in backup scripts?  Corruption on the storage device?  Truncated files?  The more bits in the checksum (assuming a well designed checksum algorithm), the more likely we are to detect accidental modification, so it is no surprise if a 64-bit crc does better than 32-bit crc.  But that logic can be taken arbitrarily far.  I don't see the connection between, on the one hand, an analysis of single-bit error detection against file size, and on the other hand, the verification of backups.
    > > 
    > > We'd like something that does a good job at detecting any differences
    > > between when the file was copied off of the server and when the command
    > > is run- potentially weeks or months later.  I would expect most issues
    > > to end up being storage-level corruption over time where the backup is
    > > stored, which could be single bit flips or whole pages getting zeroed or
    > > various other things.  Files changing size probably is one of the less
    > > common things, but, sure, that too.
    > > 
    > > That we could take this "arbitrarily far" is actually entirely fine-
    > > that's a good reason to have alternatives, which this patch does have,
    > > but that doesn't mean we should have a default that's not suitable for
    > > the files that we know we're going to be storing.
    > > 
    > > Consider that we could have used a 16-bit CRC instead, but does that
    > > actually make sense?  Ok, sure, maybe someone really wants something
    > > super fast- but should that be our default?  If not, then what criteria
    > > should we use for the default?
    > 
    > I'll answer this below....
    > 
    > >> From a support perspective, I think the much more important issue is making certain that checksums are turned on.  A one in a billion chance of missing an error seems pretty acceptable compared to the, let's say, one in two chance that your customer didn't use checksums.  Why are we even allowing this to be turned off?  Is there a usage case compelling that option?
    > > 
    > > The argument is that adding checksums takes more time.  I can understand
    > > that argument, though I don't really agree with it.  Certainly a few
    > > percent really shouldn't be that big of an issue, and in many cases even
    > > a sha256 hash isn't going to have that dramatic of an impact on the
    > > actual overall time.
    > 
    > I see two dangers here:
    > 
    > (1) The user enables checksums of some type, and due to checksums not being perfect, corruption happens but goes undetected, leaving her in a bad place.
    > 
    > (2) The user makes no checksum selection at all, gets checksums of the *default* type, determines it is too slow for her purposes, and instead of adjusting the checksum algorithm to something faster, simply turns checksums off; corruption happens and of course is undetected, leaving her in a bad place.
    
    Alright, I have tried to avoid referring back to pgbackrest, but I can't
    help it here.
    
    We have never, ever, had a user come to us and complain that pgbackrest
    is too slow because we're using a SHA hash.  We have also had them by
    default since absolutely day number one, and we even removed the option
    to disable them in 1.0.  We've never even been asked if we should
    implement some other hash or checksum which is faster.
    
    > I think the risk of (2) is far worse, which makes me tend towards a default that is fast enough not to encourage anybody to disable checksums altogether.  I have no opinion about which algorithm is best suited to that purpose, because I haven't benchmarked any.  I'm pretty much going off what Robert said, in terms of how big an impact using a heavier algorithm would be.  Perhaps you'd like to run benchmarks and make a concrete proposal for another algorithm, with numbers showing the runtime changes?  You mentioned up-thread that prior timings which showed a 40-50% slowdown were not including all the relevant stuff, so perhaps you could fix that in your benchmark and let us know what is included in the timings?
    
    I don't even know what the 40-50% slowdown numbers included.  Also, the
    general expectation in this community is that whomever is pushing a
    given patch forward should be providing the benchmarks to justify their
    choice.
    
    > I don't think we should be contemplating for v13 any checksum algorithms for the default except the ones already in the options list.  Doing that just derails the patch.  If you want highwayhash or similar to be the default, can't we hold off until v14 and think about changing the default?  Maybe I'm missing something, but I don't see any reason why it would be hard to change this after the first version has already been released.
    
    I'd rather we default to something that we are all confident and happy
    with, erroring on the side of it being overkill rather than something
    that we know isn't really appropriate for the data volume.
    
    Thanks,
    
    Stephen
    
  140. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T04:30:09Z

    Hi,
    
    On 2020-03-26 11:37:48 -0400, Robert Haas wrote:
    > I mean, you're just repeating the same argument here, and it's just
    > not valid. Regardless of the file size, the chances of a false
    > checksum match are literally less than one in a billion. There is
    > every reason to believe that users will be happy with a low-overhead
    > method that has a 99.9999999+% chance of detecting corrupt files. I do
    > agree that a 64-bit CRC would probably be not much more expensive and
    > improve the probability of detecting errors even further
    
    I *seriously* doubt that it's true that 64bit CRCs wouldn't be
    slower. The only reason CRC32C is semi-fast is that we're accelerating
    it using hardware instructions (on x86-64 and ARM at least). Before that
    it was very regularly the bottleneck for processing WAL - and it still
    sometimes is. Most CRCs aren't actually very fast to compute, because
    they don't lend themselves to benefit from ILP or SIMD.  We spent a fair
    bit of time optimizing our crc implementation before the hardware
    support was widespread.
    
    
    > but I wanted to restrict this patch to using infrastructure we already
    > have. The choices there are the various SHA functions (so I supported
    > those), MD5 (which I deliberately omitted, for reasons I hope you'll
    > be the first to agree with), CRC-32C (which is fast), a couple of
    > other CRC-32 variants (which I omitted because they seemed redundant
    > and one of them only ever existed in PostgreSQL because of a coding
    > mistake), and the hacked-up version of FNV that we use for page-level
    > checksums (which is only 16 bits and seems to have no advantages for
    > this purpose).
    
    FWIW, FNV is only 16bit because we reduce its size to 16 bit. See the
    tail of pg_checksum_page.
    
    
    I'm not sure the error detection guarantees of various CRC algorithms
    are that relevant here, btw. IMO, for something like checksums in a
    backup, just having a single one-bit error isn't as common as having
    larger errors (e.g. entire blocks beeing zeroed). And to detect that
    32bit checksums aren't that good.
    
    
    > > As for folks who are that close to the edge on their backup timing that
    > > they can't have it slow down- chances are pretty darn good that they're
    > > not far from ending up needing to find a better solution than
    > > pg_basebackup anyway.  Or they don't need to generate a manifest (or, I
    > > suppose, they could have one but not have checksums..).
    > 
    > 40-50% is a lot more than "if you were on the edge."
    
    sha256 does about approx 400MB/s per core on modern intel CPUs. That's
    way below commonly accessible storage / network capabilities (and even
    if you're only doing 200MB/s, you're still going to spend roughly half
    of the CPU time just doing hashing.  It's unlikely that you're going to
    see much speedups for sha256 just by upgrading a CPU. While there are
    hardware instructions available, they don't result in all that large
    improvements. Of course, we could also start using the GPU (err, really
    no).
    
    Defaulting to that makes very little sense to me. You're not just going
    to spend that time while backing up, but also when validating backups
    (i.e. network limits suddenly aren't a relevant bottleneck anymore).
    
    
    > > I fail to see the usefulness of a tool that doesn't actually verify that
    > > the backup is able to be restored from.
    > >
    > > Even pg_basebackup (in both fetch and stream modes...) checks that we at
    > > least got all the WAL that's needed for the backup from the server
    > > before considering the backup to be valid and telling the user that
    > > there was a successful backup.  With what you're proposing here, we
    > > could have someone do a pg_basebackup, get back an ERROR saying the
    > > backup wasn't valid, and then run pg_validatebackup and be told that the
    > > backup is valid.  I don't get how that's sensible.
    > 
    > I'm sorry that you can't see how that's sensible, but it doesn't mean
    > that it isn't sensible. It is totally unrealistic to expect that any
    > backup verification tool can verify that you won't get an error when
    > trying to use the backup. That would require that everything that the
    > validation tool try to do everything that PostgreSQL will try to do
    > when the backup is used, including running recovery and updating the
    > data files. Anything less than that creates a real possibility that
    > the backup will verify good but fail when used. This tool has a much
    > narrower purpose, which is to try to verify that we (still) have the
    > files the server sent as part of the backup and that, to the best of
    > our ability to detect such things, they have not been modified. As you
    > know, or should know, the WAL files are not sent as part of the
    > backup, and so are not verified. Other things that would also be
    > useful to check are also not verified. It would be fantastic to have
    > more verification tools in the future, but it is difficult to see why
    > anyone would bother trying if an attempt to get the first one
    > committed gets blocked because it does not yet do everything. Very few
    > patches try to do everything, and those that do usually get blocked
    > because, by trying to do too much, they get some of it badly wrong.
    
    It sounds to me that if there are to be manifests for the WAL, it should
    be a separate (set of) manifests. Trying to somehow tie together the
    manifest for the base backup, and the one for the WAL, makes little
    sense to me. They're commonly not computed in one place, often not even
    stored in the same place. For PITR relevant WAL doesn't even exist yet
    at the time the manifest is created (and thus obviously cannot be
    included in the base backup manifest). And fairly obviously one would
    want to be able to verify the correctness of WAL between two
    basebackups.
    
    I don't see much point in complicating the design to somehow capture WAL
    in the manifest, when it's only going to solve a small set of cases.
    
    Seems better to (later?) add support for generating manifests for WAL
    files, and then have a tool that can verify all the manifests required
    to restore a base backup.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  141. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T05:06:44Z

    Hi,
    
    On 2020-03-26 14:02:29 -0400, Robert Haas wrote:
    > On Thu, Mar 26, 2020 at 12:34 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > Why was crc32c
    > > picked for that purpose?
    > 
    > Because it was discovered that 64-bit CRC was too slow, per commit
    > 21fda22ec46deb7734f793ef4d7fa6c226b4c78e.
    
    Well, a 32bit crc, not crc32c. IIRC it was the ethernet polynomial (+
    bug). We switched to crc32c at some point because there are hardware
    implementations:
    
    commit 5028f22f6eb0579890689655285a4778b4ffc460
    Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
    Date:   2014-11-04 11:35:15 +0200
    
        Switch to CRC-32C in WAL and other places.
    
    
    > Like, suppose we change the default from CRC-32C to SHA-something. On
    > the upside, the error detection rate will increase from 99.9999999+%
    > to something much closer to 100%.
    
    FWIW, I don't buy the relevancy of 99.9999999+% at all. That's assuming
    a single bit error (at relevant lengths, before that it's single burst
    errors of a greater length), which isn't that relevant for our purposes.
    
    That's not to say that I don't think a CRC check can provide value. It
    does provide a high likelihood of detecting enough errors, including
    coding errors in how data is restored (not unimportant), that you're
    likely not find out aobut a problem soon.
    
    
    > On the downside,
    > backups will get as much as 40-50% slower for some users. I hope we
    > can agree that both detecting errors and taking backups quickly are
    > important. However, it is hard for me to imagine that the typical user
    > would want to pay even a 5-10% performance penalty when taking a
    > backup in order to improve an error detection feature which they may
    > not even use and which already has less than a one-in-a-billion chance
    > of going wrong.
    
    FWIW, that seems far too large a slowdown to default to for me. Most
    people aren't going to be able to figure out that it's the checksum
    parameter that causes this slowdown, there just going to feel the pain
    of the backup being much slower than their hardware.
    
    A few hundred megabytes of streaming reads/writes really doesn't take a
    beefy server these days. Medium sized VMs + a bit larger network block
    devices at all the common cloud providers have considerably higher
    bandwidth. Even a raid5x of 4 spinning disks can deliver > 500MB/s.
    
    And plenty of even the smaller instances at many providers have >
    5gbit/s network. At the upper end it's way more than that.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  142. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T05:31:21Z

    Hi,
    
    On 2020-03-26 15:37:11 -0400, Stephen Frost wrote:
    > The argument is that adding checksums takes more time.  I can understand
    > that argument, though I don't really agree with it.  Certainly a few
    > percent really shouldn't be that big of an issue, and in many cases even
    > a sha256 hash isn't going to have that dramatic of an impact on the
    > actual overall time.
    
    I don't understand how you can come to that conclusion?  It doesn't take
    very long to measure openssl's sha256 performance (which is pretty well
    optimized). Note that we do use openssl's sha256, when compiled with
    openssl support.
    
    On my workstation, with a pretty new (but not fastest single core perf
    model) intel Xeon Gold 5215, I get:
    
    $ openssl speed sha256
    ...
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    sha256           76711.75k   172036.78k   321566.89k   399008.09k   431423.49k   433689.94k
    
    IOW, ~430MB/s.
    
    
    On my laptop, with pretty fast cores:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    sha256           97054.91k   217188.63k   394864.13k   493441.02k   532100.44k   533441.19k
    
    IOW, 530MB/s
    
    
    530 MB/s is well within the realm of medium sized VMs.
    
    And, as mentioned before. even if you do only half of that, you're still
    going to be spending roughly half of the CPU time of sending a base
    backup.
    
    What makes you think that a few hundred MB/s is out of reach for a large
    fraction of PG installations that actually keep backups?
    
    Greetings,
    
    Andres Freund
    
    
    
    
  143. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T06:29:27Z

    Hi,
    
    On 2020-03-23 12:15:54 -0400, Robert Haas wrote:
    > +       <varlistentry>
    > +        <term><literal>MANIFEST</literal></term>
    > +        <listitem>
    > +         <para>
    > +          When this option is specified with a value of <literal>ye'</literal>
    
    s/ye'/yes/
    
    > +          or <literal>force-escape</literal>, a backup manifest is created
    > +          and sent along with the backup. The latter value forces all filenames
    > +          to be hex-encoded; otherwise, this type of encoding is performed only
    > +          for files whose names are non-UTF8 octet sequences.
    > +          <literal>force-escape</literal> is intended primarily for testing
    > +          purposes, to be sure that clients which read the backup manifest
    > +          can handle this case. For compatibility with previous releases,
    > +          the default is <literal>MANIFEST 'no'</literal>.
    > +         </para>
    > +        </listitem>
    > +       </varlistentry>
    
    Are you planning to include a specification of the manifest file format
    anywhere? I looked through the patches and didn't find anything.
    
    I think it'd also be good to include more information about what the
    point of manifest files actually is.
    
    
    > +  <para>
    > +   <application>pg_validatebackup</application> reads the manifest file of a
    > +   backup, verifies the manifest against its own internal checksum, and then
    > +   verifies that the same files are present in the target directory as in the
    > +   manifest itself. It then verifies that each file has the expected checksum,
    > +   unless the backup was taken the checksum algorithm set to
    > +   <literal>none</literal>, in which case checksum verification is not
    > +   performed. The presence or absence of directories is not checked, except
    > +   indirectly: if a directory is missing, any files it should have contained
    > +   will necessarily also be missing. Certain files and directories are
    > +   excluded from verification:
    > +  </para>
    
    Depending on what you want to use the manifest for, we'd also need to
    check that there are no additional files. That seems to actually be
    implemented, which imo should be mentioned here.
    
    
    
    
    > +/*
    > + * Finalize the backup manifest, and send it to the client.
    > + */
    > +static void
    > +SendBackupManifest(manifest_info *manifest)
    > +{
    > +	StringInfoData protobuf;
    > +	uint8		checksumbuf[PG_SHA256_DIGEST_LENGTH];
    > +	char		checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
    > +	size_t		manifest_bytes_done = 0;
    > +
    > +	/*
    > +	 * If there is no buffile, then the user doesn't want a manifest, so
    > +	 * don't waste any time generating one.
    > +	 */
    > +	if (manifest->buffile == NULL)
    > +		return;
    > +
    > +	/* Terminate the list of files. */
    > +	AppendStringToManifest(manifest, "],\n");
    > +
    > +	/*
    > +	 * Append manifest checksum, so that the problems with the manifest itself
    > +	 * can be detected.
    > +	 *
    > +	 * We always use SHA-256 for this, regardless of what algorithm is chosen
    > +	 * for checksumming the files.  If we ever want to make the checksum
    > +	 * algorithm used for the manifest file variable, the client will need a
    > +	 * way to figure out which algorithm to use as close to the beginning of
    > +	 * the manifest file as possible, to avoid having to read the whole thing
    > +	 * twice.
    > +	 */
    > +	manifest->still_checksumming = false;
    > +	pg_sha256_final(&manifest->manifest_ctx, checksumbuf);
    > +	AppendStringToManifest(manifest, "\"Manifest-Checksum\": \"");
    > +	hex_encode((char *) checksumbuf, sizeof checksumbuf, checksumstringbuf);
    > +	checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH - 1] = '\0';
    > +	AppendStringToManifest(manifest, checksumstringbuf);
    > +	AppendStringToManifest(manifest, "\"}\n");
    
    Hm. Is it a great choice to include the checksum for the manifest inside
    the manifest itself? With a cryptographic checksum it seems like it
    could make a ton of sense to store the checksum somewhere "safe", but
    keep the manifest itself alongside the base backup itself. While not
    huge, they won't be tiny either.
    
    
    
    > diff --git a/src/bin/pg_validatebackup/parse_manifest.c b/src/bin/pg_validatebackup/parse_manifest.c
    > new file mode 100644
    > index 0000000000..e6b42adfda
    > --- /dev/null
    > +++ b/src/bin/pg_validatebackup/parse_manifest.c
    > @@ -0,0 +1,576 @@
    > +/*-------------------------------------------------------------------------
    > + *
    > + * parse_manifest.c
    > + *	  Parse a backup manifest in JSON format.
    > + *
    > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
    > + * Portions Copyright (c) 1994, Regents of the University of California
    > + *
    > + * src/bin/pg_validatebackup/parse_manifest.c
    > + *
    > + *-------------------------------------------------------------------------
    > + */
    
    Doesn't have to be in the first version, but could it be useful to move
    this to common/ or such?
    
    
    
    > +/*
    > + * Validate one directory.
    > + *
    > + * 'relpath' is NULL if we are to validate the top-level backup directory,
    > + * and otherwise the relative path to the directory that is to be validated.
    > + *
    > + * 'fullpath' is the backup directory with 'relpath' appended; i.e. the actual
    > + * filesystem path at which it can be found.
    > + */
    > +static void
    > +validate_backup_directory(validator_context *context, char *relpath,
    > +						  char *fullpath)
    > +{
    
    Hm. Should this warn if the directory's permissions are set too openly
    (world writable?)?
    
    
    > +/*
    > + * Validate the checksum of a single file.
    > + */
    > +static void
    > +validate_file_checksum(validator_context *context, manifestfile *tabent,
    > +					   char *fullpath)
    > +{
    > +	pg_checksum_context checksum_ctx;
    > +	char	   *relpath = tabent->pathname;
    > +	int			fd;
    > +	int			rc;
    > +	uint8		buffer[READ_CHUNK_SIZE];
    > +	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
    > +	int			checksumlen;
    > +
    > +	/* Open the target file. */
    > +	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
    > +	{
    > +		report_backup_error(context, "could not open file \"%s\": %m",
    > +						   relpath);
    > +		return;
    > +	}
    > +
    > +	/* Initialize checksum context. */
    > +	pg_checksum_init(&checksum_ctx, tabent->checksum_type);
    > +
    > +	/* Read the file chunk by chunk, updating the checksum as we go. */
    > +	while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
    > +		pg_checksum_update(&checksum_ctx, buffer, rc);
    > +	if (rc < 0)
    > +		report_backup_error(context, "could not read file \"%s\": %m",
    > +						   relpath);
    > +
    
    Hm. I think it'd be good to verify that the checksummed size is the same
    as the size of the file in the manifest.
    
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  144. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T15:26:56Z

    Greetings,
    
    * Andres Freund (andres@anarazel.de) wrote:
    > On 2020-03-26 11:37:48 -0400, Robert Haas wrote:
    > > I'm sorry that you can't see how that's sensible, but it doesn't mean
    > > that it isn't sensible. It is totally unrealistic to expect that any
    > > backup verification tool can verify that you won't get an error when
    > > trying to use the backup. That would require that everything that the
    > > validation tool try to do everything that PostgreSQL will try to do
    > > when the backup is used, including running recovery and updating the
    > > data files. Anything less than that creates a real possibility that
    > > the backup will verify good but fail when used. This tool has a much
    > > narrower purpose, which is to try to verify that we (still) have the
    > > files the server sent as part of the backup and that, to the best of
    > > our ability to detect such things, they have not been modified. As you
    > > know, or should know, the WAL files are not sent as part of the
    > > backup, and so are not verified. Other things that would also be
    > > useful to check are also not verified. It would be fantastic to have
    > > more verification tools in the future, but it is difficult to see why
    > > anyone would bother trying if an attempt to get the first one
    > > committed gets blocked because it does not yet do everything. Very few
    > > patches try to do everything, and those that do usually get blocked
    > > because, by trying to do too much, they get some of it badly wrong.
    > 
    > It sounds to me that if there are to be manifests for the WAL, it should
    > be a separate (set of) manifests. Trying to somehow tie together the
    > manifest for the base backup, and the one for the WAL, makes little
    > sense to me. They're commonly not computed in one place, often not even
    > stored in the same place. For PITR relevant WAL doesn't even exist yet
    > at the time the manifest is created (and thus obviously cannot be
    > included in the base backup manifest). And fairly obviously one would
    > want to be able to verify the correctness of WAL between two
    > basebackups.
    
    We aren't talking about generic PITR or about tools other than
    pg_basebackup, which has specific options for grabbing the WAL, and
    making sure that it is all there for the backup that was taken.
    
    > I don't see much point in complicating the design to somehow capture WAL
    > in the manifest, when it's only going to solve a small set of cases.
    
    As it relates to this, I tend to think that it solves the exact case
    that pg_basebackup is built for and used for.  I said up-thread that if
    someone does decide to use -X none then we could just throw a warning
    (and perhaps have a way to override that if there's desire for it).
    
    > Seems better to (later?) add support for generating manifests for WAL
    > files, and then have a tool that can verify all the manifests required
    > to restore a base backup.
    
    I'm not trying to expand on the feature set here or move the goalposts
    way down the road, which is what seems to be what's being suggested
    here.  To be clear, I don't have any objection to adding a generic tool
    for validating WAL as you're talking about here, but I also don't think
    that's required for pg_validatebackup.  What I do think we need is a
    check of the WAL that's fetched when people use pg_basebackup -Xstream
    or -Xfetch.  pg_basebackup itself has that check because it's critical
    to the backup being successful and valid.  Not having that basic
    validation of a backup really just isn't ok- there's a reason
    pg_basebackup has that check.
    
    Thanks,
    
    Stephen
    
  145. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T17:53:54Z

    On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
    > I know you and Stephen have agreed on a number of doc changes, would it
    > be possible to get a new patch with those included? I finally have time
    > to do a review of this tomorrow.  I saw some mistakes in the docs in the
    > current patch but I know those patches are not current.
    
    Hi David,
    
    Here's a new version with some fixes:
    
    - Fixes for doc typos noted by Stephen Frost and Andres Freund.
    - Replace a doc paragraph about the advantages and disadvantages of
    CRC-32C with one by Stephen Frost, with a slightly change by me that I
    thought made it sound more grammatical.
    - Change the pg_validatebackup documentation so that it makes no
    mention of compatible tools, per Stephen.
    - Reword the discussion of the exclude list in the pg_validatebackup
    documentation, per discussion between Stephen and myself.
    - Try to make the documentation more clear about the fact that we
    check for both extra and missing files.
    - Incorporate a fix from Amit Kapila to make 003_corruption.pl pass on Windows.
    
    HTH,
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  146. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T18:13:17Z

    On Fri, Mar 27, 2020 at 1:06 AM Andres Freund <andres@anarazel.de> wrote:
    > > Like, suppose we change the default from CRC-32C to SHA-something. On
    > > the upside, the error detection rate will increase from 99.9999999+%
    > > to something much closer to 100%.
    >
    > FWIW, I don't buy the relevancy of 99.9999999+% at all. That's assuming
    > a single bit error (at relevant lengths, before that it's single burst
    > errors of a greater length), which isn't that relevant for our purposes.
    >
    > That's not to say that I don't think a CRC check can provide value. It
    > does provide a high likelihood of detecting enough errors, including
    > coding errors in how data is restored (not unimportant), that you're
    > likely not find out aobut a problem soon.
    
    So, I'm glad that you think a CRC check gives a sufficiently good
    chance of detection errors, but I don't understand what your objection
    to the percentage.  Stephen just objected to it again, too:
    
    On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > I mean, the property that I care about is the one where it detects
    > > better than 999,999,999 errors out of every 1,000,000,000, regardless
    > > of input length.
    >
    > Throwing these kinds of things around I really don't think is useful.
    
    ...but I don't understand his reasoning, or yours.
    
    My reasoning for thinking that the number is accurate is that a 32-bit
    checksum has 2^32 possible results. If all of those results are
    equally probable, then the probability that two files with unequal
    contents produce the same result is 2^-32. This does assume that the
    hash function is perfect, which no hash function is, so the actual
    probability of a collision is likely higher. But if the hash function
    is pretty good, it shouldn't be all that much higher. Note that I am
    making no assumptions here about how many bits are different, nor am I
    making any assumption about the length of a file. I am simply saying
    that an n-bit checksum should detect a difference between two files
    with a probability of roughly 1-2^{-n}, modulo the imperfections of
    the hash function. I thought that this was a well-accepted fact that
    would produce little argument from anybody, and I'm confused that
    people seem to feel otherwise.
    
    One explanation that would make sense to me is if somebody said, well,
    the nature of this particular algorithm means that, although values
    are uniformly distributed in general, the kinds of errors that are
    likely to occur in practice are likely to cancel out. For instance, if
    you imagine trivial algorithms such as adding or xor-ing all the
    bytes, adding zero bytes doesn't change the answer, and neither do
    transpositions. However, CRC is, AIUI, designed to be resistant to
    such problems. Your remark about large blocks of zero bytes is
    interesting to me in this context, but in a quick search I couldn't
    find anything stating that CRC was weak for such use cases.
    
    The old thread about switching from 64-bit CRC to 32-bit CRC had a
    link to a page which has subsequently been moved to here:
    
    https://www.ece.unb.ca/tervo/ee4253/crc.shtml
    
    Down towards the bottom, it says:
    
    "In general, bit errors and bursts up to N-bits long will be detected
    for a P(x) of degree N. For arbitrary bit errors longer than N-bits,
    the odds are one in 2^{N} than a totally false bit pattern will
    nonetheless lead to a zero remainder."
    
    Which I think is the same thing I'm saying: the chances of failing to
    detecting an error with a decent n-bit checksum ought to be about
    2^{-N}. If that's not right, I'd really like to understand why.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  147. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T18:34:19Z

    On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
    > I agree with Stephen that this should be done, but I agree with you that
    > it can wait for a future commit. However, I do think:
    >
    > 1) It should be called out rather plainly in the documentation.
    > 2) If there are files in pg_wal then pg_validatebackup should inform the
    > user that those files have not been validated.
    
    I agree with you about #1, and I suspect that there's a way to improve
    what I've got here now, but I think I might be too close to this to
    figure out what the best way would be, so suggestions welcome.
    
    I think #2 is an interesting idea and could possibly reduce the danger
    of user confusion on this point considerably - because, let's face it,
    not everyone is going to read the documentation. However, I'm having a
    hard time figuring out exactly what we'd print. Right now on success,
    unless you specify -q, you get:
    
    [rhaas ~]$ pg_validatebackup  ~/pgslave
    backup successfully verified
    
    But it feels strange and possibly confusing to me to print something like:
    
    [rhaas ~]$ pg_validatebackup  ~/pgslave
    backup successfully verified (except for pg_wal)
    
    ...because there are a few other exceptions too, and also because it
    might make the user think that we normally check that but for some
    reason decided to skip it in this case. Maybe something more verbose
    like:
    
    [rhaas ~]$ pg_validatebackup  ~/pgslave
    backup files successfully verified
    your backup contains a pg_wal directory, but this tool can't validate
    that, so do it yourself
    
    ...but that seems a little obnoxious and a little silly to print out every time.
    
    Ideas?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  148. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T18:53:10Z

    On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
    > Is it actually possible, today, in PG, to have a 4GB WAL record?
    > Judging this based on the WAL record size doesn't seem quite right.
    
    I'm not sure. I mean, most records are quite small, but I think if you
    set REPLICA IDENTITY FULL on a table with a bunch of very wide columns
    (and also wal_level=logical) it can get really big. I haven't tested
    to figure out just how big it can get. (If I have a table with lots of
    almost-1GB-blobs in it, does it work without logical replication and
    fail with logical replication? I don't know, but I doubt a WAL record
    >4GB is possible, because it seems unlikely that the code has a way to
    cope with that struct field overflowing.)
    
    > Again, I'm not against having a checksum algorithm as a option.  I'm not
    > saying that it must be SHA512 as the default.
    
    I think that what we have seen so far is that all of the SHA-n
    algorithms that PostgreSQL supports are about equally slow, so it
    doesn't really matter which one you pick there from a performance
    point of view. If you're not saying it has to be SHA-512 but you do
    want it to be SHA-256, I don't think that really fixes anything. Using
    CRC-32C does fix the performance issue, but I don't think you like
    that, either. We could default to having no checksums at all, or even
    no manifest at all, but I didn't get the impression that David, at
    least, wanted to go that way, and I don't like it either. It's not the
    world's best feature, but I think it's good enough to justify enabling
    it by default. So I'm not sure we have any options here that will
    satisfy you.
    
    > > > I don't agree with limiting our view to only those algorithms that we've
    > > > already got implemented in PG.
    > >
    > > I mean, opening that giant can of worms ~2 weeks before feature freeze
    > > is not very nice. This patch has been around for months, and the
    > > algorithms were openly discussed a long time ago.
    >
    > Yes, they were discussed before, and these issues were brought up before
    > and there was specifically concern brought up about exactly the same
    > issues that I'm repeating here. Those concerns seem to have been
    > largely ignored, apparently because "we don't have that in PG today" as
    > at least one of the considerations- even though we used to.
    
    I might have missed something, but I don't remember any suggestion of
    CRC-64 or other algorithms for which PG does not currently have
    support prior to this week. The only thing I remember having been
    suggested previously was SHA, and I responded to that by adding
    support for SHA, not by ignoring the suggestion. If there was another
    suggestion made earlier, I must have missed it.
    
    > I also had hoped that
    > David's concerns that were raised before had been heeded, as I knew he
    > was involved in the discussion previously, but that turns out to not
    > have been the case.
    
    Well, I mean, I am trying pretty hard here, but I realize that I'm not
    succeeding. I don't know which specific suggestion you're talking
    about here. I understand that there is a concern about a 32-bit CRC
    somehow not being valid for more than 512MB, but based on my research,
    I believe that to be incorrect. I've explained the reasons why I
    believe it to be incorrect several times now, but I feel like we're
    just going around in circles. If my explanation of why it's incorrect
    is itself incorrect, tell me why, but let's not just keep saying the
    things we've both already said.
    
    > Yes, that looks fine.  Feels slightly redundant to include the "as
    > described above ..." bit, and I think that could be dropped, but up to
    > you.
    
    Done in the version I posted a bit ago.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  149. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T19:20:27Z

    On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > s/ye'/yes/
    
    Ugh, sorry. Fixed in the version posted earlier.
    
    > Are you planning to include a specification of the manifest file format
    > anywhere? I looked through the patches and didn't find anything.
    
    I thought about that. I think it would be good to have. I was sort of
    hoping to leave it for a follow-on patch, but maybe that's cheating
    too much.
    
    > I think it'd also be good to include more information about what the
    > point of manifest files actually is.
    
    What kind of information do you want to see included there? Basically,
    the way the documentation is written right now, it essentially says,
    well, we have this manifest thing so that you can later run
    pg_validatebackup, and pg_validatebackup says that it's there to check
    the integrity of backups using the manifest. This is all a bit
    circular, though, and maybe needs elaboration.
    
    What I've experienced is that:
    
    - Sometimes people take a backup and then wonder later whether the
    disk has flipped some bits.
    - Sometimes people restore a backup and forget some of the parts, like
    the user-defined tablespaces.
    - Sometimes anti-virus software, or poorly-run cron job run amok,
    wander around inflicting unpredictable damage.
    
    It would be nice to have a system that would notice these kinds of
    things on a running system, but here I've got the more modest goal of
    checking for in the context of a backup. If the data gets corrupted in
    transit, or if the disk mutilates it, or if the user mutilates it, you
    need something to check the backup against to find out that bad things
    have happend; the manifest is that thing. But I don't know exactly how
    much of all that should go in the docs, or in what way.
    
    > > +  <para>
    > > +   <application>pg_validatebackup</application> reads the manifest file of a
    > > +   backup, verifies the manifest against its own internal checksum, and then
    > > +   verifies that the same files are present in the target directory as in the
    > > +   manifest itself. It then verifies that each file has the expected checksum,
    >
    > Depending on what you want to use the manifest for, we'd also need to
    > check that there are no additional files. That seems to actually be
    > implemented, which imo should be mentioned here.
    
    I intended the text to say that, because it says that it checks that
    the two things are "the same," which is symmetric.  In the new version
    I posted a bit ago, I tried to make it more explicit, because
    apparently it was not sufficiently clear.
    
    > Hm. Is it a great choice to include the checksum for the manifest inside
    > the manifest itself? With a cryptographic checksum it seems like it
    > could make a ton of sense to store the checksum somewhere "safe", but
    > keep the manifest itself alongside the base backup itself. While not
    > huge, they won't be tiny either.
    
    Seems like the user could just copy the manifest checksum and store it
    somewhere, if they wish. Then they can check it against the manifest
    itself later, if they wish. Or they can take a SHA-512 of the whole
    file and store that securely. The problem is that we have no idea how
    to write that checksum to a more security storage. We could write
    backup_manifest and backup_manifest.checksum into separate files, but
    that seems like it's adding complexity without any real benefit.
    
    To me, the security-related uses of this patch seem to be fairly
    niche. I think it's nice that they exist, but I don't think that's the
    main selling point. For me, the main selling point is that you can
    check that your disk didn't eat your data and that nobody nuked any
    files that were supposed to be there.
    
    > Doesn't have to be in the first version, but could it be useful to move
    > this to common/ or such?
    
    Yeah. At one point, this code was written in a way that was totally
    specific to pg_validatebackup, but I then realized that it would be
    better to make it more general, so I refactored it into in the form
    you see now, where pg_validatebackup.c depends on parse_manifest.c but
    not the reverse. I suspect that if someone wants to use this for
    something else they might need to change a few more things - not sure
    exactly what - but I don't think it would be too hard. I thought it
    would be best to leave that task until someone has a concrete use case
    in mind, but I did want it to to be relatively easy to do that down
    the road, and I hope that the way I've organized the code achieves
    that.
    
    > > +static void
    > > +validate_backup_directory(validator_context *context, char *relpath,
    > > +                                               char *fullpath)
    > > +{
    >
    > Hm. Should this warn if the directory's permissions are set too openly
    > (world writable?)?
    
    I don't think so, but it's pretty clear that different people have
    different ideas about what the scope of this tool ought to be, even in
    this first version.
    
    > Hm. I think it'd be good to verify that the checksummed size is the same
    > as the size of the file in the manifest.
    
    That's checked in an earlier phase. Are you worried about the file
    being modified after the first pass checks the size and before we come
    through to do the checksumming?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  150. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-27T19:29:02Z

    On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
    > > Seems better to (later?) add support for generating manifests for WAL
    > > files, and then have a tool that can verify all the manifests required
    > > to restore a base backup.
    >
    > I'm not trying to expand on the feature set here or move the goalposts
    > way down the road, which is what seems to be what's being suggested
    > here.  To be clear, I don't have any objection to adding a generic tool
    > for validating WAL as you're talking about here, but I also don't think
    > that's required for pg_validatebackup.  What I do think we need is a
    > check of the WAL that's fetched when people use pg_basebackup -Xstream
    > or -Xfetch.  pg_basebackup itself has that check because it's critical
    > to the backup being successful and valid.  Not having that basic
    > validation of a backup really just isn't ok- there's a reason
    > pg_basebackup has that check.
    
    I don't understand how this could be done without significantly
    complicating the architecture. As I said before, -Xstream sends WAL
    over a separate connection that is unrelated to the one running
    BASE_BACKUP, so the base-backup connection doesn't know what to
    include in the manifest. Now you could do something like: once all of
    the WAL files have been fetched, the client checksums all of those and
    sends their names and checksums to the server, which turns around and
    puts them into the manifest, which it then sends back to the client.
    But that is actually quite a bit of additional complexity, and it's
    pretty strange, too, because now you have the client checksumming some
    files and the server checksumming others. I know you mentioned a few
    different ideas before, but I think they all kinda have some problem
    along these lines.
    
    I also kinda disagree with the idea that the WAL should be considered
    an integral part of the backup. I don't know how pgbackrest does
    things, but BART stores each backup in a separate directly without any
    associated WAL, and then keeps all the WAL together in a different
    directory. I imagine that people who are using continuous archiving
    also tend to use -Xnone, or if they do backups by copying the files
    rather than using pg_backrest, they exclude pg_wal. In fact, for
    people with big, important databases, I'd assume that would be the
    normal pattern. You presumably wouldn't want to keep one copy of the
    WAL files taken during the backup with the backup itself, and a
    separate copy in the archive.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  151. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T19:48:50Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
    > > I agree with Stephen that this should be done, but I agree with you that
    > > it can wait for a future commit. However, I do think:
    > >
    > > 1) It should be called out rather plainly in the documentation.
    > > 2) If there are files in pg_wal then pg_validatebackup should inform the
    > > user that those files have not been validated.
    > 
    > I agree with you about #1, and I suspect that there's a way to improve
    > what I've got here now, but I think I might be too close to this to
    > figure out what the best way would be, so suggestions welcome.
    > 
    > I think #2 is an interesting idea and could possibly reduce the danger
    > of user confusion on this point considerably - because, let's face it,
    > not everyone is going to read the documentation. However, I'm having a
    > hard time figuring out exactly what we'd print. Right now on success,
    > unless you specify -q, you get:
    > 
    > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > backup successfully verified
    > 
    > But it feels strange and possibly confusing to me to print something like:
    > 
    > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > backup successfully verified (except for pg_wal)
    > 
    > ...because there are a few other exceptions too, and also because it
    
    The exceptions you're referring to here are things like the various
    signal files, that the user can recreated pretty easily..?  I don't
    think those really rise to the level of pg_wal.
    
    What I would hope to see (... well, we know what I *really* would hope
    to see, but if we really go this route) is something like:
    
    WARNING: pg_wal not empty, WAL files are not validated by this tool
    data files successfully verified
    
    and a non-zero exit code.
    
    Basically, if you're doing WAL yourself, then you'd use pg_receivewal
    and maybe your own manifest-building code for WAL or something and then
    use -X none with pg_basebackup.
    
    Then again, I'd have -X none throw a warning too.  I'd be alright with
    all of these having override switches to say "ok, I get it, don't
    complain about it".
    
    I disagree with the idea of writing "backup successfully verified" when
    we aren't doing any checking of the WAL that's essential for the backup
    (unlike various signal files and whatnot, which aren't...).
    
    Thanks,
    
    Stephen
    
  152. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-27T19:50:56Z

    On 3/27/20 1:53 PM, Robert Haas wrote:
    > On Thu, Mar 26, 2020 at 4:37 PM David Steele <david@pgmasters.net> wrote:
    >> I know you and Stephen have agreed on a number of doc changes, would it
    >> be possible to get a new patch with those included? I finally have time
    >> to do a review of this tomorrow.  I saw some mistakes in the docs in the
    >> current patch but I know those patches are not current.
    > 
    > Hi David,
    > 
    > Here's a new version with some fixes:
    > 
    > - Fixes for doc typos noted by Stephen Frost and Andres Freund.
    > - Replace a doc paragraph about the advantages and disadvantages of
    > CRC-32C with one by Stephen Frost, with a slightly change by me that I
    > thought made it sound more grammatical.
    > - Change the pg_validatebackup documentation so that it makes no
    > mention of compatible tools, per Stephen.
    > - Reword the discussion of the exclude list in the pg_validatebackup
    > documentation, per discussion between Stephen and myself.
    > - Try to make the documentation more clear about the fact that we
    > check for both extra and missing files.
    > - Incorporate a fix from Amit Kapila to make 003_corruption.pl pass on Windows.
    
    Thanks!
    
    There appear to be conflicts with 67e0adfb3f98:
    
    $ git apply -3 
    ../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch
    ../download/v14-0002-Generate-backup-manifests-for-base-backups-and-v.patch:3396: 
    trailing whitespace.
    sub cleanup_search_directory_fails
    error: patch failed: src/backend/replication/basebackup.c:258
    Falling back to three-way merge...
    Applied patch to 'src/backend/replication/basebackup.c' with conflicts.
    U src/backend/replication/basebackup.c
    warning: 1 line adds whitespace errors.
    
     > +          Specifies the algorithm that should be used to checksum 
    each file
     > +          for purposes of the backup manifest. Currently, the available
    
    perhaps "for inclusion in the backup manifest"?  Anyway, I think this 
    sentence is awkward.
    
     > +        Specifies the algorithm that should be used to checksum each 
    file
     > +        for purposes of the backup manifest. Currently, the available
    
    And again.
    
     > +        because the files themselves do not need to read.
    
    should be "need to be read".
    
     > +        the manifest itself will always contain a 
    <literal>SHA256</literal>
    
    I think just "the manifest will always contain" is fine.
    
     > +        manifeste itself, and is therefore ignored. Note that the 
    manifest
    
    typo "manifeste", perhaps remove itself.
    
     > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27 
    18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
    
    Storing the checksum type with each file seems pretty redundant. 
    Perhaps that could go in the header?  You could always override if a 
    specific file had a different checksum type, though that seems unlikely.
    
    In general it might be good to go with shorter keys: "mod", "chk", etc. 
    Manifests can get pretty big and that's a lot of extra bytes.
    
    I'm also partial to using epoch time in the manifest because it is 
    generally easier for programs to work with.  But, human-readable doesn't 
    suck, either.
    
     >  	if (maxrate > 0)
     > 		maxrate_clause = psprintf("MAX_RATE %u", maxrate);
     > +	if (manifest)
    
    A linefeed here would be nice.
    
     > +	manifestfile *tabent;
    
    This is an odd name.  A holdover from the tab-delimited version?
    
     > +	printf(_("Usage:\n  %s [OPTION]... BACKUPDIR\n\n"), progname);
    
    When I ran pg_validatebackup I expected to use -D to specify the backup 
    dir since pg_basebackup does.  On the other hand -D is weird because I 
    *really* expect that to be the pg data dir.
    
    But, do we want this to be different from pg_basebackup?
    
     > +		checksum_length = checksum_string_length / 2;
    
    This check is defeated if a single character is added the to checksum.
    
    Not too big a deal since you still get an error, but still.
    
     > + * Verify that the manifest checksum is correct.
    
    This is not working the way I would expect -- I could freely modify the 
    manifest without getting a checksum error on the manifest.  For example:
    
    $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
    pg_validatebackup: fatal: invalid checksum for file "backup_label": 
    "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
    
    So, if I deleted the entry above, I got a manifest checksum error.  But 
    if I just modified the checksum I get a file checksum error with no 
    manifest checksum error.
    
    I would prefer a manifest checksum error in all cases where it is wrong, 
    unless --exit-on-error is specified.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  153. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T19:55:12Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > Is it actually possible, today, in PG, to have a 4GB WAL record?
    > > Judging this based on the WAL record size doesn't seem quite right.
    > 
    > I'm not sure. I mean, most records are quite small, but I think if you
    > set REPLICA IDENTITY FULL on a table with a bunch of very wide columns
    > (and also wal_level=logical) it can get really big. I haven't tested
    > to figure out just how big it can get. (If I have a table with lots of
    > almost-1GB-blobs in it, does it work without logical replication and
    > fail with logical replication? I don't know, but I doubt a WAL record
    > >4GB is possible, because it seems unlikely that the code has a way to
    > cope with that struct field overflowing.)
    
    Interesting..  Well, topic for another thread, but I'd say if we believe
    that's possible then we might want to consider if the crc32c is a good
    decision to use still there.
    
    > > Again, I'm not against having a checksum algorithm as a option.  I'm not
    > > saying that it must be SHA512 as the default.
    > 
    > I think that what we have seen so far is that all of the SHA-n
    > algorithms that PostgreSQL supports are about equally slow, so it
    > doesn't really matter which one you pick there from a performance
    > point of view. If you're not saying it has to be SHA-512 but you do
    > want it to be SHA-256, I don't think that really fixes anything. Using
    > CRC-32C does fix the performance issue, but I don't think you like
    > that, either. We could default to having no checksums at all, or even
    > no manifest at all, but I didn't get the impression that David, at
    > least, wanted to go that way, and I don't like it either. It's not the
    > world's best feature, but I think it's good enough to justify enabling
    > it by default. So I'm not sure we have any options here that will
    > satisfy you.
    
    I do like having a manifest by default.  At this point it's pretty clear
    that we've just got a fundamental disagreement that more words aren't
    going to fix.  I'd rather we play it safe and use a sha256 hash and
    accept that it's going to be slower by default, and then give users an
    option to make it go faster if they want (though I'd much rather that
    alternative be a 64bit CRC than a 32bit one).
    
    Andres seems to agree with you.  I'm not sure where David sits on this
    specific question.
    
    Thanks,
    
    Stephen
    
  154. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T19:56:24Z

    Hi,
    
    On 2020-03-27 14:13:17 -0400, Robert Haas wrote:
    > On Thu, Mar 26, 2020 at 4:44 PM Stephen Frost <sfrost@snowman.net> wrote:
    > > > I mean, the property that I care about is the one where it detects
    > > > better than 999,999,999 errors out of every 1,000,000,000, regardless
    > > > of input length.
    > >
    > > Throwing these kinds of things around I really don't think is useful.
    > 
    > ...but I don't understand his reasoning, or yours.
    > 
    > My reasoning for thinking that the number is accurate is that a 32-bit
    > checksum has 2^32 possible results. If all of those results are
    > equally probable, then the probability that two files with unequal
    > contents produce the same result is 2^-32. This does assume that the
    > hash function is perfect, which no hash function is, so the actual
    > probability of a collision is likely higher. But if the hash function
    > is pretty good, it shouldn't be all that much higher. Note that I am
    > making no assumptions here about how many bits are different, nor am I
    > making any assumption about the length of a file. I am simply saying
    > that an n-bit checksum should detect a difference between two files
    > with a probability of roughly 1-2^{-n}, modulo the imperfections of
    > the hash function. I thought that this was a well-accepted fact that
    > would produce little argument from anybody, and I'm confused that
    > people seem to feel otherwise.
    
    Well: crc32 is a terrible hash, if you're looking for even distribution
    of hashed values. That's not too surprising - its design goals included
    guaranteed error detection for certain lengths, and error correction of
    single bit errors.  My understanding of the underlying math is spotty at
    best, but from what I understand that does pretty directly imply less
    independence between source data -> hash value than what we'd want from
    a good hash function.
    
    Here's an smhasher result page for crc32 (at least the latter is crc32
    afaict):
    https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32
    https://notabug.org/vaeringjar/smhasher/src/master/doc/crc32_hw
    
    and then compare that with something like xxhash, or even lookup3 (which
    I think is what our hash is a variant of):
    https://notabug.org/vaeringjar/smhasher/src/master/doc/xxHash32
    https://notabug.org/vaeringjar/smhasher/src/master/doc/lookup3
    
    The birthday paradoxon doesn't apply (otherwise 32bit would never be
    enough, at a 50% chance of conflict at around 80k hashes), but still I
    do wonder if it matters that we're trying to detect errors in not one,
    but commonly tens of thousands to millions of files. But since we just
    need to detect one error to call the whole backup corrupt...
    
    
    > One explanation that would make sense to me is if somebody said, well,
    > the nature of this particular algorithm means that, although values
    > are uniformly distributed in general, the kinds of errors that are
    > likely to occur in practice are likely to cancel out. For instance, if
    > you imagine trivial algorithms such as adding or xor-ing all the
    > bytes, adding zero bytes doesn't change the answer, and neither do
    > transpositions. However, CRC is, AIUI, designed to be resistant to
    > such problems. Your remark about large blocks of zero bytes is
    > interesting to me in this context, but in a quick search I couldn't
    > find anything stating that CRC was weak for such use cases.
    
    My main point was that CRC's error detection guarantees are pretty much
    irrelevant for us. I.e. while the right CRC will guarantee that all
    single 2 bit errors will be detected, that's not a helpful property for
    us. There rarely are single bit errors, and the bursts are too long to
    to benefit from any >2 bit guarantees. Nor are multiple failures rare
    once you hit a problem.
    
    
    > The old thread about switching from 64-bit CRC to 32-bit CRC had a
    > link to a page which has subsequently been moved to here:
    > 
    > https://www.ece.unb.ca/tervo/ee4253/crc.shtml
    > 
    > Down towards the bottom, it says:
    > 
    > "In general, bit errors and bursts up to N-bits long will be detected
    > for a P(x) of degree N. For arbitrary bit errors longer than N-bits,
    > the odds are one in 2^{N} than a totally false bit pattern will
    > nonetheless lead to a zero remainder."
    
    That's still about a single sequence of bit errors though, as far as I
    can tell. I.e. it doesn't hold for CRCs if you have two errors at
    different places.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  155. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-27T20:02:00Z

    On 3/27/20 3:20 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > 
    >> Hm. Is it a great choice to include the checksum for the manifest inside
    >> the manifest itself? With a cryptographic checksum it seems like it
    >> could make a ton of sense to store the checksum somewhere "safe", but
    >> keep the manifest itself alongside the base backup itself. While not
    >> huge, they won't be tiny either.
    > 
    > Seems like the user could just copy the manifest checksum and store it
    > somewhere, if they wish. Then they can check it against the manifest
    > itself later, if they wish. Or they can take a SHA-512 of the whole
    > file and store that securely. The problem is that we have no idea how
    > to write that checksum to a more security storage. We could write
    > backup_manifest and backup_manifest.checksum into separate files, but
    > that seems like it's adding complexity without any real benefit.
    
    I agree that this seems like a separate problem. What Robert has done 
    here is detect random mutilation of the manifest.
    
    To prevent malicious modifications you either need to store the checksum 
    in another place, or digitally sign the file and store that alongside it 
    (or inside it even). Either way seems pretty far out of scope to me.
    
    >> Hm. I think it'd be good to verify that the checksummed size is the same
    >> as the size of the file in the manifest.
    > 
    > That's checked in an earlier phase. Are you worried about the file
    > being modified after the first pass checks the size and before we come
    > through to do the checksumming?
    
    I prefer to validate the size and checksum in the same pass, but I'm not 
    sure it's that big a deal.  If the backup is being corrupted under the 
    validate process that would also apply to files that had already been 
    validated.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  156. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T20:08:59Z

    Hi,
    
    On 2020-03-27 15:29:02 -0400, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
    > > > Seems better to (later?) add support for generating manifests for WAL
    > > > files, and then have a tool that can verify all the manifests required
    > > > to restore a base backup.
    > >
    > > I'm not trying to expand on the feature set here or move the goalposts
    > > way down the road, which is what seems to be what's being suggested
    > > here.  To be clear, I don't have any objection to adding a generic tool
    > > for validating WAL as you're talking about here, but I also don't think
    > > that's required for pg_validatebackup.  What I do think we need is a
    > > check of the WAL that's fetched when people use pg_basebackup -Xstream
    > > or -Xfetch.  pg_basebackup itself has that check because it's critical
    > > to the backup being successful and valid.  Not having that basic
    > > validation of a backup really just isn't ok- there's a reason
    > > pg_basebackup has that check.
    > 
    > I don't understand how this could be done without significantly
    > complicating the architecture. As I said before, -Xstream sends WAL
    > over a separate connection that is unrelated to the one running
    > BASE_BACKUP, so the base-backup connection doesn't know what to
    > include in the manifest. Now you could do something like: once all of
    > the WAL files have been fetched, the client checksums all of those and
    > sends their names and checksums to the server, which turns around and
    > puts them into the manifest, which it then sends back to the client.
    > But that is actually quite a bit of additional complexity, and it's
    > pretty strange, too, because now you have the client checksumming some
    > files and the server checksumming others. I know you mentioned a few
    > different ideas before, but I think they all kinda have some problem
    > along these lines.
    
    How about having separate manifests for segments? And have them stay
    separate? And then have an option to verify the manifests for all the
    WAL files that are required for a specific restore? The easiest way
    would be to just add a separate manifest file for each segment, and name
    them accordingly. But inventing a naming pattern that specifies both
    start-end segments wouldn't be hard either, and result in fewer
    manifests.
    
    Base backups (in the backup sense, not for bringing up replicas etc)
    without the ability to apply newer WAL are fairly pointless imo. And if
    newer WAL is applied, there's not much point in just verifying the WAL
    that's necessary to restore the base backup. Instead you'd want to be
    able to verify all the WAL since the base backup to the "current" point
    (or the next base backup).
    
    For me having something inside pg_basebackup (or the server, for
    -Xfetch) that somehow includes the WAL files in the manifest doesn't
    really gain us much - it's obviously not something that'll help us to
    verify all the WAL that needs to be applied (to either get the base
    backup into a consistent state, or to roll forward to the desired
    point).
    
    
    
    > I also kinda disagree with the idea that the WAL should be considered
    > an integral part of the backup. I don't know how pgbackrest does
    > things, but BART stores each backup in a separate directly without any
    > associated WAL, and then keeps all the WAL together in a different
    > directory. I imagine that people who are using continuous archiving
    > also tend to use -Xnone, or if they do backups by copying the files
    > rather than using pg_backrest, they exclude pg_wal. In fact, for
    > people with big, important databases, I'd assume that would be the
    > normal pattern. You presumably wouldn't want to keep one copy of the
    > WAL files taken during the backup with the backup itself, and a
    > separate copy in the archive.
    
    +1
    
    I also don't see them as being as important, due to the already existing
    checksums (which are of a much much much higher quality than what we
    have for database pages, both by being wider, and by being much more
    frequent in most cases). There's obviously a need to validate the WAL in
    a nicer way than scripting pg_waldump - but that seems separate anyway.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  157. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T20:12:22Z

    Hi,
    
    On 2020-03-27 14:34:19 -0400, Robert Haas wrote:
    > I think #2 is an interesting idea and could possibly reduce the danger
    > of user confusion on this point considerably - because, let's face it,
    > not everyone is going to read the documentation. However, I'm having a
    > hard time figuring out exactly what we'd print. Right now on success,
    > unless you specify -q, you get:
    > 
    > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > backup successfully verified
    > 
    > But it feels strange and possibly confusing to me to print something like:
    > 
    > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > backup successfully verified (except for pg_wal)
    
    You could print something like:
    WAL necessary to restore this base backup can be validated with:
    
    pg_waldump -p ~/pgslave -t tl -s backup_start_location -e backup_end_loc > /dev/null && echo true
    
    Obviously that specific invocation sucks, but it'd not be hard to add an
    option to waldump to not output anything.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  158. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-27T20:16:11Z

    On 3/27/20 3:29 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
    >>> Seems better to (later?) add support for generating manifests for WAL
    >>> files, and then have a tool that can verify all the manifests required
    >>> to restore a base backup.
    >>
    >> I'm not trying to expand on the feature set here or move the goalposts
    >> way down the road, which is what seems to be what's being suggested
    >> here.  To be clear, I don't have any objection to adding a generic tool
    >> for validating WAL as you're talking about here, but I also don't think
    >> that's required for pg_validatebackup.  What I do think we need is a
    >> check of the WAL that's fetched when people use pg_basebackup -Xstream
    >> or -Xfetch.  pg_basebackup itself has that check because it's critical
    >> to the backup being successful and valid.  Not having that basic
    >> validation of a backup really just isn't ok- there's a reason
    >> pg_basebackup has that check.
    > 
    > I don't understand how this could be done without significantly
    > complicating the architecture. As I said before, -Xstream sends WAL
    > over a separate connection that is unrelated to the one running
    > BASE_BACKUP, so the base-backup connection doesn't know what to
    > include in the manifest. Now you could do something like: once all of
    > the WAL files have been fetched, the client checksums all of those and
    > sends their names and checksums to the server, which turns around and
    > puts them into the manifest, which it then sends back to the client.
    > But that is actually quite a bit of additional complexity, and it's
    > pretty strange, too, because now you have the client checksumming some
    > files and the server checksumming others. I know you mentioned a few
    > different ideas before, but I think they all kinda have some problem
    > along these lines.
    > 
    > I also kinda disagree with the idea that the WAL should be considered
    > an integral part of the backup. I don't know how pgbackrest does
    > things, 
    
    We checksum each WAL file while it is read and transmitted to the repo 
    by the archive_command.  Then at the end of the backup we ensure that 
    all the WAL required to make the backup consistent has made it to the repo.
    
    > but BART stores each backup in a separate directly without any
    > associated WAL, and then keeps all the WAL together in a different
    > directory. I imagine that people who are using continuous archiving
    > also tend to use -Xnone, or if they do backups by copying the files
    > rather than using pg_backrest, they exclude pg_wal. In fact, for
    > people with big, important databases, I'd assume that would be the
    > normal pattern. You presumably wouldn't want to keep one copy of the
    > WAL files taken during the backup with the backup itself, and a
    > separate copy in the archive.
    
    pgBackRest does provide the option to copy WAL into the backup directory 
    for the super-paranoid, though it is not the default. It is pretty handy 
    for moving individual backups some other medium like tape, though.
    
    If -Xnone is specified then it seems like pg_validatebackup is 
    completely off the hook.  But in the case of -Xstream or -Xfetch 
    couldn't we at least verify that the expected WAL segments are present 
    and the correct size?
    
    Storing the start/stop lsn in the manifest would be a nice thing to have 
    anyway and that would make this feature pretty trivial. Yeah, that's in 
    the backup_label file as well but the manifest is so much easier to read.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  159. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T20:32:25Z

    Hi,
    
    On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > > Are you planning to include a specification of the manifest file format
    > > anywhere? I looked through the patches and didn't find anything.
    > 
    > I thought about that. I think it would be good to have. I was sort of
    > hoping to leave it for a follow-on patch, but maybe that's cheating
    > too much.
    
    I don't like having a file format that's intended to be used by external
    tools too that's undocumented except for code that assembles it in a
    piecemeal fashion.  Do you mean in a follow-on patch this release, or
    later? I don't have a problem with the former.
    
    
    > > I think it'd also be good to include more information about what the
    > > point of manifest files actually is.
    > 
    > What kind of information do you want to see included there? Basically,
    > the way the documentation is written right now, it essentially says,
    > well, we have this manifest thing so that you can later run
    > pg_validatebackup, and pg_validatebackup says that it's there to check
    > the integrity of backups using the manifest. This is all a bit
    > circular, though, and maybe needs elaboration.
    
    I do found it to be circular. I think we mostly need a paragraph or two
    somewhere that explains on a higher level what the point of verifying
    base backups is and what is verified.
    
    
    > > Hm. Is it a great choice to include the checksum for the manifest inside
    > > the manifest itself? With a cryptographic checksum it seems like it
    > > could make a ton of sense to store the checksum somewhere "safe", but
    > > keep the manifest itself alongside the base backup itself. While not
    > > huge, they won't be tiny either.
    > 
    > Seems like the user could just copy the manifest checksum and store it
    > somewhere, if they wish. Then they can check it against the manifest
    > itself later, if they wish. Or they can take a SHA-512 of the whole
    > file and store that securely. The problem is that we have no idea how
    > to write that checksum to a more security storage. We could write
    > backup_manifest and backup_manifest.checksum into separate files, but
    > that seems like it's adding complexity without any real benefit.
    > 
    > To me, the security-related uses of this patch seem to be fairly
    > niche. I think it's nice that they exist, but I don't think that's the
    > main selling point. For me, the main selling point is that you can
    > check that your disk didn't eat your data and that nobody nuked any
    > files that were supposed to be there.
    
    Oh, I agree. I wasn't really mentioning the crypto checksum because of
    it being "security" stuff, but because of the quality of the guarantee
    it gives. I don't know how large the manifest file will be for a setup
    of with a lot of partitioned tables, but I'd expect it to not be
    tiny. So not having to store it in the 'archiving sytem' is nice.
    
    FWIW, I was thinking of backup_manifest.checksum potentially being
    desirable for another reason: The need to embed the checksum inside the
    document imo adds a fair bit of rigidity to the file format. See
    
    > +static void
    > +verify_manifest_checksum(JsonManifestParseState *parse, char *buffer,
    > +						 size_t size)
    > +{
    ...
    > +
    > +	/* Find the last two newlines in the file. */
    > +	for (i = 0; i < size; ++i)
    > +	{
    > +		if (buffer[i] == '\n')
    > +		{
    > +			++number_of_newlines;
    > +			penultimate_newline = ultimate_newline;
    > +			ultimate_newline = i;
    > +		}
    > +	}
    > +
    > +	/*
    > +	 * Make sure that the last newline is right at the end, and that there are
    > +	 * at least two lines total. We need this to be true in order for the
    > +	 * following code, which computes the manifest checksum, to work properly.
    > +	 */
    > +	if (number_of_newlines < 2)
    > +		json_manifest_parse_failure(parse->context,
    > +									"expected at least 2 lines");
    > +	if (ultimate_newline != size - 1)
    > +		json_manifest_parse_failure(parse->context,
    > +									"last line not newline-terminated");
    > +
    > +	/* Checksum the rest. */
    > +	pg_sha256_init(&manifest_ctx);
    > +	pg_sha256_update(&manifest_ctx, (uint8 *) buffer, penultimate_newline + 1);
    > +	pg_sha256_final(&manifest_ctx, manifest_checksum_actual);
    
    which certainly isn't "free form json".
    
    
    > > Doesn't have to be in the first version, but could it be useful to move
    > > this to common/ or such?
    > 
    > Yeah. At one point, this code was written in a way that was totally
    > specific to pg_validatebackup, but I then realized that it would be
    > better to make it more general, so I refactored it into in the form
    > you see now, where pg_validatebackup.c depends on parse_manifest.c but
    > not the reverse. I suspect that if someone wants to use this for
    > something else they might need to change a few more things - not sure
    > exactly what - but I don't think it would be too hard. I thought it
    > would be best to leave that task until someone has a concrete use case
    > in mind, but I did want it to to be relatively easy to do that down
    > the road, and I hope that the way I've organized the code achieves
    > that.
    
    Cool.
    
    
    > > > +static void
    > > > +validate_backup_directory(validator_context *context, char *relpath,
    > > > +                                               char *fullpath)
    > > > +{
    > >
    > > Hm. Should this warn if the directory's permissions are set too openly
    > > (world writable?)?
    > 
    > I don't think so, but it's pretty clear that different people have
    > different ideas about what the scope of this tool ought to be, even in
    > this first version.
    
    Yea. I don't have a strong opinion on this specific issue. I was mostly
    wondering because I've repeatedly seen people restore backups with world
    readable properties, and with that it's obviously possible for somebody
    else to change the contents after the checksum was computed.
    
    
    > > Hm. I think it'd be good to verify that the checksummed size is the same
    > > as the size of the file in the manifest.
    > 
    > That's checked in an earlier phase. Are you worried about the file
    > being modified after the first pass checks the size and before we come
    > through to do the checksumming?
    
    Not really, I wondered about it for a bit, and then decided that it's
    too remote an issue.
    
    What I've seen a couple of times is that actually reading a file can
    result in the file ending to be reported at a different position than
    what stat() said. So by crosschecking the size while reading with the
    one from stat (which was compared with the source system one) we'd make
    the errors much better. It's certainly easier to know where to start
    looking when validate says "error: read %llu bytes from file, expected
    %llu" or something along those lines, than when it just were to report a
    checksum error.
    
    There's also some crypto hash algorithm weaknesses that are easier to
    exploit when it's possible to append data to a known prefix, but that
    doesn't seem an obvious threat here.
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  160. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-27T20:39:29Z

    On 3/27/20 3:55 PM, Stephen Frost wrote:
    > * Robert Haas (robertmhaas@gmail.com) wrote:
    >> I think that what we have seen so far is that all of the SHA-n
    >> algorithms that PostgreSQL supports are about equally slow, so it
    >> doesn't really matter which one you pick there from a performance
    >> point of view. If you're not saying it has to be SHA-512 but you do
    >> want it to be SHA-256, I don't think that really fixes anything. Using
    >> CRC-32C does fix the performance issue, but I don't think you like
    >> that, either. We could default to having no checksums at all, or even
    >> no manifest at all, but I didn't get the impression that David, at
    >> least, wanted to go that way, and I don't like it either. It's not the
    >> world's best feature, but I think it's good enough to justify enabling
    >> it by default. So I'm not sure we have any options here that will
    >> satisfy you.
    > 
    > I do like having a manifest by default.  At this point it's pretty clear
    > that we've just got a fundamental disagreement that more words aren't
    > going to fix.  I'd rather we play it safe and use a sha256 hash and
    > accept that it's going to be slower by default, and then give users an
    > option to make it go faster if they want (though I'd much rather that
    > alternative be a 64bit CRC than a 32bit one).
    > 
    > Andres seems to agree with you.  I'm not sure where David sits on this
    > specific question.
    
    I would prefer a stronger checksum as the default but I would be fine 
    with SHA1, which is a bit faster.
    
    I believe the overhead of checksums is being overblown. In my experience 
    the vast majority of users are using compression and running the backup 
    over a network.  Once you have done those two things the cost of SHA1 is 
    pretty negligible.  As I posted way up-thread we found that just gzip -6 
    pushed the cost of SHA1 below 3% and that did not include network transfer.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  161. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T20:57:46Z

    Greetings,
    
    * Robert Haas (robertmhaas@gmail.com) wrote:
    > On Fri, Mar 27, 2020 at 11:26 AM Stephen Frost <sfrost@snowman.net> wrote:
    > > > Seems better to (later?) add support for generating manifests for WAL
    > > > files, and then have a tool that can verify all the manifests required
    > > > to restore a base backup.
    > >
    > > I'm not trying to expand on the feature set here or move the goalposts
    > > way down the road, which is what seems to be what's being suggested
    > > here.  To be clear, I don't have any objection to adding a generic tool
    > > for validating WAL as you're talking about here, but I also don't think
    > > that's required for pg_validatebackup.  What I do think we need is a
    > > check of the WAL that's fetched when people use pg_basebackup -Xstream
    > > or -Xfetch.  pg_basebackup itself has that check because it's critical
    > > to the backup being successful and valid.  Not having that basic
    > > validation of a backup really just isn't ok- there's a reason
    > > pg_basebackup has that check.
    > 
    > I don't understand how this could be done without significantly
    > complicating the architecture. As I said before, -Xstream sends WAL
    > over a separate connection that is unrelated to the one running
    > BASE_BACKUP, so the base-backup connection doesn't know what to
    > include in the manifest. Now you could do something like: once all of
    > the WAL files have been fetched, the client checksums all of those and
    > sends their names and checksums to the server, which turns around and
    > puts them into the manifest, which it then sends back to the client.
    > But that is actually quite a bit of additional complexity, and it's
    > pretty strange, too, because now you have the client checksumming some
    > files and the server checksumming others. I know you mentioned a few
    > different ideas before, but I think they all kinda have some problem
    > along these lines.
    
    I've made some suggestions before, also chatted about an idea with David
    that I'll outline here.
    
    First off- I'm a bit mystified why you are saying that the base backup
    connection doesn't know what to include in the manifest regarding WAL.
    The base-backup process determines the starting position (and then even
    puts it into the backup_label that's sent to the client), and then it
    directly returns the ending position at the end of the BASE_BACKUP
    command.  Given that we do know that information, then we just need to
    get the checksums/hashes for each of the WAL files, if it's been asked
    for.  How do we know checksums or hashes have been asked for in the
    WAL streaming connection?  We can have the pg_basebackup process ask for
    that when it connects to stream the WAL that's needed.
    
    Now the only part that's a little grotty is dealing with passing the
    checksums/hashes that the WAL stream connection calculates over to the
    base backup connection to include in the manifest.  Offhand though, it
    seems like we could drop a file in archive_status for that, perhaps
    "wal_checksums.PID" or such (the PID would be that of the PG backend
    that's doing the base backup, which we'd pass to START_REPLICATION).  Of
    course, the backup process would have to check and make sure that it got
    all the needed WAL file checksums, but since it knows the end, that
    shouldn't be too bad.
    
    > I also kinda disagree with the idea that the WAL should be considered
    > an integral part of the backup. I don't know how pgbackrest does
    > things, but BART stores each backup in a separate directly without any
    > associated WAL, and then keeps all the WAL together in a different
    > directory. I imagine that people who are using continuous archiving
    > also tend to use -Xnone, or if they do backups by copying the files
    > rather than using pg_backrest, they exclude pg_wal. In fact, for
    > people with big, important databases, I'd assume that would be the
    > normal pattern. You presumably wouldn't want to keep one copy of the
    > WAL files taken during the backup with the backup itself, and a
    > separate copy in the archive.
    
    I really don't know what to say to this.  WAL is absolutely critical to
    a backup being valid.  pgBackRest doesn't have a way to *just* validate
    a backup today, unfortunately, but we're planning to support it in the
    future and we will absolutely include in that validation checking all of
    the WAL that's part of the backup.
    
    I'm fine with forgoing all of this in the -X none case, as I've said
    elsewhere.  I think it'd be great for pg_receivewal to have a way to
    validate WAL and such, but that's a clearly new feature and it's
    independent from validating a backup.
    
    As it relates to how pgBackRest stores WAL, we actually do support both
    of the options you mention, because people with big important databases
    like to be extra paranoid.  WAL can either be stored in just the
    archive, or it can be stored in both the archive and in the backup (with
    '--archive-copy').  Note that this isn't done by just grabbing whatever
    is in pg_wal at the time of the backup, as that wouldn't actually work,
    but rather by copying the necessary WAL from the archive at the end of
    the backup.
    
    We do also check all WAL that's pulled from the archive by the restore
    command, though exactly what WAL is needed isn't something we know ahead
    of time (yet, anyway..  we are working on WAL parsing code that'll
    change that by actually scanning the WAL and storing all restore points,
    starting/ending times and transaction IDs, and anything else that can be
    used as a restore target, so we can figure out exactly all WAL that's
    needed to get to a particular restore target).
    
    We actually have someone who implemented an independent tool called
    check_pgbackrest which specifically has a "archives" check, for checking
    that the WAL is in the archive.  We plan to also provide a way to ask
    pgbackrest to confirm that there's no missing WAL, and that all of the
    WAL is valid.
    
    WAL is critical to a backup that's been taken in an online manner, no
    matter where it's stored.  A backup isn't valid without the WAL that's
    needed to reach consistency.
    
    Thanks,
    
    Stephen
    
  162. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T21:07:42Z

    Greetings,
    
    * Andres Freund (andres@anarazel.de) wrote:
    > On 2020-03-27 14:34:19 -0400, Robert Haas wrote:
    > > I think #2 is an interesting idea and could possibly reduce the danger
    > > of user confusion on this point considerably - because, let's face it,
    > > not everyone is going to read the documentation. However, I'm having a
    > > hard time figuring out exactly what we'd print. Right now on success,
    > > unless you specify -q, you get:
    > > 
    > > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > > backup successfully verified
    > > 
    > > But it feels strange and possibly confusing to me to print something like:
    > > 
    > > [rhaas ~]$ pg_validatebackup  ~/pgslave
    > > backup successfully verified (except for pg_wal)
    > 
    > You could print something like:
    > WAL necessary to restore this base backup can be validated with:
    > 
    > pg_waldump -p ~/pgslave -t tl -s backup_start_location -e backup_end_loc > /dev/null && echo true
    > 
    > Obviously that specific invocation sucks, but it'd not be hard to add an
    > option to waldump to not output anything.
    
    Interesting idea to use pg_waldump.
    
    I had suggested up-thread, and I'm still fine with, having
    pg_validatebackup scan the WAL and check the internal checksums.  I'd
    prefer an option that uses hashes to check when the user has asked for
    hashes with SHA256 or something, but at least scanning the WAL and
    making sure it validates its internal checksum (and is actually all
    there, which is pretty darn critical) would be enough to say that we're
    pretty sure the backup is valid.
    
    Thanks,
    
    Stephen
    
  163. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T21:44:07Z

    Greetings,
    
    * Andres Freund (andres@anarazel.de) wrote:
    > On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
    > > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > > > Hm. Should this warn if the directory's permissions are set too openly
    > > > (world writable?)?
    > > 
    > > I don't think so, but it's pretty clear that different people have
    > > different ideas about what the scope of this tool ought to be, even in
    > > this first version.
    > 
    > Yea. I don't have a strong opinion on this specific issue. I was mostly
    > wondering because I've repeatedly seen people restore backups with world
    > readable properties, and with that it's obviously possible for somebody
    > else to change the contents after the checksum was computed.
    
    For my 2c, at least, I don't think we need to check the directory
    permissions, but I wouldn't object to including a warning if they're set
    such that PG won't start.  I suppose +0 for "warn if they are such that
    PG won't start".
    
    Thanks,
    
    Stephen
    
  164. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T21:56:03Z

    Hi,
    
    On 2020-03-27 17:44:07 -0400, Stephen Frost wrote:
    > * Andres Freund (andres@anarazel.de) wrote:
    > > On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
    > > > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > > > > Hm. Should this warn if the directory's permissions are set too openly
    > > > > (world writable?)?
    > > > 
    > > > I don't think so, but it's pretty clear that different people have
    > > > different ideas about what the scope of this tool ought to be, even in
    > > > this first version.
    > > 
    > > Yea. I don't have a strong opinion on this specific issue. I was mostly
    > > wondering because I've repeatedly seen people restore backups with world
    > > readable properties, and with that it's obviously possible for somebody
    > > else to change the contents after the checksum was computed.
    > 
    > For my 2c, at least, I don't think we need to check the directory
    > permissions, but I wouldn't object to including a warning if they're set
    > such that PG won't start.  I suppose +0 for "warn if they are such that
    > PG won't start".
    
    I was thinking of that check not being just at the top-level, but in
    subdirectories too. It's easy to screw up the top and subdirectory
    permissions in different ways, e.g. when manually creating the database
    dir and then restoring a data directory directly into that.  IIRC
    postmaster doesn't check that at start.
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  165. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T22:00:40Z

    Hi,
    
    On 2020-03-27 17:07:42 -0400, Stephen Frost wrote:
    > I had suggested up-thread, and I'm still fine with, having
    > pg_validatebackup scan the WAL and check the internal checksums.  I'd
    > prefer an option that uses hashes to check when the user has asked for
    > hashes with SHA256 or something, but at least scanning the WAL and
    > making sure it validates its internal checksum (and is actually all
    > there, which is pretty darn critical) would be enough to say that we're
    > pretty sure the backup is valid.
    
    I'd say that actually parsing the WAL will give you a lot higher
    confidence than verifying a sha256 for each file. There's plenty of ways
    to screw up the pg_wal on the source server (I've seen several
    restore_commands doing so, particularly when eagerly fetching). Sure,
    it'll not help against an attacker, but I'm not sure I see the threat
    model.
    
    There's imo a cost argument against doing WAL verification by reading
    it, but that'd mostly be a factor when comparing against a faster
    whole-file checksum.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  166. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-27T22:07:23Z

    Hi,
    
    On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
    > I really don't know what to say to this.  WAL is absolutely critical to
    > a backup being valid.  pgBackRest doesn't have a way to *just* validate
    > a backup today, unfortunately, but we're planning to support it in the
    > future and we will absolutely include in that validation checking all of
    > the WAL that's part of the backup.
    
    Could you please address the fact that just about everybody uses base
    backups + later WAL to have a short data loss window? Integrating the
    WAL files necessary to make the base backup consistent doesn't achieve
    much if we can't verify the WAL files afterwards. And fairly obviously
    pg_basebackup can't do much about WAL created after its invocation.
    
    Given that we need something separate to address that "verification
    hole", I don't see why it's useful to have a special case solution (or
    rather multiple ones, for stream and fetch) inside pg_basebackup.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  167. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T22:09:10Z

    Greetings,
    
    * Andres Freund (andres@anarazel.de) wrote:
    > On 2020-03-27 17:44:07 -0400, Stephen Frost wrote:
    > > * Andres Freund (andres@anarazel.de) wrote:
    > > > On 2020-03-27 15:20:27 -0400, Robert Haas wrote:
    > > > > On Fri, Mar 27, 2020 at 2:29 AM Andres Freund <andres@anarazel.de> wrote:
    > > > > > Hm. Should this warn if the directory's permissions are set too openly
    > > > > > (world writable?)?
    > > > > 
    > > > > I don't think so, but it's pretty clear that different people have
    > > > > different ideas about what the scope of this tool ought to be, even in
    > > > > this first version.
    > > > 
    > > > Yea. I don't have a strong opinion on this specific issue. I was mostly
    > > > wondering because I've repeatedly seen people restore backups with world
    > > > readable properties, and with that it's obviously possible for somebody
    > > > else to change the contents after the checksum was computed.
    > > 
    > > For my 2c, at least, I don't think we need to check the directory
    > > permissions, but I wouldn't object to including a warning if they're set
    > > such that PG won't start.  I suppose +0 for "warn if they are such that
    > > PG won't start".
    > 
    > I was thinking of that check not being just at the top-level, but in
    > subdirectories too. It's easy to screw up the top and subdirectory
    > permissions in different ways, e.g. when manually creating the database
    > dir and then restoring a data directory directly into that.  IIRC
    > postmaster doesn't check that at start.
    
    Yeah, I'm pretty sure we don't check that at postmaster start..  which
    also means that we'll start up just fine even if the perms on
    subdirectories are odd or wrong, unless maybe we end up in a really odd
    state where a directory is 000'd or something.
    
    Of course..  this is all a mess when it comes to pg_basebackup, really,
    as previously discussed elsewhere, because what permissions and such you
    end up with actually depends on what *format* you use with
    pg_basebackup- it's different between 'tar' format and 'plain' format.
    That is, if you use 'tar' format, and then actually use 'tar' to
    extract, you get one set of privs, but if you use 'plain', you get
    something different.
    
    I mean..  pgBackRest sets all perms to whatever is in the manifest on
    restore (or delta), but this patch doesn't include the permissions on
    files, or ownership (something pgBackRest also tries to set, if
    possible, on restore), does it...?  Doesn't look like it on a quick
    look.  So if we want to compare to pgBackRest then, yes, we should
    include the permissions in the manifest and we should check that
    everything in the manifest matches what's on the filesystem.
    
    I don't think we should just compare all permissions or ownership with
    some arbitrary idea of what we think they should be, even though if you
    use pg_basebackup in 'plain' format, you actually end up with
    differences, today, from what the source system has.  In my view, that
    should actually be fixed, to the extent possible.
    
    Thanks,
    
    Stephen
    
  168. Re: backup manifests

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-03-27T22:17:00Z

    On 2020-Mar-27, Stephen Frost wrote:
    
    > I don't think we should just compare all permissions or ownership with
    > some arbitrary idea of what we think they should be, even though if you
    > use pg_basebackup in 'plain' format, you actually end up with
    > differences, today, from what the source system has.  In my view, that
    > should actually be fixed, to the extent possible.
    
    I posted some thoughts about this at
    https://www.postgresql.org/message-id/20190904201117.GA12986%40alvherre.pgsql
    I didn't get time to work on that myself.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  169. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T22:24:31Z

    Greetings,
    
    * Andres Freund (andres@anarazel.de) wrote:
    > On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
    > > I really don't know what to say to this.  WAL is absolutely critical to
    > > a backup being valid.  pgBackRest doesn't have a way to *just* validate
    > > a backup today, unfortunately, but we're planning to support it in the
    > > future and we will absolutely include in that validation checking all of
    > > the WAL that's part of the backup.
    > 
    > Could you please address the fact that just about everybody uses base
    > backups + later WAL to have a short data loss window? Integrating the
    > WAL files necessary to make the base backup consistent doesn't achieve
    > much if we can't verify the WAL files afterwards. And fairly obviously
    > pg_basebackup can't do much about WAL created after its invocation.
    
    I feel like we have very different ideas about what "just about
    everybody" does here.  In my view, folks use pg_basebackup because it's
    easy and they can create self-contained backups that include all the WAL
    needed to get the backup up and running again and they don't typically
    care about PITR all that much.  Folks who care about PITR use something
    that manages WAL for them, which pg_basebackup and pg_receivewal really
    don't do and it's not easy to add scripting around them to figure out
    what WAL is needed for what backup, etc.
    
    If we didn't think that the ability to create a self-contained backup
    was useful, it sure seems odd that we've done a lot to make that work
    (having both fetch and stream modes for it) and that it's the default.
    
    > Given that we need something separate to address that "verification
    > hole", I don't see why it's useful to have a special case solution (or
    > rather multiple ones, for stream and fetch) inside pg_basebackup.
    
    Well, the proposal up-thread would end up with almost zero changes to
    pg_basebackup itself, but, yes, there'd be changes to BASE_BACKUP and
    different ones for STREAMING_REPLICATION to support getting the WAL
    checksums into the manifest.
    
    Thanks,
    
    Stephen
    
  170. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-27T22:33:51Z

    On 3/27/20 6:07 PM, Andres Freund wrote:
    > Hi,
    > 
    > On 2020-03-27 16:57:46 -0400, Stephen Frost wrote:
    >> I really don't know what to say to this.  WAL is absolutely critical to
    >> a backup being valid.  pgBackRest doesn't have a way to *just* validate
    >> a backup today, unfortunately, but we're planning to support it in the
    >> future and we will absolutely include in that validation checking all of
    >> the WAL that's part of the backup.
    > 
    > Could you please address the fact that just about everybody uses base
    > backups + later WAL to have a short data loss window? Integrating the
    > WAL files necessary to make the base backup consistent doesn't achieve
    > much if we can't verify the WAL files afterwards. And fairly obviously
    > pg_basebackup can't do much about WAL created after its invocation.
    > 
    > Given that we need something separate to address that "verification
    > hole", I don't see why it's useful to have a special case solution (or
    > rather multiple ones, for stream and fetch) inside pg_basebackup.
    
    There's a pretty big difference between not being able to play forward 
    to the end of WAL and not being able to get the backup to restore to 
    consistency at all.
    
    The WAL that is generated during during the backup has special 
    importance. Without it you have no backup at all.  It's the difference 
    between *some* data loss and *total* data loss.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  171. Re: backup manifests

    Bruce Momjian <bruce@momjian.us> — 2020-03-27T22:36:17Z

    On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
    > * Robert Haas (robertmhaas@gmail.com) wrote:
    > > This is where I feel like I'm trying to make decisions in a vacuum. If
    > > we had a few more people weighing in on the thread on this point, I'd
    > > be happy to go with whatever the consensus was. If most people think
    > > having both --no-manifest (suppressing the manifest completely) and
    > > --manifest-checksums=none (suppressing only the checksums) is useless
    > > and confusing, then sure, let's rip the latter one out. If most people
    > > like the flexibility, let's keep it: it's already implemented and
    > > tested. But I hate to base the decision on what one or two people
    > > think.
    > 
    > I'm frustrated at the lack of involvement from others also.
    
    Well, the topic of backup manifests feels like it has generated a lot of
    bickering emails, and people don't want to spend their time dealing with
    that.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EnterpriseDB                             https://enterprisedb.com
    
    + As you are, so once was I.  As I am, so you will be. +
    +                      Ancient Roman grave inscription +
    
    
    
    
  172. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-27T22:38:33Z

    Greetings,
    
    On Fri, Mar 27, 2020 at 18:36 Bruce Momjian <bruce@momjian.us> wrote:
    
    > On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
    > > * Robert Haas (robertmhaas@gmail.com) wrote:
    > > > This is where I feel like I'm trying to make decisions in a vacuum. If
    > > > we had a few more people weighing in on the thread on this point, I'd
    > > > be happy to go with whatever the consensus was. If most people think
    > > > having both --no-manifest (suppressing the manifest completely) and
    > > > --manifest-checksums=none (suppressing only the checksums) is useless
    > > > and confusing, then sure, let's rip the latter one out. If most people
    > > > like the flexibility, let's keep it: it's already implemented and
    > > > tested. But I hate to base the decision on what one or two people
    > > > think.
    > >
    > > I'm frustrated at the lack of involvement from others also.
    >
    > Well, the topic of backup manifests feels like it has generated a lot of
    > bickering emails, and people don't want to spend their time dealing with
    > that.
    
    
    I’d like to not also.  I suppose it’s just an area that I’m particularly
    concerned with that allows me to overcome that. Backups are important to me.
    
    Thanks,
    
    Stephen
    
    >
    
  173. Re: backup manifests

    Bruce Momjian <bruce@momjian.us> — 2020-03-27T22:39:46Z

    On Fri, Mar 27, 2020 at 06:38:33PM -0400, Stephen Frost wrote:
    > Greetings,
    > 
    > On Fri, Mar 27, 2020 at 18:36 Bruce Momjian <bruce@momjian.us> wrote:
    > 
    >     On Thu, Mar 26, 2020 at 12:34:52PM -0400, Stephen Frost wrote:
    >     > * Robert Haas (robertmhaas@gmail.com) wrote:
    >     > > This is where I feel like I'm trying to make decisions in a vacuum. If
    >     > > we had a few more people weighing in on the thread on this point, I'd
    >     > > be happy to go with whatever the consensus was. If most people think
    >     > > having both --no-manifest (suppressing the manifest completely) and
    >     > > --manifest-checksums=none (suppressing only the checksums) is useless
    >     > > and confusing, then sure, let's rip the latter one out. If most people
    >     > > like the flexibility, let's keep it: it's already implemented and
    >     > > tested. But I hate to base the decision on what one or two people
    >     > > think.
    >     >
    >     > I'm frustrated at the lack of involvement from others also.
    > 
    >     Well, the topic of backup manifests feels like it has generated a lot of
    >     bickering emails, and people don't want to spend their time dealing with
    >     that.
    > 
    > 
    > I’d like to not also.  I suppose it’s just an area that I’m particularly
    > concerned with that allows me to overcome that. Backups are important to me.
    
    The big question is whether the discussion _needs_ to be that way.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EnterpriseDB                             https://enterprisedb.com
    
    + As you are, so once was I.  As I am, so you will be. +
    +                      Ancient Roman grave inscription +
    
    
    
    
  174. Re: backup manifests

    Noah Misch <noah@leadboat.com> — 2020-03-29T03:40:10Z

    On Fri, Mar 27, 2020 at 01:53:54PM -0400, Robert Haas wrote:
    > - Replace a doc paragraph about the advantages and disadvantages of
    > CRC-32C with one by Stephen Frost, with a slightly change by me that I
    > thought made it sound more grammatical.
    
    Defaulting to CRC-32C seems prudent to me:
    
    - As Andres Freund said, SHA-512 is slow relative to storage now available.
      Since gzip is a needlessly-slow choice for backups (or any application that
      copies the compressed data just a few times), comparison to "gzip -6" speed
      is immaterial.
    
    - While I'm sure some other fast hash would be a superior default, introducing
      a new algorithm is a bikeshed, as you said.  This design makes it easy,
      technically, for someone to introduce a new algorithm later.  CRC-32C is not
      catastrophically unfit for 1GiB files.
    
    - Defaulting to SHA-512 would, in the absence of a WAL archive that also uses
      a cryptographic hash function, give a false sense of having achieved some
      coherent cryptographic goal.  With the CRC-32C default, WAL and the rest get
      similar protection.  I'm discounting the case of using BASE_BACKUP without a
      WAL archive, because I expect little intersection between sites "worried
      enough to hash everything" and those "not worried enough to use an archive".
      (On the other hand, the program that manages the WAL archive can reasonably
      own hashing base backups; putting ownership in the server isn't achieving
      much extra.)
    
    > + <refnamediv>
    > +  <refname>pg_validatebackup</refname>
    > +  <refpurpose>verify the integrity of a base backup of a
    > +  <productname>PostgreSQL</productname> cluster</refpurpose>
    > + </refnamediv>
    
    > +    <listitem>
    > +      <para>
    > +        <literal>pg_wal</literal> is ignored because WAL files are sent
    > +        separately from the backup, and are therefore not described by the
    > +        backup manifest.
    > +      </para>
    > +    </listitem>
    
    Stephen Frost mentioned that a backup could pass validation even if
    pg_basebackup were killed after writing the base backup and before finishing
    the writing of pg_wal.  One might avoid that by simply writing the manifest to
    a temporary name and renaming it to the final name after populating pg_wal.
    
    What do you think of having the verification process also call pg_waldump to
    validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
    
    I think this functionality doesn't belong in its own program.  If you suspect
    pg_basebackup or pg_restore will eventually gain the ability to merge
    incremental backups into a recovery-ready base backup, I would put the
    functionality in that program.  Otherwise, I would put it in pg_checksums.
    For me, part of the friction here is that the program description indicates
    general verification, but the actual functionality merely checks hashes on a
    directory tree that happens to represent a PostgreSQL base backup.
    
    > +		parse->pathname = palloc(raw_length + 1);
    
    I don't see this freed anywhere; is it?  (It's useful to make peak memory
    consumption not grow in proportion to the number of files backed up.)
    
    [This message is not a full code review.]
    
    
    
    
  175. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T00:33:51Z

    On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    > I don't like having a file format that's intended to be used by external
    > tools too that's undocumented except for code that assembles it in a
    > piecemeal fashion.  Do you mean in a follow-on patch this release, or
    > later? I don't have a problem with the former.
    
    This release. I'm happy to work on that as soon as this gets
    committed, assuming it gets committed.
    
    > I do found it to be circular. I think we mostly need a paragraph or two
    > somewhere that explains on a higher level what the point of verifying
    > base backups is and what is verified.
    
    Fair enough.
    
    > FWIW, I was thinking of backup_manifest.checksum potentially being
    > desirable for another reason: The need to embed the checksum inside the
    > document imo adds a fair bit of rigidity to the file format. See
    
    Well, David Steele suggested this approach. I didn't particularly like
    it, but nobody showed up to agree with me or propose anything
    different, so here we are. I don't think it's the end of the world.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  176. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T00:42:35Z

    On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
    > Stephen Frost mentioned that a backup could pass validation even if
    > pg_basebackup were killed after writing the base backup and before finishing
    > the writing of pg_wal.  One might avoid that by simply writing the manifest to
    > a temporary name and renaming it to the final name after populating pg_wal.
    
    Huh, that's an idea. I'll have a look at the code and see what would
    be involved.
    
    > What do you think of having the verification process also call pg_waldump to
    > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
    
    I don't love calls to external binaries, but I think the thing that
    really bothers me is that pg_waldump is practically bound to terminate
    with an error, because the last WAL segment will end with a partial
    record. For the same reason, I think there's really no such thing as
    validating a single WAL file. I suppose you'd need to know the exact
    start and end locations for a minimal WAL replay and check that all
    records between those LSNs appear OK, ignoring any apparent problems
    after the minimum ending point, or at least ignoring any problems due
    to an incomplete record in the last file. We don't have a tool for
    that currently, and I don't think I can write one this week. Or at
    least, not a good one.
    
    > I think this functionality doesn't belong in its own program.  If you suspect
    > pg_basebackup or pg_restore will eventually gain the ability to merge
    > incremental backups into a recovery-ready base backup, I would put the
    > functionality in that program.  Otherwise, I would put it in pg_checksums.
    > For me, part of the friction here is that the program description indicates
    > general verification, but the actual functionality merely checks hashes on a
    > directory tree that happens to represent a PostgreSQL base backup.
    
    Suraj's original patch made this part of pg_basebackup, but I didn't
    really like that, because I wanted it to have its own set of options.
    I still think all the options I've added are pretty useful ones, and I
    can think of other things somebody might want to do. It feels very
    uncomfortable to make pg_basebackup, or pg_checksums, take either
    options from set A and do thing X, or options from set B and do thing
    Y. But it feels clear that the name pg_validatebackup is not going
    over very well with anyone. I think I should rename it to
    pg_validatemanifest.
    
    > > +             parse->pathname = palloc(raw_length + 1);
    >
    > I don't see this freed anywhere; is it?  (It's useful to make peak memory
    > consumption not grow in proportion to the number of files backed up.)
    
    We need the hash table to remain populated for the whole run time of
    the tool, because we're essentially doing a full join of the actual
    directory contents against the manifest contents. That's a bit
    unfortunate but it doesn't seem simple to improve. I think the only
    people who are really going to suffer are people who have an enormous
    pile of empty or nearly-empty relations. People who have large
    databases for the normal reason - i.e. a reasonable number of tables
    that hold a lot of data - will have manifests of very manageable size.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  177. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T00:47:40Z

    On Fri, Mar 27, 2020 at 4:02 PM David Steele <david@pgmasters.net> wrote:
    > I prefer to validate the size and checksum in the same pass, but I'm not
    > sure it's that big a deal.  If the backup is being corrupted under the
    > validate process that would also apply to files that had already been
    > validated.
    
    I did it like this because I thought that in typical scenarios it
    would be likely to produce useful results more quickly. For instance,
    suppose that you forget to restore the tablespace directories, and
    just get the main $PGDATA directory. Well, if you do it all in one
    pass, you might spend a long time checksumming things before you
    realize that some files are completely missing. I thought it would be
    useful to complain about files that are extra or missing or the wrong
    size FIRST, because that only requires us to stat() each file, and
    only after that do the comparatively extensive checksumming step that
    requires us to read the entire contents of each file. Granted, unless
    you use --exit-on-error, you're going to get all the complaints
    eventually anyway, but you might use that option, or you might hit ^C
    when you start to see a slough of complaints poppoing out.
    
    Maybe that was the wrong idea, but I thought people would like the
    idea of running cheaper checks first. I wasn't worried about
    concurrent modification of the backup because then you're super-hosed
    no matter what.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  178. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T00:48:58Z

    On 3/29/20 8:33 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    > 
    >> FWIW, I was thinking of backup_manifest.checksum potentially being
    >> desirable for another reason: The need to embed the checksum inside the
    >> document imo adds a fair bit of rigidity to the file format. See
    > 
    > Well, David Steele suggested this approach. I didn't particularly like
    > it, but nobody showed up to agree with me or propose anything
    > different, so here we are. I don't think it's the end of the world.
    
    I prefer the embedded checksum even though it is a pain. It's a lot less 
    likely to go missing.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  179. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T00:54:41Z

    On 3/29/20 8:42 PM, Robert Haas wrote:
    > On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
    >> I don't see this freed anywhere; is it?  (It's useful to make peak memory
    >> consumption not grow in proportion to the number of files backed up.)
    > 
    > We need the hash table to remain populated for the whole run time of
    > the tool, because we're essentially doing a full join of the actual
    > directory contents against the manifest contents. That's a bit
    > unfortunate but it doesn't seem simple to improve. I think the only
    > people who are really going to suffer are people who have an enormous
    > pile of empty or nearly-empty relations. People who have large
    > databases for the normal reason - i.e. a reasonable number of tables
    > that hold a lot of data - will have manifests of very manageable size.
    
    +1
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  180. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T00:59:19Z

    Hi,
    
    On 2020-03-29 20:47:40 -0400, Robert Haas wrote:
    > Maybe that was the wrong idea, but I thought people would like the
    > idea of running cheaper checks first. I wasn't worried about
    > concurrent modification of the backup because then you're super-hosed
    > no matter what.
    
    I do like that approach.
    
    To be clear: I'm suggesting the additional crosscheck not because I'm
    not concerned with concurrent modifications, but because I've seen
    filesystem per-inode metadata and the actual data / extent-tree
    differ. Leading to EOF reported while reading at a different place than
    what the size via stat() would indicate.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  181. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T01:05:17Z

    On 3/29/20 8:47 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 4:02 PM David Steele <david@pgmasters.net> wrote:
    >> I prefer to validate the size and checksum in the same pass, but I'm not
    >> sure it's that big a deal.  If the backup is being corrupted under the
    >> validate process that would also apply to files that had already been
    >> validated.
    > 
    > I did it like this because I thought that in typical scenarios it
    > would be likely to produce useful results more quickly. For instance,
    > suppose that you forget to restore the tablespace directories, and
    > just get the main $PGDATA directory. Well, if you do it all in one
    > pass, you might spend a long time checksumming things before you
    > realize that some files are completely missing. I thought it would be
    > useful to complain about files that are extra or missing or the wrong
    > size FIRST, because that only requires us to stat() each file, and
    > only after that do the comparatively extensive checksumming step that
    > requires us to read the entire contents of each file. Granted, unless
    > you use --exit-on-error, you're going to get all the complaints
    > eventually anyway, but you might use that option, or you might hit ^C
    > when you start to see a slough of complaints poppoing out.
    
    Yeah, that seems reasonable.
    
    In our case backups are nearly always compressed and/or encrypted so 
    even checking the original size is a bit of work. Getting the checksum 
    at the same time seems like an obvious win.
    
    Currently we don't have a separate validate command outside of restore 
    but when we do we'll consider doing a pass to check for file presence 
    (and size when possible) first. Thanks!
    
    > I wasn't worried about
    > concurrent modification of the backup because then you're super-hosed
    > no matter what.
    
    Really, really, super-hosed.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  182. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T01:07:40Z

    Hi,
    
    On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
    > > What do you think of having the verification process also call pg_waldump to
    > > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
    > 
    > I don't love calls to external binaries, but I think the thing that
    > really bothers me is that pg_waldump is practically bound to terminate
    > with an error, because the last WAL segment will end with a partial
    > record.
    
    I don't think that's the case here. You should know the last required
    record, which should allow to specify the precise end for pg_waldump. If
    it errors out reading to that point, we'd be in trouble.
    
    
    > For the same reason, I think there's really no such thing as
    > validating a single WAL file. I suppose you'd need to know the exact
    > start and end locations for a minimal WAL replay and check that all
    > records between those LSNs appear OK, ignoring any apparent problems
    > after the minimum ending point, or at least ignoring any problems due
    > to an incomplete record in the last file. We don't have a tool for
    > that currently, and I don't think I can write one this week. Or at
    > least, not a good one.
    
    pg_waldump -s / -e?
    
    
    > > > +             parse->pathname = palloc(raw_length + 1);
    > >
    > > I don't see this freed anywhere; is it?  (It's useful to make peak memory
    > > consumption not grow in proportion to the number of files backed up.)
    > 
    > We need the hash table to remain populated for the whole run time of
    > the tool, because we're essentially doing a full join of the actual
    > directory contents against the manifest contents. That's a bit
    > unfortunate but it doesn't seem simple to improve. I think the only
    > people who are really going to suffer are people who have an enormous
    > pile of empty or nearly-empty relations. People who have large
    > databases for the normal reason - i.e. a reasonable number of tables
    > that hold a lot of data - will have manifests of very manageable size.
    
    Given that that's a pre-existing issue - at a significantly larger scale
    imo - e.g. for pg_dump (even in the --schema-only case), and that there
    are tons of backend side issues with lots of relations too, I think
    that's fine.
    
    You could of course implement something merge-join like, and implement
    the sorted input via a disk base sort. But that's a lot of work (good
    luck making tuplesort work in the frontend...). So I'd not go there
    unless there's a lot of evidence this is a serious practical issue.
    
    If we find this use too much memory, I think we'd be better off
    condensing pathnames into either fewer allocations, or a RelFileNode as
    part of the struct (with a fallback to string for other types of
    files). But I'd also not go there for now.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  183. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T01:23:06Z

    On 3/29/20 9:07 PM, Andres Freund wrote:
    > On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
    >>> What do you think of having the verification process also call pg_waldump to
    >>> validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
    >>
    >> I don't love calls to external binaries, but I think the thing that
    >> really bothers me is that pg_waldump is practically bound to terminate
    >> with an error, because the last WAL segment will end with a partial
    >> record.
    > 
    > I don't think that's the case here. You should know the last required
    > record, which should allow to specify the precise end for pg_waldump. If
    > it errors out reading to that point, we'd be in trouble.
    
    Exactly. All WAL generated during the backup should read fine with 
    pg_waldump or there is a problem.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  184. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T02:08:14Z

    Hi,
    
    On 2020-03-29 21:23:06 -0400, David Steele wrote:
    > On 3/29/20 9:07 PM, Andres Freund wrote:
    > > On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
    > > > > What do you think of having the verification process also call pg_waldump to
    > > > > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
    > > > 
    > > > I don't love calls to external binaries, but I think the thing that
    > > > really bothers me is that pg_waldump is practically bound to terminate
    > > > with an error, because the last WAL segment will end with a partial
    > > > record.
    > > 
    > > I don't think that's the case here. You should know the last required
    > > record, which should allow to specify the precise end for pg_waldump. If
    > > it errors out reading to that point, we'd be in trouble.
    > 
    > Exactly. All WAL generated during the backup should read fine with
    > pg_waldump or there is a problem.
    
    See the attached minimal prototype for what I am thinking of.
    
    This would not correctly handle the case where the timeline changes
    while taking a base backup. But I'm not sure that'd be all that serious
    a limitation for now?
    
    I'd personally not want to use a base backup that included a timeline
    switch...
    
    Greetings,
    
    Andres Freund
    
  185. Re: backup manifests

    Noah Misch <noah@leadboat.com> — 2020-03-30T05:58:54Z

    On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
    > On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah@leadboat.com> wrote:
    > > I think this functionality doesn't belong in its own program.  If you suspect
    > > pg_basebackup or pg_restore will eventually gain the ability to merge
    > > incremental backups into a recovery-ready base backup, I would put the
    > > functionality in that program.  Otherwise, I would put it in pg_checksums.
    > > For me, part of the friction here is that the program description indicates
    > > general verification, but the actual functionality merely checks hashes on a
    > > directory tree that happens to represent a PostgreSQL base backup.
    > 
    > Suraj's original patch made this part of pg_basebackup, but I didn't
    > really like that, because I wanted it to have its own set of options.
    > I still think all the options I've added are pretty useful ones, and I
    > can think of other things somebody might want to do. It feels very
    > uncomfortable to make pg_basebackup, or pg_checksums, take either
    > options from set A and do thing X, or options from set B and do thing
    > Y.
    
    pg_checksums does already have that property, for what it's worth.  (More
    specifically, certain options dictate the mode, and it reports an error if
    another option is incompatible with the mode.)
    
    > But it feels clear that the name pg_validatebackup is not going
    > over very well with anyone. I think I should rename it to
    > pg_validatemanifest.
    
    Between those two, I would use "pg_validatebackup" if there's a fair chance it
    will end up doing the pg_waldump check.  Otherwise, I would use
    "pg_validatemanifest".  I still most prefer delivering this as a mode of an
    existing program.
    
    > > > +             parse->pathname = palloc(raw_length + 1);
    > >
    > > I don't see this freed anywhere; is it?  (It's useful to make peak memory
    > > consumption not grow in proportion to the number of files backed up.)
    > 
    > We need the hash table to remain populated for the whole run time of
    > the tool, because we're essentially doing a full join of the actual
    > directory contents against the manifest contents. That's a bit
    > unfortunate but it doesn't seem simple to improve. I think the only
    > people who are really going to suffer are people who have an enormous
    > pile of empty or nearly-empty relations. People who have large
    > databases for the normal reason - i.e. a reasonable number of tables
    > that hold a lot of data - will have manifests of very manageable size.
    
    Okay.
    
    
    
    
  186. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-30T06:24:33Z

    On Mon, Mar 30, 2020 at 11:28 AM Noah Misch <noah@leadboat.com> wrote:
    >
    > On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
    >
    > > But it feels clear that the name pg_validatebackup is not going
    > > over very well with anyone. I think I should rename it to
    > > pg_validatemanifest.
    >
    > Between those two, I would use "pg_validatebackup" if there's a fair chance it
    > will end up doing the pg_waldump check.  Otherwise, I would use
    > "pg_validatemanifest".
    >
    
    +1.
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  187. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T18:35:40Z

    On Sun, Mar 29, 2020 at 10:08 PM Andres Freund <andres@anarazel.de> wrote:
    > See the attached minimal prototype for what I am thinking of.
    >
    > This would not correctly handle the case where the timeline changes
    > while taking a base backup. But I'm not sure that'd be all that serious
    > a limitation for now?
    >
    > I'd personally not want to use a base backup that included a timeline
    > switch...
    
    Interesting concept. I've never (or almost never) used the -s and -e
    options to pg_waldump, so I didn't think about using those. I think
    having a --just-parse option to pg_waldump is a good idea, though
    maybe not with that name e.g. we could call it --quiet.
    
    It is less obvious to me what to do about all that as it pertains to
    the current patch. If we want pg_validatebackup to run pg_waldump in
    that mode or print out a hint about how to run pg_waldump in that
    mode, it would need to obtain the relevant LSNs. I guess that would
    require reading the backup_label file. It's not clear to me what we
    would do if the backup crosses a timeline switch, assuming that's even
    a case pg_basebackup allows. If we don't want to do anything in
    pg_validatebackup automatically but just want to document this as a a
    possible technique, we could finesse that problem with some
    weasel-wording.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  188. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T18:59:44Z

    Hi,
    
    On 2020-03-30 14:35:40 -0400, Robert Haas wrote:
    > On Sun, Mar 29, 2020 at 10:08 PM Andres Freund <andres@anarazel.de> wrote:
    > > See the attached minimal prototype for what I am thinking of.
    > >
    > > This would not correctly handle the case where the timeline changes
    > > while taking a base backup. But I'm not sure that'd be all that serious
    > > a limitation for now?
    > >
    > > I'd personally not want to use a base backup that included a timeline
    > > switch...
    >
    > Interesting concept. I've never (or almost never) used the -s and -e
    > options to pg_waldump, so I didn't think about using those.
    
    Oh - it's how I use it most of the time when investigating a specific
    problem. I just about always use -s, and often -e. Besides just reducing
    the logging output, and avoiding spurious errors, it makes it a lot
    easier to iteratively expand the logging for records that are
    problematic for the case at hand.
    
    
    > I think
    > having a --just-parse option to pg_waldump is a good idea, though
    > maybe not with that name e.g. we could call it --quiet.
    
    Yea, I didn't like the option's name. It's just the first thing that
    came to mind.
    
    
    > It is less obvious to me what to do about all that as it pertains to
    > the current patch.
    
    FWIW, I personally think we can live with this not validating WAL in the
    first release. But I also think it'd be within reach to do better and
    allow for WAL verification.
    
    
    > If we want pg_validatebackup to run pg_waldump in that mode or print
    > out a hint about how to run pg_waldump in that mode, it would need to
    > obtain the relevant LSNs.
    
    We could just include those in the manifest. Seems like good information
    to have in there to me, as it allows to build the complete list of files
    needed for a restore.
    
    
    > It's not clear to me what we would do if the backup crosses a timeline
    > switch, assuming that's even a case pg_basebackup allows.
    
    I've not tested it, but it sure looks like it's possible. Both by having
    a standby replaying from a node that promotes (multiple timeline
    switches possible too, I think, if the WAL source follows timelines),
    and by backing up from a standby that's being promoted.
    
    
    > If we don't want to do anything in pg_validatebackup automatically but
    > just want to document this as a a possible technique, we could finesse
    > that problem with some weasel-wording.
    
    It'd probably not be too hard to simply emit multiple commands, one for
    each timeline "segment".
    
    I wonder if it'd not be best, independent of whether we build in this
    verification, to include that metadata in the manifest file. That's for
    sure better than having to build a separate tool to parse timeline
    history files.
    
    I think it wouldn't be too hard to compute that information while taking
    the base backup. We know the end timeline (ThisTimeLineID), so we can
    just call readTimeLineHistory(ThisTimeLineID). Which should then allow
    for something pretty trivial along the lines of
    
    timelines = readTimeLineHistory(ThisTimeLineID);
    last_start = InvalidXLogRecPtr;
    foreach(lc, timelines)
    {
        TimeLineHistoryEntry *he = lfirst(lc);
    
        if (he->end < startptr)
            continue;
    
        //
        manifest_emit_wal_range(Min(he->begin, startptr), he->end);
        last_start = he->end;
    }
    
    if (last_start == InvalidXlogRecPtr)
       start = startptr;
    else
       start = last_start;
    
    manifest_emit_wal_range(start, entptr);
    
    
    Btw, just in case somebody suggests it: I don't think it's possible to
    compute the WAL checksums at this point. In stream mode WAL very well
    might already have been removed.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  189. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T19:04:55Z

    On Mon, Mar 30, 2020 at 2:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > > Between those two, I would use "pg_validatebackup" if there's a fair chance it
    > > will end up doing the pg_waldump check.  Otherwise, I would use
    > > "pg_validatemanifest".
    >
    > +1.
    
    I guess I'd like to be clear here that I have no fundamental
    disagreement with taking this tool in any direction that people would
    like it to go. For me it's just a question of timing. Feature freeze
    is now a week or so away, and nothing complicated is going to get done
    in that time. If we can all agree on something simple based on
    Andres's recent proposal, cool, but I'm not yet sure that will be the
    case, so what's plan B? We could decide that what I have here is just
    too little to be a viable facility on its own, but I think Stephen is
    the only one taking that position. We could release it as
    pg_validatemanifest with a plan to rename it if other backup-related
    checks are added later. We could release it as pg_validatebackup with
    the idea to avoid having to rename it when more backup-related checks
    are added later, but with a greater possibility of confusion in the
    meantime and no hard guarantee that anyone will actually develop such
    checks. We could put it in to pg_checksums, but I think that's really
    backing ourselves into a corner: if backup validation develops other
    checks that are not checksum-related, what then? I'd much rather
    gamble on keeping things together by topic (backup) than technology
    used internally (checksum). Putting it into pg_basebackup is another
    option, and would avoid that problem, but it's not my preferred
    option, because as I noted before, I think the command-line options
    will get confusing.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  190. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T19:16:31Z

    Hi,
    
    On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
    > I guess I'd like to be clear here that I have no fundamental
    > disagreement with taking this tool in any direction that people would
    > like it to go. For me it's just a question of timing. Feature freeze
    > is now a week or so away, and nothing complicated is going to get done
    > in that time. If we can all agree on something simple based on
    > Andres's recent proposal, cool, but I'm not yet sure that will be the
    > case, so what's plan B? We could decide that what I have here is just
    > too little to be a viable facility on its own, but I think Stephen is
    > the only one taking that position. We could release it as
    > pg_validatemanifest with a plan to rename it if other backup-related
    > checks are added later. We could release it as pg_validatebackup with
    > the idea to avoid having to rename it when more backup-related checks
    > are added later, but with a greater possibility of confusion in the
    > meantime and no hard guarantee that anyone will actually develop such
    > checks. We could put it in to pg_checksums, but I think that's really
    > backing ourselves into a corner: if backup validation develops other
    > checks that are not checksum-related, what then? I'd much rather
    > gamble on keeping things together by topic (backup) than technology
    > used internally (checksum). Putting it into pg_basebackup is another
    > option, and would avoid that problem, but it's not my preferred
    > option, because as I noted before, I think the command-line options
    > will get confusing.
    
    I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
    such. And eventually (definitely not this release) subsume pg_checksums
    in it. That way we can add other checkers too.
    
    I don't really see a point in ending up with lots of different commands
    over time. Partially because there's probably plenty checks where the
    overall cost can be drastically reduced by combining IO. Partially
    because there's probably plenty shareable infrastructure. And partially
    because I think it makes discovery for users a lot easier.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  191. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T19:23:08Z

    On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
    > I wonder if it'd not be best, independent of whether we build in this
    > verification, to include that metadata in the manifest file. That's for
    > sure better than having to build a separate tool to parse timeline
    > history files.
    
    I don't think that's better, or at least not "for sure better". The
    backup_label going to include the START TIMELINE, and if -Xfetch is
    used, we're also going to have all the timeline history files. If the
    backup manifest includes those same pieces of information, then we've
    got two sources of truth: one copy in the files the server's actually
    going to read, and another copy in the backup_manifest which we're
    going to potentially use for validation but ignore at runtime. That
    seems not great.
    
    > Btw, just in case somebody suggests it: I don't think it's possible to
    > compute the WAL checksums at this point. In stream mode WAL very well
    > might already have been removed.
    
    Right.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  192. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T20:16:10Z

    On Fri, Mar 27, 2020 at 3:51 PM David Steele <david@pgmasters.net> wrote:
    > There appear to be conflicts with 67e0adfb3f98:
    
    Rebased.
    
    >  > +          Specifies the algorithm that should be used to checksum
    > each file
    >  > +          for purposes of the backup manifest. Currently, the available
    >
    > perhaps "for inclusion in the backup manifest"?  Anyway, I think this
    > sentence is awkward.
    
    I changed it to "Specifies the checksum algorithm that should be
    applied to each file included in the backup manifest." I hope that's
    better. I also added, in both of the places where this text occurs, an
    explanation a little higher up of what a backup manifest actually is.
    
    >  > +        because the files themselves do not need to read.
    >
    > should be "need to be read".
    
    Fixed.
    
    >  > +        the manifest itself will always contain a
    > <literal>SHA256</literal>
    >
    > I think just "the manifest will always contain" is fine.
    
    OK.
    
    >  > +        manifeste itself, and is therefore ignored. Note that the
    > manifest
    >
    > typo "manifeste", perhaps remove itself.
    
    OK, fixed.
    
    >  > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27
    > 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
    >
    > Storing the checksum type with each file seems pretty redundant.
    > Perhaps that could go in the header?  You could always override if a
    > specific file had a different checksum type, though that seems unlikely.
    >
    > In general it might be good to go with shorter keys: "mod", "chk", etc.
    > Manifests can get pretty big and that's a lot of extra bytes.
    >
    > I'm also partial to using epoch time in the manifest because it is
    > generally easier for programs to work with.  But, human-readable doesn't
    > suck, either.
    
    It doesn't seem impossible for it to come up; for example, consider a
    file-level incremental backup facility. You might retain whatever
    checksums you have for the unchanged files (to avoid rereading them)
    and add checksums for modified or added files.
    
    I am not convinced that minimizing the size of the file here is a
    particularly important goal, because I don't think it's going to get
    that big in normal cases. I also think having the keys and values be
    easily understandable by human being is a plus. If we really want a
    minimal format without redundancy, we should've gone with what I
    proposed before (though admittedly that could've been tamped down even
    further if we'd cared to squeeze, which I didn't think was important
    then either).
    
    >
    >  >      if (maxrate > 0)
    >  >              maxrate_clause = psprintf("MAX_RATE %u", maxrate);
    >  > +    if (manifest)
    >
    > A linefeed here would be nice.
    
    Added.
    
    >  > +    manifestfile *tabent;
    >
    > This is an odd name.  A holdover from the tab-delimited version?
    
    No, it was meant to stand for table entry. (Now we find out what
    happens when I break my own rule against using abbreviated words.)
    
    >  > +    printf(_("Usage:\n  %s [OPTION]... BACKUPDIR\n\n"), progname);
    >
    > When I ran pg_validatebackup I expected to use -D to specify the backup
    > dir since pg_basebackup does.  On the other hand -D is weird because I
    > *really* expect that to be the pg data dir.
    >
    > But, do we want this to be different from pg_basebackup?
    
    I think it's pretty distinguishable, because pg_basebackup needs an
    input (server) and an output (directory), whereas pg_validatebackup
    only needs one. I don't really care if we want to change it, but I was
    thinking of this as being more analogous to, say, pg_resetwal.
    Granted, that's a danger-don't-use-this tool and this isn't, but I
    don't think we want the -D-is-optional behavior that tools like pg_ctl
    have, because having a tool that isn't supposed to be used on a
    running cluster default to $PGDATA seems inadvisable. And if the
    argument is mandatory then it's not clear to me why we should make
    people type -D in front of it.
    
    >  > +            checksum_length = checksum_string_length / 2;
    >
    > This check is defeated if a single character is added the to checksum.
    >
    > Not too big a deal since you still get an error, but still.
    
    I don't see what the problem is here. We speculatively divide by two
    and allocate memory assuming the value that it was even, but then
    before doing anything critical we bail out if it was actually odd.
    That's harmless. We could get around it by saying:
    
    if (checksum_string_length % 2 != 0)
        context->error_cb(...);
    checksum_length = checksum_string_length / 2;
    checksum_payload = palloc(checksum_length);
    if (!hexdecode_string(...))
        context->error_cb(...);
    
    ...but that would be adding additional code, and error messages, for
    what's basically a can't-happen-unless-the-user-is-messing-with-us
    case.
    
    >  > + * Verify that the manifest checksum is correct.
    >
    > This is not working the way I would expect -- I could freely modify the
    > manifest without getting a checksum error on the manifest.  For example:
    >
    > $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
    > pg_validatebackup: fatal: invalid checksum for file "backup_label":
    > "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
    >
    > So, if I deleted the entry above, I got a manifest checksum error.  But
    > if I just modified the checksum I get a file checksum error with no
    > manifest checksum error.
    >
    > I would prefer a manifest checksum error in all cases where it is wrong,
    > unless --exit-on-error is specified.
    
    I think I would too, but I'm confused as to what you're doing, because
    if I just modified the manifest -- by deleting a file, for example, or
    changing the checksum of a file, I just get:
    
    pg_validatebackup: fatal: manifest checksum mismatch
    
    I'm confused as to why you're not seeing that. What's the exact
    sequence of steps?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  193. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-30T20:43:59Z

    On Sun, Mar 29, 2020 at 9:05 PM David Steele <david@pgmasters.net> wrote:
    > Yeah, that seems reasonable.
    >
    > In our case backups are nearly always compressed and/or encrypted so
    > even checking the original size is a bit of work. Getting the checksum
    > at the same time seems like an obvious win.
    
    Makes sense. If this even got extended so it could read from tar-files
    instead of the filesystem directly, we'd surely want to take the
    opposite approach and just make a single pass. I'm not sure whether
    it's worth doing that at some point in the future, but it might be. If
    we're going to add the capability to compress or encrypt backups to
    pg_basebackup, we might want to do that first, and then make this tool
    handle all of those formats in one go.
    
    (As always, I don't have the ability to control how arbitrary
    developers spend their development time... so this is just a thought.)
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  194. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-30T21:08:01Z

    Hi,
    
    On 2020-03-30 15:23:08 -0400, Robert Haas wrote:
    > On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
    > > I wonder if it'd not be best, independent of whether we build in this
    > > verification, to include that metadata in the manifest file. That's for
    > > sure better than having to build a separate tool to parse timeline
    > > history files.
    > 
    > I don't think that's better, or at least not "for sure better". The
    > backup_label going to include the START TIMELINE, and if -Xfetch is
    > used, we're also going to have all the timeline history files. If the
    > backup manifest includes those same pieces of information, then we've
    > got two sources of truth: one copy in the files the server's actually
    > going to read, and another copy in the backup_manifest which we're
    > going to potentially use for validation but ignore at runtime. That
    > seems not great.
    
    The data in the backup label isn't sufficient though. Without having
    parsed the timeline file there's no way to verify that the correct WAL
    is present. I guess we can also add client side tools to parse
    timelines, add command the fetch all of the required files, and then
    interpret that somehow.
    
    But that seems much more complicated.
    
    Imo it makes sense to want to be able verify that WAL looks correct even
    transporting WAL using another method (say archiving) and thus using
    pg_basebackup's -Xnone.
    
    For the manifest to actually list what's required for the base backup
    doesn't seem redundant to me. Imo it makes the manifest file make a good
    bit more sense, since afterwards it actually describes the whole base
    backup.
    
    Taking the redundancy agreement a bit further you can argue that we
    don't need a list of relation files at all, since they're in the catalog
    :P. Obviously going to that extreme doesn't make all that much
    sense... But I do think it's a second source of truth that's independent
    of what the backends actually are going to read.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  195. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T22:56:58Z

    On 3/30/20 5:08 PM, Andres Freund wrote:
    > 
    > The data in the backup label isn't sufficient though. Without having
    > parsed the timeline file there's no way to verify that the correct WAL
    > is present. I guess we can also add client side tools to parse
    > timelines, add command the fetch all of the required files, and then
    > interpret that somehow.
    > 
    > But that seems much more complicated.
    > 
    > Imo it makes sense to want to be able verify that WAL looks correct even
    > transporting WAL using another method (say archiving) and thus using
    > pg_basebackup's -Xnone.
    > 
    > For the manifest to actually list what's required for the base backup
    > doesn't seem redundant to me. Imo it makes the manifest file make a good
    > bit more sense, since afterwards it actually describes the whole base
    > backup.
    
    FWIW, pgBackRest stores the backup WAL stop/start in the manifest. To 
    get this information after the backup is complete requires parsing the 
    .backup file which doesn't get stored in the backup directory by 
    pg_basebackup. As far as I know, this is only accessibly to solutions 
    that implement archive_command. So, pgBackRest could do that but it 
    seems far more trouble than it is worth.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  196. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-03-30T23:24:08Z

    On 3/30/20 4:16 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 3:51 PM David Steele <david@pgmasters.net> wrote:
    > 
    >>   > { "Path": "backup_label", "Size": 224, "Last-Modified": "2020-03-27
    >> 18:33:18 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b914bec9" },
    >>
    >> Storing the checksum type with each file seems pretty redundant.
    >> Perhaps that could go in the header?  You could always override if a
    >> specific file had a different checksum type, though that seems unlikely.
    >>
    >> In general it might be good to go with shorter keys: "mod", "chk", etc.
    >> Manifests can get pretty big and that's a lot of extra bytes.
    >>
    >> I'm also partial to using epoch time in the manifest because it is
    >> generally easier for programs to work with.  But, human-readable doesn't
    >> suck, either.
    > 
    > It doesn't seem impossible for it to come up; for example, consider a
    > file-level incremental backup facility. You might retain whatever
    > checksums you have for the unchanged files (to avoid rereading them)
    > and add checksums for modified or added files.
    
    OK.
    
    > I am not convinced that minimizing the size of the file here is a
    > particularly important goal, because I don't think it's going to get
    > that big in normal cases. I also think having the keys and values be
    > easily understandable by human being is a plus. If we really want a
    > minimal format without redundancy, we should've gone with what I
    > proposed before (though admittedly that could've been tamped down even
    > further if we'd cared to squeeze, which I didn't think was important
    > then either).
    
    Well, normal cases is the key.  But fine, in general we have found that 
    the in memory representation is more important in terms of supporting 
    clusters with very large numbers of files.
    
    >> When I ran pg_validatebackup I expected to use -D to specify the backup
    >> dir since pg_basebackup does.  On the other hand -D is weird because I
    >> *really* expect that to be the pg data dir.
    >>
    >> But, do we want this to be different from pg_basebackup?
    > 
    > I think it's pretty distinguishable, because pg_basebackup needs an
    > input (server) and an output (directory), whereas pg_validatebackup
    > only needs one. I don't really care if we want to change it, but I was
    > thinking of this as being more analogous to, say, pg_resetwal.
    > Granted, that's a danger-don't-use-this tool and this isn't, but I
    > don't think we want the -D-is-optional behavior that tools like pg_ctl
    > have, because having a tool that isn't supposed to be used on a
    > running cluster default to $PGDATA seems inadvisable. And if the
    > argument is mandatory then it's not clear to me why we should make
    > people type -D in front of it.
    
    Honestly I think pg_basebackup is the confusing one, because in most 
    cases -D points at the running cluster dir. So, OK.
    
    >>   > +            checksum_length = checksum_string_length / 2;
    >>
    >> This check is defeated if a single character is added the to checksum.
    >>
    >> Not too big a deal since you still get an error, but still.
    > 
    > I don't see what the problem is here. We speculatively divide by two
    > and allocate memory assuming the value that it was even, but then
    > before doing anything critical we bail out if it was actually odd.
    > That's harmless. We could get around it by saying:
    > 
    > if (checksum_string_length % 2 != 0)
    >      context->error_cb(...);
    > checksum_length = checksum_string_length / 2;
    > checksum_payload = palloc(checksum_length);
    > if (!hexdecode_string(...))
    >      context->error_cb(...);
    > 
    > ...but that would be adding additional code, and error messages, for
    > what's basically a can't-happen-unless-the-user-is-messing-with-us
    > case.
    
    Sorry, pasted the wrong code and even then still didn't get it quite 
    right.
    
    The problem:
    
    If I remove an even characters from a checksum it appears the checksum 
    passes but the manifest checksum fails:
    
    $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
    
    $ vi test/backup5/backup_manifest
         * Remove two characters from the checksum of backup_label
    
    $ pg_validatebackup test/backup5
    
    pg_validatebackup: fatal: manifest checksum mismatch
    
    But if I add any number of characters or remove an odd number of 
    characters I get:
    
    pg_validatebackup: fatal: invalid checksum for file "backup_label": 
    "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fXX"
    
    and no manifest checksum failure.
    
    >>   > + * Verify that the manifest checksum is correct.
    >>
    >> This is not working the way I would expect -- I could freely modify the
    >> manifest without getting a checksum error on the manifest.  For example:
    >>
    >> $ /home/vagrant/test/pg/bin/pg_validatebackup test/backup3
    >> pg_validatebackup: fatal: invalid checksum for file "backup_label":
    >> "408901e0814f40f8ceb7796309a59c7248458325a21941e7c55568e381f53831?"
    >>
    >> So, if I deleted the entry above, I got a manifest checksum error.  But
    >> if I just modified the checksum I get a file checksum error with no
    >> manifest checksum error.
    >>
    >> I would prefer a manifest checksum error in all cases where it is wrong,
    >> unless --exit-on-error is specified.
    > 
    > I think I would too, but I'm confused as to what you're doing, because
    > if I just modified the manifest -- by deleting a file, for example, or
    > changing the checksum of a file, I just get:
    > 
    > pg_validatebackup: fatal: manifest checksum mismatch
    > 
    > I'm confused as to why you're not seeing that. What's the exact
    > sequence of steps?
    
    $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
    
    $ vi test/backup5/backup_manifest
         * Add 'X' to the checksum of backup_label
    
    $ pg_validatebackup test/backup5
    pg_validatebackup: fatal: invalid checksum for file "backup_label": 
    "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"
    
    No mention of the manifest checksum being invalid.  But if I remove the 
    backup label file from the manifest:
    
    pg_validatebackup: fatal: manifest checksum mismatch
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  197. Re: backup manifests

    Noah Misch <noah@leadboat.com> — 2020-03-31T05:40:14Z

    On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
    > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
    > > I guess I'd like to be clear here that I have no fundamental
    > > disagreement with taking this tool in any direction that people would
    > > like it to go. For me it's just a question of timing. Feature freeze
    > > is now a week or so away, and nothing complicated is going to get done
    > > in that time. If we can all agree on something simple based on
    > > Andres's recent proposal, cool, but I'm not yet sure that will be the
    > > case, so what's plan B? We could decide that what I have here is just
    > > too little to be a viable facility on its own, but I think Stephen is
    > > the only one taking that position. We could release it as
    > > pg_validatemanifest with a plan to rename it if other backup-related
    > > checks are added later. We could release it as pg_validatebackup with
    > > the idea to avoid having to rename it when more backup-related checks
    > > are added later, but with a greater possibility of confusion in the
    > > meantime and no hard guarantee that anyone will actually develop such
    > > checks. We could put it in to pg_checksums, but I think that's really
    > > backing ourselves into a corner: if backup validation develops other
    > > checks that are not checksum-related, what then? I'd much rather
    > > gamble on keeping things together by topic (backup) than technology
    > > used internally (checksum). Putting it into pg_basebackup is another
    > > option, and would avoid that problem, but it's not my preferred
    > > option, because as I noted before, I think the command-line options
    > > will get confusing.
    > 
    > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
    > such. And eventually (definitely not this release) subsume pg_checksums
    > in it. That way we can add other checkers too.
    
    Works for me; of those two, I prefer pg_validate.
    
    
    
    
  198. Re: backup manifests

    Amit Kapila <amit.kapila16@gmail.com> — 2020-03-31T09:26:07Z

    On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
    >
    > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
    > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
    > > > I guess I'd like to be clear here that I have no fundamental
    > > > disagreement with taking this tool in any direction that people would
    > > > like it to go. For me it's just a question of timing. Feature freeze
    > > > is now a week or so away, and nothing complicated is going to get done
    > > > in that time. If we can all agree on something simple based on
    > > > Andres's recent proposal, cool, but I'm not yet sure that will be the
    > > > case, so what's plan B? We could decide that what I have here is just
    > > > too little to be a viable facility on its own, but I think Stephen is
    > > > the only one taking that position. We could release it as
    > > > pg_validatemanifest with a plan to rename it if other backup-related
    > > > checks are added later. We could release it as pg_validatebackup with
    > > > the idea to avoid having to rename it when more backup-related checks
    > > > are added later, but with a greater possibility of confusion in the
    > > > meantime and no hard guarantee that anyone will actually develop such
    > > > checks. We could put it in to pg_checksums, but I think that's really
    > > > backing ourselves into a corner: if backup validation develops other
    > > > checks that are not checksum-related, what then? I'd much rather
    > > > gamble on keeping things together by topic (backup) than technology
    > > > used internally (checksum). Putting it into pg_basebackup is another
    > > > option, and would avoid that problem, but it's not my preferred
    > > > option, because as I noted before, I think the command-line options
    > > > will get confusing.
    > >
    > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
    > > such. And eventually (definitely not this release) subsume pg_checksums
    > > in it. That way we can add other checkers too.
    >
    > Works for me; of those two, I prefer pg_validate.
    >
    
    pg_validate sounds like a tool with a much bigger purpose.  I think
    even things like amcheck could also fall under it.
    
    This patch has two parts (a) Generate backup manifests for base
    backups, and (b) Validate backup (manifest).  It seems to me that
    there are not many things pending for (a), can't we commit that first
    or is it the case that (a) depends on (b)?  This is *not* a suggestion
    to leave pg_validatebackup from this release rather just to commit if
    something is ready and meaningful on its own.
    
    -- 
    With Regards,
    Amit Kapila.
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
    
  199. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-31T11:57:01Z

    On Mon, Mar 30, 2020 at 7:24 PM David Steele <david@pgmasters.net> wrote:
    > > I'm confused as to why you're not seeing that. What's the exact
    > > sequence of steps?
    >
    > $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
    >
    > $ vi test/backup5/backup_manifest
    >      * Add 'X' to the checksum of backup_label
    >
    > $ pg_validatebackup test/backup5
    > pg_validatebackup: fatal: invalid checksum for file "backup_label":
    > "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"
    >
    > No mention of the manifest checksum being invalid.  But if I remove the
    > backup label file from the manifest:
    >
    > pg_validatebackup: fatal: manifest checksum mismatch
    
    Oh, I see what's happening now. If the checksum is not an even-length
    string of hexademical characters, it's treated as a syntax error, so
    it bails out at that point. Generally, a syntax error in the manifest
    file is treated as a fatal error, and you just die right there. You'd
    get the same behavior if you had malformed JSON, like a stray { or }
    or [ or ] someplace that it doesn't belong according to the rules of
    JSON. On the other hand, if you corrupt the checksum by adding AA or
    EE or 54 or some other even-length string of hex characters, then you
    have (in this code's view) a semantic error rather than a syntax
    error, so it will finish loading all the manifest data and then bail
    because the checksum doesn't match.
    
    We really can't avoid bailing out early sometimes, because if the file
    is totally malformed at the JSON level, there's just no way to
    continue. We could cause this particular error to get treated as a
    semantic error rather than a syntax error, but I don't really see much
    advantage in so doing. This way was easier to code, and I don't think
    it really matters which error we find first.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  200. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-03-31T11:58:15Z

    Greetings,
    
    * Amit Kapila (amit.kapila16@gmail.com) wrote:
    > On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
    > > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
    > > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
    > > > > I guess I'd like to be clear here that I have no fundamental
    > > > > disagreement with taking this tool in any direction that people would
    > > > > like it to go. For me it's just a question of timing. Feature freeze
    > > > > is now a week or so away, and nothing complicated is going to get done
    > > > > in that time. If we can all agree on something simple based on
    > > > > Andres's recent proposal, cool, but I'm not yet sure that will be the
    > > > > case, so what's plan B? We could decide that what I have here is just
    > > > > too little to be a viable facility on its own, but I think Stephen is
    > > > > the only one taking that position. We could release it as
    > > > > pg_validatemanifest with a plan to rename it if other backup-related
    > > > > checks are added later. We could release it as pg_validatebackup with
    > > > > the idea to avoid having to rename it when more backup-related checks
    > > > > are added later, but with a greater possibility of confusion in the
    > > > > meantime and no hard guarantee that anyone will actually develop such
    > > > > checks. We could put it in to pg_checksums, but I think that's really
    > > > > backing ourselves into a corner: if backup validation develops other
    > > > > checks that are not checksum-related, what then? I'd much rather
    > > > > gamble on keeping things together by topic (backup) than technology
    > > > > used internally (checksum). Putting it into pg_basebackup is another
    > > > > option, and would avoid that problem, but it's not my preferred
    > > > > option, because as I noted before, I think the command-line options
    > > > > will get confusing.
    > > >
    > > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
    > > > such. And eventually (definitely not this release) subsume pg_checksums
    > > > in it. That way we can add other checkers too.
    > >
    > > Works for me; of those two, I prefer pg_validate.
    > 
    > pg_validate sounds like a tool with a much bigger purpose.  I think
    > even things like amcheck could also fall under it.
    
    Yeah, I tend to agree with this.
    
    > This patch has two parts (a) Generate backup manifests for base
    > backups, and (b) Validate backup (manifest).  It seems to me that
    > there are not many things pending for (a), can't we commit that first
    > or is it the case that (a) depends on (b)?  This is *not* a suggestion
    > to leave pg_validatebackup from this release rather just to commit if
    > something is ready and meaningful on its own.
    
    I suspect the idea here is that we don't really want to commit something
    that nothing is actually using, and that's understandable and justified
    here- consider that even in this recent discussion there was talk that
    maybe we should have included permissions and ownership in the manifest,
    or starting and ending WAL positions, so that they'd be able to be
    checked by this tool more easily (and because it's just useful to have
    all that info in one place...  I don't really agree with the concerns
    that it's an issue for static information like that to be duplicated).
    
    In other words, while the manifest creation code might be something we
    could commit, without a tool to use it (which does all the things that
    we think it needs to, to perform some high-level task, such as "validate
    a backup") we don't know that the manifest that's actually generated is
    really up to snuff and has what it needs to have to perform that task.
    
    I had been hoping that the discussion Andres was leading regarding
    leveraging pg_waldump (or maybe just code from it..) would get us to a
    point where pg_validatebackup would check that we have all of the WAL
    needed for the backup to be consistent and that it would then verify the
    internal checksums of the WAL.  That would certainly be a good solution
    for this time around, in my view, and is already all existing
    client-side code.  I do think we'd want to have a note about how we
    verify pg_wal differently from the other files which are in the
    manifest.
    
    Thanks,
    
    Stephen
    
  201. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-03-31T18:10:34Z

    On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
    > I think it wouldn't be too hard to compute that information while taking
    > the base backup. We know the end timeline (ThisTimeLineID), so we can
    > just call readTimeLineHistory(ThisTimeLineID). Which should then allow
    > for something pretty trivial along the lines of
    >
    > timelines = readTimeLineHistory(ThisTimeLineID);
    > last_start = InvalidXLogRecPtr;
    > foreach(lc, timelines)
    > {
    >     TimeLineHistoryEntry *he = lfirst(lc);
    >
    >     if (he->end < startptr)
    >         continue;
    >
    >     //
    >     manifest_emit_wal_range(Min(he->begin, startptr), he->end);
    >     last_start = he->end;
    > }
    >
    > if (last_start == InvalidXlogRecPtr)
    >    start = startptr;
    > else
    >    start = last_start;
    >
    > manifest_emit_wal_range(start, entptr);
    
    I made an attempt to implement this. In the attached patch set, 0001
    and 0002 are (I think) unmodified from the last version. 0003 is a
    slightly-rejiggered version of your new pg_waldump option. 0004 whacks
    0002 around so that the WAL ranges are included in the manifest and
    pg_validatebackup tries to run pg_waldump for each WAL range. It
    appears to work in light testing, but I haven't yet (1) tested it
    extensively, (2) written good regression tests for it above and beyond
    what pg_validatebackup had already, or (3) updated the documentation.
    I'm going to work on those things. I would appreciate *very timely*
    feedback on anything people do or do not like about this, because I
    want to commit this patch set by the end of the work week and that
    isn't very far away. I would also appreciate if people would bear in
    mind the principle that half a loaf is better than none, and further
    improvements can be made in future releases.
    
    As part of my light testing, I tried promoting a standby that was
    running pg_basebackup, and found that pg_basebackup failed like this:
    
    pg_basebackup: error: could not get COPY data stream: ERROR:  the
    standby was promoted during online backup
    HINT:  This means that the backup being taken is corrupt and should
    not be used. Try taking another online backup.
    pg_basebackup: removing data directory "/Users/rhaas/pgslave2"
    
    My first thought was that this error message is hard to reconcile with
    this comment:
    
            /*
             * Send timeline history files too. Only the latest timeline history
             * file is required for recovery, and even that only if there happens
             * to be a timeline switch in the first WAL segment that contains the
             * checkpoint record, or if we're taking a base backup from a standby
             * server and the target timeline changes while the backup is taken.
             * But they are small and highly useful for debugging purposes, so
             * better include them all, always.
             */
    
    But then it occurred to me that this might be a cascading standby.
    Maybe the original master died and this machine's master got promoted,
    so it has to follow a timeline switch but doesn't itself get promoted.
    I think I might try to test out that scenario and see what happens,
    but I haven't done so as of this writing. Regardless, it seems like a
    really good idea to store a list of WAL ranges rather than a single
    start/end/timeline, because even if it's impossible today it might
    become possible in the future. Still, unless there's an easy way to
    set up a test scenario where multiple WAL ranges need to be verified,
    it may be hard to test that this code actually behaves properly.
    
    Thoughts?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  202. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-03-31T22:50:34Z

    Hi,
    
    On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
    > I made an attempt to implement this.
    
    Awesome!
    
    
    > In the attached patch set, 0001 I'm going to work on those things. I
    > would appreciate *very timely* feedback on anything people do or do
    > not like about this, because I want to commit this patch set by the
    > end of the work week and that isn't very far away. I would also
    > appreciate if people would bear in mind the principle that half a loaf
    > is better than none, and further improvements can be made in future
    > releases.
    > 
    > As part of my light testing, I tried promoting a standby that was
    > running pg_basebackup, and found that pg_basebackup failed like this:
    > 
    > pg_basebackup: error: could not get COPY data stream: ERROR:  the
    > standby was promoted during online backup
    > HINT:  This means that the backup being taken is corrupt and should
    > not be used. Try taking another online backup.
    > pg_basebackup: removing data directory "/Users/rhaas/pgslave2"
    > 
    > My first thought was that this error message is hard to reconcile with
    > this comment:
    > 
    >         /*
    >          * Send timeline history files too. Only the latest timeline history
    >          * file is required for recovery, and even that only if there happens
    >          * to be a timeline switch in the first WAL segment that contains the
    >          * checkpoint record, or if we're taking a base backup from a standby
    >          * server and the target timeline changes while the backup is taken.
    >          * But they are small and highly useful for debugging purposes, so
    >          * better include them all, always.
    >          */
    > 
    > But then it occurred to me that this might be a cascading standby.
    
    Yea. The check just prevents the walsender's database from being
    promoted:
    
    		/*
    		 * Check if the postmaster has signaled us to exit, and abort with an
    		 * error in that case. The error handler further up will call
    		 * do_pg_abort_backup() for us. Also check that if the backup was
    		 * started while still in recovery, the server wasn't promoted.
    		 * do_pg_stop_backup() will check that too, but it's better to stop
    		 * the backup early than continue to the end and fail there.
    		 */
    		CHECK_FOR_INTERRUPTS();
    		if (RecoveryInProgress() != backup_started_in_recovery)
    			ereport(ERROR,
    					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    					 errmsg("the standby was promoted during online backup"),
    					 errhint("This means that the backup being taken is corrupt "
    							 "and should not be used. "
    							 "Try taking another online backup.")));
    and
    
    	if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
    		ereport(ERROR,
    				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    				 errmsg("the standby was promoted during online backup"),
    				 errhint("This means that the backup being taken is corrupt "
    						 "and should not be used. "
    						 "Try taking another online backup.")));
    
    So that just prevents promotions of the current node, afaict.
    
    
    
    > Regardless, it seems like a really good idea to store a list of WAL
    > ranges rather than a single start/end/timeline, because even if it's
    > impossible today it might become possible in the future.
    
    Indeed.
    
    
    > Still, unless there's an easy way to set up a test scenario where
    > multiple WAL ranges need to be verified, it may be hard to test that
    > this code actually behaves properly.
    
    I think it'd be possible to test without a fully cascading setup, by
    creating an initial base backup, then do some work to create a bunch of
    new timelines, and then start the initial base backup. That'd have to
    follow all those timelines.  Not sure that's better than a cascading
    setup though.
    
    
    > +/*
    > + * Add information about the WAL that will need to be replayed when restoring
    > + * this backup to the manifest.
    > + */
    > +static void
    > +AddWALInfoToManifest(manifest_info *manifest, XLogRecPtr startptr,
    > +					 TimeLineID starttli, XLogRecPtr endptr, TimeLineID endtli)
    > +{
    > +	List *timelines = readTimeLineHistory(endtli);
    
    should probably happen after the manifest->buffile check.
    
    
    > +	ListCell *lc;
    > +	bool	first_wal_range = true;
    > +	bool	found_ending_tli = false;
    > +
    > +	/* If there is no buffile, then the user doesn't want a manifest. */
    > +	if (manifest->buffile == NULL)
    > +		return;
    
    Not really about this patch/function specifically: I wonder if this'd
    look better if you added ManifestEnabled() macro instead of repeating
    the comment repeatedly.
    
    
    
    > +	/* Unless --no-parse-wal was specified, we will need pg_waldump. */
    > +	if (!no_parse_wal)
    > +	{
    > +		int		ret;
    > +
    > +		pg_waldump_path = pg_malloc(MAXPGPATH);
    > +		ret = find_other_exec(argv[0], "pg_waldump",
    > +							  "pg_waldump (PostgreSQL) " PG_VERSION "\n",
    > +							 pg_waldump_path);
    > +		if (ret < 0)
    > +		{
    > +			char	full_path[MAXPGPATH];
    > +
    > +			if (find_my_exec(argv[0], full_path) < 0)
    > +				strlcpy(full_path, progname, sizeof(full_path));
    > +			if (ret == -1)
    > +				pg_log_fatal("The program \"%s\" is needed by %s but was\n"
    > +							 "not found in the same directory as \"%s\".\n"
    > +							 "Check your installation.",
    > +							 "pg_waldump", "pg_validatebackup", full_path);
    > +			else
    > +				pg_log_fatal("The program \"%s\" was found by \"%s\" but was\n"
    > +							 "not the same version as %s.\n"
    > +							 "Check your installation.",
    > +							 "pg_waldump", full_path, "pg_validatebackup");
    > +		}
    > +	}
    
    ISTM, and this can definitely wait for another time, that we should have
    one wrapper doing all of this, instead of having quite a few copies of
    very similar logic to the above.
    
    
    > +/*
    > + * Attempt to parse the WAL files required to restore from backup using
    > + * pg_waldump.
    > + */
    > +static void
    > +parse_required_wal(validator_context *context, char *pg_waldump_path,
    > +				   char *wal_directory, manifest_wal_range *first_wal_range)
    > +{
    > +	manifest_wal_range *this_wal_range = first_wal_range;
    > +
    > +	while (this_wal_range != NULL)
    > +	{
    > +		char *pg_waldump_cmd;
    > +
    > +		pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
    > +			   pg_waldump_path, wal_directory, this_wal_range->tli,
    > +			   (uint32) (this_wal_range->start_lsn >> 32),
    > +			   (uint32) this_wal_range->start_lsn,
    > +			   (uint32) (this_wal_range->end_lsn >> 32),
    > +			   (uint32) this_wal_range->end_lsn);
    > +		if (system(pg_waldump_cmd) != 0)
    > +			report_backup_error(context,
    > +								"WAL parsing failed for timeline %u",
    > +								this_wal_range->tli);
    > +
    > +		this_wal_range = this_wal_range->next;
    > +	}
    > +}
    
    Should we have a function to properly escape paths in cases like this?
    Not that it's likely or really problematic, but the quoting for path
    could be "circumvented".
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  203. Re: backup manifests

    Noah Misch <noah@leadboat.com> — 2020-04-01T05:15:04Z

    On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote:
    > On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
    > > +/*
    > > + * Attempt to parse the WAL files required to restore from backup using
    > > + * pg_waldump.
    > > + */
    > > +static void
    > > +parse_required_wal(validator_context *context, char *pg_waldump_path,
    > > +				   char *wal_directory, manifest_wal_range *first_wal_range)
    > > +{
    > > +	manifest_wal_range *this_wal_range = first_wal_range;
    > > +
    > > +	while (this_wal_range != NULL)
    > > +	{
    > > +		char *pg_waldump_cmd;
    > > +
    > > +		pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
    > > +			   pg_waldump_path, wal_directory, this_wal_range->tli,
    > > +			   (uint32) (this_wal_range->start_lsn >> 32),
    > > +			   (uint32) this_wal_range->start_lsn,
    > > +			   (uint32) (this_wal_range->end_lsn >> 32),
    > > +			   (uint32) this_wal_range->end_lsn);
    > > +		if (system(pg_waldump_cmd) != 0)
    > > +			report_backup_error(context,
    > > +								"WAL parsing failed for timeline %u",
    > > +								this_wal_range->tli);
    > > +
    > > +		this_wal_range = this_wal_range->next;
    > > +	}
    > > +}
    > 
    > Should we have a function to properly escape paths in cases like this?
    > Not that it's likely or really problematic, but the quoting for path
    > could be "circumvented".
    
    Are you looking for appendShellString(), or something different?
    
    
    
    
  204. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-01T20:47:56Z

    On Tue, Mar 31, 2020 at 6:50 PM Andres Freund <andres@anarazel.de> wrote:
    > On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
    > > I made an attempt to implement this.
    >
    > Awesome!
    
    Here's a new patch set. I haven't fixed the things in your latest
    round of review comments yet, but I did rewrite the documentation for
    pg_validatebackup, add documentation for the new pg_waldump option,
    and add regression tests for the new WAL-checking facility of
    pg_validatebackup.
    
    0001 - add pg_waldump -q
    0002 - add checksum helpers
    0003 - core backup manifest patch, now with WAL verification included
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  205. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-01T20:59:36Z

    Hi,
    
    On 2020-03-31 22:15:04 -0700, Noah Misch wrote:
    > On Tue, Mar 31, 2020 at 03:50:34PM -0700, Andres Freund wrote:
    > > On 2020-03-31 14:10:34 -0400, Robert Haas wrote:
    > > > +/*
    > > > + * Attempt to parse the WAL files required to restore from backup using
    > > > + * pg_waldump.
    > > > + */
    > > > +static void
    > > > +parse_required_wal(validator_context *context, char *pg_waldump_path,
    > > > +				   char *wal_directory, manifest_wal_range *first_wal_range)
    > > > +{
    > > > +	manifest_wal_range *this_wal_range = first_wal_range;
    > > > +
    > > > +	while (this_wal_range != NULL)
    > > > +	{
    > > > +		char *pg_waldump_cmd;
    > > > +
    > > > +		pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%X --end=%X/%X\n",
    > > > +			   pg_waldump_path, wal_directory, this_wal_range->tli,
    > > > +			   (uint32) (this_wal_range->start_lsn >> 32),
    > > > +			   (uint32) this_wal_range->start_lsn,
    > > > +			   (uint32) (this_wal_range->end_lsn >> 32),
    > > > +			   (uint32) this_wal_range->end_lsn);
    > > > +		if (system(pg_waldump_cmd) != 0)
    > > > +			report_backup_error(context,
    > > > +								"WAL parsing failed for timeline %u",
    > > > +								this_wal_range->tli);
    > > > +
    > > > +		this_wal_range = this_wal_range->next;
    > > > +	}
    > > > +}
    > > 
    > > Should we have a function to properly escape paths in cases like this?
    > > Not that it's likely or really problematic, but the quoting for path
    > > could be "circumvented".
    > 
    > Are you looking for appendShellString(), or something different?
    
    Looks like that'd be it. Thanks.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  206. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-01T21:01:55Z

    Hi,
    
    On 2020-03-31 14:56:07 +0530, Amit Kapila wrote:
    > On Tue, Mar 31, 2020 at 11:10 AM Noah Misch <noah@leadboat.com> wrote:
    > > On Mon, Mar 30, 2020 at 12:16:31PM -0700, Andres Freund wrote:
    > > > On 2020-03-30 15:04:55 -0400, Robert Haas wrote:
    > > > I'm mildly inclined to name it pg_validate, pg_validate_dbdir or
    > > > such. And eventually (definitely not this release) subsume pg_checksums
    > > > in it. That way we can add other checkers too.
    > >
    > > Works for me; of those two, I prefer pg_validate.
    > >
    > 
    > pg_validate sounds like a tool with a much bigger purpose.  I think
    > even things like amcheck could also fall under it.
    
    Intentionally so. We don't serve our users by collecting a lot of
    differently named commands to work with data directories. A I wrote
    above, the point would be to eventually have that tool also perform
    checksum validation etc.  Potentially even in a single pass over the
    data directory.
    
    
    > This patch has two parts (a) Generate backup manifests for base
    > backups, and (b) Validate backup (manifest).  It seems to me that
    > there are not many things pending for (a), can't we commit that first
    > or is it the case that (a) depends on (b)?  This is *not* a suggestion
    > to leave pg_validatebackup from this release rather just to commit if
    > something is ready and meaningful on its own.
    
    IDK, it seems easier to be able to modify both at the same time.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  207. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-04-01T21:19:47Z

    On 3/31/20 7:57 AM, Robert Haas wrote:
    > On Mon, Mar 30, 2020 at 7:24 PM David Steele <david@pgmasters.net> wrote:
    >>> I'm confused as to why you're not seeing that. What's the exact
    >>> sequence of steps?
    >>
    >> $ pg_basebackup -D test/backup5 --manifest-checksums=SHA256
    >>
    >> $ vi test/backup5/backup_manifest
    >>       * Add 'X' to the checksum of backup_label
    >>
    >> $ pg_validatebackup test/backup5
    >> pg_validatebackup: fatal: invalid checksum for file "backup_label":
    >> "a98e9164fd59d498d14cfdf19c67d1c2208a30e7b939d1b4a09f524c7adfc11fX"
    >>
    >> No mention of the manifest checksum being invalid.  But if I remove the
    >> backup label file from the manifest:
    >>
    >> pg_validatebackup: fatal: manifest checksum mismatch
    > 
    > Oh, I see what's happening now. If the checksum is not an even-length
    > string of hexademical characters, it's treated as a syntax error, so
    > it bails out at that point. Generally, a syntax error in the manifest
    > file is treated as a fatal error, and you just die right there. You'd
    > get the same behavior if you had malformed JSON, like a stray { or }
    > or [ or ] someplace that it doesn't belong according to the rules of
    > JSON. On the other hand, if you corrupt the checksum by adding AA or
    > EE or 54 or some other even-length string of hex characters, then you
    > have (in this code's view) a semantic error rather than a syntax
    > error, so it will finish loading all the manifest data and then bail
    > because the checksum doesn't match.
    > 
    > We really can't avoid bailing out early sometimes, because if the file
    > is totally malformed at the JSON level, there's just no way to
    > continue. We could cause this particular error to get treated as a
    > semantic error rather than a syntax error, but I don't really see much
    > advantage in so doing. This way was easier to code, and I don't think
    > it really matters which error we find first.
    
    I think it would be good to know that the manifest checksum is bad in 
    all cases because that may well inform other errors.
    
    That said, I know you have a lot on your plate with this patch so I'm 
    not going to make a fuss about such a minor gripe. Perhaps this can be 
    considered for future improvement.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  208. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T17:04:45Z

    On Wed, Apr 1, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > Here's a new patch set. I haven't fixed the things in your latest
    > round of review comments yet, but I did rewrite the documentation for
    > pg_validatebackup, add documentation for the new pg_waldump option,
    > and add regression tests for the new WAL-checking facility of
    > pg_validatebackup.
    >
    > 0001 - add pg_waldump -q
    > 0002 - add checksum helpers
    > 0003 - core backup manifest patch, now with WAL verification included
    
    And here's another new patch set. After some experimentation, I was
    able to manually test the timeline-switch-during-a-base-backup case
    and found that it had bugs in both pg_validatebackup and the code I
    added to the backend's basebackup.c. So I fixed those. It would be
    nice to have automated tests, but you need a large database (so that
    backing it up takes non-trivial time) and a load on the primary (so
    that WAL is being replayed during the backup) and there's a race
    condition (because the backup has to not finish before the cascading
    standby learns that the upstream has been promoted), so I don't at
    present see a practical way to automate that. I did verify, in manual
    testing, that a problem with WAL files on either timeline caused a
    validation failure. I also verified that the LSNs at which the standby
    began replay and reached consistency matched what was stored in the
    manifest.
    
    I also implemented Noah's suggestion that we should write the backup
    manifest under a temporary name and then rename it afterward.
    Stephen's original complaint that you could end up with a backup that
    validates successfully even though we died before we got the WAL is,
    at this point, moot, because pg_validatebackup is now capable of
    noticing that the WAL is missing. Nevertheless, this seems like a nice
    belt-and-suspenders check. I was able to position the rename *after*
    we fsync() the backup directory, as well as after we get all of the
    WAL, so unless those steps complete you'll have backup_manifest.tmp
    rather than backup_manifest. It's true that, if we suffered an OS
    crash before the fsync() completed and lost some files or some file
    data, pg_validatebackup ought to fail anyway, but this way it is
    absolutely certain to fail, and to do so immediately. Likewise for a
    failure while fetching WAL that manages to leave the output directory
    behind.
    
    This version has also had a visit from the pgindent police.
    
    I think this responds to pretty much all of the complaints that I know
    about and upon which we have a reasonable degree of consensus. There
    are still some things that not everybody is happy about. In
    particular, Stephen and David are unhappy about using CRC-32C as the
    default algorithm, but Andres and Noah both think it's a reasonable
    choice, even if not as robust as everybody will want. As I agree, I'm
    going to stick with that choice.
    
    Also, there is still some debate about what the tool ought to be
    called. My previous suggestion to rename this from pg_validatebackup
    to pg_validatemanifest seems wrong now that WAL validation has been
    added; in fact, given that we now have two independent sanity checks
    on a backup, I'm going to argue that it would be reasonable to extend
    that by adding more kinds of backup validation, perhaps even including
    the permissions check that Andres suggested before. I don't plan to
    pursue that at present, though. There remains the idea of merging this
    with some other tool, but I still don't like that. On the one hand,
    it's been suggested that it could be merged into pg_checksums, but I
    think that is less appealing now that it seems to be growing into a
    general-purpose backup validation tool. It may do things that have
    nothing to do with checksums. On the other hand, it's been suggested
    that it ought to be called pg_validate and that pg_checksums ought to
    eventually be merged into it, but I don't think we have sufficient
    consensus here to commit the project to such a plan. Nobody
    responsible for the pg_checksums work has endorsed it, for example.
    Moreover, pg_checksums does things other than validation, such as
    enabling and disabling checksums. Therefore, I think it's unclear that
    such a plan would achieve a sufficient degree of consensus.
    
    For my part, I think this is a general issue that is not really this
    patch's problem to solve. We have had multiple discussions over the
    years about reducing the number of binaries that we ship. We could
    have a general binary called "pg" or similar and use subcommands: pg
    createdb, pg basebackup, pg validatebackup, etc. I think such an
    approach is worth considering, though it would certainly be an
    adjustment for everyone. Or we might do something else. But I don't
    want to deal with that in this patch.
    
    A couple of other minor suggestions have been made: (1) rejigger
    things to avoid message duplication related to launching external
    binaries, (2) maybe use appendShellString, and (3) change some details
    of error-reporting related to manifest parsing. I don't believe anyone
    views these as blockers; (1) and (2) are preexisting issues that this
    patch extends to one new case.
    
    Considering all the foregoing, I would like to go ahead and commit this stuff.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  209. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-02T17:23:18Z

    Hi,
    
    On 2020-04-02 13:04:45 -0400, Robert Haas wrote:
    > And here's another new patch set. After some experimentation, I was
    > able to manually test the timeline-switch-during-a-base-backup case
    > and found that it had bugs in both pg_validatebackup and the code I
    > added to the backend's basebackup.c. So I fixed those.
    
    Cool.
    
    
    > It would be
    > nice to have automated tests, but you need a large database (so that
    > backing it up takes non-trivial time) and a load on the primary (so
    > that WAL is being replayed during the backup) and there's a race
    > condition (because the backup has to not finish before the cascading
    > standby learns that the upstream has been promoted), so I don't at
    > present see a practical way to automate that. I did verify, in manual
    > testing, that a problem with WAL files on either timeline caused a
    > validation failure. I also verified that the LSNs at which the standby
    > began replay and reached consistency matched what was stored in the
    > manifest.
    
    I suspect its possible to control the timing by preventing the
    checkpoint at the end of recovery from completing within a relevant
    timeframe. I think configuring a large checkpoint_timeout and using a
    non-fast base backup ought to do the trick. The state can be advanced by
    separately triggering an immediate checkpoint? Or by changing the
    checkpoint_timeout?
    
    
    
    > I also implemented Noah's suggestion that we should write the backup
    > manifest under a temporary name and then rename it afterward.
    > Stephen's original complaint that you could end up with a backup that
    > validates successfully even though we died before we got the WAL is,
    > at this point, moot, because pg_validatebackup is now capable of
    > noticing that the WAL is missing. Nevertheless, this seems like a nice
    > belt-and-suspenders check.
    
    Yea, it's imo generally a good idea.
    
    
    > I think this responds to pretty much all of the complaints that I know
    > about and upon which we have a reasonable degree of consensus. There
    > are still some things that not everybody is happy about. In
    > particular, Stephen and David are unhappy about using CRC-32C as the
    > default algorithm, but Andres and Noah both think it's a reasonable
    > choice, even if not as robust as everybody will want. As I agree, I'm
    > going to stick with that choice.
    
    I think it might be worth looking, in a later release, at something like
    blake3 for a fast cryptographic checksum. By allowing for instruction
    parallelism (by independently checksuming different blocks in data, and
    only advancing the "shared" checksum separately) it achieves
    considerably higher throughput rates.
    
    I suspect we should also look at a better non-crypto hash. xxhash or
    whatever. Not just for these checksums, but also for in-memory.
    
    
    > Also, there is still some debate about what the tool ought to be
    > called. My previous suggestion to rename this from pg_validatebackup
    > to pg_validatemanifest seems wrong now that WAL validation has been
    > added; in fact, given that we now have two independent sanity checks
    > on a backup, I'm going to argue that it would be reasonable to extend
    > that by adding more kinds of backup validation, perhaps even including
    > the permissions check that Andres suggested before.
    
    FWIW, the only check I'd really like to see in this release is the
    crosscheck with the files length and the actually read data (to be able
    to disagnose FS issues).
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  210. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T18:16:27Z

    On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
    > I suspect its possible to control the timing by preventing the
    > checkpoint at the end of recovery from completing within a relevant
    > timeframe. I think configuring a large checkpoint_timeout and using a
    > non-fast base backup ought to do the trick. The state can be advanced by
    > separately triggering an immediate checkpoint? Or by changing the
    > checkpoint_timeout?
    
    That might make the window fairly wide on normal systems, but I'm not
    sure about Raspberry Pi BF members or things running
    CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
    
    > I think it might be worth looking, in a later release, at something like
    > blake3 for a fast cryptographic checksum. By allowing for instruction
    > parallelism (by independently checksuming different blocks in data, and
    > only advancing the "shared" checksum separately) it achieves
    > considerably higher throughput rates.
    >
    > I suspect we should also look at a better non-crypto hash. xxhash or
    > whatever. Not just for these checksums, but also for in-memory.
    
    I have no problem with that. I don't feel that I am well-placed to
    recommend for or against specific algorithms. Speed is easy to
    measure, but there's also code stability, the license under which
    something is released, the quality of the hashes it produces, and the
    extent to which it is cryptographically secure. I'm not an expert in
    any of that stuff, but if we get consensus on something it should be
    easy enough to plug it into this framework. Even changing the default
    would be no big deal.
    
    > FWIW, the only check I'd really like to see in this release is the
    > crosscheck with the files length and the actually read data (to be able
    > to disagnose FS issues).
    
    Not sure I understand this comment. Isn't that a subset of what the
    patch already does? Are you asking for something to be changed?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  211. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-02T18:23:46Z

    Hi,
    
    On 2020-04-02 14:16:27 -0400, Robert Haas wrote:
    > On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
    > > I suspect its possible to control the timing by preventing the
    > > checkpoint at the end of recovery from completing within a relevant
    > > timeframe. I think configuring a large checkpoint_timeout and using a
    > > non-fast base backup ought to do the trick. The state can be advanced by
    > > separately triggering an immediate checkpoint? Or by changing the
    > > checkpoint_timeout?
    > 
    > That might make the window fairly wide on normal systems, but I'm not
    > sure about Raspberry Pi BF members or things running
    > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
    
    You can set checkpoint_timeout to be a day. If that's not enough, well,
    then I think we have other problems.
    
    
    > > FWIW, the only check I'd really like to see in this release is the
    > > crosscheck with the files length and the actually read data (to be able
    > > to disagnose FS issues).
    > 
    > Not sure I understand this comment. Isn't that a subset of what the
    > patch already does? Are you asking for something to be changed?
    
    Yes, I am asking for something to be changed: I'd like the code that
    read()s the file when computing the checksum to add up how many bytes
    were read, and compare that to the size in the manifest. And if there's
    a difference report an error about that, instead of a checksum failure.
    
    I've repeatedly seen filesystem issues lead to to earlier EOFs when
    read()ing than what stat() returns. It'll be pretty annoying to have to
    debug a general "checksum failure", rather than just knowing that
    reading stopped after 100MB of 1GB.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  212. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T18:55:19Z

    On Thu, Apr 2, 2020 at 2:23 PM Andres Freund <andres@anarazel.de> wrote:
    > > That might make the window fairly wide on normal systems, but I'm not
    > > sure about Raspberry Pi BF members or things running
    > > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
    >
    > You can set checkpoint_timeout to be a day. If that's not enough, well,
    > then I think we have other problems.
    
    I'm not sure that's the only issue here, but I'll try it.
    
    > Yes, I am asking for something to be changed: I'd like the code that
    > read()s the file when computing the checksum to add up how many bytes
    > were read, and compare that to the size in the manifest. And if there's
    > a difference report an error about that, instead of a checksum failure.
    >
    > I've repeatedly seen filesystem issues lead to to earlier EOFs when
    > read()ing than what stat() returns. It'll be pretty annoying to have to
    > debug a general "checksum failure", rather than just knowing that
    > reading stopped after 100MB of 1GB.
    
    Is 0004 attached like what you have in mind?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  213. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-02T19:02:55Z

    On 2020-04-02 14:55:19 -0400, Robert Haas wrote:
    > > Yes, I am asking for something to be changed: I'd like the code that
    > > read()s the file when computing the checksum to add up how many bytes
    > > were read, and compare that to the size in the manifest. And if there's
    > > a difference report an error about that, instead of a checksum failure.
    > >
    > > I've repeatedly seen filesystem issues lead to to earlier EOFs when
    > > read()ing than what stat() returns. It'll be pretty annoying to have to
    > > debug a general "checksum failure", rather than just knowing that
    > > reading stopped after 100MB of 1GB.
    > 
    > Is 0004 attached like what you have in mind?
    
    Yes. Thanks!
    
    - Andres
    
    
    
    
  214. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-04-02T19:26:15Z

    On 4/2/20 1:04 PM, Robert Haas wrote:
     >
    > There
    > are still some things that not everybody is happy about. In
    > particular, Stephen and David are unhappy about using CRC-32C as the
    > default algorithm, but Andres and Noah both think it's a reasonable
    > choice, even if not as robust as everybody will want. As I agree, I'm
    > going to stick with that choice.
    
    Yeah, I seem to be on the losing side of this argument, at least for 
    now, so I don't think it should block the commit of this patch. It's an 
    easy enough tweak if we change our minds.
    
    > For my part, I think this is a general issue that is not really this
    > patch's problem to solve. We have had multiple discussions over the
    > years about reducing the number of binaries that we ship. We could
    > have a general binary called "pg" or similar and use subcommands: pg
    > createdb, pg basebackup, pg validatebackup, etc. I think such an
    > approach is worth considering, though it would certainly be an
    > adjustment for everyone. Or we might do something else. But I don't
    > want to deal with that in this patch.
    
    I'm fine with the current name, especially now that WAL is validated.
    
    > A couple of other minor suggestions have been made: (1) rejigger
    > things to avoid message duplication related to launching external
    > binaries, 
    
    That'd be nice to have, but I think we can live without it for now.
    
    > (2) maybe use appendShellString
    
    Seems like this would be good to have but I'm not going to make a fuss 
    about it.
    
    > and (3) change some details
    > of error-reporting related to manifest parsing. I don't believe anyone
    > views these as blockers
    
    I'd view this as later refinement once we see how the tool is being used 
    and/or get gripes from the field.
    
    So, with the addition of the 0004 patch down-thread this looks 
    committable to me.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  215. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T19:42:48Z

    On Thu, Apr 2, 2020 at 2:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > On Thu, Apr 2, 2020 at 2:23 PM Andres Freund <andres@anarazel.de> wrote:
    > > > That might make the window fairly wide on normal systems, but I'm not
    > > > sure about Raspberry Pi BF members or things running
    > > > CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.
    > >
    > > You can set checkpoint_timeout to be a day. If that's not enough, well,
    > > then I think we have other problems.
    >
    > I'm not sure that's the only issue here, but I'll try it.
    
    I ran into a few problems here. In trying to set this up manually, I
    always began with the following steps:
    
    ====
    # (1) create cluster
    initdb
    
    # (2) add to configuration file
    log_checkpoints=on
    checkpoint_timeout=1d
    checkpoint_completion_target=0.99
    
    # (3) fire it up
    postgres
    createdb
    ====
    
    If at this point I do "pg_basebackup -D pgslave -R -c spread", it
    completes within a few seconds anyway, because there's basically
    nothing dirty, and no matter how slowly you write out no data, it's
    still pretty quick. If I run "pgbench -i" first, and then
    "pg_basebackup -D pgslave -R -c spread", it hangs, apparently
    essentially forever, because now the checkpoint has something to do,
    and it does it super-slowly, and "psql -c checkpoint" makes it finish
    immediately. However, this experiment isn't testing quite the right
    thing, because what I actually need is a slow backup off of a
    cascading standby, so that I have time to promote the parent standby
    before the backup completes. I tried continuing like this:
    
    ====
    # (4) set up standby
    pg_basebackup -D pgslave -R
    postgres -D pgslave -c port=5433
    
    # (5) set up cascading standby
    pg_basebackup -D pgslave2 -d port=5433 -R
    postgres -c port=5434 -D pgslave2
    
    # (6) dirty some pages on the master
    pgbench -i
    
    # (7) start a backup of the cascading standby
    pg_basebackup -D pgslave3 -d port=5434 -R -c spread
    ====
    
    However, the pg_basebackup in the last step completes after only a few
    seconds. If it were hanging, then I could continue with "pg_ctl
    promote -D pgslave" and that might give me what I need, but that's not
    what happens.
    
    I suspect I'm not doing quite what you had in mind here... thoughts?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  216. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-02T19:43:09Z

    On Thu, Apr 2, 2020 at 3:26 PM David Steele <david@pgmasters.net> wrote:
    > So, with the addition of the 0004 patch down-thread this looks
    > committable to me.
    
    Glad to hear it. Thank you.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  217. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-02T19:47:50Z

    On 2020-04-02 15:42:48 -0400, Robert Haas wrote:
    > I suspect I'm not doing quite what you had in mind here... thoughts?
    
    I have some ideas, but I think it's complicated enough that I'd not put
    it in the "pre commit path" for now.
    
    
    
    
  218. Re: backup manifests

    David Steele <david@pgmasters.net> — 2020-04-02T20:34:26Z

    On 4/2/20 3:47 PM, Andres Freund wrote:
    > On 2020-04-02 15:42:48 -0400, Robert Haas wrote:
    >> I suspect I'm not doing quite what you had in mind here... thoughts?
    > 
    > I have some ideas, but I think it's complicated enough that I'd not put
    > it in the "pre commit path" for now.
    
    +1. These would be great tests to have and a win for pg_basebackup 
    overall but I don't think they should be a prerequisite for this commit.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  219. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T19:22:23Z

    On Thu, Apr 2, 2020 at 4:34 PM David Steele <david@pgmasters.net> wrote:
    > +1. These would be great tests to have and a win for pg_basebackup
    > overall but I don't think they should be a prerequisite for this commit.
    
    Not to mention the server. I can't say that I have a lot of confidence
    that all of the server behavior in this area is well-understood and
    sane.
    
    I've pushed all the patches. Hopefully everyone is happy now, or at
    least not so unhappy that they're going to break quarantine to beat me
    up. I hope I acknowledged all of the relevant people in the commit
    message, but it's possible that I missed somebody; if so, my
    apologies. As is my usual custom, I added entries in roughly the order
    that people chimed in on the thread, so the ordering should not be
    taken as a reflection of magnitude of contribution or, well, anything
    other than the approximate order in which they chimed in.
    
    It looks like the buildfarm is unhappy though, so I guess I'd better
    go look at that.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  220. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T19:53:06Z

    On Fri, Apr 3, 2020 at 3:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > It looks like the buildfarm is unhappy though, so I guess I'd better
    > go look at that.
    
    I fixed two things so far, and there seems to be at least one more
    possible issue that I don't understand.
    
    1. Apparently, we have an automated perlcritic run built in to the
    build farm, and apparently, it really hates Perl subroutines that
    don't end with an explicit return statement. We have that overridden
    to severity 5 in our Perl critic configuration. I guess I should've
    known this, but didn't. I've pushed a fix adding return statements. I
    believe I'm on record as thinking that perlcritic is a tool for
    complaining about a lot of things that don't really matter and very
    few that actually do -- but it's project style, so I'll suck it up!
    
    2. Also, a bunch of machines were super-unhappy with
    003_corruption.pl, failing with this sort of thing:
    
    pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
    link target too long for tar format: file name "pg_tblspc/16387",
    target "/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"
    
    Apparently, this is a known problem and the solution is to use
    TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
    to make it do that.
    
    3. spurfowl has failed its last two runs like this:
    
    sh: 1: ./configure: not found
    
    I am not sure how this patch could've caused that to happen, but the
    timing of the failures is certainly suspicious.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  221. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T20:49:19Z

    On Fri, Apr 3, 2020 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > 2. Also, a bunch of machines were super-unhappy with
    > 003_corruption.pl, failing with this sort of thing:
    >
    > pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
    > link target too long for tar format: file name "pg_tblspc/16387",
    > target "/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"
    >
    > Apparently, this is a known problem and the solution is to use
    > TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
    > to make it do that.
    
    By and large, the buildfarm is a lot happier now, but fairywren
    (Windows / Msys Server 2019 / 2 gcc 7.3.0 x86_64) failed like this:
    
    # Postmaster PID for node "master" is 198420
    error running SQL: 'psql:<stdin>:3: ERROR:  directory
    "/tmp/9peoZHrEia" does not exist'
    while running 'psql -XAtq -d port=51493 host=127.0.0.1
    dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'CREATE TABLE x1
    (a int);
    INSERT INTO x1 VALUES (111);
    CREATE TABLESPACE ts1 LOCATION '/tmp/9peoZHrEia';
    CREATE TABLE x2 (a int) TABLESPACE ts1;
    INSERT INTO x1 VALUES (222);
    ' at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
    line 1531.
    ### Stopping node "master" using mode immediate
    
    I wondered why this should be failing on this machine when none of the
    other places where tempdir_short is used are similarly failing. The
    answer appears to be that most of the TAP tests that use tempdir_short
    just do this:
    
    my $tempdir_short = TestLib::tempdir_short;
    
    ...and then ignore that variable completely for the rest of the
    script.  That's not ideal, and we should probably remove those calls
    to avoid giving that it's actually used for something. The two TAP
    tests that actually do something with it - apart from the one I just
    added - are pg_basebackup's 010_pg_basebackup.pl and pg_ctl's
    001_start_stop.pl. However, both of those are skipped on Windows.
    Also, PostgresNode.pm itself uses it, but only when UNIX sockets are
    used, so again not on Windows. So it sorta looks to me like we no
    preexisting tests that meaningfully exercise TestLib::tempdir_short on
    Windows.
    
    Given that, I suppose I should consider myself lucky if this ends up
    working on *any* of the Windows critters, but given the implementation
    I'm kinda surprised we have a problem. That function is just:
    
    sub tempdir_short
    {
    
            return File::Temp::tempdir(CLEANUP => 1);
    }
    
    And File::Temp's documentation says that the temporary directory is
    picked using File::Spec's tmpdir(), which says that it knows about
    different operating systems and will DTRT on Unix, Mac, OS2, Win32,
    and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.
    
    Any ideas?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  222. Re: backup manifests

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-03T20:54:12Z

    On 2020-Apr-03, Robert Haas wrote:
    
    > sub tempdir_short
    > {
    > 
    >         return File::Temp::tempdir(CLEANUP => 1);
    > }
    > 
    > And File::Temp's documentation says that the temporary directory is
    > picked using File::Spec's tmpdir(), which says that it knows about
    > different operating systems and will DTRT on Unix, Mac, OS2, Win32,
    > and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.
    
    Maybe it needs perl2host?
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  223. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T21:07:00Z

    On Fri, Apr 3, 2020 at 4:54 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    > Maybe it needs perl2host?
    
    *jaw drops*
    
    Wow, OK, yeah, that looks like the thing.  Thanks for the suggestion;
    I didn't know that existed (and I kinda wish I still didn't).
    
    I'lll go see about adding that.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  224. Re: backup manifests

    Justin Pryzby <pryzby@telsasoft.com> — 2020-04-03T21:24:45Z

    On Fri, Apr 03, 2020 at 03:22:23PM -0400, Robert Haas wrote:
    > I've pushed all the patches.
    
    I didn't manage to look at this in advance but have some doc fixes.
    
    word-diff:
    
    diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
    index 536de9a698..d84afb7b18 100644
    --- a/doc/src/sgml/protocol.sgml
    +++ b/doc/src/sgml/protocol.sgml
    @@ -2586,7 +2586,7 @@ The commands accepted in replication mode are:
              and sent along with the backup.  The manifest is a list of every
              file present in the backup with the exception of any WAL files that
              may be included. It also stores the size, last modification time, and
              [-an optional-]{+optionally a+} checksum for each file.
              A value of <literal>force-escape</literal> forces all filenames
              to be hex-encoded; otherwise, this type of encoding is performed only
              for files whose names are non-UTF8 octet sequences.
    @@ -2602,7 +2602,7 @@ The commands accepted in replication mode are:
            <term><literal>MANIFEST_CHECKSUMS</literal></term>
            <listitem>
             <para>
              Specifies the {+checksum+} algorithm that should be applied to each file included
              in the backup manifest. Currently, the available
              algorithms are <literal>NONE</literal>, <literal>CRC32C</literal>,
              <literal>SHA224</literal>, <literal>SHA256</literal>,
    diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
    index c778e061f3..922688e227 100644
    --- a/doc/src/sgml/ref/pg_basebackup.sgml
    +++ b/doc/src/sgml/ref/pg_basebackup.sgml
    @@ -604,7 +604,7 @@ PostgreSQL documentation
            not contain any checksums. Otherwise, it will contain a checksum
            of each file in the backup using the specified algorithm. In addition,
            the manifest will always contain a <literal>SHA256</literal>
            checksum of its own [-contents.-]{+content.+} The <literal>SHA</literal> algorithms
            are significantly more CPU-intensive than <literal>CRC32C</literal>,
            so selecting one of them may increase the time required to complete
            the backup.
    @@ -614,7 +614,7 @@ PostgreSQL documentation
            of each file for users who wish to verify that the backup has not been
            tampered with, while the CRC32C algorithm provides a checksum which is
            much faster to calculate and good at catching errors due to accidental
            changes but is not resistant to [-targeted-]{+malicious+} modifications.  Note that, to
            be useful against an adversary who has access to the backup, the backup
            manifest would need to be stored securely elsewhere or otherwise
            verified not to have been modified since the backup was taken.
    diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml b/doc/src/sgml/ref/pg_validatebackup.sgml
    index 19888dc196..748ac439a6 100644
    --- a/doc/src/sgml/ref/pg_validatebackup.sgml
    +++ b/doc/src/sgml/ref/pg_validatebackup.sgml
    @@ -41,12 +41,12 @@ PostgreSQL documentation
      </para>
    
      <para>
       It is important to note that[-that-] the validation which is performed by
       <application>pg_validatebackup</application> does not and [-can not-]{+cannot+} include
       every check which will be performed by a running server when attempting
       to make use of the backup. Even if you use this tool, you should still
       perform test restores and verify that the resulting databases work as
       expected and that they[-appear to-] contain the correct data. However,
       <application>pg_validatebackup</application> can detect many problems
       that commonly occur due to storage problems or user error.
      </para>
    @@ -73,7 +73,7 @@ PostgreSQL documentation
       a <literal>backup_manifest</literal> file in the target directory or
       about anything inside <literal>pg_wal</literal>, even though these
       files won't be listed in the backup manifest. Only files are checked;
       the presence or absence [-or-]{+of+} directories is not verified, except
       indirectly: if a directory is missing, any files it should have contained
       will necessarily also be missing. 
      </para>
    @@ -84,7 +84,7 @@ PostgreSQL documentation
       for any files for which the computed checksum does not match the
       checksum stored in the manifest. This step is not performed for any files
       which produced errors in the previous step, since they are already known
       to have problems. [-Also, files-]{+Files+} which were ignored in the previous step are
       also ignored in this step.
      </para>
    
    @@ -123,7 +123,7 @@ PostgreSQL documentation
      <title>Options</title>
    
       <para>
        The following command-line options control the [-behavior.-]{+behavior of this program.+}
    
        <variablelist>
         <varlistentry>
    diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
    index 3b18e733cd..aa72a6ff10 100644
    --- a/src/backend/replication/basebackup.c
    +++ b/src/backend/replication/basebackup.c
    @@ -1148,7 +1148,7 @@ AddFileToManifest(manifest_info *manifest, const char *spcoid,
    	}
    
    	/*
    	 * Each file's entry [-need-]{+needs+} to be separated from any entry that follows by a
    	 * comma, but there's no comma before the first one or after the last one.
    	 * To make that work, adding a file to the manifest starts by terminating
    	 * the most recently added line, with a comma if appropriate, but does not
    
    -- 
    Justin
    
  225. backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T21:27:21Z

    [ splitting this off into a separate thread ]
    
    On Fri, Apr 3, 2020 at 5:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > I'lll go see about adding that.
    
    Done now. Meanwhile, two more machines have reported the mysterious message:
    
    sh: ./configure: not found
    
    ...that first appeared on spurfowl a few hours ago. The other two
    machines are eelpout and elver, both of which list Thomas Munro as a
    maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
    you check and see what's going on? spurfowl has failed this way four
    times now, and eelpout and elver have each failed the last two runs,
    but since there's no helpful information in the logs, it's hard to
    guess what went wrong.
    
    I'm sort of afraid that something in the new TAP tests accidentally
    removed way too many files during the cleanup phase - e.g. it decided
    the temporary directory was / and removed every file it could access,
    or something like that. It doesn't do that here, or I, uh, would've
    noticed by now. But sometimes strange things happen on other people's
    machines. Hopefully one of those strange things is not that my test
    code is single-handedly destroying the entire buildfarm, but it's
    possible.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  226. Re: backup manifests and contemporaneous buildfarm failures

    Fabien COELHO <coelho@cri.ensmp.fr> — 2020-04-03T21:58:30Z

    Hello Robert,
    
    > Done now. Meanwhile, two more machines have reported the mysterious message:
    >
    > sh: ./configure: not found
    >
    > ...that first appeared on spurfowl a few hours ago. The other two
    > machines are eelpout and elver, both of which list Thomas Munro as a
    > maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
    > you check and see what's going on? spurfowl has failed this way four
    > times now, and eelpout and elver have each failed the last two runs,
    > but since there's no helpful information in the logs, it's hard to
    > guess what went wrong.
    >
    > I'm sort of afraid that something in the new TAP tests accidentally
    > removed way too many files during the cleanup phase - e.g. it decided
    > the temporary directory was / and removed every file it could access,
    > or something like that. It doesn't do that here, or I, uh, would've
    > noticed by now. But sometimes strange things happen on other people's
    > machines. Hopefully one of those strange things is not that my test
    > code is single-handedly destroying the entire buildfarm, but it's
    > possible.
    
    seawasp just failed the same way. Good news, I can see "configure" under 
    "HEAD/pgsql".
    
    The only strange thing under buildroot I found is:
    
    HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
    
    this last directory perms are d--------- which seems to break cleanup.
    
    It may be a left over from a previous run which failed (possibly 21dc488 
    ?). I cannot see how this would be related to configure, though. Maybe 
    something else fails silently and the message is about a consequence of 
    the prior silent failure.
    
    I commented out the cron job and will try to look into it on tomorrow if 
    the status has not changed by then.
    
    -- 
    Fabien.
    
    
    
    
  227. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-03T22:12:48Z

    Fabien COELHO <coelho@cri.ensmp.fr> writes:
    > The only strange thing under buildroot I found is:
    
    > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
    
    > this last directory perms are d--------- which seems to break cleanup.
    
    Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
    to clean up the tmp_check directory left behind by "make check".
    So the new makefile is not fully plugged into its standard
    responsibilities.  I don't see any unreadable subdirectories though.
    
    I wonder if VPATH versus not-VPATH might be a relevant factor ...
    
    			regards, tom lane
    
    
    
    
  228. Re: backup manifests and contemporaneous buildfarm failures

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-03T22:24:41Z

    On 2020-Apr-03, Tom Lane wrote:
    
    > I wonder if VPATH versus not-VPATH might be a relevant factor ...
    
    Oh, absolutely.  The ones that failed show, in the last successful run,
    the configure line invoked as "./configure", while the animals that are
    still running are invoking configure from some other directory.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  229. Re: backup manifests and contemporaneous buildfarm failures

    Thomas Munro <thomas.munro@gmail.com> — 2020-04-03T22:29:50Z

    On Sat, Apr 4, 2020 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Fabien COELHO <coelho@cri.ensmp.fr> writes:
    > > The only strange thing under buildroot I found is:
    >
    > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
    >
    > > this last directory perms are d--------- which seems to break cleanup.
    
    Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
    presumably by the perl subroutine mutilate_open_directory_fails.  I
    see this in my inbox (the build farm wrote it to stderr or stdout
    rather than the log file):
    
    cannot chdir to child for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
    Permission denied at ./run_build.pl line 1013.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
    Directory not empty at ./run_build.pl line 1013.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
    Directory not empty at ./run_build.pl line 1013.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
    Directory not empty at ./run_build.pl line 1013.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
    at ./run_build.pl line 1013.
    cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
    Directory not empty at ./run_build.pl line 1013.
    cannot remove directory for pgsql.build/src/bin: Directory not empty
    at ./run_build.pl line 1013.
    cannot remove directory for pgsql.build/src: Directory not empty at
    ./run_build.pl line 1013.
    cannot remove directory for pgsql.build: Directory not empty at
    ./run_build.pl line 1013.
    cannot chdir to child for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
    Permission denied at ./run_build.pl line 589.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
    Directory not empty at ./run_build.pl line 589.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
    Directory not empty at ./run_build.pl line 589.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
    Directory not empty at ./run_build.pl line 589.
    cannot remove directory for
    pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
    at ./run_build.pl line 589.
    cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
    Directory not empty at ./run_build.pl line 589.
    cannot remove directory for pgsql.build/src/bin: Directory not empty
    at ./run_build.pl line 589.
    cannot remove directory for pgsql.build/src: Directory not empty at
    ./run_build.pl line 589.
    cannot remove directory for pgsql.build: Directory not empty at
    ./run_build.pl line 589.
    
    
    
    
  230. Re: backup manifests and contemporaneous buildfarm failures

    Stephen Frost <sfrost@snowman.net> — 2020-04-03T22:39:41Z

    Greetings,
    
    * Thomas Munro (thomas.munro@gmail.com) wrote:
    > On Sat, Apr 4, 2020 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > > Fabien COELHO <coelho@cri.ensmp.fr> writes:
    > > > The only strange thing under buildroot I found is:
    > >
    > > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
    > >
    > > > this last directory perms are d--------- which seems to break cleanup.
    > 
    > Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
    > presumably by the perl subroutine mutilate_open_directory_fails.  I
    > see this in my inbox (the build farm wrote it to stderr or stdout
    > rather than the log file):
    
    Yup, saw the same here.
    
    chmod'ing it to 755 seemed to result it the next run cleaning it up, at
    least.  Not sure how things will go on the next actual build tho.
    
    Thanks,
    
    Stephen
    
  231. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-03T22:48:01Z

    Thomas Munro <thomas.munro@gmail.com> writes:
    > Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
    > presumably by the perl subroutine mutilate_open_directory_fails.  I
    > see this in my inbox (the build farm wrote it to stderr or stdout
    > rather than the log file):
    
    > cannot chdir to child for
    > pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
    > Permission denied at ./run_build.pl line 1013.
    > cannot remove directory for
    > pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
    > Directory not empty at ./run_build.pl line 1013.
    
    I'm guessing that we're looking at a platform-specific difference in
    whether "rm -rf" fails outright on an unreadable subdirectory, or
    just tries to carry on by unlinking it anyway.
    
    A partial fix would be to have the test script put back normal
    permissions on that directory before it exits ... but any failure
    partway through the script would leave a time bomb requiring manual
    cleanup.
    
    On the whole, I'd argue that testing that behavior is not valuable
    enough to take risks of periodically breaking buildfarm members
    in a way that will require manual recovery --- to say nothing of
    annoying developers who trip over it.  So my vote is to remove
    that part of the test and be satisfied with checking the behavior
    for an unreadable file.
    
    This doesn't directly explain the failure-at-next-configure behavior
    that we're seeing in the buildfarm, but it wouldn't be too surprising
    if it ends up being that the buildfarm client script doesn't manage
    to fully recover from the situation.
    
    			regards, tom lane
    
    
    
    
  232. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-03T23:02:13Z

    I wrote:
    > I'm guessing that we're looking at a platform-specific difference in
    > whether "rm -rf" fails outright on an unreadable subdirectory, or
    > just tries to carry on by unlinking it anyway.
    
    Yeah... on my RHEL6 box, "make check" cleans up the working directories
    under tmp_check, but on a FreeBSD 12.1 box, not so much: I'm left with
    
    $ ls tmp_check/
    log/                            t_003_corruption_master_data/
    tgl@oldmini$ ls -R tmp_check/t_003_corruption_master_data/
    backup/
    
    tmp_check/t_003_corruption_master_data/backup:
    open_directory_fails/
    
    tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
    pg_subtrans/
    
    tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
    ls: tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: Permission denied
    
    I did not see any complaints printed to the terminal, but in
    regress_log_003_corruption there's
    
    ...
    ok 40 - corrupt backup fails validation: open_directory_fails: matches
    cannot chdir to child for /usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: Permission denied at t/003_corruption.pl line 126.
    cannot remove directory for /usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails: Directory not empty at t/003_corruption.pl line 126.
    # Running: pg_basebackup -D /usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/search_directory_fails --no-sync -T /tmp/lxaL_sLcnr=/tmp/_fegwVjoDR
    ok 41 - base backup ok
    ...
    
    This may be more of a Perl version issue than a platform issue,
    but either way it's a problem.
    
    Also, on the FreeBSD box, "rm -rf" isn't happy either:
    
    $ rm -rf tmp_check
    rm: tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: Permission denied
    rm: tmp_check/t_003_corruption_master_data/backup/open_directory_fails: Directory not empty
    rm: tmp_check/t_003_corruption_master_data/backup: Directory not empty
    rm: tmp_check/t_003_corruption_master_data: Directory not empty
    rm: tmp_check: Directory not empty
    
    
    			regards, tom lane
    
    
    
    
  233. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T23:50:13Z

    On Fri, Apr 3, 2020 at 6:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I'm guessing that we're looking at a platform-specific difference in
    > whether "rm -rf" fails outright on an unreadable subdirectory, or
    > just tries to carry on by unlinking it anyway.
    
    My intention was that it would be cleaned by the TAP framework itself,
    since the temporary directories it creates are marked for cleanup. But
    it may be that there's a platform dependency in the behavior of Perl's
    File::Path::rmtree, too.
    
    > A partial fix would be to have the test script put back normal
    > permissions on that directory before it exits ... but any failure
    > partway through the script would leave a time bomb requiring manual
    > cleanup.
    
    Yeah. I've pushed that fix for now, but as you say, it may not survive
    contact with the enemy. That's kind of disappointing, because I put a
    lot of work into trying to make the tests cover every line of code
    that they possibly could, and there's no reason to suppose that
    pg_validatebackup is the only tool that could benefit from having code
    coverage of those kinds of scenarios. It's probably not even the tool
    that is most in need of such testing; it must be far worse if, say,
    pg_rewind can't cope with it.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  234. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-03T23:55:34Z

    On Fri, Apr 3, 2020 at 5:58 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
    > seawasp just failed the same way. Good news, I can see "configure" under
    > "HEAD/pgsql".
    
    Ah, good.
    
    > The only strange thing under buildroot I found is:
    >
    > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
    
    Huh. I wonder how that got left behind ... it should've been cleaned
    up by the TAP test framework. But I pushed a commit to change the
    permissions back explicitly before exiting. As Tom says, I probably
    need to remove that entire test, but I'm going to try this first.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  235. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T00:12:15Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Fri, Apr 3, 2020 at 6:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I'm guessing that we're looking at a platform-specific difference in
    >> whether "rm -rf" fails outright on an unreadable subdirectory, or
    >> just tries to carry on by unlinking it anyway.
    
    > My intention was that it would be cleaned by the TAP framework itself,
    > since the temporary directories it creates are marked for cleanup. But
    > it may be that there's a platform dependency in the behavior of Perl's
    > File::Path::rmtree, too.
    
    Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
    the old build tree.  The man page for File::Path suggests (but can't
    quite bring itself to say in so many words) that by default, rmtree
    will adjust the permissions on target directories to allow the deletion
    to succeed.  But that's very clearly not happening on some platforms.
    (Maybe that represents a local patch on the part of some packagers
    who thought it was too unsafe?)
    
    Anyway, the end state presumably is that the pgsql.build directory
    is still there at the end of the buildfarm run, and the next run's
    attempt to also rmtree it fares no better.  Then look what it does
    to set up the new build:
    
    		system("cp -R -p $target $build_path 2>&1");
    
    Of course, if $build_path already exists, then cp copies to a subdirectory
    of the target not the target itself.  So that explains the symptom
    "./configure does not exist" --- it exists all right, but in a
    subdirectory below the one where the buildfarm expects it to be.
    
    It looks to me like the same problem would occur with VPATH or no.
    The lack of failures among the VPATH-using critters probably has
    more to do with whether their rmtree is willing to deal with this
    case than with VPATH.
    
    Anyway, it's evident that the buildfarm critters that are busted
    will need manual cleanup, because the script is not going to be
    able to get out of this by itself.  I remain of the opinion that
    the hazard of that happening again in the future (eg, if a buildfarm
    animal loses power during the test) is sufficient reason to remove
    this test case.
    
    			regards, tom lane
    
    
    
    
  236. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T00:18:49Z

    BTW, some of the buildfarm is showing a simpler portability problem:
    they think you were too cavalier about the difference between time_t
    and pg_time_t.  (On a platform with 32-bit time_t, that's an actual
    bug, probably.)  lapwing is actually failing:
    
    https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2020-04-03%2021%3A41%3A49
    
    ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Werror -I. -I. -I../../../src/include  -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o basebackup.o basebackup.c
    basebackup.c: In function 'AddFileToManifest':
    basebackup.c:1199:10: error: passing argument 1 of 'pg_gmtime' from incompatible pointer type [-Werror]
    In file included from ../../../src/include/access/xlog_internal.h:26:0,
                     from basebackup.c:20:
    ../../../src/include/pgtime.h:49:22: note: expected 'const pg_time_t *' but argument is of type 'time_t *'
    cc1: all warnings being treated as errors
    make[3]: *** [basebackup.o] Error 1
    
    but some others are showing it as a warning.
    
    I suppose that judicious s/time_t/pg_time_t/ would fix this.
    
    			regards, tom lane
    
    
    
    
  237. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T00:48:09Z

    On Fri, Apr 3, 2020 at 6:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
    > to clean up the tmp_check directory left behind by "make check".
    
    Fixed.
    
    I also tried to fix 'lapwing', which was complaining about about a
    call to pg_gmtime, saying that it "expected 'const pg_time_t *' but
    argument is of type 'time_t *'". I was thinking that the problem had
    something to do with const, but Thomas pointed out to me that
    pg_time_t != time_t, so I pushed a fix which assumes that was the
    issue. (It was certainly *an* issue.)
    
    'prairiedog' is also unhappy, and it looks related:
    
    /bin/sh ../../../../config/install-sh -c -d
    '/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'/tmp_check
    cd . && TESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'
    PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/bin:$PATH"
    DYLD_LIBRARY_PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/lib:$DYLD_LIBRARY_PATH"
     PGPORT='65678'
    PG_REGRESS='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts/../../../../src/test/regress/pg_regress'
    REGRESS_SHLIB='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/regress/regress.so'
    /usr/local/perl5.8.3/bin/prove -I ../../../../src/test/perl/ -I .
    t/*.pl
    t/001_base.........ok
    t/002_standby......FAILED--Further testing stopped: system pg_basebackup failed
    make: *** [check] Error 25
    
    Unfortunately, that error message is not very informative and for some
    reason the TAP logs don't seem to be included in the buildfarm output
    in this case, so it's hard to tell exactly what went wrong. This
    appears to be another 32-bit critter, which may be related somehow,
    but I don't know how exactly.
    
    'serinus' is also failing. This is less obviously related:
    
    [02:08:55] t/003_constraints.pl .. ok     2048 ms ( 0.01 usr  0.00 sys
    +  1.28 cusr  0.38 csys =  1.67 CPU)
    # poll_query_until timed out executing this query:
    # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN
    ('r', 's');
    # expecting this output:
    # t
    # last actual query output:
    # f
    # with stderr:
    
    But there's also this:
    
    2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection
    received: host=[local]
    2020-04-04 02:08:57.298 CEST [5e87d019.506c1:2] LOG:  replication
    connection authorized: user=bf
    application_name=tap_sub_16390_sync_16384
    2020-04-04 02:08:57.299 CEST [5e87d019.506c1:3] LOG:  statement: BEGIN
    READ ONLY ISOLATION LEVEL REPEATABLE READ
    2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
    replication command: CREATE_REPLICATION_SLOT
    "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
    slot "tap_sub_16390_sync_16384" already exists
    TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File:
    "/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c",
    Line: 718)
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)[0x54c61f]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)[0x9b3bc9]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(shmem_exit+0x35)[0x80db45]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)[0x80dc77]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(proc_exit+0x8)[0x80dd08]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(PostgresMain+0x59f)[0x83bd0f]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)[0x7a0264]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(main+0x6fb)[0x49749b]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83bbbb]
    postgres: publisher: walsender bf [local] idle in transaction
    (aborted)(_start+0x2a)[0x49753a]
    2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process
    (PID 329409) was terminated by signal 6: Aborted
    2020-04-04 02:08:57.302 CEST [5e87d018.5066b:5] DETAIL:  Failed
    process was running: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
    
    That might well be related. I note in passing that the DETAIL emitted
    by the postmaster shows the previous SQL command rather than the
    more-recent replication command, which seems like something to fix. (I
    still really dislike the fact that we have this evil hack allowing one
    connection to mix and match those sets of commands...)
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  238. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T00:54:17Z

    On Fri, Apr 3, 2020 at 8:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
    > the old build tree.  The man page for File::Path suggests (but can't
    > quite bring itself to say in so many words) that by default, rmtree
    > will adjust the permissions on target directories to allow the deletion
    > to succeed.  But that's very clearly not happening on some platforms.
    > (Maybe that represents a local patch on the part of some packagers
    > who thought it was too unsafe?)
    
    Interestingly, on my machine, rmtree coped with a mode 0 directory
    just fine, but mode 0400 was more than its tiny brain could handle, so
    the originally committed fix had code to revert 0400 back to 0700, but
    I didn't add similar code to revert from 0 back to 0700 because that
    was working fine.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  239. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T01:52:05Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > 'prairiedog' is also unhappy, and it looks related:
    
    Yeah, gaur also failed in the same place.  Both of those are
    alignment-picky 32-bit hardware, so I'm thinking the problem is
    pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
    aligned address.  I'm trying to confirm that on gaur's host right now,
    but it's a slow machine ...
    
    > 'serinus' is also failing. This is less obviously related:
    
    Dunno about this one.
    
    			regards, tom lane
    
    
    
    
  240. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T01:53:42Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Interestingly, on my machine, rmtree coped with a mode 0 directory
    > just fine, but mode 0400 was more than its tiny brain could handle, so
    > the originally committed fix had code to revert 0400 back to 0700, but
    > I didn't add similar code to revert from 0 back to 0700 because that
    > was working fine.
    
    It seems really odd that an implementation could cope with mode-0
    but not mode-400.  Not sure I care enough to dig into the Perl
    library code, though.
    
    			regards, tom lane
    
    
    
    
  241. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T02:43:31Z

    On Fri, Apr 3, 2020 at 9:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > 'prairiedog' is also unhappy, and it looks related:
    >
    > Yeah, gaur also failed in the same place.  Both of those are
    > alignment-picky 32-bit hardware, so I'm thinking the problem is
    > pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
    > aligned address.  I'm trying to confirm that on gaur's host right now,
    > but it's a slow machine ...
    
    You might just want to wait until tomorrow and see whether it clears
    up in newer runs. I just pushed yet another fix that might be
    relevant.
    
    I think I've done about as much as I can do for tonight, though. Most
    things are green now, and the ones that aren't are failing because of
    stuff that is at least plausibly fixed. By morning it should be
    clearer how much broken stuff is left, although that will be somewhat
    complicated by at least sidewinder and seawasp needing manual
    intervention to get back on track.
    
    I apologize to everyone who has been or will be inconvenienced by all
    of this. So far I've pushed 4 test case fixes, 2 bug fixes, and 1
    makefile fix, which I'm pretty sure is over quota for one patch. :-(
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  242. Re: backup manifests and contemporaneous buildfarm failures

    Andres Freund <andres@anarazel.de> — 2020-04-04T03:06:28Z

    Hi,
    
    Peter, Petr, CCed you because it's probably a bug somewhere around the
    initial copy code for logical replication.
    
    
    On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
    > 'serinus' is also failing. This is less obviously related:
    
    Hm. Tests passed once since then.
    
    
    > 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
    > replication command: CREATE_REPLICATION_SLOT
    > "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    > 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
    > slot "tap_sub_16390_sync_16384" already exists
    
    That already seems suspicious. I checked the following (successful) run
    and I did not see that in the stage's logs.
    
    Looking at the failing log, it fails because for some reason there's
    rounds (once due to a refresh, once due to an intention replication
    failure) of copying the relation. Each creates its own temporary slot.
    
    first time:
    2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: host=[local]
    2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY public.tab_rep TO STDOUT
    2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session time: 0:00:00.007 user=bf database=postgres host=[local]
    
    second time:
    2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: host=[local]
    2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY public.tab_rep TO STDOUT
    
    third time:
    2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: host=[local]
    2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot "tap_sub_16390_sync_16384" already exists
    
    Note that the connection from the second attempt has not yet
    disconnected. Hence the error about the replication slot already
    existing - it's a temporary replication slot that'd otherwise already
    have been dropped.
    
    
    Seems the logical rep code needs to do something about this race?
    
    
    About the assertion failure:
    
    TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File: "/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c", Line: 718)
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x54c61f]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x9b3bc9]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(shmem_exit+0x35)[0x80db45]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x80dc77]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(proc_exit+0x8)[0x80dd08]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(PostgresMain+0x59f)[0x83bd0f]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)[0x7a0264]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(main+0x6fb)[0x49749b]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83bbbb]
    postgres: publisher: walsender bf [local] idle in transaction (aborted)(_start+0x2a)[0x49753a]
    2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process (PID 329409) was terminated by signal 6: Aborted
    
    Due to the log_line_prefix used, I was at first not entirely sure the
    backend that crashed was the one with the ERROR. But it appears we print
    the pid as hex for '%c' (why?), so it indeed is the one.
    
    
    I, again, have to say that the amount of stuff that was done as part of
    
    commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
    Author: Peter Eisentraut <peter_e@gmx.net>
    Date:   2017-03-23 08:36:36 -0400
    
        Logical replication support for initial data copy
    
    is insane. Adding support for running sql over replication connections
    and extending CREATE_REPLICATION_SLOT with new options (without even
    mentioning that in the commit message!) as part of a commit described as
    "Logical replication support for initial data copy" shouldn't happen.
    
    
    It's not obvious to me what buffer pins could be held at this point. I
    wonder if this could be somehow related to
    
    commit 3cb646264e8ced9f25557ce271284da512d92043
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date:   2018-07-18 12:15:16 -0400
    
        Use a ResourceOwner to track buffer pins in all cases.
    ...
        In passing, remove some other ad-hoc resource owner creations that had
        gotten cargo-culted into various other places.  As far as I can tell
        that was all unnecessary, and if it had been necessary it was incomplete,
        due to lacking any provision for clearing those resowners later.
        (Also worth noting in this connection is that a process that hasn't called
        InitBufferPoolBackend has no business accessing buffers; so there's more
        to do than just add the resowner if we want to touch buffers in processes
        not covered by this patch.)
    
    which removed the resowner previously used in walsender. At the very
    least we should remove the SavedResourceOwnerDuringExport dance that's
    still done in snapbuild.c.  But it can't really be at fault here,
    because the crashing backend won't have used that.
    
    
    So I'm a bit confused here. The best approach is probably to try to
    reproduce this by adding an artifical delay into backend shutdown.
    
    
    > (I still really dislike the fact that we have this evil hack allowing
    > one connection to mix and match those sets of commands...)
    
    FWIW, I think the opposite. We should get rid of the difference as much
    as possible.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  243. Re: backup manifests and contemporaneous buildfarm failures

    Petr Jelinek <petr@2ndquadrant.com> — 2020-04-04T05:01:36Z

    On 04/04/2020 05:06, Andres Freund wrote:
    > Hi,
    > 
    > Peter, Petr, CCed you because it's probably a bug somewhere around the
    > initial copy code for logical replication.
    > 
    > 
    > On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
    >> 'serinus' is also failing. This is less obviously related:
    > 
    > Hm. Tests passed once since then.
    > 
    > 
    >> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
    >> replication command: CREATE_REPLICATION_SLOT
    >> "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    >> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
    >> slot "tap_sub_16390_sync_16384" already exists
    > 
    > That already seems suspicious. I checked the following (successful) run
    > and I did not see that in the stage's logs.
    > 
    > Looking at the failing log, it fails because for some reason there's
    > rounds (once due to a refresh, once due to an intention replication
    > failure) of copying the relation. Each creates its own temporary slot.
    > 
    > first time:
    > 2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: host=[local]
    > 2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    > 2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY public.tab_rep TO STDOUT
    > 2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session time: 0:00:00.007 user=bf database=postgres host=[local]
    > 
    > second time:
    > 2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: host=[local]
    > 2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    > 2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY public.tab_rep TO STDOUT
    > 
    > third time:
    > 2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: host=[local]
    > 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
    > 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot "tap_sub_16390_sync_16384" already exists
    > 
    > Note that the connection from the second attempt has not yet
    > disconnected. Hence the error about the replication slot already
    > existing - it's a temporary replication slot that'd otherwise already
    > have been dropped.
    > 
    > 
    > Seems the logical rep code needs to do something about this race?
    > 
    
    The downstream:
    
    > 2020-04-04 02:08:57.275 CEST [5e87d019.506bc:1] LOG:  logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started
    > 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:2] ERROR:  duplicate key value violates unique constraint "tab_rep_pkey"
    > 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:3] DETAIL:  Key (a)=(1) already exists.
    > 2020-04-04 02:08:57.282 CEST [5e87d019.506bc:4] CONTEXT:  COPY tab_rep, line 1
    > 2020-04-04 02:08:57.283 CEST [5e87d018.50689:5] LOG:  background worker "logical replication worker" (PID 329404) exited with exit code 1
    > 2020-04-04 02:08:57.287 CEST [5e87d019.506be:1] LOG:  logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started
    > 2020-04-04 02:08:57.293 CEST [5e87d019.506be:2] ERROR:  duplicate key value violates unique constraint "tab_rep_pkey"
    > 2020-04-04 02:08:57.293 CEST [5e87d019.506be:3] DETAIL:  Key (a)=(1) already exists.
    > 2020-04-04 02:08:57.293 CEST [5e87d019.506be:4] CONTEXT:  COPY tab_rep, line 1
    > 2020-04-04 02:08:57.295 CEST [5e87d018.50689:6] LOG:  background worker "logical replication worker" (PID 329406) exited with exit code 1
    > 2020-04-04 02:08:57.297 CEST [5e87d019.506c0:1] LOG:  logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started
    > 2020-04-04 02:08:57.299 CEST [5e87d019.506c0:2] ERROR:  could not create replication slot "tap_sub_16390_sync_16384": ERROR:  replication slot "tap_sub_16390_sync_16384" already exists
    > 2020-04-04 02:08:57.300 CEST [5e87d018.50689:7] LOG:  background worker "logical replication worker" (PID 329408) exited with exit code 
    
    Looks like we are simply retrying so fast that upstream will not have 
    finished cleanup after second try by the time we already run the third one.
    
    The last_start_times is supposed to protect against that so I guess 
    there is some issue with how that works.
    
    -- 
    Petr Jelinek
    2ndQuadrant - PostgreSQL Solutions for the Enterprise
    https://www.2ndQuadrant.com/
    
    
    
    
  244. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T13:20:51Z

    On Fri, Apr 3, 2020 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
    > On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
    > > 'serinus' is also failing. This is less obviously related:
    >
    > Hm. Tests passed once since then.
    
    Yeah, but conchuela also failed once in what I think was a similar
    way. I suspect the fix I pushed last night
    (3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd) may have been enough to
    clear this up.
    
    > That already seems suspicious. I checked the following (successful) run
    > and I did not see that in the stage's logs.
    
    Yeah, the behavior of the test case doesn't seem to be entirely deterministic.
    
    > I, again, have to say that the amount of stuff that was done as part of
    >
    > commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
    > Author: Peter Eisentraut <peter_e@gmx.net>
    > Date:   2017-03-23 08:36:36 -0400
    >
    >     Logical replication support for initial data copy
    >
    > is insane. Adding support for running sql over replication connections
    > and extending CREATE_REPLICATION_SLOT with new options (without even
    > mentioning that in the commit message!) as part of a commit described as
    > "Logical replication support for initial data copy" shouldn't happen.
    
    I agreed then and still do.
    
    > So I'm a bit confused here. The best approach is probably to try to
    > reproduce this by adding an artifical delay into backend shutdown.
    
    I was able to reproduce an assertion failure by starting a
    transaction, running a replication command that failed, and then
    exiting the backend. 3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd made
    that go away. I had wrongly assumed that there was no other way for a
    walsender to have a ResourceOwner, and in the face of SQL commands
    also being executed by walsenders, that's clearly not true. I'm not
    sure *precisely* how that lead to the BF failures, but it was really
    clear that it was wrong.
    
    > > (I still really dislike the fact that we have this evil hack allowing
    > > one connection to mix and match those sets of commands...)
    >
    > FWIW, I think the opposite. We should get rid of the difference as much
    > as possible.
    
    Well, that's another approach. It's OK to have one system and it's OK
    to have two systems, but one and a half is not ideal.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  245. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T13:34:52Z

    On Fri, Apr 3, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > BTW, some of the buildfarm is showing a simpler portability problem:
    > they think you were too cavalier about the difference between time_t
    > and pg_time_t.  (On a platform with 32-bit time_t, that's an actual
    > bug, probably.)  lapwing is actually failing:
    >
    > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2020-04-03%2021%3A41%3A49
    >
    > ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Werror -I. -I. -I../../../src/include  -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o basebackup.o basebackup.c
    > basebackup.c: In function 'AddFileToManifest':
    > basebackup.c:1199:10: error: passing argument 1 of 'pg_gmtime' from incompatible pointer type [-Werror]
    > In file included from ../../../src/include/access/xlog_internal.h:26:0,
    >                  from basebackup.c:20:
    > ../../../src/include/pgtime.h:49:22: note: expected 'const pg_time_t *' but argument is of type 'time_t *'
    > cc1: all warnings being treated as errors
    > make[3]: *** [basebackup.o] Error 1
    >
    > but some others are showing it as a warning.
    >
    > I suppose that judicious s/time_t/pg_time_t/ would fix this.
    
    I think you sent this email just after I pushed
    db1531cae00941bfe4f6321fdef1e1ef355b6bed, or maybe after I'd committed
    it locally and just before I pushed it. If you prefer a different fix
    than what I did there, I can certainly whack it around some more.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  246. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T13:36:08Z

    On Fri, Apr 3, 2020 at 10:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
    > I think I've done about as much as I can do for tonight, though. Most
    > things are green now, and the ones that aren't are failing because of
    > stuff that is at least plausibly fixed. By morning it should be
    > clearer how much broken stuff is left, although that will be somewhat
    > complicated by at least sidewinder and seawasp needing manual
    > intervention to get back on track.
    
    Taking stock of the situation this morning, most of the buildfarm is
    now green. There are three failures, on eelpout (6 hours ago),
    fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
    
    eelpout is unhappy because:
    
    +WARNING:  could not remove shared memory segment
    "/PostgreSQL.248989127": No such file or directory
    +WARNING:  could not remove shared memory segment
    "/PostgreSQL.1450751626": No such file or directory
      multibatch
     ------------
      f
    @@ -861,22 +863,15 @@
    
     select length(max(s.t))
     from wide left join (select id, coalesce(t, '') || '' as t from wide)
    s using (id);
    - length
    ---------
    - 320000
    -(1 row)
    -
    +ERROR:  could not open shared memory segment "/PostgreSQL.605707657":
    No such file or directory
    +CONTEXT:  parallel worker
    
    I'm not sure what caused that exactly, but it sorta looks like
    operator intervention. Thomas, any ideas?
    
    fairywren's last run was on 21dc488, and commit
    460314db08e8688e1a54a0a26657941e058e45c5 was an attempt to fix what
    broken there. I guess we'll find out whether that worked the next time
    it runs.
    
    hyrax's last run was before any of this happened, so it seems to have
    an unrelated problem. The last two runs, three and six days ago, both
    failed like this:
    
    -ERROR:  stack depth limit exceeded
    +ERROR:  stack depth limit exceeded at character 8
    
    Not sure what that's about.
    
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  247. Re: backup manifests

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T14:43:51Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Fri, Apr 3, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I suppose that judicious s/time_t/pg_time_t/ would fix this.
    
    > I think you sent this email just after I pushed
    > db1531cae00941bfe4f6321fdef1e1ef355b6bed, or maybe after I'd committed
    > it locally and just before I pushed it. If you prefer a different fix
    > than what I did there, I can certainly whack it around some more.
    
    Yeah, that commit showed up moments after I sent this.  Your fix
    seems fine -- at least prairiedog and gaur are OK with it.
    (I did verify that gaur was reproducibly crashing at that new
    pg_strftime call, so we know it was that and not some on-again-
    off-again issue.)
    
    			regards, tom lane
    
    
    
    
  248. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T14:57:32Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > hyrax's last run was before any of this happened, so it seems to have
    > an unrelated problem. The last two runs, three and six days ago, both
    > failed like this:
    
    > -ERROR:  stack depth limit exceeded
    > +ERROR:  stack depth limit exceeded at character 8
    
    > Not sure what that's about.
    
    What it looks like is that hyrax is managing to detect stack overflow
    at a point where an errcontext callback is active that adds an error
    cursor to the failure.
    
    It's not so surprising that we could get a different result that way
    from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced
    cache reloads would cause extra stack expenditure at a lot of places.
    And it could vary depending on totally random details, like the number
    of local variables in seemingly unrelated code.  What is odd is that
    (AFAIR) we've never seen this before.  Maybe somebody recently added
    an error cursor callback in a place that didn't have it before, and
    is involved in SQL-function processing?  None of the commits leading
    up to the earlier failure look promising for that, though.
    
    			regards, tom lane
    
    
    
    
  249. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-04T17:05:22Z

    On Sat, Apr 4, 2020 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > It's not so surprising that we could get a different result that way
    > from a CLOBBER_CACHE_ALWAYS animal like hyrax, since CCA-forced
    > cache reloads would cause extra stack expenditure at a lot of places.
    > And it could vary depending on totally random details, like the number
    > of local variables in seemingly unrelated code.
    
    Oh, yeah. That's unfortunate.
    
    > What is odd is that
    > (AFAIR) we've never seen this before.  Maybe somebody recently added
    > an error cursor callback in a place that didn't have it before, and
    > is involved in SQL-function processing?  None of the commits leading
    > up to the earlier failure look promising for that, though.
    
    The relevant range of commits (e8b1774fc2 to a7b9d24e4e) includes an
    ereport change (bda6dedbea) and a couple of "simple expression"
    changes (8f59f6b9c0, fbc7a71608) but I don't know exactly why they
    would have caused this. It seems at least possible, though, that
    changing the return type of functions involved in error reporting
    would slightly change the amount of stack space used; and the others
    are related to SQL-function processing. Other than experimenting on
    that machine, I'm not sure how we could really determine the relevant
    factors here.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  250. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-04T18:36:26Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > On Sat, Apr 4, 2020 at 10:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> What is odd is that
    >> (AFAIR) we've never seen this before.  Maybe somebody recently added
    >> an error cursor callback in a place that didn't have it before, and
    >> is involved in SQL-function processing?  None of the commits leading
    >> up to the earlier failure look promising for that, though.
    
    > The relevant range of commits (e8b1774fc2 to a7b9d24e4e) includes an
    > ereport change (bda6dedbea) and a couple of "simple expression"
    > changes (8f59f6b9c0, fbc7a71608) but I don't know exactly why they
    > would have caused this.
    
    When I first noticed hyrax's failure, some days ago, I immediately
    thought of the "simple expression" patch.  But that should not have
    affected SQL-function processing in any way: the bulk of the changes
    were in plpgsql, and even the changes in plancache could not be
    relevant, because functions.c does not use the plancache.
    
    As for ereport, you'd think that that would only matter once you were
    already doing an ereport.  The point at which the stack overflow
    check triggers should be in normal code, not error recovery.
    
    > It seems at least possible, though, that
    > changing the return type of functions involved in error reporting
    > would slightly change the amount of stack space used;
    
    Right, but if it's down to that sort of phase-of-the-moon codegen
    difference, you'd think this failure would have been coming and
    going for years.  I still suppose that some fairly recent change
    must be contributing to this, but haven't had time to investigate.
    
    > Other than experimenting on
    > that machine, I'm not sure how we could really determine the relevant
    > factors here.
    
    We don't have a lot of CCA buildfarm machines, so I'm suspecting that
    it's probably not that hard to repro if you build with CCA.
    
    			regards, tom lane
    
    
    
    
  251. Re: backup manifests and contemporaneous buildfarm failures

    Thomas Munro <thomas.munro@gmail.com> — 2020-04-04T21:54:11Z

    On Sun, Apr 5, 2020 at 2:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
    > eelpout is unhappy because:
    >
    > +WARNING:  could not remove shared memory segment
    > "/PostgreSQL.248989127": No such file or directory
    > +WARNING:  could not remove shared memory segment
    > "/PostgreSQL.1450751626": No such file or directory
    
    Seems to have fixed itself while I was sleeping. I did happen run
    apt-get upgrade on that box some time yesterday-ish, but I don't
    understand what mechanism would trash my /dev/shm in that process.
    /me eyes systemd with suspicion
    
    
    
    
  252. Re: backup manifests and contemporaneous buildfarm failures

    Mikael Kjellström <mikael.kjellstrom@mksoft.nu> — 2020-04-05T13:10:15Z

    On 2020-04-04 04:43, Robert Haas wrote:
    
    > I think I've done about as much as I can do for tonight, though. Most
    > things are green now, and the ones that aren't are failing because of
    > stuff that is at least plausibly fixed. By morning it should be
    > clearer how much broken stuff is left, although that will be somewhat
    > complicated by at least sidewinder and seawasp needing manual
    > intervention to get back on track.
    
    I fixed sidewinder I think.  Should clear up the next time it runs.
    
    It was the mode on the directory it couldn't handle-  A regular rm -rf 
    didn't work I had to do a chmod -R 700 on all directories to be able to 
    manually remove it.
    
    /Mikael
    
    
    
    
  253. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-05T19:31:18Z

    Hi,
    
    On 2020-04-03 15:22:23 -0400, Robert Haas wrote:
    > I've pushed all the patches.
    
    Seeing new warnings in an optimized build
    
    /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c: In function 'json_manifest_object_end':
    /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:591:2: warning: 'end_lsn' may be used uninitialized in this function [-Wmaybe-uninitialized]
      591 |  context->perwalrange_cb(context, tli, start_lsn, end_lsn);
          |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:567:5: note: 'end_lsn' was declared here
      567 |     end_lsn;
          |     ^~~~~~~
    /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:591:2: warning: 'start_lsn' may be used uninitialized in this function [-Wmaybe-uninitialized]
      591 |  context->perwalrange_cb(context, tli, start_lsn, end_lsn);
          |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /home/andres/src/postgresql-master/src/bin/pg_validatebackup/parse_manifest.c:566:13: note: 'start_lsn' was declared here
      566 |  XLogRecPtr start_lsn,
          |             ^~~~~~~~~
    
    The warnings don't seem too unreasonable. The compiler can't see that
    the error_cb inside json_manifest_parse_failure() is not expected to
    return. Probably worth adding a wrapper around the calls to
    context->error_cb and mark that as noreturn.
    
    - Andres
    
    
    
    
  254. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-05T20:06:58Z

    On 4/5/20 9:10 AM, Mikael Kjellström wrote:
    > On 2020-04-04 04:43, Robert Haas wrote:
    >
    >> I think I've done about as much as I can do for tonight, though. Most
    >> things are green now, and the ones that aren't are failing because of
    >> stuff that is at least plausibly fixed. By morning it should be
    >> clearer how much broken stuff is left, although that will be somewhat
    >> complicated by at least sidewinder and seawasp needing manual
    >> intervention to get back on track.
    >
    > I fixed sidewinder I think.  Should clear up the next time it runs.
    >
    > It was the mode on the directory it couldn't handle-  A regular rm -rf
    > didn't work I had to do a chmod -R 700 on all directories to be able
    > to manually remove it.
    >
    >
    
    
    Hmm, the buildfarm client does this at the beginning of each run to
    remove anything that might be left over from a previous run:
    
    
        rmtree("inst");
        rmtree("$pgsql") unless ($from_source && !$use_vpath);
    
    
    Do I need to precede those with some recursive chmod commands? Perhaps
    the client should refuse to run if there is still something left after
    these.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  255. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-05T20:12:26Z

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
    > Hmm, the buildfarm client does this at the beginning of each run to
    > remove anything that might be left over from a previous run:
    
    >     rmtree("inst");
    >     rmtree("$pgsql") unless ($from_source && !$use_vpath);
    
    Right, the point is precisely that some versions of rmtree() fail
    to remove a mode-0 subdirectory.
    
    > Do I need to precede those with some recursive chmod commands? Perhaps
    > the client should refuse to run if there is still something left after
    > these.
    
    I think the latter would be a very good idea, just so that this sort of
    failure is less obscure.  Not sure about whether a recursive chmod is
    really going to be worth the cycles.  (On the other hand, the normal
    case should be that there's nothing there anyway, so maybe it's not
    going to be costly.)  
    
    			regards, tom lane
    
    
    
    
  256. Re: backup manifests and contemporaneous buildfarm failures

    Fabien COELHO <coelho@cri.ensmp.fr> — 2020-04-06T05:18:10Z

    Hello,
    
    >> Do I need to precede those with some recursive chmod commands? Perhaps
    >> the client should refuse to run if there is still something left after
    >> these.
    >
    > I think the latter would be a very good idea, just so that this sort of
    > failure is less obscure.  Not sure about whether a recursive chmod is
    > really going to be worth the cycles.  (On the other hand, the normal
    > case should be that there's nothing there anyway, so maybe it's not
    > going to be costly.)
    
    Could it be a two-stage process to minimize cost but still be resilient?
    
       rmtree
       if (-d $DIR) {
         emit warning
         chmodtree
         rmtree again
         if (-d $DIR)
           emit error
       }
    
    -- 
    Fabien.
    
    
    
    
  257. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-06T11:53:07Z

    On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan
    <andrew.dunstan@2ndquadrant.com> wrote:
    > Do I need to precede those with some recursive chmod commands?
    
    +1.
    
    > Perhaps
    > the client should refuse to run if there is still something left after
    > these.
    
    +1 to that, too.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  258. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-06T20:06:52Z

    On 4/6/20 7:53 AM, Robert Haas wrote:
    > On Sun, Apr 5, 2020 at 4:07 PM Andrew Dunstan
    > <andrew.dunstan@2ndquadrant.com> wrote:
    >> Do I need to precede those with some recursive chmod commands?
    > +1.
    >
    >> Perhaps
    >> the client should refuse to run if there is still something left after
    >> these.
    > +1 to that, too.
    >
    
    
    See
    https://github.com/PGBuildFarm/client-code/commit/0ef76bb1e2629713898631b9a3380d02d41c60ad
    
    
    This will be in the next release, probably fairly soon.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  259. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-07T04:37:24Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > Taking stock of the situation this morning, most of the buildfarm is
    > now green. There are three failures, on eelpout (6 hours ago),
    > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
    
    fairywren has now done this twice in the pg_validatebackupCheck step:
    
    exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
     at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
    
    I'm a tad suspicious that it needs another perl2host()
    somewhere, but the log isn't very clear as to where.
    
    More generally, I wonder if we ought to be trying to
    centralize those perl2host() calls instead of sticking
    them into individual test cases.
    
    			regards, tom lane
    
    
    
    
  260. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-07T13:11:02Z

    On Mon, Apr 6, 2020 at 1:18 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
    >
    >
    > Hello,
    >
    > >> Do I need to precede those with some recursive chmod commands? Perhaps
    > >> the client should refuse to run if there is still something left after
    > >> these.
    > >
    > > I think the latter would be a very good idea, just so that this sort of
    > > failure is less obscure.  Not sure about whether a recursive chmod is
    > > really going to be worth the cycles.  (On the other hand, the normal
    > > case should be that there's nothing there anyway, so maybe it's not
    > > going to be costly.)
    >
    > Could it be a two-stage process to minimize cost but still be resilient?
    >
    >    rmtree
    >    if (-d $DIR) {
    >      emit warning
    >      chmodtree
    >      rmtree again
    >      if (-d $DIR)
    >        emit error
    >    }
    >
    
    
    I thought about doing that. However, it's not really necessary. In the
    normal course of events these directories should have been removed at
    the end of the previous run, so we're only dealing with exceptional
    cases here.
    
    cheers
    
    andrew
    
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  261. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-07T13:42:09Z

    On Tue, Apr 7, 2020 at 12:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > Robert Haas <robertmhaas@gmail.com> writes:
    > > Taking stock of the situation this morning, most of the buildfarm is
    > > now green. There are three failures, on eelpout (6 hours ago),
    > > fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
    >
    > fairywren has now done this twice in the pg_validatebackupCheck step:
    >
    > exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
    >  at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
    >
    > I'm a tad suspicious that it needs another perl2host()
    > somewhere, but the log isn't very clear as to where.
    >
    > More generally, I wonder if we ought to be trying to
    > centralize those perl2host() calls instead of sticking
    > them into individual test cases.
    >
    >
    
    
    Not sure about that. I'll see if I can run it by hand and get some
    more info. What's quite odd is that jacana (a very similar setup) is
    passing this happily.
    
    cheers
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  262. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-08T05:15:38Z

    
    On 2020/04/04 4:22, Robert Haas wrote:
    > On Thu, Apr 2, 2020 at 4:34 PM David Steele <david@pgmasters.net> wrote:
    >> +1. These would be great tests to have and a win for pg_basebackup
    >> overall but I don't think they should be a prerequisite for this commit.
    > 
    > Not to mention the server. I can't say that I have a lot of confidence
    > that all of the server behavior in this area is well-understood and
    > sane.
    > 
    > I've pushed all the patches.
    
    When there is a backup_manifest in the database cluster, it's included in
    the backup even when --no-manifest is specified. ISTM that this is problematic
    because the backup_manifest is obviously not valid for the backup.
    So, isn't it better to always exclude the *existing* backup_manifest in the
    cluster from the backup, like backup_label/tablespace_map? Patch attached.
    
    Also I found the typo in the document. Patch attached.
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
  263. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-08T17:35:52Z

    On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    > When there is a backup_manifest in the database cluster, it's included in
    > the backup even when --no-manifest is specified. ISTM that this is problematic
    > because the backup_manifest is obviously not valid for the backup.
    > So, isn't it better to always exclude the *existing* backup_manifest in the
    > cluster from the backup, like backup_label/tablespace_map? Patch attached.
    >
    > Also I found the typo in the document. Patch attached.
    
    Both patches look good. The second one is definitely a mistake on my
    part, and the first one seems like a totally reasonable change.
    Thanks!
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  264. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-08T17:45:39Z

    On 4/7/20 9:42 AM, Andrew Dunstan wrote:
    > On Tue, Apr 7, 2020 at 12:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> Robert Haas <robertmhaas@gmail.com> writes:
    >>> Taking stock of the situation this morning, most of the buildfarm is
    >>> now green. There are three failures, on eelpout (6 hours ago),
    >>> fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
    >> fairywren has now done this twice in the pg_validatebackupCheck step:
    >>
    >> exec failed: Bad address at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
    >>  at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 340.
    >>
    >> I'm a tad suspicious that it needs another perl2host()
    >> somewhere, but the log isn't very clear as to where.
    >>
    >> More generally, I wonder if we ought to be trying to
    >> centralize those perl2host() calls instead of sticking
    >> them into individual test cases.
    >>
    >>
    >
    > Not sure about that. I'll see if I can run it by hand and get some
    > more info. What's quite odd is that jacana (a very similar setup) is
    > passing this happily.
    >
    
    
    OK, tricky, but here's what I did to get this working on fairywren.
    
    
    First, on Msys2 there is a problem with name mangling. We've had to fix
    this before by telling it to ignore certain argument prefixes.
    
    
    Second, once that was fixed rmdir was failing on the tablespace. On
    Windows this is a junction, so unlink is the correct thing to do, I
    believe, just as it is on Unix where it's a symlink.
    
    
    cheers
    
    
    andrew
    
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
  265. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-08T17:59:38Z

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
    > OK, tricky, but here's what I did to get this working on fairywren.
    > First, on Msys2 there is a problem with name mangling. We've had to fix
    > this before by telling it to ignore certain argument prefixes.
    > Second, once that was fixed rmdir was failing on the tablespace. On
    > Windows this is a junction, so unlink is the correct thing to do, I
    > believe, just as it is on Unix where it's a symlink.
    
    Hmm, no opinion about the name mangling business, but the other part
    seems like it might break jacana and/or bowerbird, which are currently
    happy with this test?  (AFAICS we only have four Windows animals
    running the TAP tests, and the fourth (drongo) hasn't reported in
    for awhile.)
    
    I guess we could commit it and find out.  I'm all for the simpler
    coding if it works.
    
    			regards, tom lane
    
    
    
    
  266. Re: backup manifests and contemporaneous buildfarm failures

    Robert Haas <robertmhaas@gmail.com> — 2020-04-08T19:41:30Z

    On Wed, Apr 8, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > I guess we could commit it and find out.  I'm all for the simpler
    > coding if it works.
    
    I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
    $source_ts_prefix does, but the remove/unlink condition was suggested
    by Amit Kapila on the basis of testing on his Windows development
    environment, so I suspect that's actually needed on at least some
    systems. I just work here, though.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  267. Re: backup manifests and contemporaneous buildfarm failures

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-08T19:48:40Z

    On 4/8/20 3:41 PM, Robert Haas wrote:
    > On Wed, Apr 8, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >> I guess we could commit it and find out.  I'm all for the simpler
    >> coding if it works.
    > I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
    > $source_ts_prefix does, 
    
    
    You don't want to know ....
    
    
    See <https://www.msys2.org/wiki/Porting/#filesystem-namespaces> for the
    gory details.
    
    
    It's the tablespace map parameter that is upsetting it.
    
    
    
    > but the remove/unlink condition was suggested
    > by Amit Kapila on the basis of testing on his Windows development
    > environment, so I suspect that's actually needed on at least some
    > systems. I just work here, though.
    >
    
    Yeah, drongo doesn't like it, so we'll have to tweak the logic.
    
    
    I'll update after some more testing.
    
    
    cheers
    
    
    andrew
    
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  268. Re: backup manifests and contemporaneous buildfarm failures

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-04-08T20:30:22Z

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
    > On 4/8/20 3:41 PM, Robert Haas wrote:
    >> I don't understand what the local $ENV{MSYS2_ARG_CONV_EXCL} =
    >> $source_ts_prefix does, 
    
    > You don't want to know ....
    > See <https://www.msys2.org/wiki/Porting/#filesystem-namespaces> for the
    > gory details.
    
    I don't want to know either, but maybe that reference should be cited
    somewhere near where we use this sort of hack.
    
    			regards, tom lane
    
    
    
    
  269. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-09T14:06:56Z

    
    On 2020/04/09 2:35, Robert Haas wrote:
    > On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    >> When there is a backup_manifest in the database cluster, it's included in
    >> the backup even when --no-manifest is specified. ISTM that this is problematic
    >> because the backup_manifest is obviously not valid for the backup.
    >> So, isn't it better to always exclude the *existing* backup_manifest in the
    >> cluster from the backup, like backup_label/tablespace_map? Patch attached.
    >>
    >> Also I found the typo in the document. Patch attached.
    > 
    > Both patches look good. The second one is definitely a mistake on my
    > part, and the first one seems like a totally reasonable change.
    > Thanks!
    
    Thanks for reviewing them! I pushed them.
    
    Please note that the commit messages have not been delivered to
    pgsql-committers yet.
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  270. Re: backup manifests

    Stephen Frost <sfrost@snowman.net> — 2020-04-09T14:10:55Z

    Greetings,
    
    * Fujii Masao (masao.fujii@oss.nttdata.com) wrote:
    > On 2020/04/09 2:35, Robert Haas wrote:
    > >On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    > >>When there is a backup_manifest in the database cluster, it's included in
    > >>the backup even when --no-manifest is specified. ISTM that this is problematic
    > >>because the backup_manifest is obviously not valid for the backup.
    > >>So, isn't it better to always exclude the *existing* backup_manifest in the
    > >>cluster from the backup, like backup_label/tablespace_map? Patch attached.
    > >>
    > >>Also I found the typo in the document. Patch attached.
    > >
    > >Both patches look good. The second one is definitely a mistake on my
    > >part, and the first one seems like a totally reasonable change.
    > >Thanks!
    > 
    > Thanks for reviewing them! I pushed them.
    > 
    > Please note that the commit messages have not been delivered to
    > pgsql-committers yet.
    
    They've been released and your address whitelisted.
    
    Thanks,
    
    Stephen
    
  271. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-09T14:11:58Z

    
    On 2020/04/09 23:10, Stephen Frost wrote:
    > Greetings,
    > 
    > * Fujii Masao (masao.fujii@oss.nttdata.com) wrote:
    >> On 2020/04/09 2:35, Robert Haas wrote:
    >>> On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    >>>> When there is a backup_manifest in the database cluster, it's included in
    >>>> the backup even when --no-manifest is specified. ISTM that this is problematic
    >>>> because the backup_manifest is obviously not valid for the backup.
    >>>> So, isn't it better to always exclude the *existing* backup_manifest in the
    >>>> cluster from the backup, like backup_label/tablespace_map? Patch attached.
    >>>>
    >>>> Also I found the typo in the document. Patch attached.
    >>>
    >>> Both patches look good. The second one is definitely a mistake on my
    >>> part, and the first one seems like a totally reasonable change.
    >>> Thanks!
    >>
    >> Thanks for reviewing them! I pushed them.
    >>
    >> Please note that the commit messages have not been delivered to
    >> pgsql-committers yet.
    > 
    > They've been released and your address whitelisted.
    
    Many thanks!!
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  272. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-13T02:09:34Z

    
    On 2020/04/09 23:06, Fujii Masao wrote:
    > 
    > 
    > On 2020/04/09 2:35, Robert Haas wrote:
    >> On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    >>> When there is a backup_manifest in the database cluster, it's included in
    >>> the backup even when --no-manifest is specified. ISTM that this is problematic
    >>> because the backup_manifest is obviously not valid for the backup.
    >>> So, isn't it better to always exclude the *existing* backup_manifest in the
    >>> cluster from the backup, like backup_label/tablespace_map? Patch attached.
    >>>
    >>> Also I found the typo in the document. Patch attached.
    >>
    >> Both patches look good. The second one is definitely a mistake on my
    >> part, and the first one seems like a totally reasonable change.
    >> Thanks!
    > 
    > Thanks for reviewing them! I pushed them.
    
    I found other minor issues.
    
    +          When this option is specified with a value of <literal>yes</literal>
    +          or <literal>force-escape</literal>, a backup manifest is created
    
    force-escape should be force-encode.
    Patch attached.
    
    -	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
    +	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:",
    
    "m:" seems unnecessary, so should be removed?
    Patch attached.
    
    +	if (strcmp(basedir, "-") == 0)
    +	{
    +		char		header[512];
    +		PQExpBufferData	buf;
    +
    +		initPQExpBuffer(&buf);
    +		ReceiveBackupManifestInMemory(conn, &buf);
    
    backup_manifest should be received only when the manifest is enabled,
    so ISTM that the flag "manifest" should be checked in the above if-condition.
    Thought? Patch attached.
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
  273. Re: backup manifests

    Michael Paquier <michael@paquier.xyz> — 2020-04-13T03:25:01Z

    On Mon, Apr 13, 2020 at 11:09:34AM +0900, Fujii Masao wrote:
    > -	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
    > +	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvPm:",
    > 
    > "m:" seems unnecessary, so should be removed?
    > Patch attached.
    
    Smells like some remnant diff from a previous version.
    
    > +	if (strcmp(basedir, "-") == 0)
    > +	{
    > +		char		header[512];
    > +		PQExpBufferData	buf;
    > +
    > +		initPQExpBuffer(&buf);
    > +		ReceiveBackupManifestInMemory(conn, &buf);
    > 
    > backup_manifest should be received only when the manifest is enabled,
    > so ISTM that the flag "manifest" should be checked in the above if-condition.
    > Thought? Patch attached.
    >
    > -	if (strcmp(basedir, "-") == 0)
    > +	if (strcmp(basedir, "-") == 0 && manifest)
    >  	{
    >  		char		header[512];
    >  		PQExpBufferData	buf;
    
    Indeed.  Using the tar format with --no-manifest causes a failure:
    pg_basebackup -D - --format=t --wal-method=none \
        --no-manifest > /dev/null
    
    The doc changes look right to me.  Nice catches.
    --
    Michael
    
  274. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T15:15:53Z

    On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
    <masao.fujii@oss.nttdata.com> wrote:
    > I found other minor issues.
    
    I think these are all correct fixes. Thanks for the post-commit
    review, and sorry for this mistakes.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  275. documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T17:40:56Z

    On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    > I don't like having a file format that's intended to be used by external
    > tools too that's undocumented except for code that assembles it in a
    > piecemeal fashion.  Do you mean in a follow-on patch this release, or
    > later? I don't have a problem with the former.
    
    Here is a patch for that.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  276. Re: documenting the backup manifest file format

    Justin Pryzby <pryzby@telsasoft.com> — 2020-04-13T17:55:53Z

    On Mon, Apr 13, 2020 at 01:40:56PM -0400, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    > > I don't like having a file format that's intended to be used by external
    > > tools too that's undocumented except for code that assembles it in a
    > > piecemeal fashion.  Do you mean in a follow-on patch this release, or
    > > later? I don't have a problem with the former.
    > 
    > Here is a patch for that.
    
    typos:
    manifes
    hexademical (twice)
    
    -- 
    Justin
    
    
    
    
  277. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T18:08:59Z

    On Mon, Apr 13, 2020 at 1:55 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
    > typos:
    > manifes
    > hexademical (twice)
    
    Thanks. v2 attached.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  278. Re: documenting the backup manifest file format

    Erik Rijkers <er@xs4all.nl> — 2020-04-13T18:28:31Z

    On 2020-04-13 20:08, Robert Haas wrote:
    > [v2-0001-Document-the-backup-manifest-file-format.patch]
    
    Can you double check this sentence?  Seems strange to me but I don't 
    know why; it may well be that my english is not good enough.  Maybe a 
    comma after 'required' makes reading easier?
    
        The timeline from which this range of WAL records will be required in
        order to make use of this backup. The value is an integer.
    
    
    One typo:
    
    'when making using'  should be
    'when making use'
    
    
    
    Erik Rijkers
    
    
    
    
    
  279. Re: documenting the backup manifest file format

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-13T19:34:49Z

    +      The LSN at which replay must begin on the indicated timeline in order to
    +      make use of this backup.  The LSN is stored in the format normally used
    +      by <productname>PostgreSQL</productname>; that is, it is a string
    +      consisting of two strings of hexademical characters, each with a length
    +      of between 1 and 8, separated by a slash.
    
    typo "hexademical"
    
    Are these hex figures upper or lower case?  No leading zeroes?  This
    would normally not matter, but the toplevel checksum will care.  Also, I
    see no mention of prettification-chars such as newlines or indentation.
    I suppose if I pass a manifest file through prettification (or Windows
    newline conversion), the checksum may break.
    
    As for Last-Modification, I think the spec should indicate the exact
    format that's used, because it'll also be critical for checksumming.
    
    Why is the top-level checksum only allowed to be SHA-256, if the files
    can use up to SHA-512?  (Also, did we intentionally omit the dash in
    hash names, so "SHA-256" to make it SHA256?  This will also be critical
    for checksumming the manifest itself.)
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  280. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T19:51:14Z

    On Mon, Apr 13, 2020 at 2:28 PM Erik Rijkers <er@xs4all.nl> wrote:
    > Can you double check this sentence?  Seems strange to me but I don't
    > know why; it may well be that my english is not good enough.  Maybe a
    > comma after 'required' makes reading easier?
    >
    >     The timeline from which this range of WAL records will be required in
    >     order to make use of this backup. The value is an integer.
    
    It sounds a little awkward to me, but not outright wrong. I'm not
    exactly sure how to rephrase it, though. Maybe just shorten it to "the
    timeline for this range of WAL records"?
    
    > One typo:
    >
    > 'when making using'  should be
    > 'when making use'
    
    Right, thanks, fixed in my local copy.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  281. Re: documenting the backup manifest file format

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-13T20:10:20Z

    On 4/13/20 1:40 PM, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    >> I don't like having a file format that's intended to be used by external
    >> tools too that's undocumented except for code that assembles it in a
    >> piecemeal fashion.  Do you mean in a follow-on patch this release, or
    >> later? I don't have a problem with the former.
    > Here is a patch for that.
    >
    
    
    Seems ok. A tiny example, or an excerpt, might be nice.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  282. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T20:14:38Z

    On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    > Are these hex figures upper or lower case?  No leading zeroes?  This
    > would normally not matter, but the toplevel checksum will care.
    
    Not really. You just feed the whole file except for the last line
    through shasum and you get the answer.
    
    It so happens that the server generates lower-case, but
    pg_verifybackup will accept either.
    
    Leading zeroes are not omitted. If the checksum's not the right
    length, it ain't gonna work. If SHA is used, it's the same output you
    would get from running shasum -a<whatever> on the file, which is
    certainly a fixed length. I assumed that this followed from the
    statement that there are two characters per byte in the checksum, and
    from the fact that no checksum algorithm I know about drops leading
    zeroes in the output.
    
    > Also, I
    > see no mention of prettification-chars such as newlines or indentation.
    > I suppose if I pass a manifest file through prettification (or Windows
    > newline conversion), the checksum may break.
    
    It would indeed break. I'm not sure what you want me to say here,
    though. If you're trying to parse a manifest, you shouldn't care about
    how the whitespace is arranged. If you're trying to generate one, you
    can arrange it any way you like, as long as you also include it in the
    checksum.
    
    > As for Last-Modification, I think the spec should indicate the exact
    > format that's used, because it'll also be critical for checksumming.
    
    Again, I don't think it really matters for checksumming, but it's
    "YYYY-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.
    
    > Why is the top-level checksum only allowed to be SHA-256, if the files
    > can use up to SHA-512?
    
    If we allowed the top-level checksum to be changed to something else,
    then we'd probably we want to indicate which kind of checksum is being
    used at the beginning of the file, so as to enable incremental parsing
    with checksum verification at the end. pg_verifybackup doesn't
    currently do incremental parsing, but I'd like to add that sometime,
    if I get time to hash out the details. I think the use case for
    varying the checksum type of the manifest itself is much less than for
    varying it for the files. The big problem with checksumming the files
    is that it can be slow, because the files can be big. However, unless
    you have a truckload of empty files in the database, the manifest is
    going to be very small compared to the sizes of all the files, so it
    seemed harmless to use a stronger checksum algorithm for the manifest
    itself. Maybe someone with a ton of empty or nearly-empty relations
    will complain, but they can always use --no-manifest if they want.
    
    I agree that it's a little bit weird that you can have a stronger
    checksum for the files instead of the manifest itself, but I also
    wonder what the use case would be for using a stronger checksum on the
    manifest. David Steele argued that strong checksums on the files could
    be useful to software that wants to rifle through all the backups
    you've ever taken and find another copy of that file by looking for
    something with a matching checksum. CRC-32C wouldn't be strong enough
    for that, because eventually you could have enough files that you
    start to have collisions. The SHA algorithms output enough bits to
    make that quite unlikely. But this argument only makes sense for the
    files, not the manifest.
    
    Naturally, all this is arguable, though, and a good deal of arguing
    about it has been done, as you have probably noticed. I am still of
    the opinion that if somebody's goal is to use this facility for its
    intended purpose, which is to find out whether your backup got
    corrupted, any of these algorithms are fine, and are highly likely to
    tell you that you have a problem if, in fact, you do. In fact, I bet
    that even a checksum algorithm considerably stupider than anything I'd
    actually consider using would accomplish that goal in a high
    percentage of cases. But not everybody agrees with me, to the point
    where I am starting to wonder if I really understand how computers
    work.
    
    > (Also, did we intentionally omit the dash in
    > hash names, so "SHA-256" to make it SHA256?  This will also be critical
    > for checksumming the manifest itself.)
    
    I debated this with myself, settled on this spelling, and nobody
    complained until now. It could be changed, though. I didn't have any
    particular reason for choosing it except the feeling that people would
    probably prefer to type --manifest-checksum=sha256 rather than
    --manifest-checksum=sha-256.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  283. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-13T20:16:06Z

    On Mon, Apr 13, 2020 at 4:10 PM Andrew Dunstan
    <andrew.dunstan@2ndquadrant.com> wrote:
    > Seems ok. A tiny example, or an excerpt, might be nice.
    
    An empty database produces a manifest about 1200 lines long, so a full
    example seems like too much to include in the documentation. An
    excerpt could be included, I suppose.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  284. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-13T20:42:03Z

    On 4/13/20 4:14 PM, Robert Haas wrote:
    > On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    > 
    >> Also, I
    >> see no mention of prettification-chars such as newlines or indentation.
    >> I suppose if I pass a manifest file through prettification (or Windows
    >> newline conversion), the checksum may break.
    > 
    > It would indeed break. I'm not sure what you want me to say here,
    > though. If you're trying to parse a manifest, you shouldn't care about
    > how the whitespace is arranged. If you're trying to generate one, you
    > can arrange it any way you like, as long as you also include it in the
    > checksum.
    
    pgBackRest ignores whitespace but this is a legacy of the way Perl 
    calculated checksums, not an intentional feature. This worked well when 
    the manifest was loaded as a whole, converted to JSON, and checksummed, 
    but it is a major pain for the streaming code we now have in C.
    
    I guarantee that that our next manifest version will do a simple 
    checksum of bytes as Robert has done in this feature.
    
    So, I'm +1 as implemented.
    
    >> Why is the top-level checksum only allowed to be SHA-256, if the files
    >> can use up to SHA-512?
    
    <snip>
    
    > I agree that it's a little bit weird that you can have a stronger
    > checksum for the files instead of the manifest itself, but I also
    > wonder what the use case would be for using a stronger checksum on the
    > manifest. David Steele argued that strong checksums on the files could
    > be useful to software that wants to rifle through all the backups
    > you've ever taken and find another copy of that file by looking for
    > something with a matching checksum. CRC-32C wouldn't be strong enough
    > for that, because eventually you could have enough files that you
    > start to have collisions. The SHA algorithms output enough bits to
    > make that quite unlikely. But this argument only makes sense for the
    > files, not the manifest.
    
    Agreed. I think SHA-256 is *more* than enough to protect the manifest 
    against corruption. That said, since the cost of SHA-256 vs. SHA-512 in 
    the context on the manifest is negligible we could just use the stronger 
    algorithm to deflect a similar question going forward.
    
    That choice might not age well, but we could always say, well, we picked 
    it because it was the strongest available at the time. Allowing a choice 
    of which algorithm to use for to manifest checksum seems like it will 
    just make verifying the file harder with no tangible benefit.
    
    Maybe just a comment in the docs about why SHA-256 was used would be fine.
    
    >> (Also, did we intentionally omit the dash in
    >> hash names, so "SHA-256" to make it SHA256?  This will also be critical
    >> for checksumming the manifest itself.)
    > 
    > I debated this with myself, settled on this spelling, and nobody
    > complained until now. It could be changed, though. I didn't have any
    > particular reason for choosing it except the feeling that people would
    > probably prefer to type --manifest-checksum=sha256 rather than
    > --manifest-checksum=sha-256.
    
    +1 for sha256 rather than sha-256.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  285. Re: documenting the backup manifest file format

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-13T21:42:56Z

    On 2020-Apr-13, Robert Haas wrote:
    
    > On Mon, Apr 13, 2020 at 3:34 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    > > Are these hex figures upper or lower case?  No leading zeroes?  This
    > > would normally not matter, but the toplevel checksum will care.
    > 
    > Not really. You just feed the whole file except for the last line
    > through shasum and you get the answer.
    > 
    > It so happens that the server generates lower-case, but
    > pg_verifybackup will accept either.
    > 
    > Leading zeroes are not omitted. If the checksum's not the right
    > length, it ain't gonna work. If SHA is used, it's the same output you
    > would get from running shasum -a<whatever> on the file, which is
    > certainly a fixed length. I assumed that this followed from the
    > statement that there are two characters per byte in the checksum, and
    > from the fact that no checksum algorithm I know about drops leading
    > zeroes in the output.
    
    Eh, apologies, I was completely unclear -- I was looking at the LSN
    fields when writing the above.  So the leading zeroes and letter case
    comment refers to those in the LSN values.  I agree that it doesn't
    matter as long as the same tool generates the json file and writes the
    checksum.
    
    > > Also, I see no mention of prettification-chars such as newlines or
    > > indentation.  I suppose if I pass a manifest file through
    > > prettification (or Windows newline conversion), the checksum may
    > > break.
    > 
    > It would indeed break. I'm not sure what you want me to say here,
    > though. If you're trying to parse a manifest, you shouldn't care about
    > how the whitespace is arranged. If you're trying to generate one, you
    > can arrange it any way you like, as long as you also include it in the
    > checksum.
    
    Yeah, I guess I'm just saying that it feels brittle to have a file
    format that's supposed to be good for data exchange and then make it
    itself depend on representation details such as the order that fields
    appear in, the letter case, or the format of newlines.  Maybe this isn't
    really of concern, but it seemed strange.
    
    > > As for Last-Modification, I think the spec should indicate the exact
    > > format that's used, because it'll also be critical for checksumming.
    > 
    > Again, I don't think it really matters for checksumming, but it's
    > "YYYY-MM-DD HH:MM:SS TZ" format, where TZ is always GMT.
    
    I agree that whatever format you use will work as long as it isn't
    modified.
    
    I think strict ISO 8601 might be preferable (with the T in the middle
    and ending in Z instead of " GMT").
    
    > > Why is the top-level checksum only allowed to be SHA-256, if the
    > > files can use up to SHA-512?
    
    Thanks for the discussion.  I think you mostly want to make sure that
    the manifest is sensible (not corrupt) rather than defend against
    somebody maliciously giving you an attacking manifest (??).  I incline
    to agree that any SHA-2 hash is going to serve that purpose and have no
    further comment to make.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  286. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-14T16:56:49Z

    On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    > Yeah, I guess I'm just saying that it feels brittle to have a file
    > format that's supposed to be good for data exchange and then make it
    > itself depend on representation details such as the order that fields
    > appear in, the letter case, or the format of newlines.  Maybe this isn't
    > really of concern, but it seemed strange.
    
    I didn't want to use JSON for this at all, but I got outvoted. When I
    raised this issue, it was suggested that I deal with it in this way,
    so I did. I can't really defend it too far beyond that, although I do
    think that one nice thing about this is that you can verify the
    checksum using shell commands if you want. Just figure out the number
    of lines in the file, minus one, and do head -n$LINES backup_manifest
    | shasum -a256 and boom. If there were some whitespace-skipping thing
    figuring out how to reproduce the checksum calculation would be hard.
    
    > I think strict ISO 8601 might be preferable (with the T in the middle
    > and ending in Z instead of " GMT").
    
    Hmm, did David suggest that before? I don't recall for sure. I think
    he had some suggestion, but I'm not sure if it was the same one.
    
    > > > Why is the top-level checksum only allowed to be SHA-256, if the
    > > > files can use up to SHA-512?
    >
    > Thanks for the discussion.  I think you mostly want to make sure that
    > the manifest is sensible (not corrupt) rather than defend against
    > somebody maliciously giving you an attacking manifest (??).  I incline
    > to agree that any SHA-2 hash is going to serve that purpose and have no
    > further comment to make.
    
    The code has other sanity checks against the manifest failing to parse
    properly, so you can't (I hope) crash it or anything even if you
    falsify the checksum. But suppose that there is a gremlin running
    around your system flipping occasional bits. If said gremlin flips a
    bit in a "0" that appears in a file's checksum string, it could become
    a "1", a "3", or a "7", all of which are still valid characters for a
    hex string. When you then tried to verify the backup, verification for
    that file would fail, but you'd think it was a problem with the file,
    rather than a problem with the manifest. The manifest checksum
    prevents that: you'll get a complaint about the manifest checksum
    being wrong rather than a complaint about the file not matching the
    manifest checksum. A sufficiently smart gremlin could figure out the
    expected checksum for the revised manifest and flip bits to make the
    actual value match the expected one, but I think we're worried about
    "chaotic neutral" gremlins, not "lawful evil" ones.
    
    That having been said, there was some discussion on the original
    thread about keeping your backup on regular storage and your manifest
    checksum in a concrete bunker at the bottom of the ocean; in that
    scenario, it should be possible to detect tampering in either the
    manifest itself or in non-WAL data files, as long as the adversary
    can't break SHA-256. But I'm not sure how much we should really worry
    about that. For me, the design center for this feature is a user who
    untars base.tar and forgets about 43965.tar. If that person runs
    pg_verifybackup, it's gonna tell them that things are broken, and
    that's good enough for me. It may not be good enough for everybody,
    but it's good enough for me.
    
    I think I'm going to go ahed and push this now, maybe with a small
    wording tweak as discussed upthread with Andrew. The rest of this
    discussion is really about whether the patch needs any design changes
    rather than about whether the documentation describes what the patch
    does, so it makes sense to me to commit this first and then if
    somebody wants to argue for a change they certainly can.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  287. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-14T17:12:51Z

    On 4/14/20 12:56 PM, Robert Haas wrote:
    > On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
    >> Yeah, I guess I'm just saying that it feels brittle to have a file
    >> format that's supposed to be good for data exchange and then make it
    >> itself depend on representation details such as the order that fields
    >> appear in, the letter case, or the format of newlines.  Maybe this isn't
    >> really of concern, but it seemed strange.
    > 
    > I didn't want to use JSON for this at all, but I got outvoted. When I
    > raised this issue, it was suggested that I deal with it in this way,
    > so I did. I can't really defend it too far beyond that, although I do
    > think that one nice thing about this is that you can verify the
    > checksum using shell commands if you want. Just figure out the number
    > of lines in the file, minus one, and do head -n$LINES backup_manifest
    > | shasum -a256 and boom. If there were some whitespace-skipping thing
    > figuring out how to reproduce the checksum calculation would be hard.
    > 
    >> I think strict ISO 8601 might be preferable (with the T in the middle
    >> and ending in Z instead of " GMT").
    > 
    > Hmm, did David suggest that before? I don't recall for sure. I think
    > he had some suggestion, but I'm not sure if it was the same one.
    
    "I'm also partial to using epoch time in the manifest because it is 
    generally easier for programs to work with.  But, human-readable doesn't 
    suck, either."
    
    Also you don't need to worry about time-zone conversion errors -- even 
    if the source time is UTC this can easily happen if you are not careful. 
    It also saves a parsing step.
    
    The downside is it is not human-readable but this is intended to be a 
    machine-readable format so I don't think it's a big deal (encoded 
    filenames will be just as opaque). If a user really needs to know what 
    time some file is (rare, I think) they can paste it with a web tool to 
    find out.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  288. Re: documenting the backup manifest file format

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-14T17:27:31Z

    On 2020-Apr-14, David Steele wrote:
    
    > On 4/14/20 12:56 PM, Robert Haas wrote:
    >
    > > Hmm, did David suggest that before? I don't recall for sure. I think
    > > he had some suggestion, but I'm not sure if it was the same one.
    > 
    > "I'm also partial to using epoch time in the manifest because it is
    > generally easier for programs to work with.  But, human-readable doesn't
    > suck, either."
    
    Ugh.  If you go down that road, why write human-readable contents at
    all?  You may as well just use a binary format.  But that's a very
    slippery slope and you won't like to be in the bottom -- I don't see
    what that gains you.  It's not like it's a lot of work to parse a
    timestamp in a non-internationalized well-defined human-readable format.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  289. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-14T17:33:44Z

    On 4/14/20 1:27 PM, Alvaro Herrera wrote:
    > On 2020-Apr-14, David Steele wrote:
    > 
    >> On 4/14/20 12:56 PM, Robert Haas wrote:
    >>
    >>> Hmm, did David suggest that before? I don't recall for sure. I think
    >>> he had some suggestion, but I'm not sure if it was the same one.
    >>
    >> "I'm also partial to using epoch time in the manifest because it is
    >> generally easier for programs to work with.  But, human-readable doesn't
    >> suck, either."
    > 
    > Ugh.  If you go down that road, why write human-readable contents at
    > all?  You may as well just use a binary format.  But that's a very
    > slippery slope and you won't like to be in the bottom -- I don't see
    > what that gains you.  It's not like it's a lot of work to parse a
    > timestamp in a non-internationalized well-defined human-readable format.
    
    Well, times are a special case because they are so easy to mess up. Try 
    converting ISO-8601 to epoch time using the standard C functions on a 
    system where TZ != UTC. Fun times.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  290. Re: documenting the backup manifest file format

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-14T19:03:12Z

    On 4/14/20 1:33 PM, David Steele wrote:
    > On 4/14/20 1:27 PM, Alvaro Herrera wrote:
    >> On 2020-Apr-14, David Steele wrote:
    >>
    >>> On 4/14/20 12:56 PM, Robert Haas wrote:
    >>>
    >>>> Hmm, did David suggest that before? I don't recall for sure. I think
    >>>> he had some suggestion, but I'm not sure if it was the same one.
    >>>
    >>> "I'm also partial to using epoch time in the manifest because it is
    >>> generally easier for programs to work with.  But, human-readable
    >>> doesn't
    >>> suck, either."
    >>
    >> Ugh.  If you go down that road, why write human-readable contents at
    >> all?  You may as well just use a binary format.  But that's a very
    >> slippery slope and you won't like to be in the bottom -- I don't see
    >> what that gains you.  It's not like it's a lot of work to parse a
    >> timestamp in a non-internationalized well-defined human-readable format.
    >
    > Well, times are a special case because they are so easy to mess up.
    > Try converting ISO-8601 to epoch time using the standard C functions
    > on a system where TZ != UTC. Fun times.
    >
    >
    
    
    Even if it's a zulu time? That would be pretty damn sad.
    
    
    cheers
    
    
    andrew
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  291. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-14T19:19:23Z

    On 4/14/20 3:03 PM, Andrew Dunstan wrote:
    > 
    > On 4/14/20 1:33 PM, David Steele wrote:
    >> On 4/14/20 1:27 PM, Alvaro Herrera wrote:
    >>> On 2020-Apr-14, David Steele wrote:
    >>>
    >>>> On 4/14/20 12:56 PM, Robert Haas wrote:
    >>>>
    >>>>> Hmm, did David suggest that before? I don't recall for sure. I think
    >>>>> he had some suggestion, but I'm not sure if it was the same one.
    >>>>
    >>>> "I'm also partial to using epoch time in the manifest because it is
    >>>> generally easier for programs to work with.  But, human-readable
    >>>> doesn't
    >>>> suck, either."
    >>>
    >>> Ugh.  If you go down that road, why write human-readable contents at
    >>> all?  You may as well just use a binary format.  But that's a very
    >>> slippery slope and you won't like to be in the bottom -- I don't see
    >>> what that gains you.  It's not like it's a lot of work to parse a
    >>> timestamp in a non-internationalized well-defined human-readable format.
    >>
    >> Well, times are a special case because they are so easy to mess up.
    >> Try converting ISO-8601 to epoch time using the standard C functions
    >> on a system where TZ != UTC. Fun times.
    > 
    > Even if it's a zulu time? That would be pretty damn sad.
    ZULU/GMT/UTC are all fine. But if the server timezone is EDT for example 
    (not that I recommend this) you are likely to get the wrong result. 
    Results vary based on your platform. For instance, we found MacOS was 
    more likely to work the way you would expect and Linux was hopeless.
    
    There are all kinds of fun tricks to get around this (sort of). One is 
    to temporarily set TZ=UTC which sucks if an error happens before it gets 
    set back. There are some hacks to try to determine your offset which 
    have inherent race conditions around DST changes.
    
    After some experimentation we just used the Posix definition for epoch 
    time and used that to do our conversions:
    
    https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  292. Re: documenting the backup manifest file format

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-14T19:55:55Z

    On 4/14/20 3:19 PM, David Steele wrote:
    > On 4/14/20 3:03 PM, Andrew Dunstan wrote:
    >>
    >> On 4/14/20 1:33 PM, David Steele wrote:
    >>> On 4/14/20 1:27 PM, Alvaro Herrera wrote:
    >>>> On 2020-Apr-14, David Steele wrote:
    >>>>
    >>>>> On 4/14/20 12:56 PM, Robert Haas wrote:
    >>>>>
    >>>>>> Hmm, did David suggest that before? I don't recall for sure. I think
    >>>>>> he had some suggestion, but I'm not sure if it was the same one.
    >>>>>
    >>>>> "I'm also partial to using epoch time in the manifest because it is
    >>>>> generally easier for programs to work with.  But, human-readable
    >>>>> doesn't
    >>>>> suck, either."
    >>>>
    >>>> Ugh.  If you go down that road, why write human-readable contents at
    >>>> all?  You may as well just use a binary format.  But that's a very
    >>>> slippery slope and you won't like to be in the bottom -- I don't see
    >>>> what that gains you.  It's not like it's a lot of work to parse a
    >>>> timestamp in a non-internationalized well-defined human-readable
    >>>> format.
    >>>
    >>> Well, times are a special case because they are so easy to mess up.
    >>> Try converting ISO-8601 to epoch time using the standard C functions
    >>> on a system where TZ != UTC. Fun times.
    >>
    >> Even if it's a zulu time? That would be pretty damn sad.
    > ZULU/GMT/UTC are all fine. But if the server timezone is EDT for
    > example (not that I recommend this) you are likely to get the wrong
    > result. Results vary based on your platform. For instance, we found
    > MacOS was more likely to work the way you would expect and Linux was
    > hopeless.
    >
    > There are all kinds of fun tricks to get around this (sort of). One is
    > to temporarily set TZ=UTC which sucks if an error happens before it
    > gets set back. There are some hacks to try to determine your offset
    > which have inherent race conditions around DST changes.
    >
    > After some experimentation we just used the Posix definition for epoch
    > time and used that to do our conversions:
    >
    > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16
    >
    >
    >
    
    OK, but I think if we're putting a timestamp string in ISO-8601 format
    in the manifest it should be in UTC / Zulu time, precisely to avoid
    these issues. If that's too much trouble then yes an epoch time will
    probably do.
    
    
    cheers
    
    
    andrew
    
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  293. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-14T20:01:40Z

    On 4/14/20 3:55 PM, Andrew Dunstan wrote:
    > 
    > OK, but I think if we're putting a timestamp string in ISO-8601 format
    > in the manifest it should be in UTC / Zulu time, precisely to avoid
    > these issues. If that's too much trouble then yes an epoch time will
    > probably do.
    
    Happily ISO-8601 is always UTC. The problem I'm referring to is the 
    timezone setting on the host system when doing conversions in C.
    
    To be fair most languages handle this well and C is C so I'm not sure we 
    need to make a big deal of it. In JSON/XML it's pretty common to use 
    ISO-8601 so that seems like a rational choice.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  294. Re: documenting the backup manifest file format

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-14T20:09:22Z

    On 2020-Apr-14, Andrew Dunstan wrote:
    
    > OK, but I think if we're putting a timestamp string in ISO-8601 format
    > in the manifest it should be in UTC / Zulu time, precisely to avoid
    > these issues. If that's too much trouble then yes an epoch time will
    > probably do.
    
    The timestamp is always specified and always UTC (except the code calls
    it GMT).
    
    +   /*
    +    * Convert last modification time to a string and append it to the
    +    * manifest. Since it's not clear what time zone to use and since time
    +    * zone definitions can change, possibly causing confusion, use GMT
    +    * always.
    +    */
    +   appendStringInfoString(&buf, "\"Last-Modified\": \"");
    +   enlargeStringInfo(&buf, 128);
    +   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
    +                          pg_gmtime(&mtime));
    +   appendStringInfoString(&buf, "\"");
    
    I was merely saying that it's trivial to make this iso-8601 compliant as
    
        buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",
    
    ie. omit the "GMT" string and replace it with a literal Z, and remove
    the space and replace it with a T.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  295. Re: documenting the backup manifest file format

    Alvaro Herrera <alvherre@2ndquadrant.com> — 2020-04-14T20:11:00Z

    On 2020-Apr-14, David Steele wrote:
    
    > Happily ISO-8601 is always UTC.
    
    Uh, it is not --
    https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  296. Re: documenting the backup manifest file format

    Andrew Dunstan <andrew.dunstan@2ndquadrant.com> — 2020-04-14T20:33:31Z

    On 4/14/20 4:09 PM, Alvaro Herrera wrote:
    > On 2020-Apr-14, Andrew Dunstan wrote:
    >
    >> OK, but I think if we're putting a timestamp string in ISO-8601 format
    >> in the manifest it should be in UTC / Zulu time, precisely to avoid
    >> these issues. If that's too much trouble then yes an epoch time will
    >> probably do.
    > The timestamp is always specified and always UTC (except the code calls
    > it GMT).
    >
    > +   /*
    > +    * Convert last modification time to a string and append it to the
    > +    * manifest. Since it's not clear what time zone to use and since time
    > +    * zone definitions can change, possibly causing confusion, use GMT
    > +    * always.
    > +    */
    > +   appendStringInfoString(&buf, "\"Last-Modified\": \"");
    > +   enlargeStringInfo(&buf, 128);
    > +   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
    > +                          pg_gmtime(&mtime));
    > +   appendStringInfoString(&buf, "\"");
    >
    > I was merely saying that it's trivial to make this iso-8601 compliant as
    >
    >     buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",
    >
    > ie. omit the "GMT" string and replace it with a literal Z, and remove
    > the space and replace it with a T.
    >
    
    +1
    
    
    cheers
    
    
    andre
    
    
    
    -- 
    Andrew Dunstan                https://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
    
  297. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-14T20:40:12Z

    On 4/14/20 4:11 PM, Alvaro Herrera wrote:
    > On 2020-Apr-14, David Steele wrote:
    > 
    >> Happily ISO-8601 is always UTC.
    > 
    > Uh, it is not --
    > https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators
    
    Whoops, you are correct. I've just never seen non-UTC in the wild yet.
    
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  298. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-15T02:18:38Z

    
    On 2020/04/14 0:15, Robert Haas wrote:
    > On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
    > <masao.fujii@oss.nttdata.com> wrote:
    >> I found other minor issues.
    > 
    > I think these are all correct fixes. Thanks for the post-commit
    > review, and sorry for this mistakes.
    
    Thanks for the review, Michael and Robert. Pushed the patches!
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  299. Re: documenting the backup manifest file format

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-15T03:49:11Z

    
    On 2020/04/14 2:40, Robert Haas wrote:
    > On Fri, Mar 27, 2020 at 4:32 PM Andres Freund <andres@anarazel.de> wrote:
    >> I don't like having a file format that's intended to be used by external
    >> tools too that's undocumented except for code that assembles it in a
    >> piecemeal fashion.  Do you mean in a follow-on patch this release, or
    >> later? I don't have a problem with the former.
    > 
    > Here is a patch for that.
    
    While reading the document that you pushed, I thought that it's better
    to define index term for backup manifest, so that we can easily reach
    this document from the index page. Thought? Patch attached.
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
  300. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-15T13:24:38Z

    On Tue, Apr 14, 2020 at 11:49 PM Fujii Masao
    <masao.fujii@oss.nttdata.com> wrote:
    > While reading the document that you pushed, I thought that it's better
    > to define index term for backup manifest, so that we can easily reach
    > this document from the index page. Thought? Patch attached.
    
    Fine with me. I tend not to think about the index very much, so I'm
    glad you are. :-)
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  301. Re: documenting the backup manifest file format

    Jehan-Guillaume de Rorthais <jgdr@dalibo.com> — 2020-04-15T15:23:21Z

    On Tue, 14 Apr 2020 12:56:49 -0400
    Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Apr 13, 2020 at 5:43 PM Alvaro Herrera <alvherre@2ndquadrant.com>
    > wrote:
    > > Yeah, I guess I'm just saying that it feels brittle to have a file
    > > format that's supposed to be good for data exchange and then make it
    > > itself depend on representation details such as the order that fields
    > > appear in, the letter case, or the format of newlines.  Maybe this isn't
    > > really of concern, but it seemed strange.  
    > 
    > I didn't want to use JSON for this at all, but I got outvoted. When I
    > raised this issue, it was suggested that I deal with it in this way,
    > so I did. I can't really defend it too far beyond that, although I do
    > think that one nice thing about this is that you can verify the
    > checksum using shell commands if you want. Just figure out the number
    > of lines in the file, minus one, and do head -n$LINES backup_manifest
    > | shasum -a256 and boom. If there were some whitespace-skipping thing
    > figuring out how to reproduce the checksum calculation would be hard.
    
    FWIW, shell commands (md5sum and sha*sum) read checksums from a separate file
    with a very simple format: one file per line with format "CHECKSUM FILEPATH".
    
    Thanks to json, it is fairly easy to extract checksums and filenames from the
    current manifest file format and check them all with one command:
    
      jq -r '.Files|.[]|.Checksum+" "+.Path' backup_manifest > checksums.sha256
      sha256sum --check --quiet checksums.sha256
    
    You can even pipe these commands together to avoid the intermediary file.
    
    But for backup_manifest, it's kind of shame we have to check the checksum
    against an transformed version of the file. Did you consider creating eg. a
    separate backup_manifest.sha256 file?
    
    I'm very sorry in advance if this has been discussed previously.
    
    Regards,
    
    
    
    
  302. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-04-15T16:03:28Z

    On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
    <jgdr@dalibo.com> wrote:
    > But for backup_manifest, it's kind of shame we have to check the checksum
    > against an transformed version of the file. Did you consider creating eg. a
    > separate backup_manifest.sha256 file?
    >
    > I'm very sorry in advance if this has been discussed previously.
    
    It was briefly mentioned in the original (lengthy) discussion, but I
    think there was one vote in favor and two votes against or something
    like that, so it didn't go anywhere. I didn't realize that there were
    handy command-line tools for manipulating json like that, or I
    probably would have considered that idea more strongly.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  303. Re: documenting the backup manifest file format

    Jehan-Guillaume de Rorthais <jgdr@dalibo.com> — 2020-04-15T22:43:15Z

    On Wed, 15 Apr 2020 12:03:28 -0400
    Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
    > <jgdr@dalibo.com> wrote:
    > > But for backup_manifest, it's kind of shame we have to check the checksum
    > > against an transformed version of the file. Did you consider creating eg. a
    > > separate backup_manifest.sha256 file?
    > >
    > > I'm very sorry in advance if this has been discussed previously.  
    > 
    > It was briefly mentioned in the original (lengthy) discussion, but I
    > think there was one vote in favor and two votes against or something
    > like that, so it didn't go anywhere.
    
    Argh.
    
    > I didn't realize that there were handy command-line tools for manipulating
    > json like that, or I probably would have considered that idea more strongly.
    
    That was indeed a lengthy thread with various details discussed. I'm sorry I
    didn't catch the ball back then.
    
    Regards,
    
    
    
    
  304. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-04-15T22:54:14Z

    On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
    > On Wed, 15 Apr 2020 12:03:28 -0400
    > Robert Haas <robertmhaas@gmail.com> wrote:
    > 
    >> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
    >> <jgdr@dalibo.com> wrote:
    >>> But for backup_manifest, it's kind of shame we have to check the checksum
    >>> against an transformed version of the file. Did you consider creating eg. a
    >>> separate backup_manifest.sha256 file?
    >>>
    >>> I'm very sorry in advance if this has been discussed previously.
    >>
    >> It was briefly mentioned in the original (lengthy) discussion, but I
    >> think there was one vote in favor and two votes against or something
    >> like that, so it didn't go anywhere.
    > 
    > Argh.
    > 
    >> I didn't realize that there were handy command-line tools for manipulating
    >> json like that, or I probably would have considered that idea more strongly.
    > 
    > That was indeed a lengthy thread with various details discussed. I'm sorry I
    > didn't catch the ball back then.
    
    One of the reasons to use JSON was to be able to use command line tools 
    like jq to do tasks (I use it myself). But I think only the 
    pg_verifybackup tool should be used to verify the internal checksum.
    
    Two thoughts:
    
    1) You can always generate an external checksum when you generate the 
    backup if you want to do your own verification without running 
    pg_verifybackup.
    
    2) Perhaps it would be good if the pg_verifybackup command had a 
    --verify-manifest-checksum option (or something) to check that the 
    manifest file looks valid without checking any files. That's not going 
    to happen for PG13, but it's possible for PG14.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  305. Re: documenting the backup manifest file format

    Jehan-Guillaume de Rorthais <jgdr@dalibo.com> — 2020-04-16T22:23:27Z

    On Wed, 15 Apr 2020 18:54:14 -0400
    David Steele <david@pgmasters.net> wrote:
    
    > On 4/15/20 6:43 PM, Jehan-Guillaume de Rorthais wrote:
    > > On Wed, 15 Apr 2020 12:03:28 -0400
    > > Robert Haas <robertmhaas@gmail.com> wrote:
    > >   
    > >> On Wed, Apr 15, 2020 at 11:23 AM Jehan-Guillaume de Rorthais
    > >> <jgdr@dalibo.com> wrote:  
    > >>> But for backup_manifest, it's kind of shame we have to check the checksum
    > >>> against an transformed version of the file. Did you consider creating eg.
    > >>> a separate backup_manifest.sha256 file?
    > >>>
    > >>> I'm very sorry in advance if this has been discussed previously.  
    > >>
    > >> It was briefly mentioned in the original (lengthy) discussion, but I
    > >> think there was one vote in favor and two votes against or something
    > >> like that, so it didn't go anywhere.  
    > > 
    > > Argh.
    > >   
    > >> I didn't realize that there were handy command-line tools for manipulating
    > >> json like that, or I probably would have considered that idea more
    > >> strongly.  
    > > 
    > > That was indeed a lengthy thread with various details discussed. I'm sorry I
    > > didn't catch the ball back then.  
    > 
    > One of the reasons to use JSON was to be able to use command line tools 
    > like jq to do tasks (I use it myself).
    
    That's perfectly fine. I was only wondering about having the manifest checksum
    outside of the manifest itself.
    
    > But I think only the pg_verifybackup tool should be used to verify the
    > internal checksum.
    
    true.
    
    > Two thoughts:
    > 
    > 1) You can always generate an external checksum when you generate the 
    > backup if you want to do your own verification without running 
    > pg_verifybackup.
    
    Sure, but by the time I want to produce an external checksum, the manifest
    would have travel around quite a bit with various danger on its way to corrupt
    it. Checksuming it from the original process that produced it sounds safer.
    
    > 2) Perhaps it would be good if the pg_verifybackup command had a 
    > --verify-manifest-checksum option (or something) to check that the 
    > manifest file looks valid without checking any files. That's not going 
    > to happen for PG13, but it's possible for PG14.
    
    Sure.
    
    I just liked the idea to be able to check the manifest using an external
    command line implementing the same standardized checksum algo. Without editing
    the manifest first. But I understand it's too late to discuss this now.
    
    Regards,
    
    
    
    
  306. Re: documenting the backup manifest file format

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-17T09:39:21Z

    
    On 2020/04/15 22:24, Robert Haas wrote:
    > On Tue, Apr 14, 2020 at 11:49 PM Fujii Masao
    > <masao.fujii@oss.nttdata.com> wrote:
    >> While reading the document that you pushed, I thought that it's better
    >> to define index term for backup manifest, so that we can easily reach
    >> this document from the index page. Thought? Patch attached.
    > 
    > Fine with me. I tend not to think about the index very much, so I'm
    > glad you are. :-)
    
    Pushed! Thanks!
    
    Regards,
      
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  307. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-22T16:21:40Z

    
    On 2020/04/15 11:18, Fujii Masao wrote:
    > 
    > 
    > On 2020/04/14 0:15, Robert Haas wrote:
    >> On Sun, Apr 12, 2020 at 10:09 PM Fujii Masao
    >> <masao.fujii@oss.nttdata.com> wrote:
    >>> I found other minor issues.
    >>
    >> I think these are all correct fixes. Thanks for the post-commit
    >> review, and sorry for this mistakes.
    > 
    > Thanks for the review, Michael and Robert. Pushed the patches!
    
    I found three minor issues in pg_verifybackup.
    
    +		{"print-parse-wal", no_argument, NULL, 'p'},
    
    This is unused option, so this line should be removed.
    
    +	printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));
    
    Typo: --manifest should be --manifest-path
    
    pg_verifybackup accepts --quiet option, but its usage() doesn't
    print any message for --quiet option.
    
    Attached is the patch that fixes those issues.
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
  308. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-22T16:28:38Z

    On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao
    <masao.fujii@oss.nttdata.com> wrote:
    > I found three minor issues in pg_verifybackup.
    >
    > +               {"print-parse-wal", no_argument, NULL, 'p'},
    >
    > This is unused option, so this line should be removed.
    >
    > +       printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));
    >
    > Typo: --manifest should be --manifest-path
    >
    > pg_verifybackup accepts --quiet option, but its usage() doesn't
    > print any message for --quiet option.
    >
    > Attached is the patch that fixes those issues.
    
    Thanks; LGTM.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  309. Re: backup manifests

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-04-23T02:33:46Z

    
    On 2020/04/23 1:28, Robert Haas wrote:
    > On Wed, Apr 22, 2020 at 12:21 PM Fujii Masao
    > <masao.fujii@oss.nttdata.com> wrote:
    >> I found three minor issues in pg_verifybackup.
    >>
    >> +               {"print-parse-wal", no_argument, NULL, 'p'},
    >>
    >> This is unused option, so this line should be removed.
    >>
    >> +       printf(_("  -m, --manifest=PATH         use specified path for manifest\n"));
    >>
    >> Typo: --manifest should be --manifest-path
    >>
    >> pg_verifybackup accepts --quiet option, but its usage() doesn't
    >> print any message for --quiet option.
    >>
    >> Attached is the patch that fixes those issues.
    > 
    > Thanks; LGTM.
    
    Thanks for the review! Pushed.
    
    Regards,  
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  310. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-23T12:57:39Z

    On Sun, Apr 5, 2020 at 3:31 PM Andres Freund <andres@anarazel.de> wrote:
    > The warnings don't seem too unreasonable. The compiler can't see that
    > the error_cb inside json_manifest_parse_failure() is not expected to
    > return. Probably worth adding a wrapper around the calls to
    > context->error_cb and mark that as noreturn.
    
    Eh, how? The callback is declared as:
    
    typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
                                                                     char
    *fmt, ...) pg_attribute_printf(2, 3);
    
    I don't know of a way to create a wrapper around that, because of the
    variable argument list. We could change the callback to take va_list,
    I guess.
    
    Does it work for you to just add pg_attribute_noreturn() to this
    typedef, as in the attached?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
  311. Re: backup manifests

    Andres Freund <andres@anarazel.de> — 2020-04-23T21:16:29Z

    Hi,
    
    On 2020-04-23 08:57:39 -0400, Robert Haas wrote:
    > On Sun, Apr 5, 2020 at 3:31 PM Andres Freund <andres@anarazel.de> wrote:
    > > The warnings don't seem too unreasonable. The compiler can't see that
    > > the error_cb inside json_manifest_parse_failure() is not expected to
    > > return. Probably worth adding a wrapper around the calls to
    > > context->error_cb and mark that as noreturn.
    > 
    > Eh, how? The callback is declared as:
    > 
    > typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
    >                                                                  char
    > *fmt, ...) pg_attribute_printf(2, 3);
    > 
    > I don't know of a way to create a wrapper around that, because of the
    > variable argument list.
    
    Didn't think that far...
    
    
    > We could change the callback to take va_list, I guess.
    
    I'd argue that that'd be a good idea anyway, otherwise there's no way to
    wrap the invocation anywhere in the code. But that's an independent
    consideration, as:
    
    > Does it work for you to just add pg_attribute_noreturn() to this
    > typedef, as in the attached?
    
    does fix the problem for me, cool.
    
    Do you not see a warning when compiling with optimizations enabled?
    
    Greetings,
    
    Andres Freund
    
    
    
    
  312. Re: backup manifests

    Robert Haas <robertmhaas@gmail.com> — 2020-04-24T12:03:15Z

    On Thu, Apr 23, 2020 at 5:16 PM Andres Freund <andres@anarazel.de> wrote:
    > Do you not see a warning when compiling with optimizations enabled?
    
    No, I don't. I tried it with -O{0,1,2,3} and I always use -Wall
    -Werror. No warnings.
    
    [rhaas pgsql]$ clang -v
    clang version 5.0.2 (tags/RELEASE_502/final)
    Target: x86_64-apple-darwin19.4.0
    Thread model: posix
    InstalledDir: /opt/local/libexec/llvm-5.0/bin
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  313. Re: documenting the backup manifest file format

    Fujii Masao <masao.fujii@oss.nttdata.com> — 2020-05-15T06:10:38Z

    
    On 2020/04/15 5:33, Andrew Dunstan wrote:
    > 
    > On 4/14/20 4:09 PM, Alvaro Herrera wrote:
    >> On 2020-Apr-14, Andrew Dunstan wrote:
    >>
    >>> OK, but I think if we're putting a timestamp string in ISO-8601 format
    >>> in the manifest it should be in UTC / Zulu time, precisely to avoid
    >>> these issues. If that's too much trouble then yes an epoch time will
    >>> probably do.
    >> The timestamp is always specified and always UTC (except the code calls
    >> it GMT).
    >>
    >> +   /*
    >> +    * Convert last modification time to a string and append it to the
    >> +    * manifest. Since it's not clear what time zone to use and since time
    >> +    * zone definitions can change, possibly causing confusion, use GMT
    >> +    * always.
    >> +    */
    >> +   appendStringInfoString(&buf, "\"Last-Modified\": \"");
    >> +   enlargeStringInfo(&buf, 128);
    >> +   buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z",
    >> +                          pg_gmtime(&mtime));
    >> +   appendStringInfoString(&buf, "\"");
    >>
    >> I was merely saying that it's trivial to make this iso-8601 compliant as
    >>
    >>      buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ",
    >>
    >> ie. omit the "GMT" string and replace it with a literal Z, and remove
    >> the space and replace it with a T.
    
    I have one question related to this; Why don't we use log_timezone,
    like backup_label? log_timezone is used for "START TIME" field in
    backup_label. Sorry if this was already discussed.
    
    		/* Use the log timezone here, not the session timezone */
    		stamp_time = (pg_time_t) time(NULL);
    		pg_strftime(strfbuf, sizeof(strfbuf),
    					"%Y-%m-%d %H:%M:%S %Z",
    					pg_localtime(&stamp_time, log_timezone));
    
    OTOH, *if* we want to use the same timezone for backup-related files because
    backup can be used in different environements and timezone setting
    may be different there or for other reasons, backup_label also should use
    GMT or something for the sake of consistency?
    
    Regards,
    
    -- 
    Fujii Masao
    Advanced Computing Technology Center
    Research and Development Headquarters
    NTT DATA CORPORATION
    
    
    
    
  314. Re: documenting the backup manifest file format

    Robert Haas <robertmhaas@gmail.com> — 2020-05-15T13:14:32Z

    On Fri, May 15, 2020 at 2:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
    > I have one question related to this; Why don't we use log_timezone,
    > like backup_label? log_timezone is used for "START TIME" field in
    > backup_label. Sorry if this was already discussed.
    >
    >                 /* Use the log timezone here, not the session timezone */
    >                 stamp_time = (pg_time_t) time(NULL);
    >                 pg_strftime(strfbuf, sizeof(strfbuf),
    >                                         "%Y-%m-%d %H:%M:%S %Z",
    >                                         pg_localtime(&stamp_time, log_timezone));
    >
    > OTOH, *if* we want to use the same timezone for backup-related files because
    > backup can be used in different environements and timezone setting
    > may be different there or for other reasons, backup_label also should use
    > GMT or something for the sake of consistency?
    
    It's a good question. My inclination was to think that GMT would be
    the clearest thing, but I also didn't realize that the result would
    thus be inconsistent with backup_label. Not sure what's best here.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
    
  315. Re: documenting the backup manifest file format

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-05-15T13:34:34Z

    Robert Haas <robertmhaas@gmail.com> writes:
    > It's a good question. My inclination was to think that GMT would be
    > the clearest thing, but I also didn't realize that the result would
    > thus be inconsistent with backup_label. Not sure what's best here.
    
    I vote for following the backup_label precedent; that's stood for quite
    some years now.
    
    			regards, tom lane
    
    
    
    
  316. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-05-15T14:06:52Z

    On 5/15/20 9:34 AM, Tom Lane wrote:
    > Robert Haas <robertmhaas@gmail.com> writes:
    >> It's a good question. My inclination was to think that GMT would be
    >> the clearest thing, but I also didn't realize that the result would
    >> thus be inconsistent with backup_label. Not sure what's best here.
    > 
    > I vote for following the backup_label precedent; that's stood for quite
    > some years now.
    
    I'd rather keep it GMT. The timestamps in the backup label are purely 
    informational, but the timestamps in the manifest are useful, e.g. to 
    set the mtime on a restore to the original value.
    
    Forcing the user to do timezone conversions is prone to error. Some 
    languages, like C, simply aren't good at it.
    
    Of course, my actual preference is to use epoch time which is easy to 
    work with and eliminates the possibility of conversion errors. It is 
    also compact.
    
    Regards,
    -- 
    -David
    david@pgmasters.net
    
    
    
    
  317. Re: documenting the backup manifest file format

    Tom Lane <tgl@sss.pgh.pa.us> — 2020-05-15T14:17:19Z

    David Steele <david@pgmasters.net> writes:
    > On 5/15/20 9:34 AM, Tom Lane wrote:
    >> I vote for following the backup_label precedent; that's stood for quite
    >> some years now.
    
    > Of course, my actual preference is to use epoch time which is easy to 
    > work with and eliminates the possibility of conversion errors. It is 
    > also compact.
    
    Well, if we did that then it'd be sufficiently different from the backup
    label as to remove any risk of confusion.  But "easy to work with" is in
    the eye of the beholder; do we really want a format that's basically
    unreadable to the naked eye?
    
    			regards, tom lane
    
    
    
    
  318. Re: documenting the backup manifest file format

    David Steele <david@pgmasters.net> — 2020-05-15T15:05:02Z

    On 5/15/20 10:17 AM, Tom Lane wrote:
    > David Steele <david@pgmasters.net> writes:
    >> On 5/15/20 9:34 AM, Tom Lane wrote:
    >>> I vote for following the backup_label precedent; that's stood for quite
    >>> some years now.
    > 
    >> Of course, my actual preference is to use epoch time which is easy to
    >> work with and eliminates the possibility of conversion errors. It is
    >> also compact.
    > 
    > Well, if we did that then it'd be sufficiently different from the backup
    > label as to remove any risk of confusion.  But "easy to work with" is in
    > the eye of the beholder; do we really want a format that's basically
    > unreadable to the naked eye?
    
    Well, I lost this argument before so it seems I'm in the minority on 
    easy-to-use. We use epoch time in the pgBackRest manifests which has 
    been easy to deal with in both C and Perl, so experience tells me it 
    really is easy, at least for programs.
    
    The manifest (to me, at least) is generally intended to be 
    machine-processed. For instance, it contains checksums which are not all 
    that useful unless they are checked programmatically -- they can't just 
    be eye-balled.
    
    Regards,
    -- 
    -David
    david@pgmasters.net