Re: backup manifests
Rushabh Lathia <rushabh.lathia@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- 0006-checksum-algo-option.patch (text/x-patch) patch 0006
As per the discussion on the thread, here is the patch which
a) Make checksum for manifest file optional.
b) Allow user to choose a particular algorithm.
Currently with the WIP patch SHA256 and CRC checksum algorithm
supported. Patch also changed the manifest file format to append
the used algorithm name before the checksum, this way it will be
easy to validator to know which algorithm to used.
Ex:
./db/bin/pg_basebackup -D bksha/ --manifest-with-checksums=SHA256
$ cat bksha/backup_manifest | more
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:46:46 GMT
SHA256:7cf53d1b9facca908678ab70d93a9e7460cd35cedf7891de948dcf858f8a281a
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT
SHA256:8d2b6cb1dc1a6e8cee763b52d75e73571fddce06eb573861d44082c7d8c03c26
./db/bin/pg_basebackup -D bkcrc/ --manifest-with-checksums=CRC
PostgreSQL-Backup-Manifest-Version 1
File backup_label 226 2019-12-04 17:58:40 GMT CRC:343138313931333134
File pg_xact/0000 8192 2019-12-04 17:46:46 GMT CRC:363538343433333133
Pending TODOs:
- Documentation update
- Code cleanup
- Testing.
I will further continue to work on the patch and meanwhile feel free to
provide
thoughts/inputs.
Thanks,
On Mon, Nov 25, 2019 at 11:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 22, 2019 at 5:15 PM Tels <nospam-pg-abuse@bloodgate.com>
> wrote:
> > It is related to the number of states...
>
> Thanks for this explanation. See my reply to David where I also
> discuss this point.
>
> > However, if you choose a hash, please do not go below SHA-256. Both MD5
> > and SHA-1 already had collision attacks, and these only got to be bound
> > to be worse.
> >
> > https://www.mscs.dal.ca/~selinger/md5collision/
> > https://shattered.io/
>
> Yikes, that second link, about SHA-1, is depressing. Now, it's not
> likely that an attacker has access to your backup repository and can
> spend 6500 years of CPU time to engineer a Trojan file there (maybe
> more, because the files are probably bigger than the PDFs they used in
> that case) and then induce you to restore and rely upon that backup.
> However, it's entirely likely that somebody is going to eventually ban
> SHA-1 as the attacks get better, which is going to be a problem for us
> whether the underlying exposures are problems or not.
>
> > It might even be a wise idea to encode the used Hash-Algorithm into the
> > manifest file, so it can be changed later. The hash length might be not
> > enough to decide which algorithm is the one used.
>
> I agree. Let's write
> SHA256:bc1c3a57369acd0d2183a927fb2e07acbbb1c97f317bbc3b39d93ec65b754af5
> or similar rather than just the hash. That way even if the entire SHA
> family gets cracked, we can easily substitute in something else that
> hasn't been cracked yet.
>
> (It is unclear to me why anyone supposes that *any* popular hash
> function won't eventually be cracked. For a K-bit hash function, there
> are 2^K possible outputs, where K is probably in the hundreds. But
> there are 2^{2^33} possible 1GB files. So for every possible output
> value, there are 2^{2^33-K} inputs that produce that value, which is a
> very very big number. The probability that any given input produces a
> certain output is very low, but the number of possible inputs that
> produce a given output is very high; so assuming that nobody's ever
> going to figure out how to construct them seems optimistic.)
>
> > To get a feeling one can use:
> >
> > openssl speed md5 sha1 sha256 sha512
> >
> > On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:
> >
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
> > bytes 16384 bytes
> > md5 122638.55k 277023.96k 487725.57k 630806.19k
> > 683892.74k 688553.98k
> > sha1 127226.45k 313891.52k 632510.55k 865753.43k
> > 960995.33k 977215.19k
> > sha256 77611.02k 173368.15k 325460.99k 412633.43k
> > 447022.92k 448020.48k
> > sha512 51164.77k 205189.87k 361345.79k 543883.26k
> > 638372.52k 645933.74k
> >
> > Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
> > 427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
> > need a
> > pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
> > speeds
> > and then you'd use a real CPU for your server, not some poor Intel
> > powersaving
> > surfing thingy-majingy :)
>
> I mean, how fast is in theory doesn't matter nearly as much as what
> happens when you benchmark the proposed implementation, and the
> results we have so far don't support the theory that this is so cheap
> as to be negligible.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
--
Rushabh Lathia