Re: backup manifests

Andres Freund <andres@anarazel.de>

From: Andres Freund <andres@anarazel.de>

To: Robert Haas <robertmhaas@gmail.com>

Cc: Noah Misch <noah@leadboat.com>, David Steele <david@pgmasters.net>, Stephen Frost <sfrost@snowman.net>, Amit Kapila <amit.kapila16@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>, tushar <tushar.ahuja@enterprisedb.com>, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>, Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>

Date: 2020-03-30T01:07:40Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited

Hi,

On 2020-03-29 20:42:35 -0400, Robert Haas wrote:
> > What do you think of having the verification process also call pg_waldump to
> > validate the WAL CRCs (shown upthread)?  That looked helpful and simple.
> 
> I don't love calls to external binaries, but I think the thing that
> really bothers me is that pg_waldump is practically bound to terminate
> with an error, because the last WAL segment will end with a partial
> record.

I don't think that's the case here. You should know the last required
record, which should allow to specify the precise end for pg_waldump. If
it errors out reading to that point, we'd be in trouble.

> For the same reason, I think there's really no such thing as
> validating a single WAL file. I suppose you'd need to know the exact
> start and end locations for a minimal WAL replay and check that all
> records between those LSNs appear OK, ignoring any apparent problems
> after the minimum ending point, or at least ignoring any problems due
> to an incomplete record in the last file. We don't have a tool for
> that currently, and I don't think I can write one this week. Or at
> least, not a good one.

pg_waldump -s / -e?

> > > +             parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it?  (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Given that that's a pre-existing issue - at a significantly larger scale
imo - e.g. for pg_dump (even in the --schema-only case), and that there
are tons of backend side issues with lots of relations too, I think
that's fine.

You could of course implement something merge-join like, and implement
the sorted input via a disk base sort. But that's a lot of work (good
luck making tuplesort work in the frontend...). So I'd not go there
unless there's a lot of evidence this is a serious practical issue.

If we find this use too much memory, I think we'd be better off
condensing pathnames into either fewer allocations, or a RelFileNode as
part of the struct (with a fallback to string for other types of
files). But I'd also not go there for now.

Greetings,

Andres Freund