Re: backup manifests
Robert Haas <robertmhaas@gmail.com>
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Try to avoid compiler warnings in optimized builds.
- 05021a2c0cd2 13.0 landed
-
Fix option related issues in pg_verifybackup.
- 0a89e93bfaa6 13.0 landed
-
Add index term for backup manifest in documentation.
- 4db819ba4039 13.0 landed
-
Code review for backup manifest.
- a2ac73e7be7a 13.0 landed
-
Document the backup manifest file format.
- 149f2ae88ab0 13.0 landed
-
Fix typo in pg_validatebackup documentation.
- c4f82a779d26 13.0 landed
-
Exclude backup_manifest file that existed in database, from BASE_BACKUP.
- 1ec50a81ec0a 13.0 landed
-
Msys2 tweaks for pg_validatebackup corruption test
- c3e4cbaab936 13.0 landed
-
Fix resource management bug with replication=database.
- 3e0d80fd8d3d 13.0 cited
-
Be more careful about time_t vs. pg_time_t in basebackup.c.
- db1531cae009 13.0 cited
-
pg_validatebackup: Fix 'make clean' to remove tmp_check.
- 9f8f881caa0f 13.0 landed
-
pg_validatebackup: Also use perl2host in TAP tests.
- 460314db08e8 13.0 landed
-
Generate backup manifests for base backups, and validate them.
- 0d8c9c1210c4 13.0 landed
-
Add checksum helper functions.
- c12e43a2e0d4 13.0 landed
-
pg_waldump: Add a --quiet option.
- ac44367efbef 13.0 landed
-
Catversion bump for b9b408c48724
- afb5465e0cfc 13.0 cited
-
pg_basebackup: Refactor code for reading COPY and tar data.
- 431ba7bebf13 13.0 landed
-
Use a ResourceOwner to track buffer pins in all cases.
- 3cb646264e8c 12.0 cited
-
Use ARMv8 CRC instructions where available.
- f044d71e331d 11.0 cited
-
Logical replication support for initial data copy
- 7c4f52409a8c 10.0 cited
-
Use Intel SSE 4.2 CRC instructions where available.
- 3dc2d62d0486 9.5.0 cited
-
Switch to CRC-32C in WAL and other places.
- 5028f22f6eb0 9.5.0 cited
-
Remove support for 64-bit CRC.
- 404bc51cde9d 9.5.0 cited
-
Change CRCs in WAL records from 64bit to 32bit for performance reasons.
- 21fda22ec46d 8.1.0 cited
Attachments
- v16-0001-Add-checksum-helper-functions.patch (application/octet-stream) patch v16-0001
- v16-0002-Generate-backup-manifests-for-base-backups-and-v.patch (application/octet-stream) patch v16-0002
- v16-0004-WIP-Store-WAL-ranges-in-manifest-and-validate-th.patch (application/octet-stream) patch v16-0004
- v16-0003-pg_waldump-Add-quiet-option.patch (application/octet-stream) patch v16-0003
On Mon, Mar 30, 2020 at 2:59 PM Andres Freund <andres@anarazel.de> wrote:
> I think it wouldn't be too hard to compute that information while taking
> the base backup. We know the end timeline (ThisTimeLineID), so we can
> just call readTimeLineHistory(ThisTimeLineID). Which should then allow
> for something pretty trivial along the lines of
>
> timelines = readTimeLineHistory(ThisTimeLineID);
> last_start = InvalidXLogRecPtr;
> foreach(lc, timelines)
> {
> TimeLineHistoryEntry *he = lfirst(lc);
>
> if (he->end < startptr)
> continue;
>
> //
> manifest_emit_wal_range(Min(he->begin, startptr), he->end);
> last_start = he->end;
> }
>
> if (last_start == InvalidXlogRecPtr)
> start = startptr;
> else
> start = last_start;
>
> manifest_emit_wal_range(start, entptr);
I made an attempt to implement this. In the attached patch set, 0001
and 0002 are (I think) unmodified from the last version. 0003 is a
slightly-rejiggered version of your new pg_waldump option. 0004 whacks
0002 around so that the WAL ranges are included in the manifest and
pg_validatebackup tries to run pg_waldump for each WAL range. It
appears to work in light testing, but I haven't yet (1) tested it
extensively, (2) written good regression tests for it above and beyond
what pg_validatebackup had already, or (3) updated the documentation.
I'm going to work on those things. I would appreciate *very timely*
feedback on anything people do or do not like about this, because I
want to commit this patch set by the end of the work week and that
isn't very far away. I would also appreciate if people would bear in
mind the principle that half a loaf is better than none, and further
improvements can be made in future releases.
As part of my light testing, I tried promoting a standby that was
running pg_basebackup, and found that pg_basebackup failed like this:
pg_basebackup: error: could not get COPY data stream: ERROR: the
standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should
not be used. Try taking another online backup.
pg_basebackup: removing data directory "/Users/rhaas/pgslave2"
My first thought was that this error message is hard to reconcile with
this comment:
/*
* Send timeline history files too. Only the latest timeline history
* file is required for recovery, and even that only if there happens
* to be a timeline switch in the first WAL segment that contains the
* checkpoint record, or if we're taking a base backup from a standby
* server and the target timeline changes while the backup is taken.
* But they are small and highly useful for debugging purposes, so
* better include them all, always.
*/
But then it occurred to me that this might be a cascading standby.
Maybe the original master died and this machine's master got promoted,
so it has to follow a timeline switch but doesn't itself get promoted.
I think I might try to test out that scenario and see what happens,
but I haven't done so as of this writing. Regardless, it seems like a
really good idea to store a list of WAL ranges rather than a single
start/end/timeline, because even if it's impossible today it might
become possible in the future. Still, unless there's an easy way to
set up a test scenario where multiple WAL ranges need to be verified,
it may be hard to test that this code actually behaves properly.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company