Thread

  1. Re: pg_waldump: support decoding of WAL inside tarfile

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-09-08T13:37:02Z

    On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote:
    >
    [..patch]
    
    Hi Amul!
    
    0001: LGTM, maybe I would just slightly enhance the commit message
    ("This is in preparation for adding a second source file to this
    directory.") -- maye bit a bit more verbose or use a message from
    0002?
    0002: LGTM
    0003: LGTM
    
    Tested here (after partial patch apply, and test suite did work fine).
    
    0004:
    
        a.  Why should it be necessary to provide startLSN (-s) ? Couldn't
    it autodetect the first WAL (tar file) inside and just use that with
    some info message?
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
        pg_waldump: error: no start WAL location given
    
        b. Why would it like to open "blah" dir if I wanted that "blah"
    segment from the archive? Shouldn't it tell that it was looking in the
    archive and couldn find it inside?
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah
        pg_waldump: error: could not open file "blah": Not a directory
    
        c. It doesnt work when using SEGSTART, but it's there:
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar
    000000010000000000000059
        pg_waldump: error: could not open file "000000010000000000000059":
    Not a directory
        $ tar tf /tmp/base/pg_wal.tar | head -1
        000000010000000000000059
    
        d.     I've later noticed that follow-up patches seem to use the
    -s switch and there it seems to work OK. The above SEGSTART issue was
    not detected, probably because tests need to be extended cover  of
    segment name rather than just --start LSN (see test_pg_waldump):
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats
    -s 0/59000358
        pg_waldump: first record is after 0/59000358, at 0/590003E8,
    skipping over 144 bytes
        WAL statistics between 0/590003E8 and 0/61000000:
        [..]
    
        e. Code around`if (walpath == NULL && directory != NULL)` needs
    some comments.
    
        f. Code around `if (fname != NULL && is_tar_file(fname,
    &compression))` , so if fname is WAL segment here
    (00000001000000000000005A) and we do check again if that has been
    tar-ed (is_tar_file())? Why?
    
        g. Just a question: the commit message says `Note that this patch
    requires that the WAL files within the archive be in sequential order;
    an error will be reported otherwise`. I'm wondering if such
    occurrences are known to be happening in the wild? Or is it just an
    assumption that if someone would modify the tar somehow? (either way
    we could just add a reason why we need to handle such a case if we
    know -- is manual alternation the only source of such state?). For the
    record, I've tested crafting custom archives with out of sequence WAL
    archives and the code seems to work (it was done using: tar --append
    -f pg_wal.tar --format=ustar ..)
    
        h. Anyway, in case of typo/wrong LSN, 0004 emits wrong error
    message I think:
    
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats
    -s 0/50000358
        pg_waldump: error: WAL files are not archived in sequential order
        pg_waldump: detail: Expecting segment number 80 but found 89.
    
        it's just that the 50000358 LSN above is below the minimal LSN
    present in the WAL segments (first segment is 000000010000000000000059
    there, i've just intentionally provided a bad value 50.. as a typo and
    it causes the wrong message). Now it might not be an issue as with
    0005 patch the same test behaves OK (`pg_waldump: error: could not
    find a valid record after 0/50000358`). It is just relevant if this
    would be committed not all at once.
    
        i. If I give wrong --timeline=999 to pg_waldump it fails with
    misleading error message: could not read WAL data from "pg_wal.tar"
    archive: read -1 of 8192
    
    0005:
        a. I'm wondering if we shouldn't log (to stderr?) some kind of
    notification message (just once) that non-sequential WAL files were
    discovered and that pg_waldump is starting to write to $somewhere as
    it may be causing bigger I/O than anticipated when running the
    command. This can easily help when troubleshooting why it is not fast,
    and also having set TMPDIR to usually /tmp can be slow or too small.
    
        b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with
    TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp .
    We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but
    that path is completely guessable. If an attacker prepares some
    symlinks and links those to some other places, I think the code will
    happily open and overwrite the contents of the rogue symlink. I think
    using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to
    be in play. Consider that pg_waldump can be run as root (there's no
    mechanism preventing it from being used that way).
    
        c. IMHO that unlink() might be not guaranteed to always remove
    files, as in case of any trouble and exit() , those files might be
    left over. I think we need some atexit() handlers. This can be
    triggered with combo of options of nonsequential files in tar + wrong
    LSN given:
    
        $ tar tf pg_wal.tar
        00000001000000000000005A
        00000001000000000000005B
        00000001000000000000005C
        [..]
        000000010000000000000060
        000000010000000000000059 <-- out of order, appended last
        $ ls -lh 0*
        ls: cannot access '0*': No such file or directory
        $ /usr/pgsql19/bin/pg_waldump --path=/tmp/ble/pg_wal.tar --stats
    -s 0/10000358 #wrong LSN
        pg_waldump: error: could not find a valid record after 0/10000358
        $ ls -lh 0*
        -rw------- 1 postgres postgres 16M Sep  8 14:44
    000000010000000000000059.waldump.tmp
        -rw------- 1 postgres postgres 16M Sep  8 14:44
    00000001000000000000005A.waldump.tmp
        [..]
    
    0006: LGTM
    
    0007:
        a. Commit message says `Future patches to pg_waldump will enable
    it to decode WAL directly` , but those pg_waldump are earlier patches,
    right?
    
        b. pg_verifybackup should print some info with --progress that it
    is spawning pg_waldump (pg_verifybackup --progress mode does not
    display anything related to verifing WALs, but it could)
    
        c. I'm wondering, but pg_waldump seems to be not complaining if
    --end=LSN is made into such a future that it doesn't exist. E.g. If
    the latest WAL segment is 60 (with end LSN 0/60A77A59), but I run
    pg_waldump `--end=0/7000000` , it will return code 0 and nothing on
    stderr. So how sure are we that the necessary WAL segments (as per
    backup_manifest) are actually inside the tar? It's supposed to be
    verified, but it isn't for this use case? Same happens if craft
    special tar and remove just one WAL segment from pg_wal.tar (simulate
    missing WAL segment), but ask the pg_verifybackup/pg_waldump to verify
    it to exact last LSN sequence, e.g.:
    
        $ /usr/pgsql19/bin/pg_waldump --quiet
    --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028
    --end=0/60A77A58 && echo OK # but it is not OK
        OK
        $ /usr/pgsql19/bin/pg_waldump --stats
    --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028
    --end=0/60A77A58
        WAL statistics between 0/59000028 and 0/5CFFFFD0: # <-- 0/5C LSN
    maximum detected
        [..]
    
        Notice it has read till 0/5C (but I've asked till 0/60), because
    I've removed 0D:
        $ tar tf /tmp/missing/pg_wal.tar| grep ^0
        000000010000000000000059
        00000001000000000000005A
        00000001000000000000005B
        00000001000000000000005C
        00000001000000000000005E <-- missing 5D
    
        Yet it reported no errors.
    
    0008:
        LGTM
    
    Another open question I have is this: shouldn't backup_manifest come
    with CRC checksum for the archived WALs? Or does that guarantee that
    backup_manifest WAL-Ranges are present in pg_wal.tar is good enough
    because individual WAL files are CRC-protected itself?
    
    -J.