Re: backup manifests

tushar <tushar.ahuja@enterprisedb.com>

From: tushar <tushar.ahuja@enterprisedb.com>
To: Robert Haas <robertmhaas@gmail.com>, Suraj Kharage <suraj.kharage@enterprisedb.com>
Cc: Rushabh Lathia <rushabh.lathia@gmail.com>, Tels <nospam-pg-abuse@bloodgate.com>, David Steele <david@pgmasters.net>, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>, Jeevan Chalke <jeevan.chalke@enterprisedb.com>, vignesh C <vignesh21@gmail.com>
Date: 2020-03-04T13:51:03Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Try to avoid compiler warnings in optimized builds.

  2. Fix option related issues in pg_verifybackup.

  3. Add index term for backup manifest in documentation.

  4. Code review for backup manifest.

  5. Document the backup manifest file format.

  6. Fix typo in pg_validatebackup documentation.

  7. Exclude backup_manifest file that existed in database, from BASE_BACKUP.

  8. Msys2 tweaks for pg_validatebackup corruption test

  9. Fix resource management bug with replication=database.

  10. Be more careful about time_t vs. pg_time_t in basebackup.c.

  11. pg_validatebackup: Fix 'make clean' to remove tmp_check.

  12. pg_validatebackup: Also use perl2host in TAP tests.

  13. Generate backup manifests for base backups, and validate them.

  14. Add checksum helper functions.

  15. pg_waldump: Add a --quiet option.

  16. Catversion bump for b9b408c48724

  17. pg_basebackup: Refactor code for reading COPY and tar data.

  18. Use a ResourceOwner to track buffer pins in all cases.

  19. Use ARMv8 CRC instructions where available.

  20. Logical replication support for initial data copy

  21. Use Intel SSE 4.2 CRC instructions where available.

  22. Switch to CRC-32C in WAL and other places.

  23. Remove support for 64-bit CRC.

  24. Change CRCs in WAL records from 64bit to 32bit for performance reasons.

Hi,

There is a scenario in which i add something inside the pg_tablespace 
directory , i am getting an error like-

pg_validatebackup: * manifest_checksum = 
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is 
present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified

Steps to reproduce -
--connect to psql terminal   , create a tablespace
postgres=# \! mkdir /tmp/my_tblspc
postgres=# create tablespace tbs location '/tmp/my_tblspc';
CREATE TABLESPACE
postgres=# \q

--run pg_basebackup
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D data_dir   -T 
/tmp/my_tblspc/=/tmp/new_my_tblspc
[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
PG_13_202002271

--create a new file under PG_13_* folder
[centos@tushar-ldap-docker bin]$ touch 
/tmp/new_my_tblspc/PG_13_202002271/test
[centos@tushar-ldap-docker bin]$

--run pg_validatebackup ,Getting an error which looks expected
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: error: "pg_tblspc/16386/PG_13_202002271/test" is 
present on disk but not in the manifest
[centos@tushar-ldap-docker bin]$

--remove the added file
[centos@tushar-ldap-docker bin]$ rm -rf   
/tmp/new_my_tblspc/PG_13_202002271/test

--run pg_validatebackup , working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--remove the folder PG_13*
[centos@tushar-ldap-docker bin]$ rm -rf   
/tmp/new_my_tblspc/PG_13_202002271/
[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/

--run pg_validatebackup ,   No error reported  ?
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

Start the server -

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D data_dir/ start -o '-p 9033'
waiting for server to start....2020-03-04 19:18:54.839 IST [13097] LOG:  
starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv6 address 
"::1", port 9033
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv4 address 
"127.0.0.1", port 9033
2020-03-04 19:18:54.842 IST [13097] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.9033"
2020-03-04 19:18:54.843 IST [13097] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.845 IST [13098] LOG:  database system was 
interrupted; last known up at 2020-03-04 19:14:50 IST
2020-03-04 19:18:54.937 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  redo starts at 0/18000028
2020-03-04 19:18:54.939 IST [13098] LOG:  consistent recovery state 
reached at 0/18000100
2020-03-04 19:18:54.939 IST [13098] LOG:  redo done at 0/18000100
2020-03-04 19:18:54.941 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.984 IST [13097] LOG:  database system is ready to 
accept connections
  done
server started
[centos@tushar-ldap-docker bin]$

regards,

On 3/4/20 3:51 PM, tushar wrote:
> Another scenario, in which if we modify Manifest-Checksum" value from 
> backup_manifest file , we are not getting an error
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum = 
> 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
> pg_validatebackup: backup successfully verified
>
> open backup_manifest file and replace
>
> "Manifest-Checksum": 
> "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
> with
> "Manifest-Checksum": "Hello World"}
>
> rerun the pg_validatebackup
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum = Hello World
> pg_validatebackup: backup successfully verified
>
> regards,
>
> On 3/4/20 3:26 PM, tushar wrote:
>> Hi,
>> Another observation , if i change the ownership of a file which is 
>> under global/ directory
>> i.e
>>
>> [root@tushar-ldap-docker global]# chown enterprisedb 2396
>>
>> and run the pg_validatebackup command, i am getting this message -
>>
>> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup gggg
>> pg_validatebackup: * manifest_checksum = 
>> e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
>> pg_validatebackup: error: could not open file "global/2396": 
>> Permission denied
>> *** Error in `./pg_validatebackup': double free or corruption 
>> (!prev): 0x0000000001850ba0 ***
>> ======= Backtrace: =========
>> /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
>> ./pg_validatebackup[0x401f4c]
>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
>> ./pg_validatebackup[0x402049]
>> ======= Memory map: ========
>> 00400000-00415000 r-xp 00000000 fd:03 4044545 
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 00614000-00615000 r--p 00014000 fd:03 4044545 
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 00615000-00616000 rw-p 00015000 fd:03 4044545 
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 017f3000-01878000 rw-p 00000000 00:00 
>> 0                                  [heap]
>> 7fa218000000-7fa218021000 rw-p 00000000 00:00 0
>> 7fa218021000-7fa21c000000 ---p 00000000 00:00 0
>> 7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697 
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697 
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697 
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697 
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e338000-7fa224862000 r--p 00000000 fd:03 
>> 266442                     /usr/lib/locale/locale-archive
>> 7fa224862000-7fa224a25000 r-xp 00000000 fd:03 
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
>> 7fa224c30000-7fa224c47000 r-xp 00000000 fd:03 
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224c47000-7fa224e46000 ---p 00017000 fd:03 
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e46000-7fa224e47000 r--p 00016000 fd:03 
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
>> 7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478 
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478 
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478 
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225094000-7fa2250b6000 r-xp 00000000 fd:03 
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
>> 7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
>> 7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
>> 7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00 
>> 0                          [stack]
>> 7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00 
>> 0                          [vdso]
>> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 
>> 0                  [vsyscall]
>> Aborted
>> [centos@tushar-ldap-docker bin]$
>>
>>
>> I am getting the error message but along with "*** Error in 
>> `./pg_validatebackup': double free or corruption (!prev): 
>> 0x0000000001850ba0 ***"  messages
>>
>> Is this expected ?
>>
>> regards,
>>
>> On 3/3/20 8:19 PM, tushar wrote:
>>> On 3/3/20 4:04 PM, tushar wrote:
>>>> Thanks Robert.  After applying all the 5 patches (v8-00*) against 
>>>> PG v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 
>>>
>>> There is a scenario where pg_validatebackup is not throwing an error 
>>> if some file deleted from pg_wal/ folder and  but later at the time 
>>> of restoring - we are getting an error
>>>
>>> [centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
>>>
>>> [centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
>>> 000000010000000000000010  archive_status
>>>
>>> [centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
>>>
>>> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
>>> pg_validatebackup: * manifest_checksum = 
>>> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
>>> pg_validatebackup: backup successfully verified
>>>
>>> [centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
>>> waiting for server to start....2020-03-02 20:05:22.732 IST [21441] 
>>> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled 
>>> by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
>>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address 
>>> "::1", port 3333
>>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address 
>>> "127.0.0.1", port 3333
>>> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket 
>>> "/tmp/.s.PGSQL.3333"
>>> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was 
>>> interrupted; last known up at 2020-03-02 20:04:35 IST
>>> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL 
>>> directory "pg_wal/archive_status"
>>> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
>>> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate 
>>> required checkpoint record
>>> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from 
>>> a backup, touch 
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal" 
>>> and add required recovery options.
>>>     If you are not restoring from a backup, try removing the file 
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
>>>     Be careful: removing 
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will 
>>> result in a corrupt cluster if restoring from a backup.
>>> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID 
>>> 21442) exited with exit code 1
>>> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to 
>>> startup process failure
>>> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
>>>  stopped waiting
>>> pg_ctl: could not start server
>>> Examine the log output.
>>> [centos@tushar-ldap-docker bin]$
>>>
>>
>

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company