Re: WIP/PoC for parallel backup
Asif Rehman <asifr.rehman@gmail.com>
From: Asif Rehman <asifr.rehman@gmail.com>
To: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Cc: Jeevan Chalke <jeevan.chalke@enterprisedb.com>,
Robert Haas <robertmhaas@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2020-03-13T16:21:09Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix failures in incremental_sort due to number of workers
- 23ba3b5ee278 13.0 cited
-
In jsonb_plpython.c, suppress warning message from gcc 10.
- a06921816370 13.0 cited
-
Fix minor problems with non-exclusive backup cleanup.
- 303640199d04 13.0 cited
Attachments
- 0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v9.patch (application/octet-stream) patch v9-0001
- 0004-Parallel-Backup-pg_basebackup_v9.patch (application/octet-stream) patch v9-0004
- 0002-Refactor-some-backup-code-to-increase-reusability.-T_v9.patch (application/octet-stream) patch v9-0002
- 0003-Parallel-Backup-Backend-Replication-commands_v9.patch (application/octet-stream) patch v9-0003
- 0005-parallel-backup-testcase_v9.patch (application/octet-stream) patch v9-0005
- 0006-parallel-backup-documentation_v9.patch (application/octet-stream) patch v9-0006
On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi < rajkumar.raghuwanshi@enterprisedb.com> wrote: > Hi Asif > > I have started testing this feature. I have applied v6 patch on commit > a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan). > I got few observations, please take a look. > > *--if backup failed, backup directory is not getting removed.* > [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D > /tmp/test_bkp/bkp6 > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D > /tmp/test_bkp/bkp6 > pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not > empty > > > *--giving large number of jobs leading segmentation fault.* > ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3 > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > . > . > . > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: FATAL: number of > requested standby connections exceeds max_wal_senders (currently 10) > pg_basebackup: error: could not connect to server: could not fork new > process for connection: Resource temporarily unavailable > > could not fork new process for connection: Resource temporarily unavailable > pg_basebackup: error: failed to create thread: Resource temporarily > unavailable > Segmentation fault (core dumped) > > --stack-trace > gdb -q -c core.11824 pg_basebackup > Loaded symbols for /lib64/libnss_files.so.2 > Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D > /tmp/test_bkp/bkp10'. > Program terminated with signal 11, Segmentation fault. > #0 pthread_join (threadid=140503120623360, thread_return=0x0) at > pthread_join.c:46 > 46 if (INVALID_NOT_TERMINATED_TD_P (pd)) > Missing separate debuginfos, use: debuginfo-install > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 pthread_join (threadid=140503120623360, thread_return=0x0) at > pthread_join.c:46 > #1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840 > #2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316 > #3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78 > #4 exit (status=1) at exit.c:100 > #5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) > at pg_basebackup.c:2713 > #6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127 > #7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at > pg_basebackup.c:2668 > > > *--with tablespace is in the same directory as data, parallel_backup > crashed* > [edb@localhost bin]$ ./initdb -D /tmp/data > [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start > [edb@localhost bin]$ mkdir /tmp/ts > [edb@localhost bin]$ ./psql postgres > psql (13devel) > Type "help" for help. > > postgres=# create tablespace ts location '/tmp/ts'; > CREATE TABLESPACE > postgres=# create table tx (a int) tablespace ts; > CREATE TABLE > postgres=# \q > [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1 > Segmentation fault (core dumped) > > --stack-trace > [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup > Loaded symbols for /lib64/libnss_files.so.2 > Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T > /tmp/ts=/tmp/ts1'. > Program terminated with signal 11, Segmentation fault. > #0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, > backupInfo=0x14210a0) at pg_basebackup.c:3000 > 3000 backupInfo->curr->next = file; > Missing separate debuginfos, use: debuginfo-install > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 > libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 > openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, > backupInfo=0x14210a0) at pg_basebackup.c:3000 > #1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at > pg_basebackup.c:2739 > #2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128 > #3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at > pg_basebackup.c:2668 > (gdb) > Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64). (V9 of the patches are attached). -- Asif Rehman Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca