Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com>

From: Asif Rehman <asifr.rehman@gmail.com>
To: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Cc: Jeevan Chalke <jeevan.chalke@enterprisedb.com>, Robert Haas <robertmhaas@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2020-03-13T16:21:09Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

Attachments

On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

> Hi Asif
>
> I have started testing this feature. I have applied v6 patch on commit
> a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
> I got few observations, please take a look.
>
> *--if backup failed, backup directory is not getting removed.*
> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
> /tmp/test_bkp/bkp6
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
> /tmp/test_bkp/bkp6
> pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
> empty
>
>
> *--giving large number of jobs leading segmentation fault.*
> ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> .
> .
> .
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: FATAL:  number of
> requested standby connections exceeds max_wal_senders (currently 10)
> pg_basebackup: error: could not connect to server: could not fork new
> process for connection: Resource temporarily unavailable
>
> could not fork new process for connection: Resource temporarily unavailable
> pg_basebackup: error: failed to create thread: Resource temporarily
> unavailable
> Segmentation fault (core dumped)
>
> --stack-trace
> gdb -q -c core.11824 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
> /tmp/test_bkp/bkp10'.
> Program terminated with signal 11, Segmentation fault.
> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
> pthread_join.c:46
> 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
> pthread_join.c:46
> #1  0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
> #2  0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
> #3  0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
> #4  exit (status=1) at exit.c:100
> #5  0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
> at pg_basebackup.c:2713
> #6  0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
> #7  0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at
> pg_basebackup.c:2668
>
>
> *--with tablespace is in the same directory as data, parallel_backup
> crashed*
> [edb@localhost bin]$ ./initdb -D /tmp/data
> [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
> [edb@localhost bin]$ mkdir /tmp/ts
> [edb@localhost bin]$ ./psql postgres
> psql (13devel)
> Type "help" for help.
>
> postgres=# create tablespace ts location '/tmp/ts';
> CREATE TABLESPACE
> postgres=# create table tx (a int) tablespace ts;
> CREATE TABLE
> postgres=# \q
> [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
> Segmentation fault (core dumped)
>
> --stack-trace
> [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T
> /tmp/ts=/tmp/ts1'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
> backupInfo=0x14210a0) at pg_basebackup.c:3000
> 3000 backupInfo->curr->next = file;
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x0000000000409442 in get_backup_filelist (conn=0x140cb20,
> backupInfo=0x14210a0) at pg_basebackup.c:3000
> #1  0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at
> pg_basebackup.c:2739
> #2  0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
> #3  0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at
> pg_basebackup.c:2668
> (gdb)
>


Thanks Rajkumar. I have fixed the above issues and have rebased the patch
to the latest master (b7f64c64).
(V9 of the patches are attached).


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca