Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com>

From: Asif Rehman <asifr.rehman@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Ibrar Ahmed <ibrar.ahmad@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2019-11-12T12:07:14Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

Attachments

On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > 'startptr' is used by sendFile() during checksum verification. Since
>> > SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>>
>> Ugh, global variables.
>>
>> Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
>> STOP_BACKUP all using the same base_backup_opt_list production as
>> BASE_BACKUP? Presumably most of those options are not applicable to
>> most of those commands, and the productions should therefore be
>> separated.
>>
>
> Are you expecting something like the attached patch? Basically I have
> reorganised the grammar
> rules so each command can have the options required by it.
>
> I was feeling a bit reluctant for this change because it may add some
> unwanted grammar rules in
> the replication grammar. Since these commands are using the same options
> as base backup, may
> be we could throw error inside the relevant functions on unwanted options?
>
>
>
>> You should add docs, too.  I wouldn't have to guess what some of this
>> stuff was for if you wrote documentation explaining what this stuff
>> was for. :-)
>>
>
> Yes I will add it in the next patch.
>
>
>>
>> >> The tablespace_path option appears entirely unused, and I don't know
>> >> why that should be necessary here, either.
>> >
>> > This is to calculate the basepathlen. We need to exclude the tablespace
>> location (or
>> > base path) from the filename before it is sent to the client with
>> sendFile call. I added
>> > this option primarily to avoid performing string manipulation on
>> filename to extract the
>> > tablespace location and then calculate the basepathlen.
>> >
>> > Alternatively we can do it by extracting the base path from the
>> received filename. What
>> > do you suggest?
>>
>> I don't think the server needs any information from the client in
>> order to be able to exclude the tablespace location from the pathname.
>> Whatever it needs to know, it should be able to figure out, just as it
>> would in a non-parallel backup.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
I have updated the replication grammar with some new rules to differentiate
the options production
for base backup and newly added commands.

I have also created a separate patch to include the documentation changes.
The current syntax is as below:

- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
- SEND_BACKUP_FILELIST
- SEND_BACKUP_FILES ( 'FILE' [, ...] )  [ MAX_RATE rate ] [
NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca