Re: WIP/PoC for parallel backup

Asif Rehman <asifr.rehman@gmail.com>

From: Asif Rehman <asifr.rehman@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Ibrar Ahmed <ibrar.ahmad@gmail.com>, PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2019-11-01T15:26:02Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Fix failures in incremental_sort due to number of workers

  2. In jsonb_plpython.c, suppress warning message from gcc 10.

  3. Fix minor problems with non-exclusive backup cleanup.

Attachments

On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

>
>
> On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com>
>> wrote:
>> > I have updated the patch to include the changes suggested by Jeevan.
>> This patch also implements the thread workers instead of
>> > processes and fetches a single file at a time. The tar format has been
>> disabled for first version of parallel backup.
>>
>> Looking at 0001-0003:
>>
>> It's not clear to me what the purpose of the start WAL location is
>> supposed to be. As far as I can see, SendBackupFiles() stores it in a
>> variable which is then used for exactly nothing, and nothing else uses
>> it.  It seems like that would be part of a potential incremental
>> backup feature, but I don't see what it's got to do with parallel full
>> backup.
>>
>
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>
>
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>>
>
> This is to calculate the basepathlen. We need to exclude the tablespace
> location (or
> base path) from the filename before it is sent to the client with sendFile
> call. I added
> this option primarily to avoid performing string manipulation on filename
> to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received
> filename. What
> do you suggest?
>
>
>>
>> STORE_BACKUPFILE() seems like maybe it should be a function rather
>> than a macro, and also probably be renamed, because it doesn't store
>> files and the argument's not necessarily a file.
>>
> Sure.
>
>
>>
>> SendBackupManifest() does not send a backup manifest in the sense
>> contemplated by the email thread on that subject.  It sends a file
>> list.  That seems like the right idea - IMHO, anyway - but you need to
>> do a thorough renaming.
>>
>
> I'm considering the following command names:
> START_BACKUP
> - Starts the backup process
>
> SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
> - Sends the list of all files (along with file information such as
> filename, file type (directory/file/link),
> file size and file mtime for each file) to be backed up.
>
> SEND_BACKUP_FILES
> - Sends one or more files to the client.
>
> STOP_BACKUP
> - Stops the backup process.
>
> I'll update the function names accordingly after your confirmation. Of
> course, suggestions for
> better names are welcome.
>
>
>>
>> I think it would be fine to decide that this facility won't support
>> exclusive-mode backup.
>>
>
> Sure. Will drop this patch.
>
>
>>
>> I don't think much of having both sendDir() and sendDir_(). The latter
>> name is inconsistent with any naming convention we have, and there
>> seems to be no reason not to just add an argument to sendDir() and
>> change the callers.
>
>
>> I think we should rename - perhaps as a preparatory patch - the
>> sizeonly flag to dryrun, or something like that.
>>
>
> Sure, will take care of it.
>
>
>> The resource cleanup does not look right.  You've included calls to
>> PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
>> and StopBackup(), but what happens if there is an error or even a
>> clean shutdown of the connection in between? I think that there needs
>
> to be some change here to ensure that a walsender will always call
>> base_backup_cleanup() when it exits; I think that'd probably remove
>> the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
>> we have already.  This might also be something that could be done as a
>> separate, prepatory refactoring patch.
>>
>
> You're right. I didn't handle this case properly. I will removed
> PG_ENSURE_ERROR_CLEANUP
> calls and replace it with before_shmem_exit handler. This way
> whenever backend process exits,
> base_backup_cleanup will be called:
> - If it exists before calling the do_pg_stop_backup, base_backup_cleanup
> will take care of cleanup.
> - otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
> then base_backup_cleanup
> will simply return without doing anything.
>
>
>
The updated patches are attached.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca