Re: block-level incremental backup

Ibrar Ahmed <ibrar.ahmad@gmail.com>

From: Ibrar Ahmed <ibrar.ahmad@gmail.com>

To: Jeevan Chalke <jeevan.chalke@enterprisedb.com>

Cc: vignesh C <vignesh21@gmail.com>, Robert Haas <robertmhaas@gmail.com>, Anastasia Lubennikova <a.lubennikova@postgrespro.ru>, Stephen Frost <sfrost@snowman.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2019-08-16T14:36:50Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Don't call data type input functions in GUC check hooks
- 21f428ebde39 12.0 cited

On Fri, Aug 16, 2019 at 4:12 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

>
>
>
>
> On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke <
> jeevan.chalke@enterprisedb.com> wrote:
>
>>
>>
>> On Fri, Aug 2, 2019 at 6:43 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>>> Some comments:
>>> 1) There will be some link files created for tablespace, we might
>>> require some special handling for it
>>>
>>
>> Yep. I have that in my ToDo.
>> Will start working on that soon.
>>
>>
>>> 2)
>>> Retry functionality is hanlded only for copying of full files, should
>>> we handle retry for copying of partial files
>>> 3)
>>> we can have some range for maxretries similar to sleeptime
>>>
>>
>> I took help from pg_standby code related to maxentries and sleeptime.
>>
>> However, as we don't want to use system() call now, I have
>> removed all this kludge and just used fread/fwrite as discussed.
>>
>>
>>> 4)
>>> Should we check for malloc failure
>>>
>>
>> Used pg_malloc() instead. Same is also suggested by Ibrar.
>>
>>
>>>
>>> 5) Should we add display of progress as backup may take some time,
>>> this can be added as enhancement. We can get other's opinion on this.
>>>
>>
>> Can be done afterward once we have the functionality in place.
>>
>>
>>>
>>> 6)
>>> If the backup count increases providing the input may be difficult,
>>> Shall user provide all the incremental backups from a parent folder
>>> and can we handle the ordering of incremental backup internally
>>>
>>
>> I am not sure of this yet. We need to provide the tablespace mapping too.
>> But thanks for putting a point here. Will keep that in mind when I
>> revisit this.
>>
>>
>>>
>>> 7)
>>> Add verbose for copying whole file
>>>
>> Done
>>
>>
>>>
>>> 8) We can also check if approximate space is available in disk before
>>> starting combine backup, this can be added as enhancement. We can get
>>> other's opinion on this.
>>>
>>
>> Hmm... will leave it for now. User will get an error anyway.
>>
>>
>>>
>>> 9)
>>> Combine backup into directory can be combine backup directory
>>>
>> Done
>>
>>
>>>
>>> 10)
>>> MAX_INCR_BK_COUNT can be increased little, some applications use 1
>>> full backup at the beginning of the month and use 30 incremental
>>> backups rest of the days in the month
>>>
>>
>> Yeah, agree. But using any number here is debatable.
>> Let's see others opinion too.
>>
> Why not use a list?
>
>
>>
>>
>>> Regards,
>>> Vignesh
>>> EnterpriseDB: http://www.enterprisedb.com
>>>
>>
>>
>> Attached new sets of patches with refactoring done separately.
>> Incremental backup patch became small now and hopefully more
>> readable than the first version.
>>
>> --
>> Jeevan Chalke
>> Technical Architect, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
>
> +       buf = (char *) malloc(statbuf->st_size);
>
> +       if (buf == NULL)
>
> +               ereport(ERROR,
>
> +                               (errcode(ERRCODE_OUT_OF_MEMORY),
>
> +                                errmsg("out of memory")));
>
> Why are you using malloc, you can use palloc here.
>
>
>
> Hi, I gave another look at the patch and have some quick comments.


-
> char       *extptr = strstr(fn, ".partial");

I think there should be a better and strict way to check the file
extension.

-
> +               extptr = strstr(outfn, ".partial");
> +               Assert (extptr != NULL);

Why are you checking that again, you just appended that in the above
statement?

-
> +       if (verbose && statbuf.st_size > (RELSEG_SIZE * BLCKSZ))
> +               pg_log_info("found big file \"%s\" (size: %.2lfGB): %m",
fromfn,
> +                                       (double) statbuf.st_size /
(RELSEG_SIZE * BLCKSZ));

This is not just a log, you find a file which is bigger which surely has
some problem.

-
> +        * We do read entire 1GB file in memory while taking incremental
backup; so
> +        * I don't see any reason why can't we do that here.  Also,
copying data in
> +        * chunks is expensive.  However, for bigger files, we still
slice at 1GB
> +        * border.


What do you mean by bigger file, a file greater than 1GB? In which case you
get file > 1GB?


-- 
Ibrar Ahmed