Thread

Re: block-level incremental backup

Adam Brusselback <adambrusselback@gmail.com> — 2019-04-23T19:12:27Z
I hope it's alright to throw in my $0.02 as a user. I've been following
this (and the other thread on reading WAL to find modified blocks,
prefaulting, whatever else) since the start with great excitement and would
love to see the built-in backup capabilities in Postgres greatly improved.
I know this is not completely on-topic for just incremental backups, so I
apologize in advance. It just seemed like the most apt place to chime in.


Just to preface where I am coming from, I have been using pgBackRest for
the past couple years and used wal-e prior to that.  I am not a big *nix
user other than all my servers, do all my development on Windows / use
primarily Java. The command line is not where I feel most comfortable
despite my best efforts over the last 5-6 years. Prior to Postgres, I used
SQL Server for quite a few years at previous companies but was more a
junior / intermediate skill set back then. I just wanted to put that out
there so you can see where my bias's are.




With all that said, I would not be comfortable using pg_basebackup as my
main backup tool simply because I’d have to cobble together numerous tools
to get backups stored in a safe (not on the same server) location, I’d have
to manage expiring backups and the WAL which is no longer needed, along
with the rest of the stuff that makes these backup management tools useful.


The command line scares me, and even if I was able to get all that working,
I would not feel warm and fuzzy I didn’t mess something up horribly and I
may hit an edge case which destroys backups, silently corrupts data, etc.

I love that there are tools that manage all of it; backups, wal archiving,
remote storage, integrate with cloud storage (S3 and the like), manages the
retention of these backups with all their dependencies for me, and has all
the restore options necessary built in as well.


Block level incremental backup would be amazing for my use case. I have
small updates / deletes that happen to data all over some of my largest
tables. With pgBackRest, since the diff/incremental backups are at the file
level, I can have a single update / delete which touched a random spot in a
table and now requires that whole 1gb file to be backed up again. That
said, even if pg_basebackup was the only tool that did incremental block
level backup tomorrow, I still wouldn’t start using it directly. I went
into the issues I’d have to deal with if I used pg_basebackup above, and
incremental backups without a management tool make me think using it
correctly would be much harder.


I know this thread is just about incremental backup, and that pretty much
everything in core is built up from small features into larger more complex
ones. I understand that and am not trying to dump on any efforts, I am
super excited to see work being done in this area! I just wanted to share
my perspective on how crucial good backup management is to me (and I’m sure
a few others may share my sentiment considering how popular all the
external tools are).

I would never put a system in production unless I have some backup
management in place. If core builds a backup management tool which uses
pg_basebackup as building blocks for its solution…awesome! That may be
something I’d use.  If pg_basebackup can be improved so it can be used as
the basis most external backup management tools can build on top of, that’s
also great. All the external tools which practically every Postgres company
have built show that it’s obviously a need for a lot of users. Core will
never solve every single problem for all users, I know that. It would just
be great to see some of the fundamental features of backup management baked
into core in an extensible way.

With that, there could be a recommended way to set up backups
(full/incremental, parallel, compressed), point in time recovery, backup
retention, and perform restores (to a point in time, on a replica server,
etc) with just the tooling within core with a nice and simple user
interface, and great performance.

If those features core supports in the internal tooling are built in an
extensible way (as has been discussed), there could be much less
duplication of work implementing the same base features over and over for
each external tool. Those companies can focus on more value-added features
to their own products that core would never support, or on improving the
tooling/performance/features core provides.


Well, this is way longer and a lot less coherent than I was hoping, so I
apologize for that. Hopefully my stream of thoughts made a little bit of
sense to someone.


-Adam