Thread

  1. Re: block-level incremental backup

    Adam Brusselback <adambrusselback@gmail.com> — 2019-04-23T19:12:27Z

    I hope it's alright to throw in my $0.02 as a user. I've been following
    this (and the other thread on reading WAL to find modified blocks,
    prefaulting, whatever else) since the start with great excitement and would
    love to see the built-in backup capabilities in Postgres greatly improved.
    I know this is not completely on-topic for just incremental backups, so I
    apologize in advance. It just seemed like the most apt place to chime in.
    
    
    Just to preface where I am coming from, I have been using pgBackRest for
    the past couple years and used wal-e prior to that.  I am not a big *nix
    user other than all my servers, do all my development on Windows / use
    primarily Java. The command line is not where I feel most comfortable
    despite my best efforts over the last 5-6 years. Prior to Postgres, I used
    SQL Server for quite a few years at previous companies but was more a
    junior / intermediate skill set back then. I just wanted to put that out
    there so you can see where my bias's are.
    
    
    
    
    With all that said, I would not be comfortable using pg_basebackup as my
    main backup tool simply because I’d have to cobble together numerous tools
    to get backups stored in a safe (not on the same server) location, I’d have
    to manage expiring backups and the WAL which is no longer needed, along
    with the rest of the stuff that makes these backup management tools useful.
    
    
    The command line scares me, and even if I was able to get all that working,
    I would not feel warm and fuzzy I didn’t mess something up horribly and I
    may hit an edge case which destroys backups, silently corrupts data, etc.
    
    I love that there are tools that manage all of it; backups, wal archiving,
    remote storage, integrate with cloud storage (S3 and the like), manages the
    retention of these backups with all their dependencies for me, and has all
    the restore options necessary built in as well.
    
    
    Block level incremental backup would be amazing for my use case. I have
    small updates / deletes that happen to data all over some of my largest
    tables. With pgBackRest, since the diff/incremental backups are at the file
    level, I can have a single update / delete which touched a random spot in a
    table and now requires that whole 1gb file to be backed up again. That
    said, even if pg_basebackup was the only tool that did incremental block
    level backup tomorrow, I still wouldn’t start using it directly. I went
    into the issues I’d have to deal with if I used pg_basebackup above, and
    incremental backups without a management tool make me think using it
    correctly would be much harder.
    
    
    I know this thread is just about incremental backup, and that pretty much
    everything in core is built up from small features into larger more complex
    ones. I understand that and am not trying to dump on any efforts, I am
    super excited to see work being done in this area! I just wanted to share
    my perspective on how crucial good backup management is to me (and I’m sure
    a few others may share my sentiment considering how popular all the
    external tools are).
    
    I would never put a system in production unless I have some backup
    management in place. If core builds a backup management tool which uses
    pg_basebackup as building blocks for its solution…awesome! That may be
    something I’d use.  If pg_basebackup can be improved so it can be used as
    the basis most external backup management tools can build on top of, that’s
    also great. All the external tools which practically every Postgres company
    have built show that it’s obviously a need for a lot of users. Core will
    never solve every single problem for all users, I know that. It would just
    be great to see some of the fundamental features of backup management baked
    into core in an extensible way.
    
    With that, there could be a recommended way to set up backups
    (full/incremental, parallel, compressed), point in time recovery, backup
    retention, and perform restores (to a point in time, on a replica server,
    etc) with just the tooling within core with a nice and simple user
    interface, and great performance.
    
    If those features core supports in the internal tooling are built in an
    extensible way (as has been discussed), there could be much less
    duplication of work implementing the same base features over and over for
    each external tool. Those companies can focus on more value-added features
    to their own products that core would never support, or on improving the
    tooling/performance/features core provides.
    
    
    Well, this is way longer and a lot less coherent than I was hoping, so I
    apologize for that. Hopefully my stream of thoughts made a little bit of
    sense to someone.
    
    
    -Adam