Thread

  1. Automatic tablespace management in pg_basebackup

    Thom Brown <thom@linux.com> — 2024-04-27T03:07:14Z

    Hi,
    
    Manually specifying tablespace mappings in pg_basebackup, especially in
    environments where tablespaces can come and go, or with incremental
    backups, can be tedious and error-prone. I propose a solution using
    pattern-based mapping to automate this process.
    
    So rather than having to specify.
    
    -T /path/to/original/tablespace/a=/path/to/backup/tablespace/a -T
    /path/to/original/tablespace/b=/path/to/backup/tablespace/b
    
    And then coming up with a new location to map to for the subsequent
    incremental backups, perhaps we could have a parameter (I’m just going to
    choose M for “mapping”), like so:
    
    -M %p/%d_backup_1.1
    
    Where it can interpolate the following values:
    %p = path
    %d = directory
    %l = label (not sure about this one)
    
    
    Using the -M example above, when pg_basebackup finds:
    
    /path/to/original/tablespace/a
    /path/to/original/tablespace/b
    
    It creates:
    
    /path/to/original/tablespace/a_backup_1.1
    /path/to/original/tablespace/b_backup_1.1
    
    
    Or:
    
    -M /path/to/backup/tablespaces/1.1/%d
    
    Creates:
    
    /path/to/backup/tablespaces/1.1/a
    /path/to/backup/tablespaces/1.1/b
    
    
    Or possibly allowing something like %l to insert the backup label.
    
    For example:
    
    -M /path/to/backup/tablespaces/%f_%l -l 1.1
    
    Creates:
    
    /path/to/backup/tablespaces/a_1.1
    /path/to/backup/tablespaces/b_1.1
    
    
    This of course would not work if there were tablespaces as follows:
    
    /path/to/first/tablespace/a
    /path/to/second/tablespace/a
    
    Where %d would yield the same result for both tablespaces.  However, this
    seems like an unlikely scenario as the tablespace name within the database
    would need to be unique, but then requires them to use a directory name
    that isn't unique.  This could just be a scenario that isn't supported.
    
    Perhaps even allow it to auto-increment a version number it defines
    itself.  Maybe %v implies “make up a version number here, and if one
    existed in the manifest previously, increment it”.
    
    
    Ultimately, it would turn this:
    
    pg_basebackup
      -D /Users/thombrown/Development/backups/data1.5
      -h /tmp
      -p 5999
      -c fast
      -U thombrown
      -l 1.5
      -T
    /Users/thombrown/Development/tablespaces/ts_a=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_a
      -T
    /Users/thombrown/Development/tablespaces/ts_b=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_b
      -T
    /Users/thombrown/Development/tablespaces/ts_c=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_c
      -T
    /Users/thombrown/Development/tablespaces/ts_d=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_d
      -i /Users/thombrown/Development/backups/data1.4/backup_manifest
    
    Into this:
    
    pg_basebackup
      -D /Users/thombrown/Development/backups/1.5/data
      -h /tmp
      -p 5999
      -c fast
      -U thombrown
      -l 1.5
      -M /Users/thombrown/Development/backups/tablespaces/%v/%d
      -i /Users/thombrown/Development/backups/data1.4/backup_manifest
    
    In fact, if I were permitted to get carried away:
    
    -D /Users/thombrown/Development/backups/%v/%d
    
    Then, the only thing that needs changing for each incremental backup is the
    manifest location (and optionally the label).
    
    
    Given that pg_combinebackup has the same option, I imagine something
    similar would need to be added there too.  We should already know where the
    tablespaces reside, as they are in the final backup specified in the list
    of backups, so that seems to just be a matter of getting input of how the
    tablespaces should be named in the reconstructed backup.
    
    For example:
    
    pg_combinebackup
      -T
    /Users/thombrown/Development/backups/tablespaces/1.4/ts_a=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_a
      -T
    /Users/thombrown/Development/backups/tablespaces/1.4/ts_b=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_b
      -T
    /Users/thombrown/Development/backups/tablespaces/1.4/ts_c=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_c
      -T
    /Users/thombrown/Development/backups/tablespaces/1.4/ts_d=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_d
      -o /Users/thombrown/Development/backups/combined
      /Users/thombrown/Development/backups/data{1.0_full,1.1,1.2,1.3,1.4}
    
    Becomes:
    pg_combinebackup
      -M /Users/thombrown/Development/backups/tablespaces/%v_combined/%d
      -o /Users/thombrown/Development/backups/%v_combined/%d
      /Users/thombrown/Development/backups/{1.0_full,1.1,1.2,1.3,1.4}/data
    
    You may have inferred that I decided pg_combinebackup increments the
    version to the next major version, whereas pg_basebackup in incremental
    mode increments the minor version number.
    
    This, of course, becomes messy if the user decided to include the version
    number in the backup tablespace directory name, but then these sorts of
    things need to be figured out prior to placing into production anyway.
    
    I also get the feeling that accepting an unquoted % as a parameter on the
    command line could be problematic, such as it having a special meaning I
    haven't accounted for here.  In which case, it may require quoting.
    
    Thoughts?
    
    Regards
    
    Thom