0007-parallel-backup-documentation.patch

application/octet-stream

Filename: 0007-parallel-backup-documentation.patch
Type: application/octet-stream
Part: 6
Message: Re: WIP/PoC for parallel backup

Patch

Same data as JSON: GET /api/v1/attachments/:id/patch the parsed metadata as JSON — format, series position, per-file stats; never the diff bytes. API reference →
Format: format-patch
Series: patch 0007
Subject: parallel backup documentation
File+
doc/src/sgml/protocol.sgml 386 0
doc/src/sgml/ref/pg_basebackup.sgml 20 0
From fa4fe2ed932ddef90ff2e4cff1e42715139f8d4c Mon Sep 17 00:00:00 2001
From: Asif Rehman <asif.rehman@highgo.ca>
Date: Thu, 7 Nov 2019 16:52:40 +0500
Subject: [PATCH 7/7] parallel backup documentation

---
 doc/src/sgml/protocol.sgml          | 386 ++++++++++++++++++++++++++++
 doc/src/sgml/ref/pg_basebackup.sgml |  20 ++
 2 files changed, 406 insertions(+)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 80275215e0..22d620c346 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2700,6 +2700,392 @@ The commands accepted in replication mode are:
      </para>
     </listitem>
   </varlistentry>
+  
+  <varlistentry>
+    <term><literal>START_BACKUP</literal>
+        [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+        [ <literal>PROGRESS</literal> ]
+        [ <literal>FAST</literal> ]
+        [ <literal>TABLESPACE_MAP</literal> ]
+
+     <indexterm><primary>START_BACKUP</primary></indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      Instructs the server to prepare for performing on-line backup. The following
+      options are accepted:
+      <variablelist>
+       <varlistentry>
+        <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+        <listitem>
+         <para>
+          Sets the label of the backup. If none is specified, a backup label
+          of <literal>start backup</literal> will be used. The quoting rules
+          for the label are the same as a standard SQL string with
+          <xref linkend="guc-standard-conforming-strings"/> turned on.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>PROGRESS</literal></term>
+        <listitem>
+         <para>
+          Request information required to generate a progress report. This will
+          send back an approximate size in the header of each tablespace, which
+          can be used to calculate how far along the stream is done. This is
+          calculated by enumerating all the file sizes once before the transfer
+          is even started, and might as such have a negative impact on the
+          performance.  In particular, it might take longer before the first data
+          is streamed. Since the database files can change during the backup,
+          the size is only approximate and might both grow and shrink between
+          the time of approximation and the sending of the actual files.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>FAST</literal></term>
+        <listitem>
+         <para>
+          Request a fast checkpoint.
+         </para>
+        </listitem>
+       </varlistentry>
+
+       <varlistentry>
+        <term><literal>TABLESPACE_MAP</literal></term>
+        <listitem>
+         <para>
+          Include information about symbolic links present in the directory
+          <filename>pg_tblspc</filename> in a file named
+          <filename>tablespace_map</filename>. The tablespace map file includes
+          each symbolic link name as it exists in the directory
+          <filename>pg_tblspc/</filename> and the full path of that symbolic link.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+     
+     <para>
+      In response to this command, server will send out three result sets.
+     </para>
+     <para>
+      The first ordinary result set contains the starting position of the
+      backup, in a single row with two columns. The first column contains
+      the start position given in XLogRecPtr format, and the second column
+      contains the corresponding timeline ID.
+     </para>
+     
+     <para>
+      The second ordinary result set has one row for each tablespace.
+      The fields in this row are:
+      <variablelist>
+       <varlistentry>
+        <term><literal>spcoid</literal> (<type>oid</type>)</term>
+        <listitem>
+         <para>
+          The OID of the tablespace, or null if it's the base
+          directory.
+         </para>
+        </listitem>
+       </varlistentry>
+       <varlistentry>
+        <term><literal>spclocation</literal> (<type>text</type>)</term>
+        <listitem>
+         <para>
+          The full path of the tablespace directory, or null
+          if it's the base directory.
+         </para>
+        </listitem>
+       </varlistentry>
+       <varlistentry>
+        <term><literal>size</literal> (<type>int8</type>)</term>
+        <listitem>
+         <para>
+          The approximate size of the tablespace, in kilobytes (1024 bytes),
+          if progress report has been requested; otherwise it's null.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+
+     <para>
+      The final result set will be sent in a single row with two columns. The
+      first column contains the data of <filename>backup_label</filename> file,
+      and the second column contains the data of <filename>tablespace_map</filename>.
+     </para>
+
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><literal>STOP_BACKUP</literal>
+        [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ]
+        [ <literal>WAL</literal> ]
+        [ <literal>NOWAIT</literal> ]
+
+     <indexterm><primary>STOP_BACKUP</primary></indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      Instructs the server to finish performing on-line backup. The following
+      options are accepted:
+      <variablelist>
+       <varlistentry>
+        <term><replaceable class="parameter">LABEL</replaceable><replaceable>'string'</replaceable></term>
+        <listitem>
+         <para>
+          Provides the content of backup_label file to the backup. The content are
+          the same that were returned by <command>START_BACKUP</command>.
+         </para>
+        </listitem>
+       </varlistentry>
+        <varlistentry>
+        <term><literal>WAL</literal></term>
+        <listitem>
+         <para>
+          Include the necessary WAL segments in the backup. This will include
+          all the files between start and stop backup in the
+          <filename>pg_wal</filename> directory of the base directory tar
+          file.
+         </para>
+        </listitem>
+       </varlistentry>
+       <varlistentry>
+        <term><literal>NOWAIT</literal></term>
+        <listitem>
+         <para>
+          By default, the backup will wait until the last required WAL
+          segment has been archived, or emit a warning if log archiving is
+          not enabled. Specifying <literal>NOWAIT</literal> disables both
+          the waiting and the warning, leaving the client responsible for
+          ensuring the required log is available.
+         </para>
+        </listitem>
+       </varlistentry>
+      </variablelist>
+     </para>
+
+     <para>
+      In response to this command, server will send one or more CopyResponse
+      results followed by a single result set, containing the WAL end position of
+      the backup. The CopyResponse contains <filename>pg_control</filename> and
+      WAL files, if stop backup is run with WAL option.
+     </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><literal>SEND_BACKUP_FILELIST</literal>
+        <indexterm><primary>SEND_BACKUP_FILELIST</primary></indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      Instruct the server to return a list of files and directories, available in
+      data directory. In response to this command, server will send one result set
+      per tablespace. The result sets consist of following fields:
+     </para>
+
+     <variablelist>
+      <varlistentry>
+       <term><literal>path</literal> (<type>text</type>)</term>
+       <listitem>
+        <para>
+         The path and name of the file. In case of tablespace, it is an absolute
+         path on the database server, however, in case of <filename>base</filename>
+         tablespace, it is relative to $PGDATA.
+        </para>
+       </listitem>
+      </varlistentry>
+    
+      <varlistentry>
+       <term><literal>type</literal> (<type>char</type>)</term>
+       <listitem>
+        <para>
+         A single character, identifing the type of file.
+         <itemizedlist spacing="compact" mark="bullet">
+          <listitem>
+           <para>
+            <literal>'f'</literal> - Regular file. Can be any relation or
+            non-relation file in $PGDATA.
+           </para>
+          </listitem>
+    
+          <listitem>
+           <para>
+            <literal>'d'</literal> - Directory.
+           </para>
+          </listitem>
+    
+          <listitem>
+           <para>
+            <literal>'l'</literal> - Symbolic link.
+           </para>
+          </listitem>
+         </itemizedlist>
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><literal>size</literal> (<type>int8</type>)</term>
+        <listitem>
+         <para>
+          The approximate size of the file, in kilobytes (1024 bytes). It's null if
+          type is 'd' or 'l'.
+         </para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><literal>mtime</literal> (<type>Int64</type>)</term>
+        <listitem>
+         <para>
+          The file or directory last modification time, as seconds since the Epoch.
+         </para>
+        </listitem>
+      </varlistentry>
+     </variablelist>
+
+      <para>
+       This list will contain all files and directories in the $PGDATA, regardless of
+       whether they are PostgreSQL files or other files added to the same directory.
+       The only excluded files are:
+       <itemizedlist spacing="compact" mark="bullet">
+        <listitem>
+         <para>
+          <filename>postmaster.pid</filename>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <filename>postmaster.opts</filename>
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <filename>pg_internal.init</filename> (found in multiple directories)
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Various temporary files and directories created during the operation
+          of the PostgreSQL server, such as any file or directory beginning
+          with <filename>pgsql_tmp</filename> and temporary relations.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Unlogged relations, except for the init fork which is required to
+          recreate the (empty) unlogged relation on recovery.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <filename>pg_wal</filename>, including subdirectories. If the backup is run
+          with WAL files included, a synthesized version of <filename>pg_wal</filename> will be
+          included, but it will only contain the files necessary for the
+          backup to work, not the rest of the contents.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <filename>pg_dynshmem</filename>, <filename>pg_notify</filename>,
+          <filename>pg_replslot</filename>, <filename>pg_serial</filename>,
+          <filename>pg_snapshots</filename>, <filename>pg_stat_tmp</filename>, and
+          <filename>pg_subtrans</filename> are copied as empty directories (even if
+          they are symbolic links).
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Files other than regular files and directories, such as symbolic
+          links (other than for the directories listed above) and special
+          device files, are skipped.  (Symbolic links
+          in <filename>pg_tblspc</filename> are maintained.)
+         </para>
+        </listitem>
+       </itemizedlist>
+       Owner, group, and file mode are set if the underlying file system on the server
+       supports it.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><literal>SEND_BACKUP_FILES ( <replaceable class="parameter">'FILE'</replaceable> [, ...] )</literal>
+        [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ]
+        [ <literal>NOVERIFY_CHECKSUMS</literal> ]
+        [ <literal>START_WAL_LOCATION</literal> ]
+
+        <indexterm><primary>SEND_BACKUP_FILES</primary></indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      Instructs the server to send the contents of the requested FILE(s).
+     </para>
+      
+     <para>
+      A clause of the form <literal>SEND_BACKUP_FILES ( 'FILE', 'FILE', ... ) [OPTIONS]</literal>
+      is accepted where one or more FILE(s) can be requested.
+     </para>
+
+     <para>
+      In response to this command, one or more CopyResponse results will be sent,
+      one for each FILE requested. The data in the CopyResponse results will be
+      a tar format (following the “ustar interchange format” specified in the
+      POSIX 1003.1-2008 standard) dump of the tablespace contents, except that
+      the two trailing blocks of zeroes specified in the standard are omitted.
+     </para>
+
+     <para>
+      The following options are accepted:
+       <variablelist>
+        <varlistentry>
+         <term><literal>MAX_RATE</literal> <replaceable>rate</replaceable></term>
+         <listitem>
+          <para>
+           Limit (throttle) the maximum amount of data transferred from server
+           to client per unit of time. The expected unit is kilobytes per second.
+           If this option is specified, the value must either be equal to zero
+           or it must fall within the range from 32 kB through 1 GB (inclusive).
+           If zero is passed or the option is not specified, no restriction is
+           imposed on the transfer.
+          </para>
+         </listitem>
+        </varlistentry>
+
+        <varlistentry>
+         <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+         <listitem>
+          <para>
+           By default, checksums are verified during a base backup if they are
+           enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+           this verification.
+          </para>
+         </listitem>
+        </varlistentry>
+
+        <varlistentry>
+         <term><literal>START_WAL_LOCATION</literal></term>
+         <listitem>
+          <para>
+           The starting WAL position when START BACKUP command was issued,
+           returned in the form of XLogRecPtr format.
+          </para>
+         </listitem>
+        </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+  </varlistentry>
 </variablelist>
 
 </para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9e222f8d..339e68bda7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -536,6 +536,26 @@ PostgreSQL documentation
        </para>
       </listitem>
      </varlistentry>
+     
+     <varlistentry>
+      <term><option>-j <replaceable class="parameter">n</replaceable></option></term>
+      <term><option>--jobs=<replaceable class="parameter">n</replaceable></option></term>
+      <listitem>
+       <para>
+        Create <replaceable class="parameter">n</replaceable> threads to copy
+        backup files from the database server. <application>pg_basebackup</application>
+        will open <replaceable class="parameter">n</replaceable> +1 connections
+        to the database. Therefore, the server must be configured with
+        <xref linkend="guc-max-wal-senders"/> set high enough to accommodate all
+        connections.
+       </para>
+       
+       <para>
+        parallel mode only works with plain format.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
    </para>
 
-- 
2.21.0 (Apple Git-122.2)