gist-bufferbuild-docupdates-edited.patch
text/x-diff
Filename: gist-bufferbuild-docupdates-edited.patch
Type: text/x-diff
Part: 0
Message:
Re: WIP: Fast GiST index build
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 78171cf..244a2e2 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -642,6 +642,35 @@ my_distance(PG_FUNCTION_ARGS)
</variablelist>
+ <sect2 id="gist-buffering-build">
+ <title>GiST buffering build</title>
+ <para>
+ Building large GiST indexes that don't fit in cache by simply inserting
+ all the tuples tends to be slow, because if the index tuples are scattered
+ across the index, a large fraction of the insertions need to perform
+ I/O. The exception is well-ordered datasets, where the part of the index
+ where new insertions go to stays well cached.
+ PostgreSQL from version 9.2 supports a more efficient method to build
+ GiST indexes based on buffering, which can dramatically reduce number of
+ random I/O needed.
+ </para>
+
+ <para>
+ However, buffering index build needs to call the <function>penalty</>
+ function more often, which consumes some extra CPU resources. Also, it can
+ infuence the quality of the produced index, in both positive and negative
+ directions. That influence depends on various factors, like the
+ distribution of the input data and operator class implementation.
+ </para>
+
+ <para>
+ By default, the index build switches to the buffering method when the
+ index size reaches <xref linkend="guc-effective-cache-size">. It can
+ be manually turned on or off by the <literal>BUFFERING</literal> parameter
+ to the CREATE INDEX clause.
+ </para>
+
+ </sect2>
</sect1>
<sect1 id="gist-examples">
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 1a1e8d6..1b2969e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -341,6 +341,52 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable class="parameter">name</
</varlistentry>
</variablelist>
+ <para>
+ GiST indexes accept the following parameter:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>BUFFERING</></term>
+ <listitem>
+ <para>
+ Determines whether the buffering build technique described in
+ <xref linkend="gist-buffering-build"> is used to build the index. With
+ <literal>OFF</> it is disabled, with <literal>ON</> it is enabled, and
+ with <literal>AUTO</> (default) it is initially disabled, but turned on
+ on-the-fly once the index size reaches <xref linkend="guc-effective-cache-size">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>LEVELSTEP</></term>
+ <listitem>
+ <para>
+ In buffering build buffers located at tree levels i * <literal>LEVELSTEP</>,
+ i > 0 (we use upward level numbering, level = 0 corresponds to leaf pages).
+ By default <literal>LEVELSTEP</> is calculated so that sub-tree
+ of <literal>LEVELSTEP</> height fits <xref linkend="guc-effective-cache-size">
+ and <xref linkend="guc-maintenance-work-mem">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>BUFFERSIZE</></term>
+ <listitem>
+ <para>
+ Maximum size of node buffer in pages. By default it is calculated so that
+ half emptying of node buffer fill in average one page per underlying node
+ buffer. This ratio guarantees effective IO usage. In some cases lower
+ <literal>BUFFERSIZE</> can give comparable IO economy with less CPU
+ overhead.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
</refsect2>
<refsect2 id="SQL-CREATEINDEX-CONCURRENTLY">