Thread

Re: Multi-Entry Indexing for GiST & SP-GiST

Andrey Borodin <x4mmm@yandex-team.ru> — 2026-05-31T18:27:12Z

> On 21 May 2026, at 22:34, Maxime Schoemans <maxime.schoemans@enterprisedb.com> wrote:
> 
> patches attached

Hi Maxime,

I have been reading through the patch set.  I will focus on the GiST side
here - I know the SP-GiST internals far less well. So I would rather
discuss the architecture where I can actually be useful.

Skipping dedup for non-duplicated entries
------------------------------------------

On the scan path, once an opclass has extractValue, every leaf entry
goes through the TID hash even when the indexed value produced a single
sub-entry and therefore cannot collide.  GiST scans are CPU-bound (we
examine every tuple on the page and run consistent on each), so this
probe lands on the hot path rather than being hidden behind I/O.

Since multi-entry is gated on a new, non-default opclass, no existing
index takes this path, so the leaf format for these opclasses is
effectively new and free to extend. INDEX_AM_RESERVED_BIT (0x2000 in
t_info) is reserved for exactly such stuff and is currently unused anywhere
in the backend. We could set it at insert/build time only when extractValue
returns nentries > 1, and skip the hash on scan for entries without the
bit; the hash then grows only with genuinely multiplied TIDs.  I am not
proposing it as a must, just noting the format is new enough to allow it.

One related concern: I am not a big fan of the single-key-column
restriction. Features like this should be orthogonal to the rest of the
AM, and "throws an error on more than one column" tends to calcify into a
permanent limitation rather than a temporary one.

BTW sorting build ignores extract_value. But that's kinda not important at
current stage.

extractValue == new compress
----------------------------

What strikes me in the catalog is that multirange_me_ops drops the
compress support proc (3) and adds extractValue (13), while multirange_ops
is the reverse. So extractValue already supplants compress here: it emits
leaf-typed values directly. Conceptually compress is just extractValue
constrained to nentries == 1, and the SP-GiST side already makes compress
optional when extractValue is present, which points at the same overlap.

Was unifying the two considered, rather than carrying two parallel
support procs?  For example a single "produce leaf entries" entry point,
with a 1->1 shim over compress for the existing opclasses. That would
keep the insert/build path single rather than branching on whether
extractValue exists, and it would frame multi-entry as a generalization
of what compress already does rather than a parallel mechanism.

Is this useful to PostGIS?
--------------------------

The motivation that matters most to me is whether the real heavy users of
GiST will adopt this. Multiranges are a fairly narrow audience on their
own; the compelling case is multi-part geometries (MultiPolygon with
holes, routes, regions with exclaves), which is PostGIS territory.

I am adding Darafei and Paul to CC - it would be very helpful to
hear whether PostGIS would actually use extractValue in their GiST
opclasses, and whether the single-column restriction or the per-entry
dedup cost would be a problem in practice for them. If the GIS side is
on board, the feature is clearly worth itю If not, it is worth knowing
that when designing the AM-level machinery.


Best regards, Andrey Borodin.