Thread
-
Re: Multi-Entry Indexing for GiST & SP-GiST
Andrey Borodin <x4mmm@yandex-team.ru> — 2026-05-31T18:27:12Z
> On 21 May 2026, at 22:34, Maxime Schoemans <maxime.schoemans@enterprisedb.com> wrote: > > patches attached Hi Maxime, I have been reading through the patch set. I will focus on the GiST side here - I know the SP-GiST internals far less well. So I would rather discuss the architecture where I can actually be useful. Skipping dedup for non-duplicated entries ------------------------------------------ On the scan path, once an opclass has extractValue, every leaf entry goes through the TID hash even when the indexed value produced a single sub-entry and therefore cannot collide. GiST scans are CPU-bound (we examine every tuple on the page and run consistent on each), so this probe lands on the hot path rather than being hidden behind I/O. Since multi-entry is gated on a new, non-default opclass, no existing index takes this path, so the leaf format for these opclasses is effectively new and free to extend. INDEX_AM_RESERVED_BIT (0x2000 in t_info) is reserved for exactly such stuff and is currently unused anywhere in the backend. We could set it at insert/build time only when extractValue returns nentries > 1, and skip the hash on scan for entries without the bit; the hash then grows only with genuinely multiplied TIDs. I am not proposing it as a must, just noting the format is new enough to allow it. One related concern: I am not a big fan of the single-key-column restriction. Features like this should be orthogonal to the rest of the AM, and "throws an error on more than one column" tends to calcify into a permanent limitation rather than a temporary one. BTW sorting build ignores extract_value. But that's kinda not important at current stage. extractValue == new compress ---------------------------- What strikes me in the catalog is that multirange_me_ops drops the compress support proc (3) and adds extractValue (13), while multirange_ops is the reverse. So extractValue already supplants compress here: it emits leaf-typed values directly. Conceptually compress is just extractValue constrained to nentries == 1, and the SP-GiST side already makes compress optional when extractValue is present, which points at the same overlap. Was unifying the two considered, rather than carrying two parallel support procs? For example a single "produce leaf entries" entry point, with a 1->1 shim over compress for the existing opclasses. That would keep the insert/build path single rather than branching on whether extractValue exists, and it would frame multi-entry as a generalization of what compress already does rather than a parallel mechanism. Is this useful to PostGIS? -------------------------- The motivation that matters most to me is whether the real heavy users of GiST will adopt this. Multiranges are a fairly narrow audience on their own; the compelling case is multi-part geometries (MultiPolygon with holes, routes, regions with exclaves), which is PostGIS territory. I am adding Darafei and Paul to CC - it would be very helpful to hear whether PostGIS would actually use extractValue in their GiST opclasses, and whether the single-column restriction or the per-entry dedup cost would be a problem in practice for them. If the GIS side is on board, the feature is clearly worth itю If not, it is worth knowing that when designing the AM-level machinery. Best regards, Andrey Borodin.