Thread

  1. Re: Multi-Entry Indexing for GiST & SP-GiST

    Andrey Borodin <x4mmm@yandex-team.ru> — 2026-05-31T18:27:12Z

    
    > On 21 May 2026, at 22:34, Maxime Schoemans <maxime.schoemans@enterprisedb.com> wrote:
    > 
    > patches attached
    
    Hi Maxime,
    
    I have been reading through the patch set.  I will focus on the GiST side
    here - I know the SP-GiST internals far less well. So I would rather
    discuss the architecture where I can actually be useful.
    
    Skipping dedup for non-duplicated entries
    ------------------------------------------
    
    On the scan path, once an opclass has extractValue, every leaf entry
    goes through the TID hash even when the indexed value produced a single
    sub-entry and therefore cannot collide.  GiST scans are CPU-bound (we
    examine every tuple on the page and run consistent on each), so this
    probe lands on the hot path rather than being hidden behind I/O.
    
    Since multi-entry is gated on a new, non-default opclass, no existing
    index takes this path, so the leaf format for these opclasses is
    effectively new and free to extend. INDEX_AM_RESERVED_BIT (0x2000 in
    t_info) is reserved for exactly such stuff and is currently unused anywhere
    in the backend. We could set it at insert/build time only when extractValue
    returns nentries > 1, and skip the hash on scan for entries without the
    bit; the hash then grows only with genuinely multiplied TIDs.  I am not
    proposing it as a must, just noting the format is new enough to allow it.
    
    One related concern: I am not a big fan of the single-key-column
    restriction. Features like this should be orthogonal to the rest of the
    AM, and "throws an error on more than one column" tends to calcify into a
    permanent limitation rather than a temporary one.
    
    BTW sorting build ignores extract_value. But that's kinda not important at
    current stage.
    
    extractValue == new compress
    ----------------------------
    
    What strikes me in the catalog is that multirange_me_ops drops the
    compress support proc (3) and adds extractValue (13), while multirange_ops
    is the reverse. So extractValue already supplants compress here: it emits
    leaf-typed values directly. Conceptually compress is just extractValue
    constrained to nentries == 1, and the SP-GiST side already makes compress
    optional when extractValue is present, which points at the same overlap.
    
    Was unifying the two considered, rather than carrying two parallel
    support procs?  For example a single "produce leaf entries" entry point,
    with a 1->1 shim over compress for the existing opclasses. That would
    keep the insert/build path single rather than branching on whether
    extractValue exists, and it would frame multi-entry as a generalization
    of what compress already does rather than a parallel mechanism.
    
    Is this useful to PostGIS?
    --------------------------
    
    The motivation that matters most to me is whether the real heavy users of
    GiST will adopt this. Multiranges are a fairly narrow audience on their
    own; the compelling case is multi-part geometries (MultiPolygon with
    holes, routes, regions with exclaves), which is PostGIS territory.
    
    I am adding Darafei and Paul to CC - it would be very helpful to
    hear whether PostGIS would actually use extractValue in their GiST
    opclasses, and whether the single-column restriction or the per-entry
    dedup cost would be a problem in practice for them. If the GIS side is
    on board, the feature is clearly worth itю If not, it is worth knowing
    that when designing the AM-level machinery.
    
    
    Best regards, Andrey Borodin.