Thread

  1. Re: Expanding HOT updates for expression and partial indexes

    Greg Burd <greg@burd.me> — 2025-11-19T18:00:29Z

    On Nov 16 2025, at 1:53 pm, Greg Burd <greg@burd.me> wrote:
    
    > 0004 - Enable HOT updates for expression and partial indexes
    > 
    > This finally gets us back to where this project started, but on much
    > more firm ground than before because we're not going to self-deadlock. 
    > The idea has grown from a small function into something larger, but only
    > out of necessity.
    > 
    > In this patch I add ExecWhichIndexesRequireUpdates() in execIndexing.c
    > which implements (c) finding the set of attributes that force new index
    > updates.  This set can be very different from the modified indexed
    > attributes.  We know that some attributes are not equal to their
    > previous versions, but does that mean that the index that references
    > that attribute needs a new index tuple?  It may, or it may not.  Here's
    > the comment on that function that explains:
    > 
    > /*
    > * ExecWhichIndexesRequireUpdates
    > *
    > * Determine which indexes need updating given modified indexed attributes.
    > * This function is a companion to ExecCheckIndexedAttrsForChanges(). 
    > On the
    > * surface, they appear similar but they are doing two very different things.
    > *
    > * For a standard index on a set of attributes this is the intersection of
    > * the mix_attrs and the index attrs (key, expression, but not predicate).
    > *
    > * For expression indexes and indexes which implement the amcomparedatums()
    > * index AM API we'll need to form index datum and compare each
    > attribute to
    > * see if any actually changed.
    > *
    > * For expression indexes the result of the expression might not change
    > at all,
    > * this is common with JSONB columns which require expression indexes
    > and where
    > * it is commonplace to index a field within a document and have
    > updates that
    > * generally don't update that field.
    > *
    > * Partial indexes won't trigger index tuples when the old/new tuples
    > are both
    > * outside of the predicate range.
    > *
    > * For nbtree the amcomparedatums() API is critical as it requires that key
    > * attributes are equal when they memcmp(), which might not be the case when
    > * using type-specific comparison or factoring in collation which might make
    > * an index case insensitive.
    > *
    > * All of this is to say that the goal is for the executor to know,
    > ahead of
    > * calling into the table AM for the update and before calling into the index
    > * AM for inserting new index tuples, which attributes at a minimum will
    > * necessitate a new index tuple.
    > *
    > ...
    > */
    
    Attached are rebased (d5b4f3a6d4e) patches with the only changes
    happening in the last patch in the series.
    
    0004 - Enable HOT updates for expression and partial indexes
    
    I was never happy with the dual functions
    ExecCheckIndexedAttrsForChanges() and ExecWhichIndexesRequireUpdates(),
    it felt like too much overhead and duplication of effort.  While
    updating my tests, adding a few cases, I found that there was also a
    flaw in the logic.  So, time to rewrite and combine them.
    
    What did I discover?  Before the logic was to find the set of modified
    indexed attributes then review all the indexes for changed attributes
    using FormIndexDatum() and comparing before/after to see if expressions
    really changed the value to be indexed or not.  The first pass didn't
    take into account expressions, the second did.  So, an expression index
    over JSONB data wouldn't extract and test the field within the document,
    it was just comparing the entire document before/after using the jsonb
    comparison function, no bueno.
    
    This approach wraps both functions into one somewhat simplified
    function. The logic is basically, iterate over the indexes reviewing
    indexed attributes for changes.  Along the way we call into the new
    index AM's comparison function when present, otherwise we find and use
    the proper type-specific comparison function for the datum.  At the end
    of the function we have our Bitmapset of attributes that should trigger
    new index tuples.
    
    > What's left undone?
    > 
    > * I need to check code coverage so that I might
    
    I did this and it was quite good, I'll do it again for this new series
    but it's nice to see that the tests are exercising the vast majority of
    the code paths.
    
    > * create tests covering all the new cases
    
    I think the coverage is good, maybe even redundant or overly complex in places.
    
    > * update the README.HOT documentation, wiki, etc.
    
    Soon, I hope to have this approach solid and under review before
    solidifying the docs.
    
    > * performance...
    
    Still as yet unmeasured, I know that there is more work per-update to
    perform these checks, so some overhead, but I don't know if that
    overhead is more than before with HeapDetermineColumnsInfo() and
    index_unchanged_by_update().  Those two functions did essentially the
    same thing, only with binary comparison (datumIsEqual()). I need to
    measure that.  What about doing all this work outside of the buffer lock
    in heap_update()?  Surely that'll give back a bit or at least add to
    concurrency.  Forming index tuples a few extra times and evaluating the
    expressions 3 times rather than 1 is going to hurt, I think I can come
    up with a way to cache the formed datum and use it later on, but is that
    worth it?  Complex expressions, yes.  Also, what about expressions that
    expect to be executed once... and now are 3x?  That's what forced my
    update to the insert-conflict-specconflict.out test, but AFAICT there is
    no way to test if an expression's value is going to change on update
    without exercising it once for the old tuple and once for the new tuple.
    Even if it were possible for an index to provide the key it might have
    changed after the expression evaluation (as is the case in hash), so I
    don't think this is avoidable.  Maybe that's reason enough to add a
    reloption to disable the expression evaluation piece of this?  Given
    that it might create a logic or performance regression.  The flip side
    is the potential to use the HOT path, that's a real savings.
    
    One concerning thing is that nbtree's assumption that key attributes for
    TIDs must use binary comparison for equality.  This means that for our
    common case (heap/btree) there is more work per-update than before,
    which is why I need to measure.  I could look into eliminating the
    nbtree requirement, I don't understand it too well as yet by I believe
    that on page split there is an attempt to deduplicate TIDs into a
    TIDBitmap and the test for when that's possible is datumIsEqual().  If
    that were the same as in this new code, possibly evening using
    tts_attr_equal(), then... I don't know, I'll have to investigate.  Chime
    in here if you can educate me on this one. :)
    
    best.
    
    -greg