Thread

Re: Expanding HOT updates for expression and partial indexes

Greg Burd <greg@burd.me> — 2025-11-19T18:00:29Z
On Nov 16 2025, at 1:53 pm, Greg Burd <greg@burd.me> wrote:

> 0004 - Enable HOT updates for expression and partial indexes
> 
> This finally gets us back to where this project started, but on much
> more firm ground than before because we're not going to self-deadlock. 
> The idea has grown from a small function into something larger, but only
> out of necessity.
> 
> In this patch I add ExecWhichIndexesRequireUpdates() in execIndexing.c
> which implements (c) finding the set of attributes that force new index
> updates.  This set can be very different from the modified indexed
> attributes.  We know that some attributes are not equal to their
> previous versions, but does that mean that the index that references
> that attribute needs a new index tuple?  It may, or it may not.  Here's
> the comment on that function that explains:
> 
> /*
> * ExecWhichIndexesRequireUpdates
> *
> * Determine which indexes need updating given modified indexed attributes.
> * This function is a companion to ExecCheckIndexedAttrsForChanges(). 
> On the
> * surface, they appear similar but they are doing two very different things.
> *
> * For a standard index on a set of attributes this is the intersection of
> * the mix_attrs and the index attrs (key, expression, but not predicate).
> *
> * For expression indexes and indexes which implement the amcomparedatums()
> * index AM API we'll need to form index datum and compare each
> attribute to
> * see if any actually changed.
> *
> * For expression indexes the result of the expression might not change
> at all,
> * this is common with JSONB columns which require expression indexes
> and where
> * it is commonplace to index a field within a document and have
> updates that
> * generally don't update that field.
> *
> * Partial indexes won't trigger index tuples when the old/new tuples
> are both
> * outside of the predicate range.
> *
> * For nbtree the amcomparedatums() API is critical as it requires that key
> * attributes are equal when they memcmp(), which might not be the case when
> * using type-specific comparison or factoring in collation which might make
> * an index case insensitive.
> *
> * All of this is to say that the goal is for the executor to know,
> ahead of
> * calling into the table AM for the update and before calling into the index
> * AM for inserting new index tuples, which attributes at a minimum will
> * necessitate a new index tuple.
> *
> ...
> */

Attached are rebased (d5b4f3a6d4e) patches with the only changes
happening in the last patch in the series.

0004 - Enable HOT updates for expression and partial indexes

I was never happy with the dual functions
ExecCheckIndexedAttrsForChanges() and ExecWhichIndexesRequireUpdates(),
it felt like too much overhead and duplication of effort.  While
updating my tests, adding a few cases, I found that there was also a
flaw in the logic.  So, time to rewrite and combine them.

What did I discover?  Before the logic was to find the set of modified
indexed attributes then review all the indexes for changed attributes
using FormIndexDatum() and comparing before/after to see if expressions
really changed the value to be indexed or not.  The first pass didn't
take into account expressions, the second did.  So, an expression index
over JSONB data wouldn't extract and test the field within the document,
it was just comparing the entire document before/after using the jsonb
comparison function, no bueno.

This approach wraps both functions into one somewhat simplified
function. The logic is basically, iterate over the indexes reviewing
indexed attributes for changes.  Along the way we call into the new
index AM's comparison function when present, otherwise we find and use
the proper type-specific comparison function for the datum.  At the end
of the function we have our Bitmapset of attributes that should trigger
new index tuples.

> What's left undone?
> 
> * I need to check code coverage so that I might

I did this and it was quite good, I'll do it again for this new series
but it's nice to see that the tests are exercising the vast majority of
the code paths.

> * create tests covering all the new cases

I think the coverage is good, maybe even redundant or overly complex in places.

> * update the README.HOT documentation, wiki, etc.

Soon, I hope to have this approach solid and under review before
solidifying the docs.

> * performance...

Still as yet unmeasured, I know that there is more work per-update to
perform these checks, so some overhead, but I don't know if that
overhead is more than before with HeapDetermineColumnsInfo() and
index_unchanged_by_update().  Those two functions did essentially the
same thing, only with binary comparison (datumIsEqual()). I need to
measure that.  What about doing all this work outside of the buffer lock
in heap_update()?  Surely that'll give back a bit or at least add to
concurrency.  Forming index tuples a few extra times and evaluating the
expressions 3 times rather than 1 is going to hurt, I think I can come
up with a way to cache the formed datum and use it later on, but is that
worth it?  Complex expressions, yes.  Also, what about expressions that
expect to be executed once... and now are 3x?  That's what forced my
update to the insert-conflict-specconflict.out test, but AFAICT there is
no way to test if an expression's value is going to change on update
without exercising it once for the old tuple and once for the new tuple.
Even if it were possible for an index to provide the key it might have
changed after the expression evaluation (as is the case in hash), so I
don't think this is avoidable.  Maybe that's reason enough to add a
reloption to disable the expression evaluation piece of this?  Given
that it might create a logic or performance regression.  The flip side
is the potential to use the HOT path, that's a real savings.

One concerning thing is that nbtree's assumption that key attributes for
TIDs must use binary comparison for equality.  This means that for our
common case (heap/btree) there is more work per-update than before,
which is why I need to measure.  I could look into eliminating the
nbtree requirement, I don't understand it too well as yet by I believe
that on page split there is an attempt to deduplicate TIDs into a
TIDBitmap and the test for when that's possible is datumIsEqual().  If
that were the same as in this new code, possibly evening using
tts_attr_equal(), then... I don't know, I'll have to investigate.  Chime
in here if you can educate me on this one. :)

best.

-greg