Thread

  1. Re: Expanding HOT updates for expression and partial indexes

    Greg Burd <greg@burd.me> — 2025-12-03T22:06:06Z

    On Nov 24 2025, at 1:59 pm, Matthias van de Meent
    <boekewurm+postgres@gmail.com> wrote:
    
    > ... <awesome thoughtful questions, insights, etc> ...
    > 
    > Kind regards,
    > 
    > Matthias van de Meent
    
    Hey Matthias,
    
    I've updated the patch set to v25 and taken your suggested approach of
    minimizing changes in hopes of getting the majority of this patch
    series committed (I hope) soon.  By that I mean that $subject isn't
    really accurate anymore.  This patch uses datumIsEqual(), but still
    introduces the new index AM API and still moves
    HeapDetermineColumnsInfo() into a function in nodeModifyTable.c called
    ExecWhichIndexesRequireUpdates().  The "big idea" is that before calling
    into the table AM to update a tuple the executor should know what the
    impact of that update will be for the indexes on the relation. 
    ExecWhichIndexesRequireUpdates() finds the set of attributes that were
    a) modified and b) are referenced by an index and cause that index to
    need a new index tuple.  This is left up to heap now, but really should
    be generic across all table AMs (IMO), so that's what I've done.
    
    At some point in the future maybe there's a way to switch from the
    heap-specific model of all/none/summarized-only to something where we
    only update indexes that really require the updates (HOT, WARM, etc.)
    and I think this is a step in that direction, but for now the logic
    remains the same as does the signal (TU_Updated).
    
    I'll re-introduce $subject as a layer on v24 next week at which point
    I'll try to address all the good points you raised in your email.  Those
    additions (HOT expressions, HOT partial indexes, and type-specific
    equality tests) are the most controversial.
    
    I think it may be possible that these first few patches are less
    controversial and could make the cut sooner while those other ideas
    remain up for debate. I'm open to that, I think the work in the attached
    set is good and valuable on it's own.
    
    So, the attached patch set 
    
    Benefits of the patch:
    * the tests for what changed move outside of the buffer lock
    * the redundant index_unchanged_by_update() is removed
    
    Downsides of the patch:
    * a bit of new overhead in some cases
    * a bit more complicated logic than before
    
    Recall that this patch set combines with another one of mine on the list
    [1] which covers the simple_heap_update() path, this one doesn't cover
    that case and you'll see that simple_heap_update() still depends on HeapDetermineColumnsInfo().
    
    * 0001 - Prepare heapam_tuple_update() and simple_heap_update() for divergence
    
    This splits off the top of heap_update() and places that logic in both
    heapam_tuple_update() and simple_heap_update().  This patch is also
    present in the other thread [1] and essentially the same.  That thread
    addresses the changes to the catalog tuple updates.  No real effort was
    made to make this patch "pretty" or "stand-alone" as it really is a
    precursor to the work in 0002 and in [1].
    
    * 0002 - Track changed indexed columns in the executor during UPDATEs
    
    This is where the meat is, as described in earlier emails and the commit
    message. HeapDetermineColumnsInfo() logic moves up into the executor
    into ExecWhichIndexesRequireUpdates().  Some heap-specific logic related
    to replica identity remains in heapam_tuple_update().
    
    * 0003 - Replace index_unchanged_by_update() with ri_ChangedIndexedCols
    
    This removes the now redundant index_unchanged_by_update() function and
    instead uses the information gathered in
    ExecWhichIndexesRequireUpdates() and recorded in ri_ChangedIndexCols for
    the same outcome.
    
    
    best.
    
    -greg
    
    [1] https://www.postgresql.org/message-id/flat/2C5C8B8D-8B36-4547-88EB-BDCF9A7C8D94@greg.burd.me