Thread

  1. Refactor how we form HeapTuples for CatalogTuple(Insert|Update)

    Greg Burd <greg@burd.me> — 2025-11-14T15:31:28Z

    Hello,
    
    I've been working on expanding HOT updates [1] and have recently had
    some success (will be posted soon to the thread) moving the work done
    within HeapDetermineColumnsInfo() into the executor.  I found that
    heap_update() is used in two paths: heap_tuple_update() and
    simple_tuple_update(), both call into heap_update() which then uses
    HeapDetermineColumnsInfo() to find the set of indexed attributes that
    have been modified during the update.  This makes sense on the normal
    path from the executor, but the simple_tuple_update() path is only
    called from CatalogTupleUpdate() and when I looked at how catalog tuples
    were formed/mutated on that path I discovered something that I had to fix.
    
    Catalog tuples have knowledge of what attributes are mutated implicitly,
    but they don't preserve that information and so
    HeapDetermineColumnsInfo() has to re-discover that later on (while the
    page is locked).  While on the surface this isn't such a big deal, it's
    been that way and worked well for years I have other motivations (see
    [1] again) for improving this.
    
    So, I set about to create a way to retain the knowledge of what was
    modified while mutating catalog tuples and pass that information on to
    CatalogTupleUpdate() as a Bitmapset.  That is the underlying goal of
    this patch set.  To make no behavioral changes, only syntactic ones, and
    end up with something better overall.
    
    That turned out to be a bit of a challenge as we have a lot of places
    where catalog information is updated.  We also have two different
    methods for updating said information.  And we intermix them.  At times
    we hack our way to a solution and ignore convention.  I went down the
    rabbit hole on this one, but I'm back up for a cup of tea because what
    I've done seems materially better to me and I've accomplished the goal
    (and can eventually make use of the result in [1]).  Is this patch
    useful even without that other work?  I think so, just on the basis of
    cleanup. Let me explain.
    
    Historically there have been two methods for modifying heap tuples
    destined for the catalog; Form/GETSTRUCT, and values/nulls/replaces.
    This new method provides a more intuitive and less error-prone approach
    without changing the fundamentals of the process.  In addition, it is
    now possible to retain knowledge of the set of mutated attributes when
    working with catalog tuples.  This isn't used in practice (yet [1]), but
    could be a way to avoid the overhead of HeapDetermineColumnsInfo() in
    heap_update() where (while holding a lock) we re-discover that set by
    comparing old/new HeapTuple datums when trying to find what indexed
    attributes have new values and prevent HOT updates.
    
    Method 1: Form/GETSTRUCT
    
    The "Form/GETSTRUCT" model allows for direct access to the tuple data
    that is then modified, copied, and then updated via CatalogTupleUpdate().
    
    Old:
    Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
    relform->inisclustered = false;
    CatalogTupleUpdate(pg_index, &tuple->t_self, tuple);
    
    New:
    Bitmapset *updated = NULL;
    Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
    HeapTupleUpdateField(pg_index, indisclustered, false, indexForm, updated);
    CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple, updated, NULL);
    bms_free(updated);
    
    
    Method 2: values/nulls/replaces
    
    The "values/nulls/replaces" model collects the necessary information to
    either update or create a heap tuple using heap_modify_tuple() or
    heap_form_tuple() or sometimes heap_modify_tuple_by_cols().  While all
    those functions remain unchanged now the prefered function for catalog
    tuple modification is heap_update_tuple().
    
    Old:
    bool    nulls[Natts_pg_type];
    bool    replaces[Natts_pg_type];
    Datum   values[Natts_pg_type];
    
    values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType);
    nulls[Anum_pg_type_typdefaultbin - 1] = true;
    replaces[Anum_pg_type_oid - 1] = false;
    
    tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
    CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
    
    New:
    Datum       values[Natts_pg_type] = {0};
    bool        nulls[Natts_pg_type] = {false};
    Bitmapset  *updated = NULL;
    
    HeapTupleUpdateValue(pg_type, typtype, CharGetDatum(typeType), values,
    nulls, updated);
    HeapTupleUpdateValueNull(pg_type, typdefaultbin, values, nulls, updated);
    HeapTupleSetColumnNotUpdated(pg_type, oid, updated);
    
    tuple = heap_update_tuple(tuple, desc, values, nulls, updated);
    CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple, updated, NULL);
    
    In addition CatalogTupleUpdate and CatalogTupleInsert now have an
    optional argument for the CatalogIndexState, when not provided it is
    created and then released for the caller.  This eliminates the need for
    their "WithInfo" companion functions.
    
    heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
    but takes a Bitmapset called "updated" rather than an array of bool
    (generally) called "replaces" as a method for indicating what was
    modified. Additionally, this new function tries to balance the
    trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
    the ratio of attributes updated and their known runtime complexities. 
    Both paths are functionally equivalent.
    
    The changes also include initialization of the values/nulls arrays
    rather than loops or memset().  Some effort was also given to unify
    naming of these variables and their order so as to lower cognitive
    overhead.
    
    So, let me dive into the attached patches.  On advice from my mentor
    Noah Misch I've divided the changes in two.  The first patch contains
    all the infrastructure and the set of files where applying the changes
    got "interesting" and should be highlighted.  The second is the
    mechanical changes to the remaining files, and it's HUGE, apologies for
    the blast radius here but it couldn't be helped.
    
    0001 - Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
    
    htup.h: is where you'll find the new macros.  There is a "getter" called
    HeapTupleValue() to find the field in the values array and not worry
    about "-1".  After that, they break down into a two cases, insert and
    update, with forms addressing method 1 (Form/GETSTRUCT) and method 2 (values/nulls/updated).
    
    Insert macros take the form "HeapTupleSet...()" as in
    "HeapTupleSetField(...)" or "HeapTupleSetValue(...)".  There is the
    assignment to NULL which is "HeapTupleSetFieldNull(...)" and
    "HeapTupleSetValueNull(...)" as well.
    
    Arguments to these macros are:
       table_name, field, value, (nulls), (form_ptr)
    
    These macros simply translate back to the forms that we use now and in
    some cases combine work or ensure correctness.
    
    Update macros follow a similar pattern "HeapTupleUpdate...()" and then
    have "HeapTupleUpdateField..." and "HeapTupleUpdateValue..." cases to
    cover the two methods we use.
    
    Arguments are again uniform:
        table_name, field, value, values, nulls, updated
    
    There are additional macros for a few update-related special cases as well:
    
    * HeapTupleMarkColumnUpdated() - not something you'll normally use as
    the update methods all capture this information already, but there are
    occasions where this is useful.
    
    * HeapTupleUpdateSetAllColumnsUpdated() - 99.9% of the time we have
    historically started off with a "replaces" array where every value in
    that array is "false", but there are a few places where we had occasion
    to do the opposite, start with the assumption that all attributes were
    modified.  This covers that case setting all the bits in the "updated"
    Bitmapset correctly.
    
    
    heaptuple.c: you'll find here my new heap_update_tuple() function.  This
    one is interesting as it tries to balance out the cost of heap_getattr()
    vs heap_deform_tuple().  Why did I bother?  Read the comment on
    heap_modify_tuple(), and I was there... so why not?  You may disagree
    with my approach, here's a portion of the comment that explains it:
    
     * Performance strategy:
     * - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
     *   only the few non-updated attributes. This is O(k*n) where k is the number
     *   of non-updated attributes, which is small when updating many.
     * - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to
     *   extract all attributes at once (O(n)), then replace the updated ones.
     *   This avoids the O(n^2) cost of many heap_getattr() calls.
     *
     * The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
     * against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated.
    
    
    indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
    their "WithInfo" companions.  If that second part is controversial then
    I don't mind reverting it, but IMO this is cleaner and might either
    inspire more call sites to create and reuse CatalogIndexState or for
    someone (me?) to address the comment that pre-dates my work:
    
     * XXX: At some point we might cache the CatalogIndexState data
    somewhere (perhaps
     * in the relcache) so that callers needn't trouble over this.
    
    The remaining files in this patch (we're still on 0001) are in the
    category of *interesting* or non-obvious changes I ran into that were
    "fun" to fix...
    
    
    pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
    where the code code either update or insert.  In such cases hackers
    should use the macros for update, not insert, everywhere.  This records
    the additional information for the update path and doesn't burden the
    insert path with much more than a Bitmapset to free.
    
    This reminds me, there is also the case where a function is looping and
    updating a number of catalog tuples.  Care must be taken to scope the
    Bitmapset and free it every iteration as in:
    
    while( doing all the things )
    {
       Bitmapset updated = NULL;
    
       ... update a tuple
       CatalogTupleUpdate();
       bms_free(updated);
    }
    
    This also comes up when a function has multiple updates not in a loop. 
    Don't reuse the Bitmapset, just scope it.  If you do reuse it then:
    
    bms_free(updated);
    updated = NULL;
    
    and later... when you start using it again:
    
    Assert(bms_is_empty(updated));
    
    
    pg_constraint.c: brings us to namestrcpy(), take a look at
    RenameConstraintById().  When you run into namestrcpy() you've likely
    just done the mutation to the form directly but you'll want to mark that
    field as updated, as in:
    
    namestrcpy(&(con->conname), newname);
    HeapTupleMarkColumnUpdated(pg_constraint, conname, updated);
    
    This pattern is also needed after functions like pg_add_s32_overflow(),
    just HeapTupleMarkColumnUpdated() and you're fine.  I used the regex:
    "namestrcpy|pg_.*_.*flow" to find these cases, YMMV.
    
    
    alter.c: take a look at AlterObjectRename_internal() which uses
    get_object_attnum_name() to get the AttrNumber that is to be altered. 
    This is fine, but it's not something we can just plug into a macro and
    get the proper name expansion.  So, we do it the manual way and add a comment:
    
    /*
     * NOTE: We can't use the HeapTupleMarkColumnUpdated() macro here because
     * 'Anum_name' isn't a table/column name, it's a index for the relation
     * passed into the function as an argument.
     */
    namestrcpy(&nameattrdata, new_name);
    values[Anum_name - 1] = NameGetDatum(&nameattrdata);
    updated = bms_add_member(updated, Anum_name - FirstLowInvalidHeapAttributeNumber);
    
    
    analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
    update_attstats() function to match the replaced code.  Later there's
    another interesting deviation from the normal pattern in that function
    where we want to update the nth stakind/opt/coll this was frustrating at
    first until I found that this worked:
    
    HeapTupleSetValue(pg_statistic, stakind1 + k,
    Int16GetDatum(stats->stakind[k]), values);
    
    I'll be "cpp", here's what that turns into:
    
    (values)[Anum_pg_statistic_stakind1 + k - 1] = (Int16GetDatum(stats->stakind[k]));
    
    So, we're taking the attribute number adding "k" then subtracting 1, and
    voila "Bob's your uncle."  While a bit unconventional, I think the
    resulting form is understandable, much more than before which was:
    
    i = Anum_pg_statistic_stakind1 - 1;
    for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
    	values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
    
    
    attribute_stats.c: take a look at set_stats_slot() where I have a
    comment explaining this "field + n" method adding a "Don't Panic" might
    help too.  Here's that comment:
    
    /*
     * NOTE: This might seem odd, to be adding 'i' to the name of the field on
     * these macros, but that's what we need to do here to find the proper
     * slot offset and record the proper value into the updated bitmap.
     *
     * Example: HeapTupleValue(pg_statistic, stakind1 + i, values);
     *
     * Becomes: values[Anum_pg_statistic_stakind1 + i - 1];
     *
     * The result is you're indexing the i'th value, exactly what we needed.
     */
    
    The really amazing and valuable piece here is how that plays out in the following:
    
    HeapTupleUpdateValue(pg_statistic, stakind1 + i, Int16GetDatum(stakind),
    values, nulls, updated);
    
    Playing the role of "cpp" again we get:
    
    do { \
    	(values)[Anum_statistic_stakind1 + i - 1] = (Datum)
    (Int16GetDatum(stakind)); \
    	(nulls)[Anum_statistic_stakind1 + i - 1] = false; \
    	Anum_statistic_stakind1 - (-7))
    } while(0);
    
    The result is that we've set the value/isnull at the correct position in
    both the values and nulls arrays and then added the correct attribute
    number into the Bitmapset (drops the mic).
    
    
    Okay, and that about does it for "0001" with the foundation and then the
    odd cases.
    
    0002 - Update the remainder of catalog updates using the new APIs
    
    This is a lot of applying the pattern over and over until it all works. 
    That's it.  Nothing much more interesting here.
    
    
    This is a lot of change, I get that, I think there's value in change
    when the result is cleaner and more maintainable.  The one place I'll
    agree up front where this might case problems is back-patching fixes. 
    That'll have to have the new approach in some branches and the old
    approach in others.  It's a 5-year commitment, I don't take that lightly.
    
    If you made it this far, I owe you a beer/coffee/token-of-appreciation. 
    Without this patch my [1] will have an ugly wart
    (HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only.  I
    can live with that, I'd rather not.
    
    best,
    
    -greg
    
    
    [1] https://www.postgresql.org/message-id/flat/78574B24-BE0A-42C5-8075-3FA9FA63B8FC%40amazon.com