Thread
-
Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
Greg Burd <greg@burd.me> — 2025-11-14T15:31:28Z
Hello, I've been working on expanding HOT updates [1] and have recently had some success (will be posted soon to the thread) moving the work done within HeapDetermineColumnsInfo() into the executor. I found that heap_update() is used in two paths: heap_tuple_update() and simple_tuple_update(), both call into heap_update() which then uses HeapDetermineColumnsInfo() to find the set of indexed attributes that have been modified during the update. This makes sense on the normal path from the executor, but the simple_tuple_update() path is only called from CatalogTupleUpdate() and when I looked at how catalog tuples were formed/mutated on that path I discovered something that I had to fix. Catalog tuples have knowledge of what attributes are mutated implicitly, but they don't preserve that information and so HeapDetermineColumnsInfo() has to re-discover that later on (while the page is locked). While on the surface this isn't such a big deal, it's been that way and worked well for years I have other motivations (see [1] again) for improving this. So, I set about to create a way to retain the knowledge of what was modified while mutating catalog tuples and pass that information on to CatalogTupleUpdate() as a Bitmapset. That is the underlying goal of this patch set. To make no behavioral changes, only syntactic ones, and end up with something better overall. That turned out to be a bit of a challenge as we have a lot of places where catalog information is updated. We also have two different methods for updating said information. And we intermix them. At times we hack our way to a solution and ignore convention. I went down the rabbit hole on this one, but I'm back up for a cup of tea because what I've done seems materially better to me and I've accomplished the goal (and can eventually make use of the result in [1]). Is this patch useful even without that other work? I think so, just on the basis of cleanup. Let me explain. Historically there have been two methods for modifying heap tuples destined for the catalog; Form/GETSTRUCT, and values/nulls/replaces. This new method provides a more intuitive and less error-prone approach without changing the fundamentals of the process. In addition, it is now possible to retain knowledge of the set of mutated attributes when working with catalog tuples. This isn't used in practice (yet [1]), but could be a way to avoid the overhead of HeapDetermineColumnsInfo() in heap_update() where (while holding a lock) we re-discover that set by comparing old/new HeapTuple datums when trying to find what indexed attributes have new values and prevent HOT updates. Method 1: Form/GETSTRUCT The "Form/GETSTRUCT" model allows for direct access to the tuple data that is then modified, copied, and then updated via CatalogTupleUpdate(). Old: Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple); relform->inisclustered = false; CatalogTupleUpdate(pg_index, &tuple->t_self, tuple); New: Bitmapset *updated = NULL; Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple); HeapTupleUpdateField(pg_index, indisclustered, false, indexForm, updated); CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple, updated, NULL); bms_free(updated); Method 2: values/nulls/replaces The "values/nulls/replaces" model collects the necessary information to either update or create a heap tuple using heap_modify_tuple() or heap_form_tuple() or sometimes heap_modify_tuple_by_cols(). While all those functions remain unchanged now the prefered function for catalog tuple modification is heap_update_tuple(). Old: bool nulls[Natts_pg_type]; bool replaces[Natts_pg_type]; Datum values[Natts_pg_type]; values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType); nulls[Anum_pg_type_typdefaultbin - 1] = true; replaces[Anum_pg_type_oid - 1] = false; tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces); CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple); New: Datum values[Natts_pg_type] = {0}; bool nulls[Natts_pg_type] = {false}; Bitmapset *updated = NULL; HeapTupleUpdateValue(pg_type, typtype, CharGetDatum(typeType), values, nulls, updated); HeapTupleUpdateValueNull(pg_type, typdefaultbin, values, nulls, updated); HeapTupleSetColumnNotUpdated(pg_type, oid, updated); tuple = heap_update_tuple(tuple, desc, values, nulls, updated); CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple, updated, NULL); In addition CatalogTupleUpdate and CatalogTupleInsert now have an optional argument for the CatalogIndexState, when not provided it is created and then released for the caller. This eliminates the need for their "WithInfo" companion functions. heap_update_tuple() is functionally equivalent to heap_modify_tuple(), but takes a Bitmapset called "updated" rather than an array of bool (generally) called "replaces" as a method for indicating what was modified. Additionally, this new function tries to balance the trade-offs of calling heap_getattr() versus heap_deform_tuple() based on the ratio of attributes updated and their known runtime complexities. Both paths are functionally equivalent. The changes also include initialization of the values/nulls arrays rather than loops or memset(). Some effort was also given to unify naming of these variables and their order so as to lower cognitive overhead. So, let me dive into the attached patches. On advice from my mentor Noah Misch I've divided the changes in two. The first patch contains all the infrastructure and the set of files where applying the changes got "interesting" and should be highlighted. The second is the mechanical changes to the remaining files, and it's HUGE, apologies for the blast radius here but it couldn't be helped. 0001 - Refactor how we form HeapTuples for CatalogTuple(Insert|Update) htup.h: is where you'll find the new macros. There is a "getter" called HeapTupleValue() to find the field in the values array and not worry about "-1". After that, they break down into a two cases, insert and update, with forms addressing method 1 (Form/GETSTRUCT) and method 2 (values/nulls/updated). Insert macros take the form "HeapTupleSet...()" as in "HeapTupleSetField(...)" or "HeapTupleSetValue(...)". There is the assignment to NULL which is "HeapTupleSetFieldNull(...)" and "HeapTupleSetValueNull(...)" as well. Arguments to these macros are: table_name, field, value, (nulls), (form_ptr) These macros simply translate back to the forms that we use now and in some cases combine work or ensure correctness. Update macros follow a similar pattern "HeapTupleUpdate...()" and then have "HeapTupleUpdateField..." and "HeapTupleUpdateValue..." cases to cover the two methods we use. Arguments are again uniform: table_name, field, value, values, nulls, updated There are additional macros for a few update-related special cases as well: * HeapTupleMarkColumnUpdated() - not something you'll normally use as the update methods all capture this information already, but there are occasions where this is useful. * HeapTupleUpdateSetAllColumnsUpdated() - 99.9% of the time we have historically started off with a "replaces" array where every value in that array is "false", but there are a few places where we had occasion to do the opposite, start with the assumption that all attributes were modified. This covers that case setting all the bits in the "updated" Bitmapset correctly. heaptuple.c: you'll find here my new heap_update_tuple() function. This one is interesting as it tries to balance out the cost of heap_getattr() vs heap_deform_tuple(). Why did I bother? Read the comment on heap_modify_tuple(), and I was there... so why not? You may disagree with my approach, here's a portion of the comment that explains it: * Performance strategy: * - If updating many attributes (> 2*natts/3), use heap_getattr() to extract * only the few non-updated attributes. This is O(k*n) where k is the number * of non-updated attributes, which is small when updating many. * - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to * extract all attributes at once (O(n)), then replace the updated ones. * This avoids the O(n^2) cost of many heap_getattr() calls. * * The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple * against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated. indexing.c: has the change to CatalogTupleUpdate/Insert and removal of their "WithInfo" companions. If that second part is controversial then I don't mind reverting it, but IMO this is cleaner and might either inspire more call sites to create and reuse CatalogIndexState or for someone (me?) to address the comment that pre-dates my work: * XXX: At some point we might cache the CatalogIndexState data somewhere (perhaps * in the relcache) so that callers needn't trouble over this. The remaining files in this patch (we're still on 0001) are in the category of *interesting* or non-obvious changes I ran into that were "fun" to fix... pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern where the code code either update or insert. In such cases hackers should use the macros for update, not insert, everywhere. This records the additional information for the update path and doesn't burden the insert path with much more than a Bitmapset to free. This reminds me, there is also the case where a function is looping and updating a number of catalog tuples. Care must be taken to scope the Bitmapset and free it every iteration as in: while( doing all the things ) { Bitmapset updated = NULL; ... update a tuple CatalogTupleUpdate(); bms_free(updated); } This also comes up when a function has multiple updates not in a loop. Don't reuse the Bitmapset, just scope it. If you do reuse it then: bms_free(updated); updated = NULL; and later... when you start using it again: Assert(bms_is_empty(updated)); pg_constraint.c: brings us to namestrcpy(), take a look at RenameConstraintById(). When you run into namestrcpy() you've likely just done the mutation to the form directly but you'll want to mark that field as updated, as in: namestrcpy(&(con->conname), newname); HeapTupleMarkColumnUpdated(pg_constraint, conname, updated); This pattern is also needed after functions like pg_add_s32_overflow(), just HeapTupleMarkColumnUpdated() and you're fine. I used the regex: "namestrcpy|pg_.*_.*flow" to find these cases, YMMV. alter.c: take a look at AlterObjectRename_internal() which uses get_object_attnum_name() to get the AttrNumber that is to be altered. This is fine, but it's not something we can just plug into a macro and get the proper name expansion. So, we do it the manual way and add a comment: /* * NOTE: We can't use the HeapTupleMarkColumnUpdated() macro here because * 'Anum_name' isn't a table/column name, it's a index for the relation * passed into the function as an argument. */ namestrcpy(&nameattrdata, new_name); values[Anum_name - 1] = NameGetDatum(&nameattrdata); updated = bms_add_member(updated, Anum_name - FirstLowInvalidHeapAttributeNumber); analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the update_attstats() function to match the replaced code. Later there's another interesting deviation from the normal pattern in that function where we want to update the nth stakind/opt/coll this was frustrating at first until I found that this worked: HeapTupleSetValue(pg_statistic, stakind1 + k, Int16GetDatum(stats->stakind[k]), values); I'll be "cpp", here's what that turns into: (values)[Anum_pg_statistic_stakind1 + k - 1] = (Int16GetDatum(stats->stakind[k])); So, we're taking the attribute number adding "k" then subtracting 1, and voila "Bob's your uncle." While a bit unconventional, I think the resulting form is understandable, much more than before which was: i = Anum_pg_statistic_stakind1 - 1; for (k = 0; k < STATISTIC_NUM_SLOTS; k++) values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */ attribute_stats.c: take a look at set_stats_slot() where I have a comment explaining this "field + n" method adding a "Don't Panic" might help too. Here's that comment: /* * NOTE: This might seem odd, to be adding 'i' to the name of the field on * these macros, but that's what we need to do here to find the proper * slot offset and record the proper value into the updated bitmap. * * Example: HeapTupleValue(pg_statistic, stakind1 + i, values); * * Becomes: values[Anum_pg_statistic_stakind1 + i - 1]; * * The result is you're indexing the i'th value, exactly what we needed. */ The really amazing and valuable piece here is how that plays out in the following: HeapTupleUpdateValue(pg_statistic, stakind1 + i, Int16GetDatum(stakind), values, nulls, updated); Playing the role of "cpp" again we get: do { \ (values)[Anum_statistic_stakind1 + i - 1] = (Datum) (Int16GetDatum(stakind)); \ (nulls)[Anum_statistic_stakind1 + i - 1] = false; \ Anum_statistic_stakind1 - (-7)) } while(0); The result is that we've set the value/isnull at the correct position in both the values and nulls arrays and then added the correct attribute number into the Bitmapset (drops the mic). Okay, and that about does it for "0001" with the foundation and then the odd cases. 0002 - Update the remainder of catalog updates using the new APIs This is a lot of applying the pattern over and over until it all works. That's it. Nothing much more interesting here. This is a lot of change, I get that, I think there's value in change when the result is cleaner and more maintainable. The one place I'll agree up front where this might case problems is back-patching fixes. That'll have to have the new approach in some branches and the old approach in others. It's a 5-year commitment, I don't take that lightly. If you made it this far, I owe you a beer/coffee/token-of-appreciation. Without this patch my [1] will have an ugly wart (HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only. I can live with that, I'd rather not. best, -greg [1] https://www.postgresql.org/message-id/flat/78574B24-BE0A-42C5-8075-3FA9FA63B8FC%40amazon.com