Thread

  1. Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array

    Xuneng Zhou <xunengzhou@gmail.com> — 2025-12-16T10:54:09Z

    Hi,
    
    On Tue, Dec 16, 2025 at 6:29 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    >
    > On Mon, Nov 10, 2025 at 11:22 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    > >
    > > Hi,
    > >
    > > With a sorted commited.xip array, we could replace the iteration with
    > > two binary searches to find the interval to keep.
    > >
    > > Proposed Optimization
    > > ---------------------
    > >
    > > Use binary search to locate the boundaries of XIDs to remove, then
    > > compact with a single memmove. The key insight requires understanding
    > > how XID precedence relates to numeric ordering.
    > >
    > > XID Precedence Definition
    > > ~~~~~~~~~~~~~~~~~~~~~~~~~~
    > >
    > > PostgreSQL defines XID precedence as:
    > >
    > > /* compare two XIDs already known to be normal; this is a macro for speed */
    > > #define NormalTransactionIdPrecedes(id1, id2) \
    > > (AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
    > > (int32) ((id1) - (id2)) < 0)
    > >
    > > This means: id1 precedes id2 if (int32)(id1 - id2) < 0.
    > >
    > > Equivalently, this identifies all XIDs in the modular interval
    > > [id2 - 2^31, id2) on the 32-bit ring as "preceding id2". So XIDs
    > > preceding xmin are exactly those in [xmin - 2^31, xmin).
    > >
    > > From Modular Interval to Array Positions
    > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    > >
    > > The arrays are sorted in numeric uint32 order (xip[i] <= xip[i+1] in
    > > unsigned sense), which is a total order—not wraparound-aware. Therefore,
    > > the modular interval we want to remove may appear as one or two numeric
    > > blocks in the sorted array.
    > >
    > > Let boundary = xmin - 2^31 (mod 2^32). The modular interval [boundary, xmin)
    > > contains all XIDs to remove (half-open: xmin itself is kept, matching
    > > NormalTransactionIdPrecedes). Where does it appear in the numerically sorted
    > > array?
    > >
    > > Case A: (uint32)boundary <= (uint32)xmin (numeric no wrap)
    > >   Example: xmin = 3,000,000,000
    > >            boundary = 3,000,000,000 - 2,147,483,648 = 852,516,352
    > >
    > >   Here, (uint32)boundary < (uint32)xmin, so the interval does not cross
    > >   0 numerically. In the sorted array, XIDs to remove form one contiguous
    > >   block: [idx_boundary, idx_xmin).
    > >
    > >   Array layout:
    > >     [... keep ...][=== remove ===][... keep ...]
    > >     0 ............ idx_boundary ... idx_xmin ...... n
    > >
    > >   Action: Keep prefix [0, idx_boundary) and suffix [idx_xmin, n).
    > >
    > > Case B: (uint32)boundary > (uint32)xmin (numeric wrap)
    > >   Example: xmin = 100
    > >            boundary = 100 - 2^31 (mod 2^32) = 2,147,483,748
    > >
    > >   Since (uint32)boundary > (uint32)xmin, the interval wraps through 0
    > >   numerically. In the sorted array, XIDs to remove form two blocks:
    > >   [0, idx_xmin) and [idx_boundary, n).
    > >
    > >   Array layout:
    > >     [= remove =][... keep ...][= remove =]
    > >     0 ......... idx_xmin .... idx_boundary ......... n
    > >
    > >   Action: Keep only the middle [idx_xmin, idx_boundary).
    > >
    > > Note: Case B often occurs when xmin is "small" (e.g., right after
    > > startup), making xmin - 2^31 wrap numerically. This is purely about
    > > positions in the numeric order; it does not imply the cluster has
    > > "wrapped" XIDs operationally.
    > >
    > > In both cases, we locate idx_boundary and idx_xmin using binary search
    > > in O(log n) time, then use one memmove to compact
    > >
    > > The algorithm:
    > > 1. Compute boundary = xmin - 2^31
    > > 2. Binary search for idx_boundary (first index with xip[i] >= boundary)
    > > 3. Binary search for idx_xmin (first index with xip[i] >= xmin)
    > > 4. Use memmove to compact based on case A or B above
    > >
    > > Benefits
    > > --------
    > >
    > > 1. Performance: O(log n) binary search vs O(n) linear scan
    > > 2. Memory: No workspace allocation needed
    > > 3. Simplicity: One memmove instead of allocate + two copies + free
    > >
    > > The same logic is applied to both committed.xip and catchange.xip arrays.
    > >
    > > Faster binary search
    > > --------
    > >
    > > While faster binary search variants exist, the current code already
    > > introduces more complexity than the original. It’s uncertain that
    > > further optimization would deliver a meaningful performance gain.
    > >
    >
    > Adapt the patch with two-phase optimization:
    >
    > - Pre-CONSISTENT: Use in-place compaction O(n) since committed.xip is
    > unsorted during this phase.
    >
    > - Post-CONSISTENT: Use binary search O(log n) since committed.xip is
    > maintained in sorted order after reaching consistency.
    >
    
    v3-0001 fixes a critical issue where the snapshot->xip array in
    SnapBuildBuildSnapshot might not be sorted before reaching the
    consistent state. Sorry for the noise here.
    
    -- 
    Best,
    Xuneng