Thread

  1. Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2026-05-12T11:47:23Z

    On Fri, 8 May 2026 at 16:40, Salma El-Sayed <salmasayed182003@gmail.com> wrote:
    >
    > Hello Hackers,
    >
    > My name is Salma El-Sayed, and I am working on B-tree Index Bloat Reduction as part of GSoC 2026. This is my introduction email[1]. I would like to share my current approach for page merging and get your feedback.
    >
    > *The Core Idea*
    > Two sibling B-tree pages L (left) and R (right) are candidates for merging when both pages are below a size threshold (less than 10% full for example) and their combined size fits within the fill factor.
    >
    > The merge copies all data from L into R, then:
    > 1- R is marked BTP_MERGED --> it now contains L's data.
    > 2- L is marked BTP_MERGED_AWAY --> it is logically gone but kept around temporarily for any active scan that still holds a reference to it.
    > 3- L is unlinked from its parent. VACUUM will handle updating the right-link of L's left sibling as part of normal cleanup.
    
    The merging of pages is the simple part. Making scans not skip or
    return duplicated tuples is the hard part.
    
    I think it's useful and important to have an up-to-date design with
    all such considerations included, so that reviewers also are able to
    have the whole picture in mind when providing guidance.
    
    > *Questions for the Community*
    >
    > 1- Triggering the merge: how aggressive should we be?
    >
    > How should the merge process be triggered? I see two broad options:
    > a- Full-index scan: aggressively find all merge candidates in one pass.
    > b- Bounded scan: a separate dedicated scan that stops after a controlled number of pages, giving the user control over how much of the index to process at a time.
    
    I suggest to keep track of candidate pages (i.e., leaf pages with more
    than X% free space) in the btree vacuum scans. Whenever you hit a page
    which are a merge candidate (or, would become a merge candidate), you
    check if any of its sibling pages are already present as merge
    candidate in a list of candidate merge pages. If one is, merge the
    pages. If not; then add the current page not; and check if siblings of
    the current page are in our candidate list. If so, try to merge with
    the sibling; if they're not (or the merge failed), then you add the
    current page to that candidate list.
    
    Presumably, the amount of active candidates would remain small enough
    to fit in a reasonable amount of memory; as merged candidates can be
    removed once they've been used.
    
    > A possible interface for the candidate scanner might look something like:
    > bt_merge_pages(index regclass, min_fillfactor float8, max_fillfactor float8, max_pages int4)
    
    I don't think a separate scan method for this makes much sense.
    
    > 2- Flag naming and free bits
    >
    > I am proposing two new flags in btpo_flags:
    > - BTP_MERGED: set on R to indicate it absorbed L's data.
    > - BTP_MERGED_AWAY: set on L to indicate it has been merged away.
    >
    >  ** Are there free bits available in btpo_flags suitable for this purpose? And does the community see any objection to consuming two bits in this field for the merge mechanism?
    
    btpo_flags has 16 bits, of which 7 don't yet have a meaning.
    
    You might even be able to combine one new bit with BTP_HALF_DEAD, to
    limit the need for new bits to just one: presumably, pages cannot be
    both HALF_DEAD and MERGED/MERGED_AWAY at the same time.
    
    
    Kind regards,
    
    Matthias van de Meent
    
    
    PS.
    Some food for thought and questions in return:
    
    1.) Page space usage
    The idea that you shared in [0] described a page-level "Merge ID".
    Adding this to the page would affect the amount of space available on
    btree pages (for at least merged btree pages), and that seems to imply
    a possibly backward-incompatible change to currently indexed tables
    (less space on a page available -> smaller max tuple size -> current
    indexed data may not be reindexable with the new checks in place).
    
    Have you thought about this, and/or figured out whether the decrease
    in page space is relevant or can safely be worked around?
    
    2.) changes to page lifecycle operations
    There are various concurrent operations that may impact scan
    consistency. You've covered how to recover from merges with documents
    linked to from your earlier mail, but it's not clear to me if these
    things work without causing issues w.r.t. scan output:
    
      a. Deleting a BTP_MERGED page
        When vacuum runs a second time all items on the target page may
    now be removable; can that page be deleted safely?
    
      b. Deleting a BTP_MERGED_AWAY page
        BTP_MERGED_AWAY pages keep a copy of their old tuples around,
    which you mention are used by backward scans. This means its contents
    must also be cleaned up by subsequent VACUUM runs, as the backwards
    IOS may otherwise return TIDs that have been recycled and recieved new
    indexed values. This cleanup can result in an empty page - which can
    happen earlier than the XID horizon in the MergeID. Does this design
    allow those pages to be reclaimed?
    
      c. A BTP_MERGED page must be split
       What happens? And what happens if it keeps splitting, before any
    further vacuum (maintenance) operations can affect the merged
    contents?
    
      d. Are BTP_MERGED_AWAY pages still part of the data structure?
        So, is L still pointed to by both L's left sibling and R, or is it
    immediately removed from the structure (or at least as immediate as a
    HALF_DEAD page would)?
        If it's kept in the structure for extended periods: Why?
    
      e. When/how does VACUUM determine to recycle the BTP_MERGED_AWAY
    page (mark it BTP_HALF_DEAD or BTP_DELETED)?
        The docs [1] linked from [0] indicate MergeID is present only on
    BTP_MERGED pages, so it can't be used to locally determine to start
    the recycling process for a BTP_MERGED_AWAY page. What process is used
    to clean up this index page?
    
    3.) General performance
        I'm worried about the cost of maintaining the scan boundary;
    especially for index-only scans. Though it's likely you'll only have
    to keep track of this at page boundaries, it's still a new and
    possibly significant overhead; index key boundaries are not always as
    small as int4/int8 + TID.
    
    4.) Workload
        Do you have an example workload that you expect to show improvements?
    
    5.) Inner pages
        Are inner pages ever considered for merging, or are you only
    targeting leaf pages?
    
    
    [0]: https://postgr.es/m/CANBEAPFW64QUGmMYoPekV6jGrn5Yx8kTNfeEtcTG9CCBUUBoLQ@mail.gmail.com
    [1]: https://docs.google.com/document/d/1PIMG0N__4BIB0uDWOfcK-AruatNL8SCTlfTlBTCaoCo/edit?tab=t.0