Thread

  1. Re: Add a berief general comment on BTScanInsertData's nextkey and backward

    Yugo Nagata <nagata@sraoss.co.jp> — 2025-12-31T10:21:24Z

    On Tue, 18 Nov 2025 20:15:40 -0500
    Peter Geoghegan <pg@bowt.ie> wrote:
    
    > On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
    > > I've attached a patch that adds the following comment:
    > >
    > > + * nextkey determines how the scankey's boundary is interpreted, and backward
    > > + * indicates a backward scan.  See comments in _bt_first for a more detailed
    > > + * explanation of these fields.
    > >
    > > What do think?
    > 
    > Seems reasonable to me.
    
    Thank you for your review.
    
    > 
    > I wonder if we also do something about these existing _bt_binsrch comments:
    > 
    >  * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber
    >  * of the last key < given scankey, or last key <= given scankey if nextkey
    >  * is true.  (Since _bt_compare treats the first data key of such a page as
    >  * minus infinity, there will be at least one key < scankey, so the result
    >  * always points at one of the keys on the page.)
    > 
    > Here we describe what happens on an internal page. This is correct,
    > but *seems* to contradict the higher level comments at the end of
    > _bt_first. There is actually no contradiction between the two -- not
    > when you understand that _bt_first describes the whole end-to-end
    > process of calling _bt_search and then calling _bt_binsrch on the leaf
    > page (not of calling _bt_binsrch once, against an internal page).
    > 
    > One could think of this _bt_binsrch internal page behavior as
    > compensating for the fact that internal pages have pivot tuples that
    > consist of a separator key (except for the first key, which is just
    > has a -inf key/no key), plus a downlink that points to the *next* page
    > after that separator key one level down (except for the internal page
    > high key, which has no downlink at all). Might be useful to say
    > something like that instead.
    > 
    > This is hard to explain without an example. Right now, an internal
    > page might have pivot tuples that look like this:
    > 
    > Separator: -inf, Downlink: 1
    > Separator: 'a', Downlink: 2
    > Separator: 'c', Downlink: 3
    > Separator: 'f', Downlink: (none, this is the high key)
    > 
    > But _bt_binsrch "pretends" that our internal page actually contains:
    > 
    > Downlink: 1
    > Separator: 'a'
    > Downlink: 2
    > Separator: 'c'
    > Downlink: 3
    > Separator: 'f'
    > 
    > So if our = scan key is (say) 'c', we should descend using the
    > downlink that points to block 2 (not the one that points to block 3,
    > as might be expected from looking at the real physical representation
    > consisting of pivot tuples). That is what the scan needs to do to get
    > to the page one level down whose high key is also 'c' -- that's where
    > the scan needs to look. (If the next level down is the leaf level,
    > then the leaf page we descend to might also contain a *non*-pivot
    > tuple with the key value 'c', "right before" the high key with 'c',
    > which the scan will need to return in _bt_readpage. Lehman & Yao allow
    > the key before a leaf page's high key to be equal to the high key, but
    > otherwise forbid all duplicate keys.)
    > 
    > I find it very hard to know what explanation will be the least
    > confusing to other people, at least in this area. Since you're
    > interested in this area, I thought I'd ask what you think.
    
    I do not see any contradiction between the comment on _bt_binsrch and
    the comments at the end of _bt_first. The former explicitly states that
    it refers to internal (non-leaf) pages, and I understand the latter to
    describe loading data from a leaf page.
    
    That said, we could make this clearer by explicitly mentioning the leaf
    page in the first sentence, for example:
    
          * Now load data from the first leaf page of the scan (usually the
          *page currently in so->currPos.buf).
    
    Regards,
    Yugo Nagata
    
    -- 
    Yugo Nagata <nagata@sraoss.co.jp>