Thread

  1. Re: WIP: Fast GiST index build

    Alexander Korotkov <aekorotkov@gmail.com> — 2011-08-07T19:28:10Z

    Hi!
    
    There is last version of patch. There is the list of most significant
    changes in comparison with your version of patch:
    1) Reference counting of path items was added. It should helps against
    possible accumulation of path items.
    2) Neighbor relocation was added.
    3) Subtree prefetching was added.
    4) Final emptying algorithm was reverted to the original one. My experiments
    shows that typical number of passes in final emptying in your version of
    patch is about 5. It may be significant itself. Also I haven't estimate of
    number of passes and haven't guarantees that it will not be high in some
    corner cases. I.e. I prefer more predictable single-pass algorithm in spite
    of it's a little more complex.
    5) Unloading even last page of node buffer from main memory to the disk.
    Imagine that that with levelstep = 1 each inner node has buffer. It seems to
    me that keeping one page of each buffer in memory may be memory consuming.
    
    Open items I see at this moment:
    1) I dislike my switching to buffering build method because it's based on
    very unproven assumptions. And I didn't more reliable assumptions in
    scientific papers while. I would like to replace it with something much
    simplier. For example, switching to buffering build when regular build
    actually starts to produce a lot of IO. For this approach implementation I
    need to somehow detect actual IO (not just buffer read but miss of OS
    cache).
    2) I'm worrying about possible size of nodeBuffersTab and path items. If we
    imagine extremely large tree with levelstep = 1 size of this datastructures
    will be singnificant. And it's hard to predict that size without knowing of
    tree size.
    
    ------
    With best regards,
    Alexander Korotkov.