Thread

  1. Re: crash-safe visibility map, take five

    Robert Haas <robertmhaas@gmail.com> — 2011-06-19T02:04:52Z

    On Sat, Jun 18, 2011 at 9:37 AM, Noah Misch <noah@leadboat.com> wrote:
    > Indeed.  If we conflicted on the xmin of the most-recent all-visible tuple
    > when setting the bit, all-visible on the standby would become a superset of
    > all-visible on the master, and we could cease to ignore the bits.  This is an
    > excellent trade-off for masters that do UPDATE and DELETE, because they're
    > already conflicting (or using avoidance measures) on similar xids at VACUUM
    > time.  (Some INSERT-only masters will disfavor the trade-off, but we could
    > placate them with a GUC.  On the other hand, hot_standby_feedback works and
    > costs an INSERT-only master so little.)
    
    I hadn't really considered the possibility that making more things
    conflict on the standby server could work out to a win.  I think that
    would need some careful testing, and I don't want to get tied up in
    that for purposes of this patch.  There is no law against a WAL format
    change, so we can always do it later if someone wants to do the
    legwork.
    
    > No problem.  I ran these test cases with the new patch:
    >
    > -Description-                                           -Expected bits-         -Result-
    > set bit, VACUUM commit, crash           1,5                                     1,5
    > set bit, crash                                          0,1                                     0,1
    > set bit, flush heap page, crash         0,5                                     0,5
    > set bit, flush vm page, crash           1,5                                     1,5
    >
    > xlog flush request locations look correct, too.  Overall, looking good.  Do
    > you know of other notable cases to check?  The last two were somewhat tricky
    > to inject; if anyone wishes to reproduce them, I'm happy to share my recipe.
    
    Those sound like interesting recipes...
    
    > One more thing came to mind: don't we need a critical section inside the "if"
    > block in visibilitymap_set()?  I can't see anything in there that could
    > elog(ERROR, ...), but my general impression is that we use one anyway.
    
    Seems like a good idea.
    
    > I ran one repetition of my benchmark from last report, and the amount of WAL
    > has not changed.  Given that, I think the previous runs remain valid.
    
    As far as I can see, the upshot of those benchmarks was that more WAL
    was generated, but the time to execute vacuum didn't really change.  I
    think that's going to be pretty typical.  In your test, the additional
    WAL amounted to about 0.6% of the amount of data VACUUM is dirtying,
    which I think is pretty much in the noise, assuming wal_buffers and
    checkpoint_segments are tuned to suitable values.
    
    The other thing to worry about from a performance view is whether the
    push-ups required to relocate the clear-the-vm-bits logic to within
    the critical sections is going to have a significant impact on
    ordinary insert/update/delete operations.  I was a bit unsure at first
    about Heikki's contention that it wouldn't matter, but after going
    through the exercise of writing the code I think I see now why he
    wasn't concerned.  The only possibly-noticeable overhead comes from
    the possibility that we might decide not to pin the visibility map
    buffer before locking the page(s) we're operating on, and need to
    unlock-pin-and-relock.  But for that to happen, VACUUM's got to buzz
    through and mark the page all-visible just around the time that we
    examine bit.  And I have a tough time imagining that happening very
    often.  You have to have a page that has been modified since the last
    VACUUM, but not too recently (otherwise it's not actually all-visible)
    and then VACUUM has to get there and process it at almost the same
    moment that someone decides to make another modification.  It's hard
    even to think about an artificial test case that would hit that
    situation with any regularity.
    
    >> Hmm, now that I look at it, I think this test is backwards too.  I
    >> think you might be right that it wouldn't hurt anything to go ahead
    >> and set it, but I'm inclined to leave it in for now.
    >
    > Okay.  Consider expanding the comment to note that this is out of conservatism
    > rather than known necessity.
    
    Done.
    
    > Probably not worth mentioning, but there remains one space instead of one tab
    > after "visibilitymap_clear" in this line:
    >  *              visibilitymap_clear      - clear a bit in the visibility map
    
    Gee, I'm not sure whether there's a one true way of doing that.  I
    know we have a no-spaces-before-tabs rule but I'm not sure what the
    guidelines are for indenting elsewhere in the line.
    
    > FWIW, I just do C-s TAB in emacs and then scan for places where the boundaries
    > of the highlighted regions fail to line up vertically.
    
    vim.
    
    >> > This fills RelationGetBufferForTuple()'s needs, but the name doesn't seem to
    >> > fit the task too well. ?This function doesn't check the pin or that the buffer
    >> > applies to the correct relation. ?Consider naming it something like
    >> > `visibilitymap_buffer_covers_block'.
    >>
    >> I think that this may be related to the fact that visibilitymap_pin()
    >> doesn't do what you might expect.  But that name is pre-existing, so I
    >> kept it, and I don't really want this to be something that sounds
    >> totally unrelated.
    >
    > Ah, so visibilitymap_pin_ok() answers the question "Will visibilitymap_pin()
    > be a long-winded no-op given these arguments?"
    
    Bingo.  In HEAD, there's no real reason to care; you may as well just
    try it and see what happens.  But the buffer lock dance makes it
    necessary.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company