Thread

Re: crash-safe visibility map, take five

Robert Haas <robertmhaas@gmail.com> — 2011-06-19T02:04:52Z
On Sat, Jun 18, 2011 at 9:37 AM, Noah Misch <noah@leadboat.com> wrote:
> Indeed.  If we conflicted on the xmin of the most-recent all-visible tuple
> when setting the bit, all-visible on the standby would become a superset of
> all-visible on the master, and we could cease to ignore the bits.  This is an
> excellent trade-off for masters that do UPDATE and DELETE, because they're
> already conflicting (or using avoidance measures) on similar xids at VACUUM
> time.  (Some INSERT-only masters will disfavor the trade-off, but we could
> placate them with a GUC.  On the other hand, hot_standby_feedback works and
> costs an INSERT-only master so little.)

I hadn't really considered the possibility that making more things
conflict on the standby server could work out to a win.  I think that
would need some careful testing, and I don't want to get tied up in
that for purposes of this patch.  There is no law against a WAL format
change, so we can always do it later if someone wants to do the
legwork.

> No problem.  I ran these test cases with the new patch:
>
> -Description-                                           -Expected bits-         -Result-
> set bit, VACUUM commit, crash           1,5                                     1,5
> set bit, crash                                          0,1                                     0,1
> set bit, flush heap page, crash         0,5                                     0,5
> set bit, flush vm page, crash           1,5                                     1,5
>
> xlog flush request locations look correct, too.  Overall, looking good.  Do
> you know of other notable cases to check?  The last two were somewhat tricky
> to inject; if anyone wishes to reproduce them, I'm happy to share my recipe.

Those sound like interesting recipes...

> One more thing came to mind: don't we need a critical section inside the "if"
> block in visibilitymap_set()?  I can't see anything in there that could
> elog(ERROR, ...), but my general impression is that we use one anyway.

Seems like a good idea.

> I ran one repetition of my benchmark from last report, and the amount of WAL
> has not changed.  Given that, I think the previous runs remain valid.

As far as I can see, the upshot of those benchmarks was that more WAL
was generated, but the time to execute vacuum didn't really change.  I
think that's going to be pretty typical.  In your test, the additional
WAL amounted to about 0.6% of the amount of data VACUUM is dirtying,
which I think is pretty much in the noise, assuming wal_buffers and
checkpoint_segments are tuned to suitable values.

The other thing to worry about from a performance view is whether the
push-ups required to relocate the clear-the-vm-bits logic to within
the critical sections is going to have a significant impact on
ordinary insert/update/delete operations.  I was a bit unsure at first
about Heikki's contention that it wouldn't matter, but after going
through the exercise of writing the code I think I see now why he
wasn't concerned.  The only possibly-noticeable overhead comes from
the possibility that we might decide not to pin the visibility map
buffer before locking the page(s) we're operating on, and need to
unlock-pin-and-relock.  But for that to happen, VACUUM's got to buzz
through and mark the page all-visible just around the time that we
examine bit.  And I have a tough time imagining that happening very
often.  You have to have a page that has been modified since the last
VACUUM, but not too recently (otherwise it's not actually all-visible)
and then VACUUM has to get there and process it at almost the same
moment that someone decides to make another modification.  It's hard
even to think about an artificial test case that would hit that
situation with any regularity.

>> Hmm, now that I look at it, I think this test is backwards too.  I
>> think you might be right that it wouldn't hurt anything to go ahead
>> and set it, but I'm inclined to leave it in for now.
>
> Okay.  Consider expanding the comment to note that this is out of conservatism
> rather than known necessity.

Done.

> Probably not worth mentioning, but there remains one space instead of one tab
> after "visibilitymap_clear" in this line:
>  *              visibilitymap_clear      - clear a bit in the visibility map

Gee, I'm not sure whether there's a one true way of doing that.  I
know we have a no-spaces-before-tabs rule but I'm not sure what the
guidelines are for indenting elsewhere in the line.

> FWIW, I just do C-s TAB in emacs and then scan for places where the boundaries
> of the highlighted regions fail to line up vertically.

vim.

>> > This fills RelationGetBufferForTuple()'s needs, but the name doesn't seem to
>> > fit the task too well. ?This function doesn't check the pin or that the buffer
>> > applies to the correct relation. ?Consider naming it something like
>> > `visibilitymap_buffer_covers_block'.
>>
>> I think that this may be related to the fact that visibilitymap_pin()
>> doesn't do what you might expect.  But that name is pre-existing, so I
>> kept it, and I don't really want this to be something that sounds
>> totally unrelated.
>
> Ah, so visibilitymap_pin_ok() answers the question "Will visibilitymap_pin()
> be a long-winded no-op given these arguments?"

Bingo.  In HEAD, there's no real reason to care; you may as well just
try it and see what happens.  But the buffer lock dance makes it
necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company