Merge lp:~dave-cheney/tomb/001-fix-data-race-on-reason into lp:tomb

Proposed by Dave Cheney
Status: Needs review
Proposed branch: lp:~dave-cheney/tomb/001-fix-data-race-on-reason
Merge into: lp:tomb
Diff against target: 11 lines (+2/-0)
1 file modified
tomb.go (+2/-0)
To merge this branch: bzr merge lp:~dave-cheney/tomb/001-fix-data-race-on-reason
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Pending
Review via email: mp+166445@code.launchpad.net

Description of the change

tomb: fix data race on t.reason

t.m.Lock must be held to properly observe changes to t.reason.

https://codereview.appspot.com/9857046/

To post a comment you must log in.
Revision history for this message
Dimiter Naydenov (dimitern) wrote :
Revision history for this message
William Reade (fwereade) wrote :
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

On Thu, May 30, 2013 at 8:28 PM, Dave Cheney <email address hidden> wrote:
> Without this change, tomb.Wait is not guaranteed to see the value
> assigned to t.reason on line 144. To see this in effect, run

Closing the channel in Done, which is meant to be the last thing
executed on the goroutine that the tomb is monitoring, Happens Before
the receiving of the zero value on Wait, according to the memory
model:

"The closing of a channel happens before a receive that returns a zero
value because the channel is closed."

> go test -race launchpad.net/juju-core/cmd/jujud

I see. There's still no race there. It is complaining because there
are two goroutines calling Wait, and a third goroutine calling Done.
It doesn't know that the Kill(nil) that happens on Stop, before one of
the waits, in fact will always hold off until the Done is finished
running, which _also_ calls Kill(nil) before the close of the done
channel, and thus establishes a happens-before relationship that makes
the whole thing safe from a memory model standpoint.

Thanks for silencing the warning, though. I wouldn't expect the race
detector to infer such situations, and it's better to have it quiet.

Please just drop the defer and use the convention:

    Lock()
    reason := t.reason
    Unlock()
    return reason

There's no point in deferring something for logic that you know is
impossible to return early.

gustavo @ http://niemeyer.net

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

On Thu, May 30, 2013 at 8:57 PM, Dave Cheney <email address hidden> wrote:
> Mate, i'm not going to ask you to commit this if it is wrong. If you
> prefer I will raise a bug with Dmitry about the race detector.

That seems to reflect very poorly what I said, Dave.

Please leave the issue with me. Thank you.

gustavo @ http://niemeyer.net

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

On Thu, May 30, 2013 at 9:47 PM, Gustavo Niemeyer <email address hidden> wrote:
> On Thu, May 30, 2013 at 8:57 PM, Dave Cheney <email address hidden> wrote:
>> Mate, i'm not going to ask you to commit this if it is wrong. If you
>> prefer I will raise a bug with Dmitry about the race detector.
>
> That seems to reflect very poorly what I said, Dave.
>
> Please leave the issue with me. Thank you.

------------------------------------------------------------
revno: 17
revision-id: <email address hidden>
parent: <email address hidden>
committer: Gustavo Niemeyer <email address hidden>
branch nick: tomb
timestamp: Thu 2013-05-30 21:38:18 -0300
message:
  Fix race found by race detector, and reported by Dave Cheney.

  Here is more information than you'll want to know:

  If goroutine 1 is doing:

      t.Kill(nil)
      t.Wait()

  and goroutine 2 is doing:

      t.Wait()

  and goroutine 3 is doing:

      t.Done()

  There is a race, because goroutine 3 could finish first, and then
  goroutine 1 might run, and the memory model won't guarantee that
  goroutine 2 actually sees t.reason recorded by the Kill(nil) of
  goroutine 1.

  This specific case is more of a theoretical problem than a real one,
  for the following reasons:

  - The memory model guarantees that the close(t.done) performed
    by t.Done() will Happen Before the <-t.done done by either t.Wait,
    which means both goroutine 1 and 2 will always necessarily see a
    real error coming out of the goroutine(s) being monitored by t.

  - The Kill(nil) performed by goroutine 1 won't affect the value
    of t.reason due to the logic in the Kill method (although,
    observing the memory model pedantically means goroutine 1 might
    do whatever it pleases with the memory region, due to lack of
    synchronization).

  In a different scenario, though, t.Kill in goroutine 1 might be
  performed with a non-nil error, and goroutine 2 would see either
  nil or the error, and t.reason might be in an intermediate unknown
  state, so the race is in fact real. Solving this specific race via
  a mutex, as done in this change, still won't mean that this
  behavior is sane, though: what t.Wait() in goroutine 2 observes
  will be either the prior error value, or the new error value,
  based purely on timing of the two Wait calls.

  The change introduced protects t.reason with the tomb's mutex,
  which silents the race detector, and makes the logic entirely
  memory-model friendly.

Unmerged revisions

17. By Dave Cheney

Fix data race on t.reason.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'tomb.go'
2--- tomb.go 2012-04-04 03:22:16 +0000
3+++ tomb.go 2013-05-30 06:42:23 +0000
4@@ -108,6 +108,8 @@
5 func (t *Tomb) Wait() error {
6 t.init()
7 <-t.dead
8+ t.m.Lock()
9+ defer t.m.Unlock()
10 return t.reason
11 }
12

Subscribers

People subscribed via source and target branches

to all changes: