relationunit_test.go: 2 tests fail with mgo >= 241

Bug #1221705 reported by Dimiter Naydenov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Status tracked in Trunk
1.14
Fix Released
Critical
Unassigned
Trunk
Fix Released
Critical
Unassigned
mgo
Fix Released
Undecided
Unassigned

Bug Description

See: https://pastebin.canonical.com/97052/

I cannot waste more time on this issue. Let's look
more into it when jam's back.

Proposed a branch that skips those tests: https://codereview.appspot.com/13487044/

Related branches

Revision history for this message
John A Meinel (jameinel) wrote :

I don't know why, but the bot is running mgo@ revno 243. This includes some of Gustavo's changes to the mgo driver. I did not personally update mgo, but maybe some other trigger did.

If I update to rev 242 of mgo on my local box, these tests fail for me.

I only thought of this because of Gustavo's recent release of mgo:
http://blog.labix.org/2013/09/05/mgo-r2013-09-04-released

Note that one of the mgo changes include supporting "Improved Timeouts" which involves "Socket level deadlines".

So I'm guessing something in the stack added the change, and something about these tests cause stuff to take longer in Mongo.

If I use mgo@revno 238 (the last version I had locally), I see those 2 tests taking 1.5s and 1.4s which is about 10 slower than the other similar tests (all the other ones take <200ms).

So my guess is an updated mgo added socket timeouts, which were hiding whatever was actually going wrong. I don't know what, but I do think it is worth actually investigating. We can disable the tests for now, because we shouldn't block landing code, but it looks like we might have a genuine issue behind them.

summary: - relationunit_test.go: 2 tests fail only on the bot
+ relationunit_test.go: 2 tests fail with mgo >= 241
Revision history for this message
John A Meinel (jameinel) wrote :

I don't know if this is strictly a bug in mgo. But I do know that we had tests that passed with mgo revno 240 that now break with mgo revno 241.

Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

I'm restoring the tests and updating mgo because it appears to have been fixed there.

Changed in juju-core:
status: Confirmed → In Progress
Go Bot (go-bot)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

Only a problem in trunk, so marking released.

Changed in mgo:
status: New → Fix Committed
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

So the original tests that failed now pass with mgo rev 245, however, we have another test failing periodically even with that fix
----------------------------------------------------------------------
FAIL: assign_test.go:913: assignCleanSuite.TestAssignUsingConstraintsToMachine

[LOG] 20.47509 INFO juju.state opening state; mongo addresses: ["localhost:45631"]; entity ""
[LOG] 20.47805 INFO juju.state connection established
[LOG] 20.49497 INFO juju.state initializing environment
test 0
test 1
test 2
test 3
test 4
test 5
test 6
test 7
test 8
test 9
test 10
test 11
test 12
test 13
test 14
test 15
assign_test.go:921:
    c.Assert(err, gc.IsNil)
... value *errors.errorString = &errors.errorString{s:"cannot add unit to service \"wordpress\": read tcp 127.0.0.1:45631: i/o timeout"} ("cannot add unit to service \"wordpress\": read tcp 127.0.0.1:45631: i/o timeout")

OOPS: 321 passed, 1 FAILED
--- FAIL: TestPackage (68.58 seconds)

So for now, we're using rev 240 in bot running the test suite.

Changed in mgo:
status: Fix Committed → Confirmed
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

We have timeouts working in mgo for a while now, and this bug doesn't really say anything other than a timeout was seen. Can't do much about it without further information.

Changed in mgo:
status: Confirmed → Invalid
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.