juju-core

relationunit_test.go: 2 tests fail with mgo >= 241

Bug #1221705 reported by Dimiter Naydenov on 2013-09-06

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
juju-core	Status tracked in Trunk
1.14	Fix Released	Critical	Unassigned	juju-core 1.14.0
Trunk	Fix Released	Critical	Unassigned	juju-core 1.15.0
mgo	Fix Released	Undecided	Unassigned

Bug Description

See: https://pastebin.canonical.com/97052/

I cannot waste more time on this issue. Let's look
more into it when jam's back.

Proposed a branch that skips those tests: https://codereview.appspot.com/13487044/

Related branches

lp:~dimitern/juju-core/124-skip-failing-relationunit-tests

Merged into lp:~go-bot/juju-core/trunk at revision 1765

Juju Engineering: Pending requested 2013-09-06

lp:~jameinel/juju-core/restore-tests-1221705

Merged into lp:~go-bot/juju-core/trunk at revision 1771

John A Meinel: Approve on 2013-09-08

lp:~dave-cheney/juju-core/backport-r1782

Merged into lp:juju-core/1.14 at revision 1740

John A Meinel: Approve on 2013-09-11

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-06:

I don't know why, but the bot is running mgo@ revno 243. This includes some of Gustavo's changes to the mgo driver. I did not personally update mgo, but maybe some other trigger did.

If I update to rev 242 of mgo on my local box, these tests fail for me.

I only thought of this because of Gustavo's recent release of mgo:
http://blog.labix.org/2013/09/05/mgo-r2013-09-04-released

Note that one of the mgo changes include supporting "Improved Timeouts" which involves "Socket level deadlines".

So I'm guessing something in the stack added the change, and something about these tests cause stuff to take longer in Mongo.

If I use mgo@revno 238 (the last version I had locally), I see those 2 tests taking 1.5s and 1.4s which is about 10 slower than the other similar tests (all the other ones take <200ms).

So my guess is an updated mgo added socket timeouts, which were hiding whatever was actually going wrong. I don't know what, but I do think it is worth actually investigating. We can disable the tests for now, because we shouldn't block landing code, but it looks like we might have a genuine issue behind them.

summary:

- relationunit_test.go: 2 tests fail only on the bot
+ relationunit_test.go: 2 tests fail with mgo >= 241

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-06:

I don't know if this is strictly a bug in mgo. But I do know that we had tests that passed with mgo revno 240 that now break with mgo revno 241.

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-08:

This seems to have been fixed here:
http://bazaar.launchpad.net/~niemeyer/mgo/v2/revision/245

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-08:

I'm restoring the tests and updating mgo because it appears to have been fixed there.

Changed in juju-core:
status:	Confirmed → In Progress

Go Bot (go-bot) on 2013-09-08

Changed in juju-core:
status:	In Progress → Fix Committed

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-08:

Only a problem in trunk, so marking released.

Changed in mgo:
status:	New → Fix Committed
Changed in juju-core:
status:	Fix Committed → Fix Released

Revision history for this message

John A Meinel (jameinel) wrote on 2013-09-10:

So the original tests that failed now pass with mgo rev 245, however, we have another test failing periodically even with that fix
----------------------------------------------------------------------
FAIL: assign_test.go:913: assignCleanSuite.TestAssignUsingConstraintsToMachine

[LOG] 20.47509 INFO juju.state opening state; mongo addresses: ["localhost:45631"]; entity ""
[LOG] 20.47805 INFO juju.state connection established
[LOG] 20.49497 INFO juju.state initializing environment
test 0
test 1
test 2
test 3
test 4
test 5
test 6
test 7
test 8
test 9
test 10
test 11
test 12
test 13
test 14
test 15
assign_test.go:921:
c.Assert(err, gc.IsNil)
... value *errors.errorString = &errors.errorString{s:"cannot add unit to service \"wordpress\": read tcp 127.0.0.1:45631: i/o timeout"} ("cannot add unit to service \"wordpress\": read tcp 127.0.0.1:45631: i/o timeout")

OOPS: 321 passed, 1 FAILED
--- FAIL: TestPackage (68.58 seconds)

So for now, we're using rev 240 in bot running the test suite.

Changed in mgo:
status:	Fix Committed → Confirmed

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2014-03-24:

We have timeouts working in mgo for a while now, and this bug doesn't really say anything other than a timeout was seen. Can't do much about it without further information.

Changed in mgo:
status:	Confirmed → Invalid
status:	Invalid → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.