Merge lp:~flacoste/storm/bug-360846-upstream into lp:storm
Proposed by
Francis J. Lacoste
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | not available | ||||
Proposed branch: | lp:~flacoste/storm/bug-360846-upstream | ||||
Merge into: | lp:storm | ||||
Diff against target: | 93 lines | ||||
To merge this branch: | bzr merge lp:~flacoste/storm/bug-360846-upstream | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
James Henstridge | Approve | ||
Review via email: mp+6358@code.launchpad.net |
To post a comment you must log in.
Hi again,
So it seems that the previous fix didn't work in practice. But we found the
root cause (at least from the storm pov): the disconnection was happening in
the tracer code when setting the statement timeout.
So this new branch runs the tracing code through _check_disconnect. That way
any disconnect in the tracer code will trigger a reconnection. I have tests
this time.
One issue pending is what to do with the InterfaceError. That error is raised
when using a connection which has been disconnected. So if everything is fine,
this error is never triggered in practice. But if there is a Storm bug, like
here, the damages are great since that error will never trigger a
reconnection and we'll need to restart app servers. But at the same time,
making InterfaceError reconnects, would hide the underlying storm bug
(something should be run through _check_disconnect, but isn't).
I discussed this fix with Jamu and we couldn't decide on what was the right
thing to do. We thought that a compromise where the error is recovered from,
but at the same time a 'soft oops' is raised would be best. Unfortunately,
storm doesn't use the OOPS system (which isn't a flaw in itself). So I'd like
to hear your opinion on that.