Merge lp:~stub/charms/trusty/cassandra/ensure-thrift into lp:charms/trusty/cassandra
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Merge reported by: | Cory Johns | ||||
| Merged at revision: | not available | ||||
| Proposed branch: | lp:~stub/charms/trusty/cassandra/ensure-thrift | ||||
| Merge into: | lp:charms/trusty/cassandra | ||||
| Diff against target: |
639 lines (+267/-48) 10 files modified
Makefile (+14/-2) README.md (+6/-5) config.yaml (+6/-2) hooks/actions.py (+2/-1) hooks/helpers.py (+61/-21) hooks/hooks.py (+3/-4) tests/test_actions.py (+17/-6) tests/test_helpers.py (+124/-4) tests/test_integration.py (+32/-3) tests/tests.yaml (+2/-0) |
||||
| To merge this branch: | bzr merge lp:~stub/charms/trusty/cassandra/ensure-thrift | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Cory Johns | 2016-01-20 | Needs Fixing on 2016-02-25 | |
| Review Queue (community) | automated testing | Approve on 2016-02-01 | |
| Adam Israel | 2015-12-19 | Needs Fixing on 2016-01-18 | |
| Stuart Bishop | Abstain on 2015-12-19 | ||
|
Review via email:
|
|||
Description of the Change
Fix Bug #1523546 by supporting Cassandra 2.2.
Note that Cassandra 2.1 is still the release series recommended for production use.
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Stuart Bishop (stub) wrote : | # |
Cassandra 2.2 is now the default, as it is now what upstream recommends.
Also added 3.0 support for the brave (it is not recommended for production yet).
Also rolled up all the outstanding feature and bug fixes still up for review, as the fixes are needed here and conflicts need to be resolved somewhere. Up to reviewers if they want to digest this in small chunks, or just deal with this rollup branch.
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Stuart Bishop (stub) wrote : | # |
Successful run at http://
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Adam Israel (aisrael) wrote : | # |
I see we've had recent tests pass and fail. I attempted to run the tests myself, using both the local and amazon provider. In both cases, I ran into fairly consistent timeouts causing the tests to fail.
In one test run, I saw the test fail while the units were in the state of either "Waiting for permission to bootstrap" or "Waiting for agent initialization to finish". In another, the testrunner sat for hours while all units were deployed and idle. I wonder if the current timeout logic is prone to error or race conditions?
Have you seen similar timeouts in your own testing? The consistency is something I'd really like to see fixed, especially as we're preparing to roll out the cloud weather report.
Here's the log from bundletester:
http://
Here's the output of juju status immediately after the failure:
http://
I also have the full (~5M) output of debug-log, if that'd be useful.
| Stuart Bishop (stub) wrote : | # |
I'm mainly interested in a code review for this branch. Tests are passing locally and at http://
Looking at your bundletester log, the test suite ran juju-deployer to deploy a single unit and gave up waiting an hour and a half later. The environment was not in an error state, or we wouldn't have gotten as far as timing out (the 124 return code is from the timeout(1) command that wraps juju-wait). My assumption is that one of the 2 machines needed for these tests never completed provisioning, which is an issue I see regularly with juju and the local provider (lxc sets up a unit fine, but it gets stuck in juju in 'provisioning' state either without an agent state or with an agent state but no IP address). If provisioning doesn't complete locally within a couple of minutes, it is never going to complete. The timeout is only set to 1.5 hours to handle other clouds, and I'd appreciate input on whether this timeout should be reduced (what is the worst case you see to provision 4 VMs?). The alternative explanation is somehow one of the hooks ended up in an infinite loop, but I haven't seen that happen on my recent branches as loops like 'wait for Cassandra to start responding' have timeouts. This would be obvious from the debug-log, as you would see a hook repeatedly spamming messages in a loop for most of the hour and a half before Jan 18 18:56:32.
The output of juju status is for the next set of tests that deploys three units, and shows the units happily executing their hooks and settings things up. 'waiting for agent initialization to finish' is a Juju message, where the subordinates are waiting for hooks on their primaries to complete so they can be setup. 'waiting for permission to bootstrap' is a message from the charm and is normal, where it is waiting for permission from the leader to continue Cassandra setup (we need to coordinate adding units to the Cassandra cluster, one node at a time with a two minute delay between each one).
About the only thing I can think of to do here is catch provisioning failures and loudly blame others if appropriate (or perhaps skipping tests is better?). I normally Ctrl-C, rebootstrap and try again but I think it makes sense for CI runs.
| Stuart Bishop (stub) wrote : | # |
https:/
| Review Queue (review-queue) wrote : | # |
This item has failed automated testing! Results available here http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Review Queue (review-queue) wrote : | # |
The results (PASS) are in and available here: http://
| Cory Johns (johnsca) wrote : | # |
I merged this and was then notified that it broke someone's bundle. It turns out that the two new PPAs in the install_sources option are absolutely required, so anyone who was using that option previously ends up with a broken install if they don't add them: https:/
I think the mandatory PPAs should not be part of that config option, and instead should be added by the code.
This has been reverted for the time being.
| Cory Johns (johnsca) wrote : | # |
Note, a work-around is to ensure that the PPAs are always included in the config value (e.g. http://
| Stuart Bishop (stub) wrote : | # |
This is works as intended, per Bug #1517803. The PPA can't be hard coded, as it means the charm requires network access to ppa.launchpad.net to install.
| Stuart Bishop (stub) wrote : | # |
To clarify, the PPAs are not mandatory. In our internal and customer deployments the necessary packages are placed on internal archives.
| Antoni Segura Puimedon (celebdor) wrote : | # |
I understand the need for the change, I just wish that charms had major and minor releases and I could pin to major instead of to minor in my bundles, to still get non-breaking changes.
@johnsca: I'll add the options to the bundle tomorrow and test it. I would appreciate if you can delay the re-merge until then.
| Cory Johns (johnsca) wrote : | # |
This has been merged again.

This item has failed automated testing! Results available here http:// juju-ci. vapour. ws:8080/ job/charm- bundle- test-lxc/ 1719/