juju environment not usable after the upgrade

Bug #1473517 reported by Mario Splivalo
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Ian Booth
1.22
Fix Released
Critical
Ian Booth
1.24
Fix Released
Critical
Ian Booth

Bug Description

When I manage to upgrade my deployment from 1.18 to 1.22 (as explained in bug 1473450), and fix the issues (by adding missing bits to failed units' agent.conf file), I'm not able to use the environment.

Trying to deploy new service, or add unit(s) to existing services always leaves agent in 'installing' state.

These are the log messages that are repeated over and over again:

2015-07-10 15:44:03 ERROR juju.worker.uniter.filter filter.go:132 tomb: dying
2015-07-10 15:44:03 ERROR juju.worker runner.go:219 exited "uniter": ModeInstalling cs:trusty/ubuntu-3: preparing operation "install cs:trusty/ubuntu-3": failed to download charm "cs:trusty/ubuntu-3" from ["https://10.5.2.172:17070/environment/76b73a45-46ff-436b-81b6-a475a52ef548/charms?file=%2A&url=cs%3Atrusty%2Fubuntu-3"]: cannot download "https://10.5.2.172:17070/environment/76b73a45-46ff-436b-81b6-a475a52ef548/charms?file=%2A&url=cs%3Atrusty%2Fubuntu-3": bad http response: 404 Not Found
2015-07-10 15:44:06 ERROR juju.worker.uniter.filter filter.go:132 tomb: dying
2015-07-10 15:44:06 ERROR juju.worker runner.go:219 exited "uniter": ModeInstalling cs:trusty/ubuntu-3: preparing operation "install cs:trusty/ubuntu-3": failed to download charm "cs:trusty/ubuntu-3" from ["https://10.5.2.172:17070/environment/76b73a45-46ff-436b-81b6-a475a52ef548/charms?file=%2A&url=cs%3Atrusty%2Fubuntu-3"]: cannot download "https://10.5.2.172:17070/environment/76b73a45-46ff-436b-81b6-a475a52ef548/charms?file=%2A&url=cs%3Atrusty%2Fubuntu-3": bad http response: 404 Not Found

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

I have observed similar behavior when upgrading to 1.24.2 too.

Revision history for this message
Ian Booth (wallyworld) wrote :

This looks like the upgrade from 1.18 has not properly copied charms that were previously stored in cloud storage into the environment blob store. Can you please attach (debug) logs from when the upgrades were run so we can see what steps were taken during the upgrade, and look for any errors.

I assume the upgrade was done first to 1.20.14 and then 1.22?

Also, can you expand on the missing apiaddress info - it should have been there from when the 1.18 system was installed. Does it disappear during the upgrade? How many units are affected? Is the password also missing?

Revision history for this message
Ian Booth (wallyworld) wrote :

I ran an experiment:

juju bootstrap a 1.18 system
juju deploy ubuntu -n 5
juju upgrade-juju --version 1.20.14

The upgrade worked and all agents on the deployed units upgraded to 1.20.14 and restarted.

Then I added a unit to the ubuntu service:

juju add-unit ubuntu

This resulted in a failure to provision a new machine:

  "6":
    agent-state-info: '(error: cannot make user data: invalid machine configuration:
      missing API hosts)'
    instance-id: pending
    series: trusty

For some reason, the 1.20.14 agents, on restart, failed to properly load and process a file called "provider-state" in the environment's control bucket. This meant that the jujud agent did not have the API host addresses available.

As a manual work around, I restarted the jujud agent on machine 0:

juju ssh 0
sudo service jujud-machine-0 restart

Then I could add unit.

To get the previous failed add units to work, I told juju to retry provisioning of those failed machines, eg

juju retry-provisioning 6

After a minute or so, Juju wakes up and retries the failed machines and the add unit operation from before completes.

This workaround fixed the observed issues on an upgraded 1.20.14 system coming from 1.18 in a simple example. It would need testing in a more complex environment to ensure it is robust. There are no plans for another 1.20 release.

Revision history for this message
Ian Booth (wallyworld) wrote :

I then upgraded the 1.20.14 environment to 1.22.6:

juju upgrade-juju --version=1.22.6

I didn't see the 404 error adding units BUT a close examination of the logs shows a serious upgrade issue.

The TL;DR: is that the upgrade step in 1.22 to import charms from cloud storage is broken due to how it works with the JES rollout across 1.21 and 1.22. This must be fixed for the current juju release.

For 1.22, we either need to:

1. cut another 1.22 release which fixes the issue
2. insist that upgrades from 1,20 go through 1.21

For option 2, to upgrade from 1.18, the steps would be:

juju upgrade-juju --version.1.20.14
(wait for all agents to upgrade)
juju upgrade-juju --version.1.21.3
(wait for all agents to upgrade)

then you can upgrade to 1.22.6 or 1.24.x or later.

Changed in juju-core:
milestone: none → 1.25.0
status: New → Triaged
importance: Undecided → Critical
Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → Fix Committed
Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Ian, thank you for the comments and instructions. I was able to upgrade my test environment from 1.18 to 1.24.2 with your explanation. I did need to save agent.conf files as some of them were missing apiserver entry (that happened from 1.18 to 1.20 and then from 1.20 to 1.21).
From there I restarted jujud on bootstrap node, and I ended up with usable 1.24.2 environment.

I will test the upgrade of the OpenStack deploy, which will mimic what my customer has. I'll report back here if I run into troubles.

Revision history for this message
Ian Booth (wallyworld) wrote :

@Mario

Turns out my idea to go via 1.21 is flawed - it can result in errors because 1.21 does not recognise wily.

We are about to release a 1.22.7 which will fix the underlying upgrade issue so you can go straight from 1.20 to 1.22.7 in one step.
I hope to have that in a proposed ppa within a day and then stable next week.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Mario, or anyone else affected,

Accepted juju-core into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/juju-core/1.22.8-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.