juju 2.0-beta12 ERROR unable to contact api server after 61 attempts: upgrade in progress (upgrade in progress)

Bug #1605313 reported by Felipe Reyes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
New
Undecided
Unassigned

Bug Description

I have an environment which consists of 3 VMs registered in MAAS that I have been using since approximately 2.0-beta7, and since I upgraded to beta12 in the bootstrap process I'm getting the following error:

$ juju bootstrap --keep-broken --upload-tools --no-gui --constraints tags=bootstrap mymaas mymaas
Creating Juju controller "mymaas" on mymaas
Bootstrapping model "controller"
Starting new instance for initial controller
Launching instance
WARNING no architecture was specified, acquiring an arbitrary node
 - 4y3h7s
Building tools to upload (2.0-beta12.1-xenial-amd64)
Installing Juju agent on bootstrap instance
Juju GUI installation has been disabled
Waiting for address
Attempting to connect to 192.168.30.3:22
Attempting to connect to fd39:6c94:2e21:a491:5054:ff:fe47:9fb5:22
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:5lZm8VTwGRekXIFJfVfHPckLGBmC57ZJ11BPcrUGCPs.
Please contact your system administrator.
Add correct host key in /home/ubuntu/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/ubuntu/.ssh/known_hosts:6
  remove with:
  ssh-keygen -f "/home/ubuntu/.ssh/known_hosts" -R 192.168.30.3
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: cloud-utils
Installing package: cloud-image-utils
Installing package: tmux
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap agent installed
Waiting for API to become available: upgrade in progress (upgrade in progress)
[...]
Waiting for API to become available: upgrade in progress (upgrade in progress)
ERROR unable to contact api server after 61 attempts: upgrade in progress (upgrade in progress)

Probably this is race condition that my env is triggering, because I could successfully deploy once.

In the machine-0.log ( http://paste.ubuntu.com/20323646/ ) there are a lot of "juju.api.watcher watcher.go:86 error trying to stop watcher: connection is shut down" and "juju.rpc server.go:540 error writing response: write tcp 127.0.0.1:17070->127.0.0.1:56230: write: broken pipe" errors.

Also inspecting /var/log/syslog I found a "exception: E11000 duplicate key error collection" error[0], but I'm not sure if this is something that juju internally manages properly or not.

[Other info]

* juju-2.0, version: 2.0-beta12~20160715~4141~abcd123-20160715+4141+abcd123~16.04
* attaching /var/log/ directory from the controller node.

[0] Jul 21 14:55:22 ubuntu mongod.37017[5332]: [conn22] command juju.ip.addresses command: insert { insert: "ip.addresses", documents: [ { _id: "f77a530a-e744-4dd2-83c8-60f84ca449a2:m#0#d#ens3#ip#192.168.30.3", model-uuid: "f77a530a-e744-4dd2-83c8-60f84ca449a2", providerid: "525", device-name: "ens3", machine-id: "0", subnet-cidr: "192.168.30.0/24", config-method: "manual", value: "192.168.30.3", dns-servers: [ "192.168.30.1" ], gateway-address: "192.168.30.1", txn-revno: 2, txn-queue: [ "5790e25a44e8db15873c746c_69c65be8" ] } ], writeConcern: { getLastError: 1, j: true }, ordered: true } ninserted:0 keyUpdates:0 writeConflicts:0 exception: E11000 duplicate key error collection: juju.ip.addresses index: _id_ dup key: { : "f77a530a-e744-4dd2-83c8-60f84ca449a2:m#0#d#ens3#ip#192.168.30.3" } code:11000 numYields:0 reslen:309 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 112777 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 123ms

Tags: sts
Revision history for this message
Felipe Reyes (freyes) wrote :
tags: added: sts
Revision history for this message
Adam Stokes (adam-stokes) wrote :

I've been battling with the E11000 error a bunch, can you try the binaries from https://bugs.launchpad.net/juju-core/+bug/1604644/comments/11

I haven't run into this issue with those, make sure to use --upload-tools

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.