ceph-relation-changed hook fails due to hypervisor connection failure

Bug #1427660 reported by Liam Young
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
nova-compute (Juju Charms Collection)
Fix Released
Medium
Liam Young

Bug Description

## Issue
libvirt fails to start in Precise-Icehouse deployments (stable charm set). Restarting libvirt AND nova-compute was necessary for me, as nova-compute had also died. http://paste.ubuntu.com/10768920/

## nova-compute juju unit log
2015-04-07 21:29:17 INFO unit.nova-compute/0.ceph-relation-changed logger.go:40 subprocess.CalledProcessError: Command '['virsh', '-c', 'qemu:///system', 'secret-list']' returned non-zero exit status 1

## libvirtd.log from nova compute node
2015-04-07 21:26:22.456+0000: 7508: info : libvirt version: 1.2.2
2015-04-07 21:26:22.456+0000: 7508: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so: undefined symbol: rbd_create3

## Original description
When running mojo spec specs/full_stack/next_deploy/icehouse on precise the ceph-relation-changed hook fails with:

error: failed to connect to the hypervisor
error: no connection driver available for qemu:///system
Traceback (most recent call last):
  File "./hooks/ceph-relation-changed", line 335, in <module>
    main()
  File "./hooks/ceph-relation-changed", line 329, in main
    hooks.execute(sys.argv)
  File "/var/lib/juju/agents/unit-nova-compute-1/charm/hooks/charmhelpers/core/hookenv.py", line 544, in execute
    self._hooks[hook_name]()
  File "/var/lib/juju/agents/unit-nova-compute-1/charm/hooks/charmhelpers/core/host.py", line 312, in wrapped_f
    f(*args, **kwargs)
  File "./hooks/ceph-relation-changed", line 261, in ceph_changed
    key=relation_get('key'))
  File "/var/lib/juju/agents/unit-nova-compute-1/charm/hooks/nova_compute_utils.py", line 451, in create_libvirt_secret
    if secret_uuid in check_output(['virsh', '-c', uri, 'secret-list']):
  File "/usr/lib/python2.7/subprocess.py", line 544, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['virsh', '-c', 'qemu:///system', 'secret-list']' returned non-zero exit status 1

Restarting libvirt-bin solves the problem.

Additional info:

root@juju-lytrusty-machine-14:~# cat /var/log/libvirt/libvirtd.log
2015-03-03 11:20:20.609+0000: 7494: info : libvirt version: 1.2.2
2015-03-03 11:20:20.609+0000: 7494: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so: undefined symbol: rbd_create3
2015-03-03 11:20:20.617+0000: 7494: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so: undefined symbol: virStorageFileStat
2015-03-03 11:20:41.866+0000: 8506: info : libvirt version: 1.2.2
2015-03-03 11:20:41.866+0000: 8506: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so: undefined symbol: rbd_create3
2015-03-03 11:20:41.872+0000: 8506: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so: undefined symbol: virStorageFileStat
2015-03-03 11:22:51.725+0000: 8509: error : do_open:1170 : no connection driver available for qemu:///system
2015-03-03 11:22:51.726+0000: 8506: error : virNetSocketReadWire:1454 : End of file while reading data: Input/output error
2015-03-03 11:23:17.691+0000: 8516: error : do_open:1170 : no connection driver available for qemu:///system
2015-03-03 11:23:17.792+0000: 8506: error : virNetSocketReadWire:1454 : End of file while reading data: Input/output error
2015-03-03 12:18:19.220+0000: 8511: error : do_open:1170 : no connection driver available for qemu:///system
2015-03-03 12:18:19.221+0000: 8506: error : virNetSocketReadWire:1454 : End of file while reading data: Input/output error

Related branches

Liam Young (gnuoy)
Changed in nova-compute (Juju Charms Collection):
milestone: none → 15.04
assignee: nobody → Liam Young (gnuoy)
importance: Undecided → Medium
status: New → In Progress
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI, found and flagged a few duplicate bugs, as I almost filed another duplicate. ;-)

Revision history for this message
Ryan Beisner (1chb1n) wrote :

## Issue
libvirt fails to start in Precise-Icehouse deployments (stable charm set). Restarting libvirt AND nova-compute was necessary for me, as nova-compute had also died.

## nova-compute juju unit log
2015-04-07 21:29:17 INFO unit.nova-compute/0.ceph-relation-changed logger.go:40 subprocess.CalledProcessError: Command '['virsh', '-c', 'qemu:///system', 'secret-list']' returned non-zero exit status 1

## libvirtd.log from nova compute node
2015-04-07 21:26:22.456+0000: 7508: info : libvirt version: 1.2.2
2015-04-07 21:26:22.456+0000: 7508: error : virDriverLoadModule:79 : failed to load module /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so /usr/lib/libvirt/connection-driver/libvirt_driver_storage.so: undefined symbol: rbd_create3

## versions
ubuntu@juju-osci-sv19-machine-13:~$ dpkg-query --show libvirt* nova* qemu-system-common qemu-utils
libvirt-bin 1.2.2-0ubuntu13.1.6~cloud2
libvirt0 1.2.2-0ubuntu13.1.6~cloud2
nova-common 1:2014.1.3-0ubuntu1~cloud0
nova-compute 1:2014.1.3-0ubuntu1~cloud0
nova-compute-hypervisor
nova-compute-kvm 1:2014.1.3-0ubuntu1~cloud0
nova-compute-libvirt 1:2014.1.3-0ubuntu1~cloud0
qemu-system-common 2.0.0+dfsg-2ubuntu1.6~cloud0
qemu-utils 2.0.0+dfsg-2ubuntu1.6~cloud0

ubuntu@juju-osci-sv19-machine-13:~$ uname -a
Linux juju-osci-sv19-machine-13 3.2.0-79-virtual #115-Ubuntu SMP Thu Mar 12 14:37:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@juju-osci-sv19-machine-13:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.5 LTS
Release: 12.04
Codename: precise

...

See this pastebin for a lot of detail:

http://paste.ubuntu.com/10768920/

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Setting back to new, as it's a blocker for precise-icehouse stable charm deployments.

tags: added: amulet openstack uosci
Changed in nova-compute (Juju Charms Collection):
status: Fix Committed → New
Revision history for this message
Ryan Beisner (1chb1n) wrote :

FYI, a potentially-related precise rbd / qemu isue: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/1440948

tags: added: oil
Ryan Beisner (1chb1n)
description: updated
description: updated
Revision history for this message
James Page (james-page) wrote :

This is caused by librbd1 symbol versioning being a bit bust in older ceph versions - specifically libvirt thinks it will quite happily work with the librdb1 from 12.04, which lacks one symbols that is used due to being built against the 14.04 version of ceph.

We should probably fix this by upgrading librbd1 librados1 on charm install OR by versioning the depends in libvirt to ensure that the required librbd1 version is installed.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@james-page,

do you know offhand which librados version libvirt (and qemu) should minimally depend on?

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Proposed and tested a fix for stable / trunk nova-compute charm. Tested full deployment, launched & connected to instances.

Juju stat:
  http://paste.ubuntu.com/10784531/

Take note that the n-c amulet test is now failing on a different issue (bug 1440953).

jenkins@juju-osci-machine-17:~/tools/openstack-charm-testing$ juju run --service nova-compute "dpkg-query --show nova-compute; lsb_release -c"
- MachineId: "13"
  Stdout: "nova-compute\t1:2014.1.3-0ubuntu1~cloud0\nCodename:\tprecise\n"
  UnitId: nova-compute/0
- MachineId: "14"
  Stdout: "nova-compute\t1:2014.1.3-0ubuntu1~cloud0\nCodename:\tprecise\n"
  UnitId: nova-compute/1
- MachineId: "15"
  Stdout: "nova-compute\t1:2014.1.3-0ubuntu1~cloud0\nCodename:\tprecise\n"
  UnitId: nova-compute/2

| 599bcae5-be48-40fe-943b-27ea5afcb975 | trusty182313 | ACTIVE | - | Running | private=192.168.21.4, 172.17.114.202 |

Revision history for this message
Ryan Beisner (1chb1n) wrote :

librbd1 0.80.9-0ubuntu0.14.04.1 was eventually getting installed in nova-compute's ceph-relation-join hook (see http://paste.ubuntu.com/10785243/) ... after 0.41-1ubuntu2.1 was installed earlier (http://paste.ubuntu.com/10785263/).

Libvirt needed a restart after that install, so I pulled gnuoy's next branch fix.

James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
status: New → Fix Committed
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
status: Fix Committed → Fix Released
Revision history for this message
Joe Guo (guoqiao) wrote :

Somehow I hit this issue again today, with Ubuntu 18.04 and lxc version 3.15, openstack rocky.
The error message was exactly the same, with ceph-relation-changed hook fails, and virsh -c error.

It's fixed as Ryan Beisner suggested, by restarting libvirtd service in nova-compute node:

juju ssh nova-compute/0
sudo service libvirtd restart
exit
juju resolve nova-compute/0

I may retry the process to confirm later.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.