MAAS

Compose machine failure: Start tag expected, '<' not found, line 1, column 1

Bug #1690781 reported by Данило Шеган on 2017-05-15

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Expired	Medium	Unassigned

Bug Description

Attempting to compose a machine on a virsh pod failed for me with an error like below:

$ maas maas1 pod compose 2
Unable to compose machine because: Failed talking to pod: Start tag expected, '<' not found, line 1, column 1

Full traceback from regiond.log is at http://paste.ubuntu.com/24549398/, and full traceback from rackd.log is at http://paste.ubuntu.com/24549396/

See the bottom of the bug for full reproduction steps. The error goes away after reboot if your virsh pod is properly set up, so I don't think this is a high priority. Workaround (other than reboot) is to restart libvirt-bin service.

Looking at what get_domain_capabilitEs() (watch out when you grep for it) does, it basically just calls out to

virsh domcapabilities --virttype kvm

This was a nested VM instance where it failed with:

  ubuntu@maas1:~$ virsh domcapabilities --virttype kvm
  error: failed to get emulator capabilities
  error: invalid argument: unable to find any emulator to serve 'x86_64' architecture

so blindly attempting to parse that as XML failed.

Calling out to "virsh domcapabilities --virttype kvm /usr/bin/qemu-system-x86_64" worked for me, and afterwards, even just "virsh domcapabilities --virttype kvm" started to work.

It turns out the problem was that after installing qemu-kvm package, one needs to restart libvirt-bin service to get it to realize x86_64 emulator is present:

sudo systemctl restart libvirt-bin.service

After that, everything worked fine too.

We should:

1. Fix get_domain_capabilites() to check exit code of the virsh call (it's 1 for me, so it does indicate failure) and not attempt XML parsing the output in case of an error

2. Surface an error from virsh call instead of the XML parsing error

3. Perhaps suggest a workaround to the user, even though they might not be an administrator of the POD (install qemu-kvm package and restart libvirt-bin service)

4. Maybe even file a bug against eg. qemu-kvm package (to restart libvirt-bin itself) or libvirtd (to watch for appearing emulators with inotify or whatever the latest FS monitoring solution is)

Alternatively, we could just use the detected architecture ourselves and try to find the emulator to "teach" libvirt-bin service about its presence if the package is already installed, though I am not sure if this would work well with a remote virsh connection.

To reproduce (probably possible to reproduce with just qemu-system-x86 instead of qemu-kvm package):

sudo apt remove -y qemu-kvm; sudo apt autoremove -y; sudo apt install -y qemu-kvm; sudo systemctl restart libvirt-bin.service
virsh domcapabilities --virttype kvm
maas maas-connection pod compose POD-ID (get it using "maas maas-connection pods read")

All the other places in virsh pod driver seem to at least check for xml output being None (though in this case, I do get an "\n" and not an empty string on stdout) before attempting to parse as XML with etree.XML().

Tags:

Related branches

lp:~newell-jensen/maas/fix-1690781

Merged into lp:~maas-committers/maas/trunk at revision 6056

Данило Шеган (community): Approve on 2017-05-19

MAAS Maintainers: Pending requested 2017-05-17

~newell-jensen/maas:lp1690781

Merged into maas:master

Blake Rouse (community): Approve on 2019-07-12

Andres Rodriguez (andreserl) on 2017-05-15

Changed in maas:
importance:	Undecided → Medium
milestone:	none → 2.2.0rc5
status:	New → Triaged

Andres Rodriguez (andreserl) on 2017-05-15

Changed in maas:
assignee:	nobody → Newell Jensen (newell-jensen)

Newell Jensen (newell-jensen) on 2017-05-17

Changed in maas:
status:	Triaged → In Progress

Newell Jensen (newell-jensen) on 2017-05-22

Changed in maas:
milestone:	2.2.0rc5 → 2.2.1

MAAS Lander (maas-lander) on 2017-05-22

Changed in maas:
status:	In Progress → Fix Committed

Andres Rodriguez (andreserl) on 2017-07-11

Changed in maas:
status:	Fix Committed → Fix Released

MAAS Lander (maas-lander) on 2019-07-12

Changed in maas:
status:	Fix Released → Fix Committed

Revision history for this message

Marian Gasparovic (marosg) wrote on 2021-03-31:

We encountered this bug (or at least looks like it) three times while testing 2.8.5-deb

After we redeployed the machine, it works again.

Here is one of the runs

https://oil-jenkins.canonical.com/artifacts/a1657646-db57-49fd-9d8d-df28c777f653/index.html

Changed in maas:
status:	Fix Committed → New
tags:	added: cdo-qa cdo-release-blocker

Revision history for this message

Marian Gasparovic (marosg) wrote on 2021-04-01:

snap testing which failed on this

https://oil-jenkins.canonical.com/artifacts/a314fc29-e633-4c1b-8dc8-d7613fd74b94/index.html

Revision history for this message

Marian Gasparovic (marosg) wrote on 2021-04-01:

Observation - once we hit this issue, all subsequent runs hit it (we don't restart or redeploy infra nodes between runs).
After I tried to reboot all infra nodes and run a test again, it passed this critical point.

Alberto Donato (ack) on 2021-05-18

Changed in maas:
assignee:	Newell Jensen (newell-jensen) → nobody

Alberto Donato (ack) on 2021-05-19

Changed in maas:
milestone:	2.2.1 → none

Alberto Donato (ack) on 2021-05-21

Changed in maas:
status:	New → Triaged

Revision history for this message

Jerzy Husakowski (jhusakowski) wrote on 2022-09-22:

Is this issue reproducible on MAAS 3.2 or later?

Changed in maas:
status:	Triaged → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2022-11-22:

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.