arm64 - services not running that should be - missing capabilities

Bug #1891203 reported by Andrew McLeod
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
New
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Andrew McLeod

Bug Description

[Impact]

 * Libvirt (and other management software on top of qemu/kvm) probes
   capabilities of the current platform, this breaks on arm with the
   current builds making it unable to use KVM on this platform through the
   management stack.

 * The fix ensures that probing with -machine none won't break on
   platforms with different -cpu support (differs per platform)

[Test Case]

 * We will run the regression tests to cover any implications to !arm
 * The openstack Team will re-run their tests that uncovered the issue to
   confirm that execution on arm is not affected by the bug anymore (nor
   showing new issues).

[Regression Potential]

 * The fix only affects "-machine none" cases which are sort of exclusive
   for probing which was formerly broken. Therefore there should not be
   any regressions, but if any then on non-arm machines in rgard to that
   probing.
   Any reboot of a machine with libvirt re-probing capabilities would
   reveal that - so we should be safe after testing is ok.

[Other Info]

 * n/a

---

bionic ussuri - arm64 (aarch64)

QEMU emulator version 4.2.0 (Debian 1:4.2-3ubuntu6.3~cloud0)

Error in log is: nova.exception.InternalError: Nova requires QEMU version 2.11.0 or greater

This error may be a red herring as this particular version of nova does not display capabilities for aarch64, only armv6l

https://pastebin.ubuntu.com/p/JmPsDSwHBv/

If I downgrade to the version in the train cloud-archive (1:4.0+dfsg-0ubuntu9.8~cloud0) then nova-compute service starts - this problem is not present in focal, which also has version also has 1:4.2-3ubuntu6.3

Related branches

CVE References

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.7 KiB)

carry over from IRC

<admcleod>»·just putting this here for tomorrow: https://bugs.launchpad.net/charm-nova-compute/+bug/1891203
<cpaelzer>»·looked at the bug - doesn't ring a known bell
<cpaelzer>»·maybe a dependency change
<cpaelzer>»·so this is Ussuri - so we are talking about the Focal backported to Bionic here right?
<admcleod>»·bionic ussuri, right - focal ussuri works fine with that same version
<cpaelzer>»·also if you could compare dpkg -l 'qemu*' 'libvirt*' on the working focal-ussuri vs bionuc-ussuri
<admcleod>»·will do
<cpaelzer>»·I'm concerned if just an extra qemu-something-arm might be missing
<admcleod>»·heh right
<cpaelzer>»·and for arm that also includes packages like ovmf or such
<cpaelzer>»·what else then ovmf might come to ming ...
<cpaelzer>»·mind
<cpaelzer>»·maybe just be brutal and Diff full `dpkg -l` among the systemd
<cpaelzer>»·systems
<cpaelzer>»·in particular
<cpaelzer>»·your armv6l has this
<cpaelzer>»· <emulator>/usr/bin/qemu-system-arm</emulator>
<cpaelzer>»·whatever the good system has as emulator for aarch64 really needs to be there
<cpaelzer>»·also I've seen the capability prober processes fail, so if nothing of the above helps check the logs of libvirt for failing qemu processes
<admcleod>»·right if i downgrade to prev version
<admcleod>»·i do have <guest>
<admcleod>»· <os_type>hvm</os_type>
<admcleod>»· <arch name='armv7l'>
<admcleod>»· <wordsize>32</wordsize>
<admcleod>»· <emulator>/usr/bin/qemu-system-aarch64</emulator>
<admcleod> bionic has libvirt 6.0.0-0ubuntu8.2 and focal 6.0.0-0ubuntu8.3
<admcleod> nothing else obvious w qemu or libvirt packages
<cpaelzer> hmm
<cpaelzer> there were a few things I mentioned to james that need to be removed/reverted on backport
<cpaelzer> I mostly send these mails to allow my brain forgetting about them
* cpaelzer tries to search what I have mentioned there
<admcleod> yeah updating libvirt to that latest (...8.3) didnt work
<admcleod> there are no libvirt/qemu logs either
<admcleod> come tot hink about it, it feels as if there is a binary which reports the version and that aarch64 that is missing somehow
<admcleod> that how it 'feels'
<admcleod> .....without understanding how it 'works'
<cpaelzer> on startup libvirt probes for binaries of qemu
<cpaelzer> then it calls all...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.0 KiB)

In further discussions and checks Andrew and I found:

Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/aarch64/strlen.S:94
94 ../sysdeps/aarch64/strlen.S: No such file or directory.

(gdb) bt
#0 strlen () at ../sysdeps/aarch64/strlen.S:94
#1 0x0000aaaaab04fd84 in qmp_query_cpu_model_expansion (type=<optimized out>, model=0xaaaaabcd91b0, errp=errp@entry=0xffffffffee60) at ./target/arm/monitor.c:140
#2 0x0000aaaaab0300a0 in qmp_marshal_query_cpu_model_expansion (args=<optimized out>, ret=0xffffffffef10, errp=0xffffffffef08) at ./b/qemu/qapi/qapi-commands-machine-target.c:183
#3 0x0000aaaaab412ba0 in do_qmp_dispatch (errp=0xffffffffef60, allow_oob=<optimized out>, request=<optimized out>, cmds=0xaaaaabcd92f0) at ./qapi/qmp-dispatch.c:132
#4 qmp_dispatch (cmds=0xaaaaabcd92f0, request=0xffffec006c70, allow_oob=<optimized out>) at ./qapi/qmp-dispatch.c:175
#5 0x0000aaaaab306850 in monitor_qmp_dispatch (mon=0xaaaaabcd91d0, req=<optimized out>) at ./monitor/qmp.c:145
#6 0x0000aaaaab307088 in monitor_qmp_bh_dispatcher (data=<optimized out>) at ./monitor/qmp.c:234
#7 0x0000aaaaab45eb00 in aio_bh_call (bh=0xaaaaabbf32c0) at ./util/async.c:89
#8 aio_bh_poll (ctx=ctx@entry=0xaaaaabbf1e80) at ./util/async.c:117
#9 0x0000aaaaab46243c in aio_dispatch (ctx=0xaaaaabbf1e80) at ./util/aio-posix.c:459
#10 0x0000aaaaab45e9d8 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ./util/async.c:268
#11 0x0000fffff7d1bc30 in g_main_context_dispatch () from /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0
#12 0x0000aaaaab46147c in glib_pollfds_poll () at ./util/main-loop.c:219
#13 os_host_main_loop_wait (timeout=187650001905680) at ./util/main-loop.c:242
#14 main_loop_wait (nonblocking=<optimized out>) at ./util/main-loop.c:518
#15 0x0000aaaaab113330 in main_loop () at ./vl.c:1810
#16 0x0000aaaaaaed4bd0 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ./vl.c:4492
(gdb)

Which is a reaction to the probing of caps like:
$ nc 127.0.0.1 4444
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "Debian 1:4.2-3ubuntu6.3~cloud0"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{"execute":"query-cpu-model-expansion","arguments":{"type":"full","model":{"name":"host"}}}

So it is passing a null pointer and breaking.

Not 100% that this "is it", but worth taking a look for sure

@Andrew - with debug symbols and backtrace it already makes more sense :-)

(gdb) frame 1
#1 0x0000aaaaab04fd84 in qmp_query_cpu_model_expansion (type=<optimized out>, model=0xaaaaabcd91b0, errp=errp@entry=0xffffffffee60) at ./target/arm/monitor.c:140
140 int len = strlen(cpu_type) - strlen(ARM_CPU_TYPE_SUFFIX);
(gdb) p cpu_type
$1 = 0x0

This is from current_machine->cpu_type

(gdb) p *current_machine
$3 = {parent_obj = {class = 0xaaaaabc1aa30, free = 0xfffff7d21680 <g_free>, properties = 0xaaaaabcc0cc0, ref = 2, parent = 0xaaaaabcc33d0}, sysbus_notifier = {
    notify = 0xaaaaab18f320 <machine_init_notify>, node = {le_next = 0xaaaaabadaf20 <chardev_machine_done_notify>, le_prev = 0xaaaaabb17558 <machine_init_don...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Maybe
https://git.qemu.org/?p=qemu.git;a=commit;h=0999a4ba8718aa96105b978d3567fc7e90244c7e

I don't yet see why you see this only in Bionic-ussuri and not Focal-ussuri thou ?!?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Maybe what I found is a valid issue present in all qemu 4.2 builds we have, but not related to your initial struggle to get bionic-ussuri to run?

Could you please throw that patch on top of your B-ussuri builds and see if then everything magically works - or if we only fixed this issue, but not the "other" one blocking your system from recognizing aarch64 as valid arch to run KVM on.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Added an (so far incomplete) qemu task for tracking until we know if all 4.2 builds are affected and in which way.

Revision history for this message
Andrew McLeod (admcleod) wrote :

Corey rebuilt qemu with the patch suggested - i've tested it on one machine and it has worked. I am going to retest on a fresh deployment but it looks good

ppa:corey.bryant/bionic-ussuri

1:4.2-3ubuntu6.3~cloud1~ubuntu18.04.1~ppa202008131000

This patch is already in 4.2.1

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I had planned to work on the stable qemu already, but in a bit (closing out things for groovy feature freeze first).

This will be part of a focal SRU then - feel free to carry that in the cloud archive now if you want, you can drop it later then when rebasing to the latest version.

Changed in qemu (Ubuntu):
status: Incomplete → Triaged
description: updated
Changed in qemu (Ubuntu Focal):
status: New → Triaged
Changed in qemu (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Andrew, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:4.2-3ubuntu6.5)

All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.5) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

ubuntu-image/1.9+20.04ubuntu1 (amd64)
systemd/245.4-4ubuntu3.2 (amd64, armhf, s390x, ppc64el)
livecd-rootfs/2.664.4 (amd64, arm64, s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've talked to Andrew, he will give this verification a shot on the hardware the issue was first seen.

Changed in qemu (Ubuntu Focal):
assignee: nobody → Andrew McLeod (admcleod)
Revision history for this message
Andrew McLeod (admcleod) wrote :

i have finally verified this on focal - i am able to launch nova instances with 1:4.2-3ubuntu6.5

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you Andrew, setting tags.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.5

---------------
qemu (1:4.2-3ubuntu6.5) focal; urgency=medium

  * further stabilize qemu by importing patches of qemu v4.2.1
    Fixes (LP: #1891203) and (LP: #1891877)
    - d/p/stable/lp-1891877-*
    - as part of the stabilization this also fixes an
      riscv emulation issue due to the CVE-2020-13754 fixes via
      d/p/ubuntu/hw-riscv-Allow-64-bit-access-to-SiFive-CLINT.patch
  * fix s390x SQXBR emulation (LP: #1883984)
    - d/p/ubuntu/lp-1883984-target-s390x-Fix-SQXBR.patch
  * fix -no-reboot for s390x protvirt guests (LP: #1890154)
    - d/p/ubuntu/lp-1890154-s390x-protvirt-allow-to-IPL-secure-guests-with-*

 -- Christian Ehrhardt <email address hidden> Wed, 19 Aug 2020 13:40:49 +0200

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.