Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine type(pseries-bionic) complaining "KVM implementation does not support Transactional Memory, try cap-htm=off" (kvm)

Bug #1752026 reported by bugproxy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Server
linux (Ubuntu)
Fix Released
Critical
Seth Forshee
qemu (Ubuntu)
Fix Released
Critical
Canonical Server

Bug Description

== Comment: #0 - Satheesh Rajendran <email address hidden> - 2018-02-23 08:31:06 ==
---Problem Description---
libvirt unable to start a KVM guest complaining about cap-htm machine property to be off

Host Env:
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0-159
Thread(s) per core: 4
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 1202)
Model name: POWER9 (raw), altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2166.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159

ii qemu-kvm 1:2.11+dfsg-1ubuntu2 ppc64el QEMU Full virtualization on x86 hardware

ii libvirt-bin 4.0.0-1ubuntu3 ppc64el programs for the libvirt library

# lsmcode
Version of System Firmware :
 Product Name : OpenPOWER Firmware
 Product Version : open-power-SUPERMICRO-P9DSU-V1.03-20180205-imp
 Product Extra : occ-577915f
 Product Extra : skiboot-v5.9-240-g081882690163-pcbedce4
 Product Extra : petitboot-v1.6.6-p019c87e
 Product Extra : sbe-095e608
 Product Extra : machine-xml-fb5f933
 Product Extra : hostboot-9bfb201
 Product Extra : linux-4.14.13-openpower1-p78d7eee

Contact Information = <email address hidden>

---uname output---
4.15.0-10-generic

Machine Type = power9 boston 2.2 (pvr 004e 1202)

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. Boot a guest from libvirt with default pseries machine type or pseries-bionic

/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries --memory=32768 --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/ubuntu-18.04-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:77:78:79 --noautoconsole
WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.

Starting install...
ERROR internal error: process exited while connecting to monitor: ,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:78:79,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
2018-02-23T14:21:11.081809Z qemu-system-ppc64: KVM implementation does not support Transactional Memory, try cap-htm=off
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect qemu:///system start virt-tests-vm1
otherwise, please restart your installation.

2. Fails to boot..

Note: if we specify machine type as pseries=2.12 it boots fine like below

/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries-2.12 --memory=32768 --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/ubuntu-18.04-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:77:78:79 --noautoconsole
WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results.

qemu-cmd line:

libvirt+ 4283 1 99 09:26 ? 00:00:38 qemu-system-ppc64 -enable-kvm -name guest=virt-tests-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-virt-tests-vm1/master-key.aes -machine pseries-2.12,accel=kvm,usb=off,dump-guest-core=off -m 32768 -realtime mlock=off -smp 32,sockets=1,cores=32,threads=1 -uuid 108ac2b5-e8b2-4399-a925-a707e8020871 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-virt-tests-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/ubuntu-18.04-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:78:79,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

Userspace tool common name: ii libvirt-bin 4.0.0-1ubuntu3 ppc64el programs for the libvirt library

The userspace tool has the following bit modes: both

Userspace rpm: ii libvirt-bin 4.0.0-1ubuntu3 ppc64el programs for the libvirt library

Userspace tool obtained from project website: na

*Additional Instructions for <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.

== Comment: #1 - Satheesh Rajendran <email address hidden> - 2018-02-23 08:35:17 ==
vm qemu log for failed and passed cases:

Failed:(pseries-bionic)
2018-02-23 14:21:10.806+0000: starting up libvirt version: 4.0.0, package: 1ubuntu3 (Christian Ehrhardt <email address hidden> Mon, 19 Feb 2018 14:18:44 +0100), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu2), hostname: ltc-boston8.aus.stglabs.ibm.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name guest=virt-tests-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-virt-tests-vm1/master-key.aes -machine pseries-bionic,accel=kvm,usb=off,dump-guest-core=off -m 32768 -realtime mlock=off -smp 32,sockets=1,cores=32,threads=1 -uuid 36c37d3b-fb24-4350-94f9-3271b257f75c -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-virt-tests-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/ubuntu-18.04-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:78:79,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
2018-02-23T14:21:10.909242Z qemu-system-ppc64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)
2018-02-23T14:21:11.081809Z qemu-system-ppc64: KVM implementation does not support Transactional Memory, try cap-htm=off
2018-02-23 14:21:18.857+0000: shutting down, reason=failed

Passed:(pseries-2.12)
2018-02-23 14:26:07.047+0000: starting up libvirt version: 4.0.0, package: 1ubuntu3 (Christian Ehrhardt <email address hidden> Mon, 19 Feb 2018 14:18:44 +0100), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu2), hostname: ltc-boston8.aus.stglabs.ibm.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name guest=virt-tests-vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-virt-tests-vm1/master-key.aes -machine pseries-2.12,accel=kvm,usb=off,dump-guest-core=off -m 32768 -realtime mlock=off -smp 32,sockets=1,cores=32,threads=1 -uuid 108ac2b5-e8b2-4399-a925-a707e8020871 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-virt-tests-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/var/lib/libvirt/images/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/ubuntu-18.04-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:78:79,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
2018-02-23T14:26:07.116991Z qemu-system-ppc64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)

Regards,
-Satheesh

== Comment: #8 - VIPIN K. PARASHAR <email address hidden> - 2018-02-25 23:38:29 ==
Starting install...
ERROR internal error: process exited while connecting to monitor: ,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:78:79,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
2018-02-23T14:21:11.081809Z qemu-system-ppc64: KVM implementation does not support Transactional Memory, try cap-htm=off
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect qemu:///system start virt-tests-vm1
otherwise, please restart your installation.

As per above message, qemu is reporting TM to be not supported by KVM on this hardware
and thus recommending to turn off cap-htm.

== Comment: #12 - Suraj Jitindar Singh <email address hidden> - 2018-02-26 23:35:02 ==
I don't know what a pseries-bionic is, there is no reference to it upstream.

What you are seeing is expected behaviour as far as I can tell. POWER9 currently does not support HTM for a guest and thus it must not be turned on, otherwise qemu will fail to start.

HTM can be disabled from the qemu command line by setting cap-htm=off, as stated in the the error message. The pseries-2.12 machine type has htm disabled by default and thus with that machine type there is no requirement to set cap-htm=off on the command line to get qemu to start.

So depending on what machine pseries-bionic is based on it will be required to disable htm on the command line (with cap-htm=off) if it is not disabled by default for the machine.

== Comment: #13 - Satheesh Rajendran <email address hidden> - 2018-02-27 00:05:12 ==
Had a chat with Suraj and here is the summary

1. Currently Power9 DD2.2(host kernel) does not support HTM, so guest should be booted with cap-htm=off, looks like host kernel patch rework in progress--> Initial patch, https://www.spinics.net/lists/kvm-ppc/msg13378.html
2. Libvirt does not know about this cap-htm yet and currently it does not set any default values?
3. pseries-2.12 does have cap-htm=off by default but not the older machine types, so we see the guest is booting from libvirt with pseries-2.12
4. Once 1 is fixed, we can boot cap-htm=on, I guess by that time pseries-2.12 to be changed to cap-htm=on bydefault.

---> 3. Immediate fix can be Canonical defaults their machine type(pseries-bioic) to pseries-2.12...
---> Future 1 and 4 to be addressed, not sure about 2?

Needs a mirror to Canonical to address this.

Regards,
-Satheesh

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-165081 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → qemu (Ubuntu)
no longer affects: qemu
no longer affects: qemu
Changed in ubuntu-power-systems:
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Hi, the long description in this bug is a little confusing. Some of the comments appear to suggest that the required support is not yet upstream.

Could the reporter confirm whether there any current upstream patches that require backporting urgently (the bug is marked as critical)?

Or whether further upstream work is required before backporting should begin.

Thanks, Andy.

Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Canonical Server Team (canonical-server)
assignee: Canonical Server Team (canonical-server) → nobody
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
Thanks Andrew - I agree in general.
The following is based on the assumption that the linked discussion (kernel change) is not upstream yet.
Any clarification on that will help thou.

OTOH I want to start the discussion on the options we have early on.

I have seen the pseries-2.12 changes in the qemu 2.11.1 stable release (didn't like them).
Especially for things like those that you mentioned "... I guess by that time pseries-2.12 to be changed to cap-htm=on by default" is the reason I can't pick a 2.12 type until 2.12 is final and released.

We never can allow a case where pseries-2.12 != pseries-2.12 (for migrations and such).

So at the moment the default pseries-bionic is based on 2.11 being the usual default of qemu 2.11 and the one that is meant to be (and stay) stable.

So on the proposed change "3. Immediate fix can be Canonical defaults their machine type(pseries-bioic) to pseries-2.12" I'm reluctant to do so, as:
  - only pseries would be 2.12
  - there is a high chance we end up with 2.12 != 2.12 down the road

Suggestion #1:
If you (=IBM as the authoritative entity for Power) decide that you want htm to be off in the 2.11 machine type in the Ubuntu 18.04 (=Bionic) release we can do that (as Bionic is not released yet we can still change it).

But that would stay for the entire time of the Bionic release.
So pseries-bionic (the default) => pseries-2.11 (+htm off) will be the default until year 2023

Once (if) the host kernel at some point supports htm properly you can surely change the 2.12 type upstream, we would pick that up and later releases will default to a htm on case then.
Also people could run Bionic (which sets htm=off by default then) and run if needed with a htm=on override.

But even all that would mean that e.g. a new qemu from the Ubuntu cloud archive in a year, would fail the same on a 18.04 base kernel.

The real fix is to get that host support upstream (kernel) and get it in the Ubuntu kernel prior to the release of 18.04 - is that a realistic timeline, when do you expect this gets upstream?

I hope those clarifications helped to see why I think just choosing the 2.12 type is no option.

Thereby Counter-proposing:
1. in qemu we can make default pseries-bionic => pseries-2.11 (+htm off) if you want.
   That makes things safe to use for now, but OTOH htm an opt-in feature on Ubuntu 18.04
   That would stay that way for the support time of 18.04
OR
2. You get the kernel fix upstream asap and Ubuntu integrates before release of 18.04
   Then qemu/libvirt as is would work on P9 DD 2.2+
   (until that happens you can test with an override to set htm=off)

But any decision between #1/#2 depends very much on:
- the expected timeline of your kernel changes
- your preferenc in regard to the htm feature
So it is up to you to clarify on that as Andrew pointed out.

tags: added: kernel-da-key
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-02-27 22:50 EDT-------
1.
If it's decided thatwe want to default to cap-htm=off, then that can be achieved by adding:
```smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;```
to the spapr-bionic machine class init

2.
What is the timeframe we are talking about here, a week/a month? It's hard to give a firm timeframe on the patches going upstream

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

For 1. I mostly agree, the default is currently off in code and in 2.11 there is this for backwards compatability:
  smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;

We would have to
- Moving that "keep the old default" entry to 2.10 (to cover <=2.10)
  spapr_machine_2_10_class_options
   smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;
- And we would set it explicit off in 2.11 (which is what pseries-bionic refers to)
  spapr_machine_2_11_class_options
   smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
- 2.12 we would not change IMHO, that might become whatever it becomes with 2.12 development

For #2 - this is a bug fix so it does not fall under the Feature Freeze (tomorrow).
But I don't know how much lead time the kernel team needs.
Given that kernel fixes are involved this clearly needs a kernel task for them to know about - adding ...

@Kernel - please read the context - what is the last date you'd need to have this commit upstream by IBM to be able to pick it and still be in the initial 18.04 release kernel (not in -updates)?

@IBM - how about this approach:
A) We switch the default to HTM=off in qemu "now" (as soon as you ack this) to be safe
B) If you get the kernel fixes upstream fast enough for the kernel Team to pick up in time:
  B1) a fixed kernel will be pushed (before 18.04 release)
  B2) we unroll this change in qemu (before 18.04 release)

That way we would surely have something that "works" by default via (A) and if (B) is in time we can switch back to "working but with HTM enabled".
And if (B) is too late we will keep HTM disabled in the 2.11/Bionic machine type.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 00:00 EDT-------
*** Bug 165240 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 01:22 EDT-------
Seems like the best option in my opinion

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Sorry, but to be sure is that a clear "yes please disable HTM by default in qemu on ppc64el for Ubuntu 18.04" ?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 10:42 EDT-------
@paelzer yes, that is us agreeing with your plan
>A) We switch the default to HTM=off in qemu "now" (as soon as you ack this) to be safe
>B) If you get the kernel fixes upstream fast enough for the kernel Team to pick up in time:
>...

Sorry for the delay.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 12:41 EDT-------
(In reply to comment #27)
> @paelzer yes, that is us agreeing with your plan

Does it mean that we are not going to have HTM support on KVM guests, even in POWER8?

tags: removed: bugnameltc-165081 kernel-da-key severity-critical triage-g
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 13:17 EDT-------
(In reply to comment #28)
> (In reply to comment #27)
> > @paelzer yes, that is us agreeing with your plan
>
> Does it mean that we are not going to have HTM support on KVM guests, even
> in POWER8?

you would use the 2.10 machine type for that, right?

tags: added: bugnameltc-165081 severity-critical
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-01 15:07 EDT-------
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > @paelzer yes, that is us agreeing with your plan
> >
> > Does it mean that we are not going to have HTM support on KVM guests, even
> > in POWER8?
>
> you would use the 2.10 machine type for that, right?
Right. This is about the new machine type.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

To the mini-discussion above - yes it would be default off on P8 as well then.
But by selecting an older machine type, or - even better - using the new type but with cap-htm=on

Starting the fix in qemu early next week then (the one outlined as (A) in comment #4.

Changed in qemu (Ubuntu):
status: Incomplete → In Progress
Manoj Iyer (manjo)
Changed in qemu (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Server Team (canonical-server)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Critical
Changed in qemu (Ubuntu):
importance: Undecided → Critical
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: There is bug 1753826 which postponed the release/testing of this one a bit.
Currently in rebuild/test together.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Fix pushed to bionic proposed, I'll track migration after it built and some time for the tests have passed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

BTW - tests on P8 are already good on my side, and since the request from IBM came fro P9 I have to assume it will be good there. But e.g. cross release migration X->B and such I had tested explicitly to be sure.

That said, please be aware that this will be a remaining "itch" for you at the current solution.
If somebody had guests on pre-Xenial it will have HTM enabled by default.
If those users migrate to a P9 system on Bionic they will still carry the feature of HTM being enabled and run into the issue, but there is nothing we can do about it other than getting your kernel fix completed and integrated. But I thought I make you aware to be sure.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu4

---------------
qemu (1:2.11+dfsg-1ubuntu4) bionic; urgency=medium

  * d/p/ubuntu/define-ubuntu-machine-types.patch: Disable HTM feature for
    ppc64el in spapr to let the defaults not fail on Power9 HW (LP: #1752026).
  * d/p/ubuntu/lp1753826-memfd-fix-configure-test.patch: fix FTBFS with newer
    versions of glibc >=2.27 (LP: #1753826)

 -- Christian Ehrhardt <email address hidden> Mon, 05 Mar 2018 16:43:01 +0100

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-13 12:52 EDT-------
Paul's RFC patch to kernel, https://www.spinics.net/lists/kvm/msg165629.html

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Patch posted to lkml, but not yet accepted upstream.

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Confirmed → Triaged
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-20 16:38 EDT-------
I went to https://www.spinics.net/lists/kvm/msg165629.html but it only has source code change for the problem.

Can Linux team build a patch against Ubuntu 18.04 kernel 4.15.0-12-generic for test team to install. Thanks

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-21 07:12 EDT-------
The latest version of the patches has been posted and is available at https://patchwork.ozlabs.org/project/kvm-ppc/list/?series=35017. I will add a note when the series has been put in a maintainer's tree.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-21 12:20 EDT-------
Gustavo, would you please add this patch to the Ubuntu kernel you created with the trap/HMI patches?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-22 15:48 EDT-------
- Today, I setup Ubuntu KVM on my Boston system using Ubuntu 18.04 daily build below

http://cdimages.ubuntu.com/ubuntu-server/daily/current/

- With today build, it no longer has the Transaction Memory error when start up a KVM guest. So the fix in this LTCbug mhave made it into Ubuntu daily build.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-03-23 01:53 EDT-------
(In reply to comment #43)
> - Today, I setup Ubuntu KVM on my Boston system using Ubuntu 18.04 daily
> build below
>
> http://cdimages.ubuntu.com/ubuntu-server/daily/current/
>
> - With today build, it no longer has the Transaction Memory error when start
> up a KVM guest. So the fix in this LTCbug mhave made it into Ubuntu daily
> build.

(In reply to comment #43)
> - Today, I setup Ubuntu KVM on my Boston system using Ubuntu 18.04 daily
> build below
>
> http://cdimages.ubuntu.com/ubuntu-server/daily/current/
>
> - With today build, it no longer has the Transaction Memory error when start
> up a KVM guest. So the fix in this LTCbug mhave made it into Ubuntu daily
> build.

Hi Nguyen,

Pauls patch(https://patchwork.ozlabs.org/project/kvm-ppc/list/?series=35017) yet to get merged in linux master as of now{f36b7534b833 (HEAD -> master, upstream/master) Merge branch 'akpm' (patches from Andrew)}

Does ubuntu daily kernel include custom patches?

If it does not include custom patches and one reason why you do not hit the issue now, coz qemu disable cap-htm by default temporarily till the above kernel patches included, check if that is the case.

you can confirm the TM patches are included by explicitly start qemu-kvm command with cap-htm=on

Regards,
-Satheesh.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-02 19:22 EDT-------
The patches that make it possible to use HTM in guests running on POWER9 processors are now in the PowerPC kernel maintainer tree and will be requested to get merged into kernel 4.17:

681c617b7c42 KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state
87a11bb6a7f7 KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
4bb3c7a0208f KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
7672691a08c8 powerpc/powernv: Provide a way to force a core into SMT4 mode
b5af4f279323 powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
9bbf0b576d32 powerpc: Free up CPU feature bits on 64-bit machines
dd0efb3f11cc powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
c0d64cf9fefd powerpc: Use feature bit for RTC presence rather than timebase presence

Fom Paul Mackerras: for a backport, we can probably avoid the feature bit rework, I hope, and just find two free CPU feature bits. If there aren't 2 free feature bits then let me know, we might be able to scavenge some that are only used on Book E or something.

Frank Heimes (fheimes)
tags: added: kernel-da-key
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Seth Forshee (sforshee)
status: Incomplete → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

Since the kernel team is now investigating the integration of the kernel patches into bionic (the kernel bits are already Fix Committed), we should plan to get the temporary workaround again removed from qemu-kvm.
It's hard to time it in a way that the kernel changes are rolled-out and the qemu workaround got removed at the same time.
So this could lead to a short time of the qemu mitigation being reverted, but the kernel not yet being released, which would make you need the cap-htm=0ff workaround. But that way it would be much safer that both changes are available prior to the release of 18.04 Bionic.
Hence the question (to IBM) is if it would be okay to plan ahead and to get the qemu changes reverted back now?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-04 04:18 EDT-------
(In reply to comment #48)
> Since the kernel team is now investigating the integration of the kernel
> patches into bionic (the kernel bits are already Fix Committed), we should
> plan to get the temporary workaround again removed from qemu-kvm.
> It's hard to time it in a way that the kernel changes are rolled-out and the
> qemu workaround got removed at the same time.
> So this could lead to a short time of the qemu mitigation being reverted,
> but the kernel not yet being released, which would make you need the
> cap-htm=0ff workaround. But that way it would be much safer that both
> changes are available prior to the release of 18.04 Bionic.
> Hence the question (to IBM) is if it would be okay to plan ahead and to get
> the qemu changes reverted back now?

Yes, we need the qemu workaround to be removed. I see all required kernel patches are in master-next of bionic. Understand that both of these cannot be timed.. but having qemu changes revert today.. would we get updated kernel and qemu in tomorrows daily build? just wanted to understand what would be time window..

Revision history for this message
Seth Forshee (sforshee) wrote :

It's not possible to turn around a kernel that quickly. I intend to get a kernel with the fix uploaded bionic-proposed today, but it takes a few days at minimum to get it built, tested, and promoted to the -release pocket.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Seth - that is fine, for the Kernel we only need to rely on "will be out before Bionic release" and that looks good - don't feel pushed.
The updates were about asking IBM "If we assume the kernel fixes will be there, should we remove the qemu mitigation (as we can't remove it after Bionic release date)".

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-04 08:50 EDT-------
Hi Frank,

First of all, thanks for caring about this bug and accepting the out-of-the-tree patches.

You are right, the motivation to include this patchset is to re-enable the HTM in the KVM guests. So, we just need to schedule the fixes on both side.

Here are some assumptions I have:

1) If you need to revert the qemu-kvm "HTM off" patches right now, We can survive, since we have a internal kernel that contains the fix, and we can use this custom kernel in the mean time. Not a big deal.

2) On the other side, I understand that the Final Freeze for Ubuntu will be in April 19th, so, we still have some time to release qemu, how long can we wait without affecting the time to we spend testing this package?

3) Releasing the fixed kernel prior to the fixed qemu package would be better than the other way around.

4) Can't we fix fix qemu now and keep it in the proposed until the kernel is released?

Thanks!

------- Comment From <email address hidden> 2018-04-04 08:55 EDT-------
One other possibility could be to have the changes going into the kernel and, then, remove the workaround from QEMU. QEMU with the workaround should continue to work with the kernel with the proper changes. HTM will be disabled in the guests, which would not be needed anymore, but would not block a VM from running.

Anyway, I am OK either way. We need to make progress here and I understand this came in late. We need to have the fixes in the kernel and the workaround out of QEMU. If that means we will have a broken QEMU for few days, that would be OK. We can continue our tests with previous versions of kernel and QEMU until everything is settled in bionic repositories or disable HTM by hand when running tests.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Breno - I agree to #3 but since we have no hard ETA on the kernel I want to avoid punting qemu to the very last few days. History told me that always something happens/blocks and if we would miss GA we can't SRu to keep the final pseries-bionic type in sync.

For #4 there is no good "keep in proposed" for the current Dev release.

I discussed with lagarcia and JFH on IRC once more:
...
[15:02] <lagarcia_> cpaelzer, I am OK with that. TBH, we can live with QEMU removing the workaround now or after the kernel has been changed.
[15:05] <cpaelzer> there was a bug update by breno 3 minutes ago
[15:05] * cpaelzer is reading
[15:06] <cpaelzer> oh I see, and your comment - mirrored both at once
[15:06] <cpaelzer> my concern is that if anything comes up late
[15:06] <cpaelzer> we might end up with the qemu change not reverted
[15:07] <cpaelzer> as after release it becomes an issue
[15:07] <cpaelzer> and will no more be removable
[15:07] <cpaelzer> for consistency of the pseries-bionic type
[15:07] <jfh1> cpaelzer, lagarcia: ah - just saw the ticket update and a reply from Breno ...
[15:08] <cpaelzer> jfh1: do we have anything like an expected date by the kernel Team?
[15:08] <cpaelzer> I'd not want to wait with qemu later than end of this week TBH
[15:08] <cpaelzer> history teached me not to try changing things last minute
[15:08] <jfh1> cpaelzer: I agree - there is always the option to pin a package to prevent it from getting updated
[15:09] <jfh1> that can be an option for those guys who still need to KVM on P9 ...
[15:09] <cpaelzer> jfh1: lagarcia_: so are we agreeing that I'll revert the avoidance in qemu now considering the various constraints?
[15:09] * cpaelzer is +1
[15:10] <jfh1> I think so ...
[15:12] <lagarcia_> cpaelzer, yep
[15:13] <lagarcia_> cpaelzer, when the patches reach bionic kernel, everything should work out of the box. Meanwhile, we can implement the workaround by hand.

With all that said, I'm including the revert of the current mitigation from the next qemu upload.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Revert will be hanlded via bug 1761175

Revision history for this message
Frank Heimes (fheimes) wrote :

@ lagarcia and Breno:
People/tester who still need the patched qemu can prevent that from being upgraded by pinning it aka marking it as hold (https://help.ubuntu.com/community/PinningHowto).
Even in case of an accidental upgrade - it's easy to go again back to that version.

Frank Heimes (fheimes)
tags: added: triage-g
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-10 06:53 EDT-------
We have got the qemu build with htm-on today [April 10th]. Now we are able to start compat mode guests with htm-on.. apart from bug 166570 things look good. We can close this one.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.4 KiB)

This bug was fixed in the package linux - 4.15.0-15.16

---------------
linux (4.15.0-15.16) bionic; urgency=medium

  * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177)

  * FFe: Enable configuring resume offset via sysfs (LP: #1760106)
    - PM / hibernate: Make passing hibernate offsets more friendly

  * /dev/bcache/by-uuid links not created after reboot (LP: #1729145)
    - SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent

  * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine
    type(pseries-bionic) complaining "KVM implementation does not support
    Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026)
    - powerpc: Use feature bit for RTC presence rather than timebase presence
    - powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
    - powerpc: Free up CPU feature bits on 64-bit machines
    - powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
    - powerpc/powernv: Provide a way to force a core into SMT4 mode
    - KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
    - KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
    - KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

  * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910)
    - powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

  * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
    namespaces (Bolt / NVMe) (LP: #1757497)
    - powerpc/64s: Fix lost pending interrupt due to race causing lost update to
      irq_happened

  * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module
    failed to build (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

linux (4.15.0-14.15) bionic; urgency=medium

  * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678)

  * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor
    (LP: #1758662)
    - net/mlx4_en: Change default QoS settings

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
    (LP: #1759312)
    - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * Bionic update to 4.15.15 stable release (LP: #1760585)
    - net: dsa: Fix dsa_is_user_port() test inversion
    - openvswitch: meter: fix the incorrect calculation of max delta_t
    - qed: Fix MPA unalign flow in case header is split across two packets.
    - tcp: purge write queue upon aborting the connection
    - qed: Fix non TCP packets should be dropped on iWARP ll2 connection
    - sysfs: symlink: export sysfs_create_link_nowarn()
    - net: phy: relax error checking when creating sysfs link netdev->phydev
    - devlink: Remove redundant free on error path
    - macvlan: filter out unsupported feature flags
    - net: ipv6: keep sk status consistent after datagram connect failure
    - ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
    - ipv6: sr: fix NULL pointer dereference when setting encap source address
    - ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
    - mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
    - net: phy: Tell caller result ...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andy Whitcroft (apw)
tags: added: kernel-fixup-verification-needed-bionic
removed: verification-needed-bionic
Brad Figg (brad-figg)
tags: added: verification-needed-bionic
Revision history for this message
Andy Whitcroft (apw) wrote :

This bug was erroneously marked for verification in bionic; verification is not required and verification-needed-bionic is being removed.

tags: removed: verification-needed-bionic
tags: added: verification-done-bionic
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.