Regression: problems migrating recent wily/vivid Xen VMs due to memory hotplug fix

Bug #1542941 reported by Malcolm Scott
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned
Vivid
Fix Released
Undecided
Unassigned
Wily
Fix Released
Undecided
Unassigned

Bug Description

Commit 633d6f17cd91ad5bf2370265946f716e42d388c6 (aka 38d30afb12140c0e3a446fe779dc9cd29548f313 in vivid) in Xen domU causes high resource requirements in the underlying target dom0 (migrating a 64-bit domU involves a 1GB malloc in dom0, as well as a lot of unnecessary work).

In my specific case, that 1GB malloc fails as my dom0s aren't big enough, causing all migrations to fail with the migrating VM suspended.

This is fixed in http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=98dd166ea3a3c3b57919e20d9b0d1237fcd0349d :

"""
x86/xen/p2m: hint at the last populated P2M entry"

With commit 633d6f17cd91ad5bf2370265946f716e42d388c6 (x86/xen: prepare
p2m list for memory hotplug) the P2M may be sized to accomdate a much
larger amount of memory than the domain currently has.

When saving a domain, the toolstack must scan all the P2M looking for
populated pages. This results in a performance regression due to the
unnecessary scanning.

Instead of reporting (via shared_info) the maximum possible size of
the P2M, hint at the last PFN which might be populated. This hint is
increased as new leaves are added to the P2M (in the expectation that
they will be used for populated entries).

Signed-off-by: David Vrabel <email address hidden>
Cc: <email address hidden> # 4.0+
"""

CVE References

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1542941

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: vivid
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

ubuntu-vivid.git:
38d30afb12140c0e3a446fe779dc9cd29548f313 x86/xen: prepare p2m list for memory hotplug
git describe --contains 38d30afb12140c0e3a446fe779dc9cd29548f313
Ubuntu-3.19.0-17.17~133

ubuntu-wily.git:
633d6f17cd91ad5bf2370265946f716e42d388c6 x86/xen: prepare p2m list for memory hotplug
git describe --contains 633d6f17cd91ad5bf2370265946f716e42d388c6
v4.0-rc7~14^2~1

This patch has been in both kernels for quite awhile now.

Revision history for this message
Malcolm Scott (malcscott) wrote :

That's the patch which *introduces* the regression.

The missing one is 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Malcolm - perhaps you should correct the bug description. The subject for 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d does not match upstream (should be 'x86/xen/p2m: hint at the last populated P2M entry'). nor does the commit log entry make sense.

tags: added: kernel-da-key
removed: vivid
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
penalvch (penalvch) wrote :

Malcolm Scott, I've adjust the Bug Description as presumed intention. Please adjust as applicable.

tags: added: cherry-pick
tags: added: reverse-bisect-done
description: updated
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Malcolm Scott (malcscott) wrote :

Apologies for the confusion; I had the right commit ID but pasted the wrong description. Now fixed, I think!

description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
Changed in linux (Ubuntu Wily):
status: New → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
tags: added: verification-needed-wily
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-done-wily
removed: verification-needed-wily
tags: added: verification-done-vivid
removed: verification-needed-vivid
Revision history for this message
Malcolm Scott (malcscott) wrote :

Verified fixed in:

wily-proposed: linux-image-4.2.0-32-generic:amd64 4.2.0-32.37
vivid-propsed: linux-image-3.19.0-53-generic:amd64 3.19.0-53.59

(Note: the vivid-proposed kernel does run into another bug on Xen migrate which can cause the VM to freeze, but that is unrelated and is not a regression in this version -- the same happens on the latest released vivid kernel. I'll file that bug separately once I've got to the bottom of it. _This_ bug is fixed in both vivid-proposed and wily-proposed.)

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.7 KiB)

This bug was fixed in the package linux - 4.2.0-34.39

---------------
linux (4.2.0-34.39) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555821

  [ Florian Westphal ]

  * SAUCE: [nf] netfilter: x_tables: check for size overflow
    - LP: #1555353
  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (4.2.0-33.38) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554649

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608
  * cxl: Fix PSL timebase synchronization detection
    - LP: #1532914

linux (4.2.0-32.37) wily; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1550045

  [ Kamal Mostafa ]

  * Merged back Ubuntu-4.2.0-31.36

linux (4.2.0-31.36) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548579

  [ Andy Whitcroft ]

  * [Debian] hv: hv_set_ifconfig -- convert to python3
    - LP: #1506521
  * [Debian] hv: hv_set_ifconfig -- switch to approved indentation
    - LP: #1540586
  * [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
    - LP: #1540586

  [ Carol L Soto ]

  * SAUCE: IB/IPoIB: Do not set skb truesize since using one linearskb
    - LP: #1541326

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Tim Gardner ]

  * Revert "SAUCE: (noup) cxlflash: Fix to avoid virtual LUN failover
    failure"
    - LP: #1541635
  * Revert "SAUCE: (noup) cxlflash: Fix to escalate LINK_RESET also on port
    1"
    - LP: #1541635
  * [Config] ARMV8_DEPRECATED=y
    - LP: #1545542

  [ Upstream Kernel Changes ]

  * x86/xen/p2m: hint at the last populated P2M entry
    - LP: #1542941
  * mm: add dma_pool_zalloc() call to DMA API
    - LP: #1543737
  * sctp: Prevent soft lockup when sctp_accept() is called during a timeout
    event
    - LP: #1543737
  * xen-netback: respect user provided max_queues
    - LP: #1543737
  * xen-netfront: respect user provided max_queues
    - LP: #1543737
  * xen-netfront: update num_queues to real created
    - LP: #1543737
  * iio: adis_buffer: Fix out-of-bounds memory access
    - LP: #1543737
  * KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
    - LP: #1543737
  * KVM: PPC: Fix ONE_REG AltiVec support
    - LP: #1543737
  * x86/irq: Call chip->irq_set_affinity in proper context
    - LP: #1543737
  * drm/amdgpu: fix tonga smu resume
    - LP: #1543737
  * perf kvm record/report: 'unprocessable sample' error while
    recording/reporting guest data
    - LP: #1543737
  * hrtimer: Handle remaining time proper for TIME_LOW_RES
    - LP: #1543737
  * timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * drm/amdgpu: Use drm_calloc_large for VM page_tables array
    - LP: #1543737
  * drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2
    - LP: #1543737
  * drm/radeon: properly byte swap vce firmware setup
    - LP: #1543737
  ...

Read more...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.3 KiB)

This bug was fixed in the package linux - 3.19.0-56.62

---------------
linux (3.19.0-56.62) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555832

  [ Florian Westphal ]

  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (3.19.0-55.61) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554708

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608

linux (3.19.0-54.60) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1552337

  [ Upstream Kernel Changes ]

  * Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
    - LP: #1551419

linux (3.19.0-53.59) vivid; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1550576

  [ Kamal Mostafa ]

  * Merged back 3.19.0-52.58

linux (3.19.0-52.58) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548548

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Upstream Kernel Changes ]

  * Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
    - LP: #1542457
  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1546320
  * net: ipmr: fix static mfc/dev leaks on table destruction
    - LP: #1542457
  * drm/nouveau/nv46: Change mc subdev oclass from nv44 to nv4c
    - LP: #1542457
  * ovl: allow zero size xattr
    - LP: #1542457
  * ovl: use a minimal buffer in ovl_copy_xattr
    - LP: #1542457
  * [media] vb2: fix a regression in poll() behavior for output,streams
    - LP: #1542457
  * [media] gspca: ov534/topro: prevent a division by 0
    - LP: #1542457
  * [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
    - LP: #1542457
  * tools lib traceevent: Fix output of %llu for 64 bit values read on 32
    bit machines
    - LP: #1542457
  * KVM: x86: expose MSR_TSC_AUX to userspace
    - LP: #1542457
  * KVM: x86: correctly print #AC in traces
    - LP: #1542457
  * drm/radeon: call hpd_irq_event on resume
    - LP: #1542457
  * xhci: refuse loading if nousb is used
    - LP: #1542457
  * arm64: Clear out any singlestep state on a ptrace detach operation
    - LP: #1542457
  * time: Avoid signed overflow in timekeeping_get_ns()
    - LP: #1542457
  * ovl: root: copy attr
    - LP: #1542457
  * Bluetooth: Add support of Toshiba Broadcom based devices
    - LP: #1522949, #1542457
  * rtlwifi: fix memory leak for USB device
    - LP: #1542457
  * wlcore/wl12xx: spi: fix oops on firmware load
    - LP: #1542457
  * ovl: check dentry positiveness in ovl_cleanup_whiteouts()
    - LP: #1542457
  * EDAC, mc_sysfs: Fix freeing bus' name
    - LP: #1542457
  * EDAC: Robustify workqueues destruction
    - LP: #1542457
  * arm64: mm: ensure that the zero page is visible to the page table
    walker
    - LP: #1542457
  * powerpc: Make value-returning atomics fully ordered
    - LP: #1542457
  * powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
    - LP: #1542457
  * dm space map metadata: remove unused variable in brb_pop()
    - LP: #1542457
  * dm thi...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.