~kamalmostafa/ubuntu/+source/linux-aws/+git/bionic:aws-hibernation-lp1804533

Last commit made on 2018-11-21
Get this branch:
git clone -b aws-hibernation-lp1804533 https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux-aws/+git/bionic
Only Kamal Mostafa can upload to this branch. If you are Kamal Mostafa please log in for upload directions.

Branch merges

Branch information

Name:
aws-hibernation-lp1804533
Repository:
lp:~kamalmostafa/ubuntu/+source/linux-aws/+git/bionic

Recent commits

ef893ba... by Aleksei Besogonov

UBUNTU: SAUCE [aws] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA

BugLink: http://bugs.launchpad.net/bugs/1804533

The SNAPSHOT_SET_SWAP_AREA is supposed to be used to set the hibernation
offset on a running kernel to enable hibernating to a swap file.
However, it doesn't actually update the swsusp_resume_block variable. As
a result, the hibernation fails at the last step (after all the data is
written out) in the validation of the swap signature in
mark_swapfiles().

Before this patch, the command line processing was the only place where
swsusp_resume_block was set.

Signed-off-by: Aleksei Besogonov <email address hidden>
Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

a1be81a... by Munehisa Kamata

UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in system core suspend callback

BugLink: http://bugs.launchpad.net/bugs/1804533

Close event channels allocated for devices which are backed by PIRQ and
still active when suspending the system core. Normally, the devices are
emulated legacy devices, e.g. PS/2 keyboard, floppy controller and etc.

Without this, in PM hibernation, information about the event channel
remains in hibernation image, but there is no guarantee that the same
event channel numbers are assigned to the devices when restoring the
system. This may cause conflict like the following and prevent some
devices from being restored correctly.

[ 102.330821] ------------[ cut here ]------------
[ 102.333264] WARNING: CPU: 0 PID: 2324 at
drivers/xen/events/events_base.c:878 bind_evtchn_to_irq+0x88/0xf0
...
[ 102.348057] Call Trace:
[ 102.348057] [<ffffffff813001df>] dump_stack+0x63/0x84
[ 102.348057] [<ffffffff81071811>] __warn+0xd1/0xf0
[ 102.348057] [<ffffffff810718fd>] warn_slowpath_null+0x1d/0x20
[ 102.348057] [<ffffffff8139a1f8>] bind_evtchn_to_irq+0x88/0xf0
[ 102.348057] [<ffffffffa00cd420>] ? blkif_copy_from_grant+0xb0/0xb0 [xen_blkfront]
[ 102.348057] [<ffffffff8139a307>] bind_evtchn_to_irqhandler+0x27/0x80
[ 102.348057] [<ffffffffa00cc785>] talk_to_blkback+0x425/0xcd0 [xen_blkfront]
[ 102.348057] [<ffffffff811e0c8a>] ? __kmalloc+0x1ea/0x200
[ 102.348057] [<ffffffffa00ce84d>] blkfront_restore+0x2d/0x60 [xen_blkfront]
[ 102.348057] [<ffffffff813a0078>] xenbus_dev_restore+0x58/0x100
[ 102.348057] [<ffffffff813a1ff0>] ? xenbus_frontend_delayed_resume+0x20/0x20
[ 102.348057] [<ffffffff813a200e>] xenbus_dev_cond_restore+0x1e/0x30
[ 102.348057] [<ffffffff813f797e>] dpm_run_callback+0x4e/0x130
[ 102.348057] [<ffffffff813f7f17>] device_resume+0xe7/0x210
[ 102.348057] [<ffffffff813f7810>] ? pm_dev_dbg+0x80/0x80
[ 102.348057] [<ffffffff813f9374>] dpm_resume+0x114/0x2f0
[ 102.348057] [<ffffffff810c00cf>] hibernation_snapshot+0x15f/0x380
[ 102.348057] [<ffffffff810c0ac3>] hibernate+0x183/0x290
[ 102.348057] [<ffffffff810be1af>] state_store+0xcf/0xe0
[ 102.348057] [<ffffffff813020bf>] kobj_attr_store+0xf/0x20
[ 102.348057] [<ffffffff8127c88a>] sysfs_kf_write+0x3a/0x50
[ 102.348057] [<ffffffff8127c3bb>] kernfs_fop_write+0x10b/0x190
[ 102.348057] [<ffffffff81200008>] __vfs_write+0x28/0x120
[ 102.348057] [<ffffffff81200c19>] ? rw_verify_area+0x49/0xb0
[ 102.348057] [<ffffffff81200e62>] vfs_write+0xb2/0x1b0
[ 102.348057] [<ffffffff81202196>] SyS_write+0x46/0xa0
[ 102.348057] [<ffffffff81520cf7>] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 102.423005] ---[ end trace b8d6718e22e2b107 ]---
[ 102.425031] genirq: Flags mismatch irq 6. 00000000 (blkif) vs. 00000000 (floppy)

Note that we don't explicitly re-allocate event channels for such
devices in the resume callback. Re-allocation will occur when PM core
re-enable IRQs for the devices at later point.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

90ed798... by Munehisa Kamata

UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper function

BugLink: http://bugs.launchpad.net/bugs/1804533

Add a simple helper function to "shutdown" active PIRQs, which actually
closes event channels but keeps related IRQ structures intact. PM
suspend/hibernation code will rely on this.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

6099c2f... by Munehisa Kamata

UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock

BugLink: http://bugs.launchpad.net/bugs/1804533

Save steal clock values of all present CPUs in the system core ops
suspend callbacks. Also, restore a boot CPU's steal clock in the system
core resume callback. For non-boot CPUs, restore after they're brought
up, because runstate info for non-boot CPUs are not active until then.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

aa4ef43... by Munehisa Kamata

UBUNTU: SAUCE [aws] xen-time-introduce-xen_-save-restore-_steal_clock

BugLink: http://bugs.launchpad.net/bugs/1804533

Currently, steal time accounting code in scheduler expects steal clock
callback to provide monotonically increasing value. If the accounting
code receives a smaller value than previous one, it uses a negative
value to calculate steal time and results in incorrectly updated idle
and steal time accounting. This breaks userspace tools which read
/proc/stat.

top - 08:05:35 up 2:12, 3 users, load average: 0.00, 0.07, 0.23
Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,30100.0%id, 0.0%wa, 0.0%hi, 0.0%si,-1253874204672.0%st

This can actually happen when a Xen PVHVM guest gets restored from
hibernation, because such a restored guest is just a fresh domain from
Xen perspective and the time information in runstate info starts over
from scratch.

This patch introduces xen_save_steal_clock() which saves current values
in runstate info into per-cpu variables. Its couterpart,
xen_restore_steal_clock(), sets offset if it found the current values in
runstate info are smaller than previous ones. xen_steal_clock() is also
modified to use the offset to ensure that scheduler only sees
monotonically increasing number.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

48f3103... by Munehisa Kamata

UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and hibernation support

BugLink: http://bugs.launchpad.net/bugs/1804533

Add freeze and restore callbacks for PM suspend and hibernation support.
The freeze handler simply disconnects the frotnend from the backend and
frees resources associated with queues after disabling the net_device
from the system. The restore handler just changes the frontend state and
let the xenbus handler to re-allocate the resources and re-connect to the
backend. This can be performed transparently to the rest of the system.
The handlers are used for both PM suspend and hibernation so that we can
keep the existing suspend/resume callbacks for Xen suspend without
modification. Freezing netfront devices is normally expected to finish within a few
hundred milliseconds, but it can rarely take more than 5 seconds and
hit the hard coded timeout, it would depend on backend state which may
be congested and/or have complex configuration. While it's rare case,
longer default timeout seems a bit more reasonable here to avoid hitting
the timeout. Also, make it configurable via module parameter so that we
can cover broader setups than what we know currently.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

f266d75... by Munehisa Kamata

UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume callbacks

BugLink: http://bugs.launchpad.net/bugs/1804533

Add Xen PVHVM specific system core callbacks for PM suspend and
hibernation support. The callbacks suspend and resume Xen primitives,
like shared_info, pvclock and grant table. Note that Xen suspend can
handle them in a different manner, but system core callbacks are called
from the context. So if the callbacks are called from Xen suspend
context, return immediately.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>

cc926ac... by Anchal Agarwal <email address hidden>

UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume

BugLink: http://bugs.launchpad.net/bugs/1804533

Introduce a small function which re-uses shared page's PA allocated
during guest initialization time in reserve_shared_info() and not
allocate new page during resume flow.
It also does the mapping of shared_info_page by calling
xen_hvm_init_shared_info() to use the function.

Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Sebastian Biemueller <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
CR: https://cr.amazon.com/r/8273203/
Signed-off-by: Kamal Mostafa <email address hidden>

38c16eb... by Khaled El Mously

UBUNTU: Ubuntu-aws-4.15.0-1028.29

Signed-off-by: Khalid Elmously <email address hidden>

ae164a5... by Khaled El Mously

UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/1802558
Signed-off-by: Khalid Elmously <email address hidden>