The SNAPSHOT_SET_SWAP_AREA is supposed to be used to set the hibernation
offset on a running kernel to enable hibernating to a swap file.
However, it doesn't actually update the swsusp_resume_block variable. As
a result, the hibernation fails at the last step (after all the data is
written out) in the validation of the swap signature in
mark_swapfiles().
Before this patch, the command line processing was the only place where
swsusp_resume_block was set.
Freezing netfront devices is normally expected to finish within a few
hundred milliseconds, but it can rarely take more than 5 seconds and
hit the hard coded timeout, it would depend on backend state which may
be congested and/or have complex configuration. While it's rare case,
longer default timeout seems a bit more reasonable here to avoid hitting
the timeout. Also, make it configurable via module parameter so that we
can cover broader setups than what we know currently.
Currently, Xen CPU hotplug code only handles CPU_UP_PREPARE action, but
it's not sufficient to perform PM suspend/hibernation in guests with
multiple VCPUs. Let's handle CPU_UP_PREPARE_FROZEN so that non-boot
processors can be configured properly.
Add freeze and restore callbacks for PM suspend and hibernation support.
The freeze handler stops a block-layer queue and disconnect the frontend
from the backend while freeing ring_info and associated resources. The
restore handler re-allocates ring_info and re-connect to the backedend,
so the rest of the kernel can continue to use the block device
transparently. The handlers are used for both PM suspend and hibernation
so that we can keep the existing suspend/resume callbacks for Xen
suspend without modification.
Signed-off-by: Munehisa Kamata <email address hidden>
Reviewed-by: Vallish Vaidyeshwara <email address hidden>
Reviewed-by: Eduardo Valentin <email address hidden>
Reviewed-by: Anchal Agarwal <email address hidden>
CR: https://cr.amazon.com/r/7367680/
[kamal: backport to 4.4: context at end of xlvbd_init_blk_queue]
Signed-off-by: Kamal Mostafa <email address hidden>
Add freeze and restore callbacks for PM suspend and hibernation support.
The freeze handler simply disconnects the frotnend from the backend and
frees resources associated with queues after disabling the net_device
from the system. The restore handler just changes the frontend state and
let the xenbus handler to re-allocate the resources and re-connect to the
backend. This can be performed transparently to the rest of the system.
The handlers are used for both PM suspend and hibernation so that we can
keep the existing suspend/resume callbacks for Xen suspend without
modification.
Close event channels allocated for devices which are backed by PIRQ and
still active when suspending the system core. Normally, the devices are
emulated legacy devices, e.g. PS/2 keyboard, floppy controller and etc.
Without this, in PM hibernation, information about the event channel
remains in hibernation image, but there is no guarantee that the same
event channel numbers are assigned to the devices when restoring the
system. This may cause conflict like the following and prevent some
devices from being restored correctly.
Note that we don't explicitly re-allocate event channels for such
devices in the resume callback. Re-allocation will occur when PM core
re-enable IRQs for the devices at later point.
Add a simple helper function to "shutdown" active PIRQs, which actually
closes event channels but keeps related IRQ structures intact. PM
suspend/hibernation code will rely on this.
Save steal clock values of all present CPUs in the system core ops
suspend callbacks. Also, restore a boot CPU's steal clock in the system
core resume callback. For non-boot CPUs, restore after they're brought
up, because runstate info for non-boot CPUs are not active until then.
Currently, steal time accounting code in scheduler expects steal clock
callback to provide monotonically increasing value. If the accounting
code receives a smaller value than previous one, it uses a negative
value to calculate steal time and results in incorrectly updated idle
and steal time accounting. This breaks userspace tools which read
/proc/stat.
This can actually happen when a Xen PVHVM guest gets restored from
hibernation, because such a restored guest is just a fresh domain from
Xen perspective and the time information in runstate info starts over
from scratch.
This patch introduces xen_save_steal_clock() which saves current values
in runstate info into per-cpu variables. Its couterpart,
xen_restore_steal_clock(), sets offset if it found the current values in
runstate info are smaller than previous ones. xen_steal_clock() is also
modified to use the offset to ensure that scheduler only sees
monotonically increasing number.
Add Xen PVHVM specific system core callbacks for PM suspend and
hibernation support. The callbacks suspend and resume Xen primitives,
like shared_info, pvclock and grant table. Note that Xen suspend can
handle them in a different manner, but system core callbacks are called
from the context. So if the callbacks are called from Xen suspend
context, return immediately.