~arighi/ubuntu/+source/linux-aws/+git/groovy:aws-xen-hibernation-sync

Last commit made on 2021-02-24
Get this branch:
git clone -b aws-xen-hibernation-sync https://git.launchpad.net/~arighi/ubuntu/+source/linux-aws/+git/groovy
Only Andrea Righi can upload to this branch. If you are Andrea Righi please log in for upload directions.

Branch merges

Branch information

Name:
aws-xen-hibernation-sync
Repository:
lp:~arighi/ubuntu/+source/linux-aws/+git/groovy

Recent commits

f7396c3... by Eduardo Valentin <email address hidden>

UBUNTU: SAUCE: x86: tsc: avoid system instability in hibernation

BugLink: https://bugs.launchpad.net/bugs/1913410

System instability are seen during resume from hibernation when system
is under heavy CPU load. This is due to the lack of update of sched
clock data, and the scheduler would then think that heavy CPU hog
tasks need more time in CPU, causing the system to freeze
during the unfreezing of tasks. For example, threaded irqs,
and kernel processes servicing network interface may be delayed
for several tens of seconds, causing the system to be unreachable.

Situation like this can be reported by using lockup detectors
such as workqueue lockup detectors:

[root@ip-172-31-67-114 ec2-user]# echo disk > /sys/power/state

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:23:21 ...
 kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 57s!

Message from syslogd@ip-172-31-67-114 at May 7 18:29:06 ...
 kernel:BUG: workqueue lockup - pool cpus=3 node=0 flags=0x1 nice=0 stuck for 403s!

The fix for this situation is to mark the sched clock as unstable
as early as possible in the resume path, leaving it unstable
for the duration of the resume process. This will force the
scheduler to attempt to align the sched clock across CPUs using
the delta with time of day, updating sched clock data. In a post
hibernation event, we can then mark the sched clock as stable
again, avoiding unnecessary syncs with time of day on systems
in which TSC is reliable.

Reviewed-by: Erik Quanstrom <email address hidden>
Reviewed-by: Frank van der Linden <email address hidden>
Reviewed-by: Balbir Singh <email address hidden>
Reviewed-by: Munehisa Kamata <email address hidden>
Tested-by: Anchal Agarwal <email address hidden>
Signed-off-by: Eduardo Valentin <email address hidden>

CR: https://cr.amazon.com/r/8440112/
(cherry picked from aae9984b06c262e9bb9995c7015c940932307427 https://github.com/amazonlinux/linux.git)
Signed-off-by: Andrea Righi <email address hidden>

51ad85d... by Andrea Righi

UBUNTU: SAUCE: xen-netfront: prevent unnecessary close on hibernate

BugLink: https://bugs.launchpad.net/bugs/1906850

If the device in the Xen bus is already in the "closed" state when
hibernating there's no need to close the bus again. Doing so can only
cause errors that would prevent the system to hibernate correctly.

Signed-off-by: Andrea Righi <email address hidden>
Acked-by: William Breathitt Gray <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Signed-off-by: Ian May <email address hidden>
Signed-off-by: Andrea Righi <email address hidden>

23296c0... by Anchal Agarwal <email address hidden>

UBUNTU: SAUCE: xen: Update sched clock offset to avoid system instability in hibernation

BugLink: https://bugs.launchpad.net/bugs/1913410

Save/restore xen_sched_clock_offset in syscore suspend/resume during PM
hibernation. Commit '867cefb4cb1012: ("xen: Fix x86 sched_clock() interface
for xen")' fixes xen guest time handling during migration. A similar issue
is seen during PM hibernation when system runs CPU intensive workload.
Post resume pvclock resets the value to 0 however, xen sched_clock_offset
is never updated. System instability is seen during resume from hibernation
when system is under heavy CPU load. Since xen_sched_clock_offset is not
updated, system does not see the monotonic clock value and the scheduler
would then think that heavy CPU hog tasks need more time in CPU, causing
the system to freeze

Signed-off-by: Anchal Agarwal <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

ec1a12a... by Anchal Agarwal <email address hidden>

UBUNTU: SAUCE: xen: Introduce wrapper for save/restore sched clock offset

BugLink: https://bugs.launchpad.net/bugs/1913410

Introduce wrappers for save/restore xen_sched_clock_offset to be
used by PM hibernation code to avoid system instability during resume.

Signed-off-by: Anchal Agarwal <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

ccf159a... by Munehisa Kamata

UBUNTU: SAUCE: x86/xen: save and restore steal clock

BugLink: https://bugs.launchpad.net/bugs/1913410

Save steal clock values of all present CPUs in the system core ops
suspend callbacks. Also, restore a boot CPU's steal clock in the system
core resume callback. For non-boot CPUs, restore after they're brought
up, because runstate info for non-boot CPUs are not active until then.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

0a20890... by Munehisa Kamata

UBUNTU: SAUCE: xen/time: introduce xen_{save,restore}_steal_clock

BugLink: https://bugs.launchpad.net/bugs/1913410

Currently, steal time accounting code in scheduler expects steal clock
callback to provide monotonically increasing value. If the accounting
code receives a smaller value than previous one, it uses a negative
value to calculate steal time and results in incorrectly updated idle
and steal time accounting. This breaks userspace tools which read
/proc/stat.

top - 08:05:35 up 2:12, 3 users, load average: 0.00, 0.07, 0.23
Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,30100.0%id, 0.0%wa, 0.0%hi, 0.0%si,-1253874204672.0%st

This can actually happen when a Xen PVHVM guest gets restored from
hibernation, because such a restored guest is just a fresh domain from
Xen perspective and the time information in runstate info starts over
from scratch.

This patch introduces xen_save_steal_clock() which saves current values
in runstate info into per-cpu variables. Its couterpart,
xen_restore_steal_clock(), sets offset if it found the current values in
runstate info are smaller than previous ones. xen_steal_clock() is also
modified to use the offset to ensure that scheduler only sees
monotonically increasing number.

Signed-off-by: Munehisa Kamata <email address hidden>
Signed-off-by: Anchal Agarwal <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

ec7ec93... by Munehisa Kamata

UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation

BugLink: https://bugs.launchpad.net/bugs/1913410

Add freeze, thaw and restore callbacks for PM suspend and hibernation
support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
events, need to implement these xenbus_driver callbacks.
The freeze handler stops a block-layer queue and disconnect the
frontend from the backend while freeing ring_info and associated resources.
The restore handler re-allocates ring_info and re-connect to the
backend, so the rest of the kernel can continue to use the block device
transparently. Also, the handlers are used for both PM suspend and
hibernation so that we can keep the existing suspend/resume callbacks for
Xen suspend without modification. Before disconnecting from backend,
we need to prevent any new IO from being queued and wait for existing
IO to complete. Freeze/unfreeze of the queues will guarantee that there
are no requests in use on the shared ring.

Note:For older backends,if a backend doesn't have commit'12ea729645ace'
xen/blkback: unmap all persistent grants when frontend gets disconnected,
the frontend may see massive amount of grant table warning when freeing
resources.
[ 36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
[ 36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!

In this case, persistent grants would need to be disabled.

[Anchal Changelog: Removed timeout/request during blkfront freeze.
Fixed major part of the code to work with blk-mq]

Signed-off-by: Anchal Agarwal <email address hidden>
Signed-off-by: Munehisa Kamata <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

1a65f8e... by Munehisa Kamata

UBUNTU: SAUCE: xen-netfront: add callbacks for PM suspend and hibernation support

BugLink: https://bugs.launchpad.net/bugs/1913410

Add freeze, thaw and restore callbacks for PM suspend and hibernation
support. The freeze handler simply disconnects the frotnend from the
backend and frees resources associated with queues after disabling the
net_device from the system. The restore handler just changes the
frontend state and let the xenbus handler to re-allocate the resources
and re-connect to the backend. This can be performed transparently to
the rest of the system. The handlers are used for both PM suspend and
hibernation so that we can keep the existing suspend/resume callbacks
for Xen suspend without modification. Freezing netfront devices is
normally expected to finish within a few hundred milliseconds, but it
can rarely take more than 5 seconds and hit the hard coded timeout,
it would depend on backend state which may be congested and/or have
complex configuration. While it's rare case, longer default timeout
seems a bit more reasonable here to avoid hitting the timeout.
Also, make it configurable via module parameter so that we can cover
broader setups than what we know currently.

[Anchal changelog: Variable name fix and checkpatch.pl fixes]

Signed-off-by: Anchal Agarwal <email address hidden>
Signed-off-by: Munehisa Kamata <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

4e71875... by Munehisa Kamata

UBUNTU: SAUCE: x86/xen: add system core suspend and resume callbacks

BugLink: https://bugs.launchpad.net/bugs/1913410

Add Xen PVHVM specific system core callbacks for PM suspend and
hibernation support. The callbacks suspend and resume Xen
primitives,like shared_info, pvclock and grant table. Note that
Xen suspend can handle them in a different manner, but system
core callbacks are called from the context. So if the callbacks
are called from Xen suspend context, return immediately.

Signed-off-by: Agarwal Anchal <email address hidden>
Signed-off-by: Munehisa Kamata <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>

8c9e526... by Anchal Agarwal <email address hidden>

UBUNTU: SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume

BugLink: https://bugs.launchpad.net/bugs/1913410

Introduce a small function which re-uses shared page's PA allocated
during guest initialization time in reserve_shared_info() and not
allocate new page during resume flow.
It also does the mapping of shared_info_page by calling
xen_hvm_init_shared_info() to use the function.

Signed-off-by: Anchal Agarwal <email address hidden>
(backported from https://<email address hidden>/)
Signed-off-by: Andrea Righi <email address hidden>