~canonical-kernel/ubuntu/+source/linux-aws/+git/disco:master

Last commit made on 2020-02-14
Get this branch:
git clone -b master https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-aws/+git/disco
Members of Canonical Kernel can upload to this branch. Log in for directions.

Branch merges

Branch information

Recent commits

5e955fe... by Frank van der Linden <email address hidden>

UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.

BugLink: https://bugs.launchpad.net/bugs/1831940

Restoring all PIRQs, which is the right thing to do, was causing problems
on larger instances. This is a horrible workaround until this issue is fully
understood.

Signed-off-by: Frank van der Linden <email address hidden>
Reviewed-by: Alakesh Haloi <email address hidden>
Reviewed-by: Anchal Agarwal <email address hidden>
Reviewed-by: Qian Lu <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

8f1e118... by Frank van der Linden <email address hidden>

UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation.

BugLink: https://bugs.launchpad.net/bugs/1831940

The hibernation code unlinks event channels from these (legacy) IRQs, so they
must be reinitialized on wakeup, much like in the Xen suspend/resume case.

Signed-off-by: Frank van der Linden <email address hidden>
Reviewed-by: Cristian Gafton <email address hidden>
Reviewed-by: Anchal Agarwal <email address hidden>
Reviewed-by: Alakesh Haloi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

fb4b73d... by Eduardo Valentin <email address hidden>

UBUNTU SAUCE [aws]: block: xen-blkfront: consider new dom0 features on restore

BugLink: https://bugs.launchpad.net/bugs/1831940

On regular start, the instance will perform a regular boot, in which rootfs
is mounted accordingly to the xen-blkback features (in particular
feature-barrier and feature-flush-cache). That will setup the journal
accordingly to the provided features on SB.
On a start from hibernation, the instance boots, detects that a hibernation
image is present, push the image to memory and jumps back where it was. There
is no regular mount of the rootfs, it uses the data structures already in
the previous saved memory image.
Now, When the instance hibernates, it may move from its original dom0 to a new dom0
when it is restarted.
So, given the above, if the xen-blkback features change then the guest
can be in trouble. And I see the original assumption was that the
dom0 environment would be preserved. I did a couple of experiments,
and I confirm that these particular features change quite a lot across
hibernation attempts:
[ 2343.157903] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2444.712339] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2537.105884] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2636.641298] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2729.868349] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2827.118979] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 2924.812599] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3018.063399] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3116.685040] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3209.164475] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3317.981362] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3415.939725] blkfront: xvda: flush diskcache: enabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3514.202478] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[ 3619.355791] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;

Now, considering the above, this patch fixes the following scenario:
a. Instance boots and sets up bio queue on a dom0 A with softbarrier supported.
b. hibernates
c. When asked to restore, the instance is back on dom0 B with unsupported
softbarrier.
d. Restoration goes well until next journal commit is issued. Remember that
it is still using the previous image rootfs data structures, therefore
is gonna request a softbarrier.
e. The bio will error out and throw a "operation not supported" message
and cause the journal to fail, and it will decide to remount
the rootfs as RO.
[ 1138.909290] print_req_error: operation not supported error, dev xvda, sector 4470400, flags 6008
[ 1139.025685] Aborting journal on device xvda1-8.
[ 1139.029758] print_req_error: operation not supported error, dev xvda, sector 4460544, flags 26008
[ 1139.326119] Buffer I/O error on dev xvda1, logical block 0, lost sync page write
[ 1139.331398] EXT4-fs error (device xvda1): ext4_journal_check_start:61: Detected aborted journal
[ 1139.337296] EXT4-fs (xvda1): Remounting filesystem read-only
[ 1139.341006] EXT4-fs (xvda1): previous I/O error to superblock detected
[ 1139.345704] print_req_error: operation not supported error, dev xvda, sector 4096, flags 26008

The fix is essentially to read xenbus to query the new xen
blkback capabilities and update them into the request queue.

Reviewed-by: Balbir Singh <email address hidden>
Reviewed-by: Vallish Vaidyeshwara <email address hidden>
Signed-off-by: Eduardo Valentin <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

43752b9... by Anchal Agarwal <email address hidden>

UBUNTU SAUCE [aws]: ACPICA: Enable sleep button on ACPI legacy wake

BugLink: https://bugs.launchpad.net/bugs/1831940

Currently we do not see sleep_enable bit set after guest resumes from
hibernation. Hibernation is triggered in guest on receiving a sleep
trigger from the hypervisor(S4 state). We see that power button is enabled
on wake up however sleep button isn't. This causes 2nd sleep trigger to fail
since PMEN register does not have bit 9 set in its register on resume which
is expected by hypervisor to be set before it can send an SCI interrupt to
the guest.
Expected PMEN=0x320 on fresh boot however, after resume PMEN=0x120

Signed-off-by: Anchal Agarwal <email address hidden>
Reviewed-by: Balbir Singh <email address hidden>
Reviewed-by: Frank van der Linden <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

24db9e1... by Andrea Righi

UBUNTU SAUCE [aws]: PM / hibernate: set image_size to 0 by default

BugLink: https://bugs.launchpad.net/bugs/1831940

The main bottleneck during hibernation is writing the image to disk.
Set image_size to 0 by default so that the kernel is forced to make the
image as small as possible and reduce in this way the amount of I/O.

Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

6bdd93f... by Andrea Righi

UBUNTU SAUCE [aws]: PM / hibernate: make sure pm_async is always disabled

BugLink: https://bugs.launchpad.net/bugs/1831940

We have experienced deadlock conditions on hibernate under memory
pressure with pm_async enabled.

To prevent such deadlocks make sure that pm_async is never enabled.

Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

b4c7003... by Andrea Righi

UBUNTU SAUCE [aws]: mm: swap: improve swap readahead heuristic

BugLink: https://bugs.launchpad.net/bugs/1858618

Apply a more aggressive swapin readahead policy to improve swapoff
performance.

The idea is to start with no readahead (only read one page) and linearly
increment the amount of readahead pages each time swapin_readahead() is
called, up to the maximum cluster size (defined by vm.page-cluster),
then go back to one page to give the disk enough time to prefetch the
requested pages and avoid re-requesting them multiple times.

Also increase the default vm.page-cluster size to 8 (that seems to work
better with this new heuristic).

Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

fa27513... by Andrea Righi

UBUNTU SAUCE [aws] PM / hibernate: reduce memory pressure during image writing

BugLink: https://bugs.launchpad.net/bugs/1831940

Get rid of the reqd_free_pages logic and make sure I/O requests are
completed every time a swap page is written.

This allows to reduce the risk of running out of memory during hibernate
if the system is under memory pressure.

Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

11e0ed1... by Vineeth Remanan Pillai <email address hidden>

mm: rid swapoff of quadratic complexity

BugLink: https://bugs.launchpad.net/bugs/1858618

This patch was initially posted by Kelley Nielsen. Reposting the patch
with all review comments addressed and with minor modifications and
optimizations. Also, folding in the fixes offered by Hugh Dickins and
Huang Ying. Tests were rerun and commit message updated with new
results.

try_to_unuse() is of quadratic complexity, with a lot of wasted effort.
It unuses swap entries one by one, potentially iterating over all the
page tables for all the processes in the system for each one.

This new proposed implementation of try_to_unuse simplifies its
complexity to linear. It iterates over the system's mms once, unusing
all the affected entries as it walks each set of page tables. It also
makes similar changes to shmem_unuse.

Improvement

swapoff was called on a swap partition containing about 6G of data, in a
VM(8cpu, 16G RAM), and calls to unuse_pte_range() were counted.

Present implementation....about 1200M calls(8min, avg 80% cpu util).
Prototype.................about 9.0K calls(3min, avg 5% cpu util).

Details

In shmem_unuse(), iterate over the shmem_swaplist and, for each
shmem_inode_info that contains a swap entry, pass it to
shmem_unuse_inode(), along with the swap type. In shmem_unuse_inode(),
iterate over its associated xarray, and store the index and value of
each swap entry in an array for passing to shmem_swapin_page() outside
of the RCU critical section.

In try_to_unuse(), instead of iterating over the entries in the type and
unusing them one by one, perhaps walking all the page tables for all the
processes for each one, iterate over the mmlist, making one pass. Pass
each mm to unuse_mm() to begin its page table walk, and during the walk,
unuse all the ptes that have backing store in the swap type received by
try_to_unuse(). After the walk, check the type for orphaned swap
entries with find_next_to_unuse(), and remove them from the swap cache.
If find_next_to_unuse() starts over at the beginning of the type, repeat
the check of the shmem_swaplist and the walk a maximum of three times.

Change unuse_mm() and the intervening walk functions down to
unuse_pte_range() to take the type as a parameter, and to iterate over
their entire range, calling the next function down on every iteration.
In unuse_pte_range(), make a swap entry from each pte in the range using
the passed in type. If it has backing store in the type, call
swapin_readahead() to retrieve the page and pass it to unuse_pte().

Pass the count of pages_to_unuse down the page table walks in
try_to_unuse(), and return from the walk when the desired number of
pages has been swapped back in.

Link: http://<email address hidden>
Signed-off-by: Vineeth Remanan Pillai <email address hidden>
Signed-off-by: Kelley Nielsen <email address hidden>
Signed-off-by: Huang Ying <email address hidden>
Acked-by: Hugh Dickins <email address hidden>
Cc: Rik van Riel <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit b56a2d8af9147a4efe4011b60d93779c0461ca97)
Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>

e687817... by Vineeth Remanan Pillai <email address hidden>

mm: refactor swap-in logic out of shmem_getpage_gfp

BugLink: https://bugs.launchpad.net/bugs/1858618

swapin logic can be reused independently without rest of the logic in
shmem_getpage_gfp. So lets refactor it out as an independent function.

Link: http://<email address hidden>
Signed-off-by: Vineeth Remanan Pillai <email address hidden>
Reviewed-by: Andrew Morton <email address hidden>
Cc: Huang Ying <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Kelley Nielsen <email address hidden>
Cc: Rik van Riel <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit c5bf121e4350a933bd431385e6fcb72a898ecc68)
Signed-off-by: Andrea Righi <email address hidden>
Acked-by: Kamal Mostafa <email address hidden>
Acked-by: Khalid Elmously <email address hidden>
Signed-off-by: Khalid Elmously <email address hidden>