~kamalmostafa/ubuntu/+source/linux/+git/xenial:lp1573062

Last commit made on 2016-07-14
Get this branch:
git clone -b lp1573062 https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial
Only Kamal Mostafa can upload to this branch. If you are Kamal Mostafa please log in for upload directions.

Branch merges

Branch information

Recent commits

53632c8... by Kamal Mostafa

TEST KERNEL LP: #1573062.1

65cead3... by Tejun Heo

UBUNTU: SAUCE: memcg: remove lru_add_drain_all() invocation from mem_cgroup_move_charge()

BugLink: http://bugs.launchpad.net/bugs/1573062

mem_cgroup_move_charge() invokes lru_add_drain_all() so that the pvec
pages can be moved too. lru_add_drain_all() schedules and flushes
work items on system_wq which depends on being able to create new
kworkers to make forward progress. Since 1ed1328792ff ("sched,
cgroup: replace signal_struct->group_rwsem with a global
percpu_rwsem"), a new task can't be created while in the cgroup
migration path and the described lru_add_drain_all() invocation can
easily lead to a deadlock.

Charge moving is best-effort and whether the pvec pages are migrated
or not doesn't really matter. Don't call it during charge moving.
Eventually, we want to move the actual charge moving outside the
migration path.

Signed-off-by: Tejun Heo <email address hidden>
Debugged-and-tested-by: Petr Mladek <email address hidden>
Reported-by: Cyril Hrubis <email address hidden>
Reported-by: Johannes Weiner <email address hidden>
Suggested-by: Michal Hocko <email address hidden>
Acked-by: Michal Hocko <email address hidden>
Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
Cc: <email address hidden> # v4.4+
(back-ported from https://lkml.org/lkml/2016/4/21/554)
Signed-off-by: Kamal Mostafa <email address hidden>

f1942c4... by Michal Hocko <email address hidden>

oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head

BugLink: http://bugs.launchpad.net/bugs/1573062

Commit bb29902a7515 ("oom, oom_reaper: protect oom_reaper_list using
simpler way") has simplified the check for tasks already enqueued for
the oom reaper by checking tsk->oom_reaper_list != NULL. This check is
not sufficient because the tsk might be the head of the queue without
any other tasks queued and then we would simply lockup looping on the
same task. Fix the condition by checking for the head as well.

Fixes: bb29902a7515 ("oom, oom_reaper: protect oom_reaper_list using simpler way")
Signed-off-by: Michal Hocko <email address hidden>
Acked-by: Tetsuo Handa <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit af8e15cc85a253155fdcea707588bf6ddfc0be2e)
Signed-off-by: Kamal Mostafa <email address hidden>

c20760f... by Tetsuo Handa <email address hidden>

oom, oom_reaper: protect oom_reaper_list using simpler way

BugLink: http://bugs.launchpad.net/bugs/1573062

"oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it
by simply checking tsk->oom_reaper_list != NULL.

Signed-off-by: Tetsuo Handa <email address hidden>
Signed-off-by: Michal Hocko <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit bb29902a7515208846114b3b36a4281a9bbf766a)
Signed-off-by: Kamal Mostafa <email address hidden>

b92b2b4... by Michal Hocko <email address hidden>

oom: make oom_reaper freezable

BugLink: http://bugs.launchpad.net/bugs/1573062

After "oom: clear TIF_MEMDIE after oom_reaper managed to unmap the
address space" oom_reaper will call exit_oom_victim on the target task
after it is done. This might however race with the PM freezer:

CPU0 CPU1 CPU2
freeze_processes
  try_to_freeze_tasks
      # Allocation request
    out_of_memory
  oom_killer_disable
      wake_oom_reaper(P1)
          __oom_reap_task
          exit_oom_victim(P1)
    wait_event(oom_victims==0)
[...]
        do_exit(P1)
      perform IO/interfere with the freezer

which breaks the oom_killer_disable semantic. We no longer have a
guarantee that the oom victim won't interfere with the freezer because
it might be anywhere on the way to do_exit while the freezer thinks the
task has already terminated. It might trigger IO or touch devices which
are frozen already.

In order to close this race, make the oom_reaper thread freezable. This
will work because
 a) already running oom_reaper will block freezer to enter the
    quiescent state
 b) wake_oom_reaper will not wake up the reaper after it has been
    frozen
 c) the only way to call exit_oom_victim after try_to_freeze_tasks
    is from the oom victim's context when we know the further
    interference shouldn't be possible

Signed-off-by: Michal Hocko <email address hidden>
Cc: Tetsuo Handa <email address hidden>
Cc: David Rientjes <email address hidden>
Cc: Mel Gorman <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Rik van Riel <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit e26796066fdf929cbba22dabb801808f986acdb9)
Signed-off-by: Kamal Mostafa <email address hidden>

525de54... by Vladimir Davydov <email address hidden>

oom: make oom_reaper_list single linked

BugLink: http://bugs.launchpad.net/bugs/1573062

Entries are only added/removed from oom_reaper_list at head so we can
use a single linked list and hence save a word in task_struct.

Signed-off-by: Vladimir Davydov <email address hidden>
Signed-off-by: Michal Hocko <email address hidden>
Cc: Tetsuo Handa <email address hidden>
Cc: David Rientjes <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit 29c696e1c6eceb5db6b21f0c89495fcfcd40c0eb)
Signed-off-by: Kamal Mostafa <email address hidden>

41a5679... by Michal Hocko <email address hidden>

oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task

BugLink: http://bugs.launchpad.net/bugs/1573062

Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g. by sacrificing the same child
multiple times.

This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
which is set in oom_kill_process atomically and oom reaper is disabled
if the flag was already set.

Signed-off-by: Michal Hocko <email address hidden>
Reported-by: Tetsuo Handa <email address hidden>
Cc: David Rientjes <email address hidden>
Cc: Mel Gorman <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Rik van Riel <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit 855b018325737f7691f9b7d86339df40aa4e47c3)
Signed-off-by: Kamal Mostafa <email address hidden>

9ac5d80... by Michal Hocko <email address hidden>

mm, oom_reaper: implement OOM victims queuing

BugLink: http://bugs.launchpad.net/bugs/1573062

wake_oom_reaper has allowed only 1 oom victim to be queued. The main
reason for that was the simplicity as other solutions would require some
way of queuing. The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time. The race could lead to missing an oom victim which can
get stuck

out_of_memory
  wake_oom_reaper
    cmpxchg // OK
       oom_reaper
     oom_reap_task
       __oom_reap_task
oom_victim terminates
         atomic_inc_not_zero // fail
out_of_memory
  wake_oom_reaper
    cmpxchg // fails
     task_to_reap = NULL

This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible. E.g. the original victim
might have not released a lot of memory for some reason.

The situation would improve considerably if wake_oom_reaper used a more
robust queuing. This is what this patch implements. This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization. wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.

Signed-off-by: Michal Hocko <email address hidden>
Cc: Vladimir Davydov <email address hidden>
Cc: Andrea Argangeli <email address hidden>
Cc: David Rientjes <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Johannes Weiner <email address hidden>
Cc: Mel Gorman <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Cc: Rik van Riel <email address hidden>
Cc: Tetsuo Handa <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit 03049269de433cb5fe2859be9ae4469ceb1163ed)
Signed-off-by: Kamal Mostafa <email address hidden>

0eca01c... by Michal Hocko <email address hidden>

mm, oom_reaper: report success/failure

BugLink: http://bugs.launchpad.net/bugs/1573062

Inform about the successful/failed oom_reaper attempts and dump all the
held locks to tell us more who is blocking the progress.

[<email address hidden>: fix CONFIG_MMU=n build]
Signed-off-by: Michal Hocko <email address hidden>
Cc: Andrea Argangeli <email address hidden>
Cc: David Rientjes <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Johannes Weiner <email address hidden>
Cc: Mel Gorman <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Cc: Rik van Riel <email address hidden>
Cc: Tetsuo Handa <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(back-ported from commit bc448e897b6d24aae32701763b8a1fe15d29fa26)
Signed-off-by: Kamal Mostafa <email address hidden>

64f4451... by Michal Hocko <email address hidden>

oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space

BugLink: http://bugs.launchpad.net/bugs/1573062

When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.

The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp. PF_EXITING check. We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.

This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct. This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention. This is not done by this patch.

Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.

Signed-off-by: Michal Hocko <email address hidden>
Suggested-by: Johannes Weiner <email address hidden>
Suggested-by: Tetsuo Handa <email address hidden>
Cc: Andrea Argangeli <email address hidden>
Cc: David Rientjes <email address hidden>
Cc: Hugh Dickins <email address hidden>
Cc: Mel Gorman <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Cc: Rik van Riel <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit 36324a990cf578b57828c04cd85ac62cd25cf5a4)
Signed-off-by: Kamal Mostafa <email address hidden>