~vicamo/+git/ubuntu-kernel:bug-1857511/power-off-usb-port-at-shutdown/mainline

Last commit made on 2020-03-30
Get this branch:
git clone -b bug-1857511/power-off-usb-port-at-shutdown/mainline https://git.launchpad.net/~vicamo/+git/ubuntu-kernel
Only You-Sheng Yang can upload to this branch. If you are You-Sheng Yang please log in for upload directions.

Branch merges

Branch information

Name:
bug-1857511/power-off-usb-port-at-shutdown/mainline
Repository:
lp:~vicamo/+git/ubuntu-kernel

Recent commits

b3ecea5... by Kai-Heng Feng

Bug 1857511/1868536: try port power cycle in usb_port_shutdown

7111951... by Linus Torvalds <email address hidden>

Linux 5.6

570203e... by Linus Torvalds <email address hidden>

Merge branch 'akpm' (patches from Andrew)

Merge vm fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <email address hidden>:
  mm/sparse: fix kernel crash with pfn_section_valid check
  mm: fork: fix kernel_stack memcg stats for various stack implementations
  hugetlb_cgroup: fix illegal access to memory
  drivers/base/memory.c: indicate all memory blocks as removable
  mm/swapfile.c: move inode_lock out of claim_swapfile

ab93e98... by Linus Torvalds <email address hidden>

Merge tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix for the Hyper-V clocksource driver to make sched clock
  actually return nanoseconds and not the virtual clock value which
  increments at 10e7 HZ (100ns)"

* tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly

01af08b... by Linus Torvalds <email address hidden>

Merge tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single bugfix to prevent reference leaks in irq affinity notifiers"

* tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Fix reference leaks on irq affinity notifiers

b943f04... by "Aneesh Kumar K.V" <email address hidden>

mm/sparse: fix kernel crash with pfn_section_valid check

Fix the crash like this:

    BUG: Kernel NULL pointer dereference on read at 0x00000000
    Faulting instruction address: 0xc000000000c3447c
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
    CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
    ...
    NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
    LR [c000000000088354] vmemmap_free+0x144/0x320
    Call Trace:
       section_deactivate+0x220/0x240
       __remove_pages+0x118/0x170
       arch_remove_memory+0x3c/0x150
       memunmap_pages+0x1cc/0x2f0
       devm_action_release+0x30/0x50
       release_nodes+0x2f8/0x3e0
       device_release_driver_internal+0x168/0x270
       unbind_store+0x130/0x170
       drv_attr_store+0x44/0x60
       sysfs_kf_write+0x68/0x80
       kernfs_fop_write+0x100/0x290
       __vfs_write+0x3c/0x70
       vfs_write+0xcc/0x240
       ksys_write+0x7c/0x140
       system_call+0x5c/0x68

The crash is due to NULL dereference at

 test_bit(idx, ms->usage->subsection_map);

due to ms->usage = NULL in pfn_section_valid()

With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does

 __section_mem_map_addr(__sec) + __pfn;

where

  static inline struct page *__section_mem_map_addr(struct mem_section *section)
  {
 unsigned long map = section->section_mem_map;
 map &= SECTION_MAP_MASK;
 return (struct page *)map;
  }

Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated,
pfn_valid() check after a subsection_deactivate cause a kernel crash.

  static inline int pfn_valid(unsigned long pfn)
  {
  ...
 return early_section(ms) || pfn_section_valid(ms, pfn);
  }

where

  static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
  {
 int idx = subsection_map_index(pfn);

 return test_bit(idx, ms->usage->subsection_map);
  }

Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed. For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections. Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.

[<email address hidden>: add comment]
  Link: http://<email address hidden>: http://<email address hidden>
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Reported-by: Sachin Sant <email address hidden>
Signed-off-by: Aneesh Kumar K.V <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Tested-by: Sachin Sant <email address hidden>
Reviewed-by: Baoquan He <email address hidden>
Reviewed-by: Wei Yang <email address hidden>
Acked-by: Michal Hocko <email address hidden>
Acked-by: Pankaj Gupta <email address hidden>
Cc: Michael Ellerman <email address hidden>
Cc: Dan Williams <email address hidden>
Cc: David Hildenbrand <email address hidden>
Cc: Oscar Salvador <email address hidden>
Cc: Mike Rapoport <email address hidden>
Cc: <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>

8380ce4... by Roman Gushchin <email address hidden>

mm: fork: fix kernel_stack memcg stats for various stack implementations

Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node().

In the first and the second cases page->mem_cgroup pointer is set, but
in the third it's not: memcg membership of a slab page should be
determined using the memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.

It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).

In order to fix it, let's introduce a mod_memcg_obj_state() helper,
which takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .

Note: This is a special version of the patch created for stable
backports. It contains code from the following two patches:
  - mm: memcg/slab: introduce mem_cgroup_from_obj()
  - mm: fork: fix kernel_stack memcg stats for various stack implementations

[<email address hidden>: introduce mem_cgroup_from_obj()]
  Link: http://<email address hidden>
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Reviewed-by: Shakeel Butt <email address hidden>
Acked-by: Johannes Weiner <email address hidden>
Cc: Michal Hocko <email address hidden>
Cc: Bharata B Rao <email address hidden>
Cc: Shakeel Butt <email address hidden>
Cc: <email address hidden>
Link: http://<email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>

726b7bb... by Mina Almasry

hugetlb_cgroup: fix illegal access to memory

This appears to be a mistake in commit faced7e0806cf ("mm: hugetlb
controller for cgroups v2").

Essentially that commit does a hugetlb_cgroup_from_counter assuming that
page_counter_try_charge has initialized counter.

But if that has failed then it seems will not initialize counter, so
hugetlb_cgroup_from_counter(counter) ends up pointing to random memory,
causing kasan to complain.

The solution is to simply use 'h_cg', instead of
hugetlb_cgroup_from_counter(counter), since that is a reference to the
hugetlb_cgroup anyway. After this change kasan ceases to complain.

Fixes: faced7e0806cf ("mm: hugetlb controller for cgroups v2")
Reported-by: <email address hidden>
Signed-off-by: Mina Almasry <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Acked-by: Giuseppe Scrivano <email address hidden>
Acked-by: Tejun Heo <email address hidden>
Cc: Mike Kravetz <email address hidden>
Cc: David Rientjes <email address hidden>
Link: http://<email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>

53cdc1c... by David Hildenbrand

drivers/base/memory.c: indicate all memory blocks as removable

We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).

1. It runs basically lockless. While this might be good for performance,
   we see possible races with memory offlining that will require at
   least some sort of locking to fix.

2. Nowadays, more false positives are possible. No arch-specific checks
   are performed that validate if memory offlining will not be denied
   right away (and such check will require locking). For example, arm64
   won't allow to offline any memory block that was added during boot -
   which will imply a very high error rate. Other archs have other
   constraints.

3. The interface is inherently racy. E.g., if a memory block is detected
   to be removable (and was not a false positive at that time), there is
   still no guarantee that offlining will actually succeed. So any
   caller already has to deal with false positives.

4. It is unclear which performance benefit this interface actually
   provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add
   sysfs removable attribute for hotplug memory remove") mentioned

 "A user-level agent must be able to identify which sections
  of memory are likely to be removable before attempting the
  potentially expensive operation."

   However, no actual performance comparison was included.

Known users:

 - lsmem: Will group memory blocks based on the "removable" property. [1]

 - chmem: Indirect user. It has a RANGE mode where one can specify
          removable ranges identified via lsmem to be offlined. However,
          it also has a "SIZE" mode, which allows a sysadmin to skip the
          manual "identify removable blocks" step. [2]

 - powerpc-utils: Uses the "removable" attribute to skip some memory
          blocks right away when trying to find some to offline+remove.
          However, with ballooning enabled, it already skips this
          information completely (because it once resulted in many false
          negatives). Therefore, the implementation can deal with false
          positives properly already. [3]

According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils). Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the
affected legacy userspace handling is only active on old kernels. Only
very old versions of drmgr on a new kernel (unlikely) might execute
slower - totally acceptable.

With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool. We implement a very bad heuristic now.
Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.

Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").

Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.

[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://<email address hidden>

Also, this patch probably fixes a crash reported by Steve.
http://lkml.<email address hidden>

Reported-by: "Scargall, Steve" <email address hidden>
Suggested-by: Michal Hocko <email address hidden>
Signed-off-by: David Hildenbrand <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Reviewed-by: Nathan Fontenot <email address hidden>
Acked-by: Michal Hocko <email address hidden>
Cc: Dan Williams <email address hidden>
Cc: Greg Kroah-Hartman <email address hidden>
Cc: "Rafael J. Wysocki" <email address hidden>
Cc: Badari Pulavarty <email address hidden>
Cc: Robert Jennings <email address hidden>
Cc: Heiko Carstens <email address hidden>
Cc: Karel Zak <email address hidden>
Cc: <email address hidden>
Link: http://<email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>

d795a90... by Naohiro Aota <email address hidden>

mm/swapfile.c: move inode_lock out of claim_swapfile

claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY). And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile(). The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap". Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ #176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     print_unlock_imbalance_bug.cold+0x114/0x123
     lock_release+0x562/0xed0
     up_write+0x2d/0x490
     __do_sys_swapon+0x94b/0x3550
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Tested-by: Qais Youef <email address hidden>
Reviewed-by: Andrew Morton <email address hidden>
Reviewed-by: Darrick J. Wong <email address hidden>
Cc: Christoph Hellwig <email address hidden>
Cc: <email address hidden>
Link: http://<email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>