Whenever thresholds are changed the hash tables are rebuilt. This is
done by enumerating all policies and hashing and inserting them into
the right table according to the thresholds and direction.
Because socket policies are also contained in net->xfrm.policy_all but
no hash tables are defined for their direction (dir + XFRM_POLICY_MAX)
this causes a NULL or invalid pointer dereference after returning from
policy_hash_bysel() if the rebuild is done while any socket policies
are installed.
Since the rebuild after changing thresholds is scheduled this crash
could even occur if the userland sets thresholds seemingly before
installing any socket policies.
Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies")
Signed-off-by: Tobias Brunner <email address hidden>
Acked-by: Herbert Xu <email address hidden>
Signed-off-by: Steffen Klassert <email address hidden>
(cherry picked from linux-next commit 6916fb3b10b3cbe3b1f9f5b680675f53e4e299eb)
Signed-off-by: Joseph Salisbury <email address hidden>
Acked-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.
Switch the setting of psl_fir_cntl from debug to production
environment recommended value. It mostly affects the PSL behavior when
an error is raised in psl_fir1/2.
Tested with cxlflash.
Signed-off-by: Frederic Barrat <email address hidden>
Reviewed-by: Uma Krishnan <email address hidden>
Signed-off-by: Michael Ellerman <email address hidden>
(back ported from linux-next commit c6d2ee09c2fffd3efdd31be2b2811d081a45bb99)
Signed-off-by: Tim Gardner <email address hidden>
Conflicts:
drivers/misc/cxl/pci.c
Acked-by: Christopher Arges <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
3442aaa...
by
Greg Kroah-Hartman <email address hidden>
Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") swap entries do not pin memcg->css.refcnt
directly. Instead, they pin memcg->id.ref. So we should adjust the
reference counters accordingly when moving swap charges between cgroups.
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/<email address hidden>
Signed-off-by: Vladimir Davydov <email address hidden>
Acked-by: Michal Hocko <email address hidden>
Acked-by: Johannes Weiner <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
Signed-off-by: Michal Hocko <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
50276bc...
by
Vladimir Davydov <email address hidden>
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
An offline memory cgroup might have anonymous memory or shmem left
charged to it and no swap. Since only swap entries pin the id of an
offline cgroup, such a cgroup will have no id and so an attempt to
swapout its anon/shmem will not store memory cgroup info in the swap
cgroup map. As a result, memcg->swap or memcg->memsw will never get
uncharged from it and any of its ascendants.
Fix this by always charging swapout to the first ancestor cgroup that
hasn't released its id yet.
[<email address hidden>: add comment to mem_cgroup_swapout]
[<email address hidden>: use WARN_ON_ONCE() in mem_cgroup_id_get_online()]
Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza
Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Link: http://lkml.kernel.org/r/<email address hidden>
Signed-off-by: Vladimir Davydov <email address hidden>
Acked-by: Johannes Weiner <email address hidden>
Acked-by: Michal Hocko <email address hidden>
Cc: <email address hidden> [3.19+]
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
Signed-off-by: Michal Hocko <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
45c2b48...
by
Johannes Weiner <email address hidden>
mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[<email address hidden>: init the IDR]
Link: http://<email address hidden>
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://<email address hidden>
Signed-off-by: Johannes Weiner <email address hidden>
Reported-by: John Garcia <email address hidden>
Reviewed-by: Vladimir Davydov <email address hidden>
Acked-by: Tejun Heo <email address hidden>
Cc: Nikolay Borisov <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
Signed-off-by: Michal Hocko <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
If we hit this error when mounted with errors=continue or
errors=remount-ro:
EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.
Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.
[ Fixed up so we don't return EAGAIN to userspace. --tytso ]
Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <email address hidden>
Signed-off-by: Theodore Ts'o <email address hidden>
Cc: Aneesh Kumar K.V <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <email address hidden>
Signed-off-by: Vegard Nossum <email address hidden>
Signed-off-by: Theodore Ts'o <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
Signed-off-by: Tim Gardner <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>