As pointed out by Michael Ellerman, the ptrace ABI on powerpc does not
allow or require the return code to be set on syscall entry when
skipping the syscall. It will always return ENOSYS and the return code
must be set on syscall exit.
This code does that, behaving more similarly to strace. It still sets
the return code on entry, which is overridden on powerpc, and it will
always repeat the same on exit. Also, on powerpc, the errno is not
inverted, and depends on ccr.so being set.
This has been tested on powerpc and amd64.
https://lore<email address hidden>/
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Andrea Righi <email address hidden>
Acked-by: Juerg Haefliger <email address hidden>
Signed-off-by: Kleber Sacilotto de Souza <email address hidden>
A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are
experiencing random multisecond delays. These delays were caused by the
long time it took to scan the VMA interval tree with mmap_sem held.
The sharing of huge PMD does not require changes to the i_mmap at all.
Therefore, we can just take the read lock and let other threads
searching for the right VMA share it in parallel. Once the right VMA is
found, either the PMD lock (2M huge page for x86-64) or the
mm->page_table_lock will be acquired to perform the actual PMD sharing.
Lock contention, if present, will happen in the spinlock. That is much
better than contention in the rwsem where the time needed to scan the
the interval tree is indeterminate.
With this patch applied, the customer is seeing significant performance
improvement over the unpatched kernel.
Link: http://<email address hidden>
Signed-off-by: Waiman Long <email address hidden>
Suggested-by: Mike Kravetz <email address hidden>
Reviewed-by: Mike Kravetz <email address hidden>
Cc: Davidlohr Bueso <email address hidden>
Cc: Peter Zijlstra <email address hidden>
Cc: Ingo Molnar <email address hidden>
Cc: Will Deacon <email address hidden>
Cc: Matthew Wilcox <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(cherry picked from commit 930668c34408ba983049322e04f13f03b6f1fafa)
Signed-off-by: Gavin Guo <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Juerg Haefliger <email address hidden>
Signed-off-by: Kleber Sacilotto de Souza <email address hidden>
When using the PtrAuth feature in a guest, we need to save the host's
keys before allowing the guest to program them. For that, we dump
them in a per-CPU data structure (the so called host context).
But both call sites that do this are in preemptible context,
which may end up in disaster should the vcpu thread get preempted
before reentering the guest.
Instead, save the keys eagerly on each vcpu_load(). This has an
increased overhead, but is at least safe.
Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case
in floppy_queue_rq() is not handled correctly.
In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy
locked due to another request still being in-flight), we put the request
on the list of requests and return BLK_STS_OK to the block core, without
actually scheduling delayed work / doing further processing of the
request. This means that processing of this request is postponed until
another request comes and passess uncontended.
Which in some cases might actually never happen and we keep waiting
indefinitely. The simple testcase is
for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done
run in quemu. That reliably causes blkid eventually indefinitely hanging
in __floppy_read_block_0() waiting for completion, as the BIO callback
never happens, and no further IO is ever submitted on the (non-existent)
floppy device. This was observed reliably on qemu-emulated device.
Fix that by not queuing the request in the contended case, and return
BLK_STS_RESOURCE instead, so that blk core handles the request
rescheduling and let it pass properly non-contended later.