6200eb6...
by
Linus Torvalds <email address hidden>
mm: delay rmap removal until after TLB flush
When we remove a page table entry, we are very careful to only free the
page after we have flushed the TLB, because other CPUs could still be
using the page through stale TLB entries until after the flush.
However, we have removed the rmap entry for that page early, which means
that functions like folio_mkclean() would end up not serializing with
the page table lock because the page had already been made invisible to
rmap.
And that is a problem, because while the TLB entry exists, we could end
up with the followign situation:
(a) one CPU could come in and clean it, never seeing our mapping of
the page
(b) another CPU could continue to use the stale and dirty TLB entry
and continue to write to said page
resulting in a page that has been dirtied, but then marked clean again,
all while another CPU might have dirtied it some more.
End result: possibly lost dirty data.
This commit uses the same old TLB gather array that we use to delay the
freeing of the page to also say 'remove from rmap after flush', so that
we can keep the rmap entries alive until all TLB entries have been
flushed.
NOTE! While the "possibly lost dirty data" sounds catastrophic, for this
all to happen you need to have a user thread doing either madvise() with
MADV_DONTNEED or a full re-mmap() of the area concurrently with another
thread continuing to use said mapping.
So arguably this is about user space doing crazy things, but from a VM
consistency standpoint it's better if we track the dirty bit properly
even when user space goes off the rails.
Reported-by: Nadav Amit <email address hidden>
Link: Link: https://<email address hidden>/
Cc: Will Deacon <email address hidden>
Cc: Aneesh Kumar <email address hidden>
Cc: Andrew Morton <email address hidden>
Cc: Nick Piggin <email address hidden>
Cc: Peter Zijlstra <email address hidden>
Cc: Heiko Carstens <email address hidden>
Cc: Vasily Gorbik <email address hidden>
Cc: Alexander Gordeev <email address hidden>
Cc: Christian Borntraeger <email address hidden>
Cc: Sven Schnelle <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
d12cec6...
by
Linus Torvalds <email address hidden>
mm: re-unify the simplified page_zap_*_rmap() function
Now that we've simplified both the anonymous and file-backed opage zap
functions, they end up being identical except for which page statistic
they update, and we can re-unify the implementation of that much
simplified code.
To make it very clear that this is onlt for the final pte zapping (since
a lot of the simplifications depended on that), name the unified
function 'page_zap_pte_rmap()'.
Link: https://<email address hidden>/
Cc: Nadav Amit <email address hidden>
Cc: Peter Zijlstra <email address hidden>
Cc: John Hubbard <email address hidden>
Cc: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
4305e9c...
by
Linus Torvalds <email address hidden>
mm: inline simpler case of page_remove_file_rmap()
Now that we have a simplified special case of 'page_remove_rmap()' that
doesn't deal with the 'compound' case and always gets a file-mapped (ie
not anonymous) page, it ended up doing just
but 'page_remove_file_rmap()' is actually trivial when 'compound' is false.
So just inline that non-compound case in the caller, and - like we did
in the previous commit for the anon pages - only do the memcg locking for
the parts that actually matter: the page statistics.
Also, as the previous commit did for anonymous pages, knowing we only
get called for the last-level page table entries allows for a further
simplification: we can get rid of the 'PageHuge(page)' case too.
You can't map a huge-page in a pte without splitting it (and the full
code in the generic page_remove_file_rmap() function has a comment to
that effect: "hugetlb pages are always mapped with pmds").
That means that the page_zap_file_rmap() case of that whole function is
really small and trivial.
Link: https://<email address hidden>/
Cc: Nadav Amit <email address hidden>
Cc: Peter Zijlstra <email address hidden>
Cc: John Hubbard <email address hidden>
Cc: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
e2dd770...
by
Linus Torvalds <email address hidden>
mm: introduce simplified versions of 'page_remove_rmap()'
The rmap handling is proving a bit problematic, and part of it comes
from the complexities of all the different cases of our implementation
of 'page_remove_rmap()'.
And a large part of that complexity comes from the fact that while we
have multiple different versions of _adding_ an rmap, this 'remove rmap'
function tries to deal with all possible cases.
So we have these specific versions for page_add_anon_rmap(),
page_add_new_anon_rmap() and page_add_file_rmap() which all do slightly
different things, but then 'page_remove_rmap()' has to handle all the
cases.
That's particularly annoying for 'zap_pte_range()', which already knows
which special case it's dealing with. It already checked for its own
reasons whether it's an anonymous page, and it already knows it's not
the compound page case and passed in an unconditional 'false' argument.
So this introduces the specialized versions of 'page_remove_rmap()' for
the cases that zap_pte_range() wants. We also make it the job of the
caller to do the munlock_vma_page(), which is really unrelated and is
the only thing that cares aboiut the 'vma'.
This just means that we end up with several simplifications:
- there's no 'vma' argument any more, because it's not used
- there's no 'compound' argument any more, because it was always false
- we can get rid of the tests for 'compound' and 'PageAnon()' since we
know what they are
and so instead of having that fairly complicated page_remove_rmap()
function, we end up with a couple of specialized functions that are
_much_ simpler.
There is supposed to be no semantic difference from this change,
although this does end up simplifying the code further by moving the
atomic_add_negative() on the PageAnon mapcount to outside the memcg
locking.
That locking protects other data structures (the page state statistics),
and this avoids not only an ugly 'goto', but means that we don't need to
take and release the lock when we're not actually doing anything with
the state statistics.
We also remove the test for PageTransCompound(), since this is only
called for the final pte level from zap_pte_range().
Cc: Nadav Amit <email address hidden>
Cc: Peter Zijlstra <email address hidden>
Cc: John Hubbard <email address hidden>
Cc: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
30a0b95...
by
Linus Torvalds <email address hidden>
Linux 6.1-rc3
b72018a...
by
Linus Torvalds <email address hidden>
Merge tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"A use-after-free bugfix in the smscufx driver and various minor error
path fixes, smaller build fixes, sysfs fixes and typos in comments in
the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb
drivers"
* tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: cyber2000fb: fix missing pci_disable_device()
fbdev: sisfb: use explicitly signed char
fbdev: smscufx: Fix several use-after-free bugs
fbdev: xilinxfb: Make xilinxfb_release() return void
fbdev: sisfb: fix repeated word in comment
fbdev: gbefb: Convert sysfs snprintf to sysfs_emit
fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit
fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
fbdev: da8xx-fb: Fix error handling in .remove()
fbdev: MIPS supports iomem addresses
9f12754...
by
Linus Torvalds <email address hidden>
Merge tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Some small driver fixes for 6.1-rc3. They include:
- iio driver bugfixes
- counter driver bugfixes
- coresight bugfixes, including a revert and then a second fix to get
it right.
All of these have been in linux-next with no reported problems"
* tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
misc: sgi-gru: use explicitly signed char
coresight: cti: Fix hang in cti_disable_hw()
Revert "coresight: cti: Fix hang in cti_disable_hw()"
counter: 104-quad-8: Fix race getting function mode and direction
counter: microchip-tcb-capture: Handle Signal1 read and Synapse
coresight: cti: Fix hang in cti_disable_hw()
coresight: Fix possible deadlock with lock dependency
counter: ti-ecap-capture: fix IS_ERR() vs NULL check
counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array
iio: bmc150-accel-core: Fix unsafe buffer attributes
iio: adxl367: Fix unsafe buffer attributes
iio: adxl372: Fix unsafe buffer attributes
iio: at91-sama5d2_adc: Fix unsafe buffer attributes
iio: temperature: ltc2983: allocate iio channels once
tools: iio: iio_utils: fix digit calculation
iio: adc: stm32-adc: fix channel sampling time init
iio: adc: mcp3911: mask out device ID in debug prints
iio: adc: mcp3911: use correct id bits
iio: adc: mcp3911: return proper error code on failure to allocate trigger
iio: adc: mcp3911: fix sizeof() vs ARRAY_SIZE() bug
...
c4d25ce...
by
Linus Torvalds <email address hidden>
Merge tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"A few small USB fixes for 6.1-rc3. Include in here are:
- MAINTAINERS update, including a big one for the USB gadget
subsystem. Many thanks to Felipe for all of the years of hard work
he has done on this codebase, it was greatly appreciated.
- dwc3 driver fixes for reported problems.
- xhci driver fixes for reported problems.
- typec driver fixes for minor issues
- uvc gadget driver change, and then revert as it wasn't relevant for
6.1-final, as it is a new feature and people are still reviewing
and modifying it.
All of these have been in the linux-next tree with no reported issues"
* tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Don't set IMI for no_interrupt
usb: dwc3: gadget: Stop processing more requests on IMI
Revert "usb: gadget: uvc: limit isoc_sg to super speed gadgets"
xhci: Remove device endpoints from bandwidth list when freeing the device
xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
xhci: Add quirk to reset host back to default state at shutdown
usb: xhci: add XHCI_SPURIOUS_SUCCESS to ASM1042 despite being a V0.96 controller
usb: dwc3: st: Rely on child's compatible instead of name
usb: gadget: uvc: limit isoc_sg to super speed gadgets
usb: bdc: change state when port disconnected
usb: typec: ucsi: acpi: Implement resume callback
usb: typec: ucsi: Check the connection on resume
usb: gadget: aspeed: Fix probe regression
usb: gadget: uvc: fix sg handling during video encode
usb: gadget: uvc: fix sg handling in error case
usb: gadget: uvc: fix dropped frame after missed isoc
usb: dwc3: gadget: Don't delay End Transfer on delayed_status
usb: dwc3: Don't switch OTG -> peripheral if extcon is present
MAINTAINERS: Update maintainers for broadcom USB
MAINTAINERS: move USB gadget and phy entries under the main USB entry
ef3c094...
by
Linus Torvalds <email address hidden>
Merge tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- convert gpio-tegra to using an immutable irqchip
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Change myself to a maintainer
gpio: tegra: Convert to immutable irq chip
4347660...
by
Linus Torvalds <email address hidden>
Merge tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Rename a perf memory level event define to denote it is of CXL type
- Add Alder and Raptor Lakes support to RAPL
- Make sure raw sample data is output with tracepoints
* tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
perf/x86/rapl: Add support for Intel Raptor Lake
perf/x86/rapl: Add support for Intel AlderLake-N
perf: Fix missing raw data on tracepoint events