Currently there are four places in which a PCI function is scanned
and made available to drivers:
1. In pci_scan_root_bus() as part of the initial zbus
creation.
2. In zpci_bus_add_devices() when registering
a device in configured state on a zbus that has already been
scanned.
3. When a function is already known to zPCI (in reserved/standby state)
and configuration is triggered through firmware by PEC 0x301.
4. When a device is already known to zPCI (in standby/reserved state)
and configuration is triggered from within Linux using
enable_slot().
The PF/VF linking step and setting of pdev->is_virtfn introduced with
commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") was
only triggered for the second case, which is where VFs created through
sriov_numvfs usually land. However unlike some other platforms but like
POWER VFs can be individually enabled/disabled through
/sys/bus/pci/slots.
Fix this by doing VF setup as part of pcibios_bus_add_device() which is
called in all of the above cases.
Finally to remove the PF/VF links call the common code
pci_iov_remove_virtfn() function to remove linked VFs.
This takes care of the necessary sysfs cleanup.
For fixing the PF to VF link removal we need to perform some action on
every removal of a zdev from the common PCI subsystem.
So in preparation re-introduce zpci_remove_device() and use that instead
of directly calling the common code functions. This was actually still
declared from earlier code but no longer implemented.
Reviewed-by: Pierre Morel <email address hidden>
Signed-off-by: Niklas Schnelle <email address hidden>
Signed-off-by: Heiko Carstens <email address hidden>
(cherry picked from commit 2f0230b2f2d5fd287a85583eefb5aed35b6fe510)
Signed-off-by: Frank Heimes <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Kleber Sacilotto de Souza <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
We were missing the pci_dev_put() for candidate PFs. Furhtermore in
discussion with upstream it turns out that somewhat counterintuitively
some common code, in particular the vfio-pci driver, assumes that
pdev->is_virtfn always implies that pdev->physfn is set, i.e. that VFs
are always linked.
While POWER does seem to set pdev->is_virtfn even for unlinked functions
(see comments in arch/powerpc/kernel/eeh.c:eeh_debugfs_break_device())
for now just be safe and only set pdev->is_virtfn on linking.
Also make sure that we only search for parent PFs if the zbus is
multifunction and we thus know the devfn values supplied by firmware
come from the RID.
Fixes: e5794cf1a270 ("s390/pci: create links between PFs and VFs")
Cc: <email address hidden> # 5.8
Reviewed-by: Pierre Morel <email address hidden>
Signed-off-by: Niklas Schnelle <email address hidden>
Signed-off-by: Heiko Carstens <email address hidden>
(cherry picked from commit 3cddb79afc60bcdb5fd9dd7a1c64a8d03bdd460f)
Signed-off-by: Frank Heimes <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Kleber Sacilotto de Souza <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
cd72208...
by
Thomas Richter <email address hidden>
s390/cpum_cf, perf: change DFLT_CCERROR counter name
Change the counter name DLFT_CCERROR to DLFT_CCFINISH on IBM z15.
This counter counts completed DEFLATE instructions with exit code
0, 1 or 2. Since exit code 0 means success and exit code 1 or 2
indicate errors, change the counter name to avoid confusion.
This counter is incremented each time the DEFLATE instruction
completed regardless if an error was detected or not.
Fixes: d68d5d51dc89 ("s390/cpum_cf: Add new extended counters for IBM z15")
Fixes: e7950166e402 ("perf vendor events s390: Add new deflate counters for IBM z15")
Cc: <email address hidden> # v5.7
Signed-off-by: Thomas Richter <email address hidden>
Reviewed-by: Sumanth Korikkar <email address hidden>
Signed-off-by: Heiko Carstens <email address hidden>
(backported from commit 3d3af181d370069861a3be94608464e2ff3682e2)
Signed-off-by: Frank Heimes <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Kleber Sacilotto de Souza <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
In usual IPL or hot plug scenarios a zPCI function transitions directly
from reserved (invisible to Linux) to configured state or is configured
by Linux itself using an SCLP, however it can also first go from
reserved to standby and then from standby to configured without
Linux initiative.
In this scenario we first get a PEC event 0x302 and then 0x301. This may
happen for example when the device is deconfigured at another LPAR and
made available for this LPAR. It may also happen under z/VM when
a device is attached while in some inconsistent state.
However when we get the 0x301 the device is already known to zPCI
so calling zpci_create() will add it twice resulting in the below
BUG. Instead we should only enable the existing device and finally
scan it through the PCI subsystem.
Fixes: f606b3ef47c9 ("s390/pci: adapt events for zbus")
Reported-by: Alexander Egorenkov <email address hidden>
Tested-by: Alexander Egorenkov <email address hidden>
Signed-off-by: Niklas Schnelle <email address hidden>
Signed-off-by: Heiko Carstens <email address hidden>
(cherry picked from commit 3047766bc6ec9c6bc9ece85b45a41ff401e8d988)
Signed-off-by: Frank Heimes <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: Kleber Sacilotto de Souza <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
425c11c...
by
Alex Williamson <email address hidden>
vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
Accessing the disabled memory space of a PCI device would typically
result in a master abort response on conventional PCI, or an
unsupported request on PCI express. The user would generally see
these as a -1 response for the read return data and the write would be
silently discarded, possibly with an uncorrected, non-fatal AER error
triggered on the host. Some systems however take it upon themselves
to bring down the entire system when they see something that might
indicate a loss of data, such as this discarded write to a disabled
memory space.
To avoid this, we want to try to block the user from accessing memory
spaces while they're disabled. We start with a semaphore around the
memory enable bit, where writers modify the memory enable state and
must be serialized, while readers make use of the memory region and
can access in parallel. Writers include both direct manipulation via
the command register, as well as any reset path where the internal
mechanics of the reset may both explicitly and implicitly disable
memory access, and manipulation of the MSI-X configuration, where the
MSI-X vector table resides in MMIO space of the device. Readers
include the read and write file ops to access the vfio device fd
offsets as well as memory mapped access. In the latter case, we make
use of our new vma list support to zap, or invalidate, those memory
mappings in order to force them to be faulted back in on access.
Our semaphore usage will stall user access to MMIO spaces across
internal operations like reset, but the user might experience new
behavior when trying to access the MMIO space while disabled via the
PCI command register. Access via read or write while disabled will
return -EIO and access via memory maps will result in a SIGBUS. This
is expected to be compatible with known use cases and potentially
provides better error handling capabilities than present in the
hardware, while avoiding the more readily accessible and severe
platform error responses that might otherwise occur.
Fixes: CVE-2020-12888
Reviewed-by: Peter Xu <email address hidden>
Signed-off-by: Alex Williamson <email address hidden>
(cherry picked from commit abafbc551fddede3e0a08dee1dcde08fc0eb8476)
CVE-2020-12888
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: William Breathitt Gray <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
6a90834...
by
Alex Williamson <email address hidden>
vfio-pci: Fault mmaps to enable vma tracking
Rather than calling remap_pfn_range() when a region is mmap'd, setup
a vm_ops handler to support dynamic faulting of the range on access.
This allows us to manage a list of vmas actively mapping the area that
we can later use to invalidate those mappings. The open callback
invalidates the vma range so that all tracking is inserted in the
fault handler and removed in the close handler.
Reviewed-by: Peter Xu <email address hidden>
Signed-off-by: Alex Williamson <email address hidden>
(backported from commit 11c4cd07ba111a09f49625f9e4c851d83daf0a22)
[cascardo: adjusted context in header]
CVE-2020-12888
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: William Breathitt Gray <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>
441d359...
by
Alex Williamson <email address hidden>
vfio/type1: Support faulting PFNMAP vmas
With conversion to follow_pfn(), DMA mapping a PFNMAP range depends on
the range being faulted into the vma. Add support to manually provide
that, in the same way as done on KVM with hva_to_pfn_remapped().
Reviewed-by: Peter Xu <email address hidden>
Signed-off-by: Alex Williamson <email address hidden>
(cherry picked from commit 41311242221e3482b20bfed10fa4d9db98d87016)
CVE-2020-12888
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
Acked-by: Stefan Bader <email address hidden>
Acked-by: William Breathitt Gray <email address hidden>
Signed-off-by: Kelsey Skunberg <email address hidden>