~mreed8855/ubuntu/+source/linux/+git/jammy:test_dpc_1965241

Last commit made on 2022-05-17
Get this branch:
git clone -b test_dpc_1965241 https://git.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy
Only Michael Reed can upload to this branch. If you are Michael Reed please log in for upload directions.

Branch merges

Branch information

Name:
test_dpc_1965241
Repository:
lp:~mreed8855/ubuntu/+source/linux/+git/jammy

Recent commits

32b85a2... by Michael Reed

Enable config option CONFIG_PCIE_EDR
BugLink: https://bugs.launchpad.net/bugs/1965241

3171687... by Lukas Wunner <email address hidden>

PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset

Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:

  RP -- UP -- DP -- UP -- DP (hotplug) -- EP

pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":

  For a Switch, the following must cause a hot reset to be sent on all
  Downstream Ports: [...]

  * The Data Link Layer of the Upstream Port reporting DL_Down status.
    In Switches that support Link speeds greater than 5.0 GT/s, the
    Upstream Port must direct the LTSSM of each Downstream Port to the
    Hot Reset state, but not hold the LTSSMs in that state. This permits
    each Downstream Port to begin Link training immediately after its
    hot reset completes. This behavior is recommended for all Switches.

  * Receiving a hot reset on the Upstream Port.

Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":

  If the Port is enabled for edge-triggered interrupt signaling using
  MSI or MSI-X, an interrupt message must be sent every time the logical
  AND of the following conditions transitions from FALSE to TRUE: [...]

  * The Hot-Plug Interrupt Enable bit in the Slot Control register is
    set to 1b.

  * At least one hot-plug event status bit in the Slot Status register
    and its associated enable bit in the Slot Control register are both
    set to 1b.

Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.

Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.

Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.

Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.

The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.

BugLink: https://bugs.launchpad.net/bugs/1965241

[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://<email address hidden>/
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org<email address hidden>
Reported-by: Stuart Hayes <email address hidden>
Tested-by: Stuart Hayes <email address hidden>
Signed-off-by: Lukas Wunner <email address hidden>
Signed-off-by: Bjorn Helgaas <email address hidden>
Cc: <email address hidden> # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <email address hidden>
(cherry picked from commit ea401499e943c307e6d44af6c2b4e068643e7884 linux-next)
Signed-off-by: Michael Reed <email address hidden>

68bdbfc... by Lukas Wunner <email address hidden>

PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()

Rename pm_iter() to pcie_port_device_iter() and make it visible outside
CONFIG_PM and portdrv_core.c so it can be used for pciehp slot reset
recovery.

BugLink: https://bugs.launchpad.net/bugs/1965241

[bhelgaas: split into its own patch]
Link: https://<email address hidden>/
Link: https://lore.kernel.org<email address hidden>
Signed-off-by: Lukas Wunner <email address hidden>
Signed-off-by: Bjorn Helgaas <email address hidden>
(cherry picked from commit 3134689f98f9e09004a4727370adc46e7635b4be linux-next)
Signed-off-by: Michael Reed <email address hidden>

f4a9abe... by Paolo Pisati

UBUNTU: Ubuntu-5.15.0-25.25

Signed-off-by: Paolo Pisati <email address hidden>

d28a002... by Paolo Pisati

UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/1967146
Properties: no-test-build
Signed-off-by: Paolo Pisati <email address hidden>

c86ce85... by Paolo Pisati

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Paolo Pisati <email address hidden>

8239548... by Andrea Righi

UBUNTU: SAUCE: Revert "scsi: core: Reallocate device's budget map on queue depth change"

This reverts commit 813c6871f76bfaa1a61731aa2878b04540714f2d.

This commit seems to break certain SCSI drivers, causing I/O errors and
filesystem corruption. This problem can be systematically reproduced on
a power9 box with an aacraid controller.

Reverting this one seems the safest solution for now, until we find more
details about the real problem.

Signed-off-by: Andrea Righi <email address hidden>

25a5e77... by Paolo Pisati

UBUNTU: Ubuntu-5.15.0-24.24

Signed-off-by: Paolo Pisati <email address hidden>

7e61a04... by Paolo Pisati

UBUNTU: [Config] toolchain version update

Signed-off-by: Paolo Pisati <email address hidden>

5081adc... by Paolo Pisati

UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/1966305
Properties: no-test-build
Signed-off-by: Paolo Pisati <email address hidden>