~timg-tpi/ubuntu/+source/linux-azure/+git/lunar:lunar-azure-fix-vm-add-remove-race-condition

Last commit made on 2023-06-12
Get this branch:
git clone -b lunar-azure-fix-vm-add-remove-race-condition https://git.launchpad.net/~timg-tpi/ubuntu/+source/linux-azure/+git/lunar
Only Tim Gardner can upload to this branch. If you are Tim Gardner please log in for upload directions.

Branch merges

Branch information

Name:
lunar-azure-fix-vm-add-remove-race-condition
Repository:
lp:~timg-tpi/ubuntu/+source/linux-azure/+git/lunar

Recent commits

7e73419... by Dexuan Cui

PCI: hv: Use async probing to reduce boot time

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

Commit 414428c5da1c ("PCI: hv: Lock PCI bus on device eject") added
pci_lock_rescan_remove() and pci_unlock_rescan_remove() in
create_root_hv_pci_bus() and in hv_eject_device_work() to address the
race between create_root_hv_pci_bus() and hv_eject_device_work(), but it
turns that grabing the pci_rescan_remove_lock mutex is not enough:
refer to the earlier fix "PCI: hv: Add a per-bus mutex state_lock".

Now with hbus->state_lock and other fixes, the race is resolved, so
remove pci_{lock,unlock}_rescan_remove() in create_root_hv_pci_bus():
this removes the serialization in hv_pci_probe() and hence allows
async-probing (PROBE_PREFER_ASYNCHRONOUS) to work.

Add the async-probing flag to hv_pci_drv.

pci_{lock,unlock}_rescan_remove() in hv_eject_device_work() and in
hv_pci_remove() are still kept: according to the comment before
drivers/pci/probe.c: static DEFINE_MUTEX(pci_rescan_remove_lock),
"PCI device removal routines should always be executed under this mutex".

Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Reviewed-by: Long Li <email address hidden>
Cc: <email address hidden>
(cherry picked from commit 08a9019ea35582f310946d193e5daab53931fd04 https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

ef976c8... by Dexuan Cui

PCI: hv: Add a per-bus mutex state_lock

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.

Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Cc: <email address hidden>
(cherry picked from commit 04e9e9f43aee08daeb6089b4cbccba0f70dc95f0 https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

75f3b51... by Dexuan Cui

Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195.

The statement "the hv_pci_bus_exit() call releases structures of all its
child devices" in commit d6af2ed29c7c is not true: in the path
hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
child "struct hv_pci_dev *hpdev" that is created earlier in
pci_devices_present_work() -> new_pcichild_device().

The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7,
where the old version of hv_pci_bus_exit() was used; when the commit was
rebased and merged into the upstream, people didn't notice that it's
not really necessary. The commit itself doesn't cause any issue, but it
makes hv_pci_probe() more complicated. Revert it to facilitate some
upcoming changes to hv_pci_probe().

Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Acked-by: Wei Hu <email address hidden>
Cc: <email address hidden>
(cherry picked from commit f3224e09f4a6e490cfb4c04408fdaac5c6745f2e https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

fcc802d... by Dexuan Cui

PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.

Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Cc: <email address hidden>
(cherry picked from commit ae2fa5f9cdb4fc1be856c9001fa9095178b4e1b8 https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

3bb44aa... by Dexuan Cui

PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.

In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.

Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.

Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Cc: <email address hidden>
(cherry picked from commit c35a2450b29018144effb7859598dfe6e8ef0e1f https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

3182c1c... by Dexuan Cui

PCI: hv: Fix a race condition bug in hv_pci_query_relations()

BugLink: https://bugs.launchpad.net/bugs/2023071
BugLink: https://bugs.launchpad.net/bugs/2023594

Fix the longstanding race between hv_pci_query_relations() and
survey_child_resources() by flushing the workqueue before we exit from
hv_pci_query_relations().

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <email address hidden>
Reviewed-by: Michael Kelley <email address hidden>
Cc: <email address hidden>
(cherry picked from commit 70e1824c375403356308bb796fbc09f6e20f1af4 https://github.com/dcui/tdx.git)
Signed-off-by: Tim Gardner <email address hidden>

e4972fe... by Tim Gardner

UBUNTU: Ubuntu-azure-6.2.0-1005.5

Signed-off-by: Tim Gardner <email address hidden>

e5a3a6a... by Tim Gardner

UBUNTU: [Packaging] azure: introduce a separate linux-lib-rust package

BugLink: https://bugs.launchpad.net/bugs/2015867

    After enabling Rust in the kernel, the size of linux-headers increased
    consistently.

    Some work has been done to reduce the size, such as dropping the binary
    artifacts (*.o and *.cmd), but it would be nice to keep the size of
    linux-headers reasonably small to avoid wasting too much space in the
    cloud images.

    For this reason introduce a new package linux-lib-rust to ship all the
    Rust headers and libraries required to build out-of-tree kernel modules
    in Rust.

    Before this patch: 96M /usr/src/linux-headers-6.2.0-21-generic
     After this patch: 29M /usr/src/linux-headers-6.2.0-21-generic

    Signed-off-by: Andrea Righi <email address hidden>
    Acked-by: Luke Nowakowski-Krijger <email address hidden>
    Acked-by: Tim Gardner <email address hidden>
    Signed-off-by: Stefan Bader <email address hidden>

Copied from master

Signed-off-by: Tim Gardner <email address hidden>

0353ec1... by Tim Gardner

UBUNTU: link-to-tracker: update tracking bug

BugLink: https://bugs.launchpad.net/bugs/2019837
Properties: no-test-build
Signed-off-by: Tim Gardner <email address hidden>

bc7d9d6... by Tim Gardner

UBUNTU: debian/dkms-versions -- update from kernel-versions (main/master)

BugLink: https://bugs.launchpad.net/bugs/1786013
Signed-off-by: Tim Gardner <email address hidden>