Ubuntu 20.04.3 - ilzlnx3g1 - virtio-scsi devs on KVM guest having miscompares on disktests when there is a failed path.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
High
|
Skipper Bug Screeners | ||
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Canonical Server |
Bug Description
[Impact]
* There are situations where disk I/O from guests to host scsi
devices can fail to return the right status. In doing so
the guest believes that I/O was successful while it was not
leading silently to latter data corruption.
* Upstream fixed this in latter versions, backport these
[Test Plan]
* The IBM test lab can run tests with (virtual) cable pulls and
all that. This kind of testing revealed the issue initially (we
don't know what subset exactly). We'd rely on IBM to run those
tests against the builds in -proposed.
IBM already did that on the PPA which was quite helpful (see below)
* Any kind of SCSI attached disks would be worth to test.
That can be (all details herehttps:
- scsi device via hostdevs
- scsi device using iscsi
- scsi adapter via scsi-host + vhost
But that can only test if there is no regression in formerly working
simple setups. For the original case we have to rely on IBM (see above)
* Due to the complexity I'd suggest to keep this a bit
longer than usual in -proposed
[Where problems could occur]
* Qemu does a lot of things, problems of this change would occur
and be limited to the handling of scsi disks. - There is the usual kind of regression potential if our backports
missed anything or are bad. The code isn't easy, but we've now had
three developers having a look and it looks ok. - But then there is also the "intended regression" which is that we
now deliver error codes correctly. If there was a setup with bad I/O
errors and relying on not seeing them this will change. With this
upload these guests will get the error reported. We can't change
this as that is the main purpose of this fix. But one would assume
that people prefer that over silent corruption.
[Other Info]
* Per Comment: #22 this should stay in -proposed longer for up to 14 days to ensure that it gets extra testing.
--- original report ---
== Comment: #63 - Halil Pasic <email address hidden> - 2022-03-28 17:33:34 ==
I'm pretty confident I've figured out what is going on.
From the guest side, the decision whether the SCSI command was completed successfully or not comes down to looking at the sense data. Prior to commit
a108557bbf ("scsi: inline sg_io_sense_
build sense data as a response to seeing a host status presented by the host SCSI stack (e.g. kernel).
Thus when the kernel tells us that a given SCSI command did not get completed via
SCSI_HOST_
The guest kernel, and especially virtio and multipath are at no fault (AFAIU). Given these facts, it isn't all that surprising, that we end up with corruptions.
All we have to do is do backports for QEMU (when necessary). I didn't investigate vhost-scsi -- my guess is, that it ain't affected.
How do we want to handle the back-ports?
== Comment: #66 - Halil Pasic <email address hidden> - 2022-04-04 05:36:33 ==
This is a proposed backport containing 7 patches in mbox format. I tried to pick patches sanely, and all I had to do was basically resolving merge conflicts.
I have to admit I have no extensive experience in doing such invasive backports, and my knowledge of the QEMU SCSI stack is very limited. I would be happy if the Ubuntu folks would have a good look at this, and if possible improve on it.
Related branches
- Bryce Harrington (community): Approve
- Canonical Server: Pending requested
- git-ubuntu import: Pending requested
-
Diff: 1135 lines (+1071/-0)10 files modifieddebian/changelog (+9/-0)
debian/patches/series (+8/-0)
debian/patches/ubuntu/lp-1967814-scsi-Add-mapping-for-generic-SCSI_HOST-status-to-sen.patch (+126/-0)
debian/patches/ubuntu/lp-1967814-scsi-Rename-linux-specific-SG_ERR-codes-to-generic-S.patch (+90/-0)
debian/patches/ubuntu/lp-1967814-scsi-disk-convert-more-errno-values-back-to-SCSI-sta.patch (+54/-0)
debian/patches/ubuntu/lp-1967814-scsi-disk-move-scsi_handle_rw_error-earlier.patch (+213/-0)
debian/patches/ubuntu/lp-1967814-scsi-disk-pass-SCSI-status-to-scsi_handle_rw_error.patch (+97/-0)
debian/patches/ubuntu/lp-1967814-scsi-fix-sense-code-for-EREMOTEIO.patch (+51/-0)
debian/patches/ubuntu/lp-1967814-scsi-inline-sg_io_sense_from_errno-into-the-callers.patch (+251/-0)
debian/patches/ubuntu/lp-1967814-scsi-introduce-scsi_sense_from_errno.patch (+172/-0)
CVE References
Changed in ubuntu-z-systems: | |
status: | New → In Progress |
description: | updated |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
tags: |
added: targetmilestone-inin2004 removed: targetmilestone-inin--- |
Default Comment by Bridge