Merge ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-groovy into ubuntu/+source/qemu:ubuntu/devel
Proposed by
Rafael David Tinoco
Status: | Merged |
---|---|
Approved by: | Rafael David Tinoco |
Approved revision: | 0afd3d59ea4c1a9c93bda2fb3e49cb9584224b11 |
Merge reported by: | Christian Ehrhardt |
Merged at revision: | 0afd3d59ea4c1a9c93bda2fb3e49cb9584224b11 |
Proposed branch: | ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-groovy |
Merge into: | ubuntu/+source/qemu:ubuntu/devel |
Diff against target: |
310 lines (+279/-0) 4 files modified
debian/changelog (+8/-0) debian/patches/series (+2/-0) debian/patches/ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch (+110/-0) debian/patches/ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch (+159/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Christian Ehrhardt (community) | Approve | ||
Rafael David Tinoco (community) | Approve | ||
dann frazier (community) | Approve | ||
Canonical Server | Pending | ||
Review via email:
|
To post a comment you must log in.
[Impact]
* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.
-> Despite a single patch was mentioned by HWE team, checking upstream repository I saw that there were 3 aarch related fixes in the same merged.
One of those is unrelated to LP: #1805256:
3. aio-posix: signal-proof fdmon-io_uring
So I'm not suggesting. The other 2 patches are:
1. aio-wait: delegate polling of main AioContext if BQL not held
- a NULL from qemu_get_ current_ aio_context( ) makes AIO_WAIT_WHILE()
to invoke aio_poll() directly - when savevm/restorevm - and that
is incorrect if qemu big lock is not held: aio_pol() should never
run concurrently with other concurrent I/O threads.
This is a serious issue and was merged as a Aarch64 fix together with the
one bellow, the most important for our case. There is *not* a 1:1 relation
ship with the test case, but I thought.. since we will stress the AIO logic
checking for regressions, it could be a good time for a fix like this also.
@paelser: Feel free to drop it per SRU guidelines. It would only affect VMs
using I/O threads, allowing a race window to exist in between the threads
and AIO polling logic.
2. async: use explicit memory barriers
- In the following situation:
write ctx->notify_me write bh->scheduled
read bh->scheduled read ctx->notify_me
if !bh->scheduled, sleep if ctx->notify_me, notify
happening within the AIO bottom half scheduler needs SEQCST operations
for reads and writes. This could be expensive due to many places outside
the bottom half polling status.
To make this smoother and perform better, a SEQCST mem barrier was added
in between the writes and the reads for ctx->notify_me (as bh->scheduled
is never written concurrently).
This is the EXACT condition that brought us to all this discussion upstream.