Merge ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-groovy into ubuntu/+source/qemu:ubuntu/devel

Proposed by Rafael David Tinoco
Status: Merged
Approved by: Rafael David Tinoco
Approved revision: 0afd3d59ea4c1a9c93bda2fb3e49cb9584224b11
Merge reported by: Christian Ehrhardt 
Merged at revision: 0afd3d59ea4c1a9c93bda2fb3e49cb9584224b11
Proposed branch: ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-groovy
Merge into: ubuntu/+source/qemu:ubuntu/devel
Diff against target: 310 lines (+279/-0)
4 files modified
debian/changelog (+8/-0)
debian/patches/series (+2/-0)
debian/patches/ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch (+110/-0)
debian/patches/ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch (+159/-0)
Reviewer Review Type Date Requested Status
Christian Ehrhardt  (community) Approve
Rafael David Tinoco (community) Approve
dann frazier (community) Approve
Canonical Server Pending
Review via email: mp+383530@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

[Impact]

* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.

-> Despite a single patch was mentioned by HWE team, checking upstream repository I saw that there were 3 aarch related fixes in the same merged.

One of those is unrelated to LP: #1805256:

3. aio-posix: signal-proof fdmon-io_uring

So I'm not suggesting. The other 2 patches are:

1. aio-wait: delegate polling of main AioContext if BQL not held

 - a NULL from qemu_get_current_aio_context() makes AIO_WAIT_WHILE()
   to invoke aio_poll() directly - when savevm/restorevm - and that
   is incorrect if qemu big lock is not held: aio_pol() should never
   run concurrently with other concurrent I/O threads.

This is a serious issue and was merged as a Aarch64 fix together with the
one bellow, the most important for our case. There is *not* a 1:1 relation
ship with the test case, but I thought.. since we will stress the AIO logic
checking for regressions, it could be a good time for a fix like this also.

@paelser: Feel free to drop it per SRU guidelines. It would only affect VMs
using I/O threads, allowing a race window to exist in between the threads
and AIO polling logic.

2. async: use explicit memory barriers

 - In the following situation:

          write ctx->notify_me write bh->scheduled
          read bh->scheduled read ctx->notify_me
          if !bh->scheduled, sleep if ctx->notify_me, notify

   happening within the AIO bottom half scheduler needs SEQCST operations
   for reads and writes. This could be expensive due to many places outside
   the bottom half polling status.

   To make this smoother and perform better, a SEQCST mem barrier was added
   in between the writes and the reads for ctx->notify_me (as bh->scheduled
   is never written concurrently).

 This is the EXACT condition that brought us to all this discussion upstream.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

BTW I mentioned SRU guidelines because I'm assuming the patch "aio-wait: delegate polling of main AioContext if BQL not held" will also be portable to some (or all) the other versions and I'll suggest together, knowing that you can just ignore that patch if you think its not worth, or it could jeopardize the AIO logic. IMO, I think the risk will be mitigated by the other fix testing.

Revision history for this message
dann frazier (dannf) wrote :

+1, since it is the same source changes as the focal package I verified in
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/383545/comments/1006846

review: Approve
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This groovy MP needs a rebase due to CVE-2020-11869 fix in 1:4.2-3ubuntu7

It otherwise looks ok to me

review: Needs Fixing
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Can you update the PPA (https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1805256/) after the rebase. I'll then run a regression test set and if ok sponsor it into groovy.

0afd3d5... by Rafael David Tinoco

changelog

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Done!

review: Approve
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the rebase, non case specific regression testing is a few thousand tests in and still working.
Approving the MP now.
If no issue shows up late I'll sponsor this to -unapproved later on.

review: Approve
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Regression tests completed - sponsored to groovy

Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading qemu_4.2-3ubuntu8.dsc: done.
  Uploading qemu_4.2-3ubuntu8.debian.tar.xz: done.
  Uploading qemu_4.2-3ubuntu8_source.buildinfo: done.
  Uploading qemu_4.2-3ubuntu8_source.changes: done.
Successfully uploaded packages.

Note: tagging for git ubuntu ran into problems, but it isn't the most complex tag structure so that is ok

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Blocked on a flaky systemd test atm, otherwise LGTM.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Migrating right now ...

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index 88974cf..e074202 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,11 @@
6+qemu (1:4.2-3ubuntu8) groovy; urgency=medium
7+
8+ * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
9+ - async: use explicit memory barriers (LP: #1805256)
10+ - aio-wait: delegate polling of main AioContext if BQL not held
11+
12+ -- Rafael David Tinoco <rafaeldtinoco@ubuntu.com> Wed, 27 May 2020 21:47:21 +0000
13+
14 qemu (1:4.2-3ubuntu7) groovy; urgency=medium
15
16 * SECURITY UPDATE: DoS via integer overflow in ati_2d_blt()
17diff --git a/debian/patches/series b/debian/patches/series
18index 9e5edf2..a61ded8 100644
19--- a/debian/patches/series
20+++ b/debian/patches/series
21@@ -16,6 +16,8 @@ lp-1859527-virtio-blk-fix-out-of-bounds-access-to-bitmap-in-not.patch
22 ubuntu/vhost-user-gpu-Drop-trailing-json-comma.patch
23 ubuntu/lp-1847361-modules-load-upgrade.patch
24 ubuntu/lp-1847361-vhost-correctly-turn-on-VIRTIO_F_IOMMU_PLATFORM.patch
25+ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch
26+ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch
27
28 # stabilize 4.2 with patches sent to qemu-stable since 4.2 released
29 stable/lp-1867519-arm-arm-powerctl-set-NSACR.-CP11-CP10-bits-in-arm_se.patch
30diff --git a/debian/patches/ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch b/debian/patches/ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch
31new file mode 100644
32index 0000000..ec5d9e1
33--- /dev/null
34+++ b/debian/patches/ubuntu/lp-1805256-aio-wait_delegate_poll_aioctx_bql.patch
35@@ -0,0 +1,110 @@
36+Description: aio-wait: delegate polling of main AioContext if BQL not held
37+
38+Any thread that is not a iothread returns NULL for qemu_get_current_aio_context().
39+As a result, it would also return true for
40+in_aio_context_home_thread(qemu_get_aio_context()), causing
41+AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect
42+if the BQL is not held, because aio_poll() does not expect to
43+run concurrently from multiple threads, and it can actually
44+happen when savevm writes to the vmstate file from the
45+migration thread.
46+
47+Therefore, restrict in_aio_context_home_thread to return true
48+for the main AioContext only if the BQL is held.
49+
50+The function is moved to aio-wait.h because it is mostly used
51+there and to avoid a circular reference between main-loop.h
52+and block/aio.h.
53+
54+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
55+Message-Id: <20200407140746.8041-5-pbonzini@redhat.com>
56+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
57+
58+Author: Paolo Bonzini <pbonzini@redhat.com>
59+Origin: upstream, https://github.com/qemu/qemu/commit/3c18a92dc
60+Bug: https://launchpad.net/bugs/1805256
61+Bug-Ubuntu: https://launchpad.net/bugs/1805256
62+Reviewed-By: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
63+Last-Update: 2020-05-06
64+
65+--- qemu-4.2.orig/include/block/aio-wait.h
66++++ qemu-4.2/include/block/aio-wait.h
67+@@ -26,6 +26,7 @@
68+ #define QEMU_AIO_WAIT_H
69+
70+ #include "block/aio.h"
71++#include "qemu/main-loop.h"
72+
73+ /**
74+ * AioWait:
75+@@ -124,4 +125,25 @@ void aio_wait_kick(void);
76+ */
77+ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
78+
79++/**
80++ * in_aio_context_home_thread:
81++ * @ctx: the aio context
82++ *
83++ * Return whether we are running in the thread that normally runs @ctx. Note
84++ * that acquiring/releasing ctx does not affect the outcome, each AioContext
85++ * still only has one home thread that is responsible for running it.
86++ */
87++static inline bool in_aio_context_home_thread(AioContext *ctx)
88++{
89++ if (ctx == qemu_get_current_aio_context()) {
90++ return true;
91++ }
92++
93++ if (ctx == qemu_get_aio_context()) {
94++ return qemu_mutex_iothread_locked();
95++ } else {
96++ return false;
97++ }
98++}
99++
100+ #endif /* QEMU_AIO_WAIT_H */
101+--- qemu-4.2.orig/include/block/aio.h
102++++ qemu-4.2/include/block/aio.h
103+@@ -60,12 +60,16 @@ struct AioContext {
104+ QLIST_HEAD(, AioHandler) aio_handlers;
105+
106+ /* Used to avoid unnecessary event_notifier_set calls in aio_notify;
107+- * accessed with atomic primitives. If this field is 0, everything
108+- * (file descriptors, bottom halves, timers) will be re-evaluated
109+- * before the next blocking poll(), thus the event_notifier_set call
110+- * can be skipped. If it is non-zero, you may need to wake up a
111+- * concurrent aio_poll or the glib main event loop, making
112+- * event_notifier_set necessary.
113++ * only written from the AioContext home thread, or under the BQL in
114++ * the case of the main AioContext. However, it is read from any
115++ * thread so it is still accessed with atomic primitives.
116++ *
117++ * If this field is 0, everything (file descriptors, bottom halves,
118++ * timers) will be re-evaluated before the next blocking poll() or
119++ * io_uring wait; therefore, the event_notifier_set call can be
120++ * skipped. If it is non-zero, you may need to wake up a concurrent
121++ * aio_poll or the glib main event loop, making event_notifier_set
122++ * necessary.
123+ *
124+ * Bit 0 is reserved for GSource usage of the AioContext, and is 1
125+ * between a call to aio_ctx_prepare and the next call to aio_ctx_check.
126+@@ -581,19 +585,6 @@ void aio_co_enter(AioContext *ctx, struc
127+ AioContext *qemu_get_current_aio_context(void);
128+
129+ /**
130+- * in_aio_context_home_thread:
131+- * @ctx: the aio context
132+- *
133+- * Return whether we are running in the thread that normally runs @ctx. Note
134+- * that acquiring/releasing ctx does not affect the outcome, each AioContext
135+- * still only has one home thread that is responsible for running it.
136+- */
137+-static inline bool in_aio_context_home_thread(AioContext *ctx)
138+-{
139+- return ctx == qemu_get_current_aio_context();
140+-}
141+-
142+-/**
143+ * aio_context_setup:
144+ * @ctx: the aio context
145+ *
146diff --git a/debian/patches/ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch b/debian/patches/ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch
147new file mode 100644
148index 0000000..3ed09e1
149--- /dev/null
150+++ b/debian/patches/ubuntu/lp-1805256-async_use_explicit_mem_barriers.patch
151@@ -0,0 +1,159 @@
152+Description: async: use explicit memory barriers
153+
154+When using C11 atomics, non-seqcst reads and writes do not participate
155+in the total order of seqcst operations. In util/async.c and util/aio-posix.c,
156+in particular, the pattern that we use
157+
158+ write ctx->notify_me write bh->scheduled
159+ read bh->scheduled read ctx->notify_me
160+ if !bh->scheduled, sleep if ctx->notify_me, notify
161+
162+needs to use seqcst operations for both the write and the read. In
163+general this is something that we do not want, because there can be
164+many sources that are polled in addition to bottom halves. The
165+alternative is to place a seqcst memory barrier between the write
166+and the read. This also comes with a disadvantage, in that the
167+memory barrier is implicit on strongly-ordered architectures and
168+it wastes a few dozen clock cycles.
169+
170+Fortunately, ctx->notify_me is never written concurrently by two
171+threads, so we can assert that and relax the writes to ctx->notify_me.
172+The resulting solution works and performs well on both aarch64 and x86.
173+
174+Note that the atomic_set/atomic_read combination is not an atomic
175+read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
176+on x86, ATOMIC_RELAXED compiles to a locked operation.
177+
178+Analyzed-by: Ying Fang <fangying1@huawei.com>
179+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
180+Tested-by: Ying Fang <fangying1@huawei.com>
181+Message-Id: <20200407140746.8041-6-pbonzini@redhat.com>
182+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
183+
184+Author: Paolo Bonzini <pbonzini@redhat.com>
185+Origin: upstream, https://github.com/qemu/qemu/commit/5710a3e09f
186+Bug: https://bugs.launchpad.net/bugs/1805256
187+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1805256
188+Reviewed-By: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
189+Last-Update: 2020-05-06
190+
191+--- qemu-4.2.orig/util/aio-posix.c
192++++ qemu-4.2/util/aio-posix.c
193+@@ -616,6 +616,11 @@ bool aio_poll(AioContext *ctx, bool bloc
194+ int64_t timeout;
195+ int64_t start = 0;
196+
197++ /*
198++ * There cannot be two concurrent aio_poll calls for the same AioContext (or
199++ * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
200++ * We rely on this below to avoid slow locked accesses to ctx->notify_me.
201++ */
202+ assert(in_aio_context_home_thread(ctx));
203+
204+ /* aio_notify can avoid the expensive event_notifier_set if
205+@@ -626,7 +631,13 @@ bool aio_poll(AioContext *ctx, bool bloc
206+ * so disable the optimization now.
207+ */
208+ if (blocking) {
209+- atomic_add(&ctx->notify_me, 2);
210++ atomic_set(&ctx->notify_me, atomic_read(&ctx->notify_me) + 2);
211++ /*
212++ * Write ctx->notify_me before computing the timeout
213++ * (reading bottom half flags, etc.). Pairs with
214++ * smp_mb in aio_notify().
215++ */
216++ smp_mb();
217+ }
218+
219+ qemu_lockcnt_inc(&ctx->list_lock);
220+@@ -671,7 +682,8 @@ bool aio_poll(AioContext *ctx, bool bloc
221+ }
222+
223+ if (blocking) {
224+- atomic_sub(&ctx->notify_me, 2);
225++ /* Finish the poll before clearing the flag. */
226++ atomic_store_release(&ctx->notify_me, atomic_read(&ctx->notify_me) - 2);
227+ aio_notify_accept(ctx);
228+ }
229+
230+--- qemu-4.2.orig/util/aio-win32.c
231++++ qemu-4.2/util/aio-win32.c
232+@@ -321,6 +321,12 @@ bool aio_poll(AioContext *ctx, bool bloc
233+ int count;
234+ int timeout;
235+
236++ /*
237++ * There cannot be two concurrent aio_poll calls for the same AioContext (or
238++ * an aio_poll concurrent with a GSource prepare/check/dispatch callback).
239++ * We rely on this below to avoid slow locked accesses to ctx->notify_me.
240++ */
241++ assert(in_aio_context_home_thread(ctx));
242+ progress = false;
243+
244+ /* aio_notify can avoid the expensive event_notifier_set if
245+@@ -331,7 +337,13 @@ bool aio_poll(AioContext *ctx, bool bloc
246+ * so disable the optimization now.
247+ */
248+ if (blocking) {
249+- atomic_add(&ctx->notify_me, 2);
250++ atomic_set(&ctx->notify_me, atomic_read(&ctx->notify_me) + 2);
251++ /*
252++ * Write ctx->notify_me before computing the timeout
253++ * (reading bottom half flags, etc.). Pairs with
254++ * smp_mb in aio_notify().
255++ */
256++ smp_mb();
257+ }
258+
259+ qemu_lockcnt_inc(&ctx->list_lock);
260+@@ -364,8 +376,7 @@ bool aio_poll(AioContext *ctx, bool bloc
261+ ret = WaitForMultipleObjects(count, events, FALSE, timeout);
262+ if (blocking) {
263+ assert(first);
264+- assert(in_aio_context_home_thread(ctx));
265+- atomic_sub(&ctx->notify_me, 2);
266++ atomic_store_release(&ctx->notify_me, atomic_read(&ctx->notify_me) - 2);
267+ aio_notify_accept(ctx);
268+ }
269+
270+--- qemu-4.2.orig/util/async.c
271++++ qemu-4.2/util/async.c
272+@@ -220,7 +220,14 @@ aio_ctx_prepare(GSource *source, gint
273+ {
274+ AioContext *ctx = (AioContext *) source;
275+
276+- atomic_or(&ctx->notify_me, 1);
277++ atomic_set(&ctx->notify_me, atomic_read(&ctx->notify_me) | 1);
278++
279++ /*
280++ * Write ctx->notify_me before computing the timeout
281++ * (reading bottom half flags, etc.). Pairs with
282++ * smp_mb in aio_notify().
283++ */
284++ smp_mb();
285+
286+ /* We assume there is no timeout already supplied */
287+ *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
288+@@ -238,7 +245,8 @@ aio_ctx_check(GSource *source)
289+ AioContext *ctx = (AioContext *) source;
290+ QEMUBH *bh;
291+
292+- atomic_and(&ctx->notify_me, ~1);
293++ /* Finish computing the timeout before clearing the flag. */
294++ atomic_store_release(&ctx->notify_me, atomic_read(&ctx->notify_me) & ~1);
295+ aio_notify_accept(ctx);
296+
297+ for (bh = ctx->first_bh; bh; bh = bh->next) {
298+@@ -343,10 +351,10 @@ LinuxAioState *aio_get_linux_aio(AioCont
299+ void aio_notify(AioContext *ctx)
300+ {
301+ /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
302+- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
303++ * with smp_mb in aio_ctx_prepare or aio_poll.
304+ */
305+ smp_mb();
306+- if (ctx->notify_me) {
307++ if (atomic_read(&ctx->notify_me)) {
308+ event_notifier_set(&ctx->notifier);
309+ atomic_mb_set(&ctx->notified, true);
310+ }

Subscribers

People subscribed via source and target branches