Merge ~sergiodj/ubuntu/+source/qemu:tcg-crash-mantic into ubuntu/+source/qemu:ubuntu/mantic-devel
- Git
- lp:~sergiodj/ubuntu/+source/qemu
- tcg-crash-mantic
- Merge into ubuntu/mantic-devel
Status: | Merged |
---|---|
Approved by: | git-ubuntu bot |
Approved revision: | not available |
Merge reported by: | Andreas Hasenack |
Merged at revision: | 289556433dc6174924c419625887e129e69b3f5f |
Proposed branch: | ~sergiodj/ubuntu/+source/qemu:tcg-crash-mantic |
Merge into: | ubuntu/+source/qemu:ubuntu/mantic-devel |
Diff against target: |
789 lines (+755/-0) 5 files modified
debian/changelog (+7/-0) debian/patches/series (+3/-0) debian/patches/ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch (+621/-0) debian/patches/ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch (+34/-0) debian/patches/ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch (+90/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
git-ubuntu bot | Approve | ||
Christian Ehrhardt (community) | Approve | ||
Canonical Server Reporter | Pending | ||
Review via email:
|
Commit message
Description of the change
This MP fixes QEMU bug #2051965 on Mantic.
This is a tricky bug; you can take a look at the Test Plan to see how to reproduce it. There will be SRUs for Jammy and Mantic; the former will be more complex to review, and the latter will be more straightforward (although not without its complexities). As of this writing, the Jammy SRU is not ready yet because I'm still working on backporting the needed patches.
I'm particularly interested in a double check of the 3 patches being backported to fix the bug. Let me know if you spot any problems or missing patches, please.
PPA: https:/
dep8 results will be posted soon.

git-ubuntu bot (git-ubuntu-bot) wrote : | # |
Approvers: sergiodj, paelzer
Uploaders: sergiodj, paelzer
MP auto-approved

Sergio Durigan Junior (sergiodj) wrote : | # |
On Wednesday, February 14 2024, Christian Ehrhardt wrote:
> Review: Approve
Thanks for the review, Christian.
> Some trivial findings first:
>
> - patches are in debian/
Ah, indeed. Thanks, I'll fix it.
> - some places mention kernel "6.13" while it is "6.3"
Doh. Fixed.
> - avoid some ambiguities by stating clearly that the kernel >=6.3 is in meant to be the one in the guest, otherwise some might think "so what, 6.3 isn't supported in jammy/mantic anyway" or similar
Done.
> I can see why you said Jammy backport might be more complex, there are
> probably plenty of changes to that code. My gut says that you'd be ok
> to backport quite some more changes (instead of only adapting those
> three patches) if you need it, but that is up to you as you see fit.
Yeah, I've been collecting the "adjacent" patches that will need to be
backported on Jammy. I think it should be fine at the end of the day.
> +1 on this one with the minor things fixed.
Thanks. I'll wait until I have the Jammy MP ready before I upload this
one.
> For the "where issues occur" in the SRU statement, the most important bit is that this seems purely in tcg code, so we can assume that HW-virt will not be affected ever.
ACK. I'm putting off writing the rest of the SRU template until I have
a clear picture of what the Jammy MP will be.
Cheers,
--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Sergio Durigan Junior (sergiodj) wrote : | # |
On Wednesday, February 14 2024, Sergio Durigan Junior wrote:
>> For the "where issues occur" in the SRU statement, the most important bit is that this seems purely in tcg code, so we can assume that HW-virt will not be affected ever.
>
> ACK. I'm putting off writing the rest of the SRU template until I have
> a clear picture of what the Jammy MP will be.
The Jammy fix will take a bit longer, so I'm going ahead and uploading
this one. The reporter is being affected by the Mantic bug, so
hopefully this will unblock him while I figure out the Jammy fix.
Uploaded:
$ dput qemu_8.
Trying to upload package to ubuntu
Checking signature on .changes
gpg: /home/sergio/
Checking signature on .dsc
gpg: /home/sergio/
Uploading to ubuntu (via ftp to upload.ubuntu.com):
Uploading qemu_8.
Uploading qemu_8.
Uploading qemu_8.
Uploading qemu_8.
Successfully uploaded packages.
--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Andreas Hasenack (ahasenack) wrote : | # |
This was released already, marking as merged.
Preview Diff
1 | diff --git a/debian/changelog b/debian/changelog |
2 | index fe1973e..e055d38 100644 |
3 | --- a/debian/changelog |
4 | +++ b/debian/changelog |
5 | @@ -1,3 +1,10 @@ |
6 | +qemu (1:8.0.4+dfsg-1ubuntu3.23.10.3) mantic; urgency=medium |
7 | + |
8 | + * d/p/u/lp-2051965-*.patch: Fix QEMU crash when using TCG acceleration |
9 | + with guest Linux kernel >= 6.3. (LP: #2051965) |
10 | + |
11 | + -- Sergio Durigan Junior <sergio.durigan@canonical.com> Tue, 13 Feb 2024 18:26:17 -0500 |
12 | + |
13 | qemu (1:8.0.4+dfsg-1ubuntu3.23.10.2) mantic-security; urgency=medium |
14 | |
15 | * SECURITY UPDATE: OOB read in RDMA device |
16 | diff --git a/debian/patches/series b/debian/patches/series |
17 | index 75c6b02..553651c 100644 |
18 | --- a/debian/patches/series |
19 | +++ b/debian/patches/series |
20 | @@ -36,3 +36,6 @@ CVE-2023-40360.patch |
21 | CVE-2023-4135.patch |
22 | CVE-2023-42467.patch |
23 | CVE-2023-5088.patch |
24 | +ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch |
25 | +ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch |
26 | +ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch |
27 | diff --git a/debian/patches/ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch b/debian/patches/ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch |
28 | new file mode 100644 |
29 | index 0000000..2d567b0 |
30 | --- /dev/null |
31 | +++ b/debian/patches/ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch |
32 | @@ -0,0 +1,621 @@ |
33 | +From deba78709ae8ce103e2248413857747f804cd1ef Mon Sep 17 00:00:00 2001 |
34 | +From: Richard Henderson <richard.henderson@linaro.org> |
35 | +Date: Thu, 6 Jul 2023 17:55:48 +0100 |
36 | +Subject: [PATCH] accel/tcg: Always lock pages before translation |
37 | + |
38 | +We had done this for user-mode by invoking page_protect |
39 | +within the translator loop. Extend this to handle system |
40 | +mode as well. Move page locking out of tb_link_page. |
41 | + |
42 | +Reported-by: Liren Wei <lrwei@bupt.edu.cn> |
43 | +Reported-by: Richard W.M. Jones <rjones@redhat.com> |
44 | +Signed-off-by: Richard Henderson <richard.henderson@linaro.org> |
45 | +Tested-by: Richard W.M. Jones <rjones@redhat.com> |
46 | + |
47 | +Origin: backport, https://gitlab.com/qemu-project/qemu/commit/deba78709a |
48 | +Bug-Ubuntu: https://bugs.launchpad.net/bugs/2051965 |
49 | +Last-Update: 2024-02-13 |
50 | + |
51 | +--- |
52 | + accel/tcg/cpu-exec.c | 20 ++++ |
53 | + accel/tcg/internal.h | 30 ++++- |
54 | + accel/tcg/tb-maint.c | 244 ++++++++++++++++++++------------------ |
55 | + accel/tcg/translate-all.c | 43 ++++++- |
56 | + accel/tcg/translator.c | 34 ++++-- |
57 | + 5 files changed, 237 insertions(+), 134 deletions(-) |
58 | + |
59 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/cpu-exec.c 2024-02-13 18:18:45.276155938 -0500 |
60 | ++++ qemu-tcg-crash-mantic/accel/tcg/cpu-exec.c 2024-02-13 18:18:45.276155938 -0500 |
61 | +@@ -539,6 +539,26 @@ static void cpu_exec_longjmp_cleanup(CPU |
62 | + if (have_mmap_lock()) { |
63 | + mmap_unlock(); |
64 | + } |
65 | ++#else |
66 | ++ /* |
67 | ++ * For softmmu, a tlb_fill fault during translation will land here, |
68 | ++ * and we need to release any page locks held. In system mode we |
69 | ++ * have one tcg_ctx per thread, so we know it was this cpu doing |
70 | ++ * the translation. |
71 | ++ * |
72 | ++ * Alternative 1: Install a cleanup to be called via an exception |
73 | ++ * handling safe longjmp. It seems plausible that all our hosts |
74 | ++ * support such a thing. We'd have to properly register unwind info |
75 | ++ * for the JIT for EH, rather that just for GDB. |
76 | ++ * |
77 | ++ * Alternative 2: Set and restore cpu->jmp_env in tb_gen_code to |
78 | ++ * capture the cpu_loop_exit longjmp, perform the cleanup, and |
79 | ++ * jump again to arrive here. |
80 | ++ */ |
81 | ++ if (tcg_ctx->gen_tb) { |
82 | ++ tb_unlock_pages(tcg_ctx->gen_tb); |
83 | ++ tcg_ctx->gen_tb = NULL; |
84 | ++ } |
85 | + #endif |
86 | + if (qemu_mutex_iothread_locked()) { |
87 | + qemu_mutex_unlock_iothread(); |
88 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/internal.h 2024-02-13 18:18:45.276155938 -0500 |
89 | ++++ qemu-tcg-crash-mantic/accel/tcg/internal.h 2024-02-13 18:18:45.276155938 -0500 |
90 | +@@ -10,6 +10,7 @@ |
91 | + #define ACCEL_TCG_INTERNAL_H |
92 | + |
93 | + #include "exec/exec-all.h" |
94 | ++#include "exec/translate-all.h" |
95 | + |
96 | + /* |
97 | + * Access to the various translations structures need to be serialised |
98 | +@@ -35,6 +36,32 @@ static inline void page_table_config_ini |
99 | + void page_table_config_init(void); |
100 | + #endif |
101 | + |
102 | ++#ifdef CONFIG_USER_ONLY |
103 | ++/* |
104 | ++ * For user-only, page_protect sets the page read-only. |
105 | ++ * Since most execution is already on read-only pages, and we'd need to |
106 | ++ * account for other TBs on the same page, defer undoing any page protection |
107 | ++ * until we receive the write fault. |
108 | ++ */ |
109 | ++static inline void tb_lock_page0(tb_page_addr_t p0) |
110 | ++{ |
111 | ++ page_protect(p0); |
112 | ++} |
113 | ++ |
114 | ++static inline void tb_lock_page1(tb_page_addr_t p0, tb_page_addr_t p1) |
115 | ++{ |
116 | ++ page_protect(p1); |
117 | ++} |
118 | ++ |
119 | ++static inline void tb_unlock_page1(tb_page_addr_t p0, tb_page_addr_t p1) { } |
120 | ++static inline void tb_unlock_pages(TranslationBlock *tb) { } |
121 | ++#else |
122 | ++void tb_lock_page0(tb_page_addr_t); |
123 | ++void tb_lock_page1(tb_page_addr_t, tb_page_addr_t); |
124 | ++void tb_unlock_page1(tb_page_addr_t, tb_page_addr_t); |
125 | ++void tb_unlock_pages(TranslationBlock *); |
126 | ++#endif |
127 | ++ |
128 | + #ifdef CONFIG_SOFTMMU |
129 | + void tb_invalidate_phys_range_fast(ram_addr_t ram_addr, |
130 | + unsigned size, |
131 | +@@ -48,8 +75,7 @@ TranslationBlock *tb_gen_code(CPUState * |
132 | + void page_init(void); |
133 | + void tb_htable_init(void); |
134 | + void tb_reset_jump(TranslationBlock *tb, int n); |
135 | +-TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, |
136 | +- tb_page_addr_t phys_page2); |
137 | ++TranslationBlock *tb_link_page(TranslationBlock *tb); |
138 | + bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc); |
139 | + void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, |
140 | + uintptr_t host_pc); |
141 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/tb-maint.c 2024-02-13 18:18:45.276155938 -0500 |
142 | ++++ qemu-tcg-crash-mantic/accel/tcg/tb-maint.c 2024-02-13 18:20:49.096346316 -0500 |
143 | +@@ -71,17 +71,7 @@ typedef struct PageDesc PageDesc; |
144 | + */ |
145 | + #define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock()) |
146 | + |
147 | +-static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, |
148 | +- PageDesc **ret_p2, tb_page_addr_t phys2, |
149 | +- bool alloc) |
150 | +-{ |
151 | +- *ret_p1 = NULL; |
152 | +- *ret_p2 = NULL; |
153 | +-} |
154 | +- |
155 | +-static inline void page_unlock(PageDesc *pd) { } |
156 | +-static inline void page_lock_tb(const TranslationBlock *tb) { } |
157 | +-static inline void page_unlock_tb(const TranslationBlock *tb) { } |
158 | ++static inline void tb_lock_pages(const TranslationBlock *tb) { } |
159 | + |
160 | + /* |
161 | + * For user-only, since we are protecting all of memory with a single lock, |
162 | +@@ -97,7 +87,7 @@ static void tb_remove_all(void) |
163 | + } |
164 | + |
165 | + /* Call with mmap_lock held. */ |
166 | +-static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2) |
167 | ++static void tb_record(TranslationBlock *tb) |
168 | + { |
169 | + target_ulong addr; |
170 | + int flags; |
171 | +@@ -392,12 +382,108 @@ static void page_lock(PageDesc *pd) |
172 | + qemu_spin_lock(&pd->lock); |
173 | + } |
174 | + |
175 | ++/* Like qemu_spin_trylock, returns false on success */ |
176 | ++static bool page_trylock(PageDesc *pd) |
177 | ++{ |
178 | ++ bool busy = qemu_spin_trylock(&pd->lock); |
179 | ++ if (!busy) { |
180 | ++ page_lock__debug(pd); |
181 | ++ } |
182 | ++ return busy; |
183 | ++} |
184 | ++ |
185 | + static void page_unlock(PageDesc *pd) |
186 | + { |
187 | + qemu_spin_unlock(&pd->lock); |
188 | + page_unlock__debug(pd); |
189 | + } |
190 | + |
191 | ++void tb_lock_page0(tb_page_addr_t paddr) |
192 | ++{ |
193 | ++ page_lock(page_find_alloc(paddr >> TARGET_PAGE_BITS, true)); |
194 | ++} |
195 | ++ |
196 | ++void tb_lock_page1(tb_page_addr_t paddr0, tb_page_addr_t paddr1) |
197 | ++{ |
198 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
199 | ++ tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS; |
200 | ++ PageDesc *pd0, *pd1; |
201 | ++ |
202 | ++ if (pindex0 == pindex1) { |
203 | ++ /* Identical pages, and the first page is already locked. */ |
204 | ++ return; |
205 | ++ } |
206 | ++ |
207 | ++ pd1 = page_find_alloc(pindex1, true); |
208 | ++ if (pindex0 < pindex1) { |
209 | ++ /* Correct locking order, we may block. */ |
210 | ++ page_lock(pd1); |
211 | ++ return; |
212 | ++ } |
213 | ++ |
214 | ++ /* Incorrect locking order, we cannot block lest we deadlock. */ |
215 | ++ if (!page_trylock(pd1)) { |
216 | ++ return; |
217 | ++ } |
218 | ++ |
219 | ++ /* |
220 | ++ * Drop the lock on page0 and get both page locks in the right order. |
221 | ++ * Restart translation via longjmp. |
222 | ++ */ |
223 | ++ pd0 = page_find_alloc(pindex0, false); |
224 | ++ page_unlock(pd0); |
225 | ++ page_lock(pd1); |
226 | ++ page_lock(pd0); |
227 | ++ siglongjmp(tcg_ctx->jmp_trans, -3); |
228 | ++} |
229 | ++ |
230 | ++void tb_unlock_page1(tb_page_addr_t paddr0, tb_page_addr_t paddr1) |
231 | ++{ |
232 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
233 | ++ tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS; |
234 | ++ |
235 | ++ if (pindex0 != pindex1) { |
236 | ++ page_unlock(page_find_alloc(pindex1, false)); |
237 | ++ } |
238 | ++} |
239 | ++ |
240 | ++static void tb_lock_pages(TranslationBlock *tb) |
241 | ++{ |
242 | ++ tb_page_addr_t paddr0 = tb_page_addr0(tb); |
243 | ++ tb_page_addr_t paddr1 = tb_page_addr1(tb); |
244 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
245 | ++ tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS; |
246 | ++ |
247 | ++ if (unlikely(paddr0 == -1)) { |
248 | ++ return; |
249 | ++ } |
250 | ++ if (unlikely(paddr1 != -1) && pindex0 != pindex1) { |
251 | ++ if (pindex0 < pindex1) { |
252 | ++ page_lock(page_find_alloc(pindex0, true)); |
253 | ++ page_lock(page_find_alloc(pindex1, true)); |
254 | ++ return; |
255 | ++ } |
256 | ++ page_lock(page_find_alloc(pindex1, true)); |
257 | ++ } |
258 | ++ page_lock(page_find_alloc(pindex0, true)); |
259 | ++} |
260 | ++ |
261 | ++void tb_unlock_pages(TranslationBlock *tb) |
262 | ++{ |
263 | ++ tb_page_addr_t paddr0 = tb_page_addr0(tb); |
264 | ++ tb_page_addr_t paddr1 = tb_page_addr1(tb); |
265 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
266 | ++ tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS; |
267 | ++ |
268 | ++ if (unlikely(paddr0 == -1)) { |
269 | ++ return; |
270 | ++ } |
271 | ++ if (unlikely(paddr1 != -1) && pindex0 != pindex1) { |
272 | ++ page_unlock(page_find_alloc(pindex1, false)); |
273 | ++ } |
274 | ++ page_unlock(page_find_alloc(pindex0, false)); |
275 | ++} |
276 | ++ |
277 | + static inline struct page_entry * |
278 | + page_entry_new(PageDesc *pd, tb_page_addr_t index) |
279 | + { |
280 | +@@ -421,13 +507,10 @@ static void page_entry_destroy(gpointer |
281 | + /* returns false on success */ |
282 | + static bool page_entry_trylock(struct page_entry *pe) |
283 | + { |
284 | +- bool busy; |
285 | +- |
286 | +- busy = qemu_spin_trylock(&pe->pd->lock); |
287 | ++ bool busy = page_trylock(pe->pd); |
288 | + if (!busy) { |
289 | + g_assert(!pe->locked); |
290 | + pe->locked = true; |
291 | +- page_lock__debug(pe->pd); |
292 | + } |
293 | + return busy; |
294 | + } |
295 | +@@ -605,8 +688,7 @@ static void tb_remove_all(void) |
296 | + * Add the tb in the target page and protect it if necessary. |
297 | + * Called with @p->lock held. |
298 | + */ |
299 | +-static inline void tb_page_add(PageDesc *p, TranslationBlock *tb, |
300 | +- unsigned int n) |
301 | ++static void tb_page_add(PageDesc *p, TranslationBlock *tb, unsigned int n) |
302 | + { |
303 | + bool page_already_protected; |
304 | + |
305 | +@@ -626,15 +708,21 @@ static inline void tb_page_add(PageDesc |
306 | + } |
307 | + } |
308 | + |
309 | +-static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2) |
310 | ++static void tb_record(TranslationBlock *tb) |
311 | + { |
312 | +- tb_page_add(p1, tb, 0); |
313 | +- if (unlikely(p2)) { |
314 | +- tb_page_add(p2, tb, 1); |
315 | ++ tb_page_addr_t paddr0 = tb_page_addr0(tb); |
316 | ++ tb_page_addr_t paddr1 = tb_page_addr1(tb); |
317 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
318 | ++ tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS; |
319 | ++ |
320 | ++ assert(paddr0 != -1); |
321 | ++ if (unlikely(paddr1 != -1) && pindex0 != pindex1) { |
322 | ++ tb_page_add(page_find_alloc(pindex1, false), tb, 1); |
323 | + } |
324 | ++ tb_page_add(page_find_alloc(pindex0, false), tb, 0); |
325 | + } |
326 | + |
327 | +-static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb) |
328 | ++static void tb_page_remove(PageDesc *pd, TranslationBlock *tb) |
329 | + { |
330 | + TranslationBlock *tb1; |
331 | + uintptr_t *pprev; |
332 | +@@ -654,74 +742,16 @@ static inline void tb_page_remove(PageDe |
333 | + |
334 | + static void tb_remove(TranslationBlock *tb) |
335 | + { |
336 | +- PageDesc *pd; |
337 | +- |
338 | +- pd = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); |
339 | +- tb_page_remove(pd, tb); |
340 | +- if (unlikely(tb->page_addr[1] != -1)) { |
341 | +- pd = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); |
342 | +- tb_page_remove(pd, tb); |
343 | +- } |
344 | +-} |
345 | +- |
346 | +-static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, |
347 | +- PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc) |
348 | +-{ |
349 | +- PageDesc *p1, *p2; |
350 | +- tb_page_addr_t page1; |
351 | +- tb_page_addr_t page2; |
352 | +- |
353 | +- assert_memory_lock(); |
354 | +- g_assert(phys1 != -1); |
355 | +- |
356 | +- page1 = phys1 >> TARGET_PAGE_BITS; |
357 | +- page2 = phys2 >> TARGET_PAGE_BITS; |
358 | +- |
359 | +- p1 = page_find_alloc(page1, alloc); |
360 | +- if (ret_p1) { |
361 | +- *ret_p1 = p1; |
362 | +- } |
363 | +- if (likely(phys2 == -1)) { |
364 | +- page_lock(p1); |
365 | +- return; |
366 | +- } else if (page1 == page2) { |
367 | +- page_lock(p1); |
368 | +- if (ret_p2) { |
369 | +- *ret_p2 = p1; |
370 | +- } |
371 | +- return; |
372 | +- } |
373 | +- p2 = page_find_alloc(page2, alloc); |
374 | +- if (ret_p2) { |
375 | +- *ret_p2 = p2; |
376 | +- } |
377 | +- if (page1 < page2) { |
378 | +- page_lock(p1); |
379 | +- page_lock(p2); |
380 | +- } else { |
381 | +- page_lock(p2); |
382 | +- page_lock(p1); |
383 | +- } |
384 | +-} |
385 | +- |
386 | +-/* lock the page(s) of a TB in the correct acquisition order */ |
387 | +-static void page_lock_tb(const TranslationBlock *tb) |
388 | +-{ |
389 | +- page_lock_pair(NULL, tb_page_addr0(tb), NULL, tb_page_addr1(tb), false); |
390 | +-} |
391 | +- |
392 | +-static void page_unlock_tb(const TranslationBlock *tb) |
393 | +-{ |
394 | +- PageDesc *p1 = page_find(tb_page_addr0(tb) >> TARGET_PAGE_BITS); |
395 | +- |
396 | +- page_unlock(p1); |
397 | +- if (unlikely(tb_page_addr1(tb) != -1)) { |
398 | +- PageDesc *p2 = page_find(tb_page_addr1(tb) >> TARGET_PAGE_BITS); |
399 | +- |
400 | +- if (p2 != p1) { |
401 | +- page_unlock(p2); |
402 | +- } |
403 | ++ tb_page_addr_t paddr0 = tb_page_addr0(tb); |
404 | ++ tb_page_addr_t paddr1 = tb_page_addr1(tb); |
405 | ++ tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS; |
406 | ++ tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS; |
407 | ++ |
408 | ++ assert(paddr0 != -1); |
409 | ++ if (unlikely(paddr1 != -1) && pindex0 != pindex1) { |
410 | ++ tb_page_remove(page_find_alloc(pindex1, false), tb); |
411 | + } |
412 | ++ tb_page_remove(page_find_alloc(pindex0, false), tb); |
413 | + } |
414 | + #endif /* CONFIG_USER_ONLY */ |
415 | + |
416 | +@@ -926,18 +956,16 @@ static void tb_phys_invalidate__locked(T |
417 | + void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) |
418 | + { |
419 | + if (page_addr == -1 && tb_page_addr0(tb) != -1) { |
420 | +- page_lock_tb(tb); |
421 | ++ tb_lock_pages(tb); |
422 | + do_tb_phys_invalidate(tb, true); |
423 | +- page_unlock_tb(tb); |
424 | ++ tb_unlock_pages(tb); |
425 | + } else { |
426 | + do_tb_phys_invalidate(tb, false); |
427 | + } |
428 | + } |
429 | + |
430 | + /* |
431 | +- * Add a new TB and link it to the physical page tables. phys_page2 is |
432 | +- * (-1) to indicate that only one page contains the TB. |
433 | +- * |
434 | ++ * Add a new TB and link it to the physical page tables. |
435 | + * Called with mmap_lock held for user-mode emulation. |
436 | + * |
437 | + * Returns a pointer @tb, or a pointer to an existing TB that matches @tb. |
438 | +@@ -945,43 +973,29 @@ void tb_phys_invalidate(TranslationBlock |
439 | + * for the same block of guest code that @tb corresponds to. In that case, |
440 | + * the caller should discard the original @tb, and use instead the returned TB. |
441 | + */ |
442 | +-TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, |
443 | +- tb_page_addr_t phys_page2) |
444 | ++TranslationBlock *tb_link_page(TranslationBlock *tb) |
445 | + { |
446 | +- PageDesc *p; |
447 | +- PageDesc *p2 = NULL; |
448 | + void *existing_tb = NULL; |
449 | + uint32_t h; |
450 | + |
451 | + assert_memory_lock(); |
452 | + tcg_debug_assert(!(tb->cflags & CF_INVALID)); |
453 | + |
454 | +- /* |
455 | +- * Add the TB to the page list, acquiring first the pages's locks. |
456 | +- * We keep the locks held until after inserting the TB in the hash table, |
457 | +- * so that if the insertion fails we know for sure that the TBs are still |
458 | +- * in the page descriptors. |
459 | +- * Note that inserting into the hash table first isn't an option, since |
460 | +- * we can only insert TBs that are fully initialized. |
461 | +- */ |
462 | +- page_lock_pair(&p, phys_pc, &p2, phys_page2, true); |
463 | +- tb_record(tb, p, p2); |
464 | ++ tb_record(tb); |
465 | + |
466 | + /* add in the hash table */ |
467 | +- h = tb_hash_func(phys_pc, (tb->cflags & CF_PCREL ? 0 : tb->pc), |
468 | ++ h = tb_hash_func(tb_page_addr0(tb), (tb->cflags & CF_PCREL ? 0 : tb->pc), |
469 | + tb->flags, tb->cflags, tb->trace_vcpu_dstate); |
470 | + qht_insert(&tb_ctx.htable, tb, h, &existing_tb); |
471 | + |
472 | + /* remove TB from the page(s) if we couldn't insert it */ |
473 | + if (unlikely(existing_tb)) { |
474 | + tb_remove(tb); |
475 | +- tb = existing_tb; |
476 | ++ tb_unlock_pages(tb); |
477 | ++ return existing_tb; |
478 | + } |
479 | + |
480 | +- if (p2 && p2 != p) { |
481 | +- page_unlock(p2); |
482 | +- } |
483 | +- page_unlock(p); |
484 | ++ tb_unlock_pages(tb); |
485 | + return tb; |
486 | + } |
487 | + |
488 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/translate-all.c 2024-02-13 18:18:45.276155938 -0500 |
489 | ++++ qemu-tcg-crash-mantic/accel/tcg/translate-all.c 2024-02-13 18:22:38.708335591 -0500 |
490 | +@@ -304,7 +304,7 @@ TranslationBlock *tb_gen_code(CPUState * |
491 | + { |
492 | + CPUArchState *env = cpu->env_ptr; |
493 | + TranslationBlock *tb, *existing_tb; |
494 | +- tb_page_addr_t phys_pc; |
495 | ++ tb_page_addr_t phys_pc, phys_p2; |
496 | + tcg_insn_unit *gen_code_buf; |
497 | + int gen_code_size, search_size, max_insns; |
498 | + #ifdef CONFIG_PROFILER |
499 | +@@ -330,6 +330,7 @@ TranslationBlock *tb_gen_code(CPUState * |
500 | + QEMU_BUILD_BUG_ON(CF_COUNT_MASK + 1 != TCG_MAX_INSNS); |
501 | + |
502 | + buffer_overflow: |
503 | ++ assert_no_pages_locked(); |
504 | + tb = tcg_tb_alloc(tcg_ctx); |
505 | + if (unlikely(!tb)) { |
506 | + /* flush must be done */ |
507 | +@@ -351,8 +352,12 @@ TranslationBlock *tb_gen_code(CPUState * |
508 | + tb->trace_vcpu_dstate = *cpu->trace_dstate; |
509 | + tb_set_page_addr0(tb, phys_pc); |
510 | + tb_set_page_addr1(tb, -1); |
511 | ++ if (phys_pc != -1) { |
512 | ++ tb_lock_page0(phys_pc); |
513 | ++ } |
514 | ++ |
515 | + tcg_ctx->gen_tb = tb; |
516 | +- tb_overflow: |
517 | ++ restart_translate: |
518 | + |
519 | + #ifdef CONFIG_PROFILER |
520 | + /* includes aborted translations because of exceptions */ |
521 | +@@ -378,6 +383,7 @@ TranslationBlock *tb_gen_code(CPUState * |
522 | + qemu_log_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT, |
523 | + "Restarting code generation for " |
524 | + "code_gen_buffer overflow\n"); |
525 | ++ tb_unlock_pages(tb); |
526 | + goto buffer_overflow; |
527 | + |
528 | + case -2: |
529 | +@@ -396,14 +402,39 @@ TranslationBlock *tb_gen_code(CPUState * |
530 | + "Restarting code generation with " |
531 | + "smaller translation block (max %d insns)\n", |
532 | + max_insns); |
533 | +- goto tb_overflow; |
534 | ++ |
535 | ++ /* |
536 | ++ * The half-sized TB may not cross pages. |
537 | ++ * TODO: Fix all targets that cross pages except with |
538 | ++ * the first insn, at which point this can't be reached. |
539 | ++ */ |
540 | ++ phys_p2 = tb_page_addr1(tb); |
541 | ++ if (unlikely(phys_p2 != -1)) { |
542 | ++ tb_unlock_page1(phys_pc, phys_p2); |
543 | ++ tb_set_page_addr1(tb, -1); |
544 | ++ } |
545 | ++ goto restart_translate; |
546 | ++ |
547 | ++ case -3: |
548 | ++ /* |
549 | ++ * We had a page lock ordering problem. In order to avoid |
550 | ++ * deadlock we had to drop the lock on page0, which means |
551 | ++ * that everything we translated so far is compromised. |
552 | ++ * Restart with locks held on both pages. |
553 | ++ */ |
554 | ++ qemu_log_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT, |
555 | ++ "Restarting code generation with re-locked pages"); |
556 | ++ goto restart_translate; |
557 | + |
558 | + default: |
559 | + g_assert_not_reached(); |
560 | + } |
561 | + } |
562 | ++ tcg_ctx->gen_tb = NULL; |
563 | ++ |
564 | + search_size = encode_search(tb, (void *)gen_code_buf + gen_code_size); |
565 | + if (unlikely(search_size < 0)) { |
566 | ++ tb_unlock_pages(tb); |
567 | + goto buffer_overflow; |
568 | + } |
569 | + tb->tc.size = gen_code_size; |
570 | +@@ -522,6 +553,7 @@ TranslationBlock *tb_gen_code(CPUState * |
571 | + * before attempting to link to other TBs or add to the lookup table. |
572 | + */ |
573 | + if (tb_page_addr0(tb) == -1) { |
574 | ++ assert_no_pages_locked(); |
575 | + return tb; |
576 | + } |
577 | + |
578 | +@@ -536,7 +568,9 @@ TranslationBlock *tb_gen_code(CPUState * |
579 | + * No explicit memory barrier is required -- tb_link_page() makes the |
580 | + * TB visible in a consistent state. |
581 | + */ |
582 | +- existing_tb = tb_link_page(tb, tb_page_addr0(tb), tb_page_addr1(tb)); |
583 | ++ existing_tb = tb_link_page(tb); |
584 | ++ assert_no_pages_locked(); |
585 | ++ |
586 | + /* if the TB already exists, discard what we just translated */ |
587 | + if (unlikely(existing_tb != tb)) { |
588 | + uintptr_t orig_aligned = (uintptr_t)gen_code_buf; |
589 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/translator.c 2024-02-13 18:18:45.276155938 -0500 |
590 | ++++ qemu-tcg-crash-mantic/accel/tcg/translator.c 2024-02-13 18:23:08.420308318 -0500 |
591 | +@@ -17,6 +17,7 @@ |
592 | + #include "exec/translator.h" |
593 | + #include "exec/plugin-gen.h" |
594 | + #include "exec/replay-core.h" |
595 | ++#include "internal.h" |
596 | + |
597 | + bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest) |
598 | + { |
599 | +@@ -47,10 +48,6 @@ void translator_loop(CPUState *cpu, Tran |
600 | + db->host_addr[0] = host_pc; |
601 | + db->host_addr[1] = NULL; |
602 | + |
603 | +-#ifdef CONFIG_USER_ONLY |
604 | +- page_protect(pc); |
605 | +-#endif |
606 | +- |
607 | + ops->init_disas_context(db, cpu); |
608 | + tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */ |
609 | + |
610 | +@@ -158,22 +155,36 @@ static void *translator_access(CPUArchSt |
611 | + host = db->host_addr[1]; |
612 | + base = TARGET_PAGE_ALIGN(db->pc_first); |
613 | + if (host == NULL) { |
614 | +- tb_page_addr_t phys_page = |
615 | +- get_page_addr_code_hostp(env, base, &db->host_addr[1]); |
616 | ++ tb_page_addr_t page0, old_page1, new_page1; |
617 | ++ |
618 | ++ new_page1 = get_page_addr_code_hostp(env, base, &db->host_addr[1]); |
619 | + |
620 | + /* |
621 | + * If the second page is MMIO, treat as if the first page |
622 | + * was MMIO as well, so that we do not cache the TB. |
623 | + */ |
624 | +- if (unlikely(phys_page == -1)) { |
625 | ++ if (unlikely(new_page1 == -1)) { |
626 | ++ tb_unlock_pages(tb); |
627 | + tb_set_page_addr0(tb, -1); |
628 | + return NULL; |
629 | + } |
630 | + |
631 | +- tb_set_page_addr1(tb, phys_page); |
632 | +-#ifdef CONFIG_USER_ONLY |
633 | +- page_protect(end); |
634 | +-#endif |
635 | ++ /* |
636 | ++ * If this is not the first time around, and page1 matches, |
637 | ++ * then we already have the page locked. Alternately, we're |
638 | ++ * not doing anything to prevent the PTE from changing, so |
639 | ++ * we might wind up with a different page, requiring us to |
640 | ++ * re-do the locking. |
641 | ++ */ |
642 | ++ old_page1 = tb_page_addr1(tb); |
643 | ++ if (likely(new_page1 != old_page1)) { |
644 | ++ page0 = tb_page_addr0(tb); |
645 | ++ if (unlikely(old_page1 != -1)) { |
646 | ++ tb_unlock_page1(page0, old_page1); |
647 | ++ } |
648 | ++ tb_set_page_addr1(tb, new_page1); |
649 | ++ tb_lock_page1(page0, new_page1); |
650 | ++ } |
651 | + host = db->host_addr[1]; |
652 | + } |
653 | + |
654 | diff --git a/debian/patches/ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch b/debian/patches/ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch |
655 | new file mode 100644 |
656 | index 0000000..bf6b2a0 |
657 | --- /dev/null |
658 | +++ b/debian/patches/ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch |
659 | @@ -0,0 +1,34 @@ |
660 | +From ad17868eb162a5466d8ad43e5ccb428776403308 Mon Sep 17 00:00:00 2001 |
661 | +From: Richard Henderson <richard.henderson@linaro.org> |
662 | +Date: Wed, 26 Jul 2023 12:58:08 -0700 |
663 | +Subject: [PATCH] accel/tcg: Clear tcg_ctx->gen_tb on buffer overflow |
664 | + |
665 | +On overflow of code_gen_buffer, we unlock the guest pages we had been |
666 | +translating, but failed to clear gen_tb. On restart, if we cannot |
667 | +allocate a TB, we exit to the main loop to perform the flush of all |
668 | +TBs as soon as possible. With garbage in gen_tb, we hit an assert: |
669 | + |
670 | +../src/accel/tcg/tb-maint.c:348:page_unlock__debug: \ |
671 | + assertion failed: (page_is_locked(pd)) |
672 | + |
673 | +Fixes: deba78709ae8 ("accel/tcg: Always lock pages before translation") |
674 | +Signed-off-by: Richard Henderson <richard.henderson@linaro.org> |
675 | + |
676 | +Origin: upstream, https://gitlab.com/qemu-project/qemu/commit/ad17868eb1 |
677 | +Bug-Ubuntu: https://bugs.launchpad.net/bugs/2051965 |
678 | +Last-Update: 2024-02-13 |
679 | + |
680 | +--- |
681 | + accel/tcg/translate-all.c | 1 + |
682 | + 1 file changed, 1 insertion(+) |
683 | + |
684 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/translate-all.c 2024-02-13 18:24:31.688184016 -0500 |
685 | ++++ qemu-tcg-crash-mantic/accel/tcg/translate-all.c 2024-02-13 18:24:31.684184024 -0500 |
686 | +@@ -384,6 +384,7 @@ TranslationBlock *tb_gen_code(CPUState * |
687 | + "Restarting code generation for " |
688 | + "code_gen_buffer overflow\n"); |
689 | + tb_unlock_pages(tb); |
690 | ++ tcg_ctx->gen_tb = NULL; |
691 | + goto buffer_overflow; |
692 | + |
693 | + case -2: |
694 | diff --git a/debian/patches/ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch b/debian/patches/ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch |
695 | new file mode 100644 |
696 | index 0000000..9398854 |
697 | --- /dev/null |
698 | +++ b/debian/patches/ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch |
699 | @@ -0,0 +1,90 @@ |
700 | +From cb62bd15e14e304617d250158b77d0deb032f032 Mon Sep 17 00:00:00 2001 |
701 | +From: Richard Henderson <richard.henderson@linaro.org> |
702 | +Date: Thu, 6 Jul 2023 08:45:13 +0100 |
703 | +Subject: [PATCH] accel/tcg: Split out cpu_exec_longjmp_cleanup |
704 | +MIME-Version: 1.0 |
705 | +Content-Type: text/plain; charset=UTF-8 |
706 | +Content-Transfer-Encoding: 8bit |
707 | + |
708 | +Share the setjmp cleanup between cpu_exec_step_atomic |
709 | +and cpu_exec_setjmp. |
710 | + |
711 | +Reviewed-by: Alex Bennée <alex.bennee@linaro.org> |
712 | +Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> |
713 | +Reviewed-by: Richard W.M. Jones <rjones@redhat.com> |
714 | +Signed-off-by: Richard Henderson <richard.henderson@linaro.org> |
715 | + |
716 | +Origin: backport, https://gitlab.com/qemu-project/qemu/commit/cb62bd15e1 |
717 | +Bug-Ubuntu: https://bugs.launchpad.net/bugs/2051965 |
718 | +Last-Update: 2024-02-13 |
719 | + |
720 | +--- |
721 | + accel/tcg/cpu-exec.c | 43 +++++++++++++++++++------------------------ |
722 | + 1 file changed, 19 insertions(+), 24 deletions(-) |
723 | + |
724 | +--- qemu-tcg-crash-mantic.orig/accel/tcg/cpu-exec.c 2024-02-13 19:51:45.182071910 -0500 |
725 | ++++ qemu-tcg-crash-mantic/accel/tcg/cpu-exec.c 2024-02-13 19:54:41.056353084 -0500 |
726 | +@@ -529,6 +529,23 @@ static void cpu_exec_exit(CPUState *cpu) |
727 | + } |
728 | + } |
729 | + |
730 | ++static void cpu_exec_longjmp_cleanup(CPUState *cpu) |
731 | ++{ |
732 | ++ /* Non-buggy compilers preserve this; assert the correct value. */ |
733 | ++ g_assert(cpu == current_cpu); |
734 | ++ |
735 | ++#ifndef CONFIG_SOFTMMU |
736 | ++ clear_helper_retaddr(); |
737 | ++ if (have_mmap_lock()) { |
738 | ++ mmap_unlock(); |
739 | ++ } |
740 | ++#endif |
741 | ++ if (qemu_mutex_iothread_locked()) { |
742 | ++ qemu_mutex_unlock_iothread(); |
743 | ++ } |
744 | ++ assert_no_pages_locked(); |
745 | ++} |
746 | ++ |
747 | + void cpu_exec_step_atomic(CPUState *cpu) |
748 | + { |
749 | + CPUArchState *env = cpu->env_ptr; |
750 | +@@ -570,16 +587,7 @@ void cpu_exec_step_atomic(CPUState *cpu) |
751 | + cpu_tb_exec(cpu, tb, &tb_exit); |
752 | + cpu_exec_exit(cpu); |
753 | + } else { |
754 | +-#ifndef CONFIG_SOFTMMU |
755 | +- clear_helper_retaddr(); |
756 | +- if (have_mmap_lock()) { |
757 | +- mmap_unlock(); |
758 | +- } |
759 | +-#endif |
760 | +- if (qemu_mutex_iothread_locked()) { |
761 | +- qemu_mutex_unlock_iothread(); |
762 | +- } |
763 | +- assert_no_pages_locked(); |
764 | ++ cpu_exec_longjmp_cleanup(cpu); |
765 | + } |
766 | + |
767 | + /* |
768 | +@@ -1024,20 +1032,7 @@ static int cpu_exec_setjmp(CPUState *cpu |
769 | + { |
770 | + /* Prepare setjmp context for exception handling. */ |
771 | + if (unlikely(sigsetjmp(cpu->jmp_env, 0) != 0)) { |
772 | +- /* Non-buggy compilers preserve this; assert the correct value. */ |
773 | +- g_assert(cpu == current_cpu); |
774 | +- |
775 | +-#ifndef CONFIG_SOFTMMU |
776 | +- clear_helper_retaddr(); |
777 | +- if (have_mmap_lock()) { |
778 | +- mmap_unlock(); |
779 | +- } |
780 | +-#endif |
781 | +- if (qemu_mutex_iothread_locked()) { |
782 | +- qemu_mutex_unlock_iothread(); |
783 | +- } |
784 | +- |
785 | +- assert_no_pages_locked(); |
786 | ++ cpu_exec_longjmp_cleanup(cpu); |
787 | + } |
788 | + |
789 | + return cpu_exec_loop(cpu, sc); |
Some trivial findings first:
- patches are in debian/ patches/ ubuntu/ but the CL says - d/p/lp... should be - d/p/u/lp...
- some places mention kernel "6.13" while it is "6.3"
- avoid some ambiguities by stating clearly that the kernel >=6.3 is in meant to be the one in the guest, otherwise some might think "so what, 6.3 isn't supported in jammy/mantic anyway" or similar
On the patches: accel-tcg- Split-out- cpu_exec_ longjmp_ cleanup. patch accel-tcg- Always- lock-pages- before- translation. patch accel-tcg- Clear-tcg_ ctx-gen_ tb-on-buffer- overflow. patch
- lp-2051965-
That seems fine, it just moves identical code to a shared function
The assert should not matter, and on the actual binary this most likely is inlined and thereby as before
- lp-2051965-
Wow, this is more busy.
Conceptually it is ok - at high abstraction more locking is slower but safer.
It didn't help that the diff was hard to read by so muhc movement
But eventually the new code is easier to read with just one kind of locking instead of 4
I spent some time in a rabbit hole reading this, but it is either ok or too complex for me
to see the issue
- lp-2051965-
That one is easy again, clearing a condition introduced by the former
- I found no other fixups than the one you already spotted
I can see why you said Jammy backport might be more complex, there are probably plenty of changes to that code. My gut says that you'd be ok to backport quite some more changes (instead of only adapting those three patches) if you need it, but that is up to you as you see fit.
+1 on this one with the minor things fixed.
For the "where issues occur" in the SRU statement, the most important bit is that this seems purely in tcg code, so we can assume that HW-virt will not be affected ever.