Merge ~paelzer/ubuntu/+source/qemu:bug-1847948-nvmeperf-1842774-z15name-bionic into ubuntu/+source/qemu:ubuntu/bionic-devel

Proposed by Christian Ehrhardt 
Status: Merged
Approved by: Christian Ehrhardt 
Approved revision: aa5a3760aedd1465ec9f93f9eb629fa23294b691
Merge reported by: Christian Ehrhardt 
Merged at revision: aa5a3760aedd1465ec9f93f9eb629fa23294b691
Proposed branch: ~paelzer/ubuntu/+source/qemu:bug-1847948-nvmeperf-1842774-z15name-bionic
Merge into: ubuntu/+source/qemu:ubuntu/bionic-devel
Diff against target: 486 lines (+440/-0)
7 files modified
debian/changelog (+10/-0)
debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch (+81/-0)
debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch (+137/-0)
debian/patches/series (+5/-0)
debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch (+30/-0)
debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch (+82/-0)
debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch (+95/-0)
Reviewer Review Type Date Requested Status
Rafael David Tinoco (community) Approve
Canonical Server Pending
git-ubuntu developers Pending
Review via email: mp+374130@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I think we need some more patches, let me discuss that with the reporter before review.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ok, I'm more happy with the patches now and the result was confirmed to be as good (perf) by IBM.
Ready for review (again) ...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christian,

I have reviewed all the changes and I'm +1 on this merge with a minor clarification:

07:56 <rafaeldtinoco> IBM Z14 GA2 was a minor change for s309x-cpumodel patch.
07:56 <rafaeldtinoco> you changed it from upstream to backport
07:56 <rafaeldtinoco> i can't find any other reference to that change
07:56 <rafaeldtinoco> why its z13s GA1 instead of z14 GA2

All the rest lgtm. +1

review: Approve
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Oh, and I also left this comment regarding the kernel SRU for the nvme performance issue:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1847948/comments/18

(just to document)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

$ git push pkg upload/1%2.11+dfsg-1ubuntu7.20
...
To ssh://git.launchpad.net/~usd-import-team/ubuntu/+source/qemu
 * [new tag] upload/1%2.11+dfsg-1ubuntu7.20 -> upload/1%2.11+dfsg-1ubuntu7.20

$ dput ubuntu ../qemu_2.11+dfsg-1ubuntu7.20_source.changes
Checking signature on .changes
gpg: ../qemu_2.11+dfsg-1ubuntu7.20_source.changes: Error checking signature from BA3E29338280B242: SignatureVerifyError: 0
Checking signature on .dsc
gpg: ../qemu_2.11+dfsg-1ubuntu7.20.dsc: Error checking signature from BA3E29338280B242: SignatureVerifyError: 0
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading qemu_2.11+dfsg-1ubuntu7.20.dsc: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20.debian.tar.xz: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20_source.buildinfo: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20_source.changes: done.
Successfully uploaded packages.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This completed a while ago in all target releases -> merged

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index eddf1b8..9e214e0 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,13 @@
6+qemu (1:2.11+dfsg-1ubuntu7.20) bionic; urgency=medium
7+
8+ * d/p/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch:
9+ update the z15 model name (LP: #1842774)
10+ * d/p/u/lp-1847948-*: allow MSIX BAR mapping on VFIO in general and use that
11+ instead of emulation on ppc64 increasing performance of e.g. NVME
12+ passthrough (LP: #1847948)
13+
14+ -- Christian Ehrhardt <christian.ehrhardt@canonical.com> Tue, 15 Oct 2019 11:23:23 +0200
15+
16 qemu (1:2.11+dfsg-1ubuntu7.19) bionic; urgency=medium
17
18 * d/p/ubuntu/lp-1837869-block-Fix-flags-in-reopen-queue.patch: avoid
19diff --git a/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch b/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
20new file mode 100644
21index 0000000..ee036ab
22--- /dev/null
23+++ b/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
24@@ -0,0 +1,81 @@
25+From 5c08600547c059e3fd072995f9f367cdaf3c7d9d Mon Sep 17 00:00:00 2001
26+From: Eric Auger <eric.auger@redhat.com>
27+Date: Wed, 4 Apr 2018 22:30:50 +0200
28+Subject: [PATCH] vfio: Use a trace point when a RAM section cannot be DMA
29+ mapped
30+MIME-Version: 1.0
31+Content-Type: text/plain; charset=UTF-8
32+Content-Transfer-Encoding: 8bit
33+
34+Commit 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
35+added an error message if a passed memory section address or size
36+is not aligned to the page size and thus cannot be DMA mapped.
37+
38+This patch fixes the trace by printing the region name and the
39+memory region section offset within the address space (instead of
40+offset_within_region).
41+
42+We also turn the error_report into a trace event. Indeed, In some
43+cases, the traces can be confusing to non expert end-users and
44+let think the use case does not work (whereas it works as before).
45+
46+This is the case where a BAR is successively mapped at different
47+GPAs and its sections are not compatible with dma map. The listener
48+is called several times and traces are issued for each intermediate
49+mapping. The end-user cannot easily match those GPAs against the
50+final GPA output by lscpi. So let's keep those information to
51+informed users. In mid term, the plan is to advise the user about
52+BAR relocation relevance.
53+
54+Fixes: 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
55+Signed-off-by: Eric Auger <eric.auger@redhat.com>
56+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
57+Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
58+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
59+
60+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=5c08600547c05
61+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
62+Last-Update: 2019-10-16
63+
64+---
65+ hw/vfio/common.c | 11 +++++------
66+ hw/vfio/trace-events | 1 +
67+ 2 files changed, 6 insertions(+), 6 deletions(-)
68+
69+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
70+index 5e84716218..07ffa0ba10 100644
71+--- a/hw/vfio/common.c
72++++ b/hw/vfio/common.c
73+@@ -548,12 +548,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
74+ hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
75+
76+ if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
77+- error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
78+- " is not aligned to 0x%"HWADDR_PRIx
79+- " and cannot be mapped for DMA",
80+- section->offset_within_region,
81+- int128_getlo(section->size),
82+- pgmask + 1);
83++ trace_vfio_listener_region_add_no_dma_map(
84++ memory_region_name(section->mr),
85++ section->offset_within_address_space,
86++ int128_getlo(section->size),
87++ pgmask + 1);
88+ return;
89+ }
90+ }
91+diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
92+index 79f63a2ff6..20109cb758 100644
93+--- a/hw/vfio/trace-events
94++++ b/hw/vfio/trace-events
95+@@ -90,6 +90,7 @@ vfio_iommu_map_notify(const char *op, uint64_t iova_start, uint64_t iova_end) "i
96+ vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add 0x%"PRIx64" - 0x%"PRIx64
97+ vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] 0x%"PRIx64" - 0x%"PRIx64
98+ vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void *vaddr) "region_add [ram] 0x%"PRIx64" - 0x%"PRIx64" [%p]"
99++vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
100+ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
101+ vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
102+ vfio_disconnect_container(int fd) "close container->fd=%d"
103+--
104+2.23.0
105+
106diff --git a/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch b/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
107new file mode 100644
108index 0000000..7994041
109--- /dev/null
110+++ b/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
111@@ -0,0 +1,137 @@
112+From 567b5b309abe744b1098018a2eb157e7109c9f30 Mon Sep 17 00:00:00 2001
113+From: Alexey Kardashevskiy <aik@ozlabs.ru>
114+Date: Tue, 13 Mar 2018 11:17:30 -0600
115+Subject: [PATCH] vfio/pci: Relax DMA map errors for MMIO regions
116+
117+At the moment if vfio_memory_listener is registered in the system memory
118+address space, it maps/unmaps every RAM memory region for DMA.
119+It expects system page size aligned memory sections so vfio_dma_map
120+would not fail and so far this has been the case. A mapping failure
121+would be fatal. A side effect of such behavior is that some MMIO pages
122+would not be mapped silently.
123+
124+However we are going to change MSIX BAR handling so we will end having
125+non-aligned sections in vfio_memory_listener (more details is in
126+the next patch) and vfio_dma_map will exit QEMU.
127+
128+In order to avoid fatal failures on what previously was not a failure and
129+was just silently ignored, this checks the section alignment to
130+the smallest supported IOMMU page size and prints an error if not aligned;
131+it also prints an error if vfio_dma_map failed despite the page size check.
132+Both errors are not fatal; only MMIO RAM regions are checked
133+(aka "RAM device" regions).
134+
135+If the amount of errors printed is overwhelming, the MSIX relocation
136+could be used to avoid excessive error output.
137+
138+This is unlikely to cause any behavioral change.
139+
140+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
141+[aw: Fix Int128 bit ops]
142+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
143+
144+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=567b5b309abe7
145+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
146+Last-Update: 2019-10-16
147+
148+---
149+ hw/vfio/common.c | 55 ++++++++++++++++++++++++++++++++++++++++++------
150+ 1 file changed, 49 insertions(+), 6 deletions(-)
151+
152+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
153+index 6a8203a532..07c03d78b6 100644
154+--- a/hw/vfio/common.c
155++++ b/hw/vfio/common.c
156+@@ -544,18 +544,40 @@ static void vfio_listener_region_add(MemoryListener *listener,
157+
158+ llsize = int128_sub(llend, int128_make64(iova));
159+
160++ if (memory_region_is_ram_device(section->mr)) {
161++ hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
162++
163++ if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
164++ error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
165++ " is not aligned to 0x%"HWADDR_PRIx
166++ " and cannot be mapped for DMA",
167++ section->offset_within_region,
168++ int128_getlo(section->size),
169++ pgmask + 1);
170++ return;
171++ }
172++ }
173++
174+ ret = vfio_dma_map(container, iova, int128_get64(llsize),
175+ vaddr, section->readonly);
176+ if (ret) {
177+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
178+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
179+ container, iova, int128_get64(llsize), vaddr, ret);
180++ if (memory_region_is_ram_device(section->mr)) {
181++ /* Allow unexpected mappings not to be fatal for RAM devices */
182++ return;
183++ }
184+ goto fail;
185+ }
186+
187+ return;
188+
189+ fail:
190++ if (memory_region_is_ram_device(section->mr)) {
191++ error_report("failed to vfio_dma_map. pci p2p may not work");
192++ return;
193++ }
194+ /*
195+ * On the initfn path, store the first error in the container so we
196+ * can gracefully fail. Runtime, there's not much we can do other
197+@@ -577,6 +599,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
198+ hwaddr iova, end;
199+ Int128 llend, llsize;
200+ int ret;
201++ bool try_unmap = true;
202+
203+ if (vfio_listener_skipped_section(section)) {
204+ trace_vfio_listener_region_del_skip(
205+@@ -629,14 +652,34 @@ static void vfio_listener_region_del(MemoryListener *listener,
206+
207+ trace_vfio_listener_region_del(iova, end);
208+
209+- ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
210+- memory_region_unref(section->mr);
211+- if (ret) {
212+- error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
213+- "0x%"HWADDR_PRIx") = %d (%m)",
214+- container, iova, int128_get64(llsize), ret);
215++ if (memory_region_is_ram_device(section->mr)) {
216++ hwaddr pgmask;
217++ VFIOHostDMAWindow *hostwin;
218++ bool hostwin_found = false;
219++
220++ QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
221++ if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
222++ hostwin_found = true;
223++ break;
224++ }
225++ }
226++ assert(hostwin_found); /* or region_add() would have failed */
227++
228++ pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
229++ try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
230++ }
231++
232++ if (try_unmap) {
233++ ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
234++ if (ret) {
235++ error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
236++ "0x%"HWADDR_PRIx") = %d (%m)",
237++ container, iova, int128_get64(llsize), ret);
238++ }
239+ }
240+
241++ memory_region_unref(section->mr);
242++
243+ if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
244+ vfio_spapr_remove_window(container,
245+ section->offset_within_address_space);
246+--
247+2.23.0
248+
249diff --git a/debian/patches/series b/debian/patches/series
250index d6a690a..b2015aa 100644
251--- a/debian/patches/series
252+++ b/debian/patches/series
253@@ -123,3 +123,8 @@ ubuntu/lp-1836154-09-s390-cpumodel-fix-description-for-the-new-vector-fac.patch
254 ubuntu/lp-1836154-s390x-cpumodel-remove-esort-from-the-default-model.patch
255 ubuntu/lp-1836154-s390x-cpumodel-also-change-name-of-vxbeh.patch
256 ubuntu/lp-1837869-block-Fix-flags-in-reopen-queue.patch
257+ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
258+ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
259+ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
260+lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
261+lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
262diff --git a/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch b/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
263new file mode 100644
264index 0000000..e888e41
265--- /dev/null
266+++ b/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
267@@ -0,0 +1,30 @@
268+From 7505deca0bfa859136ec6419dbafc504f22fcac2 Mon Sep 17 00:00:00 2001
269+From: Christian Borntraeger <borntraeger@de.ibm.com>
270+Date: Wed, 18 Sep 2019 16:42:14 +0200
271+Subject: [PATCH] s390x/cpumodel: Add the z15 name to the description of gen15a
272+
273+We now know that gen15a is called z15.
274+
275+Reviewed-by: David Hildenbrand <david@redhat.com>
276+Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
277+
278+Backport-Note: context slightly changed for z14.2
279+Origin: backport, https://git.qemu.org/?p=qemu.git;a=commit;h=7505deca
280+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1842774
281+Last-Update: 2019-09-24
282+
283+---
284+ target/s390x/cpu_models.c | 2 +-
285+ 1 file changed, 1 insertion(+), 1 deletion(-)
286+
287+--- a/target/s390x/cpu_models.c
288++++ b/target/s390x/cpu_models.c
289+@@ -79,7 +79,7 @@ static S390CPUDef s390_cpu_defs[] = {
290+ CPUDEF_INIT(0x2965, 13, 2, 47, 0x08000000U, "z13s", "IBM z13s GA1"),
291+ CPUDEF_INIT(0x3906, 14, 1, 47, 0x08000000U, "z14", "IBM z14 GA1"),
292+ CPUDEF_INIT(0x3907, 14, 1, 47, 0x08000000U, "z14ZR1", "IBM z14 Model ZR1 GA1"),
293+- CPUDEF_INIT(0x8561, 15, 1, 47, 0x08000000U, "gen15a", "IBM 8561 GA1"),
294++ CPUDEF_INIT(0x8561, 15, 1, 47, 0x08000000U, "gen15a", "IBM z15 GA1"),
295+ CPUDEF_INIT(0x8562, 15, 1, 47, 0x08000000U, "gen15b", "IBM 8562 GA1"),
296+ };
297+
298diff --git a/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch b/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
299new file mode 100644
300index 0000000..8d3f8b0
301--- /dev/null
302+++ b/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
303@@ -0,0 +1,82 @@
304+From fcad0d2121976df4b422b4007a5eb7fcaac01134 Mon Sep 17 00:00:00 2001
305+From: Alexey Kardashevskiy <aik@ozlabs.ru>
306+Date: Tue, 13 Mar 2018 11:17:31 -0600
307+Subject: [PATCH] ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices
308+
309+This adds a possibility for the platform to tell VFIO not to emulate MSIX
310+so MMIO memory regions do not get split into chunks in flatview and
311+the entire page can be registered as a KVM memory slot and make direct
312+MMIO access possible for the guest.
313+
314+This enables the entire MSIX BAR mapping to the guest for the pseries
315+platform in order to achieve the maximum MMIO preformance for certain
316+devices.
317+
318+Tested on:
319+LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
320+
321+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
322+Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
323+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
324+
325+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=fcad0d2121976df4b422b4007a5e
326+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
327+Last-Update: 2019-10-15
328+
329+---
330+ hw/ppc/spapr.c | 7 +++++++
331+ hw/vfio/pci.c | 13 +++++++++++++
332+ 2 files changed, 20 insertions(+)
333+
334+diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
335+index 7e1c858566..032d03423f 100644
336+--- a/hw/ppc/spapr.c
337++++ b/hw/ppc/spapr.c
338+@@ -2855,6 +2855,11 @@ static void spapr_set_modern_hotplug_events(Object *obj, bool value,
339+ spapr->use_hotplug_event_source = value;
340+ }
341+
342++static bool spapr_get_msix_emulation(Object *obj, Error **errp)
343++{
344++ return true;
345++}
346++
347+ static char *spapr_get_resize_hpt(Object *obj, Error **errp)
348+ {
349+ sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
350+@@ -2936,6 +2941,8 @@ static void spapr_instance_init(Object *obj)
351+ object_property_set_description(obj, "vsmt",
352+ "Virtual SMT: KVM behaves as if this were"
353+ " the host's SMT mode", &error_abort);
354++ object_property_add_bool(obj, "vfio-no-msix-emulation",
355++ spapr_get_msix_emulation, NULL, NULL);
356+ }
357+
358+ static void spapr_machine_finalizefn(Object *obj)
359+diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
360+index 02974f4eb9..b9bc6cd310 100644
361+--- a/hw/vfio/pci.c
362++++ b/hw/vfio/pci.c
363+@@ -1581,6 +1581,19 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
364+ */
365+ memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
366+
367++ /*
368++ * The emulated machine may provide a paravirt interface for MSIX setup
369++ * so it is not strictly necessary to emulate MSIX here. This becomes
370++ * helpful when frequently accessed MMIO registers are located in
371++ * subpages adjacent to the MSIX table but the MSIX data containing page
372++ * cannot be mapped because of a host page size bigger than the MSIX table
373++ * alignment.
374++ */
375++ if (object_property_get_bool(OBJECT(qdev_get_machine()),
376++ "vfio-no-msix-emulation", NULL)) {
377++ memory_region_set_enabled(&vdev->pdev.msix_table_mmio, false);
378++ }
379++
380+ return 0;
381+ }
382+
383+--
384+2.23.0
385+
386diff --git a/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch b/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
387new file mode 100644
388index 0000000..b0c1550
389--- /dev/null
390+++ b/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
391@@ -0,0 +1,95 @@
392+From ae0215b2bb56a9d5321a185dde133bfdd306a4c0 Mon Sep 17 00:00:00 2001
393+From: Alexey Kardashevskiy <aik@ozlabs.ru>
394+Date: Tue, 13 Mar 2018 11:17:31 -0600
395+Subject: [PATCH] vfio-pci: Allow mmap of MSIX BAR
396+
397+At the moment we unconditionally avoid mapping MSIX data of a BAR and
398+emulate MSIX table in QEMU. However it is 1) not always necessary as
399+a platform may provide a paravirt interface for MSIX configuration;
400+2) can affect the speed of MMIO access by emulating them in QEMU when
401+frequently accessed registers share same system page with MSIX data,
402+this is particularly a problem for systems with the page size bigger
403+than 4KB.
404+
405+A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added
406+to the kernel [1] which tells the userspace that mapping of the MSIX data
407+is possible now. This makes use of it so from now on QEMU tries mapping
408+the entire BAR as a whole and emulate MSIX on top of that.
409+
410+[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
411+
412+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
413+Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
414+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
415+
416+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=ae0215b2bb56a9d5321a185dde13
417+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
418+Last-Update: 2019-10-15
419+
420+---
421+ hw/vfio/common.c | 15 +++++++++++++++
422+ hw/vfio/pci.c | 9 +++++++++
423+ include/hw/vfio/vfio-common.h | 1 +
424+ 3 files changed, 25 insertions(+)
425+
426+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
427+index 07c03d78b6..5e84716218 100644
428+--- a/hw/vfio/common.c
429++++ b/hw/vfio/common.c
430+@@ -1471,6 +1471,21 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
431+ return -ENODEV;
432+ }
433+
434++bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
435++{
436++ struct vfio_region_info *info = NULL;
437++ bool ret = false;
438++
439++ if (!vfio_get_region_info(vbasedev, region, &info)) {
440++ if (vfio_get_region_info_cap(info, cap_type)) {
441++ ret = true;
442++ }
443++ g_free(info);
444++ }
445++
446++ return ret;
447++}
448++
449+ /*
450+ * Interfaces for IBM EEH (Enhanced Error Handling)
451+ */
452+diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
453+index b9d2c12b82..02974f4eb9 100644
454+--- a/hw/vfio/pci.c
455++++ b/hw/vfio/pci.c
456+@@ -1294,6 +1294,15 @@ static void vfio_pci_fixup_msix_region(VFIOPCIDevice *vdev)
457+ off_t start, end;
458+ VFIORegion *region = &vdev->bars[vdev->msix->table_bar].region;
459+
460++ /*
461++ * If the host driver allows mapping of a MSIX data, we are going to
462++ * do map the entire BAR and emulate MSIX table on top of that.
463++ */
464++ if (vfio_has_region_cap(&vdev->vbasedev, region->nr,
465++ VFIO_REGION_INFO_CAP_MSIX_MAPPABLE)) {
466++ return;
467++ }
468++
469+ /*
470+ * We expect to find a single mmap covering the whole BAR, anything else
471+ * means it's either unsupported or already setup.
472+diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
473+index c5efa32750..d9360148e6 100644
474+--- a/include/hw/vfio/vfio-common.h
475++++ b/include/hw/vfio/vfio-common.h
476+@@ -193,6 +193,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
477+ struct vfio_region_info **info);
478+ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
479+ uint32_t subtype, struct vfio_region_info **info);
480++bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
481+ #endif
482+ extern const MemoryListener vfio_prereg_listener;
483+
484+--
485+2.23.0
486+

Subscribers

People subscribed via source and target branches