Merge ~paelzer/ubuntu/+source/qemu:bug-1847948-nvmeperf-1842774-z15name-bionic into ubuntu/+source/qemu:ubuntu/bionic-devel

Proposed by Christian Ehrhardt  on 2019-10-15
Status: Approved
Approved by: Christian Ehrhardt  on 2019-10-28
Approved revision: aa5a3760aedd1465ec9f93f9eb629fa23294b691
Proposed branch: ~paelzer/ubuntu/+source/qemu:bug-1847948-nvmeperf-1842774-z15name-bionic
Merge into: ubuntu/+source/qemu:ubuntu/bionic-devel
Diff against target: 486 lines (+440/-0)
7 files modified
debian/changelog (+10/-0)
debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch (+81/-0)
debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch (+137/-0)
debian/patches/series (+5/-0)
debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch (+30/-0)
debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch (+82/-0)
debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch (+95/-0)
Reviewer Review Type Date Requested Status
Rafael David Tinoco (community) 2019-10-15 Approve on 2019-10-25
Canonical Server Team 2019-10-15 Pending
Ubuntu Server Dev import team 2019-10-15 Pending
Review via email: mp+374130@code.launchpad.net
To post a comment you must log in.
Christian Ehrhardt  (paelzer) wrote :

I think we need some more patches, let me discuss that with the reporter before review.

Christian Ehrhardt  (paelzer) wrote :

Ok, I'm more happy with the patches now and the result was confirmed to be as good (perf) by IBM.
Ready for review (again) ...

Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christian,

I have reviewed all the changes and I'm +1 on this merge with a minor clarification:

07:56 <rafaeldtinoco> IBM Z14 GA2 was a minor change for s309x-cpumodel patch.
07:56 <rafaeldtinoco> you changed it from upstream to backport
07:56 <rafaeldtinoco> i can't find any other reference to that change
07:56 <rafaeldtinoco> why its z13s GA1 instead of z14 GA2

All the rest lgtm. +1

review: Approve
Rafael David Tinoco (rafaeldtinoco) wrote :

Oh, and I also left this comment regarding the kernel SRU for the nvme performance issue:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1847948/comments/18

(just to document)

Christian Ehrhardt  (paelzer) wrote :

$ git push pkg upload/1%2.11+dfsg-1ubuntu7.20
...
To ssh://git.launchpad.net/~usd-import-team/ubuntu/+source/qemu
 * [new tag] upload/1%2.11+dfsg-1ubuntu7.20 -> upload/1%2.11+dfsg-1ubuntu7.20

$ dput ubuntu ../qemu_2.11+dfsg-1ubuntu7.20_source.changes
Checking signature on .changes
gpg: ../qemu_2.11+dfsg-1ubuntu7.20_source.changes: Error checking signature from BA3E29338280B242: SignatureVerifyError: 0
Checking signature on .dsc
gpg: ../qemu_2.11+dfsg-1ubuntu7.20.dsc: Error checking signature from BA3E29338280B242: SignatureVerifyError: 0
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading qemu_2.11+dfsg-1ubuntu7.20.dsc: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20.debian.tar.xz: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20_source.buildinfo: done.
  Uploading qemu_2.11+dfsg-1ubuntu7.20_source.changes: done.
Successfully uploaded packages.

Unmerged commits

aa5a376... by Christian Ehrhardt  on 2019-10-16

d/p/u/lp-1847948-*: add patches to avoid fatal false positives on non aligned mappings

Signed-off-by: Christian Ehrhardt <email address hidden>

f5dff1b... by Christian Ehrhardt  on 2019-10-15

changelog: allow MSIX BAR mapping on VFIO (LP: #1847948)

Signed-off-by: Christian Ehrhardt <email address hidden>

da9f1f6... by Christian Ehrhardt  on 2019-10-15

d/p/u/lp-1847948-*: allow MSIX BAR mapping on VFIO in general and use that instead of emulation on ppc64 increasing performance of e.g. NVME passthrough (LP: #1847948)

Signed-off-by: Christian Ehrhardt <email address hidden>

498a516... by Christian Ehrhardt  on 2019-10-15

d/p/u/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch: minor backport changes

Signed-off-by: Christian Ehrhardt <email address hidden>

c796c83... by Christian Ehrhardt  on 2019-10-15

changelog: update the z15 model name (LP: #1842774)

Signed-off-by: Christian Ehrhardt <email address hidden>

489eca5... by Christian Ehrhardt  on 2019-09-24

d/p/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch: update the z15 model name (LP: #1842774)

Signed-off-by: Christian Ehrhardt <email address hidden>

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index eddf1b8..9e214e0 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,13 @@
6+qemu (1:2.11+dfsg-1ubuntu7.20) bionic; urgency=medium
7+
8+ * d/p/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch:
9+ update the z15 model name (LP: #1842774)
10+ * d/p/u/lp-1847948-*: allow MSIX BAR mapping on VFIO in general and use that
11+ instead of emulation on ppc64 increasing performance of e.g. NVME
12+ passthrough (LP: #1847948)
13+
14+ -- Christian Ehrhardt <christian.ehrhardt@canonical.com> Tue, 15 Oct 2019 11:23:23 +0200
15+
16 qemu (1:2.11+dfsg-1ubuntu7.19) bionic; urgency=medium
17
18 * d/p/ubuntu/lp-1837869-block-Fix-flags-in-reopen-queue.patch: avoid
19diff --git a/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch b/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
20new file mode 100644
21index 0000000..ee036ab
22--- /dev/null
23+++ b/debian/patches/lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
24@@ -0,0 +1,81 @@
25+From 5c08600547c059e3fd072995f9f367cdaf3c7d9d Mon Sep 17 00:00:00 2001
26+From: Eric Auger <eric.auger@redhat.com>
27+Date: Wed, 4 Apr 2018 22:30:50 +0200
28+Subject: [PATCH] vfio: Use a trace point when a RAM section cannot be DMA
29+ mapped
30+MIME-Version: 1.0
31+Content-Type: text/plain; charset=UTF-8
32+Content-Transfer-Encoding: 8bit
33+
34+Commit 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
35+added an error message if a passed memory section address or size
36+is not aligned to the page size and thus cannot be DMA mapped.
37+
38+This patch fixes the trace by printing the region name and the
39+memory region section offset within the address space (instead of
40+offset_within_region).
41+
42+We also turn the error_report into a trace event. Indeed, In some
43+cases, the traces can be confusing to non expert end-users and
44+let think the use case does not work (whereas it works as before).
45+
46+This is the case where a BAR is successively mapped at different
47+GPAs and its sections are not compatible with dma map. The listener
48+is called several times and traces are issued for each intermediate
49+mapping. The end-user cannot easily match those GPAs against the
50+final GPA output by lscpi. So let's keep those information to
51+informed users. In mid term, the plan is to advise the user about
52+BAR relocation relevance.
53+
54+Fixes: 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
55+Signed-off-by: Eric Auger <eric.auger@redhat.com>
56+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
57+Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
58+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
59+
60+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=5c08600547c05
61+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
62+Last-Update: 2019-10-16
63+
64+---
65+ hw/vfio/common.c | 11 +++++------
66+ hw/vfio/trace-events | 1 +
67+ 2 files changed, 6 insertions(+), 6 deletions(-)
68+
69+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
70+index 5e84716218..07ffa0ba10 100644
71+--- a/hw/vfio/common.c
72++++ b/hw/vfio/common.c
73+@@ -548,12 +548,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
74+ hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
75+
76+ if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
77+- error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
78+- " is not aligned to 0x%"HWADDR_PRIx
79+- " and cannot be mapped for DMA",
80+- section->offset_within_region,
81+- int128_getlo(section->size),
82+- pgmask + 1);
83++ trace_vfio_listener_region_add_no_dma_map(
84++ memory_region_name(section->mr),
85++ section->offset_within_address_space,
86++ int128_getlo(section->size),
87++ pgmask + 1);
88+ return;
89+ }
90+ }
91+diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
92+index 79f63a2ff6..20109cb758 100644
93+--- a/hw/vfio/trace-events
94++++ b/hw/vfio/trace-events
95+@@ -90,6 +90,7 @@ vfio_iommu_map_notify(const char *op, uint64_t iova_start, uint64_t iova_end) "i
96+ vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add 0x%"PRIx64" - 0x%"PRIx64
97+ vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] 0x%"PRIx64" - 0x%"PRIx64
98+ vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void *vaddr) "region_add [ram] 0x%"PRIx64" - 0x%"PRIx64" [%p]"
99++vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
100+ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del 0x%"PRIx64" - 0x%"PRIx64
101+ vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
102+ vfio_disconnect_container(int fd) "close container->fd=%d"
103+--
104+2.23.0
105+
106diff --git a/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch b/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
107new file mode 100644
108index 0000000..7994041
109--- /dev/null
110+++ b/debian/patches/lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
111@@ -0,0 +1,137 @@
112+From 567b5b309abe744b1098018a2eb157e7109c9f30 Mon Sep 17 00:00:00 2001
113+From: Alexey Kardashevskiy <aik@ozlabs.ru>
114+Date: Tue, 13 Mar 2018 11:17:30 -0600
115+Subject: [PATCH] vfio/pci: Relax DMA map errors for MMIO regions
116+
117+At the moment if vfio_memory_listener is registered in the system memory
118+address space, it maps/unmaps every RAM memory region for DMA.
119+It expects system page size aligned memory sections so vfio_dma_map
120+would not fail and so far this has been the case. A mapping failure
121+would be fatal. A side effect of such behavior is that some MMIO pages
122+would not be mapped silently.
123+
124+However we are going to change MSIX BAR handling so we will end having
125+non-aligned sections in vfio_memory_listener (more details is in
126+the next patch) and vfio_dma_map will exit QEMU.
127+
128+In order to avoid fatal failures on what previously was not a failure and
129+was just silently ignored, this checks the section alignment to
130+the smallest supported IOMMU page size and prints an error if not aligned;
131+it also prints an error if vfio_dma_map failed despite the page size check.
132+Both errors are not fatal; only MMIO RAM regions are checked
133+(aka "RAM device" regions).
134+
135+If the amount of errors printed is overwhelming, the MSIX relocation
136+could be used to avoid excessive error output.
137+
138+This is unlikely to cause any behavioral change.
139+
140+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
141+[aw: Fix Int128 bit ops]
142+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
143+
144+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=567b5b309abe7
145+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
146+Last-Update: 2019-10-16
147+
148+---
149+ hw/vfio/common.c | 55 ++++++++++++++++++++++++++++++++++++++++++------
150+ 1 file changed, 49 insertions(+), 6 deletions(-)
151+
152+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
153+index 6a8203a532..07c03d78b6 100644
154+--- a/hw/vfio/common.c
155++++ b/hw/vfio/common.c
156+@@ -544,18 +544,40 @@ static void vfio_listener_region_add(MemoryListener *listener,
157+
158+ llsize = int128_sub(llend, int128_make64(iova));
159+
160++ if (memory_region_is_ram_device(section->mr)) {
161++ hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
162++
163++ if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
164++ error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
165++ " is not aligned to 0x%"HWADDR_PRIx
166++ " and cannot be mapped for DMA",
167++ section->offset_within_region,
168++ int128_getlo(section->size),
169++ pgmask + 1);
170++ return;
171++ }
172++ }
173++
174+ ret = vfio_dma_map(container, iova, int128_get64(llsize),
175+ vaddr, section->readonly);
176+ if (ret) {
177+ error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
178+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
179+ container, iova, int128_get64(llsize), vaddr, ret);
180++ if (memory_region_is_ram_device(section->mr)) {
181++ /* Allow unexpected mappings not to be fatal for RAM devices */
182++ return;
183++ }
184+ goto fail;
185+ }
186+
187+ return;
188+
189+ fail:
190++ if (memory_region_is_ram_device(section->mr)) {
191++ error_report("failed to vfio_dma_map. pci p2p may not work");
192++ return;
193++ }
194+ /*
195+ * On the initfn path, store the first error in the container so we
196+ * can gracefully fail. Runtime, there's not much we can do other
197+@@ -577,6 +599,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
198+ hwaddr iova, end;
199+ Int128 llend, llsize;
200+ int ret;
201++ bool try_unmap = true;
202+
203+ if (vfio_listener_skipped_section(section)) {
204+ trace_vfio_listener_region_del_skip(
205+@@ -629,14 +652,34 @@ static void vfio_listener_region_del(MemoryListener *listener,
206+
207+ trace_vfio_listener_region_del(iova, end);
208+
209+- ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
210+- memory_region_unref(section->mr);
211+- if (ret) {
212+- error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
213+- "0x%"HWADDR_PRIx") = %d (%m)",
214+- container, iova, int128_get64(llsize), ret);
215++ if (memory_region_is_ram_device(section->mr)) {
216++ hwaddr pgmask;
217++ VFIOHostDMAWindow *hostwin;
218++ bool hostwin_found = false;
219++
220++ QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
221++ if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
222++ hostwin_found = true;
223++ break;
224++ }
225++ }
226++ assert(hostwin_found); /* or region_add() would have failed */
227++
228++ pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
229++ try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
230++ }
231++
232++ if (try_unmap) {
233++ ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
234++ if (ret) {
235++ error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
236++ "0x%"HWADDR_PRIx") = %d (%m)",
237++ container, iova, int128_get64(llsize), ret);
238++ }
239+ }
240+
241++ memory_region_unref(section->mr);
242++
243+ if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
244+ vfio_spapr_remove_window(container,
245+ section->offset_within_address_space);
246+--
247+2.23.0
248+
249diff --git a/debian/patches/series b/debian/patches/series
250index d6a690a..b2015aa 100644
251--- a/debian/patches/series
252+++ b/debian/patches/series
253@@ -123,3 +123,8 @@ ubuntu/lp-1836154-09-s390-cpumodel-fix-description-for-the-new-vector-fac.patch
254 ubuntu/lp-1836154-s390x-cpumodel-remove-esort-from-the-default-model.patch
255 ubuntu/lp-1836154-s390x-cpumodel-also-change-name-of-vxbeh.patch
256 ubuntu/lp-1837869-block-Fix-flags-in-reopen-queue.patch
257+ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
258+ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
259+ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
260+lp-1847948-vfio-pci-Relax-DMA-map-errors-for-MMIO-regions.patch
261+lp-1847948-vfio-Use-a-trace-point-when-a-RAM-section-cannot-be-.patch
262diff --git a/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch b/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
263new file mode 100644
264index 0000000..e888e41
265--- /dev/null
266+++ b/debian/patches/ubuntu/lp-1842774-s390x-cpumodel-Add-the-z15-name-to-the-description-o.patch
267@@ -0,0 +1,30 @@
268+From 7505deca0bfa859136ec6419dbafc504f22fcac2 Mon Sep 17 00:00:00 2001
269+From: Christian Borntraeger <borntraeger@de.ibm.com>
270+Date: Wed, 18 Sep 2019 16:42:14 +0200
271+Subject: [PATCH] s390x/cpumodel: Add the z15 name to the description of gen15a
272+
273+We now know that gen15a is called z15.
274+
275+Reviewed-by: David Hildenbrand <david@redhat.com>
276+Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
277+
278+Backport-Note: context slightly changed for z14.2
279+Origin: backport, https://git.qemu.org/?p=qemu.git;a=commit;h=7505deca
280+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1842774
281+Last-Update: 2019-09-24
282+
283+---
284+ target/s390x/cpu_models.c | 2 +-
285+ 1 file changed, 1 insertion(+), 1 deletion(-)
286+
287+--- a/target/s390x/cpu_models.c
288++++ b/target/s390x/cpu_models.c
289+@@ -79,7 +79,7 @@ static S390CPUDef s390_cpu_defs[] = {
290+ CPUDEF_INIT(0x2965, 13, 2, 47, 0x08000000U, "z13s", "IBM z13s GA1"),
291+ CPUDEF_INIT(0x3906, 14, 1, 47, 0x08000000U, "z14", "IBM z14 GA1"),
292+ CPUDEF_INIT(0x3907, 14, 1, 47, 0x08000000U, "z14ZR1", "IBM z14 Model ZR1 GA1"),
293+- CPUDEF_INIT(0x8561, 15, 1, 47, 0x08000000U, "gen15a", "IBM 8561 GA1"),
294++ CPUDEF_INIT(0x8561, 15, 1, 47, 0x08000000U, "gen15a", "IBM z15 GA1"),
295+ CPUDEF_INIT(0x8562, 15, 1, 47, 0x08000000U, "gen15b", "IBM 8562 GA1"),
296+ };
297+
298diff --git a/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch b/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
299new file mode 100644
300index 0000000..8d3f8b0
301--- /dev/null
302+++ b/debian/patches/ubuntu/lp-1847948-ppc-spapr-vfio-Turn-off-MSIX-emulation-for-VFIO-devi.patch
303@@ -0,0 +1,82 @@
304+From fcad0d2121976df4b422b4007a5eb7fcaac01134 Mon Sep 17 00:00:00 2001
305+From: Alexey Kardashevskiy <aik@ozlabs.ru>
306+Date: Tue, 13 Mar 2018 11:17:31 -0600
307+Subject: [PATCH] ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices
308+
309+This adds a possibility for the platform to tell VFIO not to emulate MSIX
310+so MMIO memory regions do not get split into chunks in flatview and
311+the entire page can be registered as a KVM memory slot and make direct
312+MMIO access possible for the guest.
313+
314+This enables the entire MSIX BAR mapping to the guest for the pseries
315+platform in order to achieve the maximum MMIO preformance for certain
316+devices.
317+
318+Tested on:
319+LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
320+
321+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
322+Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
323+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
324+
325+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=fcad0d2121976df4b422b4007a5e
326+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
327+Last-Update: 2019-10-15
328+
329+---
330+ hw/ppc/spapr.c | 7 +++++++
331+ hw/vfio/pci.c | 13 +++++++++++++
332+ 2 files changed, 20 insertions(+)
333+
334+diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
335+index 7e1c858566..032d03423f 100644
336+--- a/hw/ppc/spapr.c
337++++ b/hw/ppc/spapr.c
338+@@ -2855,6 +2855,11 @@ static void spapr_set_modern_hotplug_events(Object *obj, bool value,
339+ spapr->use_hotplug_event_source = value;
340+ }
341+
342++static bool spapr_get_msix_emulation(Object *obj, Error **errp)
343++{
344++ return true;
345++}
346++
347+ static char *spapr_get_resize_hpt(Object *obj, Error **errp)
348+ {
349+ sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
350+@@ -2936,6 +2941,8 @@ static void spapr_instance_init(Object *obj)
351+ object_property_set_description(obj, "vsmt",
352+ "Virtual SMT: KVM behaves as if this were"
353+ " the host's SMT mode", &error_abort);
354++ object_property_add_bool(obj, "vfio-no-msix-emulation",
355++ spapr_get_msix_emulation, NULL, NULL);
356+ }
357+
358+ static void spapr_machine_finalizefn(Object *obj)
359+diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
360+index 02974f4eb9..b9bc6cd310 100644
361+--- a/hw/vfio/pci.c
362++++ b/hw/vfio/pci.c
363+@@ -1581,6 +1581,19 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
364+ */
365+ memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
366+
367++ /*
368++ * The emulated machine may provide a paravirt interface for MSIX setup
369++ * so it is not strictly necessary to emulate MSIX here. This becomes
370++ * helpful when frequently accessed MMIO registers are located in
371++ * subpages adjacent to the MSIX table but the MSIX data containing page
372++ * cannot be mapped because of a host page size bigger than the MSIX table
373++ * alignment.
374++ */
375++ if (object_property_get_bool(OBJECT(qdev_get_machine()),
376++ "vfio-no-msix-emulation", NULL)) {
377++ memory_region_set_enabled(&vdev->pdev.msix_table_mmio, false);
378++ }
379++
380+ return 0;
381+ }
382+
383+--
384+2.23.0
385+
386diff --git a/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch b/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
387new file mode 100644
388index 0000000..b0c1550
389--- /dev/null
390+++ b/debian/patches/ubuntu/lp-1847948-vfio-pci-Allow-mmap-of-MSIX-BAR.patch
391@@ -0,0 +1,95 @@
392+From ae0215b2bb56a9d5321a185dde133bfdd306a4c0 Mon Sep 17 00:00:00 2001
393+From: Alexey Kardashevskiy <aik@ozlabs.ru>
394+Date: Tue, 13 Mar 2018 11:17:31 -0600
395+Subject: [PATCH] vfio-pci: Allow mmap of MSIX BAR
396+
397+At the moment we unconditionally avoid mapping MSIX data of a BAR and
398+emulate MSIX table in QEMU. However it is 1) not always necessary as
399+a platform may provide a paravirt interface for MSIX configuration;
400+2) can affect the speed of MMIO access by emulating them in QEMU when
401+frequently accessed registers share same system page with MSIX data,
402+this is particularly a problem for systems with the page size bigger
403+than 4KB.
404+
405+A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added
406+to the kernel [1] which tells the userspace that mapping of the MSIX data
407+is possible now. This makes use of it so from now on QEMU tries mapping
408+the entire BAR as a whole and emulate MSIX on top of that.
409+
410+[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
411+
412+Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
413+Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
414+Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
415+
416+Origin: upstream, https://git.qemu.org/?p=qemu.git;a=commit;h=ae0215b2bb56a9d5321a185dde13
417+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1847948
418+Last-Update: 2019-10-15
419+
420+---
421+ hw/vfio/common.c | 15 +++++++++++++++
422+ hw/vfio/pci.c | 9 +++++++++
423+ include/hw/vfio/vfio-common.h | 1 +
424+ 3 files changed, 25 insertions(+)
425+
426+diff --git a/hw/vfio/common.c b/hw/vfio/common.c
427+index 07c03d78b6..5e84716218 100644
428+--- a/hw/vfio/common.c
429++++ b/hw/vfio/common.c
430+@@ -1471,6 +1471,21 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
431+ return -ENODEV;
432+ }
433+
434++bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
435++{
436++ struct vfio_region_info *info = NULL;
437++ bool ret = false;
438++
439++ if (!vfio_get_region_info(vbasedev, region, &info)) {
440++ if (vfio_get_region_info_cap(info, cap_type)) {
441++ ret = true;
442++ }
443++ g_free(info);
444++ }
445++
446++ return ret;
447++}
448++
449+ /*
450+ * Interfaces for IBM EEH (Enhanced Error Handling)
451+ */
452+diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
453+index b9d2c12b82..02974f4eb9 100644
454+--- a/hw/vfio/pci.c
455++++ b/hw/vfio/pci.c
456+@@ -1294,6 +1294,15 @@ static void vfio_pci_fixup_msix_region(VFIOPCIDevice *vdev)
457+ off_t start, end;
458+ VFIORegion *region = &vdev->bars[vdev->msix->table_bar].region;
459+
460++ /*
461++ * If the host driver allows mapping of a MSIX data, we are going to
462++ * do map the entire BAR and emulate MSIX table on top of that.
463++ */
464++ if (vfio_has_region_cap(&vdev->vbasedev, region->nr,
465++ VFIO_REGION_INFO_CAP_MSIX_MAPPABLE)) {
466++ return;
467++ }
468++
469+ /*
470+ * We expect to find a single mmap covering the whole BAR, anything else
471+ * means it's either unsupported or already setup.
472+diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
473+index c5efa32750..d9360148e6 100644
474+--- a/include/hw/vfio/vfio-common.h
475++++ b/include/hw/vfio/vfio-common.h
476+@@ -193,6 +193,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
477+ struct vfio_region_info **info);
478+ int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
479+ uint32_t subtype, struct vfio_region_info **info);
480++bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
481+ #endif
482+ extern const MemoryListener vfio_prereg_listener;
483+
484+--
485+2.23.0
486+

Subscribers

People subscribed via source and target branches