Merge ~fheimes/ubuntu/+source/valgrind:valgrind-lp1825343-hirsute into ubuntu/+source/valgrind:ubuntu/hirsute-devel

Proposed by Frank Heimes
Status: Merged
Approved by: Christian Ehrhardt 
Approved revision: 9ce66c4fc97a353855c5cd5bf496d03e42867fda
Merged at revision: 9ce66c4fc97a353855c5cd5bf496d03e42867fda
Proposed branch: ~fheimes/ubuntu/+source/valgrind:valgrind-lp1825343-hirsute
Merge into: ubuntu/+source/valgrind:ubuntu/hirsute-devel
Diff against target: 3232 lines (+3198/-0)
5 files modified
debian/changelog (+9/-0)
debian/patches/lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch (+2986/-0)
debian/patches/lp-1825343-Bug-428648-s390x-Force-12-bit-amode-for-vector-loads.patch (+45/-0)
debian/patches/lp-1825343-Bug-429864-s390-Use-Iop_CasCmp-to-fix-memcheck-false.patch (+155/-0)
debian/patches/series (+3/-0)
Reviewer Review Type Date Requested Status
Christian Ehrhardt  (community) Approve
Review via email: mp+397860@code.launchpad.net

Description of the change

valgrind-lp1825343-hirsute
  add support for IBM z14 instructions to Valgrind
  debian/patches/lp-1825343-Bug-404076-s390*.patches
  debian/changelog
  backported three commits from valgrind > v3.16.1
  Thanks to Andreas Arnez (LP: #1825343)

One patch needed to be modified to skip the following two files:
  - docs/internals/s390-opcodes.csv
  - auxprogs/s390-check-opcodes.pl
since these files are not included in the upstream release tar ball 3.16.1 thereby also not included in the Ubuntu package '3.16.1-1ubuntu1'.

Test build is available here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1825343

To post a comment you must log in.
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Changelog:
- [✓] changelog entry correct version and targeted codename
- [✓] changelog entries correct
- [✓] update-maintainer has been run

New Delta:
- [✓] new patches are good or match what was is merged upstream
- [✓] new patches correctly included in debian/patches/series
- [✓] new patches have correct DEP3 metadata

Build/Test:
- [✓] build is ok (the one warning is not related to the changed code)
- [✓] verified PPA package installs/uninstalls
- [✓] sanity checks test fine

Some code changes are too complex to fully sign off on, e,g. lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch. There I have verified that it closely matches what went upstream and have to trust in them to be the subject matter experts.

Overall that looks like a good pre-FF Feature add pulled forward from the coming next version.
+1

review: Approve
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

To ssh://git.launchpad.net/ubuntu/+source/valgrind
 * [new tag] upload/1%3.16.1-1ubuntu2 -> upload/1%3.16.1-1ubuntu2

Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading valgrind_3.16.1-1ubuntu2.dsc: done.
  Uploading valgrind_3.16.1-1ubuntu2.debian.tar.xz: done.
  Uploading valgrind_3.16.1-1ubuntu2_source.buildinfo: done.
  Uploading valgrind_3.16.1-1ubuntu2_source.changes: done.
Successfully uploaded packages.

Revision history for this message
Frank Heimes (fheimes) wrote :

Many thx for reviewing, commenting, sponsoring, uploading and your overall support on this!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index c669e48..9b0d8a8 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,12 @@
6+valgrind (1:3.16.1-1ubuntu2) hirsute; urgency=medium
7+
8+ * debian/patches/lp-1825343-Bug-404076-s390*.patches
9+ adding support for IBM z14 instructions to Valgrind
10+ backported three commits from valgrind > v3.16.1
11+ Thanks to Andreas Arnez (LP: #1825343)
12+
13+ -- Frank Heimes <frank.heimes@canonical.com> Wed, 10 Feb 2021 20:10:24 +0100
14+
15 valgrind (1:3.16.1-1ubuntu1) groovy; urgency=low
16
17 * Merge from Debian unstable. Remaining changes:
18diff --git a/debian/patches/lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch b/debian/patches/lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch
19new file mode 100644
20index 0000000..fa985b9
21--- /dev/null
22+++ b/debian/patches/lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch
23@@ -0,0 +1,2986 @@
24+From 159f132289160ab1a5a5cf4da14fb57ecdb248ca Mon Sep 17 00:00:00 2001
25+From: Andreas Arnez <arnez@linux.ibm.com>
26+Date: Mon, 7 Dec 2020 20:01:26 +0100
27+Subject: [PATCH] Bug 404076 - s390x: Implement z14 vector instructions
28+
29+Implement the new instructions/features that were added to z/Architecture
30+with the vector-enhancements facility 1. Also cover the instructions from
31+the vector-packed-decimal facility that are defined outside the chapter
32+"Vector Decimal Instructions", but not the ones from that chapter itself.
33+
34+For a detailed list of newly supported instructions see the updates to
35+`docs/internals/s390-opcodes.csv'.
36+
37+Since the miscellaneous instruction extensions facility 2 was already
38+addressed by Bug 404406, this completes the support necessary to run
39+general programs built with `--march=z14' under Valgrind. The
40+vector-packed-decimal facility is currently not exploited by the standard
41+toolchain and libraries.
42+
43+Author: Andreas Arnez <arnez@linux.ibm.com>
44+Origin: backport, https://sourceware.org/git/?p=valgrind.git;a=commit;h=159f13228
45+Bug-IBM: IBM Bugzilla 163660
46+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1825343
47+Applied-Upstream: > v3.16.1
48+Reviewed-by: Frank Heimes <frank.heimes@canonical.com>
49+Last-Update: 2021-02-10
50+
51+---
52+--- a/coregrind/m_initimg/initimg-linux.c
53++++ b/coregrind/m_initimg/initimg-linux.c
54+@@ -697,9 +697,13 @@
55+ }
56+ # elif defined(VGP_s390x_linux)
57+ {
58+- /* Advertise hardware features "below" TE and VXRS. TE itself
59+- and anything above VXRS is not supported by Valgrind. */
60+- auxv->u.a_val &= (VKI_HWCAP_S390_TE - 1) | VKI_HWCAP_S390_VXRS;
61++ /* Out of the hardware features available on the platform,
62++ advertise those "below" TE, as well as the ones explicitly
63++ ORed in the expression below. Anything else, such as TE
64++ itself, is not supported by Valgrind. */
65++ auxv->u.a_val &= ((VKI_HWCAP_S390_TE - 1)
66++ | VKI_HWCAP_S390_VXRS
67++ | VKI_HWCAP_S390_VXRS_EXT);
68+ }
69+ # elif defined(VGP_arm64_linux)
70+ {
71+--- a/coregrind/m_machine.c
72++++ b/coregrind/m_machine.c
73+@@ -1544,6 +1544,7 @@
74+ { False, S390_FAC_MSA5, VEX_HWCAPS_S390X_MSA5, "MSA5" },
75+ { False, S390_FAC_MI2, VEX_HWCAPS_S390X_MI2, "MI2" },
76+ { False, S390_FAC_LSC2, VEX_HWCAPS_S390X_LSC2, "LSC2" },
77++ { False, S390_FAC_VXE, VEX_HWCAPS_S390X_VXE, "VXE" },
78+ };
79+
80+ /* Set hwcaps according to the detected facilities */
81+--- a/include/vki/vki-s390x-linux.h
82++++ b/include/vki/vki-s390x-linux.h
83+@@ -806,6 +806,7 @@
84+
85+ #define VKI_HWCAP_S390_TE 1024
86+ #define VKI_HWCAP_S390_VXRS 2048
87++#define VKI_HWCAP_S390_VXRS_EXT 8192
88+
89+
90+ //----------------------------------------------------------------------
91+--- a/NEWS
92++++ b/NEWS
93+@@ -2,6 +2,7 @@
94+ 428648 s390_emit_load_mem panics due to 20-bit offset for vector load
95+ 429864 s390x: C++ atomic test_and_set yields false-positive memcheck
96+ diagnostics
97++404076 s390x: z14 vector instructions not implemented
98+
99+ Release 3.16.1 (22 June 2020)
100+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101+--- a/none/tests/s390x/vector_float.c
102++++ b/none/tests/s390x/vector_float.c
103+@@ -114,50 +114,59 @@
104+ test_with_selective_printing(vldeb, (V128_V_RES_AS_FLOAT64 |
105+ V128_V_ARG1_AS_FLOAT64));
106+ test_with_selective_printing(wldeb, (V128_V_RES_AS_FLOAT64 |
107+- V128_V_ARG1_AS_FLOAT64));
108++ V128_V_ARG1_AS_FLOAT64 |
109++ V128_V_RES_ZERO_ONLY));
110+
111+ test_with_selective_printing(vflcdb, (V128_V_RES_AS_FLOAT64 |
112+ V128_V_ARG1_AS_FLOAT64));
113+ test_with_selective_printing(wflcdb, (V128_V_RES_AS_FLOAT64 |
114+- V128_V_ARG1_AS_FLOAT64));
115++ V128_V_ARG1_AS_FLOAT64 |
116++ V128_V_RES_ZERO_ONLY));
117+ test_with_selective_printing(vflndb, (V128_V_RES_AS_FLOAT64 |
118+ V128_V_ARG1_AS_FLOAT64));
119+ test_with_selective_printing(wflndb, (V128_V_RES_AS_FLOAT64 |
120+- V128_V_ARG1_AS_FLOAT64));
121++ V128_V_ARG1_AS_FLOAT64 |
122++ V128_V_RES_ZERO_ONLY));
123+ test_with_selective_printing(vflpdb, (V128_V_RES_AS_FLOAT64 |
124+ V128_V_ARG1_AS_FLOAT64));
125+ test_with_selective_printing(wflpdb, (V128_V_RES_AS_FLOAT64 |
126+- V128_V_ARG1_AS_FLOAT64));
127++ V128_V_ARG1_AS_FLOAT64 |
128++ V128_V_RES_ZERO_ONLY));
129+
130+ test_with_selective_printing(vfadb, (V128_V_RES_AS_FLOAT64 |
131+ V128_V_ARG1_AS_FLOAT64 |
132+ V128_V_ARG2_AS_FLOAT64));
133+ test_with_selective_printing(wfadb, (V128_V_RES_AS_FLOAT64 |
134+ V128_V_ARG1_AS_FLOAT64 |
135+- V128_V_ARG2_AS_FLOAT64));
136++ V128_V_ARG2_AS_FLOAT64 |
137++ V128_V_RES_ZERO_ONLY));
138+ test_with_selective_printing(vfsdb, (V128_V_RES_AS_FLOAT64 |
139+ V128_V_ARG1_AS_FLOAT64 |
140+ V128_V_ARG2_AS_FLOAT64));
141+ test_with_selective_printing(wfsdb, (V128_V_RES_AS_FLOAT64 |
142+ V128_V_ARG1_AS_FLOAT64 |
143+- V128_V_ARG2_AS_FLOAT64));
144++ V128_V_ARG2_AS_FLOAT64 |
145++ V128_V_RES_ZERO_ONLY));
146+ test_with_selective_printing(vfmdb, (V128_V_RES_AS_FLOAT64 |
147+ V128_V_ARG1_AS_FLOAT64 |
148+ V128_V_ARG2_AS_FLOAT64));
149+ test_with_selective_printing(wfmdb, (V128_V_RES_AS_FLOAT64 |
150+ V128_V_ARG1_AS_FLOAT64 |
151+- V128_V_ARG2_AS_FLOAT64));
152++ V128_V_ARG2_AS_FLOAT64 |
153++ V128_V_RES_ZERO_ONLY));
154+ test_with_selective_printing(vfddb, (V128_V_RES_AS_FLOAT64 |
155+ V128_V_ARG1_AS_FLOAT64 |
156+ V128_V_ARG2_AS_FLOAT64));
157+ test_with_selective_printing(wfddb, (V128_V_RES_AS_FLOAT64 |
158+ V128_V_ARG1_AS_FLOAT64 |
159+- V128_V_ARG2_AS_FLOAT64));
160++ V128_V_ARG2_AS_FLOAT64 |
161++ V128_V_RES_ZERO_ONLY));
162+
163+ test_with_selective_printing(vfsqdb, (V128_V_RES_AS_FLOAT64 |
164+ V128_V_ARG1_AS_FLOAT64));
165+ test_with_selective_printing(wfsqdb, (V128_V_RES_AS_FLOAT64 |
166+- V128_V_ARG1_AS_FLOAT64));
167++ V128_V_ARG1_AS_FLOAT64 |
168++ V128_V_RES_ZERO_ONLY));
169+
170+ test_with_selective_printing(vfmadb, (V128_V_RES_AS_FLOAT64 |
171+ V128_V_ARG1_AS_FLOAT64 |
172+@@ -166,7 +175,8 @@
173+ test_with_selective_printing(wfmadb, (V128_V_RES_AS_FLOAT64 |
174+ V128_V_ARG1_AS_FLOAT64 |
175+ V128_V_ARG2_AS_FLOAT64 |
176+- V128_V_ARG3_AS_FLOAT64));
177++ V128_V_ARG3_AS_FLOAT64 |
178++ V128_V_RES_ZERO_ONLY));
179+ test_with_selective_printing(vfmsdb, (V128_V_RES_AS_FLOAT64 |
180+ V128_V_ARG1_AS_FLOAT64 |
181+ V128_V_ARG2_AS_FLOAT64 |
182+@@ -174,21 +184,25 @@
183+ test_with_selective_printing(wfmsdb, (V128_V_RES_AS_FLOAT64 |
184+ V128_V_ARG1_AS_FLOAT64 |
185+ V128_V_ARG2_AS_FLOAT64 |
186+- V128_V_ARG3_AS_FLOAT64));
187++ V128_V_ARG3_AS_FLOAT64 |
188++ V128_V_RES_ZERO_ONLY));
189+
190+ test_with_selective_printing(wfcdb, (V128_V_ARG1_AS_FLOAT64 |
191+ V128_V_ARG2_AS_FLOAT64 |
192+- V128_R_RES));
193++ V128_R_RES |
194++ V128_V_RES_ZERO_ONLY));
195+ test_with_selective_printing(wfkdb, (V128_V_ARG1_AS_FLOAT64 |
196+ V128_V_ARG2_AS_FLOAT64 |
197+- V128_R_RES));
198++ V128_R_RES |
199++ V128_V_RES_ZERO_ONLY));
200+
201+ test_with_selective_printing(vfcedb, (V128_V_RES_AS_INT |
202+ V128_V_ARG1_AS_FLOAT64 |
203+ V128_V_ARG2_AS_FLOAT64));
204+ test_with_selective_printing(wfcedb, (V128_V_RES_AS_INT |
205+ V128_V_ARG1_AS_FLOAT64 |
206+- V128_V_ARG2_AS_FLOAT64));
207++ V128_V_ARG2_AS_FLOAT64 |
208++ V128_V_RES_ZERO_ONLY));
209+ test_with_selective_printing(vfcedbs, (V128_V_RES_AS_INT |
210+ V128_V_ARG1_AS_FLOAT64 |
211+ V128_V_ARG2_AS_FLOAT64 |
212+@@ -196,14 +210,16 @@
213+ test_with_selective_printing(wfcedbs, (V128_V_RES_AS_INT |
214+ V128_V_ARG1_AS_FLOAT64 |
215+ V128_V_ARG2_AS_FLOAT64 |
216+- V128_R_RES));
217++ V128_R_RES |
218++ V128_V_RES_ZERO_ONLY));
219+
220+ test_with_selective_printing(vfchdb, (V128_V_RES_AS_INT |
221+ V128_V_ARG1_AS_FLOAT64 |
222+ V128_V_ARG2_AS_FLOAT64));
223+ test_with_selective_printing(wfchdb, (V128_V_RES_AS_INT |
224+ V128_V_ARG1_AS_FLOAT64 |
225+- V128_V_ARG2_AS_FLOAT64));
226++ V128_V_ARG2_AS_FLOAT64 |
227++ V128_V_RES_ZERO_ONLY));
228+ test_with_selective_printing(vfchdbs, (V128_V_RES_AS_INT |
229+ V128_V_ARG1_AS_FLOAT64 |
230+ V128_V_ARG2_AS_FLOAT64 |
231+@@ -211,14 +227,16 @@
232+ test_with_selective_printing(wfchdbs, (V128_V_RES_AS_INT |
233+ V128_V_ARG1_AS_FLOAT64 |
234+ V128_V_ARG2_AS_FLOAT64 |
235+- V128_R_RES));
236++ V128_R_RES |
237++ V128_V_RES_ZERO_ONLY));
238+
239+ test_with_selective_printing(vfchedb, (V128_V_RES_AS_INT |
240+ V128_V_ARG1_AS_FLOAT64 |
241+ V128_V_ARG2_AS_FLOAT64));
242+ test_with_selective_printing(wfchedb, (V128_V_RES_AS_INT |
243+ V128_V_ARG1_AS_FLOAT64 |
244+- V128_V_ARG2_AS_FLOAT64));
245++ V128_V_ARG2_AS_FLOAT64 |
246++ V128_V_RES_ZERO_ONLY));
247+ test_with_selective_printing(vfchedbs, (V128_V_RES_AS_INT |
248+ V128_V_ARG1_AS_FLOAT64 |
249+ V128_V_ARG2_AS_FLOAT64 |
250+@@ -226,7 +244,8 @@
251+ test_with_selective_printing(wfchedbs, (V128_V_RES_AS_INT |
252+ V128_V_ARG1_AS_FLOAT64 |
253+ V128_V_ARG2_AS_FLOAT64 |
254+- V128_R_RES));
255++ V128_R_RES |
256++ V128_V_RES_ZERO_ONLY));
257+
258+ test_with_selective_printing(vftcidb0, (V128_V_RES_AS_INT |
259+ V128_V_ARG1_AS_FLOAT64 |
260+--- a/none/tests/s390x/vector_float.stdout.exp
261++++ b/none/tests/s390x/vector_float.stdout.exp
262+@@ -419,88 +419,88 @@
263+ v_result = 7fffffffffffffff | 7fffffffffffffff
264+ v_arg1 = 0x1.fed2f087c21p+341 | 0x1.180e4c1d87fc4p+682
265+ insn wcgdb00:
266+- v_result = 7fffffffffffffff | 0000000000000000
267++ v_result = 7fffffffffffffff | --
268+ v_arg1 = 0x1.d7fd9222e8b86p+670 | 0x1.c272612672a3p+798
269+ insn wcgdb00:
270+- v_result = 0000000000000000 | 0000000000000000
271++ v_result = 0000000000000000 | --
272+ v_arg1 = 0x1.745cd360987e5p-496 | -0x1.f3b404919f358p-321
273+ insn wcgdb00:
274+- v_result = 8000000000000000 | 0000000000000000
275++ v_result = 8000000000000000 | --
276+ v_arg1 = -0x1.9523565cd92d5p+643 | 0x1.253677d6d3be2p-556
277+ insn wcgdb00:
278+- v_result = 7fffffffffffffff | 0000000000000000
279++ v_result = 7fffffffffffffff | --
280+ v_arg1 = 0x1.b6eb576ec3e6ap+845 | -0x1.c7e102c503d91p+266
281+ insn wcgdb01:
282+- v_result = 0000000000000000 | 0000000000000000
283++ v_result = 0000000000000000 | --
284+ v_arg1 = -0x1.3d4319841f4d6p-1011 | -0x1.2feabf7dfc506p-680
285+ insn wcgdb01:
286+- v_result = 0000000000000000 | 0000000000000000
287++ v_result = 0000000000000000 | --
288+ v_arg1 = -0x1.6fb8d1cd8b32cp-843 | -0x1.50f6a6922f97ep+33
289+ insn wcgdb01:
290+- v_result = 0000000000000000 | 0000000000000000
291++ v_result = 0000000000000000 | --
292+ v_arg1 = -0x1.64a673daccf1ap-566 | -0x1.69ef9b1d01499p+824
293+ insn wcgdb01:
294+- v_result = 8000000000000000 | 0000000000000000
295++ v_result = 8000000000000000 | --
296+ v_arg1 = -0x1.3e2ddd862b4adp+1005 | -0x1.312466410271p+184
297+ insn wcgdb03:
298+- v_result = 0000000000000001 | 0000000000000000
299++ v_result = 0000000000000001 | --
300+ v_arg1 = 0x1.d594c3412a11p-953 | -0x1.a07393d34d77cp-224
301+ insn wcgdb03:
302+- v_result = 8000000000000000 | 0000000000000000
303++ v_result = 8000000000000000 | --
304+ v_arg1 = -0x1.f7a0dbcfd6e4cp+104 | -0x1.40f7cde7f2214p-702
305+ insn wcgdb03:
306+- v_result = 8000000000000000 | 0000000000000000
307++ v_result = 8000000000000000 | --
308+ v_arg1 = -0x1.40739c1574808p+560 | -0x1.970328ddf1b6ep-374
309+ insn wcgdb03:
310+- v_result = 0000000000000001 | 0000000000000000
311++ v_result = 0000000000000001 | --
312+ v_arg1 = 0x1.477653afd7048p-38 | 0x1.1eac2f8b2a93cp-384
313+ insn wcgdb04:
314+- v_result = ffffffffe9479a7d | 0000000000000000
315++ v_result = ffffffffe9479a7d | --
316+ v_arg1 = -0x1.6b865833eff3p+28 | 0x1.06e8cf1834d0ep-722
317+ insn wcgdb04:
318+- v_result = 0000000000000000 | 0000000000000000
319++ v_result = 0000000000000000 | --
320+ v_arg1 = 0x1.eef0b2294a5cp-544 | -0x1.8e8b133ccda15p+752
321+ insn wcgdb04:
322+- v_result = 0000000000000000 | 0000000000000000
323++ v_result = 0000000000000000 | --
324+ v_arg1 = -0x1.f34e77e6b6698p-894 | -0x1.9f7ce1cb53bddp-896
325+ insn wcgdb04:
326+- v_result = 7fffffffffffffff | 0000000000000000
327++ v_result = 7fffffffffffffff | --
328+ v_arg1 = 0x1.95707a6d75db5p+1018 | -0x1.3b0c072d23011p-224
329+ insn wcgdb05:
330+- v_result = 0000000000000000 | 0000000000000000
331++ v_result = 0000000000000000 | --
332+ v_arg1 = -0x1.a9fb71160793p-968 | 0x1.05f601fe8123ap-986
333+ insn wcgdb05:
334+- v_result = 8000000000000000 | 0000000000000000
335++ v_result = 8000000000000000 | --
336+ v_arg1 = -0x1.0864159b94305p+451 | -0x1.d4647f5a78b7ep-599
337+ insn wcgdb05:
338+- v_result = 7fffffffffffffff | 0000000000000000
339++ v_result = 7fffffffffffffff | --
340+ v_arg1 = 0x1.37eadff8397c8p+432 | -0x1.15d896b6f6063p+464
341+ insn wcgdb05:
342+- v_result = 0000000000000000 | 0000000000000000
343++ v_result = 0000000000000000 | --
344+ v_arg1 = 0x1.eb0812b0d677p-781 | 0x1.3117c5e0e288cp-202
345+ insn wcgdb06:
346+- v_result = 0000000000000001 | 0000000000000000
347++ v_result = 0000000000000001 | --
348+ v_arg1 = 0x1.6b88069167c0fp-662 | -0x1.70571d27e1279p+254
349+ insn wcgdb06:
350+- v_result = 7fffffffffffffff | 0000000000000000
351++ v_result = 7fffffffffffffff | --
352+ v_arg1 = 0x1.f6a6d6e883596p+260 | 0x1.0d578afaaa34ap+604
353+ insn wcgdb06:
354+- v_result = 0000000000000001 | 0000000000000000
355++ v_result = 0000000000000001 | --
356+ v_arg1 = 0x1.d91c7d13c4694p-475 | -0x1.ecf1f8529767bp+830
357+ insn wcgdb06:
358+- v_result = 0000000000000001 | 0000000000000000
359++ v_result = 0000000000000001 | --
360+ v_arg1 = 0x1.fac8dd3bb7af6p-101 | 0x1.fb8324a00fba8p+959
361+ insn wcgdb07:
362+- v_result = 7fffffffffffffff | 0000000000000000
363++ v_result = 7fffffffffffffff | --
364+ v_arg1 = 0x1.4b0fa18fa73c7p+111 | -0x1.08e7b17633a49p+61
365+ insn wcgdb07:
366+- v_result = e636b693e39a1100 | 0000000000000000
367++ v_result = e636b693e39a1100 | --
368+ v_arg1 = -0x1.9c9496c1c65efp+60 | 0x1.c4182ee728d76p-572
369+ insn wcgdb07:
370+- v_result = ffffffffffffffff | 0000000000000000
371++ v_result = ffffffffffffffff | --
372+ v_arg1 = -0x1.819718032dff7p-303 | 0x1.a784c77ff6aa2p-622
373+ insn wcgdb07:
374+- v_result = 7fffffffffffffff | 0000000000000000
375++ v_result = 7fffffffffffffff | --
376+ v_arg1 = 0x1.978e8abfd83c2p+152 | 0x1.2531ebf451762p+315
377+ insn vclgdb00:
378+ v_result = 0000000000000000 | 0000000000000000
379+@@ -587,88 +587,88 @@
380+ v_result = 0000000000000000 | 0000000000000000
381+ v_arg1 = -0x1.137bbb51f08bdp+306 | 0x1.18d2a1063356p-795
382+ insn wclgdb00:
383+- v_result = 0000000000000000 | 0000000000000000
384++ v_result = 0000000000000000 | --
385+ v_arg1 = -0x1.e66f55dcc2639p-1013 | -0x1.733ee56929f3bp-304
386+ insn wclgdb00:
387+- v_result = 0000000000000000 | 0000000000000000
388++ v_result = 0000000000000000 | --
389+ v_arg1 = 0x1.8802fd9ab740cp-986 | -0x1.64d4d2c7c145fp-1015
390+ insn wclgdb00:
391+- v_result = 0000000000000000 | 0000000000000000
392++ v_result = 0000000000000000 | --
393+ v_arg1 = 0x1.a67209b8c407bp-645 | -0x1.6410ff9b1c801p+487
394+ insn wclgdb00:
395+- v_result = 0000000000000000 | 0000000000000000
396++ v_result = 0000000000000000 | --
397+ v_arg1 = -0x1.cb2febaefeb2dp+49 | 0x1.dee368b2ec375p-502
398+ insn wclgdb01:
399+- v_result = 0000000000000000 | 0000000000000000
400++ v_result = 0000000000000000 | --
401+ v_arg1 = 0x1.5703db3c1b0e2p-728 | 0x1.068c4d51ea4ebp+617
402+ insn wclgdb01:
403+- v_result = 0000000000000000 | 0000000000000000
404++ v_result = 0000000000000000 | --
405+ v_arg1 = -0x1.ae350291e5b3ep+291 | 0x1.1b87bb09b6032p+376
406+ insn wclgdb01:
407+- v_result = ffffffffffffffff | 0000000000000000
408++ v_result = ffffffffffffffff | --
409+ v_arg1 = 0x1.c4666a710127ep+424 | -0x1.19e969b6c0076p+491
410+ insn wclgdb01:
411+- v_result = ffffffffffffffff | 0000000000000000
412++ v_result = ffffffffffffffff | --
413+ v_arg1 = 0x1.c892c5a4d103fp+105 | -0x1.d4f937cc76704p+749
414+ insn wclgdb03:
415+- v_result = 0000000000000001 | 0000000000000000
416++ v_result = 0000000000000001 | --
417+ v_arg1 = 0x1.81090d8fc663dp-111 | 0x1.337ec5e0f0904p+1
418+ insn wclgdb03:
419+- v_result = 0000000000000000 | 0000000000000000
420++ v_result = 0000000000000000 | --
421+ v_arg1 = -0x1.e787adc70b91p-593 | 0x1.db8d83196b53cp-762
422+ insn wclgdb03:
423+- v_result = ffffffffffffffff | 0000000000000000
424++ v_result = ffffffffffffffff | --
425+ v_arg1 = 0x1.6529307e907efp+389 | -0x1.3ea0d8d5b4dd2p+589
426+ insn wclgdb03:
427+- v_result = 0000000000000000 | 0000000000000000
428++ v_result = 0000000000000000 | --
429+ v_arg1 = -0x1.be701a158637p-385 | 0x1.c5a7f70cb8a09p+107
430+ insn wclgdb04:
431+- v_result = 0000000000000000 | 0000000000000000
432++ v_result = 0000000000000000 | --
433+ v_arg1 = -0x1.2f328571ab445p+21 | -0x1.dcc21fc82ba01p-930
434+ insn wclgdb04:
435+- v_result = 0000000000000000 | 0000000000000000
436++ v_result = 0000000000000000 | --
437+ v_arg1 = -0x1.06b69fcbb7bffp-415 | 0x1.6f9a13a0a827ap+915
438+ insn wclgdb04:
439+- v_result = 0000000000000000 | 0000000000000000
440++ v_result = 0000000000000000 | --
441+ v_arg1 = -0x1.738e549b38bcdp+479 | 0x1.a522edb999c9p-45
442+ insn wclgdb04:
443+- v_result = 0000000000000000 | 0000000000000000
444++ v_result = 0000000000000000 | --
445+ v_arg1 = 0x1.7f9399d2bcf3bp-215 | -0x1.7bc35f2d69a7fp+818
446+ insn wclgdb05:
447+- v_result = ffffffffffffffff | 0000000000000000
448++ v_result = ffffffffffffffff | --
449+ v_arg1 = 0x1.fc542bdb707f6p+880 | -0x1.8521ebc93a25fp-969
450+ insn wclgdb05:
451+- v_result = 1ce8d9951b8c8600 | 0000000000000000
452++ v_result = 1ce8d9951b8c8600 | --
453+ v_arg1 = 0x1.ce8d9951b8c86p+60 | 0x1.92712589230e7p+475
454+ insn wclgdb05:
455+- v_result = 0000000000000000 | 0000000000000000
456++ v_result = 0000000000000000 | --
457+ v_arg1 = -0x1.8a297f60a0811p-156 | 0x1.102b79043d82cp-204
458+ insn wclgdb05:
459+- v_result = 0000000000000000 | 0000000000000000
460++ v_result = 0000000000000000 | --
461+ v_arg1 = 0x1.beb9057e1401dp-196 | -0x1.820f18f830262p+15
462+ insn wclgdb06:
463+- v_result = 0000000000000001 | 0000000000000000
464++ v_result = 0000000000000001 | --
465+ v_arg1 = 0x1.c321a966ecb4dp-430 | -0x1.2f6a1a95ead99p-943
466+ insn wclgdb06:
467+- v_result = 0000000000000000 | 0000000000000000
468++ v_result = 0000000000000000 | --
469+ v_arg1 = -0x1.f1a86b4aed821p-56 | -0x1.1ee6717cc2d7fp-899
470+ insn wclgdb06:
471+- v_result = 0000000000000000 | 0000000000000000
472++ v_result = 0000000000000000 | --
473+ v_arg1 = -0x1.73ce49d89ecb9p-302 | 0x1.52663b975ed23p-716
474+ insn wclgdb06:
475+- v_result = 0000000000000000 | 0000000000000000
476++ v_result = 0000000000000000 | --
477+ v_arg1 = -0x1.3e9c2de97a292p+879 | 0x1.d34eed36f2eafp+960
478+ insn wclgdb07:
479+- v_result = 0000000000000000 | 0000000000000000
480++ v_result = 0000000000000000 | --
481+ v_arg1 = -0x1.4e6ec6ddc6a45p-632 | -0x1.6e564d0fec72bp+369
482+ insn wclgdb07:
483+- v_result = ffffffffffffffff | 0000000000000000
484++ v_result = ffffffffffffffff | --
485+ v_arg1 = 0x1.42e2c658e4c4dp+459 | -0x1.9f9dc0252e44p+85
486+ insn wclgdb07:
487+- v_result = 0000000000000000 | 0000000000000000
488++ v_result = 0000000000000000 | --
489+ v_arg1 = -0x1.fb40ac8cda3c1p-762 | 0x1.0e9ed614bc8f1p-342
490+ insn wclgdb07:
491+- v_result = 0000000000000000 | 0000000000000000
492++ v_result = 0000000000000000 | --
493+ v_arg1 = -0x1.c1f8b3c68e214p+118 | -0x1.1a26a49368b61p+756
494+ insn vfidb00:
495+ v_arg1 = -0x1.38df4cf9d52dbp-545 | -0x1.049253d90dd92p+94
496+@@ -1020,16 +1020,16 @@
497+ v_result = -0x1.6f5fb2p+70 | -0x1.0d2df6p-107
498+ insn wldeb:
499+ v_arg1 = -0x1.d26169729db2ap-435 | 0x1.d6fd080793e8cp+767
500+- v_result = -0x1.9a4c2cp-54 | 0x0p+0
501++ v_result = -0x1.9a4c2cp-54 | --
502+ insn wldeb:
503+ v_arg1 = -0x1.f4b59107fce61p-930 | 0x1.cdf2816e253f4p-168
504+- v_result = -0x1.be96b2p-116 | 0x0p+0
505++ v_result = -0x1.be96b2p-116 | --
506+ insn wldeb:
507+ v_arg1 = -0x1.9603a2997928cp-441 | -0x1.aada85e355a11p-767
508+- v_result = -0x1.d2c074p-55 | 0x0p+0
509++ v_result = -0x1.d2c074p-55 | --
510+ insn wldeb:
511+ v_arg1 = 0x1.25ccf5bd0e83p+620 | 0x1.e1635864ebb17p-88
512+- v_result = 0x1.64b99ep+78 | 0x0p+0
513++ v_result = 0x1.64b99ep+78 | --
514+ insn vflcdb:
515+ v_arg1 = 0x1.0ae6d82f76afp-166 | -0x1.e8fb1e03a7415p-191
516+ v_result = -0x1.0ae6d82f76afp-166 | 0x1.e8fb1e03a7415p-191
517+@@ -1044,16 +1044,16 @@
518+ v_result = -0x1.19520153d35b4p-301 | -0x1.ac5325cd23253p+396
519+ insn wflcdb:
520+ v_arg1 = 0x1.ffd3eecfd54d7p-831 | -0x1.97854fa523a77p+146
521+- v_result = -0x1.ffd3eecfd54d7p-831 | 0x0p+0
522++ v_result = -0x1.ffd3eecfd54d7p-831 | --
523+ insn wflcdb:
524+ v_arg1 = -0x1.508ea45606447p-442 | 0x1.ae7f0e6cf9d2bp+583
525+- v_result = 0x1.508ea45606447p-442 | 0x0p+0
526++ v_result = 0x1.508ea45606447p-442 | --
527+ insn wflcdb:
528+ v_arg1 = 0x1.da8ab2188c21ap+94 | 0x1.78a9c152aa074p-808
529+- v_result = -0x1.da8ab2188c21ap+94 | 0x0p+0
530++ v_result = -0x1.da8ab2188c21ap+94 | --
531+ insn wflcdb:
532+ v_arg1 = -0x1.086882645e0c5p-1001 | -0x1.54e2de5af5a74p-262
533+- v_result = 0x1.086882645e0c5p-1001 | 0x0p+0
534++ v_result = 0x1.086882645e0c5p-1001 | --
535+ insn vflndb:
536+ v_arg1 = -0x1.5bec561d407dcp+819 | -0x1.a5773dadb7a2dp+935
537+ v_result = -0x1.5bec561d407dcp+819 | -0x1.a5773dadb7a2dp+935
538+@@ -1068,16 +1068,16 @@
539+ v_result = -0x1.c5bc39a06d4e2p-259 | -0x1.c5e61ad849e77p-833
540+ insn wflndb:
541+ v_arg1 = -0x1.e9f3e6d1beffap-117 | -0x1.d58cc8bf123b3p-714
542+- v_result = -0x1.e9f3e6d1beffap-117 | 0x0p+0
543++ v_result = -0x1.e9f3e6d1beffap-117 | --
544+ insn wflndb:
545+ v_arg1 = -0x1.3fc4ef2e7485ep-691 | 0x1.eb328986081efp-775
546+- v_result = -0x1.3fc4ef2e7485ep-691 | 0x0p+0
547++ v_result = -0x1.3fc4ef2e7485ep-691 | --
548+ insn wflndb:
549+ v_arg1 = -0x1.7146c5afdec16p+23 | -0x1.597fcfa1fab2p-708
550+- v_result = -0x1.7146c5afdec16p+23 | 0x0p+0
551++ v_result = -0x1.7146c5afdec16p+23 | --
552+ insn wflndb:
553+ v_arg1 = 0x1.03f8d7e9afe84p-947 | 0x1.9a10c3feb6b57p-118
554+- v_result = -0x1.03f8d7e9afe84p-947 | 0x0p+0
555++ v_result = -0x1.03f8d7e9afe84p-947 | --
556+ insn vflpdb:
557+ v_arg1 = 0x1.64ae59b6c762ep-407 | -0x1.fa7191ab21e86p+533
558+ v_result = 0x1.64ae59b6c762ep-407 | 0x1.fa7191ab21e86p+533
559+@@ -1092,16 +1092,16 @@
560+ v_result = 0x1.85fa2de1d492ap+170 | 0x1.ac36828822c11p-968
561+ insn wflpdb:
562+ v_arg1 = 0x1.a6cf677640a73p-871 | 0x1.b6f1792385922p-278
563+- v_result = 0x1.a6cf677640a73p-871 | 0x0p+0
564++ v_result = 0x1.a6cf677640a73p-871 | --
565+ insn wflpdb:
566+ v_arg1 = -0x1.b886774f6d888p-191 | -0x1.6a2b08d735d22p-643
567+- v_result = 0x1.b886774f6d888p-191 | 0x0p+0
568++ v_result = 0x1.b886774f6d888p-191 | --
569+ insn wflpdb:
570+ v_arg1 = 0x1.5045d37d46f5fp+943 | -0x1.333a86ef2dcf6p-1013
571+- v_result = 0x1.5045d37d46f5fp+943 | 0x0p+0
572++ v_result = 0x1.5045d37d46f5fp+943 | --
573+ insn wflpdb:
574+ v_arg1 = 0x1.1e7bec6ada14dp+252 | 0x1.a70b3f3e24dap-153
575+- v_result = 0x1.1e7bec6ada14dp+252 | 0x0p+0
576++ v_result = 0x1.1e7bec6ada14dp+252 | --
577+ insn vfadb:
578+ v_arg1 = 0x1.5b1ad8e9f17c6p-294 | -0x1.ddd8300a0bf02p+122
579+ v_arg2 = -0x1.9b49c31ca8ac6p+926 | 0x1.fdbc992926268p+677
580+@@ -1121,19 +1121,19 @@
581+ insn wfadb:
582+ v_arg1 = 0x1.3c5466cb80722p+489 | -0x1.11e1770053ca2p+924
583+ v_arg2 = 0x1.d876cd721a726p-946 | 0x1.5c04ceb79c9bcp+1001
584+- v_result = 0x1.3c5466cb80722p+489 | 0x0p+0
585++ v_result = 0x1.3c5466cb80722p+489 | --
586+ insn wfadb:
587+ v_arg1 = 0x1.b0b142d6b76a3p+577 | 0x1.3146824e993a2p+432
588+ v_arg2 = -0x1.f7f3b7582925fp-684 | -0x1.9700143c2b935p-837
589+- v_result = 0x1.b0b142d6b76a2p+577 | 0x0p+0
590++ v_result = 0x1.b0b142d6b76a2p+577 | --
591+ insn wfadb:
592+ v_arg1 = -0x1.8d65e15edabd6p+244 | 0x1.3be7fd08492d6p-141
593+ v_arg2 = -0x1.5eef86490fb0ap+481 | 0x1.7b26c897cb6dfp+810
594+- v_result = -0x1.5eef86490fb0ap+481 | 0x0p+0
595++ v_result = -0x1.5eef86490fb0ap+481 | --
596+ insn wfadb:
597+ v_arg1 = -0x1.2dffa5b5f29p+34 | 0x1.71a026274602fp-881
598+ v_arg2 = 0x1.4dad707287289p+756 | -0x1.1500d55807247p-616
599+- v_result = 0x1.4dad707287288p+756 | 0x0p+0
600++ v_result = 0x1.4dad707287288p+756 | --
601+ insn vfsdb:
602+ v_arg1 = 0x1.054fd9c4d4883p+644 | 0x1.45c90ed85bd7fp-780
603+ v_arg2 = 0x1.f3bc7a611dadap+494 | -0x1.7c9e1e858ba5bp-301
604+@@ -1153,19 +1153,19 @@
605+ insn wfsdb:
606+ v_arg1 = 0x1.9090dabf846e7p-648 | 0x1.1c4ab843a2d15p+329
607+ v_arg2 = -0x1.a7ceb293690dep+316 | 0x1.22245954a20cp+42
608+- v_result = 0x1.a7ceb293690dep+316 | 0x0p+0
609++ v_result = 0x1.a7ceb293690dep+316 | --
610+ insn wfsdb:
611+ v_arg1 = 0x1.4e5347c27819p-933 | -0x1.56a30bda28351p-64
612+ v_arg2 = -0x1.dedb9f3935b56p-155 | 0x1.8c5b6ed76816cp-522
613+- v_result = 0x1.dedb9f3935b56p-155 | 0x0p+0
614++ v_result = 0x1.dedb9f3935b56p-155 | --
615+ insn wfsdb:
616+ v_arg1 = 0x1.0ec4e562a015bp-491 | 0x1.3996381b52d9fp-686
617+ v_arg2 = 0x1.1dcce4e81819p+960 | -0x1.32fa425e8fc08p-263
618+- v_result = -0x1.1dcce4e81818fp+960 | 0x0p+0
619++ v_result = -0x1.1dcce4e81818fp+960 | --
620+ insn wfsdb:
621+ v_arg1 = -0x1.587229f90f77dp-19 | 0x1.100d8eb8105e4p-784
622+ v_arg2 = -0x1.afb4cce4c43ddp+530 | -0x1.6da7f05e7f512p-869
623+- v_result = 0x1.afb4cce4c43dcp+530 | 0x0p+0
624++ v_result = 0x1.afb4cce4c43dcp+530 | --
625+ insn vfmdb:
626+ v_arg1 = 0x1.892b425556c47p-124 | 0x1.38222404079dfp-656
627+ v_arg2 = 0x1.af612ed2c342dp-267 | -0x1.1f735fd6ce768p-877
628+@@ -1185,19 +1185,19 @@
629+ insn wfmdb:
630+ v_arg1 = -0x1.b992d950126a1p-683 | -0x1.9c1b22eb58c59p-497
631+ v_arg2 = 0x1.b557a7d8e32c3p-25 | -0x1.f746b2ddafccep+227
632+- v_result = -0x1.792f6fb13894ap-707 | 0x0p+0
633++ v_result = -0x1.792f6fb13894ap-707 | --
634+ insn wfmdb:
635+ v_arg1 = -0x1.677a8c20a5a2fp+876 | 0x1.c03e7b97e8c0dp-645
636+ v_arg2 = 0x1.dab44be430937p-1011 | -0x1.3f51352c67be9p-916
637+- v_result = -0x1.4d4b0a1827064p-134 | 0x0p+0
638++ v_result = -0x1.4d4b0a1827064p-134 | --
639+ insn wfmdb:
640+ v_arg1 = -0x1.da60f596ad0cep+254 | 0x1.52332e0650e33p+966
641+ v_arg2 = 0x1.a042c52ed993cp+215 | 0x1.8f380c84aa133p+204
642+- v_result = -0x1.81aca4bbcbd24p+470 | 0x0p+0
643++ v_result = -0x1.81aca4bbcbd24p+470 | --
644+ insn wfmdb:
645+ v_arg1 = -0x1.83d17f11f6aa3p-469 | -0x1.98117efe89b9ep-361
646+ v_arg2 = 0x1.8c445fd46d214p-701 | -0x1.f98118821821cp+596
647+- v_result = -0x0p+0 | 0x0p+0
648++ v_result = -0x0p+0 | --
649+ insn vfddb:
650+ v_arg1 = -0x1.ecbb48899e0f1p+969 | 0x1.caf175ab352p-20
651+ v_arg2 = -0x1.9455d67f9f79dp+208 | 0x1.bc4a431b04a6fp+482
652+@@ -1217,19 +1217,19 @@
653+ insn wfddb:
654+ v_arg1 = 0x1.bd48489b60731p-114 | 0x1.a760dcf57b74fp-51
655+ v_arg2 = -0x1.171f83409eeb6p-402 | -0x1.e159d1409bdc6p-972
656+- v_result = -0x1.9864f1511f8cp+288 | 0x0p+0
657++ v_result = -0x1.9864f1511f8cp+288 | --
658+ insn wfddb:
659+ v_arg1 = -0x1.120505ef4606p-637 | -0x1.83f6f775c0eb7p+272
660+ v_arg2 = -0x1.d18ba3872fde1p+298 | 0x1.c60f8d191068cp-454
661+- v_result = 0x1.2d5cdb15a686cp-936 | 0x0p+0
662++ v_result = 0x1.2d5cdb15a686cp-936 | --
663+ insn wfddb:
664+ v_arg1 = 0x1.f637f7f8c790fp-97 | -0x1.7bdce4d74947p+189
665+ v_arg2 = -0x1.1c8f2d1b3a2edp-218 | -0x1.55fdfd1840241p-350
666+- v_result = -0x1.c3d0799c1420fp+121 | 0x0p+0
667++ v_result = -0x1.c3d0799c1420fp+121 | --
668+ insn wfddb:
669+ v_arg1 = -0x1.c63b7b2eee253p+250 | 0x1.dfd9dcd8b823fp-125
670+ v_arg2 = 0x1.094a1f1f87e0cp+629 | 0x1.eeaa23c0d7843p-814
671+- v_result = -0x1.b653a10ebdeccp-379 | 0x0p+0
672++ v_result = -0x1.b653a10ebdeccp-379 | --
673+ insn vfsqdb:
674+ v_arg1 = 0x1.f60db25f7066p-703 | -0x1.d43509abca8c3p+631
675+ v_result = 0x1.fb009ab25ec11p-352 | nan
676+@@ -1244,16 +1244,16 @@
677+ v_result = 0x1.833dba0954bccp+249 | nan
678+ insn wfsqdb:
679+ v_arg1 = 0x1.71af4e7f64978p+481 | -0x1.3429dc60011d7p-879
680+- v_result = 0x1.b30fc65551133p+240 | 0x0p+0
681++ v_result = 0x1.b30fc65551133p+240 | --
682+ insn wfsqdb:
683+ v_arg1 = 0x1.5410db1c5f403p+173 | 0x1.97fa6581e692fp+108
684+- v_result = 0x1.a144f43a592c1p+86 | 0x0p+0
685++ v_result = 0x1.a144f43a592c1p+86 | --
686+ insn wfsqdb:
687+ v_arg1 = -0x1.5838027725afep+6 | 0x1.ac61529c11f38p+565
688+- v_result = nan | 0x0p+0
689++ v_result = nan | --
690+ insn wfsqdb:
691+ v_arg1 = -0x1.159e341dcc06ep-439 | 0x1.ed54ce5481ba5p-574
692+- v_result = nan | 0x0p+0
693++ v_result = nan | --
694+ insn vfmadb:
695+ v_arg1 = -0x1.eb00a5c503d75p+538 | 0x1.89fae603ddc07p+767
696+ v_arg2 = -0x1.71c72712c3957p+715 | 0x1.1bd5773442feap+762
697+@@ -1278,22 +1278,22 @@
698+ v_arg1 = 0x1.1cc5b10a14d54p+668 | -0x1.686407390f7d1p+616
699+ v_arg2 = -0x1.bf34549e73246p+676 | -0x1.dc5a34cc470f3p+595
700+ v_arg3 = -0x1.95e0fdcf13974p-811 | -0x1.79c7cc1a8ec83p-558
701+- v_result = -0x1.fffffffffffffp+1023 | 0x0p+0
702++ v_result = -0x1.fffffffffffffp+1023 | --
703+ insn wfmadb:
704+ v_arg1 = 0x1.138bc1a5d75f8p+713 | -0x1.e226ebba2fe54p+381
705+ v_arg2 = -0x1.081ebb7cc3414p-772 | 0x1.369d99e174fc3p+922
706+ v_arg3 = -0x1.0671c682a5d0cp-1016 | 0x1.03c9530dd0377p+378
707+- v_result = -0x1.1c4933e117d95p-59 | 0x0p+0
708++ v_result = -0x1.1c4933e117d95p-59 | --
709+ insn wfmadb:
710+ v_arg1 = -0x1.166f0b1fad67bp+64 | -0x1.e9ee8d32e1069p-452
711+ v_arg2 = -0x1.4a235bdd109e2p-65 | 0x1.bacaa96fc7e81p-403
712+ v_arg3 = -0x1.d2e19acf7c4bdp+99 | 0x1.f901130f685adp-963
713+- v_result = -0x1.d2e19acf7c4bcp+99 | 0x0p+0
714++ v_result = -0x1.d2e19acf7c4bcp+99 | --
715+ insn wfmadb:
716+ v_arg1 = -0x1.77d7bfec863d2p-988 | -0x1.b68029700c6b1p-206
717+ v_arg2 = -0x1.aca05ad00aec1p+737 | 0x1.ac746bd7e216bp+51
718+ v_arg3 = 0x1.17342292078b4p+188 | -0x1.49efaf9392301p+555
719+- v_result = 0x1.17342292078b4p+188 | 0x0p+0
720++ v_result = 0x1.17342292078b4p+188 | --
721+ insn vfmsdb:
722+ v_arg1 = -0x1.a1b218e84e61p+34 | 0x1.b220f0d144daep-111
723+ v_arg2 = 0x1.564fcc2527961p-265 | 0x1.ea85a4154721ep+733
724+@@ -1318,22 +1318,22 @@
725+ v_arg1 = -0x1.7499a639673a6p-100 | -0x1.2a0d737e6cb1cp-207
726+ v_arg2 = -0x1.01ad4670a7aa3p-911 | 0x1.f94385e1021e8p+317
727+ v_arg3 = 0x1.aa42b2bb17af9p+982 | 0x1.c550e471711p+786
728+- v_result = -0x1.aa42b2bb17af8p+982 | 0x0p+0
729++ v_result = -0x1.aa42b2bb17af8p+982 | --
730+ insn wfmsdb:
731+ v_arg1 = 0x1.76840f99b431ep+500 | -0x1.989a500c92c08p+594
732+ v_arg2 = 0x1.33c657cb8385cp-84 | -0x1.2c795ad92ce17p+807
733+ v_arg3 = -0x1.ee58a39f02d54p-351 | -0x1.18695ed9a280ap+48
734+- v_result = 0x1.c242894a0068p+416 | 0x0p+0
735++ v_result = 0x1.c242894a0068p+416 | --
736+ insn wfmsdb:
737+ v_arg1 = -0x1.16db07e054a65p-469 | -0x1.3a627ab99c6e4p+689
738+ v_arg2 = 0x1.17872eae826e5p-538 | 0x1.44ed513fb5873p-929
739+ v_arg3 = 0x1.5ca912008e077p-217 | -0x1.982a6f7359876p-23
740+- v_result = -0x1.5ca912008e077p-217 | 0x0p+0
741++ v_result = -0x1.5ca912008e077p-217 | --
742+ insn wfmsdb:
743+ v_arg1 = -0x1.d315f4a932c6p+122 | 0x1.616a04493e143p+513
744+ v_arg2 = -0x1.cf1cd3516f23fp+552 | 0x1.7121749c3932cp-750
745+ v_arg3 = 0x1.dc26d92304d7fp-192 | -0x1.1fc3cca9ec20ep+371
746+- v_result = 0x1.a67ca6ba395bcp+675 | 0x0p+0
747++ v_result = 0x1.a67ca6ba395bcp+675 | --
748+ insn wfcdb:
749+ v_arg1 = 0x1.302001b736011p-633 | -0x1.72d5300225c97p-468
750+ v_arg2 = -0x1.8c007c5aba108p-17 | -0x1.bb3f9ae136acdp+569
751+@@ -1383,19 +1383,19 @@
752+ v_arg1 = 0x1.d8e5c9930c19dp+623 | -0x1.cf1facff4e194p-605
753+ v_arg2 = -0x1.ed6ba02646d0dp+441 | -0x1.2d677e710620bp+810
754+ insn wfcedb:
755+- v_result = 0000000000000000 | 0000000000000000
756++ v_result = 0000000000000000 | --
757+ v_arg1 = -0x1.a252009e1a12cp-442 | 0x1.4dc608268bb29p-513
758+ v_arg2 = -0x1.81020aa1a36e6p-687 | -0x1.300e64ce414f1p-899
759+ insn wfcedb:
760+- v_result = 0000000000000000 | 0000000000000000
761++ v_result = 0000000000000000 | --
762+ v_arg1 = 0x1.cec439a8d4781p-175 | -0x1.d20e3b281d599p+893
763+ v_arg2 = 0x1.ca17cf16cf0aap-879 | 0x1.61506f8596092p+545
764+ insn wfcedb:
765+- v_result = 0000000000000000 | 0000000000000000
766++ v_result = 0000000000000000 | --
767+ v_arg1 = 0x1.0659f5f24a004p+877 | 0x1.fc46867ed0338p-680
768+ v_arg2 = -0x1.1d6849587155ep-1010 | -0x1.f68171edc235fp+575
769+ insn wfcedb:
770+- v_result = 0000000000000000 | 0000000000000000
771++ v_result = 0000000000000000 | --
772+ v_arg1 = 0x1.dc88a0d46ad79p-816 | 0x1.245140dcaed79p+851
773+ v_arg2 = 0x1.b33e977c7b3ep-818 | -0x1.04319d7c69367p+787
774+ insn vfcedbs:
775+@@ -1419,22 +1419,22 @@
776+ v_arg2 = 0x1.ae2c06ea88ff4p+332 | -0x1.f668ce4f8ef9ap+821
777+ r_result = 0000000000000003
778+ insn wfcedbs:
779+- v_result = 0000000000000000 | 0000000000000000
780++ v_result = 0000000000000000 | --
781+ v_arg1 = 0x1.645261bf86b1fp-996 | 0x1.abd13c95397aap+992
782+ v_arg2 = -0x1.ba09e8fc66a8cp+113 | 0x1.75dbfe92c16c4p-786
783+ r_result = 0000000000000003
784+ insn wfcedbs:
785+- v_result = 0000000000000000 | 0000000000000000
786++ v_result = 0000000000000000 | --
787+ v_arg1 = -0x1.d02831d003e7dp+415 | -0x1.611a9dfd10f36p-80
788+ v_arg2 = -0x1.10bda62f4647p+723 | 0x1.cc47af6653378p-614
789+ r_result = 0000000000000003
790+ insn wfcedbs:
791+- v_result = 0000000000000000 | 0000000000000000
792++ v_result = 0000000000000000 | --
793+ v_arg1 = 0x1.f168f32f84178p-321 | -0x1.79a2a0b9549d1p-136
794+ v_arg2 = 0x1.41e19d1cfa692p+11 | -0x1.2a0ed6e7fd517p-453
795+ r_result = 0000000000000003
796+ insn wfcedbs:
797+- v_result = 0000000000000000 | 0000000000000000
798++ v_result = 0000000000000000 | --
799+ v_arg1 = -0x1.76a9144ee26c5p+188 | -0x1.386aaea2d9cddp-542
800+ v_arg2 = 0x1.810fcf222efc4p-999 | -0x1.ce90a9a43e2a1p+80
801+ r_result = 0000000000000003
802+@@ -1455,19 +1455,19 @@
803+ v_arg1 = 0x1.82be31fb88a2dp+946 | -0x1.7ca9e9ff31953p-931
804+ v_arg2 = 0x1.fe75a1052beccp+490 | 0x1.179d18543d678p-255
805+ insn wfchdb:
806+- v_result = ffffffffffffffff | 0000000000000000
807++ v_result = ffffffffffffffff | --
808+ v_arg1 = 0x1.0af85d8d8d609p-464 | -0x1.9f639a686e0fep+203
809+ v_arg2 = -0x1.3142b77b55761p-673 | 0x1.ca9c474339da1p+472
810+ insn wfchdb:
811+- v_result = ffffffffffffffff | 0000000000000000
812++ v_result = ffffffffffffffff | --
813+ v_arg1 = -0x1.6cf16959a022bp+213 | 0x1.445606e4363e1p+942
814+ v_arg2 = -0x1.8c343201bbd2p+939 | -0x1.e5095ad0c37a4p-434
815+ insn wfchdb:
816+- v_result = ffffffffffffffff | 0000000000000000
817++ v_result = ffffffffffffffff | --
818+ v_arg1 = 0x1.36b4fc9cf5bdap-52 | -0x1.f1fd95cbcd533p+540
819+ v_arg2 = 0x1.5a2362891c9edp-175 | -0x1.e1f68c319e5d2p+58
820+ insn wfchdb:
821+- v_result = ffffffffffffffff | 0000000000000000
822++ v_result = ffffffffffffffff | --
823+ v_arg1 = 0x1.11c6489f544bbp+811 | 0x1.262a740ec3d47p+456
824+ v_arg2 = -0x1.d9394d354e989p-154 | 0x1.cc21b3094391ap-972
825+ insn vfchdbs:
826+@@ -1491,22 +1491,22 @@
827+ v_arg2 = 0x1.e426748435a76p+370 | 0x1.8702527d17783p-871
828+ r_result = 0000000000000003
829+ insn wfchdbs:
830+- v_result = ffffffffffffffff | 0000000000000000
831++ v_result = ffffffffffffffff | --
832+ v_arg1 = 0x1.6c51b9f6442c8p+639 | 0x1.1e6b37adff703p+702
833+ v_arg2 = 0x1.0cba9c1c75e43p+520 | -0x1.145d44ed90967p+346
834+ r_result = 0000000000000000
835+ insn wfchdbs:
836+- v_result = ffffffffffffffff | 0000000000000000
837++ v_result = ffffffffffffffff | --
838+ v_arg1 = 0x1.7b3dd643bf36bp+816 | -0x1.61ce7bfb9307ap-683
839+ v_arg2 = -0x1.f2c998dc15c9ap-776 | 0x1.e16397f2dcdf5p+571
840+ r_result = 0000000000000000
841+ insn wfchdbs:
842+- v_result = ffffffffffffffff | 0000000000000000
843++ v_result = ffffffffffffffff | --
844+ v_arg1 = 0x1.cc3be81884e0ap-865 | -0x1.8b353bd41064p+820
845+ v_arg2 = -0x1.2c1bafaafdd4ep-34 | -0x1.24666808ab16ep-435
846+ r_result = 0000000000000000
847+ insn wfchdbs:
848+- v_result = ffffffffffffffff | 0000000000000000
849++ v_result = ffffffffffffffff | --
850+ v_arg1 = 0x1.c3de33d3b673ap+554 | 0x1.d39ed71e53096p-798
851+ v_arg2 = -0x1.c1e8f7b3c001p-828 | 0x1.22e2cf797fabp-787
852+ r_result = 0000000000000000
853+@@ -1527,19 +1527,19 @@
854+ v_arg1 = -0x1.6c5599e7ba923p+829 | -0x1.5d1a1191ed6eap-994
855+ v_arg2 = -0x1.555c8775bc4d2p-478 | -0x1.4aa6a2c82319cp+493
856+ insn wfchedb:
857+- v_result = ffffffffffffffff | 0000000000000000
858++ v_result = ffffffffffffffff | --
859+ v_arg1 = 0x1.ae6cad07b0f3ep-232 | -0x1.2ed61a43f3b99p-74
860+ v_arg2 = -0x1.226f7cddbde13p-902 | -0x1.790d1d6febbf8p+336
861+ insn wfchedb:
862+- v_result = ffffffffffffffff | 0000000000000000
863++ v_result = ffffffffffffffff | --
864+ v_arg1 = 0x1.20eb8eac3711dp-385 | 0x1.ef71d3312d7e1p+739
865+ v_arg2 = 0x1.7a3ba08c5a0bdp-823 | -0x1.a7845ccaa544dp-129
866+ insn wfchedb:
867+- v_result = 0000000000000000 | 0000000000000000
868++ v_result = 0000000000000000 | --
869+ v_arg1 = -0x1.97ebdbc057be8p+824 | 0x1.2b7798b063cd6p+237
870+ v_arg2 = 0x1.cdb87a6074294p-81 | -0x1.074c902b19bccp-416
871+ insn wfchedb:
872+- v_result = 0000000000000000 | 0000000000000000
873++ v_result = 0000000000000000 | --
874+ v_arg1 = -0x1.82deebf9ff023p+937 | 0x1.56c5adcf9d4abp-672
875+ v_arg2 = -0x1.311ce49bc9439p+561 | 0x1.c8e1c512d8544p+103
876+ insn vfchedbs:
877+@@ -1563,22 +1563,22 @@
878+ v_arg2 = -0x1.47f5dfc7a5bcp-569 | 0x1.5877ef33664a3p-758
879+ r_result = 0000000000000003
880+ insn wfchedbs:
881+- v_result = 0000000000000000 | 0000000000000000
882++ v_result = 0000000000000000 | --
883+ v_arg1 = -0x1.a7370ccfd9e49p+505 | 0x1.c6b2385850ca2p-591
884+ v_arg2 = 0x1.984f4fcd338b1p+675 | -0x1.feb996c821232p-39
885+ r_result = 0000000000000003
886+ insn wfchedbs:
887+- v_result = ffffffffffffffff | 0000000000000000
888++ v_result = ffffffffffffffff | --
889+ v_arg1 = 0x1.641878612dd2p+207 | 0x1.b35e3292db7f6p+567
890+ v_arg2 = -0x1.18a87f209e96bp+299 | -0x1.3d598f3612d8ap+1016
891+ r_result = 0000000000000000
892+ insn wfchedbs:
893+- v_result = ffffffffffffffff | 0000000000000000
894++ v_result = ffffffffffffffff | --
895+ v_arg1 = 0x1.cfc2cda244153p+404 | 0x1.d8b2b28e9d8d7p+276
896+ v_arg2 = 0x1.3517b8c7a59a1p-828 | 0x1.6096fab7003ccp-415
897+ r_result = 0000000000000000
898+ insn wfchedbs:
899+- v_result = 0000000000000000 | 0000000000000000
900++ v_result = 0000000000000000 | --
901+ v_arg1 = -0x1.54d656f033e56p-603 | -0x1.95ad0e2088967p+254
902+ v_arg2 = 0x1.4cb319db206e4p-614 | 0x1.b41cd9e3739b6p-862
903+ r_result = 0000000000000003
904+--- a/none/tests/s390x/vector.h
905++++ b/none/tests/s390x/vector.h
906+@@ -86,6 +86,13 @@
907+ printf("%016lx | %016lx\n", value.u64[0], value.u64[1]);
908+ }
909+
910++void print_hex64(const V128 value, int zero_only) {
911++ if (zero_only)
912++ printf("%016lx | --\n", value.u64[0]);
913++ else
914++ printf("%016lx | %016lx\n", value.u64[0], value.u64[1]);
915++}
916++
917+ void print_f32(const V128 value, int even_only, int zero_only) {
918+ if (zero_only)
919+ printf("%a | -- | -- | --\n", value.f32[0]);
920+@@ -222,8 +229,10 @@
921+ {printf(" v_arg2 = "); print_hex(v_arg2);} \
922+ if (info & V128_V_ARG3_AS_INT) \
923+ {printf(" v_arg3 = "); print_hex(v_arg3);} \
924+- if (info & V128_V_RES_AS_INT) \
925+- {printf(" v_result = "); print_hex(v_result);} \
926++ if (info & V128_V_RES_AS_INT) { \
927++ printf(" v_result = "); \
928++ print_hex64(v_result, info & V128_V_RES_ZERO_ONLY); \
929++ } \
930+ \
931+ if (info & V128_V_ARG1_AS_FLOAT64) \
932+ {printf(" v_arg1 = "); print_f64(v_arg1, 0);} \
933+--- a/VEX/priv/guest_s390_defs.h
934++++ b/VEX/priv/guest_s390_defs.h
935+@@ -8,7 +8,7 @@
936+ This file is part of Valgrind, a dynamic binary instrumentation
937+ framework.
938+
939+- Copyright IBM Corp. 2010-2017
940++ Copyright IBM Corp. 2010-2020
941+
942+ This program is free software; you can redistribute it and/or
943+ modify it under the terms of the GNU General Public License as
944+@@ -263,26 +263,27 @@
945+ before S390_VEC_OP_LAST. */
946+ typedef enum {
947+ S390_VEC_OP_INVALID = 0,
948+- S390_VEC_OP_VPKS = 1,
949+- S390_VEC_OP_VPKLS = 2,
950+- S390_VEC_OP_VFAE = 3,
951+- S390_VEC_OP_VFEE = 4,
952+- S390_VEC_OP_VFENE = 5,
953+- S390_VEC_OP_VISTR = 6,
954+- S390_VEC_OP_VSTRC = 7,
955+- S390_VEC_OP_VCEQ = 8,
956+- S390_VEC_OP_VTM = 9,
957+- S390_VEC_OP_VGFM = 10,
958+- S390_VEC_OP_VGFMA = 11,
959+- S390_VEC_OP_VMAH = 12,
960+- S390_VEC_OP_VMALH = 13,
961+- S390_VEC_OP_VCH = 14,
962+- S390_VEC_OP_VCHL = 15,
963+- S390_VEC_OP_VFCE = 16,
964+- S390_VEC_OP_VFCH = 17,
965+- S390_VEC_OP_VFCHE = 18,
966+- S390_VEC_OP_VFTCI = 19,
967+- S390_VEC_OP_LAST = 20 // supposed to be the last element in enum
968++ S390_VEC_OP_VPKS,
969++ S390_VEC_OP_VPKLS,
970++ S390_VEC_OP_VFAE,
971++ S390_VEC_OP_VFEE,
972++ S390_VEC_OP_VFENE,
973++ S390_VEC_OP_VISTR,
974++ S390_VEC_OP_VSTRC,
975++ S390_VEC_OP_VCEQ,
976++ S390_VEC_OP_VTM,
977++ S390_VEC_OP_VGFM,
978++ S390_VEC_OP_VGFMA,
979++ S390_VEC_OP_VMAH,
980++ S390_VEC_OP_VMALH,
981++ S390_VEC_OP_VCH,
982++ S390_VEC_OP_VCHL,
983++ S390_VEC_OP_VFTCI,
984++ S390_VEC_OP_VFMIN,
985++ S390_VEC_OP_VFMAX,
986++ S390_VEC_OP_VBPERM,
987++ S390_VEC_OP_VMSL,
988++ S390_VEC_OP_LAST // supposed to be the last element in enum
989+ } s390x_vec_op_t;
990+
991+ /* Arguments of s390x_dirtyhelper_vec_op(...) which are packed into one
992+--- a/VEX/priv/guest_s390_helpers.c
993++++ b/VEX/priv/guest_s390_helpers.c
994+@@ -8,7 +8,7 @@
995+ This file is part of Valgrind, a dynamic binary instrumentation
996+ framework.
997+
998+- Copyright IBM Corp. 2010-2017
999++ Copyright IBM Corp. 2010-2020
1000+
1001+ This program is free software; you can redistribute it and/or
1002+ modify it under the terms of the GNU General Public License as
1003+@@ -314,20 +314,11 @@
1004+ /*--- Dirty helper for Store Facility instruction ---*/
1005+ /*------------------------------------------------------------*/
1006+ #if defined(VGA_s390x)
1007+-static void
1008+-s390_set_facility_bit(ULong *addr, UInt bitno, UInt value)
1009+-{
1010+- addr += bitno / 64;
1011+- bitno = bitno % 64;
1012+-
1013+- ULong mask = 1;
1014+- mask <<= (63 - bitno);
1015+
1016+- if (value == 1) {
1017+- *addr |= mask; // set
1018+- } else {
1019+- *addr &= ~mask; // clear
1020+- }
1021++static ULong
1022++s390_stfle_range(UInt lo, UInt hi)
1023++{
1024++ return ((1UL << (hi + 1 - lo)) - 1) << (63 - (hi % 64));
1025+ }
1026+
1027+ ULong
1028+@@ -336,6 +327,77 @@
1029+ ULong hoststfle[S390_NUM_FACILITY_DW], cc, num_dw, i;
1030+ register ULong reg0 asm("0") = guest_state->guest_r0 & 0xF; /* r0[56:63] */
1031+
1032++ /* Restrict to facilities that we know about and that we assume to be
1033++ compatible with Valgrind. Of course, in this way we may reject features
1034++ that Valgrind is not really involved in (and thus would be compatible
1035++ with), but quering for such features doesn't seem like a typical use
1036++ case. */
1037++ ULong accepted_facility[S390_NUM_FACILITY_DW] = {
1038++ /* === 0 .. 63 === */
1039++ (s390_stfle_range(0, 16)
1040++ /* 17: message-security-assist, not supported */
1041++ | s390_stfle_range(18, 19)
1042++ /* 20: HFP-multiply-and-add/subtract, not supported */
1043++ | s390_stfle_range(21, 22)
1044++ /* 23: HFP-unnormalized-extension, not supported */
1045++ | s390_stfle_range(24, 25)
1046++ /* 26: parsing-enhancement, not supported */
1047++ | s390_stfle_range(27, 28)
1048++ /* 29: unassigned */
1049++ | s390_stfle_range(30, 30)
1050++ /* 31: extract-CPU-time, not supported */
1051++ | s390_stfle_range(32, 41)
1052++ /* 42-43: DFP, not fully supported */
1053++ /* 44: PFPO, not fully supported */
1054++ | s390_stfle_range(45, 47)
1055++ /* 48: DFP zoned-conversion, not supported */
1056++ /* 49: includes PPA, not supported */
1057++ /* 50: constrained transactional-execution, not supported */
1058++ | s390_stfle_range(51, 55)
1059++ /* 56: unassigned */
1060++ /* 57: MSA5, not supported */
1061++ | s390_stfle_range(58, 60)
1062++ /* 61: miscellaneous-instruction 3, not supported */
1063++ | s390_stfle_range(62, 63)),
1064++
1065++ /* === 64 .. 127 === */
1066++ (s390_stfle_range(64, 72)
1067++ /* 73: transactional-execution, not supported */
1068++ | s390_stfle_range(74, 75)
1069++ /* 76: MSA3, not supported */
1070++ /* 77: MSA4, not supported */
1071++ | s390_stfle_range(78, 78)
1072++ /* 80: DFP packed-conversion, not supported */
1073++ /* 81: PPA-in-order, not supported */
1074++ | s390_stfle_range(82, 82)
1075++ /* 83-127: unassigned */ ),
1076++
1077++ /* === 128 .. 191 === */
1078++ (s390_stfle_range(128, 131)
1079++ /* 132: unassigned */
1080++ /* 133: guarded-storage, not supported */
1081++ /* 134: vector packed decimal, not supported */
1082++ | s390_stfle_range(135, 135)
1083++ /* 136: unassigned */
1084++ /* 137: unassigned */
1085++ | s390_stfle_range(138, 142)
1086++ /* 143: unassigned */
1087++ | s390_stfle_range(144, 145)
1088++ /* 146: MSA8, not supported */
1089++ | s390_stfle_range(147, 147)
1090++ /* 148: vector-enhancements 2, not supported */
1091++ | s390_stfle_range(149, 149)
1092++ /* 150: unassigned */
1093++ /* 151: DEFLATE-conversion, not supported */
1094++ /* 153: unassigned */
1095++ /* 154: unassigned */
1096++ /* 155: MSA9, not supported */
1097++ | s390_stfle_range(156, 156)
1098++ /* 157-167: unassigned */
1099++ | s390_stfle_range(168, 168)
1100++ /* 168-191: unassigned */ ),
1101++ };
1102++
1103+ /* We cannot store more than S390_NUM_FACILITY_DW
1104+ (and it makes not much sense to do so anyhow) */
1105+ if (reg0 > S390_NUM_FACILITY_DW - 1)
1106+@@ -351,35 +413,9 @@
1107+ /* Update guest register 0 with what STFLE set r0 to */
1108+ guest_state->guest_r0 = reg0;
1109+
1110+- /* Set default: VM facilities = host facilities */
1111++ /* VM facilities = host facilities, filtered by acceptance */
1112+ for (i = 0; i < num_dw; ++i)
1113+- addr[i] = hoststfle[i];
1114+-
1115+- /* Now adjust the VM facilities according to what the VM supports */
1116+- s390_set_facility_bit(addr, S390_FAC_LDISP, 1);
1117+- s390_set_facility_bit(addr, S390_FAC_EIMM, 1);
1118+- s390_set_facility_bit(addr, S390_FAC_ETF2, 1);
1119+- s390_set_facility_bit(addr, S390_FAC_ETF3, 1);
1120+- s390_set_facility_bit(addr, S390_FAC_GIE, 1);
1121+- s390_set_facility_bit(addr, S390_FAC_EXEXT, 1);
1122+- s390_set_facility_bit(addr, S390_FAC_HIGHW, 1);
1123+- s390_set_facility_bit(addr, S390_FAC_LSC2, 1);
1124+-
1125+- s390_set_facility_bit(addr, S390_FAC_HFPMAS, 0);
1126+- s390_set_facility_bit(addr, S390_FAC_HFPUNX, 0);
1127+- s390_set_facility_bit(addr, S390_FAC_XCPUT, 0);
1128+- s390_set_facility_bit(addr, S390_FAC_MSA, 0);
1129+- s390_set_facility_bit(addr, S390_FAC_PENH, 0);
1130+- s390_set_facility_bit(addr, S390_FAC_DFP, 0);
1131+- s390_set_facility_bit(addr, S390_FAC_PFPO, 0);
1132+- s390_set_facility_bit(addr, S390_FAC_DFPZC, 0);
1133+- s390_set_facility_bit(addr, S390_FAC_MISC, 0);
1134+- s390_set_facility_bit(addr, S390_FAC_CTREXE, 0);
1135+- s390_set_facility_bit(addr, S390_FAC_TREXE, 0);
1136+- s390_set_facility_bit(addr, S390_FAC_MSA4, 0);
1137+- s390_set_facility_bit(addr, S390_FAC_VXE, 0);
1138+- s390_set_facility_bit(addr, S390_FAC_VXE2, 0);
1139+- s390_set_facility_bit(addr, S390_FAC_DFLT, 0);
1140++ addr[i] = hoststfle[i] & accepted_facility[i];
1141+
1142+ return cc;
1143+ }
1144+@@ -2500,25 +2536,26 @@
1145+ vassert(d->op > S390_VEC_OP_INVALID && d->op < S390_VEC_OP_LAST);
1146+ static const UChar opcodes[][2] = {
1147+ {0x00, 0x00}, /* invalid */
1148+- {0xe7, 0x97}, /* VPKS */
1149+- {0xe7, 0x95}, /* VPKLS */
1150+- {0xe7, 0x82}, /* VFAE */
1151+- {0xe7, 0x80}, /* VFEE */
1152+- {0xe7, 0x81}, /* VFENE */
1153+- {0xe7, 0x5c}, /* VISTR */
1154+- {0xe7, 0x8a}, /* VSTRC */
1155+- {0xe7, 0xf8}, /* VCEQ */
1156+- {0xe7, 0xd8}, /* VTM */
1157+- {0xe7, 0xb4}, /* VGFM */
1158+- {0xe7, 0xbc}, /* VGFMA */
1159+- {0xe7, 0xab}, /* VMAH */
1160+- {0xe7, 0xa9}, /* VMALH */
1161+- {0xe7, 0xfb}, /* VCH */
1162+- {0xe7, 0xf9}, /* VCHL */
1163+- {0xe7, 0xe8}, /* VFCE */
1164+- {0xe7, 0xeb}, /* VFCH */
1165+- {0xe7, 0xea}, /* VFCHE */
1166+- {0xe7, 0x4a} /* VFTCI */
1167++ [S390_VEC_OP_VPKS] = {0xe7, 0x97},
1168++ [S390_VEC_OP_VPKLS] = {0xe7, 0x95},
1169++ [S390_VEC_OP_VFAE] = {0xe7, 0x82},
1170++ [S390_VEC_OP_VFEE] = {0xe7, 0x80},
1171++ [S390_VEC_OP_VFENE] = {0xe7, 0x81},
1172++ [S390_VEC_OP_VISTR] = {0xe7, 0x5c},
1173++ [S390_VEC_OP_VSTRC] = {0xe7, 0x8a},
1174++ [S390_VEC_OP_VCEQ] = {0xe7, 0xf8},
1175++ [S390_VEC_OP_VTM] = {0xe7, 0xd8},
1176++ [S390_VEC_OP_VGFM] = {0xe7, 0xb4},
1177++ [S390_VEC_OP_VGFMA] = {0xe7, 0xbc},
1178++ [S390_VEC_OP_VMAH] = {0xe7, 0xab},
1179++ [S390_VEC_OP_VMALH] = {0xe7, 0xa9},
1180++ [S390_VEC_OP_VCH] = {0xe7, 0xfb},
1181++ [S390_VEC_OP_VCHL] = {0xe7, 0xf9},
1182++ [S390_VEC_OP_VFTCI] = {0xe7, 0x4a},
1183++ [S390_VEC_OP_VFMIN] = {0xe7, 0xee},
1184++ [S390_VEC_OP_VFMAX] = {0xe7, 0xef},
1185++ [S390_VEC_OP_VBPERM]= {0xe7, 0x85},
1186++ [S390_VEC_OP_VMSL] = {0xe7, 0xb8},
1187+ };
1188+
1189+ union {
1190+@@ -2612,6 +2649,7 @@
1191+ case S390_VEC_OP_VGFMA:
1192+ case S390_VEC_OP_VMAH:
1193+ case S390_VEC_OP_VMALH:
1194++ case S390_VEC_OP_VMSL:
1195+ the_insn.VRRd.v1 = 1;
1196+ the_insn.VRRd.v2 = 2;
1197+ the_insn.VRRd.v3 = 3;
1198+@@ -2621,9 +2659,9 @@
1199+ the_insn.VRRd.m6 = d->m5;
1200+ break;
1201+
1202+- case S390_VEC_OP_VFCE:
1203+- case S390_VEC_OP_VFCH:
1204+- case S390_VEC_OP_VFCHE:
1205++ case S390_VEC_OP_VFMIN:
1206++ case S390_VEC_OP_VFMAX:
1207++ case S390_VEC_OP_VBPERM:
1208+ the_insn.VRRc.v1 = 1;
1209+ the_insn.VRRc.v2 = 2;
1210+ the_insn.VRRc.v3 = 3;
1211+--- a/VEX/priv/guest_s390_toIR.c
1212++++ b/VEX/priv/guest_s390_toIR.c
1213+@@ -8,7 +8,7 @@
1214+ This file is part of Valgrind, a dynamic binary instrumentation
1215+ framework.
1216+
1217+- Copyright IBM Corp. 2010-2017
1218++ Copyright IBM Corp. 2010-2020
1219+
1220+ This program is free software; you can redistribute it and/or
1221+ modify it under the terms of the GNU General Public License as
1222+@@ -248,6 +248,13 @@
1223+ #define VRS_d2(insn) (((insn) >> 32) & 0xfff)
1224+ #define VRS_m4(insn) (((insn) >> 28) & 0xf)
1225+ #define VRS_rxb(insn) (((insn) >> 24) & 0xf)
1226++#define VRSd_v1(insn) (((insn) >> 28) & 0xf)
1227++#define VRSd_r3(insn) (((insn) >> 48) & 0xf)
1228++#define VSI_i3(insn) (((insn) >> 48) & 0xff)
1229++#define VSI_b2(insn) (((insn) >> 44) & 0xf)
1230++#define VSI_d2(insn) (((insn) >> 32) & 0xfff)
1231++#define VSI_v1(insn) (((insn) >> 28) & 0xf)
1232++#define VSI_rxb(insn) (((insn) >> 24) & 0xf)
1233+
1234+
1235+ /*------------------------------------------------------------*/
1236+@@ -1937,6 +1944,26 @@
1237+ return results[m];
1238+ }
1239+
1240++/* Determine IRType from instruction's floating-point format field */
1241++static IRType
1242++s390_vr_get_ftype(const UChar m)
1243++{
1244++ static const IRType results[] = {Ity_F32, Ity_F64, Ity_F128};
1245++ if (m >= 2 && m <= 4)
1246++ return results[m - 2];
1247++ return Ity_INVALID;
1248++}
1249++
1250++/* Determine number of elements from instruction's floating-point format
1251++ field */
1252++static UChar
1253++s390_vr_get_n_elem(const UChar m)
1254++{
1255++ if (m >= 2 && m <= 4)
1256++ return 1 << (4 - m);
1257++ return 0;
1258++}
1259++
1260+ /* Determine if Condition Code Set (CS) flag is set in m field */
1261+ #define s390_vr_is_cs_set(m) (((m) & 0x1) != 0)
1262+
1263+@@ -2191,12 +2218,15 @@
1264+ goto invalidIndex;
1265+ }
1266+ return vr_offset(archreg) + sizeof(ULong) * index;
1267++
1268+ case Ity_V128:
1269++ case Ity_F128:
1270+ if(index == 0) {
1271+ return vr_qw_offset(archreg);
1272+ } else {
1273+ goto invalidIndex;
1274+ }
1275++
1276+ default:
1277+ vpanic("s390_vr_offset_by_index: unknown type");
1278+ }
1279+@@ -2214,7 +2244,14 @@
1280+ UInt offset = s390_vr_offset_by_index(archreg, type, index);
1281+ vassert(typeOfIRExpr(irsb->tyenv, expr) == type);
1282+
1283+- stmt(IRStmt_Put(offset, expr));
1284++ if (type == Ity_F128) {
1285++ IRTemp val = newTemp(Ity_F128);
1286++ assign(val, expr);
1287++ stmt(IRStmt_Put(offset, unop(Iop_F128HItoF64, mkexpr(val))));
1288++ stmt(IRStmt_Put(offset + 8, unop(Iop_F128LOtoF64, mkexpr(val))));
1289++ } else {
1290++ stmt(IRStmt_Put(offset, expr));
1291++ }
1292+ }
1293+
1294+ /* Read type sized part specified by index of a vr register. */
1295+@@ -2222,6 +2259,11 @@
1296+ get_vr(UInt archreg, IRType type, UChar index)
1297+ {
1298+ UInt offset = s390_vr_offset_by_index(archreg, type, index);
1299++ if (type == Ity_F128) {
1300++ return binop(Iop_F64HLtoF128,
1301++ IRExpr_Get(offset, Ity_F64),
1302++ IRExpr_Get(offset + 8, Ity_F64));
1303++ }
1304+ return IRExpr_Get(offset, type);
1305+ }
1306+
1307+@@ -2297,11 +2339,11 @@
1308+ return mkexpr(output);
1309+ }
1310+
1311+-/* Load bytes into v1.
1312+- maxIndex specifies max index to load and must be Ity_I32.
1313+- If maxIndex >= 15, all 16 bytes are loaded.
1314+- All bytes after maxIndex are zeroed. */
1315+-static void s390_vr_loadWithLength(UChar v1, IRTemp addr, IRExpr *maxIndex)
1316++/* Starting from addr, load at most maxIndex + 1 bytes into v1. Fill the
1317++ leftmost or rightmost bytes of v1, depending on whether `rightmost' is set.
1318++ If maxIndex >= 15, load all 16 bytes; otherwise clear the remaining bytes. */
1319++static void
1320++s390_vr_loadWithLength(UChar v1, IRTemp addr, IRExpr *maxIndex, Bool rightmost)
1321+ {
1322+ IRTemp maxIdx = newTemp(Ity_I32);
1323+ IRTemp cappedMax = newTemp(Ity_I64);
1324+@@ -2314,8 +2356,8 @@
1325+ crossed if and only if the real insn would have crossed it as well.
1326+ Thus, if the bytes to load are fully contained in an aligned 16-byte
1327+ chunk, load the whole 16-byte aligned chunk, and otherwise load 16 bytes
1328+- from the unaligned address. Then shift the loaded data left-aligned
1329+- into the target vector register. */
1330++ from the unaligned address. Then shift the loaded data left- or
1331++ right-aligned into the target vector register. */
1332+
1333+ assign(maxIdx, maxIndex);
1334+ assign(cappedMax, mkite(binop(Iop_CmpLT32U, mkexpr(maxIdx), mkU32(15)),
1335+@@ -2328,20 +2370,60 @@
1336+ assign(back, mkite(binop(Iop_CmpLE64U, mkexpr(offset), mkexpr(zeroed)),
1337+ mkexpr(offset), mkU64(0)));
1338+
1339+- /* How much to shift the loaded 16-byte vector to the right, and then to
1340+- the left. Since both 'zeroed' and 'back' range from 0 to 15, the shift
1341+- amounts range from 0 to 120. */
1342+- IRExpr *shrAmount = binop(Iop_Shl64,
1343+- binop(Iop_Sub64, mkexpr(zeroed), mkexpr(back)),
1344+- mkU8(3));
1345+- IRExpr *shlAmount = binop(Iop_Shl64, mkexpr(zeroed), mkU8(3));
1346+-
1347+- put_vr_qw(v1, binop(Iop_ShlV128,
1348+- binop(Iop_ShrV128,
1349+- load(Ity_V128,
1350+- binop(Iop_Sub64, mkexpr(addr), mkexpr(back))),
1351+- unop(Iop_64to8, shrAmount)),
1352+- unop(Iop_64to8, shlAmount)));
1353++ IRExpr* chunk = load(Ity_V128, binop(Iop_Sub64, mkexpr(addr), mkexpr(back)));
1354++
1355++ /* Shift the loaded 16-byte vector to the right, then to the left, or vice
1356++ versa, where each shift amount ranges from 0 to 120. */
1357++ IRExpr* shift1;
1358++ IRExpr* shift2 = unop(Iop_64to8, binop(Iop_Shl64, mkexpr(zeroed), mkU8(3)));
1359++
1360++ if (rightmost) {
1361++ shift1 = unop(Iop_64to8, binop(Iop_Shl64, mkexpr(back), mkU8(3)));
1362++ put_vr_qw(v1, binop(Iop_ShrV128,
1363++ binop(Iop_ShlV128, chunk, shift1),
1364++ shift2));
1365++ } else {
1366++ shift1 = unop(Iop_64to8,
1367++ binop(Iop_Shl64,
1368++ binop(Iop_Sub64, mkexpr(zeroed), mkexpr(back)),
1369++ mkU8(3)));
1370++ put_vr_qw(v1, binop(Iop_ShlV128,
1371++ binop(Iop_ShrV128, chunk, shift1),
1372++ shift2));
1373++ }
1374++}
1375++
1376++/* Store at most maxIndex + 1 bytes from v1 to addr. Store the leftmost or
1377++ rightmost bytes of v1, depending on whether `rightmost' is set. If maxIndex
1378++ >= 15, store all 16 bytes. */
1379++static void
1380++s390_vr_storeWithLength(UChar v1, IRTemp addr, IRExpr *maxIndex, Bool rightmost)
1381++{
1382++ IRTemp maxIdx = newTemp(Ity_I32);
1383++ IRTemp cappedMax = newTemp(Ity_I64);
1384++ IRTemp counter = newTemp(Ity_I64);
1385++ IRExpr* offset;
1386++
1387++ assign(maxIdx, maxIndex);
1388++ assign(cappedMax, mkite(binop(Iop_CmpLT32U, mkexpr(maxIdx), mkU32(15)),
1389++ unop(Iop_32Uto64, mkexpr(maxIdx)), mkU64(15)));
1390++
1391++ assign(counter, get_counter_dw0());
1392++
1393++ if (rightmost)
1394++ offset = binop(Iop_Add64,
1395++ binop(Iop_Sub64, mkU64(15), mkexpr(cappedMax)),
1396++ mkexpr(counter));
1397++ else
1398++ offset = mkexpr(counter);
1399++
1400++ store(binop(Iop_Add64, mkexpr(addr), mkexpr(counter)),
1401++ binop(Iop_GetElem8x16, get_vr_qw(v1), unop(Iop_64to8, offset)));
1402++
1403++ /* Check for end of field */
1404++ put_counter_dw0(binop(Iop_Add64, mkexpr(counter), mkU64(1)));
1405++ iterate_if(binop(Iop_CmpNE64, mkexpr(counter), mkexpr(cappedMax)));
1406++ put_counter_dw0(mkU64(0));
1407+ }
1408+
1409+ /* Bitwise vCond ? v1 : v2
1410+@@ -3752,6 +3834,28 @@
1411+ s390_disasm(ENC5(MNM, GPR, UDXB, VR, UINT), mnm, r1, d2, 0, b2, v3, m4);
1412+ }
1413+
1414++static void
1415++s390_format_VRS_RRDV(const HChar *(*irgen)(UChar v1, UChar r3, IRTemp op2addr),
1416++ UChar v1, UChar r3, UChar b2, UShort d2, UChar rxb)
1417++{
1418++ const HChar *mnm;
1419++ IRTemp op2addr = newTemp(Ity_I64);
1420++
1421++ if (! s390_host_has_vx) {
1422++ emulation_failure(EmFail_S390X_vx);
1423++ return;
1424++ }
1425++
1426++ assign(op2addr, binop(Iop_Add64, mkU64(d2), b2 != 0 ? get_gpr_dw0(b2) :
1427++ mkU64(0)));
1428++
1429++ v1 = s390_vr_getVRindex(v1, 4, rxb);
1430++ mnm = irgen(v1, r3, op2addr);
1431++
1432++ if (UNLIKELY(vex_traceflags & VEX_TRACE_FE))
1433++ s390_disasm(ENC4(MNM, VR, GPR, UDXB), mnm, v1, r3, d2, 0, b2);
1434++}
1435++
1436+
1437+ static void
1438+ s390_format_VRS_VRDVM(const HChar *(*irgen)(UChar v1, IRTemp op2addr, UChar v3,
1439+@@ -4084,6 +4188,29 @@
1440+ mnm, v1, v2, v3, m4, m5, m6);
1441+ }
1442+
1443++static void
1444++s390_format_VSI_URDV(const HChar *(*irgen)(UChar v1, IRTemp op2addr, UChar i3),
1445++ UChar v1, UChar b2, UChar d2, UChar i3, UChar rxb)
1446++{
1447++ const HChar *mnm;
1448++ IRTemp op2addr = newTemp(Ity_I64);
1449++
1450++ if (!s390_host_has_vx) {
1451++ emulation_failure(EmFail_S390X_vx);
1452++ return;
1453++ }
1454++
1455++ v1 = s390_vr_getVRindex(v1, 4, rxb);
1456++
1457++ assign(op2addr, binop(Iop_Add64, mkU64(d2), b2 != 0 ? get_gpr_dw0(b2) :
1458++ mkU64(0)));
1459++
1460++ mnm = irgen(v1, op2addr, i3);
1461++
1462++ if (vex_traceflags & VEX_TRACE_FE)
1463++ s390_disasm(ENC4(MNM, VR, UDXB, UINT), mnm, v1, d2, 0, b2, i3);
1464++}
1465++
1466+ /*------------------------------------------------------------*/
1467+ /*--- Build IR for opcodes ---*/
1468+ /*------------------------------------------------------------*/
1469+@@ -16183,7 +16310,9 @@
1470+ static const HChar *
1471+ s390_irgen_VLLEZ(UChar v1, IRTemp op2addr, UChar m3)
1472+ {
1473+- IRType type = s390_vr_get_type(m3);
1474++ s390_insn_assert("vllez", m3 <= 3 || m3 == 6);
1475++
1476++ IRType type = s390_vr_get_type(m3 & 3);
1477+ IRExpr* op2 = load(type, mkexpr(op2addr));
1478+ IRExpr* op2as64bit;
1479+ switch (type) {
1480+@@ -16203,7 +16332,13 @@
1481+ vpanic("s390_irgen_VLLEZ: unknown type");
1482+ }
1483+
1484+- put_vr_dw0(v1, op2as64bit);
1485++ if (m3 == 6) {
1486++ /* left-aligned */
1487++ put_vr_dw0(v1, binop(Iop_Shl64, op2as64bit, mkU8(32)));
1488++ } else {
1489++ /* right-aligned */
1490++ put_vr_dw0(v1, op2as64bit);
1491++ }
1492+ put_vr_dw1(v1, mkU64(0));
1493+ return "vllez";
1494+ }
1495+@@ -16612,7 +16747,7 @@
1496+ s390_getCountToBlockBoundary(addr, m3),
1497+ mkU32(1));
1498+
1499+- s390_vr_loadWithLength(v1, addr, maxIndex);
1500++ s390_vr_loadWithLength(v1, addr, maxIndex, False);
1501+
1502+ return "vlbb";
1503+ }
1504+@@ -16620,42 +16755,51 @@
1505+ static const HChar *
1506+ s390_irgen_VLL(UChar v1, IRTemp addr, UChar r3)
1507+ {
1508+- s390_vr_loadWithLength(v1, addr, get_gpr_w1(r3));
1509++ s390_vr_loadWithLength(v1, addr, get_gpr_w1(r3), False);
1510+
1511+ return "vll";
1512+ }
1513+
1514+ static const HChar *
1515+-s390_irgen_VSTL(UChar v1, IRTemp addr, UChar r3)
1516++s390_irgen_VLRL(UChar v1, IRTemp addr, UChar i3)
1517+ {
1518+- IRTemp counter = newTemp(Ity_I64);
1519+- IRTemp maxIndexToStore = newTemp(Ity_I64);
1520+- IRTemp gpr3 = newTemp(Ity_I64);
1521++ s390_insn_assert("vlrl", (i3 & 0xf0) == 0);
1522++ s390_vr_loadWithLength(v1, addr, mkU32((UInt) i3), True);
1523+
1524+- assign(gpr3, unop(Iop_32Uto64, get_gpr_w1(r3)));
1525+- assign(maxIndexToStore, mkite(binop(Iop_CmpLE64U,
1526+- mkexpr(gpr3),
1527+- mkU64(16)
1528+- ),
1529+- mkexpr(gpr3),
1530+- mkU64(16)
1531+- )
1532+- );
1533+-
1534+- assign(counter, get_counter_dw0());
1535++ return "vlrl";
1536++}
1537+
1538+- store(binop(Iop_Add64, mkexpr(addr), mkexpr(counter)),
1539+- binop(Iop_GetElem8x16, get_vr_qw(v1), unop(Iop_64to8, mkexpr(counter))));
1540++static const HChar *
1541++s390_irgen_VLRLR(UChar v1, UChar r3, IRTemp addr)
1542++{
1543++ s390_vr_loadWithLength(v1, addr, get_gpr_w1(r3), True);
1544+
1545+- /* Check for end of field */
1546+- put_counter_dw0(binop(Iop_Add64, mkexpr(counter), mkU64(1)));
1547+- iterate_if(binop(Iop_CmpNE64, mkexpr(counter), mkexpr(maxIndexToStore)));
1548+- put_counter_dw0(mkU64(0));
1549++ return "vlrlr";
1550++}
1551+
1552++static const HChar *
1553++s390_irgen_VSTL(UChar v1, IRTemp addr, UChar r3)
1554++{
1555++ s390_vr_storeWithLength(v1, addr, get_gpr_w1(r3), False);
1556+ return "vstl";
1557+ }
1558+
1559+ static const HChar *
1560++s390_irgen_VSTRL(UChar v1, IRTemp addr, UChar i3)
1561++{
1562++ s390_insn_assert("vstrl", (i3 & 0xf0) == 0);
1563++ s390_vr_storeWithLength(v1, addr, mkU32((UInt) i3), True);
1564++ return "vstrl";
1565++}
1566++
1567++static const HChar *
1568++s390_irgen_VSTRLR(UChar v1, UChar r3, IRTemp addr)
1569++{
1570++ s390_vr_storeWithLength(v1, addr, get_gpr_w1(r3), True);
1571++ return "vstrlr";
1572++}
1573++
1574++static const HChar *
1575+ s390_irgen_VX(UChar v1, UChar v2, UChar v3)
1576+ {
1577+ put_vr_qw(v1, binop(Iop_XorV128, get_vr_qw(v2), get_vr_qw(v3)));
1578+@@ -16680,6 +16824,24 @@
1579+ }
1580+
1581+ static const HChar *
1582++s390_irgen_VOC(UChar v1, UChar v2, UChar v3)
1583++{
1584++ put_vr_qw(v1, binop(Iop_OrV128, get_vr_qw(v2),
1585++ unop(Iop_NotV128, get_vr_qw(v3))));
1586++
1587++ return "voc";
1588++}
1589++
1590++static const HChar *
1591++s390_irgen_VNN(UChar v1, UChar v2, UChar v3)
1592++{
1593++ put_vr_qw(v1, unop(Iop_NotV128,
1594++ binop(Iop_AndV128, get_vr_qw(v2), get_vr_qw(v3))));
1595++
1596++ return "vnn";
1597++}
1598++
1599++static const HChar *
1600+ s390_irgen_VNO(UChar v1, UChar v2, UChar v3)
1601+ {
1602+ put_vr_qw(v1, unop(Iop_NotV128,
1603+@@ -16689,6 +16851,15 @@
1604+ }
1605+
1606+ static const HChar *
1607++s390_irgen_VNX(UChar v1, UChar v2, UChar v3)
1608++{
1609++ put_vr_qw(v1, unop(Iop_NotV128,
1610++ binop(Iop_XorV128, get_vr_qw(v2), get_vr_qw(v3))));
1611++
1612++ return "vnx";
1613++}
1614++
1615++static const HChar *
1616+ s390_irgen_LZRF(UChar r1, IRTemp op2addr)
1617+ {
1618+ IRTemp op2 = newTemp(Ity_I32);
1619+@@ -17496,9 +17667,19 @@
1620+ static const HChar *
1621+ s390_irgen_VPOPCT(UChar v1, UChar v2, UChar m3)
1622+ {
1623+- vassert(m3 == 0);
1624++ s390_insn_assert("vpopct", m3 <= 3);
1625+
1626+- put_vr_qw(v1, unop(Iop_Cnt8x16, get_vr_qw(v2)));
1627++ IRExpr* cnt = unop(Iop_Cnt8x16, get_vr_qw(v2));
1628++
1629++ if (m3 >= 1) {
1630++ cnt = unop(Iop_PwAddL8Ux16, cnt);
1631++ if (m3 >= 2) {
1632++ cnt = unop(Iop_PwAddL16Ux8, cnt);
1633++ if (m3 == 3)
1634++ cnt = unop(Iop_PwAddL32Ux4, cnt);
1635++ }
1636++ }
1637++ put_vr_qw(v1, cnt);
1638+
1639+ return "vpopct";
1640+ }
1641+@@ -18332,12 +18513,53 @@
1642+ return "vmalh";
1643+ }
1644+
1645++static const HChar *
1646++s390_irgen_VMSL(UChar v1, UChar v2, UChar v3, UChar v4, UChar m5, UChar m6)
1647++{
1648++ s390_insn_assert("vmsl", m5 == 3 && (m6 & 3) == 0);
1649++
1650++ IRDirty* d;
1651++ IRTemp cc = newTemp(Ity_I64);
1652++
1653++ s390x_vec_op_details_t details = { .serialized = 0ULL };
1654++ details.op = S390_VEC_OP_VMSL;
1655++ details.v1 = v1;
1656++ details.v2 = v2;
1657++ details.v3 = v3;
1658++ details.v4 = v4;
1659++ details.m4 = m5;
1660++ details.m5 = m6;
1661++
1662++ d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
1663++ &s390x_dirtyhelper_vec_op,
1664++ mkIRExprVec_2(IRExpr_GSPTR(),
1665++ mkU64(details.serialized)));
1666++
1667++ d->nFxState = 4;
1668++ vex_bzero(&d->fxState, sizeof(d->fxState));
1669++ d->fxState[0].fx = Ifx_Read;
1670++ d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
1671++ d->fxState[0].size = sizeof(V128);
1672++ d->fxState[1].fx = Ifx_Read;
1673++ d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
1674++ d->fxState[1].size = sizeof(V128);
1675++ d->fxState[2].fx = Ifx_Read;
1676++ d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v4 * sizeof(V128);
1677++ d->fxState[2].size = sizeof(V128);
1678++ d->fxState[3].fx = Ifx_Write;
1679++ d->fxState[3].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
1680++ d->fxState[3].size = sizeof(V128);
1681++
1682++ stmt(IRStmt_Dirty(d));
1683++
1684++ return "vmsl";
1685++}
1686++
1687+ static void
1688+-s390_vector_fp_convert(IROp op, IRType fromType, IRType toType,
1689++s390_vector_fp_convert(IROp op, IRType fromType, IRType toType, Bool rounding,
1690+ UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1691+ {
1692+ Bool isSingleElementOp = s390_vr_is_single_element_control_set(m4);
1693+- UChar maxIndex = isSingleElementOp ? 0 : 1;
1694+
1695+ /* For Iop_F32toF64 we do this:
1696+ f32[0] -> f64[0]
1697+@@ -18350,14 +18572,21 @@
1698+ The magic below with scaling factors is used to achieve the logic
1699+ described above.
1700+ */
1701+- const UChar sourceIndexScaleFactor = (op == Iop_F32toF64) ? 2 : 1;
1702+- const UChar destinationIndexScaleFactor = (op == Iop_F64toF32) ? 2 : 1;
1703++ Int size_diff = sizeofIRType(toType) - sizeofIRType(fromType);
1704++ const UChar sourceIndexScaleFactor = size_diff > 0 ? 2 : 1;
1705++ const UChar destinationIndexScaleFactor = size_diff < 0 ? 2 : 1;
1706++ UChar n_elem = (isSingleElementOp ? 1 :
1707++ 16 / (size_diff > 0 ?
1708++ sizeofIRType(toType) : sizeofIRType(fromType)));
1709+
1710+- const Bool isUnary = (op == Iop_F32toF64);
1711+- for (UChar i = 0; i <= maxIndex; i++) {
1712++ for (UChar i = 0; i < n_elem; i++) {
1713+ IRExpr* argument = get_vr(v2, fromType, i * sourceIndexScaleFactor);
1714+ IRExpr* result;
1715+- if (!isUnary) {
1716++ if (rounding) {
1717++ if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1718++ emulation_warning(EmWarn_S390X_fpext_rounding);
1719++ m5 = S390_BFP_ROUND_PER_FPC;
1720++ }
1721+ result = binop(op,
1722+ mkexpr(encode_bfp_rounding_mode(m5)),
1723+ argument);
1724+@@ -18366,10 +18595,6 @@
1725+ }
1726+ put_vr(v1, toType, i * destinationIndexScaleFactor, result);
1727+ }
1728+-
1729+- if (isSingleElementOp) {
1730+- put_vr_dw1(v1, mkU64(0));
1731+- }
1732+ }
1733+
1734+ static const HChar *
1735+@@ -18377,12 +18602,8 @@
1736+ {
1737+ s390_insn_assert("vcdg", m3 == 3);
1738+
1739+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1740+- emulation_warning(EmWarn_S390X_fpext_rounding);
1741+- m5 = S390_BFP_ROUND_PER_FPC;
1742+- }
1743+-
1744+- s390_vector_fp_convert(Iop_I64StoF64, Ity_I64, Ity_F64, v1, v2, m3, m4, m5);
1745++ s390_vector_fp_convert(Iop_I64StoF64, Ity_I64, Ity_F64, True,
1746++ v1, v2, m3, m4, m5);
1747+
1748+ return "vcdg";
1749+ }
1750+@@ -18392,12 +18613,8 @@
1751+ {
1752+ s390_insn_assert("vcdlg", m3 == 3);
1753+
1754+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1755+- emulation_warning(EmWarn_S390X_fpext_rounding);
1756+- m5 = S390_BFP_ROUND_PER_FPC;
1757+- }
1758+-
1759+- s390_vector_fp_convert(Iop_I64UtoF64, Ity_I64, Ity_F64, v1, v2, m3, m4, m5);
1760++ s390_vector_fp_convert(Iop_I64UtoF64, Ity_I64, Ity_F64, True,
1761++ v1, v2, m3, m4, m5);
1762+
1763+ return "vcdlg";
1764+ }
1765+@@ -18407,12 +18624,8 @@
1766+ {
1767+ s390_insn_assert("vcgd", m3 == 3);
1768+
1769+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1770+- emulation_warning(EmWarn_S390X_fpext_rounding);
1771+- m5 = S390_BFP_ROUND_PER_FPC;
1772+- }
1773+-
1774+- s390_vector_fp_convert(Iop_F64toI64S, Ity_F64, Ity_I64, v1, v2, m3, m4, m5);
1775++ s390_vector_fp_convert(Iop_F64toI64S, Ity_F64, Ity_I64, True,
1776++ v1, v2, m3, m4, m5);
1777+
1778+ return "vcgd";
1779+ }
1780+@@ -18422,12 +18635,8 @@
1781+ {
1782+ s390_insn_assert("vclgd", m3 == 3);
1783+
1784+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1785+- emulation_warning(EmWarn_S390X_fpext_rounding);
1786+- m5 = S390_BFP_ROUND_PER_FPC;
1787+- }
1788+-
1789+- s390_vector_fp_convert(Iop_F64toI64U, Ity_F64, Ity_I64, v1, v2, m3, m4, m5);
1790++ s390_vector_fp_convert(Iop_F64toI64U, Ity_F64, Ity_I64, True,
1791++ v1, v2, m3, m4, m5);
1792+
1793+ return "vclgd";
1794+ }
1795+@@ -18435,246 +18644,262 @@
1796+ static const HChar *
1797+ s390_irgen_VFI(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1798+ {
1799+- s390_insn_assert("vfi", m3 == 3);
1800++ s390_insn_assert("vfi",
1801++ (m3 == 3 || (s390_host_has_vxe && m3 >= 2 && m3 <= 4)));
1802+
1803+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1804+- emulation_warning(EmWarn_S390X_fpext_rounding);
1805+- m5 = S390_BFP_ROUND_PER_FPC;
1806++ switch (m3) {
1807++ case 2: s390_vector_fp_convert(Iop_RoundF32toInt, Ity_F32, Ity_F32, True,
1808++ v1, v2, m3, m4, m5); break;
1809++ case 3: s390_vector_fp_convert(Iop_RoundF64toInt, Ity_F64, Ity_F64, True,
1810++ v1, v2, m3, m4, m5); break;
1811++ case 4: s390_vector_fp_convert(Iop_RoundF128toInt, Ity_F128, Ity_F128, True,
1812++ v1, v2, m3, m4, m5); break;
1813+ }
1814+
1815+- s390_vector_fp_convert(Iop_RoundF64toInt, Ity_F64, Ity_F64,
1816+- v1, v2, m3, m4, m5);
1817+-
1818+- return "vcgld";
1819++ return "vfi";
1820+ }
1821+
1822+ static const HChar *
1823+-s390_irgen_VLDE(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1824++s390_irgen_VFLL(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1825+ {
1826+- s390_insn_assert("vlde", m3 == 2);
1827++ s390_insn_assert("vfll", m3 == 2 || (s390_host_has_vxe && m3 == 3));
1828+
1829+- s390_vector_fp_convert(Iop_F32toF64, Ity_F32, Ity_F64, v1, v2, m3, m4, m5);
1830++ if (m3 == 2)
1831++ s390_vector_fp_convert(Iop_F32toF64, Ity_F32, Ity_F64, False,
1832++ v1, v2, m3, m4, m5);
1833++ else
1834++ s390_vector_fp_convert(Iop_F64toF128, Ity_F64, Ity_F128, False,
1835++ v1, v2, m3, m4, m5);
1836+
1837+- return "vlde";
1838++ return "vfll";
1839+ }
1840+
1841+ static const HChar *
1842+-s390_irgen_VLED(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1843++s390_irgen_VFLR(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1844+ {
1845+- s390_insn_assert("vled", m3 == 3);
1846++ s390_insn_assert("vflr", m3 == 3 || (s390_host_has_vxe && m3 == 2));
1847+
1848+- if (!s390_host_has_fpext && m5 != S390_BFP_ROUND_PER_FPC) {
1849+- m5 = S390_BFP_ROUND_PER_FPC;
1850+- }
1851+-
1852+- s390_vector_fp_convert(Iop_F64toF32, Ity_F64, Ity_F32, v1, v2, m3, m4, m5);
1853++ if (m3 == 3)
1854++ s390_vector_fp_convert(Iop_F64toF32, Ity_F64, Ity_F32, True,
1855++ v1, v2, m3, m4, m5);
1856++ else
1857++ s390_vector_fp_convert(Iop_F128toF64, Ity_F128, Ity_F64, True,
1858++ v1, v2, m3, m4, m5);
1859+
1860+- return "vled";
1861++ return "vflr";
1862+ }
1863+
1864+ static const HChar *
1865+ s390_irgen_VFPSO(UChar v1, UChar v2, UChar m3, UChar m4, UChar m5)
1866+ {
1867+- s390_insn_assert("vfpso", m3 == 3);
1868++ s390_insn_assert("vfpso", m5 <= 2 &&
1869++ (m3 == 3 || (s390_host_has_vxe && m3 >= 2 && m3 <= 4)));
1870+
1871+- IRExpr* result;
1872+- switch (m5) {
1873+- case 0: {
1874+- /* Invert sign */
1875+- if (!s390_vr_is_single_element_control_set(m4)) {
1876+- result = unop(Iop_Neg64Fx2, get_vr_qw(v2));
1877+- }
1878+- else {
1879+- result = binop(Iop_64HLtoV128,
1880+- unop(Iop_ReinterpF64asI64,
1881+- unop(Iop_NegF64, get_vr(v2, Ity_F64, 0))),
1882+- mkU64(0));
1883+- }
1884+- break;
1885+- }
1886++ Bool single = s390_vr_is_single_element_control_set(m4) || m3 == 4;
1887++ IRType type = single ? s390_vr_get_ftype(m3) : Ity_V128;
1888++ int idx = 2 * (m3 - 2) + (single ? 0 : 1);
1889++
1890++ static const IROp negate_ops[] = {
1891++ Iop_NegF32, Iop_Neg32Fx4,
1892++ Iop_NegF64, Iop_Neg64Fx2,
1893++ Iop_NegF128
1894++ };
1895++ static const IROp abs_ops[] = {
1896++ Iop_AbsF32, Iop_Abs32Fx4,
1897++ Iop_AbsF64, Iop_Abs64Fx2,
1898++ Iop_AbsF128
1899++ };
1900+
1901+- case 1: {
1902++ if (m5 == 1) {
1903+ /* Set sign to negative */
1904+- IRExpr* highHalf = mkU64(0x8000000000000000ULL);
1905+- if (!s390_vr_is_single_element_control_set(m4)) {
1906+- IRExpr* lowHalf = highHalf;
1907+- IRExpr* mask = binop(Iop_64HLtoV128, highHalf, lowHalf);
1908+- result = binop(Iop_OrV128, get_vr_qw(v2), mask);
1909+- }
1910+- else {
1911+- result = binop(Iop_64HLtoV128,
1912+- binop(Iop_Or64, get_vr_dw0(v2), highHalf),
1913+- mkU64(0ULL));
1914+- }
1915+-
1916+- break;
1917++ put_vr(v1, type, 0,
1918++ unop(negate_ops[idx],
1919++ unop(abs_ops[idx], get_vr(v2, type, 0))));
1920++ } else {
1921++ /* m5 == 0: invert sign; m5 == 2: set sign to positive */
1922++ const IROp *ops = m5 == 2 ? abs_ops : negate_ops;
1923++ put_vr(v1, type, 0, unop(ops[idx], get_vr(v2, type, 0)));
1924+ }
1925+
1926+- case 2: {
1927+- /* Set sign to positive */
1928+- if (!s390_vr_is_single_element_control_set(m4)) {
1929+- result = unop(Iop_Abs64Fx2, get_vr_qw(v2));
1930+- }
1931+- else {
1932+- result = binop(Iop_64HLtoV128,
1933+- unop(Iop_ReinterpF64asI64,
1934+- unop(Iop_AbsF64, get_vr(v2, Ity_F64, 0))),
1935+- mkU64(0));
1936+- }
1937++ return "vfpso";
1938++}
1939+
1940+- break;
1941+- }
1942++static const HChar *
1943++s390x_vec_fp_binary_op(const HChar* mnm, const IROp ops[],
1944++ UChar v1, UChar v2, UChar v3,
1945++ UChar m4, UChar m5)
1946++{
1947++ s390_insn_assert(mnm, (m5 & 7) == 0 &&
1948++ (m4 == 3 || (s390_host_has_vxe && m4 >= 2 && m4 <= 4)));
1949+
1950+- default:
1951+- vpanic("s390_irgen_VFPSO: Invalid m5 value");
1952+- }
1953++ int idx = 2 * (m4 - 2);
1954+
1955+- put_vr_qw(v1, result);
1956+- if (s390_vr_is_single_element_control_set(m4)) {
1957+- put_vr_dw1(v1, mkU64(0ULL));
1958++ if (m4 == 4 || s390_vr_is_single_element_control_set(m5)) {
1959++ IRType type = s390_vr_get_ftype(m4);
1960++ put_vr(v1, type, 0,
1961++ triop(ops[idx], get_bfp_rounding_mode_from_fpc(),
1962++ get_vr(v2, type, 0), get_vr(v3, type, 0)));
1963++ } else {
1964++ put_vr_qw(v1, triop(ops[idx + 1], get_bfp_rounding_mode_from_fpc(),
1965++ get_vr_qw(v2), get_vr_qw(v3)));
1966+ }
1967+
1968+- return "vfpso";
1969++ return mnm;
1970+ }
1971+
1972+-static void s390x_vec_fp_binary_op(IROp generalOp, IROp singleElementOp,
1973+- UChar v1, UChar v2, UChar v3, UChar m4,
1974+- UChar m5)
1975++static const HChar *
1976++s390x_vec_fp_unary_op(const HChar* mnm, const IROp ops[],
1977++ UChar v1, UChar v2, UChar m3, UChar m4)
1978+ {
1979+- IRExpr* result;
1980+- if (!s390_vr_is_single_element_control_set(m5)) {
1981+- result = triop(generalOp, get_bfp_rounding_mode_from_fpc(),
1982+- get_vr_qw(v2), get_vr_qw(v3));
1983+- } else {
1984+- IRExpr* highHalf = triop(singleElementOp,
1985+- get_bfp_rounding_mode_from_fpc(),
1986+- get_vr(v2, Ity_F64, 0),
1987+- get_vr(v3, Ity_F64, 0));
1988+- result = binop(Iop_64HLtoV128, unop(Iop_ReinterpF64asI64, highHalf),
1989+- mkU64(0ULL));
1990+- }
1991++ s390_insn_assert(mnm, (m4 & 7) == 0 &&
1992++ (m3 == 3 || (s390_host_has_vxe && m3 >= 2 && m3 <= 4)));
1993+
1994+- put_vr_qw(v1, result);
1995+-}
1996++ int idx = 2 * (m3 - 2);
1997+
1998+-static void s390x_vec_fp_unary_op(IROp generalOp, IROp singleElementOp,
1999+- UChar v1, UChar v2, UChar m3, UChar m4)
2000+-{
2001+- IRExpr* result;
2002+- if (!s390_vr_is_single_element_control_set(m4)) {
2003+- result = binop(generalOp, get_bfp_rounding_mode_from_fpc(),
2004+- get_vr_qw(v2));
2005++ if (m3 == 4 || s390_vr_is_single_element_control_set(m4)) {
2006++ IRType type = s390_vr_get_ftype(m3);
2007++ put_vr(v1, type, 0,
2008++ binop(ops[idx], get_bfp_rounding_mode_from_fpc(),
2009++ get_vr(v2, type, 0)));
2010+ }
2011+ else {
2012+- IRExpr* highHalf = binop(singleElementOp,
2013+- get_bfp_rounding_mode_from_fpc(),
2014+- get_vr(v2, Ity_F64, 0));
2015+- result = binop(Iop_64HLtoV128, unop(Iop_ReinterpF64asI64, highHalf),
2016+- mkU64(0ULL));
2017++ put_vr_qw(v1, binop(ops[idx + 1], get_bfp_rounding_mode_from_fpc(),
2018++ get_vr_qw(v2)));
2019+ }
2020+
2021+- put_vr_qw(v1, result);
2022++ return mnm;
2023+ }
2024+
2025+
2026+-static void
2027+-s390_vector_fp_mulAddOrSub(IROp singleElementOp,
2028+- UChar v1, UChar v2, UChar v3, UChar v4,
2029+- UChar m5, UChar m6)
2030++static const HChar *
2031++s390_vector_fp_mulAddOrSub(UChar v1, UChar v2, UChar v3, UChar v4,
2032++ UChar m5, UChar m6,
2033++ const HChar* mnm, const IROp single_ops[],
2034++ Bool negate)
2035+ {
2036+- Bool isSingleElementOp = s390_vr_is_single_element_control_set(m5);
2037++ s390_insn_assert(mnm, m6 == 3 || (s390_host_has_vxe && m6 >= 2 && m6 <= 4));
2038++
2039++ static const IROp negate_ops[] = { Iop_NegF32, Iop_NegF64, Iop_NegF128 };
2040++ IRType type = s390_vr_get_ftype(m6);
2041++ Bool single = s390_vr_is_single_element_control_set(m5) || m6 == 4;
2042++ UChar n_elem = single ? 1 : s390_vr_get_n_elem(m6);
2043+ IRTemp irrm_temp = newTemp(Ity_I32);
2044+ assign(irrm_temp, get_bfp_rounding_mode_from_fpc());
2045+ IRExpr* irrm = mkexpr(irrm_temp);
2046+- IRExpr* result;
2047+- IRExpr* highHalf = qop(singleElementOp,
2048+- irrm,
2049+- get_vr(v2, Ity_F64, 0),
2050+- get_vr(v3, Ity_F64, 0),
2051+- get_vr(v4, Ity_F64, 0));
2052+-
2053+- if (isSingleElementOp) {
2054+- result = binop(Iop_64HLtoV128, unop(Iop_ReinterpF64asI64, highHalf),
2055+- mkU64(0ULL));
2056+- } else {
2057+- IRExpr* lowHalf = qop(singleElementOp,
2058+- irrm,
2059+- get_vr(v2, Ity_F64, 1),
2060+- get_vr(v3, Ity_F64, 1),
2061+- get_vr(v4, Ity_F64, 1));
2062+- result = binop(Iop_64HLtoV128, unop(Iop_ReinterpF64asI64, highHalf),
2063+- unop(Iop_ReinterpF64asI64, lowHalf));
2064+- }
2065+
2066+- put_vr_qw(v1, result);
2067++ for (UChar idx = 0; idx < n_elem; idx++) {
2068++ IRExpr* result = qop(single_ops[m6 - 2],
2069++ irrm,
2070++ get_vr(v2, type, idx),
2071++ get_vr(v3, type, idx),
2072++ get_vr(v4, type, idx));
2073++ put_vr(v1, type, idx, negate ? unop(negate_ops[m6 - 2], result) : result);
2074++ }
2075++ return mnm;
2076+ }
2077+
2078+ static const HChar *
2079+ s390_irgen_VFA(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5)
2080+ {
2081+- s390_insn_assert("vfa", m4 == 3);
2082+- s390x_vec_fp_binary_op(Iop_Add64Fx2, Iop_AddF64, v1, v2, v3, m4, m5);
2083+- return "vfa";
2084++ static const IROp vfa_ops[] = {
2085++ Iop_AddF32, Iop_Add32Fx4,
2086++ Iop_AddF64, Iop_Add64Fx2,
2087++ Iop_AddF128,
2088++ };
2089++ return s390x_vec_fp_binary_op("vfa", vfa_ops, v1, v2, v3, m4, m5);
2090+ }
2091+
2092+ static const HChar *
2093+ s390_irgen_VFS(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5)
2094+ {
2095+- s390_insn_assert("vfs", m4 == 3);
2096+- s390x_vec_fp_binary_op(Iop_Sub64Fx2, Iop_SubF64, v1, v2, v3, m4, m5);
2097+- return "vfs";
2098++ static const IROp vfs_ops[] = {
2099++ Iop_SubF32, Iop_Sub32Fx4,
2100++ Iop_SubF64, Iop_Sub64Fx2,
2101++ Iop_SubF128,
2102++ };
2103++ return s390x_vec_fp_binary_op("vfs", vfs_ops, v1, v2, v3, m4, m5);
2104+ }
2105+
2106+ static const HChar *
2107+ s390_irgen_VFM(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5)
2108+ {
2109+- s390_insn_assert("vfm", m4 == 3);
2110+- s390x_vec_fp_binary_op(Iop_Mul64Fx2, Iop_MulF64, v1, v2, v3, m4, m5);
2111+- return "vfm";
2112++ static const IROp vfm_ops[] = {
2113++ Iop_MulF32, Iop_Mul32Fx4,
2114++ Iop_MulF64, Iop_Mul64Fx2,
2115++ Iop_MulF128,
2116++ };
2117++ return s390x_vec_fp_binary_op("vfm", vfm_ops, v1, v2, v3, m4, m5);
2118+ }
2119+
2120+ static const HChar *
2121+ s390_irgen_VFD(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5)
2122+ {
2123+- s390_insn_assert("vfd", m4 == 3);
2124+- s390x_vec_fp_binary_op(Iop_Div64Fx2, Iop_DivF64, v1, v2, v3, m4, m5);
2125+- return "vfd";
2126++ static const IROp vfd_ops[] = {
2127++ Iop_DivF32, Iop_Div32Fx4,
2128++ Iop_DivF64, Iop_Div64Fx2,
2129++ Iop_DivF128,
2130++ };
2131++ return s390x_vec_fp_binary_op("vfd", vfd_ops, v1, v2, v3, m4, m5);
2132+ }
2133+
2134+ static const HChar *
2135+ s390_irgen_VFSQ(UChar v1, UChar v2, UChar m3, UChar m4)
2136+ {
2137+- s390_insn_assert("vfsq", m3 == 3);
2138+- s390x_vec_fp_unary_op(Iop_Sqrt64Fx2, Iop_SqrtF64, v1, v2, m3, m4);
2139+-
2140+- return "vfsq";
2141++ static const IROp vfsq_ops[] = {
2142++ Iop_SqrtF32, Iop_Sqrt32Fx4,
2143++ Iop_SqrtF64, Iop_Sqrt64Fx2,
2144++ Iop_SqrtF128
2145++ };
2146++ return s390x_vec_fp_unary_op("vfsq", vfsq_ops, v1, v2, m3, m4);
2147+ }
2148+
2149++static const IROp FMA_single_ops[] = {
2150++ Iop_MAddF32, Iop_MAddF64, Iop_MAddF128
2151++};
2152++
2153+ static const HChar *
2154+ s390_irgen_VFMA(UChar v1, UChar v2, UChar v3, UChar v4, UChar m5, UChar m6)
2155+ {
2156+- s390_insn_assert("vfma", m6 == 3);
2157+- s390_vector_fp_mulAddOrSub(Iop_MAddF64, v1, v2, v3, v4, m5, m6);
2158+- return "vfma";
2159++ return s390_vector_fp_mulAddOrSub(v1, v2, v3, v4, m5, m6,
2160++ "vfma", FMA_single_ops, False);
2161+ }
2162+
2163+ static const HChar *
2164++s390_irgen_VFNMA(UChar v1, UChar v2, UChar v3, UChar v4, UChar m5, UChar m6)
2165++{
2166++ return s390_vector_fp_mulAddOrSub(v1, v2, v3, v4, m5, m6,
2167++ "vfnma", FMA_single_ops, True);
2168++}
2169++
2170++static const IROp FMS_single_ops[] = {
2171++ Iop_MSubF32, Iop_MSubF64, Iop_MSubF128
2172++};
2173++
2174++static const HChar *
2175+ s390_irgen_VFMS(UChar v1, UChar v2, UChar v3, UChar v4, UChar m5, UChar m6)
2176+ {
2177+- s390_insn_assert("vfms", m6 == 3);
2178+- s390_vector_fp_mulAddOrSub(Iop_MSubF64, v1, v2, v3, v4, m5, m6);
2179+- return "vfms";
2180++ return s390_vector_fp_mulAddOrSub(v1, v2, v3, v4, m5, m6,
2181++ "vfms", FMS_single_ops, False);
2182++}
2183++
2184++static const HChar *
2185++s390_irgen_VFNMS(UChar v1, UChar v2, UChar v3, UChar v4, UChar m5, UChar m6)
2186++{
2187++ return s390_vector_fp_mulAddOrSub(v1, v2, v3, v4, m5, m6,
2188++ "vfnms", FMS_single_ops, True);
2189+ }
2190+
2191+ static const HChar *
2192+ s390_irgen_WFC(UChar v1, UChar v2, UChar m3, UChar m4)
2193+ {
2194+- s390_insn_assert("wfc", m3 == 3);
2195+- s390_insn_assert("wfc", m4 == 0);
2196++ s390_insn_assert("wfc", m4 == 0 &&
2197++ (m3 == 3 || (s390_host_has_vxe && m3 >= 2 && m3 <= 4)));
2198++
2199++ static const IROp ops[] = { Iop_CmpF32, Iop_CmpF64, Iop_CmpF128 };
2200++ IRType type = s390_vr_get_ftype(m3);
2201+
2202+ IRTemp cc_vex = newTemp(Ity_I32);
2203+- assign(cc_vex, binop(Iop_CmpF64,
2204+- get_vr(v1, Ity_F64, 0), get_vr(v2, Ity_F64, 0)));
2205++ assign(cc_vex, binop(ops[m3 - 2], get_vr(v1, type, 0), get_vr(v2, type, 0)));
2206+
2207+ IRTemp cc_s390 = newTemp(Ity_I32);
2208+ assign(cc_s390, convert_vex_bfpcc_to_s390(cc_vex));
2209+@@ -18692,213 +18917,253 @@
2210+ }
2211+
2212+ static const HChar *
2213+-s390_irgen_VFCE(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2214+-{
2215+- s390_insn_assert("vfce", m4 == 3);
2216++s390_irgen_VFCx(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6,
2217++ const HChar *mnem, IRCmpFResult cmp, Bool equal_ok,
2218++ IROp cmp32, IROp cmp64)
2219++{
2220++ s390_insn_assert(mnem, (m5 & 3) == 0 && (m6 & 14) == 0 &&
2221++ (m4 == 3 || (s390_host_has_vxe && m4 >= 2 && m4 <= 4)));
2222++
2223++ Bool single = s390_vr_is_single_element_control_set(m5) || m4 == 4;
2224++
2225++ if (single) {
2226++ static const IROp ops[] = { Iop_CmpF32, Iop_CmpF64, Iop_CmpF128 };
2227++ IRType type = s390_vr_get_ftype(m4);
2228++ IRTemp result = newTemp(Ity_I32);
2229++ IRTemp cond = newTemp(Ity_I1);
2230+
2231+- Bool isSingleElementOp = s390_vr_is_single_element_control_set(m5);
2232+- if (!s390_vr_is_cs_set(m6)) {
2233+- if (!isSingleElementOp) {
2234+- put_vr_qw(v1, binop(Iop_CmpEQ64Fx2, get_vr_qw(v2), get_vr_qw(v3)));
2235++ assign(result, binop(ops[m4 - 2],
2236++ get_vr(v2, type, 0), get_vr(v3, type, 0)));
2237++ if (equal_ok) {
2238++ assign(cond,
2239++ binop(Iop_Or1,
2240++ binop(Iop_CmpEQ32, mkexpr(result), mkU32(cmp)),
2241++ binop(Iop_CmpEQ32, mkexpr(result), mkU32(Ircr_EQ))));
2242+ } else {
2243+- IRExpr* comparisonResult = binop(Iop_CmpF64, get_vr(v2, Ity_F64, 0),
2244+- get_vr(v3, Ity_F64, 0));
2245+- IRExpr* result = mkite(binop(Iop_CmpEQ32, comparisonResult,
2246+- mkU32(Ircr_EQ)),
2247+- mkU64(0xffffffffffffffffULL),
2248+- mkU64(0ULL));
2249+- put_vr_qw(v1, binop(Iop_64HLtoV128, result, mkU64(0ULL)));
2250++ assign(cond, binop(Iop_CmpEQ32, mkexpr(result), mkU32(cmp)));
2251++ }
2252++ put_vr_qw(v1, mkite(mkexpr(cond),
2253++ IRExpr_Const(IRConst_V128(0xffff)),
2254++ IRExpr_Const(IRConst_V128(0))));
2255++ if (s390_vr_is_cs_set(m6)) {
2256++ IRTemp cc = newTemp(Ity_I64);
2257++ assign(cc, mkite(mkexpr(cond), mkU64(0), mkU64(3)));
2258++ s390_cc_set(cc);
2259+ }
2260+ } else {
2261+- IRDirty* d;
2262+- IRTemp cc = newTemp(Ity_I64);
2263++ IRTemp result = newTemp(Ity_V128);
2264+
2265+- s390x_vec_op_details_t details = { .serialized = 0ULL };
2266+- details.op = S390_VEC_OP_VFCE;
2267+- details.v1 = v1;
2268+- details.v2 = v2;
2269+- details.v3 = v3;
2270+- details.m4 = m4;
2271+- details.m5 = m5;
2272+- details.m6 = m6;
2273++ assign(result, binop(m4 == 2 ? cmp32 : cmp64,
2274++ get_vr_qw(v2), get_vr_qw(v3)));
2275++ put_vr_qw(v1, mkexpr(result));
2276++ if (s390_vr_is_cs_set(m6)) {
2277++ IRTemp cc = newTemp(Ity_I64);
2278++ assign(cc,
2279++ mkite(binop(Iop_CmpEQ64,
2280++ binop(Iop_And64,
2281++ unop(Iop_V128to64, mkexpr(result)),
2282++ unop(Iop_V128HIto64, mkexpr(result))),
2283++ mkU64(-1ULL)),
2284++ mkU64(0), /* all comparison results are true */
2285++ mkite(binop(Iop_CmpEQ64,
2286++ binop(Iop_Or64,
2287++ unop(Iop_V128to64, mkexpr(result)),
2288++ unop(Iop_V128HIto64, mkexpr(result))),
2289++ mkU64(0)),
2290++ mkU64(3), /* all false */
2291++ mkU64(1)))); /* mixed true/false */
2292++ s390_cc_set(cc);
2293++ }
2294++ }
2295+
2296+- d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2297+- &s390x_dirtyhelper_vec_op,
2298+- mkIRExprVec_2(IRExpr_GSPTR(),
2299+- mkU64(details.serialized)));
2300++ return mnem;
2301++}
2302+
2303+- const UChar elementSize = isSingleElementOp ? sizeof(ULong) : sizeof(V128);
2304+- d->nFxState = 3;
2305+- vex_bzero(&d->fxState, sizeof(d->fxState));
2306+- d->fxState[0].fx = Ifx_Read;
2307+- d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2308+- d->fxState[0].size = elementSize;
2309+- d->fxState[1].fx = Ifx_Read;
2310+- d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2311+- d->fxState[1].size = elementSize;
2312+- d->fxState[2].fx = Ifx_Write;
2313+- d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2314+- d->fxState[2].size = sizeof(V128);
2315++static const HChar *
2316++s390_irgen_VFCE(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2317++{
2318++ return s390_irgen_VFCx(v1, v2, v3, m4, m5, m6, "vfce", Ircr_EQ,
2319++ False, Iop_CmpEQ32Fx4, Iop_CmpEQ64Fx2);
2320++}
2321+
2322+- stmt(IRStmt_Dirty(d));
2323+- s390_cc_set(cc);
2324+- }
2325++static const HChar *
2326++s390_irgen_VFCH(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2327++{
2328++ /* Swap arguments and compare "low" instead. */
2329++ return s390_irgen_VFCx(v1, v3, v2, m4, m5, m6, "vfch", Ircr_LT,
2330++ False, Iop_CmpLT32Fx4, Iop_CmpLT64Fx2);
2331++}
2332+
2333+- return "vfce";
2334++static const HChar *
2335++s390_irgen_VFCHE(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2336++{
2337++ /* Swap arguments and compare "low or equal" instead. */
2338++ return s390_irgen_VFCx(v1, v3, v2, m4, m5, m6, "vfche", Ircr_LT,
2339++ True, Iop_CmpLE32Fx4, Iop_CmpLE64Fx2);
2340+ }
2341+
2342+ static const HChar *
2343+-s390_irgen_VFCH(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2344++s390_irgen_VFTCI(UChar v1, UChar v2, UShort i3, UChar m4, UChar m5)
2345+ {
2346+- vassert(m4 == 3);
2347++ s390_insn_assert("vftci",
2348++ (m4 == 3 || (s390_host_has_vxe && m4 >= 2 && m4 <= 4)));
2349+
2350+ Bool isSingleElementOp = s390_vr_is_single_element_control_set(m5);
2351+- if (!s390_vr_is_cs_set(m6)) {
2352+- if (!isSingleElementOp) {
2353+- put_vr_qw(v1, binop(Iop_CmpLE64Fx2, get_vr_qw(v3), get_vr_qw(v2)));
2354+- } else {
2355+- IRExpr* comparisonResult = binop(Iop_CmpF64, get_vr(v2, Ity_F64, 0),
2356+- get_vr(v3, Ity_F64, 0));
2357+- IRExpr* result = mkite(binop(Iop_CmpEQ32, comparisonResult,
2358+- mkU32(Ircr_GT)),
2359+- mkU64(0xffffffffffffffffULL),
2360+- mkU64(0ULL));
2361+- put_vr_qw(v1, binop(Iop_64HLtoV128, result, mkU64(0ULL)));
2362+- }
2363+- }
2364+- else {
2365+- IRDirty* d;
2366+- IRTemp cc = newTemp(Ity_I64);
2367+
2368+- s390x_vec_op_details_t details = { .serialized = 0ULL };
2369+- details.op = S390_VEC_OP_VFCH;
2370+- details.v1 = v1;
2371+- details.v2 = v2;
2372+- details.v3 = v3;
2373+- details.m4 = m4;
2374+- details.m5 = m5;
2375+- details.m6 = m6;
2376++ IRDirty* d;
2377++ IRTemp cc = newTemp(Ity_I64);
2378+
2379+- d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2380+- &s390x_dirtyhelper_vec_op,
2381+- mkIRExprVec_2(IRExpr_GSPTR(),
2382+- mkU64(details.serialized)));
2383++ s390x_vec_op_details_t details = { .serialized = 0ULL };
2384++ details.op = S390_VEC_OP_VFTCI;
2385++ details.v1 = v1;
2386++ details.v2 = v2;
2387++ details.i3 = i3;
2388++ details.m4 = m4;
2389++ details.m5 = m5;
2390+
2391+- const UChar elementSize = isSingleElementOp ? sizeof(ULong) : sizeof(V128);
2392+- d->nFxState = 3;
2393+- vex_bzero(&d->fxState, sizeof(d->fxState));
2394+- d->fxState[0].fx = Ifx_Read;
2395+- d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2396+- d->fxState[0].size = elementSize;
2397+- d->fxState[1].fx = Ifx_Read;
2398+- d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2399+- d->fxState[1].size = elementSize;
2400+- d->fxState[2].fx = Ifx_Write;
2401+- d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2402+- d->fxState[2].size = sizeof(V128);
2403++ d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2404++ &s390x_dirtyhelper_vec_op,
2405++ mkIRExprVec_2(IRExpr_GSPTR(),
2406++ mkU64(details.serialized)));
2407+
2408+- stmt(IRStmt_Dirty(d));
2409+- s390_cc_set(cc);
2410+- }
2411++ const UChar elementSize = isSingleElementOp ?
2412++ sizeofIRType(s390_vr_get_ftype(m4)) : sizeof(V128);
2413++ d->nFxState = 2;
2414++ vex_bzero(&d->fxState, sizeof(d->fxState));
2415++ d->fxState[0].fx = Ifx_Read;
2416++ d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2417++ d->fxState[0].size = elementSize;
2418++ d->fxState[1].fx = Ifx_Write;
2419++ d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2420++ d->fxState[1].size = sizeof(V128);
2421++
2422++ stmt(IRStmt_Dirty(d));
2423++ s390_cc_set(cc);
2424+
2425+- return "vfch";
2426++ return "vftci";
2427+ }
2428+
2429+ static const HChar *
2430+-s390_irgen_VFCHE(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2431++s390_irgen_VFMIN(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2432+ {
2433+- s390_insn_assert("vfche", m4 == 3);
2434++ s390_insn_assert("vfmin",
2435++ (m4 == 3 || (s390_host_has_vxe && m4 >= 2 && m4 <= 4)));
2436+
2437+ Bool isSingleElementOp = s390_vr_is_single_element_control_set(m5);
2438+- if (!s390_vr_is_cs_set(m6)) {
2439+- if (!isSingleElementOp) {
2440+- put_vr_qw(v1, binop(Iop_CmpLT64Fx2, get_vr_qw(v3), get_vr_qw(v2)));
2441+- }
2442+- else {
2443+- IRExpr* comparisonResult = binop(Iop_CmpF64, get_vr(v3, Ity_F64, 0),
2444+- get_vr(v2, Ity_F64, 0));
2445+- IRExpr* result = mkite(binop(Iop_CmpEQ32, comparisonResult,
2446+- mkU32(Ircr_LT)),
2447+- mkU64(0xffffffffffffffffULL),
2448+- mkU64(0ULL));
2449+- put_vr_qw(v1, binop(Iop_64HLtoV128, result, mkU64(0ULL)));
2450+- }
2451+- }
2452+- else {
2453+- IRDirty* d;
2454+- IRTemp cc = newTemp(Ity_I64);
2455+-
2456+- s390x_vec_op_details_t details = { .serialized = 0ULL };
2457+- details.op = S390_VEC_OP_VFCHE;
2458+- details.v1 = v1;
2459+- details.v2 = v2;
2460+- details.v3 = v3;
2461+- details.m4 = m4;
2462+- details.m5 = m5;
2463+- details.m6 = m6;
2464++ IRDirty* d;
2465++ IRTemp cc = newTemp(Ity_I64);
2466+
2467+- d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2468+- &s390x_dirtyhelper_vec_op,
2469+- mkIRExprVec_2(IRExpr_GSPTR(),
2470+- mkU64(details.serialized)));
2471++ s390x_vec_op_details_t details = { .serialized = 0ULL };
2472++ details.op = S390_VEC_OP_VFMIN;
2473++ details.v1 = v1;
2474++ details.v2 = v2;
2475++ details.v3 = v3;
2476++ details.m4 = m4;
2477++ details.m5 = m5;
2478++ details.m6 = m6;
2479+
2480+- const UChar elementSize = isSingleElementOp ? sizeof(ULong) : sizeof(V128);
2481+- d->nFxState = 3;
2482+- vex_bzero(&d->fxState, sizeof(d->fxState));
2483+- d->fxState[0].fx = Ifx_Read;
2484+- d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2485+- d->fxState[0].size = elementSize;
2486+- d->fxState[1].fx = Ifx_Read;
2487+- d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2488+- d->fxState[1].size = elementSize;
2489+- d->fxState[2].fx = Ifx_Write;
2490+- d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2491+- d->fxState[2].size = sizeof(V128);
2492++ d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2493++ &s390x_dirtyhelper_vec_op,
2494++ mkIRExprVec_2(IRExpr_GSPTR(),
2495++ mkU64(details.serialized)));
2496+
2497+- stmt(IRStmt_Dirty(d));
2498+- s390_cc_set(cc);
2499+- }
2500++ const UChar elementSize = isSingleElementOp ?
2501++ sizeofIRType(s390_vr_get_ftype(m4)) : sizeof(V128);
2502++ d->nFxState = 3;
2503++ vex_bzero(&d->fxState, sizeof(d->fxState));
2504++ d->fxState[0].fx = Ifx_Read;
2505++ d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2506++ d->fxState[0].size = elementSize;
2507++ d->fxState[1].fx = Ifx_Read;
2508++ d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2509++ d->fxState[1].size = elementSize;
2510++ d->fxState[2].fx = Ifx_Write;
2511++ d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2512++ d->fxState[2].size = sizeof(V128);
2513+
2514+- return "vfche";
2515++ stmt(IRStmt_Dirty(d));
2516++ s390_cc_set(cc);
2517++ return "vfmin";
2518+ }
2519+
2520+ static const HChar *
2521+-s390_irgen_VFTCI(UChar v1, UChar v2, UShort i3, UChar m4, UChar m5)
2522++s390_irgen_VFMAX(UChar v1, UChar v2, UChar v3, UChar m4, UChar m5, UChar m6)
2523+ {
2524+- s390_insn_assert("vftci", m4 == 3);
2525++ s390_insn_assert("vfmax",
2526++ (m4 == 3 || (s390_host_has_vxe && m4 >= 2 && m4 <= 4)));
2527+
2528+ Bool isSingleElementOp = s390_vr_is_single_element_control_set(m5);
2529+-
2530+ IRDirty* d;
2531+ IRTemp cc = newTemp(Ity_I64);
2532+
2533+ s390x_vec_op_details_t details = { .serialized = 0ULL };
2534+- details.op = S390_VEC_OP_VFTCI;
2535++ details.op = S390_VEC_OP_VFMAX;
2536+ details.v1 = v1;
2537+ details.v2 = v2;
2538+- details.i3 = i3;
2539++ details.v3 = v3;
2540+ details.m4 = m4;
2541+ details.m5 = m5;
2542++ details.m6 = m6;
2543+
2544+ d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2545+ &s390x_dirtyhelper_vec_op,
2546+ mkIRExprVec_2(IRExpr_GSPTR(),
2547+ mkU64(details.serialized)));
2548+
2549+- const UChar elementSize = isSingleElementOp ? sizeof(ULong) : sizeof(V128);
2550+- d->nFxState = 2;
2551++ const UChar elementSize = isSingleElementOp ?
2552++ sizeofIRType(s390_vr_get_ftype(m4)) : sizeof(V128);
2553++ d->nFxState = 3;
2554+ vex_bzero(&d->fxState, sizeof(d->fxState));
2555+ d->fxState[0].fx = Ifx_Read;
2556+ d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2557+ d->fxState[0].size = elementSize;
2558+- d->fxState[1].fx = Ifx_Write;
2559+- d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2560+- d->fxState[1].size = sizeof(V128);
2561++ d->fxState[1].fx = Ifx_Read;
2562++ d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2563++ d->fxState[1].size = elementSize;
2564++ d->fxState[2].fx = Ifx_Write;
2565++ d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2566++ d->fxState[2].size = sizeof(V128);
2567+
2568+ stmt(IRStmt_Dirty(d));
2569+ s390_cc_set(cc);
2570++ return "vfmax";
2571++}
2572+
2573+- return "vftci";
2574++static const HChar *
2575++s390_irgen_VBPERM(UChar v1, UChar v2, UChar v3)
2576++{
2577++ IRDirty* d;
2578++ IRTemp cc = newTemp(Ity_I64);
2579++
2580++ s390x_vec_op_details_t details = { .serialized = 0ULL };
2581++ details.op = S390_VEC_OP_VBPERM;
2582++ details.v1 = v1;
2583++ details.v2 = v2;
2584++ details.v3 = v3;
2585++ details.m4 = 0;
2586++ details.m5 = 0;
2587++ details.m6 = 0;
2588++
2589++ d = unsafeIRDirty_1_N(cc, 0, "s390x_dirtyhelper_vec_op",
2590++ &s390x_dirtyhelper_vec_op,
2591++ mkIRExprVec_2(IRExpr_GSPTR(),
2592++ mkU64(details.serialized)));
2593++
2594++ d->nFxState = 3;
2595++ vex_bzero(&d->fxState, sizeof(d->fxState));
2596++ d->fxState[0].fx = Ifx_Read;
2597++ d->fxState[0].offset = S390X_GUEST_OFFSET(guest_v0) + v2 * sizeof(V128);
2598++ d->fxState[0].size = sizeof(V128);
2599++ d->fxState[1].fx = Ifx_Read;
2600++ d->fxState[1].offset = S390X_GUEST_OFFSET(guest_v0) + v3 * sizeof(V128);
2601++ d->fxState[1].size = sizeof(V128);
2602++ d->fxState[2].fx = Ifx_Write;
2603++ d->fxState[2].offset = S390X_GUEST_OFFSET(guest_v0) + v1 * sizeof(V128);
2604++ d->fxState[2].size = sizeof(V128);
2605++
2606++ stmt(IRStmt_Dirty(d));
2607++ s390_cc_set(cc);
2608++ return "vbperm";
2609+ }
2610+
2611+ /* New insns are added here.
2612+@@ -20486,11 +20751,23 @@
2613+ RXY_dl2(ovl),
2614+ RXY_dh2(ovl)); goto ok;
2615+ case 0xe60000000034ULL: /* VPKZ */ goto unimplemented;
2616+- case 0xe60000000035ULL: /* VLRL */ goto unimplemented;
2617+- case 0xe60000000037ULL: /* VLRLR */ goto unimplemented;
2618++ case 0xe60000000035ULL: s390_format_VSI_URDV(s390_irgen_VLRL, VSI_v1(ovl),
2619++ VSI_b2(ovl), VSI_d2(ovl),
2620++ VSI_i3(ovl),
2621++ VSI_rxb(ovl)); goto ok;
2622++ case 0xe60000000037ULL: s390_format_VRS_RRDV(s390_irgen_VLRLR, VRSd_v1(ovl),
2623++ VRSd_r3(ovl), VRS_b2(ovl),
2624++ VRS_d2(ovl),
2625++ VRS_rxb(ovl)); goto ok;
2626+ case 0xe6000000003cULL: /* VUPKZ */ goto unimplemented;
2627+- case 0xe6000000003dULL: /* VSTRL */ goto unimplemented;
2628+- case 0xe6000000003fULL: /* VSTRLR */ goto unimplemented;
2629++ case 0xe6000000003dULL: s390_format_VSI_URDV(s390_irgen_VSTRL, VSI_v1(ovl),
2630++ VSI_b2(ovl), VSI_d2(ovl),
2631++ VSI_i3(ovl),
2632++ VSI_rxb(ovl)); goto ok;
2633++ case 0xe6000000003fULL: s390_format_VRS_RRDV(s390_irgen_VSTRLR, VRSd_v1(ovl),
2634++ VRSd_r3(ovl), VRS_b2(ovl),
2635++ VRS_d2(ovl),
2636++ VRS_rxb(ovl)); goto ok;
2637+ case 0xe60000000049ULL: /* VLIP */ goto unimplemented;
2638+ case 0xe60000000050ULL: /* VCVB */ goto unimplemented;
2639+ case 0xe60000000052ULL: /* VCVBG */ goto unimplemented;
2640+@@ -20688,12 +20965,18 @@
2641+ case 0xe7000000006bULL: s390_format_VRR_VVV(s390_irgen_VNO, VRR_v1(ovl),
2642+ VRR_v2(ovl), VRR_r3(ovl),
2643+ VRR_rxb(ovl)); goto ok;
2644+- case 0xe7000000006cULL: /* VNX */ goto unimplemented;
2645++ case 0xe7000000006cULL: s390_format_VRR_VVV(s390_irgen_VNX, VRR_v1(ovl),
2646++ VRR_v2(ovl), VRR_r3(ovl),
2647++ VRR_rxb(ovl)); goto ok;
2648+ case 0xe7000000006dULL: s390_format_VRR_VVV(s390_irgen_VX, VRR_v1(ovl),
2649+ VRR_v2(ovl), VRR_r3(ovl),
2650+ VRR_rxb(ovl)); goto ok;
2651+- case 0xe7000000006eULL: /* VNN */ goto unimplemented;
2652+- case 0xe7000000006fULL: /* VOC */ goto unimplemented;
2653++ case 0xe7000000006eULL: s390_format_VRR_VVV(s390_irgen_VNN, VRR_v1(ovl),
2654++ VRR_v2(ovl), VRR_r3(ovl),
2655++ VRR_rxb(ovl)); goto ok;
2656++ case 0xe7000000006fULL: s390_format_VRR_VVV(s390_irgen_VOC, VRR_v1(ovl),
2657++ VRR_v2(ovl), VRR_r3(ovl),
2658++ VRR_rxb(ovl)); goto ok;
2659+ case 0xe70000000070ULL: s390_format_VRR_VVVM(s390_irgen_VESLV, VRR_v1(ovl),
2660+ VRR_v2(ovl), VRR_r3(ovl),
2661+ VRR_m4(ovl), VRR_rxb(ovl)); goto ok;
2662+@@ -20746,7 +21029,9 @@
2663+ case 0xe70000000084ULL: s390_format_VRR_VVVM(s390_irgen_VPDI, VRR_v1(ovl),
2664+ VRR_v2(ovl), VRR_r3(ovl),
2665+ VRR_m4(ovl), VRR_rxb(ovl)); goto ok;
2666+- case 0xe70000000085ULL: /* VBPERM */ goto unimplemented;
2667++ case 0xe70000000085ULL: s390_format_VRR_VVV(s390_irgen_VBPERM, VRR_v1(ovl),
2668++ VRR_v2(ovl), VRR_r3(ovl),
2669++ VRR_rxb(ovl)); goto ok;
2670+ case 0xe7000000008aULL: s390_format_VRR_VVVVMM(s390_irgen_VSTRC, VRRd_v1(ovl),
2671+ VRRd_v2(ovl), VRRd_v3(ovl),
2672+ VRRd_v4(ovl), VRRd_m5(ovl),
2673+@@ -20777,8 +21062,16 @@
2674+ case 0xe70000000097ULL: s390_format_VRR_VVVMM(s390_irgen_VPKS, VRR_v1(ovl),
2675+ VRR_v2(ovl), VRR_r3(ovl),
2676+ VRR_m4(ovl), VRR_m5(ovl), VRR_rxb(ovl)); goto ok;
2677+- case 0xe7000000009eULL: /* VFNMS */ goto unimplemented;
2678+- case 0xe7000000009fULL: /* VFNMA */ goto unimplemented;
2679++ case 0xe7000000009eULL: s390_format_VRR_VVVVMM(s390_irgen_VFNMS, VRRe_v1(ovl),
2680++ VRRe_v2(ovl), VRRe_v3(ovl),
2681++ VRRe_v4(ovl), VRRe_m5(ovl),
2682++ VRRe_m6(ovl),
2683++ VRRe_rxb(ovl)); goto ok;
2684++ case 0xe7000000009fULL: s390_format_VRR_VVVVMM(s390_irgen_VFNMA, VRRe_v1(ovl),
2685++ VRRe_v2(ovl), VRRe_v3(ovl),
2686++ VRRe_v4(ovl), VRRe_m5(ovl),
2687++ VRRe_m6(ovl),
2688++ VRRe_rxb(ovl)); goto ok;
2689+ case 0xe700000000a1ULL: s390_format_VRR_VVVM(s390_irgen_VMLH, VRR_v1(ovl),
2690+ VRR_v2(ovl), VRR_r3(ovl),
2691+ VRR_m4(ovl), VRR_rxb(ovl)); goto ok;
2692+@@ -20831,7 +21124,11 @@
2693+ case 0xe700000000b4ULL: s390_format_VRR_VVVM(s390_irgen_VGFM, VRR_v1(ovl),
2694+ VRR_v2(ovl), VRR_r3(ovl),
2695+ VRR_m4(ovl), VRR_rxb(ovl)); goto ok;
2696+- case 0xe700000000b8ULL: /* VMSL */ goto unimplemented;
2697++ case 0xe700000000b8ULL: s390_format_VRR_VVVVMM(s390_irgen_VMSL, VRRd_v1(ovl),
2698++ VRRd_v2(ovl), VRRd_v3(ovl),
2699++ VRRd_v4(ovl), VRRd_m5(ovl),
2700++ VRRd_m6(ovl),
2701++ VRRd_rxb(ovl)); goto ok;
2702+ case 0xe700000000b9ULL: s390_format_VRRd_VVVVM(s390_irgen_VACCC, VRRd_v1(ovl),
2703+ VRRd_v2(ovl), VRRd_v3(ovl),
2704+ VRRd_v4(ovl), VRRd_m5(ovl),
2705+@@ -20868,11 +21165,11 @@
2706+ VRRa_v2(ovl), VRRa_m3(ovl),
2707+ VRRa_m4(ovl), VRRa_m5(ovl),
2708+ VRRa_rxb(ovl)); goto ok;
2709+- case 0xe700000000c4ULL: s390_format_VRRa_VVMMM(s390_irgen_VLDE, VRRa_v1(ovl),
2710++ case 0xe700000000c4ULL: s390_format_VRRa_VVMMM(s390_irgen_VFLL, VRRa_v1(ovl),
2711+ VRRa_v2(ovl), VRRa_m3(ovl),
2712+ VRRa_m4(ovl), VRRa_m5(ovl),
2713+ VRRa_rxb(ovl)); goto ok;
2714+- case 0xe700000000c5ULL: s390_format_VRRa_VVMMM(s390_irgen_VLED, VRRa_v1(ovl),
2715++ case 0xe700000000c5ULL: s390_format_VRRa_VVMMM(s390_irgen_VFLR, VRRa_v1(ovl),
2716+ VRRa_v2(ovl), VRRa_m3(ovl),
2717+ VRRa_m4(ovl), VRRa_m5(ovl),
2718+ VRRa_rxb(ovl)); goto ok;
2719+@@ -20953,8 +21250,16 @@
2720+ VRRa_m3(ovl), VRRa_m4(ovl),
2721+ VRRa_m5(ovl),
2722+ VRRa_rxb(ovl)); goto ok;
2723+- case 0xe700000000eeULL: /* VFMIN */ goto unimplemented;
2724+- case 0xe700000000efULL: /* VFMAX */ goto unimplemented;
2725++ case 0xe700000000eeULL: s390_format_VRRa_VVVMMM(s390_irgen_VFMIN, VRRa_v1(ovl),
2726++ VRRa_v2(ovl), VRRa_v3(ovl),
2727++ VRRa_m3(ovl), VRRa_m4(ovl),
2728++ VRRa_m5(ovl),
2729++ VRRa_rxb(ovl)); goto ok;
2730++ case 0xe700000000efULL: s390_format_VRRa_VVVMMM(s390_irgen_VFMAX, VRRa_v1(ovl),
2731++ VRRa_v2(ovl), VRRa_v3(ovl),
2732++ VRRa_m3(ovl), VRRa_m4(ovl),
2733++ VRRa_m5(ovl),
2734++ VRRa_rxb(ovl)); goto ok;
2735+ case 0xe700000000f0ULL: s390_format_VRR_VVVM(s390_irgen_VAVGL, VRR_v1(ovl),
2736+ VRR_v2(ovl), VRR_r3(ovl),
2737+ VRR_m4(ovl), VRR_rxb(ovl)); goto ok;
2738+--- a/VEX/priv/host_s390_defs.c
2739++++ b/VEX/priv/host_s390_defs.c
2740+@@ -8,7 +8,7 @@
2741+ This file is part of Valgrind, a dynamic binary instrumentation
2742+ framework.
2743+
2744+- Copyright IBM Corp. 2010-2017
2745++ Copyright IBM Corp. 2010-2020
2746+ Copyright (C) 2012-2017 Florian Krohm (britzel@acm.org)
2747+
2748+ This program is free software; you can redistribute it and/or
2749+@@ -684,6 +684,8 @@
2750+ switch (hregClass(from)) {
2751+ case HRcInt64:
2752+ return s390_insn_move(sizeofIRType(Ity_I64), to, from);
2753++ case HRcFlt64:
2754++ return s390_insn_move(sizeofIRType(Ity_F64), to, from);
2755+ case HRcVec128:
2756+ return s390_insn_move(sizeofIRType(Ity_V128), to, from);
2757+ default:
2758+@@ -7870,6 +7872,10 @@
2759+ op = "v-vfloatabs";
2760+ break;
2761+
2762++ case S390_VEC_FLOAT_NABS:
2763++ op = "v-vfloatnabs";
2764++ break;
2765++
2766+ default:
2767+ goto fail;
2768+ }
2769+@@ -9439,21 +9445,28 @@
2770+
2771+ case S390_VEC_FLOAT_NEG: {
2772+ vassert(insn->variant.unop.src.tag == S390_OPND_REG);
2773+- vassert(insn->size == 8);
2774++ vassert(insn->size >= 4);
2775+ UChar v1 = hregNumber(insn->variant.unop.dst);
2776+ UChar v2 = hregNumber(insn->variant.unop.src.variant.reg);
2777+ return s390_emit_VFPSO(buf, v1, v2, s390_getM_from_size(insn->size), 0, 0);
2778+ }
2779+ case S390_VEC_FLOAT_ABS: {
2780+ vassert(insn->variant.unop.src.tag == S390_OPND_REG);
2781+- vassert(insn->size == 8);
2782++ vassert(insn->size >= 4);
2783+ UChar v1 = hregNumber(insn->variant.unop.dst);
2784+ UChar v2 = hregNumber(insn->variant.unop.src.variant.reg);
2785+ return s390_emit_VFPSO(buf, v1, v2, s390_getM_from_size(insn->size), 0, 2);
2786+ }
2787++ case S390_VEC_FLOAT_NABS: {
2788++ vassert(insn->variant.unop.src.tag == S390_OPND_REG);
2789++ vassert(insn->size >= 4);
2790++ UChar v1 = hregNumber(insn->variant.unop.dst);
2791++ UChar v2 = hregNumber(insn->variant.unop.src.variant.reg);
2792++ return s390_emit_VFPSO(buf, v1, v2, s390_getM_from_size(insn->size), 0, 1);
2793++ }
2794+ case S390_VEC_FLOAT_SQRT: {
2795+ vassert(insn->variant.unop.src.tag == S390_OPND_REG);
2796+- vassert(insn->size == 8);
2797++ vassert(insn->size >= 4);
2798+ UChar v1 = hregNumber(insn->variant.unop.dst);
2799+ UChar v2 = hregNumber(insn->variant.unop.src.variant.reg);
2800+ return s390_emit_VFSQ(buf, v1, v2, s390_getM_from_size(insn->size), 0);
2801+--- a/VEX/priv/host_s390_defs.h
2802++++ b/VEX/priv/host_s390_defs.h
2803+@@ -8,7 +8,7 @@
2804+ This file is part of Valgrind, a dynamic binary instrumentation
2805+ framework.
2806+
2807+- Copyright IBM Corp. 2010-2017
2808++ Copyright IBM Corp. 2010-2020
2809+
2810+ This program is free software; you can redistribute it and/or
2811+ modify it under the terms of the GNU General Public License as
2812+@@ -205,6 +205,7 @@
2813+ S390_VEC_COUNT_ONES,
2814+ S390_VEC_FLOAT_NEG,
2815+ S390_VEC_FLOAT_ABS,
2816++ S390_VEC_FLOAT_NABS,
2817+ S390_VEC_FLOAT_SQRT,
2818+ S390_UNOP_T_INVALID
2819+ } s390_unop_t;
2820+@@ -931,6 +932,8 @@
2821+ (s390_host_hwcaps & (VEX_HWCAPS_S390X_MSA5))
2822+ #define s390_host_has_lsc2 \
2823+ (s390_host_hwcaps & (VEX_HWCAPS_S390X_LSC2))
2824++#define s390_host_has_vxe \
2825++ (s390_host_hwcaps & (VEX_HWCAPS_S390X_VXE))
2826+ #endif /* ndef __VEX_HOST_S390_DEFS_H */
2827+
2828+ /*---------------------------------------------------------------*/
2829+--- a/VEX/priv/host_s390_isel.c
2830++++ b/VEX/priv/host_s390_isel.c
2831+@@ -8,7 +8,7 @@
2832+ This file is part of Valgrind, a dynamic binary instrumentation
2833+ framework.
2834+
2835+- Copyright IBM Corp. 2010-2017
2836++ Copyright IBM Corp. 2010-2020
2837+ Copyright (C) 2012-2017 Florian Krohm (britzel@acm.org)
2838+
2839+ This program is free software; you can redistribute it and/or
2840+@@ -2362,9 +2362,10 @@
2841+ case Iop_NegF128:
2842+ if (left->tag == Iex_Unop &&
2843+ (left->Iex.Unop.op == Iop_AbsF32 ||
2844+- left->Iex.Unop.op == Iop_AbsF64))
2845++ left->Iex.Unop.op == Iop_AbsF64)) {
2846+ bfpop = S390_BFP_NABS;
2847+- else
2848++ left = left->Iex.Unop.arg;
2849++ } else
2850+ bfpop = S390_BFP_NEG;
2851+ goto float128_opnd;
2852+ case Iop_AbsF128: bfpop = S390_BFP_ABS; goto float128_opnd;
2853+@@ -2726,9 +2727,10 @@
2854+ case Iop_NegF64:
2855+ if (left->tag == Iex_Unop &&
2856+ (left->Iex.Unop.op == Iop_AbsF32 ||
2857+- left->Iex.Unop.op == Iop_AbsF64))
2858++ left->Iex.Unop.op == Iop_AbsF64)) {
2859+ bfpop = S390_BFP_NABS;
2860+- else
2861++ left = left->Iex.Unop.arg;
2862++ } else
2863+ bfpop = S390_BFP_NEG;
2864+ break;
2865+
2866+@@ -3944,11 +3946,27 @@
2867+ vec_unop = S390_VEC_COUNT_ONES;
2868+ goto Iop_V_wrk;
2869+
2870++ case Iop_Neg32Fx4:
2871++ size = 4;
2872++ vec_unop = S390_VEC_FLOAT_NEG;
2873++ if (arg->tag == Iex_Unop && arg->Iex.Unop.op == Iop_Abs32Fx4) {
2874++ vec_unop = S390_VEC_FLOAT_NABS;
2875++ arg = arg->Iex.Unop.arg;
2876++ }
2877++ goto Iop_V_wrk;
2878+ case Iop_Neg64Fx2:
2879+ size = 8;
2880+ vec_unop = S390_VEC_FLOAT_NEG;
2881++ if (arg->tag == Iex_Unop && arg->Iex.Unop.op == Iop_Abs64Fx2) {
2882++ vec_unop = S390_VEC_FLOAT_NABS;
2883++ arg = arg->Iex.Unop.arg;
2884++ }
2885+ goto Iop_V_wrk;
2886+
2887++ case Iop_Abs32Fx4:
2888++ size = 4;
2889++ vec_unop = S390_VEC_FLOAT_ABS;
2890++ goto Iop_V_wrk;
2891+ case Iop_Abs64Fx2:
2892+ size = 8;
2893+ vec_unop = S390_VEC_FLOAT_ABS;
2894+@@ -4474,17 +4492,29 @@
2895+ vec_binop = S390_VEC_ELEM_ROLL_V;
2896+ goto Iop_VV_wrk;
2897+
2898++ case Iop_CmpEQ32Fx4:
2899++ size = 4;
2900++ vec_binop = S390_VEC_FLOAT_COMPARE_EQUAL;
2901++ goto Iop_VV_wrk;
2902+ case Iop_CmpEQ64Fx2:
2903+ size = 8;
2904+ vec_binop = S390_VEC_FLOAT_COMPARE_EQUAL;
2905+ goto Iop_VV_wrk;
2906+
2907++ case Iop_CmpLE32Fx4:
2908++ size = 4;
2909++ vec_binop = S390_VEC_FLOAT_COMPARE_LESS_OR_EQUAL;
2910++ goto Iop_VV_wrk;
2911+ case Iop_CmpLE64Fx2: {
2912+ size = 8;
2913+ vec_binop = S390_VEC_FLOAT_COMPARE_LESS_OR_EQUAL;
2914+ goto Iop_VV_wrk;
2915+ }
2916+
2917++ case Iop_CmpLT32Fx4:
2918++ size = 4;
2919++ vec_binop = S390_VEC_FLOAT_COMPARE_LESS;
2920++ goto Iop_VV_wrk;
2921+ case Iop_CmpLT64Fx2: {
2922+ size = 8;
2923+ vec_binop = S390_VEC_FLOAT_COMPARE_LESS;
2924+@@ -4671,20 +4701,41 @@
2925+ dst, reg1, reg2, reg3));
2926+ return dst;
2927+
2928++ case Iop_Add32Fx4:
2929++ size = 4;
2930++ vec_binop = S390_VEC_FLOAT_ADD;
2931++ goto Iop_irrm_VV_wrk;
2932++
2933+ case Iop_Add64Fx2:
2934+ size = 8;
2935+ vec_binop = S390_VEC_FLOAT_ADD;
2936+ goto Iop_irrm_VV_wrk;
2937+
2938++ case Iop_Sub32Fx4:
2939++ size = 4;
2940++ vec_binop = S390_VEC_FLOAT_SUB;
2941++ goto Iop_irrm_VV_wrk;
2942++
2943+ case Iop_Sub64Fx2:
2944+ size = 8;
2945+ vec_binop = S390_VEC_FLOAT_SUB;
2946+ goto Iop_irrm_VV_wrk;
2947+
2948++ case Iop_Mul32Fx4:
2949++ size = 4;
2950++ vec_binop = S390_VEC_FLOAT_MUL;
2951++ goto Iop_irrm_VV_wrk;
2952++
2953+ case Iop_Mul64Fx2:
2954+ size = 8;
2955+ vec_binop = S390_VEC_FLOAT_MUL;
2956+ goto Iop_irrm_VV_wrk;
2957++
2958++ case Iop_Div32Fx4:
2959++ size = 4;
2960++ vec_binop = S390_VEC_FLOAT_DIV;
2961++ goto Iop_irrm_VV_wrk;
2962++
2963+ case Iop_Div64Fx2:
2964+ size = 8;
2965+ vec_binop = S390_VEC_FLOAT_DIV;
2966+--- a/VEX/priv/main_main.c
2967++++ b/VEX/priv/main_main.c
2968+@@ -1792,6 +1792,7 @@
2969+ { VEX_HWCAPS_S390X_MSA5, "msa5" },
2970+ { VEX_HWCAPS_S390X_MI2, "mi2" },
2971+ { VEX_HWCAPS_S390X_LSC2, "lsc2" },
2972++ { VEX_HWCAPS_S390X_LSC2, "vxe" },
2973+ };
2974+ /* Allocate a large enough buffer */
2975+ static HChar buf[sizeof prefix +
2976+--- a/VEX/pub/libvex_emnote.h
2977++++ b/VEX/pub/libvex_emnote.h
2978+@@ -124,6 +124,10 @@
2979+ /* ppno insn is not supported on this host */
2980+ EmFail_S390X_ppno,
2981+
2982++ /* insn needs vector-enhancements facility which is not available on this
2983++ host */
2984++ EmFail_S390X_vxe,
2985++
2986+ EmNote_NUMBER
2987+ }
2988+ VexEmNote;
2989+--- a/VEX/pub/libvex.h
2990++++ b/VEX/pub/libvex.h
2991+@@ -167,7 +167,7 @@
2992+ #define VEX_HWCAPS_S390X_MSA5 (1<<19) /* message security assistance facility */
2993+ #define VEX_HWCAPS_S390X_MI2 (1<<20) /* miscellaneous-instruction-extensions facility 2 */
2994+ #define VEX_HWCAPS_S390X_LSC2 (1<<21) /* Conditional load/store facility2 */
2995+-
2996++#define VEX_HWCAPS_S390X_VXE (1<<22) /* Vector-enhancements facility */
2997+
2998+ /* Special value representing all available s390x hwcaps */
2999+ #define VEX_HWCAPS_S390X_ALL (VEX_HWCAPS_S390X_LDISP | \
3000+@@ -185,7 +185,8 @@
3001+ VEX_HWCAPS_S390X_VX | \
3002+ VEX_HWCAPS_S390X_MSA5 | \
3003+ VEX_HWCAPS_S390X_MI2 | \
3004+- VEX_HWCAPS_S390X_LSC2)
3005++ VEX_HWCAPS_S390X_LSC2 | \
3006++ VEX_HWCAPS_S390X_VXE)
3007+
3008+ #define VEX_HWCAPS_S390X(x) ((x) & ~VEX_S390X_MODEL_MASK)
3009+ #define VEX_S390X_MODEL(x) ((x) & VEX_S390X_MODEL_MASK)
3010diff --git a/debian/patches/lp-1825343-Bug-428648-s390x-Force-12-bit-amode-for-vector-loads.patch b/debian/patches/lp-1825343-Bug-428648-s390x-Force-12-bit-amode-for-vector-loads.patch
3011new file mode 100644
3012index 0000000..a62098a
3013--- /dev/null
3014+++ b/debian/patches/lp-1825343-Bug-428648-s390x-Force-12-bit-amode-for-vector-loads.patch
3015@@ -0,0 +1,45 @@
3016+From ba73f8d2ebe4b5fe8163ee5ab806f0e50961ebdf Mon Sep 17 00:00:00 2001
3017+From: Andreas Arnez <arnez@linux.ibm.com>
3018+Date: Tue, 3 Nov 2020 18:17:30 +0100
3019+Subject: [PATCH] Bug 428648 - s390x: Force 12-bit amode for vector loads in isel
3020+
3021+Similar to Bug 417452, where the instruction selector sometimes attempted
3022+to generate vector stores with a 20-bit displacement, the same problem has
3023+now been reported with vector loads.
3024+
3025+The problem is caused in s390_isel_vec_expr_wrk(), where the addressing
3026+mode is generated with s390_isel_amode() instead of
3027+s390_isel_amode_short(). This is fixed.
3028+
3029+Author: Andreas Arnez <arnez@linux.ibm.com>
3030+Origin: backport, https://sourceware.org/git/?p=valgrind.git;a=commit;h=ba73f8d2e
3031+Bug-IBM: IBM Bugzilla 163660
3032+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1825343
3033+Applied-Upstream: > v3.16.1
3034+Reviewed-by: Frank Heimes <frank.heimes@canonical.com>
3035+Last-Update: 2021-02-10
3036+
3037+---
3038+ NEWS | 1 +
3039+ VEX/priv/host_s390_isel.c | 2 +-
3040+ 2 files changed, 3 insertions(+), 1 deletion(-)
3041+--- a/NEWS
3042++++ b/NEWS
3043+@@ -1,4 +1,6 @@
3044+
3045++428648 s390_emit_load_mem panics due to 20-bit offset for vector load
3046++
3047+ Release 3.16.1 (22 June 2020)
3048+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3049+
3050+--- a/VEX/priv/host_s390_isel.c
3051++++ b/VEX/priv/host_s390_isel.c
3052+@@ -3741,7 +3741,7 @@
3053+ /* --------- LOAD --------- */
3054+ case Iex_Load: {
3055+ HReg dst = newVRegV(env);
3056+- s390_amode *am = s390_isel_amode(env, expr->Iex.Load.addr);
3057++ s390_amode *am = s390_isel_amode_short(env, expr->Iex.Load.addr);
3058+
3059+ if (expr->Iex.Load.end != Iend_BE)
3060+ goto irreducible;
3061diff --git a/debian/patches/lp-1825343-Bug-429864-s390-Use-Iop_CasCmp-to-fix-memcheck-false.patch b/debian/patches/lp-1825343-Bug-429864-s390-Use-Iop_CasCmp-to-fix-memcheck-false.patch
3062new file mode 100644
3063index 0000000..94c81f8
3064--- /dev/null
3065+++ b/debian/patches/lp-1825343-Bug-429864-s390-Use-Iop_CasCmp-to-fix-memcheck-false.patch
3066@@ -0,0 +1,155 @@
3067+From 5adeafad7a60b63786d9545e6980de26c17cb0a6 Mon Sep 17 00:00:00 2001
3068+From: Andreas Arnez <arnez@linux.ibm.com>
3069+Date: Thu, 3 Dec 2020 18:32:45 +0100
3070+Subject: [PATCH] Bug 429864 - s390: Use Iop_CasCmp* to fix memcheck false
3071+ positives
3072+
3073+Compare-and-swap instructions can cause memcheck false positives when
3074+operating on partially uninitialized data. An example is where a 1-byte
3075+lock is allocated on the stack and then manipulated using CS on the
3076+surrounding word. This is correct, and the uninitialized data has no
3077+influence on the result, but memcheck still complains.
3078+
3079+This is caused by logic in the s390 backend, where the expected and actual
3080+memory values are compared using Iop_Sub32. Fix this by using
3081+Iop_CasCmpNE32 instead.
3082+
3083+Author: Andreas Arnez <arnez@linux.ibm.com>
3084+Origin: backport, https://sourceware.org/git/?p=valgrind.git;a=commit;h=5adeafad7
3085+Bug-IBM: IBM Bugzilla 163660
3086+Bug-Ubuntu: https://bugs.launchpad.net/bugs/1825343
3087+Applied-Upstream: > v3.16.1
3088+Reviewed-by: Frank Heimes <frank.heimes@canonical.com>
3089+Last-Update: 2021-02-10
3090+
3091+---
3092+ NEWS | 2 ++
3093+ VEX/priv/guest_s390_toIR.c | 31 ++++++++++++++-----------------
3094+ 2 files changed, 16 insertions(+), 17 deletions(-)
3095+
3096+--- a/NEWS
3097++++ b/NEWS
3098+@@ -1,5 +1,7 @@
3099+
3100+ 428648 s390_emit_load_mem panics due to 20-bit offset for vector load
3101++429864 s390x: C++ atomic test_and_set yields false-positive memcheck
3102++ diagnostics
3103+
3104+ Release 3.16.1 (22 June 2020)
3105+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3106+--- a/VEX/priv/guest_s390_toIR.c
3107++++ b/VEX/priv/guest_s390_toIR.c
3108+@@ -742,6 +742,9 @@
3109+ case Ity_I8:
3110+ expr = unop(sign_extend ? Iop_8Sto64 : Iop_8Uto64, expr);
3111+ break;
3112++ case Ity_I1:
3113++ expr = unop(sign_extend ? Iop_1Sto64 : Iop_1Uto64, expr);
3114++ break;
3115+ default:
3116+ vpanic("s390_cc_widen");
3117+ }
3118+@@ -7417,7 +7420,7 @@
3119+
3120+ /* If old_mem contains the expected value, then the CAS succeeded.
3121+ Otherwise, it did not */
3122+- yield_if(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(op2)));
3123++ yield_if(binop(Iop_CasCmpNE32, mkexpr(old_mem), mkexpr(op2)));
3124+ put_gpr_w1(r1, mkexpr(old_mem));
3125+ }
3126+
3127+@@ -7451,7 +7454,7 @@
3128+
3129+ /* If old_mem contains the expected value, then the CAS succeeded.
3130+ Otherwise, it did not */
3131+- yield_if(binop(Iop_CmpNE64, mkexpr(old_mem), mkexpr(op2)));
3132++ yield_if(binop(Iop_CasCmpNE64, mkexpr(old_mem), mkexpr(op2)));
3133+ put_gpr_dw0(r1, mkexpr(old_mem));
3134+ }
3135+
3136+@@ -7481,7 +7484,7 @@
3137+
3138+ /* If old_mem contains the expected value, then the CAS succeeded.
3139+ Otherwise, it did not */
3140+- yield_if(binop(Iop_CmpNE32, mkexpr(old_mem), mkexpr(op2)));
3141++ yield_if(binop(Iop_CasCmpNE32, mkexpr(old_mem), mkexpr(op2)));
3142+ put_gpr_w1(r1, mkexpr(old_mem));
3143+ }
3144+
3145+@@ -13864,7 +13867,6 @@
3146+ IRTemp op1 = newTemp(Ity_I32);
3147+ IRTemp old_mem = newTemp(Ity_I32);
3148+ IRTemp op3 = newTemp(Ity_I32);
3149+- IRTemp result = newTemp(Ity_I32);
3150+ IRTemp nequal = newTemp(Ity_I1);
3151+
3152+ assign(op1, get_gpr_w1(r1));
3153+@@ -13879,12 +13881,11 @@
3154+ stmt(IRStmt_CAS(cas));
3155+
3156+ /* Set CC. Operands compared equal -> 0, else 1. */
3157+- assign(result, binop(Iop_Sub32, mkexpr(op1), mkexpr(old_mem)));
3158+- s390_cc_thunk_put1(S390_CC_OP_BITWISE, result, False);
3159++ assign(nequal, binop(Iop_CasCmpNE32, mkexpr(op1), mkexpr(old_mem)));
3160++ s390_cc_thunk_put1(S390_CC_OP_BITWISE, nequal, True);
3161+
3162+ /* If operands were equal (cc == 0) just store the old value op1 in r1.
3163+ Otherwise, store the old_value from memory in r1 and yield. */
3164+- assign(nequal, binop(Iop_CmpNE32, s390_call_calculate_cc(), mkU32(0)));
3165+ put_gpr_w1(r1, mkite(mkexpr(nequal), mkexpr(old_mem), mkexpr(op1)));
3166+ yield_if(mkexpr(nequal));
3167+ }
3168+@@ -13912,7 +13913,6 @@
3169+ IRTemp op1 = newTemp(Ity_I64);
3170+ IRTemp old_mem = newTemp(Ity_I64);
3171+ IRTemp op3 = newTemp(Ity_I64);
3172+- IRTemp result = newTemp(Ity_I64);
3173+ IRTemp nequal = newTemp(Ity_I1);
3174+
3175+ assign(op1, get_gpr_dw0(r1));
3176+@@ -13927,12 +13927,11 @@
3177+ stmt(IRStmt_CAS(cas));
3178+
3179+ /* Set CC. Operands compared equal -> 0, else 1. */
3180+- assign(result, binop(Iop_Sub64, mkexpr(op1), mkexpr(old_mem)));
3181+- s390_cc_thunk_put1(S390_CC_OP_BITWISE, result, False);
3182++ assign(nequal, binop(Iop_CasCmpNE64, mkexpr(op1), mkexpr(old_mem)));
3183++ s390_cc_thunk_put1(S390_CC_OP_BITWISE, nequal, True);
3184+
3185+ /* If operands were equal (cc == 0) just store the old value op1 in r1.
3186+ Otherwise, store the old_value from memory in r1 and yield. */
3187+- assign(nequal, binop(Iop_CmpNE32, s390_call_calculate_cc(), mkU32(0)));
3188+ put_gpr_dw0(r1, mkite(mkexpr(nequal), mkexpr(old_mem), mkexpr(op1)));
3189+ yield_if(mkexpr(nequal));
3190+
3191+@@ -13950,7 +13949,6 @@
3192+ IRTemp old_mem_low = newTemp(Ity_I32);
3193+ IRTemp op3_high = newTemp(Ity_I32);
3194+ IRTemp op3_low = newTemp(Ity_I32);
3195+- IRTemp result = newTemp(Ity_I32);
3196+ IRTemp nequal = newTemp(Ity_I1);
3197+
3198+ assign(op1_high, get_gpr_w1(r1));
3199+@@ -13967,18 +13965,17 @@
3200+ stmt(IRStmt_CAS(cas));
3201+
3202+ /* Set CC. Operands compared equal -> 0, else 1. */
3203+- assign(result, unop(Iop_1Uto32,
3204+- binop(Iop_CmpNE32,
3205++ assign(nequal,
3206++ binop(Iop_CasCmpNE32,
3207+ binop(Iop_Or32,
3208+ binop(Iop_Xor32, mkexpr(op1_high), mkexpr(old_mem_high)),
3209+ binop(Iop_Xor32, mkexpr(op1_low), mkexpr(old_mem_low))),
3210+- mkU32(0))));
3211++ mkU32(0)));
3212+
3213+- s390_cc_thunk_put1(S390_CC_OP_BITWISE, result, False);
3214++ s390_cc_thunk_put1(S390_CC_OP_BITWISE, nequal, True);
3215+
3216+ /* If operands were equal (cc == 0) just store the old value op1 in r1.
3217+ Otherwise, store the old_value from memory in r1 and yield. */
3218+- assign(nequal, binop(Iop_CmpNE32, s390_call_calculate_cc(), mkU32(0)));
3219+ put_gpr_w1(r1, mkite(mkexpr(nequal), mkexpr(old_mem_high), mkexpr(op1_high)));
3220+ put_gpr_w1(r1+1, mkite(mkexpr(nequal), mkexpr(old_mem_low), mkexpr(op1_low)));
3221+ yield_if(mkexpr(nequal));
3222diff --git a/debian/patches/series b/debian/patches/series
3223index bc89f83..36cbddd 100644
3224--- a/debian/patches/series
3225+++ b/debian/patches/series
3226@@ -9,3 +9,6 @@
3227 13_fix-path-to-vgdb.patch
3228 14_fix-debuginfo-section-duplicates-a-section-in-the-main-ELF-file.patch
3229 armv7-illegal-opcode.patch
3230+lp-1825343-Bug-428648-s390x-Force-12-bit-amode-for-vector-loads.patch
3231+lp-1825343-Bug-429864-s390-Use-Iop_CasCmp-to-fix-memcheck-false.patch
3232+lp-1825343-Bug-404076-s390x-Implement-z14-vector-instructions.patch

Subscribers

People subscribed via source and target branches