Merge ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid

Proposed by Mate Kukri
Status: Merged
Merge reported by: Mate Kukri
Merged at revision: 515581d841bd3732d669f9806966080208c840b8
Proposed branch: ~mkukri/ubuntu/+source/zlib:merge
Merge into: ubuntu/+source/zlib:debian/sid
Diff against target: 6023 lines (+5732/-19)
17 files modified
debian/changelog (+246/-0)
debian/control (+24/-1)
debian/libx32z1-dev.dirs (+1/-0)
debian/libx32z1-dev.install (+2/-0)
debian/libx32z1.dirs (+1/-0)
debian/libx32z1.install (+1/-0)
debian/libx32z1.symbols (+3/-0)
debian/patches/power/add-optimized-crc32.patch (+2539/-0)
debian/patches/power/fix-clang7-builtins.patch (+62/-0)
debian/patches/power/indirect-func-macros.patch (+295/-0)
debian/patches/s390x/add-accel-deflate.patch (+2043/-0)
debian/patches/s390x/add-vectorized-crc32.patch (+426/-0)
debian/patches/series (+5/-0)
debian/rules (+39/-5)
debian/upstream/signing-key.asc (+30/-0)
debian/watch (+2/-0)
debian/zlib-core.symbols (+13/-13)
Reviewer Review Type Date Requested Status
Lukas Märdian (community) Approve
Frank Heimes (community) Approve
Steve Langasek (community) Abstain
Ubuntu Sponsors Pending
git-ubuntu import Pending
Review via email: mp+456176@code.launchpad.net

Commit message

Merge zlib with Debian unstable.

This needed some TLC:
- Split the previous diff with git ubuntu
- Replaced the POWER and s390x patches with the newest ones from IBM rebased on Debian
- Removed the superseded bugfix patches (now included in the above)

To post a comment you must log in.
Revision history for this message
Steve Langasek (vorlon) wrote :

I'm off until end of year so I think you should grab a different reviewer for this

review: Abstain
Revision history for this message
Mate Kukri (mkukri) wrote :

> I'm off until end of year so I think you should grab a different reviewer for
> this

Understood, I saw your and Frank Heimes's name on the last changelog entries, that's what I based this on.

Do you have any names in mind who has touched this package before and might be willing to review this?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Nov 23, 2023 at 01:30:54PM -0000, Mate Kukri wrote:
> > I'm off until end of year so I think you should grab a different reviewer for
> > this

> Understood, I saw your and Frank Heimes's name on the last changelog
> entries, that's what I based this on.
>
> Do you have any names in mind who has touched this package before and
> might be willing to review this?

I don't think "touched this package" is a relevant criterion and you should
ask around in Foundations (or just ask ~canonical-foundations as a reviewer)

~mkukri/ubuntu/+source/zlib:merge updated
b2a9df2... by Mate Kukri

merge-changelogs

87e1e2b... by Mate Kukri

reconstruct-changelog

Revision history for this message
Mate Kukri (mkukri) wrote :

Now based on 1:1.3.dfsg-3

~mkukri/ubuntu/+source/zlib:merge updated
515581d... by Mate Kukri

update-maintainer

Revision history for this message
Frank Heimes (fheimes) wrote :

I think this looks good, and is a nice clean-up.

Since this is merged to the noble development release quite early, there should be some time to ask the IBM s390x people to give it a try (I remember that Ilya Leoshkevich <email address hidden> had some test code).

Once I see that this landed, I would like to ask Ilya (no need for you to do anything, but that allows to ensure that the changing s390x optimization patches work fine ...).

review: Approve
Revision history for this message
Mate Kukri (mkukri) wrote :

> I think this looks good, and is a nice clean-up.
>
> Since this is merged to the noble development release quite early, there
> should be some time to ask the IBM s390x people to give it a try (I remember
> that Ilya Leoshkevich <email address hidden> had some test code).
>
> Once I see that this landed, I would like to ask Ilya (no need for you to do
> anything, but that allows to ensure that the changing s390x optimization
> patches work fine ...).

Are you also able to upload this, or should I ask someone else?

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Mate,
I'm sorry, you would need a coredev for uploading, since it's a main
package - and I am only MOTU (working on coredev ;-).
IIRC schopin sponsored my zlib uploads in the past ...

Bye, Frank

Ubuntu on s390x Blog -- ubuntu-on-big-iron.blogspot.com
<http://ubuntu-on-big-iron.blogspot.com/?view=sidebar>

On Mon, Nov 27, 2023 at 3:01 PM Mate Kukri <email address hidden>
wrote:

> > I think this looks good, and is a nice clean-up.
> >
> > Since this is merged to the noble development release quite early, there
> > should be some time to ask the IBM s390x people to give it a try (I
> remember
> > that Ilya Leoshkevich <email address hidden> had some test code).
> >
> > Once I see that this landed, I would like to ask Ilya (no need for you
> to do
> > anything, but that allows to ensure that the changing s390x optimization
> > patches work fine ...).
>
> Are you also able to upload this, or should I ask someone else?
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Frank Heimes (fheimes) wrote :

Btw. I haven't seen a LP bug reference in the changelog, are you doing this
merge based on a LP bug ? (what I assume), then please don't forget to
reference this LP bug in d/changelog.

On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <email address hidden>
wrote:

> You have been requested to review the proposed merge of
> ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
>
> For more details, see:
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
>
>
>
> --
> You are requested to review the proposed merge of
> ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
>

Revision history for this message
Mate Kukri (mkukri) wrote :

I don't think there is an LP bug for this, maybe I should have created one, but this is tracked internally on the Foundations Jira.

> Btw. I haven't seen a LP bug reference in the changelog, are you doing this
> merge based on a LP bug ? (what I assume), then please don't forget to
> reference this LP bug in d/changelog.
>
> On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <email address hidden>
> wrote:
>
> > You have been requested to review the proposed merge of
> > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> >
> > For more details, see:
> >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> >
> >
> >
> > --
> > You are requested to review the proposed merge of
> > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> >

Revision history for this message
Frank Heimes (fheimes) wrote :

I think the Wiki page for merging recommends to do so:
https://wiki.ubuntu.com/UbuntuDevelopment/Merging
"FILE A MERGE BUG"

Ubuntu on s390x Blog -- ubuntu-on-big-iron.blogspot.com
<http://ubuntu-on-big-iron.blogspot.com/?view=sidebar>

On Tue, Nov 28, 2023 at 9:08 AM Mate Kukri <email address hidden>
wrote:

> I don't think there is an LP bug for this, maybe I should have created
> one, but this is tracked internally on the Foundations Jira.
>
> > Btw. I haven't seen a LP bug reference in the changelog, are you doing
> this
> > merge based on a LP bug ? (what I assume), then please don't forget to
> > reference this LP bug in d/changelog.
> >
> > On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <<email address hidden>
> >
> > wrote:
> >
> > > You have been requested to review the proposed merge of
> > > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> > >
> > > For more details, see:
> > >
> > >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> > >
> > >
> > >
> > > --
> > > You are requested to review the proposed merge of
> > > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> > >
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Lukas Märdian (slyon) wrote :

Thank you Mate, that's indeed a really nice cleanup!

The new patches are nicely structured and provide clean patch headers. I confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub. Besides the new patches the delta looks very similar to our previous delta, but this time as clean git-ubuntu commits. Kudos!

@Frank: you mention there might be some test code available, I wonder if we could somehow integrate that into the package? Because unfortunately there doesn't seem to be any dh_auto_test nor autopkgtest. :(
Either way, we should definitely ask IBM/Ilya to verify that the new patches work as intended.

@Mate: We should also consider upstreaming the d/watch delta to Debian, I think that could be useful and doesn't need to be part of the delta.

Test build passed in a PPA:
https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble

LGTM. Sponsoring.

review: Approve
Revision history for this message
Frank Heimes (fheimes) wrote :

From what I remember 'iii' has just a few roughly coded C programs, that
test s390x optimizations and verify some bugs (that popped up in the past).
(Unfortunately) I assume is not in a shape to be integrated as standard
test - and is s390x specific anyway ... :-/

I more thought about using these as kind of regression testing for the
s390x specific bits and pieces.

But I'll ask - maybe there was some more work on it, that I am not aware of
...

On Tue, Nov 28, 2023 at 4:31 PM Lukas Märdian <email address hidden>
wrote:

> Review: Approve
>
> Thank you Mate, that's indeed a really nice cleanup!
>
> The new patches are nicely structured and provide clean patch headers. I
> confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub.
> Besides the new patches the delta looks very similar to our previous delta,
> but this time as clean git-ubuntu commits. Kudos!
>
> @Frank: you mention there might be some test code available, I wonder if
> we could somehow integrate that into the package? Because unfortunately
> there doesn't seem to be any dh_auto_test nor autopkgtest. :(
> Either way, we should definitely ask IBM/Ilya to verify that the new
> patches work as intended.
>
> @Mate: We should also consider upstreaming the d/watch delta to Debian, I
> think that could be useful and doesn't need to be part of the delta.
>
> Test build passed in a PPA:
>
> https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble
>
> LGTM. Sponsoring.
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Frank Heimes (fheimes) wrote :

So Ilya was pretty quick. He tested the package on a mantic environment
(which is still close to noble) and all his tests passed !

Like assumed his tests are s390x specific - so not very useful for a more
generic autopkgtest.

Anyway, glad that he could gave it a try and came back with a :thumbs up:

On Tue, Nov 28, 2023 at 5:14 PM Frank Heimes <email address hidden>
wrote:

> From what I remember 'iii' has just a few roughly coded C programs, that
> test s390x optimizations and verify some bugs (that popped up in the past).
> (Unfortunately) I assume is not in a shape to be integrated as standard
> test - and is s390x specific anyway ... :-/
>
> I more thought about using these as kind of regression testing for the
> s390x specific bits and pieces.
>
> But I'll ask - maybe there was some more work on it, that I am not aware of
> ...
>
> On Tue, Nov 28, 2023 at 4:31 PM Lukas Märdian <
> <email address hidden>>
> wrote:
>
> > Review: Approve
> >
> > Thank you Mate, that's indeed a really nice cleanup!
> >
> > The new patches are nicely structured and provide clean patch headers. I
> > confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub.
> > Besides the new patches the delta looks very similar to our previous
> delta,
> > but this time as clean git-ubuntu commits. Kudos!
> >
> > @Frank: you mention there might be some test code available, I wonder if
> > we could somehow integrate that into the package? Because unfortunately
> > there doesn't seem to be any dh_auto_test nor autopkgtest. :(
> > Either way, we should definitely ask IBM/Ilya to verify that the new
> > patches work as intended.
> >
> > @Mate: We should also consider upstreaming the d/watch delta to Debian, I
> > think that could be useful and doesn't need to be part of the delta.
> >
> > Test build passed in a PPA:
> >
> >
> https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble
> >
> > LGTM. Sponsoring.
> > --
> >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> > You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> > into ubuntu/+source/zlib:debian/sid.
> >
> >
>
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Mate Kukri (mkukri) wrote :

@fheimes That is good news.

If the test code is in a publishable state it might still be worth a shot integrating it as an s390x specific autopkgtest.

That and POWER crc32 is our only significant delta over Debian, so I think it would still help give more confidence to these merges.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index 92d84a0..d52ce34 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,25 @@
6+zlib (1:1.3.dfsg-3ubuntu1) noble; urgency=medium
7+
8+ * Merge with Debian unstable. Remaining changes:
9+ - Build x32 packages
10+ - Add watch file, with GPG tarball checking, and version mangling
11+ - d/rules: Compile with DFLTCC enabled on s390x and hardware
12+ compression at level 6
13+ - d/zlib-core.symbols: Drop dfsg suffix from version
14+ * New patches rebased from iii-i/zlib/dfltcc on GitHub:
15+ - d/p/power/*: Add optimized crc32 for POWER8+
16+ - d/p/s390x/*: Add optimized crc32 and hardware deflate
17+ * Patches superseded by the above:
18+ - d/p/410.patch: Add support for IBM Z hardware-accelerated deflate
19+ - d/p/478.patch: Add optimized crc32 for Power 8+ processors
20+ - d/p/s390x-vectorize-crc32.patch: Add s390x vectorized crc32 support
21+ - d/p/1390.patch: Don't update strm.adler for raw streams on s390x
22+ (DFLTCC), otherwise libxml2 gets broken on s390x. LP #2002511
23+ - d/p/lp-2018293-fix-crash-in-deflateBound-if-called-before-deflateInt
24+ .patch: Avoid potential deflateBound() function crash on s390x
25+
26+ -- Mate Kukri <mate.kukri@canonical.com> Fri, 24 Nov 2023 08:22:52 +0000
27+
28 zlib (1:1.3.dfsg-3) unstable; urgency=low
29
30 * Update the version of texlive-binaries we break since they still had
31@@ -34,6 +56,74 @@ zlib (1:1.2.13.dfsg-2) unstable; urgency=low
32
33 -- Mark Brown <broonie@debian.org> Tue, 15 Aug 2023 00:28:42 +0100
34
35+zlib (1:1.2.13.dfsg-1ubuntu5) mantic; urgency=medium
36+
37+ * Add
38+ d/p/lp-2018293-fix-crash-in-deflateBound-if-called-before-deflateInt.patch
39+ to avoid potential deflateBound() function crash on s390x.
40+ * Clean-up and remove
41+ d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch since it was
42+ replaced by d/p/s390x-vectorize-crc32.patch with 1.2.13.dfsg-1ubuntu3
43+ but was still in d/p/ (but not in d/p/series).
44+
45+ -- Frank Heimes <frank.heimes@canonical.com> Wed, 02 Aug 2023 13:22:26 +0200
46+
47+zlib (1:1.2.13.dfsg-1ubuntu4) lunar; urgency=medium
48+
49+ * Add d/p/1390.patch to not update strm.adler for raw streams on s390x
50+ (DFLTCC), otherwise libxml2 gets broken on s390x. LP: #2002511
51+
52+ -- Frank Heimes <frank.heimes@canonical.com> Wed, 11 Jan 2023 18:02:34 +0100
53+
54+zlib (1:1.2.13.dfsg-1ubuntu3) lunar; urgency=medium
55+
56+ * Re-add vectorized crc32 support for s390x by adding
57+ d/p/s390x-vectorize-crc32.patch
58+ (crc32vx-v4: s390x: vectorize crc32). (LP: #1998470)
59+ This replaces the previously dropped patch:
60+ lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
61+ * Remove option '--crc32-vx' for s390x in d/rules, that was previously just
62+ commented out, since it's no longer needed with the new s390x crc32 code.
63+ * Update d/p/410.patch to version 26f2c0a4e17e5558d779797d713aa37ebaeef390
64+ due to unused "const char *endptr;".
65+
66+ -- Frank Heimes <frank.heimes@canonical.com> Mon, 21 Nov 2022 20:28:58 +0100
67+
68+zlib (1:1.2.13.dfsg-1ubuntu2) lunar; urgency=medium
69+
70+ * Comment out use of --crc32-vx on s390x, since this is currently not
71+ implemented due to the dropped patch that needs porting.
72+
73+ -- Steve Langasek <steve.langasek@ubuntu.com> Tue, 15 Nov 2022 17:06:45 +0000
74+
75+zlib (1:1.2.13.dfsg-1ubuntu1) lunar; urgency=low
76+
77+ * Merge from Debian unstable. Remaining changes:
78+ - Build x32 packages
79+ - debian/zlib-core.symbols: Drop dfsg suffix from version
80+ - Add watch file, with GPG tarball checking, and version mangling
81+ - Cherrypick PR#410 to enable hardware-accelerated deflate.
82+ - Copmile with DFLTCC enabled on s390x.
83+ - Enable hardware compression on s390x at level 6.
84+ - d/rules: use configure options for dfltcc instead of hardcoding
85+ the CFLAGS
86+ * Dropped changes, included upstream:
87+ - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch
88+ - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
89+ for deflatePrime() is valid in deflate.c.
90+ * Pull rebased 410.patch from https://github.com/madler/zlib/pull/410.
91+ * Drop d/p/410-lp1961427.patch, included in the above rebase.
92+ * Replace 335.patch for ppc64el (P8) crc32 performance with 478.patch which
93+ supersedes it (https://github.com/madler/zlib/pull/478).
94+ * Forward-port lp1932010-ibm-z-add-vectorized-crc32-implementation.patch.
95+ * Dropped changes:
96+ - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: this
97+ patch depends on zlib upstream PR 335 which has been superseded by
98+ upstream PR 478 with significant refactoring. Drop this patch,
99+ pending a port from IBM.
100+
101+ -- Steve Langasek <steve.langasek@ubuntu.com> Mon, 07 Nov 2022 15:57:28 -0800
102+
103 zlib (1:1.2.13.dfsg-1) unstable; urgency=low
104
105 * New upstream release.
106@@ -42,6 +132,38 @@ zlib (1:1.2.13.dfsg-1) unstable; urgency=low
107
108 -- Mark Brown <broonie@debian.org> Sat, 05 Nov 2022 12:24:46 +0000
109
110+zlib (1:1.2.11.dfsg-4.1ubuntu1) kinetic; urgency=low
111+
112+ * Merge from Debian unstable. Remaining changes:
113+ - Build x32 packages
114+ - debian/zlib-core.symbols: Drop dfsg suffix from version
115+ - Add watch file, with GPG tarball checking, and version mangling
116+ - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
117+ - Cherrypick PR#410 to enable hardware-accelerated deflate.
118+ - Copmile with DFLTCC enabled on s390x.
119+ - Improve crc32 performance on P8, proposed upstream patch.
120+ - Enable hardware compression on s390x at level 6.
121+ - Cherrypick update of s390x hw acceleration #410 pull request patch,
122+ which corrects inflateSyncPoint() return value to always gracefully
123+ fail when hw acceleration is in use.
124+ - d/rules: use configure options for dfltcc instead of hardcoding
125+ the CFLAGS
126+ - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
127+ ported from zlib-ng #912, adding a vectorized implementation
128+ of CRC32 on s390x architectures based on kernel code.
129+ - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: adjust
130+ to not make a PLT call in an ifunc on s390/s390x.
131+ - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
132+ for deflatePrime() is valid in deflate.c.
133+ - d/p/410-lp1961427.patch ported from zlib #410, fixing
134+ compressBound() with hw acceleration.
135+ * Dropped changes, included in Debian:
136+ - debian/patches/CVE-2018-25032-1.patch: fix a bug that can crash
137+ deflate on some input when using Z_FIXED in deflate.c, deflate.h.
138+ * Refresh 410.patch for upstream changes.
139+
140+ -- Steve Langasek <steve.langasek@ubuntu.com> Thu, 18 Aug 2022 09:09:22 -0700
141+
142 zlib (1:1.2.11.dfsg-4.1) unstable; urgency=medium
143
144 * Non-maintainer upload.
145@@ -69,6 +191,89 @@ zlib (1:1.2.11.dfsg-3) unstable; urgency=low
146
147 -- Mark Brown <broonie@debian.org> Fri, 18 Mar 2022 00:21:37 +0000
148
149+zlib (1:1.2.11.dfsg-2ubuntu10) kinetic; urgency=medium
150+
151+ * d/p/410-lp1961427.patch ported from zlib #410, fixing
152+ compressBound() with hw acceleration. LP: #1961427
153+ Thanks to Ilya Leoshkevich <iii@linux.ibm.com>.
154+ In addition a patch is needed for bedtools.
155+
156+ -- Frank Heimes <frank.heimes@canonical.com> Thu, 21 Jul 2022 09:30:05 +0100
157+
158+zlib (1:1.2.11.dfsg-2ubuntu9) jammy; urgency=medium
159+
160+ * SECURITY UPDATE: memory corruption when deflating
161+ - debian/patches/CVE-2018-25032-1.patch: fix a bug that can crash
162+ deflate on some input when using Z_FIXED in deflate.c, deflate.h.
163+ - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
164+ for deflatePrime() is valid in deflate.c.
165+ - CVE-2018-25032
166+
167+ -- Marc Deslauriers <marc.deslauriers@ubuntu.com> Fri, 25 Mar 2022 08:06:31 -0400
168+
169+zlib (1:1.2.11.dfsg-2ubuntu7) impish; urgency=medium
170+
171+ [ Simon Chopin ]
172+ * d/rules: use configure options for dfltcc instead of hardcoding
173+ the CFLAGS
174+ * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
175+ ported from zlib-ng #912, adding a vectorized implementation
176+ of CRC32 on s390x architectures based on kernel code. LP: #1932010
177+
178+ [ Michael Hudson-Doyle ]
179+ * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: adjust to
180+ not make a PLT call in an ifunc on s390/s390x.
181+
182+ -- Simon Chopin <simon.chopin@canonical.com> Thu, 12 Aug 2021 15:45:49 +1200
183+
184+zlib (1:1.2.11.dfsg-2ubuntu6) hirsute; urgency=medium
185+
186+ * No-change rebuild to build with lto.
187+
188+ -- Matthias Klose <doko@ubuntu.com> Sun, 28 Mar 2021 09:10:07 +0200
189+
190+zlib (1:1.2.11.dfsg-2ubuntu5) hirsute; urgency=medium
191+
192+ * No-change rebuild to drop the udeb package.
193+
194+ -- Matthias Klose <doko@ubuntu.com> Mon, 22 Feb 2021 10:36:58 +0100
195+
196+zlib (1:1.2.11.dfsg-2ubuntu4) groovy; urgency=medium
197+
198+ * Cherrypick update of s390x hw acceleration #410 pull request patch,
199+ which corrects inflateSyncPoint() return value to always gracefully
200+ fail when hw acceleration is in use. This fixes rsync failure with
201+ zlib compression on hw accelerated s390x. LP: #1899621
202+
203+ -- Dimitri John Ledkov <xnox@ubuntu.com> Thu, 15 Oct 2020 11:01:38 +0100
204+
205+zlib (1:1.2.11.dfsg-2ubuntu3) groovy; urgency=medium
206+
207+ * Enable hardware compression on s390x at level 6. LP: #1884514
208+
209+ -- Michael Hudson-Doyle <michael.hudson@ubuntu.com> Thu, 24 Sep 2020 08:44:35 +1200
210+
211+zlib (1:1.2.11.dfsg-2ubuntu2) groovy; urgency=medium
212+
213+ * Update d/patches/410.patch to current state. LP: #1882494, #1889059, #1893170
214+
215+ -- Michael Hudson-Doyle <michael.hudson@ubuntu.com> Thu, 20 Aug 2020 11:52:59 +1200
216+
217+zlib (1:1.2.11.dfsg-2ubuntu1) focal; urgency=medium
218+
219+ * Merge with Debian; remaining changes:
220+ - Build x32 packages
221+ - debian/zlib-core.symbols: Drop dfsg suffix from version
222+ - Add watch file, with GPG tarball checking, and version mangling
223+ - Drop unused patches
224+ - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
225+ (LP: #1692870)
226+ - Cherrypick PR#410 to enable hardware-accelerated deflate.
227+ - Copmile with DFLTCC enabled on s390x. LP: #1823157
228+ - Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.
229+
230+ -- Matthias Klose <doko@ubuntu.com> Tue, 25 Feb 2020 16:59:52 +0100
231+
232 zlib (1:1.2.11.dfsg-2) unstable; urgency=low
233
234 * Acknowledge previous NMUs (closes: #949388).
235@@ -80,6 +285,21 @@ zlib (1:1.2.11.dfsg-2) unstable; urgency=low
236
237 -- Mark Brown <broonie@debian.org> Mon, 24 Feb 2020 21:07:12 +0000
238
239+zlib (1:1.2.11.dfsg-1.2ubuntu1) focal; urgency=medium
240+
241+ * Merge with Debian; remaining changes:
242+ - Build x32 packages
243+ - debian/zlib-core.symbols: Drop dfsg suffix from version
244+ - Add watch file, with GPG tarball checking, and version mangling
245+ - Drop unused patches
246+ - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
247+ (LP: #1692870)
248+ - Cherrypick PR#410 to enable hardware-accelerated deflate.
249+ - Copmile with DFLTCC enabled on s390x. LP: #1823157
250+ * Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.
251+
252+ -- Matthias Klose <doko@ubuntu.com> Mon, 24 Feb 2020 12:57:03 +0100
253+
254 zlib (1:1.2.11.dfsg-1.2) unstable; urgency=medium
255
256 * Non-maintainer upload.
257@@ -97,6 +317,31 @@ zlib (1:1.2.11.dfsg-1.1) unstable; urgency=medium
258
259 -- YunQiang Su <syq@debian.org> Tue, 28 Jan 2020 19:55:38 +0800
260
261+zlib (1:1.2.11.dfsg-1ubuntu3) eoan; urgency=medium
262+
263+ * Cherrypick PR#410 to enable hardware-accelerated deflate.
264+ * Copmile with DFLTCC enabled on s390x. LP: #1823157
265+
266+ -- Dimitri John Ledkov <xnox@ubuntu.com> Mon, 19 Aug 2019 19:51:09 +0100
267+
268+zlib (1:1.2.11.dfsg-1ubuntu2) disco; urgency=medium
269+
270+ * debian/zlib-core.symbols: fix mistake introduced in the merge
271+
272+ -- Jeremy Bicha <jbicha@debian.org> Thu, 24 Jan 2019 12:56:53 -0500
273+
274+zlib (1:1.2.11.dfsg-1ubuntu1) disco; urgency=medium
275+
276+ * Sync with Debian. Remaining changes:
277+ - Build x32 packages
278+ - debian/zlib-core.symbols: Drop dfsg suffix from version
279+ - Add watch file, with GPG tarball checking, and version mangling
280+ - Drop unused patches
281+ - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
282+ (LP: #1692870)
283+
284+ -- Jeremy Bicha <jbicha@debian.org> Wed, 23 Jan 2019 17:22:17 -0500
285+
286 zlib (1:1.2.11.dfsg-1) unstable; urgency=low
287
288 * New upstream release (closes: #883180).
289@@ -1072,3 +1317,4 @@ zlib (1.0.4-1) unstable; urgency=low
290 * Moved to new source packaging format.
291
292 -- Michael Alan Dorman <mdorman@calder.med.miami.edu> Thu, 12 Sep 1996 15:19:35 -0400
293+
294diff --git a/debian/control b/debian/control
295index 3b4ff22..f365460 100644
296--- a/debian/control
297+++ b/debian/control
298@@ -1,7 +1,8 @@
299 Source: zlib
300 Section: libs
301 Priority: optional
302-Maintainer: Mark Brown <broonie@debian.org>
303+Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
304+XSBC-Original-Maintainer: Mark Brown <broonie@debian.org>
305 Standards-Version: 4.6.1
306 Homepage: http://zlib.net/
307 Build-Depends: debhelper (>= 13), gcc-multilib [amd64 i386 kfreebsd-amd64 mips mipsel powerpc ppc64 s390 sparc s390x mipsn32 mipsn32el mipsr6 mipsr6el mipsn32r6 mipsn32r6el mips64 mips64el mips64r6 mips64r6el x32] <!nobiarch>, dpkg-dev (>= 1.16.1), autoconf
308@@ -119,6 +120,28 @@ Description: compression library - n32 - DO NOT USE EXCEPT FOR PACKAGING
309 not need to build packages should use multiarch to install the relevant
310 runtime.
311
312+Package: libx32z1
313+Architecture: amd64 i386
314+Depends: ${shlibs:Depends}, ${misc:Depends}
315+Description: compression library - x32 runtime
316+ zlib is a library implementing the deflate compression method found
317+ in gzip and PKZIP. This package includes a n32 version of the shared
318+ library.
319+
320+Package: libx32z1-dev
321+Section: libdevel
322+Architecture: amd64 i386
323+Depends: libx32z1 (= ${binary:Version}), zlib1g-dev (= ${binary:Version}), libc6-dev-x32, ${misc:Depends}
324+Provides: libx32z-dev
325+Description: compression library - x32 - DO NOT USE EXCEPT FOR PACKAGING
326+ zlib is a library implementing the deflate compression method found
327+ in gzip and PKZIP. This package includes the development support
328+ files for building n32 applications.
329+ .
330+ This package should ONLY be used for building packages, users who do
331+ not need to build packages should use multiarch to install the relevant
332+ runtime.
333+
334 Package: minizip
335 Section: utils
336 Architecture: any
337diff --git a/debian/libx32z1-dev.dirs b/debian/libx32z1-dev.dirs
338new file mode 100644
339index 0000000..5447591
340--- /dev/null
341+++ b/debian/libx32z1-dev.dirs
342@@ -0,0 +1 @@
343+usr/libx32
344diff --git a/debian/libx32z1-dev.install b/debian/libx32z1-dev.install
345new file mode 100644
346index 0000000..a865054
347--- /dev/null
348+++ b/debian/libx32z1-dev.install
349@@ -0,0 +1,2 @@
350+usr/libx32/libz.a
351+usr/libx32/libz.so
352diff --git a/debian/libx32z1.dirs b/debian/libx32z1.dirs
353new file mode 100644
354index 0000000..5447591
355--- /dev/null
356+++ b/debian/libx32z1.dirs
357@@ -0,0 +1 @@
358+usr/libx32
359diff --git a/debian/libx32z1.install b/debian/libx32z1.install
360new file mode 100644
361index 0000000..3ff82f2
362--- /dev/null
363+++ b/debian/libx32z1.install
364@@ -0,0 +1 @@
365+usr/libx32/libz.so.*
366diff --git a/debian/libx32z1.symbols b/debian/libx32z1.symbols
367new file mode 100644
368index 0000000..a87cfdc
369--- /dev/null
370+++ b/debian/libx32z1.symbols
371@@ -0,0 +1,3 @@
372+libz.so.1 libx32z1 #MINVER#
373+#include "zlib-core.symbols"
374+#include "zlib-64.symbols"
375diff --git a/debian/patches/power/add-optimized-crc32.patch b/debian/patches/power/add-optimized-crc32.patch
376new file mode 100644
377index 0000000..b057b57
378--- /dev/null
379+++ b/debian/patches/power/add-optimized-crc32.patch
380@@ -0,0 +1,2539 @@
381+From: Manjunath S Matti <mmatti@linux.ibm.com>
382+Date: Thu, 14 Sep 2023 06:43:11 -0500
383+Subject: Add Power8+ optimized crc32
384+
385+This commit adds an optimized version for the crc32 function based
386+on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/
387+
388+This is the C implementation created by Rogerio Alves
389+<rogealve@br.ibm.com>
390+
391+It makes use of vector instructions to speed up CRC32 algorithm.
392+
393+Author: Rogerio Alves <rcardoso@linux.ibm.com>
394+Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
395+
396+Origin: i-iii/zlib,https://github.com/iii-i/zlib/commit/6879bc81b111247939b4924b08c5993fd0482b1a
397+---
398+ .gitignore | 29 +
399+ CMakeLists.txt | 7 +-
400+ Makefile.in | 43 +-
401+ configure | 7 +-
402+ contrib/README.contrib | 3 +-
403+ contrib/power/clang_workaround.h | 82 +++
404+ contrib/power/crc32_constants.h | 1206 ++++++++++++++++++++++++++++++++++++++
405+ contrib/power/crc32_z_power8.c | 679 +++++++++++++++++++++
406+ contrib/power/crc32_z_resolver.c | 15 +
407+ contrib/power/power.h | 4 +
408+ crc32.c | 12 +
409+ test/crc32_test.c | 205 +++++++
410+ 12 files changed, 2278 insertions(+), 14 deletions(-)
411+ create mode 100644 .gitignore
412+ create mode 100644 contrib/power/clang_workaround.h
413+ create mode 100644 contrib/power/crc32_constants.h
414+ create mode 100644 contrib/power/crc32_z_power8.c
415+ create mode 100644 contrib/power/crc32_z_resolver.c
416+ create mode 100644 test/crc32_test.c
417+
418+diff --git a/.gitignore b/.gitignore
419+new file mode 100644
420+index 0000000..e324531
421+--- /dev/null
422++++ b/.gitignore
423+@@ -0,0 +1,29 @@
424++*.diff
425++*.patch
426++*.orig
427++*.rej
428++
429++*~
430++*.a
431++*.lo
432++*.o
433++*.dylib
434++
435++*.gcda
436++*.gcno
437++*.gcov
438++
439++/crc32_test
440++/crc32_test64
441++/crc32_testsh
442++/example
443++/example64
444++/examplesh
445++/libz.so*
446++/minigzip
447++/minigzip64
448++/minigzipsh
449++/zlib.pc
450++/configure.log
451++
452++.DS_Store
453+diff --git a/CMakeLists.txt b/CMakeLists.txt
454+index 4456cd7..0464ba3 100644
455+--- a/CMakeLists.txt
456++++ b/CMakeLists.txt
457+@@ -172,7 +172,8 @@ if(CMAKE_COMPILER_IS_GNUCC)
458+
459+ if(POWER8)
460+ add_definitions(-DZ_POWER8)
461+- set(ZLIB_POWER8 )
462++ set(ZLIB_POWER8
463++ contrib/power/crc32_z_power8.c)
464+
465+ set_source_files_properties(
466+ ${ZLIB_POWER8}
467+@@ -269,6 +270,10 @@ add_executable(example test/example.c)
468+ target_link_libraries(example zlib)
469+ add_test(example example)
470+
471++add_executable(crc32_test test/crc32_test.c)
472++target_link_libraries(crc32_test zlib)
473++add_test(crc32_test crc32_test)
474++
475+ add_executable(minigzip test/minigzip.c)
476+ target_link_libraries(minigzip zlib)
477+
478+diff --git a/Makefile.in b/Makefile.in
479+index 34d3cd7..2dbb20a 100644
480+--- a/Makefile.in
481++++ b/Makefile.in
482+@@ -71,11 +71,11 @@ PIC_OBJS = $(PIC_OBJC) $(PIC_OBJA)
483+
484+ all: static shared
485+
486+-static: example$(EXE) minigzip$(EXE)
487++static: crc32_test$(EXE) example$(EXE) minigzip$(EXE)
488+
489+-shared: examplesh$(EXE) minigzipsh$(EXE)
490++shared: crc32_testsh$(EXE) examplesh$(EXE) minigzipsh$(EXE)
491+
492+-all64: example64$(EXE) minigzip64$(EXE)
493++all64: crc32_test64$(EXE) example64$(EXE) minigzip64$(EXE)
494+
495+ check: test
496+
497+@@ -83,7 +83,7 @@ test: all teststatic testshared
498+
499+ teststatic: static
500+ @TMPST=tmpst_$$; \
501+- if echo hello world | ${QEMU_RUN} ./minigzip | ${QEMU_RUN} ./minigzip -d && ${QEMU_RUN} ./example $$TMPST ; then \
502++ if echo hello world | ${QEMU_RUN} ./minigzip | ${QEMU_RUN} ./minigzip -d && ${QEMU_RUN} ./example $$TMPST && ${QEMU_RUN} ./crc32_test; then \
503+ echo ' *** zlib test OK ***'; \
504+ else \
505+ echo ' *** zlib test FAILED ***'; false; \
506+@@ -96,7 +96,7 @@ testshared: shared
507+ DYLD_LIBRARY_PATH=`pwd`:$(DYLD_LIBRARY_PATH) ; export DYLD_LIBRARY_PATH; \
508+ SHLIB_PATH=`pwd`:$(SHLIB_PATH) ; export SHLIB_PATH; \
509+ TMPSH=tmpsh_$$; \
510+- if echo hello world | ${QEMU_RUN} ./minigzipsh | ${QEMU_RUN} ./minigzipsh -d && ${QEMU_RUN} ./examplesh $$TMPSH; then \
511++ if echo hello world | ${QEMU_RUN} ./minigzipsh | ${QEMU_RUN} ./minigzipsh -d && ${QEMU_RUN} ./examplesh $$TMPSH && ${QEMU_RUN} ./crc32_testsh; then \
512+ echo ' *** zlib shared test OK ***'; \
513+ else \
514+ echo ' *** zlib shared test FAILED ***'; false; \
515+@@ -105,7 +105,7 @@ testshared: shared
516+
517+ test64: all64
518+ @TMP64=tmp64_$$; \
519+- if echo hello world | ${QEMU_RUN} ./minigzip64 | ${QEMU_RUN} ./minigzip64 -d && ${QEMU_RUN} ./example64 $$TMP64; then \
520++ if echo hello world | ${QEMU_RUN} ./minigzip64 | ${QEMU_RUN} ./minigzip64 -d && ${QEMU_RUN} ./example64 $$TMP64 && ${QEMU_RUN} ./crc32_test64; then \
521+ echo ' *** zlib 64-bit test OK ***'; \
522+ else \
523+ echo ' *** zlib 64-bit test FAILED ***'; false; \
524+@@ -139,12 +139,18 @@ match.lo: match.S
525+ mv _match.o match.lo
526+ rm -f _match.s
527+
528++crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
529++ $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c
530++
531+ example.o: $(SRCDIR)test/example.c $(SRCDIR)zlib.h zconf.h
532+ $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/example.c
533+
534+ minigzip.o: $(SRCDIR)test/minigzip.c $(SRCDIR)zlib.h zconf.h
535+ $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/minigzip.c
536+
537++crc32_test64.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
538++ $(CC) $(CFLAGS) $(ZINCOUT) -D_FILE_OFFSET_BITS=64 -c -o $@ $(SRCDIR)test/crc32_test.c
539++
540+ example64.o: $(SRCDIR)test/example.c $(SRCDIR)zlib.h zconf.h
541+ $(CC) $(CFLAGS) $(ZINCOUT) -D_FILE_OFFSET_BITS=64 -c -o $@ $(SRCDIR)test/example.c
542+
543+@@ -158,6 +164,9 @@ adler32.o: $(SRCDIR)adler32.c
544+ crc32.o: $(SRCDIR)crc32.c
545+ $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)crc32.c
546+
547++crc32_z_power8.o: $(SRCDIR)contrib/power/crc32_z_power8.c
548++ $(CC) $(CFLAGS) -mcpu=power8 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/crc32_z_power8.c
549++
550+ deflate.o: $(SRCDIR)deflate.c
551+ $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)deflate.c
552+
553+@@ -208,6 +217,11 @@ crc32.lo: $(SRCDIR)crc32.c
554+ $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/crc32.o $(SRCDIR)crc32.c
555+ -@mv objs/crc32.o $@
556+
557++crc32_z_power8.lo: $(SRCDIR)contrib/power/crc32_z_power8.c
558++ -@mkdir objs 2>/dev/null || test -d objs
559++ $(CC) $(SFLAGS) -mcpu=power8 $(ZINC) -DPIC -c -o objs/crc32_z_power8.o $(SRCDIR)contrib/power/crc32_z_power8.c
560++ -@mv objs/crc32_z_power8.o $@
561++
562+ deflate.lo: $(SRCDIR)deflate.c
563+ -@mkdir objs 2>/dev/null || test -d objs
564+ $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/deflate.o $(SRCDIR)deflate.c
565+@@ -281,18 +295,27 @@ placebo $(SHAREDLIBV): $(PIC_OBJS) libz.a
566+ ln -s $@ $(SHAREDLIBM)
567+ -@rmdir objs
568+
569++crc32_test$(EXE): crc32_test.o $(STATICLIB)
570++ $(CC) $(CFLAGS) -o $@ crc32_test.o $(TEST_LDFLAGS)
571++
572+ example$(EXE): example.o $(STATICLIB)
573+ $(CC) $(CFLAGS) -o $@ example.o $(TEST_LDFLAGS)
574+
575+ minigzip$(EXE): minigzip.o $(STATICLIB)
576+ $(CC) $(CFLAGS) -o $@ minigzip.o $(TEST_LDFLAGS)
577+
578++crc32_testsh$(EXE): crc32_test.o $(SHAREDLIBV)
579++ $(CC) $(CFLAGS) -o $@ crc32_test.o -L. $(SHAREDLIBV)
580++
581+ examplesh$(EXE): example.o $(SHAREDLIBV)
582+ $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) -L. $(SHAREDLIBV)
583+
584+ minigzipsh$(EXE): minigzip.o $(SHAREDLIBV)
585+ $(CC) $(CFLAGS) -o $@ minigzip.o $(LDFLAGS) -L. $(SHAREDLIBV)
586+
587++crc32_test64$(EXE): crc32_test64.o $(STATICLIB)
588++ $(CC) $(CFLAGS) -o $@ crc32_test64.o $(TEST_LDFLAGS)
589++
590+ example64$(EXE): example64.o $(STATICLIB)
591+ $(CC) $(CFLAGS) -o $@ example64.o $(TEST_LDFLAGS)
592+
593+@@ -368,8 +391,8 @@ minizip-clean:
594+ mostlyclean: clean
595+ clean: minizip-clean
596+ rm -f *.o *.lo *~ \
597+- example$(EXE) minigzip$(EXE) examplesh$(EXE) minigzipsh$(EXE) \
598+- example64$(EXE) minigzip64$(EXE) \
599++ crc32_test$(EXE) example$(EXE) minigzip$(EXE) crc32_testsh$(EXE) examplesh$(EXE) minigzipsh$(EXE) \
600++ crc32_test64$(EXE) example64$(EXE) minigzip64$(EXE) \
601+ infcover \
602+ libz.* foo.gz so_locations \
603+ _match.s maketree contrib/infback9/*.o
604+@@ -391,7 +414,7 @@ tags:
605+
606+ adler32.o zutil.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
607+ gzclose.o gzlib.o gzread.o gzwrite.o: $(SRCDIR)zlib.h zconf.h $(SRCDIR)gzguts.h
608+-compress.o example.o minigzip.o uncompr.o: $(SRCDIR)zlib.h zconf.h
609++compress.o crc32_test.o example.o minigzip.o uncompr.o: $(SRCDIR)zlib.h zconf.h
610+ crc32.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)crc32.h
611+ deflate.o: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
612+ infback.o inflate.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR)inflate.h $(SRCDIR)inffast.h $(SRCDIR)inffixed.h
613+@@ -401,7 +424,7 @@ trees.o: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)tr
614+
615+ adler32.lo zutil.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
616+ gzclose.lo gzlib.lo gzread.lo gzwrite.lo: $(SRCDIR)zlib.h zconf.h $(SRCDIR)gzguts.h
617+-compress.lo example.lo minigzip.lo uncompr.lo: $(SRCDIR)zlib.h zconf.h
618++compress.lo crc32_test.lo example.lo minigzip.lo uncompr.lo: $(SRCDIR)zlib.h zconf.h
619+ crc32.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)crc32.h
620+ deflate.lo: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
621+ infback.lo inflate.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR)inflate.h $(SRCDIR)inffast.h $(SRCDIR)inffixed.h
622+diff --git a/configure b/configure
623+index e307a8d..b96ed4a 100755
624+--- a/configure
625++++ b/configure
626+@@ -864,6 +864,9 @@ cat > $test.c <<EOF
627+ #ifndef _ARCH_PPC
628+ #error "Target is not Power"
629+ #endif
630++#if !(defined(__PPC64__) || defined(__powerpc64__))
631++ #error "Target is not 64 bits"
632++#endif
633+ #ifndef HAVE_IFUNC
634+ #error "Target doesn't support ifunc"
635+ #endif
636+@@ -877,8 +880,8 @@ if tryboth $CC -c $CFLAGS $test.c; then
637+
638+ if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
639+ POWER8="-DZ_POWER8"
640+- PIC_OBJC="${PIC_OBJC}"
641+- OBJC="${OBJC}"
642++ PIC_OBJC="${PIC_OBJC} crc32_z_power8.lo"
643++ OBJC="${OBJC} crc32_z_power8.o"
644+ echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
645+ else
646+ echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
647+diff --git a/contrib/README.contrib b/contrib/README.contrib
648+index c57b520..90170df 100644
649+--- a/contrib/README.contrib
650++++ b/contrib/README.contrib
651+@@ -46,7 +46,8 @@ minizip/ by Gilles Vollant <info@winimage.com>
652+ pascal/ by Bob Dellaca <bobdl@xtra.co.nz> et al.
653+ Support for Pascal
654+
655+-power/ by Matheus Castanho <msc@linux.ibm.com>
656++power/ by Daniel Black <daniel@linux.ibm.com>
657++ Matheus Castanho <msc@linux.ibm.com>
658+ and Rogerio Alves <rcardoso@linux.ibm.com>
659+ Optimized functions for Power processors
660+
661+diff --git a/contrib/power/clang_workaround.h b/contrib/power/clang_workaround.h
662+new file mode 100644
663+index 0000000..b5e7dae
664+--- /dev/null
665++++ b/contrib/power/clang_workaround.h
666+@@ -0,0 +1,82 @@
667++#ifndef CLANG_WORKAROUNDS_H
668++#define CLANG_WORKAROUNDS_H
669++
670++/*
671++ * These stubs fix clang incompatibilities with GCC builtins.
672++ */
673++
674++#ifndef __builtin_crypto_vpmsumw
675++#define __builtin_crypto_vpmsumw __builtin_crypto_vpmsumb
676++#endif
677++#ifndef __builtin_crypto_vpmsumd
678++#define __builtin_crypto_vpmsumd __builtin_crypto_vpmsumb
679++#endif
680++
681++static inline
682++__vector unsigned long long __attribute__((overloadable))
683++vec_ld(int __a, const __vector unsigned long long* __b)
684++{
685++ return (__vector unsigned long long)__builtin_altivec_lvx(__a, __b);
686++}
687++
688++/*
689++ * GCC __builtin_pack_vector_int128 returns a vector __int128_t but Clang
690++ * does not recognize this type. On GCC this builtin is translated to a
691++ * xxpermdi instruction that only moves the registers __a, __b instead generates
692++ * a load.
693++ *
694++ * Clang has vec_xxpermdi intrinsics. It was implemented in 4.0.0.
695++ */
696++static inline
697++__vector unsigned long long __builtin_pack_vector (unsigned long __a,
698++ unsigned long __b)
699++{
700++ #if defined(__BIG_ENDIAN__)
701++ __vector unsigned long long __v = {__a, __b};
702++ #else
703++ __vector unsigned long long __v = {__b, __a};
704++ #endif
705++ return __v;
706++}
707++
708++#ifndef vec_xxpermdi
709++
710++static inline
711++unsigned long __builtin_unpack_vector (__vector unsigned long long __v,
712++ int __o)
713++{
714++ return __v[__o];
715++}
716++
717++#if defined(__BIG_ENDIAN__)
718++#define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 0)
719++#define __builtin_unpack_vector_1(a) __builtin_unpack_vector ((a), 1)
720++#else
721++#define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 1)
722++#define __builtin_unpack_vector_1(a) __builtin_unpack_vector ((a), 0)
723++#endif
724++
725++#else
726++
727++static inline
728++unsigned long __builtin_unpack_vector_0 (__vector unsigned long long __v)
729++{
730++ #if defined(__BIG_ENDIAN__)
731++ return vec_xxpermdi(__v, __v, 0x0)[1];
732++ #else
733++ return vec_xxpermdi(__v, __v, 0x0)[0];
734++ #endif
735++}
736++
737++static inline
738++unsigned long __builtin_unpack_vector_1 (__vector unsigned long long __v)
739++{
740++ #if defined(__BIG_ENDIAN__)
741++ return vec_xxpermdi(__v, __v, 0x3)[1];
742++ #else
743++ return vec_xxpermdi(__v, __v, 0x3)[0];
744++ #endif
745++}
746++#endif /* vec_xxpermdi */
747++
748++#endif
749+diff --git a/contrib/power/crc32_constants.h b/contrib/power/crc32_constants.h
750+new file mode 100644
751+index 0000000..3d01150
752+--- /dev/null
753++++ b/contrib/power/crc32_constants.h
754+@@ -0,0 +1,1206 @@
755++/*
756++*
757++* THIS FILE IS GENERATED WITH
758++./crc32_constants -c -r -x 0x04C11DB7
759++
760++* This is from https://github.com/antonblanchard/crc32-vpmsum/
761++* DO NOT MODIFY IT MANUALLY!
762++*
763++*/
764++
765++#define CRC 0x4c11db7
766++#define CRC_XOR
767++#define REFLECT
768++#define MAX_SIZE 32768
769++
770++#ifndef __ASSEMBLER__
771++#ifdef CRC_TABLE
772++static const unsigned int crc_table[] = {
773++ 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
774++ 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
775++ 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
776++ 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
777++ 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
778++ 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
779++ 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
780++ 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
781++ 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
782++ 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
783++ 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
784++ 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
785++ 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
786++ 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
787++ 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
788++ 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
789++ 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
790++ 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
791++ 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
792++ 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
793++ 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
794++ 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
795++ 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
796++ 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
797++ 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
798++ 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
799++ 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
800++ 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
801++ 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
802++ 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
803++ 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
804++ 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
805++ 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
806++ 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
807++ 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
808++ 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
809++ 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
810++ 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
811++ 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
812++ 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
813++ 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
814++ 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
815++ 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
816++ 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
817++ 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
818++ 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
819++ 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
820++ 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
821++ 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
822++ 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
823++ 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
824++ 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
825++ 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
826++ 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
827++ 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
828++ 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
829++ 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
830++ 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
831++ 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
832++ 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
833++ 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
834++ 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
835++ 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
836++ 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,};
837++
838++#endif /* CRC_TABLE */
839++#ifdef POWER8_INTRINSICS
840++
841++/* Constants */
842++
843++/* Reduce 262144 kbits to 1024 bits */
844++static const __vector unsigned long long vcrc_const[255]
845++ __attribute__((aligned (16))) = {
846++#ifdef __LITTLE_ENDIAN__
847++ /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
848++ { 0x0000000099ea94a8, 0x00000001651797d2 },
849++ /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
850++ { 0x00000000945a8420, 0x0000000021e0d56c },
851++ /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
852++ { 0x0000000030762706, 0x000000000f95ecaa },
853++ /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
854++ { 0x00000001a52fc582, 0x00000001ebd224ac },
855++ /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
856++ { 0x00000001a4a7167a, 0x000000000ccb97ca },
857++ /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
858++ { 0x000000000c18249a, 0x00000001006ec8a8 },
859++ /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
860++ { 0x00000000a924ae7c, 0x000000014f58f196 },
861++ /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
862++ { 0x00000001e12ccc12, 0x00000001a7192ca6 },
863++ /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
864++ { 0x00000000a0b9d4ac, 0x000000019a64bab2 },
865++ /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
866++ { 0x0000000095e8ddfe, 0x0000000014f4ed2e },
867++ /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
868++ { 0x00000000233fddc4, 0x000000011092b6a2 },
869++ /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
870++ { 0x00000001b4529b62, 0x00000000c8a1629c },
871++ /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
872++ { 0x00000001a7fa0e64, 0x000000017bf32e8e },
873++ /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */
874++ { 0x00000001b5334592, 0x00000001f8cc6582 },
875++ /* x^246784 mod p(x)` << 1, x^246848 mod p(x)` << 1 */
876++ { 0x000000011f8ee1b4, 0x000000008631ddf0 },
877++ /* x^245760 mod p(x)` << 1, x^245824 mod p(x)` << 1 */
878++ { 0x000000006252e632, 0x000000007e5a76d0 },
879++ /* x^244736 mod p(x)` << 1, x^244800 mod p(x)` << 1 */
880++ { 0x00000000ab973e84, 0x000000002b09b31c },
881++ /* x^243712 mod p(x)` << 1, x^243776 mod p(x)` << 1 */
882++ { 0x000000007734f5ec, 0x00000001b2df1f84 },
883++ /* x^242688 mod p(x)` << 1, x^242752 mod p(x)` << 1 */
884++ { 0x000000007c547798, 0x00000001d6f56afc },
885++ /* x^241664 mod p(x)` << 1, x^241728 mod p(x)` << 1 */
886++ { 0x000000007ec40210, 0x00000001b9b5e70c },
887++ /* x^240640 mod p(x)` << 1, x^240704 mod p(x)` << 1 */
888++ { 0x00000001ab1695a8, 0x0000000034b626d2 },
889++ /* x^239616 mod p(x)` << 1, x^239680 mod p(x)` << 1 */
890++ { 0x0000000090494bba, 0x000000014c53479a },
891++ /* x^238592 mod p(x)` << 1, x^238656 mod p(x)` << 1 */
892++ { 0x00000001123fb816, 0x00000001a6d179a4 },
893++ /* x^237568 mod p(x)` << 1, x^237632 mod p(x)` << 1 */
894++ { 0x00000001e188c74c, 0x000000015abd16b4 },
895++ /* x^236544 mod p(x)` << 1, x^236608 mod p(x)` << 1 */
896++ { 0x00000001c2d3451c, 0x00000000018f9852 },
897++ /* x^235520 mod p(x)` << 1, x^235584 mod p(x)` << 1 */
898++ { 0x00000000f55cf1ca, 0x000000001fb3084a },
899++ /* x^234496 mod p(x)` << 1, x^234560 mod p(x)` << 1 */
900++ { 0x00000001a0531540, 0x00000000c53dfb04 },
901++ /* x^233472 mod p(x)` << 1, x^233536 mod p(x)` << 1 */
902++ { 0x0000000132cd7ebc, 0x00000000e10c9ad6 },
903++ /* x^232448 mod p(x)` << 1, x^232512 mod p(x)` << 1 */
904++ { 0x0000000073ab7f36, 0x0000000025aa994a },
905++ /* x^231424 mod p(x)` << 1, x^231488 mod p(x)` << 1 */
906++ { 0x0000000041aed1c2, 0x00000000fa3a74c4 },
907++ /* x^230400 mod p(x)` << 1, x^230464 mod p(x)` << 1 */
908++ { 0x0000000136c53800, 0x0000000033eb3f40 },
909++ /* x^229376 mod p(x)` << 1, x^229440 mod p(x)` << 1 */
910++ { 0x0000000126835a30, 0x000000017193f296 },
911++ /* x^228352 mod p(x)` << 1, x^228416 mod p(x)` << 1 */
912++ { 0x000000006241b502, 0x0000000043f6c86a },
913++ /* x^227328 mod p(x)` << 1, x^227392 mod p(x)` << 1 */
914++ { 0x00000000d5196ad4, 0x000000016b513ec6 },
915++ /* x^226304 mod p(x)` << 1, x^226368 mod p(x)` << 1 */
916++ { 0x000000009cfa769a, 0x00000000c8f25b4e },
917++ /* x^225280 mod p(x)` << 1, x^225344 mod p(x)` << 1 */
918++ { 0x00000000920e5df4, 0x00000001a45048ec },
919++ /* x^224256 mod p(x)` << 1, x^224320 mod p(x)` << 1 */
920++ { 0x0000000169dc310e, 0x000000000c441004 },
921++ /* x^223232 mod p(x)` << 1, x^223296 mod p(x)` << 1 */
922++ { 0x0000000009fc331c, 0x000000000e17cad6 },
923++ /* x^222208 mod p(x)` << 1, x^222272 mod p(x)` << 1 */
924++ { 0x000000010d94a81e, 0x00000001253ae964 },
925++ /* x^221184 mod p(x)` << 1, x^221248 mod p(x)` << 1 */
926++ { 0x0000000027a20ab2, 0x00000001d7c88ebc },
927++ /* x^220160 mod p(x)` << 1, x^220224 mod p(x)` << 1 */
928++ { 0x0000000114f87504, 0x00000001e7ca913a },
929++ /* x^219136 mod p(x)` << 1, x^219200 mod p(x)` << 1 */
930++ { 0x000000004b076d96, 0x0000000033ed078a },
931++ /* x^218112 mod p(x)` << 1, x^218176 mod p(x)` << 1 */
932++ { 0x00000000da4d1e74, 0x00000000e1839c78 },
933++ /* x^217088 mod p(x)` << 1, x^217152 mod p(x)` << 1 */
934++ { 0x000000001b81f672, 0x00000001322b267e },
935++ /* x^216064 mod p(x)` << 1, x^216128 mod p(x)` << 1 */
936++ { 0x000000009367c988, 0x00000000638231b6 },
937++ /* x^215040 mod p(x)` << 1, x^215104 mod p(x)` << 1 */
938++ { 0x00000001717214ca, 0x00000001ee7f16f4 },
939++ /* x^214016 mod p(x)` << 1, x^214080 mod p(x)` << 1 */
940++ { 0x000000009f47d820, 0x0000000117d9924a },
941++ /* x^212992 mod p(x)` << 1, x^213056 mod p(x)` << 1 */
942++ { 0x000000010d9a47d2, 0x00000000e1a9e0c4 },
943++ /* x^211968 mod p(x)` << 1, x^212032 mod p(x)` << 1 */
944++ { 0x00000000a696c58c, 0x00000001403731dc },
945++ /* x^210944 mod p(x)` << 1, x^211008 mod p(x)` << 1 */
946++ { 0x000000002aa28ec6, 0x00000001a5ea9682 },
947++ /* x^209920 mod p(x)` << 1, x^209984 mod p(x)` << 1 */
948++ { 0x00000001fe18fd9a, 0x0000000101c5c578 },
949++ /* x^208896 mod p(x)` << 1, x^208960 mod p(x)` << 1 */
950++ { 0x000000019d4fc1ae, 0x00000000dddf6494 },
951++ /* x^207872 mod p(x)` << 1, x^207936 mod p(x)` << 1 */
952++ { 0x00000001ba0e3dea, 0x00000000f1c3db28 },
953++ /* x^206848 mod p(x)` << 1, x^206912 mod p(x)` << 1 */
954++ { 0x0000000074b59a5e, 0x000000013112fb9c },
955++ /* x^205824 mod p(x)` << 1, x^205888 mod p(x)` << 1 */
956++ { 0x00000000f2b5ea98, 0x00000000b680b906 },
957++ /* x^204800 mod p(x)` << 1, x^204864 mod p(x)` << 1 */
958++ { 0x0000000187132676, 0x000000001a282932 },
959++ /* x^203776 mod p(x)` << 1, x^203840 mod p(x)` << 1 */
960++ { 0x000000010a8c6ad4, 0x0000000089406e7e },
961++ /* x^202752 mod p(x)` << 1, x^202816 mod p(x)` << 1 */
962++ { 0x00000001e21dfe70, 0x00000001def6be8c },
963++ /* x^201728 mod p(x)` << 1, x^201792 mod p(x)` << 1 */
964++ { 0x00000001da0050e4, 0x0000000075258728 },
965++ /* x^200704 mod p(x)` << 1, x^200768 mod p(x)` << 1 */
966++ { 0x00000000772172ae, 0x000000019536090a },
967++ /* x^199680 mod p(x)` << 1, x^199744 mod p(x)` << 1 */
968++ { 0x00000000e47724aa, 0x00000000f2455bfc },
969++ /* x^198656 mod p(x)` << 1, x^198720 mod p(x)` << 1 */
970++ { 0x000000003cd63ac4, 0x000000018c40baf4 },
971++ /* x^197632 mod p(x)` << 1, x^197696 mod p(x)` << 1 */
972++ { 0x00000001bf47d352, 0x000000004cd390d4 },
973++ /* x^196608 mod p(x)` << 1, x^196672 mod p(x)` << 1 */
974++ { 0x000000018dc1d708, 0x00000001e4ece95a },
975++ /* x^195584 mod p(x)` << 1, x^195648 mod p(x)` << 1 */
976++ { 0x000000002d4620a4, 0x000000001a3ee918 },
977++ /* x^194560 mod p(x)` << 1, x^194624 mod p(x)` << 1 */
978++ { 0x0000000058fd1740, 0x000000007c652fb8 },
979++ /* x^193536 mod p(x)` << 1, x^193600 mod p(x)` << 1 */
980++ { 0x00000000dadd9bfc, 0x000000011c67842c },
981++ /* x^192512 mod p(x)` << 1, x^192576 mod p(x)` << 1 */
982++ { 0x00000001ea2140be, 0x00000000254f759c },
983++ /* x^191488 mod p(x)` << 1, x^191552 mod p(x)` << 1 */
984++ { 0x000000009de128ba, 0x000000007ece94ca },
985++ /* x^190464 mod p(x)` << 1, x^190528 mod p(x)` << 1 */
986++ { 0x000000013ac3aa8e, 0x0000000038f258c2 },
987++ /* x^189440 mod p(x)` << 1, x^189504 mod p(x)` << 1 */
988++ { 0x0000000099980562, 0x00000001cdf17b00 },
989++ /* x^188416 mod p(x)` << 1, x^188480 mod p(x)` << 1 */
990++ { 0x00000001c1579c86, 0x000000011f882c16 },
991++ /* x^187392 mod p(x)` << 1, x^187456 mod p(x)` << 1 */
992++ { 0x0000000068dbbf94, 0x0000000100093fc8 },
993++ /* x^186368 mod p(x)` << 1, x^186432 mod p(x)` << 1 */
994++ { 0x000000004509fb04, 0x00000001cd684f16 },
995++ /* x^185344 mod p(x)` << 1, x^185408 mod p(x)` << 1 */
996++ { 0x00000001202f6398, 0x000000004bc6a70a },
997++ /* x^184320 mod p(x)` << 1, x^184384 mod p(x)` << 1 */
998++ { 0x000000013aea243e, 0x000000004fc7e8e4 },
999++ /* x^183296 mod p(x)` << 1, x^183360 mod p(x)` << 1 */
1000++ { 0x00000001b4052ae6, 0x0000000130103f1c },
1001++ /* x^182272 mod p(x)` << 1, x^182336 mod p(x)` << 1 */
1002++ { 0x00000001cd2a0ae8, 0x0000000111b0024c },
1003++ /* x^181248 mod p(x)` << 1, x^181312 mod p(x)` << 1 */
1004++ { 0x00000001fe4aa8b4, 0x000000010b3079da },
1005++ /* x^180224 mod p(x)` << 1, x^180288 mod p(x)` << 1 */
1006++ { 0x00000001d1559a42, 0x000000010192bcc2 },
1007++ /* x^179200 mod p(x)` << 1, x^179264 mod p(x)` << 1 */
1008++ { 0x00000001f3e05ecc, 0x0000000074838d50 },
1009++ /* x^178176 mod p(x)` << 1, x^178240 mod p(x)` << 1 */
1010++ { 0x0000000104ddd2cc, 0x000000001b20f520 },
1011++ /* x^177152 mod p(x)` << 1, x^177216 mod p(x)` << 1 */
1012++ { 0x000000015393153c, 0x0000000050c3590a },
1013++ /* x^176128 mod p(x)` << 1, x^176192 mod p(x)` << 1 */
1014++ { 0x0000000057e942c6, 0x00000000b41cac8e },
1015++ /* x^175104 mod p(x)` << 1, x^175168 mod p(x)` << 1 */
1016++ { 0x000000012c633850, 0x000000000c72cc78 },
1017++ /* x^174080 mod p(x)` << 1, x^174144 mod p(x)` << 1 */
1018++ { 0x00000000ebcaae4c, 0x0000000030cdb032 },
1019++ /* x^173056 mod p(x)` << 1, x^173120 mod p(x)` << 1 */
1020++ { 0x000000013ee532a6, 0x000000013e09fc32 },
1021++ /* x^172032 mod p(x)` << 1, x^172096 mod p(x)` << 1 */
1022++ { 0x00000001bf0cbc7e, 0x000000001ed624d2 },
1023++ /* x^171008 mod p(x)` << 1, x^171072 mod p(x)` << 1 */
1024++ { 0x00000000d50b7a5a, 0x00000000781aee1a },
1025++ /* x^169984 mod p(x)` << 1, x^170048 mod p(x)` << 1 */
1026++ { 0x0000000002fca6e8, 0x00000001c4d8348c },
1027++ /* x^168960 mod p(x)` << 1, x^169024 mod p(x)` << 1 */
1028++ { 0x000000007af40044, 0x0000000057a40336 },
1029++ /* x^167936 mod p(x)` << 1, x^168000 mod p(x)` << 1 */
1030++ { 0x0000000016178744, 0x0000000085544940 },
1031++ /* x^166912 mod p(x)` << 1, x^166976 mod p(x)` << 1 */
1032++ { 0x000000014c177458, 0x000000019cd21e80 },
1033++ /* x^165888 mod p(x)` << 1, x^165952 mod p(x)` << 1 */
1034++ { 0x000000011b6ddf04, 0x000000013eb95bc0 },
1035++ /* x^164864 mod p(x)` << 1, x^164928 mod p(x)` << 1 */
1036++ { 0x00000001f3e29ccc, 0x00000001dfc9fdfc },
1037++ /* x^163840 mod p(x)` << 1, x^163904 mod p(x)` << 1 */
1038++ { 0x0000000135ae7562, 0x00000000cd028bc2 },
1039++ /* x^162816 mod p(x)` << 1, x^162880 mod p(x)` << 1 */
1040++ { 0x0000000190ef812c, 0x0000000090db8c44 },
1041++ /* x^161792 mod p(x)` << 1, x^161856 mod p(x)` << 1 */
1042++ { 0x0000000067a2c786, 0x000000010010a4ce },
1043++ /* x^160768 mod p(x)` << 1, x^160832 mod p(x)` << 1 */
1044++ { 0x0000000048b9496c, 0x00000001c8f4c72c },
1045++ /* x^159744 mod p(x)` << 1, x^159808 mod p(x)` << 1 */
1046++ { 0x000000015a422de6, 0x000000001c26170c },
1047++ /* x^158720 mod p(x)` << 1, x^158784 mod p(x)` << 1 */
1048++ { 0x00000001ef0e3640, 0x00000000e3fccf68 },
1049++ /* x^157696 mod p(x)` << 1, x^157760 mod p(x)` << 1 */
1050++ { 0x00000001006d2d26, 0x00000000d513ed24 },
1051++ /* x^156672 mod p(x)` << 1, x^156736 mod p(x)` << 1 */
1052++ { 0x00000001170d56d6, 0x00000000141beada },
1053++ /* x^155648 mod p(x)` << 1, x^155712 mod p(x)` << 1 */
1054++ { 0x00000000a5fb613c, 0x000000011071aea0 },
1055++ /* x^154624 mod p(x)` << 1, x^154688 mod p(x)` << 1 */
1056++ { 0x0000000040bbf7fc, 0x000000012e19080a },
1057++ /* x^153600 mod p(x)` << 1, x^153664 mod p(x)` << 1 */
1058++ { 0x000000016ac3a5b2, 0x0000000100ecf826 },
1059++ /* x^152576 mod p(x)` << 1, x^152640 mod p(x)` << 1 */
1060++ { 0x00000000abf16230, 0x0000000069b09412 },
1061++ /* x^151552 mod p(x)` << 1, x^151616 mod p(x)` << 1 */
1062++ { 0x00000001ebe23fac, 0x0000000122297bac },
1063++ /* x^150528 mod p(x)` << 1, x^150592 mod p(x)` << 1 */
1064++ { 0x000000008b6a0894, 0x00000000e9e4b068 },
1065++ /* x^149504 mod p(x)` << 1, x^149568 mod p(x)` << 1 */
1066++ { 0x00000001288ea478, 0x000000004b38651a },
1067++ /* x^148480 mod p(x)` << 1, x^148544 mod p(x)` << 1 */
1068++ { 0x000000016619c442, 0x00000001468360e2 },
1069++ /* x^147456 mod p(x)` << 1, x^147520 mod p(x)` << 1 */
1070++ { 0x0000000086230038, 0x00000000121c2408 },
1071++ /* x^146432 mod p(x)` << 1, x^146496 mod p(x)` << 1 */
1072++ { 0x000000017746a756, 0x00000000da7e7d08 },
1073++ /* x^145408 mod p(x)` << 1, x^145472 mod p(x)` << 1 */
1074++ { 0x0000000191b8f8f8, 0x00000001058d7652 },
1075++ /* x^144384 mod p(x)` << 1, x^144448 mod p(x)` << 1 */
1076++ { 0x000000008e167708, 0x000000014a098a90 },
1077++ /* x^143360 mod p(x)` << 1, x^143424 mod p(x)` << 1 */
1078++ { 0x0000000148b22d54, 0x0000000020dbe72e },
1079++ /* x^142336 mod p(x)` << 1, x^142400 mod p(x)` << 1 */
1080++ { 0x0000000044ba2c3c, 0x000000011e7323e8 },
1081++ /* x^141312 mod p(x)` << 1, x^141376 mod p(x)` << 1 */
1082++ { 0x00000000b54d2b52, 0x00000000d5d4bf94 },
1083++ /* x^140288 mod p(x)` << 1, x^140352 mod p(x)` << 1 */
1084++ { 0x0000000005a4fd8a, 0x0000000199d8746c },
1085++ /* x^139264 mod p(x)` << 1, x^139328 mod p(x)` << 1 */
1086++ { 0x0000000139f9fc46, 0x00000000ce9ca8a0 },
1087++ /* x^138240 mod p(x)` << 1, x^138304 mod p(x)` << 1 */
1088++ { 0x000000015a1fa824, 0x00000000136edece },
1089++ /* x^137216 mod p(x)` << 1, x^137280 mod p(x)` << 1 */
1090++ { 0x000000000a61ae4c, 0x000000019b92a068 },
1091++ /* x^136192 mod p(x)` << 1, x^136256 mod p(x)` << 1 */
1092++ { 0x0000000145e9113e, 0x0000000071d62206 },
1093++ /* x^135168 mod p(x)` << 1, x^135232 mod p(x)` << 1 */
1094++ { 0x000000006a348448, 0x00000000dfc50158 },
1095++ /* x^134144 mod p(x)` << 1, x^134208 mod p(x)` << 1 */
1096++ { 0x000000004d80a08c, 0x00000001517626bc },
1097++ /* x^133120 mod p(x)` << 1, x^133184 mod p(x)` << 1 */
1098++ { 0x000000014b6837a0, 0x0000000148d1e4fa },
1099++ /* x^132096 mod p(x)` << 1, x^132160 mod p(x)` << 1 */
1100++ { 0x000000016896a7fc, 0x0000000094d8266e },
1101++ /* x^131072 mod p(x)` << 1, x^131136 mod p(x)` << 1 */
1102++ { 0x000000014f187140, 0x00000000606c5e34 },
1103++ /* x^130048 mod p(x)` << 1, x^130112 mod p(x)` << 1 */
1104++ { 0x000000019581b9da, 0x000000019766beaa },
1105++ /* x^129024 mod p(x)` << 1, x^129088 mod p(x)` << 1 */
1106++ { 0x00000001091bc984, 0x00000001d80c506c },
1107++ /* x^128000 mod p(x)` << 1, x^128064 mod p(x)` << 1 */
1108++ { 0x000000001067223c, 0x000000001e73837c },
1109++ /* x^126976 mod p(x)` << 1, x^127040 mod p(x)` << 1 */
1110++ { 0x00000001ab16ea02, 0x0000000064d587de },
1111++ /* x^125952 mod p(x)` << 1, x^126016 mod p(x)` << 1 */
1112++ { 0x000000013c4598a8, 0x00000000f4a507b0 },
1113++ /* x^124928 mod p(x)` << 1, x^124992 mod p(x)` << 1 */
1114++ { 0x00000000b3735430, 0x0000000040e342fc },
1115++ /* x^123904 mod p(x)` << 1, x^123968 mod p(x)` << 1 */
1116++ { 0x00000001bb3fc0c0, 0x00000001d5ad9c3a },
1117++ /* x^122880 mod p(x)` << 1, x^122944 mod p(x)` << 1 */
1118++ { 0x00000001570ae19c, 0x0000000094a691a4 },
1119++ /* x^121856 mod p(x)` << 1, x^121920 mod p(x)` << 1 */
1120++ { 0x00000001ea910712, 0x00000001271ecdfa },
1121++ /* x^120832 mod p(x)` << 1, x^120896 mod p(x)` << 1 */
1122++ { 0x0000000167127128, 0x000000009e54475a },
1123++ /* x^119808 mod p(x)` << 1, x^119872 mod p(x)` << 1 */
1124++ { 0x0000000019e790a2, 0x00000000c9c099ee },
1125++ /* x^118784 mod p(x)` << 1, x^118848 mod p(x)` << 1 */
1126++ { 0x000000003788f710, 0x000000009a2f736c },
1127++ /* x^117760 mod p(x)` << 1, x^117824 mod p(x)` << 1 */
1128++ { 0x00000001682a160e, 0x00000000bb9f4996 },
1129++ /* x^116736 mod p(x)` << 1, x^116800 mod p(x)` << 1 */
1130++ { 0x000000007f0ebd2e, 0x00000001db688050 },
1131++ /* x^115712 mod p(x)` << 1, x^115776 mod p(x)` << 1 */
1132++ { 0x000000002b032080, 0x00000000e9b10af4 },
1133++ /* x^114688 mod p(x)` << 1, x^114752 mod p(x)` << 1 */
1134++ { 0x00000000cfd1664a, 0x000000012d4545e4 },
1135++ /* x^113664 mod p(x)` << 1, x^113728 mod p(x)` << 1 */
1136++ { 0x00000000aa1181c2, 0x000000000361139c },
1137++ /* x^112640 mod p(x)` << 1, x^112704 mod p(x)` << 1 */
1138++ { 0x00000000ddd08002, 0x00000001a5a1a3a8 },
1139++ /* x^111616 mod p(x)` << 1, x^111680 mod p(x)` << 1 */
1140++ { 0x00000000e8dd0446, 0x000000006844e0b0 },
1141++ /* x^110592 mod p(x)` << 1, x^110656 mod p(x)` << 1 */
1142++ { 0x00000001bbd94a00, 0x00000000c3762f28 },
1143++ /* x^109568 mod p(x)` << 1, x^109632 mod p(x)` << 1 */
1144++ { 0x00000000ab6cd180, 0x00000001d26287a2 },
1145++ /* x^108544 mod p(x)` << 1, x^108608 mod p(x)` << 1 */
1146++ { 0x0000000031803ce2, 0x00000001f6f0bba8 },
1147++ /* x^107520 mod p(x)` << 1, x^107584 mod p(x)` << 1 */
1148++ { 0x0000000024f40b0c, 0x000000002ffabd62 },
1149++ /* x^106496 mod p(x)` << 1, x^106560 mod p(x)` << 1 */
1150++ { 0x00000001ba1d9834, 0x00000000fb4516b8 },
1151++ /* x^105472 mod p(x)` << 1, x^105536 mod p(x)` << 1 */
1152++ { 0x0000000104de61aa, 0x000000018cfa961c },
1153++ /* x^104448 mod p(x)` << 1, x^104512 mod p(x)` << 1 */
1154++ { 0x0000000113e40d46, 0x000000019e588d52 },
1155++ /* x^103424 mod p(x)` << 1, x^103488 mod p(x)` << 1 */
1156++ { 0x00000001415598a0, 0x00000001180f0bbc },
1157++ /* x^102400 mod p(x)` << 1, x^102464 mod p(x)` << 1 */
1158++ { 0x00000000bf6c8c90, 0x00000000e1d9177a },
1159++ /* x^101376 mod p(x)` << 1, x^101440 mod p(x)` << 1 */
1160++ { 0x00000001788b0504, 0x0000000105abc27c },
1161++ /* x^100352 mod p(x)` << 1, x^100416 mod p(x)` << 1 */
1162++ { 0x0000000038385d02, 0x00000000972e4a58 },
1163++ /* x^99328 mod p(x)` << 1, x^99392 mod p(x)` << 1 */
1164++ { 0x00000001b6c83844, 0x0000000183499a5e },
1165++ /* x^98304 mod p(x)` << 1, x^98368 mod p(x)` << 1 */
1166++ { 0x0000000051061a8a, 0x00000001c96a8cca },
1167++ /* x^97280 mod p(x)` << 1, x^97344 mod p(x)` << 1 */
1168++ { 0x000000017351388a, 0x00000001a1a5b60c },
1169++ /* x^96256 mod p(x)` << 1, x^96320 mod p(x)` << 1 */
1170++ { 0x0000000132928f92, 0x00000000e4b6ac9c },
1171++ /* x^95232 mod p(x)` << 1, x^95296 mod p(x)` << 1 */
1172++ { 0x00000000e6b4f48a, 0x00000001807e7f5a },
1173++ /* x^94208 mod p(x)` << 1, x^94272 mod p(x)` << 1 */
1174++ { 0x0000000039d15e90, 0x000000017a7e3bc8 },
1175++ /* x^93184 mod p(x)` << 1, x^93248 mod p(x)` << 1 */
1176++ { 0x00000000312d6074, 0x00000000d73975da },
1177++ /* x^92160 mod p(x)` << 1, x^92224 mod p(x)` << 1 */
1178++ { 0x000000017bbb2cc4, 0x000000017375d038 },
1179++ /* x^91136 mod p(x)` << 1, x^91200 mod p(x)` << 1 */
1180++ { 0x000000016ded3e18, 0x00000000193680bc },
1181++ /* x^90112 mod p(x)` << 1, x^90176 mod p(x)` << 1 */
1182++ { 0x00000000f1638b16, 0x00000000999b06f6 },
1183++ /* x^89088 mod p(x)` << 1, x^89152 mod p(x)` << 1 */
1184++ { 0x00000001d38b9ecc, 0x00000001f685d2b8 },
1185++ /* x^88064 mod p(x)` << 1, x^88128 mod p(x)` << 1 */
1186++ { 0x000000018b8d09dc, 0x00000001f4ecbed2 },
1187++ /* x^87040 mod p(x)` << 1, x^87104 mod p(x)` << 1 */
1188++ { 0x00000000e7bc27d2, 0x00000000ba16f1a0 },
1189++ /* x^86016 mod p(x)` << 1, x^86080 mod p(x)` << 1 */
1190++ { 0x00000000275e1e96, 0x0000000115aceac4 },
1191++ /* x^84992 mod p(x)` << 1, x^85056 mod p(x)` << 1 */
1192++ { 0x00000000e2e3031e, 0x00000001aeff6292 },
1193++ /* x^83968 mod p(x)` << 1, x^84032 mod p(x)` << 1 */
1194++ { 0x00000001041c84d8, 0x000000009640124c },
1195++ /* x^82944 mod p(x)` << 1, x^83008 mod p(x)` << 1 */
1196++ { 0x00000000706ce672, 0x0000000114f41f02 },
1197++ /* x^81920 mod p(x)` << 1, x^81984 mod p(x)` << 1 */
1198++ { 0x000000015d5070da, 0x000000009c5f3586 },
1199++ /* x^80896 mod p(x)` << 1, x^80960 mod p(x)` << 1 */
1200++ { 0x0000000038f9493a, 0x00000001878275fa },
1201++ /* x^79872 mod p(x)` << 1, x^79936 mod p(x)` << 1 */
1202++ { 0x00000000a3348a76, 0x00000000ddc42ce8 },
1203++ /* x^78848 mod p(x)` << 1, x^78912 mod p(x)` << 1 */
1204++ { 0x00000001ad0aab92, 0x0000000181d2c73a },
1205++ /* x^77824 mod p(x)` << 1, x^77888 mod p(x)` << 1 */
1206++ { 0x000000019e85f712, 0x0000000141c9320a },
1207++ /* x^76800 mod p(x)` << 1, x^76864 mod p(x)` << 1 */
1208++ { 0x000000005a871e76, 0x000000015235719a },
1209++ /* x^75776 mod p(x)` << 1, x^75840 mod p(x)` << 1 */
1210++ { 0x000000017249c662, 0x00000000be27d804 },
1211++ /* x^74752 mod p(x)` << 1, x^74816 mod p(x)` << 1 */
1212++ { 0x000000003a084712, 0x000000006242d45a },
1213++ /* x^73728 mod p(x)` << 1, x^73792 mod p(x)` << 1 */
1214++ { 0x00000000ed438478, 0x000000009a53638e },
1215++ /* x^72704 mod p(x)` << 1, x^72768 mod p(x)` << 1 */
1216++ { 0x00000000abac34cc, 0x00000001001ecfb6 },
1217++ /* x^71680 mod p(x)` << 1, x^71744 mod p(x)` << 1 */
1218++ { 0x000000005f35ef3e, 0x000000016d7c2d64 },
1219++ /* x^70656 mod p(x)` << 1, x^70720 mod p(x)` << 1 */
1220++ { 0x0000000047d6608c, 0x00000001d0ce46c0 },
1221++ /* x^69632 mod p(x)` << 1, x^69696 mod p(x)` << 1 */
1222++ { 0x000000002d01470e, 0x0000000124c907b4 },
1223++ /* x^68608 mod p(x)` << 1, x^68672 mod p(x)` << 1 */
1224++ { 0x0000000158bbc7b0, 0x0000000018a555ca },
1225++ /* x^67584 mod p(x)` << 1, x^67648 mod p(x)` << 1 */
1226++ { 0x00000000c0a23e8e, 0x000000006b0980bc },
1227++ /* x^66560 mod p(x)` << 1, x^66624 mod p(x)` << 1 */
1228++ { 0x00000001ebd85c88, 0x000000008bbba964 },
1229++ /* x^65536 mod p(x)` << 1, x^65600 mod p(x)` << 1 */
1230++ { 0x000000019ee20bb2, 0x00000001070a5a1e },
1231++ /* x^64512 mod p(x)` << 1, x^64576 mod p(x)` << 1 */
1232++ { 0x00000001acabf2d6, 0x000000002204322a },
1233++ /* x^63488 mod p(x)` << 1, x^63552 mod p(x)` << 1 */
1234++ { 0x00000001b7963d56, 0x00000000a27524d0 },
1235++ /* x^62464 mod p(x)` << 1, x^62528 mod p(x)` << 1 */
1236++ { 0x000000017bffa1fe, 0x0000000020b1e4ba },
1237++ /* x^61440 mod p(x)` << 1, x^61504 mod p(x)` << 1 */
1238++ { 0x000000001f15333e, 0x0000000032cc27fc },
1239++ /* x^60416 mod p(x)` << 1, x^60480 mod p(x)` << 1 */
1240++ { 0x000000018593129e, 0x0000000044dd22b8 },
1241++ /* x^59392 mod p(x)` << 1, x^59456 mod p(x)` << 1 */
1242++ { 0x000000019cb32602, 0x00000000dffc9e0a },
1243++ /* x^58368 mod p(x)` << 1, x^58432 mod p(x)` << 1 */
1244++ { 0x0000000142b05cc8, 0x00000001b7a0ed14 },
1245++ /* x^57344 mod p(x)` << 1, x^57408 mod p(x)` << 1 */
1246++ { 0x00000001be49e7a4, 0x00000000c7842488 },
1247++ /* x^56320 mod p(x)` << 1, x^56384 mod p(x)` << 1 */
1248++ { 0x0000000108f69d6c, 0x00000001c02a4fee },
1249++ /* x^55296 mod p(x)` << 1, x^55360 mod p(x)` << 1 */
1250++ { 0x000000006c0971f0, 0x000000003c273778 },
1251++ /* x^54272 mod p(x)` << 1, x^54336 mod p(x)` << 1 */
1252++ { 0x000000005b16467a, 0x00000001d63f8894 },
1253++ /* x^53248 mod p(x)` << 1, x^53312 mod p(x)` << 1 */
1254++ { 0x00000001551a628e, 0x000000006be557d6 },
1255++ /* x^52224 mod p(x)` << 1, x^52288 mod p(x)` << 1 */
1256++ { 0x000000019e42ea92, 0x000000006a7806ea },
1257++ /* x^51200 mod p(x)` << 1, x^51264 mod p(x)` << 1 */
1258++ { 0x000000012fa83ff2, 0x000000016155aa0c },
1259++ /* x^50176 mod p(x)` << 1, x^50240 mod p(x)` << 1 */
1260++ { 0x000000011ca9cde0, 0x00000000908650ac },
1261++ /* x^49152 mod p(x)` << 1, x^49216 mod p(x)` << 1 */
1262++ { 0x00000000c8e5cd74, 0x00000000aa5a8084 },
1263++ /* x^48128 mod p(x)` << 1, x^48192 mod p(x)` << 1 */
1264++ { 0x0000000096c27f0c, 0x0000000191bb500a },
1265++ /* x^47104 mod p(x)` << 1, x^47168 mod p(x)` << 1 */
1266++ { 0x000000002baed926, 0x0000000064e9bed0 },
1267++ /* x^46080 mod p(x)` << 1, x^46144 mod p(x)` << 1 */
1268++ { 0x000000017c8de8d2, 0x000000009444f302 },
1269++ /* x^45056 mod p(x)` << 1, x^45120 mod p(x)` << 1 */
1270++ { 0x00000000d43d6068, 0x000000019db07d3c },
1271++ /* x^44032 mod p(x)` << 1, x^44096 mod p(x)` << 1 */
1272++ { 0x00000000cb2c4b26, 0x00000001359e3e6e },
1273++ /* x^43008 mod p(x)` << 1, x^43072 mod p(x)` << 1 */
1274++ { 0x0000000145b8da26, 0x00000001e4f10dd2 },
1275++ /* x^41984 mod p(x)` << 1, x^42048 mod p(x)` << 1 */
1276++ { 0x000000018fff4b08, 0x0000000124f5735e },
1277++ /* x^40960 mod p(x)` << 1, x^41024 mod p(x)` << 1 */
1278++ { 0x0000000150b58ed0, 0x0000000124760a4c },
1279++ /* x^39936 mod p(x)` << 1, x^40000 mod p(x)` << 1 */
1280++ { 0x00000001549f39bc, 0x000000000f1fc186 },
1281++ /* x^38912 mod p(x)` << 1, x^38976 mod p(x)` << 1 */
1282++ { 0x00000000ef4d2f42, 0x00000000150e4cc4 },
1283++ /* x^37888 mod p(x)` << 1, x^37952 mod p(x)` << 1 */
1284++ { 0x00000001b1468572, 0x000000002a6204e8 },
1285++ /* x^36864 mod p(x)` << 1, x^36928 mod p(x)` << 1 */
1286++ { 0x000000013d7403b2, 0x00000000beb1d432 },
1287++ /* x^35840 mod p(x)` << 1, x^35904 mod p(x)` << 1 */
1288++ { 0x00000001a4681842, 0x0000000135f3f1f0 },
1289++ /* x^34816 mod p(x)` << 1, x^34880 mod p(x)` << 1 */
1290++ { 0x0000000167714492, 0x0000000074fe2232 },
1291++ /* x^33792 mod p(x)` << 1, x^33856 mod p(x)` << 1 */
1292++ { 0x00000001e599099a, 0x000000001ac6e2ba },
1293++ /* x^32768 mod p(x)` << 1, x^32832 mod p(x)` << 1 */
1294++ { 0x00000000fe128194, 0x0000000013fca91e },
1295++ /* x^31744 mod p(x)` << 1, x^31808 mod p(x)` << 1 */
1296++ { 0x0000000077e8b990, 0x0000000183f4931e },
1297++ /* x^30720 mod p(x)` << 1, x^30784 mod p(x)` << 1 */
1298++ { 0x00000001a267f63a, 0x00000000b6d9b4e4 },
1299++ /* x^29696 mod p(x)` << 1, x^29760 mod p(x)` << 1 */
1300++ { 0x00000001945c245a, 0x00000000b5188656 },
1301++ /* x^28672 mod p(x)` << 1, x^28736 mod p(x)` << 1 */
1302++ { 0x0000000149002e76, 0x0000000027a81a84 },
1303++ /* x^27648 mod p(x)` << 1, x^27712 mod p(x)` << 1 */
1304++ { 0x00000001bb8310a4, 0x0000000125699258 },
1305++ /* x^26624 mod p(x)` << 1, x^26688 mod p(x)` << 1 */
1306++ { 0x000000019ec60bcc, 0x00000001b23de796 },
1307++ /* x^25600 mod p(x)` << 1, x^25664 mod p(x)` << 1 */
1308++ { 0x000000012d8590ae, 0x00000000fe4365dc },
1309++ /* x^24576 mod p(x)` << 1, x^24640 mod p(x)` << 1 */
1310++ { 0x0000000065b00684, 0x00000000c68f497a },
1311++ /* x^23552 mod p(x)` << 1, x^23616 mod p(x)` << 1 */
1312++ { 0x000000015e5aeadc, 0x00000000fbf521ee },
1313++ /* x^22528 mod p(x)` << 1, x^22592 mod p(x)` << 1 */
1314++ { 0x00000000b77ff2b0, 0x000000015eac3378 },
1315++ /* x^21504 mod p(x)` << 1, x^21568 mod p(x)` << 1 */
1316++ { 0x0000000188da2ff6, 0x0000000134914b90 },
1317++ /* x^20480 mod p(x)` << 1, x^20544 mod p(x)` << 1 */
1318++ { 0x0000000063da929a, 0x0000000016335cfe },
1319++ /* x^19456 mod p(x)` << 1, x^19520 mod p(x)` << 1 */
1320++ { 0x00000001389caa80, 0x000000010372d10c },
1321++ /* x^18432 mod p(x)` << 1, x^18496 mod p(x)` << 1 */
1322++ { 0x000000013db599d2, 0x000000015097b908 },
1323++ /* x^17408 mod p(x)` << 1, x^17472 mod p(x)` << 1 */
1324++ { 0x0000000122505a86, 0x00000001227a7572 },
1325++ /* x^16384 mod p(x)` << 1, x^16448 mod p(x)` << 1 */
1326++ { 0x000000016bd72746, 0x000000009a8f75c0 },
1327++ /* x^15360 mod p(x)` << 1, x^15424 mod p(x)` << 1 */
1328++ { 0x00000001c3faf1d4, 0x00000000682c77a2 },
1329++ /* x^14336 mod p(x)` << 1, x^14400 mod p(x)` << 1 */
1330++ { 0x00000001111c826c, 0x00000000231f091c },
1331++ /* x^13312 mod p(x)` << 1, x^13376 mod p(x)` << 1 */
1332++ { 0x00000000153e9fb2, 0x000000007d4439f2 },
1333++ /* x^12288 mod p(x)` << 1, x^12352 mod p(x)` << 1 */
1334++ { 0x000000002b1f7b60, 0x000000017e221efc },
1335++ /* x^11264 mod p(x)` << 1, x^11328 mod p(x)` << 1 */
1336++ { 0x00000000b1dba570, 0x0000000167457c38 },
1337++ /* x^10240 mod p(x)` << 1, x^10304 mod p(x)` << 1 */
1338++ { 0x00000001f6397b76, 0x00000000bdf081c4 },
1339++ /* x^9216 mod p(x)` << 1, x^9280 mod p(x)` << 1 */
1340++ { 0x0000000156335214, 0x000000016286d6b0 },
1341++ /* x^8192 mod p(x)` << 1, x^8256 mod p(x)` << 1 */
1342++ { 0x00000001d70e3986, 0x00000000c84f001c },
1343++ /* x^7168 mod p(x)` << 1, x^7232 mod p(x)` << 1 */
1344++ { 0x000000003701a774, 0x0000000064efe7c0 },
1345++ /* x^6144 mod p(x)` << 1, x^6208 mod p(x)` << 1 */
1346++ { 0x00000000ac81ef72, 0x000000000ac2d904 },
1347++ /* x^5120 mod p(x)` << 1, x^5184 mod p(x)` << 1 */
1348++ { 0x0000000133212464, 0x00000000fd226d14 },
1349++ /* x^4096 mod p(x)` << 1, x^4160 mod p(x)` << 1 */
1350++ { 0x00000000e4e45610, 0x000000011cfd42e0 },
1351++ /* x^3072 mod p(x)` << 1, x^3136 mod p(x)` << 1 */
1352++ { 0x000000000c1bd370, 0x000000016e5a5678 },
1353++ /* x^2048 mod p(x)` << 1, x^2112 mod p(x)` << 1 */
1354++ { 0x00000001a7b9e7a6, 0x00000001d888fe22 },
1355++ /* x^1024 mod p(x)` << 1, x^1088 mod p(x)` << 1 */
1356++ { 0x000000007d657a10, 0x00000001af77fcd4 }
1357++#else /* __LITTLE_ENDIAN__ */
1358++ /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
1359++ { 0x00000001651797d2, 0x0000000099ea94a8 },
1360++ /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
1361++ { 0x0000000021e0d56c, 0x00000000945a8420 },
1362++ /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
1363++ { 0x000000000f95ecaa, 0x0000000030762706 },
1364++ /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
1365++ { 0x00000001ebd224ac, 0x00000001a52fc582 },
1366++ /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
1367++ { 0x000000000ccb97ca, 0x00000001a4a7167a },
1368++ /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
1369++ { 0x00000001006ec8a8, 0x000000000c18249a },
1370++ /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
1371++ { 0x000000014f58f196, 0x00000000a924ae7c },
1372++ /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
1373++ { 0x00000001a7192ca6, 0x00000001e12ccc12 },
1374++ /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
1375++ { 0x000000019a64bab2, 0x00000000a0b9d4ac },
1376++ /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
1377++ { 0x0000000014f4ed2e, 0x0000000095e8ddfe },
1378++ /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
1379++ { 0x000000011092b6a2, 0x00000000233fddc4 },
1380++ /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
1381++ { 0x00000000c8a1629c, 0x00000001b4529b62 },
1382++ /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
1383++ { 0x000000017bf32e8e, 0x00000001a7fa0e64 },
1384++ /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */
1385++ { 0x00000001f8cc6582, 0x00000001b5334592 },
1386++ /* x^246784 mod p(x)` << 1, x^246848 mod p(x)` << 1 */
1387++ { 0x000000008631ddf0, 0x000000011f8ee1b4 },
1388++ /* x^245760 mod p(x)` << 1, x^245824 mod p(x)` << 1 */
1389++ { 0x000000007e5a76d0, 0x000000006252e632 },
1390++ /* x^244736 mod p(x)` << 1, x^244800 mod p(x)` << 1 */
1391++ { 0x000000002b09b31c, 0x00000000ab973e84 },
1392++ /* x^243712 mod p(x)` << 1, x^243776 mod p(x)` << 1 */
1393++ { 0x00000001b2df1f84, 0x000000007734f5ec },
1394++ /* x^242688 mod p(x)` << 1, x^242752 mod p(x)` << 1 */
1395++ { 0x00000001d6f56afc, 0x000000007c547798 },
1396++ /* x^241664 mod p(x)` << 1, x^241728 mod p(x)` << 1 */
1397++ { 0x00000001b9b5e70c, 0x000000007ec40210 },
1398++ /* x^240640 mod p(x)` << 1, x^240704 mod p(x)` << 1 */
1399++ { 0x0000000034b626d2, 0x00000001ab1695a8 },
1400++ /* x^239616 mod p(x)` << 1, x^239680 mod p(x)` << 1 */
1401++ { 0x000000014c53479a, 0x0000000090494bba },
1402++ /* x^238592 mod p(x)` << 1, x^238656 mod p(x)` << 1 */
1403++ { 0x00000001a6d179a4, 0x00000001123fb816 },
1404++ /* x^237568 mod p(x)` << 1, x^237632 mod p(x)` << 1 */
1405++ { 0x000000015abd16b4, 0x00000001e188c74c },
1406++ /* x^236544 mod p(x)` << 1, x^236608 mod p(x)` << 1 */
1407++ { 0x00000000018f9852, 0x00000001c2d3451c },
1408++ /* x^235520 mod p(x)` << 1, x^235584 mod p(x)` << 1 */
1409++ { 0x000000001fb3084a, 0x00000000f55cf1ca },
1410++ /* x^234496 mod p(x)` << 1, x^234560 mod p(x)` << 1 */
1411++ { 0x00000000c53dfb04, 0x00000001a0531540 },
1412++ /* x^233472 mod p(x)` << 1, x^233536 mod p(x)` << 1 */
1413++ { 0x00000000e10c9ad6, 0x0000000132cd7ebc },
1414++ /* x^232448 mod p(x)` << 1, x^232512 mod p(x)` << 1 */
1415++ { 0x0000000025aa994a, 0x0000000073ab7f36 },
1416++ /* x^231424 mod p(x)` << 1, x^231488 mod p(x)` << 1 */
1417++ { 0x00000000fa3a74c4, 0x0000000041aed1c2 },
1418++ /* x^230400 mod p(x)` << 1, x^230464 mod p(x)` << 1 */
1419++ { 0x0000000033eb3f40, 0x0000000136c53800 },
1420++ /* x^229376 mod p(x)` << 1, x^229440 mod p(x)` << 1 */
1421++ { 0x000000017193f296, 0x0000000126835a30 },
1422++ /* x^228352 mod p(x)` << 1, x^228416 mod p(x)` << 1 */
1423++ { 0x0000000043f6c86a, 0x000000006241b502 },
1424++ /* x^227328 mod p(x)` << 1, x^227392 mod p(x)` << 1 */
1425++ { 0x000000016b513ec6, 0x00000000d5196ad4 },
1426++ /* x^226304 mod p(x)` << 1, x^226368 mod p(x)` << 1 */
1427++ { 0x00000000c8f25b4e, 0x000000009cfa769a },
1428++ /* x^225280 mod p(x)` << 1, x^225344 mod p(x)` << 1 */
1429++ { 0x00000001a45048ec, 0x00000000920e5df4 },
1430++ /* x^224256 mod p(x)` << 1, x^224320 mod p(x)` << 1 */
1431++ { 0x000000000c441004, 0x0000000169dc310e },
1432++ /* x^223232 mod p(x)` << 1, x^223296 mod p(x)` << 1 */
1433++ { 0x000000000e17cad6, 0x0000000009fc331c },
1434++ /* x^222208 mod p(x)` << 1, x^222272 mod p(x)` << 1 */
1435++ { 0x00000001253ae964, 0x000000010d94a81e },
1436++ /* x^221184 mod p(x)` << 1, x^221248 mod p(x)` << 1 */
1437++ { 0x00000001d7c88ebc, 0x0000000027a20ab2 },
1438++ /* x^220160 mod p(x)` << 1, x^220224 mod p(x)` << 1 */
1439++ { 0x00000001e7ca913a, 0x0000000114f87504 },
1440++ /* x^219136 mod p(x)` << 1, x^219200 mod p(x)` << 1 */
1441++ { 0x0000000033ed078a, 0x000000004b076d96 },
1442++ /* x^218112 mod p(x)` << 1, x^218176 mod p(x)` << 1 */
1443++ { 0x00000000e1839c78, 0x00000000da4d1e74 },
1444++ /* x^217088 mod p(x)` << 1, x^217152 mod p(x)` << 1 */
1445++ { 0x00000001322b267e, 0x000000001b81f672 },
1446++ /* x^216064 mod p(x)` << 1, x^216128 mod p(x)` << 1 */
1447++ { 0x00000000638231b6, 0x000000009367c988 },
1448++ /* x^215040 mod p(x)` << 1, x^215104 mod p(x)` << 1 */
1449++ { 0x00000001ee7f16f4, 0x00000001717214ca },
1450++ /* x^214016 mod p(x)` << 1, x^214080 mod p(x)` << 1 */
1451++ { 0x0000000117d9924a, 0x000000009f47d820 },
1452++ /* x^212992 mod p(x)` << 1, x^213056 mod p(x)` << 1 */
1453++ { 0x00000000e1a9e0c4, 0x000000010d9a47d2 },
1454++ /* x^211968 mod p(x)` << 1, x^212032 mod p(x)` << 1 */
1455++ { 0x00000001403731dc, 0x00000000a696c58c },
1456++ /* x^210944 mod p(x)` << 1, x^211008 mod p(x)` << 1 */
1457++ { 0x00000001a5ea9682, 0x000000002aa28ec6 },
1458++ /* x^209920 mod p(x)` << 1, x^209984 mod p(x)` << 1 */
1459++ { 0x0000000101c5c578, 0x00000001fe18fd9a },
1460++ /* x^208896 mod p(x)` << 1, x^208960 mod p(x)` << 1 */
1461++ { 0x00000000dddf6494, 0x000000019d4fc1ae },
1462++ /* x^207872 mod p(x)` << 1, x^207936 mod p(x)` << 1 */
1463++ { 0x00000000f1c3db28, 0x00000001ba0e3dea },
1464++ /* x^206848 mod p(x)` << 1, x^206912 mod p(x)` << 1 */
1465++ { 0x000000013112fb9c, 0x0000000074b59a5e },
1466++ /* x^205824 mod p(x)` << 1, x^205888 mod p(x)` << 1 */
1467++ { 0x00000000b680b906, 0x00000000f2b5ea98 },
1468++ /* x^204800 mod p(x)` << 1, x^204864 mod p(x)` << 1 */
1469++ { 0x000000001a282932, 0x0000000187132676 },
1470++ /* x^203776 mod p(x)` << 1, x^203840 mod p(x)` << 1 */
1471++ { 0x0000000089406e7e, 0x000000010a8c6ad4 },
1472++ /* x^202752 mod p(x)` << 1, x^202816 mod p(x)` << 1 */
1473++ { 0x00000001def6be8c, 0x00000001e21dfe70 },
1474++ /* x^201728 mod p(x)` << 1, x^201792 mod p(x)` << 1 */
1475++ { 0x0000000075258728, 0x00000001da0050e4 },
1476++ /* x^200704 mod p(x)` << 1, x^200768 mod p(x)` << 1 */
1477++ { 0x000000019536090a, 0x00000000772172ae },
1478++ /* x^199680 mod p(x)` << 1, x^199744 mod p(x)` << 1 */
1479++ { 0x00000000f2455bfc, 0x00000000e47724aa },
1480++ /* x^198656 mod p(x)` << 1, x^198720 mod p(x)` << 1 */
1481++ { 0x000000018c40baf4, 0x000000003cd63ac4 },
1482++ /* x^197632 mod p(x)` << 1, x^197696 mod p(x)` << 1 */
1483++ { 0x000000004cd390d4, 0x00000001bf47d352 },
1484++ /* x^196608 mod p(x)` << 1, x^196672 mod p(x)` << 1 */
1485++ { 0x00000001e4ece95a, 0x000000018dc1d708 },
1486++ /* x^195584 mod p(x)` << 1, x^195648 mod p(x)` << 1 */
1487++ { 0x000000001a3ee918, 0x000000002d4620a4 },
1488++ /* x^194560 mod p(x)` << 1, x^194624 mod p(x)` << 1 */
1489++ { 0x000000007c652fb8, 0x0000000058fd1740 },
1490++ /* x^193536 mod p(x)` << 1, x^193600 mod p(x)` << 1 */
1491++ { 0x000000011c67842c, 0x00000000dadd9bfc },
1492++ /* x^192512 mod p(x)` << 1, x^192576 mod p(x)` << 1 */
1493++ { 0x00000000254f759c, 0x00000001ea2140be },
1494++ /* x^191488 mod p(x)` << 1, x^191552 mod p(x)` << 1 */
1495++ { 0x000000007ece94ca, 0x000000009de128ba },
1496++ /* x^190464 mod p(x)` << 1, x^190528 mod p(x)` << 1 */
1497++ { 0x0000000038f258c2, 0x000000013ac3aa8e },
1498++ /* x^189440 mod p(x)` << 1, x^189504 mod p(x)` << 1 */
1499++ { 0x00000001cdf17b00, 0x0000000099980562 },
1500++ /* x^188416 mod p(x)` << 1, x^188480 mod p(x)` << 1 */
1501++ { 0x000000011f882c16, 0x00000001c1579c86 },
1502++ /* x^187392 mod p(x)` << 1, x^187456 mod p(x)` << 1 */
1503++ { 0x0000000100093fc8, 0x0000000068dbbf94 },
1504++ /* x^186368 mod p(x)` << 1, x^186432 mod p(x)` << 1 */
1505++ { 0x00000001cd684f16, 0x000000004509fb04 },
1506++ /* x^185344 mod p(x)` << 1, x^185408 mod p(x)` << 1 */
1507++ { 0x000000004bc6a70a, 0x00000001202f6398 },
1508++ /* x^184320 mod p(x)` << 1, x^184384 mod p(x)` << 1 */
1509++ { 0x000000004fc7e8e4, 0x000000013aea243e },
1510++ /* x^183296 mod p(x)` << 1, x^183360 mod p(x)` << 1 */
1511++ { 0x0000000130103f1c, 0x00000001b4052ae6 },
1512++ /* x^182272 mod p(x)` << 1, x^182336 mod p(x)` << 1 */
1513++ { 0x0000000111b0024c, 0x00000001cd2a0ae8 },
1514++ /* x^181248 mod p(x)` << 1, x^181312 mod p(x)` << 1 */
1515++ { 0x000000010b3079da, 0x00000001fe4aa8b4 },
1516++ /* x^180224 mod p(x)` << 1, x^180288 mod p(x)` << 1 */
1517++ { 0x000000010192bcc2, 0x00000001d1559a42 },
1518++ /* x^179200 mod p(x)` << 1, x^179264 mod p(x)` << 1 */
1519++ { 0x0000000074838d50, 0x00000001f3e05ecc },
1520++ /* x^178176 mod p(x)` << 1, x^178240 mod p(x)` << 1 */
1521++ { 0x000000001b20f520, 0x0000000104ddd2cc },
1522++ /* x^177152 mod p(x)` << 1, x^177216 mod p(x)` << 1 */
1523++ { 0x0000000050c3590a, 0x000000015393153c },
1524++ /* x^176128 mod p(x)` << 1, x^176192 mod p(x)` << 1 */
1525++ { 0x00000000b41cac8e, 0x0000000057e942c6 },
1526++ /* x^175104 mod p(x)` << 1, x^175168 mod p(x)` << 1 */
1527++ { 0x000000000c72cc78, 0x000000012c633850 },
1528++ /* x^174080 mod p(x)` << 1, x^174144 mod p(x)` << 1 */
1529++ { 0x0000000030cdb032, 0x00000000ebcaae4c },
1530++ /* x^173056 mod p(x)` << 1, x^173120 mod p(x)` << 1 */
1531++ { 0x000000013e09fc32, 0x000000013ee532a6 },
1532++ /* x^172032 mod p(x)` << 1, x^172096 mod p(x)` << 1 */
1533++ { 0x000000001ed624d2, 0x00000001bf0cbc7e },
1534++ /* x^171008 mod p(x)` << 1, x^171072 mod p(x)` << 1 */
1535++ { 0x00000000781aee1a, 0x00000000d50b7a5a },
1536++ /* x^169984 mod p(x)` << 1, x^170048 mod p(x)` << 1 */
1537++ { 0x00000001c4d8348c, 0x0000000002fca6e8 },
1538++ /* x^168960 mod p(x)` << 1, x^169024 mod p(x)` << 1 */
1539++ { 0x0000000057a40336, 0x000000007af40044 },
1540++ /* x^167936 mod p(x)` << 1, x^168000 mod p(x)` << 1 */
1541++ { 0x0000000085544940, 0x0000000016178744 },
1542++ /* x^166912 mod p(x)` << 1, x^166976 mod p(x)` << 1 */
1543++ { 0x000000019cd21e80, 0x000000014c177458 },
1544++ /* x^165888 mod p(x)` << 1, x^165952 mod p(x)` << 1 */
1545++ { 0x000000013eb95bc0, 0x000000011b6ddf04 },
1546++ /* x^164864 mod p(x)` << 1, x^164928 mod p(x)` << 1 */
1547++ { 0x00000001dfc9fdfc, 0x00000001f3e29ccc },
1548++ /* x^163840 mod p(x)` << 1, x^163904 mod p(x)` << 1 */
1549++ { 0x00000000cd028bc2, 0x0000000135ae7562 },
1550++ /* x^162816 mod p(x)` << 1, x^162880 mod p(x)` << 1 */
1551++ { 0x0000000090db8c44, 0x0000000190ef812c },
1552++ /* x^161792 mod p(x)` << 1, x^161856 mod p(x)` << 1 */
1553++ { 0x000000010010a4ce, 0x0000000067a2c786 },
1554++ /* x^160768 mod p(x)` << 1, x^160832 mod p(x)` << 1 */
1555++ { 0x00000001c8f4c72c, 0x0000000048b9496c },
1556++ /* x^159744 mod p(x)` << 1, x^159808 mod p(x)` << 1 */
1557++ { 0x000000001c26170c, 0x000000015a422de6 },
1558++ /* x^158720 mod p(x)` << 1, x^158784 mod p(x)` << 1 */
1559++ { 0x00000000e3fccf68, 0x00000001ef0e3640 },
1560++ /* x^157696 mod p(x)` << 1, x^157760 mod p(x)` << 1 */
1561++ { 0x00000000d513ed24, 0x00000001006d2d26 },
1562++ /* x^156672 mod p(x)` << 1, x^156736 mod p(x)` << 1 */
1563++ { 0x00000000141beada, 0x00000001170d56d6 },
1564++ /* x^155648 mod p(x)` << 1, x^155712 mod p(x)` << 1 */
1565++ { 0x000000011071aea0, 0x00000000a5fb613c },
1566++ /* x^154624 mod p(x)` << 1, x^154688 mod p(x)` << 1 */
1567++ { 0x000000012e19080a, 0x0000000040bbf7fc },
1568++ /* x^153600 mod p(x)` << 1, x^153664 mod p(x)` << 1 */
1569++ { 0x0000000100ecf826, 0x000000016ac3a5b2 },
1570++ /* x^152576 mod p(x)` << 1, x^152640 mod p(x)` << 1 */
1571++ { 0x0000000069b09412, 0x00000000abf16230 },
1572++ /* x^151552 mod p(x)` << 1, x^151616 mod p(x)` << 1 */
1573++ { 0x0000000122297bac, 0x00000001ebe23fac },
1574++ /* x^150528 mod p(x)` << 1, x^150592 mod p(x)` << 1 */
1575++ { 0x00000000e9e4b068, 0x000000008b6a0894 },
1576++ /* x^149504 mod p(x)` << 1, x^149568 mod p(x)` << 1 */
1577++ { 0x000000004b38651a, 0x00000001288ea478 },
1578++ /* x^148480 mod p(x)` << 1, x^148544 mod p(x)` << 1 */
1579++ { 0x00000001468360e2, 0x000000016619c442 },
1580++ /* x^147456 mod p(x)` << 1, x^147520 mod p(x)` << 1 */
1581++ { 0x00000000121c2408, 0x0000000086230038 },
1582++ /* x^146432 mod p(x)` << 1, x^146496 mod p(x)` << 1 */
1583++ { 0x00000000da7e7d08, 0x000000017746a756 },
1584++ /* x^145408 mod p(x)` << 1, x^145472 mod p(x)` << 1 */
1585++ { 0x00000001058d7652, 0x0000000191b8f8f8 },
1586++ /* x^144384 mod p(x)` << 1, x^144448 mod p(x)` << 1 */
1587++ { 0x000000014a098a90, 0x000000008e167708 },
1588++ /* x^143360 mod p(x)` << 1, x^143424 mod p(x)` << 1 */
1589++ { 0x0000000020dbe72e, 0x0000000148b22d54 },
1590++ /* x^142336 mod p(x)` << 1, x^142400 mod p(x)` << 1 */
1591++ { 0x000000011e7323e8, 0x0000000044ba2c3c },
1592++ /* x^141312 mod p(x)` << 1, x^141376 mod p(x)` << 1 */
1593++ { 0x00000000d5d4bf94, 0x00000000b54d2b52 },
1594++ /* x^140288 mod p(x)` << 1, x^140352 mod p(x)` << 1 */
1595++ { 0x0000000199d8746c, 0x0000000005a4fd8a },
1596++ /* x^139264 mod p(x)` << 1, x^139328 mod p(x)` << 1 */
1597++ { 0x00000000ce9ca8a0, 0x0000000139f9fc46 },
1598++ /* x^138240 mod p(x)` << 1, x^138304 mod p(x)` << 1 */
1599++ { 0x00000000136edece, 0x000000015a1fa824 },
1600++ /* x^137216 mod p(x)` << 1, x^137280 mod p(x)` << 1 */
1601++ { 0x000000019b92a068, 0x000000000a61ae4c },
1602++ /* x^136192 mod p(x)` << 1, x^136256 mod p(x)` << 1 */
1603++ { 0x0000000071d62206, 0x0000000145e9113e },
1604++ /* x^135168 mod p(x)` << 1, x^135232 mod p(x)` << 1 */
1605++ { 0x00000000dfc50158, 0x000000006a348448 },
1606++ /* x^134144 mod p(x)` << 1, x^134208 mod p(x)` << 1 */
1607++ { 0x00000001517626bc, 0x000000004d80a08c },
1608++ /* x^133120 mod p(x)` << 1, x^133184 mod p(x)` << 1 */
1609++ { 0x0000000148d1e4fa, 0x000000014b6837a0 },
1610++ /* x^132096 mod p(x)` << 1, x^132160 mod p(x)` << 1 */
1611++ { 0x0000000094d8266e, 0x000000016896a7fc },
1612++ /* x^131072 mod p(x)` << 1, x^131136 mod p(x)` << 1 */
1613++ { 0x00000000606c5e34, 0x000000014f187140 },
1614++ /* x^130048 mod p(x)` << 1, x^130112 mod p(x)` << 1 */
1615++ { 0x000000019766beaa, 0x000000019581b9da },
1616++ /* x^129024 mod p(x)` << 1, x^129088 mod p(x)` << 1 */
1617++ { 0x00000001d80c506c, 0x00000001091bc984 },
1618++ /* x^128000 mod p(x)` << 1, x^128064 mod p(x)` << 1 */
1619++ { 0x000000001e73837c, 0x000000001067223c },
1620++ /* x^126976 mod p(x)` << 1, x^127040 mod p(x)` << 1 */
1621++ { 0x0000000064d587de, 0x00000001ab16ea02 },
1622++ /* x^125952 mod p(x)` << 1, x^126016 mod p(x)` << 1 */
1623++ { 0x00000000f4a507b0, 0x000000013c4598a8 },
1624++ /* x^124928 mod p(x)` << 1, x^124992 mod p(x)` << 1 */
1625++ { 0x0000000040e342fc, 0x00000000b3735430 },
1626++ /* x^123904 mod p(x)` << 1, x^123968 mod p(x)` << 1 */
1627++ { 0x00000001d5ad9c3a, 0x00000001bb3fc0c0 },
1628++ /* x^122880 mod p(x)` << 1, x^122944 mod p(x)` << 1 */
1629++ { 0x0000000094a691a4, 0x00000001570ae19c },
1630++ /* x^121856 mod p(x)` << 1, x^121920 mod p(x)` << 1 */
1631++ { 0x00000001271ecdfa, 0x00000001ea910712 },
1632++ /* x^120832 mod p(x)` << 1, x^120896 mod p(x)` << 1 */
1633++ { 0x000000009e54475a, 0x0000000167127128 },
1634++ /* x^119808 mod p(x)` << 1, x^119872 mod p(x)` << 1 */
1635++ { 0x00000000c9c099ee, 0x0000000019e790a2 },
1636++ /* x^118784 mod p(x)` << 1, x^118848 mod p(x)` << 1 */
1637++ { 0x000000009a2f736c, 0x000000003788f710 },
1638++ /* x^117760 mod p(x)` << 1, x^117824 mod p(x)` << 1 */
1639++ { 0x00000000bb9f4996, 0x00000001682a160e },
1640++ /* x^116736 mod p(x)` << 1, x^116800 mod p(x)` << 1 */
1641++ { 0x00000001db688050, 0x000000007f0ebd2e },
1642++ /* x^115712 mod p(x)` << 1, x^115776 mod p(x)` << 1 */
1643++ { 0x00000000e9b10af4, 0x000000002b032080 },
1644++ /* x^114688 mod p(x)` << 1, x^114752 mod p(x)` << 1 */
1645++ { 0x000000012d4545e4, 0x00000000cfd1664a },
1646++ /* x^113664 mod p(x)` << 1, x^113728 mod p(x)` << 1 */
1647++ { 0x000000000361139c, 0x00000000aa1181c2 },
1648++ /* x^112640 mod p(x)` << 1, x^112704 mod p(x)` << 1 */
1649++ { 0x00000001a5a1a3a8, 0x00000000ddd08002 },
1650++ /* x^111616 mod p(x)` << 1, x^111680 mod p(x)` << 1 */
1651++ { 0x000000006844e0b0, 0x00000000e8dd0446 },
1652++ /* x^110592 mod p(x)` << 1, x^110656 mod p(x)` << 1 */
1653++ { 0x00000000c3762f28, 0x00000001bbd94a00 },
1654++ /* x^109568 mod p(x)` << 1, x^109632 mod p(x)` << 1 */
1655++ { 0x00000001d26287a2, 0x00000000ab6cd180 },
1656++ /* x^108544 mod p(x)` << 1, x^108608 mod p(x)` << 1 */
1657++ { 0x00000001f6f0bba8, 0x0000000031803ce2 },
1658++ /* x^107520 mod p(x)` << 1, x^107584 mod p(x)` << 1 */
1659++ { 0x000000002ffabd62, 0x0000000024f40b0c },
1660++ /* x^106496 mod p(x)` << 1, x^106560 mod p(x)` << 1 */
1661++ { 0x00000000fb4516b8, 0x00000001ba1d9834 },
1662++ /* x^105472 mod p(x)` << 1, x^105536 mod p(x)` << 1 */
1663++ { 0x000000018cfa961c, 0x0000000104de61aa },
1664++ /* x^104448 mod p(x)` << 1, x^104512 mod p(x)` << 1 */
1665++ { 0x000000019e588d52, 0x0000000113e40d46 },
1666++ /* x^103424 mod p(x)` << 1, x^103488 mod p(x)` << 1 */
1667++ { 0x00000001180f0bbc, 0x00000001415598a0 },
1668++ /* x^102400 mod p(x)` << 1, x^102464 mod p(x)` << 1 */
1669++ { 0x00000000e1d9177a, 0x00000000bf6c8c90 },
1670++ /* x^101376 mod p(x)` << 1, x^101440 mod p(x)` << 1 */
1671++ { 0x0000000105abc27c, 0x00000001788b0504 },
1672++ /* x^100352 mod p(x)` << 1, x^100416 mod p(x)` << 1 */
1673++ { 0x00000000972e4a58, 0x0000000038385d02 },
1674++ /* x^99328 mod p(x)` << 1, x^99392 mod p(x)` << 1 */
1675++ { 0x0000000183499a5e, 0x00000001b6c83844 },
1676++ /* x^98304 mod p(x)` << 1, x^98368 mod p(x)` << 1 */
1677++ { 0x00000001c96a8cca, 0x0000000051061a8a },
1678++ /* x^97280 mod p(x)` << 1, x^97344 mod p(x)` << 1 */
1679++ { 0x00000001a1a5b60c, 0x000000017351388a },
1680++ /* x^96256 mod p(x)` << 1, x^96320 mod p(x)` << 1 */
1681++ { 0x00000000e4b6ac9c, 0x0000000132928f92 },
1682++ /* x^95232 mod p(x)` << 1, x^95296 mod p(x)` << 1 */
1683++ { 0x00000001807e7f5a, 0x00000000e6b4f48a },
1684++ /* x^94208 mod p(x)` << 1, x^94272 mod p(x)` << 1 */
1685++ { 0x000000017a7e3bc8, 0x0000000039d15e90 },
1686++ /* x^93184 mod p(x)` << 1, x^93248 mod p(x)` << 1 */
1687++ { 0x00000000d73975da, 0x00000000312d6074 },
1688++ /* x^92160 mod p(x)` << 1, x^92224 mod p(x)` << 1 */
1689++ { 0x000000017375d038, 0x000000017bbb2cc4 },
1690++ /* x^91136 mod p(x)` << 1, x^91200 mod p(x)` << 1 */
1691++ { 0x00000000193680bc, 0x000000016ded3e18 },
1692++ /* x^90112 mod p(x)` << 1, x^90176 mod p(x)` << 1 */
1693++ { 0x00000000999b06f6, 0x00000000f1638b16 },
1694++ /* x^89088 mod p(x)` << 1, x^89152 mod p(x)` << 1 */
1695++ { 0x00000001f685d2b8, 0x00000001d38b9ecc },
1696++ /* x^88064 mod p(x)` << 1, x^88128 mod p(x)` << 1 */
1697++ { 0x00000001f4ecbed2, 0x000000018b8d09dc },
1698++ /* x^87040 mod p(x)` << 1, x^87104 mod p(x)` << 1 */
1699++ { 0x00000000ba16f1a0, 0x00000000e7bc27d2 },
1700++ /* x^86016 mod p(x)` << 1, x^86080 mod p(x)` << 1 */
1701++ { 0x0000000115aceac4, 0x00000000275e1e96 },
1702++ /* x^84992 mod p(x)` << 1, x^85056 mod p(x)` << 1 */
1703++ { 0x00000001aeff6292, 0x00000000e2e3031e },
1704++ /* x^83968 mod p(x)` << 1, x^84032 mod p(x)` << 1 */
1705++ { 0x000000009640124c, 0x00000001041c84d8 },
1706++ /* x^82944 mod p(x)` << 1, x^83008 mod p(x)` << 1 */
1707++ { 0x0000000114f41f02, 0x00000000706ce672 },
1708++ /* x^81920 mod p(x)` << 1, x^81984 mod p(x)` << 1 */
1709++ { 0x000000009c5f3586, 0x000000015d5070da },
1710++ /* x^80896 mod p(x)` << 1, x^80960 mod p(x)` << 1 */
1711++ { 0x00000001878275fa, 0x0000000038f9493a },
1712++ /* x^79872 mod p(x)` << 1, x^79936 mod p(x)` << 1 */
1713++ { 0x00000000ddc42ce8, 0x00000000a3348a76 },
1714++ /* x^78848 mod p(x)` << 1, x^78912 mod p(x)` << 1 */
1715++ { 0x0000000181d2c73a, 0x00000001ad0aab92 },
1716++ /* x^77824 mod p(x)` << 1, x^77888 mod p(x)` << 1 */
1717++ { 0x0000000141c9320a, 0x000000019e85f712 },
1718++ /* x^76800 mod p(x)` << 1, x^76864 mod p(x)` << 1 */
1719++ { 0x000000015235719a, 0x000000005a871e76 },
1720++ /* x^75776 mod p(x)` << 1, x^75840 mod p(x)` << 1 */
1721++ { 0x00000000be27d804, 0x000000017249c662 },
1722++ /* x^74752 mod p(x)` << 1, x^74816 mod p(x)` << 1 */
1723++ { 0x000000006242d45a, 0x000000003a084712 },
1724++ /* x^73728 mod p(x)` << 1, x^73792 mod p(x)` << 1 */
1725++ { 0x000000009a53638e, 0x00000000ed438478 },
1726++ /* x^72704 mod p(x)` << 1, x^72768 mod p(x)` << 1 */
1727++ { 0x00000001001ecfb6, 0x00000000abac34cc },
1728++ /* x^71680 mod p(x)` << 1, x^71744 mod p(x)` << 1 */
1729++ { 0x000000016d7c2d64, 0x000000005f35ef3e },
1730++ /* x^70656 mod p(x)` << 1, x^70720 mod p(x)` << 1 */
1731++ { 0x00000001d0ce46c0, 0x0000000047d6608c },
1732++ /* x^69632 mod p(x)` << 1, x^69696 mod p(x)` << 1 */
1733++ { 0x0000000124c907b4, 0x000000002d01470e },
1734++ /* x^68608 mod p(x)` << 1, x^68672 mod p(x)` << 1 */
1735++ { 0x0000000018a555ca, 0x0000000158bbc7b0 },
1736++ /* x^67584 mod p(x)` << 1, x^67648 mod p(x)` << 1 */
1737++ { 0x000000006b0980bc, 0x00000000c0a23e8e },
1738++ /* x^66560 mod p(x)` << 1, x^66624 mod p(x)` << 1 */
1739++ { 0x000000008bbba964, 0x00000001ebd85c88 },
1740++ /* x^65536 mod p(x)` << 1, x^65600 mod p(x)` << 1 */
1741++ { 0x00000001070a5a1e, 0x000000019ee20bb2 },
1742++ /* x^64512 mod p(x)` << 1, x^64576 mod p(x)` << 1 */
1743++ { 0x000000002204322a, 0x00000001acabf2d6 },
1744++ /* x^63488 mod p(x)` << 1, x^63552 mod p(x)` << 1 */
1745++ { 0x00000000a27524d0, 0x00000001b7963d56 },
1746++ /* x^62464 mod p(x)` << 1, x^62528 mod p(x)` << 1 */
1747++ { 0x0000000020b1e4ba, 0x000000017bffa1fe },
1748++ /* x^61440 mod p(x)` << 1, x^61504 mod p(x)` << 1 */
1749++ { 0x0000000032cc27fc, 0x000000001f15333e },
1750++ /* x^60416 mod p(x)` << 1, x^60480 mod p(x)` << 1 */
1751++ { 0x0000000044dd22b8, 0x000000018593129e },
1752++ /* x^59392 mod p(x)` << 1, x^59456 mod p(x)` << 1 */
1753++ { 0x00000000dffc9e0a, 0x000000019cb32602 },
1754++ /* x^58368 mod p(x)` << 1, x^58432 mod p(x)` << 1 */
1755++ { 0x00000001b7a0ed14, 0x0000000142b05cc8 },
1756++ /* x^57344 mod p(x)` << 1, x^57408 mod p(x)` << 1 */
1757++ { 0x00000000c7842488, 0x00000001be49e7a4 },
1758++ /* x^56320 mod p(x)` << 1, x^56384 mod p(x)` << 1 */
1759++ { 0x00000001c02a4fee, 0x0000000108f69d6c },
1760++ /* x^55296 mod p(x)` << 1, x^55360 mod p(x)` << 1 */
1761++ { 0x000000003c273778, 0x000000006c0971f0 },
1762++ /* x^54272 mod p(x)` << 1, x^54336 mod p(x)` << 1 */
1763++ { 0x00000001d63f8894, 0x000000005b16467a },
1764++ /* x^53248 mod p(x)` << 1, x^53312 mod p(x)` << 1 */
1765++ { 0x000000006be557d6, 0x00000001551a628e },
1766++ /* x^52224 mod p(x)` << 1, x^52288 mod p(x)` << 1 */
1767++ { 0x000000006a7806ea, 0x000000019e42ea92 },
1768++ /* x^51200 mod p(x)` << 1, x^51264 mod p(x)` << 1 */
1769++ { 0x000000016155aa0c, 0x000000012fa83ff2 },
1770++ /* x^50176 mod p(x)` << 1, x^50240 mod p(x)` << 1 */
1771++ { 0x00000000908650ac, 0x000000011ca9cde0 },
1772++ /* x^49152 mod p(x)` << 1, x^49216 mod p(x)` << 1 */
1773++ { 0x00000000aa5a8084, 0x00000000c8e5cd74 },
1774++ /* x^48128 mod p(x)` << 1, x^48192 mod p(x)` << 1 */
1775++ { 0x0000000191bb500a, 0x0000000096c27f0c },
1776++ /* x^47104 mod p(x)` << 1, x^47168 mod p(x)` << 1 */
1777++ { 0x0000000064e9bed0, 0x000000002baed926 },
1778++ /* x^46080 mod p(x)` << 1, x^46144 mod p(x)` << 1 */
1779++ { 0x000000009444f302, 0x000000017c8de8d2 },
1780++ /* x^45056 mod p(x)` << 1, x^45120 mod p(x)` << 1 */
1781++ { 0x000000019db07d3c, 0x00000000d43d6068 },
1782++ /* x^44032 mod p(x)` << 1, x^44096 mod p(x)` << 1 */
1783++ { 0x00000001359e3e6e, 0x00000000cb2c4b26 },
1784++ /* x^43008 mod p(x)` << 1, x^43072 mod p(x)` << 1 */
1785++ { 0x00000001e4f10dd2, 0x0000000145b8da26 },
1786++ /* x^41984 mod p(x)` << 1, x^42048 mod p(x)` << 1 */
1787++ { 0x0000000124f5735e, 0x000000018fff4b08 },
1788++ /* x^40960 mod p(x)` << 1, x^41024 mod p(x)` << 1 */
1789++ { 0x0000000124760a4c, 0x0000000150b58ed0 },
1790++ /* x^39936 mod p(x)` << 1, x^40000 mod p(x)` << 1 */
1791++ { 0x000000000f1fc186, 0x00000001549f39bc },
1792++ /* x^38912 mod p(x)` << 1, x^38976 mod p(x)` << 1 */
1793++ { 0x00000000150e4cc4, 0x00000000ef4d2f42 },
1794++ /* x^37888 mod p(x)` << 1, x^37952 mod p(x)` << 1 */
1795++ { 0x000000002a6204e8, 0x00000001b1468572 },
1796++ /* x^36864 mod p(x)` << 1, x^36928 mod p(x)` << 1 */
1797++ { 0x00000000beb1d432, 0x000000013d7403b2 },
1798++ /* x^35840 mod p(x)` << 1, x^35904 mod p(x)` << 1 */
1799++ { 0x0000000135f3f1f0, 0x00000001a4681842 },
1800++ /* x^34816 mod p(x)` << 1, x^34880 mod p(x)` << 1 */
1801++ { 0x0000000074fe2232, 0x0000000167714492 },
1802++ /* x^33792 mod p(x)` << 1, x^33856 mod p(x)` << 1 */
1803++ { 0x000000001ac6e2ba, 0x00000001e599099a },
1804++ /* x^32768 mod p(x)` << 1, x^32832 mod p(x)` << 1 */
1805++ { 0x0000000013fca91e, 0x00000000fe128194 },
1806++ /* x^31744 mod p(x)` << 1, x^31808 mod p(x)` << 1 */
1807++ { 0x0000000183f4931e, 0x0000000077e8b990 },
1808++ /* x^30720 mod p(x)` << 1, x^30784 mod p(x)` << 1 */
1809++ { 0x00000000b6d9b4e4, 0x00000001a267f63a },
1810++ /* x^29696 mod p(x)` << 1, x^29760 mod p(x)` << 1 */
1811++ { 0x00000000b5188656, 0x00000001945c245a },
1812++ /* x^28672 mod p(x)` << 1, x^28736 mod p(x)` << 1 */
1813++ { 0x0000000027a81a84, 0x0000000149002e76 },
1814++ /* x^27648 mod p(x)` << 1, x^27712 mod p(x)` << 1 */
1815++ { 0x0000000125699258, 0x00000001bb8310a4 },
1816++ /* x^26624 mod p(x)` << 1, x^26688 mod p(x)` << 1 */
1817++ { 0x00000001b23de796, 0x000000019ec60bcc },
1818++ /* x^25600 mod p(x)` << 1, x^25664 mod p(x)` << 1 */
1819++ { 0x00000000fe4365dc, 0x000000012d8590ae },
1820++ /* x^24576 mod p(x)` << 1, x^24640 mod p(x)` << 1 */
1821++ { 0x00000000c68f497a, 0x0000000065b00684 },
1822++ /* x^23552 mod p(x)` << 1, x^23616 mod p(x)` << 1 */
1823++ { 0x00000000fbf521ee, 0x000000015e5aeadc },
1824++ /* x^22528 mod p(x)` << 1, x^22592 mod p(x)` << 1 */
1825++ { 0x000000015eac3378, 0x00000000b77ff2b0 },
1826++ /* x^21504 mod p(x)` << 1, x^21568 mod p(x)` << 1 */
1827++ { 0x0000000134914b90, 0x0000000188da2ff6 },
1828++ /* x^20480 mod p(x)` << 1, x^20544 mod p(x)` << 1 */
1829++ { 0x0000000016335cfe, 0x0000000063da929a },
1830++ /* x^19456 mod p(x)` << 1, x^19520 mod p(x)` << 1 */
1831++ { 0x000000010372d10c, 0x00000001389caa80 },
1832++ /* x^18432 mod p(x)` << 1, x^18496 mod p(x)` << 1 */
1833++ { 0x000000015097b908, 0x000000013db599d2 },
1834++ /* x^17408 mod p(x)` << 1, x^17472 mod p(x)` << 1 */
1835++ { 0x00000001227a7572, 0x0000000122505a86 },
1836++ /* x^16384 mod p(x)` << 1, x^16448 mod p(x)` << 1 */
1837++ { 0x000000009a8f75c0, 0x000000016bd72746 },
1838++ /* x^15360 mod p(x)` << 1, x^15424 mod p(x)` << 1 */
1839++ { 0x00000000682c77a2, 0x00000001c3faf1d4 },
1840++ /* x^14336 mod p(x)` << 1, x^14400 mod p(x)` << 1 */
1841++ { 0x00000000231f091c, 0x00000001111c826c },
1842++ /* x^13312 mod p(x)` << 1, x^13376 mod p(x)` << 1 */
1843++ { 0x000000007d4439f2, 0x00000000153e9fb2 },
1844++ /* x^12288 mod p(x)` << 1, x^12352 mod p(x)` << 1 */
1845++ { 0x000000017e221efc, 0x000000002b1f7b60 },
1846++ /* x^11264 mod p(x)` << 1, x^11328 mod p(x)` << 1 */
1847++ { 0x0000000167457c38, 0x00000000b1dba570 },
1848++ /* x^10240 mod p(x)` << 1, x^10304 mod p(x)` << 1 */
1849++ { 0x00000000bdf081c4, 0x00000001f6397b76 },
1850++ /* x^9216 mod p(x)` << 1, x^9280 mod p(x)` << 1 */
1851++ { 0x000000016286d6b0, 0x0000000156335214 },
1852++ /* x^8192 mod p(x)` << 1, x^8256 mod p(x)` << 1 */
1853++ { 0x00000000c84f001c, 0x00000001d70e3986 },
1854++ /* x^7168 mod p(x)` << 1, x^7232 mod p(x)` << 1 */
1855++ { 0x0000000064efe7c0, 0x000000003701a774 },
1856++ /* x^6144 mod p(x)` << 1, x^6208 mod p(x)` << 1 */
1857++ { 0x000000000ac2d904, 0x00000000ac81ef72 },
1858++ /* x^5120 mod p(x)` << 1, x^5184 mod p(x)` << 1 */
1859++ { 0x00000000fd226d14, 0x0000000133212464 },
1860++ /* x^4096 mod p(x)` << 1, x^4160 mod p(x)` << 1 */
1861++ { 0x000000011cfd42e0, 0x00000000e4e45610 },
1862++ /* x^3072 mod p(x)` << 1, x^3136 mod p(x)` << 1 */
1863++ { 0x000000016e5a5678, 0x000000000c1bd370 },
1864++ /* x^2048 mod p(x)` << 1, x^2112 mod p(x)` << 1 */
1865++ { 0x00000001d888fe22, 0x00000001a7b9e7a6 },
1866++ /* x^1024 mod p(x)` << 1, x^1088 mod p(x)` << 1 */
1867++ { 0x00000001af77fcd4, 0x000000007d657a10 }
1868++#endif /* __LITTLE_ENDIAN__ */
1869++ };
1870++
1871++/* Reduce final 1024-2048 bits to 64 bits, shifting 32 bits to include the trailing 32 bits of zeros */
1872++
1873++static const __vector unsigned long long vcrc_short_const[16]
1874++ __attribute__((aligned (16))) = {
1875++#ifdef __LITTLE_ENDIAN__
1876++ /* x^1952 mod p(x) , x^1984 mod p(x) , x^2016 mod p(x) , x^2048 mod p(x) */
1877++ { 0x99168a18ec447f11, 0xed837b2613e8221e },
1878++ /* x^1824 mod p(x) , x^1856 mod p(x) , x^1888 mod p(x) , x^1920 mod p(x) */
1879++ { 0xe23e954e8fd2cd3c, 0xc8acdd8147b9ce5a },
1880++ /* x^1696 mod p(x) , x^1728 mod p(x) , x^1760 mod p(x) , x^1792 mod p(x) */
1881++ { 0x92f8befe6b1d2b53, 0xd9ad6d87d4277e25 },
1882++ /* x^1568 mod p(x) , x^1600 mod p(x) , x^1632 mod p(x) , x^1664 mod p(x) */
1883++ { 0xf38a3556291ea462, 0xc10ec5e033fbca3b },
1884++ /* x^1440 mod p(x) , x^1472 mod p(x) , x^1504 mod p(x) , x^1536 mod p(x) */
1885++ { 0x974ac56262b6ca4b, 0xc0b55b0e82e02e2f },
1886++ /* x^1312 mod p(x) , x^1344 mod p(x) , x^1376 mod p(x) , x^1408 mod p(x) */
1887++ { 0x855712b3784d2a56, 0x71aa1df0e172334d },
1888++ /* x^1184 mod p(x) , x^1216 mod p(x) , x^1248 mod p(x) , x^1280 mod p(x) */
1889++ { 0xa5abe9f80eaee722, 0xfee3053e3969324d },
1890++ /* x^1056 mod p(x) , x^1088 mod p(x) , x^1120 mod p(x) , x^1152 mod p(x) */
1891++ { 0x1fa0943ddb54814c, 0xf44779b93eb2bd08 },
1892++ /* x^928 mod p(x) , x^960 mod p(x) , x^992 mod p(x) , x^1024 mod p(x) */
1893++ { 0xa53ff440d7bbfe6a, 0xf5449b3f00cc3374 },
1894++ /* x^800 mod p(x) , x^832 mod p(x) , x^864 mod p(x) , x^896 mod p(x) */
1895++ { 0xebe7e3566325605c, 0x6f8346e1d777606e },
1896++ /* x^672 mod p(x) , x^704 mod p(x) , x^736 mod p(x) , x^768 mod p(x) */
1897++ { 0xc65a272ce5b592b8, 0xe3ab4f2ac0b95347 },
1898++ /* x^544 mod p(x) , x^576 mod p(x) , x^608 mod p(x) , x^640 mod p(x) */
1899++ { 0x5705a9ca4721589f, 0xaa2215ea329ecc11 },
1900++ /* x^416 mod p(x) , x^448 mod p(x) , x^480 mod p(x) , x^512 mod p(x) */
1901++ { 0xe3720acb88d14467, 0x1ed8f66ed95efd26 },
1902++ /* x^288 mod p(x) , x^320 mod p(x) , x^352 mod p(x) , x^384 mod p(x) */
1903++ { 0xba1aca0315141c31, 0x78ed02d5a700e96a },
1904++ /* x^160 mod p(x) , x^192 mod p(x) , x^224 mod p(x) , x^256 mod p(x) */
1905++ { 0xad2a31b3ed627dae, 0xba8ccbe832b39da3 },
1906++ /* x^32 mod p(x) , x^64 mod p(x) , x^96 mod p(x) , x^128 mod p(x) */
1907++ { 0x6655004fa06a2517, 0xedb88320b1e6b092 }
1908++#else /* __LITTLE_ENDIAN__ */
1909++ /* x^1952 mod p(x) , x^1984 mod p(x) , x^2016 mod p(x) , x^2048 mod p(x) */
1910++ { 0xed837b2613e8221e, 0x99168a18ec447f11 },
1911++ /* x^1824 mod p(x) , x^1856 mod p(x) , x^1888 mod p(x) , x^1920 mod p(x) */
1912++ { 0xc8acdd8147b9ce5a, 0xe23e954e8fd2cd3c },
1913++ /* x^1696 mod p(x) , x^1728 mod p(x) , x^1760 mod p(x) , x^1792 mod p(x) */
1914++ { 0xd9ad6d87d4277e25, 0x92f8befe6b1d2b53 },
1915++ /* x^1568 mod p(x) , x^1600 mod p(x) , x^1632 mod p(x) , x^1664 mod p(x) */
1916++ { 0xc10ec5e033fbca3b, 0xf38a3556291ea462 },
1917++ /* x^1440 mod p(x) , x^1472 mod p(x) , x^1504 mod p(x) , x^1536 mod p(x) */
1918++ { 0xc0b55b0e82e02e2f, 0x974ac56262b6ca4b },
1919++ /* x^1312 mod p(x) , x^1344 mod p(x) , x^1376 mod p(x) , x^1408 mod p(x) */
1920++ { 0x71aa1df0e172334d, 0x855712b3784d2a56 },
1921++ /* x^1184 mod p(x) , x^1216 mod p(x) , x^1248 mod p(x) , x^1280 mod p(x) */
1922++ { 0xfee3053e3969324d, 0xa5abe9f80eaee722 },
1923++ /* x^1056 mod p(x) , x^1088 mod p(x) , x^1120 mod p(x) , x^1152 mod p(x) */
1924++ { 0xf44779b93eb2bd08, 0x1fa0943ddb54814c },
1925++ /* x^928 mod p(x) , x^960 mod p(x) , x^992 mod p(x) , x^1024 mod p(x) */
1926++ { 0xf5449b3f00cc3374, 0xa53ff440d7bbfe6a },
1927++ /* x^800 mod p(x) , x^832 mod p(x) , x^864 mod p(x) , x^896 mod p(x) */
1928++ { 0x6f8346e1d777606e, 0xebe7e3566325605c },
1929++ /* x^672 mod p(x) , x^704 mod p(x) , x^736 mod p(x) , x^768 mod p(x) */
1930++ { 0xe3ab4f2ac0b95347, 0xc65a272ce5b592b8 },
1931++ /* x^544 mod p(x) , x^576 mod p(x) , x^608 mod p(x) , x^640 mod p(x) */
1932++ { 0xaa2215ea329ecc11, 0x5705a9ca4721589f },
1933++ /* x^416 mod p(x) , x^448 mod p(x) , x^480 mod p(x) , x^512 mod p(x) */
1934++ { 0x1ed8f66ed95efd26, 0xe3720acb88d14467 },
1935++ /* x^288 mod p(x) , x^320 mod p(x) , x^352 mod p(x) , x^384 mod p(x) */
1936++ { 0x78ed02d5a700e96a, 0xba1aca0315141c31 },
1937++ /* x^160 mod p(x) , x^192 mod p(x) , x^224 mod p(x) , x^256 mod p(x) */
1938++ { 0xba8ccbe832b39da3, 0xad2a31b3ed627dae },
1939++ /* x^32 mod p(x) , x^64 mod p(x) , x^96 mod p(x) , x^128 mod p(x) */
1940++ { 0xedb88320b1e6b092, 0x6655004fa06a2517 }
1941++#endif /* __LITTLE_ENDIAN__ */
1942++ };
1943++
1944++/* Barrett constants */
1945++/* 33 bit reflected Barrett constant m - (4^32)/n */
1946++
1947++static const __vector unsigned long long v_Barrett_const[2]
1948++ __attribute__((aligned (16))) = {
1949++ /* x^64 div p(x) */
1950++#ifdef __LITTLE_ENDIAN__
1951++ { 0x00000001f7011641, 0x0000000000000000 },
1952++ { 0x00000001db710641, 0x0000000000000000 }
1953++#else /* __LITTLE_ENDIAN__ */
1954++ { 0x0000000000000000, 0x00000001f7011641 },
1955++ { 0x0000000000000000, 0x00000001db710641 }
1956++#endif /* __LITTLE_ENDIAN__ */
1957++ };
1958++#endif /* POWER8_INTRINSICS */
1959++
1960++#endif /* __ASSEMBLER__ */
1961+diff --git a/contrib/power/crc32_z_power8.c b/contrib/power/crc32_z_power8.c
1962+new file mode 100644
1963+index 0000000..7858cfe
1964+--- /dev/null
1965++++ b/contrib/power/crc32_z_power8.c
1966+@@ -0,0 +1,679 @@
1967++/*
1968++ * Calculate the checksum of data that is 16 byte aligned and a multiple of
1969++ * 16 bytes.
1970++ *
1971++ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
1972++ * chunks in order to mask the latency of the vpmsum instructions. If we
1973++ * have more than 32 kB of data to checksum we repeat this step multiple
1974++ * times, passing in the previous 1024 bits.
1975++ *
1976++ * The next step is to reduce the 1024 bits to 64 bits. This step adds
1977++ * 32 bits of 0s to the end - this matches what a CRC does. We just
1978++ * calculate constants that land the data in this 32 bits.
1979++ *
1980++ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
1981++ * for n = CRC using POWER8 instructions. We use x = 32.
1982++ *
1983++ * http://en.wikipedia.org/wiki/Barrett_reduction
1984++ *
1985++ * This code uses gcc vector builtins instead using assembly directly.
1986++ *
1987++ * Copyright (C) 2017 Rogerio Alves <rogealve@br.ibm.com>, IBM
1988++ *
1989++ * This program is free software; you can redistribute it and/or
1990++ * modify it under the terms of either:
1991++ *
1992++ * a) the GNU General Public License as published by the Free Software
1993++ * Foundation; either version 2 of the License, or (at your option)
1994++ * any later version, or
1995++ * b) the Apache License, Version 2.0
1996++ */
1997++
1998++#include <altivec.h>
1999++#include "../../zutil.h"
2000++#include "power.h"
2001++
2002++#define POWER8_INTRINSICS
2003++#define CRC_TABLE
2004++
2005++#ifdef CRC32_CONSTANTS_HEADER
2006++#include CRC32_CONSTANTS_HEADER
2007++#else
2008++#include "crc32_constants.h"
2009++#endif
2010++
2011++#define VMX_ALIGN 16
2012++#define VMX_ALIGN_MASK (VMX_ALIGN-1)
2013++
2014++#ifdef REFLECT
2015++static unsigned int crc32_align(unsigned int crc, const unsigned char *p,
2016++ unsigned long len)
2017++{
2018++ while (len--)
2019++ crc = crc_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);
2020++ return crc;
2021++}
2022++#else
2023++static unsigned int crc32_align(unsigned int crc, const unsigned char *p,
2024++ unsigned long len)
2025++{
2026++ while (len--)
2027++ crc = crc_table[((crc >> 24) ^ *p++) & 0xff] ^ (crc << 8);
2028++ return crc;
2029++}
2030++#endif
2031++
2032++static unsigned int __attribute__ ((aligned (32)))
2033++__crc32_vpmsum(unsigned int crc, const void* p, unsigned long len);
2034++
2035++unsigned long ZLIB_INTERNAL _crc32_z_power8(uLong _crc, const Bytef *_p,
2036++ z_size_t _len)
2037++{
2038++ unsigned int prealign;
2039++ unsigned int tail;
2040++
2041++ /* Map zlib API to crc32_vpmsum API */
2042++ unsigned int crc = (unsigned int) (0xffffffff & _crc);
2043++ const unsigned char *p = _p;
2044++ unsigned long len = (unsigned long) _len;
2045++
2046++ if (p == (const unsigned char *) 0x0) return 0;
2047++#ifdef CRC_XOR
2048++ crc ^= 0xffffffff;
2049++#endif
2050++
2051++ if (len < VMX_ALIGN + VMX_ALIGN_MASK) {
2052++ crc = crc32_align(crc, p, len);
2053++ goto out;
2054++ }
2055++
2056++ if ((unsigned long)p & VMX_ALIGN_MASK) {
2057++ prealign = VMX_ALIGN - ((unsigned long)p & VMX_ALIGN_MASK);
2058++ crc = crc32_align(crc, p, prealign);
2059++ len -= prealign;
2060++ p += prealign;
2061++ }
2062++
2063++ crc = __crc32_vpmsum(crc, p, len & ~VMX_ALIGN_MASK);
2064++
2065++ tail = len & VMX_ALIGN_MASK;
2066++ if (tail) {
2067++ p += len & ~VMX_ALIGN_MASK;
2068++ crc = crc32_align(crc, p, tail);
2069++ }
2070++
2071++out:
2072++#ifdef CRC_XOR
2073++ crc ^= 0xffffffff;
2074++#endif
2075++
2076++ /* Convert to zlib API */
2077++ return (unsigned long) crc;
2078++}
2079++
2080++#if defined (__clang__)
2081++#include "clang_workaround.h"
2082++#else
2083++#define __builtin_pack_vector(a, b) __builtin_pack_vector_int128 ((a), (b))
2084++#define __builtin_unpack_vector_0(a) __builtin_unpack_vector_int128 ((vector __int128_t)(a), 0)
2085++#define __builtin_unpack_vector_1(a) __builtin_unpack_vector_int128 ((vector __int128_t)(a), 1)
2086++#endif
2087++
2088++/* When we have a load-store in a single-dispatch group and address overlap
2089++ * such that foward is not allowed (load-hit-store) the group must be flushed.
2090++ * A group ending NOP prevents the flush.
2091++ */
2092++#define GROUP_ENDING_NOP asm("ori 2,2,0" ::: "memory")
2093++
2094++#if defined(__BIG_ENDIAN__) && defined (REFLECT)
2095++#define BYTESWAP_DATA
2096++#elif defined(__LITTLE_ENDIAN__) && !defined(REFLECT)
2097++#define BYTESWAP_DATA
2098++#endif
2099++
2100++#ifdef BYTESWAP_DATA
2101++#define VEC_PERM(vr, va, vb, vc) vr = vec_perm(va, vb,\
2102++ (__vector unsigned char) vc)
2103++#if defined(__LITTLE_ENDIAN__)
2104++/* Byte reverse permute constant LE. */
2105++static const __vector unsigned long long vperm_const
2106++ __attribute__ ((aligned(16))) = { 0x08090A0B0C0D0E0FUL,
2107++ 0x0001020304050607UL };
2108++#else
2109++static const __vector unsigned long long vperm_const
2110++ __attribute__ ((aligned(16))) = { 0x0F0E0D0C0B0A0908UL,
2111++ 0X0706050403020100UL };
2112++#endif
2113++#else
2114++#define VEC_PERM(vr, va, vb, vc)
2115++#endif
2116++
2117++static unsigned int __attribute__ ((aligned (32)))
2118++__crc32_vpmsum(unsigned int crc, const void* p, unsigned long len) {
2119++
2120++ const __vector unsigned long long vzero = {0,0};
2121++ const __vector unsigned long long vones = {0xffffffffffffffffUL,
2122++ 0xffffffffffffffffUL};
2123++
2124++#ifdef REFLECT
2125++ const __vector unsigned long long vmask_32bit =
2126++ (__vector unsigned long long)vec_sld((__vector unsigned char)vzero,
2127++ (__vector unsigned char)vones, 4);
2128++#endif
2129++
2130++ const __vector unsigned long long vmask_64bit =
2131++ (__vector unsigned long long)vec_sld((__vector unsigned char)vzero,
2132++ (__vector unsigned char)vones, 8);
2133++
2134++ __vector unsigned long long vcrc;
2135++
2136++ __vector unsigned long long vconst1, vconst2;
2137++
2138++ /* vdata0-vdata7 will contain our data (p). */
2139++ __vector unsigned long long vdata0, vdata1, vdata2, vdata3, vdata4,
2140++ vdata5, vdata6, vdata7;
2141++
2142++ /* v0-v7 will contain our checksums */
2143++ __vector unsigned long long v0 = {0,0};
2144++ __vector unsigned long long v1 = {0,0};
2145++ __vector unsigned long long v2 = {0,0};
2146++ __vector unsigned long long v3 = {0,0};
2147++ __vector unsigned long long v4 = {0,0};
2148++ __vector unsigned long long v5 = {0,0};
2149++ __vector unsigned long long v6 = {0,0};
2150++ __vector unsigned long long v7 = {0,0};
2151++
2152++
2153++ /* Vector auxiliary variables. */
2154++ __vector unsigned long long va0, va1, va2, va3, va4, va5, va6, va7;
2155++
2156++ unsigned int result = 0;
2157++ unsigned int offset; /* Constant table offset. */
2158++
2159++ unsigned long i; /* Counter. */
2160++ unsigned long chunks;
2161++
2162++ unsigned long block_size;
2163++ int next_block = 0;
2164++
2165++ /* Align by 128 bits. The last 128 bit block will be processed at end. */
2166++ unsigned long length = len & 0xFFFFFFFFFFFFFF80UL;
2167++
2168++#ifdef REFLECT
2169++ vcrc = (__vector unsigned long long)__builtin_pack_vector(0UL, crc);
2170++#else
2171++ vcrc = (__vector unsigned long long)__builtin_pack_vector(crc, 0UL);
2172++
2173++ /* Shift into top 32 bits */
2174++ vcrc = (__vector unsigned long long)vec_sld((__vector unsigned char)vcrc,
2175++ (__vector unsigned char)vzero, 4);
2176++#endif
2177++
2178++ /* Short version. */
2179++ if (len < 256) {
2180++ /* Calculate where in the constant table we need to start. */
2181++ offset = 256 - len;
2182++
2183++ vconst1 = vec_ld(offset, vcrc_short_const);
2184++ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
2185++ VEC_PERM(vdata0, vdata0, vconst1, vperm_const);
2186++
2187++ /* xor initial value*/
2188++ vdata0 = vec_xor(vdata0, vcrc);
2189++
2190++ vdata0 = (__vector unsigned long long) __builtin_crypto_vpmsumw
2191++ ((__vector unsigned int)vdata0, (__vector unsigned int)vconst1);
2192++ v0 = vec_xor(v0, vdata0);
2193++
2194++ for (i = 16; i < len; i += 16) {
2195++ vconst1 = vec_ld(offset + i, vcrc_short_const);
2196++ vdata0 = vec_ld(i, (__vector unsigned long long*) p);
2197++ VEC_PERM(vdata0, vdata0, vconst1, vperm_const);
2198++ vdata0 = (__vector unsigned long long) __builtin_crypto_vpmsumw
2199++ ((__vector unsigned int)vdata0, (__vector unsigned int)vconst1);
2200++ v0 = vec_xor(v0, vdata0);
2201++ }
2202++ } else {
2203++
2204++ /* Load initial values. */
2205++ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
2206++ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
2207++
2208++ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
2209++ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
2210++
2211++ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
2212++ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
2213++
2214++ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
2215++ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
2216++
2217++ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
2218++ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
2219++
2220++ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
2221++ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
2222++
2223++ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
2224++ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
2225++
2226++ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
2227++ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
2228++
2229++ /* xor in initial value */
2230++ vdata0 = vec_xor(vdata0, vcrc);
2231++
2232++ p = (char *)p + 128;
2233++
2234++ do {
2235++ /* Checksum in blocks of MAX_SIZE. */
2236++ block_size = length;
2237++ if (block_size > MAX_SIZE) {
2238++ block_size = MAX_SIZE;
2239++ }
2240++
2241++ length = length - block_size;
2242++
2243++ /*
2244++ * Work out the offset into the constants table to start at. Each
2245++ * constant is 16 bytes, and it is used against 128 bytes of input
2246++ * data - 128 / 16 = 8
2247++ */
2248++ offset = (MAX_SIZE/8) - (block_size/8);
2249++ /* We reduce our final 128 bytes in a separate step */
2250++ chunks = (block_size/128)-1;
2251++
2252++ vconst1 = vec_ld(offset, vcrc_const);
2253++
2254++ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata0,
2255++ (__vector unsigned long long)vconst1);
2256++ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata1,
2257++ (__vector unsigned long long)vconst1);
2258++ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata2,
2259++ (__vector unsigned long long)vconst1);
2260++ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata3,
2261++ (__vector unsigned long long)vconst1);
2262++ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata4,
2263++ (__vector unsigned long long)vconst1);
2264++ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata5,
2265++ (__vector unsigned long long)vconst1);
2266++ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata6,
2267++ (__vector unsigned long long)vconst1);
2268++ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata7,
2269++ (__vector unsigned long long)vconst1);
2270++
2271++ if (chunks > 1) {
2272++ offset += 16;
2273++ vconst2 = vec_ld(offset, vcrc_const);
2274++ GROUP_ENDING_NOP;
2275++
2276++ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
2277++ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
2278++
2279++ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
2280++ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
2281++
2282++ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
2283++ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
2284++
2285++ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
2286++ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
2287++
2288++ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
2289++ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
2290++
2291++ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
2292++ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
2293++
2294++ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
2295++ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
2296++
2297++ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
2298++ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
2299++
2300++ p = (char *)p + 128;
2301++
2302++ /*
2303++ * main loop. We modulo schedule it such that it takes three
2304++ * iterations to complete - first iteration load, second
2305++ * iteration vpmsum, third iteration xor.
2306++ */
2307++ for (i = 0; i < chunks-2; i++) {
2308++ vconst1 = vec_ld(offset, vcrc_const);
2309++ offset += 16;
2310++ GROUP_ENDING_NOP;
2311++
2312++ v0 = vec_xor(v0, va0);
2313++ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long
2314++ long)vdata0, (__vector unsigned long long)vconst2);
2315++ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
2316++ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
2317++ GROUP_ENDING_NOP;
2318++
2319++ v1 = vec_xor(v1, va1);
2320++ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long
2321++ long)vdata1, (__vector unsigned long long)vconst2);
2322++ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
2323++ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
2324++ GROUP_ENDING_NOP;
2325++
2326++ v2 = vec_xor(v2, va2);
2327++ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long
2328++ long)vdata2, (__vector unsigned long long)vconst2);
2329++ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
2330++ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
2331++ GROUP_ENDING_NOP;
2332++
2333++ v3 = vec_xor(v3, va3);
2334++ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long
2335++ long)vdata3, (__vector unsigned long long)vconst2);
2336++ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
2337++ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
2338++
2339++ vconst2 = vec_ld(offset, vcrc_const);
2340++ GROUP_ENDING_NOP;
2341++
2342++ v4 = vec_xor(v4, va4);
2343++ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long
2344++ long)vdata4, (__vector unsigned long long)vconst1);
2345++ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
2346++ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
2347++ GROUP_ENDING_NOP;
2348++
2349++ v5 = vec_xor(v5, va5);
2350++ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long
2351++ long)vdata5, (__vector unsigned long long)vconst1);
2352++ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
2353++ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
2354++ GROUP_ENDING_NOP;
2355++
2356++ v6 = vec_xor(v6, va6);
2357++ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long
2358++ long)vdata6, (__vector unsigned long long)vconst1);
2359++ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
2360++ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
2361++ GROUP_ENDING_NOP;
2362++
2363++ v7 = vec_xor(v7, va7);
2364++ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long
2365++ long)vdata7, (__vector unsigned long long)vconst1);
2366++ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
2367++ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
2368++
2369++ p = (char *)p + 128;
2370++ }
2371++
2372++ /* First cool down*/
2373++ vconst1 = vec_ld(offset, vcrc_const);
2374++ offset += 16;
2375++
2376++ v0 = vec_xor(v0, va0);
2377++ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long
2378++ long)vdata0, (__vector unsigned long long)vconst1);
2379++ GROUP_ENDING_NOP;
2380++
2381++ v1 = vec_xor(v1, va1);
2382++ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long
2383++ long)vdata1, (__vector unsigned long long)vconst1);
2384++ GROUP_ENDING_NOP;
2385++
2386++ v2 = vec_xor(v2, va2);
2387++ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long
2388++ long)vdata2, (__vector unsigned long long)vconst1);
2389++ GROUP_ENDING_NOP;
2390++
2391++ v3 = vec_xor(v3, va3);
2392++ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long
2393++ long)vdata3, (__vector unsigned long long)vconst1);
2394++ GROUP_ENDING_NOP;
2395++
2396++ v4 = vec_xor(v4, va4);
2397++ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long
2398++ long)vdata4, (__vector unsigned long long)vconst1);
2399++ GROUP_ENDING_NOP;
2400++
2401++ v5 = vec_xor(v5, va5);
2402++ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long
2403++ long)vdata5, (__vector unsigned long long)vconst1);
2404++ GROUP_ENDING_NOP;
2405++
2406++ v6 = vec_xor(v6, va6);
2407++ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long
2408++ long)vdata6, (__vector unsigned long long)vconst1);
2409++ GROUP_ENDING_NOP;
2410++
2411++ v7 = vec_xor(v7, va7);
2412++ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long
2413++ long)vdata7, (__vector unsigned long long)vconst1);
2414++ }/* else */
2415++
2416++ /* Second cool down. */
2417++ v0 = vec_xor(v0, va0);
2418++ v1 = vec_xor(v1, va1);
2419++ v2 = vec_xor(v2, va2);
2420++ v3 = vec_xor(v3, va3);
2421++ v4 = vec_xor(v4, va4);
2422++ v5 = vec_xor(v5, va5);
2423++ v6 = vec_xor(v6, va6);
2424++ v7 = vec_xor(v7, va7);
2425++
2426++#ifdef REFLECT
2427++ /*
2428++ * vpmsumd produces a 96 bit result in the least significant bits
2429++ * of the register. Since we are bit reflected we have to shift it
2430++ * left 32 bits so it occupies the least significant bits in the
2431++ * bit reflected domain.
2432++ */
2433++ v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2434++ (__vector unsigned char)vzero, 4);
2435++ v1 = (__vector unsigned long long)vec_sld((__vector unsigned char)v1,
2436++ (__vector unsigned char)vzero, 4);
2437++ v2 = (__vector unsigned long long)vec_sld((__vector unsigned char)v2,
2438++ (__vector unsigned char)vzero, 4);
2439++ v3 = (__vector unsigned long long)vec_sld((__vector unsigned char)v3,
2440++ (__vector unsigned char)vzero, 4);
2441++ v4 = (__vector unsigned long long)vec_sld((__vector unsigned char)v4,
2442++ (__vector unsigned char)vzero, 4);
2443++ v5 = (__vector unsigned long long)vec_sld((__vector unsigned char)v5,
2444++ (__vector unsigned char)vzero, 4);
2445++ v6 = (__vector unsigned long long)vec_sld((__vector unsigned char)v6,
2446++ (__vector unsigned char)vzero, 4);
2447++ v7 = (__vector unsigned long long)vec_sld((__vector unsigned char)v7,
2448++ (__vector unsigned char)vzero, 4);
2449++#endif
2450++
2451++ /* xor with the last 1024 bits. */
2452++ va0 = vec_ld(0, (__vector unsigned long long*) p);
2453++ VEC_PERM(va0, va0, va0, vperm_const);
2454++
2455++ va1 = vec_ld(16, (__vector unsigned long long*) p);
2456++ VEC_PERM(va1, va1, va1, vperm_const);
2457++
2458++ va2 = vec_ld(32, (__vector unsigned long long*) p);
2459++ VEC_PERM(va2, va2, va2, vperm_const);
2460++
2461++ va3 = vec_ld(48, (__vector unsigned long long*) p);
2462++ VEC_PERM(va3, va3, va3, vperm_const);
2463++
2464++ va4 = vec_ld(64, (__vector unsigned long long*) p);
2465++ VEC_PERM(va4, va4, va4, vperm_const);
2466++
2467++ va5 = vec_ld(80, (__vector unsigned long long*) p);
2468++ VEC_PERM(va5, va5, va5, vperm_const);
2469++
2470++ va6 = vec_ld(96, (__vector unsigned long long*) p);
2471++ VEC_PERM(va6, va6, va6, vperm_const);
2472++
2473++ va7 = vec_ld(112, (__vector unsigned long long*) p);
2474++ VEC_PERM(va7, va7, va7, vperm_const);
2475++
2476++ p = (char *)p + 128;
2477++
2478++ vdata0 = vec_xor(v0, va0);
2479++ vdata1 = vec_xor(v1, va1);
2480++ vdata2 = vec_xor(v2, va2);
2481++ vdata3 = vec_xor(v3, va3);
2482++ vdata4 = vec_xor(v4, va4);
2483++ vdata5 = vec_xor(v5, va5);
2484++ vdata6 = vec_xor(v6, va6);
2485++ vdata7 = vec_xor(v7, va7);
2486++
2487++ /* Check if we have more blocks to process */
2488++ next_block = 0;
2489++ if (length != 0) {
2490++ next_block = 1;
2491++
2492++ /* zero v0-v7 */
2493++ v0 = vec_xor(v0, v0);
2494++ v1 = vec_xor(v1, v1);
2495++ v2 = vec_xor(v2, v2);
2496++ v3 = vec_xor(v3, v3);
2497++ v4 = vec_xor(v4, v4);
2498++ v5 = vec_xor(v5, v5);
2499++ v6 = vec_xor(v6, v6);
2500++ v7 = vec_xor(v7, v7);
2501++ }
2502++ length = length + 128;
2503++
2504++ } while (next_block);
2505++
2506++ /* Calculate how many bytes we have left. */
2507++ length = (len & 127);
2508++
2509++ /* Calculate where in (short) constant table we need to start. */
2510++ offset = 128 - length;
2511++
2512++ v0 = vec_ld(offset, vcrc_short_const);
2513++ v1 = vec_ld(offset + 16, vcrc_short_const);
2514++ v2 = vec_ld(offset + 32, vcrc_short_const);
2515++ v3 = vec_ld(offset + 48, vcrc_short_const);
2516++ v4 = vec_ld(offset + 64, vcrc_short_const);
2517++ v5 = vec_ld(offset + 80, vcrc_short_const);
2518++ v6 = vec_ld(offset + 96, vcrc_short_const);
2519++ v7 = vec_ld(offset + 112, vcrc_short_const);
2520++
2521++ offset += 128;
2522++
2523++ v0 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2524++ (__vector unsigned int)vdata0,(__vector unsigned int)v0);
2525++ v1 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2526++ (__vector unsigned int)vdata1,(__vector unsigned int)v1);
2527++ v2 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2528++ (__vector unsigned int)vdata2,(__vector unsigned int)v2);
2529++ v3 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2530++ (__vector unsigned int)vdata3,(__vector unsigned int)v3);
2531++ v4 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2532++ (__vector unsigned int)vdata4,(__vector unsigned int)v4);
2533++ v5 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2534++ (__vector unsigned int)vdata5,(__vector unsigned int)v5);
2535++ v6 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2536++ (__vector unsigned int)vdata6,(__vector unsigned int)v6);
2537++ v7 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2538++ (__vector unsigned int)vdata7,(__vector unsigned int)v7);
2539++
2540++ /* Now reduce the tail (0-112 bytes). */
2541++ for (i = 0; i < length; i+=16) {
2542++ vdata0 = vec_ld(i,(__vector unsigned long long*)p);
2543++ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
2544++ va0 = vec_ld(offset + i,vcrc_short_const);
2545++ va0 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2546++ (__vector unsigned int)vdata0,(__vector unsigned int)va0);
2547++ v0 = vec_xor(v0, va0);
2548++ }
2549++
2550++ /* xor all parallel chunks together. */
2551++ v0 = vec_xor(v0, v1);
2552++ v2 = vec_xor(v2, v3);
2553++ v4 = vec_xor(v4, v5);
2554++ v6 = vec_xor(v6, v7);
2555++
2556++ v0 = vec_xor(v0, v2);
2557++ v4 = vec_xor(v4, v6);
2558++
2559++ v0 = vec_xor(v0, v4);
2560++ }
2561++
2562++ /* Barrett Reduction */
2563++ vconst1 = vec_ld(0, v_Barrett_const);
2564++ vconst2 = vec_ld(16, v_Barrett_const);
2565++
2566++ v1 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2567++ (__vector unsigned char)v0, 8);
2568++ v0 = vec_xor(v1,v0);
2569++
2570++#ifdef REFLECT
2571++ /* shift left one bit */
2572++ __vector unsigned char vsht_splat = vec_splat_u8 (1);
2573++ v0 = (__vector unsigned long long)vec_sll ((__vector unsigned char)v0,
2574++ vsht_splat);
2575++#endif
2576++
2577++ v0 = vec_and(v0, vmask_64bit);
2578++
2579++#ifndef REFLECT
2580++
2581++ /*
2582++ * Now for the actual algorithm. The idea is to calculate q,
2583++ * the multiple of our polynomial that we need to subtract. By
2584++ * doing the computation 2x bits higher (ie 64 bits) and shifting the
2585++ * result back down 2x bits, we round down to the nearest multiple.
2586++ */
2587++
2588++ /* ma */
2589++ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v0,
2590++ (__vector unsigned long long)vconst1);
2591++ /* q = floor(ma/(2^64)) */
2592++ v1 = (__vector unsigned long long)vec_sld ((__vector unsigned char)vzero,
2593++ (__vector unsigned char)v1, 8);
2594++ /* qn */
2595++ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2596++ (__vector unsigned long long)vconst2);
2597++ /* a - qn, subtraction is xor in GF(2) */
2598++ v0 = vec_xor (v0, v1);
2599++ /*
2600++ * Get the result into r3. We need to shift it left 8 bytes:
2601++ * V0 [ 0 1 2 X ]
2602++ * V0 [ 0 X 2 3 ]
2603++ */
2604++ result = __builtin_unpack_vector_1 (v0);
2605++#else
2606++
2607++ /*
2608++ * The reflected version of Barrett reduction. Instead of bit
2609++ * reflecting our data (which is expensive to do), we bit reflect our
2610++ * constants and our algorithm, which means the intermediate data in
2611++ * our vector registers goes from 0-63 instead of 63-0. We can reflect
2612++ * the algorithm because we don't carry in mod 2 arithmetic.
2613++ */
2614++
2615++ /* bottom 32 bits of a */
2616++ v1 = vec_and(v0, vmask_32bit);
2617++
2618++ /* ma */
2619++ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2620++ (__vector unsigned long long)vconst1);
2621++
2622++ /* bottom 32bits of ma */
2623++ v1 = vec_and(v1, vmask_32bit);
2624++ /* qn */
2625++ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2626++ (__vector unsigned long long)vconst2);
2627++ /* a - qn, subtraction is xor in GF(2) */
2628++ v0 = vec_xor (v0, v1);
2629++
2630++ /*
2631++ * Since we are bit reflected, the result (ie the low 32 bits) is in
2632++ * the high 32 bits. We just need to shift it left 4 bytes
2633++ * V0 [ 0 1 X 3 ]
2634++ * V0 [ 0 X 2 3 ]
2635++ */
2636++
2637++ /* shift result into top 64 bits of */
2638++ v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2639++ (__vector unsigned char)vzero, 4);
2640++
2641++ result = __builtin_unpack_vector_0 (v0);
2642++#endif
2643++
2644++ return result;
2645++}
2646+diff --git a/contrib/power/crc32_z_resolver.c b/contrib/power/crc32_z_resolver.c
2647+new file mode 100644
2648+index 0000000..f4e9aa4
2649+--- /dev/null
2650++++ b/contrib/power/crc32_z_resolver.c
2651+@@ -0,0 +1,15 @@
2652++/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
2653++ * For conditions of distribution and use, see copyright notice in zlib.h
2654++ */
2655++
2656++#include "../gcc/zifunc.h"
2657++#include "power.h"
2658++
2659++Z_IFUNC(crc32_z) {
2660++#ifdef Z_POWER8
2661++ if (__builtin_cpu_supports("arch_2_07"))
2662++ return _crc32_z_power8;
2663++#endif
2664++
2665++ return crc32_z_default;
2666++}
2667+diff --git a/contrib/power/power.h b/contrib/power/power.h
2668+index b42c7d6..79123aa 100644
2669+--- a/contrib/power/power.h
2670++++ b/contrib/power/power.h
2671+@@ -2,3 +2,7 @@
2672+ * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
2673+ * For conditions of distribution and use, see copyright notice in zlib.h
2674+ */
2675++
2676++#include "../../zconf.h"
2677++
2678++unsigned long _crc32_z_power8(unsigned long, const Bytef *, z_size_t);
2679+diff --git a/crc32.c b/crc32.c
2680+index 6c38f5c..5589d54 100644
2681+--- a/crc32.c
2682++++ b/crc32.c
2683+@@ -691,6 +691,13 @@ local z_word_t crc_word_big(z_word_t data) {
2684+ #endif
2685+
2686+ /* ========================================================================= */
2687++#ifdef Z_POWER_OPT
2688++/* Rename function so resolver can use its symbol. The default version will be
2689++ * returned by the resolver if the host has no support for an optimized version.
2690++ */
2691++#define crc32_z crc32_z_default
2692++#endif /* Z_POWER_OPT */
2693++
2694+ unsigned long ZEXPORT crc32_z(unsigned long crc, const unsigned char FAR *buf,
2695+ z_size_t len) {
2696+ /* Return initial CRC, if requested. */
2697+@@ -1009,6 +1016,11 @@ unsigned long ZEXPORT crc32_z(unsigned long crc, const unsigned char FAR *buf,
2698+ return crc ^ 0xffffffff;
2699+ }
2700+
2701++#ifdef Z_POWER_OPT
2702++#undef crc32_z
2703++#include "contrib/power/crc32_z_resolver.c"
2704++#endif /* Z_POWER_OPT */
2705++
2706+ #endif
2707+
2708+ /* ========================================================================= */
2709+diff --git a/test/crc32_test.c b/test/crc32_test.c
2710+new file mode 100644
2711+index 0000000..3155553
2712+--- /dev/null
2713++++ b/test/crc32_test.c
2714+@@ -0,0 +1,205 @@
2715++/* crc32_tes.c -- unit test for crc32 in the zlib compression library
2716++ * Copyright (C) 1995-2006, 2010, 2011, 2016, 2019 Rogerio Alves
2717++ * For conditions of distribution and use, see copyright notice in zlib.h
2718++ */
2719++
2720++#include "zlib.h"
2721++#include <stdio.h>
2722++
2723++#ifdef STDC
2724++# include <string.h>
2725++# include <stdlib.h>
2726++#endif
2727++
2728++void test_crc32 OF((uLong crc, Byte* buf, z_size_t len, uLong chk, int line));
2729++int main OF((void));
2730++
2731++typedef struct {
2732++ int line;
2733++ uLong crc;
2734++ char* buf;
2735++ int len;
2736++ uLong expect;
2737++} crc32_test;
2738++
2739++void test_crc32(crc, buf, len, chk, line)
2740++ uLong crc;
2741++ Byte *buf;
2742++ z_size_t len;
2743++ uLong chk;
2744++ int line;
2745++{
2746++ uLong res = crc32(crc, buf, len);
2747++ if (res != chk) {
2748++ fprintf(stderr, "FAIL [%d]: crc32 returned 0x%08X expected 0x%08X\n",
2749++ line, (unsigned int)res, (unsigned int)chk);
2750++ exit(1);
2751++ }
2752++}
2753++
2754++static const crc32_test tests[] = {
2755++ {__LINE__, 0x0, 0x0, 0, 0x0},
2756++ {__LINE__, 0xffffffff, 0x0, 0, 0x0},
2757++ {__LINE__, 0x0, 0x0, 255, 0x0}, /* BZ 174799. */
2758++ {__LINE__, 0x0, 0x0, 256, 0x0},
2759++ {__LINE__, 0x0, 0x0, 257, 0x0},
2760++ {__LINE__, 0x0, 0x0, 32767, 0x0},
2761++ {__LINE__, 0x0, 0x0, 32768, 0x0},
2762++ {__LINE__, 0x0, 0x0, 32769, 0x0},
2763++ {__LINE__, 0x0, "", 0, 0x0},
2764++ {__LINE__, 0xffffffff, "", 0, 0xffffffff},
2765++ {__LINE__, 0x0, "abacus", 6, 0xc3d7115b},
2766++ {__LINE__, 0x0, "backlog", 7, 0x269205},
2767++ {__LINE__, 0x0, "campfire", 8, 0x22a515f8},
2768++ {__LINE__, 0x0, "delta", 5, 0x9643fed9},
2769++ {__LINE__, 0x0, "executable", 10, 0xd68eda01},
2770++ {__LINE__, 0x0, "file", 4, 0x8c9f3610},
2771++ {__LINE__, 0x0, "greatest", 8, 0xc1abd6cd},
2772++ {__LINE__, 0x0, "hello", 5, 0x3610a686},
2773++ {__LINE__, 0x0, "inverter", 8, 0xc9e962c9},
2774++ {__LINE__, 0x0, "jigsaw", 6, 0xce4e3f69},
2775++ {__LINE__, 0x0, "karate", 6, 0x890be0e2},
2776++ {__LINE__, 0x0, "landscape", 9, 0xc4e0330b},
2777++ {__LINE__, 0x0, "machine", 7, 0x1505df84},
2778++ {__LINE__, 0x0, "nanometer", 9, 0xd4e19f39},
2779++ {__LINE__, 0x0, "oblivion", 8, 0xdae9de77},
2780++ {__LINE__, 0x0, "panama", 6, 0x66b8979c},
2781++ {__LINE__, 0x0, "quest", 5, 0x4317f817},
2782++ {__LINE__, 0x0, "resource", 8, 0xbc91f416},
2783++ {__LINE__, 0x0, "secret", 6, 0x5ca2e8e5},
2784++ {__LINE__, 0x0, "test", 4, 0xd87f7e0c},
2785++ {__LINE__, 0x0, "ultimate", 8, 0x3fc79b0b},
2786++ {__LINE__, 0x0, "vector", 6, 0x1b6e485b},
2787++ {__LINE__, 0x0, "walrus", 6, 0xbe769b97},
2788++ {__LINE__, 0x0, "xeno", 4, 0xe7a06444},
2789++ {__LINE__, 0x0, "yelling", 7, 0xfe3944e5},
2790++ {__LINE__, 0x0, "zlib", 4, 0x73887d3a},
2791++ {__LINE__, 0x0, "4BJD7PocN1VqX0jXVpWB", 20, 0xd487a5a1},
2792++ {__LINE__, 0x0, "F1rPWI7XvDs6nAIRx41l", 20, 0x61a0132e},
2793++ {__LINE__, 0x0, "ldhKlsVkPFOveXgkGtC2", 20, 0xdf02f76},
2794++ {__LINE__, 0x0, "5KKnGOOrs8BvJ35iKTOS", 20, 0x579b2b0a},
2795++ {__LINE__, 0x0, "0l1tw7GOcem06Ddu7yn4", 20, 0xf7d16e2d},
2796++ {__LINE__, 0x0, "MCr47CjPIn9R1IvE1Tm5", 20, 0x731788f5},
2797++ {__LINE__, 0x0, "UcixbzPKTIv0SvILHVdO", 20, 0x7112bb11},
2798++ {__LINE__, 0x0, "dGnAyAhRQDsWw0ESou24", 20, 0xf32a0dac},
2799++ {__LINE__, 0x0, "di0nvmY9UYMYDh0r45XT", 20, 0x625437bb},
2800++ {__LINE__, 0x0, "2XKDwHfAhFsV0RhbqtvH", 20, 0x896930f9},
2801++ {__LINE__, 0x0, "ZhrANFIiIvRnqClIVyeD", 20, 0x8579a37},
2802++ {__LINE__, 0x0, "v7Q9ehzioTOVeDIZioT1", 20, 0x632aa8e0},
2803++ {__LINE__, 0x0, "Yod5hEeKcYqyhfXbhxj2", 20, 0xc829af29},
2804++ {__LINE__, 0x0, "GehSWY2ay4uUKhehXYb0", 20, 0x1b08b7e8},
2805++ {__LINE__, 0x0, "kwytJmq6UqpflV8Y8GoE", 20, 0x4e33b192},
2806++ {__LINE__, 0x0, "70684206568419061514", 20, 0x59a179f0},
2807++ {__LINE__, 0x0, "42015093765128581010", 20, 0xcd1013d7},
2808++ {__LINE__, 0x0, "88214814356148806939", 20, 0xab927546},
2809++ {__LINE__, 0x0, "43472694284527343838", 20, 0x11f3b20c},
2810++ {__LINE__, 0x0, "49769333513942933689", 20, 0xd562d4ca},
2811++ {__LINE__, 0x0, "54979784887993251199", 20, 0x233395f7},
2812++ {__LINE__, 0x0, "58360544869206793220", 20, 0x2d167fd5},
2813++ {__LINE__, 0x0, "27347953487840714234", 20, 0x8b5108ba},
2814++ {__LINE__, 0x0, "07650690295365319082", 20, 0xc46b3cd8},
2815++ {__LINE__, 0x0, "42655507906821911703", 20, 0xc10b2662},
2816++ {__LINE__, 0x0, "29977409200786225655", 20, 0xc9a0f9d2},
2817++ {__LINE__, 0x0, "85181542907229116674", 20, 0x9341357b},
2818++ {__LINE__, 0x0, "87963594337989416799", 20, 0xf0424937},
2819++ {__LINE__, 0x0, "21395988329504168551", 20, 0xd7c4c31f},
2820++ {__LINE__, 0x0, "51991013580943379423", 20, 0xf11edcc4},
2821++ {__LINE__, 0x0, "*]+@!);({_$;}[_},?{?;(_?,=-][@", 30, 0x40795df4},
2822++ {__LINE__, 0x0, "_@:_).&(#.[:[{[:)$++-($_;@[)}+", 30, 0xdd61a631},
2823++ {__LINE__, 0x0, "&[!,[$_==}+.]@!;*(+},[;:)$;)-@", 30, 0xca907a99},
2824++ {__LINE__, 0x0, "]{.[.+?+[[=;[?}_#&;[=)__$$:+=_", 30, 0xf652deac},
2825++ {__LINE__, 0x0, "-%.)=/[@].:.(:,()$;=%@-$?]{%+%", 30, 0xaf39a5a9},
2826++ {__LINE__, 0x0, "+]#$(@&.=:,*];/.!]%/{:){:@(;)$", 30, 0x6bebb4cf},
2827++ {__LINE__, 0x0, ")-._.:?[&:.=+}(*$/=!.${;(=$@!}", 30, 0x76430bac},
2828++ {__LINE__, 0x0, ":(_*&%/[[}+,?#$&*+#[([*-/#;%(]", 30, 0x6c80c388},
2829++ {__LINE__, 0x0, "{[#-;:$/{)(+[}#]/{&!%(@)%:@-$:", 30, 0xd54d977d},
2830++ {__LINE__, 0x0, "_{$*,}(&,@.)):=!/%(&(,,-?$}}}!", 30, 0xe3966ad5},
2831++ {__LINE__, 0x0, "e$98KNzqaV)Y:2X?]77].{gKRD4G5{mHZk,Z)SpU%L3FSgv!Wb8MLAFdi{+fp)c,@8m6v)yXg@]HBDFk?.4&}g5_udE*JHCiH=aL", 100, 0xe7c71db9},
2832++ {__LINE__, 0x0, "r*Fd}ef+5RJQ;+W=4jTR9)R*p!B;]Ed7tkrLi;88U7g@3v!5pk2X6D)vt,.@N8c]@yyEcKi[vwUu@.Ppm@C6%Mv*3Nw}Y,58_aH)", 100, 0xeaa52777},
2833++ {__LINE__, 0x0, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 100, 0xcd472048},
2834++ {__LINE__, 0x7a30360d, "abacus", 6, 0xf8655a84},
2835++ {__LINE__, 0x6fd767ee, "backlog", 7, 0x1ed834b1},
2836++ {__LINE__, 0xefeb7589, "campfire", 8, 0x686cfca},
2837++ {__LINE__, 0x61cf7e6b, "delta", 5, 0x1554e4b1},
2838++ {__LINE__, 0xdc712e2, "executable", 10, 0x761b4254},
2839++ {__LINE__, 0xad23c7fd, "file", 4, 0x7abdd09b},
2840++ {__LINE__, 0x85cb2317, "greatest", 8, 0x4ba91c6b},
2841++ {__LINE__, 0x9eed31b0, "inverter", 8, 0xd5e78ba5},
2842++ {__LINE__, 0xb94f34ca, "jigsaw", 6, 0x23649109},
2843++ {__LINE__, 0xab058a2, "karate", 6, 0xc5591f41},
2844++ {__LINE__, 0x5bff2b7a, "landscape", 9, 0xf10eb644},
2845++ {__LINE__, 0x605c9a5f, "machine", 7, 0xbaa0a636},
2846++ {__LINE__, 0x51bdeea5, "nanometer", 9, 0x6af89afb},
2847++ {__LINE__, 0x85c21c79, "oblivion", 8, 0xecae222b},
2848++ {__LINE__, 0x97216f56, "panama", 6, 0x47dffac4},
2849++ {__LINE__, 0x18444af2, "quest", 5, 0x70c2fe36},
2850++ {__LINE__, 0xbe6ce359, "resource", 8, 0x1471d925},
2851++ {__LINE__, 0x843071f1, "secret", 6, 0x50c9a0db},
2852++ {__LINE__, 0xf2480c60, "ultimate", 8, 0xf973daf8},
2853++ {__LINE__, 0x2d2feb3d, "vector", 6, 0x344ac03d},
2854++ {__LINE__, 0x7490310a, "walrus", 6, 0x6d1408ef},
2855++ {__LINE__, 0x97d247d4, "xeno", 4, 0xe62670b5},
2856++ {__LINE__, 0x93cf7599, "yelling", 7, 0x1b36da38},
2857++ {__LINE__, 0x73c84278, "zlib", 4, 0x6432d127},
2858++ {__LINE__, 0x228a87d1, "4BJD7PocN1VqX0jXVpWB", 20, 0x997107d0},
2859++ {__LINE__, 0xa7a048d0, "F1rPWI7XvDs6nAIRx41l", 20, 0xdc567274},
2860++ {__LINE__, 0x1f0ded40, "ldhKlsVkPFOveXgkGtC2", 20, 0xdcc63870},
2861++ {__LINE__, 0xa804a62f, "5KKnGOOrs8BvJ35iKTOS", 20, 0x6926cffd},
2862++ {__LINE__, 0x508fae6a, "0l1tw7GOcem06Ddu7yn4", 20, 0xb52b38bc},
2863++ {__LINE__, 0xe5adaf4f, "MCr47CjPIn9R1IvE1Tm5", 20, 0xf83b8178},
2864++ {__LINE__, 0x67136a40, "UcixbzPKTIv0SvILHVdO", 20, 0xc5213070},
2865++ {__LINE__, 0xb00c4a10, "dGnAyAhRQDsWw0ESou24", 20, 0xbc7648b0},
2866++ {__LINE__, 0x2e0c84b5, "di0nvmY9UYMYDh0r45XT", 20, 0xd8123a72},
2867++ {__LINE__, 0x81238d44, "2XKDwHfAhFsV0RhbqtvH", 20, 0xd5ac5620},
2868++ {__LINE__, 0xf853aa92, "ZhrANFIiIvRnqClIVyeD", 20, 0xceae099d},
2869++ {__LINE__, 0x5a692325, "v7Q9ehzioTOVeDIZioT1", 20, 0xb07d2b24},
2870++ {__LINE__, 0x3275b9f, "Yod5hEeKcYqyhfXbhxj2", 20, 0x24ce91df},
2871++ {__LINE__, 0x38371feb, "GehSWY2ay4uUKhehXYb0", 20, 0x707b3b30},
2872++ {__LINE__, 0xafc8bf62, "kwytJmq6UqpflV8Y8GoE", 20, 0x16abc6a9},
2873++ {__LINE__, 0x9b07db73, "70684206568419061514", 20, 0xae1fb7b7},
2874++ {__LINE__, 0xe75b214, "42015093765128581010", 20, 0xd4eecd2d},
2875++ {__LINE__, 0x72d0fe6f, "88214814356148806939", 20, 0x4660ec7},
2876++ {__LINE__, 0xf857a4b1, "43472694284527343838", 20, 0xfd8afdf7},
2877++ {__LINE__, 0x54b8e14, "49769333513942933689", 20, 0xc6d1b5f2},
2878++ {__LINE__, 0xd6aa5616, "54979784887993251199", 20, 0x32476461},
2879++ {__LINE__, 0x11e63098, "58360544869206793220", 20, 0xd917cf1a},
2880++ {__LINE__, 0xbe92385, "27347953487840714234", 20, 0x4ad14a12},
2881++ {__LINE__, 0x49511de0, "07650690295365319082", 20, 0xe37b5c6c},
2882++ {__LINE__, 0x3db13bc1, "42655507906821911703", 20, 0x7cc497f1},
2883++ {__LINE__, 0xbb899bea, "29977409200786225655", 20, 0x99781bb2},
2884++ {__LINE__, 0xf6cd9436, "85181542907229116674", 20, 0x132256a1},
2885++ {__LINE__, 0x9109e6c3, "87963594337989416799", 20, 0xbfdb2c83},
2886++ {__LINE__, 0x75770fc, "21395988329504168551", 20, 0x8d9d1e81},
2887++ {__LINE__, 0x69b1d19b, "51991013580943379423", 20, 0x7b6d4404},
2888++ {__LINE__, 0xc6132975, "*]+@!);({_$;}[_},?{?;(_?,=-][@", 30, 0x8619f010},
2889++ {__LINE__, 0xd58cb00c, "_@:_).&(#.[:[{[:)$++-($_;@[)}+", 30, 0x15746ac3},
2890++ {__LINE__, 0xb63b8caa, "&[!,[$_==}+.]@!;*(+},[;:)$;)-@", 30, 0xaccf812f},
2891++ {__LINE__, 0x8a45a2b8, "]{.[.+?+[[=;[?}_#&;[=)__$$:+=_", 30, 0x78af45de},
2892++ {__LINE__, 0xcbe95b78, "-%.)=/[@].:.(:,()$;=%@-$?]{%+%", 30, 0x25b06b59},
2893++ {__LINE__, 0x4ef8a54b, "+]#$(@&.=:,*];/.!]%/{:){:@(;)$", 30, 0x4ba0d08f},
2894++ {__LINE__, 0x76ad267a, ")-._.:?[&:.=+}(*$/=!.${;(=$@!}", 30, 0xe26b6aac},
2895++ {__LINE__, 0x569e613c, ":(_*&%/[[}+,?#$&*+#[([*-/#;%(]", 30, 0x7e2b0a66},
2896++ {__LINE__, 0x36aa61da, "{[#-;:$/{)(+[}#]/{&!%(@)%:@-$:", 30, 0xb3430dc7},
2897++ {__LINE__, 0xf67222df, "_{$*,}(&,@.)):=!/%(&(,,-?$}}}!", 30, 0x626c17a},
2898++ {__LINE__, 0x74b34fd3, "e$98KNzqaV)Y:2X?]77].{gKRD4G5{mHZk,Z)SpU%L3FSgv!Wb8MLAFdi{+fp)c,@8m6v)yXg@]HBDFk?.4&}g5_udE*JHCiH=aL", 100, 0xccf98060},
2899++ {__LINE__, 0x351fd770, "r*Fd}ef+5RJQ;+W=4jTR9)R*p!B;]Ed7tkrLi;88U7g@3v!5pk2X6D)vt,.@N8c]@yyEcKi[vwUu@.Ppm@C6%Mv*3Nw}Y,58_aH)", 100, 0xd8b95312},
2900++ {__LINE__, 0xc45aef77, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 100, 0xbb1c9912},
2901++ {__LINE__, 0xc45aef77, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2902++ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2903++ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2904++ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2905++ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2906++ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 600, 0x888AFA5B}
2907++};
2908++
2909++static const int test_size = sizeof(tests) / sizeof(tests[0]);
2910++
2911++int main(void)
2912++{
2913++ int i;
2914++ for (i = 0; i < test_size; i++) {
2915++ test_crc32(tests[i].crc, (Byte*) tests[i].buf, tests[i].len,
2916++ tests[i].expect, tests[i].line);
2917++ }
2918++ return 0;
2919++}
2920diff --git a/debian/patches/power/fix-clang7-builtins.patch b/debian/patches/power/fix-clang7-builtins.patch
2921new file mode 100644
2922index 0000000..0ed510f
2923--- /dev/null
2924+++ b/debian/patches/power/fix-clang7-builtins.patch
2925@@ -0,0 +1,62 @@
2926+From: Manjunath S Matti <mmatti@linux.ibm.com>
2927+Date: Thu, 14 Sep 2023 06:45:31 -0500
2928+Subject: Fix clang's behavior on versions >= 7
2929+
2930+Clang 7 changed the behavior of vec_xxpermdi in order to match GCC's
2931+behavior. After this change, code that used to work on Clang 6 stopped
2932+to work on Clang >= 7.
2933+
2934+Tested on Clang 6, 7, 8 and 9.
2935+
2936+Reference: https://bugs.llvm.org/show_bug.cgi?id=38192
2937+
2938+Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2939+Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
2940+
2941+Origin: i-iii/zlib, https://github.com/iii-i/zlib/commit/8aca10a8a5ddb397854eb9a443f29658d3e3e12e
2942+---
2943+ contrib/power/clang_workaround.h | 15 ++++++++++-----
2944+ 1 file changed, 10 insertions(+), 5 deletions(-)
2945+
2946+diff --git a/contrib/power/clang_workaround.h b/contrib/power/clang_workaround.h
2947+index b5e7dae..915f7e5 100644
2948+--- a/contrib/power/clang_workaround.h
2949++++ b/contrib/power/clang_workaround.h
2950+@@ -39,7 +39,12 @@ __vector unsigned long long __builtin_pack_vector (unsigned long __a,
2951+ return __v;
2952+ }
2953+
2954+-#ifndef vec_xxpermdi
2955++/*
2956++ * Clang 7 changed the behavior of vec_xxpermdi in order to provide the same
2957++ * behavior of GCC. That means code adapted to Clang >= 7 does not work on
2958++ * Clang <= 6. So, fallback to __builtin_unpack_vector() on Clang <= 6.
2959++ */
2960++#if !defined vec_xxpermdi || __clang_major__ <= 6
2961+
2962+ static inline
2963+ unsigned long __builtin_unpack_vector (__vector unsigned long long __v,
2964+@@ -62,9 +67,9 @@ static inline
2965+ unsigned long __builtin_unpack_vector_0 (__vector unsigned long long __v)
2966+ {
2967+ #if defined(__BIG_ENDIAN__)
2968+- return vec_xxpermdi(__v, __v, 0x0)[1];
2969+- #else
2970+ return vec_xxpermdi(__v, __v, 0x0)[0];
2971++ #else
2972++ return vec_xxpermdi(__v, __v, 0x3)[0];
2973+ #endif
2974+ }
2975+
2976+@@ -72,9 +77,9 @@ static inline
2977+ unsigned long __builtin_unpack_vector_1 (__vector unsigned long long __v)
2978+ {
2979+ #if defined(__BIG_ENDIAN__)
2980+- return vec_xxpermdi(__v, __v, 0x3)[1];
2981+- #else
2982+ return vec_xxpermdi(__v, __v, 0x3)[0];
2983++ #else
2984++ return vec_xxpermdi(__v, __v, 0x0)[0];
2985+ #endif
2986+ }
2987+ #endif /* vec_xxpermdi */
2988diff --git a/debian/patches/power/indirect-func-macros.patch b/debian/patches/power/indirect-func-macros.patch
2989new file mode 100644
2990index 0000000..c2976d8
2991--- /dev/null
2992+++ b/debian/patches/power/indirect-func-macros.patch
2993@@ -0,0 +1,295 @@
2994+From: Manjunath S Matti <mmatti@linux.ibm.com>
2995+Date: Thu, 14 Sep 2023 06:15:57 -0500
2996+Subject: Preparation for Power optimizations
2997+
2998+Optimized functions for Power will make use of GNU indirect functions,
2999+an extension to support different implementations of the same function,
3000+which can be selected during runtime. This will be used to provide
3001+optimized functions for different processor versions.
3002+
3003+Since this is a GNU extension, we placed the definition of the Z_IFUNC
3004+macro under `contrib/gcc`. This can be reused by other archs as well.
3005+
3006+Author: Matheus Castanho <msc@linux.ibm.com>
3007+Author: Rogerio Alves <rcardoso@linux.ibm.com>
3008+Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
3009+
3010+Origin: iii-i/zlib, https://github.com/iii-i/zlib/commit/096441298ecd1c123f1d37c2b34d6b6bb3c42e93
3011+---
3012+ CMakeLists.txt | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++
3013+ configure | 66 ++++++++++++++++++++++++++++++++++++++++++++++
3014+ contrib/README.contrib | 8 ++++++
3015+ contrib/gcc/zifunc.h | 60 ++++++++++++++++++++++++++++++++++++++++++
3016+ contrib/power/power.h | 4 +++
3017+ 5 files changed, 209 insertions(+)
3018+ create mode 100644 contrib/gcc/zifunc.h
3019+ create mode 100644 contrib/power/power.h
3020+
3021+diff --git a/CMakeLists.txt b/CMakeLists.txt
3022+index 7f1b69f..4456cd7 100644
3023+--- a/CMakeLists.txt
3024++++ b/CMakeLists.txt
3025+@@ -5,6 +5,8 @@ project(zlib C)
3026+
3027+ set(VERSION "1.3")
3028+
3029++option(POWER "Enable building power implementation")
3030++
3031+ set(INSTALL_BIN_DIR "${CMAKE_INSTALL_PREFIX}/bin" CACHE PATH "Installation directory for executables")
3032+ set(INSTALL_LIB_DIR "${CMAKE_INSTALL_PREFIX}/lib" CACHE PATH "Installation directory for libraries")
3033+ set(INSTALL_INC_DIR "${CMAKE_INSTALL_PREFIX}/include" CACHE PATH "Installation directory for headers")
3034+@@ -126,6 +128,75 @@ if(NOT MINGW)
3035+ )
3036+ endif()
3037+
3038++if(CMAKE_COMPILER_IS_GNUCC)
3039++
3040++ # test to see if we can use a GNU indirect function to detect and load optimized code at runtime
3041++ CHECK_C_SOURCE_COMPILES("
3042++ static int test_ifunc_native(void)
3043++ {
3044++ return 1;
3045++ }
3046++ static int (*(check_ifunc_native(void)))(void)
3047++ {
3048++ return test_ifunc_native;
3049++ }
3050++ int test_ifunc(void) __attribute__ ((ifunc (\"check_ifunc_native\")));
3051++ int main(void)
3052++ {
3053++ return 0;
3054++ }
3055++ " HAS_C_ATTR_IFUNC)
3056++
3057++ if(HAS_C_ATTR_IFUNC)
3058++ add_definitions(-DHAVE_IFUNC)
3059++ set(ZLIB_PRIVATE_HDRS ${ZLIB_PRIVATE_HDRS} contrib/gcc/zifunc.h)
3060++ endif()
3061++
3062++ if(POWER)
3063++ # Test to see if we can use the optimizations for Power
3064++ CHECK_C_SOURCE_COMPILES("
3065++ #ifndef _ARCH_PPC
3066++ #error \"Target is not Power\"
3067++ #endif
3068++ #ifndef __BUILTIN_CPU_SUPPORTS__
3069++ #error \"Target doesn't support __builtin_cpu_supports()\"
3070++ #endif
3071++ int main() { return 0; }
3072++ " HAS_POWER_SUPPORT)
3073++
3074++ if(HAS_POWER_SUPPORT AND HAS_C_ATTR_IFUNC)
3075++ add_definitions(-DZ_POWER_OPT)
3076++
3077++ set(CMAKE_REQUIRED_FLAGS -mcpu=power8)
3078++ CHECK_C_SOURCE_COMPILES("int main(void){return 0;}" POWER8)
3079++
3080++ if(POWER8)
3081++ add_definitions(-DZ_POWER8)
3082++ set(ZLIB_POWER8 )
3083++
3084++ set_source_files_properties(
3085++ ${ZLIB_POWER8}
3086++ PROPERTIES COMPILE_FLAGS -mcpu=power8)
3087++ endif()
3088++
3089++ set(CMAKE_REQUIRED_FLAGS -mcpu=power9)
3090++ CHECK_C_SOURCE_COMPILES("int main(void){return 0;}" POWER9)
3091++
3092++ if(POWER9)
3093++ add_definitions(-DZ_POWER9)
3094++ set(ZLIB_POWER9 )
3095++
3096++ set_source_files_properties(
3097++ ${ZLIB_POWER9}
3098++ PROPERTIES COMPILE_FLAGS -mcpu=power9)
3099++ endif()
3100++
3101++ set(ZLIB_PRIVATE_HDRS ${ZLIB_PRIVATE_HDRS} contrib/power/power.h)
3102++ set(ZLIB_SRCS ${ZLIB_SRCS} ${ZLIB_POWER8} ${ZLIB_POWER9})
3103++ endif()
3104++ endif()
3105++endif()
3106++
3107+ # parse the full version number from zlib.h and include in ZLIB_FULL_VERSION
3108+ file(READ ${CMAKE_CURRENT_SOURCE_DIR}/zlib.h _zlib_h_contents)
3109+ string(REGEX REPLACE ".*#define[ \t]+ZLIB_VERSION[ \t]+\"([-0-9A-Za-z.]+)\".*"
3110+diff --git a/configure b/configure
3111+index cc867c9..e307a8d 100755
3112+--- a/configure
3113++++ b/configure
3114+@@ -834,6 +834,72 @@ EOF
3115+ fi
3116+ fi
3117+
3118++# test to see if we can use a gnu indirection function to detect and load optimized code at runtime
3119++echo >> configure.log
3120++cat > $test.c <<EOF
3121++static int test_ifunc_native(void)
3122++{
3123++ return 1;
3124++}
3125++
3126++static int (*(check_ifunc_native(void)))(void)
3127++{
3128++ return test_ifunc_native;
3129++}
3130++
3131++int test_ifunc(void) __attribute__ ((ifunc ("check_ifunc_native")));
3132++EOF
3133++
3134++if tryboth $CC -c $CFLAGS $test.c; then
3135++ SFLAGS="${SFLAGS} -DHAVE_IFUNC"
3136++ CFLAGS="${CFLAGS} -DHAVE_IFUNC"
3137++ echo "Checking for attribute(ifunc) support... Yes." | tee -a configure.log
3138++else
3139++ echo "Checking for attribute(ifunc) support... No." | tee -a configure.log
3140++fi
3141++
3142++# Test to see if we can use the optimizations for Power
3143++echo >> configure.log
3144++cat > $test.c <<EOF
3145++#ifndef _ARCH_PPC
3146++ #error "Target is not Power"
3147++#endif
3148++#ifndef HAVE_IFUNC
3149++ #error "Target doesn't support ifunc"
3150++#endif
3151++#ifndef __BUILTIN_CPU_SUPPORTS__
3152++ #error "Target doesn't support __builtin_cpu_supports()"
3153++#endif
3154++EOF
3155++
3156++if tryboth $CC -c $CFLAGS $test.c; then
3157++ echo "int main(void){return 0;}" > $test.c
3158++
3159++ if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
3160++ POWER8="-DZ_POWER8"
3161++ PIC_OBJC="${PIC_OBJC}"
3162++ OBJC="${OBJC}"
3163++ echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
3164++ else
3165++ echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
3166++ fi
3167++
3168++ if tryboth $CC -c $CFLAGS -mcpu=power9 $test.c; then
3169++ POWER9="-DZ_POWER9"
3170++ PIC_OBJC="${PIC_OBJC}"
3171++ OBJC="${OBJC}"
3172++ echo "Checking for -mcpu=power9 support... Yes." | tee -a configure.log
3173++ else
3174++ echo "Checking for -mcpu=power9 support... No." | tee -a configure.log
3175++ fi
3176++
3177++ SFLAGS="${SFLAGS} ${POWER8} ${POWER9} -DZ_POWER_OPT"
3178++ CFLAGS="${CFLAGS} ${POWER8} ${POWER9} -DZ_POWER_OPT"
3179++ echo "Checking for Power optimizations support... Yes." | tee -a configure.log
3180++else
3181++ echo "Checking for Power optimizations support... No." | tee -a configure.log
3182++fi
3183++
3184+ # show the results in the log
3185+ echo >> configure.log
3186+ echo ALL = $ALL >> configure.log
3187+diff --git a/contrib/README.contrib b/contrib/README.contrib
3188+index 5e5f950..c57b520 100644
3189+--- a/contrib/README.contrib
3190++++ b/contrib/README.contrib
3191+@@ -11,6 +11,10 @@ ada/ by Dmitriy Anisimkov <anisimkov@yahoo.com>
3192+ blast/ by Mark Adler <madler@alumni.caltech.edu>
3193+ Decompressor for output of PKWare Data Compression Library (DCL)
3194+
3195++gcc/ by Matheus Castanho <msc@linux.ibm.com>
3196++ and Rogerio Alves <rcardoso@linux.ibm.com>
3197++ Optimization helpers using GCC-specific extensions
3198++
3199+ delphi/ by Cosmin Truta <cosmint@cs.ubbcluj.ro>
3200+ Support for Delphi and C++ Builder
3201+
3202+@@ -42,6 +46,10 @@ minizip/ by Gilles Vollant <info@winimage.com>
3203+ pascal/ by Bob Dellaca <bobdl@xtra.co.nz> et al.
3204+ Support for Pascal
3205+
3206++power/ by Matheus Castanho <msc@linux.ibm.com>
3207++ and Rogerio Alves <rcardoso@linux.ibm.com>
3208++ Optimized functions for Power processors
3209++
3210+ puff/ by Mark Adler <madler@alumni.caltech.edu>
3211+ Small, low memory usage inflate. Also serves to provide an
3212+ unambiguous description of the deflate format.
3213+diff --git a/contrib/gcc/zifunc.h b/contrib/gcc/zifunc.h
3214+new file mode 100644
3215+index 0000000..daf4fe4
3216+--- /dev/null
3217++++ b/contrib/gcc/zifunc.h
3218+@@ -0,0 +1,60 @@
3219++/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
3220++ * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
3221++ * For conditions of distribution and use, see copyright notice in zlib.h
3222++ */
3223++
3224++#ifndef Z_IFUNC_H_
3225++#define Z_IFUNC_H_
3226++
3227++/* Helpers for arch optimizations */
3228++
3229++#define Z_IFUNC(fname) \
3230++ typeof(fname) fname __attribute__ ((ifunc (#fname "_resolver"))); \
3231++ local typeof(fname) *fname##_resolver(void)
3232++/* This is a helper macro to declare a resolver for an indirect function
3233++ * (ifunc). Let's say you have function
3234++ *
3235++ * int foo (int a);
3236++ *
3237++ * for which you want to provide different implementations, for example:
3238++ *
3239++ * int foo_clever (int a) {
3240++ * ... clever things ...
3241++ * }
3242++ *
3243++ * int foo_smart (int a) {
3244++ * ... smart things ...
3245++ * }
3246++ *
3247++ * You will have to declare foo() as an indirect function and also provide a
3248++ * resolver for it, to choose between foo_clever() and foo_smart() based on
3249++ * some criteria you define (e.g. processor features).
3250++ *
3251++ * Since most likely foo() has a default implementation somewhere in zlib, you
3252++ * may have to rename it so the 'foo' symbol can be used by the ifunc without
3253++ * conflicts.
3254++ *
3255++ * #define foo foo_default
3256++ * int foo (int a) {
3257++ * ...
3258++ * }
3259++ * #undef foo
3260++ *
3261++ * Now you just have to provide a resolver function to choose which function
3262++ * should be used (decided at runtime on the first call to foo()):
3263++ *
3264++ * Z_IFUNC(foo) {
3265++ * if (... some condition ...)
3266++ * return foo_clever;
3267++ *
3268++ * if (... other condition ...)
3269++ * return foo_smart;
3270++ *
3271++ * return foo_default;
3272++ * }
3273++ *
3274++ * All calls to foo() throughout the code can remain untouched, all the magic
3275++ * will be done by the linker using the resolver function.
3276++ */
3277++
3278++#endif /* Z_IFUNC_H_ */
3279+diff --git a/contrib/power/power.h b/contrib/power/power.h
3280+new file mode 100644
3281+index 0000000..b42c7d6
3282+--- /dev/null
3283++++ b/contrib/power/power.h
3284+@@ -0,0 +1,4 @@
3285++/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
3286++ * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
3287++ * For conditions of distribution and use, see copyright notice in zlib.h
3288++ */
3289diff --git a/debian/patches/s390x/add-accel-deflate.patch b/debian/patches/s390x/add-accel-deflate.patch
3290new file mode 100644
3291index 0000000..1ae9be6
3292--- /dev/null
3293+++ b/debian/patches/s390x/add-accel-deflate.patch
3294@@ -0,0 +1,2043 @@
3295+From: Ilya Leoshkevich <iii@linux.ibm.com>
3296+Date: Wed, 18 Jul 2018 13:14:07 +0200
3297+Subject: Add support for IBM Z hardware-accelerated deflate
3298+
3299+IBM Z mainframes starting from version z15 provide DFLTCC instruction,
3300+which implements deflate algorithm in hardware with estimated
3301+compression and decompression performance orders of magnitude faster
3302+than the current zlib and ratio comparable with that of level 1.
3303+
3304+This patch adds DFLTCC support to zlib. It can be enabled using the
3305+following build commands:
3306+
3307+ $ ./configure --dfltcc
3308+ $ make
3309+
3310+When built like this, zlib would compress in hardware on level 1, and
3311+in software on all other levels. Decompression will always happen in
3312+hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
3313+to make it used by default) one could either configure with
3314+`--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run
3315+time.
3316+
3317+Two DFLTCC compression calls produce the same results only when they
3318+both are made on machines of the same generation, and when the
3319+respective buffers have the same offset relative to the start of the
3320+page. Therefore care should be taken when using hardware compression
3321+when reproducible results are desired. One such use case - reproducible
3322+software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH`
3323+environment variable is set, the hardware compression is disabled.
3324+
3325+DFLTCC does not support every single zlib feature, in particular:
3326+
3327+ * `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
3328+ * `inflateMark()`
3329+ * `inflatePrime()`
3330+ * `inflateSyncPoint()`
3331+
3332+When used, these functions will either switch to software, or, in case
3333+this is not possible, gracefully fail.
3334+
3335+This patch tries to add DFLTCC support in the least intrusive way.
3336+All SystemZ-specific code is placed into a separate file, but
3337+unfortunately there is still a noticeable amount of changes in the
3338+main zlib code. Below is the summary of these changes.
3339+
3340+DFLTCC takes as arguments a parameter block, an input buffer, an output
3341+buffer and a window. Since DFLTCC requires parameter block to be
3342+doubleword-aligned, and it's reasonable to allocate it alongside
3343+deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and
3344+`ZCOPY_STATE()` macros are introduced in order to encapsulate the
3345+allocation details. The same is true for window, for which
3346+the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced.
3347+
3348+Software and hardware window formats do not match, therefore,
3349+`deflateSetDictionary()`, `deflateGetDictionary()`,
3350+`inflateSetDictionary()` and `inflateGetDictionary()` need special
3351+handling, which is triggered using the new
3352+`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
3353+`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()`
3354+macros.
3355+
3356+`deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC
3357+parameter block, which is allocated alongside zlib state, using
3358+the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()`
3359+macros.
3360+
3361+The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware
3362+and the software deflate implementations when the `deflateParams()`
3363+arguments demand this.
3364+
3365+The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
3366+`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
3367+calls gracefully fail.
3368+
3369+The algorithm implemented in the hardware has different compression
3370+ratio than the one implemented in software. In order for
3371+`deflateBound()` to return the correct results for the hardware
3372+implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and
3373+`DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced.
3374+
3375+Actual compression and decompression are handled by the new
3376+`DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation
3377+with DFLTCC manages the window on its own, calling `updatewindow()` is
3378+suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro.
3379+
3380+In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
3381+checksums, therefore, whenever it's used, the software checksumming is
3382+suppressed using the new `DEFLATE_NEED_CHECKSUM()` and
3383+`INFLATE_NEED_CHECKSUM()` macros.
3384+
3385+DFLTCC will refuse to write an End-of-block Symbol if there is no input
3386+data, thus in some cases it is necessary to do this manually. In order
3387+to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and
3388+`flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`.
3389+Furthermore, since the block and the stream termination must be handled
3390+in software as well, `enum block_state` is moved to `deflate.h`.
3391+
3392+Since the first call to `dfltcc_inflate()` already needs the window,
3393+and it might be not allocated yet, `inflate_ensure_window()` is
3394+factored out of `updatewindow()` and made `ZLIB_INTERNAL`.
3395+
3396+Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
3397+Origin: i-iii/zlib,https://github.com/iii-i/zlib/commit/481ee63d5f8fa12b5c833d32d08a3c74bc62cb20
3398+---
3399+ Makefile.in | 8 +
3400+ compress.c | 14 +-
3401+ configure | 24 +
3402+ contrib/README.contrib | 4 +
3403+ contrib/s390/README.txt | 17 +
3404+ contrib/s390/dfltcc.c | 1004 +++++++++++++++++++++++++++++++++++++++++
3405+ contrib/s390/dfltcc.h | 97 ++++
3406+ contrib/s390/dfltcc_deflate.h | 53 +++
3407+ deflate.c | 76 +++-
3408+ deflate.h | 12 +
3409+ gzguts.h | 4 +
3410+ inflate.c | 98 ++--
3411+ inflate.h | 2 +
3412+ test/infcover.c | 3 +-
3413+ test/minigzip.c | 4 +
3414+ trees.c | 8 +-
3415+ zutil.h | 2 +
3416+ 17 files changed, 1371 insertions(+), 59 deletions(-)
3417+ create mode 100644 contrib/s390/README.txt
3418+ create mode 100644 contrib/s390/dfltcc.c
3419+ create mode 100644 contrib/s390/dfltcc.h
3420+ create mode 100644 contrib/s390/dfltcc_deflate.h
3421+
3422+diff --git a/Makefile.in b/Makefile.in
3423+index ede4db3..1710f63 100644
3424+--- a/Makefile.in
3425++++ b/Makefile.in
3426+@@ -140,6 +140,14 @@ match.lo: match.S
3427+ mv _match.o match.lo
3428+ rm -f _match.s
3429+
3430++dfltcc.o: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
3431++ $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)contrib/s390/dfltcc.c
3432++
3433++dfltcc.lo: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
3434++ -@mkdir objs 2>/dev/null || test -d objs
3435++ $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/dfltcc.o $(SRCDIR)contrib/s390/dfltcc.c
3436++ -@mv objs/dfltcc.o $@
3437++
3438+ crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
3439+ $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c
3440+
3441+diff --git a/compress.c b/compress.c
3442+index f43bacf..08a0660 100644
3443+--- a/compress.c
3444++++ b/compress.c
3445+@@ -5,9 +5,15 @@
3446+
3447+ /* @(#) $Id$ */
3448+
3449+-#define ZLIB_INTERNAL
3450++#include "zutil.h"
3451+ #include "zlib.h"
3452+
3453++#ifdef DFLTCC
3454++# include "contrib/s390/dfltcc.h"
3455++#else
3456++#define DEFLATE_BOUND_COMPLEN(source_len) 0
3457++#endif
3458++
3459+ /* ===========================================================================
3460+ Compresses the source buffer into the destination buffer. The level
3461+ parameter has the same meaning as in deflateInit. sourceLen is the byte
3462+@@ -70,6 +76,12 @@ int ZEXPORT compress(Bytef *dest, uLongf *destLen, const Bytef *source,
3463+ this function needs to be updated.
3464+ */
3465+ uLong ZEXPORT compressBound(uLong sourceLen) {
3466++ uLong complen = DEFLATE_BOUND_COMPLEN(sourceLen);
3467++
3468++ if (complen > 0)
3469++ /* Architecture-specific code provided an upper bound. */
3470++ return complen + ZLIB_WRAPLEN;
3471++
3472+ return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
3473+ (sourceLen >> 25) + 13;
3474+ }
3475+diff --git a/configure b/configure
3476+index 3372cbf..b99a348 100755
3477+--- a/configure
3478++++ b/configure
3479+@@ -117,6 +117,7 @@ case "$1" in
3480+ echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
3481+ echo ' [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
3482+ echo ' [--includedir=INCLUDEDIR] [--archs="-arch i386 -arch x86_64"]' | tee -a configure.log
3483++ echo ' [--dfltcc] [--dfltcc-level-mask=MASK]' | tee -a configure.log
3484+ exit 0 ;;
3485+ -p*=* | --prefix=*) prefix=`echo $1 | sed 's/.*=//'`; shift ;;
3486+ -e*=* | --eprefix=*) exec_prefix=`echo $1 | sed 's/.*=//'`; shift ;;
3487+@@ -143,6 +144,16 @@ case "$1" in
3488+ --sanitize) address=1; shift ;;
3489+ --address) address=1; shift ;;
3490+ --memory) memory=1; shift ;;
3491++ --dfltcc)
3492++ CFLAGS="$CFLAGS -DDFLTCC"
3493++ OBJC="$OBJC dfltcc.o"
3494++ PIC_OBJC="$PIC_OBJC dfltcc.lo"
3495++ shift
3496++ ;;
3497++ --dfltcc-level-mask=*)
3498++ CFLAGS="$CFLAGS -DDFLTCC_LEVEL_MASK=`echo $1 | sed 's/.*=//'`"
3499++ shift
3500++ ;;
3501+ *)
3502+ echo "unknown option: $1" | tee -a configure.log
3503+ echo "$0 --help for help" | tee -a configure.log
3504+@@ -834,6 +845,19 @@ EOF
3505+ fi
3506+ fi
3507+
3508++# Check whether sys/sdt.h is available
3509++cat > $test.c << EOF
3510++#include <sys/sdt.h>
3511++int main() { return 0; }
3512++EOF
3513++if try $CC -c $CFLAGS $test.c; then
3514++ echo "Checking for sys/sdt.h ... Yes." | tee -a configure.log
3515++ CFLAGS="$CFLAGS -DHAVE_SYS_SDT_H"
3516++ SFLAGS="$SFLAGS -DHAVE_SYS_SDT_H"
3517++else
3518++ echo "Checking for sys/sdt.h ... No." | tee -a configure.log
3519++fi
3520++
3521+ # test to see if we can use a gnu indirection function to detect and load optimized code at runtime
3522+ echo >> configure.log
3523+ cat > $test.c <<EOF
3524+diff --git a/contrib/README.contrib b/contrib/README.contrib
3525+index 90170df..a36d404 100644
3526+--- a/contrib/README.contrib
3527++++ b/contrib/README.contrib
3528+@@ -55,6 +55,10 @@ puff/ by Mark Adler <madler@alumni.caltech.edu>
3529+ Small, low memory usage inflate. Also serves to provide an
3530+ unambiguous description of the deflate format.
3531+
3532++s390/ by Ilya Leoshkevich <iii@linux.ibm.com>
3533++ Hardware-accelerated deflate on IBM Z with DEFLATE CONVERSION CALL
3534++ instruction.
3535++
3536+ testzlib/ by Gilles Vollant <info@winimage.com>
3537+ Example of the use of zlib
3538+
3539+diff --git a/contrib/s390/README.txt b/contrib/s390/README.txt
3540+new file mode 100644
3541+index 0000000..48be008
3542+--- /dev/null
3543++++ b/contrib/s390/README.txt
3544+@@ -0,0 +1,17 @@
3545++IBM Z mainframes starting from version z15 provide DFLTCC instruction,
3546++which implements deflate algorithm in hardware with estimated
3547++compression and decompression performance orders of magnitude faster
3548++than the current zlib and ratio comparable with that of level 1.
3549++
3550++This directory adds DFLTCC support. In order to enable it, the following
3551++build commands should be used:
3552++
3553++ $ ./configure --dfltcc
3554++ $ make
3555++
3556++When built like this, zlib would compress in hardware on level 1, and in
3557++software on all other levels. Decompression will always happen in
3558++hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
3559++make it used by default) one could either configure with
3560++--dfltcc-level-mask=0x7e or set the environment variable
3561++DFLTCC_LEVEL_MASK to 0x7e at run time.
3562+diff --git a/contrib/s390/dfltcc.c b/contrib/s390/dfltcc.c
3563+new file mode 100644
3564+index 0000000..f2b222d
3565+--- /dev/null
3566++++ b/contrib/s390/dfltcc.c
3567+@@ -0,0 +1,1004 @@
3568++/* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */
3569++
3570++/*
3571++ Use the following commands to build zlib with DFLTCC support:
3572++
3573++ $ ./configure --dfltcc
3574++ $ make
3575++*/
3576++
3577++#define _GNU_SOURCE
3578++#include <ctype.h>
3579++#include <errno.h>
3580++#include <inttypes.h>
3581++#include <stddef.h>
3582++#include <stdio.h>
3583++#include <stdint.h>
3584++#include <stdlib.h>
3585++#include "../../zutil.h"
3586++#include "../../deflate.h"
3587++#include "../../inftrees.h"
3588++#include "../../inflate.h"
3589++#include "dfltcc.h"
3590++#include "dfltcc_deflate.h"
3591++#ifdef HAVE_SYS_SDT_H
3592++#include <sys/sdt.h>
3593++#endif
3594++
3595++/*
3596++ C wrapper for the DEFLATE CONVERSION CALL instruction.
3597++ */
3598++typedef enum {
3599++ DFLTCC_CC_OK = 0,
3600++ DFLTCC_CC_OP1_TOO_SHORT = 1,
3601++ DFLTCC_CC_OP2_TOO_SHORT = 2,
3602++ DFLTCC_CC_OP2_CORRUPT = 2,
3603++ DFLTCC_CC_AGAIN = 3,
3604++} dfltcc_cc;
3605++
3606++#define DFLTCC_QAF 0
3607++#define DFLTCC_GDHT 1
3608++#define DFLTCC_CMPR 2
3609++#define DFLTCC_XPND 4
3610++#define HBT_CIRCULAR (1 << 7)
3611++#define HB_BITS 15
3612++#define HB_SIZE (1 << HB_BITS)
3613++#define DFLTCC_FACILITY 151
3614++
3615++local inline dfltcc_cc dfltcc(int fn, void *param,
3616++ Bytef **op1, size_t *len1,
3617++ z_const Bytef **op2, size_t *len2,
3618++ void *hist)
3619++{
3620++ Bytef *t2 = op1 ? *op1 : NULL;
3621++ size_t t3 = len1 ? *len1 : 0;
3622++ z_const Bytef *t4 = op2 ? *op2 : NULL;
3623++ size_t t5 = len2 ? *len2 : 0;
3624++ register int r0 __asm__("r0") = fn;
3625++ register void *r1 __asm__("r1") = param;
3626++ register Bytef *r2 __asm__("r2") = t2;
3627++ register size_t r3 __asm__("r3") = t3;
3628++ register z_const Bytef *r4 __asm__("r4") = t4;
3629++ register size_t r5 __asm__("r5") = t5;
3630++ int cc;
3631++
3632++ __asm__ volatile(
3633++#ifdef HAVE_SYS_SDT_H
3634++ STAP_PROBE_ASM(zlib, dfltcc_entry,
3635++ STAP_PROBE_ASM_TEMPLATE(5))
3636++#endif
3637++ ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n"
3638++#ifdef HAVE_SYS_SDT_H
3639++ STAP_PROBE_ASM(zlib, dfltcc_exit,
3640++ STAP_PROBE_ASM_TEMPLATE(5))
3641++#endif
3642++ "ipm %[cc]\n"
3643++ : [r2] "+r" (r2)
3644++ , [r3] "+r" (r3)
3645++ , [r4] "+r" (r4)
3646++ , [r5] "+r" (r5)
3647++ , [cc] "=r" (cc)
3648++ : [r0] "r" (r0)
3649++ , [r1] "r" (r1)
3650++ , [hist] "r" (hist)
3651++#ifdef HAVE_SYS_SDT_H
3652++ , STAP_PROBE_ASM_OPERANDS(5, r2, r3, r4, r5, hist)
3653++#endif
3654++ : "cc", "memory");
3655++ t2 = r2; t3 = r3; t4 = r4; t5 = r5;
3656++
3657++ if (op1)
3658++ *op1 = t2;
3659++ if (len1)
3660++ *len1 = t3;
3661++ if (op2)
3662++ *op2 = t4;
3663++ if (len2)
3664++ *len2 = t5;
3665++ return (cc >> 28) & 3;
3666++}
3667++
3668++/*
3669++ Parameter Block for Query Available Functions.
3670++ */
3671++#define static_assert(c, msg) \
3672++ __attribute__((unused)) \
3673++ static char static_assert_failed_ ## msg[c ? 1 : -1]
3674++
3675++struct dfltcc_qaf_param {
3676++ char fns[16];
3677++ char reserved1[8];
3678++ char fmts[2];
3679++ char reserved2[6];
3680++};
3681++
3682++static_assert(sizeof(struct dfltcc_qaf_param) == 32,
3683++ sizeof_struct_dfltcc_qaf_param_is_32);
3684++
3685++local inline int is_bit_set(const char *bits, int n)
3686++{
3687++ return bits[n / 8] & (1 << (7 - (n % 8)));
3688++}
3689++
3690++local inline void clear_bit(char *bits, int n)
3691++{
3692++ bits[n / 8] &= ~(1 << (7 - (n % 8)));
3693++}
3694++
3695++#define DFLTCC_FMT0 0
3696++
3697++/*
3698++ Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand.
3699++ */
3700++#define CVT_CRC32 0
3701++#define CVT_ADLER32 1
3702++#define HTT_FIXED 0
3703++#define HTT_DYNAMIC 1
3704++
3705++struct dfltcc_param_v0 {
3706++ uint16_t pbvn; /* Parameter-Block-Version Number */
3707++ uint8_t mvn; /* Model-Version Number */
3708++ uint8_t ribm; /* Reserved for IBM use */
3709++ unsigned reserved32 : 31;
3710++ unsigned cf : 1; /* Continuation Flag */
3711++ uint8_t reserved64[8];
3712++ unsigned nt : 1; /* New Task */
3713++ unsigned reserved129 : 1;
3714++ unsigned cvt : 1; /* Check Value Type */
3715++ unsigned reserved131 : 1;
3716++ unsigned htt : 1; /* Huffman-Table Type */
3717++ unsigned bcf : 1; /* Block-Continuation Flag */
3718++ unsigned bcc : 1; /* Block Closing Control */
3719++ unsigned bhf : 1; /* Block Header Final */
3720++ unsigned reserved136 : 1;
3721++ unsigned reserved137 : 1;
3722++ unsigned dhtgc : 1; /* DHT Generation Control */
3723++ unsigned reserved139 : 5;
3724++ unsigned reserved144 : 5;
3725++ unsigned sbb : 3; /* Sub-Byte Boundary */
3726++ uint8_t oesc; /* Operation-Ending-Supplemental Code */
3727++ unsigned reserved160 : 12;
3728++ unsigned ifs : 4; /* Incomplete-Function Status */
3729++ uint16_t ifl; /* Incomplete-Function Length */
3730++ uint8_t reserved192[8];
3731++ uint8_t reserved256[8];
3732++ uint8_t reserved320[4];
3733++ uint16_t hl; /* History Length */
3734++ unsigned reserved368 : 1;
3735++ uint16_t ho : 15; /* History Offset */
3736++ uint32_t cv; /* Check Value */
3737++ unsigned eobs : 15; /* End-of-block Symbol */
3738++ unsigned reserved431: 1;
3739++ uint8_t eobl : 4; /* End-of-block Length */
3740++ unsigned reserved436 : 12;
3741++ unsigned reserved448 : 4;
3742++ uint16_t cdhtl : 12; /* Compressed-Dynamic-Huffman Table
3743++ Length */
3744++ uint8_t reserved464[6];
3745++ uint8_t cdht[288];
3746++ uint8_t reserved[32];
3747++ uint8_t csb[1152];
3748++};
3749++
3750++static_assert(sizeof(struct dfltcc_param_v0) == 1536,
3751++ sizeof_struct_dfltcc_param_v0_is_1536);
3752++
3753++local z_const char *oesc_msg(char *buf, int oesc)
3754++{
3755++ if (oesc == 0x00)
3756++ return NULL; /* Successful completion */
3757++ else {
3758++ sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc);
3759++ return buf;
3760++ }
3761++}
3762++
3763++/*
3764++ Extension of inflate_state and deflate_state. Must be doubleword-aligned.
3765++*/
3766++struct dfltcc_state {
3767++ struct dfltcc_param_v0 param; /* Parameter block. */
3768++ struct dfltcc_qaf_param af; /* Available functions. */
3769++ uLong level_mask; /* Levels on which to use DFLTCC */
3770++ uLong block_size; /* New block each X bytes */
3771++ uLong block_threshold; /* New block after total_in > X */
3772++ uLong dht_threshold; /* New block only if avail_in >= X */
3773++ char msg[64]; /* Buffer for strm->msg */
3774++};
3775++
3776++#define ALIGN_UP(p, size) \
3777++ (__typeof__(p))(((uintptr_t)(p) + ((size) - 1)) & ~((size) - 1))
3778++
3779++#define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)( \
3780++ (char *)(state) + ALIGN_UP(sizeof(*state), 8)))
3781++
3782++/*
3783++ Compress.
3784++ */
3785++local inline int dfltcc_can_deflate_with_params(z_streamp strm,
3786++ int level,
3787++ uInt window_bits,
3788++ int strategy)
3789++{
3790++ deflate_state *state = (deflate_state *)strm->state;
3791++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
3792++
3793++ /* Unsupported compression settings */
3794++ if ((dfltcc_state->level_mask & (1 << level)) == 0)
3795++ return 0;
3796++ if (window_bits != HB_BITS)
3797++ return 0;
3798++ if (strategy != Z_FIXED && strategy != Z_DEFAULT_STRATEGY)
3799++ return 0;
3800++
3801++ /* Unsupported hardware */
3802++ if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) ||
3803++ !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) ||
3804++ !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0))
3805++ return 0;
3806++
3807++ return 1;
3808++}
3809++
3810++int ZLIB_INTERNAL dfltcc_can_deflate(z_streamp strm)
3811++{
3812++ deflate_state *state = (deflate_state *)strm->state;
3813++
3814++ return dfltcc_can_deflate_with_params(strm,
3815++ state->level,
3816++ state->w_bits,
3817++ state->strategy);
3818++}
3819++
3820++local void dfltcc_gdht(z_streamp strm)
3821++{
3822++ deflate_state *state = (deflate_state *)strm->state;
3823++ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
3824++ size_t avail_in = avail_in = strm->avail_in;
3825++
3826++ dfltcc(DFLTCC_GDHT,
3827++ param, NULL, NULL,
3828++ &strm->next_in, &avail_in, NULL);
3829++}
3830++
3831++local dfltcc_cc dfltcc_cmpr(z_streamp strm)
3832++{
3833++ deflate_state *state = (deflate_state *)strm->state;
3834++ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
3835++ size_t avail_in = strm->avail_in;
3836++ size_t avail_out = strm->avail_out;
3837++ dfltcc_cc cc;
3838++
3839++ cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR,
3840++ param, &strm->next_out, &avail_out,
3841++ &strm->next_in, &avail_in, state->window);
3842++ strm->total_in += (strm->avail_in - avail_in);
3843++ strm->total_out += (strm->avail_out - avail_out);
3844++ strm->avail_in = avail_in;
3845++ strm->avail_out = avail_out;
3846++ return cc;
3847++}
3848++
3849++local void send_eobs(z_streamp strm,
3850++ z_const struct dfltcc_param_v0 *param)
3851++{
3852++ deflate_state *state = (deflate_state *)strm->state;
3853++
3854++ _tr_send_bits(
3855++ state,
3856++ bi_reverse(param->eobs >> (15 - param->eobl), param->eobl),
3857++ param->eobl);
3858++ flush_pending(strm);
3859++ if (state->pending != 0) {
3860++ /* The remaining data is located in pending_out[0:pending]. If someone
3861++ * calls put_byte() - this might happen in deflate() - the byte will be
3862++ * placed into pending_buf[pending], which is incorrect. Move the
3863++ * remaining data to the beginning of pending_buf so that put_byte() is
3864++ * usable again.
3865++ */
3866++ memmove(state->pending_buf, state->pending_out, state->pending);
3867++ state->pending_out = state->pending_buf;
3868++ }
3869++#ifdef ZLIB_DEBUG
3870++ state->compressed_len += param->eobl;
3871++#endif
3872++}
3873++
3874++int ZLIB_INTERNAL dfltcc_deflate(z_streamp strm, int flush,
3875++ block_state *result)
3876++{
3877++ deflate_state *state = (deflate_state *)strm->state;
3878++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
3879++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
3880++ uInt masked_avail_in;
3881++ dfltcc_cc cc;
3882++ int need_empty_block;
3883++ int soft_bcc;
3884++ int no_flush;
3885++
3886++ if (!dfltcc_can_deflate(strm)) {
3887++ /* Clear history. */
3888++ if (flush == Z_FULL_FLUSH)
3889++ param->hl = 0;
3890++ return 0;
3891++ }
3892++
3893++again:
3894++ masked_avail_in = 0;
3895++ soft_bcc = 0;
3896++ no_flush = flush == Z_NO_FLUSH;
3897++
3898++ /* No input data. Return, except when Continuation Flag is set, which means
3899++ * that DFLTCC has buffered some output in the parameter block and needs to
3900++ * be called again in order to flush it.
3901++ */
3902++ if (strm->avail_in == 0 && !param->cf) {
3903++ /* A block is still open, and the hardware does not support closing
3904++ * blocks without adding data. Thus, close it manually.
3905++ */
3906++ if (!no_flush && param->bcf) {
3907++ send_eobs(strm, param);
3908++ param->bcf = 0;
3909++ }
3910++ /* Let one of deflate_* functions write a trailing empty block. */
3911++ if (flush == Z_FINISH)
3912++ return 0;
3913++ /* Clear history. */
3914++ if (flush == Z_FULL_FLUSH)
3915++ param->hl = 0;
3916++ /* Trigger block post-processing if necessary. */
3917++ *result = no_flush ? need_more : block_done;
3918++ return 1;
3919++ }
3920++
3921++ /* There is an open non-BFINAL block, we are not going to close it just
3922++ * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see
3923++ * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new
3924++ * DHT in order to adapt to a possibly changed input data distribution.
3925++ */
3926++ if (param->bcf && no_flush &&
3927++ strm->total_in > dfltcc_state->block_threshold &&
3928++ strm->avail_in >= dfltcc_state->dht_threshold) {
3929++ if (param->cf) {
3930++ /* We need to flush the DFLTCC buffer before writing the
3931++ * End-of-block Symbol. Mask the input data and proceed as usual.
3932++ */
3933++ masked_avail_in += strm->avail_in;
3934++ strm->avail_in = 0;
3935++ no_flush = 0;
3936++ } else {
3937++ /* DFLTCC buffer is empty, so we can manually write the
3938++ * End-of-block Symbol right away.
3939++ */
3940++ send_eobs(strm, param);
3941++ param->bcf = 0;
3942++ dfltcc_state->block_threshold =
3943++ strm->total_in + dfltcc_state->block_size;
3944++ }
3945++ }
3946++
3947++ /* No space for compressed data. If we proceed, dfltcc_cmpr() will return
3948++ * DFLTCC_CC_OP1_TOO_SHORT without buffering header bits, but we will still
3949++ * set BCF=1, which is wrong. Avoid complications and return early.
3950++ */
3951++ if (strm->avail_out == 0) {
3952++ *result = need_more;
3953++ return 1;
3954++ }
3955++
3956++ /* The caller gave us too much data. Pass only one block worth of
3957++ * uncompressed data to DFLTCC and mask the rest, so that on the next
3958++ * iteration we start a new block.
3959++ */
3960++ if (no_flush && strm->avail_in > dfltcc_state->block_size) {
3961++ masked_avail_in += (strm->avail_in - dfltcc_state->block_size);
3962++ strm->avail_in = dfltcc_state->block_size;
3963++ }
3964++
3965++ /* When we have an open non-BFINAL deflate block and caller indicates that
3966++ * the stream is ending, we need to close an open deflate block and open a
3967++ * BFINAL one.
3968++ */
3969++ need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf;
3970++
3971++ /* Translate stream to parameter block */
3972++ param->cvt = state->wrap == 2 ? CVT_CRC32 : CVT_ADLER32;
3973++ if (!no_flush)
3974++ /* We need to close a block. Always do this in software - when there is
3975++ * no input data, the hardware will not honor BCC. */
3976++ soft_bcc = 1;
3977++ if (flush == Z_FINISH && !param->bcf)
3978++ /* We are about to open a BFINAL block, set Block Header Final bit
3979++ * until the stream ends.
3980++ */
3981++ param->bhf = 1;
3982++ /* DFLTCC-CMPR will write to next_out, so make sure that buffers with
3983++ * higher precedence are empty.
3984++ */
3985++ Assert(state->pending == 0, "There must be no pending bytes");
3986++ Assert(state->bi_valid < 8, "There must be less than 8 pending bits");
3987++ param->sbb = (unsigned int)state->bi_valid;
3988++ if (param->sbb > 0)
3989++ *strm->next_out = (Bytef)state->bi_buf;
3990++ /* Honor history and check value */
3991++ param->nt = 0;
3992++ if (state->wrap == 1)
3993++ param->cv = strm->adler;
3994++ else if (state->wrap == 2)
3995++ param->cv = ZSWAP32(strm->adler);
3996++
3997++ /* When opening a block, choose a Huffman-Table Type */
3998++ if (!param->bcf) {
3999++ if (state->strategy == Z_FIXED ||
4000++ (strm->total_in == 0 && dfltcc_state->block_threshold > 0))
4001++ param->htt = HTT_FIXED;
4002++ else {
4003++ param->htt = HTT_DYNAMIC;
4004++ dfltcc_gdht(strm);
4005++ }
4006++ }
4007++
4008++ /* Deflate */
4009++ do {
4010++ cc = dfltcc_cmpr(strm);
4011++ if (strm->avail_in < 4096 && masked_avail_in > 0)
4012++ /* We are about to call DFLTCC with a small input buffer, which is
4013++ * inefficient. Since there is masked data, there will be at least
4014++ * one more DFLTCC call, so skip the current one and make the next
4015++ * one handle more data.
4016++ */
4017++ break;
4018++ } while (cc == DFLTCC_CC_AGAIN);
4019++
4020++ /* Translate parameter block to stream */
4021++ strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
4022++ state->bi_valid = param->sbb;
4023++ if (state->bi_valid == 0)
4024++ state->bi_buf = 0; /* Avoid accessing next_out */
4025++ else
4026++ state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1);
4027++ if (state->wrap == 1)
4028++ strm->adler = param->cv;
4029++ else if (state->wrap == 2)
4030++ strm->adler = ZSWAP32(param->cv);
4031++
4032++ /* Unmask the input data */
4033++ strm->avail_in += masked_avail_in;
4034++ masked_avail_in = 0;
4035++
4036++ /* If we encounter an error, it means there is a bug in DFLTCC call */
4037++ Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG");
4038++
4039++ /* Update Block-Continuation Flag. It will be used to check whether to call
4040++ * GDHT the next time.
4041++ */
4042++ if (cc == DFLTCC_CC_OK) {
4043++ if (soft_bcc) {
4044++ send_eobs(strm, param);
4045++ param->bcf = 0;
4046++ dfltcc_state->block_threshold =
4047++ strm->total_in + dfltcc_state->block_size;
4048++ } else
4049++ param->bcf = 1;
4050++ if (flush == Z_FINISH) {
4051++ if (need_empty_block)
4052++ /* Make the current deflate() call also close the stream */
4053++ return 0;
4054++ else {
4055++ bi_windup(state);
4056++ *result = finish_done;
4057++ }
4058++ } else {
4059++ if (flush == Z_FULL_FLUSH)
4060++ param->hl = 0; /* Clear history */
4061++ *result = flush == Z_NO_FLUSH ? need_more : block_done;
4062++ }
4063++ } else {
4064++ param->bcf = 1;
4065++ *result = need_more;
4066++ }
4067++ if (strm->avail_in != 0 && strm->avail_out != 0)
4068++ goto again; /* deflate() must use all input or all output */
4069++ return 1;
4070++}
4071++
4072++/*
4073++ Expand.
4074++ */
4075++int ZLIB_INTERNAL dfltcc_can_inflate(z_streamp strm)
4076++{
4077++ struct inflate_state *state = (struct inflate_state *)strm->state;
4078++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4079++
4080++ /* Unsupported hardware */
4081++ return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
4082++ is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
4083++}
4084++
4085++local dfltcc_cc dfltcc_xpnd(z_streamp strm)
4086++{
4087++ struct inflate_state *state = (struct inflate_state *)strm->state;
4088++ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
4089++ size_t avail_in = strm->avail_in;
4090++ size_t avail_out = strm->avail_out;
4091++ dfltcc_cc cc;
4092++
4093++ cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR,
4094++ param, &strm->next_out, &avail_out,
4095++ &strm->next_in, &avail_in, state->window);
4096++ strm->avail_in = avail_in;
4097++ strm->avail_out = avail_out;
4098++ return cc;
4099++}
4100++
4101++dfltcc_inflate_action ZLIB_INTERNAL dfltcc_inflate(z_streamp strm, int flush,
4102++ int *ret)
4103++{
4104++ struct inflate_state *state = (struct inflate_state *)strm->state;
4105++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4106++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4107++ dfltcc_cc cc;
4108++
4109++ if (flush == Z_BLOCK || flush == Z_TREES) {
4110++ /* DFLTCC does not support stopping on block boundaries */
4111++ if (dfltcc_inflate_disable(strm)) {
4112++ *ret = Z_STREAM_ERROR;
4113++ return DFLTCC_INFLATE_BREAK;
4114++ } else
4115++ return DFLTCC_INFLATE_SOFTWARE;
4116++ }
4117++
4118++ if (state->last) {
4119++ if (state->bits != 0) {
4120++ strm->next_in++;
4121++ strm->avail_in--;
4122++ state->bits = 0;
4123++ }
4124++ state->mode = CHECK;
4125++ return DFLTCC_INFLATE_CONTINUE;
4126++ }
4127++
4128++ if (strm->avail_in == 0 && !param->cf)
4129++ return DFLTCC_INFLATE_BREAK;
4130++
4131++ if (inflate_ensure_window(state)) {
4132++ state->mode = MEM;
4133++ return DFLTCC_INFLATE_CONTINUE;
4134++ }
4135++
4136++ /* Translate stream to parameter block */
4137++ param->cvt = ((state->wrap & 4) && state->flags) ? CVT_CRC32 : CVT_ADLER32;
4138++ param->sbb = state->bits;
4139++ if (param->hl)
4140++ param->nt = 0; /* Honor history for the first block */
4141++ if (state->wrap & 4)
4142++ param->cv = state->flags ? ZSWAP32(state->check) : state->check;
4143++
4144++ /* Inflate */
4145++ do {
4146++ cc = dfltcc_xpnd(strm);
4147++ } while (cc == DFLTCC_CC_AGAIN);
4148++
4149++ /* Translate parameter block to stream */
4150++ strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
4151++ state->last = cc == DFLTCC_CC_OK;
4152++ state->bits = param->sbb;
4153++ if (state->wrap & 4)
4154++ strm->adler = state->check = state->flags ?
4155++ ZSWAP32(param->cv) : param->cv;
4156++ if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
4157++ /* Report an error if stream is corrupted */
4158++ state->mode = BAD;
4159++ return DFLTCC_INFLATE_CONTINUE;
4160++ }
4161++ state->mode = TYPEDO;
4162++ /* Break if operands are exhausted, otherwise continue looping */
4163++ return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
4164++ DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
4165++}
4166++
4167++int ZLIB_INTERNAL dfltcc_was_inflate_used(z_streamp strm)
4168++{
4169++ struct inflate_state *state = (struct inflate_state *)strm->state;
4170++ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
4171++
4172++ return !param->nt;
4173++}
4174++
4175++/*
4176++ Rotates a circular buffer.
4177++ The implementation is based on https://cplusplus.com/reference/algorithm/rotate/
4178++ */
4179++local void rotate(Bytef *start, Bytef *pivot, Bytef *end)
4180++{
4181++ Bytef *p = pivot;
4182++ Bytef tmp;
4183++
4184++ while (p != start) {
4185++ tmp = *start;
4186++ *start = *p;
4187++ *p = tmp;
4188++
4189++ start++;
4190++ p++;
4191++
4192++ if (p == end)
4193++ p = pivot;
4194++ else if (start == pivot)
4195++ pivot = p;
4196++ }
4197++}
4198++
4199++#define MIN(x, y) ({ \
4200++ typeof(x) _x = (x); \
4201++ typeof(y) _y = (y); \
4202++ _x < _y ? _x : _y; \
4203++})
4204++
4205++#define MAX(x, y) ({ \
4206++ typeof(x) _x = (x); \
4207++ typeof(y) _y = (y); \
4208++ _x > _y ? _x : _y; \
4209++})
4210++
4211++int ZLIB_INTERNAL dfltcc_inflate_disable(z_streamp strm)
4212++{
4213++ struct inflate_state *state = (struct inflate_state *)strm->state;
4214++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4215++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4216++
4217++ if (!dfltcc_can_inflate(strm))
4218++ return 0;
4219++ if (dfltcc_was_inflate_used(strm))
4220++ /* DFLTCC has already decompressed some data. Since there is not
4221++ * enough information to resume decompression in software, the call
4222++ * must fail.
4223++ */
4224++ return 1;
4225++ /* DFLTCC was not used yet - decompress in software */
4226++ memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
4227++ /* Convert the window from the hardware to the software format */
4228++ rotate(state->window, state->window + param->ho, state->window + HB_SIZE);
4229++ state->whave = state->wnext = MIN(param->hl, state->wsize);
4230++ return 0;
4231++}
4232++
4233++local int env_dfltcc_disabled;
4234++local int env_source_date_epoch;
4235++local unsigned long env_level_mask;
4236++local unsigned long env_block_size;
4237++local unsigned long env_block_threshold;
4238++local unsigned long env_dht_threshold;
4239++local unsigned long env_ribm;
4240++local uint64_t cpu_facilities[(DFLTCC_FACILITY / 64) + 1];
4241++local struct dfltcc_qaf_param cpu_af __attribute__((aligned(8)));
4242++
4243++local inline int is_dfltcc_enabled(void)
4244++{
4245++ if (env_dfltcc_disabled)
4246++ /* User has explicitly disabled DFLTCC. */
4247++ return 0;
4248++
4249++ return is_bit_set((const char *)cpu_facilities, DFLTCC_FACILITY);
4250++}
4251++
4252++local unsigned long xstrtoul(const char *s, unsigned long _default)
4253++{
4254++ char *endptr;
4255++ unsigned long result;
4256++
4257++ if (!(s && *s))
4258++ return _default;
4259++ errno = 0;
4260++ result = strtoul(s, &endptr, 0);
4261++ return (errno || *endptr) ? _default : result;
4262++}
4263++
4264++__attribute__((constructor)) local void init_globals(void)
4265++{
4266++ const char *env;
4267++ register char r0 __asm__("r0");
4268++
4269++ env = secure_getenv("DFLTCC");
4270++ env_dfltcc_disabled = env && !strcmp(env, "0");
4271++
4272++ env = secure_getenv("SOURCE_DATE_EPOCH");
4273++ env_source_date_epoch = !!env;
4274++
4275++#ifndef DFLTCC_LEVEL_MASK
4276++#define DFLTCC_LEVEL_MASK 0x2
4277++#endif
4278++ env_level_mask = xstrtoul(secure_getenv("DFLTCC_LEVEL_MASK"),
4279++ DFLTCC_LEVEL_MASK);
4280++
4281++#ifndef DFLTCC_BLOCK_SIZE
4282++#define DFLTCC_BLOCK_SIZE 1048576
4283++#endif
4284++ env_block_size = xstrtoul(secure_getenv("DFLTCC_BLOCK_SIZE"),
4285++ DFLTCC_BLOCK_SIZE);
4286++
4287++#ifndef DFLTCC_FIRST_FHT_BLOCK_SIZE
4288++#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
4289++#endif
4290++ env_block_threshold = xstrtoul(secure_getenv("DFLTCC_FIRST_FHT_BLOCK_SIZE"),
4291++ DFLTCC_FIRST_FHT_BLOCK_SIZE);
4292++
4293++#ifndef DFLTCC_DHT_MIN_SAMPLE_SIZE
4294++#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
4295++#endif
4296++ env_dht_threshold = xstrtoul(secure_getenv("DFLTCC_DHT_MIN_SAMPLE_SIZE"),
4297++ DFLTCC_DHT_MIN_SAMPLE_SIZE);
4298++
4299++#ifndef DFLTCC_RIBM
4300++#define DFLTCC_RIBM 0
4301++#endif
4302++ env_ribm = xstrtoul(secure_getenv("DFLTCC_RIBM"), DFLTCC_RIBM);
4303++
4304++ memset(cpu_facilities, 0, sizeof(cpu_facilities));
4305++ r0 = sizeof(cpu_facilities) / sizeof(cpu_facilities[0]) - 1;
4306++ /* STFLE is supported since z9-109 and only in z/Architecture mode. When
4307++ * compiling with -m31, gcc defaults to ESA mode, however, since the kernel
4308++ * is 64-bit, it's always z/Architecture mode at runtime.
4309++ */
4310++ __asm__ volatile(
4311++#ifndef __clang__
4312++ ".machinemode push\n"
4313++ ".machinemode zarch\n"
4314++#endif
4315++ "stfle %[facilities]\n"
4316++#ifndef __clang__
4317++ ".machinemode pop\n"
4318++#endif
4319++ : [facilities] "=Q" (cpu_facilities)
4320++ , [r0] "+r" (r0)
4321++ :
4322++ : "cc");
4323++
4324++ /* Initialize available functions */
4325++ if (is_dfltcc_enabled())
4326++ dfltcc(DFLTCC_QAF, &cpu_af, NULL, NULL, NULL, NULL, NULL);
4327++ else
4328++ memset(&cpu_af, 0, sizeof(cpu_af));
4329++}
4330++
4331++/*
4332++ Memory management.
4333++
4334++ DFLTCC requires parameter blocks and window to be aligned. zlib allows
4335++ users to specify their own allocation functions, so using e.g.
4336++ `posix_memalign' is not an option. Thus, we overallocate and take the
4337++ aligned portion of the buffer.
4338++*/
4339++void ZLIB_INTERNAL dfltcc_reset(z_streamp strm, uInt size)
4340++{
4341++ struct dfltcc_state *dfltcc_state =
4342++ (struct dfltcc_state *)((char *)strm->state + ALIGN_UP(size, 8));
4343++
4344++ memcpy(&dfltcc_state->af, &cpu_af, sizeof(dfltcc_state->af));
4345++
4346++ if (env_source_date_epoch)
4347++ /* User needs reproducible results, but the output of DFLTCC_CMPR
4348++ * depends on buffers' page offsets.
4349++ */
4350++ clear_bit(dfltcc_state->af.fns, DFLTCC_CMPR);
4351++
4352++ /* Initialize parameter block */
4353++ memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param));
4354++ dfltcc_state->param.nt = 1;
4355++
4356++ /* Initialize tuning parameters */
4357++ dfltcc_state->level_mask = env_level_mask;
4358++ dfltcc_state->block_size = env_block_size;
4359++ dfltcc_state->block_threshold = env_block_threshold;
4360++ dfltcc_state->dht_threshold = env_dht_threshold;
4361++ dfltcc_state->param.ribm = env_ribm;
4362++}
4363++
4364++voidpf ZLIB_INTERNAL dfltcc_alloc_state(z_streamp strm, uInt items, uInt size)
4365++{
4366++ return ZALLOC(strm,
4367++ ALIGN_UP(items * size, 8) + sizeof(struct dfltcc_state),
4368++ sizeof(unsigned char));
4369++}
4370++
4371++void ZLIB_INTERNAL dfltcc_copy_state(voidpf dst, const voidpf src, uInt size)
4372++{
4373++ zmemcpy(dst, src, ALIGN_UP(size, 8) + sizeof(struct dfltcc_state));
4374++}
4375++
4376++static const int PAGE_ALIGN = 0x1000;
4377++
4378++voidpf ZLIB_INTERNAL dfltcc_alloc_window(z_streamp strm, uInt items, uInt size)
4379++{
4380++ voidpf p, w;
4381++
4382++ /* To simplify freeing, we store the pointer to the allocated buffer right
4383++ * before the window. Note that DFLTCC always uses HB_SIZE bytes.
4384++ */
4385++ p = ZALLOC(strm, sizeof(voidpf) + MAX(items * size, HB_SIZE) + PAGE_ALIGN,
4386++ sizeof(unsigned char));
4387++ if (p == NULL)
4388++ return NULL;
4389++ w = ALIGN_UP((char *)p + sizeof(voidpf), PAGE_ALIGN);
4390++ *(voidpf *)((char *)w - sizeof(voidpf)) = p;
4391++ return w;
4392++}
4393++
4394++void ZLIB_INTERNAL dfltcc_copy_window(void *dest, const void *src, size_t n)
4395++{
4396++ memcpy(dest, src, MAX(n, HB_SIZE));
4397++}
4398++
4399++void ZLIB_INTERNAL dfltcc_free_window(z_streamp strm, voidpf w)
4400++{
4401++ if (w)
4402++ ZFREE(strm, *(voidpf *)((unsigned char *)w - sizeof(voidpf)));
4403++}
4404++
4405++/*
4406++ Switching between hardware and software compression.
4407++
4408++ DFLTCC does not support all zlib settings, e.g. generation of non-compressed
4409++ blocks or alternative window sizes. When such settings are applied on the
4410++ fly with deflateParams, we need to convert between hardware and software
4411++ window formats.
4412++*/
4413++int ZLIB_INTERNAL dfltcc_deflate_params(z_streamp strm, int level,
4414++ int strategy, int *flush)
4415++{
4416++ deflate_state *state = (deflate_state *)strm->state;
4417++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4418++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4419++ int could_deflate = dfltcc_can_deflate(strm);
4420++ int can_deflate = dfltcc_can_deflate_with_params(strm,
4421++ level,
4422++ state->w_bits,
4423++ strategy);
4424++
4425++ if (can_deflate == could_deflate)
4426++ /* We continue to work in the same mode - no changes needed */
4427++ return Z_OK;
4428++
4429++ if (strm->total_in == 0 && param->nt == 1 && param->hl == 0)
4430++ /* DFLTCC was not used yet - no changes needed */
4431++ return Z_OK;
4432++
4433++ /* For now, do not convert between window formats - simply get rid of the
4434++ * old data instead.
4435++ */
4436++ *flush = Z_FULL_FLUSH;
4437++ return Z_OK;
4438++}
4439++
4440++int ZLIB_INTERNAL dfltcc_deflate_done(z_streamp strm, int flush)
4441++{
4442++ deflate_state *state = (deflate_state *)strm->state;
4443++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4444++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4445++
4446++ /* When deflate(Z_FULL_FLUSH) is called with small avail_out, it might
4447++ * close the block without resetting the compression state. Detect this
4448++ * situation and return that deflation is not done.
4449++ */
4450++ if (flush == Z_FULL_FLUSH && strm->avail_out == 0)
4451++ return 0;
4452++
4453++ /* Return that deflation is not done if DFLTCC is used and either it
4454++ * buffered some data (Continuation Flag is set), or has not written EOBS
4455++ * yet (Block-Continuation Flag is set).
4456++ */
4457++ return !dfltcc_can_deflate(strm) || (!param->cf && !param->bcf);
4458++}
4459++
4460++/*
4461++ Preloading history.
4462++*/
4463++local void append_history(struct dfltcc_param_v0 *param,
4464++ Bytef *history,
4465++ const Bytef *buf,
4466++ uInt count)
4467++{
4468++ size_t offset;
4469++ size_t n;
4470++
4471++ /* Do not use more than 32K */
4472++ if (count > HB_SIZE) {
4473++ buf += count - HB_SIZE;
4474++ count = HB_SIZE;
4475++ }
4476++ offset = (param->ho + param->hl) % HB_SIZE;
4477++ if (offset + count <= HB_SIZE)
4478++ /* Circular history buffer does not wrap - copy one chunk */
4479++ zmemcpy(history + offset, buf, count);
4480++ else {
4481++ /* Circular history buffer wraps - copy two chunks */
4482++ n = HB_SIZE - offset;
4483++ zmemcpy(history + offset, buf, n);
4484++ zmemcpy(history, buf + n, count - n);
4485++ }
4486++ n = param->hl + count;
4487++ if (n <= HB_SIZE)
4488++ /* All history fits into buffer - no need to discard anything */
4489++ param->hl = n;
4490++ else {
4491++ /* History does not fit into buffer - discard extra bytes */
4492++ param->ho = (param->ho + (n - HB_SIZE)) % HB_SIZE;
4493++ param->hl = HB_SIZE;
4494++ }
4495++}
4496++
4497++local void get_history(struct dfltcc_param_v0 *param,
4498++ const Bytef *history,
4499++ Bytef *buf)
4500++{
4501++ if (param->ho + param->hl <= HB_SIZE)
4502++ /* Circular history buffer does not wrap - copy one chunk */
4503++ memcpy(buf, history + param->ho, param->hl);
4504++ else {
4505++ /* Circular history buffer wraps - copy two chunks */
4506++ memcpy(buf, history + param->ho, HB_SIZE - param->ho);
4507++ memcpy(buf + HB_SIZE - param->ho, history, param->ho + param->hl - HB_SIZE);
4508++ }
4509++}
4510++
4511++int ZLIB_INTERNAL dfltcc_deflate_set_dictionary(z_streamp strm,
4512++ const Bytef *dictionary,
4513++ uInt dict_length)
4514++{
4515++ deflate_state *state = (deflate_state *)strm->state;
4516++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4517++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4518++
4519++ append_history(param, state->window, dictionary, dict_length);
4520++ state->strstart = 1; /* Add FDICT to zlib header */
4521++ state->block_start = state->strstart; /* Make deflate_stored happy */
4522++ return Z_OK;
4523++}
4524++
4525++int ZLIB_INTERNAL dfltcc_deflate_get_dictionary(z_streamp strm,
4526++ Bytef *dictionary,
4527++ uInt *dict_length)
4528++{
4529++ deflate_state *state = (deflate_state *)strm->state;
4530++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4531++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4532++
4533++ if (dictionary)
4534++ get_history(param, state->window, dictionary);
4535++ if (dict_length)
4536++ *dict_length = param->hl;
4537++ return Z_OK;
4538++}
4539++
4540++int ZLIB_INTERNAL dfltcc_inflate_set_dictionary(z_streamp strm,
4541++ const Bytef *dictionary,
4542++ uInt dict_length)
4543++{
4544++ struct inflate_state *state = (struct inflate_state *)strm->state;
4545++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4546++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4547++
4548++ if (inflate_ensure_window(state)) {
4549++ state->mode = MEM;
4550++ return Z_MEM_ERROR;
4551++ }
4552++
4553++ append_history(param, state->window, dictionary, dict_length);
4554++ state->havedict = 1;
4555++ return Z_OK;
4556++}
4557++
4558++int ZLIB_INTERNAL dfltcc_inflate_get_dictionary(z_streamp strm,
4559++ Bytef *dictionary,
4560++ uInt *dict_length)
4561++{
4562++ struct inflate_state *state = (struct inflate_state *)strm->state;
4563++ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
4564++ struct dfltcc_param_v0 *param = &dfltcc_state->param;
4565++
4566++ if (dictionary && state->window)
4567++ get_history(param, state->window, dictionary);
4568++ if (dict_length)
4569++ *dict_length = param->hl;
4570++ return Z_OK;
4571++}
4572+diff --git a/contrib/s390/dfltcc.h b/contrib/s390/dfltcc.h
4573+new file mode 100644
4574+index 0000000..c8491c4
4575+--- /dev/null
4576++++ b/contrib/s390/dfltcc.h
4577+@@ -0,0 +1,97 @@
4578++#ifndef DFLTCC_H
4579++#define DFLTCC_H
4580++
4581++#include "../../zlib.h"
4582++#include "../../zutil.h"
4583++
4584++voidpf ZLIB_INTERNAL dfltcc_alloc_state(z_streamp strm, uInt items, uInt size);
4585++void ZLIB_INTERNAL dfltcc_copy_state(voidpf dst, const voidpf src, uInt size);
4586++void ZLIB_INTERNAL dfltcc_reset(z_streamp strm, uInt size);
4587++voidpf ZLIB_INTERNAL dfltcc_alloc_window(z_streamp strm, uInt items,
4588++ uInt size);
4589++void ZLIB_INTERNAL dfltcc_copy_window(void *dest, const void *src, size_t n);
4590++void ZLIB_INTERNAL dfltcc_free_window(z_streamp strm, voidpf w);
4591++#define DFLTCC_BLOCK_HEADER_BITS 3
4592++#define DFLTCC_HLITS_COUNT_BITS 5
4593++#define DFLTCC_HDISTS_COUNT_BITS 5
4594++#define DFLTCC_HCLENS_COUNT_BITS 4
4595++#define DFLTCC_MAX_HCLENS 19
4596++#define DFLTCC_HCLEN_BITS 3
4597++#define DFLTCC_MAX_HLITS 286
4598++#define DFLTCC_MAX_HDISTS 30
4599++#define DFLTCC_MAX_HLIT_HDIST_BITS 7
4600++#define DFLTCC_MAX_SYMBOL_BITS 16
4601++#define DFLTCC_MAX_EOBS_BITS 15
4602++#define DFLTCC_MAX_PADDING_BITS 7
4603++#define DEFLATE_BOUND_COMPLEN(source_len) \
4604++ ((DFLTCC_BLOCK_HEADER_BITS + \
4605++ DFLTCC_HLITS_COUNT_BITS + \
4606++ DFLTCC_HDISTS_COUNT_BITS + \
4607++ DFLTCC_HCLENS_COUNT_BITS + \
4608++ DFLTCC_MAX_HCLENS * DFLTCC_HCLEN_BITS + \
4609++ (DFLTCC_MAX_HLITS + DFLTCC_MAX_HDISTS) * DFLTCC_MAX_HLIT_HDIST_BITS + \
4610++ (source_len) * DFLTCC_MAX_SYMBOL_BITS + \
4611++ DFLTCC_MAX_EOBS_BITS + \
4612++ DFLTCC_MAX_PADDING_BITS) >> 3)
4613++int ZLIB_INTERNAL dfltcc_can_inflate(z_streamp strm);
4614++typedef enum {
4615++ DFLTCC_INFLATE_CONTINUE,
4616++ DFLTCC_INFLATE_BREAK,
4617++ DFLTCC_INFLATE_SOFTWARE,
4618++} dfltcc_inflate_action;
4619++dfltcc_inflate_action ZLIB_INTERNAL dfltcc_inflate(z_streamp strm,
4620++ int flush, int *ret);
4621++int ZLIB_INTERNAL dfltcc_was_inflate_used(z_streamp strm);
4622++int ZLIB_INTERNAL dfltcc_inflate_disable(z_streamp strm);
4623++int ZLIB_INTERNAL dfltcc_inflate_set_dictionary(z_streamp strm,
4624++ const Bytef *dictionary,
4625++ uInt dict_length);
4626++int ZLIB_INTERNAL dfltcc_inflate_get_dictionary(z_streamp strm,
4627++ Bytef *dictionary,
4628++ uInt* dict_length);
4629++
4630++#define ZALLOC_STATE dfltcc_alloc_state
4631++#define ZFREE_STATE ZFREE
4632++#define ZCOPY_STATE dfltcc_copy_state
4633++#define ZALLOC_WINDOW dfltcc_alloc_window
4634++#define ZCOPY_WINDOW dfltcc_copy_window
4635++#define ZFREE_WINDOW dfltcc_free_window
4636++#define TRY_FREE_WINDOW dfltcc_free_window
4637++#define INFLATE_RESET_KEEP_HOOK(strm) \
4638++ dfltcc_reset((strm), sizeof(struct inflate_state))
4639++#define INFLATE_PRIME_HOOK(strm, bits, value) \
4640++ do { if (dfltcc_inflate_disable((strm))) return Z_STREAM_ERROR; } while (0)
4641++#define INFLATE_TYPEDO_HOOK(strm, flush) \
4642++ if (dfltcc_can_inflate((strm))) { \
4643++ dfltcc_inflate_action action; \
4644++\
4645++ RESTORE(); \
4646++ action = dfltcc_inflate((strm), (flush), &ret); \
4647++ LOAD(); \
4648++ if (action == DFLTCC_INFLATE_CONTINUE) \
4649++ break; \
4650++ else if (action == DFLTCC_INFLATE_BREAK) \
4651++ goto inf_leave; \
4652++ }
4653++#define INFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_inflate((strm)))
4654++#define INFLATE_NEED_UPDATEWINDOW(strm) (!dfltcc_can_inflate((strm)))
4655++#define INFLATE_MARK_HOOK(strm) \
4656++ do { \
4657++ if (dfltcc_was_inflate_used((strm))) return -(1L << 16); \
4658++ } while (0)
4659++#define INFLATE_SYNC_POINT_HOOK(strm) \
4660++ do { \
4661++ if (dfltcc_was_inflate_used((strm))) return Z_STREAM_ERROR; \
4662++ } while (0)
4663++#define INFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
4664++ do { \
4665++ if (dfltcc_can_inflate(strm)) \
4666++ return dfltcc_inflate_set_dictionary(strm, dict, dict_len); \
4667++ } while (0)
4668++#define INFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
4669++ do { \
4670++ if (dfltcc_can_inflate(strm)) \
4671++ return dfltcc_inflate_get_dictionary(strm, dict, dict_len); \
4672++ } while (0)
4673++
4674++#endif
4675+diff --git a/contrib/s390/dfltcc_deflate.h b/contrib/s390/dfltcc_deflate.h
4676+new file mode 100644
4677+index 0000000..2699d15
4678+--- /dev/null
4679++++ b/contrib/s390/dfltcc_deflate.h
4680+@@ -0,0 +1,53 @@
4681++#ifndef DFLTCC_DEFLATE_H
4682++#define DFLTCC_DEFLATE_H
4683++
4684++#include "dfltcc.h"
4685++
4686++int ZLIB_INTERNAL dfltcc_can_deflate(z_streamp strm);
4687++int ZLIB_INTERNAL dfltcc_deflate(z_streamp strm,
4688++ int flush,
4689++ block_state *result);
4690++int ZLIB_INTERNAL dfltcc_deflate_params(z_streamp strm, int level,
4691++ int strategy, int *flush);
4692++int ZLIB_INTERNAL dfltcc_deflate_done(z_streamp strm, int flush);
4693++int ZLIB_INTERNAL dfltcc_deflate_set_dictionary(z_streamp strm,
4694++ const Bytef *dictionary,
4695++ uInt dict_length);
4696++int ZLIB_INTERNAL dfltcc_deflate_get_dictionary(z_streamp strm,
4697++ Bytef *dictionary,
4698++ uInt* dict_length);
4699++
4700++#define DEFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
4701++ do { \
4702++ if (dfltcc_can_deflate((strm))) \
4703++ return dfltcc_deflate_set_dictionary((strm), (dict), (dict_len)); \
4704++ } while (0)
4705++#define DEFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
4706++ do { \
4707++ if (dfltcc_can_deflate((strm))) \
4708++ return dfltcc_deflate_get_dictionary((strm), (dict), (dict_len)); \
4709++ } while (0)
4710++#define DEFLATE_RESET_KEEP_HOOK(strm) \
4711++ dfltcc_reset((strm), sizeof(deflate_state))
4712++#define DEFLATE_PARAMS_HOOK(strm, level, strategy, hook_flush) \
4713++ do { \
4714++ int err; \
4715++\
4716++ err = dfltcc_deflate_params((strm), \
4717++ (level), \
4718++ (strategy), \
4719++ (hook_flush)); \
4720++ if (err == Z_STREAM_ERROR) \
4721++ return err; \
4722++ } while (0)
4723++#define DEFLATE_DONE dfltcc_deflate_done
4724++#define DEFLATE_BOUND_ADJUST_COMPLEN(strm, complen, source_len) \
4725++ do { \
4726++ if (deflateStateCheck((strm)) || dfltcc_can_deflate((strm))) \
4727++ (complen) = DEFLATE_BOUND_COMPLEN(source_len); \
4728++ } while (0)
4729++#define DEFLATE_NEED_CONSERVATIVE_BOUND(strm) (dfltcc_can_deflate((strm)))
4730++#define DEFLATE_HOOK dfltcc_deflate
4731++#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
4732++
4733++#endif
4734+diff --git a/deflate.c b/deflate.c
4735+index bd01175..9f5bc8b 100644
4736+--- a/deflate.c
4737++++ b/deflate.c
4738+@@ -60,12 +60,24 @@ const char deflate_copyright[] =
4739+ copyright string in the executable of your product.
4740+ */
4741+
4742+-typedef enum {
4743+- need_more, /* block not completed, need more input or more output */
4744+- block_done, /* block flush performed */
4745+- finish_started, /* finish started, need only more output at next deflate */
4746+- finish_done /* finish done, accept no more input or output */
4747+-} block_state;
4748++#ifdef DFLTCC
4749++#include "contrib/s390/dfltcc_deflate.h"
4750++#else
4751++#define ZALLOC_STATE ZALLOC
4752++#define ZFREE_STATE ZFREE
4753++#define ZCOPY_STATE zmemcpy
4754++#define ZALLOC_WINDOW ZALLOC
4755++#define TRY_FREE_WINDOW TRY_FREE
4756++#define DEFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) do {} while (0)
4757++#define DEFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) do {} while (0)
4758++#define DEFLATE_RESET_KEEP_HOOK(strm) do {} while (0)
4759++#define DEFLATE_PARAMS_HOOK(strm, level, strategy, hook_flush) do {} while (0)
4760++#define DEFLATE_DONE(strm, flush) 1
4761++#define DEFLATE_BOUND_ADJUST_COMPLEN(strm, complen, sourceLen) do {} while (0)
4762++#define DEFLATE_NEED_CONSERVATIVE_BOUND(strm) 0
4763++#define DEFLATE_HOOK(strm, flush, bstate) 0
4764++#define DEFLATE_NEED_CHECKSUM(strm) 1
4765++#endif
4766+
4767+ typedef block_state (*compress_func)(deflate_state *s, int flush);
4768+ /* Compression function. Returns the block state after the call. */
4769+@@ -224,7 +236,8 @@ local unsigned read_buf(z_streamp strm, Bytef *buf, unsigned size) {
4770+ strm->avail_in -= len;
4771+
4772+ zmemcpy(buf, strm->next_in, len);
4773+- if (strm->state->wrap == 1) {
4774++ if (!DEFLATE_NEED_CHECKSUM(strm)) {}
4775++ else if (strm->state->wrap == 1) {
4776+ strm->adler = adler32(strm->adler, buf, len);
4777+ }
4778+ #ifdef GZIP
4779+@@ -429,7 +442,7 @@ int ZEXPORT deflateInit2_(z_streamp strm, int level, int method,
4780+ return Z_STREAM_ERROR;
4781+ }
4782+ if (windowBits == 8) windowBits = 9; /* until 256-byte window bug fixed */
4783+- s = (deflate_state *) ZALLOC(strm, 1, sizeof(deflate_state));
4784++ s = (deflate_state *) ZALLOC_STATE(strm, 1, sizeof(deflate_state));
4785+ if (s == Z_NULL) return Z_MEM_ERROR;
4786+ strm->state = (struct internal_state FAR *)s;
4787+ s->strm = strm;
4788+@@ -446,7 +459,7 @@ int ZEXPORT deflateInit2_(z_streamp strm, int level, int method,
4789+ s->hash_mask = s->hash_size - 1;
4790+ s->hash_shift = ((s->hash_bits + MIN_MATCH-1) / MIN_MATCH);
4791+
4792+- s->window = (Bytef *) ZALLOC(strm, s->w_size, 2*sizeof(Byte));
4793++ s->window = (Bytef *) ZALLOC_WINDOW(strm, s->w_size, 2*sizeof(Byte));
4794+ s->prev = (Posf *) ZALLOC(strm, s->w_size, sizeof(Pos));
4795+ s->head = (Posf *) ZALLOC(strm, s->hash_size, sizeof(Pos));
4796+
4797+@@ -559,6 +572,7 @@ int ZEXPORT deflateSetDictionary(z_streamp strm, const Bytef *dictionary,
4798+ /* when using zlib wrappers, compute Adler-32 for provided dictionary */
4799+ if (wrap == 1)
4800+ strm->adler = adler32(strm->adler, dictionary, dictLength);
4801++ DEFLATE_SET_DICTIONARY_HOOK(strm, dictionary, dictLength);
4802+ s->wrap = 0; /* avoid computing Adler-32 in read_buf */
4803+
4804+ /* if dictionary would fill window, just replace the history */
4805+@@ -614,6 +628,7 @@ int ZEXPORT deflateGetDictionary(z_streamp strm, Bytef *dictionary,
4806+
4807+ if (deflateStateCheck(strm))
4808+ return Z_STREAM_ERROR;
4809++ DEFLATE_GET_DICTIONARY_HOOK(strm, dictionary, dictLength);
4810+ s = strm->state;
4811+ len = s->strstart + s->lookahead;
4812+ if (len > s->w_size)
4813+@@ -658,6 +673,8 @@ int ZEXPORT deflateResetKeep(z_streamp strm) {
4814+
4815+ _tr_init(s);
4816+
4817++ DEFLATE_RESET_KEEP_HOOK(strm);
4818++
4819+ return Z_OK;
4820+ }
4821+
4822+@@ -740,6 +757,7 @@ int ZEXPORT deflatePrime(z_streamp strm, int bits, int value) {
4823+ int ZEXPORT deflateParams(z_streamp strm, int level, int strategy) {
4824+ deflate_state *s;
4825+ compress_func func;
4826++ int hook_flush = Z_NO_FLUSH;
4827+
4828+ if (deflateStateCheck(strm)) return Z_STREAM_ERROR;
4829+ s = strm->state;
4830+@@ -752,15 +770,18 @@ int ZEXPORT deflateParams(z_streamp strm, int level, int strategy) {
4831+ if (level < 0 || level > 9 || strategy < 0 || strategy > Z_FIXED) {
4832+ return Z_STREAM_ERROR;
4833+ }
4834++ DEFLATE_PARAMS_HOOK(strm, level, strategy, &hook_flush);
4835+ func = configuration_table[s->level].func;
4836+
4837+- if ((strategy != s->strategy || func != configuration_table[level].func) &&
4838+- s->last_flush != -2) {
4839++ if (((strategy != s->strategy || func != configuration_table[level].func) &&
4840++ s->last_flush != -2) || hook_flush != Z_NO_FLUSH) {
4841+ /* Flush the last buffer: */
4842+- int err = deflate(strm, Z_BLOCK);
4843++ int flush = RANK(hook_flush) > RANK(Z_BLOCK) ? hook_flush : Z_BLOCK;
4844++ int err = deflate(strm, flush);
4845+ if (err == Z_STREAM_ERROR)
4846+ return err;
4847+- if (strm->avail_in || (s->strstart - s->block_start) + s->lookahead)
4848++ if (strm->avail_in || (s->strstart - s->block_start) + s->lookahead ||
4849++ !DEFLATE_DONE(strm, flush))
4850+ return Z_BUF_ERROR;
4851+ }
4852+ if (s->level != level) {
4853+@@ -828,11 +849,13 @@ uLong ZEXPORT deflateBound(z_streamp strm, uLong sourceLen) {
4854+ ~13% overhead plus a small constant */
4855+ fixedlen = sourceLen + (sourceLen >> 3) + (sourceLen >> 8) +
4856+ (sourceLen >> 9) + 4;
4857++ DEFLATE_BOUND_ADJUST_COMPLEN(strm, fixedlen, sourceLen);
4858+
4859+ /* upper bound for stored blocks with length 127 (memLevel == 1) --
4860+ ~4% overhead plus a small constant */
4861+ storelen = sourceLen + (sourceLen >> 5) + (sourceLen >> 7) +
4862+ (sourceLen >> 11) + 7;
4863++ DEFLATE_BOUND_ADJUST_COMPLEN(strm, storelen, sourceLen);
4864+
4865+ /* if can't get parameters, return larger bound plus a zlib wrapper */
4866+ if (deflateStateCheck(strm))
4867+@@ -874,7 +897,8 @@ uLong ZEXPORT deflateBound(z_streamp strm, uLong sourceLen) {
4868+ }
4869+
4870+ /* if not default parameters, return one of the conservative bounds */
4871+- if (s->w_bits != 15 || s->hash_bits != 8 + 7)
4872++ if (DEFLATE_NEED_CONSERVATIVE_BOUND(strm) ||
4873++ s->w_bits != 15 || s->hash_bits != 8 + 7)
4874+ return (s->w_bits <= s->hash_bits && s->level ? fixedlen : storelen) +
4875+ wraplen;
4876+
4877+@@ -900,7 +924,7 @@ local void putShortMSB(deflate_state *s, uInt b) {
4878+ * applications may wish to modify it to avoid allocating a large
4879+ * strm->next_out buffer and copying into it. (See also read_buf()).
4880+ */
4881+-local void flush_pending(z_streamp strm) {
4882++void ZLIB_INTERNAL flush_pending(z_streamp strm) {
4883+ unsigned len;
4884+ deflate_state *s = strm->state;
4885+
4886+@@ -1167,7 +1191,8 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
4887+ (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
4888+ block_state bstate;
4889+
4890+- bstate = s->level == 0 ? deflate_stored(s, flush) :
4891++ bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate :
4892++ s->level == 0 ? deflate_stored(s, flush) :
4893+ s->strategy == Z_HUFFMAN_ONLY ? deflate_huff(s, flush) :
4894+ s->strategy == Z_RLE ? deflate_rle(s, flush) :
4895+ (*(configuration_table[s->level].func))(s, flush);
4896+@@ -1214,7 +1239,6 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
4897+ }
4898+
4899+ if (flush != Z_FINISH) return Z_OK;
4900+- if (s->wrap <= 0) return Z_STREAM_END;
4901+
4902+ /* Write the trailer */
4903+ #ifdef GZIP
4904+@@ -1230,7 +1254,7 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
4905+ }
4906+ else
4907+ #endif
4908+- {
4909++ if (s->wrap == 1) {
4910+ putShortMSB(s, (uInt)(strm->adler >> 16));
4911+ putShortMSB(s, (uInt)(strm->adler & 0xffff));
4912+ }
4913+@@ -1239,7 +1263,11 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
4914+ * to flush the rest.
4915+ */
4916+ if (s->wrap > 0) s->wrap = -s->wrap; /* write the trailer only once! */
4917+- return s->pending != 0 ? Z_OK : Z_STREAM_END;
4918++ if (s->pending == 0) {
4919++ Assert(s->bi_valid == 0, "bi_buf not flushed");
4920++ return Z_STREAM_END;
4921++ }
4922++ return Z_OK;
4923+ }
4924+
4925+ /* ========================================================================= */
4926+@@ -1254,9 +1282,9 @@ int ZEXPORT deflateEnd(z_streamp strm) {
4927+ TRY_FREE(strm, strm->state->pending_buf);
4928+ TRY_FREE(strm, strm->state->head);
4929+ TRY_FREE(strm, strm->state->prev);
4930+- TRY_FREE(strm, strm->state->window);
4931++ TRY_FREE_WINDOW(strm, strm->state->window);
4932+
4933+- ZFREE(strm, strm->state);
4934++ ZFREE_STATE(strm, strm->state);
4935+ strm->state = Z_NULL;
4936+
4937+ return status == BUSY_STATE ? Z_DATA_ERROR : Z_OK;
4938+@@ -1285,13 +1313,13 @@ int ZEXPORT deflateCopy(z_streamp dest, z_streamp source) {
4939+
4940+ zmemcpy((voidpf)dest, (voidpf)source, sizeof(z_stream));
4941+
4942+- ds = (deflate_state *) ZALLOC(dest, 1, sizeof(deflate_state));
4943++ ds = (deflate_state *) ZALLOC_STATE(dest, 1, sizeof(deflate_state));
4944+ if (ds == Z_NULL) return Z_MEM_ERROR;
4945+ dest->state = (struct internal_state FAR *) ds;
4946+- zmemcpy((voidpf)ds, (voidpf)ss, sizeof(deflate_state));
4947++ ZCOPY_STATE((voidpf)ds, (voidpf)ss, sizeof(deflate_state));
4948+ ds->strm = dest;
4949+
4950+- ds->window = (Bytef *) ZALLOC(dest, ds->w_size, 2*sizeof(Byte));
4951++ ds->window = (Bytef *) ZALLOC_WINDOW(dest, ds->w_size, 2*sizeof(Byte));
4952+ ds->prev = (Posf *) ZALLOC(dest, ds->w_size, sizeof(Pos));
4953+ ds->head = (Posf *) ZALLOC(dest, ds->hash_size, sizeof(Pos));
4954+ ds->pending_buf = (uchf *) ZALLOC(dest, ds->lit_bufsize, 4);
4955+diff --git a/deflate.h b/deflate.h
4956+index 8696791..d49e698 100644
4957+--- a/deflate.h
4958++++ b/deflate.h
4959+@@ -299,6 +299,7 @@ void ZLIB_INTERNAL _tr_flush_bits(deflate_state *s);
4960+ void ZLIB_INTERNAL _tr_align(deflate_state *s);
4961+ void ZLIB_INTERNAL _tr_stored_block(deflate_state *s, charf *buf,
4962+ ulg stored_len, int last);
4963++void ZLIB_INTERNAL _tr_send_bits(deflate_state *s, int value, int length);
4964+
4965+ #define d_code(dist) \
4966+ ((dist) < 256 ? _dist_code[dist] : _dist_code[256+((dist)>>7)])
4967+@@ -343,4 +344,15 @@ void ZLIB_INTERNAL _tr_stored_block(deflate_state *s, charf *buf,
4968+ flush = _tr_tally(s, distance, length)
4969+ #endif
4970+
4971++typedef enum {
4972++ need_more, /* block not completed, need more input or more output */
4973++ block_done, /* block flush performed */
4974++ finish_started, /* finish started, need only more output at next deflate */
4975++ finish_done /* finish done, accept no more input or output */
4976++} block_state;
4977++
4978++unsigned ZLIB_INTERNAL bi_reverse(unsigned code, int len);
4979++void ZLIB_INTERNAL bi_windup(deflate_state *s);
4980++void ZLIB_INTERNAL flush_pending(z_streamp strm);
4981++
4982+ #endif /* DEFLATE_H */
4983+diff --git a/gzguts.h b/gzguts.h
4984+index f937504..5adfd1d 100644
4985+--- a/gzguts.h
4986++++ b/gzguts.h
4987+@@ -152,7 +152,11 @@
4988+
4989+ /* default i/o buffer size -- double this for output when reading (this and
4990+ twice this must be able to fit in an unsigned type) */
4991++#ifdef DFLTCC
4992++#define GZBUFSIZE 131072
4993++#else
4994+ #define GZBUFSIZE 8192
4995++#endif
4996+
4997+ /* gzip modes, also provide a little integrity check on the passed structure */
4998+ #define GZ_NONE 0
4999+diff --git a/inflate.c b/inflate.c
5000+index b0757a9..c0f808f 100644
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches