Merge ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid

Proposed by Mate Kukri
Status: Merged
Merge reported by: Mate Kukri
Merged at revision: 515581d841bd3732d669f9806966080208c840b8
Proposed branch: ~mkukri/ubuntu/+source/zlib:merge
Merge into: ubuntu/+source/zlib:debian/sid
Diff against target: 6023 lines (+5732/-19)
17 files modified
debian/changelog (+246/-0)
debian/control (+24/-1)
debian/libx32z1-dev.dirs (+1/-0)
debian/libx32z1-dev.install (+2/-0)
debian/libx32z1.dirs (+1/-0)
debian/libx32z1.install (+1/-0)
debian/libx32z1.symbols (+3/-0)
debian/patches/power/add-optimized-crc32.patch (+2539/-0)
debian/patches/power/fix-clang7-builtins.patch (+62/-0)
debian/patches/power/indirect-func-macros.patch (+295/-0)
debian/patches/s390x/add-accel-deflate.patch (+2043/-0)
debian/patches/s390x/add-vectorized-crc32.patch (+426/-0)
debian/patches/series (+5/-0)
debian/rules (+39/-5)
debian/upstream/signing-key.asc (+30/-0)
debian/watch (+2/-0)
debian/zlib-core.symbols (+13/-13)
Reviewer Review Type Date Requested Status
Lukas Märdian (community) Approve
Frank Heimes (community) Approve
Steve Langasek (community) Abstain
Ubuntu Sponsors Pending
git-ubuntu import Pending
Review via email: mp+456176@code.launchpad.net

Commit message

Merge zlib with Debian unstable.

This needed some TLC:
- Split the previous diff with git ubuntu
- Replaced the POWER and s390x patches with the newest ones from IBM rebased on Debian
- Removed the superseded bugfix patches (now included in the above)

To post a comment you must log in.
Revision history for this message
Steve Langasek (vorlon) wrote :

I'm off until end of year so I think you should grab a different reviewer for this

review: Abstain
Revision history for this message
Mate Kukri (mkukri) wrote :

> I'm off until end of year so I think you should grab a different reviewer for
> this

Understood, I saw your and Frank Heimes's name on the last changelog entries, that's what I based this on.

Do you have any names in mind who has touched this package before and might be willing to review this?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Nov 23, 2023 at 01:30:54PM -0000, Mate Kukri wrote:
> > I'm off until end of year so I think you should grab a different reviewer for
> > this

> Understood, I saw your and Frank Heimes's name on the last changelog
> entries, that's what I based this on.
>
> Do you have any names in mind who has touched this package before and
> might be willing to review this?

I don't think "touched this package" is a relevant criterion and you should
ask around in Foundations (or just ask ~canonical-foundations as a reviewer)

~mkukri/ubuntu/+source/zlib:merge updated
b2a9df2... by Mate Kukri

merge-changelogs

87e1e2b... by Mate Kukri

reconstruct-changelog

Revision history for this message
Mate Kukri (mkukri) wrote :

Now based on 1:1.3.dfsg-3

~mkukri/ubuntu/+source/zlib:merge updated
515581d... by Mate Kukri

update-maintainer

Revision history for this message
Frank Heimes (fheimes) wrote :

I think this looks good, and is a nice clean-up.

Since this is merged to the noble development release quite early, there should be some time to ask the IBM s390x people to give it a try (I remember that Ilya Leoshkevich <email address hidden> had some test code).

Once I see that this landed, I would like to ask Ilya (no need for you to do anything, but that allows to ensure that the changing s390x optimization patches work fine ...).

review: Approve
Revision history for this message
Mate Kukri (mkukri) wrote :

> I think this looks good, and is a nice clean-up.
>
> Since this is merged to the noble development release quite early, there
> should be some time to ask the IBM s390x people to give it a try (I remember
> that Ilya Leoshkevich <email address hidden> had some test code).
>
> Once I see that this landed, I would like to ask Ilya (no need for you to do
> anything, but that allows to ensure that the changing s390x optimization
> patches work fine ...).

Are you also able to upload this, or should I ask someone else?

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Mate,
I'm sorry, you would need a coredev for uploading, since it's a main
package - and I am only MOTU (working on coredev ;-).
IIRC schopin sponsored my zlib uploads in the past ...

Bye, Frank

Ubuntu on s390x Blog -- ubuntu-on-big-iron.blogspot.com
<http://ubuntu-on-big-iron.blogspot.com/?view=sidebar>

On Mon, Nov 27, 2023 at 3:01 PM Mate Kukri <email address hidden>
wrote:

> > I think this looks good, and is a nice clean-up.
> >
> > Since this is merged to the noble development release quite early, there
> > should be some time to ask the IBM s390x people to give it a try (I
> remember
> > that Ilya Leoshkevich <email address hidden> had some test code).
> >
> > Once I see that this landed, I would like to ask Ilya (no need for you
> to do
> > anything, but that allows to ensure that the changing s390x optimization
> > patches work fine ...).
>
> Are you also able to upload this, or should I ask someone else?
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Frank Heimes (fheimes) wrote :

Btw. I haven't seen a LP bug reference in the changelog, are you doing this
merge based on a LP bug ? (what I assume), then please don't forget to
reference this LP bug in d/changelog.

On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <email address hidden>
wrote:

> You have been requested to review the proposed merge of
> ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
>
> For more details, see:
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
>
>
>
> --
> You are requested to review the proposed merge of
> ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
>

Revision history for this message
Mate Kukri (mkukri) wrote :

I don't think there is an LP bug for this, maybe I should have created one, but this is tracked internally on the Foundations Jira.

> Btw. I haven't seen a LP bug reference in the changelog, are you doing this
> merge based on a LP bug ? (what I assume), then please don't forget to
> reference this LP bug in d/changelog.
>
> On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <email address hidden>
> wrote:
>
> > You have been requested to review the proposed merge of
> > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> >
> > For more details, see:
> >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> >
> >
> >
> > --
> > You are requested to review the proposed merge of
> > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> >

Revision history for this message
Frank Heimes (fheimes) wrote :

I think the Wiki page for merging recommends to do so:
https://wiki.ubuntu.com/UbuntuDevelopment/Merging
"FILE A MERGE BUG"

Ubuntu on s390x Blog -- ubuntu-on-big-iron.blogspot.com
<http://ubuntu-on-big-iron.blogspot.com/?view=sidebar>

On Tue, Nov 28, 2023 at 9:08 AM Mate Kukri <email address hidden>
wrote:

> I don't think there is an LP bug for this, maybe I should have created
> one, but this is tracked internally on the Foundations Jira.
>
> > Btw. I haven't seen a LP bug reference in the changelog, are you doing
> this
> > merge based on a LP bug ? (what I assume), then please don't forget to
> > reference this LP bug in d/changelog.
> >
> > On Thu, Nov 23, 2023 at 2:15 PM Mate Kukri <<email address hidden>
> >
> > wrote:
> >
> > > You have been requested to review the proposed merge of
> > > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> > >
> > > For more details, see:
> > >
> > >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> > >
> > >
> > >
> > > --
> > > You are requested to review the proposed merge of
> > > ~mkukri/ubuntu/+source/zlib:merge into ubuntu/+source/zlib:debian/sid.
> > >
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Lukas Märdian (slyon) wrote :

Thank you Mate, that's indeed a really nice cleanup!

The new patches are nicely structured and provide clean patch headers. I confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub. Besides the new patches the delta looks very similar to our previous delta, but this time as clean git-ubuntu commits. Kudos!

@Frank: you mention there might be some test code available, I wonder if we could somehow integrate that into the package? Because unfortunately there doesn't seem to be any dh_auto_test nor autopkgtest. :(
Either way, we should definitely ask IBM/Ilya to verify that the new patches work as intended.

@Mate: We should also consider upstreaming the d/watch delta to Debian, I think that could be useful and doesn't need to be part of the delta.

Test build passed in a PPA:
https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble

LGTM. Sponsoring.

review: Approve
Revision history for this message
Frank Heimes (fheimes) wrote :

From what I remember 'iii' has just a few roughly coded C programs, that
test s390x optimizations and verify some bugs (that popped up in the past).
(Unfortunately) I assume is not in a shape to be integrated as standard
test - and is s390x specific anyway ... :-/

I more thought about using these as kind of regression testing for the
s390x specific bits and pieces.

But I'll ask - maybe there was some more work on it, that I am not aware of
...

On Tue, Nov 28, 2023 at 4:31 PM Lukas Märdian <email address hidden>
wrote:

> Review: Approve
>
> Thank you Mate, that's indeed a really nice cleanup!
>
> The new patches are nicely structured and provide clean patch headers. I
> confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub.
> Besides the new patches the delta looks very similar to our previous delta,
> but this time as clean git-ubuntu commits. Kudos!
>
> @Frank: you mention there might be some test code available, I wonder if
> we could somehow integrate that into the package? Because unfortunately
> there doesn't seem to be any dh_auto_test nor autopkgtest. :(
> Either way, we should definitely ask IBM/Ilya to verify that the new
> patches work as intended.
>
> @Mate: We should also consider upstreaming the d/watch delta to Debian, I
> think that could be useful and doesn't need to be part of the delta.
>
> Test build passed in a PPA:
>
> https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble
>
> LGTM. Sponsoring.
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Frank Heimes (fheimes) wrote :

So Ilya was pretty quick. He tested the package on a mantic environment
(which is still close to noble) and all his tests passed !

Like assumed his tests are s390x specific - so not very useful for a more
generic autopkgtest.

Anyway, glad that he could gave it a try and came back with a :thumbs up:

On Tue, Nov 28, 2023 at 5:14 PM Frank Heimes <email address hidden>
wrote:

> From what I remember 'iii' has just a few roughly coded C programs, that
> test s390x optimizations and verify some bugs (that popped up in the past).
> (Unfortunately) I assume is not in a shape to be integrated as standard
> test - and is s390x specific anyway ... :-/
>
> I more thought about using these as kind of regression testing for the
> s390x specific bits and pieces.
>
> But I'll ask - maybe there was some more work on it, that I am not aware of
> ...
>
> On Tue, Nov 28, 2023 at 4:31 PM Lukas Märdian <
> <email address hidden>>
> wrote:
>
> > Review: Approve
> >
> > Thank you Mate, that's indeed a really nice cleanup!
> >
> > The new patches are nicely structured and provide clean patch headers. I
> > confirmed they match the patches from Ilya (iii-i/zlib/dfltcc) on GitHub.
> > Besides the new patches the delta looks very similar to our previous
> delta,
> > but this time as clean git-ubuntu commits. Kudos!
> >
> > @Frank: you mention there might be some test code available, I wonder if
> > we could somehow integrate that into the package? Because unfortunately
> > there doesn't seem to be any dh_auto_test nor autopkgtest. :(
> > Either way, we should definitely ask IBM/Ilya to verify that the new
> > patches work as intended.
> >
> > @Mate: We should also consider upstreaming the d/watch delta to Debian, I
> > think that could be useful and doesn't need to be part of the delta.
> >
> > Test build passed in a PPA:
> >
> >
> https://launchpad.net/~mkukri/+archive/ubuntu/dev/+packages?field.name_filter=&field.status_filter=published&field.series_filter=noble
> >
> > LGTM. Sponsoring.
> > --
> >
> >
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> > You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> > into ubuntu/+source/zlib:debian/sid.
> >
> >
>
> --
>
> https://code.launchpad.net/~mkukri/ubuntu/+source/zlib/+git/zlib/+merge/456176
> You are reviewing the proposed merge of ~mkukri/ubuntu/+source/zlib:merge
> into ubuntu/+source/zlib:debian/sid.
>
>

Revision history for this message
Mate Kukri (mkukri) wrote :

@fheimes That is good news.

If the test code is in a publishable state it might still be worth a shot integrating it as an s390x specific autopkgtest.

That and POWER crc32 is our only significant delta over Debian, so I think it would still help give more confidence to these merges.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/debian/changelog b/debian/changelog
index 92d84a0..d52ce34 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,25 @@
1zlib (1:1.3.dfsg-3ubuntu1) noble; urgency=medium
2
3 * Merge with Debian unstable. Remaining changes:
4 - Build x32 packages
5 - Add watch file, with GPG tarball checking, and version mangling
6 - d/rules: Compile with DFLTCC enabled on s390x and hardware
7 compression at level 6
8 - d/zlib-core.symbols: Drop dfsg suffix from version
9 * New patches rebased from iii-i/zlib/dfltcc on GitHub:
10 - d/p/power/*: Add optimized crc32 for POWER8+
11 - d/p/s390x/*: Add optimized crc32 and hardware deflate
12 * Patches superseded by the above:
13 - d/p/410.patch: Add support for IBM Z hardware-accelerated deflate
14 - d/p/478.patch: Add optimized crc32 for Power 8+ processors
15 - d/p/s390x-vectorize-crc32.patch: Add s390x vectorized crc32 support
16 - d/p/1390.patch: Don't update strm.adler for raw streams on s390x
17 (DFLTCC), otherwise libxml2 gets broken on s390x. LP #2002511
18 - d/p/lp-2018293-fix-crash-in-deflateBound-if-called-before-deflateInt
19 .patch: Avoid potential deflateBound() function crash on s390x
20
21 -- Mate Kukri <mate.kukri@canonical.com> Fri, 24 Nov 2023 08:22:52 +0000
22
1zlib (1:1.3.dfsg-3) unstable; urgency=low23zlib (1:1.3.dfsg-3) unstable; urgency=low
224
3 * Update the version of texlive-binaries we break since they still had25 * Update the version of texlive-binaries we break since they still had
@@ -34,6 +56,74 @@ zlib (1:1.2.13.dfsg-2) unstable; urgency=low
3456
35 -- Mark Brown <broonie@debian.org> Tue, 15 Aug 2023 00:28:42 +010057 -- Mark Brown <broonie@debian.org> Tue, 15 Aug 2023 00:28:42 +0100
3658
59zlib (1:1.2.13.dfsg-1ubuntu5) mantic; urgency=medium
60
61 * Add
62 d/p/lp-2018293-fix-crash-in-deflateBound-if-called-before-deflateInt.patch
63 to avoid potential deflateBound() function crash on s390x.
64 * Clean-up and remove
65 d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch since it was
66 replaced by d/p/s390x-vectorize-crc32.patch with 1.2.13.dfsg-1ubuntu3
67 but was still in d/p/ (but not in d/p/series).
68
69 -- Frank Heimes <frank.heimes@canonical.com> Wed, 02 Aug 2023 13:22:26 +0200
70
71zlib (1:1.2.13.dfsg-1ubuntu4) lunar; urgency=medium
72
73 * Add d/p/1390.patch to not update strm.adler for raw streams on s390x
74 (DFLTCC), otherwise libxml2 gets broken on s390x. LP: #2002511
75
76 -- Frank Heimes <frank.heimes@canonical.com> Wed, 11 Jan 2023 18:02:34 +0100
77
78zlib (1:1.2.13.dfsg-1ubuntu3) lunar; urgency=medium
79
80 * Re-add vectorized crc32 support for s390x by adding
81 d/p/s390x-vectorize-crc32.patch
82 (crc32vx-v4: s390x: vectorize crc32). (LP: #1998470)
83 This replaces the previously dropped patch:
84 lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
85 * Remove option '--crc32-vx' for s390x in d/rules, that was previously just
86 commented out, since it's no longer needed with the new s390x crc32 code.
87 * Update d/p/410.patch to version 26f2c0a4e17e5558d779797d713aa37ebaeef390
88 due to unused "const char *endptr;".
89
90 -- Frank Heimes <frank.heimes@canonical.com> Mon, 21 Nov 2022 20:28:58 +0100
91
92zlib (1:1.2.13.dfsg-1ubuntu2) lunar; urgency=medium
93
94 * Comment out use of --crc32-vx on s390x, since this is currently not
95 implemented due to the dropped patch that needs porting.
96
97 -- Steve Langasek <steve.langasek@ubuntu.com> Tue, 15 Nov 2022 17:06:45 +0000
98
99zlib (1:1.2.13.dfsg-1ubuntu1) lunar; urgency=low
100
101 * Merge from Debian unstable. Remaining changes:
102 - Build x32 packages
103 - debian/zlib-core.symbols: Drop dfsg suffix from version
104 - Add watch file, with GPG tarball checking, and version mangling
105 - Cherrypick PR#410 to enable hardware-accelerated deflate.
106 - Copmile with DFLTCC enabled on s390x.
107 - Enable hardware compression on s390x at level 6.
108 - d/rules: use configure options for dfltcc instead of hardcoding
109 the CFLAGS
110 * Dropped changes, included upstream:
111 - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch
112 - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
113 for deflatePrime() is valid in deflate.c.
114 * Pull rebased 410.patch from https://github.com/madler/zlib/pull/410.
115 * Drop d/p/410-lp1961427.patch, included in the above rebase.
116 * Replace 335.patch for ppc64el (P8) crc32 performance with 478.patch which
117 supersedes it (https://github.com/madler/zlib/pull/478).
118 * Forward-port lp1932010-ibm-z-add-vectorized-crc32-implementation.patch.
119 * Dropped changes:
120 - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: this
121 patch depends on zlib upstream PR 335 which has been superseded by
122 upstream PR 478 with significant refactoring. Drop this patch,
123 pending a port from IBM.
124
125 -- Steve Langasek <steve.langasek@ubuntu.com> Mon, 07 Nov 2022 15:57:28 -0800
126
37zlib (1:1.2.13.dfsg-1) unstable; urgency=low127zlib (1:1.2.13.dfsg-1) unstable; urgency=low
38128
39 * New upstream release.129 * New upstream release.
@@ -42,6 +132,38 @@ zlib (1:1.2.13.dfsg-1) unstable; urgency=low
42132
43 -- Mark Brown <broonie@debian.org> Sat, 05 Nov 2022 12:24:46 +0000133 -- Mark Brown <broonie@debian.org> Sat, 05 Nov 2022 12:24:46 +0000
44134
135zlib (1:1.2.11.dfsg-4.1ubuntu1) kinetic; urgency=low
136
137 * Merge from Debian unstable. Remaining changes:
138 - Build x32 packages
139 - debian/zlib-core.symbols: Drop dfsg suffix from version
140 - Add watch file, with GPG tarball checking, and version mangling
141 - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
142 - Cherrypick PR#410 to enable hardware-accelerated deflate.
143 - Copmile with DFLTCC enabled on s390x.
144 - Improve crc32 performance on P8, proposed upstream patch.
145 - Enable hardware compression on s390x at level 6.
146 - Cherrypick update of s390x hw acceleration #410 pull request patch,
147 which corrects inflateSyncPoint() return value to always gracefully
148 fail when hw acceleration is in use.
149 - d/rules: use configure options for dfltcc instead of hardcoding
150 the CFLAGS
151 - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
152 ported from zlib-ng #912, adding a vectorized implementation
153 of CRC32 on s390x architectures based on kernel code.
154 - d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: adjust
155 to not make a PLT call in an ifunc on s390/s390x.
156 - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
157 for deflatePrime() is valid in deflate.c.
158 - d/p/410-lp1961427.patch ported from zlib #410, fixing
159 compressBound() with hw acceleration.
160 * Dropped changes, included in Debian:
161 - debian/patches/CVE-2018-25032-1.patch: fix a bug that can crash
162 deflate on some input when using Z_FIXED in deflate.c, deflate.h.
163 * Refresh 410.patch for upstream changes.
164
165 -- Steve Langasek <steve.langasek@ubuntu.com> Thu, 18 Aug 2022 09:09:22 -0700
166
45zlib (1:1.2.11.dfsg-4.1) unstable; urgency=medium167zlib (1:1.2.11.dfsg-4.1) unstable; urgency=medium
46168
47 * Non-maintainer upload.169 * Non-maintainer upload.
@@ -69,6 +191,89 @@ zlib (1:1.2.11.dfsg-3) unstable; urgency=low
69191
70 -- Mark Brown <broonie@debian.org> Fri, 18 Mar 2022 00:21:37 +0000192 -- Mark Brown <broonie@debian.org> Fri, 18 Mar 2022 00:21:37 +0000
71193
194zlib (1:1.2.11.dfsg-2ubuntu10) kinetic; urgency=medium
195
196 * d/p/410-lp1961427.patch ported from zlib #410, fixing
197 compressBound() with hw acceleration. LP: #1961427
198 Thanks to Ilya Leoshkevich <iii@linux.ibm.com>.
199 In addition a patch is needed for bedtools.
200
201 -- Frank Heimes <frank.heimes@canonical.com> Thu, 21 Jul 2022 09:30:05 +0100
202
203zlib (1:1.2.11.dfsg-2ubuntu9) jammy; urgency=medium
204
205 * SECURITY UPDATE: memory corruption when deflating
206 - debian/patches/CVE-2018-25032-1.patch: fix a bug that can crash
207 deflate on some input when using Z_FIXED in deflate.c, deflate.h.
208 - debian/patches/CVE-2018-25032-2.patch: assure that the number of bits
209 for deflatePrime() is valid in deflate.c.
210 - CVE-2018-25032
211
212 -- Marc Deslauriers <marc.deslauriers@ubuntu.com> Fri, 25 Mar 2022 08:06:31 -0400
213
214zlib (1:1.2.11.dfsg-2ubuntu7) impish; urgency=medium
215
216 [ Simon Chopin ]
217 * d/rules: use configure options for dfltcc instead of hardcoding
218 the CFLAGS
219 * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch
220 ported from zlib-ng #912, adding a vectorized implementation
221 of CRC32 on s390x architectures based on kernel code. LP: #1932010
222
223 [ Michael Hudson-Doyle ]
224 * d/p/lp1932010-ibm-z-add-vectorized-crc32-implementation.patch: adjust to
225 not make a PLT call in an ifunc on s390/s390x.
226
227 -- Simon Chopin <simon.chopin@canonical.com> Thu, 12 Aug 2021 15:45:49 +1200
228
229zlib (1:1.2.11.dfsg-2ubuntu6) hirsute; urgency=medium
230
231 * No-change rebuild to build with lto.
232
233 -- Matthias Klose <doko@ubuntu.com> Sun, 28 Mar 2021 09:10:07 +0200
234
235zlib (1:1.2.11.dfsg-2ubuntu5) hirsute; urgency=medium
236
237 * No-change rebuild to drop the udeb package.
238
239 -- Matthias Klose <doko@ubuntu.com> Mon, 22 Feb 2021 10:36:58 +0100
240
241zlib (1:1.2.11.dfsg-2ubuntu4) groovy; urgency=medium
242
243 * Cherrypick update of s390x hw acceleration #410 pull request patch,
244 which corrects inflateSyncPoint() return value to always gracefully
245 fail when hw acceleration is in use. This fixes rsync failure with
246 zlib compression on hw accelerated s390x. LP: #1899621
247
248 -- Dimitri John Ledkov <xnox@ubuntu.com> Thu, 15 Oct 2020 11:01:38 +0100
249
250zlib (1:1.2.11.dfsg-2ubuntu3) groovy; urgency=medium
251
252 * Enable hardware compression on s390x at level 6. LP: #1884514
253
254 -- Michael Hudson-Doyle <michael.hudson@ubuntu.com> Thu, 24 Sep 2020 08:44:35 +1200
255
256zlib (1:1.2.11.dfsg-2ubuntu2) groovy; urgency=medium
257
258 * Update d/patches/410.patch to current state. LP: #1882494, #1889059, #1893170
259
260 -- Michael Hudson-Doyle <michael.hudson@ubuntu.com> Thu, 20 Aug 2020 11:52:59 +1200
261
262zlib (1:1.2.11.dfsg-2ubuntu1) focal; urgency=medium
263
264 * Merge with Debian; remaining changes:
265 - Build x32 packages
266 - debian/zlib-core.symbols: Drop dfsg suffix from version
267 - Add watch file, with GPG tarball checking, and version mangling
268 - Drop unused patches
269 - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
270 (LP: #1692870)
271 - Cherrypick PR#410 to enable hardware-accelerated deflate.
272 - Copmile with DFLTCC enabled on s390x. LP: #1823157
273 - Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.
274
275 -- Matthias Klose <doko@ubuntu.com> Tue, 25 Feb 2020 16:59:52 +0100
276
72zlib (1:1.2.11.dfsg-2) unstable; urgency=low277zlib (1:1.2.11.dfsg-2) unstable; urgency=low
73278
74 * Acknowledge previous NMUs (closes: #949388).279 * Acknowledge previous NMUs (closes: #949388).
@@ -80,6 +285,21 @@ zlib (1:1.2.11.dfsg-2) unstable; urgency=low
80285
81 -- Mark Brown <broonie@debian.org> Mon, 24 Feb 2020 21:07:12 +0000286 -- Mark Brown <broonie@debian.org> Mon, 24 Feb 2020 21:07:12 +0000
82287
288zlib (1:1.2.11.dfsg-1.2ubuntu1) focal; urgency=medium
289
290 * Merge with Debian; remaining changes:
291 - Build x32 packages
292 - debian/zlib-core.symbols: Drop dfsg suffix from version
293 - Add watch file, with GPG tarball checking, and version mangling
294 - Drop unused patches
295 - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
296 (LP: #1692870)
297 - Cherrypick PR#410 to enable hardware-accelerated deflate.
298 - Copmile with DFLTCC enabled on s390x. LP: #1823157
299 * Improve crc32 performance on P8, proposed upstream patch. LP: #1742941.
300
301 -- Matthias Klose <doko@ubuntu.com> Mon, 24 Feb 2020 12:57:03 +0100
302
83zlib (1:1.2.11.dfsg-1.2) unstable; urgency=medium303zlib (1:1.2.11.dfsg-1.2) unstable; urgency=medium
84304
85 * Non-maintainer upload.305 * Non-maintainer upload.
@@ -97,6 +317,31 @@ zlib (1:1.2.11.dfsg-1.1) unstable; urgency=medium
97317
98 -- YunQiang Su <syq@debian.org> Tue, 28 Jan 2020 19:55:38 +0800318 -- YunQiang Su <syq@debian.org> Tue, 28 Jan 2020 19:55:38 +0800
99319
320zlib (1:1.2.11.dfsg-1ubuntu3) eoan; urgency=medium
321
322 * Cherrypick PR#410 to enable hardware-accelerated deflate.
323 * Copmile with DFLTCC enabled on s390x. LP: #1823157
324
325 -- Dimitri John Ledkov <xnox@ubuntu.com> Mon, 19 Aug 2019 19:51:09 +0100
326
327zlib (1:1.2.11.dfsg-1ubuntu2) disco; urgency=medium
328
329 * debian/zlib-core.symbols: fix mistake introduced in the merge
330
331 -- Jeremy Bicha <jbicha@debian.org> Thu, 24 Jan 2019 12:56:53 -0500
332
333zlib (1:1.2.11.dfsg-1ubuntu1) disco; urgency=medium
334
335 * Sync with Debian. Remaining changes:
336 - Build x32 packages
337 - debian/zlib-core.symbols: Drop dfsg suffix from version
338 - Add watch file, with GPG tarball checking, and version mangling
339 - Drop unused patches
340 - Cherry-pick Permit-a-deflateParams-parameter-change-asap.patch:
341 (LP: #1692870)
342
343 -- Jeremy Bicha <jbicha@debian.org> Wed, 23 Jan 2019 17:22:17 -0500
344
100zlib (1:1.2.11.dfsg-1) unstable; urgency=low345zlib (1:1.2.11.dfsg-1) unstable; urgency=low
101346
102 * New upstream release (closes: #883180).347 * New upstream release (closes: #883180).
@@ -1072,3 +1317,4 @@ zlib (1.0.4-1) unstable; urgency=low
1072 * Moved to new source packaging format.1317 * Moved to new source packaging format.
10731318
1074 -- Michael Alan Dorman <mdorman@calder.med.miami.edu> Thu, 12 Sep 1996 15:19:35 -04001319 -- Michael Alan Dorman <mdorman@calder.med.miami.edu> Thu, 12 Sep 1996 15:19:35 -0400
1320
diff --git a/debian/control b/debian/control
index 3b4ff22..f365460 100644
--- a/debian/control
+++ b/debian/control
@@ -1,7 +1,8 @@
1Source: zlib1Source: zlib
2Section: libs2Section: libs
3Priority: optional3Priority: optional
4Maintainer: Mark Brown <broonie@debian.org>4Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
5XSBC-Original-Maintainer: Mark Brown <broonie@debian.org>
5Standards-Version: 4.6.16Standards-Version: 4.6.1
6Homepage: http://zlib.net/7Homepage: http://zlib.net/
7Build-Depends: debhelper (>= 13), gcc-multilib [amd64 i386 kfreebsd-amd64 mips mipsel powerpc ppc64 s390 sparc s390x mipsn32 mipsn32el mipsr6 mipsr6el mipsn32r6 mipsn32r6el mips64 mips64el mips64r6 mips64r6el x32] <!nobiarch>, dpkg-dev (>= 1.16.1), autoconf8Build-Depends: debhelper (>= 13), gcc-multilib [amd64 i386 kfreebsd-amd64 mips mipsel powerpc ppc64 s390 sparc s390x mipsn32 mipsn32el mipsr6 mipsr6el mipsn32r6 mipsn32r6el mips64 mips64el mips64r6 mips64r6el x32] <!nobiarch>, dpkg-dev (>= 1.16.1), autoconf
@@ -119,6 +120,28 @@ Description: compression library - n32 - DO NOT USE EXCEPT FOR PACKAGING
119 not need to build packages should use multiarch to install the relevant120 not need to build packages should use multiarch to install the relevant
120 runtime.121 runtime.
121122
123Package: libx32z1
124Architecture: amd64 i386
125Depends: ${shlibs:Depends}, ${misc:Depends}
126Description: compression library - x32 runtime
127 zlib is a library implementing the deflate compression method found
128 in gzip and PKZIP. This package includes a n32 version of the shared
129 library.
130
131Package: libx32z1-dev
132Section: libdevel
133Architecture: amd64 i386
134Depends: libx32z1 (= ${binary:Version}), zlib1g-dev (= ${binary:Version}), libc6-dev-x32, ${misc:Depends}
135Provides: libx32z-dev
136Description: compression library - x32 - DO NOT USE EXCEPT FOR PACKAGING
137 zlib is a library implementing the deflate compression method found
138 in gzip and PKZIP. This package includes the development support
139 files for building n32 applications.
140 .
141 This package should ONLY be used for building packages, users who do
142 not need to build packages should use multiarch to install the relevant
143 runtime.
144
122Package: minizip145Package: minizip
123Section: utils146Section: utils
124Architecture: any147Architecture: any
diff --git a/debian/libx32z1-dev.dirs b/debian/libx32z1-dev.dirs
125new file mode 100644148new file mode 100644
index 0000000..5447591
--- /dev/null
+++ b/debian/libx32z1-dev.dirs
@@ -0,0 +1 @@
1usr/libx32
diff --git a/debian/libx32z1-dev.install b/debian/libx32z1-dev.install
0new file mode 1006442new file mode 100644
index 0000000..a865054
--- /dev/null
+++ b/debian/libx32z1-dev.install
@@ -0,0 +1,2 @@
1usr/libx32/libz.a
2usr/libx32/libz.so
diff --git a/debian/libx32z1.dirs b/debian/libx32z1.dirs
0new file mode 1006443new file mode 100644
index 0000000..5447591
--- /dev/null
+++ b/debian/libx32z1.dirs
@@ -0,0 +1 @@
1usr/libx32
diff --git a/debian/libx32z1.install b/debian/libx32z1.install
0new file mode 1006442new file mode 100644
index 0000000..3ff82f2
--- /dev/null
+++ b/debian/libx32z1.install
@@ -0,0 +1 @@
1usr/libx32/libz.so.*
diff --git a/debian/libx32z1.symbols b/debian/libx32z1.symbols
0new file mode 1006442new file mode 100644
index 0000000..a87cfdc
--- /dev/null
+++ b/debian/libx32z1.symbols
@@ -0,0 +1,3 @@
1libz.so.1 libx32z1 #MINVER#
2#include "zlib-core.symbols"
3#include "zlib-64.symbols"
diff --git a/debian/patches/power/add-optimized-crc32.patch b/debian/patches/power/add-optimized-crc32.patch
0new file mode 1006444new file mode 100644
index 0000000..b057b57
--- /dev/null
+++ b/debian/patches/power/add-optimized-crc32.patch
@@ -0,0 +1,2539 @@
1From: Manjunath S Matti <mmatti@linux.ibm.com>
2Date: Thu, 14 Sep 2023 06:43:11 -0500
3Subject: Add Power8+ optimized crc32
4
5This commit adds an optimized version for the crc32 function based
6on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/
7
8This is the C implementation created by Rogerio Alves
9<rogealve@br.ibm.com>
10
11It makes use of vector instructions to speed up CRC32 algorithm.
12
13Author: Rogerio Alves <rcardoso@linux.ibm.com>
14Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
15
16Origin: i-iii/zlib,https://github.com/iii-i/zlib/commit/6879bc81b111247939b4924b08c5993fd0482b1a
17---
18 .gitignore | 29 +
19 CMakeLists.txt | 7 +-
20 Makefile.in | 43 +-
21 configure | 7 +-
22 contrib/README.contrib | 3 +-
23 contrib/power/clang_workaround.h | 82 +++
24 contrib/power/crc32_constants.h | 1206 ++++++++++++++++++++++++++++++++++++++
25 contrib/power/crc32_z_power8.c | 679 +++++++++++++++++++++
26 contrib/power/crc32_z_resolver.c | 15 +
27 contrib/power/power.h | 4 +
28 crc32.c | 12 +
29 test/crc32_test.c | 205 +++++++
30 12 files changed, 2278 insertions(+), 14 deletions(-)
31 create mode 100644 .gitignore
32 create mode 100644 contrib/power/clang_workaround.h
33 create mode 100644 contrib/power/crc32_constants.h
34 create mode 100644 contrib/power/crc32_z_power8.c
35 create mode 100644 contrib/power/crc32_z_resolver.c
36 create mode 100644 test/crc32_test.c
37
38diff --git a/.gitignore b/.gitignore
39new file mode 100644
40index 0000000..e324531
41--- /dev/null
42+++ b/.gitignore
43@@ -0,0 +1,29 @@
44+*.diff
45+*.patch
46+*.orig
47+*.rej
48+
49+*~
50+*.a
51+*.lo
52+*.o
53+*.dylib
54+
55+*.gcda
56+*.gcno
57+*.gcov
58+
59+/crc32_test
60+/crc32_test64
61+/crc32_testsh
62+/example
63+/example64
64+/examplesh
65+/libz.so*
66+/minigzip
67+/minigzip64
68+/minigzipsh
69+/zlib.pc
70+/configure.log
71+
72+.DS_Store
73diff --git a/CMakeLists.txt b/CMakeLists.txt
74index 4456cd7..0464ba3 100644
75--- a/CMakeLists.txt
76+++ b/CMakeLists.txt
77@@ -172,7 +172,8 @@ if(CMAKE_COMPILER_IS_GNUCC)
78
79 if(POWER8)
80 add_definitions(-DZ_POWER8)
81- set(ZLIB_POWER8 )
82+ set(ZLIB_POWER8
83+ contrib/power/crc32_z_power8.c)
84
85 set_source_files_properties(
86 ${ZLIB_POWER8}
87@@ -269,6 +270,10 @@ add_executable(example test/example.c)
88 target_link_libraries(example zlib)
89 add_test(example example)
90
91+add_executable(crc32_test test/crc32_test.c)
92+target_link_libraries(crc32_test zlib)
93+add_test(crc32_test crc32_test)
94+
95 add_executable(minigzip test/minigzip.c)
96 target_link_libraries(minigzip zlib)
97
98diff --git a/Makefile.in b/Makefile.in
99index 34d3cd7..2dbb20a 100644
100--- a/Makefile.in
101+++ b/Makefile.in
102@@ -71,11 +71,11 @@ PIC_OBJS = $(PIC_OBJC) $(PIC_OBJA)
103
104 all: static shared
105
106-static: example$(EXE) minigzip$(EXE)
107+static: crc32_test$(EXE) example$(EXE) minigzip$(EXE)
108
109-shared: examplesh$(EXE) minigzipsh$(EXE)
110+shared: crc32_testsh$(EXE) examplesh$(EXE) minigzipsh$(EXE)
111
112-all64: example64$(EXE) minigzip64$(EXE)
113+all64: crc32_test64$(EXE) example64$(EXE) minigzip64$(EXE)
114
115 check: test
116
117@@ -83,7 +83,7 @@ test: all teststatic testshared
118
119 teststatic: static
120 @TMPST=tmpst_$$; \
121- if echo hello world | ${QEMU_RUN} ./minigzip | ${QEMU_RUN} ./minigzip -d && ${QEMU_RUN} ./example $$TMPST ; then \
122+ if echo hello world | ${QEMU_RUN} ./minigzip | ${QEMU_RUN} ./minigzip -d && ${QEMU_RUN} ./example $$TMPST && ${QEMU_RUN} ./crc32_test; then \
123 echo ' *** zlib test OK ***'; \
124 else \
125 echo ' *** zlib test FAILED ***'; false; \
126@@ -96,7 +96,7 @@ testshared: shared
127 DYLD_LIBRARY_PATH=`pwd`:$(DYLD_LIBRARY_PATH) ; export DYLD_LIBRARY_PATH; \
128 SHLIB_PATH=`pwd`:$(SHLIB_PATH) ; export SHLIB_PATH; \
129 TMPSH=tmpsh_$$; \
130- if echo hello world | ${QEMU_RUN} ./minigzipsh | ${QEMU_RUN} ./minigzipsh -d && ${QEMU_RUN} ./examplesh $$TMPSH; then \
131+ if echo hello world | ${QEMU_RUN} ./minigzipsh | ${QEMU_RUN} ./minigzipsh -d && ${QEMU_RUN} ./examplesh $$TMPSH && ${QEMU_RUN} ./crc32_testsh; then \
132 echo ' *** zlib shared test OK ***'; \
133 else \
134 echo ' *** zlib shared test FAILED ***'; false; \
135@@ -105,7 +105,7 @@ testshared: shared
136
137 test64: all64
138 @TMP64=tmp64_$$; \
139- if echo hello world | ${QEMU_RUN} ./minigzip64 | ${QEMU_RUN} ./minigzip64 -d && ${QEMU_RUN} ./example64 $$TMP64; then \
140+ if echo hello world | ${QEMU_RUN} ./minigzip64 | ${QEMU_RUN} ./minigzip64 -d && ${QEMU_RUN} ./example64 $$TMP64 && ${QEMU_RUN} ./crc32_test64; then \
141 echo ' *** zlib 64-bit test OK ***'; \
142 else \
143 echo ' *** zlib 64-bit test FAILED ***'; false; \
144@@ -139,12 +139,18 @@ match.lo: match.S
145 mv _match.o match.lo
146 rm -f _match.s
147
148+crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
149+ $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c
150+
151 example.o: $(SRCDIR)test/example.c $(SRCDIR)zlib.h zconf.h
152 $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/example.c
153
154 minigzip.o: $(SRCDIR)test/minigzip.c $(SRCDIR)zlib.h zconf.h
155 $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/minigzip.c
156
157+crc32_test64.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
158+ $(CC) $(CFLAGS) $(ZINCOUT) -D_FILE_OFFSET_BITS=64 -c -o $@ $(SRCDIR)test/crc32_test.c
159+
160 example64.o: $(SRCDIR)test/example.c $(SRCDIR)zlib.h zconf.h
161 $(CC) $(CFLAGS) $(ZINCOUT) -D_FILE_OFFSET_BITS=64 -c -o $@ $(SRCDIR)test/example.c
162
163@@ -158,6 +164,9 @@ adler32.o: $(SRCDIR)adler32.c
164 crc32.o: $(SRCDIR)crc32.c
165 $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)crc32.c
166
167+crc32_z_power8.o: $(SRCDIR)contrib/power/crc32_z_power8.c
168+ $(CC) $(CFLAGS) -mcpu=power8 $(ZINC) -c -o $@ $(SRCDIR)contrib/power/crc32_z_power8.c
169+
170 deflate.o: $(SRCDIR)deflate.c
171 $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)deflate.c
172
173@@ -208,6 +217,11 @@ crc32.lo: $(SRCDIR)crc32.c
174 $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/crc32.o $(SRCDIR)crc32.c
175 -@mv objs/crc32.o $@
176
177+crc32_z_power8.lo: $(SRCDIR)contrib/power/crc32_z_power8.c
178+ -@mkdir objs 2>/dev/null || test -d objs
179+ $(CC) $(SFLAGS) -mcpu=power8 $(ZINC) -DPIC -c -o objs/crc32_z_power8.o $(SRCDIR)contrib/power/crc32_z_power8.c
180+ -@mv objs/crc32_z_power8.o $@
181+
182 deflate.lo: $(SRCDIR)deflate.c
183 -@mkdir objs 2>/dev/null || test -d objs
184 $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/deflate.o $(SRCDIR)deflate.c
185@@ -281,18 +295,27 @@ placebo $(SHAREDLIBV): $(PIC_OBJS) libz.a
186 ln -s $@ $(SHAREDLIBM)
187 -@rmdir objs
188
189+crc32_test$(EXE): crc32_test.o $(STATICLIB)
190+ $(CC) $(CFLAGS) -o $@ crc32_test.o $(TEST_LDFLAGS)
191+
192 example$(EXE): example.o $(STATICLIB)
193 $(CC) $(CFLAGS) -o $@ example.o $(TEST_LDFLAGS)
194
195 minigzip$(EXE): minigzip.o $(STATICLIB)
196 $(CC) $(CFLAGS) -o $@ minigzip.o $(TEST_LDFLAGS)
197
198+crc32_testsh$(EXE): crc32_test.o $(SHAREDLIBV)
199+ $(CC) $(CFLAGS) -o $@ crc32_test.o -L. $(SHAREDLIBV)
200+
201 examplesh$(EXE): example.o $(SHAREDLIBV)
202 $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS) -L. $(SHAREDLIBV)
203
204 minigzipsh$(EXE): minigzip.o $(SHAREDLIBV)
205 $(CC) $(CFLAGS) -o $@ minigzip.o $(LDFLAGS) -L. $(SHAREDLIBV)
206
207+crc32_test64$(EXE): crc32_test64.o $(STATICLIB)
208+ $(CC) $(CFLAGS) -o $@ crc32_test64.o $(TEST_LDFLAGS)
209+
210 example64$(EXE): example64.o $(STATICLIB)
211 $(CC) $(CFLAGS) -o $@ example64.o $(TEST_LDFLAGS)
212
213@@ -368,8 +391,8 @@ minizip-clean:
214 mostlyclean: clean
215 clean: minizip-clean
216 rm -f *.o *.lo *~ \
217- example$(EXE) minigzip$(EXE) examplesh$(EXE) minigzipsh$(EXE) \
218- example64$(EXE) minigzip64$(EXE) \
219+ crc32_test$(EXE) example$(EXE) minigzip$(EXE) crc32_testsh$(EXE) examplesh$(EXE) minigzipsh$(EXE) \
220+ crc32_test64$(EXE) example64$(EXE) minigzip64$(EXE) \
221 infcover \
222 libz.* foo.gz so_locations \
223 _match.s maketree contrib/infback9/*.o
224@@ -391,7 +414,7 @@ tags:
225
226 adler32.o zutil.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
227 gzclose.o gzlib.o gzread.o gzwrite.o: $(SRCDIR)zlib.h zconf.h $(SRCDIR)gzguts.h
228-compress.o example.o minigzip.o uncompr.o: $(SRCDIR)zlib.h zconf.h
229+compress.o crc32_test.o example.o minigzip.o uncompr.o: $(SRCDIR)zlib.h zconf.h
230 crc32.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)crc32.h
231 deflate.o: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
232 infback.o inflate.o: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR)inflate.h $(SRCDIR)inffast.h $(SRCDIR)inffixed.h
233@@ -401,7 +424,7 @@ trees.o: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)tr
234
235 adler32.lo zutil.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
236 gzclose.lo gzlib.lo gzread.lo gzwrite.lo: $(SRCDIR)zlib.h zconf.h $(SRCDIR)gzguts.h
237-compress.lo example.lo minigzip.lo uncompr.lo: $(SRCDIR)zlib.h zconf.h
238+compress.lo crc32_test.lo example.lo minigzip.lo uncompr.lo: $(SRCDIR)zlib.h zconf.h
239 crc32.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)crc32.h
240 deflate.lo: $(SRCDIR)deflate.h $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h
241 infback.lo inflate.lo: $(SRCDIR)zutil.h $(SRCDIR)zlib.h zconf.h $(SRCDIR)inftrees.h $(SRCDIR)inflate.h $(SRCDIR)inffast.h $(SRCDIR)inffixed.h
242diff --git a/configure b/configure
243index e307a8d..b96ed4a 100755
244--- a/configure
245+++ b/configure
246@@ -864,6 +864,9 @@ cat > $test.c <<EOF
247 #ifndef _ARCH_PPC
248 #error "Target is not Power"
249 #endif
250+#if !(defined(__PPC64__) || defined(__powerpc64__))
251+ #error "Target is not 64 bits"
252+#endif
253 #ifndef HAVE_IFUNC
254 #error "Target doesn't support ifunc"
255 #endif
256@@ -877,8 +880,8 @@ if tryboth $CC -c $CFLAGS $test.c; then
257
258 if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
259 POWER8="-DZ_POWER8"
260- PIC_OBJC="${PIC_OBJC}"
261- OBJC="${OBJC}"
262+ PIC_OBJC="${PIC_OBJC} crc32_z_power8.lo"
263+ OBJC="${OBJC} crc32_z_power8.o"
264 echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
265 else
266 echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
267diff --git a/contrib/README.contrib b/contrib/README.contrib
268index c57b520..90170df 100644
269--- a/contrib/README.contrib
270+++ b/contrib/README.contrib
271@@ -46,7 +46,8 @@ minizip/ by Gilles Vollant <info@winimage.com>
272 pascal/ by Bob Dellaca <bobdl@xtra.co.nz> et al.
273 Support for Pascal
274
275-power/ by Matheus Castanho <msc@linux.ibm.com>
276+power/ by Daniel Black <daniel@linux.ibm.com>
277+ Matheus Castanho <msc@linux.ibm.com>
278 and Rogerio Alves <rcardoso@linux.ibm.com>
279 Optimized functions for Power processors
280
281diff --git a/contrib/power/clang_workaround.h b/contrib/power/clang_workaround.h
282new file mode 100644
283index 0000000..b5e7dae
284--- /dev/null
285+++ b/contrib/power/clang_workaround.h
286@@ -0,0 +1,82 @@
287+#ifndef CLANG_WORKAROUNDS_H
288+#define CLANG_WORKAROUNDS_H
289+
290+/*
291+ * These stubs fix clang incompatibilities with GCC builtins.
292+ */
293+
294+#ifndef __builtin_crypto_vpmsumw
295+#define __builtin_crypto_vpmsumw __builtin_crypto_vpmsumb
296+#endif
297+#ifndef __builtin_crypto_vpmsumd
298+#define __builtin_crypto_vpmsumd __builtin_crypto_vpmsumb
299+#endif
300+
301+static inline
302+__vector unsigned long long __attribute__((overloadable))
303+vec_ld(int __a, const __vector unsigned long long* __b)
304+{
305+ return (__vector unsigned long long)__builtin_altivec_lvx(__a, __b);
306+}
307+
308+/*
309+ * GCC __builtin_pack_vector_int128 returns a vector __int128_t but Clang
310+ * does not recognize this type. On GCC this builtin is translated to a
311+ * xxpermdi instruction that only moves the registers __a, __b instead generates
312+ * a load.
313+ *
314+ * Clang has vec_xxpermdi intrinsics. It was implemented in 4.0.0.
315+ */
316+static inline
317+__vector unsigned long long __builtin_pack_vector (unsigned long __a,
318+ unsigned long __b)
319+{
320+ #if defined(__BIG_ENDIAN__)
321+ __vector unsigned long long __v = {__a, __b};
322+ #else
323+ __vector unsigned long long __v = {__b, __a};
324+ #endif
325+ return __v;
326+}
327+
328+#ifndef vec_xxpermdi
329+
330+static inline
331+unsigned long __builtin_unpack_vector (__vector unsigned long long __v,
332+ int __o)
333+{
334+ return __v[__o];
335+}
336+
337+#if defined(__BIG_ENDIAN__)
338+#define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 0)
339+#define __builtin_unpack_vector_1(a) __builtin_unpack_vector ((a), 1)
340+#else
341+#define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 1)
342+#define __builtin_unpack_vector_1(a) __builtin_unpack_vector ((a), 0)
343+#endif
344+
345+#else
346+
347+static inline
348+unsigned long __builtin_unpack_vector_0 (__vector unsigned long long __v)
349+{
350+ #if defined(__BIG_ENDIAN__)
351+ return vec_xxpermdi(__v, __v, 0x0)[1];
352+ #else
353+ return vec_xxpermdi(__v, __v, 0x0)[0];
354+ #endif
355+}
356+
357+static inline
358+unsigned long __builtin_unpack_vector_1 (__vector unsigned long long __v)
359+{
360+ #if defined(__BIG_ENDIAN__)
361+ return vec_xxpermdi(__v, __v, 0x3)[1];
362+ #else
363+ return vec_xxpermdi(__v, __v, 0x3)[0];
364+ #endif
365+}
366+#endif /* vec_xxpermdi */
367+
368+#endif
369diff --git a/contrib/power/crc32_constants.h b/contrib/power/crc32_constants.h
370new file mode 100644
371index 0000000..3d01150
372--- /dev/null
373+++ b/contrib/power/crc32_constants.h
374@@ -0,0 +1,1206 @@
375+/*
376+*
377+* THIS FILE IS GENERATED WITH
378+./crc32_constants -c -r -x 0x04C11DB7
379+
380+* This is from https://github.com/antonblanchard/crc32-vpmsum/
381+* DO NOT MODIFY IT MANUALLY!
382+*
383+*/
384+
385+#define CRC 0x4c11db7
386+#define CRC_XOR
387+#define REFLECT
388+#define MAX_SIZE 32768
389+
390+#ifndef __ASSEMBLER__
391+#ifdef CRC_TABLE
392+static const unsigned int crc_table[] = {
393+ 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
394+ 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
395+ 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
396+ 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
397+ 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de,
398+ 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
399+ 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec,
400+ 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
401+ 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
402+ 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
403+ 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940,
404+ 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
405+ 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116,
406+ 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
407+ 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
408+ 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
409+ 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a,
410+ 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
411+ 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818,
412+ 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
413+ 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
414+ 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
415+ 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c,
416+ 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
417+ 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2,
418+ 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
419+ 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
420+ 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
421+ 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086,
422+ 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
423+ 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4,
424+ 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
425+ 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
426+ 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
427+ 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8,
428+ 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
429+ 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe,
430+ 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
431+ 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
432+ 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
433+ 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252,
434+ 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
435+ 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60,
436+ 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
437+ 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
438+ 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
439+ 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04,
440+ 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
441+ 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a,
442+ 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
443+ 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
444+ 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
445+ 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e,
446+ 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
447+ 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c,
448+ 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
449+ 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
450+ 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
451+ 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0,
452+ 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
453+ 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6,
454+ 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
455+ 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
456+ 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,};
457+
458+#endif /* CRC_TABLE */
459+#ifdef POWER8_INTRINSICS
460+
461+/* Constants */
462+
463+/* Reduce 262144 kbits to 1024 bits */
464+static const __vector unsigned long long vcrc_const[255]
465+ __attribute__((aligned (16))) = {
466+#ifdef __LITTLE_ENDIAN__
467+ /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
468+ { 0x0000000099ea94a8, 0x00000001651797d2 },
469+ /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
470+ { 0x00000000945a8420, 0x0000000021e0d56c },
471+ /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
472+ { 0x0000000030762706, 0x000000000f95ecaa },
473+ /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
474+ { 0x00000001a52fc582, 0x00000001ebd224ac },
475+ /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
476+ { 0x00000001a4a7167a, 0x000000000ccb97ca },
477+ /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
478+ { 0x000000000c18249a, 0x00000001006ec8a8 },
479+ /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
480+ { 0x00000000a924ae7c, 0x000000014f58f196 },
481+ /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
482+ { 0x00000001e12ccc12, 0x00000001a7192ca6 },
483+ /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
484+ { 0x00000000a0b9d4ac, 0x000000019a64bab2 },
485+ /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
486+ { 0x0000000095e8ddfe, 0x0000000014f4ed2e },
487+ /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
488+ { 0x00000000233fddc4, 0x000000011092b6a2 },
489+ /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
490+ { 0x00000001b4529b62, 0x00000000c8a1629c },
491+ /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
492+ { 0x00000001a7fa0e64, 0x000000017bf32e8e },
493+ /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */
494+ { 0x00000001b5334592, 0x00000001f8cc6582 },
495+ /* x^246784 mod p(x)` << 1, x^246848 mod p(x)` << 1 */
496+ { 0x000000011f8ee1b4, 0x000000008631ddf0 },
497+ /* x^245760 mod p(x)` << 1, x^245824 mod p(x)` << 1 */
498+ { 0x000000006252e632, 0x000000007e5a76d0 },
499+ /* x^244736 mod p(x)` << 1, x^244800 mod p(x)` << 1 */
500+ { 0x00000000ab973e84, 0x000000002b09b31c },
501+ /* x^243712 mod p(x)` << 1, x^243776 mod p(x)` << 1 */
502+ { 0x000000007734f5ec, 0x00000001b2df1f84 },
503+ /* x^242688 mod p(x)` << 1, x^242752 mod p(x)` << 1 */
504+ { 0x000000007c547798, 0x00000001d6f56afc },
505+ /* x^241664 mod p(x)` << 1, x^241728 mod p(x)` << 1 */
506+ { 0x000000007ec40210, 0x00000001b9b5e70c },
507+ /* x^240640 mod p(x)` << 1, x^240704 mod p(x)` << 1 */
508+ { 0x00000001ab1695a8, 0x0000000034b626d2 },
509+ /* x^239616 mod p(x)` << 1, x^239680 mod p(x)` << 1 */
510+ { 0x0000000090494bba, 0x000000014c53479a },
511+ /* x^238592 mod p(x)` << 1, x^238656 mod p(x)` << 1 */
512+ { 0x00000001123fb816, 0x00000001a6d179a4 },
513+ /* x^237568 mod p(x)` << 1, x^237632 mod p(x)` << 1 */
514+ { 0x00000001e188c74c, 0x000000015abd16b4 },
515+ /* x^236544 mod p(x)` << 1, x^236608 mod p(x)` << 1 */
516+ { 0x00000001c2d3451c, 0x00000000018f9852 },
517+ /* x^235520 mod p(x)` << 1, x^235584 mod p(x)` << 1 */
518+ { 0x00000000f55cf1ca, 0x000000001fb3084a },
519+ /* x^234496 mod p(x)` << 1, x^234560 mod p(x)` << 1 */
520+ { 0x00000001a0531540, 0x00000000c53dfb04 },
521+ /* x^233472 mod p(x)` << 1, x^233536 mod p(x)` << 1 */
522+ { 0x0000000132cd7ebc, 0x00000000e10c9ad6 },
523+ /* x^232448 mod p(x)` << 1, x^232512 mod p(x)` << 1 */
524+ { 0x0000000073ab7f36, 0x0000000025aa994a },
525+ /* x^231424 mod p(x)` << 1, x^231488 mod p(x)` << 1 */
526+ { 0x0000000041aed1c2, 0x00000000fa3a74c4 },
527+ /* x^230400 mod p(x)` << 1, x^230464 mod p(x)` << 1 */
528+ { 0x0000000136c53800, 0x0000000033eb3f40 },
529+ /* x^229376 mod p(x)` << 1, x^229440 mod p(x)` << 1 */
530+ { 0x0000000126835a30, 0x000000017193f296 },
531+ /* x^228352 mod p(x)` << 1, x^228416 mod p(x)` << 1 */
532+ { 0x000000006241b502, 0x0000000043f6c86a },
533+ /* x^227328 mod p(x)` << 1, x^227392 mod p(x)` << 1 */
534+ { 0x00000000d5196ad4, 0x000000016b513ec6 },
535+ /* x^226304 mod p(x)` << 1, x^226368 mod p(x)` << 1 */
536+ { 0x000000009cfa769a, 0x00000000c8f25b4e },
537+ /* x^225280 mod p(x)` << 1, x^225344 mod p(x)` << 1 */
538+ { 0x00000000920e5df4, 0x00000001a45048ec },
539+ /* x^224256 mod p(x)` << 1, x^224320 mod p(x)` << 1 */
540+ { 0x0000000169dc310e, 0x000000000c441004 },
541+ /* x^223232 mod p(x)` << 1, x^223296 mod p(x)` << 1 */
542+ { 0x0000000009fc331c, 0x000000000e17cad6 },
543+ /* x^222208 mod p(x)` << 1, x^222272 mod p(x)` << 1 */
544+ { 0x000000010d94a81e, 0x00000001253ae964 },
545+ /* x^221184 mod p(x)` << 1, x^221248 mod p(x)` << 1 */
546+ { 0x0000000027a20ab2, 0x00000001d7c88ebc },
547+ /* x^220160 mod p(x)` << 1, x^220224 mod p(x)` << 1 */
548+ { 0x0000000114f87504, 0x00000001e7ca913a },
549+ /* x^219136 mod p(x)` << 1, x^219200 mod p(x)` << 1 */
550+ { 0x000000004b076d96, 0x0000000033ed078a },
551+ /* x^218112 mod p(x)` << 1, x^218176 mod p(x)` << 1 */
552+ { 0x00000000da4d1e74, 0x00000000e1839c78 },
553+ /* x^217088 mod p(x)` << 1, x^217152 mod p(x)` << 1 */
554+ { 0x000000001b81f672, 0x00000001322b267e },
555+ /* x^216064 mod p(x)` << 1, x^216128 mod p(x)` << 1 */
556+ { 0x000000009367c988, 0x00000000638231b6 },
557+ /* x^215040 mod p(x)` << 1, x^215104 mod p(x)` << 1 */
558+ { 0x00000001717214ca, 0x00000001ee7f16f4 },
559+ /* x^214016 mod p(x)` << 1, x^214080 mod p(x)` << 1 */
560+ { 0x000000009f47d820, 0x0000000117d9924a },
561+ /* x^212992 mod p(x)` << 1, x^213056 mod p(x)` << 1 */
562+ { 0x000000010d9a47d2, 0x00000000e1a9e0c4 },
563+ /* x^211968 mod p(x)` << 1, x^212032 mod p(x)` << 1 */
564+ { 0x00000000a696c58c, 0x00000001403731dc },
565+ /* x^210944 mod p(x)` << 1, x^211008 mod p(x)` << 1 */
566+ { 0x000000002aa28ec6, 0x00000001a5ea9682 },
567+ /* x^209920 mod p(x)` << 1, x^209984 mod p(x)` << 1 */
568+ { 0x00000001fe18fd9a, 0x0000000101c5c578 },
569+ /* x^208896 mod p(x)` << 1, x^208960 mod p(x)` << 1 */
570+ { 0x000000019d4fc1ae, 0x00000000dddf6494 },
571+ /* x^207872 mod p(x)` << 1, x^207936 mod p(x)` << 1 */
572+ { 0x00000001ba0e3dea, 0x00000000f1c3db28 },
573+ /* x^206848 mod p(x)` << 1, x^206912 mod p(x)` << 1 */
574+ { 0x0000000074b59a5e, 0x000000013112fb9c },
575+ /* x^205824 mod p(x)` << 1, x^205888 mod p(x)` << 1 */
576+ { 0x00000000f2b5ea98, 0x00000000b680b906 },
577+ /* x^204800 mod p(x)` << 1, x^204864 mod p(x)` << 1 */
578+ { 0x0000000187132676, 0x000000001a282932 },
579+ /* x^203776 mod p(x)` << 1, x^203840 mod p(x)` << 1 */
580+ { 0x000000010a8c6ad4, 0x0000000089406e7e },
581+ /* x^202752 mod p(x)` << 1, x^202816 mod p(x)` << 1 */
582+ { 0x00000001e21dfe70, 0x00000001def6be8c },
583+ /* x^201728 mod p(x)` << 1, x^201792 mod p(x)` << 1 */
584+ { 0x00000001da0050e4, 0x0000000075258728 },
585+ /* x^200704 mod p(x)` << 1, x^200768 mod p(x)` << 1 */
586+ { 0x00000000772172ae, 0x000000019536090a },
587+ /* x^199680 mod p(x)` << 1, x^199744 mod p(x)` << 1 */
588+ { 0x00000000e47724aa, 0x00000000f2455bfc },
589+ /* x^198656 mod p(x)` << 1, x^198720 mod p(x)` << 1 */
590+ { 0x000000003cd63ac4, 0x000000018c40baf4 },
591+ /* x^197632 mod p(x)` << 1, x^197696 mod p(x)` << 1 */
592+ { 0x00000001bf47d352, 0x000000004cd390d4 },
593+ /* x^196608 mod p(x)` << 1, x^196672 mod p(x)` << 1 */
594+ { 0x000000018dc1d708, 0x00000001e4ece95a },
595+ /* x^195584 mod p(x)` << 1, x^195648 mod p(x)` << 1 */
596+ { 0x000000002d4620a4, 0x000000001a3ee918 },
597+ /* x^194560 mod p(x)` << 1, x^194624 mod p(x)` << 1 */
598+ { 0x0000000058fd1740, 0x000000007c652fb8 },
599+ /* x^193536 mod p(x)` << 1, x^193600 mod p(x)` << 1 */
600+ { 0x00000000dadd9bfc, 0x000000011c67842c },
601+ /* x^192512 mod p(x)` << 1, x^192576 mod p(x)` << 1 */
602+ { 0x00000001ea2140be, 0x00000000254f759c },
603+ /* x^191488 mod p(x)` << 1, x^191552 mod p(x)` << 1 */
604+ { 0x000000009de128ba, 0x000000007ece94ca },
605+ /* x^190464 mod p(x)` << 1, x^190528 mod p(x)` << 1 */
606+ { 0x000000013ac3aa8e, 0x0000000038f258c2 },
607+ /* x^189440 mod p(x)` << 1, x^189504 mod p(x)` << 1 */
608+ { 0x0000000099980562, 0x00000001cdf17b00 },
609+ /* x^188416 mod p(x)` << 1, x^188480 mod p(x)` << 1 */
610+ { 0x00000001c1579c86, 0x000000011f882c16 },
611+ /* x^187392 mod p(x)` << 1, x^187456 mod p(x)` << 1 */
612+ { 0x0000000068dbbf94, 0x0000000100093fc8 },
613+ /* x^186368 mod p(x)` << 1, x^186432 mod p(x)` << 1 */
614+ { 0x000000004509fb04, 0x00000001cd684f16 },
615+ /* x^185344 mod p(x)` << 1, x^185408 mod p(x)` << 1 */
616+ { 0x00000001202f6398, 0x000000004bc6a70a },
617+ /* x^184320 mod p(x)` << 1, x^184384 mod p(x)` << 1 */
618+ { 0x000000013aea243e, 0x000000004fc7e8e4 },
619+ /* x^183296 mod p(x)` << 1, x^183360 mod p(x)` << 1 */
620+ { 0x00000001b4052ae6, 0x0000000130103f1c },
621+ /* x^182272 mod p(x)` << 1, x^182336 mod p(x)` << 1 */
622+ { 0x00000001cd2a0ae8, 0x0000000111b0024c },
623+ /* x^181248 mod p(x)` << 1, x^181312 mod p(x)` << 1 */
624+ { 0x00000001fe4aa8b4, 0x000000010b3079da },
625+ /* x^180224 mod p(x)` << 1, x^180288 mod p(x)` << 1 */
626+ { 0x00000001d1559a42, 0x000000010192bcc2 },
627+ /* x^179200 mod p(x)` << 1, x^179264 mod p(x)` << 1 */
628+ { 0x00000001f3e05ecc, 0x0000000074838d50 },
629+ /* x^178176 mod p(x)` << 1, x^178240 mod p(x)` << 1 */
630+ { 0x0000000104ddd2cc, 0x000000001b20f520 },
631+ /* x^177152 mod p(x)` << 1, x^177216 mod p(x)` << 1 */
632+ { 0x000000015393153c, 0x0000000050c3590a },
633+ /* x^176128 mod p(x)` << 1, x^176192 mod p(x)` << 1 */
634+ { 0x0000000057e942c6, 0x00000000b41cac8e },
635+ /* x^175104 mod p(x)` << 1, x^175168 mod p(x)` << 1 */
636+ { 0x000000012c633850, 0x000000000c72cc78 },
637+ /* x^174080 mod p(x)` << 1, x^174144 mod p(x)` << 1 */
638+ { 0x00000000ebcaae4c, 0x0000000030cdb032 },
639+ /* x^173056 mod p(x)` << 1, x^173120 mod p(x)` << 1 */
640+ { 0x000000013ee532a6, 0x000000013e09fc32 },
641+ /* x^172032 mod p(x)` << 1, x^172096 mod p(x)` << 1 */
642+ { 0x00000001bf0cbc7e, 0x000000001ed624d2 },
643+ /* x^171008 mod p(x)` << 1, x^171072 mod p(x)` << 1 */
644+ { 0x00000000d50b7a5a, 0x00000000781aee1a },
645+ /* x^169984 mod p(x)` << 1, x^170048 mod p(x)` << 1 */
646+ { 0x0000000002fca6e8, 0x00000001c4d8348c },
647+ /* x^168960 mod p(x)` << 1, x^169024 mod p(x)` << 1 */
648+ { 0x000000007af40044, 0x0000000057a40336 },
649+ /* x^167936 mod p(x)` << 1, x^168000 mod p(x)` << 1 */
650+ { 0x0000000016178744, 0x0000000085544940 },
651+ /* x^166912 mod p(x)` << 1, x^166976 mod p(x)` << 1 */
652+ { 0x000000014c177458, 0x000000019cd21e80 },
653+ /* x^165888 mod p(x)` << 1, x^165952 mod p(x)` << 1 */
654+ { 0x000000011b6ddf04, 0x000000013eb95bc0 },
655+ /* x^164864 mod p(x)` << 1, x^164928 mod p(x)` << 1 */
656+ { 0x00000001f3e29ccc, 0x00000001dfc9fdfc },
657+ /* x^163840 mod p(x)` << 1, x^163904 mod p(x)` << 1 */
658+ { 0x0000000135ae7562, 0x00000000cd028bc2 },
659+ /* x^162816 mod p(x)` << 1, x^162880 mod p(x)` << 1 */
660+ { 0x0000000190ef812c, 0x0000000090db8c44 },
661+ /* x^161792 mod p(x)` << 1, x^161856 mod p(x)` << 1 */
662+ { 0x0000000067a2c786, 0x000000010010a4ce },
663+ /* x^160768 mod p(x)` << 1, x^160832 mod p(x)` << 1 */
664+ { 0x0000000048b9496c, 0x00000001c8f4c72c },
665+ /* x^159744 mod p(x)` << 1, x^159808 mod p(x)` << 1 */
666+ { 0x000000015a422de6, 0x000000001c26170c },
667+ /* x^158720 mod p(x)` << 1, x^158784 mod p(x)` << 1 */
668+ { 0x00000001ef0e3640, 0x00000000e3fccf68 },
669+ /* x^157696 mod p(x)` << 1, x^157760 mod p(x)` << 1 */
670+ { 0x00000001006d2d26, 0x00000000d513ed24 },
671+ /* x^156672 mod p(x)` << 1, x^156736 mod p(x)` << 1 */
672+ { 0x00000001170d56d6, 0x00000000141beada },
673+ /* x^155648 mod p(x)` << 1, x^155712 mod p(x)` << 1 */
674+ { 0x00000000a5fb613c, 0x000000011071aea0 },
675+ /* x^154624 mod p(x)` << 1, x^154688 mod p(x)` << 1 */
676+ { 0x0000000040bbf7fc, 0x000000012e19080a },
677+ /* x^153600 mod p(x)` << 1, x^153664 mod p(x)` << 1 */
678+ { 0x000000016ac3a5b2, 0x0000000100ecf826 },
679+ /* x^152576 mod p(x)` << 1, x^152640 mod p(x)` << 1 */
680+ { 0x00000000abf16230, 0x0000000069b09412 },
681+ /* x^151552 mod p(x)` << 1, x^151616 mod p(x)` << 1 */
682+ { 0x00000001ebe23fac, 0x0000000122297bac },
683+ /* x^150528 mod p(x)` << 1, x^150592 mod p(x)` << 1 */
684+ { 0x000000008b6a0894, 0x00000000e9e4b068 },
685+ /* x^149504 mod p(x)` << 1, x^149568 mod p(x)` << 1 */
686+ { 0x00000001288ea478, 0x000000004b38651a },
687+ /* x^148480 mod p(x)` << 1, x^148544 mod p(x)` << 1 */
688+ { 0x000000016619c442, 0x00000001468360e2 },
689+ /* x^147456 mod p(x)` << 1, x^147520 mod p(x)` << 1 */
690+ { 0x0000000086230038, 0x00000000121c2408 },
691+ /* x^146432 mod p(x)` << 1, x^146496 mod p(x)` << 1 */
692+ { 0x000000017746a756, 0x00000000da7e7d08 },
693+ /* x^145408 mod p(x)` << 1, x^145472 mod p(x)` << 1 */
694+ { 0x0000000191b8f8f8, 0x00000001058d7652 },
695+ /* x^144384 mod p(x)` << 1, x^144448 mod p(x)` << 1 */
696+ { 0x000000008e167708, 0x000000014a098a90 },
697+ /* x^143360 mod p(x)` << 1, x^143424 mod p(x)` << 1 */
698+ { 0x0000000148b22d54, 0x0000000020dbe72e },
699+ /* x^142336 mod p(x)` << 1, x^142400 mod p(x)` << 1 */
700+ { 0x0000000044ba2c3c, 0x000000011e7323e8 },
701+ /* x^141312 mod p(x)` << 1, x^141376 mod p(x)` << 1 */
702+ { 0x00000000b54d2b52, 0x00000000d5d4bf94 },
703+ /* x^140288 mod p(x)` << 1, x^140352 mod p(x)` << 1 */
704+ { 0x0000000005a4fd8a, 0x0000000199d8746c },
705+ /* x^139264 mod p(x)` << 1, x^139328 mod p(x)` << 1 */
706+ { 0x0000000139f9fc46, 0x00000000ce9ca8a0 },
707+ /* x^138240 mod p(x)` << 1, x^138304 mod p(x)` << 1 */
708+ { 0x000000015a1fa824, 0x00000000136edece },
709+ /* x^137216 mod p(x)` << 1, x^137280 mod p(x)` << 1 */
710+ { 0x000000000a61ae4c, 0x000000019b92a068 },
711+ /* x^136192 mod p(x)` << 1, x^136256 mod p(x)` << 1 */
712+ { 0x0000000145e9113e, 0x0000000071d62206 },
713+ /* x^135168 mod p(x)` << 1, x^135232 mod p(x)` << 1 */
714+ { 0x000000006a348448, 0x00000000dfc50158 },
715+ /* x^134144 mod p(x)` << 1, x^134208 mod p(x)` << 1 */
716+ { 0x000000004d80a08c, 0x00000001517626bc },
717+ /* x^133120 mod p(x)` << 1, x^133184 mod p(x)` << 1 */
718+ { 0x000000014b6837a0, 0x0000000148d1e4fa },
719+ /* x^132096 mod p(x)` << 1, x^132160 mod p(x)` << 1 */
720+ { 0x000000016896a7fc, 0x0000000094d8266e },
721+ /* x^131072 mod p(x)` << 1, x^131136 mod p(x)` << 1 */
722+ { 0x000000014f187140, 0x00000000606c5e34 },
723+ /* x^130048 mod p(x)` << 1, x^130112 mod p(x)` << 1 */
724+ { 0x000000019581b9da, 0x000000019766beaa },
725+ /* x^129024 mod p(x)` << 1, x^129088 mod p(x)` << 1 */
726+ { 0x00000001091bc984, 0x00000001d80c506c },
727+ /* x^128000 mod p(x)` << 1, x^128064 mod p(x)` << 1 */
728+ { 0x000000001067223c, 0x000000001e73837c },
729+ /* x^126976 mod p(x)` << 1, x^127040 mod p(x)` << 1 */
730+ { 0x00000001ab16ea02, 0x0000000064d587de },
731+ /* x^125952 mod p(x)` << 1, x^126016 mod p(x)` << 1 */
732+ { 0x000000013c4598a8, 0x00000000f4a507b0 },
733+ /* x^124928 mod p(x)` << 1, x^124992 mod p(x)` << 1 */
734+ { 0x00000000b3735430, 0x0000000040e342fc },
735+ /* x^123904 mod p(x)` << 1, x^123968 mod p(x)` << 1 */
736+ { 0x00000001bb3fc0c0, 0x00000001d5ad9c3a },
737+ /* x^122880 mod p(x)` << 1, x^122944 mod p(x)` << 1 */
738+ { 0x00000001570ae19c, 0x0000000094a691a4 },
739+ /* x^121856 mod p(x)` << 1, x^121920 mod p(x)` << 1 */
740+ { 0x00000001ea910712, 0x00000001271ecdfa },
741+ /* x^120832 mod p(x)` << 1, x^120896 mod p(x)` << 1 */
742+ { 0x0000000167127128, 0x000000009e54475a },
743+ /* x^119808 mod p(x)` << 1, x^119872 mod p(x)` << 1 */
744+ { 0x0000000019e790a2, 0x00000000c9c099ee },
745+ /* x^118784 mod p(x)` << 1, x^118848 mod p(x)` << 1 */
746+ { 0x000000003788f710, 0x000000009a2f736c },
747+ /* x^117760 mod p(x)` << 1, x^117824 mod p(x)` << 1 */
748+ { 0x00000001682a160e, 0x00000000bb9f4996 },
749+ /* x^116736 mod p(x)` << 1, x^116800 mod p(x)` << 1 */
750+ { 0x000000007f0ebd2e, 0x00000001db688050 },
751+ /* x^115712 mod p(x)` << 1, x^115776 mod p(x)` << 1 */
752+ { 0x000000002b032080, 0x00000000e9b10af4 },
753+ /* x^114688 mod p(x)` << 1, x^114752 mod p(x)` << 1 */
754+ { 0x00000000cfd1664a, 0x000000012d4545e4 },
755+ /* x^113664 mod p(x)` << 1, x^113728 mod p(x)` << 1 */
756+ { 0x00000000aa1181c2, 0x000000000361139c },
757+ /* x^112640 mod p(x)` << 1, x^112704 mod p(x)` << 1 */
758+ { 0x00000000ddd08002, 0x00000001a5a1a3a8 },
759+ /* x^111616 mod p(x)` << 1, x^111680 mod p(x)` << 1 */
760+ { 0x00000000e8dd0446, 0x000000006844e0b0 },
761+ /* x^110592 mod p(x)` << 1, x^110656 mod p(x)` << 1 */
762+ { 0x00000001bbd94a00, 0x00000000c3762f28 },
763+ /* x^109568 mod p(x)` << 1, x^109632 mod p(x)` << 1 */
764+ { 0x00000000ab6cd180, 0x00000001d26287a2 },
765+ /* x^108544 mod p(x)` << 1, x^108608 mod p(x)` << 1 */
766+ { 0x0000000031803ce2, 0x00000001f6f0bba8 },
767+ /* x^107520 mod p(x)` << 1, x^107584 mod p(x)` << 1 */
768+ { 0x0000000024f40b0c, 0x000000002ffabd62 },
769+ /* x^106496 mod p(x)` << 1, x^106560 mod p(x)` << 1 */
770+ { 0x00000001ba1d9834, 0x00000000fb4516b8 },
771+ /* x^105472 mod p(x)` << 1, x^105536 mod p(x)` << 1 */
772+ { 0x0000000104de61aa, 0x000000018cfa961c },
773+ /* x^104448 mod p(x)` << 1, x^104512 mod p(x)` << 1 */
774+ { 0x0000000113e40d46, 0x000000019e588d52 },
775+ /* x^103424 mod p(x)` << 1, x^103488 mod p(x)` << 1 */
776+ { 0x00000001415598a0, 0x00000001180f0bbc },
777+ /* x^102400 mod p(x)` << 1, x^102464 mod p(x)` << 1 */
778+ { 0x00000000bf6c8c90, 0x00000000e1d9177a },
779+ /* x^101376 mod p(x)` << 1, x^101440 mod p(x)` << 1 */
780+ { 0x00000001788b0504, 0x0000000105abc27c },
781+ /* x^100352 mod p(x)` << 1, x^100416 mod p(x)` << 1 */
782+ { 0x0000000038385d02, 0x00000000972e4a58 },
783+ /* x^99328 mod p(x)` << 1, x^99392 mod p(x)` << 1 */
784+ { 0x00000001b6c83844, 0x0000000183499a5e },
785+ /* x^98304 mod p(x)` << 1, x^98368 mod p(x)` << 1 */
786+ { 0x0000000051061a8a, 0x00000001c96a8cca },
787+ /* x^97280 mod p(x)` << 1, x^97344 mod p(x)` << 1 */
788+ { 0x000000017351388a, 0x00000001a1a5b60c },
789+ /* x^96256 mod p(x)` << 1, x^96320 mod p(x)` << 1 */
790+ { 0x0000000132928f92, 0x00000000e4b6ac9c },
791+ /* x^95232 mod p(x)` << 1, x^95296 mod p(x)` << 1 */
792+ { 0x00000000e6b4f48a, 0x00000001807e7f5a },
793+ /* x^94208 mod p(x)` << 1, x^94272 mod p(x)` << 1 */
794+ { 0x0000000039d15e90, 0x000000017a7e3bc8 },
795+ /* x^93184 mod p(x)` << 1, x^93248 mod p(x)` << 1 */
796+ { 0x00000000312d6074, 0x00000000d73975da },
797+ /* x^92160 mod p(x)` << 1, x^92224 mod p(x)` << 1 */
798+ { 0x000000017bbb2cc4, 0x000000017375d038 },
799+ /* x^91136 mod p(x)` << 1, x^91200 mod p(x)` << 1 */
800+ { 0x000000016ded3e18, 0x00000000193680bc },
801+ /* x^90112 mod p(x)` << 1, x^90176 mod p(x)` << 1 */
802+ { 0x00000000f1638b16, 0x00000000999b06f6 },
803+ /* x^89088 mod p(x)` << 1, x^89152 mod p(x)` << 1 */
804+ { 0x00000001d38b9ecc, 0x00000001f685d2b8 },
805+ /* x^88064 mod p(x)` << 1, x^88128 mod p(x)` << 1 */
806+ { 0x000000018b8d09dc, 0x00000001f4ecbed2 },
807+ /* x^87040 mod p(x)` << 1, x^87104 mod p(x)` << 1 */
808+ { 0x00000000e7bc27d2, 0x00000000ba16f1a0 },
809+ /* x^86016 mod p(x)` << 1, x^86080 mod p(x)` << 1 */
810+ { 0x00000000275e1e96, 0x0000000115aceac4 },
811+ /* x^84992 mod p(x)` << 1, x^85056 mod p(x)` << 1 */
812+ { 0x00000000e2e3031e, 0x00000001aeff6292 },
813+ /* x^83968 mod p(x)` << 1, x^84032 mod p(x)` << 1 */
814+ { 0x00000001041c84d8, 0x000000009640124c },
815+ /* x^82944 mod p(x)` << 1, x^83008 mod p(x)` << 1 */
816+ { 0x00000000706ce672, 0x0000000114f41f02 },
817+ /* x^81920 mod p(x)` << 1, x^81984 mod p(x)` << 1 */
818+ { 0x000000015d5070da, 0x000000009c5f3586 },
819+ /* x^80896 mod p(x)` << 1, x^80960 mod p(x)` << 1 */
820+ { 0x0000000038f9493a, 0x00000001878275fa },
821+ /* x^79872 mod p(x)` << 1, x^79936 mod p(x)` << 1 */
822+ { 0x00000000a3348a76, 0x00000000ddc42ce8 },
823+ /* x^78848 mod p(x)` << 1, x^78912 mod p(x)` << 1 */
824+ { 0x00000001ad0aab92, 0x0000000181d2c73a },
825+ /* x^77824 mod p(x)` << 1, x^77888 mod p(x)` << 1 */
826+ { 0x000000019e85f712, 0x0000000141c9320a },
827+ /* x^76800 mod p(x)` << 1, x^76864 mod p(x)` << 1 */
828+ { 0x000000005a871e76, 0x000000015235719a },
829+ /* x^75776 mod p(x)` << 1, x^75840 mod p(x)` << 1 */
830+ { 0x000000017249c662, 0x00000000be27d804 },
831+ /* x^74752 mod p(x)` << 1, x^74816 mod p(x)` << 1 */
832+ { 0x000000003a084712, 0x000000006242d45a },
833+ /* x^73728 mod p(x)` << 1, x^73792 mod p(x)` << 1 */
834+ { 0x00000000ed438478, 0x000000009a53638e },
835+ /* x^72704 mod p(x)` << 1, x^72768 mod p(x)` << 1 */
836+ { 0x00000000abac34cc, 0x00000001001ecfb6 },
837+ /* x^71680 mod p(x)` << 1, x^71744 mod p(x)` << 1 */
838+ { 0x000000005f35ef3e, 0x000000016d7c2d64 },
839+ /* x^70656 mod p(x)` << 1, x^70720 mod p(x)` << 1 */
840+ { 0x0000000047d6608c, 0x00000001d0ce46c0 },
841+ /* x^69632 mod p(x)` << 1, x^69696 mod p(x)` << 1 */
842+ { 0x000000002d01470e, 0x0000000124c907b4 },
843+ /* x^68608 mod p(x)` << 1, x^68672 mod p(x)` << 1 */
844+ { 0x0000000158bbc7b0, 0x0000000018a555ca },
845+ /* x^67584 mod p(x)` << 1, x^67648 mod p(x)` << 1 */
846+ { 0x00000000c0a23e8e, 0x000000006b0980bc },
847+ /* x^66560 mod p(x)` << 1, x^66624 mod p(x)` << 1 */
848+ { 0x00000001ebd85c88, 0x000000008bbba964 },
849+ /* x^65536 mod p(x)` << 1, x^65600 mod p(x)` << 1 */
850+ { 0x000000019ee20bb2, 0x00000001070a5a1e },
851+ /* x^64512 mod p(x)` << 1, x^64576 mod p(x)` << 1 */
852+ { 0x00000001acabf2d6, 0x000000002204322a },
853+ /* x^63488 mod p(x)` << 1, x^63552 mod p(x)` << 1 */
854+ { 0x00000001b7963d56, 0x00000000a27524d0 },
855+ /* x^62464 mod p(x)` << 1, x^62528 mod p(x)` << 1 */
856+ { 0x000000017bffa1fe, 0x0000000020b1e4ba },
857+ /* x^61440 mod p(x)` << 1, x^61504 mod p(x)` << 1 */
858+ { 0x000000001f15333e, 0x0000000032cc27fc },
859+ /* x^60416 mod p(x)` << 1, x^60480 mod p(x)` << 1 */
860+ { 0x000000018593129e, 0x0000000044dd22b8 },
861+ /* x^59392 mod p(x)` << 1, x^59456 mod p(x)` << 1 */
862+ { 0x000000019cb32602, 0x00000000dffc9e0a },
863+ /* x^58368 mod p(x)` << 1, x^58432 mod p(x)` << 1 */
864+ { 0x0000000142b05cc8, 0x00000001b7a0ed14 },
865+ /* x^57344 mod p(x)` << 1, x^57408 mod p(x)` << 1 */
866+ { 0x00000001be49e7a4, 0x00000000c7842488 },
867+ /* x^56320 mod p(x)` << 1, x^56384 mod p(x)` << 1 */
868+ { 0x0000000108f69d6c, 0x00000001c02a4fee },
869+ /* x^55296 mod p(x)` << 1, x^55360 mod p(x)` << 1 */
870+ { 0x000000006c0971f0, 0x000000003c273778 },
871+ /* x^54272 mod p(x)` << 1, x^54336 mod p(x)` << 1 */
872+ { 0x000000005b16467a, 0x00000001d63f8894 },
873+ /* x^53248 mod p(x)` << 1, x^53312 mod p(x)` << 1 */
874+ { 0x00000001551a628e, 0x000000006be557d6 },
875+ /* x^52224 mod p(x)` << 1, x^52288 mod p(x)` << 1 */
876+ { 0x000000019e42ea92, 0x000000006a7806ea },
877+ /* x^51200 mod p(x)` << 1, x^51264 mod p(x)` << 1 */
878+ { 0x000000012fa83ff2, 0x000000016155aa0c },
879+ /* x^50176 mod p(x)` << 1, x^50240 mod p(x)` << 1 */
880+ { 0x000000011ca9cde0, 0x00000000908650ac },
881+ /* x^49152 mod p(x)` << 1, x^49216 mod p(x)` << 1 */
882+ { 0x00000000c8e5cd74, 0x00000000aa5a8084 },
883+ /* x^48128 mod p(x)` << 1, x^48192 mod p(x)` << 1 */
884+ { 0x0000000096c27f0c, 0x0000000191bb500a },
885+ /* x^47104 mod p(x)` << 1, x^47168 mod p(x)` << 1 */
886+ { 0x000000002baed926, 0x0000000064e9bed0 },
887+ /* x^46080 mod p(x)` << 1, x^46144 mod p(x)` << 1 */
888+ { 0x000000017c8de8d2, 0x000000009444f302 },
889+ /* x^45056 mod p(x)` << 1, x^45120 mod p(x)` << 1 */
890+ { 0x00000000d43d6068, 0x000000019db07d3c },
891+ /* x^44032 mod p(x)` << 1, x^44096 mod p(x)` << 1 */
892+ { 0x00000000cb2c4b26, 0x00000001359e3e6e },
893+ /* x^43008 mod p(x)` << 1, x^43072 mod p(x)` << 1 */
894+ { 0x0000000145b8da26, 0x00000001e4f10dd2 },
895+ /* x^41984 mod p(x)` << 1, x^42048 mod p(x)` << 1 */
896+ { 0x000000018fff4b08, 0x0000000124f5735e },
897+ /* x^40960 mod p(x)` << 1, x^41024 mod p(x)` << 1 */
898+ { 0x0000000150b58ed0, 0x0000000124760a4c },
899+ /* x^39936 mod p(x)` << 1, x^40000 mod p(x)` << 1 */
900+ { 0x00000001549f39bc, 0x000000000f1fc186 },
901+ /* x^38912 mod p(x)` << 1, x^38976 mod p(x)` << 1 */
902+ { 0x00000000ef4d2f42, 0x00000000150e4cc4 },
903+ /* x^37888 mod p(x)` << 1, x^37952 mod p(x)` << 1 */
904+ { 0x00000001b1468572, 0x000000002a6204e8 },
905+ /* x^36864 mod p(x)` << 1, x^36928 mod p(x)` << 1 */
906+ { 0x000000013d7403b2, 0x00000000beb1d432 },
907+ /* x^35840 mod p(x)` << 1, x^35904 mod p(x)` << 1 */
908+ { 0x00000001a4681842, 0x0000000135f3f1f0 },
909+ /* x^34816 mod p(x)` << 1, x^34880 mod p(x)` << 1 */
910+ { 0x0000000167714492, 0x0000000074fe2232 },
911+ /* x^33792 mod p(x)` << 1, x^33856 mod p(x)` << 1 */
912+ { 0x00000001e599099a, 0x000000001ac6e2ba },
913+ /* x^32768 mod p(x)` << 1, x^32832 mod p(x)` << 1 */
914+ { 0x00000000fe128194, 0x0000000013fca91e },
915+ /* x^31744 mod p(x)` << 1, x^31808 mod p(x)` << 1 */
916+ { 0x0000000077e8b990, 0x0000000183f4931e },
917+ /* x^30720 mod p(x)` << 1, x^30784 mod p(x)` << 1 */
918+ { 0x00000001a267f63a, 0x00000000b6d9b4e4 },
919+ /* x^29696 mod p(x)` << 1, x^29760 mod p(x)` << 1 */
920+ { 0x00000001945c245a, 0x00000000b5188656 },
921+ /* x^28672 mod p(x)` << 1, x^28736 mod p(x)` << 1 */
922+ { 0x0000000149002e76, 0x0000000027a81a84 },
923+ /* x^27648 mod p(x)` << 1, x^27712 mod p(x)` << 1 */
924+ { 0x00000001bb8310a4, 0x0000000125699258 },
925+ /* x^26624 mod p(x)` << 1, x^26688 mod p(x)` << 1 */
926+ { 0x000000019ec60bcc, 0x00000001b23de796 },
927+ /* x^25600 mod p(x)` << 1, x^25664 mod p(x)` << 1 */
928+ { 0x000000012d8590ae, 0x00000000fe4365dc },
929+ /* x^24576 mod p(x)` << 1, x^24640 mod p(x)` << 1 */
930+ { 0x0000000065b00684, 0x00000000c68f497a },
931+ /* x^23552 mod p(x)` << 1, x^23616 mod p(x)` << 1 */
932+ { 0x000000015e5aeadc, 0x00000000fbf521ee },
933+ /* x^22528 mod p(x)` << 1, x^22592 mod p(x)` << 1 */
934+ { 0x00000000b77ff2b0, 0x000000015eac3378 },
935+ /* x^21504 mod p(x)` << 1, x^21568 mod p(x)` << 1 */
936+ { 0x0000000188da2ff6, 0x0000000134914b90 },
937+ /* x^20480 mod p(x)` << 1, x^20544 mod p(x)` << 1 */
938+ { 0x0000000063da929a, 0x0000000016335cfe },
939+ /* x^19456 mod p(x)` << 1, x^19520 mod p(x)` << 1 */
940+ { 0x00000001389caa80, 0x000000010372d10c },
941+ /* x^18432 mod p(x)` << 1, x^18496 mod p(x)` << 1 */
942+ { 0x000000013db599d2, 0x000000015097b908 },
943+ /* x^17408 mod p(x)` << 1, x^17472 mod p(x)` << 1 */
944+ { 0x0000000122505a86, 0x00000001227a7572 },
945+ /* x^16384 mod p(x)` << 1, x^16448 mod p(x)` << 1 */
946+ { 0x000000016bd72746, 0x000000009a8f75c0 },
947+ /* x^15360 mod p(x)` << 1, x^15424 mod p(x)` << 1 */
948+ { 0x00000001c3faf1d4, 0x00000000682c77a2 },
949+ /* x^14336 mod p(x)` << 1, x^14400 mod p(x)` << 1 */
950+ { 0x00000001111c826c, 0x00000000231f091c },
951+ /* x^13312 mod p(x)` << 1, x^13376 mod p(x)` << 1 */
952+ { 0x00000000153e9fb2, 0x000000007d4439f2 },
953+ /* x^12288 mod p(x)` << 1, x^12352 mod p(x)` << 1 */
954+ { 0x000000002b1f7b60, 0x000000017e221efc },
955+ /* x^11264 mod p(x)` << 1, x^11328 mod p(x)` << 1 */
956+ { 0x00000000b1dba570, 0x0000000167457c38 },
957+ /* x^10240 mod p(x)` << 1, x^10304 mod p(x)` << 1 */
958+ { 0x00000001f6397b76, 0x00000000bdf081c4 },
959+ /* x^9216 mod p(x)` << 1, x^9280 mod p(x)` << 1 */
960+ { 0x0000000156335214, 0x000000016286d6b0 },
961+ /* x^8192 mod p(x)` << 1, x^8256 mod p(x)` << 1 */
962+ { 0x00000001d70e3986, 0x00000000c84f001c },
963+ /* x^7168 mod p(x)` << 1, x^7232 mod p(x)` << 1 */
964+ { 0x000000003701a774, 0x0000000064efe7c0 },
965+ /* x^6144 mod p(x)` << 1, x^6208 mod p(x)` << 1 */
966+ { 0x00000000ac81ef72, 0x000000000ac2d904 },
967+ /* x^5120 mod p(x)` << 1, x^5184 mod p(x)` << 1 */
968+ { 0x0000000133212464, 0x00000000fd226d14 },
969+ /* x^4096 mod p(x)` << 1, x^4160 mod p(x)` << 1 */
970+ { 0x00000000e4e45610, 0x000000011cfd42e0 },
971+ /* x^3072 mod p(x)` << 1, x^3136 mod p(x)` << 1 */
972+ { 0x000000000c1bd370, 0x000000016e5a5678 },
973+ /* x^2048 mod p(x)` << 1, x^2112 mod p(x)` << 1 */
974+ { 0x00000001a7b9e7a6, 0x00000001d888fe22 },
975+ /* x^1024 mod p(x)` << 1, x^1088 mod p(x)` << 1 */
976+ { 0x000000007d657a10, 0x00000001af77fcd4 }
977+#else /* __LITTLE_ENDIAN__ */
978+ /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
979+ { 0x00000001651797d2, 0x0000000099ea94a8 },
980+ /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
981+ { 0x0000000021e0d56c, 0x00000000945a8420 },
982+ /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
983+ { 0x000000000f95ecaa, 0x0000000030762706 },
984+ /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
985+ { 0x00000001ebd224ac, 0x00000001a52fc582 },
986+ /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
987+ { 0x000000000ccb97ca, 0x00000001a4a7167a },
988+ /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
989+ { 0x00000001006ec8a8, 0x000000000c18249a },
990+ /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
991+ { 0x000000014f58f196, 0x00000000a924ae7c },
992+ /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
993+ { 0x00000001a7192ca6, 0x00000001e12ccc12 },
994+ /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
995+ { 0x000000019a64bab2, 0x00000000a0b9d4ac },
996+ /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
997+ { 0x0000000014f4ed2e, 0x0000000095e8ddfe },
998+ /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
999+ { 0x000000011092b6a2, 0x00000000233fddc4 },
1000+ /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
1001+ { 0x00000000c8a1629c, 0x00000001b4529b62 },
1002+ /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
1003+ { 0x000000017bf32e8e, 0x00000001a7fa0e64 },
1004+ /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 */
1005+ { 0x00000001f8cc6582, 0x00000001b5334592 },
1006+ /* x^246784 mod p(x)` << 1, x^246848 mod p(x)` << 1 */
1007+ { 0x000000008631ddf0, 0x000000011f8ee1b4 },
1008+ /* x^245760 mod p(x)` << 1, x^245824 mod p(x)` << 1 */
1009+ { 0x000000007e5a76d0, 0x000000006252e632 },
1010+ /* x^244736 mod p(x)` << 1, x^244800 mod p(x)` << 1 */
1011+ { 0x000000002b09b31c, 0x00000000ab973e84 },
1012+ /* x^243712 mod p(x)` << 1, x^243776 mod p(x)` << 1 */
1013+ { 0x00000001b2df1f84, 0x000000007734f5ec },
1014+ /* x^242688 mod p(x)` << 1, x^242752 mod p(x)` << 1 */
1015+ { 0x00000001d6f56afc, 0x000000007c547798 },
1016+ /* x^241664 mod p(x)` << 1, x^241728 mod p(x)` << 1 */
1017+ { 0x00000001b9b5e70c, 0x000000007ec40210 },
1018+ /* x^240640 mod p(x)` << 1, x^240704 mod p(x)` << 1 */
1019+ { 0x0000000034b626d2, 0x00000001ab1695a8 },
1020+ /* x^239616 mod p(x)` << 1, x^239680 mod p(x)` << 1 */
1021+ { 0x000000014c53479a, 0x0000000090494bba },
1022+ /* x^238592 mod p(x)` << 1, x^238656 mod p(x)` << 1 */
1023+ { 0x00000001a6d179a4, 0x00000001123fb816 },
1024+ /* x^237568 mod p(x)` << 1, x^237632 mod p(x)` << 1 */
1025+ { 0x000000015abd16b4, 0x00000001e188c74c },
1026+ /* x^236544 mod p(x)` << 1, x^236608 mod p(x)` << 1 */
1027+ { 0x00000000018f9852, 0x00000001c2d3451c },
1028+ /* x^235520 mod p(x)` << 1, x^235584 mod p(x)` << 1 */
1029+ { 0x000000001fb3084a, 0x00000000f55cf1ca },
1030+ /* x^234496 mod p(x)` << 1, x^234560 mod p(x)` << 1 */
1031+ { 0x00000000c53dfb04, 0x00000001a0531540 },
1032+ /* x^233472 mod p(x)` << 1, x^233536 mod p(x)` << 1 */
1033+ { 0x00000000e10c9ad6, 0x0000000132cd7ebc },
1034+ /* x^232448 mod p(x)` << 1, x^232512 mod p(x)` << 1 */
1035+ { 0x0000000025aa994a, 0x0000000073ab7f36 },
1036+ /* x^231424 mod p(x)` << 1, x^231488 mod p(x)` << 1 */
1037+ { 0x00000000fa3a74c4, 0x0000000041aed1c2 },
1038+ /* x^230400 mod p(x)` << 1, x^230464 mod p(x)` << 1 */
1039+ { 0x0000000033eb3f40, 0x0000000136c53800 },
1040+ /* x^229376 mod p(x)` << 1, x^229440 mod p(x)` << 1 */
1041+ { 0x000000017193f296, 0x0000000126835a30 },
1042+ /* x^228352 mod p(x)` << 1, x^228416 mod p(x)` << 1 */
1043+ { 0x0000000043f6c86a, 0x000000006241b502 },
1044+ /* x^227328 mod p(x)` << 1, x^227392 mod p(x)` << 1 */
1045+ { 0x000000016b513ec6, 0x00000000d5196ad4 },
1046+ /* x^226304 mod p(x)` << 1, x^226368 mod p(x)` << 1 */
1047+ { 0x00000000c8f25b4e, 0x000000009cfa769a },
1048+ /* x^225280 mod p(x)` << 1, x^225344 mod p(x)` << 1 */
1049+ { 0x00000001a45048ec, 0x00000000920e5df4 },
1050+ /* x^224256 mod p(x)` << 1, x^224320 mod p(x)` << 1 */
1051+ { 0x000000000c441004, 0x0000000169dc310e },
1052+ /* x^223232 mod p(x)` << 1, x^223296 mod p(x)` << 1 */
1053+ { 0x000000000e17cad6, 0x0000000009fc331c },
1054+ /* x^222208 mod p(x)` << 1, x^222272 mod p(x)` << 1 */
1055+ { 0x00000001253ae964, 0x000000010d94a81e },
1056+ /* x^221184 mod p(x)` << 1, x^221248 mod p(x)` << 1 */
1057+ { 0x00000001d7c88ebc, 0x0000000027a20ab2 },
1058+ /* x^220160 mod p(x)` << 1, x^220224 mod p(x)` << 1 */
1059+ { 0x00000001e7ca913a, 0x0000000114f87504 },
1060+ /* x^219136 mod p(x)` << 1, x^219200 mod p(x)` << 1 */
1061+ { 0x0000000033ed078a, 0x000000004b076d96 },
1062+ /* x^218112 mod p(x)` << 1, x^218176 mod p(x)` << 1 */
1063+ { 0x00000000e1839c78, 0x00000000da4d1e74 },
1064+ /* x^217088 mod p(x)` << 1, x^217152 mod p(x)` << 1 */
1065+ { 0x00000001322b267e, 0x000000001b81f672 },
1066+ /* x^216064 mod p(x)` << 1, x^216128 mod p(x)` << 1 */
1067+ { 0x00000000638231b6, 0x000000009367c988 },
1068+ /* x^215040 mod p(x)` << 1, x^215104 mod p(x)` << 1 */
1069+ { 0x00000001ee7f16f4, 0x00000001717214ca },
1070+ /* x^214016 mod p(x)` << 1, x^214080 mod p(x)` << 1 */
1071+ { 0x0000000117d9924a, 0x000000009f47d820 },
1072+ /* x^212992 mod p(x)` << 1, x^213056 mod p(x)` << 1 */
1073+ { 0x00000000e1a9e0c4, 0x000000010d9a47d2 },
1074+ /* x^211968 mod p(x)` << 1, x^212032 mod p(x)` << 1 */
1075+ { 0x00000001403731dc, 0x00000000a696c58c },
1076+ /* x^210944 mod p(x)` << 1, x^211008 mod p(x)` << 1 */
1077+ { 0x00000001a5ea9682, 0x000000002aa28ec6 },
1078+ /* x^209920 mod p(x)` << 1, x^209984 mod p(x)` << 1 */
1079+ { 0x0000000101c5c578, 0x00000001fe18fd9a },
1080+ /* x^208896 mod p(x)` << 1, x^208960 mod p(x)` << 1 */
1081+ { 0x00000000dddf6494, 0x000000019d4fc1ae },
1082+ /* x^207872 mod p(x)` << 1, x^207936 mod p(x)` << 1 */
1083+ { 0x00000000f1c3db28, 0x00000001ba0e3dea },
1084+ /* x^206848 mod p(x)` << 1, x^206912 mod p(x)` << 1 */
1085+ { 0x000000013112fb9c, 0x0000000074b59a5e },
1086+ /* x^205824 mod p(x)` << 1, x^205888 mod p(x)` << 1 */
1087+ { 0x00000000b680b906, 0x00000000f2b5ea98 },
1088+ /* x^204800 mod p(x)` << 1, x^204864 mod p(x)` << 1 */
1089+ { 0x000000001a282932, 0x0000000187132676 },
1090+ /* x^203776 mod p(x)` << 1, x^203840 mod p(x)` << 1 */
1091+ { 0x0000000089406e7e, 0x000000010a8c6ad4 },
1092+ /* x^202752 mod p(x)` << 1, x^202816 mod p(x)` << 1 */
1093+ { 0x00000001def6be8c, 0x00000001e21dfe70 },
1094+ /* x^201728 mod p(x)` << 1, x^201792 mod p(x)` << 1 */
1095+ { 0x0000000075258728, 0x00000001da0050e4 },
1096+ /* x^200704 mod p(x)` << 1, x^200768 mod p(x)` << 1 */
1097+ { 0x000000019536090a, 0x00000000772172ae },
1098+ /* x^199680 mod p(x)` << 1, x^199744 mod p(x)` << 1 */
1099+ { 0x00000000f2455bfc, 0x00000000e47724aa },
1100+ /* x^198656 mod p(x)` << 1, x^198720 mod p(x)` << 1 */
1101+ { 0x000000018c40baf4, 0x000000003cd63ac4 },
1102+ /* x^197632 mod p(x)` << 1, x^197696 mod p(x)` << 1 */
1103+ { 0x000000004cd390d4, 0x00000001bf47d352 },
1104+ /* x^196608 mod p(x)` << 1, x^196672 mod p(x)` << 1 */
1105+ { 0x00000001e4ece95a, 0x000000018dc1d708 },
1106+ /* x^195584 mod p(x)` << 1, x^195648 mod p(x)` << 1 */
1107+ { 0x000000001a3ee918, 0x000000002d4620a4 },
1108+ /* x^194560 mod p(x)` << 1, x^194624 mod p(x)` << 1 */
1109+ { 0x000000007c652fb8, 0x0000000058fd1740 },
1110+ /* x^193536 mod p(x)` << 1, x^193600 mod p(x)` << 1 */
1111+ { 0x000000011c67842c, 0x00000000dadd9bfc },
1112+ /* x^192512 mod p(x)` << 1, x^192576 mod p(x)` << 1 */
1113+ { 0x00000000254f759c, 0x00000001ea2140be },
1114+ /* x^191488 mod p(x)` << 1, x^191552 mod p(x)` << 1 */
1115+ { 0x000000007ece94ca, 0x000000009de128ba },
1116+ /* x^190464 mod p(x)` << 1, x^190528 mod p(x)` << 1 */
1117+ { 0x0000000038f258c2, 0x000000013ac3aa8e },
1118+ /* x^189440 mod p(x)` << 1, x^189504 mod p(x)` << 1 */
1119+ { 0x00000001cdf17b00, 0x0000000099980562 },
1120+ /* x^188416 mod p(x)` << 1, x^188480 mod p(x)` << 1 */
1121+ { 0x000000011f882c16, 0x00000001c1579c86 },
1122+ /* x^187392 mod p(x)` << 1, x^187456 mod p(x)` << 1 */
1123+ { 0x0000000100093fc8, 0x0000000068dbbf94 },
1124+ /* x^186368 mod p(x)` << 1, x^186432 mod p(x)` << 1 */
1125+ { 0x00000001cd684f16, 0x000000004509fb04 },
1126+ /* x^185344 mod p(x)` << 1, x^185408 mod p(x)` << 1 */
1127+ { 0x000000004bc6a70a, 0x00000001202f6398 },
1128+ /* x^184320 mod p(x)` << 1, x^184384 mod p(x)` << 1 */
1129+ { 0x000000004fc7e8e4, 0x000000013aea243e },
1130+ /* x^183296 mod p(x)` << 1, x^183360 mod p(x)` << 1 */
1131+ { 0x0000000130103f1c, 0x00000001b4052ae6 },
1132+ /* x^182272 mod p(x)` << 1, x^182336 mod p(x)` << 1 */
1133+ { 0x0000000111b0024c, 0x00000001cd2a0ae8 },
1134+ /* x^181248 mod p(x)` << 1, x^181312 mod p(x)` << 1 */
1135+ { 0x000000010b3079da, 0x00000001fe4aa8b4 },
1136+ /* x^180224 mod p(x)` << 1, x^180288 mod p(x)` << 1 */
1137+ { 0x000000010192bcc2, 0x00000001d1559a42 },
1138+ /* x^179200 mod p(x)` << 1, x^179264 mod p(x)` << 1 */
1139+ { 0x0000000074838d50, 0x00000001f3e05ecc },
1140+ /* x^178176 mod p(x)` << 1, x^178240 mod p(x)` << 1 */
1141+ { 0x000000001b20f520, 0x0000000104ddd2cc },
1142+ /* x^177152 mod p(x)` << 1, x^177216 mod p(x)` << 1 */
1143+ { 0x0000000050c3590a, 0x000000015393153c },
1144+ /* x^176128 mod p(x)` << 1, x^176192 mod p(x)` << 1 */
1145+ { 0x00000000b41cac8e, 0x0000000057e942c6 },
1146+ /* x^175104 mod p(x)` << 1, x^175168 mod p(x)` << 1 */
1147+ { 0x000000000c72cc78, 0x000000012c633850 },
1148+ /* x^174080 mod p(x)` << 1, x^174144 mod p(x)` << 1 */
1149+ { 0x0000000030cdb032, 0x00000000ebcaae4c },
1150+ /* x^173056 mod p(x)` << 1, x^173120 mod p(x)` << 1 */
1151+ { 0x000000013e09fc32, 0x000000013ee532a6 },
1152+ /* x^172032 mod p(x)` << 1, x^172096 mod p(x)` << 1 */
1153+ { 0x000000001ed624d2, 0x00000001bf0cbc7e },
1154+ /* x^171008 mod p(x)` << 1, x^171072 mod p(x)` << 1 */
1155+ { 0x00000000781aee1a, 0x00000000d50b7a5a },
1156+ /* x^169984 mod p(x)` << 1, x^170048 mod p(x)` << 1 */
1157+ { 0x00000001c4d8348c, 0x0000000002fca6e8 },
1158+ /* x^168960 mod p(x)` << 1, x^169024 mod p(x)` << 1 */
1159+ { 0x0000000057a40336, 0x000000007af40044 },
1160+ /* x^167936 mod p(x)` << 1, x^168000 mod p(x)` << 1 */
1161+ { 0x0000000085544940, 0x0000000016178744 },
1162+ /* x^166912 mod p(x)` << 1, x^166976 mod p(x)` << 1 */
1163+ { 0x000000019cd21e80, 0x000000014c177458 },
1164+ /* x^165888 mod p(x)` << 1, x^165952 mod p(x)` << 1 */
1165+ { 0x000000013eb95bc0, 0x000000011b6ddf04 },
1166+ /* x^164864 mod p(x)` << 1, x^164928 mod p(x)` << 1 */
1167+ { 0x00000001dfc9fdfc, 0x00000001f3e29ccc },
1168+ /* x^163840 mod p(x)` << 1, x^163904 mod p(x)` << 1 */
1169+ { 0x00000000cd028bc2, 0x0000000135ae7562 },
1170+ /* x^162816 mod p(x)` << 1, x^162880 mod p(x)` << 1 */
1171+ { 0x0000000090db8c44, 0x0000000190ef812c },
1172+ /* x^161792 mod p(x)` << 1, x^161856 mod p(x)` << 1 */
1173+ { 0x000000010010a4ce, 0x0000000067a2c786 },
1174+ /* x^160768 mod p(x)` << 1, x^160832 mod p(x)` << 1 */
1175+ { 0x00000001c8f4c72c, 0x0000000048b9496c },
1176+ /* x^159744 mod p(x)` << 1, x^159808 mod p(x)` << 1 */
1177+ { 0x000000001c26170c, 0x000000015a422de6 },
1178+ /* x^158720 mod p(x)` << 1, x^158784 mod p(x)` << 1 */
1179+ { 0x00000000e3fccf68, 0x00000001ef0e3640 },
1180+ /* x^157696 mod p(x)` << 1, x^157760 mod p(x)` << 1 */
1181+ { 0x00000000d513ed24, 0x00000001006d2d26 },
1182+ /* x^156672 mod p(x)` << 1, x^156736 mod p(x)` << 1 */
1183+ { 0x00000000141beada, 0x00000001170d56d6 },
1184+ /* x^155648 mod p(x)` << 1, x^155712 mod p(x)` << 1 */
1185+ { 0x000000011071aea0, 0x00000000a5fb613c },
1186+ /* x^154624 mod p(x)` << 1, x^154688 mod p(x)` << 1 */
1187+ { 0x000000012e19080a, 0x0000000040bbf7fc },
1188+ /* x^153600 mod p(x)` << 1, x^153664 mod p(x)` << 1 */
1189+ { 0x0000000100ecf826, 0x000000016ac3a5b2 },
1190+ /* x^152576 mod p(x)` << 1, x^152640 mod p(x)` << 1 */
1191+ { 0x0000000069b09412, 0x00000000abf16230 },
1192+ /* x^151552 mod p(x)` << 1, x^151616 mod p(x)` << 1 */
1193+ { 0x0000000122297bac, 0x00000001ebe23fac },
1194+ /* x^150528 mod p(x)` << 1, x^150592 mod p(x)` << 1 */
1195+ { 0x00000000e9e4b068, 0x000000008b6a0894 },
1196+ /* x^149504 mod p(x)` << 1, x^149568 mod p(x)` << 1 */
1197+ { 0x000000004b38651a, 0x00000001288ea478 },
1198+ /* x^148480 mod p(x)` << 1, x^148544 mod p(x)` << 1 */
1199+ { 0x00000001468360e2, 0x000000016619c442 },
1200+ /* x^147456 mod p(x)` << 1, x^147520 mod p(x)` << 1 */
1201+ { 0x00000000121c2408, 0x0000000086230038 },
1202+ /* x^146432 mod p(x)` << 1, x^146496 mod p(x)` << 1 */
1203+ { 0x00000000da7e7d08, 0x000000017746a756 },
1204+ /* x^145408 mod p(x)` << 1, x^145472 mod p(x)` << 1 */
1205+ { 0x00000001058d7652, 0x0000000191b8f8f8 },
1206+ /* x^144384 mod p(x)` << 1, x^144448 mod p(x)` << 1 */
1207+ { 0x000000014a098a90, 0x000000008e167708 },
1208+ /* x^143360 mod p(x)` << 1, x^143424 mod p(x)` << 1 */
1209+ { 0x0000000020dbe72e, 0x0000000148b22d54 },
1210+ /* x^142336 mod p(x)` << 1, x^142400 mod p(x)` << 1 */
1211+ { 0x000000011e7323e8, 0x0000000044ba2c3c },
1212+ /* x^141312 mod p(x)` << 1, x^141376 mod p(x)` << 1 */
1213+ { 0x00000000d5d4bf94, 0x00000000b54d2b52 },
1214+ /* x^140288 mod p(x)` << 1, x^140352 mod p(x)` << 1 */
1215+ { 0x0000000199d8746c, 0x0000000005a4fd8a },
1216+ /* x^139264 mod p(x)` << 1, x^139328 mod p(x)` << 1 */
1217+ { 0x00000000ce9ca8a0, 0x0000000139f9fc46 },
1218+ /* x^138240 mod p(x)` << 1, x^138304 mod p(x)` << 1 */
1219+ { 0x00000000136edece, 0x000000015a1fa824 },
1220+ /* x^137216 mod p(x)` << 1, x^137280 mod p(x)` << 1 */
1221+ { 0x000000019b92a068, 0x000000000a61ae4c },
1222+ /* x^136192 mod p(x)` << 1, x^136256 mod p(x)` << 1 */
1223+ { 0x0000000071d62206, 0x0000000145e9113e },
1224+ /* x^135168 mod p(x)` << 1, x^135232 mod p(x)` << 1 */
1225+ { 0x00000000dfc50158, 0x000000006a348448 },
1226+ /* x^134144 mod p(x)` << 1, x^134208 mod p(x)` << 1 */
1227+ { 0x00000001517626bc, 0x000000004d80a08c },
1228+ /* x^133120 mod p(x)` << 1, x^133184 mod p(x)` << 1 */
1229+ { 0x0000000148d1e4fa, 0x000000014b6837a0 },
1230+ /* x^132096 mod p(x)` << 1, x^132160 mod p(x)` << 1 */
1231+ { 0x0000000094d8266e, 0x000000016896a7fc },
1232+ /* x^131072 mod p(x)` << 1, x^131136 mod p(x)` << 1 */
1233+ { 0x00000000606c5e34, 0x000000014f187140 },
1234+ /* x^130048 mod p(x)` << 1, x^130112 mod p(x)` << 1 */
1235+ { 0x000000019766beaa, 0x000000019581b9da },
1236+ /* x^129024 mod p(x)` << 1, x^129088 mod p(x)` << 1 */
1237+ { 0x00000001d80c506c, 0x00000001091bc984 },
1238+ /* x^128000 mod p(x)` << 1, x^128064 mod p(x)` << 1 */
1239+ { 0x000000001e73837c, 0x000000001067223c },
1240+ /* x^126976 mod p(x)` << 1, x^127040 mod p(x)` << 1 */
1241+ { 0x0000000064d587de, 0x00000001ab16ea02 },
1242+ /* x^125952 mod p(x)` << 1, x^126016 mod p(x)` << 1 */
1243+ { 0x00000000f4a507b0, 0x000000013c4598a8 },
1244+ /* x^124928 mod p(x)` << 1, x^124992 mod p(x)` << 1 */
1245+ { 0x0000000040e342fc, 0x00000000b3735430 },
1246+ /* x^123904 mod p(x)` << 1, x^123968 mod p(x)` << 1 */
1247+ { 0x00000001d5ad9c3a, 0x00000001bb3fc0c0 },
1248+ /* x^122880 mod p(x)` << 1, x^122944 mod p(x)` << 1 */
1249+ { 0x0000000094a691a4, 0x00000001570ae19c },
1250+ /* x^121856 mod p(x)` << 1, x^121920 mod p(x)` << 1 */
1251+ { 0x00000001271ecdfa, 0x00000001ea910712 },
1252+ /* x^120832 mod p(x)` << 1, x^120896 mod p(x)` << 1 */
1253+ { 0x000000009e54475a, 0x0000000167127128 },
1254+ /* x^119808 mod p(x)` << 1, x^119872 mod p(x)` << 1 */
1255+ { 0x00000000c9c099ee, 0x0000000019e790a2 },
1256+ /* x^118784 mod p(x)` << 1, x^118848 mod p(x)` << 1 */
1257+ { 0x000000009a2f736c, 0x000000003788f710 },
1258+ /* x^117760 mod p(x)` << 1, x^117824 mod p(x)` << 1 */
1259+ { 0x00000000bb9f4996, 0x00000001682a160e },
1260+ /* x^116736 mod p(x)` << 1, x^116800 mod p(x)` << 1 */
1261+ { 0x00000001db688050, 0x000000007f0ebd2e },
1262+ /* x^115712 mod p(x)` << 1, x^115776 mod p(x)` << 1 */
1263+ { 0x00000000e9b10af4, 0x000000002b032080 },
1264+ /* x^114688 mod p(x)` << 1, x^114752 mod p(x)` << 1 */
1265+ { 0x000000012d4545e4, 0x00000000cfd1664a },
1266+ /* x^113664 mod p(x)` << 1, x^113728 mod p(x)` << 1 */
1267+ { 0x000000000361139c, 0x00000000aa1181c2 },
1268+ /* x^112640 mod p(x)` << 1, x^112704 mod p(x)` << 1 */
1269+ { 0x00000001a5a1a3a8, 0x00000000ddd08002 },
1270+ /* x^111616 mod p(x)` << 1, x^111680 mod p(x)` << 1 */
1271+ { 0x000000006844e0b0, 0x00000000e8dd0446 },
1272+ /* x^110592 mod p(x)` << 1, x^110656 mod p(x)` << 1 */
1273+ { 0x00000000c3762f28, 0x00000001bbd94a00 },
1274+ /* x^109568 mod p(x)` << 1, x^109632 mod p(x)` << 1 */
1275+ { 0x00000001d26287a2, 0x00000000ab6cd180 },
1276+ /* x^108544 mod p(x)` << 1, x^108608 mod p(x)` << 1 */
1277+ { 0x00000001f6f0bba8, 0x0000000031803ce2 },
1278+ /* x^107520 mod p(x)` << 1, x^107584 mod p(x)` << 1 */
1279+ { 0x000000002ffabd62, 0x0000000024f40b0c },
1280+ /* x^106496 mod p(x)` << 1, x^106560 mod p(x)` << 1 */
1281+ { 0x00000000fb4516b8, 0x00000001ba1d9834 },
1282+ /* x^105472 mod p(x)` << 1, x^105536 mod p(x)` << 1 */
1283+ { 0x000000018cfa961c, 0x0000000104de61aa },
1284+ /* x^104448 mod p(x)` << 1, x^104512 mod p(x)` << 1 */
1285+ { 0x000000019e588d52, 0x0000000113e40d46 },
1286+ /* x^103424 mod p(x)` << 1, x^103488 mod p(x)` << 1 */
1287+ { 0x00000001180f0bbc, 0x00000001415598a0 },
1288+ /* x^102400 mod p(x)` << 1, x^102464 mod p(x)` << 1 */
1289+ { 0x00000000e1d9177a, 0x00000000bf6c8c90 },
1290+ /* x^101376 mod p(x)` << 1, x^101440 mod p(x)` << 1 */
1291+ { 0x0000000105abc27c, 0x00000001788b0504 },
1292+ /* x^100352 mod p(x)` << 1, x^100416 mod p(x)` << 1 */
1293+ { 0x00000000972e4a58, 0x0000000038385d02 },
1294+ /* x^99328 mod p(x)` << 1, x^99392 mod p(x)` << 1 */
1295+ { 0x0000000183499a5e, 0x00000001b6c83844 },
1296+ /* x^98304 mod p(x)` << 1, x^98368 mod p(x)` << 1 */
1297+ { 0x00000001c96a8cca, 0x0000000051061a8a },
1298+ /* x^97280 mod p(x)` << 1, x^97344 mod p(x)` << 1 */
1299+ { 0x00000001a1a5b60c, 0x000000017351388a },
1300+ /* x^96256 mod p(x)` << 1, x^96320 mod p(x)` << 1 */
1301+ { 0x00000000e4b6ac9c, 0x0000000132928f92 },
1302+ /* x^95232 mod p(x)` << 1, x^95296 mod p(x)` << 1 */
1303+ { 0x00000001807e7f5a, 0x00000000e6b4f48a },
1304+ /* x^94208 mod p(x)` << 1, x^94272 mod p(x)` << 1 */
1305+ { 0x000000017a7e3bc8, 0x0000000039d15e90 },
1306+ /* x^93184 mod p(x)` << 1, x^93248 mod p(x)` << 1 */
1307+ { 0x00000000d73975da, 0x00000000312d6074 },
1308+ /* x^92160 mod p(x)` << 1, x^92224 mod p(x)` << 1 */
1309+ { 0x000000017375d038, 0x000000017bbb2cc4 },
1310+ /* x^91136 mod p(x)` << 1, x^91200 mod p(x)` << 1 */
1311+ { 0x00000000193680bc, 0x000000016ded3e18 },
1312+ /* x^90112 mod p(x)` << 1, x^90176 mod p(x)` << 1 */
1313+ { 0x00000000999b06f6, 0x00000000f1638b16 },
1314+ /* x^89088 mod p(x)` << 1, x^89152 mod p(x)` << 1 */
1315+ { 0x00000001f685d2b8, 0x00000001d38b9ecc },
1316+ /* x^88064 mod p(x)` << 1, x^88128 mod p(x)` << 1 */
1317+ { 0x00000001f4ecbed2, 0x000000018b8d09dc },
1318+ /* x^87040 mod p(x)` << 1, x^87104 mod p(x)` << 1 */
1319+ { 0x00000000ba16f1a0, 0x00000000e7bc27d2 },
1320+ /* x^86016 mod p(x)` << 1, x^86080 mod p(x)` << 1 */
1321+ { 0x0000000115aceac4, 0x00000000275e1e96 },
1322+ /* x^84992 mod p(x)` << 1, x^85056 mod p(x)` << 1 */
1323+ { 0x00000001aeff6292, 0x00000000e2e3031e },
1324+ /* x^83968 mod p(x)` << 1, x^84032 mod p(x)` << 1 */
1325+ { 0x000000009640124c, 0x00000001041c84d8 },
1326+ /* x^82944 mod p(x)` << 1, x^83008 mod p(x)` << 1 */
1327+ { 0x0000000114f41f02, 0x00000000706ce672 },
1328+ /* x^81920 mod p(x)` << 1, x^81984 mod p(x)` << 1 */
1329+ { 0x000000009c5f3586, 0x000000015d5070da },
1330+ /* x^80896 mod p(x)` << 1, x^80960 mod p(x)` << 1 */
1331+ { 0x00000001878275fa, 0x0000000038f9493a },
1332+ /* x^79872 mod p(x)` << 1, x^79936 mod p(x)` << 1 */
1333+ { 0x00000000ddc42ce8, 0x00000000a3348a76 },
1334+ /* x^78848 mod p(x)` << 1, x^78912 mod p(x)` << 1 */
1335+ { 0x0000000181d2c73a, 0x00000001ad0aab92 },
1336+ /* x^77824 mod p(x)` << 1, x^77888 mod p(x)` << 1 */
1337+ { 0x0000000141c9320a, 0x000000019e85f712 },
1338+ /* x^76800 mod p(x)` << 1, x^76864 mod p(x)` << 1 */
1339+ { 0x000000015235719a, 0x000000005a871e76 },
1340+ /* x^75776 mod p(x)` << 1, x^75840 mod p(x)` << 1 */
1341+ { 0x00000000be27d804, 0x000000017249c662 },
1342+ /* x^74752 mod p(x)` << 1, x^74816 mod p(x)` << 1 */
1343+ { 0x000000006242d45a, 0x000000003a084712 },
1344+ /* x^73728 mod p(x)` << 1, x^73792 mod p(x)` << 1 */
1345+ { 0x000000009a53638e, 0x00000000ed438478 },
1346+ /* x^72704 mod p(x)` << 1, x^72768 mod p(x)` << 1 */
1347+ { 0x00000001001ecfb6, 0x00000000abac34cc },
1348+ /* x^71680 mod p(x)` << 1, x^71744 mod p(x)` << 1 */
1349+ { 0x000000016d7c2d64, 0x000000005f35ef3e },
1350+ /* x^70656 mod p(x)` << 1, x^70720 mod p(x)` << 1 */
1351+ { 0x00000001d0ce46c0, 0x0000000047d6608c },
1352+ /* x^69632 mod p(x)` << 1, x^69696 mod p(x)` << 1 */
1353+ { 0x0000000124c907b4, 0x000000002d01470e },
1354+ /* x^68608 mod p(x)` << 1, x^68672 mod p(x)` << 1 */
1355+ { 0x0000000018a555ca, 0x0000000158bbc7b0 },
1356+ /* x^67584 mod p(x)` << 1, x^67648 mod p(x)` << 1 */
1357+ { 0x000000006b0980bc, 0x00000000c0a23e8e },
1358+ /* x^66560 mod p(x)` << 1, x^66624 mod p(x)` << 1 */
1359+ { 0x000000008bbba964, 0x00000001ebd85c88 },
1360+ /* x^65536 mod p(x)` << 1, x^65600 mod p(x)` << 1 */
1361+ { 0x00000001070a5a1e, 0x000000019ee20bb2 },
1362+ /* x^64512 mod p(x)` << 1, x^64576 mod p(x)` << 1 */
1363+ { 0x000000002204322a, 0x00000001acabf2d6 },
1364+ /* x^63488 mod p(x)` << 1, x^63552 mod p(x)` << 1 */
1365+ { 0x00000000a27524d0, 0x00000001b7963d56 },
1366+ /* x^62464 mod p(x)` << 1, x^62528 mod p(x)` << 1 */
1367+ { 0x0000000020b1e4ba, 0x000000017bffa1fe },
1368+ /* x^61440 mod p(x)` << 1, x^61504 mod p(x)` << 1 */
1369+ { 0x0000000032cc27fc, 0x000000001f15333e },
1370+ /* x^60416 mod p(x)` << 1, x^60480 mod p(x)` << 1 */
1371+ { 0x0000000044dd22b8, 0x000000018593129e },
1372+ /* x^59392 mod p(x)` << 1, x^59456 mod p(x)` << 1 */
1373+ { 0x00000000dffc9e0a, 0x000000019cb32602 },
1374+ /* x^58368 mod p(x)` << 1, x^58432 mod p(x)` << 1 */
1375+ { 0x00000001b7a0ed14, 0x0000000142b05cc8 },
1376+ /* x^57344 mod p(x)` << 1, x^57408 mod p(x)` << 1 */
1377+ { 0x00000000c7842488, 0x00000001be49e7a4 },
1378+ /* x^56320 mod p(x)` << 1, x^56384 mod p(x)` << 1 */
1379+ { 0x00000001c02a4fee, 0x0000000108f69d6c },
1380+ /* x^55296 mod p(x)` << 1, x^55360 mod p(x)` << 1 */
1381+ { 0x000000003c273778, 0x000000006c0971f0 },
1382+ /* x^54272 mod p(x)` << 1, x^54336 mod p(x)` << 1 */
1383+ { 0x00000001d63f8894, 0x000000005b16467a },
1384+ /* x^53248 mod p(x)` << 1, x^53312 mod p(x)` << 1 */
1385+ { 0x000000006be557d6, 0x00000001551a628e },
1386+ /* x^52224 mod p(x)` << 1, x^52288 mod p(x)` << 1 */
1387+ { 0x000000006a7806ea, 0x000000019e42ea92 },
1388+ /* x^51200 mod p(x)` << 1, x^51264 mod p(x)` << 1 */
1389+ { 0x000000016155aa0c, 0x000000012fa83ff2 },
1390+ /* x^50176 mod p(x)` << 1, x^50240 mod p(x)` << 1 */
1391+ { 0x00000000908650ac, 0x000000011ca9cde0 },
1392+ /* x^49152 mod p(x)` << 1, x^49216 mod p(x)` << 1 */
1393+ { 0x00000000aa5a8084, 0x00000000c8e5cd74 },
1394+ /* x^48128 mod p(x)` << 1, x^48192 mod p(x)` << 1 */
1395+ { 0x0000000191bb500a, 0x0000000096c27f0c },
1396+ /* x^47104 mod p(x)` << 1, x^47168 mod p(x)` << 1 */
1397+ { 0x0000000064e9bed0, 0x000000002baed926 },
1398+ /* x^46080 mod p(x)` << 1, x^46144 mod p(x)` << 1 */
1399+ { 0x000000009444f302, 0x000000017c8de8d2 },
1400+ /* x^45056 mod p(x)` << 1, x^45120 mod p(x)` << 1 */
1401+ { 0x000000019db07d3c, 0x00000000d43d6068 },
1402+ /* x^44032 mod p(x)` << 1, x^44096 mod p(x)` << 1 */
1403+ { 0x00000001359e3e6e, 0x00000000cb2c4b26 },
1404+ /* x^43008 mod p(x)` << 1, x^43072 mod p(x)` << 1 */
1405+ { 0x00000001e4f10dd2, 0x0000000145b8da26 },
1406+ /* x^41984 mod p(x)` << 1, x^42048 mod p(x)` << 1 */
1407+ { 0x0000000124f5735e, 0x000000018fff4b08 },
1408+ /* x^40960 mod p(x)` << 1, x^41024 mod p(x)` << 1 */
1409+ { 0x0000000124760a4c, 0x0000000150b58ed0 },
1410+ /* x^39936 mod p(x)` << 1, x^40000 mod p(x)` << 1 */
1411+ { 0x000000000f1fc186, 0x00000001549f39bc },
1412+ /* x^38912 mod p(x)` << 1, x^38976 mod p(x)` << 1 */
1413+ { 0x00000000150e4cc4, 0x00000000ef4d2f42 },
1414+ /* x^37888 mod p(x)` << 1, x^37952 mod p(x)` << 1 */
1415+ { 0x000000002a6204e8, 0x00000001b1468572 },
1416+ /* x^36864 mod p(x)` << 1, x^36928 mod p(x)` << 1 */
1417+ { 0x00000000beb1d432, 0x000000013d7403b2 },
1418+ /* x^35840 mod p(x)` << 1, x^35904 mod p(x)` << 1 */
1419+ { 0x0000000135f3f1f0, 0x00000001a4681842 },
1420+ /* x^34816 mod p(x)` << 1, x^34880 mod p(x)` << 1 */
1421+ { 0x0000000074fe2232, 0x0000000167714492 },
1422+ /* x^33792 mod p(x)` << 1, x^33856 mod p(x)` << 1 */
1423+ { 0x000000001ac6e2ba, 0x00000001e599099a },
1424+ /* x^32768 mod p(x)` << 1, x^32832 mod p(x)` << 1 */
1425+ { 0x0000000013fca91e, 0x00000000fe128194 },
1426+ /* x^31744 mod p(x)` << 1, x^31808 mod p(x)` << 1 */
1427+ { 0x0000000183f4931e, 0x0000000077e8b990 },
1428+ /* x^30720 mod p(x)` << 1, x^30784 mod p(x)` << 1 */
1429+ { 0x00000000b6d9b4e4, 0x00000001a267f63a },
1430+ /* x^29696 mod p(x)` << 1, x^29760 mod p(x)` << 1 */
1431+ { 0x00000000b5188656, 0x00000001945c245a },
1432+ /* x^28672 mod p(x)` << 1, x^28736 mod p(x)` << 1 */
1433+ { 0x0000000027a81a84, 0x0000000149002e76 },
1434+ /* x^27648 mod p(x)` << 1, x^27712 mod p(x)` << 1 */
1435+ { 0x0000000125699258, 0x00000001bb8310a4 },
1436+ /* x^26624 mod p(x)` << 1, x^26688 mod p(x)` << 1 */
1437+ { 0x00000001b23de796, 0x000000019ec60bcc },
1438+ /* x^25600 mod p(x)` << 1, x^25664 mod p(x)` << 1 */
1439+ { 0x00000000fe4365dc, 0x000000012d8590ae },
1440+ /* x^24576 mod p(x)` << 1, x^24640 mod p(x)` << 1 */
1441+ { 0x00000000c68f497a, 0x0000000065b00684 },
1442+ /* x^23552 mod p(x)` << 1, x^23616 mod p(x)` << 1 */
1443+ { 0x00000000fbf521ee, 0x000000015e5aeadc },
1444+ /* x^22528 mod p(x)` << 1, x^22592 mod p(x)` << 1 */
1445+ { 0x000000015eac3378, 0x00000000b77ff2b0 },
1446+ /* x^21504 mod p(x)` << 1, x^21568 mod p(x)` << 1 */
1447+ { 0x0000000134914b90, 0x0000000188da2ff6 },
1448+ /* x^20480 mod p(x)` << 1, x^20544 mod p(x)` << 1 */
1449+ { 0x0000000016335cfe, 0x0000000063da929a },
1450+ /* x^19456 mod p(x)` << 1, x^19520 mod p(x)` << 1 */
1451+ { 0x000000010372d10c, 0x00000001389caa80 },
1452+ /* x^18432 mod p(x)` << 1, x^18496 mod p(x)` << 1 */
1453+ { 0x000000015097b908, 0x000000013db599d2 },
1454+ /* x^17408 mod p(x)` << 1, x^17472 mod p(x)` << 1 */
1455+ { 0x00000001227a7572, 0x0000000122505a86 },
1456+ /* x^16384 mod p(x)` << 1, x^16448 mod p(x)` << 1 */
1457+ { 0x000000009a8f75c0, 0x000000016bd72746 },
1458+ /* x^15360 mod p(x)` << 1, x^15424 mod p(x)` << 1 */
1459+ { 0x00000000682c77a2, 0x00000001c3faf1d4 },
1460+ /* x^14336 mod p(x)` << 1, x^14400 mod p(x)` << 1 */
1461+ { 0x00000000231f091c, 0x00000001111c826c },
1462+ /* x^13312 mod p(x)` << 1, x^13376 mod p(x)` << 1 */
1463+ { 0x000000007d4439f2, 0x00000000153e9fb2 },
1464+ /* x^12288 mod p(x)` << 1, x^12352 mod p(x)` << 1 */
1465+ { 0x000000017e221efc, 0x000000002b1f7b60 },
1466+ /* x^11264 mod p(x)` << 1, x^11328 mod p(x)` << 1 */
1467+ { 0x0000000167457c38, 0x00000000b1dba570 },
1468+ /* x^10240 mod p(x)` << 1, x^10304 mod p(x)` << 1 */
1469+ { 0x00000000bdf081c4, 0x00000001f6397b76 },
1470+ /* x^9216 mod p(x)` << 1, x^9280 mod p(x)` << 1 */
1471+ { 0x000000016286d6b0, 0x0000000156335214 },
1472+ /* x^8192 mod p(x)` << 1, x^8256 mod p(x)` << 1 */
1473+ { 0x00000000c84f001c, 0x00000001d70e3986 },
1474+ /* x^7168 mod p(x)` << 1, x^7232 mod p(x)` << 1 */
1475+ { 0x0000000064efe7c0, 0x000000003701a774 },
1476+ /* x^6144 mod p(x)` << 1, x^6208 mod p(x)` << 1 */
1477+ { 0x000000000ac2d904, 0x00000000ac81ef72 },
1478+ /* x^5120 mod p(x)` << 1, x^5184 mod p(x)` << 1 */
1479+ { 0x00000000fd226d14, 0x0000000133212464 },
1480+ /* x^4096 mod p(x)` << 1, x^4160 mod p(x)` << 1 */
1481+ { 0x000000011cfd42e0, 0x00000000e4e45610 },
1482+ /* x^3072 mod p(x)` << 1, x^3136 mod p(x)` << 1 */
1483+ { 0x000000016e5a5678, 0x000000000c1bd370 },
1484+ /* x^2048 mod p(x)` << 1, x^2112 mod p(x)` << 1 */
1485+ { 0x00000001d888fe22, 0x00000001a7b9e7a6 },
1486+ /* x^1024 mod p(x)` << 1, x^1088 mod p(x)` << 1 */
1487+ { 0x00000001af77fcd4, 0x000000007d657a10 }
1488+#endif /* __LITTLE_ENDIAN__ */
1489+ };
1490+
1491+/* Reduce final 1024-2048 bits to 64 bits, shifting 32 bits to include the trailing 32 bits of zeros */
1492+
1493+static const __vector unsigned long long vcrc_short_const[16]
1494+ __attribute__((aligned (16))) = {
1495+#ifdef __LITTLE_ENDIAN__
1496+ /* x^1952 mod p(x) , x^1984 mod p(x) , x^2016 mod p(x) , x^2048 mod p(x) */
1497+ { 0x99168a18ec447f11, 0xed837b2613e8221e },
1498+ /* x^1824 mod p(x) , x^1856 mod p(x) , x^1888 mod p(x) , x^1920 mod p(x) */
1499+ { 0xe23e954e8fd2cd3c, 0xc8acdd8147b9ce5a },
1500+ /* x^1696 mod p(x) , x^1728 mod p(x) , x^1760 mod p(x) , x^1792 mod p(x) */
1501+ { 0x92f8befe6b1d2b53, 0xd9ad6d87d4277e25 },
1502+ /* x^1568 mod p(x) , x^1600 mod p(x) , x^1632 mod p(x) , x^1664 mod p(x) */
1503+ { 0xf38a3556291ea462, 0xc10ec5e033fbca3b },
1504+ /* x^1440 mod p(x) , x^1472 mod p(x) , x^1504 mod p(x) , x^1536 mod p(x) */
1505+ { 0x974ac56262b6ca4b, 0xc0b55b0e82e02e2f },
1506+ /* x^1312 mod p(x) , x^1344 mod p(x) , x^1376 mod p(x) , x^1408 mod p(x) */
1507+ { 0x855712b3784d2a56, 0x71aa1df0e172334d },
1508+ /* x^1184 mod p(x) , x^1216 mod p(x) , x^1248 mod p(x) , x^1280 mod p(x) */
1509+ { 0xa5abe9f80eaee722, 0xfee3053e3969324d },
1510+ /* x^1056 mod p(x) , x^1088 mod p(x) , x^1120 mod p(x) , x^1152 mod p(x) */
1511+ { 0x1fa0943ddb54814c, 0xf44779b93eb2bd08 },
1512+ /* x^928 mod p(x) , x^960 mod p(x) , x^992 mod p(x) , x^1024 mod p(x) */
1513+ { 0xa53ff440d7bbfe6a, 0xf5449b3f00cc3374 },
1514+ /* x^800 mod p(x) , x^832 mod p(x) , x^864 mod p(x) , x^896 mod p(x) */
1515+ { 0xebe7e3566325605c, 0x6f8346e1d777606e },
1516+ /* x^672 mod p(x) , x^704 mod p(x) , x^736 mod p(x) , x^768 mod p(x) */
1517+ { 0xc65a272ce5b592b8, 0xe3ab4f2ac0b95347 },
1518+ /* x^544 mod p(x) , x^576 mod p(x) , x^608 mod p(x) , x^640 mod p(x) */
1519+ { 0x5705a9ca4721589f, 0xaa2215ea329ecc11 },
1520+ /* x^416 mod p(x) , x^448 mod p(x) , x^480 mod p(x) , x^512 mod p(x) */
1521+ { 0xe3720acb88d14467, 0x1ed8f66ed95efd26 },
1522+ /* x^288 mod p(x) , x^320 mod p(x) , x^352 mod p(x) , x^384 mod p(x) */
1523+ { 0xba1aca0315141c31, 0x78ed02d5a700e96a },
1524+ /* x^160 mod p(x) , x^192 mod p(x) , x^224 mod p(x) , x^256 mod p(x) */
1525+ { 0xad2a31b3ed627dae, 0xba8ccbe832b39da3 },
1526+ /* x^32 mod p(x) , x^64 mod p(x) , x^96 mod p(x) , x^128 mod p(x) */
1527+ { 0x6655004fa06a2517, 0xedb88320b1e6b092 }
1528+#else /* __LITTLE_ENDIAN__ */
1529+ /* x^1952 mod p(x) , x^1984 mod p(x) , x^2016 mod p(x) , x^2048 mod p(x) */
1530+ { 0xed837b2613e8221e, 0x99168a18ec447f11 },
1531+ /* x^1824 mod p(x) , x^1856 mod p(x) , x^1888 mod p(x) , x^1920 mod p(x) */
1532+ { 0xc8acdd8147b9ce5a, 0xe23e954e8fd2cd3c },
1533+ /* x^1696 mod p(x) , x^1728 mod p(x) , x^1760 mod p(x) , x^1792 mod p(x) */
1534+ { 0xd9ad6d87d4277e25, 0x92f8befe6b1d2b53 },
1535+ /* x^1568 mod p(x) , x^1600 mod p(x) , x^1632 mod p(x) , x^1664 mod p(x) */
1536+ { 0xc10ec5e033fbca3b, 0xf38a3556291ea462 },
1537+ /* x^1440 mod p(x) , x^1472 mod p(x) , x^1504 mod p(x) , x^1536 mod p(x) */
1538+ { 0xc0b55b0e82e02e2f, 0x974ac56262b6ca4b },
1539+ /* x^1312 mod p(x) , x^1344 mod p(x) , x^1376 mod p(x) , x^1408 mod p(x) */
1540+ { 0x71aa1df0e172334d, 0x855712b3784d2a56 },
1541+ /* x^1184 mod p(x) , x^1216 mod p(x) , x^1248 mod p(x) , x^1280 mod p(x) */
1542+ { 0xfee3053e3969324d, 0xa5abe9f80eaee722 },
1543+ /* x^1056 mod p(x) , x^1088 mod p(x) , x^1120 mod p(x) , x^1152 mod p(x) */
1544+ { 0xf44779b93eb2bd08, 0x1fa0943ddb54814c },
1545+ /* x^928 mod p(x) , x^960 mod p(x) , x^992 mod p(x) , x^1024 mod p(x) */
1546+ { 0xf5449b3f00cc3374, 0xa53ff440d7bbfe6a },
1547+ /* x^800 mod p(x) , x^832 mod p(x) , x^864 mod p(x) , x^896 mod p(x) */
1548+ { 0x6f8346e1d777606e, 0xebe7e3566325605c },
1549+ /* x^672 mod p(x) , x^704 mod p(x) , x^736 mod p(x) , x^768 mod p(x) */
1550+ { 0xe3ab4f2ac0b95347, 0xc65a272ce5b592b8 },
1551+ /* x^544 mod p(x) , x^576 mod p(x) , x^608 mod p(x) , x^640 mod p(x) */
1552+ { 0xaa2215ea329ecc11, 0x5705a9ca4721589f },
1553+ /* x^416 mod p(x) , x^448 mod p(x) , x^480 mod p(x) , x^512 mod p(x) */
1554+ { 0x1ed8f66ed95efd26, 0xe3720acb88d14467 },
1555+ /* x^288 mod p(x) , x^320 mod p(x) , x^352 mod p(x) , x^384 mod p(x) */
1556+ { 0x78ed02d5a700e96a, 0xba1aca0315141c31 },
1557+ /* x^160 mod p(x) , x^192 mod p(x) , x^224 mod p(x) , x^256 mod p(x) */
1558+ { 0xba8ccbe832b39da3, 0xad2a31b3ed627dae },
1559+ /* x^32 mod p(x) , x^64 mod p(x) , x^96 mod p(x) , x^128 mod p(x) */
1560+ { 0xedb88320b1e6b092, 0x6655004fa06a2517 }
1561+#endif /* __LITTLE_ENDIAN__ */
1562+ };
1563+
1564+/* Barrett constants */
1565+/* 33 bit reflected Barrett constant m - (4^32)/n */
1566+
1567+static const __vector unsigned long long v_Barrett_const[2]
1568+ __attribute__((aligned (16))) = {
1569+ /* x^64 div p(x) */
1570+#ifdef __LITTLE_ENDIAN__
1571+ { 0x00000001f7011641, 0x0000000000000000 },
1572+ { 0x00000001db710641, 0x0000000000000000 }
1573+#else /* __LITTLE_ENDIAN__ */
1574+ { 0x0000000000000000, 0x00000001f7011641 },
1575+ { 0x0000000000000000, 0x00000001db710641 }
1576+#endif /* __LITTLE_ENDIAN__ */
1577+ };
1578+#endif /* POWER8_INTRINSICS */
1579+
1580+#endif /* __ASSEMBLER__ */
1581diff --git a/contrib/power/crc32_z_power8.c b/contrib/power/crc32_z_power8.c
1582new file mode 100644
1583index 0000000..7858cfe
1584--- /dev/null
1585+++ b/contrib/power/crc32_z_power8.c
1586@@ -0,0 +1,679 @@
1587+/*
1588+ * Calculate the checksum of data that is 16 byte aligned and a multiple of
1589+ * 16 bytes.
1590+ *
1591+ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
1592+ * chunks in order to mask the latency of the vpmsum instructions. If we
1593+ * have more than 32 kB of data to checksum we repeat this step multiple
1594+ * times, passing in the previous 1024 bits.
1595+ *
1596+ * The next step is to reduce the 1024 bits to 64 bits. This step adds
1597+ * 32 bits of 0s to the end - this matches what a CRC does. We just
1598+ * calculate constants that land the data in this 32 bits.
1599+ *
1600+ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
1601+ * for n = CRC using POWER8 instructions. We use x = 32.
1602+ *
1603+ * http://en.wikipedia.org/wiki/Barrett_reduction
1604+ *
1605+ * This code uses gcc vector builtins instead using assembly directly.
1606+ *
1607+ * Copyright (C) 2017 Rogerio Alves <rogealve@br.ibm.com>, IBM
1608+ *
1609+ * This program is free software; you can redistribute it and/or
1610+ * modify it under the terms of either:
1611+ *
1612+ * a) the GNU General Public License as published by the Free Software
1613+ * Foundation; either version 2 of the License, or (at your option)
1614+ * any later version, or
1615+ * b) the Apache License, Version 2.0
1616+ */
1617+
1618+#include <altivec.h>
1619+#include "../../zutil.h"
1620+#include "power.h"
1621+
1622+#define POWER8_INTRINSICS
1623+#define CRC_TABLE
1624+
1625+#ifdef CRC32_CONSTANTS_HEADER
1626+#include CRC32_CONSTANTS_HEADER
1627+#else
1628+#include "crc32_constants.h"
1629+#endif
1630+
1631+#define VMX_ALIGN 16
1632+#define VMX_ALIGN_MASK (VMX_ALIGN-1)
1633+
1634+#ifdef REFLECT
1635+static unsigned int crc32_align(unsigned int crc, const unsigned char *p,
1636+ unsigned long len)
1637+{
1638+ while (len--)
1639+ crc = crc_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);
1640+ return crc;
1641+}
1642+#else
1643+static unsigned int crc32_align(unsigned int crc, const unsigned char *p,
1644+ unsigned long len)
1645+{
1646+ while (len--)
1647+ crc = crc_table[((crc >> 24) ^ *p++) & 0xff] ^ (crc << 8);
1648+ return crc;
1649+}
1650+#endif
1651+
1652+static unsigned int __attribute__ ((aligned (32)))
1653+__crc32_vpmsum(unsigned int crc, const void* p, unsigned long len);
1654+
1655+unsigned long ZLIB_INTERNAL _crc32_z_power8(uLong _crc, const Bytef *_p,
1656+ z_size_t _len)
1657+{
1658+ unsigned int prealign;
1659+ unsigned int tail;
1660+
1661+ /* Map zlib API to crc32_vpmsum API */
1662+ unsigned int crc = (unsigned int) (0xffffffff & _crc);
1663+ const unsigned char *p = _p;
1664+ unsigned long len = (unsigned long) _len;
1665+
1666+ if (p == (const unsigned char *) 0x0) return 0;
1667+#ifdef CRC_XOR
1668+ crc ^= 0xffffffff;
1669+#endif
1670+
1671+ if (len < VMX_ALIGN + VMX_ALIGN_MASK) {
1672+ crc = crc32_align(crc, p, len);
1673+ goto out;
1674+ }
1675+
1676+ if ((unsigned long)p & VMX_ALIGN_MASK) {
1677+ prealign = VMX_ALIGN - ((unsigned long)p & VMX_ALIGN_MASK);
1678+ crc = crc32_align(crc, p, prealign);
1679+ len -= prealign;
1680+ p += prealign;
1681+ }
1682+
1683+ crc = __crc32_vpmsum(crc, p, len & ~VMX_ALIGN_MASK);
1684+
1685+ tail = len & VMX_ALIGN_MASK;
1686+ if (tail) {
1687+ p += len & ~VMX_ALIGN_MASK;
1688+ crc = crc32_align(crc, p, tail);
1689+ }
1690+
1691+out:
1692+#ifdef CRC_XOR
1693+ crc ^= 0xffffffff;
1694+#endif
1695+
1696+ /* Convert to zlib API */
1697+ return (unsigned long) crc;
1698+}
1699+
1700+#if defined (__clang__)
1701+#include "clang_workaround.h"
1702+#else
1703+#define __builtin_pack_vector(a, b) __builtin_pack_vector_int128 ((a), (b))
1704+#define __builtin_unpack_vector_0(a) __builtin_unpack_vector_int128 ((vector __int128_t)(a), 0)
1705+#define __builtin_unpack_vector_1(a) __builtin_unpack_vector_int128 ((vector __int128_t)(a), 1)
1706+#endif
1707+
1708+/* When we have a load-store in a single-dispatch group and address overlap
1709+ * such that foward is not allowed (load-hit-store) the group must be flushed.
1710+ * A group ending NOP prevents the flush.
1711+ */
1712+#define GROUP_ENDING_NOP asm("ori 2,2,0" ::: "memory")
1713+
1714+#if defined(__BIG_ENDIAN__) && defined (REFLECT)
1715+#define BYTESWAP_DATA
1716+#elif defined(__LITTLE_ENDIAN__) && !defined(REFLECT)
1717+#define BYTESWAP_DATA
1718+#endif
1719+
1720+#ifdef BYTESWAP_DATA
1721+#define VEC_PERM(vr, va, vb, vc) vr = vec_perm(va, vb,\
1722+ (__vector unsigned char) vc)
1723+#if defined(__LITTLE_ENDIAN__)
1724+/* Byte reverse permute constant LE. */
1725+static const __vector unsigned long long vperm_const
1726+ __attribute__ ((aligned(16))) = { 0x08090A0B0C0D0E0FUL,
1727+ 0x0001020304050607UL };
1728+#else
1729+static const __vector unsigned long long vperm_const
1730+ __attribute__ ((aligned(16))) = { 0x0F0E0D0C0B0A0908UL,
1731+ 0X0706050403020100UL };
1732+#endif
1733+#else
1734+#define VEC_PERM(vr, va, vb, vc)
1735+#endif
1736+
1737+static unsigned int __attribute__ ((aligned (32)))
1738+__crc32_vpmsum(unsigned int crc, const void* p, unsigned long len) {
1739+
1740+ const __vector unsigned long long vzero = {0,0};
1741+ const __vector unsigned long long vones = {0xffffffffffffffffUL,
1742+ 0xffffffffffffffffUL};
1743+
1744+#ifdef REFLECT
1745+ const __vector unsigned long long vmask_32bit =
1746+ (__vector unsigned long long)vec_sld((__vector unsigned char)vzero,
1747+ (__vector unsigned char)vones, 4);
1748+#endif
1749+
1750+ const __vector unsigned long long vmask_64bit =
1751+ (__vector unsigned long long)vec_sld((__vector unsigned char)vzero,
1752+ (__vector unsigned char)vones, 8);
1753+
1754+ __vector unsigned long long vcrc;
1755+
1756+ __vector unsigned long long vconst1, vconst2;
1757+
1758+ /* vdata0-vdata7 will contain our data (p). */
1759+ __vector unsigned long long vdata0, vdata1, vdata2, vdata3, vdata4,
1760+ vdata5, vdata6, vdata7;
1761+
1762+ /* v0-v7 will contain our checksums */
1763+ __vector unsigned long long v0 = {0,0};
1764+ __vector unsigned long long v1 = {0,0};
1765+ __vector unsigned long long v2 = {0,0};
1766+ __vector unsigned long long v3 = {0,0};
1767+ __vector unsigned long long v4 = {0,0};
1768+ __vector unsigned long long v5 = {0,0};
1769+ __vector unsigned long long v6 = {0,0};
1770+ __vector unsigned long long v7 = {0,0};
1771+
1772+
1773+ /* Vector auxiliary variables. */
1774+ __vector unsigned long long va0, va1, va2, va3, va4, va5, va6, va7;
1775+
1776+ unsigned int result = 0;
1777+ unsigned int offset; /* Constant table offset. */
1778+
1779+ unsigned long i; /* Counter. */
1780+ unsigned long chunks;
1781+
1782+ unsigned long block_size;
1783+ int next_block = 0;
1784+
1785+ /* Align by 128 bits. The last 128 bit block will be processed at end. */
1786+ unsigned long length = len & 0xFFFFFFFFFFFFFF80UL;
1787+
1788+#ifdef REFLECT
1789+ vcrc = (__vector unsigned long long)__builtin_pack_vector(0UL, crc);
1790+#else
1791+ vcrc = (__vector unsigned long long)__builtin_pack_vector(crc, 0UL);
1792+
1793+ /* Shift into top 32 bits */
1794+ vcrc = (__vector unsigned long long)vec_sld((__vector unsigned char)vcrc,
1795+ (__vector unsigned char)vzero, 4);
1796+#endif
1797+
1798+ /* Short version. */
1799+ if (len < 256) {
1800+ /* Calculate where in the constant table we need to start. */
1801+ offset = 256 - len;
1802+
1803+ vconst1 = vec_ld(offset, vcrc_short_const);
1804+ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
1805+ VEC_PERM(vdata0, vdata0, vconst1, vperm_const);
1806+
1807+ /* xor initial value*/
1808+ vdata0 = vec_xor(vdata0, vcrc);
1809+
1810+ vdata0 = (__vector unsigned long long) __builtin_crypto_vpmsumw
1811+ ((__vector unsigned int)vdata0, (__vector unsigned int)vconst1);
1812+ v0 = vec_xor(v0, vdata0);
1813+
1814+ for (i = 16; i < len; i += 16) {
1815+ vconst1 = vec_ld(offset + i, vcrc_short_const);
1816+ vdata0 = vec_ld(i, (__vector unsigned long long*) p);
1817+ VEC_PERM(vdata0, vdata0, vconst1, vperm_const);
1818+ vdata0 = (__vector unsigned long long) __builtin_crypto_vpmsumw
1819+ ((__vector unsigned int)vdata0, (__vector unsigned int)vconst1);
1820+ v0 = vec_xor(v0, vdata0);
1821+ }
1822+ } else {
1823+
1824+ /* Load initial values. */
1825+ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
1826+ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
1827+
1828+ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
1829+ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
1830+
1831+ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
1832+ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
1833+
1834+ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
1835+ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
1836+
1837+ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
1838+ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
1839+
1840+ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
1841+ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
1842+
1843+ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
1844+ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
1845+
1846+ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
1847+ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
1848+
1849+ /* xor in initial value */
1850+ vdata0 = vec_xor(vdata0, vcrc);
1851+
1852+ p = (char *)p + 128;
1853+
1854+ do {
1855+ /* Checksum in blocks of MAX_SIZE. */
1856+ block_size = length;
1857+ if (block_size > MAX_SIZE) {
1858+ block_size = MAX_SIZE;
1859+ }
1860+
1861+ length = length - block_size;
1862+
1863+ /*
1864+ * Work out the offset into the constants table to start at. Each
1865+ * constant is 16 bytes, and it is used against 128 bytes of input
1866+ * data - 128 / 16 = 8
1867+ */
1868+ offset = (MAX_SIZE/8) - (block_size/8);
1869+ /* We reduce our final 128 bytes in a separate step */
1870+ chunks = (block_size/128)-1;
1871+
1872+ vconst1 = vec_ld(offset, vcrc_const);
1873+
1874+ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata0,
1875+ (__vector unsigned long long)vconst1);
1876+ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata1,
1877+ (__vector unsigned long long)vconst1);
1878+ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata2,
1879+ (__vector unsigned long long)vconst1);
1880+ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata3,
1881+ (__vector unsigned long long)vconst1);
1882+ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata4,
1883+ (__vector unsigned long long)vconst1);
1884+ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata5,
1885+ (__vector unsigned long long)vconst1);
1886+ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata6,
1887+ (__vector unsigned long long)vconst1);
1888+ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long long)vdata7,
1889+ (__vector unsigned long long)vconst1);
1890+
1891+ if (chunks > 1) {
1892+ offset += 16;
1893+ vconst2 = vec_ld(offset, vcrc_const);
1894+ GROUP_ENDING_NOP;
1895+
1896+ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
1897+ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
1898+
1899+ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
1900+ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
1901+
1902+ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
1903+ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
1904+
1905+ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
1906+ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
1907+
1908+ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
1909+ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
1910+
1911+ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
1912+ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
1913+
1914+ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
1915+ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
1916+
1917+ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
1918+ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
1919+
1920+ p = (char *)p + 128;
1921+
1922+ /*
1923+ * main loop. We modulo schedule it such that it takes three
1924+ * iterations to complete - first iteration load, second
1925+ * iteration vpmsum, third iteration xor.
1926+ */
1927+ for (i = 0; i < chunks-2; i++) {
1928+ vconst1 = vec_ld(offset, vcrc_const);
1929+ offset += 16;
1930+ GROUP_ENDING_NOP;
1931+
1932+ v0 = vec_xor(v0, va0);
1933+ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long
1934+ long)vdata0, (__vector unsigned long long)vconst2);
1935+ vdata0 = vec_ld(0, (__vector unsigned long long*) p);
1936+ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
1937+ GROUP_ENDING_NOP;
1938+
1939+ v1 = vec_xor(v1, va1);
1940+ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long
1941+ long)vdata1, (__vector unsigned long long)vconst2);
1942+ vdata1 = vec_ld(16, (__vector unsigned long long*) p);
1943+ VEC_PERM(vdata1, vdata1, vdata1, vperm_const);
1944+ GROUP_ENDING_NOP;
1945+
1946+ v2 = vec_xor(v2, va2);
1947+ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long
1948+ long)vdata2, (__vector unsigned long long)vconst2);
1949+ vdata2 = vec_ld(32, (__vector unsigned long long*) p);
1950+ VEC_PERM(vdata2, vdata2, vdata2, vperm_const);
1951+ GROUP_ENDING_NOP;
1952+
1953+ v3 = vec_xor(v3, va3);
1954+ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long
1955+ long)vdata3, (__vector unsigned long long)vconst2);
1956+ vdata3 = vec_ld(48, (__vector unsigned long long*) p);
1957+ VEC_PERM(vdata3, vdata3, vdata3, vperm_const);
1958+
1959+ vconst2 = vec_ld(offset, vcrc_const);
1960+ GROUP_ENDING_NOP;
1961+
1962+ v4 = vec_xor(v4, va4);
1963+ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long
1964+ long)vdata4, (__vector unsigned long long)vconst1);
1965+ vdata4 = vec_ld(64, (__vector unsigned long long*) p);
1966+ VEC_PERM(vdata4, vdata4, vdata4, vperm_const);
1967+ GROUP_ENDING_NOP;
1968+
1969+ v5 = vec_xor(v5, va5);
1970+ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long
1971+ long)vdata5, (__vector unsigned long long)vconst1);
1972+ vdata5 = vec_ld(80, (__vector unsigned long long*) p);
1973+ VEC_PERM(vdata5, vdata5, vdata5, vperm_const);
1974+ GROUP_ENDING_NOP;
1975+
1976+ v6 = vec_xor(v6, va6);
1977+ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long
1978+ long)vdata6, (__vector unsigned long long)vconst1);
1979+ vdata6 = vec_ld(96, (__vector unsigned long long*) p);
1980+ VEC_PERM(vdata6, vdata6, vdata6, vperm_const);
1981+ GROUP_ENDING_NOP;
1982+
1983+ v7 = vec_xor(v7, va7);
1984+ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long
1985+ long)vdata7, (__vector unsigned long long)vconst1);
1986+ vdata7 = vec_ld(112, (__vector unsigned long long*) p);
1987+ VEC_PERM(vdata7, vdata7, vdata7, vperm_const);
1988+
1989+ p = (char *)p + 128;
1990+ }
1991+
1992+ /* First cool down*/
1993+ vconst1 = vec_ld(offset, vcrc_const);
1994+ offset += 16;
1995+
1996+ v0 = vec_xor(v0, va0);
1997+ va0 = __builtin_crypto_vpmsumd ((__vector unsigned long
1998+ long)vdata0, (__vector unsigned long long)vconst1);
1999+ GROUP_ENDING_NOP;
2000+
2001+ v1 = vec_xor(v1, va1);
2002+ va1 = __builtin_crypto_vpmsumd ((__vector unsigned long
2003+ long)vdata1, (__vector unsigned long long)vconst1);
2004+ GROUP_ENDING_NOP;
2005+
2006+ v2 = vec_xor(v2, va2);
2007+ va2 = __builtin_crypto_vpmsumd ((__vector unsigned long
2008+ long)vdata2, (__vector unsigned long long)vconst1);
2009+ GROUP_ENDING_NOP;
2010+
2011+ v3 = vec_xor(v3, va3);
2012+ va3 = __builtin_crypto_vpmsumd ((__vector unsigned long
2013+ long)vdata3, (__vector unsigned long long)vconst1);
2014+ GROUP_ENDING_NOP;
2015+
2016+ v4 = vec_xor(v4, va4);
2017+ va4 = __builtin_crypto_vpmsumd ((__vector unsigned long
2018+ long)vdata4, (__vector unsigned long long)vconst1);
2019+ GROUP_ENDING_NOP;
2020+
2021+ v5 = vec_xor(v5, va5);
2022+ va5 = __builtin_crypto_vpmsumd ((__vector unsigned long
2023+ long)vdata5, (__vector unsigned long long)vconst1);
2024+ GROUP_ENDING_NOP;
2025+
2026+ v6 = vec_xor(v6, va6);
2027+ va6 = __builtin_crypto_vpmsumd ((__vector unsigned long
2028+ long)vdata6, (__vector unsigned long long)vconst1);
2029+ GROUP_ENDING_NOP;
2030+
2031+ v7 = vec_xor(v7, va7);
2032+ va7 = __builtin_crypto_vpmsumd ((__vector unsigned long
2033+ long)vdata7, (__vector unsigned long long)vconst1);
2034+ }/* else */
2035+
2036+ /* Second cool down. */
2037+ v0 = vec_xor(v0, va0);
2038+ v1 = vec_xor(v1, va1);
2039+ v2 = vec_xor(v2, va2);
2040+ v3 = vec_xor(v3, va3);
2041+ v4 = vec_xor(v4, va4);
2042+ v5 = vec_xor(v5, va5);
2043+ v6 = vec_xor(v6, va6);
2044+ v7 = vec_xor(v7, va7);
2045+
2046+#ifdef REFLECT
2047+ /*
2048+ * vpmsumd produces a 96 bit result in the least significant bits
2049+ * of the register. Since we are bit reflected we have to shift it
2050+ * left 32 bits so it occupies the least significant bits in the
2051+ * bit reflected domain.
2052+ */
2053+ v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2054+ (__vector unsigned char)vzero, 4);
2055+ v1 = (__vector unsigned long long)vec_sld((__vector unsigned char)v1,
2056+ (__vector unsigned char)vzero, 4);
2057+ v2 = (__vector unsigned long long)vec_sld((__vector unsigned char)v2,
2058+ (__vector unsigned char)vzero, 4);
2059+ v3 = (__vector unsigned long long)vec_sld((__vector unsigned char)v3,
2060+ (__vector unsigned char)vzero, 4);
2061+ v4 = (__vector unsigned long long)vec_sld((__vector unsigned char)v4,
2062+ (__vector unsigned char)vzero, 4);
2063+ v5 = (__vector unsigned long long)vec_sld((__vector unsigned char)v5,
2064+ (__vector unsigned char)vzero, 4);
2065+ v6 = (__vector unsigned long long)vec_sld((__vector unsigned char)v6,
2066+ (__vector unsigned char)vzero, 4);
2067+ v7 = (__vector unsigned long long)vec_sld((__vector unsigned char)v7,
2068+ (__vector unsigned char)vzero, 4);
2069+#endif
2070+
2071+ /* xor with the last 1024 bits. */
2072+ va0 = vec_ld(0, (__vector unsigned long long*) p);
2073+ VEC_PERM(va0, va0, va0, vperm_const);
2074+
2075+ va1 = vec_ld(16, (__vector unsigned long long*) p);
2076+ VEC_PERM(va1, va1, va1, vperm_const);
2077+
2078+ va2 = vec_ld(32, (__vector unsigned long long*) p);
2079+ VEC_PERM(va2, va2, va2, vperm_const);
2080+
2081+ va3 = vec_ld(48, (__vector unsigned long long*) p);
2082+ VEC_PERM(va3, va3, va3, vperm_const);
2083+
2084+ va4 = vec_ld(64, (__vector unsigned long long*) p);
2085+ VEC_PERM(va4, va4, va4, vperm_const);
2086+
2087+ va5 = vec_ld(80, (__vector unsigned long long*) p);
2088+ VEC_PERM(va5, va5, va5, vperm_const);
2089+
2090+ va6 = vec_ld(96, (__vector unsigned long long*) p);
2091+ VEC_PERM(va6, va6, va6, vperm_const);
2092+
2093+ va7 = vec_ld(112, (__vector unsigned long long*) p);
2094+ VEC_PERM(va7, va7, va7, vperm_const);
2095+
2096+ p = (char *)p + 128;
2097+
2098+ vdata0 = vec_xor(v0, va0);
2099+ vdata1 = vec_xor(v1, va1);
2100+ vdata2 = vec_xor(v2, va2);
2101+ vdata3 = vec_xor(v3, va3);
2102+ vdata4 = vec_xor(v4, va4);
2103+ vdata5 = vec_xor(v5, va5);
2104+ vdata6 = vec_xor(v6, va6);
2105+ vdata7 = vec_xor(v7, va7);
2106+
2107+ /* Check if we have more blocks to process */
2108+ next_block = 0;
2109+ if (length != 0) {
2110+ next_block = 1;
2111+
2112+ /* zero v0-v7 */
2113+ v0 = vec_xor(v0, v0);
2114+ v1 = vec_xor(v1, v1);
2115+ v2 = vec_xor(v2, v2);
2116+ v3 = vec_xor(v3, v3);
2117+ v4 = vec_xor(v4, v4);
2118+ v5 = vec_xor(v5, v5);
2119+ v6 = vec_xor(v6, v6);
2120+ v7 = vec_xor(v7, v7);
2121+ }
2122+ length = length + 128;
2123+
2124+ } while (next_block);
2125+
2126+ /* Calculate how many bytes we have left. */
2127+ length = (len & 127);
2128+
2129+ /* Calculate where in (short) constant table we need to start. */
2130+ offset = 128 - length;
2131+
2132+ v0 = vec_ld(offset, vcrc_short_const);
2133+ v1 = vec_ld(offset + 16, vcrc_short_const);
2134+ v2 = vec_ld(offset + 32, vcrc_short_const);
2135+ v3 = vec_ld(offset + 48, vcrc_short_const);
2136+ v4 = vec_ld(offset + 64, vcrc_short_const);
2137+ v5 = vec_ld(offset + 80, vcrc_short_const);
2138+ v6 = vec_ld(offset + 96, vcrc_short_const);
2139+ v7 = vec_ld(offset + 112, vcrc_short_const);
2140+
2141+ offset += 128;
2142+
2143+ v0 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2144+ (__vector unsigned int)vdata0,(__vector unsigned int)v0);
2145+ v1 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2146+ (__vector unsigned int)vdata1,(__vector unsigned int)v1);
2147+ v2 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2148+ (__vector unsigned int)vdata2,(__vector unsigned int)v2);
2149+ v3 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2150+ (__vector unsigned int)vdata3,(__vector unsigned int)v3);
2151+ v4 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2152+ (__vector unsigned int)vdata4,(__vector unsigned int)v4);
2153+ v5 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2154+ (__vector unsigned int)vdata5,(__vector unsigned int)v5);
2155+ v6 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2156+ (__vector unsigned int)vdata6,(__vector unsigned int)v6);
2157+ v7 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2158+ (__vector unsigned int)vdata7,(__vector unsigned int)v7);
2159+
2160+ /* Now reduce the tail (0-112 bytes). */
2161+ for (i = 0; i < length; i+=16) {
2162+ vdata0 = vec_ld(i,(__vector unsigned long long*)p);
2163+ VEC_PERM(vdata0, vdata0, vdata0, vperm_const);
2164+ va0 = vec_ld(offset + i,vcrc_short_const);
2165+ va0 = (__vector unsigned long long)__builtin_crypto_vpmsumw (
2166+ (__vector unsigned int)vdata0,(__vector unsigned int)va0);
2167+ v0 = vec_xor(v0, va0);
2168+ }
2169+
2170+ /* xor all parallel chunks together. */
2171+ v0 = vec_xor(v0, v1);
2172+ v2 = vec_xor(v2, v3);
2173+ v4 = vec_xor(v4, v5);
2174+ v6 = vec_xor(v6, v7);
2175+
2176+ v0 = vec_xor(v0, v2);
2177+ v4 = vec_xor(v4, v6);
2178+
2179+ v0 = vec_xor(v0, v4);
2180+ }
2181+
2182+ /* Barrett Reduction */
2183+ vconst1 = vec_ld(0, v_Barrett_const);
2184+ vconst2 = vec_ld(16, v_Barrett_const);
2185+
2186+ v1 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2187+ (__vector unsigned char)v0, 8);
2188+ v0 = vec_xor(v1,v0);
2189+
2190+#ifdef REFLECT
2191+ /* shift left one bit */
2192+ __vector unsigned char vsht_splat = vec_splat_u8 (1);
2193+ v0 = (__vector unsigned long long)vec_sll ((__vector unsigned char)v0,
2194+ vsht_splat);
2195+#endif
2196+
2197+ v0 = vec_and(v0, vmask_64bit);
2198+
2199+#ifndef REFLECT
2200+
2201+ /*
2202+ * Now for the actual algorithm. The idea is to calculate q,
2203+ * the multiple of our polynomial that we need to subtract. By
2204+ * doing the computation 2x bits higher (ie 64 bits) and shifting the
2205+ * result back down 2x bits, we round down to the nearest multiple.
2206+ */
2207+
2208+ /* ma */
2209+ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v0,
2210+ (__vector unsigned long long)vconst1);
2211+ /* q = floor(ma/(2^64)) */
2212+ v1 = (__vector unsigned long long)vec_sld ((__vector unsigned char)vzero,
2213+ (__vector unsigned char)v1, 8);
2214+ /* qn */
2215+ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2216+ (__vector unsigned long long)vconst2);
2217+ /* a - qn, subtraction is xor in GF(2) */
2218+ v0 = vec_xor (v0, v1);
2219+ /*
2220+ * Get the result into r3. We need to shift it left 8 bytes:
2221+ * V0 [ 0 1 2 X ]
2222+ * V0 [ 0 X 2 3 ]
2223+ */
2224+ result = __builtin_unpack_vector_1 (v0);
2225+#else
2226+
2227+ /*
2228+ * The reflected version of Barrett reduction. Instead of bit
2229+ * reflecting our data (which is expensive to do), we bit reflect our
2230+ * constants and our algorithm, which means the intermediate data in
2231+ * our vector registers goes from 0-63 instead of 63-0. We can reflect
2232+ * the algorithm because we don't carry in mod 2 arithmetic.
2233+ */
2234+
2235+ /* bottom 32 bits of a */
2236+ v1 = vec_and(v0, vmask_32bit);
2237+
2238+ /* ma */
2239+ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2240+ (__vector unsigned long long)vconst1);
2241+
2242+ /* bottom 32bits of ma */
2243+ v1 = vec_and(v1, vmask_32bit);
2244+ /* qn */
2245+ v1 = __builtin_crypto_vpmsumd ((__vector unsigned long long)v1,
2246+ (__vector unsigned long long)vconst2);
2247+ /* a - qn, subtraction is xor in GF(2) */
2248+ v0 = vec_xor (v0, v1);
2249+
2250+ /*
2251+ * Since we are bit reflected, the result (ie the low 32 bits) is in
2252+ * the high 32 bits. We just need to shift it left 4 bytes
2253+ * V0 [ 0 1 X 3 ]
2254+ * V0 [ 0 X 2 3 ]
2255+ */
2256+
2257+ /* shift result into top 64 bits of */
2258+ v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0,
2259+ (__vector unsigned char)vzero, 4);
2260+
2261+ result = __builtin_unpack_vector_0 (v0);
2262+#endif
2263+
2264+ return result;
2265+}
2266diff --git a/contrib/power/crc32_z_resolver.c b/contrib/power/crc32_z_resolver.c
2267new file mode 100644
2268index 0000000..f4e9aa4
2269--- /dev/null
2270+++ b/contrib/power/crc32_z_resolver.c
2271@@ -0,0 +1,15 @@
2272+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
2273+ * For conditions of distribution and use, see copyright notice in zlib.h
2274+ */
2275+
2276+#include "../gcc/zifunc.h"
2277+#include "power.h"
2278+
2279+Z_IFUNC(crc32_z) {
2280+#ifdef Z_POWER8
2281+ if (__builtin_cpu_supports("arch_2_07"))
2282+ return _crc32_z_power8;
2283+#endif
2284+
2285+ return crc32_z_default;
2286+}
2287diff --git a/contrib/power/power.h b/contrib/power/power.h
2288index b42c7d6..79123aa 100644
2289--- a/contrib/power/power.h
2290+++ b/contrib/power/power.h
2291@@ -2,3 +2,7 @@
2292 * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
2293 * For conditions of distribution and use, see copyright notice in zlib.h
2294 */
2295+
2296+#include "../../zconf.h"
2297+
2298+unsigned long _crc32_z_power8(unsigned long, const Bytef *, z_size_t);
2299diff --git a/crc32.c b/crc32.c
2300index 6c38f5c..5589d54 100644
2301--- a/crc32.c
2302+++ b/crc32.c
2303@@ -691,6 +691,13 @@ local z_word_t crc_word_big(z_word_t data) {
2304 #endif
2305
2306 /* ========================================================================= */
2307+#ifdef Z_POWER_OPT
2308+/* Rename function so resolver can use its symbol. The default version will be
2309+ * returned by the resolver if the host has no support for an optimized version.
2310+ */
2311+#define crc32_z crc32_z_default
2312+#endif /* Z_POWER_OPT */
2313+
2314 unsigned long ZEXPORT crc32_z(unsigned long crc, const unsigned char FAR *buf,
2315 z_size_t len) {
2316 /* Return initial CRC, if requested. */
2317@@ -1009,6 +1016,11 @@ unsigned long ZEXPORT crc32_z(unsigned long crc, const unsigned char FAR *buf,
2318 return crc ^ 0xffffffff;
2319 }
2320
2321+#ifdef Z_POWER_OPT
2322+#undef crc32_z
2323+#include "contrib/power/crc32_z_resolver.c"
2324+#endif /* Z_POWER_OPT */
2325+
2326 #endif
2327
2328 /* ========================================================================= */
2329diff --git a/test/crc32_test.c b/test/crc32_test.c
2330new file mode 100644
2331index 0000000..3155553
2332--- /dev/null
2333+++ b/test/crc32_test.c
2334@@ -0,0 +1,205 @@
2335+/* crc32_tes.c -- unit test for crc32 in the zlib compression library
2336+ * Copyright (C) 1995-2006, 2010, 2011, 2016, 2019 Rogerio Alves
2337+ * For conditions of distribution and use, see copyright notice in zlib.h
2338+ */
2339+
2340+#include "zlib.h"
2341+#include <stdio.h>
2342+
2343+#ifdef STDC
2344+# include <string.h>
2345+# include <stdlib.h>
2346+#endif
2347+
2348+void test_crc32 OF((uLong crc, Byte* buf, z_size_t len, uLong chk, int line));
2349+int main OF((void));
2350+
2351+typedef struct {
2352+ int line;
2353+ uLong crc;
2354+ char* buf;
2355+ int len;
2356+ uLong expect;
2357+} crc32_test;
2358+
2359+void test_crc32(crc, buf, len, chk, line)
2360+ uLong crc;
2361+ Byte *buf;
2362+ z_size_t len;
2363+ uLong chk;
2364+ int line;
2365+{
2366+ uLong res = crc32(crc, buf, len);
2367+ if (res != chk) {
2368+ fprintf(stderr, "FAIL [%d]: crc32 returned 0x%08X expected 0x%08X\n",
2369+ line, (unsigned int)res, (unsigned int)chk);
2370+ exit(1);
2371+ }
2372+}
2373+
2374+static const crc32_test tests[] = {
2375+ {__LINE__, 0x0, 0x0, 0, 0x0},
2376+ {__LINE__, 0xffffffff, 0x0, 0, 0x0},
2377+ {__LINE__, 0x0, 0x0, 255, 0x0}, /* BZ 174799. */
2378+ {__LINE__, 0x0, 0x0, 256, 0x0},
2379+ {__LINE__, 0x0, 0x0, 257, 0x0},
2380+ {__LINE__, 0x0, 0x0, 32767, 0x0},
2381+ {__LINE__, 0x0, 0x0, 32768, 0x0},
2382+ {__LINE__, 0x0, 0x0, 32769, 0x0},
2383+ {__LINE__, 0x0, "", 0, 0x0},
2384+ {__LINE__, 0xffffffff, "", 0, 0xffffffff},
2385+ {__LINE__, 0x0, "abacus", 6, 0xc3d7115b},
2386+ {__LINE__, 0x0, "backlog", 7, 0x269205},
2387+ {__LINE__, 0x0, "campfire", 8, 0x22a515f8},
2388+ {__LINE__, 0x0, "delta", 5, 0x9643fed9},
2389+ {__LINE__, 0x0, "executable", 10, 0xd68eda01},
2390+ {__LINE__, 0x0, "file", 4, 0x8c9f3610},
2391+ {__LINE__, 0x0, "greatest", 8, 0xc1abd6cd},
2392+ {__LINE__, 0x0, "hello", 5, 0x3610a686},
2393+ {__LINE__, 0x0, "inverter", 8, 0xc9e962c9},
2394+ {__LINE__, 0x0, "jigsaw", 6, 0xce4e3f69},
2395+ {__LINE__, 0x0, "karate", 6, 0x890be0e2},
2396+ {__LINE__, 0x0, "landscape", 9, 0xc4e0330b},
2397+ {__LINE__, 0x0, "machine", 7, 0x1505df84},
2398+ {__LINE__, 0x0, "nanometer", 9, 0xd4e19f39},
2399+ {__LINE__, 0x0, "oblivion", 8, 0xdae9de77},
2400+ {__LINE__, 0x0, "panama", 6, 0x66b8979c},
2401+ {__LINE__, 0x0, "quest", 5, 0x4317f817},
2402+ {__LINE__, 0x0, "resource", 8, 0xbc91f416},
2403+ {__LINE__, 0x0, "secret", 6, 0x5ca2e8e5},
2404+ {__LINE__, 0x0, "test", 4, 0xd87f7e0c},
2405+ {__LINE__, 0x0, "ultimate", 8, 0x3fc79b0b},
2406+ {__LINE__, 0x0, "vector", 6, 0x1b6e485b},
2407+ {__LINE__, 0x0, "walrus", 6, 0xbe769b97},
2408+ {__LINE__, 0x0, "xeno", 4, 0xe7a06444},
2409+ {__LINE__, 0x0, "yelling", 7, 0xfe3944e5},
2410+ {__LINE__, 0x0, "zlib", 4, 0x73887d3a},
2411+ {__LINE__, 0x0, "4BJD7PocN1VqX0jXVpWB", 20, 0xd487a5a1},
2412+ {__LINE__, 0x0, "F1rPWI7XvDs6nAIRx41l", 20, 0x61a0132e},
2413+ {__LINE__, 0x0, "ldhKlsVkPFOveXgkGtC2", 20, 0xdf02f76},
2414+ {__LINE__, 0x0, "5KKnGOOrs8BvJ35iKTOS", 20, 0x579b2b0a},
2415+ {__LINE__, 0x0, "0l1tw7GOcem06Ddu7yn4", 20, 0xf7d16e2d},
2416+ {__LINE__, 0x0, "MCr47CjPIn9R1IvE1Tm5", 20, 0x731788f5},
2417+ {__LINE__, 0x0, "UcixbzPKTIv0SvILHVdO", 20, 0x7112bb11},
2418+ {__LINE__, 0x0, "dGnAyAhRQDsWw0ESou24", 20, 0xf32a0dac},
2419+ {__LINE__, 0x0, "di0nvmY9UYMYDh0r45XT", 20, 0x625437bb},
2420+ {__LINE__, 0x0, "2XKDwHfAhFsV0RhbqtvH", 20, 0x896930f9},
2421+ {__LINE__, 0x0, "ZhrANFIiIvRnqClIVyeD", 20, 0x8579a37},
2422+ {__LINE__, 0x0, "v7Q9ehzioTOVeDIZioT1", 20, 0x632aa8e0},
2423+ {__LINE__, 0x0, "Yod5hEeKcYqyhfXbhxj2", 20, 0xc829af29},
2424+ {__LINE__, 0x0, "GehSWY2ay4uUKhehXYb0", 20, 0x1b08b7e8},
2425+ {__LINE__, 0x0, "kwytJmq6UqpflV8Y8GoE", 20, 0x4e33b192},
2426+ {__LINE__, 0x0, "70684206568419061514", 20, 0x59a179f0},
2427+ {__LINE__, 0x0, "42015093765128581010", 20, 0xcd1013d7},
2428+ {__LINE__, 0x0, "88214814356148806939", 20, 0xab927546},
2429+ {__LINE__, 0x0, "43472694284527343838", 20, 0x11f3b20c},
2430+ {__LINE__, 0x0, "49769333513942933689", 20, 0xd562d4ca},
2431+ {__LINE__, 0x0, "54979784887993251199", 20, 0x233395f7},
2432+ {__LINE__, 0x0, "58360544869206793220", 20, 0x2d167fd5},
2433+ {__LINE__, 0x0, "27347953487840714234", 20, 0x8b5108ba},
2434+ {__LINE__, 0x0, "07650690295365319082", 20, 0xc46b3cd8},
2435+ {__LINE__, 0x0, "42655507906821911703", 20, 0xc10b2662},
2436+ {__LINE__, 0x0, "29977409200786225655", 20, 0xc9a0f9d2},
2437+ {__LINE__, 0x0, "85181542907229116674", 20, 0x9341357b},
2438+ {__LINE__, 0x0, "87963594337989416799", 20, 0xf0424937},
2439+ {__LINE__, 0x0, "21395988329504168551", 20, 0xd7c4c31f},
2440+ {__LINE__, 0x0, "51991013580943379423", 20, 0xf11edcc4},
2441+ {__LINE__, 0x0, "*]+@!);({_$;}[_},?{?;(_?,=-][@", 30, 0x40795df4},
2442+ {__LINE__, 0x0, "_@:_).&(#.[:[{[:)$++-($_;@[)}+", 30, 0xdd61a631},
2443+ {__LINE__, 0x0, "&[!,[$_==}+.]@!;*(+},[;:)$;)-@", 30, 0xca907a99},
2444+ {__LINE__, 0x0, "]{.[.+?+[[=;[?}_#&;[=)__$$:+=_", 30, 0xf652deac},
2445+ {__LINE__, 0x0, "-%.)=/[@].:.(:,()$;=%@-$?]{%+%", 30, 0xaf39a5a9},
2446+ {__LINE__, 0x0, "+]#$(@&.=:,*];/.!]%/{:){:@(;)$", 30, 0x6bebb4cf},
2447+ {__LINE__, 0x0, ")-._.:?[&:.=+}(*$/=!.${;(=$@!}", 30, 0x76430bac},
2448+ {__LINE__, 0x0, ":(_*&%/[[}+,?#$&*+#[([*-/#;%(]", 30, 0x6c80c388},
2449+ {__LINE__, 0x0, "{[#-;:$/{)(+[}#]/{&!%(@)%:@-$:", 30, 0xd54d977d},
2450+ {__LINE__, 0x0, "_{$*,}(&,@.)):=!/%(&(,,-?$}}}!", 30, 0xe3966ad5},
2451+ {__LINE__, 0x0, "e$98KNzqaV)Y:2X?]77].{gKRD4G5{mHZk,Z)SpU%L3FSgv!Wb8MLAFdi{+fp)c,@8m6v)yXg@]HBDFk?.4&}g5_udE*JHCiH=aL", 100, 0xe7c71db9},
2452+ {__LINE__, 0x0, "r*Fd}ef+5RJQ;+W=4jTR9)R*p!B;]Ed7tkrLi;88U7g@3v!5pk2X6D)vt,.@N8c]@yyEcKi[vwUu@.Ppm@C6%Mv*3Nw}Y,58_aH)", 100, 0xeaa52777},
2453+ {__LINE__, 0x0, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 100, 0xcd472048},
2454+ {__LINE__, 0x7a30360d, "abacus", 6, 0xf8655a84},
2455+ {__LINE__, 0x6fd767ee, "backlog", 7, 0x1ed834b1},
2456+ {__LINE__, 0xefeb7589, "campfire", 8, 0x686cfca},
2457+ {__LINE__, 0x61cf7e6b, "delta", 5, 0x1554e4b1},
2458+ {__LINE__, 0xdc712e2, "executable", 10, 0x761b4254},
2459+ {__LINE__, 0xad23c7fd, "file", 4, 0x7abdd09b},
2460+ {__LINE__, 0x85cb2317, "greatest", 8, 0x4ba91c6b},
2461+ {__LINE__, 0x9eed31b0, "inverter", 8, 0xd5e78ba5},
2462+ {__LINE__, 0xb94f34ca, "jigsaw", 6, 0x23649109},
2463+ {__LINE__, 0xab058a2, "karate", 6, 0xc5591f41},
2464+ {__LINE__, 0x5bff2b7a, "landscape", 9, 0xf10eb644},
2465+ {__LINE__, 0x605c9a5f, "machine", 7, 0xbaa0a636},
2466+ {__LINE__, 0x51bdeea5, "nanometer", 9, 0x6af89afb},
2467+ {__LINE__, 0x85c21c79, "oblivion", 8, 0xecae222b},
2468+ {__LINE__, 0x97216f56, "panama", 6, 0x47dffac4},
2469+ {__LINE__, 0x18444af2, "quest", 5, 0x70c2fe36},
2470+ {__LINE__, 0xbe6ce359, "resource", 8, 0x1471d925},
2471+ {__LINE__, 0x843071f1, "secret", 6, 0x50c9a0db},
2472+ {__LINE__, 0xf2480c60, "ultimate", 8, 0xf973daf8},
2473+ {__LINE__, 0x2d2feb3d, "vector", 6, 0x344ac03d},
2474+ {__LINE__, 0x7490310a, "walrus", 6, 0x6d1408ef},
2475+ {__LINE__, 0x97d247d4, "xeno", 4, 0xe62670b5},
2476+ {__LINE__, 0x93cf7599, "yelling", 7, 0x1b36da38},
2477+ {__LINE__, 0x73c84278, "zlib", 4, 0x6432d127},
2478+ {__LINE__, 0x228a87d1, "4BJD7PocN1VqX0jXVpWB", 20, 0x997107d0},
2479+ {__LINE__, 0xa7a048d0, "F1rPWI7XvDs6nAIRx41l", 20, 0xdc567274},
2480+ {__LINE__, 0x1f0ded40, "ldhKlsVkPFOveXgkGtC2", 20, 0xdcc63870},
2481+ {__LINE__, 0xa804a62f, "5KKnGOOrs8BvJ35iKTOS", 20, 0x6926cffd},
2482+ {__LINE__, 0x508fae6a, "0l1tw7GOcem06Ddu7yn4", 20, 0xb52b38bc},
2483+ {__LINE__, 0xe5adaf4f, "MCr47CjPIn9R1IvE1Tm5", 20, 0xf83b8178},
2484+ {__LINE__, 0x67136a40, "UcixbzPKTIv0SvILHVdO", 20, 0xc5213070},
2485+ {__LINE__, 0xb00c4a10, "dGnAyAhRQDsWw0ESou24", 20, 0xbc7648b0},
2486+ {__LINE__, 0x2e0c84b5, "di0nvmY9UYMYDh0r45XT", 20, 0xd8123a72},
2487+ {__LINE__, 0x81238d44, "2XKDwHfAhFsV0RhbqtvH", 20, 0xd5ac5620},
2488+ {__LINE__, 0xf853aa92, "ZhrANFIiIvRnqClIVyeD", 20, 0xceae099d},
2489+ {__LINE__, 0x5a692325, "v7Q9ehzioTOVeDIZioT1", 20, 0xb07d2b24},
2490+ {__LINE__, 0x3275b9f, "Yod5hEeKcYqyhfXbhxj2", 20, 0x24ce91df},
2491+ {__LINE__, 0x38371feb, "GehSWY2ay4uUKhehXYb0", 20, 0x707b3b30},
2492+ {__LINE__, 0xafc8bf62, "kwytJmq6UqpflV8Y8GoE", 20, 0x16abc6a9},
2493+ {__LINE__, 0x9b07db73, "70684206568419061514", 20, 0xae1fb7b7},
2494+ {__LINE__, 0xe75b214, "42015093765128581010", 20, 0xd4eecd2d},
2495+ {__LINE__, 0x72d0fe6f, "88214814356148806939", 20, 0x4660ec7},
2496+ {__LINE__, 0xf857a4b1, "43472694284527343838", 20, 0xfd8afdf7},
2497+ {__LINE__, 0x54b8e14, "49769333513942933689", 20, 0xc6d1b5f2},
2498+ {__LINE__, 0xd6aa5616, "54979784887993251199", 20, 0x32476461},
2499+ {__LINE__, 0x11e63098, "58360544869206793220", 20, 0xd917cf1a},
2500+ {__LINE__, 0xbe92385, "27347953487840714234", 20, 0x4ad14a12},
2501+ {__LINE__, 0x49511de0, "07650690295365319082", 20, 0xe37b5c6c},
2502+ {__LINE__, 0x3db13bc1, "42655507906821911703", 20, 0x7cc497f1},
2503+ {__LINE__, 0xbb899bea, "29977409200786225655", 20, 0x99781bb2},
2504+ {__LINE__, 0xf6cd9436, "85181542907229116674", 20, 0x132256a1},
2505+ {__LINE__, 0x9109e6c3, "87963594337989416799", 20, 0xbfdb2c83},
2506+ {__LINE__, 0x75770fc, "21395988329504168551", 20, 0x8d9d1e81},
2507+ {__LINE__, 0x69b1d19b, "51991013580943379423", 20, 0x7b6d4404},
2508+ {__LINE__, 0xc6132975, "*]+@!);({_$;}[_},?{?;(_?,=-][@", 30, 0x8619f010},
2509+ {__LINE__, 0xd58cb00c, "_@:_).&(#.[:[{[:)$++-($_;@[)}+", 30, 0x15746ac3},
2510+ {__LINE__, 0xb63b8caa, "&[!,[$_==}+.]@!;*(+},[;:)$;)-@", 30, 0xaccf812f},
2511+ {__LINE__, 0x8a45a2b8, "]{.[.+?+[[=;[?}_#&;[=)__$$:+=_", 30, 0x78af45de},
2512+ {__LINE__, 0xcbe95b78, "-%.)=/[@].:.(:,()$;=%@-$?]{%+%", 30, 0x25b06b59},
2513+ {__LINE__, 0x4ef8a54b, "+]#$(@&.=:,*];/.!]%/{:){:@(;)$", 30, 0x4ba0d08f},
2514+ {__LINE__, 0x76ad267a, ")-._.:?[&:.=+}(*$/=!.${;(=$@!}", 30, 0xe26b6aac},
2515+ {__LINE__, 0x569e613c, ":(_*&%/[[}+,?#$&*+#[([*-/#;%(]", 30, 0x7e2b0a66},
2516+ {__LINE__, 0x36aa61da, "{[#-;:$/{)(+[}#]/{&!%(@)%:@-$:", 30, 0xb3430dc7},
2517+ {__LINE__, 0xf67222df, "_{$*,}(&,@.)):=!/%(&(,,-?$}}}!", 30, 0x626c17a},
2518+ {__LINE__, 0x74b34fd3, "e$98KNzqaV)Y:2X?]77].{gKRD4G5{mHZk,Z)SpU%L3FSgv!Wb8MLAFdi{+fp)c,@8m6v)yXg@]HBDFk?.4&}g5_udE*JHCiH=aL", 100, 0xccf98060},
2519+ {__LINE__, 0x351fd770, "r*Fd}ef+5RJQ;+W=4jTR9)R*p!B;]Ed7tkrLi;88U7g@3v!5pk2X6D)vt,.@N8c]@yyEcKi[vwUu@.Ppm@C6%Mv*3Nw}Y,58_aH)", 100, 0xd8b95312},
2520+ {__LINE__, 0xc45aef77, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 100, 0xbb1c9912},
2521+ {__LINE__, 0xc45aef77, "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2522+ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2523+ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2524+ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2525+ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&"
2526+ "h{bcmdC+a;t+Cf{6Y_dFq-{X4Yu&7uNfVDh?q&_u.UWJU],-GiH7ADzb7-V.Q%4=+v!$L9W+T=bP]$_:]Vyg}A.ygD.r;h-D]m%&", 600, 0x888AFA5B}
2527+};
2528+
2529+static const int test_size = sizeof(tests) / sizeof(tests[0]);
2530+
2531+int main(void)
2532+{
2533+ int i;
2534+ for (i = 0; i < test_size; i++) {
2535+ test_crc32(tests[i].crc, (Byte*) tests[i].buf, tests[i].len,
2536+ tests[i].expect, tests[i].line);
2537+ }
2538+ return 0;
2539+}
diff --git a/debian/patches/power/fix-clang7-builtins.patch b/debian/patches/power/fix-clang7-builtins.patch
0new file mode 1006442540new file mode 100644
index 0000000..0ed510f
--- /dev/null
+++ b/debian/patches/power/fix-clang7-builtins.patch
@@ -0,0 +1,62 @@
1From: Manjunath S Matti <mmatti@linux.ibm.com>
2Date: Thu, 14 Sep 2023 06:45:31 -0500
3Subject: Fix clang's behavior on versions >= 7
4
5Clang 7 changed the behavior of vec_xxpermdi in order to match GCC's
6behavior. After this change, code that used to work on Clang 6 stopped
7to work on Clang >= 7.
8
9Tested on Clang 6, 7, 8 and 9.
10
11Reference: https://bugs.llvm.org/show_bug.cgi?id=38192
12
13Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
14Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
15
16Origin: i-iii/zlib, https://github.com/iii-i/zlib/commit/8aca10a8a5ddb397854eb9a443f29658d3e3e12e
17---
18 contrib/power/clang_workaround.h | 15 ++++++++++-----
19 1 file changed, 10 insertions(+), 5 deletions(-)
20
21diff --git a/contrib/power/clang_workaround.h b/contrib/power/clang_workaround.h
22index b5e7dae..915f7e5 100644
23--- a/contrib/power/clang_workaround.h
24+++ b/contrib/power/clang_workaround.h
25@@ -39,7 +39,12 @@ __vector unsigned long long __builtin_pack_vector (unsigned long __a,
26 return __v;
27 }
28
29-#ifndef vec_xxpermdi
30+/*
31+ * Clang 7 changed the behavior of vec_xxpermdi in order to provide the same
32+ * behavior of GCC. That means code adapted to Clang >= 7 does not work on
33+ * Clang <= 6. So, fallback to __builtin_unpack_vector() on Clang <= 6.
34+ */
35+#if !defined vec_xxpermdi || __clang_major__ <= 6
36
37 static inline
38 unsigned long __builtin_unpack_vector (__vector unsigned long long __v,
39@@ -62,9 +67,9 @@ static inline
40 unsigned long __builtin_unpack_vector_0 (__vector unsigned long long __v)
41 {
42 #if defined(__BIG_ENDIAN__)
43- return vec_xxpermdi(__v, __v, 0x0)[1];
44- #else
45 return vec_xxpermdi(__v, __v, 0x0)[0];
46+ #else
47+ return vec_xxpermdi(__v, __v, 0x3)[0];
48 #endif
49 }
50
51@@ -72,9 +77,9 @@ static inline
52 unsigned long __builtin_unpack_vector_1 (__vector unsigned long long __v)
53 {
54 #if defined(__BIG_ENDIAN__)
55- return vec_xxpermdi(__v, __v, 0x3)[1];
56- #else
57 return vec_xxpermdi(__v, __v, 0x3)[0];
58+ #else
59+ return vec_xxpermdi(__v, __v, 0x0)[0];
60 #endif
61 }
62 #endif /* vec_xxpermdi */
diff --git a/debian/patches/power/indirect-func-macros.patch b/debian/patches/power/indirect-func-macros.patch
0new file mode 10064463new file mode 100644
index 0000000..c2976d8
--- /dev/null
+++ b/debian/patches/power/indirect-func-macros.patch
@@ -0,0 +1,295 @@
1From: Manjunath S Matti <mmatti@linux.ibm.com>
2Date: Thu, 14 Sep 2023 06:15:57 -0500
3Subject: Preparation for Power optimizations
4
5Optimized functions for Power will make use of GNU indirect functions,
6an extension to support different implementations of the same function,
7which can be selected during runtime. This will be used to provide
8optimized functions for different processor versions.
9
10Since this is a GNU extension, we placed the definition of the Z_IFUNC
11macro under `contrib/gcc`. This can be reused by other archs as well.
12
13Author: Matheus Castanho <msc@linux.ibm.com>
14Author: Rogerio Alves <rcardoso@linux.ibm.com>
15Signed-off-by: Manjunath Matti <mmatti@linux.ibm.com>
16
17Origin: iii-i/zlib, https://github.com/iii-i/zlib/commit/096441298ecd1c123f1d37c2b34d6b6bb3c42e93
18---
19 CMakeLists.txt | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++
20 configure | 66 ++++++++++++++++++++++++++++++++++++++++++++++
21 contrib/README.contrib | 8 ++++++
22 contrib/gcc/zifunc.h | 60 ++++++++++++++++++++++++++++++++++++++++++
23 contrib/power/power.h | 4 +++
24 5 files changed, 209 insertions(+)
25 create mode 100644 contrib/gcc/zifunc.h
26 create mode 100644 contrib/power/power.h
27
28diff --git a/CMakeLists.txt b/CMakeLists.txt
29index 7f1b69f..4456cd7 100644
30--- a/CMakeLists.txt
31+++ b/CMakeLists.txt
32@@ -5,6 +5,8 @@ project(zlib C)
33
34 set(VERSION "1.3")
35
36+option(POWER "Enable building power implementation")
37+
38 set(INSTALL_BIN_DIR "${CMAKE_INSTALL_PREFIX}/bin" CACHE PATH "Installation directory for executables")
39 set(INSTALL_LIB_DIR "${CMAKE_INSTALL_PREFIX}/lib" CACHE PATH "Installation directory for libraries")
40 set(INSTALL_INC_DIR "${CMAKE_INSTALL_PREFIX}/include" CACHE PATH "Installation directory for headers")
41@@ -126,6 +128,75 @@ if(NOT MINGW)
42 )
43 endif()
44
45+if(CMAKE_COMPILER_IS_GNUCC)
46+
47+ # test to see if we can use a GNU indirect function to detect and load optimized code at runtime
48+ CHECK_C_SOURCE_COMPILES("
49+ static int test_ifunc_native(void)
50+ {
51+ return 1;
52+ }
53+ static int (*(check_ifunc_native(void)))(void)
54+ {
55+ return test_ifunc_native;
56+ }
57+ int test_ifunc(void) __attribute__ ((ifunc (\"check_ifunc_native\")));
58+ int main(void)
59+ {
60+ return 0;
61+ }
62+ " HAS_C_ATTR_IFUNC)
63+
64+ if(HAS_C_ATTR_IFUNC)
65+ add_definitions(-DHAVE_IFUNC)
66+ set(ZLIB_PRIVATE_HDRS ${ZLIB_PRIVATE_HDRS} contrib/gcc/zifunc.h)
67+ endif()
68+
69+ if(POWER)
70+ # Test to see if we can use the optimizations for Power
71+ CHECK_C_SOURCE_COMPILES("
72+ #ifndef _ARCH_PPC
73+ #error \"Target is not Power\"
74+ #endif
75+ #ifndef __BUILTIN_CPU_SUPPORTS__
76+ #error \"Target doesn't support __builtin_cpu_supports()\"
77+ #endif
78+ int main() { return 0; }
79+ " HAS_POWER_SUPPORT)
80+
81+ if(HAS_POWER_SUPPORT AND HAS_C_ATTR_IFUNC)
82+ add_definitions(-DZ_POWER_OPT)
83+
84+ set(CMAKE_REQUIRED_FLAGS -mcpu=power8)
85+ CHECK_C_SOURCE_COMPILES("int main(void){return 0;}" POWER8)
86+
87+ if(POWER8)
88+ add_definitions(-DZ_POWER8)
89+ set(ZLIB_POWER8 )
90+
91+ set_source_files_properties(
92+ ${ZLIB_POWER8}
93+ PROPERTIES COMPILE_FLAGS -mcpu=power8)
94+ endif()
95+
96+ set(CMAKE_REQUIRED_FLAGS -mcpu=power9)
97+ CHECK_C_SOURCE_COMPILES("int main(void){return 0;}" POWER9)
98+
99+ if(POWER9)
100+ add_definitions(-DZ_POWER9)
101+ set(ZLIB_POWER9 )
102+
103+ set_source_files_properties(
104+ ${ZLIB_POWER9}
105+ PROPERTIES COMPILE_FLAGS -mcpu=power9)
106+ endif()
107+
108+ set(ZLIB_PRIVATE_HDRS ${ZLIB_PRIVATE_HDRS} contrib/power/power.h)
109+ set(ZLIB_SRCS ${ZLIB_SRCS} ${ZLIB_POWER8} ${ZLIB_POWER9})
110+ endif()
111+ endif()
112+endif()
113+
114 # parse the full version number from zlib.h and include in ZLIB_FULL_VERSION
115 file(READ ${CMAKE_CURRENT_SOURCE_DIR}/zlib.h _zlib_h_contents)
116 string(REGEX REPLACE ".*#define[ \t]+ZLIB_VERSION[ \t]+\"([-0-9A-Za-z.]+)\".*"
117diff --git a/configure b/configure
118index cc867c9..e307a8d 100755
119--- a/configure
120+++ b/configure
121@@ -834,6 +834,72 @@ EOF
122 fi
123 fi
124
125+# test to see if we can use a gnu indirection function to detect and load optimized code at runtime
126+echo >> configure.log
127+cat > $test.c <<EOF
128+static int test_ifunc_native(void)
129+{
130+ return 1;
131+}
132+
133+static int (*(check_ifunc_native(void)))(void)
134+{
135+ return test_ifunc_native;
136+}
137+
138+int test_ifunc(void) __attribute__ ((ifunc ("check_ifunc_native")));
139+EOF
140+
141+if tryboth $CC -c $CFLAGS $test.c; then
142+ SFLAGS="${SFLAGS} -DHAVE_IFUNC"
143+ CFLAGS="${CFLAGS} -DHAVE_IFUNC"
144+ echo "Checking for attribute(ifunc) support... Yes." | tee -a configure.log
145+else
146+ echo "Checking for attribute(ifunc) support... No." | tee -a configure.log
147+fi
148+
149+# Test to see if we can use the optimizations for Power
150+echo >> configure.log
151+cat > $test.c <<EOF
152+#ifndef _ARCH_PPC
153+ #error "Target is not Power"
154+#endif
155+#ifndef HAVE_IFUNC
156+ #error "Target doesn't support ifunc"
157+#endif
158+#ifndef __BUILTIN_CPU_SUPPORTS__
159+ #error "Target doesn't support __builtin_cpu_supports()"
160+#endif
161+EOF
162+
163+if tryboth $CC -c $CFLAGS $test.c; then
164+ echo "int main(void){return 0;}" > $test.c
165+
166+ if tryboth $CC -c $CFLAGS -mcpu=power8 $test.c; then
167+ POWER8="-DZ_POWER8"
168+ PIC_OBJC="${PIC_OBJC}"
169+ OBJC="${OBJC}"
170+ echo "Checking for -mcpu=power8 support... Yes." | tee -a configure.log
171+ else
172+ echo "Checking for -mcpu=power8 support... No." | tee -a configure.log
173+ fi
174+
175+ if tryboth $CC -c $CFLAGS -mcpu=power9 $test.c; then
176+ POWER9="-DZ_POWER9"
177+ PIC_OBJC="${PIC_OBJC}"
178+ OBJC="${OBJC}"
179+ echo "Checking for -mcpu=power9 support... Yes." | tee -a configure.log
180+ else
181+ echo "Checking for -mcpu=power9 support... No." | tee -a configure.log
182+ fi
183+
184+ SFLAGS="${SFLAGS} ${POWER8} ${POWER9} -DZ_POWER_OPT"
185+ CFLAGS="${CFLAGS} ${POWER8} ${POWER9} -DZ_POWER_OPT"
186+ echo "Checking for Power optimizations support... Yes." | tee -a configure.log
187+else
188+ echo "Checking for Power optimizations support... No." | tee -a configure.log
189+fi
190+
191 # show the results in the log
192 echo >> configure.log
193 echo ALL = $ALL >> configure.log
194diff --git a/contrib/README.contrib b/contrib/README.contrib
195index 5e5f950..c57b520 100644
196--- a/contrib/README.contrib
197+++ b/contrib/README.contrib
198@@ -11,6 +11,10 @@ ada/ by Dmitriy Anisimkov <anisimkov@yahoo.com>
199 blast/ by Mark Adler <madler@alumni.caltech.edu>
200 Decompressor for output of PKWare Data Compression Library (DCL)
201
202+gcc/ by Matheus Castanho <msc@linux.ibm.com>
203+ and Rogerio Alves <rcardoso@linux.ibm.com>
204+ Optimization helpers using GCC-specific extensions
205+
206 delphi/ by Cosmin Truta <cosmint@cs.ubbcluj.ro>
207 Support for Delphi and C++ Builder
208
209@@ -42,6 +46,10 @@ minizip/ by Gilles Vollant <info@winimage.com>
210 pascal/ by Bob Dellaca <bobdl@xtra.co.nz> et al.
211 Support for Pascal
212
213+power/ by Matheus Castanho <msc@linux.ibm.com>
214+ and Rogerio Alves <rcardoso@linux.ibm.com>
215+ Optimized functions for Power processors
216+
217 puff/ by Mark Adler <madler@alumni.caltech.edu>
218 Small, low memory usage inflate. Also serves to provide an
219 unambiguous description of the deflate format.
220diff --git a/contrib/gcc/zifunc.h b/contrib/gcc/zifunc.h
221new file mode 100644
222index 0000000..daf4fe4
223--- /dev/null
224+++ b/contrib/gcc/zifunc.h
225@@ -0,0 +1,60 @@
226+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
227+ * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
228+ * For conditions of distribution and use, see copyright notice in zlib.h
229+ */
230+
231+#ifndef Z_IFUNC_H_
232+#define Z_IFUNC_H_
233+
234+/* Helpers for arch optimizations */
235+
236+#define Z_IFUNC(fname) \
237+ typeof(fname) fname __attribute__ ((ifunc (#fname "_resolver"))); \
238+ local typeof(fname) *fname##_resolver(void)
239+/* This is a helper macro to declare a resolver for an indirect function
240+ * (ifunc). Let's say you have function
241+ *
242+ * int foo (int a);
243+ *
244+ * for which you want to provide different implementations, for example:
245+ *
246+ * int foo_clever (int a) {
247+ * ... clever things ...
248+ * }
249+ *
250+ * int foo_smart (int a) {
251+ * ... smart things ...
252+ * }
253+ *
254+ * You will have to declare foo() as an indirect function and also provide a
255+ * resolver for it, to choose between foo_clever() and foo_smart() based on
256+ * some criteria you define (e.g. processor features).
257+ *
258+ * Since most likely foo() has a default implementation somewhere in zlib, you
259+ * may have to rename it so the 'foo' symbol can be used by the ifunc without
260+ * conflicts.
261+ *
262+ * #define foo foo_default
263+ * int foo (int a) {
264+ * ...
265+ * }
266+ * #undef foo
267+ *
268+ * Now you just have to provide a resolver function to choose which function
269+ * should be used (decided at runtime on the first call to foo()):
270+ *
271+ * Z_IFUNC(foo) {
272+ * if (... some condition ...)
273+ * return foo_clever;
274+ *
275+ * if (... other condition ...)
276+ * return foo_smart;
277+ *
278+ * return foo_default;
279+ * }
280+ *
281+ * All calls to foo() throughout the code can remain untouched, all the magic
282+ * will be done by the linker using the resolver function.
283+ */
284+
285+#endif /* Z_IFUNC_H_ */
286diff --git a/contrib/power/power.h b/contrib/power/power.h
287new file mode 100644
288index 0000000..b42c7d6
289--- /dev/null
290+++ b/contrib/power/power.h
291@@ -0,0 +1,4 @@
292+/* Copyright (C) 2019 Matheus Castanho <msc@linux.ibm.com>, IBM
293+ * 2019 Rogerio Alves <rogerio.alves@ibm.com>, IBM
294+ * For conditions of distribution and use, see copyright notice in zlib.h
295+ */
diff --git a/debian/patches/s390x/add-accel-deflate.patch b/debian/patches/s390x/add-accel-deflate.patch
0new file mode 100644296new file mode 100644
index 0000000..1ae9be6
--- /dev/null
+++ b/debian/patches/s390x/add-accel-deflate.patch
@@ -0,0 +1,2043 @@
1From: Ilya Leoshkevich <iii@linux.ibm.com>
2Date: Wed, 18 Jul 2018 13:14:07 +0200
3Subject: Add support for IBM Z hardware-accelerated deflate
4
5IBM Z mainframes starting from version z15 provide DFLTCC instruction,
6which implements deflate algorithm in hardware with estimated
7compression and decompression performance orders of magnitude faster
8than the current zlib and ratio comparable with that of level 1.
9
10This patch adds DFLTCC support to zlib. It can be enabled using the
11following build commands:
12
13 $ ./configure --dfltcc
14 $ make
15
16When built like this, zlib would compress in hardware on level 1, and
17in software on all other levels. Decompression will always happen in
18hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.,
19to make it used by default) one could either configure with
20`--dfltcc-level-mask=0x7e` or `export DFLTCC_LEVEL_MASK=0x7e` at run
21time.
22
23Two DFLTCC compression calls produce the same results only when they
24both are made on machines of the same generation, and when the
25respective buffers have the same offset relative to the start of the
26page. Therefore care should be taken when using hardware compression
27when reproducible results are desired. One such use case - reproducible
28software builds - is handled explicitly: when the `SOURCE_DATE_EPOCH`
29environment variable is set, the hardware compression is disabled.
30
31DFLTCC does not support every single zlib feature, in particular:
32
33 * `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
34 * `inflateMark()`
35 * `inflatePrime()`
36 * `inflateSyncPoint()`
37
38When used, these functions will either switch to software, or, in case
39this is not possible, gracefully fail.
40
41This patch tries to add DFLTCC support in the least intrusive way.
42All SystemZ-specific code is placed into a separate file, but
43unfortunately there is still a noticeable amount of changes in the
44main zlib code. Below is the summary of these changes.
45
46DFLTCC takes as arguments a parameter block, an input buffer, an output
47buffer and a window. Since DFLTCC requires parameter block to be
48doubleword-aligned, and it's reasonable to allocate it alongside
49deflate and inflate states, The `ZALLOC_STATE()`, `ZFREE_STATE()` and
50`ZCOPY_STATE()` macros are introduced in order to encapsulate the
51allocation details. The same is true for window, for which
52the `ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros are introduced.
53
54Software and hardware window formats do not match, therefore,
55`deflateSetDictionary()`, `deflateGetDictionary()`,
56`inflateSetDictionary()` and `inflateGetDictionary()` need special
57handling, which is triggered using the new
58`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
59`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()`
60macros.
61
62`deflateResetKeep()` and `inflateResetKeep()` now update the DFLTCC
63parameter block, which is allocated alongside zlib state, using
64the new `DEFLATE_RESET_KEEP_HOOK()` and `INFLATE_RESET_KEEP_HOOK()`
65macros.
66
67The new `DEFLATE_PARAMS_HOOK()` macro switches between the hardware
68and the software deflate implementations when the `deflateParams()`
69arguments demand this.
70
71The new `INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
72`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
73calls gracefully fail.
74
75The algorithm implemented in the hardware has different compression
76ratio than the one implemented in software. In order for
77`deflateBound()` to return the correct results for the hardware
78implementation, the new `DEFLATE_BOUND_ADJUST_COMPLEN()` and
79`DEFLATE_NEED_CONSERVATIVE_BOUND()` macros are introduced.
80
81Actual compression and decompression are handled by the new
82`DEFLATE_HOOK()` and `INFLATE_TYPEDO_HOOK()` macros. Since inflation
83with DFLTCC manages the window on its own, calling `updatewindow()` is
84suppressed using the new `INFLATE_NEED_UPDATEWINDOW()` macro.
85
86In addition to the compression, DFLTCC computes the CRC-32 and Adler-32
87checksums, therefore, whenever it's used, the software checksumming is
88suppressed using the new `DEFLATE_NEED_CHECKSUM()` and
89`INFLATE_NEED_CHECKSUM()` macros.
90
91DFLTCC will refuse to write an End-of-block Symbol if there is no input
92data, thus in some cases it is necessary to do this manually. In order
93to achieve this, `send_bits()`, `bi_reverse()`, `bi_windup()` and
94`flush_pending()` are promoted from `local` to `ZLIB_INTERNAL`.
95Furthermore, since the block and the stream termination must be handled
96in software as well, `enum block_state` is moved to `deflate.h`.
97
98Since the first call to `dfltcc_inflate()` already needs the window,
99and it might be not allocated yet, `inflate_ensure_window()` is
100factored out of `updatewindow()` and made `ZLIB_INTERNAL`.
101
102Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
103Origin: i-iii/zlib,https://github.com/iii-i/zlib/commit/481ee63d5f8fa12b5c833d32d08a3c74bc62cb20
104---
105 Makefile.in | 8 +
106 compress.c | 14 +-
107 configure | 24 +
108 contrib/README.contrib | 4 +
109 contrib/s390/README.txt | 17 +
110 contrib/s390/dfltcc.c | 1004 +++++++++++++++++++++++++++++++++++++++++
111 contrib/s390/dfltcc.h | 97 ++++
112 contrib/s390/dfltcc_deflate.h | 53 +++
113 deflate.c | 76 +++-
114 deflate.h | 12 +
115 gzguts.h | 4 +
116 inflate.c | 98 ++--
117 inflate.h | 2 +
118 test/infcover.c | 3 +-
119 test/minigzip.c | 4 +
120 trees.c | 8 +-
121 zutil.h | 2 +
122 17 files changed, 1371 insertions(+), 59 deletions(-)
123 create mode 100644 contrib/s390/README.txt
124 create mode 100644 contrib/s390/dfltcc.c
125 create mode 100644 contrib/s390/dfltcc.h
126 create mode 100644 contrib/s390/dfltcc_deflate.h
127
128diff --git a/Makefile.in b/Makefile.in
129index ede4db3..1710f63 100644
130--- a/Makefile.in
131+++ b/Makefile.in
132@@ -140,6 +140,14 @@ match.lo: match.S
133 mv _match.o match.lo
134 rm -f _match.s
135
136+dfltcc.o: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
137+ $(CC) $(CFLAGS) $(ZINC) -c -o $@ $(SRCDIR)contrib/s390/dfltcc.c
138+
139+dfltcc.lo: $(SRCDIR)contrib/s390/dfltcc.c $(SRCDIR)zlib.h zconf.h
140+ -@mkdir objs 2>/dev/null || test -d objs
141+ $(CC) $(SFLAGS) $(ZINC) -DPIC -c -o objs/dfltcc.o $(SRCDIR)contrib/s390/dfltcc.c
142+ -@mv objs/dfltcc.o $@
143+
144 crc32_test.o: $(SRCDIR)test/crc32_test.c $(SRCDIR)zlib.h zconf.h
145 $(CC) $(CFLAGS) $(ZINCOUT) -c -o $@ $(SRCDIR)test/crc32_test.c
146
147diff --git a/compress.c b/compress.c
148index f43bacf..08a0660 100644
149--- a/compress.c
150+++ b/compress.c
151@@ -5,9 +5,15 @@
152
153 /* @(#) $Id$ */
154
155-#define ZLIB_INTERNAL
156+#include "zutil.h"
157 #include "zlib.h"
158
159+#ifdef DFLTCC
160+# include "contrib/s390/dfltcc.h"
161+#else
162+#define DEFLATE_BOUND_COMPLEN(source_len) 0
163+#endif
164+
165 /* ===========================================================================
166 Compresses the source buffer into the destination buffer. The level
167 parameter has the same meaning as in deflateInit. sourceLen is the byte
168@@ -70,6 +76,12 @@ int ZEXPORT compress(Bytef *dest, uLongf *destLen, const Bytef *source,
169 this function needs to be updated.
170 */
171 uLong ZEXPORT compressBound(uLong sourceLen) {
172+ uLong complen = DEFLATE_BOUND_COMPLEN(sourceLen);
173+
174+ if (complen > 0)
175+ /* Architecture-specific code provided an upper bound. */
176+ return complen + ZLIB_WRAPLEN;
177+
178 return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
179 (sourceLen >> 25) + 13;
180 }
181diff --git a/configure b/configure
182index 3372cbf..b99a348 100755
183--- a/configure
184+++ b/configure
185@@ -117,6 +117,7 @@ case "$1" in
186 echo ' configure [--const] [--zprefix] [--prefix=PREFIX] [--eprefix=EXPREFIX]' | tee -a configure.log
187 echo ' [--static] [--64] [--libdir=LIBDIR] [--sharedlibdir=LIBDIR]' | tee -a configure.log
188 echo ' [--includedir=INCLUDEDIR] [--archs="-arch i386 -arch x86_64"]' | tee -a configure.log
189+ echo ' [--dfltcc] [--dfltcc-level-mask=MASK]' | tee -a configure.log
190 exit 0 ;;
191 -p*=* | --prefix=*) prefix=`echo $1 | sed 's/.*=//'`; shift ;;
192 -e*=* | --eprefix=*) exec_prefix=`echo $1 | sed 's/.*=//'`; shift ;;
193@@ -143,6 +144,16 @@ case "$1" in
194 --sanitize) address=1; shift ;;
195 --address) address=1; shift ;;
196 --memory) memory=1; shift ;;
197+ --dfltcc)
198+ CFLAGS="$CFLAGS -DDFLTCC"
199+ OBJC="$OBJC dfltcc.o"
200+ PIC_OBJC="$PIC_OBJC dfltcc.lo"
201+ shift
202+ ;;
203+ --dfltcc-level-mask=*)
204+ CFLAGS="$CFLAGS -DDFLTCC_LEVEL_MASK=`echo $1 | sed 's/.*=//'`"
205+ shift
206+ ;;
207 *)
208 echo "unknown option: $1" | tee -a configure.log
209 echo "$0 --help for help" | tee -a configure.log
210@@ -834,6 +845,19 @@ EOF
211 fi
212 fi
213
214+# Check whether sys/sdt.h is available
215+cat > $test.c << EOF
216+#include <sys/sdt.h>
217+int main() { return 0; }
218+EOF
219+if try $CC -c $CFLAGS $test.c; then
220+ echo "Checking for sys/sdt.h ... Yes." | tee -a configure.log
221+ CFLAGS="$CFLAGS -DHAVE_SYS_SDT_H"
222+ SFLAGS="$SFLAGS -DHAVE_SYS_SDT_H"
223+else
224+ echo "Checking for sys/sdt.h ... No." | tee -a configure.log
225+fi
226+
227 # test to see if we can use a gnu indirection function to detect and load optimized code at runtime
228 echo >> configure.log
229 cat > $test.c <<EOF
230diff --git a/contrib/README.contrib b/contrib/README.contrib
231index 90170df..a36d404 100644
232--- a/contrib/README.contrib
233+++ b/contrib/README.contrib
234@@ -55,6 +55,10 @@ puff/ by Mark Adler <madler@alumni.caltech.edu>
235 Small, low memory usage inflate. Also serves to provide an
236 unambiguous description of the deflate format.
237
238+s390/ by Ilya Leoshkevich <iii@linux.ibm.com>
239+ Hardware-accelerated deflate on IBM Z with DEFLATE CONVERSION CALL
240+ instruction.
241+
242 testzlib/ by Gilles Vollant <info@winimage.com>
243 Example of the use of zlib
244
245diff --git a/contrib/s390/README.txt b/contrib/s390/README.txt
246new file mode 100644
247index 0000000..48be008
248--- /dev/null
249+++ b/contrib/s390/README.txt
250@@ -0,0 +1,17 @@
251+IBM Z mainframes starting from version z15 provide DFLTCC instruction,
252+which implements deflate algorithm in hardware with estimated
253+compression and decompression performance orders of magnitude faster
254+than the current zlib and ratio comparable with that of level 1.
255+
256+This directory adds DFLTCC support. In order to enable it, the following
257+build commands should be used:
258+
259+ $ ./configure --dfltcc
260+ $ make
261+
262+When built like this, zlib would compress in hardware on level 1, and in
263+software on all other levels. Decompression will always happen in
264+hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
265+make it used by default) one could either configure with
266+--dfltcc-level-mask=0x7e or set the environment variable
267+DFLTCC_LEVEL_MASK to 0x7e at run time.
268diff --git a/contrib/s390/dfltcc.c b/contrib/s390/dfltcc.c
269new file mode 100644
270index 0000000..f2b222d
271--- /dev/null
272+++ b/contrib/s390/dfltcc.c
273@@ -0,0 +1,1004 @@
274+/* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */
275+
276+/*
277+ Use the following commands to build zlib with DFLTCC support:
278+
279+ $ ./configure --dfltcc
280+ $ make
281+*/
282+
283+#define _GNU_SOURCE
284+#include <ctype.h>
285+#include <errno.h>
286+#include <inttypes.h>
287+#include <stddef.h>
288+#include <stdio.h>
289+#include <stdint.h>
290+#include <stdlib.h>
291+#include "../../zutil.h"
292+#include "../../deflate.h"
293+#include "../../inftrees.h"
294+#include "../../inflate.h"
295+#include "dfltcc.h"
296+#include "dfltcc_deflate.h"
297+#ifdef HAVE_SYS_SDT_H
298+#include <sys/sdt.h>
299+#endif
300+
301+/*
302+ C wrapper for the DEFLATE CONVERSION CALL instruction.
303+ */
304+typedef enum {
305+ DFLTCC_CC_OK = 0,
306+ DFLTCC_CC_OP1_TOO_SHORT = 1,
307+ DFLTCC_CC_OP2_TOO_SHORT = 2,
308+ DFLTCC_CC_OP2_CORRUPT = 2,
309+ DFLTCC_CC_AGAIN = 3,
310+} dfltcc_cc;
311+
312+#define DFLTCC_QAF 0
313+#define DFLTCC_GDHT 1
314+#define DFLTCC_CMPR 2
315+#define DFLTCC_XPND 4
316+#define HBT_CIRCULAR (1 << 7)
317+#define HB_BITS 15
318+#define HB_SIZE (1 << HB_BITS)
319+#define DFLTCC_FACILITY 151
320+
321+local inline dfltcc_cc dfltcc(int fn, void *param,
322+ Bytef **op1, size_t *len1,
323+ z_const Bytef **op2, size_t *len2,
324+ void *hist)
325+{
326+ Bytef *t2 = op1 ? *op1 : NULL;
327+ size_t t3 = len1 ? *len1 : 0;
328+ z_const Bytef *t4 = op2 ? *op2 : NULL;
329+ size_t t5 = len2 ? *len2 : 0;
330+ register int r0 __asm__("r0") = fn;
331+ register void *r1 __asm__("r1") = param;
332+ register Bytef *r2 __asm__("r2") = t2;
333+ register size_t r3 __asm__("r3") = t3;
334+ register z_const Bytef *r4 __asm__("r4") = t4;
335+ register size_t r5 __asm__("r5") = t5;
336+ int cc;
337+
338+ __asm__ volatile(
339+#ifdef HAVE_SYS_SDT_H
340+ STAP_PROBE_ASM(zlib, dfltcc_entry,
341+ STAP_PROBE_ASM_TEMPLATE(5))
342+#endif
343+ ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n"
344+#ifdef HAVE_SYS_SDT_H
345+ STAP_PROBE_ASM(zlib, dfltcc_exit,
346+ STAP_PROBE_ASM_TEMPLATE(5))
347+#endif
348+ "ipm %[cc]\n"
349+ : [r2] "+r" (r2)
350+ , [r3] "+r" (r3)
351+ , [r4] "+r" (r4)
352+ , [r5] "+r" (r5)
353+ , [cc] "=r" (cc)
354+ : [r0] "r" (r0)
355+ , [r1] "r" (r1)
356+ , [hist] "r" (hist)
357+#ifdef HAVE_SYS_SDT_H
358+ , STAP_PROBE_ASM_OPERANDS(5, r2, r3, r4, r5, hist)
359+#endif
360+ : "cc", "memory");
361+ t2 = r2; t3 = r3; t4 = r4; t5 = r5;
362+
363+ if (op1)
364+ *op1 = t2;
365+ if (len1)
366+ *len1 = t3;
367+ if (op2)
368+ *op2 = t4;
369+ if (len2)
370+ *len2 = t5;
371+ return (cc >> 28) & 3;
372+}
373+
374+/*
375+ Parameter Block for Query Available Functions.
376+ */
377+#define static_assert(c, msg) \
378+ __attribute__((unused)) \
379+ static char static_assert_failed_ ## msg[c ? 1 : -1]
380+
381+struct dfltcc_qaf_param {
382+ char fns[16];
383+ char reserved1[8];
384+ char fmts[2];
385+ char reserved2[6];
386+};
387+
388+static_assert(sizeof(struct dfltcc_qaf_param) == 32,
389+ sizeof_struct_dfltcc_qaf_param_is_32);
390+
391+local inline int is_bit_set(const char *bits, int n)
392+{
393+ return bits[n / 8] & (1 << (7 - (n % 8)));
394+}
395+
396+local inline void clear_bit(char *bits, int n)
397+{
398+ bits[n / 8] &= ~(1 << (7 - (n % 8)));
399+}
400+
401+#define DFLTCC_FMT0 0
402+
403+/*
404+ Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand.
405+ */
406+#define CVT_CRC32 0
407+#define CVT_ADLER32 1
408+#define HTT_FIXED 0
409+#define HTT_DYNAMIC 1
410+
411+struct dfltcc_param_v0 {
412+ uint16_t pbvn; /* Parameter-Block-Version Number */
413+ uint8_t mvn; /* Model-Version Number */
414+ uint8_t ribm; /* Reserved for IBM use */
415+ unsigned reserved32 : 31;
416+ unsigned cf : 1; /* Continuation Flag */
417+ uint8_t reserved64[8];
418+ unsigned nt : 1; /* New Task */
419+ unsigned reserved129 : 1;
420+ unsigned cvt : 1; /* Check Value Type */
421+ unsigned reserved131 : 1;
422+ unsigned htt : 1; /* Huffman-Table Type */
423+ unsigned bcf : 1; /* Block-Continuation Flag */
424+ unsigned bcc : 1; /* Block Closing Control */
425+ unsigned bhf : 1; /* Block Header Final */
426+ unsigned reserved136 : 1;
427+ unsigned reserved137 : 1;
428+ unsigned dhtgc : 1; /* DHT Generation Control */
429+ unsigned reserved139 : 5;
430+ unsigned reserved144 : 5;
431+ unsigned sbb : 3; /* Sub-Byte Boundary */
432+ uint8_t oesc; /* Operation-Ending-Supplemental Code */
433+ unsigned reserved160 : 12;
434+ unsigned ifs : 4; /* Incomplete-Function Status */
435+ uint16_t ifl; /* Incomplete-Function Length */
436+ uint8_t reserved192[8];
437+ uint8_t reserved256[8];
438+ uint8_t reserved320[4];
439+ uint16_t hl; /* History Length */
440+ unsigned reserved368 : 1;
441+ uint16_t ho : 15; /* History Offset */
442+ uint32_t cv; /* Check Value */
443+ unsigned eobs : 15; /* End-of-block Symbol */
444+ unsigned reserved431: 1;
445+ uint8_t eobl : 4; /* End-of-block Length */
446+ unsigned reserved436 : 12;
447+ unsigned reserved448 : 4;
448+ uint16_t cdhtl : 12; /* Compressed-Dynamic-Huffman Table
449+ Length */
450+ uint8_t reserved464[6];
451+ uint8_t cdht[288];
452+ uint8_t reserved[32];
453+ uint8_t csb[1152];
454+};
455+
456+static_assert(sizeof(struct dfltcc_param_v0) == 1536,
457+ sizeof_struct_dfltcc_param_v0_is_1536);
458+
459+local z_const char *oesc_msg(char *buf, int oesc)
460+{
461+ if (oesc == 0x00)
462+ return NULL; /* Successful completion */
463+ else {
464+ sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc);
465+ return buf;
466+ }
467+}
468+
469+/*
470+ Extension of inflate_state and deflate_state. Must be doubleword-aligned.
471+*/
472+struct dfltcc_state {
473+ struct dfltcc_param_v0 param; /* Parameter block. */
474+ struct dfltcc_qaf_param af; /* Available functions. */
475+ uLong level_mask; /* Levels on which to use DFLTCC */
476+ uLong block_size; /* New block each X bytes */
477+ uLong block_threshold; /* New block after total_in > X */
478+ uLong dht_threshold; /* New block only if avail_in >= X */
479+ char msg[64]; /* Buffer for strm->msg */
480+};
481+
482+#define ALIGN_UP(p, size) \
483+ (__typeof__(p))(((uintptr_t)(p) + ((size) - 1)) & ~((size) - 1))
484+
485+#define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)( \
486+ (char *)(state) + ALIGN_UP(sizeof(*state), 8)))
487+
488+/*
489+ Compress.
490+ */
491+local inline int dfltcc_can_deflate_with_params(z_streamp strm,
492+ int level,
493+ uInt window_bits,
494+ int strategy)
495+{
496+ deflate_state *state = (deflate_state *)strm->state;
497+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
498+
499+ /* Unsupported compression settings */
500+ if ((dfltcc_state->level_mask & (1 << level)) == 0)
501+ return 0;
502+ if (window_bits != HB_BITS)
503+ return 0;
504+ if (strategy != Z_FIXED && strategy != Z_DEFAULT_STRATEGY)
505+ return 0;
506+
507+ /* Unsupported hardware */
508+ if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) ||
509+ !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) ||
510+ !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0))
511+ return 0;
512+
513+ return 1;
514+}
515+
516+int ZLIB_INTERNAL dfltcc_can_deflate(z_streamp strm)
517+{
518+ deflate_state *state = (deflate_state *)strm->state;
519+
520+ return dfltcc_can_deflate_with_params(strm,
521+ state->level,
522+ state->w_bits,
523+ state->strategy);
524+}
525+
526+local void dfltcc_gdht(z_streamp strm)
527+{
528+ deflate_state *state = (deflate_state *)strm->state;
529+ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
530+ size_t avail_in = avail_in = strm->avail_in;
531+
532+ dfltcc(DFLTCC_GDHT,
533+ param, NULL, NULL,
534+ &strm->next_in, &avail_in, NULL);
535+}
536+
537+local dfltcc_cc dfltcc_cmpr(z_streamp strm)
538+{
539+ deflate_state *state = (deflate_state *)strm->state;
540+ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
541+ size_t avail_in = strm->avail_in;
542+ size_t avail_out = strm->avail_out;
543+ dfltcc_cc cc;
544+
545+ cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR,
546+ param, &strm->next_out, &avail_out,
547+ &strm->next_in, &avail_in, state->window);
548+ strm->total_in += (strm->avail_in - avail_in);
549+ strm->total_out += (strm->avail_out - avail_out);
550+ strm->avail_in = avail_in;
551+ strm->avail_out = avail_out;
552+ return cc;
553+}
554+
555+local void send_eobs(z_streamp strm,
556+ z_const struct dfltcc_param_v0 *param)
557+{
558+ deflate_state *state = (deflate_state *)strm->state;
559+
560+ _tr_send_bits(
561+ state,
562+ bi_reverse(param->eobs >> (15 - param->eobl), param->eobl),
563+ param->eobl);
564+ flush_pending(strm);
565+ if (state->pending != 0) {
566+ /* The remaining data is located in pending_out[0:pending]. If someone
567+ * calls put_byte() - this might happen in deflate() - the byte will be
568+ * placed into pending_buf[pending], which is incorrect. Move the
569+ * remaining data to the beginning of pending_buf so that put_byte() is
570+ * usable again.
571+ */
572+ memmove(state->pending_buf, state->pending_out, state->pending);
573+ state->pending_out = state->pending_buf;
574+ }
575+#ifdef ZLIB_DEBUG
576+ state->compressed_len += param->eobl;
577+#endif
578+}
579+
580+int ZLIB_INTERNAL dfltcc_deflate(z_streamp strm, int flush,
581+ block_state *result)
582+{
583+ deflate_state *state = (deflate_state *)strm->state;
584+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
585+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
586+ uInt masked_avail_in;
587+ dfltcc_cc cc;
588+ int need_empty_block;
589+ int soft_bcc;
590+ int no_flush;
591+
592+ if (!dfltcc_can_deflate(strm)) {
593+ /* Clear history. */
594+ if (flush == Z_FULL_FLUSH)
595+ param->hl = 0;
596+ return 0;
597+ }
598+
599+again:
600+ masked_avail_in = 0;
601+ soft_bcc = 0;
602+ no_flush = flush == Z_NO_FLUSH;
603+
604+ /* No input data. Return, except when Continuation Flag is set, which means
605+ * that DFLTCC has buffered some output in the parameter block and needs to
606+ * be called again in order to flush it.
607+ */
608+ if (strm->avail_in == 0 && !param->cf) {
609+ /* A block is still open, and the hardware does not support closing
610+ * blocks without adding data. Thus, close it manually.
611+ */
612+ if (!no_flush && param->bcf) {
613+ send_eobs(strm, param);
614+ param->bcf = 0;
615+ }
616+ /* Let one of deflate_* functions write a trailing empty block. */
617+ if (flush == Z_FINISH)
618+ return 0;
619+ /* Clear history. */
620+ if (flush == Z_FULL_FLUSH)
621+ param->hl = 0;
622+ /* Trigger block post-processing if necessary. */
623+ *result = no_flush ? need_more : block_done;
624+ return 1;
625+ }
626+
627+ /* There is an open non-BFINAL block, we are not going to close it just
628+ * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see
629+ * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new
630+ * DHT in order to adapt to a possibly changed input data distribution.
631+ */
632+ if (param->bcf && no_flush &&
633+ strm->total_in > dfltcc_state->block_threshold &&
634+ strm->avail_in >= dfltcc_state->dht_threshold) {
635+ if (param->cf) {
636+ /* We need to flush the DFLTCC buffer before writing the
637+ * End-of-block Symbol. Mask the input data and proceed as usual.
638+ */
639+ masked_avail_in += strm->avail_in;
640+ strm->avail_in = 0;
641+ no_flush = 0;
642+ } else {
643+ /* DFLTCC buffer is empty, so we can manually write the
644+ * End-of-block Symbol right away.
645+ */
646+ send_eobs(strm, param);
647+ param->bcf = 0;
648+ dfltcc_state->block_threshold =
649+ strm->total_in + dfltcc_state->block_size;
650+ }
651+ }
652+
653+ /* No space for compressed data. If we proceed, dfltcc_cmpr() will return
654+ * DFLTCC_CC_OP1_TOO_SHORT without buffering header bits, but we will still
655+ * set BCF=1, which is wrong. Avoid complications and return early.
656+ */
657+ if (strm->avail_out == 0) {
658+ *result = need_more;
659+ return 1;
660+ }
661+
662+ /* The caller gave us too much data. Pass only one block worth of
663+ * uncompressed data to DFLTCC and mask the rest, so that on the next
664+ * iteration we start a new block.
665+ */
666+ if (no_flush && strm->avail_in > dfltcc_state->block_size) {
667+ masked_avail_in += (strm->avail_in - dfltcc_state->block_size);
668+ strm->avail_in = dfltcc_state->block_size;
669+ }
670+
671+ /* When we have an open non-BFINAL deflate block and caller indicates that
672+ * the stream is ending, we need to close an open deflate block and open a
673+ * BFINAL one.
674+ */
675+ need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf;
676+
677+ /* Translate stream to parameter block */
678+ param->cvt = state->wrap == 2 ? CVT_CRC32 : CVT_ADLER32;
679+ if (!no_flush)
680+ /* We need to close a block. Always do this in software - when there is
681+ * no input data, the hardware will not honor BCC. */
682+ soft_bcc = 1;
683+ if (flush == Z_FINISH && !param->bcf)
684+ /* We are about to open a BFINAL block, set Block Header Final bit
685+ * until the stream ends.
686+ */
687+ param->bhf = 1;
688+ /* DFLTCC-CMPR will write to next_out, so make sure that buffers with
689+ * higher precedence are empty.
690+ */
691+ Assert(state->pending == 0, "There must be no pending bytes");
692+ Assert(state->bi_valid < 8, "There must be less than 8 pending bits");
693+ param->sbb = (unsigned int)state->bi_valid;
694+ if (param->sbb > 0)
695+ *strm->next_out = (Bytef)state->bi_buf;
696+ /* Honor history and check value */
697+ param->nt = 0;
698+ if (state->wrap == 1)
699+ param->cv = strm->adler;
700+ else if (state->wrap == 2)
701+ param->cv = ZSWAP32(strm->adler);
702+
703+ /* When opening a block, choose a Huffman-Table Type */
704+ if (!param->bcf) {
705+ if (state->strategy == Z_FIXED ||
706+ (strm->total_in == 0 && dfltcc_state->block_threshold > 0))
707+ param->htt = HTT_FIXED;
708+ else {
709+ param->htt = HTT_DYNAMIC;
710+ dfltcc_gdht(strm);
711+ }
712+ }
713+
714+ /* Deflate */
715+ do {
716+ cc = dfltcc_cmpr(strm);
717+ if (strm->avail_in < 4096 && masked_avail_in > 0)
718+ /* We are about to call DFLTCC with a small input buffer, which is
719+ * inefficient. Since there is masked data, there will be at least
720+ * one more DFLTCC call, so skip the current one and make the next
721+ * one handle more data.
722+ */
723+ break;
724+ } while (cc == DFLTCC_CC_AGAIN);
725+
726+ /* Translate parameter block to stream */
727+ strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
728+ state->bi_valid = param->sbb;
729+ if (state->bi_valid == 0)
730+ state->bi_buf = 0; /* Avoid accessing next_out */
731+ else
732+ state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1);
733+ if (state->wrap == 1)
734+ strm->adler = param->cv;
735+ else if (state->wrap == 2)
736+ strm->adler = ZSWAP32(param->cv);
737+
738+ /* Unmask the input data */
739+ strm->avail_in += masked_avail_in;
740+ masked_avail_in = 0;
741+
742+ /* If we encounter an error, it means there is a bug in DFLTCC call */
743+ Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG");
744+
745+ /* Update Block-Continuation Flag. It will be used to check whether to call
746+ * GDHT the next time.
747+ */
748+ if (cc == DFLTCC_CC_OK) {
749+ if (soft_bcc) {
750+ send_eobs(strm, param);
751+ param->bcf = 0;
752+ dfltcc_state->block_threshold =
753+ strm->total_in + dfltcc_state->block_size;
754+ } else
755+ param->bcf = 1;
756+ if (flush == Z_FINISH) {
757+ if (need_empty_block)
758+ /* Make the current deflate() call also close the stream */
759+ return 0;
760+ else {
761+ bi_windup(state);
762+ *result = finish_done;
763+ }
764+ } else {
765+ if (flush == Z_FULL_FLUSH)
766+ param->hl = 0; /* Clear history */
767+ *result = flush == Z_NO_FLUSH ? need_more : block_done;
768+ }
769+ } else {
770+ param->bcf = 1;
771+ *result = need_more;
772+ }
773+ if (strm->avail_in != 0 && strm->avail_out != 0)
774+ goto again; /* deflate() must use all input or all output */
775+ return 1;
776+}
777+
778+/*
779+ Expand.
780+ */
781+int ZLIB_INTERNAL dfltcc_can_inflate(z_streamp strm)
782+{
783+ struct inflate_state *state = (struct inflate_state *)strm->state;
784+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
785+
786+ /* Unsupported hardware */
787+ return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
788+ is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
789+}
790+
791+local dfltcc_cc dfltcc_xpnd(z_streamp strm)
792+{
793+ struct inflate_state *state = (struct inflate_state *)strm->state;
794+ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
795+ size_t avail_in = strm->avail_in;
796+ size_t avail_out = strm->avail_out;
797+ dfltcc_cc cc;
798+
799+ cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR,
800+ param, &strm->next_out, &avail_out,
801+ &strm->next_in, &avail_in, state->window);
802+ strm->avail_in = avail_in;
803+ strm->avail_out = avail_out;
804+ return cc;
805+}
806+
807+dfltcc_inflate_action ZLIB_INTERNAL dfltcc_inflate(z_streamp strm, int flush,
808+ int *ret)
809+{
810+ struct inflate_state *state = (struct inflate_state *)strm->state;
811+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
812+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
813+ dfltcc_cc cc;
814+
815+ if (flush == Z_BLOCK || flush == Z_TREES) {
816+ /* DFLTCC does not support stopping on block boundaries */
817+ if (dfltcc_inflate_disable(strm)) {
818+ *ret = Z_STREAM_ERROR;
819+ return DFLTCC_INFLATE_BREAK;
820+ } else
821+ return DFLTCC_INFLATE_SOFTWARE;
822+ }
823+
824+ if (state->last) {
825+ if (state->bits != 0) {
826+ strm->next_in++;
827+ strm->avail_in--;
828+ state->bits = 0;
829+ }
830+ state->mode = CHECK;
831+ return DFLTCC_INFLATE_CONTINUE;
832+ }
833+
834+ if (strm->avail_in == 0 && !param->cf)
835+ return DFLTCC_INFLATE_BREAK;
836+
837+ if (inflate_ensure_window(state)) {
838+ state->mode = MEM;
839+ return DFLTCC_INFLATE_CONTINUE;
840+ }
841+
842+ /* Translate stream to parameter block */
843+ param->cvt = ((state->wrap & 4) && state->flags) ? CVT_CRC32 : CVT_ADLER32;
844+ param->sbb = state->bits;
845+ if (param->hl)
846+ param->nt = 0; /* Honor history for the first block */
847+ if (state->wrap & 4)
848+ param->cv = state->flags ? ZSWAP32(state->check) : state->check;
849+
850+ /* Inflate */
851+ do {
852+ cc = dfltcc_xpnd(strm);
853+ } while (cc == DFLTCC_CC_AGAIN);
854+
855+ /* Translate parameter block to stream */
856+ strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
857+ state->last = cc == DFLTCC_CC_OK;
858+ state->bits = param->sbb;
859+ if (state->wrap & 4)
860+ strm->adler = state->check = state->flags ?
861+ ZSWAP32(param->cv) : param->cv;
862+ if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
863+ /* Report an error if stream is corrupted */
864+ state->mode = BAD;
865+ return DFLTCC_INFLATE_CONTINUE;
866+ }
867+ state->mode = TYPEDO;
868+ /* Break if operands are exhausted, otherwise continue looping */
869+ return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
870+ DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
871+}
872+
873+int ZLIB_INTERNAL dfltcc_was_inflate_used(z_streamp strm)
874+{
875+ struct inflate_state *state = (struct inflate_state *)strm->state;
876+ struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
877+
878+ return !param->nt;
879+}
880+
881+/*
882+ Rotates a circular buffer.
883+ The implementation is based on https://cplusplus.com/reference/algorithm/rotate/
884+ */
885+local void rotate(Bytef *start, Bytef *pivot, Bytef *end)
886+{
887+ Bytef *p = pivot;
888+ Bytef tmp;
889+
890+ while (p != start) {
891+ tmp = *start;
892+ *start = *p;
893+ *p = tmp;
894+
895+ start++;
896+ p++;
897+
898+ if (p == end)
899+ p = pivot;
900+ else if (start == pivot)
901+ pivot = p;
902+ }
903+}
904+
905+#define MIN(x, y) ({ \
906+ typeof(x) _x = (x); \
907+ typeof(y) _y = (y); \
908+ _x < _y ? _x : _y; \
909+})
910+
911+#define MAX(x, y) ({ \
912+ typeof(x) _x = (x); \
913+ typeof(y) _y = (y); \
914+ _x > _y ? _x : _y; \
915+})
916+
917+int ZLIB_INTERNAL dfltcc_inflate_disable(z_streamp strm)
918+{
919+ struct inflate_state *state = (struct inflate_state *)strm->state;
920+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
921+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
922+
923+ if (!dfltcc_can_inflate(strm))
924+ return 0;
925+ if (dfltcc_was_inflate_used(strm))
926+ /* DFLTCC has already decompressed some data. Since there is not
927+ * enough information to resume decompression in software, the call
928+ * must fail.
929+ */
930+ return 1;
931+ /* DFLTCC was not used yet - decompress in software */
932+ memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
933+ /* Convert the window from the hardware to the software format */
934+ rotate(state->window, state->window + param->ho, state->window + HB_SIZE);
935+ state->whave = state->wnext = MIN(param->hl, state->wsize);
936+ return 0;
937+}
938+
939+local int env_dfltcc_disabled;
940+local int env_source_date_epoch;
941+local unsigned long env_level_mask;
942+local unsigned long env_block_size;
943+local unsigned long env_block_threshold;
944+local unsigned long env_dht_threshold;
945+local unsigned long env_ribm;
946+local uint64_t cpu_facilities[(DFLTCC_FACILITY / 64) + 1];
947+local struct dfltcc_qaf_param cpu_af __attribute__((aligned(8)));
948+
949+local inline int is_dfltcc_enabled(void)
950+{
951+ if (env_dfltcc_disabled)
952+ /* User has explicitly disabled DFLTCC. */
953+ return 0;
954+
955+ return is_bit_set((const char *)cpu_facilities, DFLTCC_FACILITY);
956+}
957+
958+local unsigned long xstrtoul(const char *s, unsigned long _default)
959+{
960+ char *endptr;
961+ unsigned long result;
962+
963+ if (!(s && *s))
964+ return _default;
965+ errno = 0;
966+ result = strtoul(s, &endptr, 0);
967+ return (errno || *endptr) ? _default : result;
968+}
969+
970+__attribute__((constructor)) local void init_globals(void)
971+{
972+ const char *env;
973+ register char r0 __asm__("r0");
974+
975+ env = secure_getenv("DFLTCC");
976+ env_dfltcc_disabled = env && !strcmp(env, "0");
977+
978+ env = secure_getenv("SOURCE_DATE_EPOCH");
979+ env_source_date_epoch = !!env;
980+
981+#ifndef DFLTCC_LEVEL_MASK
982+#define DFLTCC_LEVEL_MASK 0x2
983+#endif
984+ env_level_mask = xstrtoul(secure_getenv("DFLTCC_LEVEL_MASK"),
985+ DFLTCC_LEVEL_MASK);
986+
987+#ifndef DFLTCC_BLOCK_SIZE
988+#define DFLTCC_BLOCK_SIZE 1048576
989+#endif
990+ env_block_size = xstrtoul(secure_getenv("DFLTCC_BLOCK_SIZE"),
991+ DFLTCC_BLOCK_SIZE);
992+
993+#ifndef DFLTCC_FIRST_FHT_BLOCK_SIZE
994+#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
995+#endif
996+ env_block_threshold = xstrtoul(secure_getenv("DFLTCC_FIRST_FHT_BLOCK_SIZE"),
997+ DFLTCC_FIRST_FHT_BLOCK_SIZE);
998+
999+#ifndef DFLTCC_DHT_MIN_SAMPLE_SIZE
1000+#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
1001+#endif
1002+ env_dht_threshold = xstrtoul(secure_getenv("DFLTCC_DHT_MIN_SAMPLE_SIZE"),
1003+ DFLTCC_DHT_MIN_SAMPLE_SIZE);
1004+
1005+#ifndef DFLTCC_RIBM
1006+#define DFLTCC_RIBM 0
1007+#endif
1008+ env_ribm = xstrtoul(secure_getenv("DFLTCC_RIBM"), DFLTCC_RIBM);
1009+
1010+ memset(cpu_facilities, 0, sizeof(cpu_facilities));
1011+ r0 = sizeof(cpu_facilities) / sizeof(cpu_facilities[0]) - 1;
1012+ /* STFLE is supported since z9-109 and only in z/Architecture mode. When
1013+ * compiling with -m31, gcc defaults to ESA mode, however, since the kernel
1014+ * is 64-bit, it's always z/Architecture mode at runtime.
1015+ */
1016+ __asm__ volatile(
1017+#ifndef __clang__
1018+ ".machinemode push\n"
1019+ ".machinemode zarch\n"
1020+#endif
1021+ "stfle %[facilities]\n"
1022+#ifndef __clang__
1023+ ".machinemode pop\n"
1024+#endif
1025+ : [facilities] "=Q" (cpu_facilities)
1026+ , [r0] "+r" (r0)
1027+ :
1028+ : "cc");
1029+
1030+ /* Initialize available functions */
1031+ if (is_dfltcc_enabled())
1032+ dfltcc(DFLTCC_QAF, &cpu_af, NULL, NULL, NULL, NULL, NULL);
1033+ else
1034+ memset(&cpu_af, 0, sizeof(cpu_af));
1035+}
1036+
1037+/*
1038+ Memory management.
1039+
1040+ DFLTCC requires parameter blocks and window to be aligned. zlib allows
1041+ users to specify their own allocation functions, so using e.g.
1042+ `posix_memalign' is not an option. Thus, we overallocate and take the
1043+ aligned portion of the buffer.
1044+*/
1045+void ZLIB_INTERNAL dfltcc_reset(z_streamp strm, uInt size)
1046+{
1047+ struct dfltcc_state *dfltcc_state =
1048+ (struct dfltcc_state *)((char *)strm->state + ALIGN_UP(size, 8));
1049+
1050+ memcpy(&dfltcc_state->af, &cpu_af, sizeof(dfltcc_state->af));
1051+
1052+ if (env_source_date_epoch)
1053+ /* User needs reproducible results, but the output of DFLTCC_CMPR
1054+ * depends on buffers' page offsets.
1055+ */
1056+ clear_bit(dfltcc_state->af.fns, DFLTCC_CMPR);
1057+
1058+ /* Initialize parameter block */
1059+ memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param));
1060+ dfltcc_state->param.nt = 1;
1061+
1062+ /* Initialize tuning parameters */
1063+ dfltcc_state->level_mask = env_level_mask;
1064+ dfltcc_state->block_size = env_block_size;
1065+ dfltcc_state->block_threshold = env_block_threshold;
1066+ dfltcc_state->dht_threshold = env_dht_threshold;
1067+ dfltcc_state->param.ribm = env_ribm;
1068+}
1069+
1070+voidpf ZLIB_INTERNAL dfltcc_alloc_state(z_streamp strm, uInt items, uInt size)
1071+{
1072+ return ZALLOC(strm,
1073+ ALIGN_UP(items * size, 8) + sizeof(struct dfltcc_state),
1074+ sizeof(unsigned char));
1075+}
1076+
1077+void ZLIB_INTERNAL dfltcc_copy_state(voidpf dst, const voidpf src, uInt size)
1078+{
1079+ zmemcpy(dst, src, ALIGN_UP(size, 8) + sizeof(struct dfltcc_state));
1080+}
1081+
1082+static const int PAGE_ALIGN = 0x1000;
1083+
1084+voidpf ZLIB_INTERNAL dfltcc_alloc_window(z_streamp strm, uInt items, uInt size)
1085+{
1086+ voidpf p, w;
1087+
1088+ /* To simplify freeing, we store the pointer to the allocated buffer right
1089+ * before the window. Note that DFLTCC always uses HB_SIZE bytes.
1090+ */
1091+ p = ZALLOC(strm, sizeof(voidpf) + MAX(items * size, HB_SIZE) + PAGE_ALIGN,
1092+ sizeof(unsigned char));
1093+ if (p == NULL)
1094+ return NULL;
1095+ w = ALIGN_UP((char *)p + sizeof(voidpf), PAGE_ALIGN);
1096+ *(voidpf *)((char *)w - sizeof(voidpf)) = p;
1097+ return w;
1098+}
1099+
1100+void ZLIB_INTERNAL dfltcc_copy_window(void *dest, const void *src, size_t n)
1101+{
1102+ memcpy(dest, src, MAX(n, HB_SIZE));
1103+}
1104+
1105+void ZLIB_INTERNAL dfltcc_free_window(z_streamp strm, voidpf w)
1106+{
1107+ if (w)
1108+ ZFREE(strm, *(voidpf *)((unsigned char *)w - sizeof(voidpf)));
1109+}
1110+
1111+/*
1112+ Switching between hardware and software compression.
1113+
1114+ DFLTCC does not support all zlib settings, e.g. generation of non-compressed
1115+ blocks or alternative window sizes. When such settings are applied on the
1116+ fly with deflateParams, we need to convert between hardware and software
1117+ window formats.
1118+*/
1119+int ZLIB_INTERNAL dfltcc_deflate_params(z_streamp strm, int level,
1120+ int strategy, int *flush)
1121+{
1122+ deflate_state *state = (deflate_state *)strm->state;
1123+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1124+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1125+ int could_deflate = dfltcc_can_deflate(strm);
1126+ int can_deflate = dfltcc_can_deflate_with_params(strm,
1127+ level,
1128+ state->w_bits,
1129+ strategy);
1130+
1131+ if (can_deflate == could_deflate)
1132+ /* We continue to work in the same mode - no changes needed */
1133+ return Z_OK;
1134+
1135+ if (strm->total_in == 0 && param->nt == 1 && param->hl == 0)
1136+ /* DFLTCC was not used yet - no changes needed */
1137+ return Z_OK;
1138+
1139+ /* For now, do not convert between window formats - simply get rid of the
1140+ * old data instead.
1141+ */
1142+ *flush = Z_FULL_FLUSH;
1143+ return Z_OK;
1144+}
1145+
1146+int ZLIB_INTERNAL dfltcc_deflate_done(z_streamp strm, int flush)
1147+{
1148+ deflate_state *state = (deflate_state *)strm->state;
1149+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1150+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1151+
1152+ /* When deflate(Z_FULL_FLUSH) is called with small avail_out, it might
1153+ * close the block without resetting the compression state. Detect this
1154+ * situation and return that deflation is not done.
1155+ */
1156+ if (flush == Z_FULL_FLUSH && strm->avail_out == 0)
1157+ return 0;
1158+
1159+ /* Return that deflation is not done if DFLTCC is used and either it
1160+ * buffered some data (Continuation Flag is set), or has not written EOBS
1161+ * yet (Block-Continuation Flag is set).
1162+ */
1163+ return !dfltcc_can_deflate(strm) || (!param->cf && !param->bcf);
1164+}
1165+
1166+/*
1167+ Preloading history.
1168+*/
1169+local void append_history(struct dfltcc_param_v0 *param,
1170+ Bytef *history,
1171+ const Bytef *buf,
1172+ uInt count)
1173+{
1174+ size_t offset;
1175+ size_t n;
1176+
1177+ /* Do not use more than 32K */
1178+ if (count > HB_SIZE) {
1179+ buf += count - HB_SIZE;
1180+ count = HB_SIZE;
1181+ }
1182+ offset = (param->ho + param->hl) % HB_SIZE;
1183+ if (offset + count <= HB_SIZE)
1184+ /* Circular history buffer does not wrap - copy one chunk */
1185+ zmemcpy(history + offset, buf, count);
1186+ else {
1187+ /* Circular history buffer wraps - copy two chunks */
1188+ n = HB_SIZE - offset;
1189+ zmemcpy(history + offset, buf, n);
1190+ zmemcpy(history, buf + n, count - n);
1191+ }
1192+ n = param->hl + count;
1193+ if (n <= HB_SIZE)
1194+ /* All history fits into buffer - no need to discard anything */
1195+ param->hl = n;
1196+ else {
1197+ /* History does not fit into buffer - discard extra bytes */
1198+ param->ho = (param->ho + (n - HB_SIZE)) % HB_SIZE;
1199+ param->hl = HB_SIZE;
1200+ }
1201+}
1202+
1203+local void get_history(struct dfltcc_param_v0 *param,
1204+ const Bytef *history,
1205+ Bytef *buf)
1206+{
1207+ if (param->ho + param->hl <= HB_SIZE)
1208+ /* Circular history buffer does not wrap - copy one chunk */
1209+ memcpy(buf, history + param->ho, param->hl);
1210+ else {
1211+ /* Circular history buffer wraps - copy two chunks */
1212+ memcpy(buf, history + param->ho, HB_SIZE - param->ho);
1213+ memcpy(buf + HB_SIZE - param->ho, history, param->ho + param->hl - HB_SIZE);
1214+ }
1215+}
1216+
1217+int ZLIB_INTERNAL dfltcc_deflate_set_dictionary(z_streamp strm,
1218+ const Bytef *dictionary,
1219+ uInt dict_length)
1220+{
1221+ deflate_state *state = (deflate_state *)strm->state;
1222+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1223+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1224+
1225+ append_history(param, state->window, dictionary, dict_length);
1226+ state->strstart = 1; /* Add FDICT to zlib header */
1227+ state->block_start = state->strstart; /* Make deflate_stored happy */
1228+ return Z_OK;
1229+}
1230+
1231+int ZLIB_INTERNAL dfltcc_deflate_get_dictionary(z_streamp strm,
1232+ Bytef *dictionary,
1233+ uInt *dict_length)
1234+{
1235+ deflate_state *state = (deflate_state *)strm->state;
1236+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1237+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1238+
1239+ if (dictionary)
1240+ get_history(param, state->window, dictionary);
1241+ if (dict_length)
1242+ *dict_length = param->hl;
1243+ return Z_OK;
1244+}
1245+
1246+int ZLIB_INTERNAL dfltcc_inflate_set_dictionary(z_streamp strm,
1247+ const Bytef *dictionary,
1248+ uInt dict_length)
1249+{
1250+ struct inflate_state *state = (struct inflate_state *)strm->state;
1251+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1252+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1253+
1254+ if (inflate_ensure_window(state)) {
1255+ state->mode = MEM;
1256+ return Z_MEM_ERROR;
1257+ }
1258+
1259+ append_history(param, state->window, dictionary, dict_length);
1260+ state->havedict = 1;
1261+ return Z_OK;
1262+}
1263+
1264+int ZLIB_INTERNAL dfltcc_inflate_get_dictionary(z_streamp strm,
1265+ Bytef *dictionary,
1266+ uInt *dict_length)
1267+{
1268+ struct inflate_state *state = (struct inflate_state *)strm->state;
1269+ struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
1270+ struct dfltcc_param_v0 *param = &dfltcc_state->param;
1271+
1272+ if (dictionary && state->window)
1273+ get_history(param, state->window, dictionary);
1274+ if (dict_length)
1275+ *dict_length = param->hl;
1276+ return Z_OK;
1277+}
1278diff --git a/contrib/s390/dfltcc.h b/contrib/s390/dfltcc.h
1279new file mode 100644
1280index 0000000..c8491c4
1281--- /dev/null
1282+++ b/contrib/s390/dfltcc.h
1283@@ -0,0 +1,97 @@
1284+#ifndef DFLTCC_H
1285+#define DFLTCC_H
1286+
1287+#include "../../zlib.h"
1288+#include "../../zutil.h"
1289+
1290+voidpf ZLIB_INTERNAL dfltcc_alloc_state(z_streamp strm, uInt items, uInt size);
1291+void ZLIB_INTERNAL dfltcc_copy_state(voidpf dst, const voidpf src, uInt size);
1292+void ZLIB_INTERNAL dfltcc_reset(z_streamp strm, uInt size);
1293+voidpf ZLIB_INTERNAL dfltcc_alloc_window(z_streamp strm, uInt items,
1294+ uInt size);
1295+void ZLIB_INTERNAL dfltcc_copy_window(void *dest, const void *src, size_t n);
1296+void ZLIB_INTERNAL dfltcc_free_window(z_streamp strm, voidpf w);
1297+#define DFLTCC_BLOCK_HEADER_BITS 3
1298+#define DFLTCC_HLITS_COUNT_BITS 5
1299+#define DFLTCC_HDISTS_COUNT_BITS 5
1300+#define DFLTCC_HCLENS_COUNT_BITS 4
1301+#define DFLTCC_MAX_HCLENS 19
1302+#define DFLTCC_HCLEN_BITS 3
1303+#define DFLTCC_MAX_HLITS 286
1304+#define DFLTCC_MAX_HDISTS 30
1305+#define DFLTCC_MAX_HLIT_HDIST_BITS 7
1306+#define DFLTCC_MAX_SYMBOL_BITS 16
1307+#define DFLTCC_MAX_EOBS_BITS 15
1308+#define DFLTCC_MAX_PADDING_BITS 7
1309+#define DEFLATE_BOUND_COMPLEN(source_len) \
1310+ ((DFLTCC_BLOCK_HEADER_BITS + \
1311+ DFLTCC_HLITS_COUNT_BITS + \
1312+ DFLTCC_HDISTS_COUNT_BITS + \
1313+ DFLTCC_HCLENS_COUNT_BITS + \
1314+ DFLTCC_MAX_HCLENS * DFLTCC_HCLEN_BITS + \
1315+ (DFLTCC_MAX_HLITS + DFLTCC_MAX_HDISTS) * DFLTCC_MAX_HLIT_HDIST_BITS + \
1316+ (source_len) * DFLTCC_MAX_SYMBOL_BITS + \
1317+ DFLTCC_MAX_EOBS_BITS + \
1318+ DFLTCC_MAX_PADDING_BITS) >> 3)
1319+int ZLIB_INTERNAL dfltcc_can_inflate(z_streamp strm);
1320+typedef enum {
1321+ DFLTCC_INFLATE_CONTINUE,
1322+ DFLTCC_INFLATE_BREAK,
1323+ DFLTCC_INFLATE_SOFTWARE,
1324+} dfltcc_inflate_action;
1325+dfltcc_inflate_action ZLIB_INTERNAL dfltcc_inflate(z_streamp strm,
1326+ int flush, int *ret);
1327+int ZLIB_INTERNAL dfltcc_was_inflate_used(z_streamp strm);
1328+int ZLIB_INTERNAL dfltcc_inflate_disable(z_streamp strm);
1329+int ZLIB_INTERNAL dfltcc_inflate_set_dictionary(z_streamp strm,
1330+ const Bytef *dictionary,
1331+ uInt dict_length);
1332+int ZLIB_INTERNAL dfltcc_inflate_get_dictionary(z_streamp strm,
1333+ Bytef *dictionary,
1334+ uInt* dict_length);
1335+
1336+#define ZALLOC_STATE dfltcc_alloc_state
1337+#define ZFREE_STATE ZFREE
1338+#define ZCOPY_STATE dfltcc_copy_state
1339+#define ZALLOC_WINDOW dfltcc_alloc_window
1340+#define ZCOPY_WINDOW dfltcc_copy_window
1341+#define ZFREE_WINDOW dfltcc_free_window
1342+#define TRY_FREE_WINDOW dfltcc_free_window
1343+#define INFLATE_RESET_KEEP_HOOK(strm) \
1344+ dfltcc_reset((strm), sizeof(struct inflate_state))
1345+#define INFLATE_PRIME_HOOK(strm, bits, value) \
1346+ do { if (dfltcc_inflate_disable((strm))) return Z_STREAM_ERROR; } while (0)
1347+#define INFLATE_TYPEDO_HOOK(strm, flush) \
1348+ if (dfltcc_can_inflate((strm))) { \
1349+ dfltcc_inflate_action action; \
1350+\
1351+ RESTORE(); \
1352+ action = dfltcc_inflate((strm), (flush), &ret); \
1353+ LOAD(); \
1354+ if (action == DFLTCC_INFLATE_CONTINUE) \
1355+ break; \
1356+ else if (action == DFLTCC_INFLATE_BREAK) \
1357+ goto inf_leave; \
1358+ }
1359+#define INFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_inflate((strm)))
1360+#define INFLATE_NEED_UPDATEWINDOW(strm) (!dfltcc_can_inflate((strm)))
1361+#define INFLATE_MARK_HOOK(strm) \
1362+ do { \
1363+ if (dfltcc_was_inflate_used((strm))) return -(1L << 16); \
1364+ } while (0)
1365+#define INFLATE_SYNC_POINT_HOOK(strm) \
1366+ do { \
1367+ if (dfltcc_was_inflate_used((strm))) return Z_STREAM_ERROR; \
1368+ } while (0)
1369+#define INFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
1370+ do { \
1371+ if (dfltcc_can_inflate(strm)) \
1372+ return dfltcc_inflate_set_dictionary(strm, dict, dict_len); \
1373+ } while (0)
1374+#define INFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
1375+ do { \
1376+ if (dfltcc_can_inflate(strm)) \
1377+ return dfltcc_inflate_get_dictionary(strm, dict, dict_len); \
1378+ } while (0)
1379+
1380+#endif
1381diff --git a/contrib/s390/dfltcc_deflate.h b/contrib/s390/dfltcc_deflate.h
1382new file mode 100644
1383index 0000000..2699d15
1384--- /dev/null
1385+++ b/contrib/s390/dfltcc_deflate.h
1386@@ -0,0 +1,53 @@
1387+#ifndef DFLTCC_DEFLATE_H
1388+#define DFLTCC_DEFLATE_H
1389+
1390+#include "dfltcc.h"
1391+
1392+int ZLIB_INTERNAL dfltcc_can_deflate(z_streamp strm);
1393+int ZLIB_INTERNAL dfltcc_deflate(z_streamp strm,
1394+ int flush,
1395+ block_state *result);
1396+int ZLIB_INTERNAL dfltcc_deflate_params(z_streamp strm, int level,
1397+ int strategy, int *flush);
1398+int ZLIB_INTERNAL dfltcc_deflate_done(z_streamp strm, int flush);
1399+int ZLIB_INTERNAL dfltcc_deflate_set_dictionary(z_streamp strm,
1400+ const Bytef *dictionary,
1401+ uInt dict_length);
1402+int ZLIB_INTERNAL dfltcc_deflate_get_dictionary(z_streamp strm,
1403+ Bytef *dictionary,
1404+ uInt* dict_length);
1405+
1406+#define DEFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
1407+ do { \
1408+ if (dfltcc_can_deflate((strm))) \
1409+ return dfltcc_deflate_set_dictionary((strm), (dict), (dict_len)); \
1410+ } while (0)
1411+#define DEFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
1412+ do { \
1413+ if (dfltcc_can_deflate((strm))) \
1414+ return dfltcc_deflate_get_dictionary((strm), (dict), (dict_len)); \
1415+ } while (0)
1416+#define DEFLATE_RESET_KEEP_HOOK(strm) \
1417+ dfltcc_reset((strm), sizeof(deflate_state))
1418+#define DEFLATE_PARAMS_HOOK(strm, level, strategy, hook_flush) \
1419+ do { \
1420+ int err; \
1421+\
1422+ err = dfltcc_deflate_params((strm), \
1423+ (level), \
1424+ (strategy), \
1425+ (hook_flush)); \
1426+ if (err == Z_STREAM_ERROR) \
1427+ return err; \
1428+ } while (0)
1429+#define DEFLATE_DONE dfltcc_deflate_done
1430+#define DEFLATE_BOUND_ADJUST_COMPLEN(strm, complen, source_len) \
1431+ do { \
1432+ if (deflateStateCheck((strm)) || dfltcc_can_deflate((strm))) \
1433+ (complen) = DEFLATE_BOUND_COMPLEN(source_len); \
1434+ } while (0)
1435+#define DEFLATE_NEED_CONSERVATIVE_BOUND(strm) (dfltcc_can_deflate((strm)))
1436+#define DEFLATE_HOOK dfltcc_deflate
1437+#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
1438+
1439+#endif
1440diff --git a/deflate.c b/deflate.c
1441index bd01175..9f5bc8b 100644
1442--- a/deflate.c
1443+++ b/deflate.c
1444@@ -60,12 +60,24 @@ const char deflate_copyright[] =
1445 copyright string in the executable of your product.
1446 */
1447
1448-typedef enum {
1449- need_more, /* block not completed, need more input or more output */
1450- block_done, /* block flush performed */
1451- finish_started, /* finish started, need only more output at next deflate */
1452- finish_done /* finish done, accept no more input or output */
1453-} block_state;
1454+#ifdef DFLTCC
1455+#include "contrib/s390/dfltcc_deflate.h"
1456+#else
1457+#define ZALLOC_STATE ZALLOC
1458+#define ZFREE_STATE ZFREE
1459+#define ZCOPY_STATE zmemcpy
1460+#define ZALLOC_WINDOW ZALLOC
1461+#define TRY_FREE_WINDOW TRY_FREE
1462+#define DEFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) do {} while (0)
1463+#define DEFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) do {} while (0)
1464+#define DEFLATE_RESET_KEEP_HOOK(strm) do {} while (0)
1465+#define DEFLATE_PARAMS_HOOK(strm, level, strategy, hook_flush) do {} while (0)
1466+#define DEFLATE_DONE(strm, flush) 1
1467+#define DEFLATE_BOUND_ADJUST_COMPLEN(strm, complen, sourceLen) do {} while (0)
1468+#define DEFLATE_NEED_CONSERVATIVE_BOUND(strm) 0
1469+#define DEFLATE_HOOK(strm, flush, bstate) 0
1470+#define DEFLATE_NEED_CHECKSUM(strm) 1
1471+#endif
1472
1473 typedef block_state (*compress_func)(deflate_state *s, int flush);
1474 /* Compression function. Returns the block state after the call. */
1475@@ -224,7 +236,8 @@ local unsigned read_buf(z_streamp strm, Bytef *buf, unsigned size) {
1476 strm->avail_in -= len;
1477
1478 zmemcpy(buf, strm->next_in, len);
1479- if (strm->state->wrap == 1) {
1480+ if (!DEFLATE_NEED_CHECKSUM(strm)) {}
1481+ else if (strm->state->wrap == 1) {
1482 strm->adler = adler32(strm->adler, buf, len);
1483 }
1484 #ifdef GZIP
1485@@ -429,7 +442,7 @@ int ZEXPORT deflateInit2_(z_streamp strm, int level, int method,
1486 return Z_STREAM_ERROR;
1487 }
1488 if (windowBits == 8) windowBits = 9; /* until 256-byte window bug fixed */
1489- s = (deflate_state *) ZALLOC(strm, 1, sizeof(deflate_state));
1490+ s = (deflate_state *) ZALLOC_STATE(strm, 1, sizeof(deflate_state));
1491 if (s == Z_NULL) return Z_MEM_ERROR;
1492 strm->state = (struct internal_state FAR *)s;
1493 s->strm = strm;
1494@@ -446,7 +459,7 @@ int ZEXPORT deflateInit2_(z_streamp strm, int level, int method,
1495 s->hash_mask = s->hash_size - 1;
1496 s->hash_shift = ((s->hash_bits + MIN_MATCH-1) / MIN_MATCH);
1497
1498- s->window = (Bytef *) ZALLOC(strm, s->w_size, 2*sizeof(Byte));
1499+ s->window = (Bytef *) ZALLOC_WINDOW(strm, s->w_size, 2*sizeof(Byte));
1500 s->prev = (Posf *) ZALLOC(strm, s->w_size, sizeof(Pos));
1501 s->head = (Posf *) ZALLOC(strm, s->hash_size, sizeof(Pos));
1502
1503@@ -559,6 +572,7 @@ int ZEXPORT deflateSetDictionary(z_streamp strm, const Bytef *dictionary,
1504 /* when using zlib wrappers, compute Adler-32 for provided dictionary */
1505 if (wrap == 1)
1506 strm->adler = adler32(strm->adler, dictionary, dictLength);
1507+ DEFLATE_SET_DICTIONARY_HOOK(strm, dictionary, dictLength);
1508 s->wrap = 0; /* avoid computing Adler-32 in read_buf */
1509
1510 /* if dictionary would fill window, just replace the history */
1511@@ -614,6 +628,7 @@ int ZEXPORT deflateGetDictionary(z_streamp strm, Bytef *dictionary,
1512
1513 if (deflateStateCheck(strm))
1514 return Z_STREAM_ERROR;
1515+ DEFLATE_GET_DICTIONARY_HOOK(strm, dictionary, dictLength);
1516 s = strm->state;
1517 len = s->strstart + s->lookahead;
1518 if (len > s->w_size)
1519@@ -658,6 +673,8 @@ int ZEXPORT deflateResetKeep(z_streamp strm) {
1520
1521 _tr_init(s);
1522
1523+ DEFLATE_RESET_KEEP_HOOK(strm);
1524+
1525 return Z_OK;
1526 }
1527
1528@@ -740,6 +757,7 @@ int ZEXPORT deflatePrime(z_streamp strm, int bits, int value) {
1529 int ZEXPORT deflateParams(z_streamp strm, int level, int strategy) {
1530 deflate_state *s;
1531 compress_func func;
1532+ int hook_flush = Z_NO_FLUSH;
1533
1534 if (deflateStateCheck(strm)) return Z_STREAM_ERROR;
1535 s = strm->state;
1536@@ -752,15 +770,18 @@ int ZEXPORT deflateParams(z_streamp strm, int level, int strategy) {
1537 if (level < 0 || level > 9 || strategy < 0 || strategy > Z_FIXED) {
1538 return Z_STREAM_ERROR;
1539 }
1540+ DEFLATE_PARAMS_HOOK(strm, level, strategy, &hook_flush);
1541 func = configuration_table[s->level].func;
1542
1543- if ((strategy != s->strategy || func != configuration_table[level].func) &&
1544- s->last_flush != -2) {
1545+ if (((strategy != s->strategy || func != configuration_table[level].func) &&
1546+ s->last_flush != -2) || hook_flush != Z_NO_FLUSH) {
1547 /* Flush the last buffer: */
1548- int err = deflate(strm, Z_BLOCK);
1549+ int flush = RANK(hook_flush) > RANK(Z_BLOCK) ? hook_flush : Z_BLOCK;
1550+ int err = deflate(strm, flush);
1551 if (err == Z_STREAM_ERROR)
1552 return err;
1553- if (strm->avail_in || (s->strstart - s->block_start) + s->lookahead)
1554+ if (strm->avail_in || (s->strstart - s->block_start) + s->lookahead ||
1555+ !DEFLATE_DONE(strm, flush))
1556 return Z_BUF_ERROR;
1557 }
1558 if (s->level != level) {
1559@@ -828,11 +849,13 @@ uLong ZEXPORT deflateBound(z_streamp strm, uLong sourceLen) {
1560 ~13% overhead plus a small constant */
1561 fixedlen = sourceLen + (sourceLen >> 3) + (sourceLen >> 8) +
1562 (sourceLen >> 9) + 4;
1563+ DEFLATE_BOUND_ADJUST_COMPLEN(strm, fixedlen, sourceLen);
1564
1565 /* upper bound for stored blocks with length 127 (memLevel == 1) --
1566 ~4% overhead plus a small constant */
1567 storelen = sourceLen + (sourceLen >> 5) + (sourceLen >> 7) +
1568 (sourceLen >> 11) + 7;
1569+ DEFLATE_BOUND_ADJUST_COMPLEN(strm, storelen, sourceLen);
1570
1571 /* if can't get parameters, return larger bound plus a zlib wrapper */
1572 if (deflateStateCheck(strm))
1573@@ -874,7 +897,8 @@ uLong ZEXPORT deflateBound(z_streamp strm, uLong sourceLen) {
1574 }
1575
1576 /* if not default parameters, return one of the conservative bounds */
1577- if (s->w_bits != 15 || s->hash_bits != 8 + 7)
1578+ if (DEFLATE_NEED_CONSERVATIVE_BOUND(strm) ||
1579+ s->w_bits != 15 || s->hash_bits != 8 + 7)
1580 return (s->w_bits <= s->hash_bits && s->level ? fixedlen : storelen) +
1581 wraplen;
1582
1583@@ -900,7 +924,7 @@ local void putShortMSB(deflate_state *s, uInt b) {
1584 * applications may wish to modify it to avoid allocating a large
1585 * strm->next_out buffer and copying into it. (See also read_buf()).
1586 */
1587-local void flush_pending(z_streamp strm) {
1588+void ZLIB_INTERNAL flush_pending(z_streamp strm) {
1589 unsigned len;
1590 deflate_state *s = strm->state;
1591
1592@@ -1167,7 +1191,8 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
1593 (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
1594 block_state bstate;
1595
1596- bstate = s->level == 0 ? deflate_stored(s, flush) :
1597+ bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate :
1598+ s->level == 0 ? deflate_stored(s, flush) :
1599 s->strategy == Z_HUFFMAN_ONLY ? deflate_huff(s, flush) :
1600 s->strategy == Z_RLE ? deflate_rle(s, flush) :
1601 (*(configuration_table[s->level].func))(s, flush);
1602@@ -1214,7 +1239,6 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
1603 }
1604
1605 if (flush != Z_FINISH) return Z_OK;
1606- if (s->wrap <= 0) return Z_STREAM_END;
1607
1608 /* Write the trailer */
1609 #ifdef GZIP
1610@@ -1230,7 +1254,7 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
1611 }
1612 else
1613 #endif
1614- {
1615+ if (s->wrap == 1) {
1616 putShortMSB(s, (uInt)(strm->adler >> 16));
1617 putShortMSB(s, (uInt)(strm->adler & 0xffff));
1618 }
1619@@ -1239,7 +1263,11 @@ int ZEXPORT deflate(z_streamp strm, int flush) {
1620 * to flush the rest.
1621 */
1622 if (s->wrap > 0) s->wrap = -s->wrap; /* write the trailer only once! */
1623- return s->pending != 0 ? Z_OK : Z_STREAM_END;
1624+ if (s->pending == 0) {
1625+ Assert(s->bi_valid == 0, "bi_buf not flushed");
1626+ return Z_STREAM_END;
1627+ }
1628+ return Z_OK;
1629 }
1630
1631 /* ========================================================================= */
1632@@ -1254,9 +1282,9 @@ int ZEXPORT deflateEnd(z_streamp strm) {
1633 TRY_FREE(strm, strm->state->pending_buf);
1634 TRY_FREE(strm, strm->state->head);
1635 TRY_FREE(strm, strm->state->prev);
1636- TRY_FREE(strm, strm->state->window);
1637+ TRY_FREE_WINDOW(strm, strm->state->window);
1638
1639- ZFREE(strm, strm->state);
1640+ ZFREE_STATE(strm, strm->state);
1641 strm->state = Z_NULL;
1642
1643 return status == BUSY_STATE ? Z_DATA_ERROR : Z_OK;
1644@@ -1285,13 +1313,13 @@ int ZEXPORT deflateCopy(z_streamp dest, z_streamp source) {
1645
1646 zmemcpy((voidpf)dest, (voidpf)source, sizeof(z_stream));
1647
1648- ds = (deflate_state *) ZALLOC(dest, 1, sizeof(deflate_state));
1649+ ds = (deflate_state *) ZALLOC_STATE(dest, 1, sizeof(deflate_state));
1650 if (ds == Z_NULL) return Z_MEM_ERROR;
1651 dest->state = (struct internal_state FAR *) ds;
1652- zmemcpy((voidpf)ds, (voidpf)ss, sizeof(deflate_state));
1653+ ZCOPY_STATE((voidpf)ds, (voidpf)ss, sizeof(deflate_state));
1654 ds->strm = dest;
1655
1656- ds->window = (Bytef *) ZALLOC(dest, ds->w_size, 2*sizeof(Byte));
1657+ ds->window = (Bytef *) ZALLOC_WINDOW(dest, ds->w_size, 2*sizeof(Byte));
1658 ds->prev = (Posf *) ZALLOC(dest, ds->w_size, sizeof(Pos));
1659 ds->head = (Posf *) ZALLOC(dest, ds->hash_size, sizeof(Pos));
1660 ds->pending_buf = (uchf *) ZALLOC(dest, ds->lit_bufsize, 4);
1661diff --git a/deflate.h b/deflate.h
1662index 8696791..d49e698 100644
1663--- a/deflate.h
1664+++ b/deflate.h
1665@@ -299,6 +299,7 @@ void ZLIB_INTERNAL _tr_flush_bits(deflate_state *s);
1666 void ZLIB_INTERNAL _tr_align(deflate_state *s);
1667 void ZLIB_INTERNAL _tr_stored_block(deflate_state *s, charf *buf,
1668 ulg stored_len, int last);
1669+void ZLIB_INTERNAL _tr_send_bits(deflate_state *s, int value, int length);
1670
1671 #define d_code(dist) \
1672 ((dist) < 256 ? _dist_code[dist] : _dist_code[256+((dist)>>7)])
1673@@ -343,4 +344,15 @@ void ZLIB_INTERNAL _tr_stored_block(deflate_state *s, charf *buf,
1674 flush = _tr_tally(s, distance, length)
1675 #endif
1676
1677+typedef enum {
1678+ need_more, /* block not completed, need more input or more output */
1679+ block_done, /* block flush performed */
1680+ finish_started, /* finish started, need only more output at next deflate */
1681+ finish_done /* finish done, accept no more input or output */
1682+} block_state;
1683+
1684+unsigned ZLIB_INTERNAL bi_reverse(unsigned code, int len);
1685+void ZLIB_INTERNAL bi_windup(deflate_state *s);
1686+void ZLIB_INTERNAL flush_pending(z_streamp strm);
1687+
1688 #endif /* DEFLATE_H */
1689diff --git a/gzguts.h b/gzguts.h
1690index f937504..5adfd1d 100644
1691--- a/gzguts.h
1692+++ b/gzguts.h
1693@@ -152,7 +152,11 @@
1694
1695 /* default i/o buffer size -- double this for output when reading (this and
1696 twice this must be able to fit in an unsigned type) */
1697+#ifdef DFLTCC
1698+#define GZBUFSIZE 131072
1699+#else
1700 #define GZBUFSIZE 8192
1701+#endif
1702
1703 /* gzip modes, also provide a little integrity check on the passed structure */
1704 #define GZ_NONE 0
1705diff --git a/inflate.c b/inflate.c
1706index b0757a9..c0f808f 100644
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches