Merge lp:~vila/bzr/1116079-gzip-compat into lp:bzr

Proposed by Vincent Ladeuil on 2013-07-09
Status: Merged
Approved by: Robert Collins on 2013-07-09
Approved revision: 6582
Merged at revision: 6580
Proposed branch: lp:~vila/bzr/1116079-gzip-compat
Merge into: lp:bzr
Diff against target: 268 lines (+122/-100)
3 files modified
bzrlib/tests/test_tuned_gzip.py (+6/-3)
bzrlib/tuned_gzip.py (+111/-97)
doc/en/release-notes/bzr-2.6.txt (+5/-0)
To merge this branch: bzr merge lp:~vila/bzr/1116079-gzip-compat
Reviewer Review Type Date Requested Status
John A Meinel Approve on 2013-07-09
Robert Collins (community) 2013-07-09 Approve on 2013-07-09
Review via email: mp+173666@code.launchpad.net

Commit message

Fix test failure for tuned_gzip.

Description of the change

gzip.py has changed in 2.7, AFAIU, we don't really need tuned_gzip.py
anymore but the deprecation has never been completed.

This fix does mainly two things:

- fix assertToGzip so the failing test_enormous_chunks doesn't flood the
  ouput with 256*1024 'a large string\n' twice, i.e. 7.864.320 bytes ! I
  suspect the test writer never had this test fail...

- catch up with gzip.py internal design evolution.

That's the minimal effort to get the test suite passing.

To post a comment you must log in.
Robert Collins (lifeless) wrote :

Looks good to me.

review: Approve
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-07-09 12:04, Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging
> lp:~vila/bzr/1116079-gzip-compat into lp:bzr.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #1116079
> in Bazaar: "Test
> bzrlib.tests.test_tuned_gzip.TestToGzip.test_enormous_chunk fails -
> potential regression in python2.7 2.7.3-15ubuntu1"
> https://bugs.launchpad.net/bzr/+bug/1116079
>
> For more details, see:
> https://code.launchpad.net/~vila/bzr/1116079-gzip-compat/+merge/173666
>
> gzip.py has changed in 2.7, AFAIU, we don't really need
> tuned_gzip.py anymore but the deprecation has never been
> completed.
>
> This fix does mainly two things:
>
>
> - fix assertToGzip so the failing test_enormous_chunks doesn't
> flood the ouput with 256*1024 'a large string\n' twice, i.e.
> 7.864.320 bytes ! I suspect the test writer never had this test
> fail...
>
> - catch up with gzip.py internal design evolution.
>
> That's the minimal effort to get the test suite passing.
>

+ lraw, ldecoded = len(raw_bytes), len(decoded)
+ self.assertEqual(lraw, ldecoded,
+ 'Expecting data length %d, got %d' % (lraw,
ldecoded))
+ self.assertEqual(raw_bytes, decoded)

Why not turn that into:

if raw_bytes != decoded:
  self.fail("Raw bytes did not match (not outputting due to size)")

Someone who wants to investigate can do a debug print, but we won't
dump 7MB that nobody can actively use if the length happens to match.

I would personally be fine just dropping tuned_gzip altogether in
favor of just using upstream's gzip (since it is only deprecated
formats anyway).

But this change is ok, too.

 review: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHbx4EACgkQJdeBCYSNAAORRgCcCmtTNU9Y+QO0KF3UPxrt7DbC
+dEAoM39VVlz6DVbinOPUODYepznv4Oy
=Tq6l
-----END PGP SIGNATURE-----

review: Approve
Vincent Ladeuil (vila) wrote :

sent to pqm by email

lp:~vila/bzr/1116079-gzip-compat updated on 2013-07-13
6583. By Vincent Ladeuil on 2013-07-13

Urgh. pqm still runs python 2.6 so we have to maintain compatibility to land the fix 8-(

Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/tests/test_tuned_gzip.py'
--- bzrlib/tests/test_tuned_gzip.py 2011-05-13 12:51:05 +0000
+++ bzrlib/tests/test_tuned_gzip.py 2013-07-13 19:08:24 +0000
@@ -106,14 +106,17 @@
106class TestToGzip(tests.TestCase):106class TestToGzip(tests.TestCase):
107107
108 def assertToGzip(self, chunks):108 def assertToGzip(self, chunks):
109 bytes = ''.join(chunks)109 raw_bytes = ''.join(chunks)
110 gzfromchunks = tuned_gzip.chunks_to_gzip(chunks)110 gzfromchunks = tuned_gzip.chunks_to_gzip(chunks)
111 gzfrombytes = tuned_gzip.bytes_to_gzip(bytes)111 gzfrombytes = tuned_gzip.bytes_to_gzip(raw_bytes)
112 self.assertEqual(gzfrombytes, gzfromchunks)112 self.assertEqual(gzfrombytes, gzfromchunks)
113 decoded = self.applyDeprecated(113 decoded = self.applyDeprecated(
114 symbol_versioning.deprecated_in((2, 3, 0)),114 symbol_versioning.deprecated_in((2, 3, 0)),
115 tuned_gzip.GzipFile, fileobj=StringIO(gzfromchunks)).read()115 tuned_gzip.GzipFile, fileobj=StringIO(gzfromchunks)).read()
116 self.assertEqual(bytes, decoded)116 lraw, ldecoded = len(raw_bytes), len(decoded)
117 self.assertEqual(lraw, ldecoded,
118 'Expecting data length %d, got %d' % (lraw, ldecoded))
119 self.assertEqual(raw_bytes, decoded)
117120
118 def test_single_chunk(self):121 def test_single_chunk(self):
119 self.assertToGzip(['a modest chunk\nwith some various\nbits\n'])122 self.assertToGzip(['a modest chunk\nwith some various\nbits\n'])
120123
=== modified file 'bzrlib/tuned_gzip.py'
--- bzrlib/tuned_gzip.py 2011-12-19 13:23:58 +0000
+++ bzrlib/tuned_gzip.py 2013-07-13 19:08:24 +0000
@@ -127,15 +127,28 @@
127 DeprecationWarning, stacklevel=2)127 DeprecationWarning, stacklevel=2)
128 gzip.GzipFile.__init__(self, *args, **kwargs)128 gzip.GzipFile.__init__(self, *args, **kwargs)
129129
130 def _add_read_data(self, data):130 if sys.version_info >= (2, 7, 4):
131 # 4169 calls in 183131 def _add_read_data(self, data):
132 # temp var for len(data) and switch to +='s.132 # 4169 calls in 183
133 # 4169 in 139133 # temp var for len(data) and switch to +='s.
134 len_data = len(data)134 # 4169 in 139
135 self.crc = zlib.crc32(data, self.crc)135 len_data = len(data)
136 self.extrabuf += data136 self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
137 self.extrasize += len_data137 offset = self.offset - self.extrastart
138 self.size += len_data138 self.extrabuf = self.extrabuf[offset:] + data
139 self.extrasize = self.extrasize + len_data
140 self.extrastart = self.offset
141 self.size = self.size + len_data
142 else:
143 def _add_read_data(self, data):
144 # 4169 calls in 183
145 # temp var for len(data) and switch to +='s.
146 # 4169 in 139
147 len_data = len(data)
148 self.crc = zlib.crc32(data, self.crc)
149 self.extrabuf += data
150 self.extrasize += len_data
151 self.size += len_data
139152
140 def _write_gzip_header(self):153 def _write_gzip_header(self):
141 """A tuned version of gzip._write_gzip_header154 """A tuned version of gzip._write_gzip_header
@@ -161,97 +174,98 @@
161 '' # self.fileobj.write(fname + '\000')174 '' # self.fileobj.write(fname + '\000')
162 )175 )
163176
164 def _read(self, size=1024):177 if sys.version_info < (2, 7, 4):
165 # various optimisations:178 def _read(self, size=1024):
166 # reduces lsprof count from 2500 to179 # various optimisations:
167 # 8337 calls in 1272, 365 internal180 # reduces lsprof count from 2500 to
168 if self.fileobj is None:181 # 8337 calls in 1272, 365 internal
169 raise EOFError, "Reached EOF"182 if self.fileobj is None:
170
171 if self._new_member:
172 # If the _new_member flag is set, we have to
173 # jump to the next member, if there is one.
174 #
175 # First, check if we're at the end of the file;
176 # if so, it's time to stop; no more members to read.
177 next_header_bytes = self.fileobj.read(10)
178 if next_header_bytes == '':
179 raise EOFError, "Reached EOF"183 raise EOFError, "Reached EOF"
180184
181 self._init_read()185 if self._new_member:
182 self._read_gzip_header(next_header_bytes)186 # If the _new_member flag is set, we have to
183 self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)187 # jump to the next member, if there is one.
184 self._new_member = False188 #
185189 # First, check if we're at the end of the file;
186 # Read a chunk of data from the file190 # if so, it's time to stop; no more members to read.
187 buf = self.fileobj.read(size)191 next_header_bytes = self.fileobj.read(10)
188192 if next_header_bytes == '':
189 # If the EOF has been reached, flush the decompression object193 raise EOFError, "Reached EOF"
190 # and mark this object as finished.194
191195 self._init_read()
192 if buf == "":196 self._read_gzip_header(next_header_bytes)
193 self._add_read_data(self.decompress.flush())197 self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
194 if len(self.decompress.unused_data) < 8:198 self._new_member = False
195 raise AssertionError("what does flush do?")199
196 self._gzip_tail = self.decompress.unused_data[0:8]200 # Read a chunk of data from the file
197 self._read_eof()201 buf = self.fileobj.read(size)
198 # tell the driving read() call we have stuffed all the data202
199 # in self.extrabuf203 # If the EOF has been reached, flush the decompression object
200 raise EOFError, 'Reached EOF'204 # and mark this object as finished.
201205
202 self._add_read_data(self.decompress.decompress(buf))206 if buf == "":
203207 self._add_read_data(self.decompress.flush())
204 if self.decompress.unused_data != "":208 if len(self.decompress.unused_data) < 8:
205 # Ending case: we've come to the end of a member in the file,209 raise AssertionError("what does flush do?")
206 # so seek back to the start of the data for the next member which
207 # is the length of the decompress objects unused data - the first
208 # 8 bytes for the end crc and size records.
209 #
210 # so seek back to the start of the unused data, finish up
211 # this member, and read a new gzip header.
212 # (The number of bytes to seek back is the length of the unused
213 # data, minus 8 because those 8 bytes are part of this member.
214 seek_length = len (self.decompress.unused_data) - 8
215 if seek_length > 0:
216 # we read too much data
217 self.fileobj.seek(-seek_length, 1)
218 self._gzip_tail = self.decompress.unused_data[0:8]210 self._gzip_tail = self.decompress.unused_data[0:8]
219 elif seek_length < 0:211 self._read_eof()
220 # we haven't read enough to check the checksum.212 # tell the driving read() call we have stuffed all the data
221 if not (-8 < seek_length):213 # in self.extrabuf
222 raise AssertionError("too great a seek")214 raise EOFError, 'Reached EOF'
223 buf = self.fileobj.read(-seek_length)215
224 self._gzip_tail = self.decompress.unused_data + buf216 self._add_read_data(self.decompress.decompress(buf))
225 else:217
226 self._gzip_tail = self.decompress.unused_data218 if self.decompress.unused_data != "":
227219 # Ending case: we've come to the end of a member in the file,
228 # Check the CRC and file size, and set the flag so we read220 # so seek back to the start of the data for the next member
229 # a new member on the next call221 # which is the length of the decompress objects unused data -
230 self._read_eof()222 # the first 8 bytes for the end crc and size records.
231 self._new_member = True223 #
232224 # so seek back to the start of the unused data, finish up
233 def _read_eof(self):225 # this member, and read a new gzip header.
234 """tuned to reduce function calls and eliminate file seeking:226 # (The number of bytes to seek back is the length of the unused
235 pass 1:227 # data, minus 8 because those 8 bytes are part of this member.
236 reduces lsprof count from 800 to 288228 seek_length = len (self.decompress.unused_data) - 8
237 4168 in 296229 if seek_length > 0:
238 avoid U32 call by using struct format L230 # we read too much data
239 4168 in 200231 self.fileobj.seek(-seek_length, 1)
240 """232 self._gzip_tail = self.decompress.unused_data[0:8]
241 # We've read to the end of the file, so we should have 8 bytes of233 elif seek_length < 0:
242 # unused data in the decompressor. If we don't, there is a corrupt file.234 # we haven't read enough to check the checksum.
243 # We use these 8 bytes to calculate the CRC and the recorded file size.235 if not (-8 < seek_length):
244 # We then check the that the computed CRC and size of the236 raise AssertionError("too great a seek")
245 # uncompressed data matches the stored values. Note that the size237 buf = self.fileobj.read(-seek_length)
246 # stored is the true file size mod 2**32.238 self._gzip_tail = self.decompress.unused_data + buf
247 if not (len(self._gzip_tail) == 8):239 else:
248 raise AssertionError("gzip trailer is incorrect length.")240 self._gzip_tail = self.decompress.unused_data
249 crc32, isize = struct.unpack("<LL", self._gzip_tail)241
250 # note that isize is unsigned - it can exceed 2GB242 # Check the CRC and file size, and set the flag so we read
251 if crc32 != U32(self.crc):243 # a new member on the next call
252 raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))244 self._read_eof()
253 elif isize != LOWU32(self.size):245 self._new_member = True
254 raise IOError, "Incorrect length of data produced"246
247 def _read_eof(self):
248 """tuned to reduce function calls and eliminate file seeking:
249 pass 1:
250 reduces lsprof count from 800 to 288
251 4168 in 296
252 avoid U32 call by using struct format L
253 4168 in 200
254 """
255 # We've read to the end of the file, so we should have 8 bytes of
256 # unused data in the decompressor. If we don't, there is a corrupt
257 # file. We use these 8 bytes to calculate the CRC and the recorded
258 # file size. We then check the that the computed CRC and size of
259 # the uncompressed data matches the stored values. Note that the
260 # size stored is the true file size mod 2**32.
261 if not (len(self._gzip_tail) == 8):
262 raise AssertionError("gzip trailer is incorrect length.")
263 crc32, isize = struct.unpack("<LL", self._gzip_tail)
264 # note that isize is unsigned - it can exceed 2GB
265 if crc32 != U32(self.crc):
266 raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))
267 elif isize != LOWU32(self.size):
268 raise IOError, "Incorrect length of data produced"
255269
256 def _read_gzip_header(self, bytes=None):270 def _read_gzip_header(self, bytes=None):
257 """Supply bytes if the minimum header size is already read.271 """Supply bytes if the minimum header size is already read.
258272
=== modified file 'doc/en/release-notes/bzr-2.6.txt'
--- doc/en/release-notes/bzr-2.6.txt 2013-05-27 09:13:55 +0000
+++ doc/en/release-notes/bzr-2.6.txt 2013-07-13 19:08:24 +0000
@@ -103,6 +103,11 @@
103* The launchpad plugin now requires API 1.6.0 or later. This version shipped103* The launchpad plugin now requires API 1.6.0 or later. This version shipped
104 with Ubuntu 9.10. (Aaron Bentley)104 with Ubuntu 9.10. (Aaron Bentley)
105105
106* Better align with upstream gzip.py in tuned_gzip.py. We may lose a bit of
107 performance but that's for knit and weave formats and already partly
108 deprecated, better keep compatibility than failing fast ;)
109 (Vincent Ladeuil, #1116079)
110
106Testing111Testing
107*******112*******
108113