Merge lp:~vila/bzr/1116079-gzip-compat into lp:bzr

Proposed by Vincent Ladeuil on 2013-07-09
Status: Merged
Approved by: Robert Collins on 2013-07-09
Approved revision: 6582
Merged at revision: 6580
Proposed branch: lp:~vila/bzr/1116079-gzip-compat
Merge into: lp:bzr
Diff against target: 268 lines (+122/-100)
3 files modified
bzrlib/tests/test_tuned_gzip.py (+6/-3)
bzrlib/tuned_gzip.py (+111/-97)
doc/en/release-notes/bzr-2.6.txt (+5/-0)
To merge this branch: bzr merge lp:~vila/bzr/1116079-gzip-compat
Reviewer Review Type Date Requested Status
John A Meinel Approve on 2013-07-09
Robert Collins (community) 2013-07-09 Approve on 2013-07-09
Review via email: mp+173666@code.launchpad.net

Commit message

Fix test failure for tuned_gzip.

Description of the change

gzip.py has changed in 2.7, AFAIU, we don't really need tuned_gzip.py
anymore but the deprecation has never been completed.

This fix does mainly two things:

- fix assertToGzip so the failing test_enormous_chunks doesn't flood the
  ouput with 256*1024 'a large string\n' twice, i.e. 7.864.320 bytes ! I
  suspect the test writer never had this test fail...

- catch up with gzip.py internal design evolution.

That's the minimal effort to get the test suite passing.

To post a comment you must log in.
Robert Collins (lifeless) wrote :

Looks good to me.

review: Approve
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-07-09 12:04, Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging
> lp:~vila/bzr/1116079-gzip-compat into lp:bzr.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #1116079
> in Bazaar: "Test
> bzrlib.tests.test_tuned_gzip.TestToGzip.test_enormous_chunk fails -
> potential regression in python2.7 2.7.3-15ubuntu1"
> https://bugs.launchpad.net/bzr/+bug/1116079
>
> For more details, see:
> https://code.launchpad.net/~vila/bzr/1116079-gzip-compat/+merge/173666
>
> gzip.py has changed in 2.7, AFAIU, we don't really need
> tuned_gzip.py anymore but the deprecation has never been
> completed.
>
> This fix does mainly two things:
>
>
> - fix assertToGzip so the failing test_enormous_chunks doesn't
> flood the ouput with 256*1024 'a large string\n' twice, i.e.
> 7.864.320 bytes ! I suspect the test writer never had this test
> fail...
>
> - catch up with gzip.py internal design evolution.
>
> That's the minimal effort to get the test suite passing.
>

+ lraw, ldecoded = len(raw_bytes), len(decoded)
+ self.assertEqual(lraw, ldecoded,
+ 'Expecting data length %d, got %d' % (lraw,
ldecoded))
+ self.assertEqual(raw_bytes, decoded)

Why not turn that into:

if raw_bytes != decoded:
  self.fail("Raw bytes did not match (not outputting due to size)")

Someone who wants to investigate can do a debug print, but we won't
dump 7MB that nobody can actively use if the length happens to match.

I would personally be fine just dropping tuned_gzip altogether in
favor of just using upstream's gzip (since it is only deprecated
formats anyway).

But this change is ok, too.

 review: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHbx4EACgkQJdeBCYSNAAORRgCcCmtTNU9Y+QO0KF3UPxrt7DbC
+dEAoM39VVlz6DVbinOPUODYepznv4Oy
=Tq6l
-----END PGP SIGNATURE-----

review: Approve
Vincent Ladeuil (vila) wrote :

sent to pqm by email

lp:~vila/bzr/1116079-gzip-compat updated on 2013-07-13
6583. By Vincent Ladeuil on 2013-07-13

Urgh. pqm still runs python 2.6 so we have to maintain compatibility to land the fix 8-(

Vincent Ladeuil (vila) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/test_tuned_gzip.py'
2--- bzrlib/tests/test_tuned_gzip.py 2011-05-13 12:51:05 +0000
3+++ bzrlib/tests/test_tuned_gzip.py 2013-07-13 19:08:24 +0000
4@@ -106,14 +106,17 @@
5 class TestToGzip(tests.TestCase):
6
7 def assertToGzip(self, chunks):
8- bytes = ''.join(chunks)
9+ raw_bytes = ''.join(chunks)
10 gzfromchunks = tuned_gzip.chunks_to_gzip(chunks)
11- gzfrombytes = tuned_gzip.bytes_to_gzip(bytes)
12+ gzfrombytes = tuned_gzip.bytes_to_gzip(raw_bytes)
13 self.assertEqual(gzfrombytes, gzfromchunks)
14 decoded = self.applyDeprecated(
15 symbol_versioning.deprecated_in((2, 3, 0)),
16 tuned_gzip.GzipFile, fileobj=StringIO(gzfromchunks)).read()
17- self.assertEqual(bytes, decoded)
18+ lraw, ldecoded = len(raw_bytes), len(decoded)
19+ self.assertEqual(lraw, ldecoded,
20+ 'Expecting data length %d, got %d' % (lraw, ldecoded))
21+ self.assertEqual(raw_bytes, decoded)
22
23 def test_single_chunk(self):
24 self.assertToGzip(['a modest chunk\nwith some various\nbits\n'])
25
26=== modified file 'bzrlib/tuned_gzip.py'
27--- bzrlib/tuned_gzip.py 2011-12-19 13:23:58 +0000
28+++ bzrlib/tuned_gzip.py 2013-07-13 19:08:24 +0000
29@@ -127,15 +127,28 @@
30 DeprecationWarning, stacklevel=2)
31 gzip.GzipFile.__init__(self, *args, **kwargs)
32
33- def _add_read_data(self, data):
34- # 4169 calls in 183
35- # temp var for len(data) and switch to +='s.
36- # 4169 in 139
37- len_data = len(data)
38- self.crc = zlib.crc32(data, self.crc)
39- self.extrabuf += data
40- self.extrasize += len_data
41- self.size += len_data
42+ if sys.version_info >= (2, 7, 4):
43+ def _add_read_data(self, data):
44+ # 4169 calls in 183
45+ # temp var for len(data) and switch to +='s.
46+ # 4169 in 139
47+ len_data = len(data)
48+ self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
49+ offset = self.offset - self.extrastart
50+ self.extrabuf = self.extrabuf[offset:] + data
51+ self.extrasize = self.extrasize + len_data
52+ self.extrastart = self.offset
53+ self.size = self.size + len_data
54+ else:
55+ def _add_read_data(self, data):
56+ # 4169 calls in 183
57+ # temp var for len(data) and switch to +='s.
58+ # 4169 in 139
59+ len_data = len(data)
60+ self.crc = zlib.crc32(data, self.crc)
61+ self.extrabuf += data
62+ self.extrasize += len_data
63+ self.size += len_data
64
65 def _write_gzip_header(self):
66 """A tuned version of gzip._write_gzip_header
67@@ -161,97 +174,98 @@
68 '' # self.fileobj.write(fname + '\000')
69 )
70
71- def _read(self, size=1024):
72- # various optimisations:
73- # reduces lsprof count from 2500 to
74- # 8337 calls in 1272, 365 internal
75- if self.fileobj is None:
76- raise EOFError, "Reached EOF"
77-
78- if self._new_member:
79- # If the _new_member flag is set, we have to
80- # jump to the next member, if there is one.
81- #
82- # First, check if we're at the end of the file;
83- # if so, it's time to stop; no more members to read.
84- next_header_bytes = self.fileobj.read(10)
85- if next_header_bytes == '':
86+ if sys.version_info < (2, 7, 4):
87+ def _read(self, size=1024):
88+ # various optimisations:
89+ # reduces lsprof count from 2500 to
90+ # 8337 calls in 1272, 365 internal
91+ if self.fileobj is None:
92 raise EOFError, "Reached EOF"
93
94- self._init_read()
95- self._read_gzip_header(next_header_bytes)
96- self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
97- self._new_member = False
98-
99- # Read a chunk of data from the file
100- buf = self.fileobj.read(size)
101-
102- # If the EOF has been reached, flush the decompression object
103- # and mark this object as finished.
104-
105- if buf == "":
106- self._add_read_data(self.decompress.flush())
107- if len(self.decompress.unused_data) < 8:
108- raise AssertionError("what does flush do?")
109- self._gzip_tail = self.decompress.unused_data[0:8]
110- self._read_eof()
111- # tell the driving read() call we have stuffed all the data
112- # in self.extrabuf
113- raise EOFError, 'Reached EOF'
114-
115- self._add_read_data(self.decompress.decompress(buf))
116-
117- if self.decompress.unused_data != "":
118- # Ending case: we've come to the end of a member in the file,
119- # so seek back to the start of the data for the next member which
120- # is the length of the decompress objects unused data - the first
121- # 8 bytes for the end crc and size records.
122- #
123- # so seek back to the start of the unused data, finish up
124- # this member, and read a new gzip header.
125- # (The number of bytes to seek back is the length of the unused
126- # data, minus 8 because those 8 bytes are part of this member.
127- seek_length = len (self.decompress.unused_data) - 8
128- if seek_length > 0:
129- # we read too much data
130- self.fileobj.seek(-seek_length, 1)
131+ if self._new_member:
132+ # If the _new_member flag is set, we have to
133+ # jump to the next member, if there is one.
134+ #
135+ # First, check if we're at the end of the file;
136+ # if so, it's time to stop; no more members to read.
137+ next_header_bytes = self.fileobj.read(10)
138+ if next_header_bytes == '':
139+ raise EOFError, "Reached EOF"
140+
141+ self._init_read()
142+ self._read_gzip_header(next_header_bytes)
143+ self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
144+ self._new_member = False
145+
146+ # Read a chunk of data from the file
147+ buf = self.fileobj.read(size)
148+
149+ # If the EOF has been reached, flush the decompression object
150+ # and mark this object as finished.
151+
152+ if buf == "":
153+ self._add_read_data(self.decompress.flush())
154+ if len(self.decompress.unused_data) < 8:
155+ raise AssertionError("what does flush do?")
156 self._gzip_tail = self.decompress.unused_data[0:8]
157- elif seek_length < 0:
158- # we haven't read enough to check the checksum.
159- if not (-8 < seek_length):
160- raise AssertionError("too great a seek")
161- buf = self.fileobj.read(-seek_length)
162- self._gzip_tail = self.decompress.unused_data + buf
163- else:
164- self._gzip_tail = self.decompress.unused_data
165-
166- # Check the CRC and file size, and set the flag so we read
167- # a new member on the next call
168- self._read_eof()
169- self._new_member = True
170-
171- def _read_eof(self):
172- """tuned to reduce function calls and eliminate file seeking:
173- pass 1:
174- reduces lsprof count from 800 to 288
175- 4168 in 296
176- avoid U32 call by using struct format L
177- 4168 in 200
178- """
179- # We've read to the end of the file, so we should have 8 bytes of
180- # unused data in the decompressor. If we don't, there is a corrupt file.
181- # We use these 8 bytes to calculate the CRC and the recorded file size.
182- # We then check the that the computed CRC and size of the
183- # uncompressed data matches the stored values. Note that the size
184- # stored is the true file size mod 2**32.
185- if not (len(self._gzip_tail) == 8):
186- raise AssertionError("gzip trailer is incorrect length.")
187- crc32, isize = struct.unpack("<LL", self._gzip_tail)
188- # note that isize is unsigned - it can exceed 2GB
189- if crc32 != U32(self.crc):
190- raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))
191- elif isize != LOWU32(self.size):
192- raise IOError, "Incorrect length of data produced"
193+ self._read_eof()
194+ # tell the driving read() call we have stuffed all the data
195+ # in self.extrabuf
196+ raise EOFError, 'Reached EOF'
197+
198+ self._add_read_data(self.decompress.decompress(buf))
199+
200+ if self.decompress.unused_data != "":
201+ # Ending case: we've come to the end of a member in the file,
202+ # so seek back to the start of the data for the next member
203+ # which is the length of the decompress objects unused data -
204+ # the first 8 bytes for the end crc and size records.
205+ #
206+ # so seek back to the start of the unused data, finish up
207+ # this member, and read a new gzip header.
208+ # (The number of bytes to seek back is the length of the unused
209+ # data, minus 8 because those 8 bytes are part of this member.
210+ seek_length = len (self.decompress.unused_data) - 8
211+ if seek_length > 0:
212+ # we read too much data
213+ self.fileobj.seek(-seek_length, 1)
214+ self._gzip_tail = self.decompress.unused_data[0:8]
215+ elif seek_length < 0:
216+ # we haven't read enough to check the checksum.
217+ if not (-8 < seek_length):
218+ raise AssertionError("too great a seek")
219+ buf = self.fileobj.read(-seek_length)
220+ self._gzip_tail = self.decompress.unused_data + buf
221+ else:
222+ self._gzip_tail = self.decompress.unused_data
223+
224+ # Check the CRC and file size, and set the flag so we read
225+ # a new member on the next call
226+ self._read_eof()
227+ self._new_member = True
228+
229+ def _read_eof(self):
230+ """tuned to reduce function calls and eliminate file seeking:
231+ pass 1:
232+ reduces lsprof count from 800 to 288
233+ 4168 in 296
234+ avoid U32 call by using struct format L
235+ 4168 in 200
236+ """
237+ # We've read to the end of the file, so we should have 8 bytes of
238+ # unused data in the decompressor. If we don't, there is a corrupt
239+ # file. We use these 8 bytes to calculate the CRC and the recorded
240+ # file size. We then check the that the computed CRC and size of
241+ # the uncompressed data matches the stored values. Note that the
242+ # size stored is the true file size mod 2**32.
243+ if not (len(self._gzip_tail) == 8):
244+ raise AssertionError("gzip trailer is incorrect length.")
245+ crc32, isize = struct.unpack("<LL", self._gzip_tail)
246+ # note that isize is unsigned - it can exceed 2GB
247+ if crc32 != U32(self.crc):
248+ raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))
249+ elif isize != LOWU32(self.size):
250+ raise IOError, "Incorrect length of data produced"
251
252 def _read_gzip_header(self, bytes=None):
253 """Supply bytes if the minimum header size is already read.
254
255=== modified file 'doc/en/release-notes/bzr-2.6.txt'
256--- doc/en/release-notes/bzr-2.6.txt 2013-05-27 09:13:55 +0000
257+++ doc/en/release-notes/bzr-2.6.txt 2013-07-13 19:08:24 +0000
258@@ -103,6 +103,11 @@
259 * The launchpad plugin now requires API 1.6.0 or later. This version shipped
260 with Ubuntu 9.10. (Aaron Bentley)
261
262+* Better align with upstream gzip.py in tuned_gzip.py. We may lose a bit of
263+ performance but that's for knit and weave formats and already partly
264+ deprecated, better keep compatibility than failing fast ;)
265+ (Vincent Ladeuil, #1116079)
266+
267 Testing
268 *******
269