Bazaar

Merge lp:~vila/bzr/1116079-gzip-compat into lp:bzr

1116079-gzip-compat
Merge into bzr.dev

Proposed by Vincent Ladeuil on 2013-07-09

Status:

Merged

Approved by:

Robert Collins on 2013-07-09

Approved revision:

no longer in the source branch.

Merged at revision:

6580

Proposed branch:

lp:~vila/bzr/1116079-gzip-compat

Merge into:

lp:bzr

Diff against target:

268 lines (+122/-100)

3 files modified

bzrlib/tests/test_tuned_gzip.py (+6/-3)
bzrlib/tuned_gzip.py (+111/-97)
doc/en/release-notes/bzr-2.6.txt (+5/-0)

To merge this branch:

bzr merge lp:~vila/bzr/1116079-gzip-compat

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
John A Meinel		Approve on 2013-07-09
Robert Collins (community)	2013-07-09	Approve on 2013-07-09
Review via email: mp+173666@code.launchpad.net

Commit message

Fix test failure for tuned_gzip.

Description of the change

gzip.py has changed in 2.7, AFAIU, we don't really need tuned_gzip.py
anymore but the deprecation has never been completed.

This fix does mainly two things:

- fix assertToGzip so the failing test_enormous_chunks doesn't flood the
ouput with 256*1024 'a large string\n' twice, i.e. 7.864.320 bytes ! I
suspect the test writer never had this test fail...

- catch up with gzip.py internal design evolution.

That's the minimal effort to get the test suite passing.

Revision history for this message

Robert Collins (lifeless) wrote on 2013-07-09:

Looks good to me.

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2013-07-09:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-07-09 12:04, Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging
> lp:~vila/bzr/1116079-gzip-compat into lp:bzr.
>
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #1116079
> in Bazaar: "Test
> bzrlib.tests.test_tuned_gzip.TestToGzip.test_enormous_chunk fails -
> potential regression in python2.7 2.7.3-15ubuntu1"
> https://bugs.launchpad.net/bzr/+bug/1116079
>
> For more details, see:
> https://code.launchpad.net/~vila/bzr/1116079-gzip-compat/+merge/173666
>
> gzip.py has changed in 2.7, AFAIU, we don't really need
> tuned_gzip.py anymore but the deprecation has never been
> completed.
>
> This fix does mainly two things:
>
>
> - fix assertToGzip so the failing test_enormous_chunks doesn't
> flood the ouput with 256*1024 'a large string\n' twice, i.e.
> 7.864.320 bytes ! I suspect the test writer never had this test
> fail...
>
> - catch up with gzip.py internal design evolution.
>
> That's the minimal effort to get the test suite passing.
>

+ lraw, ldecoded = len(raw_bytes), len(decoded)
+ self.assertEqual(lraw, ldecoded,
+ 'Expecting data length %d, got %d' % (lraw,
ldecoded))
+ self.assertEqual(raw_bytes, decoded)

Why not turn that into:

if raw_bytes != decoded:
self.fail("Raw bytes did not match (not outputting due to size)")

Someone who wants to investigate can do a debug print, but we won't
dump 7MB that nobody can actively use if the length happens to match.

I would personally be fine just dropping tuned_gzip altogether in
favor of just using upstream's gzip (since it is only deprecated
formats anyway).

But this change is ok, too.

review: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHbx4EACgkQJdeBCYSNAAORRgCcCmtTNU9Y+QO0KF3UPxrt7DbC
+dEAoM39VVlz6DVbinOPUODYepznv4Oy
=Tq6l
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-07-09 12:04, Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging
> lp:~vila/bzr/1116079-gzip-compat into lp:bzr.
> 
> Requested reviews: bzr-core (bzr-core) Related bugs: Bug #1116079
> in Bazaar: "Test
> bzrlib.tests.test_tuned_gzip.TestToGzip.test_enormous_chunk fails -
> potential regression in python2.7 2.7.3-15ubuntu1" 
> https://bugs.launchpad.net/bzr/+bug/1116079
> 
> For more details, see: 
> https://code.launchpad.net/~vila/bzr/1116079-gzip-compat/+merge/173666
>
>  gzip.py has changed in 2.7, AFAIU, we don't really need
> tuned_gzip.py anymore but the deprecation has never been
> completed.
> 
> This fix does mainly two things:
> 
> 
> - fix assertToGzip so the failing test_enormous_chunks doesn't
> flood the ouput with 256*1024 'a large string\n' twice, i.e.
> 7.864.320 bytes ! I suspect the test writer never had this test
> fail...
> 
> - catch up with gzip.py internal design evolution.
> 
> That's the minimal effort to get the test suite passing.
>

+        lraw, ldecoded = len(raw_bytes), len(decoded)
+        self.assertEqual(lraw, ldecoded,
+                         'Expecting data length %d, got %d' % (lraw,
ldecoded))
+        self.assertEqual(raw_bytes, decoded)

Why not turn that into:

if raw_bytes != decoded:
  self.fail("Raw bytes did not match (not outputting due to size)")

Someone who wants to investigate can do a debug print, but we won't
dump 7MB that nobody can actively use if the length happens to match.

I would personally be fine just dropping tuned_gzip altogether in
favor of just using upstream's gzip (since it is only deprecated
formats anyway).

But this change is ok, too.

review: approve

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHbx4EACgkQJdeBCYSNAAORRgCcCmtTNU9Y+QO0KF3UPxrt7DbC
+dEAoM39VVlz6DVbinOPUODYepznv4Oy
=Tq6l
-----END PGP SIGNATURE-----

review: Approve

Revision history for this message

Vincent Ladeuil (vila) wrote on 2013-07-13:

sent to pqm by email

Revision history for this message

Vincent Ladeuil (vila) wrote on 2013-07-13:

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Vincent Ladeuil

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

 === modified file 'bzrlib/tests/test_tuned_gzip.py'
 --- bzrlib/tests/test_tuned_gzip.py	2011-05-13 12:51:05 +0000
 +++ bzrlib/tests/test_tuned_gzip.py	2013-07-13 19:08:24 +0000
@@ -106,14 +106,17 @@
  class TestToGzip(tests.TestCase):
      def assertToGzip(self, chunks):
--        bytes = ''.join(chunks)
++        raw_bytes = ''.join(chunks)
          gzfromchunks = tuned_gzip.chunks_to_gzip(chunks)
--        gzfrombytes = tuned_gzip.bytes_to_gzip(bytes)
++        gzfrombytes = tuned_gzip.bytes_to_gzip(raw_bytes)
          self.assertEqual(gzfrombytes, gzfromchunks)
          decoded = self.applyDeprecated(
              symbol_versioning.deprecated_in((2, 3, 0)),
              tuned_gzip.GzipFile, fileobj=StringIO(gzfromchunks)).read()
--        self.assertEqual(bytes, decoded)
++        lraw, ldecoded = len(raw_bytes), len(decoded)
++        self.assertEqual(lraw, ldecoded,
++                         'Expecting data length %d, got %d' % (lraw, ldecoded))
++        self.assertEqual(raw_bytes, decoded)
      def test_single_chunk(self):
          self.assertToGzip(['a modest chunk\nwith some various\nbits\n'])
 === modified file 'bzrlib/tuned_gzip.py'
 --- bzrlib/tuned_gzip.py	2011-12-19 13:23:58 +0000
 +++ bzrlib/tuned_gzip.py	2013-07-13 19:08:24 +0000
@@ -127,15 +127,28 @@
              DeprecationWarning, stacklevel=2)
          gzip.GzipFile.__init__(self, *args, **kwargs)
--    def _add_read_data(self, data):
--        # 4169 calls in 183
--        # temp var for len(data) and switch to +='s.
--        # 4169 in 139
--        len_data = len(data)
--        self.crc = zlib.crc32(data, self.crc)
--        self.extrabuf += data
--        self.extrasize += len_data
--        self.size += len_data
++    if sys.version_info >= (2, 7, 4):
++        def _add_read_data(self, data):
++            # 4169 calls in 183
++            # temp var for len(data) and switch to +='s.
++            # 4169 in 139
++            len_data = len(data)
++            self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
++            offset = self.offset - self.extrastart
++            self.extrabuf = self.extrabuf[offset:] + data
++            self.extrasize = self.extrasize + len_data
++            self.extrastart = self.offset
++            self.size = self.size + len_data
++    else:
++        def _add_read_data(self, data):
++            # 4169 calls in 183
++            # temp var for len(data) and switch to +='s.
++            # 4169 in 139
++            len_data = len(data)
++            self.crc = zlib.crc32(data, self.crc)
++            self.extrabuf += data
++            self.extrasize += len_data
++            self.size += len_data
      def _write_gzip_header(self):
          """A tuned version of gzip._write_gzip_header
@@ -161,97 +174,98 @@
              ''          #     self.fileobj.write(fname + '\000')
+             )
--    def _read(self, size=1024):
--        # various optimisations:
--        # reduces lsprof count from 2500 to
--        # 8337 calls in 1272, 365 internal
--        if self.fileobj is None:
--            raise EOFError, "Reached EOF"
--
--        if self._new_member:
--            # If the _new_member flag is set, we have to
--            # jump to the next member, if there is one.
--            #
--            # First, check if we're at the end of the file;
--            # if so, it's time to stop; no more members to read.
--            next_header_bytes = self.fileobj.read(10)
--            if next_header_bytes == '':
++    if sys.version_info < (2, 7, 4):
++        def _read(self, size=1024):
++            # various optimisations:
++            # reduces lsprof count from 2500 to
++            # 8337 calls in 1272, 365 internal
++            if self.fileobj is None:
                  raise EOFError, "Reached EOF"
--            self._init_read()
--            self._read_gzip_header(next_header_bytes)
--            self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
--            self._new_member = False
--
--        # Read a chunk of data from the file
--        buf = self.fileobj.read(size)
--
--        # If the EOF has been reached, flush the decompression object
--        # and mark this object as finished.
--
--        if buf == "":
--            self._add_read_data(self.decompress.flush())
--            if len(self.decompress.unused_data) < 8:
--                raise AssertionError("what does flush do?")
--            self._gzip_tail = self.decompress.unused_data[0:8]
--            self._read_eof()
--            # tell the driving read() call we have stuffed all the data
--            # in self.extrabuf
--            raise EOFError, 'Reached EOF'
--
--        self._add_read_data(self.decompress.decompress(buf))
--
--        if self.decompress.unused_data != "":
--            # Ending case: we've come to the end of a member in the file,
--            # so seek back to the start of the data for the next member which
--            # is the length of the decompress objects unused data - the first
--            # 8 bytes for the end crc and size records.
--            #
--            # so seek back to the start of the unused data, finish up
--            # this member, and read a new gzip header.
--            # (The number of bytes to seek back is the length of the unused
--            # data, minus 8 because those 8 bytes are part of this member.
--            seek_length = len (self.decompress.unused_data) - 8
--            if seek_length > 0:
--                # we read too much data
--                self.fileobj.seek(-seek_length, 1)
++            if self._new_member:
++                # If the _new_member flag is set, we have to
++                # jump to the next member, if there is one.
++                #
++                # First, check if we're at the end of the file;
++                # if so, it's time to stop; no more members to read.
++                next_header_bytes = self.fileobj.read(10)
++                if next_header_bytes == '':
++                    raise EOFError, "Reached EOF"
++
++                self._init_read()
++                self._read_gzip_header(next_header_bytes)
++                self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
++                self._new_member = False
++
++            # Read a chunk of data from the file
++            buf = self.fileobj.read(size)
++
++            # If the EOF has been reached, flush the decompression object
++            # and mark this object as finished.
++
++            if buf == "":
++                self._add_read_data(self.decompress.flush())
++                if len(self.decompress.unused_data) < 8:
++                    raise AssertionError("what does flush do?")
                  self._gzip_tail = self.decompress.unused_data[0:8]
--            elif seek_length < 0:
--                # we haven't read enough to check the checksum.
--                if not (-8 < seek_length):
--                    raise AssertionError("too great a seek")
--                buf = self.fileobj.read(-seek_length)
--                self._gzip_tail = self.decompress.unused_data + buf
--            else:
--                self._gzip_tail = self.decompress.unused_data
--
--            # Check the CRC and file size, and set the flag so we read
--            # a new member on the next call
--            self._read_eof()
--            self._new_member = True
--
--    def _read_eof(self):
--        """tuned to reduce function calls and eliminate file seeking:
--        pass 1:
--        reduces lsprof count from 800 to 288
--        4168 in 296
--        avoid U32 call by using struct format L
--        4168 in 200
--        """
--        # We've read to the end of the file, so we should have 8 bytes of
--        # unused data in the decompressor. If we don't, there is a corrupt file.
--        # We use these 8 bytes to calculate the CRC and the recorded file size.
--        # We then check the that the computed CRC and size of the
--        # uncompressed data matches the stored values.  Note that the size
--        # stored is the true file size mod 2**32.
--        if not (len(self._gzip_tail) == 8):
--            raise AssertionError("gzip trailer is incorrect length.")
--        crc32, isize = struct.unpack("<LL", self._gzip_tail)
--        # note that isize is unsigned - it can exceed 2GB
--        if crc32 != U32(self.crc):
--            raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))
--        elif isize != LOWU32(self.size):
--            raise IOError, "Incorrect length of data produced"
++                self._read_eof()
++                # tell the driving read() call we have stuffed all the data
++                # in self.extrabuf
++                raise EOFError, 'Reached EOF'
++
++            self._add_read_data(self.decompress.decompress(buf))
++
++            if self.decompress.unused_data != "":
++                # Ending case: we've come to the end of a member in the file,
++                # so seek back to the start of the data for the next member
++                # which is the length of the decompress objects unused data -
++                # the first 8 bytes for the end crc and size records.
++                #
++                # so seek back to the start of the unused data, finish up
++                # this member, and read a new gzip header.
++                # (The number of bytes to seek back is the length of the unused
++                # data, minus 8 because those 8 bytes are part of this member.
++                seek_length = len (self.decompress.unused_data) - 8
++                if seek_length > 0:
++                    # we read too much data
++                    self.fileobj.seek(-seek_length, 1)
++                    self._gzip_tail = self.decompress.unused_data[0:8]
++                elif seek_length < 0:
++                    # we haven't read enough to check the checksum.
++                    if not (-8 < seek_length):
++                        raise AssertionError("too great a seek")
++                    buf = self.fileobj.read(-seek_length)
++                    self._gzip_tail = self.decompress.unused_data + buf
++                else:
++                    self._gzip_tail = self.decompress.unused_data
++
++                # Check the CRC and file size, and set the flag so we read
++                # a new member on the next call
++                self._read_eof()
++                self._new_member = True
++
++        def _read_eof(self):
++            """tuned to reduce function calls and eliminate file seeking:
++            pass 1:
++            reduces lsprof count from 800 to 288
++            4168 in 296
++            avoid U32 call by using struct format L
++            4168 in 200
++            """
++            # We've read to the end of the file, so we should have 8 bytes of
++            # unused data in the decompressor. If we don't, there is a corrupt
++            # file.  We use these 8 bytes to calculate the CRC and the recorded
++            # file size.  We then check the that the computed CRC and size of
++            # the uncompressed data matches the stored values.  Note that the
++            # size stored is the true file size mod 2**32.
++            if not (len(self._gzip_tail) == 8):
++                raise AssertionError("gzip trailer is incorrect length.")
++            crc32, isize = struct.unpack("<LL", self._gzip_tail)
++            # note that isize is unsigned - it can exceed 2GB
++            if crc32 != U32(self.crc):
++                raise IOError, "CRC check failed %d %d" % (crc32, U32(self.crc))
++            elif isize != LOWU32(self.size):
++                raise IOError, "Incorrect length of data produced"
      def _read_gzip_header(self, bytes=None):
          """Supply bytes if the minimum header size is already read.
 === modified file 'doc/en/release-notes/bzr-2.6.txt'
 --- doc/en/release-notes/bzr-2.6.txt	2013-05-27 09:13:55 +0000
 +++ doc/en/release-notes/bzr-2.6.txt	2013-07-13 19:08:24 +0000
@@ -103,6 +103,11 @@
  * The launchpad plugin now requires API 1.6.0 or later.  This version shipped
    with Ubuntu 9.10.  (Aaron Bentley)
++* Better align with upstream gzip.py in tuned_gzip.py. We may lose a bit of
++  performance but that's for knit and weave formats and already partly
++  deprecated, better keep compatibility than failing fast ;)
++  (Vincent Ladeuil, #1116079)
++
  Testing
  *******