Code review comment for lp:~louis/apport/apport-unpack-extract

Revision history for this message
Martin Pitt (pitti) wrote :

Hey Louis,

thanks for working on this! The general approach looks fine to me, I
just have some small things to fix.

Louis Bouchard [2015-01-26 14:42 -0000]:
> Changes to apport-unpack are only meaningful for Ubuntu Precise, so
> they could be brought in as a debian patch, hence dropped from that
> MP, while keeping the enhancement to problem_report.

I think it's fine to keep that in trunk. We might have other large
attachments which are better handled that way.

> === modified file 'bin/apport-unpack'
> --- bin/apport-unpack 2012-05-04 06:54:56 +0000
> +++ bin/apport-unpack 2015-01-26 14:41:51 +0000
> @@ -49,18 +49,28 @@
> except OSError as e:
> fatal(str(e))
>
> +binaries = ()

Please make that a list or a set; using a tuple in this way is
confusing. It would also be nice if you could name this "bin_keys"?

> for k in pr:
> + if pr[k] == u'':

No u'' please, just ''; u'' isn't compatible with earlier Python 3.x.

> + binaries += (k,)

Following the above, bin_keys.append(k) (list) or .add(k) (set).

> + with open(report, 'rb') as f:
> + for element in binaries:

"key" instead of element, please? Let's consistently call the report
contents "key" and "value".

> + def extract(self, file, item=None, directory='/tmp'):
> + '''Extract only one element from the problem_report
> + directly without loading the report beforehand
> + This is required for Kernel Crash Dumps that can be
> + very big and saturate the RAM
> + '''

Short descriptions need to be one line. Please rename "item" to "key",
and drop the None default for it, as None will turn the whole thing
into a no-op. Also, don't specify a default for directory; such
defaults are for CLI programs, and libraries should *never* default to
/tmp unless they very carefully avoid overwriting existing files in a
race free manner (i. e. O_CREAT|O_EXCL).

Finally, to further point out that this is for a single key instead of
the whole report, could this be named extract_key()?

> + self._assert_bin_mode(file)
> + self.data.clear()
> + key = None
> + value = None
> + b64_block = False
> + bd = None
> + # Make sure the report is at the beginning
> + file.seek(0)

I'd like to avoid this. First, it's unexpected (and undocumented), you
might deliberately call this on an open file at an offset, and second
it won't work for non-seekable fds such as reading from stdin.

> + for line in file:
> + # continuation line
> + if line.startswith(b' '):
> + if not b64_block:
> + continue
> + assert (key is not None and value is not None)
> + if b64_block:
> + l = base64.b64decode(line)
> + if bd:
> + out.write(bd.decompress(l))
> + else:
> + # lazy initialization of bd
> + # skip gzip header, if present
> + if l.startswith(b'\037\213\010'):
> + bd = zlib.decompressobj(-zlib.MAX_WBITS)
> + out.write(bd.decompress(self._strip_gzip_header(l)))
> + else:
> + # legacy zlib-only format used default block
> + # size
> + bd = zlib.decompressobj()
> + out.write(bd.decompress(l))
> + else:
> + if len(value) > 0:
> + out.write('{}'.format(b'\n'))
> + if line.endswith(b'\n'):
> + out.write('{}'.format(line[1:-1]))
> + else:
> + out.write('{}'.format(line[1:]))
> + else:
> + if b64_block:
> + if bd:
> + value += bd.flush()
> + b64_block = False
> + bd = None
> + if key:
> + assert value is not None
> + self.data[key] = self._try_unicode(value)
> + (key, value) = line.split(b':', 1)
> + if not _python2:
> + key = key.decode('ASCII')
> + if key != item:
> + continue
> + value = value.strip()
> + if value == b'base64':
> + value = b''
> + b64_block = True
> + try:
> + out=open(os.path.join(directory, item), 'wb')
> + except IOError as e:
> + fatal(str(e))

This logic can become a simpler as you know precisely what to scan for
at first: a line equal to key + b':\n'. If it's not found, this should
throw a KeyError, otherwise you can start uncompressing/writing from
there and quit once you arrive at the next key without having to read
the remainder of the file.

> +
> + if key is not None:
> + self.data[key] = self._try_unicode(value)
> +
> + self.old_keys = set(self.data.keys())

These two shouldn't happen as you don't modify the report.

> --- test/test_problem_report.py 2012-10-11 05:56:35 +0000
> +++ test/test_problem_report.py 2015-01-26 14:41:51 +0000
> @@ -258,6 +266,52 @@
> pr.load(BytesIO(b'ProblemType: Crash'))
> self.assertEqual(list(pr.keys()), ['ProblemType'])
>
> +
> + def test_extract(self):
> + '''extract() with various binary elements.'''
> +
> + # create a test report with binary elements
> + large_val = b'A' * 5000000
> +
> + pr = problem_report.ProblemReport()
> + pr['Txt'] = 'some text'
> + pr['MoreTxt'] = 'some more text'
> + pr['Foo'] = problem_report.CompressedValue(b'FooFoo!')
> + pr['Uncompressed'] = bin_data
> + pr['Bin'] = problem_report.CompressedValue()
> + pr['Bin'].set_value(bin_data)
> + pr['Large'] = problem_report.CompressedValue(large_val)
> + pr['Multiline'] = problem_report.CompressedValue(b'\1\1\1\n\2\2\n\3\3\3')

This is really binary, so I would just drop this bit.

> + report = BytesIO()
> + pr.write(report)
> + report.seek(0)
> +
> + #Extracts nothing if non binary
> + pr.extract(report, 'Txt', self.workdir )
> + self.assertEqual(os.path.exists(os.path.join(self.workdir, 'Txt')), False)

This should raise a ValueError instead. Silently not doing anything on
wrong input (i. e. an ASCII key) is very un-Pythonic.

Also, don't use assertEqual(..., True|False) -- use assertTrue() and
assertFalse(). Verifying that it does not write a file is still a good
thing, of course (after the assertRaises).

> + #Check inexistant element
> + pr.extract(report, 'Bar', self.workdir )
> + self.assertEqual(os.path.exists(os.path.join(self.workdir, 'Bar')), False)

This should raise a KeyError.

> + #Check valid elements
> + pr.extract(report, 'Foo', self.workdir )
> + element = open(os.path.join(self.workdir, 'Foo'))
> + self.assertEqual(element.read(), b'FooFoo!' )

Please don't leave fd leaks. Use "with open(...) as f:". Same for the
tests below.

Finally, there are a lot of PEP-8 errors in this. Can you please
install "pep8" and fix the issues in

  pep8 -r --ignore=E401,E501,W291,W293 test/test_problem_report.py
  pep8 -r --ignore=E401,E501,E124 problem_report.py

That's what test/run does, BTW. You can just start that, and as soon
as it starts running the module tests you are good. It will fail very
quickly on pep8/pyflakes errors.

Thanks!

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

« Back to merge proposal