thanks for working on this! The general approach looks fine to me, I
just have some small things to fix.
Louis Bouchard [2015-01-26 14:42 -0000]:
> Changes to apport-unpack are only meaningful for Ubuntu Precise, so
> they could be brought in as a debian patch, hence dropped from that
> MP, while keeping the enhancement to problem_report.
I think it's fine to keep that in trunk. We might have other large
attachments which are better handled that way.
Please make that a list or a set; using a tuple in this way is
confusing. It would also be nice if you could name this "bin_keys"?
> for k in pr:
> + if pr[k] == u'':
No u'' please, just ''; u'' isn't compatible with earlier Python 3.x.
> + binaries += (k,)
Following the above, bin_keys.append(k) (list) or .add(k) (set).
> + with open(report, 'rb') as f:
> + for element in binaries:
"key" instead of element, please? Let's consistently call the report
contents "key" and "value".
> + def extract(self, file, item=None, directory='/tmp'):
> + '''Extract only one element from the problem_report
> + directly without loading the report beforehand
> + This is required for Kernel Crash Dumps that can be
> + very big and saturate the RAM
> + '''
Short descriptions need to be one line. Please rename "item" to "key",
and drop the None default for it, as None will turn the whole thing
into a no-op. Also, don't specify a default for directory; such
defaults are for CLI programs, and libraries should *never* default to
/tmp unless they very carefully avoid overwriting existing files in a
race free manner (i. e. O_CREAT|O_EXCL).
Finally, to further point out that this is for a single key instead of
the whole report, could this be named extract_key()?
> + self._assert_bin_mode(file)
> + self.data.clear()
> + key = None
> + value = None
> + b64_block = False
> + bd = None
> + # Make sure the report is at the beginning
> + file.seek(0)
I'd like to avoid this. First, it's unexpected (and undocumented), you
might deliberately call this on an open file at an offset, and second
it won't work for non-seekable fds such as reading from stdin.
> + for line in file:
> + # continuation line
> + if line.startswith(b' '):
> + if not b64_block:
> + continue
> + assert (key is not None and value is not None)
> + if b64_block:
> + l = base64.b64decode(line)
> + if bd:
> + out.write(bd.decompress(l))
> + else:
> + # lazy initialization of bd
> + # skip gzip header, if present
> + if l.startswith(b'\037\213\010'):
> + bd = zlib.decompressobj(-zlib.MAX_WBITS)
> + out.write(bd.decompress(self._strip_gzip_header(l)))
> + else:
> + # legacy zlib-only format used default block
> + # size
> + bd = zlib.decompressobj()
> + out.write(bd.decompress(l))
> + else:
> + if len(value) > 0:
> + out.write('{}'.format(b'\n'))
> + if line.endswith(b'\n'):
> + out.write('{}'.format(line[1:-1]))
> + else:
> + out.write('{}'.format(line[1:]))
> + else:
> + if b64_block:
> + if bd:
> + value += bd.flush()
> + b64_block = False
> + bd = None
> + if key:
> + assert value is not None
> + self.data[key] = self._try_unicode(value)
> + (key, value) = line.split(b':', 1)
> + if not _python2:
> + key = key.decode('ASCII')
> + if key != item:
> + continue
> + value = value.strip()
> + if value == b'base64':
> + value = b''
> + b64_block = True
> + try:
> + out=open(os.path.join(directory, item), 'wb')
> + except IOError as e:
> + fatal(str(e))
This logic can become a simpler as you know precisely what to scan for
at first: a line equal to key + b':\n'. If it's not found, this should
throw a KeyError, otherwise you can start uncompressing/writing from
there and quit once you arrive at the next key without having to read
the remainder of the file.
> +
> + if key is not None:
> + self.data[key] = self._try_unicode(value)
> +
> + self.old_keys = set(self.data.keys())
These two shouldn't happen as you don't modify the report.
This should raise a ValueError instead. Silently not doing anything on
wrong input (i. e. an ASCII key) is very un-Pythonic.
Also, don't use assertEqual(..., True|False) -- use assertTrue() and
assertFalse(). Verifying that it does not write a file is still a good
thing, of course (after the assertRaises).
That's what test/run does, BTW. You can just start that, and as soon
as it starts running the module tests you are good. It will fail very
quickly on pep8/pyflakes errors.
Thanks!
Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Hey Louis,
thanks for working on this! The general approach looks fine to me, I
just have some small things to fix.
Louis Bouchard [2015-01-26 14:42 -0000]:
> Changes to apport-unpack are only meaningful for Ubuntu Precise, so
> they could be brought in as a debian patch, hence dropped from that
> MP, while keeping the enhancement to problem_report.
I think it's fine to keep that in trunk. We might have other large
attachments which are better handled that way.
> === modified file 'bin/apport-unpack'
> --- bin/apport-unpack 2012-05-04 06:54:56 +0000
> +++ bin/apport-unpack 2015-01-26 14:41:51 +0000
> @@ -49,18 +49,28 @@
> except OSError as e:
> fatal(str(e))
>
> +binaries = ()
Please make that a list or a set; using a tuple in this way is
confusing. It would also be nice if you could name this "bin_keys"?
> for k in pr:
> + if pr[k] == u'':
No u'' please, just ''; u'' isn't compatible with earlier Python 3.x.
> + binaries += (k,)
Following the above, bin_keys.append(k) (list) or .add(k) (set).
> + with open(report, 'rb') as f:
> + for element in binaries:
"key" instead of element, please? Let's consistently call the report
contents "key" and "value".
> + def extract(self, file, item=None, directory='/tmp'):
> + '''Extract only one element from the problem_report
> + directly without loading the report beforehand
> + This is required for Kernel Crash Dumps that can be
> + very big and saturate the RAM
> + '''
Short descriptions need to be one line. Please rename "item" to "key",
and drop the None default for it, as None will turn the whole thing
into a no-op. Also, don't specify a default for directory; such
defaults are for CLI programs, and libraries should *never* default to
/tmp unless they very carefully avoid overwriting existing files in a
race free manner (i. e. O_CREAT|O_EXCL).
Finally, to further point out that this is for a single key instead of
the whole report, could this be named extract_key()?
> + self._assert_ bin_mode( file)
> + self.data.clear()
> + key = None
> + value = None
> + b64_block = False
> + bd = None
> + # Make sure the report is at the beginning
> + file.seek(0)
I'd like to avoid this. First, it's unexpected (and undocumented), you
might deliberately call this on an open file at an offset, and second
it won't work for non-seekable fds such as reading from stdin.
> + for line in file: b64decode( line) bd.decompress( l)) b'\037\ 213\010' ): obj(-zlib. MAX_WBITS) bd.decompress( self._strip_ gzip_header( l))) obj() bd.decompress( l)) '{}'.format( b'\n')) b'\n'): '{}'.format( line[1: -1])) '{}'.format( line[1: ])) unicode( value) os.path. join(directory, item), 'wb')
> + # continuation line
> + if line.startswith(b' '):
> + if not b64_block:
> + continue
> + assert (key is not None and value is not None)
> + if b64_block:
> + l = base64.
> + if bd:
> + out.write(
> + else:
> + # lazy initialization of bd
> + # skip gzip header, if present
> + if l.startswith(
> + bd = zlib.decompress
> + out.write(
> + else:
> + # legacy zlib-only format used default block
> + # size
> + bd = zlib.decompress
> + out.write(
> + else:
> + if len(value) > 0:
> + out.write(
> + if line.endswith(
> + out.write(
> + else:
> + out.write(
> + else:
> + if b64_block:
> + if bd:
> + value += bd.flush()
> + b64_block = False
> + bd = None
> + if key:
> + assert value is not None
> + self.data[key] = self._try_
> + (key, value) = line.split(b':', 1)
> + if not _python2:
> + key = key.decode('ASCII')
> + if key != item:
> + continue
> + value = value.strip()
> + if value == b'base64':
> + value = b''
> + b64_block = True
> + try:
> + out=open(
> + except IOError as e:
> + fatal(str(e))
This logic can become a simpler as you know precisely what to scan for writing from
at first: a line equal to key + b':\n'. If it's not found, this should
throw a KeyError, otherwise you can start uncompressing/
there and quit once you arrive at the next key without having to read
the remainder of the file.
> + unicode( value) data.keys( ))
> + if key is not None:
> + self.data[key] = self._try_
> +
> + self.old_keys = set(self.
These two shouldn't happen as you don't modify the report.
> --- test/test_ problem_ report. py 2012-10-11 05:56:35 +0000 problem_ report. py 2015-01-26 14:41:51 +0000 BytesIO( b'ProblemType: Crash')) l(list( pr.keys( )), ['ProblemType']) report. ProblemReport( ) report. CompressedValue (b'FooFoo! ') report. CompressedValue () ].set_value( bin_data) report. CompressedValue (large_ val) report. CompressedValue (b'\1\1\ 1\n\2\2\ n\3\3\3' )
> +++ test/test_
> @@ -258,6 +266,52 @@
> pr.load(
> self.assertEqua
>
> +
> + def test_extract(self):
> + '''extract() with various binary elements.'''
> +
> + # create a test report with binary elements
> + large_val = b'A' * 5000000
> +
> + pr = problem_
> + pr['Txt'] = 'some text'
> + pr['MoreTxt'] = 'some more text'
> + pr['Foo'] = problem_
> + pr['Uncompressed'] = bin_data
> + pr['Bin'] = problem_
> + pr['Bin'
> + pr['Large'] = problem_
> + pr['Multiline'] = problem_
This is really binary, so I would just drop this bit.
> + report = BytesIO() l(os.path. exists( os.path. join(self. workdir, 'Txt')), False)
> + pr.write(report)
> + report.seek(0)
> +
> + #Extracts nothing if non binary
> + pr.extract(report, 'Txt', self.workdir )
> + self.assertEqua
This should raise a ValueError instead. Silently not doing anything on
wrong input (i. e. an ASCII key) is very un-Pythonic.
Also, don't use assertEqual(..., True|False) -- use assertTrue() and
assertFalse(). Verifying that it does not write a file is still a good
thing, of course (after the assertRaises).
> + #Check inexistant element l(os.path. exists( os.path. join(self. workdir, 'Bar')), False)
> + pr.extract(report, 'Bar', self.workdir )
> + self.assertEqua
This should raise a KeyError.
> + #Check valid elements path.join( self.workdir, 'Foo')) l(element. read(), b'FooFoo!' )
> + pr.extract(report, 'Foo', self.workdir )
> + element = open(os.
> + self.assertEqua
Please don't leave fd leaks. Use "with open(...) as f:". Same for the
tests below.
Finally, there are a lot of PEP-8 errors in this. Can you please
install "pep8" and fix the issues in
pep8 -r --ignore= E401,E501, W291,W293 test/test_ problem_ report. py E401,E501, E124 problem_report.py
pep8 -r --ignore=
That's what test/run does, BTW. You can just start that, and as soon
as it starts running the module tests you are good. It will fail very
quickly on pep8/pyflakes errors.
Thanks!
Martin www.piware. de
--
Martin Pitt | http://
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)