Merge lp:~louis/apport/apport-unpack-extract into lp:~apport-hackers/apport/trunk
Status: | Merged |
---|---|
Merged at revision: | 2896 |
Proposed branch: | lp:~louis/apport/apport-unpack-extract |
Merge into: | lp:~apport-hackers/apport/trunk |
Diff against target: |
188 lines (+132/-3) 3 files modified
bin/apport-unpack (+11/-2) problem_report.py (+66/-0) test/test_problem_report.py (+55/-1) |
To merge this branch: | bzr merge lp:~louis/apport/apport-unpack-extract |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pitt (community) | Approve | ||
Review via email: mp+247591@code.launchpad.net |
Description of the change
The modifications to problem_report.py are mandatory in order for the changes in apport-unpack to work.
Changes to apport-unpack are only meaningful for Ubuntu Precise, so they could be brought in as a debian patch, hence dropped from that MP, while keeping the enhancement to problem_report.
Explanation:
============
Kernel VmCore can be very big ( multi Gigabytes) files. ProblemReport.
apport-unpack is modified to extract the content of the report in two steps :
1) Load the report with binary=False and write all the non-binary element to their respective file, keeping a list of the binary element identified.
2) Use the extract() method on each element identified as binary to write directly to disk
Here is a comparison of the time taken before and after the modification :
root@crashdump:~# time apport-unpack linux-image-
Test on Precise:
Current version :
real 4m54.216s
user 1m16.777s
sys 0m47.919s
With extract method :
============
real 0m12.574s
user 0m5.560s
sys 0m7.004s
Hey Louis,
thanks for working on this! The general approach looks fine to me, I
just have some small things to fix.
Louis Bouchard [2015-01-26 14:42 -0000]:
> Changes to apport-unpack are only meaningful for Ubuntu Precise, so
> they could be brought in as a debian patch, hence dropped from that
> MP, while keeping the enhancement to problem_report.
I think it's fine to keep that in trunk. We might have other large
attachments which are better handled that way.
> === modified file 'bin/apport-unpack'
> --- bin/apport-unpack 2012-05-04 06:54:56 +0000
> +++ bin/apport-unpack 2015-01-26 14:41:51 +0000
> @@ -49,18 +49,28 @@
> except OSError as e:
> fatal(str(e))
>
> +binaries = ()
Please make that a list or a set; using a tuple in this way is
confusing. It would also be nice if you could name this "bin_keys"?
> for k in pr:
> + if pr[k] == u'':
No u'' please, just ''; u'' isn't compatible with earlier Python 3.x.
> + binaries += (k,)
Following the above, bin_keys.append(k) (list) or .add(k) (set).
> + with open(report, 'rb') as f:
> + for element in binaries:
"key" instead of element, please? Let's consistently call the report
contents "key" and "value".
> + def extract(self, file, item=None, directory='/tmp'):
> + '''Extract only one element from the problem_report
> + directly without loading the report beforehand
> + This is required for Kernel Crash Dumps that can be
> + very big and saturate the RAM
> + '''
Short descriptions need to be one line. Please rename "item" to "key",
and drop the None default for it, as None will turn the whole thing
into a no-op. Also, don't specify a default for directory; such
defaults are for CLI programs, and libraries should *never* default to
/tmp unless they very carefully avoid overwriting existing files in a
race free manner (i. e. O_CREAT|O_EXCL).
Finally, to further point out that this is for a single key instead of
the whole report, could this be named extract_key()?
> + self._assert_ bin_mode( file)
> + self.data.clear()
> + key = None
> + value = None
> + b64_block = False
> + bd = None
> + # Make sure the report is at the beginning
> + file.seek(0)
I'd like to avoid this. First, it's unexpected (and undocumented), you
might deliberately call this on an open file at an offset, and second
it won't work for non-seekable fds such as reading from stdin.
> + for line in file: b64decode( line) bd.decompress( l)) b'\037\ 213\010' ): obj(-zlib. MAX_WBITS)
> + # continuation line
> + if line.startswith(b' '):
> + if not b64_block:
> + continue
> + assert (key is not None and value is not None)
> + if b64_block:
> + l = base64.
> + if bd:
> + out.write(
> + else:
> + # lazy initialization of bd
> + # skip gzip header, if present
> + if l.startswith(
> + bd = zlib.decompress
> + out.write(bd.de...