Merge lp:~brian-murray/apport/zgrep-fallback into lp:~apport-hackers/apport/trunk

Proposed by Brian Murray
Status: Merged
Approved by: Martin Pitt
Approved revision: 3107
Merged at revision: 3106
Proposed branch: lp:~brian-murray/apport/zgrep-fallback
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 36 lines (+16/-3)
1 file modified
backends/packaging-apt-dpkg.py (+16/-3)
To merge this branch: bzr merge lp:~brian-murray/apport/zgrep-fallback
Reviewer Review Type Date Requested Status
Martin Pitt (community) Approve
Review via email: mp+310218@code.launchpad.net

Description of the change

The production retracers for the Error Tracker were OOM'ing regularly when trying to use zgrep to search Contents.gz for files found in the crash report. While zgrep is faster than using gzip and reading the file line by line this still seems like a good fallback option and is better than having the retrace process crash, I've implemented the proposed change in the production version of the Error Tracker and have encountered no issues with it.

As mentioned this could likely be better:

+ try:
+ line = line.decode('UTF-8').rstrip('\n')
+ # 2016-11-01 this should be better
+ except UnicodeDecodeError:
+ continue

I added because of the following lines in Contents.gz for yakkety:

 $ zgrep -a "lenska.alias" /mnt/storage/archive-mirror/dists/yakkety/Contents-amd64.gz
usr/lib/aspell/�slenska.alias universe/text/aspell-is

Thanks!

To post a comment you must log in.
Revision history for this message
Martin Pitt (pitti) wrote :

I can't say that I like having two code paths for the same thing, but I accept that it's necessary. (Still, if these machines are that tight on RAM, can we increase that a bit?)

I would like this to be a bit faster and simpler though, see inline comment. Thanks!

3107. By Brian Murray

improve unicode handling after pitti's feedback

Revision history for this message
Brian Murray (brian-murray) wrote :

I've made the changes you've suggested, thanks!

Revision history for this message
Martin Pitt (pitti) wrote :

LGTM now, thanks! Please merge.

review: Approve
Revision history for this message
Brian Murray (brian-murray) wrote :

I don't have permission to commit to the apport project.

Revision history for this message
Martin Pitt (pitti) wrote :

Uh, what? Time to change that ☺ You are a member now.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'backends/packaging-apt-dpkg.py'
2--- backends/packaging-apt-dpkg.py 2016-08-13 07:09:38 +0000
3+++ backends/packaging-apt-dpkg.py 2016-11-08 19:25:48 +0000
4@@ -13,6 +13,7 @@
5 # the full text of the license.
6
7 import subprocess, os, glob, stat, sys, tempfile, shutil, time
8+import errno
9 import hashlib
10 import json
11
12@@ -1221,9 +1222,21 @@
13
14 # zgrep is magnitudes faster than a 'gzip.open/split() loop'
15 package = None
16- zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
17- stdout=subprocess.PIPE, stderr=subprocess.PIPE)
18- out = zgrep.communicate()[0].decode('UTF-8')
19+ try:
20+ zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
21+ stdout=subprocess.PIPE, stderr=subprocess.PIPE)
22+ out = zgrep.communicate()[0].decode('UTF-8')
23+ except OSError as e:
24+ if e.errno != errno.ENOMEM:
25+ raise
26+ file_b = file.encode()
27+ import gzip
28+ with gzip.open('%s' % map, 'rb') as contents:
29+ out = ''
30+ for line in contents:
31+ if line.startswith(file_b):
32+ out = line
33+ break
34 # we do not check the return code, since zgrep -m1 often errors out
35 # with 'stdout: broken pipe'
36 if out:

Subscribers

People subscribed via source and target branches