Merge lp:~brian-murray/apport/bug-1016380 into lp:apport

Proposed by Brian Murray on 2012-09-07
Status: Rejected
Rejected by: Martin Pitt on 2012-09-10
Proposed branch: lp:~brian-murray/apport/bug-1016380
Merge into: lp:apport
Diff against target: 15 lines (+4/-1)
1 file modified
apport/ (+4/-1)
To merge this branch: bzr merge lp:~brian-murray/apport/bug-1016380
Reviewer Review Type Date Requested Status
Martin Pitt 2012-09-07 Needs Fixing on 2012-09-10
Review via email:

Description of the change

All the bug patterns are strings so log files should be decoded so that the patterns won't crash on them.

Martin Pitt (pitti) wrote :

I added tests for this bug.

Why do you need to encode the regexp into UTF-8 when you already decode the UTF-8 string? I think the re and the text both have to have the same type.

We also need to handle the case where the bytes object is not UTF-8, otherwise the .decode('UTF-8') will crash with a UnicodeDecodeError.

So I think it's better to use re in bytes mode when the value is in bytes, as this is both more efficient (avoids decoding) more robust (avoids UnicodeDecodeErrors), and also allows us to match against binary values.

Thanks for the initial fix!

review: Needs Fixing

Unmerged revisions

2482. By Brian Murray on 2012-09-07

decode byte objects before trying to use a string regex with them

Preview Diff

1=== modified file 'apport/'
2--- apport/ 2012-08-31 10:42:24 +0000
3+++ apport/ 2012-09-07 20:57:23 +0000
4@@ -137,7 +137,10 @@
5 regexp = c.childNodes[0].nodeValue
6 v = report[key]
7 if isinstance(v, problem_report.CompressedValue):
8- v = v.get_value()
9+ v = v.get_value().decode('UTF-8')
10+ regexp = regexp.encode('UTF-8')
11+ elif isinstance(v, bytes):
12+ v = v.get_value().decode('UTF-8')
13 regexp = regexp.encode('UTF-8')
14 try:
15 re_c = re.compile(regexp)


