Merge lp:~brian-murray/apport/bug-1016380 into lp:~apport-hackers/apport/trunk

Proposed by Brian Murray
Status: Rejected
Rejected by: Martin Pitt
Proposed branch: lp:~brian-murray/apport/bug-1016380
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 15 lines (+4/-1)
1 file modified
apport/report.py (+4/-1)
To merge this branch: bzr merge lp:~brian-murray/apport/bug-1016380
Reviewer Review Type Date Requested Status
Martin Pitt (community) Needs Fixing
Review via email: mp+123365@code.launchpad.net

Description of the change

All the bug patterns are strings so log files should be decoded so that the patterns won't crash on them.

To post a comment you must log in.
Revision history for this message
Martin Pitt (pitti) wrote :

I added tests for this bug.

Why do you need to encode the regexp into UTF-8 when you already decode the UTF-8 string? I think the re and the text both have to have the same type.

We also need to handle the case where the bytes object is not UTF-8, otherwise the .decode('UTF-8') will crash with a UnicodeDecodeError.

So I think it's better to use re in bytes mode when the value is in bytes, as this is both more efficient (avoids decoding) more robust (avoids UnicodeDecodeErrors), and also allows us to match against binary values.

Thanks for the initial fix!

review: Needs Fixing
Revision history for this message
Martin Pitt (pitti) wrote :

Unmerged revisions

2482. By Brian Murray

decode byte objects before trying to use a string regex with them

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apport/report.py'
2--- apport/report.py 2012-08-31 10:42:24 +0000
3+++ apport/report.py 2012-09-07 20:57:23 +0000
4@@ -137,7 +137,10 @@
5 regexp = c.childNodes[0].nodeValue
6 v = report[key]
7 if isinstance(v, problem_report.CompressedValue):
8- v = v.get_value()
9+ v = v.get_value().decode('UTF-8')
10+ regexp = regexp.encode('UTF-8')
11+ elif isinstance(v, bytes):
12+ v = v.get_value().decode('UTF-8')
13 regexp = regexp.encode('UTF-8')
14 try:
15 re_c = re.compile(regexp)

Subscribers

People subscribed via source and target branches