Merge lp:~cmiller/apport/attach_drm into lp:~apport-hackers/apport/trunk

Proposed by Chad Miller
Status: Merged
Merged at revision: 2918
Proposed branch: lp:~cmiller/apport/attach_drm
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 19 lines (+3/-3)
1 file modified
apport/hookutils.py (+3/-3)
To merge this branch: bzr merge lp:~cmiller/apport/attach_drm
Reviewer Review Type Date Requested Status
Martin Pitt (community) Approve
Review via email: mp+250831@code.launchpad.net

Commit message

Open DRM files as if they're bytes, not an encoded Unicode string. Process bytes, and decode (as "latin1") as last step, to get Unicode. Avoid at least two TypeErrors in Py3.

To post a comment you must log in.
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks for this! But why do you decode them as latin-1? It was just encoded to base64, so ASCII should be fine. But "modes" and the others also just look like ASCII. So UTF-8 might still be a better default, or do you have a case where these attributes contain non-ASCII data which isn't UTF-8 either?

When presenting the data to the user, or downloading it from LP, apport will try and convert byte arrays to UTF-8 unicode strings; if that fails, the values are simply left as byte arrays. So either leave the entire info as a byte array, or try to convert it from ASCII or UTF-8 and if that fails with an UnicodeDecodeError, just leave it like it is?

review: Needs Information
Revision history for this message
Chad Miller (cmiller) wrote :

Martin, I have little confidence that the kernel will always return lower-7-bit values or valid UTF8 in those files. I *never want to guess encoding at all*, but if I have to pick, I would rather preserve bytes and get some mojibake that I can decode manually, than crash or discard.

I didn't dig into whether I could change the interface of the report hash to have byte array. If you say it's okay to make its values byte arrays, I'll happily drop the decode()ing.

Revision history for this message
Martin Pitt (pitti) wrote :

I modified the decoding to default to UTF-8 and be robust against wrong characters, and merged. Thanks!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apport/hookutils.py'
2--- apport/hookutils.py 2015-02-10 11:50:59 +0000
3+++ apport/hookutils.py 2015-02-24 19:33:57 +0000
4@@ -838,14 +838,14 @@
5 path = os.path.join(con, f)
6 if f == 'uevent' or not os.path.isfile(path):
7 continue
8- val = open(path).read().strip()
9+ val = open(path, "rb").read().strip()
10 # format some well-known attributes specially
11 if f == 'modes':
12- val = val.replace('\n', ' ')
13+ val = val.replace(b'\n', b' ')
14 if f == 'edid':
15 val = base64.b64encode(val)
16 f += '-base64'
17- info += '%s: %s\n' % (f, val)
18+ info += '%s: %s\n' % (f, val.decode("latin1"))
19 return info
20
21

Subscribers

People subscribed via source and target branches