Merge lp:~cmiller/apport/attach_drm into lp:apport

Proposed by Chad Miller on 2015-02-24
Status: Merged
Merged at revision: 2918
Proposed branch: lp:~cmiller/apport/attach_drm
Merge into: lp:apport
Diff against target: 19 lines (+3/-3)
1 file modified
apport/ (+3/-3)
To merge this branch: bzr merge lp:~cmiller/apport/attach_drm
Reviewer Review Type Date Requested Status
Martin Pitt 2015-02-24 Approve on 2015-02-27
Review via email:

Commit message

Open DRM files as if they're bytes, not an encoded Unicode string. Process bytes, and decode (as "latin1") as last step, to get Unicode. Avoid at least two TypeErrors in Py3.

To post a comment you must log in.
Martin Pitt (pitti) wrote :

Thanks for this! But why do you decode them as latin-1? It was just encoded to base64, so ASCII should be fine. But "modes" and the others also just look like ASCII. So UTF-8 might still be a better default, or do you have a case where these attributes contain non-ASCII data which isn't UTF-8 either?

When presenting the data to the user, or downloading it from LP, apport will try and convert byte arrays to UTF-8 unicode strings; if that fails, the values are simply left as byte arrays. So either leave the entire info as a byte array, or try to convert it from ASCII or UTF-8 and if that fails with an UnicodeDecodeError, just leave it like it is?

review: Needs Information
Chad Miller (cmiller) wrote :

Martin, I have little confidence that the kernel will always return lower-7-bit values or valid UTF8 in those files. I *never want to guess encoding at all*, but if I have to pick, I would rather preserve bytes and get some mojibake that I can decode manually, than crash or discard.

I didn't dig into whether I could change the interface of the report hash to have byte array. If you say it's okay to make its values byte arrays, I'll happily drop the decode()ing.

Martin Pitt (pitti) wrote :

I modified the decoding to default to UTF-8 and be robust against wrong characters, and merged. Thanks!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apport/'
2--- apport/ 2015-02-10 11:50:59 +0000
3+++ apport/ 2015-02-24 19:33:57 +0000
4@@ -838,14 +838,14 @@
5 path = os.path.join(con, f)
6 if f == 'uevent' or not os.path.isfile(path):
7 continue
8- val = open(path).read().strip()
9+ val = open(path, "rb").read().strip()
10 # format some well-known attributes specially
11 if f == 'modes':
12- val = val.replace('\n', ' ')
13+ val = val.replace(b'\n', b' ')
14 if f == 'edid':
15 val = base64.b64encode(val)
16 f += '-base64'
17- info += '%s: %s\n' % (f, val)
18+ info += '%s: %s\n' % (f, val.decode("latin1"))
19 return info


People subscribed via source and target branches