Merge lp:~diane-trout/dkimpy/python3 into lp:dkimpy

Proposed by Diane Trout
Status: Merged
Merge reported by: Stuart Gathman
Merged at revision: not available
Proposed branch: lp:~diane-trout/dkimpy/python3
Merge into: lp:dkimpy
Diff against target: 185 lines (+46/-19)
4 files modified
dkim/__init__.py (+8/-8)
dkim/dnsplug.py (+2/-2)
dkim/tests/test_dkim.py (+34/-7)
dkim/tests/test_util.py (+2/-2)
To merge this branch: bzr merge lp:~diane-trout/dkimpy/python3
Reviewer Review Type Date Requested Status
Stuart Gathman Pending
Review via email: mp+154488@code.launchpad.net

Description of the change

Several of the unit tests failed under python3.3

Under python3 there were some inconsistencies in whether strings were bytes or strings. I tried to make it more consistent. With revno 103 the only unit tests that were failing under python 3.3 were a docstring test.

To post a comment you must log in.
Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

Most of the proposed changes add b to a string constant. Does re.compile really take byte strings in python3? Or does it depend on whether you are parsing bytes or strings?

Revision history for this message
Diane Trout (diane-trout) wrote :

On Wednesday, March 20, 2013 23:27:28 Stuart Gathman wrote:
> Most of the proposed changes add b to a string constant. Does re.compile
> really take byte strings in python3? Or does it depend on whether you are
> parsing bytes or strings?

Yep. assuming the order is right. I did run test.py with python 2.7.3 and
python 3.3.0. In theory it passed with both. (I'm not sure what would happen
with 2.6)

Python 3.2.3 (default, Feb 20 2013, 14:44:27)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(br'foo')
<_sre.SRE_Pattern object at 0x1bcc5a0>
>>>

The order of the r & b modifiers matters:
rb'foo' gives a syntax error.

The reason I needed to go tag things is I got error messages like the
following.

Python 3.3.0 (default, Mar 6 2013, 13:45:29)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> a = b'123'
>>> re.search('2', a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/re.py", line 161, in search
    return _compile(pattern, flags).search(string)
TypeError: can't use a string pattern on a bytes-like object

Also I found another issue in dkimpy/__init__ where I had a message that had a
x tag but not the t tag. (rfc4871 says "if both are present" implying t= is
optional.)

I hadn't committed the following patch as I thought I should make a test case
first.

=== modified file 'dkim/__init__.py'
--- dkim/__init__.py 2013-03-20 16:01:55 +0000
+++ dkim/__init__.py 2013-03-20 16:12:41 +0000
@@ -177,7 +177,7 @@
         if re.match(br"\d+$", sig[b'x']) is None:
             raise ValidationError(
                 "x= value is not a decimal integer (%s)" % sig[b'x'])
- if int(sig[b'x']) < int(sig[b't']):
+ if b't' in sig and int(sig[b'x']) < int(sig[b't']):

Revision history for this message
Diane Trout (diane-trout) wrote :

On Wednesday, March 20, 2013 23:27:28 you wrote:
> Most of the proposed changes add b to a string constant. Does re.compile
> really take byte strings in python3? Or does it depend on whether you are
> parsing bytes or strings?

One other thing I found was the dns library I was using was returning python 3
strings, and most of dkim had been moved to using byte strings, which caused
indexing issues.

I'm not really sure if dns get_txt records should be returning bytes or
strings, and wanted to ask one of the python3 porting people but it looks like
they're still travelling back from pycon.

I did discover that looking for b'data' and 'data' in a python 3 dictionary is
different.

diane

Revision history for this message
Scott Kitterman (kitterman) wrote :

On Thursday, March 21, 2013 12:31:22 AM Diane Trout wrote:
> On Wednesday, March 20, 2013 23:27:28 you wrote:
> > Most of the proposed changes add b to a string constant. Does re.compile
> > really take byte strings in python3? Or does it depend on whether you are
> > parsing bytes or strings?
>
> One other thing I found was the dns library I was using was returning python
> 3 strings, and most of dkim had been moved to using byte strings, which
> caused indexing issues.
>
> I'm not really sure if dns get_txt records should be returning bytes or
> strings, and wanted to ask one of the python3 porting people but it looks
> like they're still travelling back from pycon.
>
> I did discover that looking for b'data' and 'data' in a python 3 dictionary
> is different.

I did the python3 port of pydns. It's quite possible I got it wrong. At the
time, there was no other python3 DNS module available, so I was sort of
feeling my way along in the dark. It wouldn't surprise me if I got stuff
wrong. Were you using pydns (module DNS) or dnspython (module dns)? Pydns
in particular is fun to deal with since it's very old. It was originally
written for python1.5 or 1.6 and has not had a lot more than bug fixing done to
it for a long time before I ported it.

Revision history for this message
Diane Trout (diane-trout) wrote :

Looks like the one I used was imported with:

import DNS

It appears the debian package is python3-dns with a homepage of
http://launchpad.net/py3dns

I'm guessing dns should probably return bytes by default?

I read through https://en.wikipedia.org/wiki/Internationalized_domain_name and
RFC 2181 which seem to indicate DNS can hold arbitrary binary strings and that
individual records can specify their own encoding -- like punycode
representation for non-ASCII domain names.

So the "ideal" dns library might have some helper functions to decode specific
resource records to unicode.

Diane

On Thursday, March 21, 2013 02:37:24 Scott Kitterman wrote:
> On Thursday, March 21, 2013 12:31:22 AM Diane Trout wrote:
> > On Wednesday, March 20, 2013 23:27:28 you wrote:
> > > Most of the proposed changes add b to a string constant. Does
> > > re.compile
> > > really take byte strings in python3? Or does it depend on whether you
> > > are
> > > parsing bytes or strings?
> >
> > One other thing I found was the dns library I was using was returning
> > python 3 strings, and most of dkim had been moved to using byte strings,
> > which caused indexing issues.
> >
> > I'm not really sure if dns get_txt records should be returning bytes or
> > strings, and wanted to ask one of the python3 porting people but it looks
> > like they're still travelling back from pycon.
> >
> > I did discover that looking for b'data' and 'data' in a python 3
> > dictionary
> > is different.
>
> I did the python3 port of pydns. It's quite possible I got it wrong. At
> the time, there was no other python3 DNS module available, so I was sort of
> feeling my way along in the dark. It wouldn't surprise me if I got stuff
> wrong. Were you using pydns (module DNS) or dnspython (module dns)?
> Pydns in particular is fun to deal with since it's very old. It was
> originally written for python1.5 or 1.6 and has not had a lot more than bug
> fixing done to it for a long time before I ported it.

lp:~diane-trout/dkimpy/python3 updated
104. By Diane Trout <email address hidden>

The t tag may not be present when x is specified.

Also add in some tests for checking validate_signature_fields

Revision history for this message
Scott Kitterman (kitterman) wrote :

On Thursday, March 21, 2013 03:16:25 AM you wrote:
> Looks like the one I used was imported with:
>
> import DNS
>
> It appears the debian package is python3-dns with a homepage of
> http://launchpad.net/py3dns
>
> I'm guessing dns should probably return bytes by default?
>
> I read through https://en.wikipedia.org/wiki/Internationalized_domain_name
> and RFC 2181 which seem to indicate DNS can hold arbitrary binary strings
> and that individual records can specify their own encoding -- like punycode
> representation for non-ASCII domain names.
>
> So the "ideal" dns library might have some helper functions to decode
> specific resource records to unicode.

That's the one I ported. If you have some interest in it, I'm glad for help
with getting it straightened out (even just correcting/contributing tests
would help a lot).

Revision history for this message
Diane Trout (diane-trout) wrote :

I'll make a stab at it in the next day or two, as It'd be cleaner than the
function i hid in dkimpy to convert return values to byte strings.

Also in my python3 dkimpy branch I also committed some tests and the fix for a
missing t=, should I open a new pull request and/or file a bug?

Diane

On Thursday, March 21, 2013 04:25:23 you wrote:
> On Thursday, March 21, 2013 03:16:25 AM you wrote:
> > Looks like the one I used was imported with:
> >
> > import DNS
> >
> > It appears the debian package is python3-dns with a homepage of
> > http://launchpad.net/py3dns
> >
> > I'm guessing dns should probably return bytes by default?
> >
> > I read through https://en.wikipedia.org/wiki/Internationalized_domain_name
> > and RFC 2181 which seem to indicate DNS can hold arbitrary binary strings
> > and that individual records can specify their own encoding -- like
> > punycode
> > representation for non-ASCII domain names.
> >
> > So the "ideal" dns library might have some helper functions to decode
> > specific resource records to unicode.
>
> That's the one I ported. If you have some interest in it, I'm glad for help
> with getting it straightened out (even just correcting/contributing tests
> would help a lot).

Revision history for this message
Scott Kitterman (kitterman) wrote :

A new merge request is probably best.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

Long ago, Nostradamus foresaw that on 03/20/2013 11:16 PM, Diane Trout
would write:
> Looks like the one I used was imported with:
>
> import DNS
>
> It appears the debian package is python3-dns with a homepage of
> http://launchpad.net/py3dns
>
> I'm guessing dns should probably return bytes by default?
>
> I read through https://en.wikipedia.org/wiki/Internationalized_domain_name and
> RFC 2181 which seem to indicate DNS can hold arbitrary binary strings and that
> individual records can specify their own encoding -- like punycode
> representation for non-ASCII domain names.
>
> So the "ideal" dns library might have some helper functions to decode specific
> resource records to unicode.
>
Yes, the unfinished goal for pydns is to have a "raw" value of bytes for
TXT, and "cooked" value of unicode. The default should be bytes (and I
screwed it up by adding some global options for decoding to unicode),
and we will be defining extensions for cooked values. Another screwy
thing is that A RRs are returned as dotted quad strings, whereas AAAA
RRs are returned as bytes. That is more difficult to rationalize
without breaking existing code. So that led to the idea of 3 modes:
raw, cooked, compat.

Revision history for this message
Diane Trout (diane-trout) wrote :

What do you think of the test layout in dkimpy? (test.py in root which is importing unittest style tests in <module>.tests?)

Once upon a time I tried to use unit2 & nose2 and had some state issues... (although that was a django app I was trying to test...)

The DNS.Request(...).req(resulttype='binary') from py3dns "3.0.3" worked 'better' than without the resulttype tag, however my program still crashed at some point in dkim because one of the data results wasn't a list.

I'm thinking a reasonable next step is to write some more test cases, I'm just not sure if you have a preference for a test runner for python3? Unittest is more verbose than some of the other choices, but at least it is in the standard library.

Diane

> On Thursday, March 21, 2013 03:16:25 AM you wrote:
> > Looks like the one I used was imported with:
> >
> > import DNS
> >
> > It appears the debian package is python3-dns with a homepage of
> > http://launchpad.net/py3dns
> >
> > I'm guessing dns should probably return bytes by default?
> >
> > I read through https://en.wikipedia.org/wiki/Internationalized_domain_name
> > and RFC 2181 which seem to indicate DNS can hold arbitrary binary strings
> > and that individual records can specify their own encoding -- like punycode
> > representation for non-ASCII domain names.
> >
> > So the "ideal" dns library might have some helper functions to decode
> > specific resource records to unicode.
>
> That's the one I ported. If you have some interest in it, I'm glad for help
> with getting it straightened out (even just correcting/contributing tests
> would help a lot).

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

Long ago, Nostradamus foresaw that on 03/22/2013 01:47 AM, Diane Trout
would write:
> What do you think of the test layout in dkimpy? (test.py in root which is importing unittest style tests in <module>.tests?)
>
> Once upon a time I tried to use unit2 & nose2 and had some state issues... (although that was a django app I was trying to test...)
The tests should be nose compatible, and include test.py in root so that
you can also run them without nose. At a simple level, nose is simply a
way of finding all your tests by naming convention (class names start
with Test... and methods start with test_...) so that you don't have to
write a test.py. Any deviations from a nose compatible class name
should be fixed.

lp:~diane-trout/dkimpy/python3 updated
105. By Diane Trout <email address hidden>

py3dns returns strings, not bytes. so the join didn't work.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

Revision 110 adds the changes from Diane that are still applicable. Both python2.7 and python3.4 pass unit tests now.

I am not happy that *all* of the API is bytes. For instance, the canonicalization options should be str, not bytes. The header names for frozen_sign, should_not_sign, etc, should be str. But, of course, email messages must be bytes.

I introduced a text(s) function that converts the bytes/str/unicode argument to str (as defined by python version). If text is a little more efficient (and I just remembered I can define one of 2 versions dynamically), we can use that to lookup header names in those config fields, and change them back to str instead of bytes.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

I guess I was supposed to create another branch and resubmit the merge proposal, and not just do it on the trunk. But except for frozen_headers, etc, the API is no worse. And we can fix those.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

So what is the proper way to prevent auto-merging this year old branch from Diane? I've already manually merged the changes that were still relevant.

Revision history for this message
Diane Trout (diane-trout) wrote :

On Sunday, February 22, 2015 01:37:56 Stuart Gathman wrote:
> So what is the proper way to prevent auto-merging this year old branch from
> Diane? I've already manually merged the changes that were still relevant.

I just got a notification from launchpad, would it help for me to try and pull
and update my branch?

Diane

Revision history for this message
William Grant (wgrant) wrote :

I'd just set the merge proposal status at the top to "Merged" manually. It won't automerge it.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'dkim/__init__.py'
2--- dkim/__init__.py 2012-10-27 15:23:05 +0000
3+++ dkim/__init__.py 2013-03-22 22:58:34 +0000
4@@ -118,8 +118,8 @@
5 lastindex[h] = i
6 return sign_headers
7
8-FWS = r'(?:\r?\n\s+)?'
9-RE_BTAG = re.compile(r'([;\s]b'+FWS+r'=)(?:'+FWS+r'[a-zA-Z0-9+/=])*(?:\r?\n\Z)?')
10+FWS = br'(?:\r?\n\s+)?'
11+RE_BTAG = re.compile(br'([;\s]b'+FWS+br'=)(?:'+FWS+br'[a-zA-Z0-9+/=])*(?:\r?\n\Z)?')
12
13 def hash_headers(hasher, canonicalize_headers, headers, include_headers,
14 sigheader, sig):
15@@ -165,7 +165,7 @@
16 raise ValidationError(
17 "i= domain is not a subdomain of d= (i=%s d=%s)" %
18 (sig[b'i'], sig[b'd']))
19- if b'l' in sig and re.match(br"\d{,76}$", sig['l']) is None:
20+ if b'l' in sig and re.match(br"\d{,76}$", sig[b'l']) is None:
21 raise ValidationError(
22 "l= value is not a decimal integer (%s)" % sig[b'l'])
23 if b'q' in sig and sig[b'q'] != b"dns/txt":
24@@ -177,7 +177,7 @@
25 if re.match(br"\d+$", sig[b'x']) is None:
26 raise ValidationError(
27 "x= value is not a decimal integer (%s)" % sig[b'x'])
28- if int(sig[b'x']) < int(sig[b't']):
29+ if b't' in sig and int(sig[b'x']) < int(sig[b't']):
30 raise ValidationError(
31 "x= value is less than t= value (x=%s t=%s)" %
32 (sig[b'x'], sig[b't']))
33@@ -417,7 +417,7 @@
34 include_headers = self.default_sign_headers()
35
36 # rfc4871 says FROM is required
37- if 'from' not in ( x.lower() for x in include_headers ):
38+ if b'from' not in ( x.lower() for x in include_headers ):
39 raise ParameterError("The From header field MUST be signed")
40
41 # raise exception for any SHOULD_NOT headers, call can modify
42@@ -540,7 +540,7 @@
43 raise KeyFormatError("missing public key: %s"%name)
44 try:
45 pub = parse_tag_value(s)
46- except InvalidTagValueList:
47+ except InvalidTagValueList as e:
48 raise KeyFormatError(e)
49 try:
50 pk = parse_public_key(base64.b64decode(pub[b'p']))
51@@ -555,8 +555,8 @@
52 # fields when verifying. Since there should be only one From header,
53 # this shouldn't break any legitimate messages. This could be
54 # generalized to check for extras of other singleton headers.
55- if 'from' in include_headers:
56- include_headers.append('from')
57+ if b'from' in include_headers:
58+ include_headers.append(b'from')
59 h = hasher()
60 self.signed_headers = hash_headers(
61 h, canon_policy, headers, include_headers, sigheaders[idx], sig)
62
63=== modified file 'dkim/dnsplug.py'
64--- dkim/dnsplug.py 2012-06-13 03:09:15 +0000
65+++ dkim/dnsplug.py 2013-03-22 22:58:34 +0000
66@@ -44,7 +44,7 @@
67 response = DNS.DnsRequest(name, qtype='txt').req()
68 if not response.answers:
69 return None
70- return b''.join(response.answers[0]['data'])
71+ return ''.join(response.answers[0]['data'])
72
73 def get_txt_Milter_dns(name):
74 """Return a TXT record associated with a DNS name."""
75@@ -80,6 +80,6 @@
76 except UnicodeDecodeError:
77 return None
78 txt = _get_txt(unicode_name)
79- if txt:
80+ if txt and isinstance(txt, str):
81 txt = txt.encode('utf-8')
82 return txt
83
84=== modified file 'dkim/tests/test_dkim.py'
85--- dkim/tests/test_dkim.py 2012-10-27 14:48:29 +0000
86+++ dkim/tests/test_dkim.py 2013-03-22 22:58:34 +0000
87@@ -92,14 +92,14 @@
88 # <https://bugs.launchpad.net/dkimpy/+bug/939128>
89 # Simple-mode signature header verification is wrong
90 # (should ignore FWS anywhere in signature tag: b=)
91- sample_msg = """\
92+ sample_msg = b"""\
93 From: mbp@canonical.com
94 To: scottk@example.com
95 Subject: this is my
96 test message
97-""".replace('\n', '\r\n')
98+""".replace(b'\n', b'\r\n')
99
100- sample_privkey = """\
101+ sample_privkey = b"""\
102 -----BEGIN RSA PRIVATE KEY-----
103 MIIBOwIBAAJBANmBe10IgY+u7h3enWTukkqtUD5PR52Tb/mPfjC0QJTocVBq6Za/
104 PlzfV+Py92VaCak19F4WrbVTK5Gg5tW220MCAwEAAQJAYFUKsD+uMlcFu1D3YNaR
105@@ -111,22 +111,22 @@
106 -----END RSA PRIVATE KEY-----
107 """
108
109- sample_pubkey = """\
110+ sample_pubkey = b"""\
111 -----BEGIN PUBLIC KEY-----
112 MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANmBe10IgY+u7h3enWTukkqtUD5PR52T
113 b/mPfjC0QJTocVBq6Za/PlzfV+Py92VaCak19F4WrbVTK5Gg5tW220MCAwEAAQ==
114 -----END PUBLIC KEY-----
115 """
116
117- sample_dns = """\
118+ sample_dns = b"""\
119 k=rsa; \
120 p=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANmBe10IgY+u7h3enWTukkqtUD5PR52T\
121 b/mPfjC0QJTocVBq6Za/PlzfV+Py92VaCak19F4WrbVTK5Gg5tW220MCAwEAAQ=="""
122
123- _dns_responses = {'example._domainkey.canonical.com.': sample_dns}
124+ _dns_responses = {b'example._domainkey.canonical.com.': sample_dns}
125 for header_mode in [dkim.Relaxed, dkim.Simple]:
126
127- dkim_header = dkim.sign(sample_msg, 'example', 'canonical.com',
128+ dkim_header = dkim.sign(sample_msg, b'example', b'canonical.com',
129 sample_privkey, canonicalize=(header_mode, dkim.Relaxed))
130 # Folding dkim_header affects b= tag only, since dkim.sign folds
131 # sig_value with empty b= before hashing, and then appends the
132@@ -192,6 +192,33 @@
133 res = dkim.verify(sig+message, dnsfunc=self.dnsfunc)
134 self.assertFalse(res)
135
136+ def test_validate_signature_fields(self):
137+ sig = {b'v': b'1',
138+ b'a': b'rsa-sha256',
139+ b'b': b'K/UUOt8lCtgjp3kSTogqBm9lY1Yax/NwZ+bKm39/WKzo5KYe3L/6RoIA/0oiDX4kO\n \t Qut49HCV6ZUe6dY9V5qWBwLanRs1sCnObaOGMpFfs8tU4TWpDSVXaNZAqn15XVW0WH\n \t EzOzUfVuatpa1kF4voIgSbmZHR1vN3WpRtcTBe/I=',
140+ b'bh': b'n0HUwGCP28PkesXBPH82Kboy8LhNFWU9zUISIpAez7M=',
141+ b'c': b'simple/simple',
142+ b'd': b'kitterman.com',
143+ b'h': b'From:To:Subject:Date:Cc:MIME-Version:Content-Type:\n \t Content-Transfer-Encoding:Message-Id',
144+ b's': b'2007-00',
145+ b't': b'1299525798'}
146+ dkim.validate_signature_fields(sig)
147+ # try new version
148+ sigVer = sig.copy()
149+ sigVer[b'v'] = 2
150+ self.assertRaises(dkim.ValidationError, dkim.validate_signature_fields, sigVer)
151+ # try with x
152+ sigX = sig.copy()
153+ sigX[b'x'] = b'1399525798'
154+ dkim.validate_signature_fields(sig)
155+ # try with late t
156+ sigX[b't'] = b'1400000000'
157+ self.assertRaises(dkim.ValidationError, dkim.validate_signature_fields, sigX)
158+ # try without t
159+ del sigX[b't']
160+ dkim.validate_signature_fields(sigX)
161+
162+
163 def test_suite():
164 from unittest import TestLoader
165 return TestLoader().loadTestsFromName(__name__)
166
167=== modified file 'dkim/tests/test_util.py'
168--- dkim/tests/test_util.py 2011-06-17 03:38:14 +0000
169+++ dkim/tests/test_util.py 2013-03-22 22:58:34 +0000
170@@ -62,7 +62,7 @@
171 DuplicateTag, parse_tag_value, b'foo=bar;foo=baz')
172
173 def test_trailing_whitespace(self):
174- hval = '''v=1; a=rsa-sha256; d=facebookmail.com; s=s1024-2011-q2; c=relaxed/simple;
175+ hval = b'''v=1; a=rsa-sha256; d=facebookmail.com; s=s1024-2011-q2; c=relaxed/simple;
176 q=dns/txt; i=@facebookmail.com; t=1308078492;
177 h=From:Subject:Date:To:MIME-Version:Content-Type;
178 bh=+qPyCOiDQkusTPstCoGjimgDgeZbUaJWIr1mdE6RFxk=;
179@@ -71,7 +71,7 @@
180 3KzW0yB9JHwiDCw1EioVkv+OMHhAYzoIypA0bQyi2bc=;
181 '''
182 sig = parse_tag_value(hval)
183- self.assertEquals(sig[b't'],'1308078492')
184+ self.assertEquals(sig[b't'],b'1308078492')
185 self.assertEquals(len(sig),11)
186
187

Subscribers

People subscribed via source and target branches