Merge lp:~stevanr/linaro-license-protection/automate-integration-tests into lp:~linaro-automation/linaro-license-protection/trunk

Proposed by Stevan Radaković
Status: Merged
Approved by: James Tunnicliffe
Approved revision: 75
Merged at revision: 71
Proposed branch: lp:~stevanr/linaro-license-protection/automate-integration-tests
Merge into: lp:~linaro-automation/linaro-license-protection/trunk
Diff against target: 517 lines (+305/-143)
5 files modified
.htaccess (+2/-2)
README (+7/-0)
testing/filefetcher.py (+0/-129)
testing/license_protected_file_downloader.py (+284/-0)
testing/test_click_through_license.py (+12/-12)
To merge this branch: bzr merge lp:~stevanr/linaro-license-protection/automate-integration-tests
Reviewer Review Type Date Requested Status
James Tunnicliffe (community) Approve
Данило Шеган code Pending
Review via email: mp+105209@code.launchpad.net

Description of the change

Update filefetcher to the newest version from James' branch.
Fix https://bugs.launchpad.net/linaro-license-protection/+bug/996002
Automate integration tests after deployment to production.

To post a comment you must log in.
Revision history for this message
Stevan Radaković (stevanr) wrote :

Sorry guys, I accidentally did everything in one commit. I reverted file to James' version and pushed my new version again.

71. By Stevan Radaković

Reverting file so changes can be seen.

72. By Stevan Radaković

Done reverting file so changes can be seen.

Revision history for this message
Данило Шеган (danilo) wrote :

This would be a good opportunity to add a dependencies section to the 'Setup' section in the README (or if you have a better idea of where it should go, just go for it). Something along the following lines:

Dependencies
............

libapache2-mod-php5

Testing: phpunit, testrepository, python-html2text

Revision history for this message
Данило Шеган (danilo) wrote :

Also, tests are still not passing with these changes. Have you had a chance to investigate that?

73. By Stevan Radaković

Wrong identation fix in _get_license method

74. By Stevan Radaković

Incorrect parsing of the domain fixed

75. By Stevan Radaković

Tests updated to use new filefetcher

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I am working on the assumption that the tests pass now :-)

This looks fine. Please add the lines to README that Danilo suggested as well, but I don't think there is any reason to re-review with that change, so I will approve this. Of course, if you have other ideas for the set up instructions, then please just check in without that change.

review: Approve
76. By Stevan Radaković

Add Dependencies section to README file

77. By Stevan Radaković

Revert accidental commit of __init__.py

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file '.htaccess'
--- .htaccess 2012-05-02 11:33:12 +0000
+++ .htaccess 2012-05-11 12:35:21 +0000
@@ -13,12 +13,12 @@
13## without port number for use in cookie domain13## without port number for use in cookie domain
14RewriteCond %{SERVER_PORT} !^80$ [OR]14RewriteCond %{SERVER_PORT} !^80$ [OR]
15RewriteCond %{SERVER_PORT} !^443$15RewriteCond %{SERVER_PORT} !^443$
16RewriteCond %{HTTP_HOST} (.*)(\:.*)16RewriteCond %{HTTP_HOST} ^([^:]*)$
17RewriteRule .* - [E=CO_DOMAIN:%1]17RewriteRule .* - [E=CO_DOMAIN:%1]
1818
19RewriteCond %{SERVER_PORT} !^80$ [OR]19RewriteCond %{SERVER_PORT} !^80$ [OR]
20RewriteCond %{SERVER_PORT} !^443$20RewriteCond %{SERVER_PORT} !^443$
21RewriteCond %{HTTP_HOST} (^.*$)21RewriteCond %{HTTP_HOST} ^([^:]*):(.*)$
22RewriteRule .* - [E=CO_DOMAIN:%1]22RewriteRule .* - [E=CO_DOMAIN:%1]
2323
24## Let internal hosts through always.24## Let internal hosts through always.
2525
=== modified file 'README'
--- README 2012-05-08 19:51:41 +0000
+++ README 2012-05-11 12:35:21 +0000
@@ -15,6 +15,13 @@
1515
16Currently, all directories/files containing either 'origen' or 'snowball' in the URL path are protected with appropriate license (Samsung or ST-E) click-through.16Currently, all directories/files containing either 'origen' or 'snowball' in the URL path are protected with appropriate license (Samsung or ST-E) click-through.
1717
18Dependencies
19............
20
21libapache2-mod-php5
22
23Testing: phpunit, testrepository, python-html2text
24
1825
19Technical details26Technical details
20-----------------27-----------------
2128
=== removed file 'testing/filefetcher.py'
--- testing/filefetcher.py 2012-01-13 11:48:16 +0000
+++ testing/filefetcher.py 1970-01-01 00:00:00 +0000
@@ -1,129 +0,0 @@
1#!/usr/bin/env python
2
3# Changes required to address EULA for the origen hwpacks
4
5import argparse
6import os
7import pycurl
8import re
9import urlparse
10
11
12class LicenseProtectedFileFetcher:
13 """Fetch a file from the web that may be protected by a license redirect
14
15 This is designed to run on snapshots.linaro.org. License HTML file are in
16 the form:
17
18 <vendor>.html has a link to <vendor>-accept.html
19
20 If self.get is pointed at a file that has to go through one of these
21 licenses, it should be able to automatically accept the license and
22 download the file.
23
24 Once a license has been accepted, it will be used for all following
25 downloads.
26
27 If self.close() is called before the object is deleted, cURL will store
28 the license accept cookie to cookies.txt, so it can be used for later
29 downloads.
30
31 """
32 def __init__(self):
33 """Set up cURL"""
34 self.curl = pycurl.Curl()
35 self.curl.setopt(pycurl.FOLLOWLOCATION, 1)
36 self.curl.setopt(pycurl.WRITEFUNCTION, self._write_body)
37 self.curl.setopt(pycurl.HEADERFUNCTION, self._write_header)
38 self.curl.setopt(pycurl.COOKIEFILE, "cookies.txt")
39 self.curl.setopt(pycurl.COOKIEJAR, "cookies.txt")
40
41 def _get(self, url):
42 """Clear out header and body storage, fetch URL, filling them in."""
43 self.curl.setopt(pycurl.URL, url)
44
45 self.body = ""
46 self.header = ""
47
48 self.curl.perform()
49
50 def get(self, url, ignore_license=False, accept_license=True):
51 """Fetch the requested URL, ignoring license at all or
52 accepting or declining licenses, returns file body.
53
54 Fetches the file at url. If a redirect is encountered, it is
55 expected to be to a license that has an accept or decline link.
56 Follow that link, then download original file or nolicense notice.
57
58 """
59 self._get(url)
60
61 if ignore_license:
62 return self.body
63
64 location = self._get_location()
65 if location:
66 # Off to the races - we have been redirected.
67 # Expect to find a link to self.location with -accepted or
68 # -declined inserted before the .html,
69 # i.e. ste.html -> ste-accepted.html
70
71 # Get the file from the URL (full path)
72 file = urlparse.urlparse(location).path
73
74 # Get the file without the rest of the path
75 file = os.path.split(file)[-1]
76
77 # Look for a link with accepted.html or declined.html
78 # in the page name. Follow it.
79 new_file = None
80 for line in self.body.splitlines():
81 if accept_license:
82 link_search = re.search("""href=.*?["'](.*?-accepted.html)""",
83 line)
84 else:
85 link_search = re.search("""href=.*?["'](.*?-declined.html)""",
86 line)
87 if link_search:
88 # Have found license decline URL!
89 new_file = link_search.group(1)
90
91 if new_file:
92 # accept or decline the license...
93 next_url = re.sub(file, new_file, location)
94 self._get(next_url)
95
96 # The above get *should* take us to the file requested via
97 # a redirect. If we manually need to follow that redirect,
98 # do that now.
99
100 if accept_license and self._get_location():
101 # If we haven't been redirected to our original file,
102 # we should be able to just download it now.
103 self._get(url)
104
105 return self.body
106
107 def _search_header(self, field):
108 """Search header for the supplied field, return field / None"""
109 for line in self.header.splitlines():
110 search = re.search(field + ":\s+(.*?)$", line)
111 if search:
112 return search.group(1)
113 return None
114
115 def _get_location(self):
116 """Return content of Location field in header / None"""
117 return self._search_header("Location")
118
119 def _write_body(self, buf):
120 """Used by curl as a sink for body content"""
121 self.body += buf
122
123 def _write_header(self, buf):
124 """Used by curl as a sink for header content"""
125 self.header += buf
126
127 def close(self):
128 """Wrapper to close curl - this will allow curl to write out cookies"""
129 self.curl.close()
1300
=== added file 'testing/license_protected_file_downloader.py'
--- testing/license_protected_file_downloader.py 1970-01-01 00:00:00 +0000
+++ testing/license_protected_file_downloader.py 2012-05-11 12:35:21 +0000
@@ -0,0 +1,284 @@
1#!/usr/bin/env python
2
3import argparse
4import os
5import pycurl
6import re
7import urlparse
8import html2text
9from BeautifulSoup import BeautifulSoup
10
11class LicenseProtectedFileFetcher:
12 """Fetch a file from the web that may be protected by a license redirect
13
14 This is designed to run on snapshots.linaro.org. License HTML file are in
15 the form:
16
17 <vendor>.html has a link to <vendor>-accept.html
18
19 If self.get is pointed at a file that has to go through one of these
20 licenses, it should be able to automatically accept the license and
21 download the file.
22
23 Once a license has been accepted, it will be used for all following
24 downloads.
25
26 If self.close() is called before the object is deleted, cURL will store
27 the license accept cookie to cookies.txt, so it can be used for later
28 downloads.
29
30 """
31 def __init__(self, cookie_file="cookies.txt"):
32 """Set up cURL"""
33 self.curl = pycurl.Curl()
34 self.curl.setopt(pycurl.WRITEFUNCTION, self._write_body)
35 self.curl.setopt(pycurl.HEADERFUNCTION, self._write_header)
36 self.curl.setopt(pycurl.FOLLOWLOCATION, 1)
37 self.curl.setopt(pycurl.COOKIEFILE, cookie_file)
38 self.curl.setopt(pycurl.COOKIEJAR, cookie_file)
39 self.file_out = None
40
41 def _get(self, url):
42 """Clear out header and body storage, fetch URL, filling them in."""
43 url = url.encode("ascii")
44 self.curl.setopt(pycurl.URL, url)
45
46 self.body = ""
47 self.header = ""
48
49 if self.file_name:
50 self.file_out = open(self.file_name, 'w')
51 else:
52 self.file_out = None
53
54 self.curl.perform()
55 self._parse_headers(url)
56
57 if self.file_out:
58 self.file_out.close()
59
60 def _parse_headers(self, url):
61 header = {}
62 for line in self.header.splitlines():
63 # Header lines typically are of the form thing: value...
64 test_line = re.search("^(.*?)\s*:\s*(.*)$", line)
65
66 if test_line:
67 header[test_line.group(1)] = test_line.group(2)
68
69 # The location attribute is sometimes relative, but we would
70 # like to have it as always absolute...
71 if 'Location' in header:
72 parsed_location = urlparse.urlparse(header["Location"])
73
74 # If not an absolute location...
75 if not parsed_location.netloc:
76 parsed_source_url = urlparse.urlparse(url)
77 new_location = ["", "", "", "", ""]
78
79 new_location[0] = parsed_source_url.scheme
80 new_location[1] = parsed_source_url.netloc
81 new_location[2] = header["Location"]
82
83 # Update location with absolute URL
84 header["Location"] = urlparse.urlunsplit(new_location)
85
86 self.header_text = self.header
87 self.header = header
88
89 def get_headers(self, url):
90 url = url.encode("ascii")
91 self.curl.setopt(pycurl.URL, url)
92
93 self.body = ""
94 self.header = ""
95
96 # Setting NOBODY causes CURL to just fetch the header.
97 self.curl.setopt(pycurl.NOBODY, True)
98 self.curl.perform()
99 self.curl.setopt(pycurl.NOBODY, False)
100
101 self._parse_headers(url)
102
103 return self.header
104
105 def get_or_return_license(self, url, file_name=None):
106 """Get file at the requested URL or, if behind a license, return that.
107
108 If the URL provided does not redirect us to a license, then return the
109 body of that file. If we are redirected to a license click through
110 then return (the license as plain text, url to accept the license).
111
112 If the user of this function accepts the license, then they should
113 call get_protected_file."""
114
115 self.file_name = file_name
116
117 # Get the license details. If this returns None, the file isn't license
118 # protected and we can just return the file we started to get in the
119 # function (self.body).
120 license_details = self._get_license(url)
121
122 if license_details:
123 return license_details
124
125 return self.body
126
127 def get(self, url, file_name=None, ignore_license=False, accept_license=True):
128 """Fetch the requested URL, accepting licenses
129
130 Fetches the file at url. If a redirect is encountered, it is
131 expected to be to a license that has an accept link. Follow that link,
132 then download the original file. Returns the fist 1MB of the file
133 (see _write_body).
134
135 """
136
137 self.file_name = file_name
138 if ignore_license:
139 self._get(url)
140 return self.body
141
142 license_details = self._get_license(url)
143
144 if license_details:
145 # Found a license.
146 if accept_license:
147 # Accept the license without looking at it and
148 # start fetching the file we originally wanted.
149 accept_url = license_details[1]
150 self.get_protected_file(accept_url, url)
151 else:
152 # We want to decline the license and return the notice.
153 decline_url = license_details[2]
154 self._get(decline_url)
155
156 else:
157 # If we got here, there wasn't a license protecting the file
158 # so we just fetch it.
159 self._get(url)
160
161 return self.body
162
163 def _get_license(self, url):
164 """Return (license, accept URL, decline URL) if found,
165 else return None.
166
167 """
168
169 self.get_headers(url)
170
171 if "Location" in self.header and self.header["Location"] != url:
172 # We have been redirected to a new location - the license file
173 location = self.header["Location"]
174
175 # Fetch the license HTML
176 self._get(location)
177
178 # Get the file from the URL (full path)
179 file = urlparse.urlparse(location).path
180
181 # Get the file without the rest of the path
182 file = os.path.split(file)[-1]
183
184 # Look for a link with accepted.html in the page name. Follow it.
185 accept_search, decline_search = None, None
186 for line in self.body.splitlines():
187 if not accept_search:
188 accept_search = re.search(
189 """href=.*?["'](.*?-accepted.html)""",
190 line)
191 if not decline_search:
192 decline_search = re.search(
193 """href=.*?["'](.*?-declined.html)""",
194 line)
195
196 if accept_search and decline_search:
197 # Have found license accept URL!
198 new_file = accept_search.group(1)
199 accept_url = re.sub(file, new_file, location)
200
201 # Found decline URL as well.
202 new_file_decline = decline_search.group(1)
203 decline_url = re.sub(file, new_file_decline, location)
204
205 # Parse the HTML using BeautifulSoup
206 soup = BeautifulSoup(self.body)
207
208 # The license is in a div with the ID license-text, so we
209 # use this to pull just the license out of the HTML.
210 html_license = u""
211 for chunk in soup.findAll(id="license-text"):
212 # Output of chunk.prettify is UTF8, but comes back
213 # as a str, so convert it here.
214 html_license += chunk.prettify().decode("utf-8")
215
216 text_license = html2text.html2text(html_license)
217
218 return text_license, accept_url, decline_url
219
220 return None
221
222 def get_protected_file(self, accept_url, url):
223 """Gets the file redirected to by the accept_url"""
224
225 self._get(accept_url) # Accept the license
226
227 if not("Location" in self.header and self.header["Location"] == url):
228 # If we got here, we don't have the file yet (weren't redirected
229 # to it). Fetch our target file. This should work now that we have
230 # the right cookie.
231 self._get(url) # Download the target file
232
233 return self.body
234
235 def _write_body(self, buf):
236 """Used by curl as a sink for body content"""
237
238 # If we have a target file to write to, write to it
239 if self.file_out:
240 self.file_out.write(buf)
241
242 # Only buffer first 1MB of body. This should be plenty for anything
243 # we wish to parse internally.
244 if len(self.body) < 1024*1024*1024:
245 # XXX Would be nice to stop keeping the file in RAM at all and
246 # passing large buffers around. Perhaps only keep in RAM if
247 # file_name == None? (used for getting directory listings
248 # normally).
249 self.body += buf
250
251 def _write_header(self, buf):
252 """Used by curl as a sink for header content"""
253 self.header += buf
254
255 def register_progress_callback(self, callback):
256 self.curl.setopt(pycurl.NOPROGRESS, 0)
257 self.curl.setopt(pycurl.PROGRESSFUNCTION, callback)
258
259 def close(self):
260 """Wrapper to close curl - this will allow curl to write out cookies"""
261 self.curl.close()
262
263def main():
264 """Download file specified on command line"""
265 parser = argparse.ArgumentParser(description="Download a file, accepting "
266 "any licenses required to do so.")
267
268 parser.add_argument('url', metavar="URL", type=str, nargs=1,
269 help="URL of file to download.")
270
271 args = parser.parse_args()
272
273 fetcher = LicenseProtectedFileFetcher()
274
275 # Get file name from URL
276 file_name = os.path.basename(urlparse.urlparse(args.url[0]).path)
277 if not file_name:
278 file_name = "downloaded"
279 fetcher.get(args.url[0], file_name)
280
281 fetcher.close()
282
283if __name__ == "__main__":
284 main()
0285
=== modified file 'testing/test_click_through_license.py'
--- testing/test_click_through_license.py 2012-05-07 08:48:51 +0000
+++ testing/test_click_through_license.py 2012-05-11 12:35:21 +0000
@@ -9,7 +9,7 @@
99
10from testtools import TestCase10from testtools import TestCase
11from testtools.matchers import Mismatch11from testtools.matchers import Mismatch
12from filefetcher import LicenseProtectedFileFetcher12from license_protected_file_downloader import LicenseProtectedFileFetcher
1313
14fetcher = LicenseProtectedFileFetcher()14fetcher = LicenseProtectedFileFetcher()
15cwd = os.getcwd()15cwd = os.getcwd()
@@ -145,19 +145,19 @@
145 self.assertThat(testfile, Contains(search))145 self.assertThat(testfile, Contains(search))
146146
147 def test_redirect_to_license_samsung(self):147 def test_redirect_to_license_samsung(self):
148 search = "LICENSE AGREEMENT"148 search = "PLEASE READ THE FOLLOWING AGREEMENT CAREFULLY"
149 testfile = fetcher.get(host + samsung_test_file, ignore_license=True)149 testfile = fetcher.get_or_return_license(host + samsung_test_file)
150 self.assertThat(testfile, Contains(search))150 self.assertThat(testfile[0], Contains(search))
151151
152 def test_redirect_to_license_ste(self):152 def test_redirect_to_license_ste(self):
153 search = "LICENSE AGREEMENT"153 search = "PLEASE READ THE FOLLOWING AGREEMENT CAREFULLY"
154 testfile = fetcher.get(host + ste_test_file, ignore_license=True)154 testfile = fetcher.get_or_return_license(host + ste_test_file)
155 self.assertThat(testfile, Contains(search))155 self.assertThat(testfile[0], Contains(search))
156156
157 def test_redirect_to_license_linaro(self):157 def test_redirect_to_license_linaro(self):
158 search = "LICENSE AGREEMENT"158 search = "Linaro license."
159 testfile = fetcher.get(host + linaro_test_file, ignore_license=True)159 testfile = fetcher.get_or_return_license(host + linaro_test_file)
160 self.assertThat(testfile, Contains(search))160 self.assertThat(testfile[0], Contains(search))
161161
162 def test_decline_license_samsung(self):162 def test_decline_license_samsung(self):
163 search = "License has not been accepted"163 search = "License has not been accepted"
@@ -214,13 +214,13 @@
214 def test_license_accepted_samsung(self):214 def test_license_accepted_samsung(self):
215 search = "This is protected with click-through Samsung license."215 search = "This is protected with click-through Samsung license."
216 os.rename("%s/cookies.samsung" % docroot, "%s/cookies.txt" % docroot)216 os.rename("%s/cookies.samsung" % docroot, "%s/cookies.txt" % docroot)
217 testfile = fetcher.get(host + samsung_test_file, ignore_license=True)217 testfile = fetcher.get(host + samsung_test_file)
218 self.assertThat(testfile, Contains(search))218 self.assertThat(testfile, Contains(search))
219219
220 def test_license_accepted_ste(self):220 def test_license_accepted_ste(self):
221 search = "This is protected with click-through ST-E license."221 search = "This is protected with click-through ST-E license."
222 os.rename("%s/cookies.ste" % docroot, "%s/cookies.txt" % docroot)222 os.rename("%s/cookies.ste" % docroot, "%s/cookies.txt" % docroot)
223 testfile = fetcher.get(host + ste_test_file, ignore_license=True)223 testfile = fetcher.get(host + ste_test_file)
224 self.assertThat(testfile, Contains(search))224 self.assertThat(testfile, Contains(search))
225225
226 def test_internal_host_samsung(self):226 def test_internal_host_samsung(self):

Subscribers

People subscribed via source and target branches