Merge lp:~jerith/ibid/split_utils into lp:~ibid-core/ibid/old-trunk-1.6

Proposed by Jeremy Thurgood
Status: Merged
Approved by: Stefano Rivera
Approved revision: not available
Merged at revision: 808
Proposed branch: lp:~jerith/ibid/split_utils
Merge into: lp:~ibid-core/ibid/old-trunk-1.6
Diff against target: 191 lines (+64/-54)
6 files modified
ibid/plugins/feeds.py (+2/-1)
ibid/plugins/icecast.py (+2/-1)
ibid/plugins/lookup.py (+3/-2)
ibid/plugins/url.py (+1/-1)
ibid/utils/__init__.py (+1/-49)
ibid/utils/html.py (+55/-0)
To merge this branch: bzr merge lp:~jerith/ibid/split_utils
Reviewer Review Type Date Requested Status
Jonathan Hitchcock Abstain
Michael Gorven Approve
Stefano Rivera Approve
Review via email: mp+16619@code.launchpad.net

This proposal supersedes a proposal from 2009-12-28.

To post a comment you must log in.
Revision history for this message
Jeremy Thurgood (jerith) wrote : Posted in a previous version of this proposal

A small amount of reorganisation makes html5lib and BeautifulSoup optional dependencies.

Revision history for this message
Jonathan Hitchcock (vhata) wrote : Posted in a previous version of this proposal

ImportError: No module named ibid.utils.html

review: Needs Fixing
Revision history for this message
Michael Gorven (mgorven) wrote : Posted in a previous version of this proposal

You forgot to add ibid/utils/html.py.
 review needs_fixing

review: Needs Fixing
Revision history for this message
Michael Gorven (mgorven) wrote : Posted in a previous version of this proposal

  File "/home/mgorven/bzr/ibid/jerith/split_utils/ibid/utils/html.py", line 7,
in get_html_parse_tree
    req = urllib2.Request(url, data, headers)
NameError: global name 'urllib2' is not defined

 review needs_fixing

review: Needs Fixing
Revision history for this message
Jeremy Thurgood (jerith) wrote :

Actually tested it this time. We *really* need automated tests.

Revision history for this message
Stefano Rivera (stefanor) wrote :

I'm happy. But can we add a vim-line to html.py

review: Approve
Revision history for this message
Michael Gorven (mgorven) wrote :

Looks fine.
 review approve

review: Approve
Revision history for this message
Jonathan Hitchcock (vhata) wrote :

10:42 <Vhata> what you've actually created is ibid.utils.needs_beautiful_soup
10:42 <Vhata> not ibid.utils.html
10:44 <jerith> Vhata: Correct. I was going to put more things in there, but it turned out that only the one function used the two libraries and other plugins used the other html-related functions.
10:44 <jerith> And I couldn't come up with a better name.
10:46 <Vhata> so now people are going to have to dig around to work out whether to import form ibid.utils or ibid.utils.html (and why not ibid.utils.json or ibid.utils.datetime?)
10:46 <Vhata> if you're going to create ibid.utils.html, then put all the html stuff in there

review: Abstain
Revision history for this message
Jeremy Thurgood (jerith) wrote :

To paraphrase the next bit of IRC conversation, I believe that the following two statements are true:

1. A general dependency issue affecting all bot owners (those who care about the mass of extra libraries they install, anyway) outweighs a specific utility function location issue that affects only a subset of plugin developers.

2. Since the only way for plugin developers to find said utility functions is by reading documentation and code, and both of those will show them where the relevant functions are, the location of the utility functions isn't a big issue.

Perhaps renaming utils.html to utils.htmlextras will make the situation more explicit.

In any case, the dependency issue is a bigger deal for non-Debian people and those, like me, who prefer not to fill their system Python library with random stuff that isn't needed by package-managed applications.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'ibid/plugins/feeds.py'
2--- ibid/plugins/feeds.py 2009-12-10 09:56:44 +0000
3+++ ibid/plugins/feeds.py 2009-12-28 14:06:13 +0000
4@@ -12,7 +12,8 @@
5 from ibid.config import IntOption
6 from ibid.plugins import Processor, match, authorise, run_every
7 from ibid.models import Base, VersionedSchema
8-from ibid.utils import cacheable_download, get_html_parse_tree, human_join
9+from ibid.utils import cacheable_download, human_join
10+from ibid.utils.html import get_html_parse_tree
11
12 help = {'feeds': u'Displays articles from RSS and Atom feeds'}
13
14
15=== modified file 'ibid/plugins/icecast.py'
16--- ibid/plugins/icecast.py 2009-11-30 15:29:43 +0000
17+++ ibid/plugins/icecast.py 2009-12-28 14:06:13 +0000
18@@ -3,7 +3,8 @@
19
20 from ibid.config import DictOption, IntOption
21 from ibid.plugins import Processor, match, run_every
22-from ibid.utils import get_html_parse_tree, human_join
23+from ibid.utils import human_join
24+from ibid.utils.html import get_html_parse_tree
25
26 log = logging.getLogger('plugins.icecast')
27
28
29=== modified file 'ibid/plugins/lookup.py'
30--- ibid/plugins/lookup.py 2009-12-17 16:22:53 +0000
31+++ ibid/plugins/lookup.py 2009-12-28 14:06:13 +0000
32@@ -15,8 +15,9 @@
33 from ibid.compat import defaultdict, dt_strptime, ElementTree
34 from ibid.config import Option, BoolOption, DictOption
35 from ibid.plugins import Processor, match, handler
36-from ibid.utils import ago, decode_htmlentities, get_html_parse_tree, \
37- cacheable_download, json_webservice, human_join, plural
38+from ibid.utils import ago, decode_htmlentities, cacheable_download, \
39+ json_webservice, human_join, plural
40+from ibid.utils.html import get_html_parse_tree
41
42 log = logging.getLogger('plugins.lookup')
43
44
45=== modified file 'ibid/plugins/url.py'
46--- ibid/plugins/url.py 2009-10-18 16:49:32 +0000
47+++ ibid/plugins/url.py 2009-12-28 14:06:13 +0000
48@@ -13,7 +13,7 @@
49 from ibid.plugins import Processor, match, handler
50 from ibid.config import Option, ListOption
51 from ibid.models import Base, VersionedSchema
52-from ibid.utils import get_html_parse_tree
53+from ibid.utils.html import get_html_parse_tree
54
55 help = {'url': u'Captures URLs seen in channel to database and/or to delicious, and shortens and lengthens URLs'}
56
57
58=== added directory 'ibid/utils'
59=== renamed file 'ibid/utils.py' => 'ibid/utils/__init__.py'
60--- ibid/utils.py 2009-12-20 22:09:29 +0000
61+++ ibid/utils/__init__.py 2009-12-28 14:06:13 +0000
62@@ -1,4 +1,3 @@
63-import cgi
64 from gzip import GzipFile
65 from htmlentitydefs import name2codepoint
66 import os
67@@ -12,11 +11,9 @@
68 import zlib
69
70 from dateutil.tz import tzlocal, tzutc
71-from html5lib import HTMLParser, treebuilders
72-from BeautifulSoup import BeautifulSoup
73
74 import ibid
75-from ibid.compat import defaultdict, ElementTree, json
76+from ibid.compat import defaultdict, json
77
78 def ago(delta, units=None):
79 parts = []
80@@ -158,51 +155,6 @@
81
82 return unicode(timestamp.strftime(format.encode('utf8')), 'utf8')
83
84-class ContentTypeException(Exception):
85- pass
86-
87-def get_html_parse_tree(url, data=None, headers={}, treetype='beautifulsoup'):
88- "Request a URL, parse with html5lib, and return a parse tree from it"
89-
90- req = urllib2.Request(url, data, headers)
91- f = urllib2.urlopen(req)
92-
93- if f.info().gettype() not in ('text/html', 'application/xhtml+xml'):
94- f.close()
95- raise ContentTypeException("Content type isn't HTML, but " + f.info().gettype())
96-
97- data = f.read()
98- f.close()
99-
100- encoding = None
101- contentType = f.headers.get('content-type')
102- if contentType:
103- (mediaType, params) = cgi.parse_header(contentType)
104- encoding = params.get('charset')
105-
106- compression = f.headers.get('content-encoding')
107- if compression:
108- if compression.lower() == "deflate":
109- try:
110- data = zlib.decompress(data)
111- except zlib.error:
112- data = zlib.decompress(data, -zlib.MAX_WBITS)
113- elif compression.lower() == "gzip":
114- compressedstream = StringIO(data)
115- gzipper = GzipFile(fileobj=compressedstream)
116- data = gzipper.read()
117-
118- if treetype == "beautifulsoup":
119- return BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
120- elif treetype == "etree":
121- treebuilder = treebuilders.getTreeBuilder("etree", ElementTree)
122- else:
123- treebuilder = treebuilders.getTreeBuilder(treetype)
124-
125- parser = HTMLParser(tree=treebuilder)
126-
127- return parser.parse(data, encoding = encoding)
128-
129 class JSONException(Exception):
130 pass
131
132
133=== added file 'ibid/utils/html.py'
134--- ibid/utils/html.py 1970-01-01 00:00:00 +0000
135+++ ibid/utils/html.py 2009-12-28 14:06:13 +0000
136@@ -0,0 +1,55 @@
137+import cgi
138+import zlib
139+import urllib2
140+from gzip import GzipFile
141+from StringIO import StringIO
142+
143+from html5lib import HTMLParser, treebuilders
144+from BeautifulSoup import BeautifulSoup
145+
146+from ibid.compat import ElementTree
147+
148+class ContentTypeException(Exception):
149+ pass
150+
151+def get_html_parse_tree(url, data=None, headers={}, treetype='beautifulsoup'):
152+ "Request a URL, parse with html5lib, and return a parse tree from it"
153+
154+ req = urllib2.Request(url, data, headers)
155+ f = urllib2.urlopen(req)
156+
157+ if f.info().gettype() not in ('text/html', 'application/xhtml+xml'):
158+ f.close()
159+ raise ContentTypeException("Content type isn't HTML, but " + f.info().gettype())
160+
161+ data = f.read()
162+ f.close()
163+
164+ encoding = None
165+ contentType = f.headers.get('content-type')
166+ if contentType:
167+ (mediaType, params) = cgi.parse_header(contentType)
168+ encoding = params.get('charset')
169+
170+ compression = f.headers.get('content-encoding')
171+ if compression:
172+ if compression.lower() == "deflate":
173+ try:
174+ data = zlib.decompress(data)
175+ except zlib.error:
176+ data = zlib.decompress(data, -zlib.MAX_WBITS)
177+ elif compression.lower() == "gzip":
178+ compressedstream = StringIO(data)
179+ gzipper = GzipFile(fileobj=compressedstream)
180+ data = gzipper.read()
181+
182+ if treetype == "beautifulsoup":
183+ return BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
184+ elif treetype == "etree":
185+ treebuilder = treebuilders.getTreeBuilder("etree", ElementTree)
186+ else:
187+ treebuilder = treebuilders.getTreeBuilder(treetype)
188+
189+ parser = HTMLParser(tree=treebuilder)
190+
191+ return parser.parse(data, encoding = encoding)

Subscribers

People subscribed via source and target branches