Merge lp:~wgrant/loggerhead/bug-740142 into lp:loggerhead

Proposed by William Grant
Status: Merged
Approved by: Robert Collins
Approved revision: 448
Merged at revision: 442
Proposed branch: lp:~wgrant/loggerhead/bug-740142
Merge into: lp:loggerhead
Diff against target: 247 lines (+96/-21)
6 files modified
loggerhead/controllers/view_ui.py (+1/-2)
loggerhead/templatefunctions.py (+20/-12)
loggerhead/tests/__init__.py (+1/-0)
loggerhead/tests/test_simple.py (+7/-3)
loggerhead/tests/test_util.py (+33/-0)
loggerhead/util.py (+34/-4)
To merge this branch: bzr merge lp:~wgrant/loggerhead/bug-740142
Reviewer Review Type Date Requested Status
Robert Collins Approve
Review via email: mp+54463@code.launchpad.net

Commit message

Properly escape filenames throughout loggerhead.templatefunctions.

Description of the change

loggerhead.templatefunctions uses cgi.escape to do all its escaping: for element values, attribute values, and URLs. But cgi.escape(foo) is only OK for element values; it fails to escape double or single quotes, allowing content to break out of quoted attributes. It also doesn't do much for URLs at all.

This branch fixes all the holes I could find after a quickish examination. I've introduced a new html_format function which does safe HTML template formatting. All of loggerhead.templatefunction's HTML generation now uses html_format. SimpleTAL will only let content through unescaped when "structure" is used in the template, and all referenced functions seem to be safe now.

templatefunctions also failed to URL-encode some URL fragments. I can't think of any significant damage that could be done here besides breaking the page, but it was an easy and relevant fix necessary for testing.

To post a comment you must log in.
lp:~wgrant/loggerhead/bug-740142 updated
447. By William Grant

Add forgotten test_util.

Revision history for this message
Robert Collins (lifeless) wrote :

Two things...
firstly, you probably want to copy the xml serialiser regex bzrlib has - its perf tested (we may render 10K filenames on a single page...).

And have you checked for performance impacts?

lp:~wgrant/loggerhead/bug-740142 updated
448. By William Grant

Use str.replace instead of lots of dict lookups.

Revision history for this message
Robert Collins (lifeless) wrote :

cool

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

I did some testing of this branch with non-ascii filenames and all the URLs still worked. (at least of what I tested.)

I think it would be good to have more tests that assert we don't include the raw content of filenames and file content in all the pages we serve, but this is certainly a good start.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'loggerhead/controllers/view_ui.py'
2--- loggerhead/controllers/view_ui.py 2011-03-12 17:15:08 +0000
3+++ loggerhead/controllers/view_ui.py 2011-03-23 05:23:24 +0000
4@@ -17,7 +17,6 @@
5 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
6 #
7
8-import cgi
9 import os
10 import time
11
12@@ -65,7 +64,7 @@
13 extra_lines = len(file_lines) - len(hl_lines)
14 hl_lines.extend([u''] * extra_lines)
15 else:
16- hl_lines = map(cgi.escape, file_lines)
17+ hl_lines = map(util.html_escape, file_lines)
18
19 return hl_lines;
20
21
22=== modified file 'loggerhead/templatefunctions.py'
23--- loggerhead/templatefunctions.py 2011-03-02 14:07:21 +0000
24+++ loggerhead/templatefunctions.py 2011-03-23 05:23:24 +0000
25@@ -14,8 +14,8 @@
26 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
27 #
28
29-import cgi
30 import os
31+import urllib
32
33 import pkg_resources
34
35@@ -23,6 +23,7 @@
36
37 import loggerhead
38 from loggerhead.zptsupport import zpt
39+from loggerhead.util import html_format
40
41
42 templatefunctions = {}
43@@ -49,16 +50,21 @@
44 if style == 'fragment':
45 def file_link(filename):
46 if currently_showing and filename == currently_showing:
47- return '<b><a href="#%s">%s</a></b>' % (
48- cgi.escape(filename), cgi.escape(filename))
49+ return html_format(
50+ '<b><a href="#%s">%s</a></b>',
51+ urllib.quote(filename.encode('utf-8')), filename)
52 else:
53 return revision_link(
54- url, entry.revno, filename, '#' + filename)
55+ url, entry.revno, filename,
56+ '#' + urllib.quote(filename.encode('utf-8')))
57 else:
58 def file_link(filename):
59- return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (
60- url(['/revision', entry.revno]), '#' + filename, cgi.escape(filename),
61- cgi.escape(entry.revno), cgi.escape(filename))
62+ return html_format(
63+ '<a href="%s%s" title="View changes to %s in revision %s">'
64+ '%s</a>',
65+ url(['/revision', entry.revno]),
66+ '#' + urllib.quote(filename.encode('utf-8')),
67+ filename, entry.revno, filename)
68 return _pt('revisionfilechanges').expand(
69 entry=entry, file_changes=file_changes, file_link=file_link, **templatefunctions)
70
71@@ -122,14 +128,16 @@
72
73 @templatefunc
74 def view_link(url, revno, path):
75- return '<a href="%s" title="Annotate %s">%s</a>' % (
76- url(['/view', revno, path]), cgi.escape(path), cgi.escape(path))
77+ return html_format(
78+ '<a href="%s" title="Annotate %s">%s</a>',
79+ url(['/view', revno, path]), path, path)
80+
81
82 @templatefunc
83 def revision_link(url, revno, path, frag=''):
84- return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (
85- url(['/revision', revno, path]), frag, cgi.escape(path),
86- cgi.escape(revno), cgi.escape(path))
87+ return html_format(
88+ '<a href="%s%s" title="View changes to %s in revision %s">%s</a>',
89+ url(['/revision', revno, path]), frag, path, revno, path)
90
91
92 @templatefunc
93
94=== modified file 'loggerhead/tests/__init__.py'
95--- loggerhead/tests/__init__.py 2011-03-19 08:35:57 +0000
96+++ loggerhead/tests/__init__.py 2011-03-23 05:23:24 +0000
97@@ -26,5 +26,6 @@
98 'test_simple',
99 'test_revision_ui',
100 'test_templating',
101+ 'test_util',
102 ]]))
103 return standard_tests
104
105=== modified file 'loggerhead/tests/test_simple.py'
106--- loggerhead/tests/test_simple.py 2011-03-19 08:35:57 +0000
107+++ loggerhead/tests/test_simple.py 2011-03-23 05:23:24 +0000
108@@ -56,9 +56,11 @@
109
110 self.filecontents = ('some\nmultiline\ndata\n'
111 'with<htmlspecialchars\n')
112+ filenames = ['myfilename', 'anotherfile<']
113 self.build_tree_contents(
114- [('myfilename', self.filecontents)])
115- self.tree.add('myfilename', 'myfile-id')
116+ (filename, self.filecontents) for filename in filenames)
117+ for filename in filenames:
118+ self.tree.add(filename, '%s-id' % filename)
119 self.fileid = self.tree.path2id('myfilename')
120 self.msg = 'a very exciting commit message <'
121 self.revid = self.tree.commit(message=self.msg)
122@@ -70,7 +72,7 @@
123
124 def test_changes_for_file(self):
125 app = self.setUpLoggerhead()
126- res = app.get('/changes?filter_file_id=myfile-id')
127+ res = app.get('/changes?filter_file_id=myfilename-id')
128 res.mustcontain(cgi.escape(self.msg))
129
130 def test_changes_branch_from(self):
131@@ -131,6 +133,8 @@
132 def test_revision(self):
133 app = self.setUpLoggerhead()
134 res = app.get('/revision/1')
135+ res.mustcontain(no=['anotherfile<'])
136+ res.mustcontain('anotherfile&lt;')
137 res.mustcontain('myfilename')
138
139
140
141=== added file 'loggerhead/tests/test_util.py'
142--- loggerhead/tests/test_util.py 1970-01-01 00:00:00 +0000
143+++ loggerhead/tests/test_util.py 2011-03-23 05:23:24 +0000
144@@ -0,0 +1,33 @@
145+# Copyright 2011 Canonical Ltd
146+#
147+# This program is free software; you can redistribute it and/or modify
148+# it under the terms of the GNU General Public License as published by
149+# the Free Software Foundation; either version 2 of the License, or
150+# (at your option) any later version.
151+#
152+# This program is distributed in the hope that it will be useful,
153+# but WITHOUT ANY WARRANTY; without even the implied warranty of
154+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
155+# GNU General Public License for more details.
156+#
157+# You should have received a copy of the GNU General Public License
158+# along with this program; if not, write to the Free Software
159+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
160+
161+from bzrlib import tests
162+
163+from loggerhead.util import html_escape, html_format
164+
165+
166+class TestHTMLEscaping(tests.TestCase):
167+
168+ def test_html_escape(self):
169+ self.assertEqual(
170+ "foo &quot;&#39;&lt;&gt;&amp;",
171+ html_escape("foo \"'<>&"))
172+
173+ def test_html_format(self):
174+ self.assertEqual(
175+ '<foo bar="baz&quot;&#39;">&lt;baz&gt;&amp;</foo>',
176+ html_format(
177+ '<foo bar="%s">%s</foo>', "baz\"'", "<baz>&"))
178
179=== modified file 'loggerhead/util.py'
180--- loggerhead/util.py 2010-04-28 21:41:32 +0000
181+++ loggerhead/util.py 2011-03-23 05:23:24 +0000
182@@ -20,7 +20,6 @@
183 #
184
185 import base64
186-import cgi
187 import datetime
188 import logging
189 import re
190@@ -214,16 +213,47 @@
191 # only do this if unicode turns out to be a problem
192 #_BADCHARS_RE = re.compile(ur'[\u007f-\uffff]')
193
194+# Can't be a dict; &amp; needs to be done first.
195+html_entity_subs = [
196+ ("&", "&amp;"),
197+ ('"', "&quot;"),
198+ ("'", "&#39;"), # &apos; is defined in XML, but not HTML.
199+ (">", "&gt;"),
200+ ("<", "&lt;"),
201+ ]
202+
203+
204+def html_escape(s):
205+ """Transform dangerous (X)HTML characters into entities.
206+
207+ Like cgi.escape, except also escaping " and '. This makes it safe to use
208+ in both attribute and element content.
209+
210+ If you want to safely fill a format string with escaped values, use
211+ html_format instead
212+ """
213+ for char, repl in html_entity_subs:
214+ s = s.replace(char, repl)
215+ return s
216+
217+
218+def html_format(template, *args):
219+ """Safely format an HTML template string, escaping the arguments.
220+
221+ The template string must not be user-controlled; it will not be escaped.
222+ """
223+ return template % tuple(html_escape(arg) for arg in args)
224+
225+
226 # FIXME: get rid of this method; use fixed_width() and avoid XML().
227
228-
229 def html_clean(s):
230 """
231 clean up a string for html display. expand any tabs, encode any html
232 entities, and replace spaces with '&nbsp;'. this is primarily for use
233 in displaying monospace text.
234 """
235- s = cgi.escape(s.expandtabs())
236+ s = html_escape(s.expandtabs())
237 s = s.replace(' ', '&nbsp;')
238 return s
239
240@@ -269,7 +299,7 @@
241 except UnicodeDecodeError:
242 s = s.decode('iso-8859-15')
243
244- s = cgi.escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)
245+ s = html_escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)
246
247 return HSC.clean(s).replace('\n', '<br/>')
248

Subscribers

People subscribed via source and target branches