Merge lp:~wgrant/loggerhead/bug-740142 into lp:loggerhead

Proposed by William Grant
Status: Merged
Approved by: Robert Collins
Approved revision: 448
Merged at revision: 442
Proposed branch: lp:~wgrant/loggerhead/bug-740142
Merge into: lp:loggerhead
Diff against target: 247 lines (+96/-21)
6 files modified
loggerhead/controllers/view_ui.py (+1/-2)
loggerhead/templatefunctions.py (+20/-12)
loggerhead/tests/__init__.py (+1/-0)
loggerhead/tests/test_simple.py (+7/-3)
loggerhead/tests/test_util.py (+33/-0)
loggerhead/util.py (+34/-4)
To merge this branch: bzr merge lp:~wgrant/loggerhead/bug-740142
Reviewer Review Type Date Requested Status
Robert Collins Approve
Review via email: mp+54463@code.launchpad.net

Commit message

Properly escape filenames throughout loggerhead.templatefunctions.

Description of the change

loggerhead.templatefunctions uses cgi.escape to do all its escaping: for element values, attribute values, and URLs. But cgi.escape(foo) is only OK for element values; it fails to escape double or single quotes, allowing content to break out of quoted attributes. It also doesn't do much for URLs at all.

This branch fixes all the holes I could find after a quickish examination. I've introduced a new html_format function which does safe HTML template formatting. All of loggerhead.templatefunction's HTML generation now uses html_format. SimpleTAL will only let content through unescaped when "structure" is used in the template, and all referenced functions seem to be safe now.

templatefunctions also failed to URL-encode some URL fragments. I can't think of any significant damage that could be done here besides breaking the page, but it was an easy and relevant fix necessary for testing.

To post a comment you must log in.
lp:~wgrant/loggerhead/bug-740142 updated
447. By William Grant

Add forgotten test_util.

Revision history for this message
Robert Collins (lifeless) wrote :

Two things...
firstly, you probably want to copy the xml serialiser regex bzrlib has - its perf tested (we may render 10K filenames on a single page...).

And have you checked for performance impacts?

lp:~wgrant/loggerhead/bug-740142 updated
448. By William Grant

Use str.replace instead of lots of dict lookups.

Revision history for this message
Robert Collins (lifeless) wrote :

cool

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

I did some testing of this branch with non-ascii filenames and all the URLs still worked. (at least of what I tested.)

I think it would be good to have more tests that assert we don't include the raw content of filenames and file content in all the pages we serve, but this is certainly a good start.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'loggerhead/controllers/view_ui.py'
--- loggerhead/controllers/view_ui.py 2011-03-12 17:15:08 +0000
+++ loggerhead/controllers/view_ui.py 2011-03-23 05:23:24 +0000
@@ -17,7 +17,6 @@
17# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA17# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18#18#
1919
20import cgi
21import os20import os
22import time21import time
2322
@@ -65,7 +64,7 @@
65 extra_lines = len(file_lines) - len(hl_lines)64 extra_lines = len(file_lines) - len(hl_lines)
66 hl_lines.extend([u''] * extra_lines)65 hl_lines.extend([u''] * extra_lines)
67 else:66 else:
68 hl_lines = map(cgi.escape, file_lines)67 hl_lines = map(util.html_escape, file_lines)
69 68
70 return hl_lines;69 return hl_lines;
7170
7271
=== modified file 'loggerhead/templatefunctions.py'
--- loggerhead/templatefunctions.py 2011-03-02 14:07:21 +0000
+++ loggerhead/templatefunctions.py 2011-03-23 05:23:24 +0000
@@ -14,8 +14,8 @@
14# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA14# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
15#15#
1616
17import cgi
18import os17import os
18import urllib
1919
20import pkg_resources20import pkg_resources
2121
@@ -23,6 +23,7 @@
2323
24import loggerhead24import loggerhead
25from loggerhead.zptsupport import zpt25from loggerhead.zptsupport import zpt
26from loggerhead.util import html_format
2627
2728
28templatefunctions = {}29templatefunctions = {}
@@ -49,16 +50,21 @@
49 if style == 'fragment':50 if style == 'fragment':
50 def file_link(filename):51 def file_link(filename):
51 if currently_showing and filename == currently_showing:52 if currently_showing and filename == currently_showing:
52 return '<b><a href="#%s">%s</a></b>' % (53 return html_format(
53 cgi.escape(filename), cgi.escape(filename))54 '<b><a href="#%s">%s</a></b>',
55 urllib.quote(filename.encode('utf-8')), filename)
54 else:56 else:
55 return revision_link(57 return revision_link(
56 url, entry.revno, filename, '#' + filename)58 url, entry.revno, filename,
59 '#' + urllib.quote(filename.encode('utf-8')))
57 else:60 else:
58 def file_link(filename):61 def file_link(filename):
59 return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (62 return html_format(
60 url(['/revision', entry.revno]), '#' + filename, cgi.escape(filename),63 '<a href="%s%s" title="View changes to %s in revision %s">'
61 cgi.escape(entry.revno), cgi.escape(filename))64 '%s</a>',
65 url(['/revision', entry.revno]),
66 '#' + urllib.quote(filename.encode('utf-8')),
67 filename, entry.revno, filename)
62 return _pt('revisionfilechanges').expand(68 return _pt('revisionfilechanges').expand(
63 entry=entry, file_changes=file_changes, file_link=file_link, **templatefunctions)69 entry=entry, file_changes=file_changes, file_link=file_link, **templatefunctions)
6470
@@ -122,14 +128,16 @@
122128
123@templatefunc129@templatefunc
124def view_link(url, revno, path):130def view_link(url, revno, path):
125 return '<a href="%s" title="Annotate %s">%s</a>' % (131 return html_format(
126 url(['/view', revno, path]), cgi.escape(path), cgi.escape(path))132 '<a href="%s" title="Annotate %s">%s</a>',
133 url(['/view', revno, path]), path, path)
134
127135
128@templatefunc136@templatefunc
129def revision_link(url, revno, path, frag=''):137def revision_link(url, revno, path, frag=''):
130 return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (138 return html_format(
131 url(['/revision', revno, path]), frag, cgi.escape(path),139 '<a href="%s%s" title="View changes to %s in revision %s">%s</a>',
132 cgi.escape(revno), cgi.escape(path))140 url(['/revision', revno, path]), frag, path, revno, path)
133141
134142
135@templatefunc143@templatefunc
136144
=== modified file 'loggerhead/tests/__init__.py'
--- loggerhead/tests/__init__.py 2011-03-19 08:35:57 +0000
+++ loggerhead/tests/__init__.py 2011-03-23 05:23:24 +0000
@@ -26,5 +26,6 @@
26 'test_simple',26 'test_simple',
27 'test_revision_ui',27 'test_revision_ui',
28 'test_templating',28 'test_templating',
29 'test_util',
29 ]]))30 ]]))
30 return standard_tests31 return standard_tests
3132
=== modified file 'loggerhead/tests/test_simple.py'
--- loggerhead/tests/test_simple.py 2011-03-19 08:35:57 +0000
+++ loggerhead/tests/test_simple.py 2011-03-23 05:23:24 +0000
@@ -56,9 +56,11 @@
5656
57 self.filecontents = ('some\nmultiline\ndata\n'57 self.filecontents = ('some\nmultiline\ndata\n'
58 'with<htmlspecialchars\n')58 'with<htmlspecialchars\n')
59 filenames = ['myfilename', 'anotherfile<']
59 self.build_tree_contents(60 self.build_tree_contents(
60 [('myfilename', self.filecontents)])61 (filename, self.filecontents) for filename in filenames)
61 self.tree.add('myfilename', 'myfile-id')62 for filename in filenames:
63 self.tree.add(filename, '%s-id' % filename)
62 self.fileid = self.tree.path2id('myfilename')64 self.fileid = self.tree.path2id('myfilename')
63 self.msg = 'a very exciting commit message <'65 self.msg = 'a very exciting commit message <'
64 self.revid = self.tree.commit(message=self.msg)66 self.revid = self.tree.commit(message=self.msg)
@@ -70,7 +72,7 @@
7072
71 def test_changes_for_file(self):73 def test_changes_for_file(self):
72 app = self.setUpLoggerhead()74 app = self.setUpLoggerhead()
73 res = app.get('/changes?filter_file_id=myfile-id')75 res = app.get('/changes?filter_file_id=myfilename-id')
74 res.mustcontain(cgi.escape(self.msg))76 res.mustcontain(cgi.escape(self.msg))
7577
76 def test_changes_branch_from(self):78 def test_changes_branch_from(self):
@@ -131,6 +133,8 @@
131 def test_revision(self):133 def test_revision(self):
132 app = self.setUpLoggerhead()134 app = self.setUpLoggerhead()
133 res = app.get('/revision/1')135 res = app.get('/revision/1')
136 res.mustcontain(no=['anotherfile<'])
137 res.mustcontain('anotherfile&lt;')
134 res.mustcontain('myfilename')138 res.mustcontain('myfilename')
135139
136140
137141
=== added file 'loggerhead/tests/test_util.py'
--- loggerhead/tests/test_util.py 1970-01-01 00:00:00 +0000
+++ loggerhead/tests/test_util.py 2011-03-23 05:23:24 +0000
@@ -0,0 +1,33 @@
1# Copyright 2011 Canonical Ltd
2#
3# This program is free software; you can redistribute it and/or modify
4# it under the terms of the GNU General Public License as published by
5# the Free Software Foundation; either version 2 of the License, or
6# (at your option) any later version.
7#
8# This program is distributed in the hope that it will be useful,
9# but WITHOUT ANY WARRANTY; without even the implied warranty of
10# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11# GNU General Public License for more details.
12#
13# You should have received a copy of the GNU General Public License
14# along with this program; if not, write to the Free Software
15# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16
17from bzrlib import tests
18
19from loggerhead.util import html_escape, html_format
20
21
22class TestHTMLEscaping(tests.TestCase):
23
24 def test_html_escape(self):
25 self.assertEqual(
26 "foo &quot;&#39;&lt;&gt;&amp;",
27 html_escape("foo \"'<>&"))
28
29 def test_html_format(self):
30 self.assertEqual(
31 '<foo bar="baz&quot;&#39;">&lt;baz&gt;&amp;</foo>',
32 html_format(
33 '<foo bar="%s">%s</foo>', "baz\"'", "<baz>&"))
034
=== modified file 'loggerhead/util.py'
--- loggerhead/util.py 2010-04-28 21:41:32 +0000
+++ loggerhead/util.py 2011-03-23 05:23:24 +0000
@@ -20,7 +20,6 @@
20#20#
2121
22import base6422import base64
23import cgi
24import datetime23import datetime
25import logging24import logging
26import re25import re
@@ -214,16 +213,47 @@
214# only do this if unicode turns out to be a problem213# only do this if unicode turns out to be a problem
215#_BADCHARS_RE = re.compile(ur'[\u007f-\uffff]')214#_BADCHARS_RE = re.compile(ur'[\u007f-\uffff]')
216215
216# Can't be a dict; &amp; needs to be done first.
217html_entity_subs = [
218 ("&", "&amp;"),
219 ('"', "&quot;"),
220 ("'", "&#39;"), # &apos; is defined in XML, but not HTML.
221 (">", "&gt;"),
222 ("<", "&lt;"),
223 ]
224
225
226def html_escape(s):
227 """Transform dangerous (X)HTML characters into entities.
228
229 Like cgi.escape, except also escaping " and '. This makes it safe to use
230 in both attribute and element content.
231
232 If you want to safely fill a format string with escaped values, use
233 html_format instead
234 """
235 for char, repl in html_entity_subs:
236 s = s.replace(char, repl)
237 return s
238
239
240def html_format(template, *args):
241 """Safely format an HTML template string, escaping the arguments.
242
243 The template string must not be user-controlled; it will not be escaped.
244 """
245 return template % tuple(html_escape(arg) for arg in args)
246
247
217# FIXME: get rid of this method; use fixed_width() and avoid XML().248# FIXME: get rid of this method; use fixed_width() and avoid XML().
218249
219
220def html_clean(s):250def html_clean(s):
221 """251 """
222 clean up a string for html display. expand any tabs, encode any html252 clean up a string for html display. expand any tabs, encode any html
223 entities, and replace spaces with '&nbsp;'. this is primarily for use253 entities, and replace spaces with '&nbsp;'. this is primarily for use
224 in displaying monospace text.254 in displaying monospace text.
225 """255 """
226 s = cgi.escape(s.expandtabs())256 s = html_escape(s.expandtabs())
227 s = s.replace(' ', '&nbsp;')257 s = s.replace(' ', '&nbsp;')
228 return s258 return s
229259
@@ -269,7 +299,7 @@
269 except UnicodeDecodeError:299 except UnicodeDecodeError:
270 s = s.decode('iso-8859-15')300 s = s.decode('iso-8859-15')
271301
272 s = cgi.escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)302 s = html_escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)
273303
274 return HSC.clean(s).replace('\n', '<br/>')304 return HSC.clean(s).replace('\n', '<br/>')
275305

Subscribers

People subscribed via source and target branches