loggerhead

Merge lp:~wgrant/loggerhead/bug-740142 into lp:loggerhead

bug-740142
Merge into trunk-rich

Proposed by William Grant on 2011-03-23

Status:

Merged

Approved by:

Robert Collins on 2011-03-23

Approved revision:

448

Merged at revision:

442

Proposed branch:

lp:~wgrant/loggerhead/bug-740142

Merge into:

lp:loggerhead

Diff against target:

247 lines (+96/-21)

6 files modified

loggerhead/controllers/view_ui.py (+1/-2)
loggerhead/templatefunctions.py (+20/-12)
loggerhead/tests/__init__.py (+1/-0)
loggerhead/tests/test_simple.py (+7/-3)
loggerhead/tests/test_util.py (+33/-0)
loggerhead/util.py (+34/-4)

To merge this branch:

bzr merge lp:~wgrant/loggerhead/bug-740142

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Robert Collins		2011-03-23	Approve on 2011-03-23
Review via email: mp+54463@code.launchpad.net

Commit message

Properly escape filenames throughout loggerhead.templatefunctions.

Description of the change

loggerhead.templatefunctions uses cgi.escape to do all its escaping: for element values, attribute values, and URLs. But cgi.escape(foo) is only OK for element values; it fails to escape double or single quotes, allowing content to break out of quoted attributes. It also doesn't do much for URLs at all.

This branch fixes all the holes I could find after a quickish examination. I've introduced a new html_format function which does safe HTML template formatting. All of loggerhead.templatefunction's HTML generation now uses html_format. SimpleTAL will only let content through unescaped when "structure" is used in the template, and all referenced functions seem to be safe now.

templatefunctions also failed to URL-encode some URL fragments. I can't think of any significant damage that could be done here besides breaking the page, but it was an easy and relevant fix necessary for testing.

lp:~wgrant/loggerhead/bug-740142 updated on 2011-03-23

447. By William Grant on 2011-03-23: Add forgotten test_util.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-03-23:

Two things...
firstly, you probably want to copy the xml serialiser regex bzrlib has - its perf tested (we may render 10K filenames on a single page...).

And have you checked for performance impacts?

lp:~wgrant/loggerhead/bug-740142 updated on 2011-03-23

448. By William Grant on 2011-03-23: Use str.replace instead of lots of dict lookups.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-03-23:

cool

review: Approve

Revision history for this message

John A Meinel (jameinel) wrote on 2011-03-23:

I did some testing of this branch with non-ascii filenames and all the URLs still worked. (at least of what I tested.)

I think it would be good to have more tests that assert we don't include the raw content of filenames and file content in all the pages we serve, but this is certainly a good start.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Colin Watson

James Barlow

John A Meinel

Matt Nordhoff

Michael Hudson-Doyle

Paul Hummer

William Grant

jay ham

 === modified file 'loggerhead/controllers/view_ui.py'
 --- loggerhead/controllers/view_ui.py	2011-03-12 17:15:08 +0000
 +++ loggerhead/controllers/view_ui.py	2011-03-23 05:23:24 +0000
@@ -17,7 +17,6 @@
  # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ #
--import cgi
  import os
  import time
@@ -65,7 +64,7 @@
              extra_lines = len(file_lines) - len(hl_lines)
              hl_lines.extend([u''] * extra_lines)
          else:
--            hl_lines = map(cgi.escape, file_lines)
++            hl_lines = map(util.html_escape, file_lines)
          return hl_lines;
 === modified file 'loggerhead/templatefunctions.py'
 --- loggerhead/templatefunctions.py	2011-03-02 14:07:21 +0000
 +++ loggerhead/templatefunctions.py	2011-03-23 05:23:24 +0000
@@ -14,8 +14,8 @@
  # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ #
--import cgi
  import os
++import urllib
  import pkg_resources
@@ -23,6 +23,7 @@
  import loggerhead
  from loggerhead.zptsupport import zpt
++from loggerhead.util import html_format
  templatefunctions = {}
@@ -49,16 +50,21 @@
      if style == 'fragment':
          def file_link(filename):
              if currently_showing and filename == currently_showing:
--                return '<b><a href="#%s">%s</a></b>' % (
--                    cgi.escape(filename), cgi.escape(filename))
++                return html_format(
++                    '<b><a href="#%s">%s</a></b>',
++                    urllib.quote(filename.encode('utf-8')), filename)
              else:
                  return revision_link(
--                    url, entry.revno, filename, '#' + filename)
++                    url, entry.revno, filename,
++                    '#' + urllib.quote(filename.encode('utf-8')))
      else:
          def file_link(filename):
--            return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (
--                url(['/revision', entry.revno]), '#' + filename, cgi.escape(filename),
--                cgi.escape(entry.revno), cgi.escape(filename))
++            return html_format(
++                '<a href="%s%s" title="View changes to %s in revision %s">'
++                '%s</a>',
++                url(['/revision', entry.revno]),
++                '#' + urllib.quote(filename.encode('utf-8')),
++                filename, entry.revno, filename)
      return _pt('revisionfilechanges').expand(
          entry=entry, file_changes=file_changes, file_link=file_link, **templatefunctions)
@@ -122,14 +128,16 @@
  @templatefunc
  def view_link(url, revno, path):
--    return '<a href="%s" title="Annotate %s">%s</a>' % (
--        url(['/view', revno, path]), cgi.escape(path), cgi.escape(path))
++    return html_format(
++        '<a href="%s" title="Annotate %s">%s</a>',
++        url(['/view', revno, path]), path, path)
++
  @templatefunc
  def revision_link(url, revno, path, frag=''):
--    return '<a href="%s%s" title="View changes to %s in revision %s">%s</a>' % (
--        url(['/revision', revno, path]), frag, cgi.escape(path),
--        cgi.escape(revno), cgi.escape(path))
++    return html_format(
++        '<a href="%s%s" title="View changes to %s in revision %s">%s</a>',
++        url(['/revision', revno, path]), frag, path, revno, path)
  @templatefunc
 === modified file 'loggerhead/tests/__init__.py'
 --- loggerhead/tests/__init__.py	2011-03-19 08:35:57 +0000
 +++ loggerhead/tests/__init__.py	2011-03-23 05:23:24 +0000
@@ -26,5 +26,6 @@
              'test_simple',
              'test_revision_ui',
              'test_templating',
++            'test_util',
          ]]))
      return standard_tests
 === modified file 'loggerhead/tests/test_simple.py'
 --- loggerhead/tests/test_simple.py	2011-03-19 08:35:57 +0000
 +++ loggerhead/tests/test_simple.py	2011-03-23 05:23:24 +0000
@@ -56,9 +56,11 @@
          self.filecontents = ('some\nmultiline\ndata\n'
                               'with<htmlspecialchars\n')
++        filenames = ['myfilename', 'anotherfile<']
          self.build_tree_contents(
--            [('myfilename', self.filecontents)])
--        self.tree.add('myfilename', 'myfile-id')
++            (filename, self.filecontents) for filename in filenames)
++        for filename in filenames:
++            self.tree.add(filename, '%s-id' % filename)
          self.fileid = self.tree.path2id('myfilename')
          self.msg = 'a very exciting commit message <'
          self.revid = self.tree.commit(message=self.msg)
@@ -70,7 +72,7 @@
      def test_changes_for_file(self):
          app = self.setUpLoggerhead()
--        res = app.get('/changes?filter_file_id=myfile-id')
++        res = app.get('/changes?filter_file_id=myfilename-id')
          res.mustcontain(cgi.escape(self.msg))
      def test_changes_branch_from(self):
@@ -131,6 +133,8 @@
      def test_revision(self):
          app = self.setUpLoggerhead()
          res = app.get('/revision/1')
++        res.mustcontain(no=['anotherfile<'])
++        res.mustcontain('anotherfile&lt;')
          res.mustcontain('myfilename')
 === added file 'loggerhead/tests/test_util.py'
 --- loggerhead/tests/test_util.py	1970-01-01 00:00:00 +0000
 +++ loggerhead/tests/test_util.py	2011-03-23 05:23:24 +0000
@@ -0,0 +1,33 @@
++# Copyright 2011 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
++
++from bzrlib import tests
++
++from loggerhead.util import html_escape, html_format
++
++
++class TestHTMLEscaping(tests.TestCase):
++
++    def test_html_escape(self):
++        self.assertEqual(
++            "foo &quot;&#39;&lt;&gt;&amp;",
++            html_escape("foo \"'<>&"))
++
++    def test_html_format(self):
++        self.assertEqual(
++            '<foo bar="baz&quot;&#39;">&lt;baz&gt;&amp;</foo>',
++            html_format(
++                '<foo bar="%s">%s</foo>', "baz\"'", "<baz>&"))
 === modified file 'loggerhead/util.py'
 --- loggerhead/util.py	2010-04-28 21:41:32 +0000
 +++ loggerhead/util.py	2011-03-23 05:23:24 +0000
@@ -20,7 +20,6 @@
+ #
  import base64
--import cgi
  import datetime
  import logging
  import re
@@ -214,16 +213,47 @@
  # only do this if unicode turns out to be a problem
  #_BADCHARS_RE = re.compile(ur'[\u007f-\uffff]')
++# Can't be a dict; &amp; needs to be done first.
++html_entity_subs = [
++    ("&", "&amp;"),
++    ('"', "&quot;"),
++    ("'", "&#39;"), # &apos; is defined in XML, but not HTML.
++    (">", "&gt;"),
++    ("<", "&lt;"),
++    ]
++
++
++def html_escape(s):
++    """Transform dangerous (X)HTML characters into entities.
++
++    Like cgi.escape, except also escaping " and '. This makes it safe to use
++    in both attribute and element content.
++
++    If you want to safely fill a format string with escaped values, use
++    html_format instead
++    """
++    for char, repl in html_entity_subs:
++        s = s.replace(char, repl)
++    return s
++
++
++def html_format(template, *args):
++    """Safely format an HTML template string, escaping the arguments.
++
++    The template string must not be user-controlled; it will not be escaped.
++    """
++    return template % tuple(html_escape(arg) for arg in args)
++
++
  # FIXME: get rid of this method; use fixed_width() and avoid XML().
--
  def html_clean(s):
      """
      clean up a string for html display.  expand any tabs, encode any html
      entities, and replace spaces with '&nbsp;'.  this is primarily for use
      in displaying monospace text.
      """
--    s = cgi.escape(s.expandtabs())
++    s = html_escape(s.expandtabs())
      s = s.replace(' ', '&nbsp;')
      return s
@@ -269,7 +299,7 @@
          except UnicodeDecodeError:
              s = s.decode('iso-8859-15')
--    s = cgi.escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)
++    s = html_escape(s).expandtabs().replace(' ', NONBREAKING_SPACE)
      return HSC.clean(s).replace('\n', '<br/>')