Merge lp:~wgrant/launchpad/bug-694001-apache-username-spaces into lp:launchpad

Proposed by William Grant
Status: Merged
Approved by: Curtis Hovey
Approved revision: no longer in the source branch.
Merged at revision: 12239
Proposed branch: lp:~wgrant/launchpad/bug-694001-apache-username-spaces
Merge into: lp:launchpad
Diff against target: 39 lines (+17/-1)
2 files modified
lib/contrib/apachelog.py (+1/-1)
lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+16/-0)
To merge this branch: bzr merge lp:~wgrant/launchpad/bug-694001-apache-username-spaces
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
j.c.sackett (community) code* Approve
Review via email: mp+46720@code.launchpad.net

Commit message

[r=jcsackett,sinzui][ui=none][bug=694001] Prevent the Apache log parser from choking on usernames with spaces.

Description of the change

Private PPA Apache logs contain unquoted usernames, which can be whatever the user wants -- even containing spaces or other strange characters. Since the default Apache combined log format uses spaces to delimit fields, the parser grabs fields with \S+ unless they are quoted. This makes it choke on lines with usernames containing spaces.

This branch fixes contrib.apachelog to match usernames with spaces. It's the only field in the default log format that can contain spaces, so it's still deterministically parsable.

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :

William--

This looks fine by me.

review: Approve (code*)
Revision history for this message
Curtis Hovey (sinzui) wrote :

This is good to land

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/contrib/apachelog.py'
--- lib/contrib/apachelog.py 2009-04-29 19:10:17 +0000
+++ lib/contrib/apachelog.py 2011-01-19 00:11:54 +0000
@@ -159,7 +159,7 @@
159 elif findpercent.search(element):159 elif findpercent.search(element):
160 subpattern = r'(\[[^\]]+\])'160 subpattern = r'(\[[^\]]+\])'
161 161
162 elif element == '%U':162 elif element in ('%U', '%u'):
163 subpattern = '(.+?)'163 subpattern = '(.+?)'
164 164
165 subpatterns.append(subpattern)165 subpatterns.append(subpattern)
166166
=== modified file 'lib/lp/services/apachelogparser/tests/test_apachelogparser.py'
--- lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2011-01-05 04:56:11 +0000
+++ lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2011-01-19 00:11:54 +0000
@@ -68,6 +68,22 @@
68 self.assertEqual(68 self.assertEqual(
69 request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')69 request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
7070
71 def test_parsing_line_with_spaces_in_username(self):
72 # Some lines have spaces in the username, left unquoted by
73 # Apache. They can still be parsed OK, since no other fields
74 # have similar issues.
75 line = (r'1.1.1.1 - Some User [25/Jan/2009:15:48:07 +0000] "GET '
76 r'/10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0" 200 12341 '
77 r'"http://foo.bar/?baz=\"bang\"" '
78 r'"\"Nokia2630/2.0 (05.20) Profile/MIDP-2.1 '
79 r'Configuration/CLDC-1.1\""')
80 host, date, status, request = get_host_date_status_and_request(line)
81 self.assertEqual(host, '1.1.1.1')
82 self.assertEqual(date, '[25/Jan/2009:15:48:07 +0000]')
83 self.assertEqual(status, '200')
84 self.assertEqual(
85 request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
86
71 def test_day_extraction(self):87 def test_day_extraction(self):
72 date = '[13/Jun/2008:18:38:57 +0100]'88 date = '[13/Jun/2008:18:38:57 +0100]'
73 self.assertEqual(get_day(date), datetime(2008, 6, 13))89 self.assertEqual(get_day(date), datetime(2008, 6, 13))