Merge lp:~wgrant/launchpad/bug-694001-apache-username-spaces into lp:launchpad

Proposed by William Grant on 2011-01-19
Status: Merged
Approved by: Curtis Hovey on 2011-01-19
Approved revision: no longer in the source branch.
Merged at revision: 12239
Proposed branch: lp:~wgrant/launchpad/bug-694001-apache-username-spaces
Merge into: lp:launchpad
Diff against target: 39 lines (+17/-1)
2 files modified
lib/contrib/apachelog.py (+1/-1)
lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+16/-0)
To merge this branch: bzr merge lp:~wgrant/launchpad/bug-694001-apache-username-spaces
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code 2011-01-19 Approve on 2011-01-19
j.c.sackett (community) code* 2011-01-19 Approve on 2011-01-19
Review via email: mp+46720@code.launchpad.net

Commit message

[r=jcsackett,sinzui][ui=none][bug=694001] Prevent the Apache log parser from choking on usernames with spaces.

Description of the change

Private PPA Apache logs contain unquoted usernames, which can be whatever the user wants -- even containing spaces or other strange characters. Since the default Apache combined log format uses spaces to delimit fields, the parser grabs fields with \S+ unless they are quoted. This makes it choke on lines with usernames containing spaces.

This branch fixes contrib.apachelog to match usernames with spaces. It's the only field in the default log format that can contain spaces, so it's still deterministically parsable.

To post a comment you must log in.
j.c.sackett (jcsackett) wrote :

William--

This looks fine by me.

review: Approve (code*)
Curtis Hovey (sinzui) wrote :

This is good to land

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/contrib/apachelog.py'
2--- lib/contrib/apachelog.py 2009-04-29 19:10:17 +0000
3+++ lib/contrib/apachelog.py 2011-01-19 00:11:54 +0000
4@@ -159,7 +159,7 @@
5 elif findpercent.search(element):
6 subpattern = r'(\[[^\]]+\])'
7
8- elif element == '%U':
9+ elif element in ('%U', '%u'):
10 subpattern = '(.+?)'
11
12 subpatterns.append(subpattern)
13
14=== modified file 'lib/lp/services/apachelogparser/tests/test_apachelogparser.py'
15--- lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2011-01-05 04:56:11 +0000
16+++ lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2011-01-19 00:11:54 +0000
17@@ -68,6 +68,22 @@
18 self.assertEqual(
19 request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
20
21+ def test_parsing_line_with_spaces_in_username(self):
22+ # Some lines have spaces in the username, left unquoted by
23+ # Apache. They can still be parsed OK, since no other fields
24+ # have similar issues.
25+ line = (r'1.1.1.1 - Some User [25/Jan/2009:15:48:07 +0000] "GET '
26+ r'/10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0" 200 12341 '
27+ r'"http://foo.bar/?baz=\"bang\"" '
28+ r'"\"Nokia2630/2.0 (05.20) Profile/MIDP-2.1 '
29+ r'Configuration/CLDC-1.1\""')
30+ host, date, status, request = get_host_date_status_and_request(line)
31+ self.assertEqual(host, '1.1.1.1')
32+ self.assertEqual(date, '[25/Jan/2009:15:48:07 +0000]')
33+ self.assertEqual(status, '200')
34+ self.assertEqual(
35+ request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
36+
37 def test_day_extraction(self):
38 date = '[13/Jun/2008:18:38:57 +0100]'
39 self.assertEqual(get_day(date), datetime(2008, 6, 13))