Merge lp:~gary/launchpad/bug676489 into lp:launchpad

Proposed by Gary Poster
Status: Merged
Approved by: Gary Poster
Approved revision: no longer in the source branch.
Merged at revision: 11935
Proposed branch: lp:~gary/launchpad/bug676489
Merge into: lp:launchpad
Diff against target: 79 lines (+43/-7)
2 files modified
lib/lp/services/apachelogparser/base.py (+13/-7)
lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+30/-0)
To merge this branch: bzr merge lp:~gary/launchpad/bug676489
Reviewer Review Type Date Requested Status
Edwin Grubbs (community) code Approve
Benji York (community) code* Approve
Review via email: mp+41069@code.launchpad.net

Commit message

[r=benji,edwin-grubbs][ui=none][bug=676489] fix bug 676489 by adding code in the apache log parsing to handle a space in the path.

Description of the change

This change fixes bug 676489 by adding code to handle a space in the path. I added tests for these changes, as well as for an existing code path pertinent to HTTP 1.0 possibly not including the protocol.

To run all the pertinent tests I know about, run ``./bin/test -vvt test_apachelogparser``.

``make lint`` is happy.

Thank you.

To post a comment you must log in.
Revision history for this message
Benji York (benji) wrote :

This branch looks good. Edwin, my mentor, will also be reviewing it
momentarily.

I especially appreciate the thorough test coverage.

A small comment about this bit of code:

path, ignore, protocol = rest.rpartition(' ')
if not path:
    # HTTP 1.0 requests might omit the HTTP version so we cope with them.
    path = protocol

On first reading I'd expect the body of the if to assign a new value to
"protocol" as well (perhaps None).

On second reading I realize that protocol isn't used (other than if it
isn't actually the protocol) and wonder if restraining from naming the
strings until they're known-good might be better:

# HTTP 1.0 requests might omit the HTTP version so we cope with them.
first, ignore, last = rest.partition(' ')
if first:
    path = first
else:
    path = last

Or even the shorter but perhaps obscure:

# HTTP 1.0 requests might omit the HTTP version so we cope with them.
first, ignore, last = rest.partition(' ')
path = first or last

review: Approve (code*)
Revision history for this message
Edwin Grubbs (edwin-grubbs) wrote :

Hi Gary,

I like benji's suggestion of using:
  first, ignore, last = rest.rpartition(' ')

I believe our style guide wants a more explicit if-condition though. For example:
  if first == "":

Otherwise, I think the branch is very nice.

-Edwin

review: Approve (code)
Revision history for this message
Benji York (benji) wrote :

On Wed, Nov 17, 2010 at 5:27 PM, Edwin Grubbs
<email address hidden> wrote:
> I believe our style guide wants a more explicit if-condition though. For example:
>  if first == "":

Correct: https://dev.launchpad.net/PythonStyleGuide#Truth%20conditionals
--
Benji York

Revision history for this message
Gary Poster (gary) wrote :

Thank you both. I changed the code per your suggestion.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/services/apachelogparser/base.py'
2--- lib/lp/services/apachelogparser/base.py 2010-09-11 19:25:13 +0000
3+++ lib/lp/services/apachelogparser/base.py 2010-11-17 23:22:15 +0000
4@@ -204,15 +204,21 @@
5
6 def get_method_and_path(request):
7 """Extract the method of the request and path of the requested file."""
8- L = request.split()
9- # HTTP 1.0 requests might omit the HTTP version so we must cope with them.
10- if len(L) == 2:
11- method, path = L
12+ method, ignore, rest = request.partition(' ')
13+ # In the below, the common case is that `first` is the path and `last` is
14+ # the protocol.
15+ first, ignore, last = rest.rpartition(' ')
16+ if first == '':
17+ # HTTP 1.0 requests might omit the HTTP version so we cope with them.
18+ path = last
19+ elif not last.startswith('HTTP'):
20+ # We cope with HTTP 1.0 protocol without HTTP version *and* a
21+ # space in the path (see bug 676489 for example).
22+ path = rest
23 else:
24- method, path, protocol = L
25-
26+ # This is the common case.
27+ path = first
28 if path.startswith('http://') or path.startswith('https://'):
29 uri = URI(path)
30 path = uri.path
31-
32 return method, path
33
34=== modified file 'lib/lp/services/apachelogparser/tests/test_apachelogparser.py'
35--- lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2010-10-04 19:50:45 +0000
36+++ lib/lp/services/apachelogparser/tests/test_apachelogparser.py 2010-11-17 23:22:15 +0000
37@@ -29,6 +29,7 @@
38 get_fd_and_file_size,
39 get_files_to_parse,
40 get_host_date_status_and_request,
41+ get_method_and_path,
42 parse_file,
43 )
44 from lp.services.apachelogparser.model.parsedapachelog import ParsedApacheLog
45@@ -71,6 +72,35 @@
46 date = '[13/Jun/2008:18:38:57 +0100]'
47 self.assertEqual(get_day(date), datetime(2008, 6, 13))
48
49+ def test_parsing_path_with_missing_protocol(self):
50+ request = (r'GET /56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?'
51+ r'N\x1f\x9b')
52+ method, path = get_method_and_path(request)
53+ self.assertEqual(method, 'GET')
54+ self.assertEqual(
55+ path,
56+ r'/56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?N\x1f\x9b')
57+
58+ def test_parsing_path_with_space(self):
59+ # See bug 676489.
60+ request = (r'GET /56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?'
61+ r'N\x1f\x9b Z%7B... HTTP/1.0')
62+ method, path = get_method_and_path(request)
63+ self.assertEqual(method, 'GET')
64+ self.assertEqual(
65+ path,
66+ r'/56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?N\x1f\x9b Z%7B...')
67+
68+ def test_parsing_path_with_space_and_missing_protocol(self):
69+ # This is a variation of bug 676489.
70+ request = (r'GET /56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?'
71+ r'N\x1f\x9b Z%7B...')
72+ method, path = get_method_and_path(request)
73+ self.assertEqual(method, 'GET')
74+ self.assertEqual(
75+ path,
76+ r'/56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?N\x1f\x9b Z%7B...')
77+
78
79 class Test_get_fd_and_file_size(TestCase):
80