Merge lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa into lp:launchpad
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Approved by: | Michael Nelson on 2010-06-18 | ||||
| Approved revision: | no longer in the source branch. | ||||
| Merged at revision: | 11035 | ||||
| Proposed branch: | lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa | ||||
| Merge into: | lp:launchpad | ||||
| Diff against target: |
205 lines (+54/-22) 5 files modified
lib/canonical/config/schema-lazr.conf (+2/-2) lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py (+7/-3) lib/lp/services/apachelogparser/base.py (+9/-4) lib/lp/services/apachelogparser/tests/apache-log-files/launchpadlibrarian.net.access-log (+1/-0) lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+35/-13) |
||||
| To merge this branch: | bzr merge lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Abel Deuring (community) | code | 2010-06-17 | Approve on 2010-06-18 |
|
Review via email:
|
|||
Commit Message
Ensure the next line is incremented when log parser skips lines.
Description of the Change
This branch fixes bug 590766 (which was a result of a bad fix landed for bug 588288).
The apache log parser has always ignored lines that are not for an http status 200 OK, or are not GET requests etc. But the current code in devel doesn't increment to the next line when ignoring these lines.
The fix was just a small restructure to ensure that the line is always incremented.
To test:
bin/test -vvm test_apachelogp
To demo locally:
Apply the following patch so that only 2 lines are read at a time: http://
Then copy an apache access log from your /var/log/apache2 to /srv/launchpadl
and then run:
cronscripts/
multiple times. You can watch the data being updated with `psql launchpad_dev` running the query:
select * from parsedapachelog;
No lint.
| Abel Deuring (adeuring) wrote : | # |
...one more comment: Can't we use something like
line[-1] == '\n'
as an indicator if the current line is complete of if we have reached the end of the file and should not parse this line?
| Abel Deuring (adeuring) wrote : | # |
I think that would allow us to remove the "exception Exception" handling
| Michael Nelson (michael.nelson) wrote : | # |
Thanks Abel. Regarding the exception handling, it looks to me as if that is by design (but I didn't design it ;)) - that is, if there is an error the log parser will not continue past the line where the error occurred until it has been resolved (so that we don't end up skipping chunks of the file, but are rather alerted to it). I'm basing this on the test:
def test_unexpected
# When there's an unexpected error, we log it and return as if we had
# parsed up to the line before the one where the failure occurred.
I can create a bug with your comments for the foundations guys to look at if you think it's worthwhile.
Thanks!

Hi Michael,
your changes look good -- but I believe I found another issue in the "while next_line" loop (or I had not have yet enough coffee...):
If we get any exception in the second try/except block of parse_file(), parsed_bytes is set to the value for the start of the line causing the exception, and the loop is terminated via a "break".
So, when parse_file() is called the next time, the line causing the exception is read again. That's fine if the exception of the previous run was due to the line being truncated; in this case, it is fair to assume that a new run with more data will work better. But if the exception is caused for example by a bug in get_method_ and_path( ) or geoip.getRecord ByAddress( ), we will have an "non-terminating loop", in the sense that sucessive calls of parse_file() will never get past the line causing the exception.
I'm approving this branch but I think it is worth to rethink the exception handling in the loop.