Merge lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa into lp:launchpad
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Michael Nelson | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 11035 | ||||
Proposed branch: | lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa | ||||
Merge into: | lp:launchpad | ||||
Diff against target: |
205 lines (+54/-22) 5 files modified
lib/canonical/config/schema-lazr.conf (+2/-2) lib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py (+7/-3) lib/lp/services/apachelogparser/base.py (+9/-4) lib/lp/services/apachelogparser/tests/apache-log-files/launchpadlibrarian.net.access-log (+1/-0) lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+35/-13) |
||||
To merge this branch: | bzr merge lp:~michael.nelson/launchpad/588288-log-parser-dont-read-entire-file-qa | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Abel Deuring (community) | code | Approve | |
Review via email: mp+27849@code.launchpad.net |
Commit message
Ensure the next line is incremented when log parser skips lines.
Description of the change
This branch fixes bug 590766 (which was a result of a bad fix landed for bug 588288).
The apache log parser has always ignored lines that are not for an http status 200 OK, or are not GET requests etc. But the current code in devel doesn't increment to the next line when ignoring these lines.
The fix was just a small restructure to ensure that the line is always incremented.
To test:
bin/test -vvm test_apachelogp
To demo locally:
Apply the following patch so that only 2 lines are read at a time: http://
Then copy an apache access log from your /var/log/apache2 to /srv/launchpadl
and then run:
cronscripts/
multiple times. You can watch the data being updated with `psql launchpad_dev` running the query:
select * from parsedapachelog;
No lint.
Hi Michael,
your changes look good -- but I believe I found another issue in the "while next_line" loop (or I had not have yet enough coffee...):
If we get any exception in the second try/except block of parse_file(), parsed_bytes is set to the value for the start of the line causing the exception, and the loop is terminated via a "break".
So, when parse_file() is called the next time, the line causing the exception is read again. That's fine if the exception of the previous run was due to the line being truncated; in this case, it is fair to assume that a new run with more data will work better. But if the exception is caused for example by a bug in get_method_ and_path( ) or geoip.getRecord ByAddress( ), we will have an "non-terminating loop", in the sense that sucessive calls of parse_file() will never get past the line causing the exception.
I'm approving this branch but I think it is worth to rethink the exception handling in the loop.