Launchpad itself

Merge lp:~julian-edwards/launchpad/log-parser-bug-680463 into lp:launchpad

log-parser-bug-680463
Merge into devel

Proposed by Julian Edwards on 2010-11-25

Status:

Merged

Approved by:

Gavin Panella on 2010-11-25

Approved revision:

no longer in the source branch.

Merged at revision:

11988

Proposed branch:

lp:~julian-edwards/launchpad/log-parser-bug-680463

Merge into:

lp:launchpad

Diff against target:

24 lines (+7/-6)

1 file modified

lib/lp/services/apachelogparser/base.py (+7/-6)

To merge this branch:

bzr merge lp:~julian-edwards/launchpad/log-parser-bug-680463

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Gavin Panella (community)		2010-11-25	Approve on 2010-11-25
Review via email: mp+41865@code.launchpad.net

Commit message

When parsing Apache logs, figure out the length of gzip log files without having to read them in to memory in their entirety which can cause the process to crash when it eats all the available memory.

Description of the change

Figure out the length of gzip log files without having to read them in to memory.

The existing code tries to read the uncompressed contents of a gzip file into memory in their entirety. This makes the PPA log parser blow up quite horribly as the log files are very large.

Use existing test with:
bin/test -cvv test_apachelogparser Test_get_fd_and_file_size

QA Plan
-------

I have got a copy of the production log files that cause the crash on dogfood. Running with the fix allows the processing to continue with no increased memory usage as observed in "top".

Revision history for this message

Gavin Panella (allenap) on 2010-11-25:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Harpianto,ANDI

James Troup

John A Meinel

Julian Edwards

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Ubuntu-BR DevOps

William Grant

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'lib/lp/services/apachelogparser/base.py'
 --- lib/lp/services/apachelogparser/base.py	2010-11-17 23:20:07 +0000
 +++ lib/lp/services/apachelogparser/base.py	2010-11-26 17:47:08 +0000
@@ -64,13 +64,14 @@
      file_path points to a gzipped file.
      """
      if file_path.endswith('.gz'):
++        # The last 4 bytes of the file contains the uncompressed file's
++        # size, modulo 2**32.  This code is somewhat stolen from the gzip
++        # module in Python 2.6.
          fd = gzip.open(file_path)
--        # There doesn't seem to be a better way of figuring out the
--        # uncompressed size of a file, so we'll read the whole file here.
--        file_size = len(fd.read())
--        # Seek back to the beginning of the file as if we had just opened
--        # it.
--        fd.seek(0)
++        fd.fileobj.seek(-4, os.SEEK_END)
++        isize = gzip.read32(fd.fileobj)   # may exceed 2GB
++        file_size = isize & 0xffffffffL
++        fd.fileobj.seek(0)
      else:
          fd = open(file_path)
          file_size = os.path.getsize(file_path)