Merge lp:~jameinel/bzr/2.0.4-494406-serve-memory-leak into lp:bzr

Proposed by John A Meinel
Status: Merged
Merged at revision: not available
Proposed branch: lp:~jameinel/bzr/2.0.4-494406-serve-memory-leak
Merge into: lp:bzr
Diff against target: 107 lines (+40/-15)
3 files modified
NEWS (+12/-1)
bzrlib/_bencode_pyx.pyx (+1/-1)
bzrlib/export/dir_exporter.py (+27/-13)
To merge this branch: bzr merge lp:~jameinel/bzr/2.0.4-494406-serve-memory-leak
Reviewer Review Type Date Requested Status
bzr-core Pending
Review via email: mp+16368@code.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

I managed to track down the memory leak on server-side operations to bzrlib.bencode.Encoder.

It turns out that Encoder was using '__del__' as its cleanup function, when it should have been '__dealloc__'. I don't know if __del__ ever gets called for Pyrex objects, but I do know that __dealloc__ always gets called. (And doesn't cause problems with GC cycles, as it gets put into the tp_dealloc slot.)

Anyway, this one-line patch gets rid of the memory leak as near as I can tell. (At least cycling on 'bzr commit' no longer leaks memory.)

It could also have a fair impact on codehostings 'lp-serve'. As that would be encoding a *lot* of bytes to send over the wire. This memory was also "dark" memory, in that it was allocated directly using 'malloc()' and thus not seen by Meliae. I'm not positive, but I think for every byte that 'bzr lp-serve' would send over the wire, it would get wrapped by a 'bencode' operation, which means that every byte sent out gets permanently leaked in memory. (Maybe we don't serve bulk data through a bencode wrapper.) At least, all of the response tuples are certainly encoded.

It doesn't affect my 'memory reduction' tests that I used for 2.1.0b3, because those were operating locally.

Revision history for this message
John A Meinel (jameinel) wrote :

This was meant to be proposed for 2.0
https://code.edge.launchpad.net/~jameinel/bzr/2.0.4-494406-serve-memory-leak/+merge/16369

Please review it there.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2009-12-18 09:05:13 +0000
3+++ NEWS 2009-12-18 22:32:31 +0000
4@@ -50,7 +50,7 @@
5 bzr 2.0.4 (not released yet)
6 ############################
7
8-:Codename: template
9+:Codename:
10 :2.0.4: ???
11
12 Compatibility Breaks
13@@ -62,6 +62,17 @@
14 Bug Fixes
15 *********
16
17+* ``bzr export dir`` now requests all file content as a record stream,
18+ rather than requsting the file content one file-at-a-time. This can make
19+ exporting over the network significantly faster (54min => 9min in one
20+ case). (John Arbash Meinel, #343218)
21+
22+* ``bzr serve`` no longer slowly leaks memory. The compiled
23+ ``bzrlib.bencode.Encoder()`` class was using ``__del__`` to cleanup and
24+ free resources, and it should have been using ``__dealloc__``.
25+ This will likely have an impact on any other process that is serving for
26+ an extended period of time. (John Arbash Meinel, #494406)
27+
28 Improvements
29 ************
30
31
32=== modified file 'bzrlib/_bencode_pyx.pyx'
33--- bzrlib/_bencode_pyx.pyx 2009-10-15 18:28:14 +0000
34+++ bzrlib/_bencode_pyx.pyx 2009-12-18 22:32:31 +0000
35@@ -268,7 +268,7 @@
36 self.maxsize = maxsize
37 self.tail = p
38
39- def __del__(self):
40+ def __dealloc__(self):
41 free(self.buffer)
42 self.buffer = NULL
43 self.maxsize = 0
44
45=== modified file 'bzrlib/export/dir_exporter.py'
46--- bzrlib/export/dir_exporter.py 2009-07-29 13:46:55 +0000
47+++ bzrlib/export/dir_exporter.py 2009-12-18 22:32:31 +0000
48@@ -1,4 +1,4 @@
49-# Copyright (C) 2005 Canonical Ltd
50+# Copyright (C) 2005, 2009 Canonical Ltd
51 #
52 # This program is free software; you can redistribute it and/or modify
53 # it under the terms of the GNU General Public License as published by
54@@ -52,21 +52,17 @@
55 raise errors.BzrError("Can't export tree to non-empty directory.")
56 else:
57 raise
58+ # Iterate everything, building up the files we will want to export, and
59+ # creating the directories and symlinks that we need.
60+ # This tracks (file_id, (destination_path, executable))
61+ # This matches the api that tree.iter_files_bytes() wants
62+ # Note in the case of revision trees, this does trigger a double inventory
63+ # lookup, hopefully it isn't too expensive.
64+ to_fetch = []
65 for dp, ie in _export_iter_entries(tree, subdir):
66 fullpath = osutils.pathjoin(dest, dp)
67 if ie.kind == "file":
68- if filtered:
69- chunks = tree.get_file_lines(ie.file_id)
70- filters = tree._content_filter_stack(dp)
71- context = ContentFilterContext(dp, tree, ie)
72- contents = filtered_output_bytes(chunks, filters, context)
73- content = ''.join(contents)
74- fileobj = StringIO.StringIO(content)
75- else:
76- fileobj = tree.get_file(ie.file_id)
77- osutils.pumpfile(fileobj, file(fullpath, 'wb'))
78- if tree.is_executable(ie.file_id):
79- os.chmod(fullpath, 0755)
80+ to_fetch.append((ie.file_id, (dp, tree.is_executable(ie.file_id))))
81 elif ie.kind == "directory":
82 os.mkdir(fullpath)
83 elif ie.kind == "symlink":
84@@ -80,3 +76,21 @@
85 else:
86 raise errors.BzrError("don't know how to export {%s} of kind %r" %
87 (ie.file_id, ie.kind))
88+ # The data returned here can be in any order, but we've already created all
89+ # the directories
90+ flags = os.O_CREAT | os.O_TRUNC | os.O_WRONLY | getattr(os, 'O_BINARY', 0)
91+ for (relpath, executable), chunks in tree.iter_files_bytes(to_fetch):
92+ if filtered:
93+ filters = tree._content_filter_stack(relpath)
94+ context = ContentFilterContext(relpath, tree, ie)
95+ chunks = filtered_output_bytes(chunks, filters, context)
96+ fullpath = osutils.pathjoin(dest, relpath)
97+ # We set the mode and let the umask sort out the file info
98+ mode = 0666
99+ if executable:
100+ mode = 0777
101+ out = os.fdopen(os.open(fullpath, flags, mode), 'wb')
102+ try:
103+ out.writelines(chunks)
104+ finally:
105+ out.close()
106
107=== modified file 'doc/en/user-guide/introducing_bazaar.txt'