Merge lp:~jamesodhunt/apport/bug-1256268 into lp:~apport-hackers/apport/trunk

Proposed by James Hunt
Status: Needs review
Proposed branch: lp:~jamesodhunt/apport/bug-1256268
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 404 lines (+395/-0)
2 files modified
data/general-hooks/resource_hogs.py (+205/-0)
test/test_resource_hogs.py (+190/-0)
To merge this branch: bzr merge lp:~jamesodhunt/apport/bug-1256268
Reviewer Review Type Date Requested Status
Martin Pitt (community) Needs Fixing
Review via email: mp+198946@code.launchpad.net

Description of the change

I've tried to specify "reasonable" default values, but we may need to tweak the following (or maybe even allow them to be configured via /etc/apport/)?:

# Consider this number of the top cpu-hogging processes.
max_hogs = 3

# Percentage threshold for cpu and memory hogs
cpu_threshold = 80
mem_threshold = 50

To post a comment you must log in.
lp:~jamesodhunt/apport/bug-1256268 updated
2739. By James Hunt

Make check_for_hogs() add the bug tag for consistency with the way the hog
data itself is added to the bug.

Revision history for this message
James Hunt (jamesodhunt) wrote :

There are a couple of python test scripts here:

http://people.canonical.com/~jhunt/python/

... to trigger the conditions we are looking to detect.

lp:~jamesodhunt/apport/bug-1256268 updated
2740. By James Hunt

* data/general-hooks/resource_hogs.py: pep8 clean-up.
* test/test_resource_hogs.py: Test for resource_hogs.py.

Revision history for this message
James Hunt (jamesodhunt) wrote :

Hi Martin,

We now have a basic set of tests. However, there are a couple of issues:

1) test_resource_hogs.py: I had to copy TestSuiteUserInterface() from test_ui.py. Is there a way to avoid this?

2) When you run these tests, you will probably find that the memory hog test fails (this is because python fails to allocate the requisite 50% of system memory :-) Whilst consuming CPU is easy, consuming a fixed percentage of memory in a test is difficult for various reasons (amount of RAM, rlimits, etc). I'm in favour of having a way to set mem_threshold in resource_hogs.py to a very low value (probably zero in fact) such that we can be guaranteed that memory can be allocated and the hook will fire and we can check the report object. If you agree, is there a standard apport idiom I can use to do this?

Revision history for this message
Martin Pitt (pitti) wrote :

Hey James,

thanks for working on this!

1) It's better to move TestSuiteUserInterface into a separate test/ui.py and import it from test_ui.py and the new test.

2) The usual idiom for in-process changes is that the defaults are in a global variable and the test changes the global variable. That would work if the logic was in apport/report.py itself, but as it's in a separate hook the only way that comes to my mind is to check an environment variable in the hook.

I wonder if we actually want a percentage memory treshold. A process is "broken" if it unexpectedly takes 1 GB of memory regardless of whether you only have 1 or 64 GB of RAM. My gut feeling is that it should rather be flagged if the apported process is in the top 3 memory users, or perhaps also just if is using > 500 MB RAM.

Global: Can you please rename "hog" to perhaps "leaks", to avoid animal references and slang?

22 +def get_pid_path(pid):

That's fairly complicated; why isn't reading the /proc/pid/exe symlink sufficient? Apport does that in other places like add_proc_info(), so if you want to reuse any of its logic, call it on a temporary empty dictionary and use report['ExecutablePath'] or perhaps also InterpreterPath.

47 + path = [x for x in env if x.startswith('PATH=')]

part of get_pid_path(), so it might go away entirely, but if not let's rather call "which". shutils.which will make this easier in the future, but we still need to support python 2.x.

61 +def run_top(max_lines, sort_field):

I wonder if running "ps" instead of "top" might be easier to parse? You can precisely define the output fields, sort order, etc. But if "top" works for you, that's fine.

142 + @report: report,

Please don't put trailing commas after parameter descriptions.

379 + self.assertTrue(set(['MemoryHogs']).issubset(set(self.ui.report.keys())))

This is better written as

    self.assertIn('MemoryHogs', self.ui.report)

However, this ought to be more specific to actually ensure that you grabbed the test process, not any process.

This should also get some negative tests, to ensure that well-behaved processes do not turn up in those new keys.

Thanks!

review: Needs Fixing

Unmerged revisions

2740. By James Hunt

* data/general-hooks/resource_hogs.py: pep8 clean-up.
* test/test_resource_hogs.py: Test for resource_hogs.py.

2739. By James Hunt

Make check_for_hogs() add the bug tag for consistency with the way the hog
data itself is added to the bug.

2738. By James Hunt

data/general-hooks/resource_hogs.py: New general hook to capture details
of processes consuming large amounts of CPU and memory if they relate to
file in the package the user is reporting a bug against (LP: #1256268).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'data/general-hooks/resource_hogs.py'
2--- data/general-hooks/resource_hogs.py 1970-01-01 00:00:00 +0000
3+++ data/general-hooks/resource_hogs.py 2013-12-13 22:41:36 +0000
4@@ -0,0 +1,205 @@
5+# Look for resource-hogging processes (currently CPU+Memory).
6+#
7+# Copyright (C) 2013 Canonical Ltd.
8+# Author: James Hunt <james.hunt@canonical.com>
9+#
10+# This program is free software; you can redistribute it and/or modify it
11+# under the terms of the GNU General Public License as published by the
12+# Free Software Foundation; either version 2 of the License, or (at your
13+# option) any later version. See http://www.gnu.org/copyleft/gpl.html for
14+# the full text of the license.
15+
16+import os
17+import re
18+import apport.packaging
19+import apport.hookutils
20+
21+
22+def get_pid_path(pid):
23+ '''
24+ Determine the full path to the command running as the specified pid.
25+
26+ Returns: Full path, or None on error.
27+ '''
28+ fields = apport.hookutils.read_file('/proc/' + pid + '/cmdline').split('\0')
29+
30+ cmd_path = fields[0]
31+
32+ if not cmd_path:
33+ # kernel thread?
34+ return None
35+
36+ if cmd_path[0] == '/':
37+ return cmd_path
38+
39+ # No absolute path so look up the command in the
40+ # processes PATH
41+ cmd = cmd_path
42+
43+ env = apport.hookutils.read_file('/proc/' + pid + '/environ').split('\0')
44+ if not env:
45+ return None
46+
47+ path = [x for x in env if x.startswith('PATH=')]
48+ if not path:
49+ return None
50+
51+ path = path[0]
52+
53+ for elem in path.split(':'):
54+ cmd_path = '{}/{}'.format(elem, cmd)
55+ if os.path.exists(cmd_path):
56+ return cmd_path
57+
58+ return None
59+
60+
61+def run_top(max_lines, sort_field):
62+ '''
63+ Run top(1).
64+
65+ @max_lines: number of output lines to keep,
66+ @sort_field: field to sort top outpu by.
67+
68+ Returns: List of (up to size @max_lines elements) with each element
69+ containing a list of top output fields.
70+ '''
71+ lines = []
72+ count = 0
73+ args = ['top', '-b', '-n1']
74+
75+ args += ['-o{}'.format(sort_field)]
76+
77+ output = apport.hookutils.command_output(args).split('\n')
78+
79+ for line in output:
80+ fields = line.split()
81+
82+ # ignore header lines
83+ if (len(fields) != 12):
84+ continue
85+
86+ # pid
87+ result = re.match('\d+', fields[0])
88+ if not result:
89+ continue
90+
91+ # user
92+ if not re.match('\w+', fields[1]):
93+ continue
94+
95+ # time
96+ if not re.match('\d+:\d+\.\d+', fields[10]):
97+ continue
98+
99+ # command
100+ cmd = fields[11]
101+ if not cmd:
102+ continue
103+
104+ if count == max_lines:
105+ break
106+
107+ count += 1
108+
109+ lines.append(fields)
110+
111+ return lines
112+
113+
114+def check_for_hogs(report, reported_package, max_hogs, cpu_threshold, mem_threshold):
115+ '''
116+ Check for cpu and memory hogs.
117+ '''
118+
119+ cpu_hogs = get_hogs(report, reported_package, max_hogs, '%CPU', 8, cpu_threshold)
120+ mem_hogs = get_hogs(report, reported_package, max_hogs, '%MEM', 9, mem_threshold)
121+
122+ # Add the hog details to the report, adding a tag if any of the hogs
123+ # matches the package the Bug is being reported against.
124+
125+ if len(cpu_hogs):
126+ report['CPUHogs'] = str(cpu_hogs)
127+ got = [i for i, v in enumerate(cpu_hogs) if v[0] == reported_package]
128+ if got:
129+ report['Tags'] += ' {}'.format('cpu-hog')
130+
131+ if len(mem_hogs):
132+ report['MemoryHogs'] = str(mem_hogs)
133+ got = [i for i, v in enumerate(mem_hogs) if v[0] == reported_package]
134+ if got:
135+ report['Tags'] += ' {}'.format('memory-hog')
136+
137+
138+def get_hogs(report, reported_package, max_hogs, top_sort_field, field_count, threshold):
139+ '''
140+ Determine a list of resource hogs.
141+
142+ @report: report,
143+ @reported_package: name of package problem is being reported against,
144+ @max_hogs: number of hogs to consider,
145+ @top_sort_field: name of top(1) field for a specified resource,
146+ @field_count: the field number of @top_sort_field (zero-indexed),
147+ @threshold: float threshold value - any process above this will be
148+ considered to be a hog.
149+
150+ Returns: List of tuples where each tuple is (full_command_path, package, percentage).
151+ '''
152+ hogs = []
153+ count = 0
154+
155+ lines = run_top(max_hogs, top_sort_field)
156+
157+ for fields in lines:
158+ pid = fields[0]
159+
160+ result = re.match('\d+\.\d+', fields[field_count])
161+ if not result:
162+ continue
163+ hog_field = float(result.group(0))
164+
165+ count += 1
166+
167+ # top(1) displays sorted output in descending order so if the top
168+ # offenders don't meet the threshold, there is no point
169+ # looking any further.
170+ if count == 1 and hog_field < threshold:
171+ break
172+
173+ cmd = apport.hookutils.read_file('/proc/' + pid + '/cmdline').split('\0')[0]
174+
175+ if not cmd:
176+ # kernel thread gone haywire?
177+ continue
178+
179+ if cmd[0] != '/':
180+ cmd = get_pid_path(pid)
181+ if not cmd:
182+ continue
183+
184+ pkg = apport.packaging.get_file_package(cmd)
185+
186+ hogs.append((cmd, pkg, hog_field))
187+
188+ return hogs
189+
190+
191+def add_info(report):
192+ # Consider this number of the top cpu-hogging processes.
193+ max_hogs = 3
194+
195+ # Percentage threshold for cpu and memory hogs
196+ cpu_threshold = 80
197+ mem_threshold = 50
198+
199+ # Only consider bugs reports for now
200+ if report.get('ProblemType', None) != 'Bug':
201+ return
202+
203+ # We need to know the package the user is reporting the bug against
204+ if not 'Package' in report:
205+ return
206+
207+ reported_package = report.get('Package').split()[0]
208+
209+ check_for_hogs(report, reported_package, max_hogs, cpu_threshold, mem_threshold)
210
211=== added file 'test/test_resource_hogs.py'
212--- test/test_resource_hogs.py 1970-01-01 00:00:00 +0000
213+++ test/test_resource_hogs.py 2013-12-13 22:41:36 +0000
214@@ -0,0 +1,190 @@
215+#!/usr/bin/python3
216+
217+import unittest
218+import sys
219+import time
220+import os
221+import tempfile
222+from multiprocessing import Process, Pipe
223+
224+datadir = os.environ.get('APPORT_DATA_DIR', '/usr/share/apport')
225+sys.path.insert(0, os.path.join(datadir, 'general-hooks'))
226+
227+import apport.ui
228+import apport.report
229+
230+
231+class TestSuiteUserInterface(apport.ui.UserInterface):
232+ def __init__(self):
233+ # use our dummy crashdb
234+ self.crashdb_conf = tempfile.NamedTemporaryFile()
235+ self.crashdb_conf.write(b'''default = 'testsuite'
236+databases = {
237+ 'testsuite': {
238+ 'impl': 'memory',
239+ 'bug_pattern_url': None,
240+ },
241+ 'debug': {
242+ 'impl': 'memory',
243+ 'distro': 'debug',
244+ },
245+}
246+''')
247+ self.crashdb_conf.flush()
248+
249+ os.environ['APPORT_CRASHDB_CONF'] = self.crashdb_conf.name
250+
251+ apport.ui.UserInterface.__init__(self)
252+
253+ self.crashdb = apport.crashdb_impl.memory.CrashDatabase(
254+ None, {'dummy_data': 1, 'dupdb_url': ''})
255+
256+ # state of progress dialogs
257+ self.ic_progress_active = False
258+ self.ic_progress_pulses = 0 # count the pulses
259+ self.upload_progress_active = False
260+ self.upload_progress_pulses = 0
261+
262+ # these store the choices the ui_present_* calls do
263+ self.present_package_error_response = None
264+ self.present_kernel_error_response = None
265+ self.present_details_response = None
266+ self.question_yesno_response = None
267+ self.question_choice_response = None
268+ self.question_file_response = None
269+
270+ self.opened_url = None
271+ self.present_details_shown = False
272+
273+ self.clear_msg()
274+
275+ def clear_msg(self):
276+ # last message box
277+ self.msg_title = None
278+ self.msg_text = None
279+ self.msg_severity = None # 'warning' or 'error'
280+ self.msg_choices = None
281+
282+ def ui_present_report_details(self, allowed_to_report=True, modal_for=None):
283+ self.present_details_shown = True
284+ return self.present_details_response
285+
286+ def ui_info_message(self, title, text):
287+ self.msg_title = title
288+ self.msg_text = text
289+ self.msg_severity = 'info'
290+
291+ def ui_error_message(self, title, text):
292+ self.msg_title = title
293+ self.msg_text = text
294+ self.msg_severity = 'error'
295+
296+ def ui_start_info_collection_progress(self):
297+ self.ic_progress_pulses = 0
298+ self.ic_progress_active = True
299+
300+ def ui_pulse_info_collection_progress(self):
301+ assert self.ic_progress_active
302+ self.ic_progress_pulses += 1
303+
304+ def ui_stop_info_collection_progress(self):
305+ self.ic_progress_active = False
306+
307+ def ui_start_upload_progress(self):
308+ self.upload_progress_pulses = 0
309+ self.upload_progress_active = True
310+
311+ def ui_set_upload_progress(self, progress):
312+ assert self.upload_progress_active
313+ self.upload_progress_pulses += 1
314+
315+ def ui_stop_upload_progress(self):
316+ self.upload_progress_active = False
317+
318+ def open_url(self, url):
319+ self.opened_url = url
320+
321+ def ui_question_yesno(self, text):
322+ self.msg_text = text
323+ return self.question_yesno_response
324+
325+ def ui_question_choice(self, text, options, multiple):
326+ self.msg_text = text
327+ self.msg_choices = options
328+ return self.question_choice_response
329+
330+ def ui_question_file(self, text):
331+ self.msg_text = text
332+ return self.question_file_response
333+
334+
335+class T(unittest.TestCase):
336+ def eat_memory(self, conn):
337+ '''
338+ Allocate lots of memory.
339+ '''
340+
341+ Gb = 2 ** 30
342+
343+ # Evil: Keep allocating until we fail, then block,
344+ # keeping the memory we already have allocated.
345+ l = []
346+ try:
347+ while True:
348+ l.append(' ' * Gb)
349+ except MemoryError:
350+ conn.recv()
351+
352+ def eat_cpu(self):
353+ '''
354+ Consume CPU by spinning in a tight loop.
355+ '''
356+ while True:
357+ True
358+
359+ def test_memory_hog(self):
360+ '''Ensure a memory hog is detected'''
361+
362+ parent_conn, child_conn = Pipe()
363+
364+ p = Process(target=self.eat_memory, args=(child_conn,))
365+ p.start()
366+
367+ # wait until it's had a chance to get going
368+ time.sleep(2)
369+
370+ self.ui = TestSuiteUserInterface()
371+ self.ui.report = apport.Report('Bug')
372+
373+ # need "a" package to trigger the hook
374+ self.ui.report['Package'] = 'foo'
375+
376+ self.ui.collect_info()
377+ self.ui.report.add_hooks_info(self.ui)
378+
379+ self.assertTrue(set(['MemoryHogs']).issubset(set(self.ui.report.keys())))
380+
381+ p.terminate()
382+
383+ def test_cpu_hog(self):
384+ p = Process(target=self.eat_cpu)
385+ p.start()
386+
387+ # wait until it's had a chance to get going
388+ time.sleep(2)
389+
390+ self.ui = TestSuiteUserInterface()
391+ self.ui.report = apport.Report('Bug')
392+
393+ # need "a" package to trigger the hook
394+ self.ui.report['Package'] = 'foo'
395+
396+ self.ui.collect_info()
397+ self.ui.report.add_hooks_info(self.ui)
398+
399+ self.assertTrue(set(['CPUHogs']).issubset(set(self.ui.report.keys())))
400+
401+ p.terminate()
402+
403+
404+unittest.main()

Subscribers

People subscribed via source and target branches