Apport

Merge lp:~jamesodhunt/apport/bug-1256268 into lp:~apport-hackers/apport/trunk

bug-1256268
Merge into trunk

Proposed by James Hunt on 2013-12-13

Status:

Needs review

Proposed branch:

lp:~jamesodhunt/apport/bug-1256268

Merge into:

lp:~apport-hackers/apport/trunk

Diff against target:

404 lines (+395/-0)

2 files modified

data/general-hooks/resource_hogs.py (+205/-0)
test/test_resource_hogs.py (+190/-0)

To merge this branch:

bzr merge lp:~jamesodhunt/apport/bug-1256268

Undecided

New

Link a bug report

Reviewer	Review Type	Date Requested	Status
Martin Pitt (community)		2013-12-13	Needs Fixing on 2013-12-16
Review via email: mp+198946@code.launchpad.net

Description of the change

I've tried to specify "reasonable" default values, but we may need to tweak the following (or maybe even allow them to be configured via /etc/apport/)?:

# Consider this number of the top cpu-hogging processes.
max_hogs = 3

# Percentage threshold for cpu and memory hogs
cpu_threshold = 80
mem_threshold = 50

lp:~jamesodhunt/apport/bug-1256268 updated on 2013-12-13

2739. By James Hunt on 2013-12-13: Make check_for_hogs() add the bug tag for consistency with the way the hog
data itself is added to the bug.

Revision history for this message

James Hunt (jamesodhunt) wrote on 2013-12-13:

There are a couple of python test scripts here:

http://people.canonical.com/~jhunt/python/

... to trigger the conditions we are looking to detect.

lp:~jamesodhunt/apport/bug-1256268 updated on 2013-12-13

2740. By James Hunt on 2013-12-13: * data/general-hooks/resource_hogs.py: pep8 clean-up.
* test/test_resource_hogs.py: Test for resource_hogs.py.

Revision history for this message

James Hunt (jamesodhunt) wrote on 2013-12-13:

Hi Martin,

We now have a basic set of tests. However, there are a couple of issues:

1) test_resource_hogs.py: I had to copy TestSuiteUserInterface() from test_ui.py. Is there a way to avoid this?

2) When you run these tests, you will probably find that the memory hog test fails (this is because python fails to allocate the requisite 50% of system memory :-) Whilst consuming CPU is easy, consuming a fixed percentage of memory in a test is difficult for various reasons (amount of RAM, rlimits, etc). I'm in favour of having a way to set mem_threshold in resource_hogs.py to a very low value (probably zero in fact) such that we can be guaranteed that memory can be allocated and the hook will fire and we can check the report object. If you agree, is there a standard apport idiom I can use to do this?

Revision history for this message

Martin Pitt (pitti) wrote on 2013-12-16:

Hey James,

thanks for working on this!

1) It's better to move TestSuiteUserInterface into a separate test/ui.py and import it from test_ui.py and the new test.

2) The usual idiom for in-process changes is that the defaults are in a global variable and the test changes the global variable. That would work if the logic was in apport/report.py itself, but as it's in a separate hook the only way that comes to my mind is to check an environment variable in the hook.

I wonder if we actually want a percentage memory treshold. A process is "broken" if it unexpectedly takes 1 GB of memory regardless of whether you only have 1 or 64 GB of RAM. My gut feeling is that it should rather be flagged if the apported process is in the top 3 memory users, or perhaps also just if is using > 500 MB RAM.

Global: Can you please rename "hog" to perhaps "leaks", to avoid animal references and slang?

22 +def get_pid_path(pid):

That's fairly complicated; why isn't reading the /proc/pid/exe symlink sufficient? Apport does that in other places like add_proc_info(), so if you want to reuse any of its logic, call it on a temporary empty dictionary and use report['ExecutablePath'] or perhaps also InterpreterPath.

47 + path = [x for x in env if x.startswith('PATH=')]

part of get_pid_path(), so it might go away entirely, but if not let's rather call "which". shutils.which will make this easier in the future, but we still need to support python 2.x.

61 +def run_top(max_lines, sort_field):

I wonder if running "ps" instead of "top" might be easier to parse? You can precisely define the output fields, sort order, etc. But if "top" works for you, that's fine.

142 + @report: report,

Please don't put trailing commas after parameter descriptions.

379 + self.assertTrue(set(['MemoryHogs']).issubset(set(self.ui.report.keys())))

This is better written as

self.assertIn('MemoryHogs', self.ui.report)

However, this ought to be more specific to actually ensure that you grabbed the test process, not any process.

This should also get some negative tests, to ensure that well-behaved processes do not turn up in those new keys.

Thanks!

Hey James,

thanks for working on this!

1) It's better to move TestSuiteUserInterface into a separate test/ui.py and import it from test_ui.py and the new test.

Global: Can you please rename "hog" to perhaps "leaks", to avoid animal references and slang?

22	+def get_pid_path(pid):

47	+ path = [x for x in env if x.startswith('PATH=')]

part of get_pid_path(), so it might go away entirely, but if not let's rather call "which". shutils.which will make this easier in the future, but we still need to support python 2.x.

61	+def run_top(max_lines, sort_field):

I wonder if running "ps" instead of "top" might be easier to parse? You can precisely define the output fields, sort order, etc. But if "top" works for you, that's fine.

142	+ @report: report,

Please don't put trailing commas after parameter descriptions.

379	+ self.assertTrue(set(['MemoryHogs']).issubset(set(self.ui.report.keys())))

This is better written as

self.assertIn('MemoryHogs', self.ui.report)

However, this ought to be more specific to actually ensure that you grabbed the test process, not any process.

This should also get some negative tests, to ensure that well-behaved processes do not turn up in those new keys.

Thanks!

review: Needs Fixing

Unmerged revisions

2740. By James Hunt on 2013-12-13: * data/general-hooks/resource_hogs.py: pep8 clean-up.
* test/test_resource_hogs.py: Test for resource_hogs.py.
2739. By James Hunt on 2013-12-13: Make check_for_hogs() add the bug tag for consistency with the way the hog
data itself is added to the bug.
2738. By James Hunt on 2013-12-13: data/general-hooks/resource_hogs.py: New general hook to capture details
of processes consuming large amounts of CPU and memory if they relate to
file in the package the user is reporting a bug against (LP: #1256268).

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Brian Murray

Bruno Maximilian Voss

James Hunt

Martin Pitt

Ritesh Raj Sarraf

 === added file 'data/general-hooks/resource_hogs.py'
 --- data/general-hooks/resource_hogs.py	1970-01-01 00:00:00 +0000
 +++ data/general-hooks/resource_hogs.py	2013-12-13 22:41:36 +0000
@@ -0,0 +1,205 @@
++# Look for resource-hogging processes (currently CPU+Memory).
++#
++# Copyright (C) 2013 Canonical Ltd.
++# Author: James Hunt <james.hunt@canonical.com>
++#
++# This program is free software; you can redistribute it and/or modify it
++# under the terms of the GNU General Public License as published by the
++# Free Software Foundation; either version 2 of the License, or (at your
++# option) any later version.  See http://www.gnu.org/copyleft/gpl.html for
++# the full text of the license.
++
++import os
++import re
++import apport.packaging
++import apport.hookutils
++
++
++def get_pid_path(pid):
++    '''
++    Determine the full path to the command running as the specified pid.
++
++    Returns: Full path, or None on error.
++    '''
++    fields = apport.hookutils.read_file('/proc/' + pid + '/cmdline').split('\0')
++
++    cmd_path = fields[0]
++
++    if not cmd_path:
++        # kernel thread?
++        return None
++
++    if cmd_path[0] == '/':
++        return cmd_path
++
++    # No absolute path so look up the command in the
++    # processes PATH
++    cmd = cmd_path
++
++    env = apport.hookutils.read_file('/proc/' + pid + '/environ').split('\0')
++    if not env:
++        return None
++
++    path = [x for x in env if x.startswith('PATH=')]
++    if not path:
++        return None
++
++    path = path[0]
++
++    for elem in path.split(':'):
++        cmd_path = '{}/{}'.format(elem, cmd)
++        if os.path.exists(cmd_path):
++            return cmd_path
++
++    return None
++
++
++def run_top(max_lines, sort_field):
++    '''
++    Run top(1).
++
++    @max_lines: number of output lines to keep,
++    @sort_field: field to sort top outpu by.
++
++    Returns: List of (up to size @max_lines elements) with each element
++    containing a list of top output fields.
++    '''
++    lines = []
++    count = 0
++    args = ['top', '-b', '-n1']
++
++    args += ['-o{}'.format(sort_field)]
++
++    output = apport.hookutils.command_output(args).split('\n')
++
++    for line in output:
++        fields = line.split()
++
++        # ignore header lines
++        if (len(fields) != 12):
++            continue
++
++        # pid
++        result = re.match('\d+', fields[0])
++        if not result:
++            continue
++
++        # user
++        if not re.match('\w+', fields[1]):
++            continue
++
++        # time
++        if not re.match('\d+:\d+\.\d+', fields[10]):
++            continue
++
++        # command
++        cmd = fields[11]
++        if not cmd:
++            continue
++
++        if count == max_lines:
++            break
++
++        count += 1
++
++        lines.append(fields)
++
++    return lines
++
++
++def check_for_hogs(report, reported_package, max_hogs, cpu_threshold, mem_threshold):
++    '''
++    Check for cpu and memory hogs.
++    '''
++
++    cpu_hogs = get_hogs(report, reported_package, max_hogs, '%CPU', 8, cpu_threshold)
++    mem_hogs = get_hogs(report, reported_package, max_hogs, '%MEM', 9, mem_threshold)
++
++    # Add the hog details to the report, adding a tag if any of the hogs
++    # matches the package the Bug is being reported against.
++
++    if len(cpu_hogs):
++        report['CPUHogs'] = str(cpu_hogs)
++        got = [i for i, v in enumerate(cpu_hogs) if v[0] == reported_package]
++        if got:
++            report['Tags'] += ' {}'.format('cpu-hog')
++
++    if len(mem_hogs):
++        report['MemoryHogs'] = str(mem_hogs)
++        got = [i for i, v in enumerate(mem_hogs) if v[0] == reported_package]
++        if got:
++            report['Tags'] += ' {}'.format('memory-hog')
++
++
++def get_hogs(report, reported_package, max_hogs, top_sort_field, field_count, threshold):
++    '''
++    Determine a list of resource hogs.
++
++    @report: report,
++    @reported_package: name of package problem is being reported against,
++    @max_hogs: number of hogs to consider,
++    @top_sort_field: name of top(1) field for a specified resource,
++    @field_count: the field number of @top_sort_field (zero-indexed),
++    @threshold: float threshold value - any process above this will be
++     considered to be a hog.
++
++    Returns: List of tuples where each tuple is (full_command_path, package, percentage).
++    '''
++    hogs = []
++    count = 0
++
++    lines = run_top(max_hogs, top_sort_field)
++
++    for fields in lines:
++        pid = fields[0]
++
++        result = re.match('\d+\.\d+', fields[field_count])
++        if not result:
++            continue
++        hog_field = float(result.group(0))
++
++        count += 1
++
++        # top(1) displays sorted output in descending order so if the top
++        # offenders don't meet the threshold, there is no point
++        # looking any further.
++        if count == 1 and hog_field < threshold:
++            break
++
++        cmd = apport.hookutils.read_file('/proc/' + pid + '/cmdline').split('\0')[0]
++
++        if not cmd:
++            # kernel thread gone haywire?
++            continue
++
++        if cmd[0] != '/':
++            cmd = get_pid_path(pid)
++            if not cmd:
++                continue
++
++        pkg = apport.packaging.get_file_package(cmd)
++
++        hogs.append((cmd, pkg, hog_field))
++
++    return hogs
++
++
++def add_info(report):
++    # Consider this number of the top cpu-hogging processes.
++    max_hogs = 3
++
++    # Percentage threshold for cpu and memory hogs
++    cpu_threshold = 80
++    mem_threshold = 50
++
++    # Only consider bugs reports for now
++    if report.get('ProblemType', None) != 'Bug':
++        return
++
++    # We need to know the package the user is reporting the bug against
++    if not 'Package' in report:
++        return
++
++    reported_package = report.get('Package').split()[0]
++
++    check_for_hogs(report, reported_package, max_hogs, cpu_threshold, mem_threshold)
 === added file 'test/test_resource_hogs.py'
 --- test/test_resource_hogs.py	1970-01-01 00:00:00 +0000
 +++ test/test_resource_hogs.py	2013-12-13 22:41:36 +0000
@@ -0,0 +1,190 @@
++#!/usr/bin/python3
++
++import unittest
++import sys
++import time
++import os
++import tempfile
++from multiprocessing import Process, Pipe
++
++datadir = os.environ.get('APPORT_DATA_DIR', '/usr/share/apport')
++sys.path.insert(0, os.path.join(datadir, 'general-hooks'))
++
++import apport.ui
++import apport.report
++
++
++class TestSuiteUserInterface(apport.ui.UserInterface):
++    def __init__(self):
++        # use our dummy crashdb
++        self.crashdb_conf = tempfile.NamedTemporaryFile()
++        self.crashdb_conf.write(b'''default = 'testsuite'
++databases = {
++    'testsuite': {
++        'impl': 'memory',
++        'bug_pattern_url': None,
++    },
++    'debug': {
++        'impl': 'memory',
++        'distro': 'debug',
++    },
++}
++''')
++        self.crashdb_conf.flush()
++
++        os.environ['APPORT_CRASHDB_CONF'] = self.crashdb_conf.name
++
++        apport.ui.UserInterface.__init__(self)
++
++        self.crashdb = apport.crashdb_impl.memory.CrashDatabase(
++            None, {'dummy_data': 1, 'dupdb_url': ''})
++
++        # state of progress dialogs
++        self.ic_progress_active = False
++        self.ic_progress_pulses = 0  # count the pulses
++        self.upload_progress_active = False
++        self.upload_progress_pulses = 0
++
++        # these store the choices the ui_present_* calls do
++        self.present_package_error_response = None
++        self.present_kernel_error_response = None
++        self.present_details_response = None
++        self.question_yesno_response = None
++        self.question_choice_response = None
++        self.question_file_response = None
++
++        self.opened_url = None
++        self.present_details_shown = False
++
++        self.clear_msg()
++
++    def clear_msg(self):
++        # last message box
++        self.msg_title = None
++        self.msg_text = None
++        self.msg_severity = None  # 'warning' or 'error'
++        self.msg_choices = None
++
++    def ui_present_report_details(self, allowed_to_report=True, modal_for=None):
++        self.present_details_shown = True
++        return self.present_details_response
++
++    def ui_info_message(self, title, text):
++        self.msg_title = title
++        self.msg_text = text
++        self.msg_severity = 'info'
++
++    def ui_error_message(self, title, text):
++        self.msg_title = title
++        self.msg_text = text
++        self.msg_severity = 'error'
++
++    def ui_start_info_collection_progress(self):
++        self.ic_progress_pulses = 0
++        self.ic_progress_active = True
++
++    def ui_pulse_info_collection_progress(self):
++        assert self.ic_progress_active
++        self.ic_progress_pulses += 1
++
++    def ui_stop_info_collection_progress(self):
++        self.ic_progress_active = False
++
++    def ui_start_upload_progress(self):
++        self.upload_progress_pulses = 0
++        self.upload_progress_active = True
++
++    def ui_set_upload_progress(self, progress):
++        assert self.upload_progress_active
++        self.upload_progress_pulses += 1
++
++    def ui_stop_upload_progress(self):
++        self.upload_progress_active = False
++
++    def open_url(self, url):
++        self.opened_url = url
++
++    def ui_question_yesno(self, text):
++        self.msg_text = text
++        return self.question_yesno_response
++
++    def ui_question_choice(self, text, options, multiple):
++        self.msg_text = text
++        self.msg_choices = options
++        return self.question_choice_response
++
++    def ui_question_file(self, text):
++        self.msg_text = text
++        return self.question_file_response
++
++
++class T(unittest.TestCase):
++    def eat_memory(self, conn):
++        '''
++        Allocate lots of memory.
++        '''
++
++        Gb = 2 ** 30
++
++        # Evil: Keep allocating until we fail, then block,
++        # keeping the memory we already have allocated.
++        l = []
++        try:
++            while True:
++                l.append(' ' * Gb)
++        except MemoryError:
++            conn.recv()
++
++    def eat_cpu(self):
++        '''
++        Consume CPU by spinning in a tight loop.
++        '''
++        while True:
++            True
++
++    def test_memory_hog(self):
++        '''Ensure a memory hog is detected'''
++
++        parent_conn, child_conn = Pipe()
++
++        p = Process(target=self.eat_memory, args=(child_conn,))
++        p.start()
++
++        # wait until it's had a chance to get going
++        time.sleep(2)
++
++        self.ui = TestSuiteUserInterface()
++        self.ui.report = apport.Report('Bug')
++
++        # need "a" package to trigger the hook
++        self.ui.report['Package'] = 'foo'
++
++        self.ui.collect_info()
++        self.ui.report.add_hooks_info(self.ui)
++
++        self.assertTrue(set(['MemoryHogs']).issubset(set(self.ui.report.keys())))
++
++        p.terminate()
++
++    def test_cpu_hog(self):
++        p = Process(target=self.eat_cpu)
++        p.start()
++
++        # wait until it's had a chance to get going
++        time.sleep(2)
++
++        self.ui = TestSuiteUserInterface()
++        self.ui.report = apport.Report('Bug')
++
++        # need "a" package to trigger the hook
++        self.ui.report['Package'] = 'foo'
++
++        self.ui.collect_info()
++        self.ui.report.add_hooks_info(self.ui)
++
++        self.assertTrue(set(['CPUHogs']).issubset(set(self.ui.report.keys())))
++
++        p.terminate()
++
++
++unittest.main()

Apport

Merge lp:~jamesodhunt/apport/bug-1256268 into lp:~apport-hackers/apport/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers