Sextant

Merge lp:~ben-hutchings/ensoft-sextant/rel-merge into lp:ensoft-sextant

rel-merge
Merge into whiteline

Proposed by Ben Hutchings on 2014-11-25

Status:	Merged
Approved by:	Robert on 2014-12-01
Approved revision:	63
Merged at revision:	34
Proposed branch:	lp:~ben-hutchings/ensoft-sextant/rel-merge
Merge into:	lp:ensoft-sextant
Diff against target:	1428 lines (+510/-441) 11 files modified resources/sextant/web/interface.html (+10/-3) resources/sextant/web/queryjavascript.js (+9/-4) src/sextant/db_api.py (+247/-102) src/sextant/export.py (+9/-5) src/sextant/objdump_parser.py (+46/-12) src/sextant/test_db.py (+97/-0) src/sextant/test_db_api.py (+0/-275) src/sextant/test_parser.py (+21/-17) src/sextant/test_resources/parser_test.dump (+25/-0) src/sextant/update_db.py (+3/-3) src/sextant/web/server.py (+43/-20)
To merge this branch:	bzr merge lp:~ben-hutchings/ensoft-sextant/rel-merge
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Robert		2014-11-25	Approve on 2014-12-01
Review via email: mp+242762@code.launchpad.net

This proposal supersedes a proposal from 2014-11-21.

Commit message

Adds a feature to Sextant that allows the user to filter the results so that you only show the callgraph within a particular file.

Description of the change

For merge

Markups done.

Revision history for this message

Robert (rjwills) on 2014-12-01:

review: Approve

Revision history for this message

Robert (rjwills) wrote on 2014-12-02:

Adds a feature to Sextant that allows the user to filter the results so that you only show the callgraph within a particular file.

Revision history for this message

Robert (rjwills) wrote on 2014-12-02:

> Adds a feature to Sextant that allows the user to filter the results so that
> you only show the callgraph within a particular file.

This comment was added in error, it was supposed to be the commit comment.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Ben Hutchings

Ensoft Patch Lander

Patrick Stevens

 === modified file 'resources/sextant/web/interface.html'
 --- resources/sextant/web/interface.html	2014-11-13 10:47:34 +0000
 +++ resources/sextant/web/interface.html	2014-11-25 11:40:53 +0000
@@ -27,8 +27,8 @@
                      All functions calling specific function</option>
                  <option value="functions_called_by">
                      All functions called by a specific function</option>
--                <!--option value="all_call_paths"> REMOVED AS THIS IS SLOW FOR IOS
--                    All function call paths between two functions</option-->
++                <option value="all_call_paths">
++                    All function call paths between two functions</option>
                  <option value="shortest_call_path">
                      Shortest path between two functions</option>
                  <option value="function_names">
@@ -39,7 +39,14 @@
                  <input type="checkbox" id="suppress_common" value="True"></input>
                  Suppress common functions?
              </label>
--
++            <label>
++                <input type="checkbox" id="limit_internal" value="True"></input>
++                Limit to internal calls?
++            </label>
++            <label>
++                <input type="number" id="max_depth" value="1" min="0" max="10"></input>
++                Maximum call depth.
++            </label>
              <button class="button" style="float:right; margin: 1px 20px -1px 0;" onclick="execute_query()">Run Query</button>
          </div>
          <div id="toolbar-row2" style="margin-left: 234px;">
 === modified file 'resources/sextant/web/queryjavascript.js'
 --- resources/sextant/web/queryjavascript.js	2014-11-21 12:50:51 +0000
 +++ resources/sextant/web/queryjavascript.js	2014-11-25 11:40:53 +0000
@@ -157,16 +157,21 @@
      } else {
          //If not function names we will want a graph as an output;
          //url returns svg file of graph.
--                // We use a random number argument to prevent caching.
++        // We use a random number argument to prevent caching.
          var string = "/output_graph.svg?stop_cache=" + String(Math.random()) + "&program_name=" +
              document.getElementById("program_name").value +
--            "&query=" + query_id + "&func1=";
++            "&query=" + query_id + "&function_calling=";
          string = string + document.getElementById("function_1").value +
--            "&func2=" + document.getElementById("function_2").value;
++            "&function_called=" + document.getElementById("function_2").value;
          string = string + "&suppress_common=" +
              document.getElementById('suppress_common').checked.toString();
++        string = string + "&limit_internal=" +
++            document.getElementById('limit_internal').checked.toString();
++    string = string + "&max_depth=" +
++            document.getElementById('max_depth').value.toString();
++    }
++
--    }
      var xmlhttp = new XMLHttpRequest();
          xmlhttp.open("GET", string, true);
          xmlhttp.send();
 === modified file 'src/sextant/db_api.py'
 --- src/sextant/db_api.py	2014-11-19 10:40:24 +0000
 +++ src/sextant/db_api.py	2014-11-25 11:40:53 +0000
@@ -159,28 +159,43 @@
          tmp_path = os.path.join(self._tmp_dir, '{}_{{}}'.format(program_name))
          self.func_writer = CSVWriter(tmp_path.format('funcs'),
--                                     headers=['name', 'type', 'file'],
++                                     headers=('name', 'type', 'file'),
                                       max_rows=5000)
          self.call_writer = CSVWriter(tmp_path.format('calls'),
--                                     headers=['caller', 'callee'],
++                                     headers=('caller', 'callee', 'is_internal'),
                                       max_rows=5000)
          # Define the queries we use to upload the functions and calls.
--        self.add_func_query = (' USING PERIODIC COMMIT 250'
++        self.add_func_query = (
++                 ' USING PERIODIC COMMIT 250'
                   ' LOAD CSV WITH HEADERS FROM "file:{}" AS line'
                   ' WITH line, toInt(line.id) as lineid'
                   ' MATCH (n:program {{name: "{}"}})'
                   ' CREATE (n)-[:subject]->(m:func {{name: line.name,'
                   ' id: lineid, type: line.type, file: line.file}})')
--        self.add_call_query = (' USING PERIODIC COMMIT 250'
--                 ' LOAD CSV WITH HEADERS FROM "file:{}" AS line'
--                 ' MATCH (p:program {{name: "{}"}})'
--                 ' MATCH (p)-[:subject]->(n:func {{name: line.caller}})'
--                 ' USING INDEX n:func(name)'
--                 ' MATCH (p)-[:subject]->(m:func {{name: line.callee}})'
--                 ' USING INDEX m:func(name)'
--                 ' CREATE (n)-[r:calls]->(m)')
++        self.add_internal_call_query = (
++                 ' USING PERIODIC COMMIT 250'
++                 ' LOAD CSV WITH HEADERS FROM "file:{}" AS line'
++                 ' WITH line WHERE line.is_internal = "True"'
++                 ' MATCH (p:program {{name: "{}"}})'
++                 ' MATCH (p)-[:subject]->(n:func {{name: line.caller}})'
++                 ' USING INDEX n:func(name)'
++                 ' MATCH (p)-[:subject]->(m:func {{name: line.callee}})'
++                 ' USING INDEX m:func(name)'
++                 ' CREATE (n)-[r:internal]->(m)')
++
++        self.add_external_call_query = (
++                 ' USING PERIODIC COMMIT 250'
++                 ' LOAD CSV WITH HEADERS FROM "file:{}" AS line'
++                 ' WITH line WHERE line.is_internal <> "True"'
++                 ' MATCH (p:program {{name: "{}"}})'
++                 ' MATCH (p)-[:subject]->(n:func {{name: line.caller}})'
++                 ' USING INDEX n:func(name)'
++                 ' MATCH (p)-[:subject]->(m:func {{name: line.callee}})'
++                 ' USING INDEX m:func(name)'
++                 ' CREATE (n)-[r:external]->(m)')
++
          self.add_program_query = ('CREATE (p:program {{name: "{}", uploader: "{}", '
                  ' uploader_id: "{}", date: "{}",'
@@ -221,7 +236,7 @@
          """
          self.func_writer.write(name, typ, source)
--    def add_call(self, caller, callee):
++    def add_call(self, caller, callee, is_internal=False):
          """
          Add a function call.
@@ -230,8 +245,11 @@
                  The name of the function making the call.
              callee:
                  The name of the function called.
++            is_internal:
++                True if the caller's source file is the same as callee's,
++                unless either one is 'unknown'.
          """
--        self.call_writer.write(caller, callee)
++        self.call_writer.write(caller, callee, is_internal)
      def _copy_local_to_remote_tmp_dir(self):
@@ -257,7 +275,6 @@
              remote_paths:
                  A list of the paths of the remote fils.
          """
--
          def try_rmdir(path):
              # Helper function to try and remove a directory, silently
              # fail if it contains files, otherwise raise the exception.
@@ -270,6 +287,7 @@
                  else:
                      raise e
++
          print('Cleaning temporary files...', end='')
          file_paths = list(itertools.chain(self.func_writer.file_iter(),
                                            self.call_writer.file_iter()))
@@ -279,7 +297,7 @@
          try_rmdir(self._tmp_dir)
          try_rmdir(TMP_DIR)
--
++
          self._ssh.remove_from_tmp_dir(remote_paths)
          print('done.')
@@ -335,10 +353,12 @@
                                                      func_count, call_count))
              tx.commit()
--            # Create the functions.
--            for files, query, descr in zip((remote_funcs, remote_calls),
--                                           (self.add_func_query, self.add_call_query),
--                                           ('funcs', 'calls')):
++            # Add the functions and internal and external calls to the database.
++            fqds = ((remote_funcs, self.add_func_query, 'functions'),
++                    (remote_calls, self.add_internal_call_query, 'internal calls'),
++                    (remote_calls, self.add_external_call_query, 'external calls'))
++
++            for files, query, descr in fqds:
                  start = time()
                  for i, path in enumerate(files):
                      completed = int(100*float(i+1)/len(files))
@@ -377,7 +397,7 @@
          Loop over all functions: increment the called-by count of their callees.
          """
          for func in self.functions:
--            for called in func.functions_i_call:
++            for called, is_internal in func.functions_i_call:
                  called.number_calling_me += 1
      def _rest_node_output_to_graph(self, rest_output):
@@ -442,8 +462,8 @@
                                                              for_query=True)
              for index in result:
                  q = ("START n=node({})"
--                     "MATCH n-[calls:calls]->(m)"
--                     "RETURN n.name, m.name").format(result[index][2])
++                     "MATCH n-[r]->(m)"
++                     "RETURN n.name, m.name, type(r) = 'internal'").format(result[index][2])
                  new_tx.append(q)
              logging.debug('exec')
@@ -454,13 +474,17 @@
              for call_list in results:
                  if call_list:
--                    # call_list has element 0 being an arbitrary call this
--                    # function makes; element 0 of that call is the name of the
--                    # function itself. Think {{'orig', 'b'}, {'orig', 'c'}}.
--                    orig = call_list[0][0]
--                    # result['orig'] is [<Function>, ('callee1','callee2')]
--                    result[orig][1] |= set(list(zip(*call_list.elements))[1])
--                    # recall: set union is denoted by |
++                    elements = call_list.elements
++                    # elements is a list of lists of form:
++                    # [[caller1, callee1, is_internal],
++                    #  [caller1, callee2, is_internal],
++                    #  ...]
++
++                    caller = elements[0][0]
++                    callees, is_internals = zip(*elements)[1:]
++
++                    # result[caller] is [<Function>, <set of callee, is_internal tuples>]
++                    result[caller][1] |= set(zip(callees, is_internals))
          else:
              # We don't have a parent database connection.
@@ -478,8 +502,8 @@
          named_function = lambda name: result[name][0] if name in result else None
          for function, calls, node_id in result.values():
--            what_i_call = [named_function(name)
--                           for name in calls
++            what_i_call = [(named_function(name), is_internal)
++                           for name, is_internal in calls
                             if named_function(name) is not None]
              function.functions_i_call = what_i_call
@@ -707,7 +731,7 @@
          func_count, call_count = tx.commit()[0].elements[0]
          del_call_query = ('OPTIONAL MATCH (p:program {{name: "{}"}})'
--                          '-[:subject]->(f:func)-[c:calls]->()'
++                          '-[:subject]->(f:func)-[c]->()'
                            ' WITH c LIMIT 5000 DELETE c RETURN count(distinct(c))'
                            .format(program_name))
@@ -817,40 +841,32 @@
          result = self._db.query(q, returns=neo4jrestclient.Node)
          return bool(result)
--    def check_function_exists(self, program_name, function_name):
--        """
--        Execute query to check whether a function with the given name exists.
--        We only check for functions which are children of a program with the
--          given program_name.
--        :param program_name: string name of the program within which to check
--        :param function_name: string name of the function to check for existence
--        :return: bool(names validate correctly, and function exists in program)
--        """
--        if not validate_query(program_name):
--            return False
--
--        pmatch = '(:program {{name: "{}"}})'.format(program_name)
--        fmatch = '(f:func {{name: "{}"}})'.format(function_name)
--        # be explicit about index usage
--        q = (' MATCH {}-[:subject]->{} USING INDEX f:func(name)'
--             ' RETURN f LIMIT 1'.format(pmatch, fmatch))
--
--        # result will be an empty list if the function was not found
--        result = self._db.query(q, returns=neo4jrestclient.Node)
--        return bool(result)
--
      def get_function_names(self, program_name, search=None, max_funcs=None):
          """
          Execute query to retrieve a list of all functions in the program.
          Any of the output names can be used verbatim in any SextantConnection
            method which requires a function-name input.
--        :param program_name: name of the program whose functions to retrieve
--        :return: None if program_name doesn't exist in the remote database,
--          a set of function-name strings otherwise.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a set of function-name strings otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
++
++            search:
++                A string of form  <name_match>:<file_match>, where at least
++                one of name_match and file_match is provided, and each may be a
++                comma separated list of strings containing wildcard '.*'
++                sequences.
++
++            max_funcs:
++                An integer limiting the number of functions returned by this
++                method.
          """
          if not validate_query(program_name):
--            return set()
++            return None
          limit = "LIMIT {}".format(max_funcs) if max_funcs else ""
@@ -859,8 +875,8 @@
                   ' RETURN f.name {}').format(program_name, limit)
          else:
              q = (' MATCH (:program {{name: "{}"}})-[:subject]->(f:func)'
--                 ' WHERE f.name =~ ".*{}.*" RETURN f.name {}'
--                 .format(program_name, search, limit))
++                 ' {} RETURN f.name {}'
++                 .format(program_name, self.get_query('f', search), limit))
          return {func[0] for func in self._db.query(q)}
      @staticmethod
@@ -888,6 +904,7 @@
                  etc.
          """
++
          if ':' in search:
              func_subs, file_subs = search.split(':')
          else:
@@ -928,49 +945,134 @@
          return query_str
--    def get_all_functions_called(self, program_name, function_calling):
--        """
--        Execute query to find all functions called by a function (indirectly).
--        If the given function is not present in the program, returns None;
--          likewise if the program_name does not exist.
--        :param program_name: a string name of the program we wish to query under
--        :param function_calling: string name of a function whose children to find
--        :return: FunctionQueryResult, maximal subgraph rooted at function_calling
--        """
++    def get_all_functions_called(self, program_name, function_calling,
++                                 limit_internal=False, max_depth=0):
++        """
++        Return the subtrees of the callgraph rooted at the specified functions.
++
++        Optionally limit to a maximum call depth, or to only internal (same
++        source file) calls. In the case of the latter, also include a single
++        extra hop into external functions.
++
++        If the function has no calls, it will be returned alone.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a FunctionQueryResult containing the nodes and relationships
++            otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
++
++            function_calling:
++                A string of form  <name_match>:<file_match>, where at least
++                one of name_match and file_match is provided, and each may be a
++                comma separated list of strings containing wildcard '.*'
++                sequences. Specifies the list of subtree roots.
++
++            limit_internal:
++                If true, only explore internal calls, but also add one extra
++                level (above max_depth) along external calls.
++
++            max_depth:
++                An integer which will limit the depth
++                of the subtrees. '0' corresponds to unlimited depth.
++        """
++
++        if not validate_query(program_name):
++             return None
++
          q = (' MATCH (p:program {{name: "{}"}})-[:subject]->(f:func) {}'
--             ' MATCH (f)-[:calls]->(g:func) RETURN distinct f, g'
--             .format(program_name, SextantConnection.get_query('f', function_calling)))
++                ' MATCH (f)-[{}*0..{}]->(g:func)'
++             .format(program_name, SextantConnection.get_query('f', function_calling),
++                     ':internal' if limit_internal else '',
++                     max_depth or ''))
++
++        if limit_internal:
++            q += (' WITH f, g MATCH (g)-[:external*0..1]->(h)'
++                  ' RETURN distinct f, h')
++        else:
++            q += ' RETURN distinct f, g'
          return self._execute_query(program_name, q)
--    def get_all_functions_calling(self, program_name, function_called):
--        """
--        Execute query to find all functions which call a function (indirectly).
--        If the given function is not present in the program, returns None;
--          likewise if the program_name does not exist.
--        :param program_name: a string name of the program we wish to query
--        :param function_called: string name of a function whose parents to find
--        :return: FunctionQueryResult, maximal connected subgraph with leaf function_called
--        """
++    def get_all_functions_calling(self, program_name, function_called,
++                                  limit_internal=False, max_depth=1):
++        """
++        Return functions calling the specified functions.
++
++        Optionally limit to a maximum call depth, or to only internal (same
++        source file) calls.
++
++        If the function is not called, return it alone.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a FunctionQueryResult containing the nodes and relationships
++            otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
++
++            function_called:
++                A string of form  <name_match>:<file_match>, where at least
++                one of name_match and file_match is provided, and each may be a
++                comma separated list of strings containing wildcard '.*'
++                sequences. Specifies the list of functions to match.
++
++            limit_internal:
++                If true, only explore internal calls.
++
++            max_depth:
++                An integer which will limit the depth
++                of the subtrees. '0' corresponds to unlimited depth.
++        """
++
++        if not validate_query(program_name):
++             return None
          q = (' MATCH (p:program {{name: "{}"}})-[:subject]->(g:func) {}'
--             ' MATCH (f)-[:calls]->(g)'
++             ' MATCH (f)-[{}*0..{}]->(g)'
               ' RETURN distinct f, g')
--        q = q.format(program_name, SextantConnection.get_query('g', function_called), program_name)
++        q = q.format(program_name, SextantConnection.get_query('g', function_called),
++                     ':internal' if limit_internal else ':internal|external',
++                     max_depth or '')
          return self._execute_query(program_name, q)
--    def get_call_paths(self, program_name, function_calling, function_called):
--        """
--        Execute query to find all possible routes between two specific nodes.
--        If the given functions are not present in the program, returns None;
--          ditto if the program_name does not exist.
--        :param program_name: string program name
--        :param function_calling: string
--        :param function_called: string
--        :return: FunctionQueryResult, the union of all subgraphs reachable by
--          adding a source at function_calling and a sink at function_called.
--        """
++    def get_call_paths(self, program_name, function_calling, function_called,
++                       limit_internal=False, max_depth=1):
++        """
++        Return all call paths between the sets of functions specified by the
++        search strings function_calling and function_called.
++
++        Optionally limit to a maximum call depth, or to only internal (same
++        source file) calls.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a FunctionQueryResult object containing the nodes and relationships
++            otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
++
++            function_calling, function_called:
++                A string of form  <name_match>:<file_match>, where at least
++                one of name_match and file_match is provided, and each may be a
++                comma separated list of strings containing wildcard '.*'
++                sequences. Specifies the list of functions to match.
++
++            limit_internal:
++                If true, only explore internal calls.
++
++            max_depth:
++                An integer which will limit the depth
++                of the subtrees. '0' corresponds to unlimited depth.
++        """
++
++        if not validate_query(program_name):
++             return None
          if not self.check_program_exists(program_name):
              return None
@@ -981,18 +1083,26 @@
          q = (' MATCH (p:program {{name: "{}"}})'
               ' MATCH (p)-[:subject]->(start:func) {} WITH start, p'
               ' MATCH (p)-[:subject]->(end:func) {} WITH start, end'
--             ' MATCH path=(start)-[:calls*]->(end)'
++             ' MATCH path=(start)-[{}*0..{}]->(end)'
               ' WITH DISTINCT nodes(path) AS result'
               ' UNWIND result AS answer'
               ' RETURN answer')
--        q = q.format(program_name, start_q, end_q)
++        q = q.format(program_name, start_q, end_q,
++                     ':internal' if limit_internal else '',
++                     max_depth or '')
          return self._execute_query(program_name, q)
      def get_whole_program(self, program_name):
--        """Execute query to find the entire program with a given name.
--        If the program is not present in the remote database, returns None.
--        :param: program_name: a string name of the program we wish to return.
--        :return: a FunctionQueryResult consisting of the program graph.
++        """
++        Return the full call graph of the program.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a FunctionQueryResult object containing all of the program nodes
++            and calls otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
          """
          if not self.check_program_exists(program_name):
@@ -1002,7 +1112,9 @@
               ' RETURN (f)'.format(program_name))
          return self._execute_query(program_name, q)
--    def get_shortest_path_between_functions(self, program_name, function_calling, function_called):
++    def get_shortest_path_between_functions(self, program_name,
++                                            function_calling, function_called,
++                                            limit_internal=False, max_depth=0):
          """
          Execute query to get a single, shortest, path between two functions.
          :param program_name: string name of the program we wish to search under
@@ -1010,6 +1122,37 @@
          :param func2: the name of the function at which to terminate the path
          :return: FunctionQueryResult shortest path between func1 and func2.
          """
++        """
++        Return all shortest paths between the sets of functions specified.
++
++        Optionally limit to a maximum call depth, or to only internal (same
++        source file) calls.
++
++        Return:
++            None if program_name doesn't exist in the remote database,
++            a FunctionQueryResult object containing the nodes and relationships
++            otherwise.
++        Arguments:
++            program_name:
++                The name of the program to query.
++
++            function_calling, function_called:
++                A string of form  <name_match>:<file_match>, where at least
++                one of name_match and file_match is provided, and each may be a
++                comma separated list of strings containing wildcard '.*'
++                sequences. Specifies the list of functions to match.
++
++            limit_internal:
++                If true, only explore internal calls.
++
++            max_depth:
++                A STRING containing an integer which will limit the depth
++                of the subtrees. '0' corresponds to unlimited depth.
++        """
++
++        if not self.check_program_exists(program_name):
++            return None
++
          if not self.check_program_exists(program_name):
              return None
@@ -1019,10 +1162,12 @@
          q = (' MATCH (p:program {{name: "{}"}})'
               ' MATCH (p)-[:subject]->(start:func) {} WITH start, p'
               ' MATCH (p)-[:subject]->(end:func) {} WITH start, end'
--             ' MATCH path=shortestPath((start)-[:calls*]->(end))'
++             ' MATCH path=allShortestPaths((start)-[{}*..{}]->(end))'
               ' UNWIND nodes(path) AS answer'
               ' RETURN answer')
--        q = q.format(program_name, start_q, end_q)
++        q = q.format(program_name, start_q, end_q,
++                     ':internal' if limit_internal else '',
++                     max_depth or '')
          return self._execute_query(program_name, q)
 === modified file 'src/sextant/export.py'
 --- src/sextant/export.py	2014-11-19 10:32:34 +0000
 +++ src/sextant/export.py	2014-11-25 11:40:53 +0000
@@ -71,9 +71,10 @@
                      if func_called == func:
                          functions_called.remove(func_called)
--            for func_called in func.functions_i_call:
++            for func_called, is_internal in func.functions_i_call:
                  if not (suppress_common_nodes and func_called.is_common):
--                    output_str += ' "{}" -> "{}"\n'.format(func.name, func_called.name)
++                    color = 'black' if is_internal else 'red'
++                    output_str += ' "{}" -> "{}" [color={}]\n'.format(func.name, func_called.name, color)
          output_str += '}'
          return output_str
@@ -140,18 +141,21 @@
                                  </node>\n""".format(func.name, 20, len(display_func)*8, colour, display_func)
              functions_called = func.functions_i_call
++            print(functions_called)
              if remove_self_calls is True:
                  #remove calls where a function calls itself
--                for func_called in functions_called:
++                for func_called, is_internal, in functions_called:
                      if func_called == func:
                          functions_called.remove(func_called)
--            for callee in functions_called:
++            for callee, is_internal in functions_called:
++                print(callee, is_internal)
++                color = "#000000" if is_internal else "#ff0000"
                  if callee not in commonly_called:
                      if not(suppress_common_nodes and callee.is_common):
                              output_str += """<edge source="{}" target="{}"> <data key="d9">
                                          <y:PolyLineEdge>
--                                        <y:LineStyle color="#000000" type="line" width="1.0"/>
++                                        <y:LineStyle color="#ff0000" type="line" width="1.0"/>
                                          <y:Arrows source="none" target="standard"/>
                                          <y:BendStyle smoothed="false"/>
                                          </y:PolyLineEdge>
 === modified file 'src/sextant/objdump_parser.py'
 --- src/sextant/objdump_parser.py	2014-11-21 15:19:15 +0000
 +++ src/sextant/objdump_parser.py	2014-11-25 11:40:53 +0000
@@ -43,11 +43,15 @@
          function_ptr_count:
              The number of function pointers that have been detected.
          _known_functions:
--            A set of the names of functions that have been
--            parsed - used to avoid registering a function multiple times.
++            A dict of the names of functions that have been
++            parsed - used to avoid registering a function multiple times
++            and to label calls as internal/external.
          _partial_functions:
              A set of functions whose names we have seen but whose source
              files we don't yet know.
++        _partial_calls:
++            A set of the (caller, callee) tuples representing calls between
++            a _partial_function and another function.
      """
      def __init__(self, file_path, file_object=None,
@@ -106,16 +110,18 @@
          self.function_ptr_count = 0
          # Avoid adding duplicate functions.
--        self._known_functions = set()
++        self._known_functions = dict()
++        self._known_calls = set()
          # Set of partially-parsed functions.
          self._partial_functions = set()
++        self._partial_calls = set()
          # By default print information to stdout.
          def print_func(name, typ, source='unknown'):
--            print('func {:25}{:15}{}'.format(name, typ, source))
++            print('func {:25} {:15}{}'.format(name, typ, source))
--        def print_call(caller, callee):
--            print('call {:25}{:25}'.format(caller, callee))
++        def print_call(caller, callee, is_internal):
++            print('call {:25} {:25}'.format(caller, callee))
          def print_started(parser):
              print('parse started: {}[{}]'.format(self.path, ', '.join(self.sections)))
@@ -148,6 +154,7 @@
                  self._partial_functions.add(name)
          elif source == 'unknown':
              # Manually adding a stub function.
++            self._known_functions[name] = source
              self.add_function(name, 'stub', source)
              self.function_count += 1
          elif name not in self._known_functions:
@@ -160,7 +167,7 @@
              except KeyError:
                  pass
--            self._known_functions.add(name)
++            self._known_functions[name] = source
              self.add_function(name, 'normal', source)
              self.function_count += 1
@@ -168,15 +175,33 @@
          """
          Add a function pointer.
          """
--        self.add_function(name, 'pointer')
++        self.add_function(name, 'pointer', 'unknown')
          self.function_count += 1
--    def _add_call(self, caller, callee):
++    def _add_call(self, caller, callee, force=False):
          """
          Add a function call from caller to callee.
          """
--        self.add_call(caller, callee)
--        self.call_count += 1
++        if (caller, callee) in self._known_calls:
++            return
++
++        try:
++            files = self._known_functions
++            is_internal = (callee.startswith('func_ptr_')
++                            or (files[caller] != 'unknown'
++                                and files[caller] == files[callee]))
++            self.add_call(caller, callee, is_internal)
++            self._known_calls.add((caller, callee))
++            self.call_count += 1
++        except KeyError:
++            if force:
++                self._add_function(callee, 'unknown')
++                self.add_call(caller, callee, False)
++                self._known_calls.add((caller, callee))
++                self.call_count += 1
++                print(caller, callee)
++            else:
++                self._partial_calls.add((caller, callee))
      def parse(self):
          """
@@ -193,6 +218,10 @@
                  if to_add:
                      file_line = line.startswith('/')
                      source = line.split(':')[0] if file_line else None
++                    if source:
++                        # Prune out the relative parts of the filepath.
++                        source = source.rsplit('./', 1)[-1]
++
                      self._add_function(current_function, source)
                      to_add = False
@@ -227,6 +256,10 @@
                              # Flag function - we look for source on the next line.
                              to_add = True
++                        # If we have come to a new current_function, then we can
++                        # forget about the calls we knew about from the last.
++                        self._known_calls = set()
++
                      elif 'call ' in line or 'callq ' in line:
                          # WHITESPACE to prevent picking up function names
                          # containing 'call'
@@ -268,7 +301,8 @@
              for name in self._partial_functions:
                  self._add_function(name, 'unknown')
--
++            for call in sorted(self._partial_calls, key=lambda el: el[0]):
++                self._add_call(*call, force=True)
              self.finished()
 === added file 'src/sextant/test_db.py'
 --- src/sextant/test_db.py	1970-01-01 00:00:00 +0000
 +++ src/sextant/test_db.py	2014-11-25 11:40:53 +0000
@@ -0,0 +1,97 @@
++#!/usr/bin/python
++# -----------------------------------------
++# Sextant
++# Copyright 2014, Ensoft Ltd.
++# Author: Patrick Stevens, James Harkin
++# -----------------------------------------
++#Testing module
++
++import unittest
++
++import db_api
++import update_db
++
++PNAME = 'tester-parser_test'
++NORMAL = {'main', 'normal', 'wierd$name', 'duplicates'}
++
++
++class TestFunctionQueryResults(unittest.TestCase):
++    @classmethod
++    def setUpClass(cls):
++        # we need to set up the remote database by using the neo4j_input_api
++        cls.remote_url = 'http://ensoft-sandbox:7474'
++        cls.connection = db_api.SextantConnection('ensoft-sandbox', 7474)
++
++        update_db.upload_program
++        update_db.upload_program(cls.connection, 'tester', 'test_resources/parser_test',
++                                 program_name=None, not_object_file=False, add_file_paths=True)
++
++    @classmethod
++    def tearDownClass(cls):
++        cls.connection.delete_program('tester-parser_test')
++        cls.connection.close()
++
++    def test_get_function_names(self):
++        get_names = self.connection.get_function_names
++        # Test getting all names
++        names = get_names(PNAME)
++        # Test file wildcard search
++        parser_names = get_names(PNAME, search=':.*parser_test.c')
++
++        self.assertTrue(names.issuperset(NORMAL))
++        self.assertEquals(len(names), 24)
++        self.assertEquals(parser_names, {u'main', u'normal', u'duplicates', u'wierd$name'})
++
++        # Test the wildcard matching
++        search = self.connection.get_function_names(PNAME, search='.*libc.*')
++        search_exp = {u'__libc_csu_init', u'__libc_csu_fini', u'__libc_start_main'}
++
++        self.assertEquals(search, search_exp)
++
++        # Test the limiting
++        too_many = self.connection.get_function_names(PNAME, max_funcs=3)
++        self.assertEquals(len(too_many), 3)
++
++        # Test empty for non-existant program
++        self.assertFalse(self.connection.get_function_names('blah blah blah'))
++
++    def test_get_all_functions_called(self):
++        get_fns = self.connection.get_all_functions_called
++
++        for depth, num in zip([0, 1, 2, 3], [8, 3, 8, 8]):
++            result = get_fns(PNAME, 'main', False, depth).functions
++            self.assertEquals(len(result), num, str(result))
++
++        for depth, num in zip([0, 1, 2, 3], [8, 4, 8, 8]):
++            # Limit to internal functions
++            # TODO this isn't a great test - need greater call depth
++            result = get_fns(PNAME, 'main', True, depth).functions
++            self.assertEquals(len(result), num)
++
++    def test_get_all_functions_calling(self):
++        get_fns = self.connection.get_all_functions_calling
++
++        for depth, num in zip(range(3), [4, 3, 4, 4]):
++            result = get_fns(PNAME, 'printf', limit_internal=False, max_depth=depth).functions
++            self.assertEquals(len(result), num)
++
++        for depth, num in zip(range(3), [1, 1, 1, 1]):
++            result = get_fns(PNAME, 'printf', limit_internal=True, max_depth=depth).functions
++            self.assertEquals(len(result), num)
++
++    def test_get_all_paths_between(self):
++       get_paths = self.connection.get_call_paths
++
++       result = {f.name for f in get_paths(PNAME, 'main', 'wierd$name', True, 0).functions}
++       exp = {'main', 'normal', 'duplicates', 'wierd$name'}
++       self.assertEquals(result, exp)
++
++    def test_get_shortest_paths_between(self):
++        get_paths = self.connection.get_shortest_path_between_functions
++
++        result = {f.name for f in get_paths(PNAME, 'main', 'wierd$name', True, 0).functions}
++        exp = {u'main', u'normal', u'wierd$name'}
++        self.assertEquals(result, exp)
++
++if __name__ == '__main__':
++    unittest.main()
 === removed file 'src/sextant/test_db_api.py'
 --- src/sextant/test_db_api.py	2014-10-16 15:26:47 +0000
 +++ src/sextant/test_db_api.py	1970-01-01 00:00:00 +0000
@@ -1,275 +0,0 @@
--#!/usr/bin/python
--# -----------------------------------------
--# Sextant
--# Copyright 2014, Ensoft Ltd.
--# Author: Patrick Stevens, James Harkin
--# -----------------------------------------
--#Testing module
--
--import unittest
--
--from db_api import Function
--from db_api import FunctionQueryResult
--from db_api import SextantConnection
--from db_api import validate_query
--
--
--class TestFunctionQueryResults(unittest.TestCase):
--    @classmethod
--    def setUpClass(cls):
--        # we need to set up the remote database by using the neo4j_input_api
--        cls.remote_url = 'http://ensoft-sandbox:7474'
--
--        cls.setter_connection = SextantConnection('ensoft-sandbox', 7474)
--
--        cls.program_1_name = 'testprogram'
--        cls.one_node_program_name = 'testprogram1'
--        cls.empty_program_name = 'testprogramblank'
--
--        # if anything failed before, delete programs now
--        cls.setter_connection.delete_program(cls.program_1_name)
--        cls.setter_connection.delete_program(cls.one_node_program_name)
--        cls.setter_connection.delete_program(cls.empty_program_name)
--
--
--        cls.upload_program = cls.setter_connection.new_program(cls.program_1_name)
--        cls.upload_program.add_function('func1')
--        cls.upload_program.add_function('func2')
--        cls.upload_program.add_function('func3')
--        cls.upload_program.add_function('func4')
--        cls.upload_program.add_function('func5')
--        cls.upload_program.add_function('func6')
--        cls.upload_program.add_function('func7')
--        cls.upload_program.add_call('func1', 'func2')
--        cls.upload_program.add_call('func1', 'func4')
--        cls.upload_program.add_call('func2', 'func1')
--        cls.upload_program.add_call('func2', 'func4')
--        cls.upload_program.add_call('func3', 'func5')
--        cls.upload_program.add_call('func4', 'func4')
--        cls.upload_program.add_call('func4', 'func5')
--        cls.upload_program.add_call('func5', 'func1')
--        cls.upload_program.add_call('func5', 'func2')
--        cls.upload_program.add_call('func5', 'func3')
--        cls.upload_program.add_call('func6', 'func7')
--
--        cls.upload_program.commit()
--
--        cls.upload_one_node_program = cls.setter_connection.new_program(cls.one_node_program_name)
--        cls.upload_one_node_program.add_function('lonefunc')
--
--        cls.upload_one_node_program.commit()
--
--        cls.upload_empty_program = cls.setter_connection.new_program(cls.empty_program_name)
--
--        cls.upload_empty_program.commit()
--
--        cls.getter_connection = cls.setter_connection
--
--
--    @classmethod
--    def tearDownClass(cls):
--        cls.setter_connection.delete_program(cls.upload_program.program_name)
--        cls.setter_connection.delete_program(cls.upload_one_node_program.program_name)
--        cls.setter_connection.delete_program(cls.upload_empty_program.program_name)
--
--        cls.setter_connection.close()
--        del(cls.setter_connection)
--
--    def test_17_get_call_paths(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)
--        reference1.functions = [Function(self.program_1_name, 'func1'), Function(self.program_1_name, 'func2'),
--                                Function(self.program_1_name, 'func3'),
--                                Function(self.program_1_name, 'func4'), Function(self.program_1_name, 'func5')]
--        reference1.functions[0].functions_i_call = reference1.functions[1:4:2]  # func1 calls func2, func4
--        reference1.functions[1].functions_i_call = reference1.functions[0:4:3]  # func2 calls func1, func4
--        reference1.functions[2].functions_i_call = [reference1.functions[4]]  # func3 calls func5
--        reference1.functions[3].functions_i_call = reference1.functions[3:5]  # func4 calls func4, func5
--        reference1.functions[4].functions_i_call = reference1.functions[0:3]  # func5 calls func1, func2, func3
--        self.assertEquals(reference1, self.getter_connection.get_call_paths(self.program_1_name, 'func1', 'func2'))
--        self.assertIsNone(self.getter_connection.get_call_paths('not a prog', 'func1', 'func2')) # shouldn't validation
--        self.assertIsNone(self.getter_connection.get_call_paths('notaprogram', 'func1', 'func2'))
--        self.assertIsNone(self.getter_connection.get_call_paths(self.program_1_name, 'notafunc', 'func2'))
--        self.assertIsNone(self.getter_connection.get_call_paths(self.program_1_name, 'func1', 'notafunc'))
--
--    def test_02_get_whole_program(self):
--        reference = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)
--        reference.functions = [Function(self.program_1_name, 'func1'), Function(self.program_1_name, 'func2'),
--                               Function(self.program_1_name, 'func3'),
--                               Function(self.program_1_name, 'func4'), Function(self.program_1_name, 'func5'),
--                               Function(self.program_1_name, 'func6'), Function(self.program_1_name, 'func7')]
--        reference.functions[0].functions_i_call = reference.functions[1:4:2]  # func1 calls func2, func4
--        reference.functions[1].functions_i_call = reference.functions[0:4:3]  # func2 calls func1, func4
--        reference.functions[2].functions_i_call = [reference.functions[4]]  # func3 calls func5
--        reference.functions[3].functions_i_call = reference.functions[3:5]  # func4 calls func4, func5
--        reference.functions[4].functions_i_call = reference.functions[0:3]  # func5 calls func1, func2, func3
--        reference.functions[5].functions_i_call = [reference.functions[6]]  # func6 calls func7
--
--
--        self.assertEqual(reference, self.getter_connection.get_whole_program(self.program_1_name))
--        self.assertIsNone(self.getter_connection.get_whole_program('nottherightprogramname'))
--
--    def test_03_get_whole_one_node_program(self):
--        reference = FunctionQueryResult(parent_db=None, program_name=self.one_node_program_name)
--        reference.functions = [Function(self.one_node_program_name, 'lonefunc')]
--
--        self.assertEqual(reference, self.getter_connection.get_whole_program(self.one_node_program_name))
--
--    def test_04_get_whole_empty_program(self):
--        reference = FunctionQueryResult(parent_db=None, program_name=self.empty_program_name)
--        reference.functions = []
--
--        self.assertEqual(reference, self.getter_connection.get_whole_program(self.empty_program_name))
--
--    def test_05_get_function_names(self):
--        reference = {'func1', 'func2', 'func3', 'func4', 'func5', 'func6', 'func7'}
--        self.assertEqual(reference, self.getter_connection.get_function_names(self.program_1_name))
--
--    def test_06_get_function_names_one_node_program(self):
--        reference = {'lonefunc'}
--        self.assertEqual(reference, self.getter_connection.get_function_names(self.one_node_program_name))
--
--    def test_07_get_function_names_empty_program(self):
--        reference = set()
--        self.assertEqual(reference, self.getter_connection.get_function_names(self.empty_program_name))
--
--    def test_09_validation_is_used(self):
--        self.assertFalse(self.getter_connection.get_function_names('not alphanumeric'))
--        self.assertFalse(self.getter_connection.get_whole_program('not alphanumeric'))
--        self.assertFalse(self.getter_connection.check_program_exists('not alphanumeric'))
--        self.assertFalse(self.getter_connection.check_function_exists('not alphanumeric', 'alpha'))
--        self.assertFalse(self.getter_connection.check_function_exists('alpha', 'not alpha'))
--        self.assertFalse(self.getter_connection.get_all_functions_called('alphaprogram', 'not alpha function'))
--        self.assertFalse(self.getter_connection.get_all_functions_called('not alpha program', 'alphafunction'))
--        self.assertFalse(self.getter_connection.get_all_functions_calling('not alpha program', 'alphafunction'))
--        self.assertFalse(self.getter_connection.get_all_functions_calling('alphaprogram', 'not alpha function'))
--        self.assertFalse(self.getter_connection.get_call_paths('not alpha program','alphafunc1', 'alphafunc2'))
--        self.assertFalse(self.getter_connection.get_call_paths('alphaprogram','not alpha func 1', 'alphafunc2'))
--        self.assertFalse(self.getter_connection.get_call_paths('alphaprogram','alphafunc1', 'not alpha func 2'))
--
--    def test_08_get_program_names(self):
--        reference = {self.program_1_name, self.one_node_program_name, self.empty_program_name}
--        self.assertTrue(reference.issubset(self.getter_connection.get_program_names()))
--
--
--    def test_11_get_all_functions_called(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 1,2,3,4,5 component
--        reference2 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 6,7 component
--        reference3 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 7 component
--        reference1.functions = [Function(self.program_1_name, 'func1'), Function(self.program_1_name, 'func2'),
--                                Function(self.program_1_name, 'func3'),
--                                Function(self.program_1_name, 'func4'), Function(self.program_1_name, 'func5')]
--        reference2.functions = [Function(self.program_1_name, 'func6'), Function(self.program_1_name, 'func7')]
--        reference3.functions = []
--
--        reference1.functions[0].functions_i_call = reference1.functions[1:4:2]  # func1 calls func2, func4
--        reference1.functions[1].functions_i_call = reference1.functions[0:4:3]  # func2 calls func1, func4
--        reference1.functions[2].functions_i_call = [reference1.functions[4]]  # func3 calls func5
--        reference1.functions[3].functions_i_call = reference1.functions[3:5]  # func4 calls func4, func5
--        reference1.functions[4].functions_i_call = reference1.functions[0:3]  # func5 calls func1, func2, func3
--
--        reference2.functions[0].functions_i_call = [reference2.functions[1]]
--
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.program_1_name, 'func1'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.program_1_name, 'func2'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.program_1_name, 'func3'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.program_1_name, 'func4'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.program_1_name, 'func5'))
--
--        self.assertEquals(reference2, self.getter_connection.get_all_functions_called(self.program_1_name, 'func6'))
--
--        self.assertEquals(reference3, self.getter_connection.get_all_functions_called(self.program_1_name, 'func7'))
--
--        self.assertIsNone(self.getter_connection.get_all_functions_called(self.program_1_name, 'nottherightfunction'))
--        self.assertIsNone(self.getter_connection.get_all_functions_called('nottherightprogram', 'func2'))
--
--    def test_12_get_all_functions_called_1(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.one_node_program_name)
--        reference1.functions = []
--
--        d=self.getter_connection.get_all_functions_called(self.one_node_program_name, 'lonefunc')
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_called(self.one_node_program_name,
--                                                                                      'lonefunc'))
--        self.assertIsNone(self.getter_connection.get_all_functions_called(self.one_node_program_name,
--                                                                          'not the right function'))
--        self.assertIsNone(self.getter_connection.get_all_functions_called('not the right program', 'lonefunc'))
--
--    def test_13_get_all_functions_called_blank(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.empty_program_name)
--        reference1.functions = []
--
--        self.assertIsNone(self.getter_connection.get_all_functions_called(self.empty_program_name,
--                                                                          'not the right function'))
--
--    def test_14_get_all_functions_calling(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 1,2,3,4,5 component
--        reference2 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 6,7 component
--        reference3 = FunctionQueryResult(parent_db=None, program_name=self.program_1_name)  # this will be the 7 component
--        reference1.functions = [Function(self.program_1_name, 'func1'), Function(self.program_1_name, 'func2'),
--                                Function(self.program_1_name, 'func3'),
--                                Function(self.program_1_name, 'func4'), Function(self.program_1_name, 'func5')]
--
--        reference1.functions[0].functions_i_call = reference1.functions[1:4:2]  # func1 calls func2, func4
--        reference1.functions[1].functions_i_call = reference1.functions[0:4:3]  # func2 calls func1, func4
--        reference1.functions[2].functions_i_call = [reference1.functions[4]]  # func3 calls func5
--        reference1.functions[3].functions_i_call = reference1.functions[3:5]  # func4 calls func4, func5
--        reference1.functions[4].functions_i_call = reference1.functions[0:3]  # func5 calls func1, func2, func3
--
--        reference2.functions = [Function(self.program_1_name, 'func6'), Function(self.program_1_name, 'func7')]
--
--        reference2.functions[0].functions_i_call = [reference2.functions[1]]
--
--        reference3.functions = [Function(self.program_1_name, 'func6')]
--
--        reference3.functions = []
--
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func1'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func2'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func3'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func4'))
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func5'))
--
--        self.assertEquals(reference2, self.getter_connection.get_all_functions_calling(self.program_1_name,'func7'))
--
--        self.assertEquals(reference3, self.getter_connection.get_all_functions_calling(self.program_1_name, 'func6'))
--
--        self.assertIsNone(self.getter_connection.get_all_functions_calling(self.program_1_name, 'nottherightfunction'))
--        self.assertIsNone(self.getter_connection.get_all_functions_calling('nottherightprogram', 'func2'))
--
--    def test_15_get_all_functions_calling_one_node_prog(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.one_node_program_name)
--        reference1.functions = []
--        self.assertEquals(reference1, self.getter_connection.get_all_functions_calling(self.one_node_program_name,
--                                                                                       'lonefunc'))
--        self.assertIsNone(self.getter_connection.get_all_functions_calling(self.one_node_program_name,
--                                                                           'not the right function'))
--        self.assertIsNone(self.getter_connection.get_all_functions_calling('not the right program', 'lonefunc'))
--
--    def test_16_get_all_functions_calling_blank_prog(self):
--        reference1 = FunctionQueryResult(parent_db=None, program_name=self.empty_program_name)
--        reference1.functions=[]
--
--        self.assertIsNone(self.getter_connection.get_all_functions_called(self.empty_program_name,
--                                                                          'not the right function'))
--
--
--
--    def test_18_get_call_paths_between_two_functions_one_node_prog(self):
--        reference = FunctionQueryResult(parent_db=None, program_name=self.one_node_program_name)
--        reference.functions = [] # that is, reference is the empty program with name self.one_node_program_name
--
--        self.assertEquals(self.getter_connection.get_call_paths(self.one_node_program_name, 'lonefunc', 'lonefunc'),
--                          reference)
--        self.assertIsNone(self.getter_connection.get_call_paths(self.one_node_program_name, 'lonefunc', 'notafunc'))
--        self.assertIsNone(self.getter_connection.get_call_paths(self.one_node_program_name, 'notafunc', 'notafunc'))
--
--    def test_10_validator(self):
--        self.assertFalse(validate_query(''))
--        self.assertTrue(validate_query('thisworks'))
--        self.assertTrue(validate_query('th1sw0rks'))
--        self.assertTrue(validate_query('12345'))
--        self.assertFalse(validate_query('this does not work'))
--        self.assertTrue(validate_query('this_does_work'))
--        self.assertFalse(validate_query("'")) # string consisting of a single quote mark
--
--if __name__ == '__main__':
--    unittest.main()
 === modified file 'src/sextant/test_parser.py'
 --- src/sextant/test_parser.py	2014-11-05 16:09:16 +0000
 +++ src/sextant/test_parser.py	2014-11-25 11:40:53 +0000
@@ -11,20 +11,20 @@
      def setUp(self):
          pass
--    def add_function(self, dct, name, typ):
++    def add_function(self, dct, name, typ, source):
          self.assertFalse(name in dct, "duplicate function added: {} into {}".format(name, dct.keys()))
--        dct[name] = typ
++        dct[name] = (typ, source)
--    def add_call(self, dct, caller, callee):
--        dct[caller].append(callee)
++    def add_call(self, dct, caller, callee, is_internal):
++        dct[caller].append((callee, is_internal))
      def do_parse(self, path=DUMP_FILE, sections=['.text'], ignore_ptrs=False):
          functions = {}
          calls = defaultdict(list)
          # set the Parser to put output in local dictionaries
--        add_function = lambda n, t, s='unknown': self.add_function(functions, n, t)
--        add_call = lambda a, b: self.add_call(calls, a, b)
++        add_function = lambda n, t, s: self.add_function(functions, n, t, s)
++        add_call = lambda a, b, i: self.add_call(calls, a, b, i)
          p = parser.Parser(path, sections=sections, ignore_ptrs=ignore_ptrs,
                            add_function=add_function, add_call=add_call)
@@ -43,12 +43,16 @@
          # ensure that the correct functions are listed with the correct types
          res, funcs, calls = self.do_parse()
--        for name, typ in zip(['normal', 'duplicates', 'wierd$name', 'printf', 'func_ptr_3'],
--                             ['normal', 'normal', 'normal', 'stub', 'pointer']):
++        known = 'parser_test.c'
++        unknown = 'unknown'
++
++        for name, typ, fle in zip(['normal', 'duplicates', 'wierd$name', 'printf', 'func_ptr_3'],
++                                  ['normal', 'normal', 'normal', 'stub', 'pointer'],
++                                  [known, known, known, unknown, unknown]):
              self.assertTrue(name in funcs, "'{}' not found in function dictionary".format(name))
--            self.assertEquals(funcs[name], typ)
++            self.assertEquals(funcs[name][0], typ)
++            self.assertTrue(funcs[name][1].endswith(fle))
--        self.assertFalse('__gmon_start__' in funcs, "don't see a function defined in .plt")
      def test_no_ptrs(self):
          # ensure that the ignore_ptrs flags is working
@@ -61,17 +65,17 @@
      def test_calls(self):
          res, funcs, calls = self.do_parse()
--        self.assertTrue('normal' in calls['main'])
--        self.assertTrue('duplicates' in calls['main'])
++        self.assertTrue(('normal', True) in calls['main'])
++        self.assertTrue(('duplicates', True) in calls['main'])
          normal_calls = sorted(['wierd$name', 'printf', 'func_ptr_3'])
--        self.assertEquals(sorted(calls['normal']), normal_calls)
++        self.assertEquals(sorted(zip(*calls['normal'])[0]), normal_calls)
--        self.assertEquals(calls['duplicates'].count('normal'), 2)
--        self.assertEquals(calls['duplicates'].count('printf'), 2,
++        self.assertEquals(calls['duplicates'].count(('normal', True)), 1)
++        self.assertEquals(calls['duplicates'].count(('printf', False)), 1,
                            "expected 2 printf calls in {}".format(calls['duplicates']))
--        self.assertTrue('func_ptr_4' in calls['duplicates'])
--        self.assertTrue('func_ptr_5' in calls['duplicates'])
++        self.assertTrue(('func_ptr_4', True) in calls['duplicates'])
++        self.assertTrue(('func_ptr_5', True) in calls['duplicates'])
      def test_sections(self):
          res, funcs, calls = self.do_parse(sections=['.plt', '.text'])
 === modified file 'src/sextant/test_resources/parser_test.dump'
 --- src/sextant/test_resources/parser_test.dump	2014-10-16 15:21:43 +0000
 +++ src/sextant/test_resources/parser_test.dump	2014-11-25 11:40:53 +0000
@@ -20,20 +20,45 @@
 f0 <frame_dummy>:
 f:	call   *%eax
  0804841d <normal>:
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:14
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:18
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:20
   8048430:	call   8048458 <wierd$name>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:21
   8048443:	call   80482f0 <printf@plt>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:22
   8048451:	call   *%eax
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:24
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:25
  08048458 <wierd$name>:
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:29
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:30
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:31
  08048460 <duplicates>:
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:35
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:36
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:39
 b:	call   80482f0 <printf@plt>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:40
 e:	call   80482f0 <printf@plt>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:42
   8048499:	call   804841d <normal>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:43
 a4:	call   804841d <normal>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:45
 b2:	call   *%eax
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:46
 bd:	call   *%eax
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:48
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:49
 c4 <main>:
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:53
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:54
 d4:	call   804841d <normal>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:55
 e0:	call   8048460 <duplicates>
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:56
++/home/benhutc/filter-search/src/sextant/test_resources/parser_test.c:57
 f0 <__libc_csu_init>:
 f6:	call   8048350 <__x86.get_pc_thunk.bx>
 e:	call   80482b4 <_init>
 === modified file 'src/sextant/update_db.py'
 --- src/sextant/update_db.py	2014-11-13 14:01:55 +0000
 +++ src/sextant/update_db.py	2014-11-25 11:40:53 +0000
@@ -9,9 +9,9 @@
  __all__ = ("upload_program", "delete_program")
--from .db_api import SextantConnection
--from .sshmanager import SSHConnectionError
--from .objdump_parser import Parser, run_objdump
++from db_api import SextantConnection
++from sshmanager import SSHConnectionError
++from objdump_parser import Parser, run_objdump
  from os import path
  from time import time
  import subprocess
 === modified file 'src/sextant/web/server.py'
 --- src/sextant/web/server.py	2014-11-13 17:30:38 +0000
 +++ src/sextant/web/server.py	2014-11-25 11:40:53 +0000
@@ -92,18 +92,41 @@
          file_out.close()
          return output
++    @classmethod
++    def _proc_args(self, arg_dict):
++        # Sanitize the arg_dict. For keys which we require as function
++        # arguements, convert to correct types if they are given (ie not '')
++        # and otherwise delete the entries. Note that all values in the
++        # dictionary are contained in a list.
++        for func in ('function_calling', 'function_called'):
++            value = arg_dict[func][0]
++            if not value:
++                del arg_dict[func]
++
++        limit_internal = arg_dict['limit_internal'][0]
++        if limit_internal != 'null':
++            # limit_internal may be 'null', 'true', or 'false'.
++            arg_dict['limit_internal'] = [(limit_internal == 'true')]
++        else:
++            del arg_dict['limit_internal']
++
++        max_depth = arg_dict['max_depth'][0]
++        if max_depth:
++            # max_depth is a string which may be empty or contain an integer.
++            arg_dict['max_depth'] = [int(max_depth)]
++        else:
++            del arg_dict['max_depth']
++
++        return arg_dict
++
      @defer.inlineCallbacks
      def _render_plot(self, request):
--        # the items in the args dict are lists - so use .get()[0] to retrieve
--        args = request.args
++        args = self._proc_args(request.args)
          res_code = RESPONSE_CODE_OK
          res_msg = None # set this in the logic
--        #
          # Check if provided program name exists
--        #
--
          name = args.get('program_name', [None])[0]
          if "program_name" is None:
@@ -117,9 +140,7 @@
                  res_fmt = "Program {} not found in database."
                  res_msg = res_fmt.format(escape(name))
--        #
          # We have a connection and a valid program - now setup the query
--        #
          # We store query info as:
          # <query_name>: (<function>, (<known args>), (<kwargs>)
@@ -132,19 +153,21 @@
              ),
              'functions_calling': (
                  CONNECTION.get_all_functions_calling,
--                ('func1',)
++                ('function_calling', 'limit_internal', 'max_depth')
              ),
              'functions_called_by': (
                  CONNECTION.get_all_functions_called,
--                ('func1',)
++                ('function_calling', 'limit_internal', 'max_depth')
              ),
              'all_call_paths': (
                  CONNECTION.get_call_paths,
--                ('func1', 'func2')
++                ('function_calling', 'function_called',
++                 'limit_internal', 'max_depth')
              ),
              'shortest_call_path': (
                  CONNECTION.get_shortest_path_between_functions,
--                ('func1', 'func2')
++                ('function_calling', 'function_called',
++                 'limit_internal', 'max_depth')
+             )
+         }
@@ -162,12 +185,11 @@
          # extract any required keyword arguments from request.args
          if res_code is RESPONSE_CODE_OK:
--            fn, kwargs = query
--
--            # all args will be strings - use None to indicate missing argument
--            req_args = tuple(args.get(kwarg, [None])[0] for kwarg in kwargs)
--            missing_args = [kwarg for (kwarg, req_arg) in zip(kwargs, req_args)
--                                if req_arg is None]
++            fn, keys = query
++
++            # None indicates a missing argument
++            req_kwargs = dict(((key, args.get(key, [None])[0]) for key in keys))
++            missing_args = [key for key in keys if req_kwargs[key] is None]
              if missing_args:
                  # missing a kwarg from request.args
@@ -180,13 +202,15 @@
              try:
                  print('running query {}'.format(datetime.now()))
                  program = yield defer_to_thread_with_timeout(render_timeout, fn,
--                                                             name, *req_args)
++                                                             name, **req_kwargs)
                  print('\tdone {}'.format(datetime.now()))
              except Exception as e:
                  # the timeout has fired and cancelled the request
                  res_code = RESPONSE_CODE_BAD_REQUEST
                  res_msg = "{}".format(e)
                  print('\tfailed {}'.format(datetime.now()))
++                raise
++
          if res_code is RESPONSE_CODE_OK:
              # we have received a response to our request
@@ -194,7 +218,7 @@
                  res_code = RESPONSE_CODE_NOT_FOUND
                  res_fmt = ("At least one of the input functions '{}' was not "
                             "found in program {}.")
--                res_msg = res_fmt.format(', '.join(req_args), escape(name))
++                res_msg = res_fmt.format(', '.join(args), escape(name))
              elif not program.functions:
                  res_code = RESPONSE_CODE_NO_CONTENT
                  res_fmt = ("The program {} is in the database but has no "
@@ -237,7 +261,6 @@
          max_funcs = AUTOCOMPLETE_NAMES_LIMIT + 1
          programs = CONNECTION.programs_with_metadata()
          result = CONNECTION.get_function_names(program_name, search, max_funcs)
--        print(search, len(result))
          return result if len(result) < max_funcs else set()