'apropos' maxes out CPU when run with '/bin/*' as argument

Bug #927028 reported by Ben Okopnik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
man-db (Ubuntu)
Fix Released
Medium
Colin Watson

Bug Description

When 'apropos' is executed with an argument containing a list of files in a directory (e.g., '/bin/*'), it maxes out the CPU for a good long while (~40 seconds for /bin/*, about 9.5 minutes for /usr/bin/[a-z]*).

--------------------------------------------------------------------------------------------------------------------------------------------
ben@feynman:~$ apropos ~/bin/* & sleep 2; top -b -p $! -d 1 -n 3
[2] 9671

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 9671 ben 20 0 7080 2224 928 R 100 0.2 0:02.46 apropos

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 9671 ben 20 0 7012 2160 928 R 101 0.2 0:03.45 apropos

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 9671 ben 20 0 7012 2164 928 R 100 0.2 0:04.44 apropos
--------------------------------------------------------------------------------------------------------------------------------------------

ben@feynman:~$ dpkg -S /usr/bin/apropos
man-db: /usr/bin/apropos

ben@feynman:~$ lsb_release -rd
Description: Ubuntu 11.10
Release: 11.10

ben@feynman:~$ apt-cache policy man-db
man-db:
  Installed: 2.6.0.2-2
  Candidate: 2.6.0.2-2
  Version table:
 *** 2.6.0.2-2 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/main i386 Packages
        100 /var/lib/dpkg/status

Tags: bot-comment

Related branches

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/927028/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → man-db (Ubuntu)
Revision history for this message
Colin Watson (cjwatson) wrote :

I'm not quite sure why you'd do this, but I assume this is a reduced test case based on something else. :-) I agree that it's a bug, thanks; I'll have a look at it as upstream.

Changed in man-db (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Ben Okopnik (ben-okopnik) wrote :

Yep, what I sent was a demo of the problem. As to why you'd do this... um, newbie Linux users wanting to see what all the standard utlities do, for example? That makes for a rather useful functional reference. Given that part of what I do is teach such newbies, I personally would find it nice to be able to recommend it to them. :)

(Obviously, I can use a 'for' loop to do the same thing for now.)

Revision history for this message
Colin Watson (cjwatson) wrote :

Something like 'apropos /bin/bash' doesn't do anything useful anyway. The 'whatis' program sounds closer to what you want, but needs to be invoked as 'whatis bash' rather than 'whatis /bin/bash'.

So, it sounds like I have two bugs here:

 1) apropos reopens and rescans the database for every command-line argument, which is inefficient;
 2) whatis doesn't do the right thing when given the full path to an executable as an argument (compare the change to man in man-db 2.5.7 that made this work for that program).

Revision history for this message
Colin Watson (cjwatson) wrote :

Incidentally you'll find that, besides giving what I think is probably more like the right answer, 'whatis <long list>' is much quicker. It still has a similar inefficiency where it reopens the database every time, but whatis has a much quicker per-item search than apropos does (it only has to look up a single database key rather than scanning them all) so it doesn't matter so much.

  $ bash -c 'cd /bin; time whatis * >/dev/null 2>&1'

  real 0m0.042s
  user 0m0.012s
  sys 0m0.008s

Revision history for this message
Ben Okopnik (ben-okopnik) wrote :

> Something like 'apropos /bin/bash' doesn't do anything useful anyway.

Right, but 'cd /bin; apropos *' does - and still has exactly the same problem as I originally reported. As to 'whatis', I had tested it before writing up this bug; it produces a totally different set of results, including throwing a dozen 'nothing appropriate' warnings when run in a similar manner (which also seems wrong, to me: e.g., 'apropos nc.openbsd' has no problem telling me what it is, but 'whatis' does):

ben@feynman:/bin$ apropos * > /tmp/apr.out
ben@feynman:/bin$ whatis * > /tmp/what.out
nc.openbsd: nothing appropriate.
ntfsck: nothing appropriate.
ntfsdecrypt: nothing appropriate.
ntfsdump_logfile: nothing appropriate.
ntfsmftalloc: nothing appropriate.
ntfsmove: nothing appropriate.
ntfstruncate: nothing appropriate.
ntfswipe: nothing appropriate.
plymouth: nothing appropriate.
plymouth-upstart-bridge: nothing appropriate.
static-sh: nothing appropriate.
ulockmgr_server: nothing appropriate.
ben@feynman:/bin$ wc -l /tmp/{apr,what}.out
  6214 /tmp/apr.out
   161 /tmp/what.out
  6375 total

One possible (brute-force) answer to all of this might be to have 'apropos' only accept a single argument. It wouldn't stop anyone from trying the above with, say, a 'for' loop, but at least they'd be aware that they're running a largish number of processes. Also, given that 'whatis' isn't SUID, having both it and apropos resolve paths shouldn't affect security.

Thank you for both your advice and your time and effort in resolving this.

Revision history for this message
Colin Watson (cjwatson) wrote :

If "nc.openbsd" only appears in the description rather than in the page name, then apropos will show it but whatis won't; that's the essence of the distinction between those two programs. It rather depends what you want, but mostly, apropos is more useful for queries where you can remember a keyword but not exactly what the page is called. It typically produces far too much noise when given a command name, particularly for some of the short ones.

There's actually a bug where apropos never prints "nothing appropriate" even when it couldn't find any matches for an argument, which I noticed and fixed while fixing this bug.

There should be no need to restrict apropos' input. I've fixed this properly for man-db 2.6.2:

Wed Feb 22 03:04:45 GMT 2012 Colin Watson <email address hidden>

        Optimise apropos when given many arguments (Ubuntu bug #927028).

        * src/whatis.c (use_grep, do_whatis, parse_name, parse_whatis,
          do_apropos, search): Operate on multiple pages.
          (use_grep, do_whatis, do_apropos): Update an output array rather
          than returning an int.
          (parse_name, parse_whatis): Update an output array as well as
          returning an int.
          (display, do_whatis_section): Constify page argument.
          (match): Constify lowpage and whatis arguments.
          (main): Process all arguments using a single call to search.
        * NEWS: Document this.

I've yet to work on path resolution in whatis, but that should be easy.

Changed in man-db (Ubuntu):
status: Triaged → Fix Committed
assignee: nobody → Colin Watson (cjwatson)
Revision history for this message
Colin Watson (cjwatson) wrote :

And here's the other part:

Mon Feb 27 13:26:47 GMT 2012 Colin Watson <email address hidden>

        * src/whatis.c (main): Move locale manpath expansion to ...
          (locale_manpath): ... here (new function).
          (suitable_manpath): New function.
          (do_whatis): If a page contains a slash and is a path to an
          executable on $PATH, then look up its base name only in
          appropriate manual hierarchies.
          (search): Pass current manpath entry to do_whatis.
        * src/tests/whatis-1: New file.
        * src/tests/Makefile.am (ALL_TESTS): Add whatis-1.
        * NEWS: Document this.

(I deliberately made this change only for whatis, and not for apropos. This is because whatis is documented as taking manual page names as arguments, while apropos is documented as taking keywords (which are regular expressions by default but may be in other formats). Extending the domain of exact manual page names to include executables that have associated manual pages seems fairly safe to me, but I don't feel comfortable with fiddling with the interpretation of the more general search strings that you give to apropos.)

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package man-db - 2.6.2-1

---------------
man-db (2.6.2-1) unstable; urgency=low

  * New upstream release:
    - Optimise apropos when given many arguments (LP: #927028).
    - apropos prints an error message and returns non-zero when it finds no
      matches (closes: #672661).
    - Avoid fatal errors when opening a 64-bit GDBM database from a 32-bit
      process (LP: #1001189).
  * Configure with --with-xz=xz --with-lzip=lzip.
  * Adjust debian/watch to track .tar.xz releases.
  * Convert debian/copyright to copyright-format 1.0.
  * Override hardening-no-fortify-functions Lintian warning for
    /usr/bin/manpath, as a false positive.

 -- Colin Watson <email address hidden> Mon, 18 Jun 2012 22:56:56 +0100

Changed in man-db (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.