Merge lp:~jjed/archive-crawler/use-aptfile into lp:~mvo/archive-crawler/mvo

Proposed by Jjed
Status: Merged
Merged at revision: 124
Proposed branch: lp:~jjed/archive-crawler/use-aptfile
Merge into: lp:~mvo/archive-crawler/mvo
Diff against target: 1259 lines (+704/-202)
15 files modified
ArchiveCache/__init__.py (+313/-0)
ArchiveCrawler/__init__.py (+13/-6)
DesktopDataExtractor/__init__.py (+74/-127)
IconFinder/__init__.py (+90/-0)
data/icon_search.cfg (+0/-11)
getMenuData.py (+7/-4)
tests/archive/dists/testing/Contents_amd64 (+5/-0)
tests/archive/dists/testing/Contents_i386 (+3/-0)
tests/archive/pool/get_pkgs.sh (+10/-0)
tests/test_archive_cache.py (+96/-0)
tests/test_cmd_data_extractor.py (+5/-5)
tests/test_cralwer.py (+11/-11)
tests/test_deb_package.py (+1/-1)
tests/test_desktop_data_extractor.py (+32/-37)
tests/test_icon_finder.py (+44/-0)
To merge this branch: bzr merge lp:~jjed/archive-crawler/use-aptfile
Reviewer Review Type Date Requested Status
Michael Vogt Pending
Review via email: mp+62944@code.launchpad.net

Description of the change

Apologies for the huge patch. I abstracted IconFinder out, and that
ballooned the line count of this solution quite a bit.

SUMMARY

This branch adds a new component, `IconFinder`, which should solve the
issue of finding icons in an archive. It is a frontend for `ArchiveCache`,
a general solution for searching archive files via Contents-<ARCH>.gz.

In the future, `ArchiveCache` might replace `ArchiveCrawler` as a cleaner,
quicker, and more modularity-friendly solution. As it stands, only `IconFinder` uses it.

DESCRIPTION

`ArchiveCache` loads all lines from a Content.gz file and either writes them
to files or stores them in RAM depending on memory constraints. It provides
methods to search them for files and paths.

Its major feature over a simple grep is the separate caching of index prefixes,
allowing a frontend to quickly search only the files in a certain directory
(eg /usr/share/icons). `IconFinder` is a simple wrapper for this functionality,
searching a sequence of prefixes for a match.

`IconFinder` integration replaces most of my work in `DesktopDataExtractor`.
`ArchiveCrawler` now finds all deb paths before crawling, so as to allow
`IconFinder` access to any deb from the start.

`DesktopDataExtractor` now takes `archive_dir` rather than `pool_dir`, the
pool directory being found instead at `archive_dir`/pool, and the contents
file at `archive_dir`/Contents-<ARCH>.gz.

TESTING

Out of 361 desktop entries I tested (an old main[a-k] + universe[2-d]) with
343 icon requests, 13 failed. Of these, 11 failed because they requested
non-existent icons, 1 failed because it requested an icon too large ('arista'),
and 1 failed because it stored its icon in /usr/lib without a symlink in
/usr/share or and absolute path request ('batmon.app').

The unit tests that pass as of r123 pass now, as well as new ones for
`IconFinder` and `ArchiveCache`.

FUTURE

Hopefully this branch -- together with a solution for extracting large icons
-- will solve #599535. With the exception of arista, all icons that fail to
extract should now be the fault of the package. Hopefully. We'll see if the
full archive has an interesting edge cases.

To post a comment you must log in.
Revision history for this message
Michael Vogt (mvo) wrote :

On Tue, May 31, 2011 at 05:55:57AM -0000, Jacob Johan Edwards wrote:
> Jacob Johan Edwards has proposed merging lp:~j-johan-edwards/archive-crawler/use-aptfile into lp:~mvo/archive-crawler/mvo.
>
> Requested reviews:
> Michael Vogt (mvo)
>
> For more details, see:
> https://code.launchpad.net/~j-johan-edwards/archive-crawler/use-aptfile/+merge/62944
>
> Apologies for the huge patch. I abstracted IconFinder out, and that
> ballooned the line count of this solution quite a bit.

That is fine, the content is absolutely worth it!

> SUMMARY
>
> This branch adds a new component, `IconFinder`, which should solve the
> issue of finding icons in an archive. It is a frontend for `ArchiveCache`,
> a general solution for searching archive files via Contents-<ARCH>.gz.
>
> In the future, `ArchiveCache` might replace `ArchiveCrawler` as a cleaner,
> quicker, and more modularity-friendly solution. As it stands, only `IconFinder` uses it.

Awsome, this is great work already and has even more potential!

[..]
> TESTING
>
> Out of 361 desktop entries I tested (an old main[a-k] + universe[2-d]) with
> 343 icon requests, 13 failed. Of these, 11 failed because they requested
> non-existent icons, 1 failed because it requested an icon too large ('arista'),
> and 1 failed because it stored its icon in /usr/lib without a symlink in
> /usr/share or and absolute path request ('batmon.app').
>
> The unit tests that pass as of r123 pass now, as well as new ones for
> `IconFinder` and `ArchiveCache`.

Thanks for updating the tests as well (and adding more!), it looks
really well done.

> FUTURE
>
> Hopefully this branch -- together with a solution for extracting large icons
> -- will solve #599535. With the exception of arista, all icons that fail to
> extract should now be the fault of the package. Hopefully. We'll see if the
> full archive has an interesting edge cases.

I'm running a full oneiric extraction now, it will be interessting to
see what it outputs and I will compare to the previous runs to find
regressions (but I really doubt there will be any).

Thanks,
 Michael

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added directory 'ArchiveCache'
2=== added file 'ArchiveCache/__init__.py'
3--- ArchiveCache/__init__.py 1970-01-01 00:00:00 +0000
4+++ ArchiveCache/__init__.py 2011-05-31 05:55:54 +0000
5@@ -0,0 +1,313 @@
6+import gzip
7+import hashlib
8+import logging
9+import os.path
10+import re
11+import tempfile
12+
13+
14+class ArchiveCache(object):
15+ """
16+ This class caches the file paths in a deb archive--as described by
17+ that archive's Contents-<ARCH>.gz--to make search of them efficient.
18+ It can also index subsets of the archives (eg all paths beginning
19+ with /usr/share/icons/) for quicker informed searches.
20+
21+ Indexes are created at instantiation via the prefixes_to_index
22+ argument. They can then be searched simply by using the `prefix`
23+ argument in search methods.
24+
25+ In the event of ArchiveCache runs out of memory to allocate, it
26+ will prioritize the indexes in order, and write all excess data
27+ to cache files.
28+ """
29+
30+ def __init__(self, contents_file, prefixes_to_index=[], memory=500):
31+ """
32+ @contents_file: The path to a Contents-<ARCH>.gz file.
33+ @prefixes_to_index: A list of path prefixes (eg /usr/share is a
34+ prefix of /usr/share/*) to separately cache
35+ and make searchable for better speeds.
36+ @memory: The maximum amount of data in the contents_file (in MB)
37+ that the cache will load into memory.
38+ """
39+ self._contents_file = contents_file
40+ self._memory, self._max_memory = 0, memory * 1024 * 1024
41+ self._num_lines = 0
42+ self._checksum = None
43+ self._create_indexes(prefixes_to_index)
44+ self._load_cache()
45+
46+ def search(self, regex, prefix="", first_only=False):
47+ """
48+ Returns a list of (package name, path) pairs, where "path" is a
49+ file location in the archive that the search term matches, and
50+ "package name" is the name of a package that contains it.
51+
52+ @regex: A string regular expression to filter paths with.
53+ @prefix: A string path from prefix_indexes. Only paths starting
54+ with the prefix will be searched. Speeds up queries.
55+ @first_only: Set True to search only until finding one hit.
56+ """
57+ prefix = prefix.strip("/")
58+ if prefix:
59+ index = self._indexes[prefix]
60+ logging.debug("Searching cache for all files '%s' with prefix %s" %
61+ (regex, prefix))
62+ else:
63+ index = self._contents_index
64+ logging.debug("Searching cache for all files '%s'" % regex)
65+ results = index.search(regex, first_only)
66+ if results and first_only:
67+ return results
68+ for p in self._prefixes:
69+ if p.startswith(prefix) and p != prefix:
70+ child_index = self._indexes[p]
71+ results.extend(child_index.search(regex, first_only))
72+ if results and first_only:
73+ return results
74+ return results
75+
76+ def search_exact(self, path):
77+ """
78+ Returns the name of a package that contains the exact `path`
79+ location, or None if `path` does not exist in the archive.
80+ """
81+ path = path.strip("/")
82+ logging.debug("Searching cache for '%s'" % path)
83+ prefix = self._find_index(path)
84+ if prefix:
85+ return self._indexes[prefix].search_exact(path)
86+ return self._contents_index.search_exact(path)
87+
88+ def _create_indexes(self, prefixes):
89+ self._indexes = {}
90+ self._prefixes = tuple([p.strip("/") for p in prefixes])
91+ # indexes for user-defined paths like '/usr/share/icons'
92+ for prefix in self._prefixes:
93+ self._indexes[prefix] = _Index()
94+ # an index for lines of all other paths
95+ self._contents_index = _Index()
96+
97+ def _load_cache(self):
98+ logging.info("Loading archive paths from %s into cache" %
99+ os.path.basename(self._contents_file))
100+ contents = gzip.open(self._contents_file)
101+ while not re.match("^FILE\W+LOCATION", contents.readline()):
102+ continue
103+ for line in contents:
104+ self._memory += self._place_in_index(line)
105+ if self._memory > self._max_memory:
106+ self._reduce_memory_use()
107+ self._num_lines += 1
108+ contents.close()
109+ logging.info("Done loading cache. %iMB in active memory." %
110+ (self._memory / 1048576))
111+
112+ def _place_in_index(self, line):
113+ # put a line in its right index and return the amount of memory it took
114+ path = line.split()[0]
115+ prefix = self._find_index(path)
116+ if prefix:
117+ return self._indexes[prefix].add_line(line)
118+ else:
119+ return self._contents_index.add_line(line)
120+
121+ def _find_index(self, path):
122+ # get the longest index path prefixing the given path
123+ candidate_index = ""
124+ for prefix in self._prefixes:
125+ if path.startswith(prefix) and len(prefix) > len(candidate_index):
126+ candidate_index = prefix
127+ return candidate_index
128+
129+ def _reduce_memory_use(self):
130+ if self._contents_index.bytes_loaded:
131+ logging.info("* Max cache load exceeded. Removing non-indexed"
132+ "paths from memory")
133+ self._contents_index.flush_memory()
134+ return
135+ for prefix in reversed(self._prefixes):
136+ if self._indexes[prefix].bytes_loaded:
137+ logging.info("Max cache load exceeded. Removing '%s' index "
138+ "from memory" % prefix)
139+ self._indexes[prefix].flush_memory()
140+ return
141+
142+ def __len__(self, prefix=""):
143+ """
144+ Returns the number of paths searchable in this cache. Assign
145+ `prefix` to count the paths in a particular index.
146+ """
147+ if not prefix:
148+ return self._num_lines
149+ else:
150+ # the length of the prefix and its children (eg /usr/ -> /usr/share)
151+ prefix = prefix.strip("/")
152+ num_lines = self._indexes[prefix].num_lines
153+ for p in self._prefixes:
154+ if p.startswith(prefix + "/"):
155+ num_lines += self._indexes[p].num_lines
156+ return num_lines
157+
158+ def __contains__(self, prefix):
159+ return prefix.strip("/") in self._prefixes
160+
161+ def _get_checksum(self):
162+ if not self._checksum:
163+ contents = open(self._contents_file, 'r')
164+ data = contents.read()
165+ contents.close()
166+ self._checksum = hashlib.md5(data).hexdigest()
167+ return self._checksum
168+
169+ checksum = property(_get_checksum, doc="""
170+ An MD5 hash of this cache's Contents.gz file""")
171+
172+ prefixes = property(lambda self: self._prefixes, doc="""
173+ The path prefixes that the cache has indexed for searching""")
174+
175+
176+class _Index:
177+ # a helper implementation class for indexes: cached lines all beginning
178+ # with the same prefix
179+ def __init__(self):
180+ self.bytes_loaded = 0
181+ self.num_lines = 0
182+ self._file = tempfile.TemporaryFile()
183+ self._lines = []
184+ self._using_memory = True
185+
186+ def add_line(self, line):
187+ # add a line to the cache file, and memory if possible
188+ self.num_lines += 1
189+ self._file.write(line)
190+ if self._using_memory:
191+ to_store = line.strip()
192+ self._lines.append(to_store)
193+ bytes_added = len(to_store)
194+ self.bytes_loaded += bytes_added
195+ return bytes_added
196+ return 0
197+
198+ def flush_memory(self):
199+ # stop caching lines in memory
200+ self._lines = None
201+ self.bytes_loaded = 0
202+ self._using_memory = False
203+
204+ def search_exact(self, path):
205+ if self._using_memory:
206+ line = _search_lines_exact(path, self._lines)
207+ else:
208+ line = _search_file_exact(path, self._file)
209+ if not line:
210+ return None
211+ # the right-most part of a Contents file is a package for the path
212+ parts = line.rsplit("/", 1)
213+ if len(parts) != 2:
214+ logging.warning("Invalid contents line: %s" % line)
215+ return None
216+ return parts[1].strip()
217+
218+ def search(self, regex, first_only=False):
219+ lines = _search_pattern(regex, self._file, self._lines, first_only)
220+ results = []
221+ for line in lines:
222+ parts = line.split()
223+ if len(parts) != 2:
224+ logging.warning("Invalid contents line: %s" % line)
225+ continue
226+ package_data = parts[1].rsplit("/", 1)
227+ if len(package_data) != 2:
228+ logging.warning("Invalid contents line: %s" % line)
229+ continue
230+ path, package_name = parts[0], package_data[1]
231+ results.append((package_name, path))
232+ return results
233+
234+
235+# Utility functions for quickly searching files according to the
236+# following line format used by Contents-<ARCH>:
237+#
238+# absolute/path/to/file section/cat/pkgname
239+
240+def _search_file_exact(search_term, file):
241+ # Searches the file at `file` for a line starting with
242+ # `search_term`. Assumes lines are in alphabetical order. Returns
243+ # None if there is no match.
244+ file.seek(0, 2)
245+ low, high = 0, file.tell()
246+
247+ # perform a binary search on the file
248+ while high > low:
249+ mid = (low + high) / 2
250+ file.seek(mid)
251+
252+ # read current line from the beginning
253+ while file.tell() != 0 and file.read(1) != '\n':
254+ file.seek(-2, os.SEEK_CUR)
255+ line_begin = file.tell()
256+ line = file.readline()
257+ line_end = file.tell()
258+
259+ first_term = line.split()[0]
260+ if search_term.startswith(first_term):
261+ # return if first term matches search term
262+ if len(search_term) == len(first_term):
263+ return line
264+ # or if the first term looks like a directory (symlink)
265+ elif (len(search_term) > len(first_term) and
266+ search_term[len(first_term)] == "/"):
267+ return line
268+ # otherwise continue the binary search
269+ match = cmp(search_term, first_term)
270+ if match < 0:
271+ high = line_begin
272+ elif match > 0:
273+ low = line_end
274+ else:
275+ return line
276+
277+
278+def _search_lines_exact(search_term, lines):
279+ # Searches `lines` for a line starting with `search_term`. Assumes
280+ # lines are in alphabetical order. Returns None if there is no match.
281+ low, high = 0, len(lines)
282+ while high > low:
283+ mid = (low + high) / 2
284+ first_term = lines[mid].split()[0]
285+ if search_term.startswith(first_term):
286+ # return if first term matches search term
287+ if len(search_term) == len(first_term):
288+ return lines[mid]
289+ # or if the first term looks like a directory (symlink)
290+ elif (len(search_term) > len(first_term) and
291+ search_term[len(first_term)] == "/"):
292+ return lines[mid]
293+ # otherwise continue the binary search
294+ match = cmp(search_term, first_term)
295+ if match < 0:
296+ high = mid
297+ elif match > 0:
298+ low = mid + 1
299+ else:
300+ return lines[mid]
301+
302+
303+def _search_pattern(search_pattern, file, lines=None, first_only=False):
304+ # Searches the file at `file` or the `line_list` for all lines where
305+ # the first term (whitespace delimited) matches the `search_pattern`
306+ # regex string. Use `lines` if they are already loaded.
307+ pattern = re.compile(search_pattern.replace('+', '\+'))
308+ results = []
309+ if lines == None:
310+ lines = file
311+ lines.seek(0)
312+ for line in lines:
313+ first_term = line.split()[0]
314+ if pattern.match(first_term):
315+ results.append(line)
316+ if first_only:
317+ return tuple(results)
318+ return tuple(results)
319
320=== modified file 'ArchiveCrawler/__init__.py'
321--- ArchiveCrawler/__init__.py 2011-04-13 13:15:17 +0000
322+++ ArchiveCrawler/__init__.py 2011-05-31 05:55:54 +0000
323@@ -48,6 +48,7 @@
324 self._loadDebFilesDone()
325 self.callbacks = set()
326 self.pkgs_to_debpath = {}
327+ self.pkgs_to_debinfo = {}
328
329 def registerCallback(self, c):
330 if not callable(c):
331@@ -93,8 +94,8 @@
332 logging.debug("adding '%s' to debfiles_done" % debfile)
333 self.debfiles_done.add(debfile)
334
335- def inspectDeb(self, debfile):
336- #logging.debug("inspectDeb %s" % debfile)
337+ def indexDeb(self, debfile):
338+ # add debfile to self.pkgs_to_debpath if it's valid for this crawler
339 m = re.match(".*/(.*)_(.*)_(.*).deb", debfile)
340 pkgname = m.group(1)
341 pkgver = urllib.unquote(m.group(2))
342@@ -130,16 +131,20 @@
343 logging.debug("skipping, compoent does not match (expected '%s' got '%s' "% (component, debfile))
344 return False
345 # add to mapping of name and deb
346- # it may be needed if it contains an application's icon
347 self.pkgs_to_debpath[pkgname] = debfile
348- # ... then filter if done already
349+ self.pkgs_to_debinfo[pkgname] = (candVer, pkgarch, component)
350+ return True
351+
352+ def inspectDeb(self, pkgname):
353+ debfile = self.pkgs_to_debpath[pkgname]
354+ candVer, pkgarch, component = self.pkgs_to_debinfo[pkgname]
355+ # ... filter if done already
356 if debfile in self.debfiles_done:
357 logging.debug("skipping, already in debfiles_done '%s'" % debfile)
358 return False
359
360 # looks like we have a valid ver
361 logging.debug("found valid deb: '%s'" % debfile)
362-
363
364 # pass the epoch here too, this information is not encoded in the
365 # filename
366@@ -152,7 +157,9 @@
367 for (root, dirs, files) in os.walk(self.pooldir):
368 for f in files:
369 if f.endswith(".deb"):
370- self.inspectDeb(os.path.join(root,f))
371+ self.indexDeb(os.path.join(root, f))
372+ for pkg in self.pkgs_to_debpath:
373+ self.inspectDeb(pkg)
374 self._saveDebFilesDone()
375
376 def findOrphanedFiles(self):
377
378=== modified file 'DesktopDataExtractor/__init__.py'
379--- DesktopDataExtractor/__init__.py 2011-04-19 13:54:07 +0000
380+++ DesktopDataExtractor/__init__.py 2011-05-31 05:55:54 +0000
381@@ -19,6 +19,8 @@
382 import time
383
384 import ArchiveCrawler
385+import ArchiveCache
386+import IconFinder
387
388 try:
389 # only available on 2.7 so we provide a very simple backport
390@@ -30,16 +32,19 @@
391
392 SUPPORTED_ARCHES = ("i386","amd64")
393
394- def __init__(self, aptroot, pooldir, datadir):
395+ def __init__(self, aptroot, archivedir, datadir, dist):
396 # init dirs
397 self.aptroot = aptroot
398- self.pooldir = pooldir
399+ self.archivedir = archivedir
400+ self.pooldir = os.path.join(archivedir, 'pool')
401 self.datadir = datadir
402+ self.dist = dist
403 self.tmpdir = tempfile.mkdtemp()
404 self.menu_data = os.path.join(self.datadir, "menu-data")
405 self.menu_data_codecs = os.path.join(self.datadir, "menu-data-codecs")
406 if not os.path.exists(self.menu_data_codecs):
407 raise Exception, "no menu-data-codecs/ dir in %s" % datadir
408+
409
410 self.codecs_foradditional = { }
411 # packages we have already seen
412@@ -62,12 +67,6 @@
413 # available in certain arches
414 self.pkgs_per_arch = {}
415 self.pkgs_per_arch["all"] = set()
416- # a mapping of package names to wanted application icons their
417- # packages don't contain
418- self.pkgs_to_missing_icons = {}
419- # regular expressions for finding packages that might contain
420- # wanted icons
421- self.iconsearch_regex = []
422 # now read the config
423 self._readConfig()
424
425@@ -89,7 +88,6 @@
426 blacklist_desktop = os.path.join(self.datadir,"blacklist_desktop.cfg")
427 renamecfg = os.path.join(self.datadir,"rename.cfg")
428 annotatecfg = os.path.join(self.datadir,"annotate.cfg")
429- iconsearchcfg = os.path.join(self.datadir,"icon_search.cfg")
430 if os.path.exists(blacklist):
431 logging.info("using blacklist: '%s'" % blacklist)
432 for line in open(blacklist).readlines():
433@@ -124,14 +122,6 @@
434 annotations = annotations_str.split(",")
435 logging.debug("annotations: '%s': %s" % (desktopfile,annotations))
436 self.desktop_annotate[desktopfile] = annotations
437- if os.path.exists(iconsearchcfg):
438- logging.info("using icon search: '%s'" % iconsearchcfg)
439- for line in open(iconsearchcfg):
440- line = line.strip()
441- if line != "" and not line.startswith("#"):
442- logging.debug("icon search regex: '%s'" % line)
443- self.iconsearch_regex.append(line)
444-
445
446
447 def extract(self):
448@@ -144,8 +134,19 @@
449 self._cleanOrphans()
450
451 logging.info("Starting extraction in %s" % self.pooldir)
452+ contents = {}
453+ for arch in self.SUPPORTED_ARCHES:
454+ contents_file = "Contents-%s.gz" % arch
455+ contents_path = os.path.join(self.archivedir, "dists",
456+ self.dist, contents_file)
457+ contents[arch] = contents_path
458+ assert os.path.exists(contents_path), "Cannot find Contents at %s" % contents_path
459+ prefixes = IconFinder.IconFinder.prefixes
460 for arch in self.SUPPORTED_ARCHES:
461 logging.debug("looking at '%s'" % arch)
462+ self.archiveCache = ArchiveCache.ArchiveCache(contents[arch],
463+ prefixes)
464+ self.iconFinder = IconFinder.IconFinder(self.archiveCache)
465 self.pkgs_per_arch[arch] = set()
466 self.crawler = ArchiveCrawler.ArchiveCrawler(self.aptroot,
467 self.pooldir,
468@@ -156,65 +157,10 @@
469 self.crawler.updateCache()
470 self.crawler.registerCallback(self.inspectDeb)
471 self.crawler.crawl()
472- self._findMissingIcons()
473 self._calcArchSpecific()
474 self._addCodecInformation()
475 pickle.dump(self.deb_to_files,open(self.deb_to_files_f,"w"))
476 logging.info("extract() finished")
477-
478- def _findMissingIcons(self):
479- """ search for missing desktop icons in using the crawl cache """
480- for (pkgname, icons) in self.pkgs_to_missing_icons.items():
481- logging.debug("Searching for missing '%s' icons" % pkgname)
482- # get an ordered set from most likely to least likely package
483- to_search = OrderedDict()
484- # add (in order of importance) all cached packages matching regex
485- for regex in self.iconsearch_regex:
486- # FIXME: use {0} here once the extraction host moves from
487- # py2.5 to 2.6
488- if r'%(first_term)s' in regex:
489- first_term = re.split('-|_', pkgname, 1)[0]
490- regex = regex % { 'first_term' : first_term }
491- try:
492- matches = filter(re.compile(regex).match, self.pkgs_seen)
493- for match in matches:
494- to_search[match] = None
495- except Exception as e:
496- print "ERROR: %s" % e
497- # queue all non-library dependencies of the package
498- deps = self.crawler.cache[pkgname].candidate.dependencies
499- for dep in deps:
500- for dep_candidate in dep.or_dependencies:
501- depname = dep_candidate.name
502- if not depname.startswith('lib'):
503- to_search[depname] = None
504-
505- # finally, search the set of likely packages
506- for name in to_search:
507- # get cached tarfile
508- logging.debug("* Looking in %s" % name)
509- if name not in self.crawler.pkgs_to_debpath:
510- logging.debug(" Deb for %s not found!" % name)
511- continue
512- try:
513- debPath = self.crawler.pkgs_to_debpath[name]
514- datafile = self._extractDebData(debPath)
515- tar = tarfile.open(datafile)
516- except:
517- logging.debug(" Deb for %s could not be opened!" % name)
518- continue
519- found = set()
520- for icon in icons:
521- (res, n) = self.search_icon(tar, icon, self.menu_data)
522- if res == True:
523- logging.debug(" Icon %s found!" % icon)
524- found.add(icon)
525- # stop searching for any icons we find
526- icons = icons.difference(found)
527- if len(icons) == 0:
528- del self.pkgs_to_missing_icons[pkgname]
529- break
530- logging.info("missing icons left: '%s'" % self.pkgs_to_missing_icons)
531
532 def _calcArchSpecific(self):
533 # now add the architecture information
534@@ -358,54 +304,44 @@
535 return False
536
537 def search_icon(self, tarfile, iconName, outputdir):
538- if iconName == None:
539- logging.warning("search_icon() called with no icon name")
540- return (False, None)
541-
542- # a iconName can be a single name or a full path
543- # if it is a single name, look into a icon-theme path (usr/share/icons/hicolor) and then into usr/share/pixmaps
544- # if it is a full path just look for this
545-
546- # this is the "full-path" case
547- # FIXME: there are (some) icons that are not full pathes like "/usr/.../"
548- # but "zapping/zapping.png"
549- if "/" in iconName:
550- newIconName = iconName.replace("/", "_")
551- outpath = os.path.join(outputdir,"icons",newIconName)
552- # prevent wasted disk read
553- if os.path.exists(outpath):
554- return (True, newIconName)
555- res = self.extract_icon(tarfile, iconName, outpath)
556- return (res, newIconName)
557-
558- # this is the "get-it-from-a-icontheme" case, look into icon-theme hicolor and usr/share/pixmaps
559-
560- # search path (ordered by importance)
561- search_dirs = [
562- "usr/share/icons/hicolor/64x64",
563- "usr/share/icons/hicolor/48x48",
564- "usr/share/icons/hicolor/128x128",
565- "usr/share/pixmaps",
566- "usr/share/icons/hicolor/32x32",
567- "usr/share/icons/hicolor/22x22",
568- "usr/share/icons/hicolor/16x16",
569- "usr/share/icons"
570- ]
571- # extensions (ordered by importance)
572- pixmaps_ext = ["", ".png",".xpm",".svg"]
573-
574- # prevent wasted disk read
575- if os.path.exists(os.path.join(outputdir,"icons",iconName)):
576- return (True, None)
577- for d in search_dirs:
578- for name in tarfile.getnames():
579- if d in name:
580- for ext in pixmaps_ext:
581- if name.endswith(iconName+ext):
582- res = self.extract_icon(tarfile, name, os.path.join(outputdir,"icons", os.path.basename(name)))
583- return (res, None)
584- logging.warning("no icon: '%s' could be found" % iconName)
585- return (False, None)
586+ # ask IconFinder for an icon and its package
587+ results = self.iconFinder.search(iconName)
588+ if not results:
589+ logging.warning("Could not find package with icon '%s'" % iconName)
590+ return None
591+ logging.debug("Found icon and package %s" % str(results))
592+ pkgname, iconPath = results
593+
594+ # retrieve the package deb from the archive crawler
595+ if pkgname not in self.crawler.pkgs_to_debpath:
596+ logging.warning("Could not find deb '%s' for '%s'" %
597+ (pkgname, iconName))
598+ return None
599+ try:
600+ debPath = self.crawler.pkgs_to_debpath[pkgname]
601+ datafile = self._extractDebData(debPath)
602+ tar = tarfile.open(datafile)
603+ except Exception as e:
604+ logging.warning("Could not open deb '%s' for '%s'" %
605+ (pkgname, iconName, str(e)))
606+ return None
607+
608+ # extract the icon
609+ newIconName = iconName.replace("/", "_")
610+ outpath = os.path.join(outputdir, "icons", newIconName)
611+ success = self.extract_icon(tar, os.path.join(".", iconPath), outpath)
612+ os.remove(datafile)
613+ if not success:
614+ logging.warning("Could not extract '%s' from '%s' tarfile" %
615+ (iconPath, pkgname))
616+ return None
617+
618+ # validate it is an icon
619+ filetype = subprocess.check_output(["file", "-b", outpath]).strip()
620+ if filetype == "ASCII text":
621+ logging.warning("'%s' is not an icon" % iconPath)
622+ return None
623+ return newIconName
624
625 def tarfile_extract_orlog(self, dataFile, path):
626 try:
627@@ -493,12 +429,16 @@
628 line = string.strip(line)
629 if line.startswith("Icon="):
630 iconName = line[line.index("=")+1:]
631- logging.debug("Package '%s' needs icon '%s'" % (pkgname, iconName))
632- (res, newIconName) = self.search_icon(dataFile, iconName, outputdir)
633- if res == False:
634- if not pkgname in self.pkgs_to_missing_icons:
635- self.pkgs_to_missing_icons[pkgname] = set()
636- self.pkgs_to_missing_icons[pkgname].add(iconName)
637+ if not iconName:
638+ logging.debug("No icon needed for '%s'" %
639+ os.path.basename(path))
640+ newIconName = None
641+ else:
642+ logging.debug("Package '%s' needs icon '%s'" % (pkgname, iconName))
643+ newIconName = self.search_icon(dataFile, iconName, outputdir)
644+ if newIconName == None:
645+ logging.warning("Could not retrieve icon for '%s'" %
646+ os.path.basename(path))
647
648 # now check for supicious pkgnames (FIXME: make this not hardcoded)
649 if "-common" in pkgname or "-data" in pkgname:
650@@ -580,7 +520,14 @@
651 # extract it here, python tarfile does not support lzma
652 subprocess.call(["lzma","-d",datafile])
653 datafile = os.path.splitext(datafile)[0]
654- return datafile
655+ # make name unique
656+ datafile_new = datafile.replace("data", repr(time.time()))
657+ try:
658+ os.rename(datafile, datafile_new)
659+ except:
660+ logging.warning("Renaming tarball to %s failed" % datafile_new)
661+ return
662+ return datafile_new
663
664 def inspectDeb(self, crawler, filename, pkgname, ver, pkgarch, component):
665 """ check if the deb is interessting for us (not blacklisted) """
666
667=== added directory 'IconFinder'
668=== added file 'IconFinder/__init__.py'
669--- IconFinder/__init__.py 1970-01-01 00:00:00 +0000
670+++ IconFinder/__init__.py 2011-05-31 05:55:54 +0000
671@@ -0,0 +1,90 @@
672+import logging
673+
674+# Policies for searching icon directories.
675+BEST_ICON_POLICY, SMALL_ICON_POLICY = range(2)
676+
677+_SEARCH_DIRS = {
678+ # Finds better, but often extremely large icons. Favors SVG.
679+ BEST_ICON_POLICY: ["usr/share/icons/hicolor/scalable",
680+ "usr/share/icons/hicolor/128x128",
681+ "usr/share/icons/hicolor/256x256",
682+ "usr/share/icons/hicolor/64x64",
683+ "usr/share/icons/hicolor/48x48",
684+ "usr/share/pixmaps",
685+ "usr/share/icons/hicolor/32x32",
686+ "usr/share/icons/hicolor/22x22",
687+ "usr/share/icons/hicolor/16x16",
688+ "usr/share/icons/Humanity",
689+ "usr/share/icons",
690+ "usr/share"],
691+
692+ # Provides smaller icons. Favors PNG.
693+ SMALL_ICON_POLICY: ["usr/share/icons/hicolor/64x64",
694+ "usr/share/icons/hicolor/48x48",
695+ "usr/share/icons/hicolor/128x128",
696+ "usr/share/pixmaps",
697+ "usr/share/icons/hicolor/32x32",
698+ "usr/share/icons/hicolor/22x22",
699+ "usr/share/icons/hicolor/16x16",
700+ "usr/share/icons/Humanity",
701+ "usr/share/icons",
702+ "usr/share"]}
703+
704+
705+class IconFinder:
706+ """Searches for application icons according to a given policy."""
707+
708+ # When called to search for an icon, IconFinder looks in indexed
709+ # path prefixes in an order decided by policy.
710+
711+ def __init__(self, cache, policy=SMALL_ICON_POLICY):
712+ """
713+ @cache: An ArchiveCache instance to search for icons with.
714+ @policy: the priority policy to use when searching for icons
715+ Either BEST_ICON_POLICY or SMALL_ICON_POLICY (default)
716+ """
717+ self._set_cache(cache)
718+ self._set_policy(policy)
719+
720+ def search(self, icon_name):
721+ """
722+ Finds, according to policy, the best package name and icon
723+ path for `icon_name`.
724+ @icon_name: A valid Icon string for an XDG Desktop Entry.
725+ @returns ('packagename', '/path/to/icon') or None
726+ """
727+ if icon_name.startswith("/"):
728+ result = self._cache.search_exact(icon_name)
729+ if result:
730+ return (result, icon_name.strip("/"))
731+ else:
732+ # match any path ending with the name and an extension
733+ pattern = ".*/%s(.png|.xpm|.svg|)$" % icon_name
734+ for prefix in self._search_dirs:
735+ results = self._cache.search(pattern, prefix, first_only=True)
736+ if results:
737+ return results[0]
738+
739+ def _set_cache(self, cache):
740+ if not self.prefixes.issubset(set(cache.prefixes)):
741+ logging.warning("Using an ArchiveCache that doesn't index "
742+ "IconFinder's search prefixes")
743+ self._cache = cache
744+
745+ def _set_policy(self, policy):
746+ # Ensure the policy exists before assigning
747+ assert policy in (BEST_ICON_POLICY, SMALL_ICON_POLICY)
748+ self._policy = policy
749+ self._search_dirs = _SEARCH_DIRS[policy]
750+
751+ # A set of every path in the search directories
752+ prefixes = set(sum(_SEARCH_DIRS.values(), []))
753+
754+ cache = property(lambda self: self._cache, _set_cache, doc="""
755+ An ArchiveCache to search for icons with. It should index all
756+ file paths in `IconFinder.prefixes` for good performance.""")
757+
758+ policy = property(lambda self: self._policy, _set_policy, doc="""
759+ Determines what type and size of icon the IconFinder will
760+ search for first. Uses SMALL_ICON_POLICY by default. Switch to
761+ BEST_ICON_POLICY to favor large and scalable icons.""")
762
763=== removed file 'data/icon_search.cfg'
764--- data/icon_search.cfg 2011-04-14 09:16:47 +0000
765+++ data/icon_search.cfg 1970-01-01 00:00:00 +0000
766@@ -1,11 +0,0 @@
767-# these are regular expressions for finding packages with icons for a
768-# *.desktop file, when icons are not in the application package. Any
769-# string "{0}" will be formatted to the first hyphen-delimited term of
770-# an application package (eg "foo-bar-baz" -> "foo")
771-
772-# for cases like wesnoth/wesnoth-data
773-^%(first_term)s.+(data|common)$
774-
775-# for when environment applications use environmental icons
776-gnome-icon-theme
777-oxygen-icon-theme
778
779=== modified file 'getMenuData.py'
780--- getMenuData.py 2011-04-11 13:12:12 +0000
781+++ getMenuData.py 2011-05-31 05:55:54 +0000
782@@ -39,8 +39,10 @@
783 parser.add_option("--actionsdir", "--actionsdir", dest="actionsdir",
784 help="actionsdir",
785 default=apt_pkg.Config.Find("APT::Architecture"))
786- parser.add_option("-p", "--pooldir", dest="pooldir",
787- help="pooldir", default="/srv/archive.ubuntu.com/ubuntu/pool")
788+ parser.add_option("-a", "--archivedir", dest="archivedir",
789+ help="archivedir", default="/srv/archive.ubuntu.com/ubuntu")
790+ parser.add_option("--dist", dest="dist",
791+ help="dist", default="natty")
792 parser.add_option("-d", "--datadir", dest="datadir",
793 help="datadir", default="data/")
794 (options, args) = parser.parse_args()
795@@ -48,8 +50,9 @@
796
797 # now run it
798 desktop_extractor = DesktopDataExtractor(options.aptroot,
799- options.pooldir,
800- options.datadir)
801+ options.archivedir,
802+ options.datadir,
803+ options.dist)
804 desktop_extractor.extract()
805
806 logging.info("extraction finished")
807
808=== added directory 'tests/archive'
809=== added directory 'tests/archive/dists'
810=== added directory 'tests/archive/dists/testing'
811=== added file 'tests/archive/dists/testing/Contents_amd64'
812--- tests/archive/dists/testing/Contents_amd64 1970-01-01 00:00:00 +0000
813+++ tests/archive/dists/testing/Contents_amd64 2011-05-31 05:55:54 +0000
814@@ -0,0 +1,5 @@
815+
816+FILE LOCATION
817+usr/share/icons/hicolor/48x48/apps/audacious.png foo/audacious
818+usr/share/pixmaps/synaptic.png foo/synaptic
819+usr/lib/GNUstep/Applications/Cynthiune.app/Resources/Cynthiune.tiff foo/cynthiune.app
820
821=== added file 'tests/archive/dists/testing/Contents_i386'
822--- tests/archive/dists/testing/Contents_i386 1970-01-01 00:00:00 +0000
823+++ tests/archive/dists/testing/Contents_i386 2011-05-31 05:55:54 +0000
824@@ -0,0 +1,3 @@
825+FILE LOCATION
826+usr/share/icons/hicolor/48x48/apps/cheese.png foo/cheese-common
827+usr/share/pixmaps/python2.5.xpm foo/python2.5
828
829=== renamed directory 'tests/pool' => 'tests/archive/pool'
830=== modified file 'tests/archive/pool/get_pkgs.sh'
831--- tests/pool/get_pkgs.sh 2011-04-16 18:10:30 +0000
832+++ tests/archive/pool/get_pkgs.sh 2011-05-31 05:55:54 +0000
833@@ -57,3 +57,13 @@
834 cd universe/c/cynthiune.app
835 wget -c https://launchpad.net/ubuntu/+source/cynthiune.app/0.9.5-11ubuntu1/+buildjob/1962087/+files/cynthiune.app_0.9.5-11ubuntu1_amd64.deb
836 cd ../../..
837+
838+# zip contents files
839+cd ../dists/testing
840+if [ ! -e Contents-i386.gz ] || [ ! -e Contents-amd64.gz ]; then
841+ rm -f Contents-*
842+ cp Contents_i386 Contents-i386
843+ cp Contents_amd64 Contents-amd64
844+ gzip Contents-*
845+fi
846+cd ../../pool
847
848=== added file 'tests/data/Contents-i386.gz'
849Binary files tests/data/Contents-i386.gz 1970-01-01 00:00:00 +0000 and tests/data/Contents-i386.gz 2011-05-31 05:55:54 +0000 differ
850=== removed file 'tests/pool/main/h/hello/hello_2.1.1-4_i386.deb'
851Binary files tests/pool/main/h/hello/hello_2.1.1-4_i386.deb 2007-07-24 10:17:39 +0000 and tests/pool/main/h/hello/hello_2.1.1-4_i386.deb 1970-01-01 00:00:00 +0000 differ
852=== added file 'tests/test_archive_cache.py'
853--- tests/test_archive_cache.py 1970-01-01 00:00:00 +0000
854+++ tests/test_archive_cache.py 2011-05-31 05:55:54 +0000
855@@ -0,0 +1,96 @@
856+#!/usr/bin/env python
857+
858+import sys
859+import unittest
860+import subprocess
861+sys.path.insert(0, "../")
862+
863+from ArchiveCache import ArchiveCache
864+
865+# This data will need to be refreshed if Contents-i386.gz changes
866+CONTENTS_PATH = "data/Contents-i386.gz"
867+MD5_CHECKSUM = subprocess.check_output(["md5sum", CONTENTS_PATH]).split()[0]
868+INDEX_PATHS = ("usr/share/applications", "/usr/share/icons", "/usr/share/")
869+CONTENTS_LINES, INDEX_LINES = 258, (1, 12, 255)
870+
871+SEARCH_EXACT_PAIRS = {
872+ "/etc/dbus-1/system.d/com.ubuntu.SoftwareCenter.conf": "software-center",
873+ "/usr/bin/software-center": "software-center",
874+ "usr/share/icons/hicolor/32x32/apps/softwarecenter.png": "software-center",
875+ "/usr/share/xubuntu-docs/ubuntu-software-center.html": "xubuntu-docs",
876+ "/red/herring/path": None}
877+
878+SEARCH_PAIRS = {
879+ (".*SoftwareCenter.conf", ""): [("software-center",
880+ "etc/dbus-1/system.d/com.ubuntu.SoftwareCenter.conf")],
881+ (".*softwarecenter.svg", "usr/share"): [("software-center",
882+ "usr/share/icons/hicolor/scalable/apps/softwarecenter.svg")],
883+ (".*\.desktop$", "/usr/share/applications"): [("software-center",
884+ "usr/share/applications/ubuntu-software-center.desktop")],
885+ (".*software-center.html", "usr/share/"): [("xubuntu-docs",
886+ "usr/share/xubuntu-docs/ubuntu-software-center.html")]}
887+
888+
889+class ArchiveCacheTest(unittest.TestCase):
890+ """
891+ Tests the proper operation of ArchiveCache.
892+ """
893+ @classmethod
894+ def setUp(self):
895+ self.cache = ArchiveCache(CONTENTS_PATH, INDEX_PATHS)
896+
897+ def test_contents_checksum(self):
898+ """archive cache loads contents.gz files correctly"""
899+ self.assertEqual(self.cache.checksum, MD5_CHECKSUM)
900+
901+ def test_contents_length(self):
902+ """archive cache stores its length in lines"""
903+ self.assertTrue(hasattr(self.cache, "__len__"))
904+ self.assertEqual(len(self.cache), CONTENTS_LINES)
905+
906+ def test_index_length(self):
907+ """archive cache indexes store their length in lines"""
908+ for i, path in enumerate(INDEX_PATHS):
909+ self.assertEqual(self.cache.__len__(path), INDEX_LINES[i])
910+
911+ def test_index_checking(self):
912+ """archive cache allows idiomatic checking of index existance"""
913+ self.assertTrue(hasattr(self.cache, "__contains__"))
914+ self.assertTrue(hasattr(self.cache, "prefixes"))
915+ for path in INDEX_PATHS:
916+ self.assertTrue(path.strip("/") in self.cache.prefixes)
917+
918+ def test_search_exact(self):
919+ """archive cache can find an exact path from contents"""
920+ for arg, result in SEARCH_EXACT_PAIRS.items():
921+ self.assertEqual(self.cache.search_exact(arg), result)
922+
923+ def test_search_regex(self):
924+ """archive cache can find a regex result from contents"""
925+ for args, results in SEARCH_PAIRS.items():
926+ self.assertEqual(set(self.cache.search(*args)), set(results))
927+
928+ def test_first_only_search(self):
929+ """archive cache can stop at first result in search"""
930+ results = self.cache.search(".*softwarecenter(.png|.svg)$",
931+ "/usr/share/icons", first_only=True)
932+ self.assertEqual(len(results), 1)
933+
934+ def test_search_exact_symlinked(self):
935+ """archive cache should return partial path matches for symlinks"""
936+ # Contents.gz interprets symlinks as files. If we search for an exact
937+ # file via a symlinked directory, it should still work.
938+ result = self.cache.search_exact("usr/share/symlinked_dir/linked_file")
939+ self.assertEqual(result, "software-center")
940+
941+ def test_search_no_memory(self):
942+ """archive cache search should work without memory caching"""
943+ old_cache = self.cache
944+ self.cache = ArchiveCache(CONTENTS_PATH, INDEX_PATHS, memory=0)
945+ self.test_search_exact()
946+ self.test_search_regex()
947+ self.test_search_exact_symlinked()
948+ self.cache = old_cache
949+
950+if __name__ == "__main__":
951+ unittest.main()
952
953=== modified file 'tests/test_cmd_data_extractor.py'
954--- tests/test_cmd_data_extractor.py 2008-06-30 10:47:06 +0000
955+++ tests/test_cmd_data_extractor.py 2011-05-31 05:55:54 +0000
956@@ -26,7 +26,7 @@
957
958 def testExtractorSimple(self):
959 extractor = CommandDataExtractor("./aptroot",
960- "./pool",
961+ "./archive/pool",
962 "./data")
963 extractor.extract()
964 self.assert_(os.path.exists("./data/scan.data"))
965@@ -34,13 +34,13 @@
966
967 def testExtractorOrphan(self):
968 extractor = CommandDataExtractor("./aptroot",
969- "./pool",
970+ "./archive/pool",
971 "./data")
972 extractor.extract()
973 self.assert_(len(open("./data/scan.data").readlines()) > 2)
974 self.assert_("gnome-session-remove" in open(extractor.cmd_data_f).read())
975 # now simulate archive removal of a pkg
976- b = "./pool/main/g/gnome-session/gnome-session_2.14.1-0ubuntu11_i386.deb"
977+ b = "./archive/pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb"
978 os.rename(b, b+".xxx")
979 # clean orphaned desktop files and make sure that the file really
980 # got removed
981@@ -51,7 +51,7 @@
982
983 def testExtractorCleanPkg(self):
984 extractor = CommandDataExtractor("./aptroot",
985- "./pool",
986+ "./archive/pool",
987 "./data")
988 # simulate superseeding
989 open("./data/scan.data","w").write("i386|main|gnome-session|gnome-session-remove,gnome-session-save,gnome-session-properties,x-session-manager\n")
990@@ -60,7 +60,7 @@
991 self.assert_("gnome-wm" in open(extractor.cmd_data_f).read())
992
993 if __name__ == "__main__":
994- subprocess.call(["(cd pool; ./get_pkgs.sh)"],shell=True)
995+ subprocess.call(["(cd archive/pool; ./get_pkgs.sh)"],shell=True)
996 logging.basicConfig(level=logging.DEBUG)
997 apt_pkg.init()
998 #unittest.main(defaultTest="testCommandDataExtractor.testExtractorOrphan")
999
1000=== modified file 'tests/test_cralwer.py'
1001--- tests/test_cralwer.py 2007-11-28 09:47:20 +0000
1002+++ tests/test_cralwer.py 2011-05-31 05:55:54 +0000
1003@@ -21,7 +21,7 @@
1004
1005 def setUp(self):
1006 # update the cache only once
1007- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1008+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1009 crawler.updateCache()
1010 # remove pkgs-found file to make tests meaningful
1011 self.rm(self.pkgs_found_file)
1012@@ -31,7 +31,7 @@
1013 def callback_helper(crawler, debfile, pkg, ver, arch, component):
1014 self._i += 1
1015 return True
1016- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1017+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1018 # test if we allow only callables
1019 try:
1020 crawler.registerCallback(self._i)
1021@@ -49,7 +49,7 @@
1022 del self._i
1023
1024 def testCrawlerSkipDone(self):
1025- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1026+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1027 crawler.crawl()
1028 self.assert_(len(crawler.debfiles_done) > 0)
1029 # delete the found-pkgs marker file, crawl again
1030@@ -59,41 +59,41 @@
1031 self.rm(self.debfiles_done)
1032
1033 def testCrawlerSkipPersistant(self):
1034- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1035+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1036 crawler.crawl()
1037 # delete the found-pkgs marker file, create new crawler, crawl again,
1038 # see if we skip the done files
1039 self.rm(self.pkgs_found_file)
1040- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1041+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1042 crawler.crawl()
1043 self.assert_(not os.path.exists(self.pkgs_found_file))
1044
1045 def testFindOrphanedFiles(self):
1046- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1047+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1048 crawler.crawl()
1049- crawler.debfiles_done.add("./pool/main/h/hello/hello_2.1.1-3_i386.deb")
1050+ crawler.debfiles_done.add("./archive/pool/main/h/hello/hello_2.1.1-3_i386.deb")
1051 print crawler.findOrphanedFiles()
1052 self.assert_(len(crawler.findOrphanedFiles()) == 1)
1053
1054 def testFindOrphanedPackages(self):
1055- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1056+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1057 crawler.crawl()
1058 print crawler.findOrphanedPackages()
1059 self.assert_(len(crawler.findOrphanedPackages()) == 0)
1060- crawler.debfiles_done.add("./pool/main/h/hello/hello_2.1.1-3_i386.deb")
1061+ crawler.debfiles_done.add("./archive/pool/main/h/hello/hello_2.1.1-3_i386.deb")
1062 print crawler.findOrphanedPackages()
1063 self.assert_(len(crawler.findOrphanedPackages()) == 1)
1064
1065 def testFindObsoletedPackages(self):
1066 # FIXME: this test is pretty useless
1067- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386")
1068+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386")
1069 crawler.crawl()
1070 #print "obsolete: ", crawler.findObsoletedPackages()
1071 self.assert_(len(crawler.findObsoletedPackages()) > 0)
1072
1073 def testArches(self):
1074 for arch in ('amd64', 'i386'):
1075- crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", arch)
1076+ crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", arch)
1077 crawler.updateCache()
1078 self.assert_(len(glob.glob(os.path.join(crawler.aptroot,"var/lib/apt/lists")+"/*%s*" % arch)) > 0)
1079
1080
1081=== modified file 'tests/test_deb_package.py'
1082--- tests/test_deb_package.py 2008-03-04 12:45:12 +0000
1083+++ tests/test_deb_package.py 2011-05-31 05:55:54 +0000
1084@@ -11,7 +11,7 @@
1085
1086 class testDebPackage(unittest.TestCase):
1087 def testDebPackage(self):
1088- name = "pool/main/g/git-core/git-core_1.5.4.3-1ubuntu1_i386.deb"
1089+ name = "archive/pool/main/g/git-core/git-core_1.5.4.3-1ubuntu2_i386.deb"
1090 pkg = CommandDataExtractor.data_extractor.load(name)
1091 print pkg
1092 print pkg.name
1093
1094=== modified file 'tests/test_desktop_data_extractor.py'
1095--- tests/test_desktop_data_extractor.py 2011-04-16 18:10:30 +0000
1096+++ tests/test_desktop_data_extractor.py 2011-05-31 05:55:54 +0000
1097@@ -13,74 +13,69 @@
1098
1099 from DesktopDataExtractor import DesktopDataExtractor
1100
1101+
1102 class TestDesktopDataExtractor(unittest.TestCase):
1103
1104 debfiles_done = "./data/desktop.p"
1105
1106- def rm(self, f):
1107- if os.path.exists(f):
1108- os.unlink(f)
1109- def setUp(self):
1110- self.rm(self.debfiles_done)
1111+ @classmethod
1112+ def setUpClass(self):
1113+ if os.path.exists(self.debfiles_done):
1114+ os.unlink(self.debfiles_done)
1115 try:
1116 shutil.rmtree("./data/menu-data")
1117 except:
1118 pass
1119+ self.extractor = DesktopDataExtractor("./aptroot",
1120+ "./archive",
1121+ "./data",
1122+ "testing")
1123+ self.extractor.extract()
1124
1125 def test_extractor_cheese_common(self):
1126- extractor = DesktopDataExtractor("./aptroot-lucid",
1127- "./pool",
1128- "./data")
1129- extractor.extract()
1130 # check if icon extraction works
1131- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "cheese.png")))
1132-
1133+ self.extractor = DesktopDataExtractor("./aptroot-lucid",
1134+ "./archive",
1135+ "./data",
1136+ "testing")
1137+ self.extractor.extract()
1138+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "cheese")))
1139+
1140 def test_symlinked_icons(self):
1141- extractor = DesktopDataExtractor("./aptroot-maverick",
1142- "./pool",
1143- "./data")
1144- extractor.extract()
1145 # check if symlinked (a) icon (b) directory extraction work
1146- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "audacious.png")))
1147- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "_usr_lib_GNUstep_Applications_Cynthiune.app_Resources_Cynthiune.tiff")))
1148+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "audacious")))
1149+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "_usr_lib_GNUstep_Applications_Cynthiune.app_Resources_Cynthiune.tiff")))
1150
1151 def test_extractor_simple(self):
1152- extractor = DesktopDataExtractor("./aptroot",
1153- "./pool",
1154- "./data")
1155- extractor.extract()
1156 # see if extraction works
1157- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"synaptic.desktop")))
1158+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"synaptic.desktop")))
1159 # see if lzma works
1160- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"ooo-math.desktop")))
1161+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"ooo-math.desktop")))
1162 # gnome-about is blacklisted
1163- self.assertTrue(not os.path.exists(os.path.join(extractor.menu_data,"gnome-about.desktop")))
1164+ self.assertTrue(not os.path.exists(os.path.join(self.extractor.menu_data,"gnome-about.desktop")))
1165 # check if icon extraction works
1166- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "_usr_share_pixmaps_python2.5.xpm")))
1167+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "_usr_share_pixmaps_python2.5.xpm")))
1168
1169 def test_extractor_simple_maverick(self):
1170 extractor = DesktopDataExtractor("./aptroot-maverick",
1171- "./pool",
1172- "./data")
1173+ "./archive",
1174+ "./data",
1175+ "testing")
1176 extractor.extract()
1177 # check if we have baobab
1178- self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"baobab.desktop")))
1179+ self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"baobab.desktop")))
1180
1181
1182 def test_extractor_orphan(self):
1183- extractor = DesktopDataExtractor("./aptroot",
1184- "./pool",
1185- "./data")
1186- extractor.extract()
1187- self.assert_(os.path.exists(os.path.join(extractor.menu_data,"session-properties.desktop")))
1188+ self.assert_(os.path.exists(os.path.join(self.extractor.menu_data,"session-properties.desktop")))
1189
1190 # now simulate archive removal of a pkg
1191- b = "./pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb"
1192+ b = "./archive/pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb"
1193 os.rename(b, b+".xxx")
1194 # clean orphaned desktop files and make sure that the file really
1195 # got removed
1196- extractor.extract()
1197- self.assert_(not os.path.exists(os.path.join(extractor.menu_data,"session-properties.desktop")))
1198+ self.extractor.extract()
1199+ self.assert_(not os.path.exists(os.path.join(self.extractor.menu_data,"session-properties.desktop")))
1200 os.rename(b+".xxx", b)
1201
1202 if __name__ == "__main__":
1203@@ -88,6 +83,6 @@
1204 logging.basicConfig(level=logging.DEBUG)
1205 else:
1206 logging.basicConfig(level=logging.INFO)
1207- subprocess.call(["(cd pool; ./get_pkgs.sh)"],shell=True)
1208+ subprocess.call(["(cd archive/pool; ./get_pkgs.sh)"],shell=True)
1209 apt_pkg.init()
1210 unittest.main()
1211
1212=== added file 'tests/test_icon_finder.py'
1213--- tests/test_icon_finder.py 1970-01-01 00:00:00 +0000
1214+++ tests/test_icon_finder.py 2011-05-31 05:55:54 +0000
1215@@ -0,0 +1,44 @@
1216+#!/usr/bin/env python
1217+
1218+import sys
1219+import unittest
1220+sys.path.insert(0, "../")
1221+
1222+from ArchiveCache import ArchiveCache
1223+from IconFinder import IconFinder
1224+
1225+# This data will need to be refreshed as Contents-i386.gz changes
1226+CONTENTS_PATH = "data/Contents-i386.gz"
1227+ICONS = (
1228+ "usr/share/icons/hicolor/128x128/apps/softwarecenter.png",
1229+ "usr/share/icons/hicolor/16x16/apps/softwarecenter.png",
1230+ "usr/share/icons/hicolor/22x22/apps/softwarecenter.png",
1231+ "usr/share/icons/hicolor/24x24/apps/ppa.svg",
1232+ "usr/share/icons/hicolor/24x24/apps/softwarecenter.png",
1233+ "usr/share/icons/hicolor/24x24/apps/unknown-channel.svg",
1234+ "usr/share/icons/hicolor/32x32/apps/softwarecenter.png",
1235+ "usr/share/icons/hicolor/48x48/apps/softwarecenter.png",
1236+ "usr/share/icons/hicolor/64x64/apps/softwarecenter.png",
1237+ "usr/share/icons/hicolor/scalable/apps/category-show-all.svg",
1238+ "usr/share/icons/hicolor/scalable/apps/partner.svg",
1239+ "usr/share/icons/hicolor/scalable/apps/softwarecenter.svg")
1240+
1241+
1242+class IconFinderTest(unittest.TestCase):
1243+ """
1244+ Tests the proper operation of IconFinder.
1245+ """
1246+ def setUp(self):
1247+ self.cache = ArchiveCache(CONTENTS_PATH, IconFinder.prefixes)
1248+
1249+ def test_search(self):
1250+ finder = IconFinder(self.cache)
1251+ # results for faulty search
1252+ self.assertEqual(finder.search("foobarbaz"), None)
1253+ # results for good search
1254+ package, path = finder.search("softwarecenter")
1255+ self.assertEqual(package, "software-center")
1256+ self.assertTrue(path in ICONS)
1257+
1258+if __name__ == "__main__":
1259+ unittest.main()

Subscribers

People subscribed via source and target branches

to all changes: