Merge lp:~jjed/archive-crawler/use-aptfile into lp:~mvo/archive-crawler/mvo
- use-aptfile
- Merge into mvo
Status: | Merged |
---|---|
Merged at revision: | 124 |
Proposed branch: | lp:~jjed/archive-crawler/use-aptfile |
Merge into: | lp:~mvo/archive-crawler/mvo |
Diff against target: |
1259 lines (+704/-202) 15 files modified
ArchiveCache/__init__.py (+313/-0) ArchiveCrawler/__init__.py (+13/-6) DesktopDataExtractor/__init__.py (+74/-127) IconFinder/__init__.py (+90/-0) data/icon_search.cfg (+0/-11) getMenuData.py (+7/-4) tests/archive/dists/testing/Contents_amd64 (+5/-0) tests/archive/dists/testing/Contents_i386 (+3/-0) tests/archive/pool/get_pkgs.sh (+10/-0) tests/test_archive_cache.py (+96/-0) tests/test_cmd_data_extractor.py (+5/-5) tests/test_cralwer.py (+11/-11) tests/test_deb_package.py (+1/-1) tests/test_desktop_data_extractor.py (+32/-37) tests/test_icon_finder.py (+44/-0) |
To merge this branch: | bzr merge lp:~jjed/archive-crawler/use-aptfile |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Michael Vogt | Pending | ||
Review via email: mp+62944@code.launchpad.net |
Commit message
Description of the change
Apologies for the huge patch. I abstracted IconFinder out, and that
ballooned the line count of this solution quite a bit.
SUMMARY
This branch adds a new component, `IconFinder`, which should solve the
issue of finding icons in an archive. It is a frontend for `ArchiveCache`,
a general solution for searching archive files via Contents-<ARCH>.gz.
In the future, `ArchiveCache` might replace `ArchiveCrawler` as a cleaner,
quicker, and more modularity-friendly solution. As it stands, only `IconFinder` uses it.
DESCRIPTION
`ArchiveCache` loads all lines from a Content.gz file and either writes them
to files or stores them in RAM depending on memory constraints. It provides
methods to search them for files and paths.
Its major feature over a simple grep is the separate caching of index prefixes,
allowing a frontend to quickly search only the files in a certain directory
(eg /usr/share/icons). `IconFinder` is a simple wrapper for this functionality,
searching a sequence of prefixes for a match.
`IconFinder` integration replaces most of my work in `DesktopDataExt
`ArchiveCrawler` now finds all deb paths before crawling, so as to allow
`IconFinder` access to any deb from the start.
`DesktopDataExt
pool directory being found instead at `archive_dir`/pool, and the contents
file at `archive_
TESTING
Out of 361 desktop entries I tested (an old main[a-k] + universe[2-d]) with
343 icon requests, 13 failed. Of these, 11 failed because they requested
non-existent icons, 1 failed because it requested an icon too large ('arista'),
and 1 failed because it stored its icon in /usr/lib without a symlink in
/usr/share or and absolute path request ('batmon.app').
The unit tests that pass as of r123 pass now, as well as new ones for
`IconFinder` and `ArchiveCache`.
FUTURE
Hopefully this branch -- together with a solution for extracting large icons
-- will solve #599535. With the exception of arista, all icons that fail to
extract should now be the fault of the package. Hopefully. We'll see if the
full archive has an interesting edge cases.
Michael Vogt (mvo) wrote : | # |
Preview Diff
1 | === added directory 'ArchiveCache' |
2 | === added file 'ArchiveCache/__init__.py' |
3 | --- ArchiveCache/__init__.py 1970-01-01 00:00:00 +0000 |
4 | +++ ArchiveCache/__init__.py 2011-05-31 05:55:54 +0000 |
5 | @@ -0,0 +1,313 @@ |
6 | +import gzip |
7 | +import hashlib |
8 | +import logging |
9 | +import os.path |
10 | +import re |
11 | +import tempfile |
12 | + |
13 | + |
14 | +class ArchiveCache(object): |
15 | + """ |
16 | + This class caches the file paths in a deb archive--as described by |
17 | + that archive's Contents-<ARCH>.gz--to make search of them efficient. |
18 | + It can also index subsets of the archives (eg all paths beginning |
19 | + with /usr/share/icons/) for quicker informed searches. |
20 | + |
21 | + Indexes are created at instantiation via the prefixes_to_index |
22 | + argument. They can then be searched simply by using the `prefix` |
23 | + argument in search methods. |
24 | + |
25 | + In the event of ArchiveCache runs out of memory to allocate, it |
26 | + will prioritize the indexes in order, and write all excess data |
27 | + to cache files. |
28 | + """ |
29 | + |
30 | + def __init__(self, contents_file, prefixes_to_index=[], memory=500): |
31 | + """ |
32 | + @contents_file: The path to a Contents-<ARCH>.gz file. |
33 | + @prefixes_to_index: A list of path prefixes (eg /usr/share is a |
34 | + prefix of /usr/share/*) to separately cache |
35 | + and make searchable for better speeds. |
36 | + @memory: The maximum amount of data in the contents_file (in MB) |
37 | + that the cache will load into memory. |
38 | + """ |
39 | + self._contents_file = contents_file |
40 | + self._memory, self._max_memory = 0, memory * 1024 * 1024 |
41 | + self._num_lines = 0 |
42 | + self._checksum = None |
43 | + self._create_indexes(prefixes_to_index) |
44 | + self._load_cache() |
45 | + |
46 | + def search(self, regex, prefix="", first_only=False): |
47 | + """ |
48 | + Returns a list of (package name, path) pairs, where "path" is a |
49 | + file location in the archive that the search term matches, and |
50 | + "package name" is the name of a package that contains it. |
51 | + |
52 | + @regex: A string regular expression to filter paths with. |
53 | + @prefix: A string path from prefix_indexes. Only paths starting |
54 | + with the prefix will be searched. Speeds up queries. |
55 | + @first_only: Set True to search only until finding one hit. |
56 | + """ |
57 | + prefix = prefix.strip("/") |
58 | + if prefix: |
59 | + index = self._indexes[prefix] |
60 | + logging.debug("Searching cache for all files '%s' with prefix %s" % |
61 | + (regex, prefix)) |
62 | + else: |
63 | + index = self._contents_index |
64 | + logging.debug("Searching cache for all files '%s'" % regex) |
65 | + results = index.search(regex, first_only) |
66 | + if results and first_only: |
67 | + return results |
68 | + for p in self._prefixes: |
69 | + if p.startswith(prefix) and p != prefix: |
70 | + child_index = self._indexes[p] |
71 | + results.extend(child_index.search(regex, first_only)) |
72 | + if results and first_only: |
73 | + return results |
74 | + return results |
75 | + |
76 | + def search_exact(self, path): |
77 | + """ |
78 | + Returns the name of a package that contains the exact `path` |
79 | + location, or None if `path` does not exist in the archive. |
80 | + """ |
81 | + path = path.strip("/") |
82 | + logging.debug("Searching cache for '%s'" % path) |
83 | + prefix = self._find_index(path) |
84 | + if prefix: |
85 | + return self._indexes[prefix].search_exact(path) |
86 | + return self._contents_index.search_exact(path) |
87 | + |
88 | + def _create_indexes(self, prefixes): |
89 | + self._indexes = {} |
90 | + self._prefixes = tuple([p.strip("/") for p in prefixes]) |
91 | + # indexes for user-defined paths like '/usr/share/icons' |
92 | + for prefix in self._prefixes: |
93 | + self._indexes[prefix] = _Index() |
94 | + # an index for lines of all other paths |
95 | + self._contents_index = _Index() |
96 | + |
97 | + def _load_cache(self): |
98 | + logging.info("Loading archive paths from %s into cache" % |
99 | + os.path.basename(self._contents_file)) |
100 | + contents = gzip.open(self._contents_file) |
101 | + while not re.match("^FILE\W+LOCATION", contents.readline()): |
102 | + continue |
103 | + for line in contents: |
104 | + self._memory += self._place_in_index(line) |
105 | + if self._memory > self._max_memory: |
106 | + self._reduce_memory_use() |
107 | + self._num_lines += 1 |
108 | + contents.close() |
109 | + logging.info("Done loading cache. %iMB in active memory." % |
110 | + (self._memory / 1048576)) |
111 | + |
112 | + def _place_in_index(self, line): |
113 | + # put a line in its right index and return the amount of memory it took |
114 | + path = line.split()[0] |
115 | + prefix = self._find_index(path) |
116 | + if prefix: |
117 | + return self._indexes[prefix].add_line(line) |
118 | + else: |
119 | + return self._contents_index.add_line(line) |
120 | + |
121 | + def _find_index(self, path): |
122 | + # get the longest index path prefixing the given path |
123 | + candidate_index = "" |
124 | + for prefix in self._prefixes: |
125 | + if path.startswith(prefix) and len(prefix) > len(candidate_index): |
126 | + candidate_index = prefix |
127 | + return candidate_index |
128 | + |
129 | + def _reduce_memory_use(self): |
130 | + if self._contents_index.bytes_loaded: |
131 | + logging.info("* Max cache load exceeded. Removing non-indexed" |
132 | + "paths from memory") |
133 | + self._contents_index.flush_memory() |
134 | + return |
135 | + for prefix in reversed(self._prefixes): |
136 | + if self._indexes[prefix].bytes_loaded: |
137 | + logging.info("Max cache load exceeded. Removing '%s' index " |
138 | + "from memory" % prefix) |
139 | + self._indexes[prefix].flush_memory() |
140 | + return |
141 | + |
142 | + def __len__(self, prefix=""): |
143 | + """ |
144 | + Returns the number of paths searchable in this cache. Assign |
145 | + `prefix` to count the paths in a particular index. |
146 | + """ |
147 | + if not prefix: |
148 | + return self._num_lines |
149 | + else: |
150 | + # the length of the prefix and its children (eg /usr/ -> /usr/share) |
151 | + prefix = prefix.strip("/") |
152 | + num_lines = self._indexes[prefix].num_lines |
153 | + for p in self._prefixes: |
154 | + if p.startswith(prefix + "/"): |
155 | + num_lines += self._indexes[p].num_lines |
156 | + return num_lines |
157 | + |
158 | + def __contains__(self, prefix): |
159 | + return prefix.strip("/") in self._prefixes |
160 | + |
161 | + def _get_checksum(self): |
162 | + if not self._checksum: |
163 | + contents = open(self._contents_file, 'r') |
164 | + data = contents.read() |
165 | + contents.close() |
166 | + self._checksum = hashlib.md5(data).hexdigest() |
167 | + return self._checksum |
168 | + |
169 | + checksum = property(_get_checksum, doc=""" |
170 | + An MD5 hash of this cache's Contents.gz file""") |
171 | + |
172 | + prefixes = property(lambda self: self._prefixes, doc=""" |
173 | + The path prefixes that the cache has indexed for searching""") |
174 | + |
175 | + |
176 | +class _Index: |
177 | + # a helper implementation class for indexes: cached lines all beginning |
178 | + # with the same prefix |
179 | + def __init__(self): |
180 | + self.bytes_loaded = 0 |
181 | + self.num_lines = 0 |
182 | + self._file = tempfile.TemporaryFile() |
183 | + self._lines = [] |
184 | + self._using_memory = True |
185 | + |
186 | + def add_line(self, line): |
187 | + # add a line to the cache file, and memory if possible |
188 | + self.num_lines += 1 |
189 | + self._file.write(line) |
190 | + if self._using_memory: |
191 | + to_store = line.strip() |
192 | + self._lines.append(to_store) |
193 | + bytes_added = len(to_store) |
194 | + self.bytes_loaded += bytes_added |
195 | + return bytes_added |
196 | + return 0 |
197 | + |
198 | + def flush_memory(self): |
199 | + # stop caching lines in memory |
200 | + self._lines = None |
201 | + self.bytes_loaded = 0 |
202 | + self._using_memory = False |
203 | + |
204 | + def search_exact(self, path): |
205 | + if self._using_memory: |
206 | + line = _search_lines_exact(path, self._lines) |
207 | + else: |
208 | + line = _search_file_exact(path, self._file) |
209 | + if not line: |
210 | + return None |
211 | + # the right-most part of a Contents file is a package for the path |
212 | + parts = line.rsplit("/", 1) |
213 | + if len(parts) != 2: |
214 | + logging.warning("Invalid contents line: %s" % line) |
215 | + return None |
216 | + return parts[1].strip() |
217 | + |
218 | + def search(self, regex, first_only=False): |
219 | + lines = _search_pattern(regex, self._file, self._lines, first_only) |
220 | + results = [] |
221 | + for line in lines: |
222 | + parts = line.split() |
223 | + if len(parts) != 2: |
224 | + logging.warning("Invalid contents line: %s" % line) |
225 | + continue |
226 | + package_data = parts[1].rsplit("/", 1) |
227 | + if len(package_data) != 2: |
228 | + logging.warning("Invalid contents line: %s" % line) |
229 | + continue |
230 | + path, package_name = parts[0], package_data[1] |
231 | + results.append((package_name, path)) |
232 | + return results |
233 | + |
234 | + |
235 | +# Utility functions for quickly searching files according to the |
236 | +# following line format used by Contents-<ARCH>: |
237 | +# |
238 | +# absolute/path/to/file section/cat/pkgname |
239 | + |
240 | +def _search_file_exact(search_term, file): |
241 | + # Searches the file at `file` for a line starting with |
242 | + # `search_term`. Assumes lines are in alphabetical order. Returns |
243 | + # None if there is no match. |
244 | + file.seek(0, 2) |
245 | + low, high = 0, file.tell() |
246 | + |
247 | + # perform a binary search on the file |
248 | + while high > low: |
249 | + mid = (low + high) / 2 |
250 | + file.seek(mid) |
251 | + |
252 | + # read current line from the beginning |
253 | + while file.tell() != 0 and file.read(1) != '\n': |
254 | + file.seek(-2, os.SEEK_CUR) |
255 | + line_begin = file.tell() |
256 | + line = file.readline() |
257 | + line_end = file.tell() |
258 | + |
259 | + first_term = line.split()[0] |
260 | + if search_term.startswith(first_term): |
261 | + # return if first term matches search term |
262 | + if len(search_term) == len(first_term): |
263 | + return line |
264 | + # or if the first term looks like a directory (symlink) |
265 | + elif (len(search_term) > len(first_term) and |
266 | + search_term[len(first_term)] == "/"): |
267 | + return line |
268 | + # otherwise continue the binary search |
269 | + match = cmp(search_term, first_term) |
270 | + if match < 0: |
271 | + high = line_begin |
272 | + elif match > 0: |
273 | + low = line_end |
274 | + else: |
275 | + return line |
276 | + |
277 | + |
278 | +def _search_lines_exact(search_term, lines): |
279 | + # Searches `lines` for a line starting with `search_term`. Assumes |
280 | + # lines are in alphabetical order. Returns None if there is no match. |
281 | + low, high = 0, len(lines) |
282 | + while high > low: |
283 | + mid = (low + high) / 2 |
284 | + first_term = lines[mid].split()[0] |
285 | + if search_term.startswith(first_term): |
286 | + # return if first term matches search term |
287 | + if len(search_term) == len(first_term): |
288 | + return lines[mid] |
289 | + # or if the first term looks like a directory (symlink) |
290 | + elif (len(search_term) > len(first_term) and |
291 | + search_term[len(first_term)] == "/"): |
292 | + return lines[mid] |
293 | + # otherwise continue the binary search |
294 | + match = cmp(search_term, first_term) |
295 | + if match < 0: |
296 | + high = mid |
297 | + elif match > 0: |
298 | + low = mid + 1 |
299 | + else: |
300 | + return lines[mid] |
301 | + |
302 | + |
303 | +def _search_pattern(search_pattern, file, lines=None, first_only=False): |
304 | + # Searches the file at `file` or the `line_list` for all lines where |
305 | + # the first term (whitespace delimited) matches the `search_pattern` |
306 | + # regex string. Use `lines` if they are already loaded. |
307 | + pattern = re.compile(search_pattern.replace('+', '\+')) |
308 | + results = [] |
309 | + if lines == None: |
310 | + lines = file |
311 | + lines.seek(0) |
312 | + for line in lines: |
313 | + first_term = line.split()[0] |
314 | + if pattern.match(first_term): |
315 | + results.append(line) |
316 | + if first_only: |
317 | + return tuple(results) |
318 | + return tuple(results) |
319 | |
320 | === modified file 'ArchiveCrawler/__init__.py' |
321 | --- ArchiveCrawler/__init__.py 2011-04-13 13:15:17 +0000 |
322 | +++ ArchiveCrawler/__init__.py 2011-05-31 05:55:54 +0000 |
323 | @@ -48,6 +48,7 @@ |
324 | self._loadDebFilesDone() |
325 | self.callbacks = set() |
326 | self.pkgs_to_debpath = {} |
327 | + self.pkgs_to_debinfo = {} |
328 | |
329 | def registerCallback(self, c): |
330 | if not callable(c): |
331 | @@ -93,8 +94,8 @@ |
332 | logging.debug("adding '%s' to debfiles_done" % debfile) |
333 | self.debfiles_done.add(debfile) |
334 | |
335 | - def inspectDeb(self, debfile): |
336 | - #logging.debug("inspectDeb %s" % debfile) |
337 | + def indexDeb(self, debfile): |
338 | + # add debfile to self.pkgs_to_debpath if it's valid for this crawler |
339 | m = re.match(".*/(.*)_(.*)_(.*).deb", debfile) |
340 | pkgname = m.group(1) |
341 | pkgver = urllib.unquote(m.group(2)) |
342 | @@ -130,16 +131,20 @@ |
343 | logging.debug("skipping, compoent does not match (expected '%s' got '%s' "% (component, debfile)) |
344 | return False |
345 | # add to mapping of name and deb |
346 | - # it may be needed if it contains an application's icon |
347 | self.pkgs_to_debpath[pkgname] = debfile |
348 | - # ... then filter if done already |
349 | + self.pkgs_to_debinfo[pkgname] = (candVer, pkgarch, component) |
350 | + return True |
351 | + |
352 | + def inspectDeb(self, pkgname): |
353 | + debfile = self.pkgs_to_debpath[pkgname] |
354 | + candVer, pkgarch, component = self.pkgs_to_debinfo[pkgname] |
355 | + # ... filter if done already |
356 | if debfile in self.debfiles_done: |
357 | logging.debug("skipping, already in debfiles_done '%s'" % debfile) |
358 | return False |
359 | |
360 | # looks like we have a valid ver |
361 | logging.debug("found valid deb: '%s'" % debfile) |
362 | - |
363 | |
364 | # pass the epoch here too, this information is not encoded in the |
365 | # filename |
366 | @@ -152,7 +157,9 @@ |
367 | for (root, dirs, files) in os.walk(self.pooldir): |
368 | for f in files: |
369 | if f.endswith(".deb"): |
370 | - self.inspectDeb(os.path.join(root,f)) |
371 | + self.indexDeb(os.path.join(root, f)) |
372 | + for pkg in self.pkgs_to_debpath: |
373 | + self.inspectDeb(pkg) |
374 | self._saveDebFilesDone() |
375 | |
376 | def findOrphanedFiles(self): |
377 | |
378 | === modified file 'DesktopDataExtractor/__init__.py' |
379 | --- DesktopDataExtractor/__init__.py 2011-04-19 13:54:07 +0000 |
380 | +++ DesktopDataExtractor/__init__.py 2011-05-31 05:55:54 +0000 |
381 | @@ -19,6 +19,8 @@ |
382 | import time |
383 | |
384 | import ArchiveCrawler |
385 | +import ArchiveCache |
386 | +import IconFinder |
387 | |
388 | try: |
389 | # only available on 2.7 so we provide a very simple backport |
390 | @@ -30,16 +32,19 @@ |
391 | |
392 | SUPPORTED_ARCHES = ("i386","amd64") |
393 | |
394 | - def __init__(self, aptroot, pooldir, datadir): |
395 | + def __init__(self, aptroot, archivedir, datadir, dist): |
396 | # init dirs |
397 | self.aptroot = aptroot |
398 | - self.pooldir = pooldir |
399 | + self.archivedir = archivedir |
400 | + self.pooldir = os.path.join(archivedir, 'pool') |
401 | self.datadir = datadir |
402 | + self.dist = dist |
403 | self.tmpdir = tempfile.mkdtemp() |
404 | self.menu_data = os.path.join(self.datadir, "menu-data") |
405 | self.menu_data_codecs = os.path.join(self.datadir, "menu-data-codecs") |
406 | if not os.path.exists(self.menu_data_codecs): |
407 | raise Exception, "no menu-data-codecs/ dir in %s" % datadir |
408 | + |
409 | |
410 | self.codecs_foradditional = { } |
411 | # packages we have already seen |
412 | @@ -62,12 +67,6 @@ |
413 | # available in certain arches |
414 | self.pkgs_per_arch = {} |
415 | self.pkgs_per_arch["all"] = set() |
416 | - # a mapping of package names to wanted application icons their |
417 | - # packages don't contain |
418 | - self.pkgs_to_missing_icons = {} |
419 | - # regular expressions for finding packages that might contain |
420 | - # wanted icons |
421 | - self.iconsearch_regex = [] |
422 | # now read the config |
423 | self._readConfig() |
424 | |
425 | @@ -89,7 +88,6 @@ |
426 | blacklist_desktop = os.path.join(self.datadir,"blacklist_desktop.cfg") |
427 | renamecfg = os.path.join(self.datadir,"rename.cfg") |
428 | annotatecfg = os.path.join(self.datadir,"annotate.cfg") |
429 | - iconsearchcfg = os.path.join(self.datadir,"icon_search.cfg") |
430 | if os.path.exists(blacklist): |
431 | logging.info("using blacklist: '%s'" % blacklist) |
432 | for line in open(blacklist).readlines(): |
433 | @@ -124,14 +122,6 @@ |
434 | annotations = annotations_str.split(",") |
435 | logging.debug("annotations: '%s': %s" % (desktopfile,annotations)) |
436 | self.desktop_annotate[desktopfile] = annotations |
437 | - if os.path.exists(iconsearchcfg): |
438 | - logging.info("using icon search: '%s'" % iconsearchcfg) |
439 | - for line in open(iconsearchcfg): |
440 | - line = line.strip() |
441 | - if line != "" and not line.startswith("#"): |
442 | - logging.debug("icon search regex: '%s'" % line) |
443 | - self.iconsearch_regex.append(line) |
444 | - |
445 | |
446 | |
447 | def extract(self): |
448 | @@ -144,8 +134,19 @@ |
449 | self._cleanOrphans() |
450 | |
451 | logging.info("Starting extraction in %s" % self.pooldir) |
452 | + contents = {} |
453 | + for arch in self.SUPPORTED_ARCHES: |
454 | + contents_file = "Contents-%s.gz" % arch |
455 | + contents_path = os.path.join(self.archivedir, "dists", |
456 | + self.dist, contents_file) |
457 | + contents[arch] = contents_path |
458 | + assert os.path.exists(contents_path), "Cannot find Contents at %s" % contents_path |
459 | + prefixes = IconFinder.IconFinder.prefixes |
460 | for arch in self.SUPPORTED_ARCHES: |
461 | logging.debug("looking at '%s'" % arch) |
462 | + self.archiveCache = ArchiveCache.ArchiveCache(contents[arch], |
463 | + prefixes) |
464 | + self.iconFinder = IconFinder.IconFinder(self.archiveCache) |
465 | self.pkgs_per_arch[arch] = set() |
466 | self.crawler = ArchiveCrawler.ArchiveCrawler(self.aptroot, |
467 | self.pooldir, |
468 | @@ -156,65 +157,10 @@ |
469 | self.crawler.updateCache() |
470 | self.crawler.registerCallback(self.inspectDeb) |
471 | self.crawler.crawl() |
472 | - self._findMissingIcons() |
473 | self._calcArchSpecific() |
474 | self._addCodecInformation() |
475 | pickle.dump(self.deb_to_files,open(self.deb_to_files_f,"w")) |
476 | logging.info("extract() finished") |
477 | - |
478 | - def _findMissingIcons(self): |
479 | - """ search for missing desktop icons in using the crawl cache """ |
480 | - for (pkgname, icons) in self.pkgs_to_missing_icons.items(): |
481 | - logging.debug("Searching for missing '%s' icons" % pkgname) |
482 | - # get an ordered set from most likely to least likely package |
483 | - to_search = OrderedDict() |
484 | - # add (in order of importance) all cached packages matching regex |
485 | - for regex in self.iconsearch_regex: |
486 | - # FIXME: use {0} here once the extraction host moves from |
487 | - # py2.5 to 2.6 |
488 | - if r'%(first_term)s' in regex: |
489 | - first_term = re.split('-|_', pkgname, 1)[0] |
490 | - regex = regex % { 'first_term' : first_term } |
491 | - try: |
492 | - matches = filter(re.compile(regex).match, self.pkgs_seen) |
493 | - for match in matches: |
494 | - to_search[match] = None |
495 | - except Exception as e: |
496 | - print "ERROR: %s" % e |
497 | - # queue all non-library dependencies of the package |
498 | - deps = self.crawler.cache[pkgname].candidate.dependencies |
499 | - for dep in deps: |
500 | - for dep_candidate in dep.or_dependencies: |
501 | - depname = dep_candidate.name |
502 | - if not depname.startswith('lib'): |
503 | - to_search[depname] = None |
504 | - |
505 | - # finally, search the set of likely packages |
506 | - for name in to_search: |
507 | - # get cached tarfile |
508 | - logging.debug("* Looking in %s" % name) |
509 | - if name not in self.crawler.pkgs_to_debpath: |
510 | - logging.debug(" Deb for %s not found!" % name) |
511 | - continue |
512 | - try: |
513 | - debPath = self.crawler.pkgs_to_debpath[name] |
514 | - datafile = self._extractDebData(debPath) |
515 | - tar = tarfile.open(datafile) |
516 | - except: |
517 | - logging.debug(" Deb for %s could not be opened!" % name) |
518 | - continue |
519 | - found = set() |
520 | - for icon in icons: |
521 | - (res, n) = self.search_icon(tar, icon, self.menu_data) |
522 | - if res == True: |
523 | - logging.debug(" Icon %s found!" % icon) |
524 | - found.add(icon) |
525 | - # stop searching for any icons we find |
526 | - icons = icons.difference(found) |
527 | - if len(icons) == 0: |
528 | - del self.pkgs_to_missing_icons[pkgname] |
529 | - break |
530 | - logging.info("missing icons left: '%s'" % self.pkgs_to_missing_icons) |
531 | |
532 | def _calcArchSpecific(self): |
533 | # now add the architecture information |
534 | @@ -358,54 +304,44 @@ |
535 | return False |
536 | |
537 | def search_icon(self, tarfile, iconName, outputdir): |
538 | - if iconName == None: |
539 | - logging.warning("search_icon() called with no icon name") |
540 | - return (False, None) |
541 | - |
542 | - # a iconName can be a single name or a full path |
543 | - # if it is a single name, look into a icon-theme path (usr/share/icons/hicolor) and then into usr/share/pixmaps |
544 | - # if it is a full path just look for this |
545 | - |
546 | - # this is the "full-path" case |
547 | - # FIXME: there are (some) icons that are not full pathes like "/usr/.../" |
548 | - # but "zapping/zapping.png" |
549 | - if "/" in iconName: |
550 | - newIconName = iconName.replace("/", "_") |
551 | - outpath = os.path.join(outputdir,"icons",newIconName) |
552 | - # prevent wasted disk read |
553 | - if os.path.exists(outpath): |
554 | - return (True, newIconName) |
555 | - res = self.extract_icon(tarfile, iconName, outpath) |
556 | - return (res, newIconName) |
557 | - |
558 | - # this is the "get-it-from-a-icontheme" case, look into icon-theme hicolor and usr/share/pixmaps |
559 | - |
560 | - # search path (ordered by importance) |
561 | - search_dirs = [ |
562 | - "usr/share/icons/hicolor/64x64", |
563 | - "usr/share/icons/hicolor/48x48", |
564 | - "usr/share/icons/hicolor/128x128", |
565 | - "usr/share/pixmaps", |
566 | - "usr/share/icons/hicolor/32x32", |
567 | - "usr/share/icons/hicolor/22x22", |
568 | - "usr/share/icons/hicolor/16x16", |
569 | - "usr/share/icons" |
570 | - ] |
571 | - # extensions (ordered by importance) |
572 | - pixmaps_ext = ["", ".png",".xpm",".svg"] |
573 | - |
574 | - # prevent wasted disk read |
575 | - if os.path.exists(os.path.join(outputdir,"icons",iconName)): |
576 | - return (True, None) |
577 | - for d in search_dirs: |
578 | - for name in tarfile.getnames(): |
579 | - if d in name: |
580 | - for ext in pixmaps_ext: |
581 | - if name.endswith(iconName+ext): |
582 | - res = self.extract_icon(tarfile, name, os.path.join(outputdir,"icons", os.path.basename(name))) |
583 | - return (res, None) |
584 | - logging.warning("no icon: '%s' could be found" % iconName) |
585 | - return (False, None) |
586 | + # ask IconFinder for an icon and its package |
587 | + results = self.iconFinder.search(iconName) |
588 | + if not results: |
589 | + logging.warning("Could not find package with icon '%s'" % iconName) |
590 | + return None |
591 | + logging.debug("Found icon and package %s" % str(results)) |
592 | + pkgname, iconPath = results |
593 | + |
594 | + # retrieve the package deb from the archive crawler |
595 | + if pkgname not in self.crawler.pkgs_to_debpath: |
596 | + logging.warning("Could not find deb '%s' for '%s'" % |
597 | + (pkgname, iconName)) |
598 | + return None |
599 | + try: |
600 | + debPath = self.crawler.pkgs_to_debpath[pkgname] |
601 | + datafile = self._extractDebData(debPath) |
602 | + tar = tarfile.open(datafile) |
603 | + except Exception as e: |
604 | + logging.warning("Could not open deb '%s' for '%s'" % |
605 | + (pkgname, iconName, str(e))) |
606 | + return None |
607 | + |
608 | + # extract the icon |
609 | + newIconName = iconName.replace("/", "_") |
610 | + outpath = os.path.join(outputdir, "icons", newIconName) |
611 | + success = self.extract_icon(tar, os.path.join(".", iconPath), outpath) |
612 | + os.remove(datafile) |
613 | + if not success: |
614 | + logging.warning("Could not extract '%s' from '%s' tarfile" % |
615 | + (iconPath, pkgname)) |
616 | + return None |
617 | + |
618 | + # validate it is an icon |
619 | + filetype = subprocess.check_output(["file", "-b", outpath]).strip() |
620 | + if filetype == "ASCII text": |
621 | + logging.warning("'%s' is not an icon" % iconPath) |
622 | + return None |
623 | + return newIconName |
624 | |
625 | def tarfile_extract_orlog(self, dataFile, path): |
626 | try: |
627 | @@ -493,12 +429,16 @@ |
628 | line = string.strip(line) |
629 | if line.startswith("Icon="): |
630 | iconName = line[line.index("=")+1:] |
631 | - logging.debug("Package '%s' needs icon '%s'" % (pkgname, iconName)) |
632 | - (res, newIconName) = self.search_icon(dataFile, iconName, outputdir) |
633 | - if res == False: |
634 | - if not pkgname in self.pkgs_to_missing_icons: |
635 | - self.pkgs_to_missing_icons[pkgname] = set() |
636 | - self.pkgs_to_missing_icons[pkgname].add(iconName) |
637 | + if not iconName: |
638 | + logging.debug("No icon needed for '%s'" % |
639 | + os.path.basename(path)) |
640 | + newIconName = None |
641 | + else: |
642 | + logging.debug("Package '%s' needs icon '%s'" % (pkgname, iconName)) |
643 | + newIconName = self.search_icon(dataFile, iconName, outputdir) |
644 | + if newIconName == None: |
645 | + logging.warning("Could not retrieve icon for '%s'" % |
646 | + os.path.basename(path)) |
647 | |
648 | # now check for supicious pkgnames (FIXME: make this not hardcoded) |
649 | if "-common" in pkgname or "-data" in pkgname: |
650 | @@ -580,7 +520,14 @@ |
651 | # extract it here, python tarfile does not support lzma |
652 | subprocess.call(["lzma","-d",datafile]) |
653 | datafile = os.path.splitext(datafile)[0] |
654 | - return datafile |
655 | + # make name unique |
656 | + datafile_new = datafile.replace("data", repr(time.time())) |
657 | + try: |
658 | + os.rename(datafile, datafile_new) |
659 | + except: |
660 | + logging.warning("Renaming tarball to %s failed" % datafile_new) |
661 | + return |
662 | + return datafile_new |
663 | |
664 | def inspectDeb(self, crawler, filename, pkgname, ver, pkgarch, component): |
665 | """ check if the deb is interessting for us (not blacklisted) """ |
666 | |
667 | === added directory 'IconFinder' |
668 | === added file 'IconFinder/__init__.py' |
669 | --- IconFinder/__init__.py 1970-01-01 00:00:00 +0000 |
670 | +++ IconFinder/__init__.py 2011-05-31 05:55:54 +0000 |
671 | @@ -0,0 +1,90 @@ |
672 | +import logging |
673 | + |
674 | +# Policies for searching icon directories. |
675 | +BEST_ICON_POLICY, SMALL_ICON_POLICY = range(2) |
676 | + |
677 | +_SEARCH_DIRS = { |
678 | + # Finds better, but often extremely large icons. Favors SVG. |
679 | + BEST_ICON_POLICY: ["usr/share/icons/hicolor/scalable", |
680 | + "usr/share/icons/hicolor/128x128", |
681 | + "usr/share/icons/hicolor/256x256", |
682 | + "usr/share/icons/hicolor/64x64", |
683 | + "usr/share/icons/hicolor/48x48", |
684 | + "usr/share/pixmaps", |
685 | + "usr/share/icons/hicolor/32x32", |
686 | + "usr/share/icons/hicolor/22x22", |
687 | + "usr/share/icons/hicolor/16x16", |
688 | + "usr/share/icons/Humanity", |
689 | + "usr/share/icons", |
690 | + "usr/share"], |
691 | + |
692 | + # Provides smaller icons. Favors PNG. |
693 | + SMALL_ICON_POLICY: ["usr/share/icons/hicolor/64x64", |
694 | + "usr/share/icons/hicolor/48x48", |
695 | + "usr/share/icons/hicolor/128x128", |
696 | + "usr/share/pixmaps", |
697 | + "usr/share/icons/hicolor/32x32", |
698 | + "usr/share/icons/hicolor/22x22", |
699 | + "usr/share/icons/hicolor/16x16", |
700 | + "usr/share/icons/Humanity", |
701 | + "usr/share/icons", |
702 | + "usr/share"]} |
703 | + |
704 | + |
705 | +class IconFinder: |
706 | + """Searches for application icons according to a given policy.""" |
707 | + |
708 | + # When called to search for an icon, IconFinder looks in indexed |
709 | + # path prefixes in an order decided by policy. |
710 | + |
711 | + def __init__(self, cache, policy=SMALL_ICON_POLICY): |
712 | + """ |
713 | + @cache: An ArchiveCache instance to search for icons with. |
714 | + @policy: the priority policy to use when searching for icons |
715 | + Either BEST_ICON_POLICY or SMALL_ICON_POLICY (default) |
716 | + """ |
717 | + self._set_cache(cache) |
718 | + self._set_policy(policy) |
719 | + |
720 | + def search(self, icon_name): |
721 | + """ |
722 | + Finds, according to policy, the best package name and icon |
723 | + path for `icon_name`. |
724 | + @icon_name: A valid Icon string for an XDG Desktop Entry. |
725 | + @returns ('packagename', '/path/to/icon') or None |
726 | + """ |
727 | + if icon_name.startswith("/"): |
728 | + result = self._cache.search_exact(icon_name) |
729 | + if result: |
730 | + return (result, icon_name.strip("/")) |
731 | + else: |
732 | + # match any path ending with the name and an extension |
733 | + pattern = ".*/%s(.png|.xpm|.svg|)$" % icon_name |
734 | + for prefix in self._search_dirs: |
735 | + results = self._cache.search(pattern, prefix, first_only=True) |
736 | + if results: |
737 | + return results[0] |
738 | + |
739 | + def _set_cache(self, cache): |
740 | + if not self.prefixes.issubset(set(cache.prefixes)): |
741 | + logging.warning("Using an ArchiveCache that doesn't index " |
742 | + "IconFinder's search prefixes") |
743 | + self._cache = cache |
744 | + |
745 | + def _set_policy(self, policy): |
746 | + # Ensure the policy exists before assigning |
747 | + assert policy in (BEST_ICON_POLICY, SMALL_ICON_POLICY) |
748 | + self._policy = policy |
749 | + self._search_dirs = _SEARCH_DIRS[policy] |
750 | + |
751 | + # A set of every path in the search directories |
752 | + prefixes = set(sum(_SEARCH_DIRS.values(), [])) |
753 | + |
754 | + cache = property(lambda self: self._cache, _set_cache, doc=""" |
755 | + An ArchiveCache to search for icons with. It should index all |
756 | + file paths in `IconFinder.prefixes` for good performance.""") |
757 | + |
758 | + policy = property(lambda self: self._policy, _set_policy, doc=""" |
759 | + Determines what type and size of icon the IconFinder will |
760 | + search for first. Uses SMALL_ICON_POLICY by default. Switch to |
761 | + BEST_ICON_POLICY to favor large and scalable icons.""") |
762 | |
763 | === removed file 'data/icon_search.cfg' |
764 | --- data/icon_search.cfg 2011-04-14 09:16:47 +0000 |
765 | +++ data/icon_search.cfg 1970-01-01 00:00:00 +0000 |
766 | @@ -1,11 +0,0 @@ |
767 | -# these are regular expressions for finding packages with icons for a |
768 | -# *.desktop file, when icons are not in the application package. Any |
769 | -# string "{0}" will be formatted to the first hyphen-delimited term of |
770 | -# an application package (eg "foo-bar-baz" -> "foo") |
771 | - |
772 | -# for cases like wesnoth/wesnoth-data |
773 | -^%(first_term)s.+(data|common)$ |
774 | - |
775 | -# for when environment applications use environmental icons |
776 | -gnome-icon-theme |
777 | -oxygen-icon-theme |
778 | |
779 | === modified file 'getMenuData.py' |
780 | --- getMenuData.py 2011-04-11 13:12:12 +0000 |
781 | +++ getMenuData.py 2011-05-31 05:55:54 +0000 |
782 | @@ -39,8 +39,10 @@ |
783 | parser.add_option("--actionsdir", "--actionsdir", dest="actionsdir", |
784 | help="actionsdir", |
785 | default=apt_pkg.Config.Find("APT::Architecture")) |
786 | - parser.add_option("-p", "--pooldir", dest="pooldir", |
787 | - help="pooldir", default="/srv/archive.ubuntu.com/ubuntu/pool") |
788 | + parser.add_option("-a", "--archivedir", dest="archivedir", |
789 | + help="archivedir", default="/srv/archive.ubuntu.com/ubuntu") |
790 | + parser.add_option("--dist", dest="dist", |
791 | + help="dist", default="natty") |
792 | parser.add_option("-d", "--datadir", dest="datadir", |
793 | help="datadir", default="data/") |
794 | (options, args) = parser.parse_args() |
795 | @@ -48,8 +50,9 @@ |
796 | |
797 | # now run it |
798 | desktop_extractor = DesktopDataExtractor(options.aptroot, |
799 | - options.pooldir, |
800 | - options.datadir) |
801 | + options.archivedir, |
802 | + options.datadir, |
803 | + options.dist) |
804 | desktop_extractor.extract() |
805 | |
806 | logging.info("extraction finished") |
807 | |
808 | === added directory 'tests/archive' |
809 | === added directory 'tests/archive/dists' |
810 | === added directory 'tests/archive/dists/testing' |
811 | === added file 'tests/archive/dists/testing/Contents_amd64' |
812 | --- tests/archive/dists/testing/Contents_amd64 1970-01-01 00:00:00 +0000 |
813 | +++ tests/archive/dists/testing/Contents_amd64 2011-05-31 05:55:54 +0000 |
814 | @@ -0,0 +1,5 @@ |
815 | + |
816 | +FILE LOCATION |
817 | +usr/share/icons/hicolor/48x48/apps/audacious.png foo/audacious |
818 | +usr/share/pixmaps/synaptic.png foo/synaptic |
819 | +usr/lib/GNUstep/Applications/Cynthiune.app/Resources/Cynthiune.tiff foo/cynthiune.app |
820 | |
821 | === added file 'tests/archive/dists/testing/Contents_i386' |
822 | --- tests/archive/dists/testing/Contents_i386 1970-01-01 00:00:00 +0000 |
823 | +++ tests/archive/dists/testing/Contents_i386 2011-05-31 05:55:54 +0000 |
824 | @@ -0,0 +1,3 @@ |
825 | +FILE LOCATION |
826 | +usr/share/icons/hicolor/48x48/apps/cheese.png foo/cheese-common |
827 | +usr/share/pixmaps/python2.5.xpm foo/python2.5 |
828 | |
829 | === renamed directory 'tests/pool' => 'tests/archive/pool' |
830 | === modified file 'tests/archive/pool/get_pkgs.sh' |
831 | --- tests/pool/get_pkgs.sh 2011-04-16 18:10:30 +0000 |
832 | +++ tests/archive/pool/get_pkgs.sh 2011-05-31 05:55:54 +0000 |
833 | @@ -57,3 +57,13 @@ |
834 | cd universe/c/cynthiune.app |
835 | wget -c https://launchpad.net/ubuntu/+source/cynthiune.app/0.9.5-11ubuntu1/+buildjob/1962087/+files/cynthiune.app_0.9.5-11ubuntu1_amd64.deb |
836 | cd ../../.. |
837 | + |
838 | +# zip contents files |
839 | +cd ../dists/testing |
840 | +if [ ! -e Contents-i386.gz ] || [ ! -e Contents-amd64.gz ]; then |
841 | + rm -f Contents-* |
842 | + cp Contents_i386 Contents-i386 |
843 | + cp Contents_amd64 Contents-amd64 |
844 | + gzip Contents-* |
845 | +fi |
846 | +cd ../../pool |
847 | |
848 | === added file 'tests/data/Contents-i386.gz' |
849 | Binary files tests/data/Contents-i386.gz 1970-01-01 00:00:00 +0000 and tests/data/Contents-i386.gz 2011-05-31 05:55:54 +0000 differ |
850 | === removed file 'tests/pool/main/h/hello/hello_2.1.1-4_i386.deb' |
851 | Binary files tests/pool/main/h/hello/hello_2.1.1-4_i386.deb 2007-07-24 10:17:39 +0000 and tests/pool/main/h/hello/hello_2.1.1-4_i386.deb 1970-01-01 00:00:00 +0000 differ |
852 | === added file 'tests/test_archive_cache.py' |
853 | --- tests/test_archive_cache.py 1970-01-01 00:00:00 +0000 |
854 | +++ tests/test_archive_cache.py 2011-05-31 05:55:54 +0000 |
855 | @@ -0,0 +1,96 @@ |
856 | +#!/usr/bin/env python |
857 | + |
858 | +import sys |
859 | +import unittest |
860 | +import subprocess |
861 | +sys.path.insert(0, "../") |
862 | + |
863 | +from ArchiveCache import ArchiveCache |
864 | + |
865 | +# This data will need to be refreshed if Contents-i386.gz changes |
866 | +CONTENTS_PATH = "data/Contents-i386.gz" |
867 | +MD5_CHECKSUM = subprocess.check_output(["md5sum", CONTENTS_PATH]).split()[0] |
868 | +INDEX_PATHS = ("usr/share/applications", "/usr/share/icons", "/usr/share/") |
869 | +CONTENTS_LINES, INDEX_LINES = 258, (1, 12, 255) |
870 | + |
871 | +SEARCH_EXACT_PAIRS = { |
872 | + "/etc/dbus-1/system.d/com.ubuntu.SoftwareCenter.conf": "software-center", |
873 | + "/usr/bin/software-center": "software-center", |
874 | + "usr/share/icons/hicolor/32x32/apps/softwarecenter.png": "software-center", |
875 | + "/usr/share/xubuntu-docs/ubuntu-software-center.html": "xubuntu-docs", |
876 | + "/red/herring/path": None} |
877 | + |
878 | +SEARCH_PAIRS = { |
879 | + (".*SoftwareCenter.conf", ""): [("software-center", |
880 | + "etc/dbus-1/system.d/com.ubuntu.SoftwareCenter.conf")], |
881 | + (".*softwarecenter.svg", "usr/share"): [("software-center", |
882 | + "usr/share/icons/hicolor/scalable/apps/softwarecenter.svg")], |
883 | + (".*\.desktop$", "/usr/share/applications"): [("software-center", |
884 | + "usr/share/applications/ubuntu-software-center.desktop")], |
885 | + (".*software-center.html", "usr/share/"): [("xubuntu-docs", |
886 | + "usr/share/xubuntu-docs/ubuntu-software-center.html")]} |
887 | + |
888 | + |
889 | +class ArchiveCacheTest(unittest.TestCase): |
890 | + """ |
891 | + Tests the proper operation of ArchiveCache. |
892 | + """ |
893 | + @classmethod |
894 | + def setUp(self): |
895 | + self.cache = ArchiveCache(CONTENTS_PATH, INDEX_PATHS) |
896 | + |
897 | + def test_contents_checksum(self): |
898 | + """archive cache loads contents.gz files correctly""" |
899 | + self.assertEqual(self.cache.checksum, MD5_CHECKSUM) |
900 | + |
901 | + def test_contents_length(self): |
902 | + """archive cache stores its length in lines""" |
903 | + self.assertTrue(hasattr(self.cache, "__len__")) |
904 | + self.assertEqual(len(self.cache), CONTENTS_LINES) |
905 | + |
906 | + def test_index_length(self): |
907 | + """archive cache indexes store their length in lines""" |
908 | + for i, path in enumerate(INDEX_PATHS): |
909 | + self.assertEqual(self.cache.__len__(path), INDEX_LINES[i]) |
910 | + |
911 | + def test_index_checking(self): |
912 | + """archive cache allows idiomatic checking of index existance""" |
913 | + self.assertTrue(hasattr(self.cache, "__contains__")) |
914 | + self.assertTrue(hasattr(self.cache, "prefixes")) |
915 | + for path in INDEX_PATHS: |
916 | + self.assertTrue(path.strip("/") in self.cache.prefixes) |
917 | + |
918 | + def test_search_exact(self): |
919 | + """archive cache can find an exact path from contents""" |
920 | + for arg, result in SEARCH_EXACT_PAIRS.items(): |
921 | + self.assertEqual(self.cache.search_exact(arg), result) |
922 | + |
923 | + def test_search_regex(self): |
924 | + """archive cache can find a regex result from contents""" |
925 | + for args, results in SEARCH_PAIRS.items(): |
926 | + self.assertEqual(set(self.cache.search(*args)), set(results)) |
927 | + |
928 | + def test_first_only_search(self): |
929 | + """archive cache can stop at first result in search""" |
930 | + results = self.cache.search(".*softwarecenter(.png|.svg)$", |
931 | + "/usr/share/icons", first_only=True) |
932 | + self.assertEqual(len(results), 1) |
933 | + |
934 | + def test_search_exact_symlinked(self): |
935 | + """archive cache should return partial path matches for symlinks""" |
936 | + # Contents.gz interprets symlinks as files. If we search for an exact |
937 | + # file via a symlinked directory, it should still work. |
938 | + result = self.cache.search_exact("usr/share/symlinked_dir/linked_file") |
939 | + self.assertEqual(result, "software-center") |
940 | + |
941 | + def test_search_no_memory(self): |
942 | + """archive cache search should work without memory caching""" |
943 | + old_cache = self.cache |
944 | + self.cache = ArchiveCache(CONTENTS_PATH, INDEX_PATHS, memory=0) |
945 | + self.test_search_exact() |
946 | + self.test_search_regex() |
947 | + self.test_search_exact_symlinked() |
948 | + self.cache = old_cache |
949 | + |
950 | +if __name__ == "__main__": |
951 | + unittest.main() |
952 | |
953 | === modified file 'tests/test_cmd_data_extractor.py' |
954 | --- tests/test_cmd_data_extractor.py 2008-06-30 10:47:06 +0000 |
955 | +++ tests/test_cmd_data_extractor.py 2011-05-31 05:55:54 +0000 |
956 | @@ -26,7 +26,7 @@ |
957 | |
958 | def testExtractorSimple(self): |
959 | extractor = CommandDataExtractor("./aptroot", |
960 | - "./pool", |
961 | + "./archive/pool", |
962 | "./data") |
963 | extractor.extract() |
964 | self.assert_(os.path.exists("./data/scan.data")) |
965 | @@ -34,13 +34,13 @@ |
966 | |
967 | def testExtractorOrphan(self): |
968 | extractor = CommandDataExtractor("./aptroot", |
969 | - "./pool", |
970 | + "./archive/pool", |
971 | "./data") |
972 | extractor.extract() |
973 | self.assert_(len(open("./data/scan.data").readlines()) > 2) |
974 | self.assert_("gnome-session-remove" in open(extractor.cmd_data_f).read()) |
975 | # now simulate archive removal of a pkg |
976 | - b = "./pool/main/g/gnome-session/gnome-session_2.14.1-0ubuntu11_i386.deb" |
977 | + b = "./archive/pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb" |
978 | os.rename(b, b+".xxx") |
979 | # clean orphaned desktop files and make sure that the file really |
980 | # got removed |
981 | @@ -51,7 +51,7 @@ |
982 | |
983 | def testExtractorCleanPkg(self): |
984 | extractor = CommandDataExtractor("./aptroot", |
985 | - "./pool", |
986 | + "./archive/pool", |
987 | "./data") |
988 | # simulate superseeding |
989 | open("./data/scan.data","w").write("i386|main|gnome-session|gnome-session-remove,gnome-session-save,gnome-session-properties,x-session-manager\n") |
990 | @@ -60,7 +60,7 @@ |
991 | self.assert_("gnome-wm" in open(extractor.cmd_data_f).read()) |
992 | |
993 | if __name__ == "__main__": |
994 | - subprocess.call(["(cd pool; ./get_pkgs.sh)"],shell=True) |
995 | + subprocess.call(["(cd archive/pool; ./get_pkgs.sh)"],shell=True) |
996 | logging.basicConfig(level=logging.DEBUG) |
997 | apt_pkg.init() |
998 | #unittest.main(defaultTest="testCommandDataExtractor.testExtractorOrphan") |
999 | |
1000 | === modified file 'tests/test_cralwer.py' |
1001 | --- tests/test_cralwer.py 2007-11-28 09:47:20 +0000 |
1002 | +++ tests/test_cralwer.py 2011-05-31 05:55:54 +0000 |
1003 | @@ -21,7 +21,7 @@ |
1004 | |
1005 | def setUp(self): |
1006 | # update the cache only once |
1007 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1008 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1009 | crawler.updateCache() |
1010 | # remove pkgs-found file to make tests meaningful |
1011 | self.rm(self.pkgs_found_file) |
1012 | @@ -31,7 +31,7 @@ |
1013 | def callback_helper(crawler, debfile, pkg, ver, arch, component): |
1014 | self._i += 1 |
1015 | return True |
1016 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1017 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1018 | # test if we allow only callables |
1019 | try: |
1020 | crawler.registerCallback(self._i) |
1021 | @@ -49,7 +49,7 @@ |
1022 | del self._i |
1023 | |
1024 | def testCrawlerSkipDone(self): |
1025 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1026 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1027 | crawler.crawl() |
1028 | self.assert_(len(crawler.debfiles_done) > 0) |
1029 | # delete the found-pkgs marker file, crawl again |
1030 | @@ -59,41 +59,41 @@ |
1031 | self.rm(self.debfiles_done) |
1032 | |
1033 | def testCrawlerSkipPersistant(self): |
1034 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1035 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1036 | crawler.crawl() |
1037 | # delete the found-pkgs marker file, create new crawler, crawl again, |
1038 | # see if we skip the done files |
1039 | self.rm(self.pkgs_found_file) |
1040 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1041 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1042 | crawler.crawl() |
1043 | self.assert_(not os.path.exists(self.pkgs_found_file)) |
1044 | |
1045 | def testFindOrphanedFiles(self): |
1046 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1047 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1048 | crawler.crawl() |
1049 | - crawler.debfiles_done.add("./pool/main/h/hello/hello_2.1.1-3_i386.deb") |
1050 | + crawler.debfiles_done.add("./archive/pool/main/h/hello/hello_2.1.1-3_i386.deb") |
1051 | print crawler.findOrphanedFiles() |
1052 | self.assert_(len(crawler.findOrphanedFiles()) == 1) |
1053 | |
1054 | def testFindOrphanedPackages(self): |
1055 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1056 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1057 | crawler.crawl() |
1058 | print crawler.findOrphanedPackages() |
1059 | self.assert_(len(crawler.findOrphanedPackages()) == 0) |
1060 | - crawler.debfiles_done.add("./pool/main/h/hello/hello_2.1.1-3_i386.deb") |
1061 | + crawler.debfiles_done.add("./archive/pool/main/h/hello/hello_2.1.1-3_i386.deb") |
1062 | print crawler.findOrphanedPackages() |
1063 | self.assert_(len(crawler.findOrphanedPackages()) == 1) |
1064 | |
1065 | def testFindObsoletedPackages(self): |
1066 | # FIXME: this test is pretty useless |
1067 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", "i386") |
1068 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", "i386") |
1069 | crawler.crawl() |
1070 | #print "obsolete: ", crawler.findObsoletedPackages() |
1071 | self.assert_(len(crawler.findObsoletedPackages()) > 0) |
1072 | |
1073 | def testArches(self): |
1074 | for arch in ('amd64', 'i386'): |
1075 | - crawler = ArchiveCrawler("./aptroot", "./pool", "./actions", "./data", arch) |
1076 | + crawler = ArchiveCrawler("./aptroot", "./archive/pool", "./actions", "./data", arch) |
1077 | crawler.updateCache() |
1078 | self.assert_(len(glob.glob(os.path.join(crawler.aptroot,"var/lib/apt/lists")+"/*%s*" % arch)) > 0) |
1079 | |
1080 | |
1081 | === modified file 'tests/test_deb_package.py' |
1082 | --- tests/test_deb_package.py 2008-03-04 12:45:12 +0000 |
1083 | +++ tests/test_deb_package.py 2011-05-31 05:55:54 +0000 |
1084 | @@ -11,7 +11,7 @@ |
1085 | |
1086 | class testDebPackage(unittest.TestCase): |
1087 | def testDebPackage(self): |
1088 | - name = "pool/main/g/git-core/git-core_1.5.4.3-1ubuntu1_i386.deb" |
1089 | + name = "archive/pool/main/g/git-core/git-core_1.5.4.3-1ubuntu2_i386.deb" |
1090 | pkg = CommandDataExtractor.data_extractor.load(name) |
1091 | print pkg |
1092 | print pkg.name |
1093 | |
1094 | === modified file 'tests/test_desktop_data_extractor.py' |
1095 | --- tests/test_desktop_data_extractor.py 2011-04-16 18:10:30 +0000 |
1096 | +++ tests/test_desktop_data_extractor.py 2011-05-31 05:55:54 +0000 |
1097 | @@ -13,74 +13,69 @@ |
1098 | |
1099 | from DesktopDataExtractor import DesktopDataExtractor |
1100 | |
1101 | + |
1102 | class TestDesktopDataExtractor(unittest.TestCase): |
1103 | |
1104 | debfiles_done = "./data/desktop.p" |
1105 | |
1106 | - def rm(self, f): |
1107 | - if os.path.exists(f): |
1108 | - os.unlink(f) |
1109 | - def setUp(self): |
1110 | - self.rm(self.debfiles_done) |
1111 | + @classmethod |
1112 | + def setUpClass(self): |
1113 | + if os.path.exists(self.debfiles_done): |
1114 | + os.unlink(self.debfiles_done) |
1115 | try: |
1116 | shutil.rmtree("./data/menu-data") |
1117 | except: |
1118 | pass |
1119 | + self.extractor = DesktopDataExtractor("./aptroot", |
1120 | + "./archive", |
1121 | + "./data", |
1122 | + "testing") |
1123 | + self.extractor.extract() |
1124 | |
1125 | def test_extractor_cheese_common(self): |
1126 | - extractor = DesktopDataExtractor("./aptroot-lucid", |
1127 | - "./pool", |
1128 | - "./data") |
1129 | - extractor.extract() |
1130 | # check if icon extraction works |
1131 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "cheese.png"))) |
1132 | - |
1133 | + self.extractor = DesktopDataExtractor("./aptroot-lucid", |
1134 | + "./archive", |
1135 | + "./data", |
1136 | + "testing") |
1137 | + self.extractor.extract() |
1138 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "cheese"))) |
1139 | + |
1140 | def test_symlinked_icons(self): |
1141 | - extractor = DesktopDataExtractor("./aptroot-maverick", |
1142 | - "./pool", |
1143 | - "./data") |
1144 | - extractor.extract() |
1145 | # check if symlinked (a) icon (b) directory extraction work |
1146 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "audacious.png"))) |
1147 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "_usr_lib_GNUstep_Applications_Cynthiune.app_Resources_Cynthiune.tiff"))) |
1148 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "audacious"))) |
1149 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "_usr_lib_GNUstep_Applications_Cynthiune.app_Resources_Cynthiune.tiff"))) |
1150 | |
1151 | def test_extractor_simple(self): |
1152 | - extractor = DesktopDataExtractor("./aptroot", |
1153 | - "./pool", |
1154 | - "./data") |
1155 | - extractor.extract() |
1156 | # see if extraction works |
1157 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"synaptic.desktop"))) |
1158 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"synaptic.desktop"))) |
1159 | # see if lzma works |
1160 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"ooo-math.desktop"))) |
1161 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"ooo-math.desktop"))) |
1162 | # gnome-about is blacklisted |
1163 | - self.assertTrue(not os.path.exists(os.path.join(extractor.menu_data,"gnome-about.desktop"))) |
1164 | + self.assertTrue(not os.path.exists(os.path.join(self.extractor.menu_data,"gnome-about.desktop"))) |
1165 | # check if icon extraction works |
1166 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"icons", "_usr_share_pixmaps_python2.5.xpm"))) |
1167 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"icons", "_usr_share_pixmaps_python2.5.xpm"))) |
1168 | |
1169 | def test_extractor_simple_maverick(self): |
1170 | extractor = DesktopDataExtractor("./aptroot-maverick", |
1171 | - "./pool", |
1172 | - "./data") |
1173 | + "./archive", |
1174 | + "./data", |
1175 | + "testing") |
1176 | extractor.extract() |
1177 | # check if we have baobab |
1178 | - self.assertTrue(os.path.exists(os.path.join(extractor.menu_data,"baobab.desktop"))) |
1179 | + self.assertTrue(os.path.exists(os.path.join(self.extractor.menu_data,"baobab.desktop"))) |
1180 | |
1181 | |
1182 | def test_extractor_orphan(self): |
1183 | - extractor = DesktopDataExtractor("./aptroot", |
1184 | - "./pool", |
1185 | - "./data") |
1186 | - extractor.extract() |
1187 | - self.assert_(os.path.exists(os.path.join(extractor.menu_data,"session-properties.desktop"))) |
1188 | + self.assert_(os.path.exists(os.path.join(self.extractor.menu_data,"session-properties.desktop"))) |
1189 | |
1190 | # now simulate archive removal of a pkg |
1191 | - b = "./pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb" |
1192 | + b = "./archive/pool/main/g/gnome-session/gnome-session_2.22.1.1-0ubuntu2_i386.deb" |
1193 | os.rename(b, b+".xxx") |
1194 | # clean orphaned desktop files and make sure that the file really |
1195 | # got removed |
1196 | - extractor.extract() |
1197 | - self.assert_(not os.path.exists(os.path.join(extractor.menu_data,"session-properties.desktop"))) |
1198 | + self.extractor.extract() |
1199 | + self.assert_(not os.path.exists(os.path.join(self.extractor.menu_data,"session-properties.desktop"))) |
1200 | os.rename(b+".xxx", b) |
1201 | |
1202 | if __name__ == "__main__": |
1203 | @@ -88,6 +83,6 @@ |
1204 | logging.basicConfig(level=logging.DEBUG) |
1205 | else: |
1206 | logging.basicConfig(level=logging.INFO) |
1207 | - subprocess.call(["(cd pool; ./get_pkgs.sh)"],shell=True) |
1208 | + subprocess.call(["(cd archive/pool; ./get_pkgs.sh)"],shell=True) |
1209 | apt_pkg.init() |
1210 | unittest.main() |
1211 | |
1212 | === added file 'tests/test_icon_finder.py' |
1213 | --- tests/test_icon_finder.py 1970-01-01 00:00:00 +0000 |
1214 | +++ tests/test_icon_finder.py 2011-05-31 05:55:54 +0000 |
1215 | @@ -0,0 +1,44 @@ |
1216 | +#!/usr/bin/env python |
1217 | + |
1218 | +import sys |
1219 | +import unittest |
1220 | +sys.path.insert(0, "../") |
1221 | + |
1222 | +from ArchiveCache import ArchiveCache |
1223 | +from IconFinder import IconFinder |
1224 | + |
1225 | +# This data will need to be refreshed as Contents-i386.gz changes |
1226 | +CONTENTS_PATH = "data/Contents-i386.gz" |
1227 | +ICONS = ( |
1228 | + "usr/share/icons/hicolor/128x128/apps/softwarecenter.png", |
1229 | + "usr/share/icons/hicolor/16x16/apps/softwarecenter.png", |
1230 | + "usr/share/icons/hicolor/22x22/apps/softwarecenter.png", |
1231 | + "usr/share/icons/hicolor/24x24/apps/ppa.svg", |
1232 | + "usr/share/icons/hicolor/24x24/apps/softwarecenter.png", |
1233 | + "usr/share/icons/hicolor/24x24/apps/unknown-channel.svg", |
1234 | + "usr/share/icons/hicolor/32x32/apps/softwarecenter.png", |
1235 | + "usr/share/icons/hicolor/48x48/apps/softwarecenter.png", |
1236 | + "usr/share/icons/hicolor/64x64/apps/softwarecenter.png", |
1237 | + "usr/share/icons/hicolor/scalable/apps/category-show-all.svg", |
1238 | + "usr/share/icons/hicolor/scalable/apps/partner.svg", |
1239 | + "usr/share/icons/hicolor/scalable/apps/softwarecenter.svg") |
1240 | + |
1241 | + |
1242 | +class IconFinderTest(unittest.TestCase): |
1243 | + """ |
1244 | + Tests the proper operation of IconFinder. |
1245 | + """ |
1246 | + def setUp(self): |
1247 | + self.cache = ArchiveCache(CONTENTS_PATH, IconFinder.prefixes) |
1248 | + |
1249 | + def test_search(self): |
1250 | + finder = IconFinder(self.cache) |
1251 | + # results for faulty search |
1252 | + self.assertEqual(finder.search("foobarbaz"), None) |
1253 | + # results for good search |
1254 | + package, path = finder.search("softwarecenter") |
1255 | + self.assertEqual(package, "software-center") |
1256 | + self.assertTrue(path in ICONS) |
1257 | + |
1258 | +if __name__ == "__main__": |
1259 | + unittest.main() |
On Tue, May 31, 2011 at 05:55:57AM -0000, Jacob Johan Edwards wrote: /code.launchpad .net/~j- johan-edwards/ archive- crawler/ use-aptfile/ +merge/ 62944
> Jacob Johan Edwards has proposed merging lp:~j-johan-edwards/archive-crawler/use-aptfile into lp:~mvo/archive-crawler/mvo.
>
> Requested reviews:
> Michael Vogt (mvo)
>
> For more details, see:
> https:/
>
> Apologies for the huge patch. I abstracted IconFinder out, and that
> ballooned the line count of this solution quite a bit.
That is fine, the content is absolutely worth it!
> SUMMARY
>
> This branch adds a new component, `IconFinder`, which should solve the
> issue of finding icons in an archive. It is a frontend for `ArchiveCache`,
> a general solution for searching archive files via Contents-<ARCH>.gz.
>
> In the future, `ArchiveCache` might replace `ArchiveCrawler` as a cleaner,
> quicker, and more modularity-friendly solution. As it stands, only `IconFinder` uses it.
Awsome, this is great work already and has even more potential!
[..]
> TESTING
>
> Out of 361 desktop entries I tested (an old main[a-k] + universe[2-d]) with
> 343 icon requests, 13 failed. Of these, 11 failed because they requested
> non-existent icons, 1 failed because it requested an icon too large ('arista'),
> and 1 failed because it stored its icon in /usr/lib without a symlink in
> /usr/share or and absolute path request ('batmon.app').
>
> The unit tests that pass as of r123 pass now, as well as new ones for
> `IconFinder` and `ArchiveCache`.
Thanks for updating the tests as well (and adding more!), it looks
really well done.
> FUTURE
>
> Hopefully this branch -- together with a solution for extracting large icons
> -- will solve #599535. With the exception of arista, all icons that fail to
> extract should now be the fault of the package. Hopefully. We'll see if the
> full archive has an interesting edge cases.
I'm running a full oneiric extraction now, it will be interessting to
see what it outputs and I will compare to the previous runs to find
regressions (but I really doubt there will be any).
Thanks,
Michael