lp:~jjed/archive-crawler/near-rewrite

Created by Jjed and last modified

An as-of-yet experimental, complete overhaul of archive-crawler. Its scope includes:

* A new test suite based on a stable, artificial package archive
* Removal of the ArchiveCrawler and DesktopDataExtractor workflow
* Use of ArchiveCache to allow quick metadata extraction without a full archive copy
* A general cleanup and modularization of the codebase

It leverages existing code, but is rather close to a rewrite. The rationale for these changes are:

a) Maintainability. The scope of ArchiveCrawler has expanded well beyond its initial design. Also, its test suite is based on an every-changing archive, making it unstable.
b) Performance. ArchiveCrawler currently has to read every package in the archive to complete; ArchiveCache instead creates and searches a cache in memory, thus making more use of the CPU.
c) Remote extraction. ArchiveCache requires only a tiny fraction of archive to be present locally. The ability to run ArchiveCache without downloading and storing terabytes of data will ease development.

Get this branch:
bzr branch lp:~jjed/archive-crawler/near-rewrite
Only Jjed can upload to this branch. If you are Jjed please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Jjed
Project:
archive-crawler
Status:
Mature

Recent revisions

184. By Jacob Edwards

More documentation for what's going on in the extract process.

183. By Jacob Edwards

Ensure run_suite only makes the output directory if needed.

182. By Jacob Edwards

Further improved the documentation of unit tests.

181. By Jacob Edwards

run_suite should create the output directory before calling unittest.main

180. By Jacob Edwards

Clean up documentation and variable names in unit tests.

179. By Jacob Edwards

TODO item: licensing concerns.

178. By Jacob Edwards

Remove unused __init__ method from IconFinder.

177. By Jacob Johan Edwards

Documented the python pixmaps bug, and cleared off old bugs.

176. By Jacob Johan Edwards

Updated the TODO file with new item: intelligent cache loading.

175. By Jacob Johan Edwards

Updated the README file.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers