Merge lp:~adeuring/charmworld/1180118-check-if-charm-is-deleted into lp:~juju-jitsu/charmworld/trunk
Status: | Merged |
---|---|
Approved by: | Abel Deuring |
Approved revision: | 253 |
Merged at revision: | 256 |
Proposed branch: | lp:~adeuring/charmworld/1180118-check-if-charm-is-deleted |
Merge into: | lp:~juju-jitsu/charmworld/trunk |
Diff against target: |
475 lines (+220/-37) 8 files modified
charmworld/jobs/ingest.py (+6/-0) charmworld/jobs/lp.py (+43/-17) charmworld/jobs/tests/test_bzr.py (+33/-0) charmworld/jobs/tests/test_ingest.py (+1/-0) charmworld/jobs/tests/test_lp.py (+114/-19) charmworld/jobs/tests/test_proof.py (+13/-0) charmworld/models.py (+7/-0) charmworld/testing/factory.py (+3/-1) |
To merge this branch: | bzr merge lp:~adeuring/charmworld/1180118-check-if-charm-is-deleted |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Abel Deuring (community) | Approve | ||
Aaron Bentley (community) | Approve | ||
Review via email: mp+168050@code.launchpad.net |
Commit message
jobs.lp: queue also charms that do no longer have a Launchpad branch.
Description of the change
This branch fixes the "deleted branches" part of bug 1180118:
Charmworld does not know about deleted/merged charms
It adds a new attribute "branch_deleted" to the charm data; those parts
of the ingest job are skipped that require access to a charm's files.
The queue/lp job now also uses the set of all charms stored in
the Mongo DB as a second source of charms that will be checked
by the ingest job.
This second source implemented in the new function jobs.lp.db_charms()
The sequences db_charms() and available_charms() are merged by a
new function job.utils.
elements from each source; if they are considered to be equal,
the value from sequence a is returned, otherwise the "smaller" value
is returned.
The sequences returned by db_charms() and available_charms() are ordered
by branch_spec, hence cmp(x['
as the comparison function in merge_sequences.
Since the data from db_charms() is only used when available_charms()
does not have a corresponding element, db_charms() can already set
the values charm['
charm['
Implementation note for merge_sequences(): I played at first with
try: ... except StopIteration: ... in this function but I found it
hard to find a half way readable implementation that works correctly
in the "elif compared == 0" part when one or both of the "source
sequences" run out of elements. Appending a "marker element" to these
sequences made the implementaion mch easier to read.
This code needs further work. The merge_sequences work is supposed to prevent memory exhaustion, but the possibility of memory exhaustion is vanishingly remote. The size of the charm dicts is measured in kilobytes, but the memory size of current machines is measured in gigabytes. We would need hundreds of millions of charms to run the risk of memory exhaustion. And even if this code prevented memory exhaustion from the db lookups, it doesn't address the memory impact of the LP lookup in get_branch_tips. So even with merge_sequences, the memory use would be half, which is the same order of magnitude.
This is premature optimization. If we actually start to encounter memory issues, we should solve it then. I expect the branch lookups are one of the last places that will give us issues. And we will likely have changed that code before we experience such issues, anyhow.
We don't have a LOC limit in this project, and I don't think we should, but I don't think the 90+ lines required by merge_sequences are worth the maintenance burden.
Here is a patch that provides the obvious implementation: http:// pastebin. ubuntu. com/5742373/
Please apply the patch or take a similar approach.
Note that the patch uses basic data structures like dict and list. That kind of code is always easier to maintain than code that uses project-specific functions.
There are also bugs in the handling of the all_charms limit and default parameters. The import_filter is ignored entirely, and import_ filter= default is supplied to available_charms.
The limit is applied only to the LP data, not the db data. This means that too many results will be returned (assuming the number of charms in the db exceeds limit), and that any results excluded from the available_charms output because of the limit that are present in the DB will incorrectly be assigned branch_deleted = True.
Simply applying a limit to both sets is not going to work, because discrepancies between the db and LP's branch listings will cause a branch to be excluded from the LP data based on the limit, while include in the DB data based on the limit. For example:
LP: ['a', 'b', 'c']
DB: ['a', 'c']
With a limit of 2, the LP set will be ['a', 'b'] and DB set will be ['a', 'c'], falsely implying that 'c' was deleted from the DB.
I think the simplest option is to apply the limit after merging LP and DB data.
It's arguable that ProofIngestJob should not abort if the branch is deleted, since we would want the proof data to be updated as our proofer is updated. Of course, it would need to abort if the branch_dir does not exist. (Yes, once the number of worker nodes increases, relying on the branch cache means that you may randomly get different results depending on the worker.)