Code review comment for lp:~adeuring/charmworld/1180118-check-if-charm-is-deleted

Revision history for this message
Aaron Bentley (abentley) wrote :

This code needs further work. The merge_sequences work is supposed to prevent memory exhaustion, but the possibility of memory exhaustion is vanishingly remote. The size of the charm dicts is measured in kilobytes, but the memory size of current machines is measured in gigabytes. We would need hundreds of millions of charms to run the risk of memory exhaustion. And even if this code prevented memory exhaustion from the db lookups, it doesn't address the memory impact of the LP lookup in get_branch_tips. So even with merge_sequences, the memory use would be half, which is the same order of magnitude.

This is premature optimization. If we actually start to encounter memory issues, we should solve it then. I expect the branch lookups are one of the last places that will give us issues. And we will likely have changed that code before we experience such issues, anyhow.

We don't have a LOC limit in this project, and I don't think we should, but I don't think the 90+ lines required by merge_sequences are worth the maintenance burden.

Here is a patch that provides the obvious implementation: http://pastebin.ubuntu.com/5742373/

Please apply the patch or take a similar approach.

Note that the patch uses basic data structures like dict and list. That kind of code is always easier to maintain than code that uses project-specific functions.

There are also bugs in the handling of the all_charms limit and default parameters. The import_filter is ignored entirely, and import_filter=default is supplied to available_charms.

The limit is applied only to the LP data, not the db data. This means that too many results will be returned (assuming the number of charms in the db exceeds limit), and that any results excluded from the available_charms output because of the limit that are present in the DB will incorrectly be assigned branch_deleted = True.

Simply applying a limit to both sets is not going to work, because discrepancies between the db and LP's branch listings will cause a branch to be excluded from the LP data based on the limit, while include in the DB data based on the limit. For example:

LP: ['a', 'b', 'c']
DB: ['a', 'c']

With a limit of 2, the LP set will be ['a', 'b'] and DB set will be ['a', 'c'], falsely implying that 'c' was deleted from the DB.

I think the simplest option is to apply the limit after merging LP and DB data.

It's arguable that ProofIngestJob should not abort if the branch is deleted, since we would want the proof data to be updated as our proofer is updated. Of course, it would need to abort if the branch_dir does not exist. (Yes, once the number of worker nodes increases, relying on the branch cache means that you may randomly get different results depending on the worker.)

review: Needs Fixing

« Back to merge proposal