Merge lp:~benji/charmworld/bundle-indexing into lp:~juju-jitsu/charmworld/trunk
- bundle-indexing
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Benji York |
Approved revision: | 328 |
Merged at revision: | 327 |
Proposed branch: | lp:~benji/charmworld/bundle-indexing |
Merge into: | lp:~juju-jitsu/charmworld/trunk |
Diff against target: |
386 lines (+135/-96) 5 files modified
charmworld/jobs/ingest.py (+25/-38) charmworld/jobs/tests/test_ingest.py (+65/-33) charmworld/jobs/tests/test_worker.py (+19/-22) charmworld/models.py (+7/-3) charmworld/tests/test_models.py (+19/-0) |
To merge this branch: | bzr merge lp:~benji/charmworld/bundle-indexing |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Benji York (community) | code | Approve | |
Aaron Bentley (community) | Approve | ||
Review via email: mp+176956@code.launchpad.net |
Commit message
Index bundles as part of ingest.
Description of the change
Benji York (benji) wrote : | # |
> Could you explain why quote_yaml was introduced in slurp_files? The keys in
> the input are supposed to be hex digits, so I don't see how they could contain
> '.' or '$'. I think it's preferable to raise a ValueError if inappropriate
> keys are supplied.
Good point. I guess that was a think-o on my part. I've removed the
quoting.
> quote_yaml lets us work around the fact that some of our keys are user-
> generated, but we want to change our internal representation so that no keys
> are user-generated. So ideally, we will not add any new quote_yaml call
> sites.
Since we store paths in hashed-files, we will still need to quote keys
(since the paths are keys and they can contain dots). I considered
writing a MongoDB-safe mapping that (un)quotes its keys as necessary,
but it felt a bit over-engineered for what we needed at this point in
time.
> Because the index is also used as a data store, it needs to be as tightly
> synchronized with the database as possible. So I'd like to see indexing and
> storing in the database presented as a single operation. In fact, this is one
> of the reasons CharmSource exists-- to present a unified CharmSource.save
> method.
I have addressed this by adding the indexing to the store_bundles function.
> At 143, consider making the side-effect a lambda. I think that might be more
> readable.
Done.
> Somehow, we've misspelled 'promulgated' as 'promultated'.
"Promultated" is the act of distributing potatoes.
Fixed.
Aaron Bentley (abentley) wrote : | # |
I was wrong about quote_yaml. The keys in dict from slurp_map are paths, not "hex digits", as I said. So they do need to be quoted, after all Please restore quote_yaml to decorate_basket and land. Sorry.
Charmworld Lander (charmworld-lander) wrote : | # |
There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.
Benji York (benji) wrote : | # |
Dear Charmworld Lander, I, being of sound mind and body, do hereby declare my latest revisions reviewed and approved and therefore designate them landable, by you, forthwith.
Charmworld Lander (charmworld-lander) wrote : | # |
Attempt to merge into lp:charmworld failed due to conflicts:
text conflict in charmworld/
Benji York (benji) wrote : | # |
Conflicts fixed.
Charmworld Lander (charmworld-lander) wrote : | # |
There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.
Benji York (benji) wrote : | # |
Conflicts fixed.
Preview Diff
1 | === modified file 'charmworld/jobs/ingest.py' |
2 | --- charmworld/jobs/ingest.py 2013-07-29 14:33:27 +0000 |
3 | +++ charmworld/jobs/ingest.py 2013-07-29 15:07:28 +0000 |
4 | @@ -49,9 +49,11 @@ |
5 | from charmworld.utils import ( |
6 | quote_key, |
7 | quote_yaml, |
8 | + unquote_yaml, |
9 | timestamp, |
10 | ) |
11 | |
12 | +# XXX Why not "from charmworld.jobs.config import..."? (Benji) |
13 | from config import CHARM_DIR |
14 | from config import CHARM_PROOF_PATH |
15 | from config import settings |
16 | @@ -274,32 +276,42 @@ |
17 | self.working_dir = working_dir |
18 | super(UpdateBundleJob, self).__init__() |
19 | |
20 | - def run(self, payload): |
21 | - self.log.info('Saving %s' % payload['branch_spec']) |
22 | - fs = getfs(self.db, collection='hashed-files') |
23 | - self.decorate_basket(payload, fs) |
24 | - self.db.baskets.save(payload) |
25 | - self.store_bundles(fs, payload, self.db.bundles) |
26 | + def store_bundles(self, deployer_config, basket_id): |
27 | + store_bundles(self.db.bundles, deployer_config, basket_id) |
28 | |
29 | @staticmethod |
30 | - def construct_id(basket_data, revno): |
31 | + def construct_basket_id(basket_data, revno): |
32 | owner_segment = '~%s/' % basket_data['owner'] |
33 | return '%s%s-%d' % (owner_segment, basket_data['name'], revno) |
34 | |
35 | + @staticmethod |
36 | + def decorate_bundles(basket_data): |
37 | + # Give each bundle a unique ID. |
38 | + for bundle_name, bundle in basket_data.items(): |
39 | + bundle['_id'] = '%s/%s' % (basket_data['_id'], bundle_name) |
40 | + |
41 | def decorate_basket(self, basket_data, fs): |
42 | branch_dir = fetch_branch(self.working_dir, basket_data, self.log) |
43 | branch = Branch.open(branch_dir) |
44 | revno = branch.revision_id_to_revno(basket_data['commit']) |
45 | tree = branch.repository.revision_tree(basket_data['commit']) |
46 | - basket_data['_id'] = self.construct_id(basket_data, revno) |
47 | - basket_data['file_hashes'] = slurp_files(fs, tree) |
48 | + basket_data['_id'] = self.construct_basket_id(basket_data, revno) |
49 | + basket_data['file_hashes'] = quote_yaml(slurp_files(fs, tree)) |
50 | |
51 | @staticmethod |
52 | - def store_bundles(fs, basket_data, collection): |
53 | - hashes = basket_data['file_hashes'] |
54 | + def get_deployer_config(fs, basket_data): |
55 | + hashes = unquote_yaml(basket_data['file_hashes']) |
56 | deployer_config_bytes = fs.get(hashes['bundles.yaml']) |
57 | - deployer_config = yaml.safe_load(deployer_config_bytes) |
58 | - store_bundles(collection, deployer_config, basket_data['_id']) |
59 | + return yaml.safe_load(deployer_config_bytes) |
60 | + |
61 | + def run(self, basket_data): |
62 | + self.log.info('Saving %s' % basket_data['branch_spec']) |
63 | + fs = getfs(self.db, collection='hashed-files') |
64 | + self.decorate_basket(basket_data, fs) |
65 | + self.db.baskets.save(basket_data) |
66 | + deployer_config = self.get_deployer_config(fs, basket_data) |
67 | + self.decorate_bundles(deployer_config) |
68 | + self.store_bundles(deployer_config, basket_data['_id']) |
69 | |
70 | |
71 | def _rev_info(r, branch): |
72 | @@ -690,28 +702,3 @@ |
73 | charm['address'] = address |
74 | charm['store_data'] = data |
75 | charm['store_url'] = make_store_url(data['revision'], address) |
76 | - |
77 | - |
78 | -# XXX j.c.sackett Jan 31 2013 Bug:1111708 scan_repo is swapped for |
79 | -# scan_charm to reindex what's on disk; we should probably just |
80 | -# have a different job for this that's not part of ingest. |
81 | -#def scan_repo(db, root_dir): |
82 | - #log = logging.getLogger("charm.scan") |
83 | - #charms = os.listdir(root_dir) |
84 | - #for c in charms: |
85 | - #charm_dir = os.path.join(root_dir, c) |
86 | - #if not os.path.isdir(charm_dir): |
87 | - #continue |
88 | - ##log.info("Processing %s", c) |
89 | - #try: |
90 | - #scan_charm( |
91 | - #db, c, charm_dir, repo="~charmers/charm/oneiric/%s" % c) |
92 | - #except: |
93 | - #log.exception("Unknown scan error") |
94 | - #raise |
95 | - #import pdb |
96 | - #import sys |
97 | - #import traceback |
98 | - #traceback.print_exc() |
99 | - #pdb.post_mortem(sys.exc_info()[-1]) |
100 | - #raise |
101 | |
102 | === modified file 'charmworld/jobs/tests/test_ingest.py' |
103 | --- charmworld/jobs/tests/test_ingest.py 2013-07-25 19:48:14 +0000 |
104 | +++ charmworld/jobs/tests/test_ingest.py 2013-07-29 15:07:28 +0000 |
105 | @@ -21,6 +21,7 @@ |
106 | ) |
107 | from mock import patch |
108 | from pyelasticsearch.exceptions import ElasticHttpNotFoundError |
109 | +import yaml |
110 | |
111 | from charmworld.charmstore import ( |
112 | CharmStore, |
113 | @@ -50,6 +51,9 @@ |
114 | response404, |
115 | TestCase, |
116 | ) |
117 | +from charmworld.utils import ( |
118 | + quote_key, |
119 | + ) |
120 | |
121 | |
122 | class TestUpdateDateCreated(TestCase): |
123 | @@ -288,6 +292,36 @@ |
124 | fs = getfs(self.db, 'hashed-files') |
125 | self.assertEqual(fs.get(expected_hash).read(), bytes) |
126 | |
127 | + def test_dots_in_file_paths_do_not_break_things(self): |
128 | + # Since mongodb doesn't like keys with dots in them and we want to |
129 | + # store a mapping from file name (potentially with dots) to file hash, |
130 | + # we need to quote the file patth. |
131 | + job = UpdateBundleJob() |
132 | + job.setup(self.db) |
133 | + with patch.object(job, 'get_deployer_config'): |
134 | + with bundle_update_environment(): |
135 | + with patch( |
136 | + 'charmworld.jobs.ingest.slurp_files', |
137 | + side_effect=lambda *_: {'a.b': 'X'}) as slurp_files: |
138 | + bundle_data = factory.get_payload_json() |
139 | + job.run(bundle_data) |
140 | + self.assertTrue(slurp_files.called) |
141 | + |
142 | + def test_file_paths_with_dots_can_be_retrieved(self): |
143 | + # Since we have to escape file paths for mongodb, it is important that |
144 | + # we check that file paths containing dots can be retrieved. |
145 | + job = UpdateBundleJob() |
146 | + basket_data = {'file_hashes': {quote_key('bundles.yaml'): 'HASH'}} |
147 | + class FauxFS: |
148 | + @staticmethod |
149 | + def get(hash): |
150 | + self.assertEqual(hash, 'HASH') |
151 | + self.get_called = True |
152 | + return '' |
153 | + job.get_deployer_config(FauxFS(), basket_data) |
154 | + self.assertTrue(self.get_called) |
155 | + |
156 | + |
157 | def test_run_pulls_in_all_branch_files(self): |
158 | # The ingest job puts all files in the branch into the database. |
159 | NAME = 'our-source-package' |
160 | @@ -303,32 +337,32 @@ |
161 | # database. |
162 | job = UpdateBundleJob() |
163 | job.setup(self.db) |
164 | - with bundle_update_environment(): |
165 | - def slurp_files_side_effect(*args): |
166 | - return {} |
167 | - with patch( |
168 | - 'charmworld.jobs.ingest.slurp_files', |
169 | - side_effect=slurp_files_side_effect) as slurp_files: |
170 | - bundle_data = factory.get_payload_json() |
171 | - job.run(bundle_data) |
172 | - self.assertTrue(slurp_files.called) |
173 | + with patch.object(job, 'get_deployer_config'): |
174 | + with bundle_update_environment(): |
175 | + with patch( |
176 | + 'charmworld.jobs.ingest.slurp_files', |
177 | + side_effect=lambda *_: {}) as slurp_files: |
178 | + bundle_data = factory.get_payload_json() |
179 | + job.run(bundle_data) |
180 | + self.assertTrue(slurp_files.called) |
181 | |
182 | def test_bundle_knows_where_its_files_are(self): |
183 | # The mapping of bundle file paths to hashes (keys in gridfs) is |
184 | # stored in the bundle document. |
185 | job = UpdateBundleJob() |
186 | job.setup(self.db) |
187 | - with bundle_update_environment(): |
188 | - def slurp_files_side_effect(*args): |
189 | - return {'path1': 'hash1', 'path2': 'hash2'} |
190 | - with patch( |
191 | - 'charmworld.jobs.ingest.slurp_files', |
192 | - side_effect=slurp_files_side_effect): |
193 | - bundle_data = factory.get_payload_json() |
194 | - job.run(bundle_data) |
195 | - self.assertEqual( |
196 | - bundle_data['file_hashes'], |
197 | - {'path1': 'hash1', 'path2': 'hash2'}) |
198 | + with patch.object(job, 'get_deployer_config'): |
199 | + with bundle_update_environment(): |
200 | + def slurp_files_side_effect(*args): |
201 | + return {'path1': 'hash1', 'path2': 'hash2'} |
202 | + with patch( |
203 | + 'charmworld.jobs.ingest.slurp_files', |
204 | + side_effect=slurp_files_side_effect): |
205 | + bundle_data = factory.get_payload_json() |
206 | + job.run(bundle_data) |
207 | + self.assertEqual( |
208 | + bundle_data['file_hashes'], |
209 | + {'path1': 'hash1', 'path2': 'hash2'}) |
210 | |
211 | def test_bundle_ids_include_their_revno(self): |
212 | job = UpdateBundleJob() |
213 | @@ -363,19 +397,19 @@ |
214 | with self.assertRaises(bzrlib.errors.NoSuchRevision): |
215 | job.decorate_basket(bundle_data, None) |
216 | |
217 | - def test_construct_id_includes_owner_if_promultated(self): |
218 | + def test_construct_basket_id_includes_owner_if_promulgated(self): |
219 | payload = factory.get_payload_json(name='mysql', promulgated=True) |
220 | - basket_id = UpdateBundleJob.construct_id(payload, 5) |
221 | + basket_id = UpdateBundleJob.construct_basket_id(payload, 5) |
222 | self.assertEqual('~charmers/mysql-5', basket_id) |
223 | |
224 | - def test_construct_id_includes_owner_if_not_promultated(self): |
225 | + def test_construct_basket_id_includes_owner_if_not_promulgated(self): |
226 | payload = factory.get_payload_json(name='mysql', promulgated=False) |
227 | - basket_id = UpdateBundleJob.construct_id(payload, 5) |
228 | + basket_id = UpdateBundleJob.construct_basket_id(payload, 5) |
229 | self.assertEqual('~charmers/mysql-5', basket_id) |
230 | |
231 | def test_job_can_store_bundles_in_the_db(self): |
232 | basket_data = factory.get_payload_json(name='wordpress') |
233 | - basket_id = UpdateBundleJob.construct_id(basket_data, 5) |
234 | + basket_id = UpdateBundleJob.construct_basket_id(basket_data, 5) |
235 | bundle_id = '%s/%s' % (basket_id, 'stage') |
236 | basket_data['_id'] = basket_id |
237 | job = UpdateBundleJob() |
238 | @@ -397,20 +431,18 @@ |
239 | engine: apache |
240 | """) |
241 | fs.put(deployer_config, _id=DEPLOYER_CONFIG_HASH) |
242 | - job.store_bundles(fs, basket_data, self.db.bundles) |
243 | + job.store_bundles(yaml.safe_load(deployer_config), basket_id) |
244 | self.assertIsNotNone(self.db.bundles.find_one(bundle_id)) |
245 | |
246 | def test_job_run_stores_bundles_in_the_db(self): |
247 | basket_data = factory.get_payload_json(name='wordpress') |
248 | job = UpdateBundleJob() |
249 | job.setup(self.db) |
250 | - with patch.object(job, 'decorate_basket'): |
251 | - with patch.object(job, 'store_bundles') as store_bundles: |
252 | - job.run(basket_data) |
253 | - self.assertTrue(store_bundles.called) |
254 | - |
255 | - |
256 | - # TODO add bundles to index |
257 | + with patch.object(job, 'get_deployer_config'): |
258 | + with patch.object(job, 'decorate_basket'): |
259 | + with patch.object(job, 'store_bundles') as store_bundles: |
260 | + job.run(basket_data) |
261 | + self.assertTrue(store_bundles.called) |
262 | |
263 | |
264 | class TestUpdateCharm(MongoTestBase): |
265 | |
266 | === modified file 'charmworld/jobs/tests/test_worker.py' |
267 | --- charmworld/jobs/tests/test_worker.py 2013-07-25 19:48:14 +0000 |
268 | +++ charmworld/jobs/tests/test_worker.py 2013-07-29 15:07:28 +0000 |
269 | @@ -67,6 +67,21 @@ |
270 | yield |
271 | |
272 | |
273 | +def FauxWorkerFactory(test_case): |
274 | + class FauxWorker: |
275 | + run_called = False |
276 | + |
277 | + def __init__(self, *args): |
278 | + pass |
279 | + |
280 | + @classmethod |
281 | + def run(cls, run_forever): |
282 | + test_case.assertFalse(run_forever) |
283 | + cls.run_called = True |
284 | + |
285 | + return FauxWorker |
286 | + |
287 | + |
288 | class WorkerTest(MongoTestBase): |
289 | |
290 | def setUp(self): |
291 | @@ -185,31 +200,13 @@ |
292 | self.assertIn(queue_entry, queue_entries_run) |
293 | |
294 | def test_worker_does_not_run_forever_by_default(self): |
295 | - run_called = [] |
296 | - class FauxWorker: |
297 | - def __init__(self, *args): |
298 | - pass |
299 | - |
300 | - @staticmethod |
301 | - def run(run_forever): |
302 | - self.assertFalse(run_forever) |
303 | - run_called.append(True) |
304 | - |
305 | + FauxWorker = FauxWorkerFactory(self) |
306 | with patch('charmworld.jobs.worker.QueueWorker', new=FauxWorker): |
307 | self.run_main_with_exit() |
308 | - self.assertTrue(run_called) |
309 | + self.assertTrue(FauxWorker.run_called) |
310 | |
311 | def test_worker_runs_forever_if_command_line_says_to(self): |
312 | - run_called = [] |
313 | - class FauxWorker: |
314 | - def __init__(self, *args): |
315 | - pass |
316 | - |
317 | - @staticmethod |
318 | - def run(run_forever): |
319 | - self.assertFalse(run_forever) |
320 | - run_called.append(True) |
321 | - |
322 | + FauxWorker = FauxWorkerFactory(self) |
323 | with patch('charmworld.jobs.worker.QueueWorker', new=FauxWorker): |
324 | self.run_main_with_exit() |
325 | - self.assertTrue(run_called) |
326 | + self.assertTrue(FauxWorker.run_called) |
327 | |
328 | === modified file 'charmworld/models.py' |
329 | --- charmworld/models.py 2013-07-29 11:29:28 +0000 |
330 | +++ charmworld/models.py 2013-07-29 15:07:28 +0000 |
331 | @@ -28,6 +28,7 @@ |
332 | quote_key, |
333 | ) |
334 | from charmworld.search import ElasticSearchClient |
335 | +from charmworld.jobs.config import settings |
336 | |
337 | OBSOLETE_SERIES = set(['oneiric']) |
338 | |
339 | @@ -1213,11 +1214,14 @@ |
340 | return options |
341 | |
342 | |
343 | -def store_bundles(collection, deployer_data, basket_id): |
344 | - for key in deployer_data: |
345 | +def store_bundles(collection, deployer_config, basket_id): |
346 | + """Store a basket of bundles into both MongoDB and ElasticSearch.""" |
347 | + index_client = ElasticSearchClient.from_settings(settings) |
348 | + index_client.index_bundles(deployer_config.values()) |
349 | + for key in deployer_config: |
350 | bundle_doc = { |
351 | '_id': '%s/%s' % (basket_id, key), |
352 | - 'data': get_flattened_deployment(deployer_data, key), |
353 | + 'data': get_flattened_deployment(deployer_config, key), |
354 | } |
355 | collection.save(bundle_doc) |
356 | |
357 | |
358 | === modified file 'charmworld/tests/test_models.py' |
359 | --- charmworld/tests/test_models.py 2013-07-29 11:29:28 +0000 |
360 | +++ charmworld/tests/test_models.py 2013-07-29 15:07:28 +0000 |
361 | @@ -1316,6 +1316,25 @@ |
362 | store_bundles(self.db.bundles, parsed, 'wordpress-5') |
363 | self.assertItemsEqual(['wordpress-stage', 'wordpress-prod'], keys) |
364 | |
365 | + def test_storing_a_bundle_includes_indexing_it(self): |
366 | + class FauxElasticSearchClient: |
367 | + index_bundles_called = False |
368 | + |
369 | + @classmethod |
370 | + def from_settings(cls, *args): |
371 | + return cls |
372 | + |
373 | + @classmethod |
374 | + def index_bundles(cls, *args): |
375 | + cls.index_bundles_called = True |
376 | + |
377 | + with patch( |
378 | + 'charmworld.models.ElasticSearchClient', |
379 | + FauxElasticSearchClient): |
380 | + store_bundles(self.db.bundles, {}, 'wordpress-5') |
381 | + |
382 | + self.assertTrue(FauxElasticSearchClient.index_bundles_called) |
383 | + |
384 | |
385 | class BundleTestCase(TestCase): |
386 | """Unit tests for the Bundle model that wraps bundle representations.""" |
Could you explain why quote_yaml was introduced in slurp_files? The keys in the input are supposed to be hex digits, so I don't see how they could contain '.' or '$'. I think it's preferable to raise a ValueError if inappropriate keys are supplied.
quote_yaml lets us work around the fact that some of our keys are user-generated, but we want to change our internal representation so that no keys are user-generated. So ideally, we will not add any new quote_yaml call sites.
Because the index is also used as a data store, it needs to be as tightly synchronized with the database as possible. So I'd like to see indexing and storing in the database presented as a single operation. In fact, this is one of the reasons CharmSource exists-- to present a unified CharmSource.save method. It's important that the same representation is stored in the index and the db, because we want to be able to use them interchangeably. Right now, models. store_bundles is storing a different representation from UpdateBundle. index_bundles. Both of these concerns could be addressed by having models. store_bundles accept an index client and update the index as well.
At 143, consider making the side-effect a lambda. I think that might be more readable.
Somehow, we've misspelled 'promulgated' as 'promultated'. Could you fix that (since you're renaming those tests anyway)?