Merge lp:~benji/charmworld/bundle-indexing into lp:~juju-jitsu/charmworld/trunk

Proposed by Benji York
Status: Merged
Approved by: Benji York
Approved revision: 328
Merged at revision: 327
Proposed branch: lp:~benji/charmworld/bundle-indexing
Merge into: lp:~juju-jitsu/charmworld/trunk
Diff against target: 386 lines (+135/-96)
5 files modified
charmworld/jobs/ingest.py (+25/-38)
charmworld/jobs/tests/test_ingest.py (+65/-33)
charmworld/jobs/tests/test_worker.py (+19/-22)
charmworld/models.py (+7/-3)
charmworld/tests/test_models.py (+19/-0)
To merge this branch: bzr merge lp:~benji/charmworld/bundle-indexing
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Aaron Bentley (community) Approve
Review via email: mp+176956@code.launchpad.net

Commit message

Index bundles as part of ingest.

To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

Could you explain why quote_yaml was introduced in slurp_files? The keys in the input are supposed to be hex digits, so I don't see how they could contain '.' or '$'. I think it's preferable to raise a ValueError if inappropriate keys are supplied.

quote_yaml lets us work around the fact that some of our keys are user-generated, but we want to change our internal representation so that no keys are user-generated. So ideally, we will not add any new quote_yaml call sites.

Because the index is also used as a data store, it needs to be as tightly synchronized with the database as possible. So I'd like to see indexing and storing in the database presented as a single operation. In fact, this is one of the reasons CharmSource exists-- to present a unified CharmSource.save method. It's important that the same representation is stored in the index and the db, because we want to be able to use them interchangeably. Right now, models.store_bundles is storing a different representation from UpdateBundle.index_bundles. Both of these concerns could be addressed by having models.store_bundles accept an index client and update the index as well.

At 143, consider making the side-effect a lambda. I think that might be more readable.

Somehow, we've misspelled 'promulgated' as 'promultated'. Could you fix that (since you're renaming those tests anyway)?

review: Needs Information
Revision history for this message
Benji York (benji) wrote :

> Could you explain why quote_yaml was introduced in slurp_files? The keys in
> the input are supposed to be hex digits, so I don't see how they could contain
> '.' or '$'. I think it's preferable to raise a ValueError if inappropriate
> keys are supplied.

Good point. I guess that was a think-o on my part. I've removed the
quoting.

> quote_yaml lets us work around the fact that some of our keys are user-
> generated, but we want to change our internal representation so that no keys
> are user-generated. So ideally, we will not add any new quote_yaml call
> sites.

Since we store paths in hashed-files, we will still need to quote keys
(since the paths are keys and they can contain dots). I considered
writing a MongoDB-safe mapping that (un)quotes its keys as necessary,
but it felt a bit over-engineered for what we needed at this point in
time.

> Because the index is also used as a data store, it needs to be as tightly
> synchronized with the database as possible. So I'd like to see indexing and
> storing in the database presented as a single operation. In fact, this is one
> of the reasons CharmSource exists-- to present a unified CharmSource.save
> method.

I have addressed this by adding the indexing to the store_bundles function.

> At 143, consider making the side-effect a lambda. I think that might be more
> readable.

Done.

> Somehow, we've misspelled 'promulgated' as 'promultated'.

"Promultated" is the act of distributing potatoes.

Fixed.

Revision history for this message
Aaron Bentley (abentley) wrote :

I was wrong about quote_yaml. The keys in dict from slurp_map are paths, not "hex digits", as I said. So they do need to be quoted, after all Please restore quote_yaml to decorate_basket and land. Sorry.

review: Approve
Revision history for this message
Charmworld Lander (charmworld-lander) wrote :

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message
Benji York (benji) wrote :

Dear Charmworld Lander, I, being of sound mind and body, do hereby declare my latest revisions reviewed and approved and therefore designate them landable, by you, forthwith.

review: Approve (code)
Revision history for this message
Charmworld Lander (charmworld-lander) wrote :

Attempt to merge into lp:charmworld failed due to conflicts:

text conflict in charmworld/jobs/ingest.py

Revision history for this message
Benji York (benji) wrote :

Conflicts fixed.

review: Approve (code)
Revision history for this message
Charmworld Lander (charmworld-lander) wrote :

There are additional revisions which have not been approved in review. Please seek review and approval of these new revisions.

Revision history for this message
Benji York (benji) wrote :

Conflicts fixed.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmworld/jobs/ingest.py'
2--- charmworld/jobs/ingest.py 2013-07-29 14:33:27 +0000
3+++ charmworld/jobs/ingest.py 2013-07-29 15:07:28 +0000
4@@ -49,9 +49,11 @@
5 from charmworld.utils import (
6 quote_key,
7 quote_yaml,
8+ unquote_yaml,
9 timestamp,
10 )
11
12+# XXX Why not "from charmworld.jobs.config import..."? (Benji)
13 from config import CHARM_DIR
14 from config import CHARM_PROOF_PATH
15 from config import settings
16@@ -274,32 +276,42 @@
17 self.working_dir = working_dir
18 super(UpdateBundleJob, self).__init__()
19
20- def run(self, payload):
21- self.log.info('Saving %s' % payload['branch_spec'])
22- fs = getfs(self.db, collection='hashed-files')
23- self.decorate_basket(payload, fs)
24- self.db.baskets.save(payload)
25- self.store_bundles(fs, payload, self.db.bundles)
26+ def store_bundles(self, deployer_config, basket_id):
27+ store_bundles(self.db.bundles, deployer_config, basket_id)
28
29 @staticmethod
30- def construct_id(basket_data, revno):
31+ def construct_basket_id(basket_data, revno):
32 owner_segment = '~%s/' % basket_data['owner']
33 return '%s%s-%d' % (owner_segment, basket_data['name'], revno)
34
35+ @staticmethod
36+ def decorate_bundles(basket_data):
37+ # Give each bundle a unique ID.
38+ for bundle_name, bundle in basket_data.items():
39+ bundle['_id'] = '%s/%s' % (basket_data['_id'], bundle_name)
40+
41 def decorate_basket(self, basket_data, fs):
42 branch_dir = fetch_branch(self.working_dir, basket_data, self.log)
43 branch = Branch.open(branch_dir)
44 revno = branch.revision_id_to_revno(basket_data['commit'])
45 tree = branch.repository.revision_tree(basket_data['commit'])
46- basket_data['_id'] = self.construct_id(basket_data, revno)
47- basket_data['file_hashes'] = slurp_files(fs, tree)
48+ basket_data['_id'] = self.construct_basket_id(basket_data, revno)
49+ basket_data['file_hashes'] = quote_yaml(slurp_files(fs, tree))
50
51 @staticmethod
52- def store_bundles(fs, basket_data, collection):
53- hashes = basket_data['file_hashes']
54+ def get_deployer_config(fs, basket_data):
55+ hashes = unquote_yaml(basket_data['file_hashes'])
56 deployer_config_bytes = fs.get(hashes['bundles.yaml'])
57- deployer_config = yaml.safe_load(deployer_config_bytes)
58- store_bundles(collection, deployer_config, basket_data['_id'])
59+ return yaml.safe_load(deployer_config_bytes)
60+
61+ def run(self, basket_data):
62+ self.log.info('Saving %s' % basket_data['branch_spec'])
63+ fs = getfs(self.db, collection='hashed-files')
64+ self.decorate_basket(basket_data, fs)
65+ self.db.baskets.save(basket_data)
66+ deployer_config = self.get_deployer_config(fs, basket_data)
67+ self.decorate_bundles(deployer_config)
68+ self.store_bundles(deployer_config, basket_data['_id'])
69
70
71 def _rev_info(r, branch):
72@@ -690,28 +702,3 @@
73 charm['address'] = address
74 charm['store_data'] = data
75 charm['store_url'] = make_store_url(data['revision'], address)
76-
77-
78-# XXX j.c.sackett Jan 31 2013 Bug:1111708 scan_repo is swapped for
79-# scan_charm to reindex what's on disk; we should probably just
80-# have a different job for this that's not part of ingest.
81-#def scan_repo(db, root_dir):
82- #log = logging.getLogger("charm.scan")
83- #charms = os.listdir(root_dir)
84- #for c in charms:
85- #charm_dir = os.path.join(root_dir, c)
86- #if not os.path.isdir(charm_dir):
87- #continue
88- ##log.info("Processing %s", c)
89- #try:
90- #scan_charm(
91- #db, c, charm_dir, repo="~charmers/charm/oneiric/%s" % c)
92- #except:
93- #log.exception("Unknown scan error")
94- #raise
95- #import pdb
96- #import sys
97- #import traceback
98- #traceback.print_exc()
99- #pdb.post_mortem(sys.exc_info()[-1])
100- #raise
101
102=== modified file 'charmworld/jobs/tests/test_ingest.py'
103--- charmworld/jobs/tests/test_ingest.py 2013-07-25 19:48:14 +0000
104+++ charmworld/jobs/tests/test_ingest.py 2013-07-29 15:07:28 +0000
105@@ -21,6 +21,7 @@
106 )
107 from mock import patch
108 from pyelasticsearch.exceptions import ElasticHttpNotFoundError
109+import yaml
110
111 from charmworld.charmstore import (
112 CharmStore,
113@@ -50,6 +51,9 @@
114 response404,
115 TestCase,
116 )
117+from charmworld.utils import (
118+ quote_key,
119+ )
120
121
122 class TestUpdateDateCreated(TestCase):
123@@ -288,6 +292,36 @@
124 fs = getfs(self.db, 'hashed-files')
125 self.assertEqual(fs.get(expected_hash).read(), bytes)
126
127+ def test_dots_in_file_paths_do_not_break_things(self):
128+ # Since mongodb doesn't like keys with dots in them and we want to
129+ # store a mapping from file name (potentially with dots) to file hash,
130+ # we need to quote the file patth.
131+ job = UpdateBundleJob()
132+ job.setup(self.db)
133+ with patch.object(job, 'get_deployer_config'):
134+ with bundle_update_environment():
135+ with patch(
136+ 'charmworld.jobs.ingest.slurp_files',
137+ side_effect=lambda *_: {'a.b': 'X'}) as slurp_files:
138+ bundle_data = factory.get_payload_json()
139+ job.run(bundle_data)
140+ self.assertTrue(slurp_files.called)
141+
142+ def test_file_paths_with_dots_can_be_retrieved(self):
143+ # Since we have to escape file paths for mongodb, it is important that
144+ # we check that file paths containing dots can be retrieved.
145+ job = UpdateBundleJob()
146+ basket_data = {'file_hashes': {quote_key('bundles.yaml'): 'HASH'}}
147+ class FauxFS:
148+ @staticmethod
149+ def get(hash):
150+ self.assertEqual(hash, 'HASH')
151+ self.get_called = True
152+ return ''
153+ job.get_deployer_config(FauxFS(), basket_data)
154+ self.assertTrue(self.get_called)
155+
156+
157 def test_run_pulls_in_all_branch_files(self):
158 # The ingest job puts all files in the branch into the database.
159 NAME = 'our-source-package'
160@@ -303,32 +337,32 @@
161 # database.
162 job = UpdateBundleJob()
163 job.setup(self.db)
164- with bundle_update_environment():
165- def slurp_files_side_effect(*args):
166- return {}
167- with patch(
168- 'charmworld.jobs.ingest.slurp_files',
169- side_effect=slurp_files_side_effect) as slurp_files:
170- bundle_data = factory.get_payload_json()
171- job.run(bundle_data)
172- self.assertTrue(slurp_files.called)
173+ with patch.object(job, 'get_deployer_config'):
174+ with bundle_update_environment():
175+ with patch(
176+ 'charmworld.jobs.ingest.slurp_files',
177+ side_effect=lambda *_: {}) as slurp_files:
178+ bundle_data = factory.get_payload_json()
179+ job.run(bundle_data)
180+ self.assertTrue(slurp_files.called)
181
182 def test_bundle_knows_where_its_files_are(self):
183 # The mapping of bundle file paths to hashes (keys in gridfs) is
184 # stored in the bundle document.
185 job = UpdateBundleJob()
186 job.setup(self.db)
187- with bundle_update_environment():
188- def slurp_files_side_effect(*args):
189- return {'path1': 'hash1', 'path2': 'hash2'}
190- with patch(
191- 'charmworld.jobs.ingest.slurp_files',
192- side_effect=slurp_files_side_effect):
193- bundle_data = factory.get_payload_json()
194- job.run(bundle_data)
195- self.assertEqual(
196- bundle_data['file_hashes'],
197- {'path1': 'hash1', 'path2': 'hash2'})
198+ with patch.object(job, 'get_deployer_config'):
199+ with bundle_update_environment():
200+ def slurp_files_side_effect(*args):
201+ return {'path1': 'hash1', 'path2': 'hash2'}
202+ with patch(
203+ 'charmworld.jobs.ingest.slurp_files',
204+ side_effect=slurp_files_side_effect):
205+ bundle_data = factory.get_payload_json()
206+ job.run(bundle_data)
207+ self.assertEqual(
208+ bundle_data['file_hashes'],
209+ {'path1': 'hash1', 'path2': 'hash2'})
210
211 def test_bundle_ids_include_their_revno(self):
212 job = UpdateBundleJob()
213@@ -363,19 +397,19 @@
214 with self.assertRaises(bzrlib.errors.NoSuchRevision):
215 job.decorate_basket(bundle_data, None)
216
217- def test_construct_id_includes_owner_if_promultated(self):
218+ def test_construct_basket_id_includes_owner_if_promulgated(self):
219 payload = factory.get_payload_json(name='mysql', promulgated=True)
220- basket_id = UpdateBundleJob.construct_id(payload, 5)
221+ basket_id = UpdateBundleJob.construct_basket_id(payload, 5)
222 self.assertEqual('~charmers/mysql-5', basket_id)
223
224- def test_construct_id_includes_owner_if_not_promultated(self):
225+ def test_construct_basket_id_includes_owner_if_not_promulgated(self):
226 payload = factory.get_payload_json(name='mysql', promulgated=False)
227- basket_id = UpdateBundleJob.construct_id(payload, 5)
228+ basket_id = UpdateBundleJob.construct_basket_id(payload, 5)
229 self.assertEqual('~charmers/mysql-5', basket_id)
230
231 def test_job_can_store_bundles_in_the_db(self):
232 basket_data = factory.get_payload_json(name='wordpress')
233- basket_id = UpdateBundleJob.construct_id(basket_data, 5)
234+ basket_id = UpdateBundleJob.construct_basket_id(basket_data, 5)
235 bundle_id = '%s/%s' % (basket_id, 'stage')
236 basket_data['_id'] = basket_id
237 job = UpdateBundleJob()
238@@ -397,20 +431,18 @@
239 engine: apache
240 """)
241 fs.put(deployer_config, _id=DEPLOYER_CONFIG_HASH)
242- job.store_bundles(fs, basket_data, self.db.bundles)
243+ job.store_bundles(yaml.safe_load(deployer_config), basket_id)
244 self.assertIsNotNone(self.db.bundles.find_one(bundle_id))
245
246 def test_job_run_stores_bundles_in_the_db(self):
247 basket_data = factory.get_payload_json(name='wordpress')
248 job = UpdateBundleJob()
249 job.setup(self.db)
250- with patch.object(job, 'decorate_basket'):
251- with patch.object(job, 'store_bundles') as store_bundles:
252- job.run(basket_data)
253- self.assertTrue(store_bundles.called)
254-
255-
256- # TODO add bundles to index
257+ with patch.object(job, 'get_deployer_config'):
258+ with patch.object(job, 'decorate_basket'):
259+ with patch.object(job, 'store_bundles') as store_bundles:
260+ job.run(basket_data)
261+ self.assertTrue(store_bundles.called)
262
263
264 class TestUpdateCharm(MongoTestBase):
265
266=== modified file 'charmworld/jobs/tests/test_worker.py'
267--- charmworld/jobs/tests/test_worker.py 2013-07-25 19:48:14 +0000
268+++ charmworld/jobs/tests/test_worker.py 2013-07-29 15:07:28 +0000
269@@ -67,6 +67,21 @@
270 yield
271
272
273+def FauxWorkerFactory(test_case):
274+ class FauxWorker:
275+ run_called = False
276+
277+ def __init__(self, *args):
278+ pass
279+
280+ @classmethod
281+ def run(cls, run_forever):
282+ test_case.assertFalse(run_forever)
283+ cls.run_called = True
284+
285+ return FauxWorker
286+
287+
288 class WorkerTest(MongoTestBase):
289
290 def setUp(self):
291@@ -185,31 +200,13 @@
292 self.assertIn(queue_entry, queue_entries_run)
293
294 def test_worker_does_not_run_forever_by_default(self):
295- run_called = []
296- class FauxWorker:
297- def __init__(self, *args):
298- pass
299-
300- @staticmethod
301- def run(run_forever):
302- self.assertFalse(run_forever)
303- run_called.append(True)
304-
305+ FauxWorker = FauxWorkerFactory(self)
306 with patch('charmworld.jobs.worker.QueueWorker', new=FauxWorker):
307 self.run_main_with_exit()
308- self.assertTrue(run_called)
309+ self.assertTrue(FauxWorker.run_called)
310
311 def test_worker_runs_forever_if_command_line_says_to(self):
312- run_called = []
313- class FauxWorker:
314- def __init__(self, *args):
315- pass
316-
317- @staticmethod
318- def run(run_forever):
319- self.assertFalse(run_forever)
320- run_called.append(True)
321-
322+ FauxWorker = FauxWorkerFactory(self)
323 with patch('charmworld.jobs.worker.QueueWorker', new=FauxWorker):
324 self.run_main_with_exit()
325- self.assertTrue(run_called)
326+ self.assertTrue(FauxWorker.run_called)
327
328=== modified file 'charmworld/models.py'
329--- charmworld/models.py 2013-07-29 11:29:28 +0000
330+++ charmworld/models.py 2013-07-29 15:07:28 +0000
331@@ -28,6 +28,7 @@
332 quote_key,
333 )
334 from charmworld.search import ElasticSearchClient
335+from charmworld.jobs.config import settings
336
337 OBSOLETE_SERIES = set(['oneiric'])
338
339@@ -1213,11 +1214,14 @@
340 return options
341
342
343-def store_bundles(collection, deployer_data, basket_id):
344- for key in deployer_data:
345+def store_bundles(collection, deployer_config, basket_id):
346+ """Store a basket of bundles into both MongoDB and ElasticSearch."""
347+ index_client = ElasticSearchClient.from_settings(settings)
348+ index_client.index_bundles(deployer_config.values())
349+ for key in deployer_config:
350 bundle_doc = {
351 '_id': '%s/%s' % (basket_id, key),
352- 'data': get_flattened_deployment(deployer_data, key),
353+ 'data': get_flattened_deployment(deployer_config, key),
354 }
355 collection.save(bundle_doc)
356
357
358=== modified file 'charmworld/tests/test_models.py'
359--- charmworld/tests/test_models.py 2013-07-29 11:29:28 +0000
360+++ charmworld/tests/test_models.py 2013-07-29 15:07:28 +0000
361@@ -1316,6 +1316,25 @@
362 store_bundles(self.db.bundles, parsed, 'wordpress-5')
363 self.assertItemsEqual(['wordpress-stage', 'wordpress-prod'], keys)
364
365+ def test_storing_a_bundle_includes_indexing_it(self):
366+ class FauxElasticSearchClient:
367+ index_bundles_called = False
368+
369+ @classmethod
370+ def from_settings(cls, *args):
371+ return cls
372+
373+ @classmethod
374+ def index_bundles(cls, *args):
375+ cls.index_bundles_called = True
376+
377+ with patch(
378+ 'charmworld.models.ElasticSearchClient',
379+ FauxElasticSearchClient):
380+ store_bundles(self.db.bundles, {}, 'wordpress-5')
381+
382+ self.assertTrue(FauxElasticSearchClient.index_bundles_called)
383+
384
385 class BundleTestCase(TestCase):
386 """Unit tests for the Bundle model that wraps bundle representations."""

Subscribers

People subscribed via source and target branches