Merge lp:~adeuring/charmworld/1206659-simpler-es-mapping into lp:~juju-jitsu/charmworld/trunk

Proposed by Abel Deuring
Status: Merged
Approved by: Abel Deuring
Approved revision: 353
Merged at revision: 357
Proposed branch: lp:~adeuring/charmworld/1206659-simpler-es-mapping
Merge into: lp:~juju-jitsu/charmworld/trunk
Diff against target: 263 lines (+144/-10)
5 files modified
charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py (+8/-0)
charmworld/migrations/versions/tests/test_migrations.py (+20/-0)
charmworld/search.py (+51/-4)
charmworld/tests/test_models.py (+6/-5)
charmworld/tests/test_search.py (+59/-1)
To merge this branch: bzr merge lp:~adeuring/charmworld/1206659-simpler-es-mapping
Reviewer Review Type Date Requested Status
Abel Deuring (community) Approve
Curtis Hovey (community) code Approve
Review via email: mp+181100@code.launchpad.net

Commit message

Define a (mostly) static mapping for charms.

Description of the change

This branch is a first step to fix bug 1206659: Charmworld indexes far too
many fields due to Elasticsearch dynamic mapping.

Limitations:

  - It defines a static mapping only for charms, not for bundles
  - The attributes "requires" and "provides" of a charm still have
    a dynamic mapping.

Details:

The method ElasticSearchClient.put_mapping() now defines a static mapping
for charms. All fields needed for text searching, for filtering and ordering
are indexed, but no other fields.

charm['requires'] and charm['provides'] are dictionaries with user-defined
keys, hence it is impossible to define a static mapping. Fixing bug
1194907
will allow to define a static mapping for these fields too.

test_mapping_changes_during_indexing() shows that indexing a charm
changes only the mapping for requires/provides.

TestCharmSource.test_save_deletes_on_error() needed a change: Attempts
to index two charm with different types of a property named "b" now just
work because charm['b'] is now ignored by ES.

Similary, TestCharmSource.test_sync_index_script() needed a change
because the field "maintainer" is no longer indexed. (When I tried to
fix this test, I noticed bug 1214328.)

Testing the new migration showed that ES considers the new static mapping
to be compatible with the old dynamic mapping: If
ElasticSearchClient.put_mapping() is called when the index already contains
data with the previous dynamic mapping, the new index specifications are
simply accepted but the mapping stays dynamic. Hence I added the parameter
force_reindex to charmworld.search.update().

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

Hi Able.

I think our intent is to treat charm_text_index, charm_integer_index as constants. Can you capitalise these identifiers and change them from lists to tuples (to ensure nothing can mutate them)?

Please place an XXX in front of the comment on line 230 because we intend to fix this issue soon.

I think this is good to land.

review: Approve (code)
353. By Abel Deuring

implemented reviewer's comments.

Revision history for this message
Abel Deuring (adeuring) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py'
2--- charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py 1970-01-01 00:00:00 +0000
3+++ charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py 2013-08-20 17:21:26 +0000
4@@ -0,0 +1,8 @@
5+# Copyright 2013 Canonical Ltd. This software is licensed under the
6+# GNU Affero General Public License version 3 (see the file LICENSE).
7+
8+# fixed ElasticSearch mapping for charms.
9+from charmworld.search import update as es_update
10+
11+def upgrade(db, index_client):
12+ es_update(index_client)
13
14=== modified file 'charmworld/migrations/versions/tests/test_migrations.py'
15--- charmworld/migrations/versions/tests/test_migrations.py 2013-08-14 15:28:02 +0000
16+++ charmworld/migrations/versions/tests/test_migrations.py 2013-08-20 17:21:26 +0000
17@@ -18,6 +18,7 @@
18 )
19 from charmworld.migrations import migrate
20 from charmworld.migrations.migrate import Versions
21+from charmworld.tests.test_search import put_mapping
22
23 import mock
24
25@@ -208,3 +209,22 @@
26 # No exception is raised.
27 self.versions.run_migration(self.db, self.index_client,
28 '015_drop_bundle_doctype.py')
29+
30+
31+class TestMigration016(MigrationTestBase):
32+
33+ def test_es_mapping_is_static(self):
34+ self.use_index_client(put_mapping=False)
35+ put_mapping(
36+ self.index_client,
37+ {'box': {'type': 'string', 'index': 'not_analyzed'}},
38+ dynamic=True)
39+ self.index_client.index_charm(factory.get_charm_json())
40+ old_mapping = self.index_client.get_mapping()
41+ self.assertIn(
42+ 'files', old_mapping['charm']['properties']['data']['properties'])
43+ self.versions.run_migration(self.db, self.index_client,
44+ '016_fixed_ES_mapping_for_charms.py')
45+ new_mapping = self.index_client.get_mapping()
46+ self.assertNotIn(
47+ 'files', new_mapping['charm']['properties']['data']['properties'])
48
49=== modified file 'charmworld/search.py'
50--- charmworld/search.py 2013-08-14 15:12:04 +0000
51+++ charmworld/search.py 2013-08-20 17:21:26 +0000
52@@ -190,6 +190,47 @@
53 charm_properties = dict(
54 (name, {'type': 'string', 'index': 'not_analyzed'})
55 for name in charm_exact_index)
56+
57+ CHARM_TEXT_INDEX = (
58+ 'summary',
59+ 'description',
60+ )
61+ charm_properties.update(dict(
62+ (name, {'type': 'string'}) for name in CHARM_TEXT_INDEX))
63+
64+ CHARM_INTEGER_INDEX = (
65+ 'downloads',
66+ 'downloads_in_past_30_days',
67+ 'downloads_in_past_7_days',
68+ 'downloads_in_past_half_year',
69+ )
70+ charm_properties.update(dict(
71+ (name, {'type': 'long'}) for name in CHARM_INTEGER_INDEX))
72+
73+ charm_properties['date_created'] = {
74+ 'type': 'date',
75+ }
76+ charm_properties['is_featured'] = {
77+ 'type': 'boolean',
78+ }
79+ charm_properties['store_data'] = {
80+ 'properties': {
81+ 'errors': {'type': 'string'},
82+ }
83+ }
84+ # XXX Abel Deuring 2013-08-19 bug=1194907: The fields requires
85+ # and provides are user defined dictionaries. We can't specify
86+ # all possible keys. This partly reintroduces a large mapping;
87+ # the proper way to fix this is described in bug 1194907.
88+ charm_properties['requires'] = {
89+ 'type': 'object',
90+ 'dynamic': True,
91+ }
92+ charm_properties['provides'] = {
93+ 'type': 'object',
94+ 'dynamic': True,
95+ }
96+
97 charm_properties['config'] = {
98 'properties': {
99 'options': {
100@@ -218,10 +259,13 @@
101 for name in bundle_exact_index)
102
103 with translate_error():
104- for (name, properties) in ((CHARM, charm_properties),
105- (BUNDLE, bundle_properties)):
106+ for (name, properties, dynamic) in (
107+ (CHARM, charm_properties, False),
108+ (BUNDLE, bundle_properties, True)):
109 self._client.put_mapping(self.index_name, name, {
110 name: {
111+ 'type': 'object',
112+ 'dynamic': dynamic,
113 'properties': properties,
114 }
115 })
116@@ -645,7 +689,7 @@
117 return new_index
118
119
120-def update(index_client):
121+def update(index_client, force_reindex=False):
122 """Update the index to use the current mapping.
123
124 If the index isn't present, a new index is created and aliased to it.
125@@ -654,7 +698,10 @@
126 """
127 actual_client = index_client
128 try:
129- index_client.put_mapping()
130+ if force_reindex:
131+ actual_client = reindex(index_client)
132+ else:
133+ index_client.put_mapping()
134 except IncompatibleMapping:
135 actual_client = reindex(index_client)
136 except IndexMissing:
137
138=== modified file 'charmworld/tests/test_models.py'
139--- charmworld/tests/test_models.py 2013-08-19 14:45:29 +0000
140+++ charmworld/tests/test_models.py 2013-08-20 17:21:26 +0000
141@@ -1420,10 +1420,10 @@
142 self.assertEqual(foo, bar)
143
144 def test_sync_index_script(self):
145- factory.makeCharm(self.db)[0]
146+ factory.makeCharm(self.db, owner='b')[0]
147 # Create a charm which is index-incompatible with the previous one.
148- factory.makeCharm(self.db,
149- maintainer={'first': 'J', 'last': 'Random'})[0]
150+ factory.makeCharm(self.db, owner='a',
151+ description={'text': 'whatever'})[0]
152 handler = self.get_handler('sync-index')
153 with patch('charmworld.models.configure_logging'):
154 with patch('charmworld.models.get_ini', lambda: INI):
155@@ -1438,18 +1438,19 @@
156 source = CharmSource.from_request(self)
157 source.save({
158 '_id': 'a',
159- 'b': {},
160 'owner': 'foo',
161 'series': 'bar',
162 'name': 'baz',
163+ 'description': 'whatever',
164 })
165 with self.assertRaises(ElasticHttpError):
166 source.save({
167 '_id': 'a',
168- 'b': 'c',
169 'owner': 'foo',
170 'series': 'bar',
171 'name': 'baz',
172+ # A dictionary as the value conflicts with the mapping.
173+ 'description': {'text': 'whatever'},
174 })
175 self.assertIs(None, self.index_client.get('a'))
176
177
178=== modified file 'charmworld/tests/test_search.py'
179--- charmworld/tests/test_search.py 2013-08-14 15:12:04 +0000
180+++ charmworld/tests/test_search.py 2013-08-20 17:21:26 +0000
181@@ -848,6 +848,23 @@
182 "Character '%s' at the end of a search term causes "
183 "ElasticHttpError" % char)
184
185+ def test_mapping_changes_during_indexing(self):
186+ # When a charm is indexed, only the mappings for the properties
187+ # "requires" and "provides" change.
188+ before = self.index_client.get_mapping()['charm']['properties']
189+ before = before['data']['properties']
190+ charm = factory.get_charm_json()
191+ self.index_client.index_charm(charm)
192+ after = self.index_client.get_mapping()['charm']['properties']
193+ after = after['data']['properties']
194+ before_provides = before.pop('provides')
195+ before_requires = before.pop('requires')
196+ after_provides = after.pop('provides')
197+ after_requires = after.pop('requires')
198+ self.assertEqual(before, after)
199+ self.assertNotEqual(before_provides, after_provides)
200+ self.assertNotEqual(before_requires, after_requires)
201+
202
203 class TestIndexingBundles(TestCase):
204
205@@ -877,10 +894,11 @@
206 self.index_client.index_bundles([])
207
208
209-def put_mapping(client, properties):
210+def put_mapping(client, properties, dynamic=True):
211 client._client.put_mapping(
212 client.index_name, CHARM, {
213 CHARM: {
214+ 'dynamic': dynamic,
215 'properties': {'data': {'properties': properties}}
216 }
217 }
218@@ -930,6 +948,46 @@
219 self.assertEqual('not_analyzed',
220 mapping['properties']['name']['index'])
221
222+ def update_to_static_mapping(self, force_reindex):
223+ index_client = self.use_index_client(put_mapping=False)
224+ put_mapping(
225+ index_client,
226+ {'box': {'type': 'string', 'index': 'not_analyzed'}},
227+ dynamic=True)
228+ update(index_client, force_reindex)
229+ updated_mapping = index_client.get_mapping()
230+ # A property 'files' is not defined in the current mapping.
231+ self.assertNotIn(
232+ 'files',
233+ updated_mapping['charm']['properties']['data']['properties'])
234+ index_client.index_charm(factory.get_charm_json())
235+ return index_client
236+
237+ def test_simple_change_dynamic_to_static_mapping(self):
238+ # If an existing mapping is dynamic (the default for ElasticSearch)
239+ # and if a new mapping is specified as static, the two mappings
240+ # are considered compatible, but the resulting mapping is
241+ # still dynamic.
242+ index_client = self.update_to_static_mapping(force_reindex=False)
243+ updated_mapping = index_client.get_mapping()
244+ # charm['files'] is not supposed to be included in the new
245+ # mapping, but it still exists if force_reindex is not used.
246+ self.assertIn(
247+ 'files',
248+ updated_mapping['charm']['properties']['data']['properties'])
249+
250+ def test_dynamic_to_static_mapping_forced_reindex(self):
251+ # If an existing mapping is dynamic (the defult) and if a
252+ # new mapping is specified as static, the two mappings
253+ # are considered compatible, but the resulting mapping is
254+ # still dynamic.
255+ index_client = self.update_to_static_mapping(force_reindex=True)
256+ updated_mapping = index_client.get_mapping()
257+ # The mapping is indeed static; charm['files'] is not indexed.
258+ self.assertNotIn(
259+ 'files',
260+ updated_mapping['charm']['properties']['data']['properties'])
261+
262
263 class TestReindex(TestCase):
264

Subscribers

People subscribed via source and target branches