Merge lp:~adeuring/charmworld/1206659-simpler-es-mapping into lp:~juju-jitsu/charmworld/trunk

Proposed by Abel Deuring
Status: Merged
Approved by: Abel Deuring
Approved revision: 353
Merged at revision: 357
Proposed branch: lp:~adeuring/charmworld/1206659-simpler-es-mapping
Merge into: lp:~juju-jitsu/charmworld/trunk
Diff against target: 263 lines (+144/-10)
5 files modified
charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py (+8/-0)
charmworld/migrations/versions/tests/test_migrations.py (+20/-0)
charmworld/search.py (+51/-4)
charmworld/tests/test_models.py (+6/-5)
charmworld/tests/test_search.py (+59/-1)
To merge this branch: bzr merge lp:~adeuring/charmworld/1206659-simpler-es-mapping
Reviewer Review Type Date Requested Status
Abel Deuring (community) Approve
Curtis Hovey (community) code Approve
Review via email: mp+181100@code.launchpad.net

Commit message

Define a (mostly) static mapping for charms.

Description of the change

This branch is a first step to fix bug 1206659: Charmworld indexes far too
many fields due to Elasticsearch dynamic mapping.

Limitations:

  - It defines a static mapping only for charms, not for bundles
  - The attributes "requires" and "provides" of a charm still have
    a dynamic mapping.

Details:

The method ElasticSearchClient.put_mapping() now defines a static mapping
for charms. All fields needed for text searching, for filtering and ordering
are indexed, but no other fields.

charm['requires'] and charm['provides'] are dictionaries with user-defined
keys, hence it is impossible to define a static mapping. Fixing bug
1194907
will allow to define a static mapping for these fields too.

test_mapping_changes_during_indexing() shows that indexing a charm
changes only the mapping for requires/provides.

TestCharmSource.test_save_deletes_on_error() needed a change: Attempts
to index two charm with different types of a property named "b" now just
work because charm['b'] is now ignored by ES.

Similary, TestCharmSource.test_sync_index_script() needed a change
because the field "maintainer" is no longer indexed. (When I tried to
fix this test, I noticed bug 1214328.)

Testing the new migration showed that ES considers the new static mapping
to be compatible with the old dynamic mapping: If
ElasticSearchClient.put_mapping() is called when the index already contains
data with the previous dynamic mapping, the new index specifications are
simply accepted but the mapping stays dynamic. Hence I added the parameter
force_reindex to charmworld.search.update().

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

Hi Able.

I think our intent is to treat charm_text_index, charm_integer_index as constants. Can you capitalise these identifiers and change them from lists to tuples (to ensure nothing can mutate them)?

Please place an XXX in front of the comment on line 230 because we intend to fix this issue soon.

I think this is good to land.

review: Approve (code)
353. By Abel Deuring

implemented reviewer's comments.

Revision history for this message
Abel Deuring (adeuring) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py'
--- charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py 1970-01-01 00:00:00 +0000
+++ charmworld/migrations/versions/016_fixed_ES_mapping_for_charms.py 2013-08-20 17:21:26 +0000
@@ -0,0 +1,8 @@
1# Copyright 2013 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4# fixed ElasticSearch mapping for charms.
5from charmworld.search import update as es_update
6
7def upgrade(db, index_client):
8 es_update(index_client)
09
=== modified file 'charmworld/migrations/versions/tests/test_migrations.py'
--- charmworld/migrations/versions/tests/test_migrations.py 2013-08-14 15:28:02 +0000
+++ charmworld/migrations/versions/tests/test_migrations.py 2013-08-20 17:21:26 +0000
@@ -18,6 +18,7 @@
18)18)
19from charmworld.migrations import migrate19from charmworld.migrations import migrate
20from charmworld.migrations.migrate import Versions20from charmworld.migrations.migrate import Versions
21from charmworld.tests.test_search import put_mapping
2122
22import mock23import mock
2324
@@ -208,3 +209,22 @@
208 # No exception is raised.209 # No exception is raised.
209 self.versions.run_migration(self.db, self.index_client,210 self.versions.run_migration(self.db, self.index_client,
210 '015_drop_bundle_doctype.py')211 '015_drop_bundle_doctype.py')
212
213
214class TestMigration016(MigrationTestBase):
215
216 def test_es_mapping_is_static(self):
217 self.use_index_client(put_mapping=False)
218 put_mapping(
219 self.index_client,
220 {'box': {'type': 'string', 'index': 'not_analyzed'}},
221 dynamic=True)
222 self.index_client.index_charm(factory.get_charm_json())
223 old_mapping = self.index_client.get_mapping()
224 self.assertIn(
225 'files', old_mapping['charm']['properties']['data']['properties'])
226 self.versions.run_migration(self.db, self.index_client,
227 '016_fixed_ES_mapping_for_charms.py')
228 new_mapping = self.index_client.get_mapping()
229 self.assertNotIn(
230 'files', new_mapping['charm']['properties']['data']['properties'])
211231
=== modified file 'charmworld/search.py'
--- charmworld/search.py 2013-08-14 15:12:04 +0000
+++ charmworld/search.py 2013-08-20 17:21:26 +0000
@@ -190,6 +190,47 @@
190 charm_properties = dict(190 charm_properties = dict(
191 (name, {'type': 'string', 'index': 'not_analyzed'})191 (name, {'type': 'string', 'index': 'not_analyzed'})
192 for name in charm_exact_index)192 for name in charm_exact_index)
193
194 CHARM_TEXT_INDEX = (
195 'summary',
196 'description',
197 )
198 charm_properties.update(dict(
199 (name, {'type': 'string'}) for name in CHARM_TEXT_INDEX))
200
201 CHARM_INTEGER_INDEX = (
202 'downloads',
203 'downloads_in_past_30_days',
204 'downloads_in_past_7_days',
205 'downloads_in_past_half_year',
206 )
207 charm_properties.update(dict(
208 (name, {'type': 'long'}) for name in CHARM_INTEGER_INDEX))
209
210 charm_properties['date_created'] = {
211 'type': 'date',
212 }
213 charm_properties['is_featured'] = {
214 'type': 'boolean',
215 }
216 charm_properties['store_data'] = {
217 'properties': {
218 'errors': {'type': 'string'},
219 }
220 }
221 # XXX Abel Deuring 2013-08-19 bug=1194907: The fields requires
222 # and provides are user defined dictionaries. We can't specify
223 # all possible keys. This partly reintroduces a large mapping;
224 # the proper way to fix this is described in bug 1194907.
225 charm_properties['requires'] = {
226 'type': 'object',
227 'dynamic': True,
228 }
229 charm_properties['provides'] = {
230 'type': 'object',
231 'dynamic': True,
232 }
233
193 charm_properties['config'] = {234 charm_properties['config'] = {
194 'properties': {235 'properties': {
195 'options': {236 'options': {
@@ -218,10 +259,13 @@
218 for name in bundle_exact_index)259 for name in bundle_exact_index)
219260
220 with translate_error():261 with translate_error():
221 for (name, properties) in ((CHARM, charm_properties),262 for (name, properties, dynamic) in (
222 (BUNDLE, bundle_properties)):263 (CHARM, charm_properties, False),
264 (BUNDLE, bundle_properties, True)):
223 self._client.put_mapping(self.index_name, name, {265 self._client.put_mapping(self.index_name, name, {
224 name: {266 name: {
267 'type': 'object',
268 'dynamic': dynamic,
225 'properties': properties,269 'properties': properties,
226 }270 }
227 })271 })
@@ -645,7 +689,7 @@
645 return new_index689 return new_index
646690
647691
648def update(index_client):692def update(index_client, force_reindex=False):
649 """Update the index to use the current mapping.693 """Update the index to use the current mapping.
650694
651 If the index isn't present, a new index is created and aliased to it.695 If the index isn't present, a new index is created and aliased to it.
@@ -654,7 +698,10 @@
654 """698 """
655 actual_client = index_client699 actual_client = index_client
656 try:700 try:
657 index_client.put_mapping()701 if force_reindex:
702 actual_client = reindex(index_client)
703 else:
704 index_client.put_mapping()
658 except IncompatibleMapping:705 except IncompatibleMapping:
659 actual_client = reindex(index_client)706 actual_client = reindex(index_client)
660 except IndexMissing:707 except IndexMissing:
661708
=== modified file 'charmworld/tests/test_models.py'
--- charmworld/tests/test_models.py 2013-08-19 14:45:29 +0000
+++ charmworld/tests/test_models.py 2013-08-20 17:21:26 +0000
@@ -1420,10 +1420,10 @@
1420 self.assertEqual(foo, bar)1420 self.assertEqual(foo, bar)
14211421
1422 def test_sync_index_script(self):1422 def test_sync_index_script(self):
1423 factory.makeCharm(self.db)[0]1423 factory.makeCharm(self.db, owner='b')[0]
1424 # Create a charm which is index-incompatible with the previous one.1424 # Create a charm which is index-incompatible with the previous one.
1425 factory.makeCharm(self.db,1425 factory.makeCharm(self.db, owner='a',
1426 maintainer={'first': 'J', 'last': 'Random'})[0]1426 description={'text': 'whatever'})[0]
1427 handler = self.get_handler('sync-index')1427 handler = self.get_handler('sync-index')
1428 with patch('charmworld.models.configure_logging'):1428 with patch('charmworld.models.configure_logging'):
1429 with patch('charmworld.models.get_ini', lambda: INI):1429 with patch('charmworld.models.get_ini', lambda: INI):
@@ -1438,18 +1438,19 @@
1438 source = CharmSource.from_request(self)1438 source = CharmSource.from_request(self)
1439 source.save({1439 source.save({
1440 '_id': 'a',1440 '_id': 'a',
1441 'b': {},
1442 'owner': 'foo',1441 'owner': 'foo',
1443 'series': 'bar',1442 'series': 'bar',
1444 'name': 'baz',1443 'name': 'baz',
1444 'description': 'whatever',
1445 })1445 })
1446 with self.assertRaises(ElasticHttpError):1446 with self.assertRaises(ElasticHttpError):
1447 source.save({1447 source.save({
1448 '_id': 'a',1448 '_id': 'a',
1449 'b': 'c',
1450 'owner': 'foo',1449 'owner': 'foo',
1451 'series': 'bar',1450 'series': 'bar',
1452 'name': 'baz',1451 'name': 'baz',
1452 # A dictionary as the value conflicts with the mapping.
1453 'description': {'text': 'whatever'},
1453 })1454 })
1454 self.assertIs(None, self.index_client.get('a'))1455 self.assertIs(None, self.index_client.get('a'))
14551456
14561457
=== modified file 'charmworld/tests/test_search.py'
--- charmworld/tests/test_search.py 2013-08-14 15:12:04 +0000
+++ charmworld/tests/test_search.py 2013-08-20 17:21:26 +0000
@@ -848,6 +848,23 @@
848 "Character '%s' at the end of a search term causes "848 "Character '%s' at the end of a search term causes "
849 "ElasticHttpError" % char)849 "ElasticHttpError" % char)
850850
851 def test_mapping_changes_during_indexing(self):
852 # When a charm is indexed, only the mappings for the properties
853 # "requires" and "provides" change.
854 before = self.index_client.get_mapping()['charm']['properties']
855 before = before['data']['properties']
856 charm = factory.get_charm_json()
857 self.index_client.index_charm(charm)
858 after = self.index_client.get_mapping()['charm']['properties']
859 after = after['data']['properties']
860 before_provides = before.pop('provides')
861 before_requires = before.pop('requires')
862 after_provides = after.pop('provides')
863 after_requires = after.pop('requires')
864 self.assertEqual(before, after)
865 self.assertNotEqual(before_provides, after_provides)
866 self.assertNotEqual(before_requires, after_requires)
867
851868
852class TestIndexingBundles(TestCase):869class TestIndexingBundles(TestCase):
853870
@@ -877,10 +894,11 @@
877 self.index_client.index_bundles([])894 self.index_client.index_bundles([])
878895
879896
880def put_mapping(client, properties):897def put_mapping(client, properties, dynamic=True):
881 client._client.put_mapping(898 client._client.put_mapping(
882 client.index_name, CHARM, {899 client.index_name, CHARM, {
883 CHARM: {900 CHARM: {
901 'dynamic': dynamic,
884 'properties': {'data': {'properties': properties}}902 'properties': {'data': {'properties': properties}}
885 }903 }
886 }904 }
@@ -930,6 +948,46 @@
930 self.assertEqual('not_analyzed',948 self.assertEqual('not_analyzed',
931 mapping['properties']['name']['index'])949 mapping['properties']['name']['index'])
932950
951 def update_to_static_mapping(self, force_reindex):
952 index_client = self.use_index_client(put_mapping=False)
953 put_mapping(
954 index_client,
955 {'box': {'type': 'string', 'index': 'not_analyzed'}},
956 dynamic=True)
957 update(index_client, force_reindex)
958 updated_mapping = index_client.get_mapping()
959 # A property 'files' is not defined in the current mapping.
960 self.assertNotIn(
961 'files',
962 updated_mapping['charm']['properties']['data']['properties'])
963 index_client.index_charm(factory.get_charm_json())
964 return index_client
965
966 def test_simple_change_dynamic_to_static_mapping(self):
967 # If an existing mapping is dynamic (the default for ElasticSearch)
968 # and if a new mapping is specified as static, the two mappings
969 # are considered compatible, but the resulting mapping is
970 # still dynamic.
971 index_client = self.update_to_static_mapping(force_reindex=False)
972 updated_mapping = index_client.get_mapping()
973 # charm['files'] is not supposed to be included in the new
974 # mapping, but it still exists if force_reindex is not used.
975 self.assertIn(
976 'files',
977 updated_mapping['charm']['properties']['data']['properties'])
978
979 def test_dynamic_to_static_mapping_forced_reindex(self):
980 # If an existing mapping is dynamic (the defult) and if a
981 # new mapping is specified as static, the two mappings
982 # are considered compatible, but the resulting mapping is
983 # still dynamic.
984 index_client = self.update_to_static_mapping(force_reindex=True)
985 updated_mapping = index_client.get_mapping()
986 # The mapping is indeed static; charm['files'] is not indexed.
987 self.assertNotIn(
988 'files',
989 updated_mapping['charm']['properties']['data']['properties'])
990
933991
934class TestReindex(TestCase):992class TestReindex(TestCase):
935993

Subscribers

People subscribed via source and target branches