Merge lp:~jameinel/u1db/index-transformations into lp:u1db

Proposed by John A Meinel
Status: Merged
Merged at revision: 144
Proposed branch: lp:~jameinel/u1db/index-transformations
Merge into: lp:u1db
Diff against target: 922 lines (+679/-51)
12 files modified
.bzrignore (+1/-0)
.testr.conf (+4/-0)
doc/sqlite_schema.txt (+0/-1)
u1db/backends/inmemory.py (+29/-21)
u1db/backends/sqlite_backend.py (+51/-23)
u1db/errors.py (+4/-0)
u1db/query_parser.py (+235/-0)
u1db/tests/__init__.py (+0/-1)
u1db/tests/test_backends.py (+57/-0)
u1db/tests/test_inmemory.py (+4/-5)
u1db/tests/test_query_parser.py (+277/-0)
u1db/tests/test_sqlite_backend.py (+17/-0)
To merge this branch: bzr merge lp:~jameinel/u1db/index-transformations
Reviewer Review Type Date Requested Status
Samuele Pedroni Approve
Review via email: mp+84250@code.launchpad.net

Description of the change

This takes James Westby's great work on index transformations, and updates it a bit. Some thoughts:

1) This changes it so that the Getter api always thinks in terms of lists, rather than sometimes direct values and sometimes lists. It cleans up some of the internal code that had to check if thing was a list or not, and then apply it to the list, vs apply it to an item.

I think it will also match a C api better, since you don't have object types there. So you just end up with always-a-list. (still not sure what to do about ints vs strings, but I'm not worrying too much about that yet.)

2) The only one I'm not very sure about is IsNull. At the moment doing something like:

   create_index('null_field', ["is_null(field)"])

 Will always return a single width list. So if your document is:
  '{"field": "value"}' => [True]
  '{}' => [False]
  '{"field": ["list", "values"]}' => [False]
  '{"field": null}' => [True]
  '{"field": []}' => [True]
  '{"field": [null]}' => ??? I think [True]
  '{"field": [1, null]}' => [False]

 In James's original implementation the only one that was different was the empty-list, which would
 claim that an empty list is not null.

 I think this is what we want, though.

3) We should probably move how the indexing code is tested into being a backend permutation test. At least, we should add more than the small handful of tests we have now.

To post a comment you must log in.
Revision history for this message
Samuele Pedroni (pedronis) wrote :

I suppose the parser is written in a style that is easy to translate to C but then I would really use some kind of caching of parser->getter outcomes in SQLitePartialExpandDatabase._evaluate_index. I would expect inserting data otherwise to be (quite?) slower than it was before these changes.

69 + if keys is None:
70 return None

that should just be if key: in the new world, the code doesn't break, but we don't bail out early anymore

271 + elif isinstance(raw_doc, list):
272 + # If anything in the list is not a simple type, the list is
273 + result = [val for val in raw_doc

I don't understand the comment there, is it truncated?

is it intentional to check in .testr.conf?

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/2/2011 3:55 PM, Samuele Pedroni wrote:
> I suppose the parser is written in a style that is easy to
> translate to C but then I would really use some kind of caching of
> parser->getter outcomes in
> SQLitePartialExpandDatabase._evaluate_index. I would expect
> inserting data otherwise to be (quite?) slower than it was before
> these changes.
>

Yeah, I'm working on actually benchmarking it to make sure it matters.
It shouldn't be hard to just have a:

parsers = []
for field in fields:
  parsers.append(Parser(field))

for content in docs:
  raw_doc = loads(content)
  for parser in parsers:
    rows = parser.apply(raw_doc)

> 69 + if keys is None: 70 return None
>
> that should just be if key: in the new world, the code doesn't
> break, but we don't bail out early anymore

sure, good catch.

>
> 271 + elif isinstance(raw_doc, list): 272 + # If anything in the
> list is not a simple type, the list is 273 + result = [val for val
> in raw_doc
>
> I don't understand the comment there, is it truncated?
>

yeah, it got truncated. Originally there was a loop that if *any* item
in the list was not a simple value, then it would treat the whole list
as None. I changed it to just omit the non-simple types.

>
> is it intentional to check in .testr.conf?

This is something James did. But yes. It tells the 'testr' (test
repository?) program how to run your test suite. I'm fine including it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7ZBi0ACgkQJdeBCYSNAAMuZACfcwv8OzDgGeq4ZPFPKQGuW+6a
W5kAoK6iubdWpCs3HS5JrMiEEHh8ACli
=B2hV
-----END PGP SIGNATURE-----

144. By John A Meinel

Use a for loop instead of while + counters to do _take_word

145. By John A Meinel

Pre-parse the index definition before we evaluate them on documents.

146. By John A Meinel

Add some doc strings.

Revision history for this message
John A Meinel (jameinel) wrote :

This has now been updated to pre-parse the index definitions, and then apply the getters to the documents. There was one trick that current trunk creates a PRIMARY KEY index on document_fields, and this patch removes it. These tests were done with the index removed.

Here are the benchmark results:

  2.530s create_index(title) trunk
  3.357s create_index(title) no-caching
  2.588s create_index(title) caching

The other thing to note is the effect if you create indexes with more complex definitions:
   2.588s create title
   2.859s create low_title
   3.879s create low_split_title
   3.540s create low_low_title

vs

   3.357s create title
   3.735s create low_title
   5.121s create low_split_title
   5.329s create low_low_title

Note that some of the slowdown is because the document_fields table is getting bigger. Also, we should note that the create_index time with *no* new fields inserted is:

   1.925s create low_title (trunk)

(that is because trunk doesn't support lower() as an operator yet, so it gives a good baseline of functionality.)

I was a bit surprised that it was that slow, given the speeds of _iter_all_docs and simplejson.loads.

147. By John A Meinel

Fix the shortcut code.

Revision history for this message
Samuele Pedroni (pedronis) wrote :

looks good, I would probably be even more aggressive and do this, to amortize parsing the getters across put_docs as well:

=== modified file 'u1db/backends/sqlite_backend.py'
--- u1db/backends/sqlite_backend.py 2011-12-06 14:18:56 +0000
+++ u1db/backends/sqlite_backend.py 2011-12-06 15:11:28 +0000
@@ -39,6 +39,8 @@
         self._db_handle = dbapi2.connect(sqlite_file)
         self._real_replica_uid = None
         self._ensure_schema()
+ self._parser = query_parser.Parser()
+ self._cached_getters = {}

     def get_sync_target(self):
         return SQLiteSyncTarget(self)
@@ -199,8 +201,11 @@

     def _parse_index_definition(self, index_field):
         """Parse a field definition for an index, returning a Getter."""
- parser = query_parser.Parser()
- getter = parser.parse(index_field)
+ try:
+ getter = self._cached_getters[index_field]
+ except KeyError:
+ getter = self._parser.parse(index_field)
+ self._cached_getters[index_field] = getter
         return getter

     def _update_indexes(self, doc_id, raw_doc, getters, db_cursor):

review: Approve
148. By John A Meinel

Fix up a outdated comment.

149. By John A Meinel

add a comment about longer-lived Getter caching.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2011-12-02 19:05:35 +0000
3+++ .bzrignore 2011-12-06 15:24:25 +0000
4@@ -2,3 +2,4 @@
5 ./dist
6 ./u1db.egg-info
7 doc/sqlite_schema.html
8+.testrepository
9
10=== added file '.testr.conf'
11--- .testr.conf 1970-01-01 00:00:00 +0000
12+++ .testr.conf 2011-12-06 15:24:25 +0000
13@@ -0,0 +1,4 @@
14+[DEFAULT]
15+test_command=${PYTHON:-python} -m subunit.run $LISTOPT $IDOPTION discover u1db
16+test_id_option=--load-list $IDFILE
17+test_list_option=--list
18
19=== modified file 'doc/sqlite_schema.txt'
20--- doc/sqlite_schema.txt 2011-11-16 08:03:06 +0000
21+++ doc/sqlite_schema.txt 2011-12-06 15:24:25 +0000
22@@ -38,7 +38,6 @@
23 doc_id TEXT,
24 field_name TEXT,
25 value TEXT,
26- CONSTRAINT document_fields_pkey PRIMARY KEY (doc_id, field_name)
27 );
28
29 So if you had two documents of the form::
30
31=== modified file 'u1db/backends/inmemory.py'
32--- u1db/backends/inmemory.py 2011-12-02 11:02:42 +0000
33+++ u1db/backends/inmemory.py 2011-12-06 15:24:25 +0000
34@@ -16,7 +16,11 @@
35
36 import simplejson
37
38-from u1db import Document, errors
39+from u1db import (
40+ Document,
41+ errors,
42+ query_parser,
43+ )
44 from u1db.backends import CommonBackend, CommonSyncTarget
45
46
47@@ -182,6 +186,8 @@
48 self._name = index_name
49 self._definition = index_definition
50 self._values = {}
51+ parser = query_parser.Parser()
52+ self._getters = parser.parse_all(self._definition)
53
54 def evaluate_json(self, doc):
55 """Determine the 'key' after applying this index to the doc."""
56@@ -190,32 +196,35 @@
57
58 def evaluate(self, obj):
59 """Evaluate a dict object, applying this definition."""
60- result = []
61- for field in self._definition:
62- val = obj
63- for subfield in field.split('.'):
64- val = val.get(subfield)
65- if val is None:
66- return None
67- result.append(val)
68- return '\x01'.join(result)
69+ all_rows = [[]]
70+ for getter in self._getters:
71+ new_rows = []
72+ keys = getter.get(obj)
73+ if not keys:
74+ return []
75+ for key in keys:
76+ new_rows.extend([row + [key] for row in all_rows])
77+ all_rows = new_rows
78+ all_rows = ['\x01'.join(row) for row in all_rows]
79+ return all_rows
80
81 def add_json(self, doc_id, doc):
82 """Add this json doc to the index."""
83- key = self.evaluate_json(doc)
84- if key is None:
85+ keys = self.evaluate_json(doc)
86+ if not keys:
87 return
88- self._values.setdefault(key, []).append(doc_id)
89+ for key in keys:
90+ self._values.setdefault(key, []).append(doc_id)
91
92 def remove_json(self, doc_id, doc):
93 """Remove this json doc from the index."""
94- key = self.evaluate_json(doc)
95- if key is None:
96- return
97- doc_ids = self._values[key]
98- doc_ids.remove(doc_id)
99- if not doc_ids:
100- del self._values[key]
101+ keys = self.evaluate_json(doc)
102+ if keys:
103+ for key in keys:
104+ doc_ids = self._values[key]
105+ doc_ids.remove(doc_id)
106+ if not doc_ids:
107+ del self._values[key]
108
109 def _find_non_wildcards(self, values):
110 """Check if this should be a wildcard match.
111@@ -288,4 +297,3 @@
112 def record_sync_info(self, other_replica_uid, other_replica_generation):
113 self._db.set_sync_generation(other_replica_uid,
114 other_replica_generation)
115-
116
117=== modified file 'u1db/backends/sqlite_backend.py'
118--- u1db/backends/sqlite_backend.py 2011-12-02 11:02:42 +0000
119+++ u1db/backends/sqlite_backend.py 2011-12-06 15:24:25 +0000
120@@ -21,7 +21,12 @@
121 import uuid
122
123 from u1db.backends import CommonBackend, CommonSyncTarget
124-from u1db import Document, errors
125+from u1db import (
126+ compat,
127+ Document,
128+ errors,
129+ query_parser,
130+ )
131
132
133 class SQLiteDatabase(CommonBackend):
134@@ -136,9 +141,7 @@
135 c.execute("CREATE TABLE document_fields ("
136 " doc_id TEXT,"
137 " field_name TEXT,"
138- " value TEXT,"
139- " CONSTRAINT document_fields_pkey"
140- " PRIMARY KEY (doc_id, field_name))")
141+ " value TEXT)")
142 # TODO: Should we include doc_id or not? By including it, the
143 # content can be returned directly from the index, and
144 # matched with the documents table, roughly saving 1 btree
145@@ -194,6 +197,36 @@
146 def _extra_schema_init(self, c):
147 """Add any extra fields, etc to the basic table definitions."""
148
149+ def _parse_index_definition(self, index_field):
150+ """Parse a field definition for an index, returning a Getter."""
151+ # Note: We may want to keep a Parser object around, and cache the
152+ # Getter objects for a greater length of time. Specifically, if
153+ # you create a bunch of indexes, and then insert 50k docs, you'll
154+ # re-parse the indexes between puts. The time to insert the docs
155+ # is still likely to dominate put_doc time, though.
156+ parser = query_parser.Parser()
157+ getter = parser.parse(index_field)
158+ return getter
159+
160+ def _update_indexes(self, doc_id, raw_doc, getters, db_cursor):
161+ """Update document_fields for a single document.
162+
163+ :param doc_id: Identifier for this document
164+ :param raw_doc: The python dict representation of the document.
165+ :param getters: A list of [(field_name, Getter)]. Getter.get will be
166+ called to evaluate the index definition for this document, and the
167+ results will be inserted into the db.
168+ :param db_cursor: An sqlite Cursor.
169+ :return: None
170+ """
171+ values = []
172+ for field_name, getter in getters:
173+ for idx_value in getter.get(raw_doc):
174+ values.append((doc_id, field_name, idx_value))
175+ if values:
176+ db_cursor.executemany(
177+ "INSERT INTO document_fields VALUES (?, ?, ?)", values)
178+
179 def _set_replica_uid(self, replica_uid):
180 """Force the replica_uid to be set."""
181 with self._db_handle:
182@@ -550,21 +583,9 @@
183
184 def _evaluate_index(self, raw_doc, field):
185 val = raw_doc
186- for subfield in field.split('.'):
187- if val is None:
188- return None
189- val = val.get(subfield, None)
190- return val
191-
192- def _update_indexes(self, doc_id, raw_doc, fields, db_cursor):
193- values = []
194- for field_name in fields:
195- idx_value = self._evaluate_index(raw_doc, field_name)
196- if idx_value is not None:
197- values.append((doc_id, field_name, idx_value))
198- if values:
199- db_cursor.executemany(
200- "INSERT INTO document_fields VALUES (?, ?, ?)", values)
201+ parser = query_parser.Parser()
202+ getter = parser.parse(field)
203+ return getter.get(raw_doc)
204
205 def _put_and_update_indexes(self, old_doc, doc):
206 c = self._db_handle.cursor()
207@@ -584,8 +605,9 @@
208 if indexed_fields:
209 # It is expected that len(indexed_fields) is shorter than
210 # len(raw_doc)
211- # TODO: Handle nested indexed fields.
212- self._update_indexes(doc.doc_id, raw_doc, indexed_fields, c)
213+ getters = [(field, self._parse_index_definition(field))
214+ for field in indexed_fields]
215+ self._update_indexes(doc.doc_id, raw_doc, getters, c)
216 c.execute("INSERT INTO transaction_log(doc_id) VALUES (?)",
217 (doc.doc_id,))
218
219@@ -613,9 +635,15 @@
220 yield row
221
222 def _update_all_indexes(self, new_fields):
223+ """Iterate all the documents, and add content to document_fields.
224+
225+ :param new_fields: The index definitions that need to be added.
226+ """
227+ getters = [(field, self._parse_index_definition(field))
228+ for field in new_fields]
229+ c = self._db_handle.cursor()
230 for doc_id, doc in self._iter_all_docs():
231 raw_doc = simplejson.loads(doc)
232- c = self._db_handle.cursor()
233- self._update_indexes(doc_id, raw_doc, new_fields, c)
234+ self._update_indexes(doc_id, raw_doc, getters, c)
235
236 SQLiteDatabase.register_implementation(SQLitePartialExpandDatabase)
237
238=== modified file 'u1db/errors.py'
239--- u1db/errors.py 2011-12-01 14:46:15 +0000
240+++ u1db/errors.py 2011-12-06 15:24:25 +0000
241@@ -65,6 +65,10 @@
242 wire_description = "database does not exist"
243
244
245+class IndexDefinitionParseError(U1DBError):
246+ """The index definition cannot be parsed."""
247+
248+
249 class HTTPError(U1DBError):
250 """Unspecific HTTP errror."""
251
252
253=== added file 'u1db/query_parser.py'
254--- u1db/query_parser.py 1970-01-01 00:00:00 +0000
255+++ u1db/query_parser.py 2011-12-06 15:24:25 +0000
256@@ -0,0 +1,235 @@
257+# Copyright 2011 Canonical Ltd.
258+#
259+# This program is free software: you can redistribute it and/or modify it
260+# under the terms of the GNU General Public License version 3, as published
261+# by the Free Software Foundation.
262+#
263+# This program is distributed in the hope that it will be useful, but
264+# WITHOUT ANY WARRANTY; without even the implied warranties of
265+# MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
266+# PURPOSE. See the GNU General Public License for more details.
267+#
268+# You should have received a copy of the GNU General Public License along
269+# with this program. If not, see <http://www.gnu.org/licenses/>.
270+
271+"""Code for parsing Index definitions."""
272+
273+import string
274+
275+from u1db import (
276+ errors,
277+ )
278+
279+
280+class Getter(object):
281+ """Get values from a document based on a specification."""
282+
283+ def get(self, raw_doc):
284+ """Get a value from the document.
285+
286+ :param raw_doc: a python dictionary to get the value from.
287+ :return: A list of values that match the description.
288+ """
289+ raise NotImplementedError(self.get)
290+
291+
292+class StaticGetter(Getter):
293+ """A getter that returns a defined value (independent of the doc)."""
294+
295+ def __init__(self, value):
296+ """Create a StaticGetter.
297+
298+ :param value: the value to return when get is called.
299+ """
300+ if value is None:
301+ self.value = []
302+ elif isinstance(value, list):
303+ self.value = value
304+ else:
305+ self.value = [value]
306+
307+ def get(self, raw_doc):
308+ return self.value
309+
310+
311+class ExtractField(Getter):
312+ """Extract a field from the document."""
313+
314+ def __init__(self, field):
315+ """Create an ExtractField object.
316+
317+ When a document is passed to get() this will return a value
318+ from the document based on the field specifier passed to
319+ the constructor.
320+
321+ None will be returned if the field is nonexistant, or refers to an
322+ object, rather than a simple type or list of simple types.
323+
324+ :param field: a specifier for the field to return.
325+ This is either a field name, or a dotted field name.
326+ """
327+ self.field = field
328+
329+ def get(self, raw_doc):
330+ for subfield in self.field.split('.'):
331+ if isinstance(raw_doc, dict):
332+ raw_doc = raw_doc.get(subfield)
333+ else:
334+ return []
335+ if isinstance(raw_doc, dict):
336+ return []
337+ if raw_doc is None:
338+ result = []
339+ elif isinstance(raw_doc, list):
340+ # Strip anything in the list that isn't a simple type
341+ result = [val for val in raw_doc
342+ if not isinstance(val, (dict, list))]
343+ else:
344+ result = [raw_doc]
345+ return result
346+
347+
348+class Transformation(Getter):
349+ """A transformation on a value from another Getter."""
350+
351+ name = None
352+ """The name that the transform has in a query string."""
353+
354+ def __init__(self, inner):
355+ """Create a transformation.
356+
357+ :param inner: the Getter to transform the value for.
358+ """
359+ self.inner = inner
360+
361+ def get(self, raw_doc):
362+ inner_values = self.inner.get(raw_doc)
363+ assert isinstance(inner_values, list), 'get() should always return a list'
364+ return self.transform(inner_values)
365+
366+ def transform(self, values):
367+ """Transform the values.
368+
369+ This should be implemented by subclasses to transform the
370+ value when get() is called.
371+
372+ :param values: the values from the other Getter
373+ :return: the transformed values.
374+ """
375+ raise NotImplementedError(self.transform)
376+
377+
378+class Lower(Transformation):
379+ """Lowercase a string.
380+
381+ This transformation will return None for non-string inputs. However,
382+ it will lowercase any strings in a list, dropping any elements
383+ that are not strings.
384+ """
385+
386+ name = "lower"
387+
388+ def _can_transform(self, val):
389+ return not isinstance(val, (int, bool, float, list, dict))
390+
391+ def transform(self, values):
392+ if not values:
393+ return []
394+ return [val.lower() for val in values if self._can_transform(val)]
395+
396+
397+class SplitWords(Transformation):
398+ """Split a string on whitespace.
399+
400+ This Getter will return [] for non-string inputs. It will however
401+ split any strings in an input list, discarding any elements that
402+ are not strings.
403+ """
404+
405+ name = "split_words"
406+
407+ def _can_transform(self, val):
408+ return not isinstance(val, (int, bool, float, list, dict))
409+
410+ def transform(self, values):
411+ if not values:
412+ return []
413+ result = []
414+ for value in values:
415+ if self._can_transform(value):
416+ # TODO: This is quadratic to search the list linearly while we
417+ # are appending to it. Consider using a set() instead.
418+ for word in value.split():
419+ if word not in result:
420+ result.append(word)
421+ return result
422+
423+
424+class IsNull(Transformation):
425+ """Indicate whether the input is None.
426+
427+ This Getter returns a bool indicating whether the input is nil.
428+ """
429+
430+ name = "is_null"
431+
432+ def transform(self, values):
433+ return [len(values) == 0]
434+
435+
436+class Parser(object):
437+ """Parse an index expression into a sequence of transformations."""
438+
439+ _transformations = {}
440+ _word_chars = string.lowercase + string.uppercase + "._" + string.digits
441+
442+ def _take_word(self, partial):
443+ word = ''
444+ for idx, char in enumerate(partial):
445+ if char not in self._word_chars:
446+ return partial[:idx], partial[idx:]
447+ return partial, ''
448+
449+ def parse(self, field):
450+ inner = self._inner_parse(field)
451+ return inner
452+
453+ def _inner_parse(self, field):
454+ word, field = self._take_word(field)
455+ if field.startswith("("):
456+ # We have an operation
457+ if not field.endswith(")"):
458+ raise errors.IndexDefinitionParseError(
459+ "Invalid transformation function: %s" % field)
460+ op = self._transformations.get(word, None)
461+ if op is None:
462+ raise errors.IndexDefinitionParseError(
463+ "Unknown operation: %s" % word)
464+ inner = self._inner_parse(field[1:-1])
465+ return op(inner)
466+ else:
467+ if len(field) != 0:
468+ raise errors.IndexDefinitionParseError(
469+ "Unhandled characters: %s" % (field,))
470+ if len(word) == 0:
471+ raise errors.IndexDefinitionParseError(
472+ "Missing field specifier")
473+ if word.endswith("."):
474+ raise errors.IndexDefinitionParseError(
475+ "Invalid field specifier: %s" % word)
476+ return ExtractField(word)
477+
478+ def parse_all(self, fields):
479+ return [self.parse(field) for field in fields]
480+
481+ @classmethod
482+ def register_transormation(cls, transform):
483+ assert transform.name not in cls._transformations, (
484+ "Transform %s already registered for %s"
485+ % (transform.name, cls._transformations[transform.name]))
486+ cls._transformations[transform.name] = transform
487+
488+
489+Parser.register_transormation(SplitWords)
490+Parser.register_transormation(Lower)
491+Parser.register_transormation(IsNull)
492
493=== modified file 'u1db/tests/__init__.py'
494--- u1db/tests/__init__.py 2011-11-28 19:51:23 +0000
495+++ u1db/tests/__init__.py 2011-12-06 15:24:25 +0000
496@@ -107,7 +107,6 @@
497
498 class DatabaseBaseTests(TestCase):
499
500- create_database = None
501 scenarios = LOCAL_DATABASES_SCENARIOS
502
503 def create_database(self, replica_uid):
504
505=== modified file 'u1db/tests/test_backends.py'
506--- u1db/tests/test_backends.py 2011-12-01 14:46:15 +0000
507+++ u1db/tests/test_backends.py 2011-12-06 15:24:25 +0000
508@@ -518,6 +518,63 @@
509 self.assertEqual([doc1],
510 self.db.get_from_index('test-idx', [('*',)]))
511
512+ def test_get_from_index_with_lower(self):
513+ self.db.create_index("index", ["lower(name)"])
514+ content = '{"name": "Foo"}'
515+ doc = self.db.create_doc(content)
516+ rows = self.db.get_from_index("index", [("foo", )])
517+ self.assertEqual([doc], rows)
518+
519+ def test_get_from_index_with_lower_matches_same_case(self):
520+ self.db.create_index("index", ["lower(name)"])
521+ content = '{"name": "foo"}'
522+ doc = self.db.create_doc(content)
523+ rows = self.db.get_from_index("index", [("foo", )])
524+ self.assertEqual([doc], rows)
525+
526+ def test_index_lower_doesnt_match_different_case(self):
527+ self.db.create_index("index", ["lower(name)"])
528+ content = '{"name": "Foo"}'
529+ doc = self.db.create_doc(content)
530+ rows = self.db.get_from_index("index", [("Foo", )])
531+ self.assertEqual([], rows)
532+
533+ def test_index_lower_doesnt_match_other_index(self):
534+ self.db.create_index("index", ["lower(name)"])
535+ self.db.create_index("other_index", ["name"])
536+ content = '{"name": "Foo"}'
537+ doc = self.db.create_doc(content)
538+ rows = self.db.get_from_index("index", [("Foo", )])
539+ self.assertEqual(0, len(rows))
540+
541+ def test_index_list(self):
542+ self.db.create_index("index", ["name"])
543+ content = '{"name": ["foo", "bar"]}'
544+ doc = self.db.create_doc(content)
545+ rows = self.db.get_from_index("index", [("bar", )])
546+ self.assertEqual([doc], rows)
547+
548+ def test_index_split_words_match_first(self):
549+ self.db.create_index("index", ["split_words(name)"])
550+ content = '{"name": "foo bar"}'
551+ doc = self.db.create_doc(content)
552+ rows = self.db.get_from_index("index", [("foo", )])
553+ self.assertEqual([doc], rows)
554+
555+ def test_index_split_words_match_second(self):
556+ self.db.create_index("index", ["split_words(name)"])
557+ content = '{"name": "foo bar"}'
558+ doc = self.db.create_doc(content)
559+ rows = self.db.get_from_index("index", [("bar", )])
560+ self.assertEqual([doc], rows)
561+
562+ def test_index_split_words_match_both(self):
563+ self.db.create_index("index", ["split_words(name)"])
564+ content = '{"name": "foo foo"}'
565+ doc = self.db.create_doc(content)
566+ rows = self.db.get_from_index("index", [("foo", )])
567+ self.assertEqual([doc], rows)
568+
569 def test_get_partial_from_index(self):
570 content1 = '{"k1": "v1", "k2": "v2"}'
571 content2 = '{"k1": "v1", "k2": "x2"}'
572
573=== modified file 'u1db/tests/test_inmemory.py'
574--- u1db/tests/test_inmemory.py 2011-12-02 11:02:42 +0000
575+++ u1db/tests/test_inmemory.py 2011-12-06 15:24:25 +0000
576@@ -53,20 +53,20 @@
577
578 def test_evaluate_json(self):
579 idx = inmemory.InMemoryIndex('idx-name', ['key'])
580- self.assertEqual('value', idx.evaluate_json(simple_doc))
581+ self.assertEqual(['value'], idx.evaluate_json(simple_doc))
582
583 def test_evaluate_json_field_None(self):
584 idx = inmemory.InMemoryIndex('idx-name', ['missing'])
585- self.assertEqual(None, idx.evaluate_json(simple_doc))
586+ self.assertEqual([], idx.evaluate_json(simple_doc))
587
588 def test_evaluate_json_subfield_None(self):
589 idx = inmemory.InMemoryIndex('idx-name', ['key', 'missing'])
590- self.assertEqual(None, idx.evaluate_json(simple_doc))
591+ self.assertEqual([], idx.evaluate_json(simple_doc))
592
593 def test_evaluate_multi_index(self):
594 doc = '{"key": "value", "key2": "value2"}'
595 idx = inmemory.InMemoryIndex('idx-name', ['key', 'key2'])
596- self.assertEqual('value\x01value2',
597+ self.assertEqual(['value\x01value2'],
598 idx.evaluate_json(doc))
599
600 def test_update_ignores_None(self):
601@@ -119,4 +119,3 @@
602 idx._find_non_wildcards, ('a', 'b', 'c', 'd'))
603 self.assertRaises(errors.InvalidValueForIndex,
604 idx._find_non_wildcards, ('*', 'b', 'c'))
605-
606
607=== added file 'u1db/tests/test_query_parser.py'
608--- u1db/tests/test_query_parser.py 1970-01-01 00:00:00 +0000
609+++ u1db/tests/test_query_parser.py 2011-12-06 15:24:25 +0000
610@@ -0,0 +1,277 @@
611+# Copyright 2011 Canonical Ltd.
612+#
613+# This program is free software: you can redistribute it and/or modify it
614+# under the terms of the GNU General Public License version 3, as published
615+# by the Free Software Foundation.
616+#
617+# This program is distributed in the hope that it will be useful, but
618+# WITHOUT ANY WARRANTY; without even the implied warranties of
619+# MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
620+# PURPOSE. See the GNU General Public License for more details.
621+#
622+# You should have received a copy of the GNU General Public License along
623+# with this program. If not, see <http://www.gnu.org/licenses/>.
624+
625+from u1db import (
626+ errors,
627+ query_parser,
628+ tests,
629+ )
630+
631+
632+trivial_raw_doc = {}
633+
634+class TestStaticGetter(tests.TestCase):
635+
636+ def test_returns_string(self):
637+ getter = query_parser.StaticGetter('foo')
638+ self.assertEqual(['foo'], getter.get(trivial_raw_doc))
639+
640+ def test_returns_int(self):
641+ getter = query_parser.StaticGetter(9)
642+ self.assertEqual([9], getter.get(trivial_raw_doc))
643+
644+ def test_returns_float(self):
645+ getter = query_parser.StaticGetter(9.2)
646+ self.assertEqual([9.2], getter.get(trivial_raw_doc))
647+
648+ def test_returns_None(self):
649+ getter = query_parser.StaticGetter(None)
650+ self.assertEqual([], getter.get(trivial_raw_doc))
651+
652+ def test_returns_list(self):
653+ getter = query_parser.StaticGetter(['a', 'b'])
654+ self.assertEqual(['a', 'b'], getter.get(trivial_raw_doc))
655+
656+
657+class TestExtractField(tests.TestCase):
658+
659+ def assertExtractField(self, expected, field_name, raw_doc):
660+ getter = query_parser.ExtractField(field_name)
661+ self.assertEqual(expected, getter.get(raw_doc))
662+
663+ def test_get_value(self):
664+ self.assertExtractField(['bar'], 'foo', {'foo': 'bar'})
665+
666+ def test_get_value_None(self):
667+ self.assertExtractField([], 'foo', {'foo': None})
668+
669+ def test_get_value_missing_key(self):
670+ self.assertExtractField([], 'foo', {})
671+
672+ def test_get_value_subfield(self):
673+ self.assertExtractField(['bar'], 'foo.baz', {'foo': {'baz': 'bar'}})
674+
675+ def test_get_value_subfield_missing(self):
676+ self.assertExtractField([], 'foo.baz', {'foo': 'bar'})
677+
678+ def test_get_value_dict(self):
679+ self.assertExtractField([], 'foo', {'foo': {'baz': 'bar'}})
680+
681+ def test_get_value_list(self):
682+ self.assertExtractField(['bar', 'zap'], 'foo', {'foo': ['bar', 'zap']})
683+
684+ def test_get_value_mixed_list(self):
685+ self.assertExtractField(['bar', 'zap'], 'foo',
686+ {'foo': ['bar', ['baa'], 'zap', {'bing': 9}]})
687+
688+ def test_get_value_list_of_dicts(self):
689+ self.assertExtractField([], 'foo', {'foo': [{'zap': 'bar'}]})
690+
691+ def test_get_value_int(self):
692+ self.assertExtractField([9], 'foo', {'foo': 9})
693+
694+ def test_get_value_float(self):
695+ self.assertExtractField([9.2], 'foo', {'foo': 9.2})
696+
697+ def test_get_value_bool(self):
698+ self.assertExtractField([True], 'foo', {'foo': True})
699+ self.assertExtractField([False], 'foo', {'foo': False})
700+
701+
702+class TestLower(tests.TestCase):
703+
704+ def assertLowerGets(self, expected, input_val):
705+ getter = query_parser.Lower(query_parser.StaticGetter(input_val))
706+ out_val = getter.get(trivial_raw_doc)
707+ self.assertEqual(expected, out_val)
708+
709+ def test_inner_returns_None(self):
710+ self.assertLowerGets([], None)
711+
712+ def test_inner_returns_string(self):
713+ self.assertLowerGets(['foo'], 'fOo')
714+
715+ def test_inner_returns_list(self):
716+ self.assertLowerGets(['foo', 'bar'], ['fOo', 'bAr'])
717+
718+ def test_inner_returns_int(self):
719+ self.assertLowerGets([], 9)
720+
721+ def test_inner_returns_float(self):
722+ self.assertLowerGets([], 9.0)
723+
724+ def test_inner_returns_bool(self):
725+ self.assertLowerGets([], True)
726+
727+ def test_inner_returns_list_containing_int(self):
728+ self.assertLowerGets(['foo', 'bar'], ['fOo', 9, 'bAr'])
729+
730+ def test_inner_returns_list_containing_float(self):
731+ self.assertLowerGets(['foo', 'bar'], ['fOo', 9.2, 'bAr'])
732+
733+ def test_inner_returns_list_containing_bool(self):
734+ self.assertLowerGets(['foo', 'bar'], ['fOo', True, 'bAr'])
735+
736+ def test_inner_returns_list_containing_list(self):
737+ # TODO: Should this be unfolding the inner list?
738+ self.assertLowerGets(['foo', 'bar'], ['fOo', ['bAa'], 'bAr'])
739+
740+ def test_inner_returns_list_containing_dict(self):
741+ self.assertLowerGets(['foo', 'bar'], ['fOo', {'baa': 'xam'}, 'bAr'])
742+
743+
744+class TestSplitWords(tests.TestCase):
745+
746+ def assertSplitWords(self, expected, value):
747+ getter = query_parser.SplitWords(query_parser.StaticGetter(value))
748+ self.assertEqual(expected, getter.get(trivial_raw_doc))
749+
750+ def test_inner_returns_None(self):
751+ self.assertSplitWords([], None)
752+
753+ def test_inner_returns_string(self):
754+ self.assertSplitWords(['foo', 'bar'], 'foo bar')
755+
756+ def test_inner_returns_list(self):
757+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
758+ ['foo baz', 'bar sux'])
759+
760+ def test_deduplicates(self):
761+ self.assertSplitWords(['bar'], ['bar', 'bar', 'bar'])
762+
763+ def test_inner_returns_int(self):
764+ self.assertSplitWords([], 9)
765+
766+ def test_inner_returns_float(self):
767+ self.assertSplitWords([], 9.2)
768+
769+ def test_inner_returns_bool(self):
770+ self.assertSplitWords([], True)
771+
772+ def test_inner_returns_list_containing_int(self):
773+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
774+ ['foo baz', 9, 'bar sux'])
775+
776+ def test_inner_returns_list_containing_float(self):
777+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
778+ ['foo baz', 9.2, 'bar sux'])
779+
780+ def test_inner_returns_list_containing_bool(self):
781+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
782+ ['foo baz', True, 'bar sux'])
783+
784+ def test_inner_returns_list_containing_list(self):
785+ # TODO: Expand sub-lists?
786+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
787+ ['foo baz', ['baa'], 'bar sux'])
788+
789+ def test_inner_returns_list_containing_dict(self):
790+ self.assertSplitWords(['foo', 'baz', 'bar', 'sux'],
791+ ['foo baz', {'baa': 'xam'}, 'bar sux'])
792+
793+
794+class TestIsNull(tests.TestCase):
795+
796+ def assertIsNull(self, value):
797+ getter = query_parser.IsNull(query_parser.StaticGetter(value))
798+ self.assertEqual([True], getter.get(trivial_raw_doc))
799+
800+ def assertIsNotNull(self, value):
801+ getter = query_parser.IsNull(query_parser.StaticGetter(value))
802+ self.assertEqual([False], getter.get(trivial_raw_doc))
803+
804+ def test_inner_returns_None(self):
805+ self.assertIsNull(None)
806+
807+ def test_inner_returns_string(self):
808+ self.assertIsNotNull('foo')
809+
810+ def test_inner_returns_list(self):
811+ self.assertIsNotNull(['foo', 'bar'])
812+
813+ def test_inner_returns_empty_list(self):
814+ # TODO: is this the behavior we want?
815+ self.assertIsNull([])
816+
817+ def test_inner_returns_int(self):
818+ self.assertIsNotNull(9)
819+
820+ def test_inner_returns_float(self):
821+ self.assertIsNotNull(9.2)
822+
823+ def test_inner_returns_bool(self):
824+ self.assertIsNotNull(True)
825+
826+ # TODO: What about a dict? Inner is likely to return None, even though the
827+ # attribute does exist...
828+
829+
830+class TestParser(tests.TestCase):
831+
832+ def parse(self, spec):
833+ parser = query_parser.Parser()
834+ return parser.parse(spec)
835+
836+ def parse_all(self, specs):
837+ parser = query_parser.Parser()
838+ return parser.parse_all(specs)
839+
840+ def assertParseError(self, definition):
841+ self.assertRaises(errors.IndexDefinitionParseError, self.parse,
842+ definition)
843+
844+ def test_parse_empty_string(self):
845+ self.assertRaises(errors.IndexDefinitionParseError, self.parse, "")
846+
847+ def test_parse_field(self):
848+ getter = self.parse("a")
849+ self.assertIsInstance(getter, query_parser.ExtractField)
850+ self.assertEqual("a", getter.field)
851+
852+ def test_parse_dotted_field(self):
853+ getter = self.parse("a.b")
854+ self.assertIsInstance(getter, query_parser.ExtractField)
855+ self.assertEqual("a.b", getter.field)
856+
857+ def test_parse_dotted_field_nothing_after_dot(self):
858+ self.assertParseError("a.")
859+
860+ def test_parse_missing_close_on_transformation(self):
861+ self.assertParseError("lower(a")
862+
863+ def test_parse_missing_field_in_transformation(self):
864+ self.assertParseError("lower()")
865+
866+ def test_parse_trailing_chars(self):
867+ self.assertParseError("lower(ab$)")
868+
869+ def test_parse_empty_op(self):
870+ self.assertParseError("(ab)")
871+
872+ def test_parse_unknown_op(self):
873+ self.assertParseError("no_such_operation(field)")
874+
875+ def test_parse_transformation(self):
876+ getter = self.parse("lower(a)")
877+ self.assertIsInstance(getter, query_parser.Lower)
878+ self.assertIsInstance(getter.inner, query_parser.ExtractField)
879+ self.assertEqual("a", getter.inner.field)
880+
881+ def test_parse_all(self):
882+ getters = self.parse_all(["a", "b"])
883+ self.assertEqual(2, len(getters))
884+ self.assertIsInstance(getters[0], query_parser.ExtractField)
885+ self.assertEqual("a", getters[0].field)
886+ self.assertIsInstance(getters[1], query_parser.ExtractField)
887+ self.assertEqual("b", getters[1].field)
888
889=== modified file 'u1db/tests/test_sqlite_backend.py'
890--- u1db/tests/test_sqlite_backend.py 2011-12-02 11:02:42 +0000
891+++ u1db/tests/test_sqlite_backend.py 2011-12-06 15:24:25 +0000
892@@ -23,6 +23,7 @@
893 from u1db import (
894 errors,
895 tests,
896+ query_parser,
897 )
898 from u1db.backends import sqlite_backend
899
900@@ -115,6 +116,22 @@
901 c.execute("SELECT * FROM conflicts")
902 c.execute("SELECT * FROM index_definitions")
903
904+ def test__parse_index(self):
905+ self.db = sqlite_backend.SQLitePartialExpandDatabase(':memory:')
906+ g = self.db._parse_index_definition('fieldname')
907+ self.assertIsInstance(g, query_parser.ExtractField)
908+ self.assertEqual('fieldname', g.field)
909+
910+ def test__update_indexes(self):
911+ self.db = sqlite_backend.SQLitePartialExpandDatabase(':memory:')
912+ g = self.db._parse_index_definition('fieldname')
913+ c = self.db._get_sqlite_handle().cursor()
914+ self.db._update_indexes('doc-id', {'fieldname': 'val'},
915+ [('fieldname', g)], c)
916+ c.execute('SELECT doc_id, field_name, value FROM document_fields')
917+ self.assertEqual([('doc-id', 'fieldname', 'val')],
918+ c.fetchall())
919+
920 def test__set_replica_uid(self):
921 # Start from scratch, so that replica_uid isn't set.
922 self.db = sqlite_backend.SQLitePartialExpandDatabase(':memory:')

Subscribers

People subscribed via source and target branches