Merge lp:~thisfred/u1db/c-nested_index into lp:u1db

Proposed by Eric Casteleijn on 2012-03-06
Status: Merged
Approved by: John A Meinel on 2012-03-07
Approved revision: 224
Merged at revision: 226
Proposed branch: lp:~thisfred/u1db/c-nested_index
Merge into: lp:u1db
Diff against target: 189 lines (+48/-21)
4 files modified
src/u1db.c (+12/-12)
src/u1db_query.c (+17/-1)
u1db/tests/test_backends.py (+18/-3)
u1db/tests/test_c_backend.py (+1/-5)
To merge this branch: bzr merge lp:~thisfred/u1db/c-nested_index
Reviewer Review Type Date Requested Status
John A Meinel (community) 2012-03-06 Needs Fixing on 2012-03-07
Review via email: mp+96215@code.launchpad.net

Commit Message

Added support for nested indexes.

Description of the Change

Added support for nested indexes.

To post a comment you must log in.
Eric Casteleijn (thisfred) wrote :

This is a very rough attempt to get the tests to pass. Please turn review harshness up to 11, as there may be performance issues or even memory management bugs. I added some XXX comments where I had questions, which I'll remove once they're addressed.

lp:~thisfred/u1db/c-nested_index updated on 2012-03-06
222. By Eric Casteleijn on 2012-03-06

initialize

223. By Eric Casteleijn on 2012-03-06

strtok_r is threadsafe

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/6/2012 7:45 PM, Eric Casteleijn wrote:
> Eric Casteleijn has proposed merging
> lp:~thisfred/u1db/c-nested_index into lp:u1db.
>
> Requested reviews: John A Meinel (jameinel)
>
> For more details, see:
> https://code.launchpad.net/~thisfred/u1db/c-nested_index/+merge/96215
>
> Added support for nested indexes.

+ val = int(self.db._replica_uid, 16) # XXX missing assert?

The assert is effectively that a ValueError is not raised because uid
is a valid hex string.

You can put a comment to that fact if you prefer. The uid is going to
be random, so we can't assert what value it has, but we are asserting
that it is a hex string.

- - val = json_object_object_get(ctx->obj, expression);
+ tmp_expression = strdup(expression); // XXX: need to test for out
of mem?
+ result = strtok(tmp_expression, DOT); // XXX: strtok thread safety?
+ val = ctx->obj;
+ while (result != NULL) {
+ val = json_object_object_get(val, result);
+ result = strtok(NULL, DOT);
+ }
+ free(tmp_expression);

I'm not a big fan of strtok, because the state is kept within the
function, rather than being exposed and passed back.
Plus it mutates the buffer you're working with, etc. Though we
probably don't have much choice on the latter, because libjson doesn't
have a get that you can pass the length of the string (it has to be
null-terminated).

I see that you switched to strtok_r in a follow up patch that doesn't
show up in the email diff. Which I thought was a good idea from "man
strtok", except strtok_r doesn't exist for MSVC.. :(

It isn't particularly hard to write our own strtok_r, or even do the
parsing manually, since we don't need to support a set of characters,
just one. So strchr is a pretty easy substitute.

// Not actually tested:
tmp_expression = strdup(expression);
result = tmp_expression;
while (result != NULL) {
  dot_chr = strchr(result, '.');
  if (dot_chr != NULL) {
    *dot_chr = '\0';
    dot_chr ++;
  }
  val = json_object_object_get(val, result);
  result = dot_chr;
}
free(tmp_expression);

Oh, we'll also want to trap for 'val' become non-existent, I believe
[while (result != NULL && val != NULL)]. I think we need another test
case for:

def tested_nested_nonexistent(self):
  doc = self.db.create_doc(nested_doc)
  # sub exists, but sub.foo does not:
  self.db.create_index('test-idx', ['sub.foo'])
  self.assertEqual([], self.db.get_from_index('test-idx', [('*',)]))

I'm not 100% sure how json_object_object_get works for keys that don't
exist, but that's why we test it :).

This is the sort of test that *could* be a specific implementation
test, but it works well enough to be tested at the API level, and all
implementations shouldn't crash if an index requests an attribute that
doesn't exist.

Otherwise, I think this idea is pretty good.

 review: needsfixing

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9XHJQACgkQJdeBCYSNAAPQsQCgrjWv7cKhyw/Uu4DZmXd6p0OH
pdMAoKsheZPbqs+r1pfNKLpQAQ6y3FV4
=oLyS
-----END PGP SIGNATURE-----

review: Needs Fixing
Eric Casteleijn (thisfred) wrote :

Thanks, will fix, when mulling this over last night I realized we only tested for the happy path, and it probably breaks horribly when the subfield does not exist.

Eric Casteleijn (thisfred) wrote :

The above test passes with the current code, because the final val being NULL was already caught by the

if (val != NULL)

below, so I added another test which looks for a value that is nested deeper, and that indeed blew up the code, which your proposed extra check fixed.

lp:~thisfred/u1db/c-nested_index updated on 2012-03-07
224. By Eric Casteleijn on 2012-03-07

removed use of strtok, added a test for nonexistent nested fields, and fixed the code so it doesn't blow up in the face of that

John A Meinel (jameinel) wrote :

138 + def tested_nested_nonexistent(self):
139 + doc = self.db.create_doc(nested_doc)
140 + # sub exists, but sub.foo does not:
141 + self.db.create_index('test-idx', ['sub.foo'])
142 + self.assertEqual([], self.db.get_from_index('test-idx', [('*',)]))
143 +
144 + def tested_nested_nonexistent(self):
145 + doc = self.db.create_doc(nested_doc)
146 + # sub exists, but sub.foo does not:
147 + self.db.create_index('test-idx', ['sub.foo.bar.baz.qux.fnord'])
148 + self.assertEqual([], self.db.get_from_index('test-idx', [('*',)]))

You have the test name duplicated. I don't know if you just want the second test, or if you want both of them. Otherwise this is good.

review: Needs Fixing
Eric Casteleijn (thisfred) wrote :

Oops, fixed.

lp:~thisfred/u1db/c-nested_index updated on 2012-03-08
225. By Eric Casteleijn on 2012-03-08

fixed test name

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/u1db.c'
2--- src/u1db.c 2012-03-07 08:07:40 +0000
3+++ src/u1db.c 2012-03-08 13:47:22 +0000
4@@ -355,15 +355,15 @@
5 int status;
6
7 if (is_update) {
8- status = sqlite3_prepare_v2(db->sql_handle,
9+ status = sqlite3_prepare_v2(db->sql_handle,
10 "UPDATE document SET doc_rev = ?, content = ? WHERE doc_id = ?", -1,
11- &statement, NULL);
12+ &statement, NULL);
13 if (status != SQLITE_OK) { goto finish; }
14 status = delete_old_fields(db, doc_id);
15 } else {
16- status = sqlite3_prepare_v2(db->sql_handle,
17+ status = sqlite3_prepare_v2(db->sql_handle,
18 "INSERT INTO document (doc_rev, content, doc_id) VALUES (?, ?, ?)", -1,
19- &statement, NULL);
20+ &statement, NULL);
21 }
22 if (status != SQLITE_OK) {
23 return status;
24@@ -386,7 +386,7 @@
25 sqlite3_finalize(statement);
26 if (status != SQLITE_OK) { goto finish; }
27 status = u1db__update_indexes(db, doc_id, content);
28- status = sqlite3_prepare_v2(db->sql_handle,
29+ status = sqlite3_prepare_v2(db->sql_handle,
30 "INSERT INTO transaction_log(doc_id) VALUES (?)", -1,
31 &statement, NULL);
32 if (status != SQLITE_OK) {
33@@ -414,9 +414,9 @@
34 sqlite3_stmt *statement;
35 int status;
36
37- status = sqlite3_prepare_v2(db->sql_handle,
38+ status = sqlite3_prepare_v2(db->sql_handle,
39 "SELECT 1 FROM conflicts WHERE doc_id = ? LIMIT 1", -1,
40- &statement, NULL);
41+ &statement, NULL);
42 if (status != SQLITE_OK) {
43 return status;
44 }
45@@ -446,9 +446,9 @@
46 sqlite3_stmt *statement;
47 int status;
48
49- status = sqlite3_prepare_v2(db->sql_handle,
50+ status = sqlite3_prepare_v2(db->sql_handle,
51 "INSERT INTO conflicts VALUES (?, ?, ?)", -1,
52- &statement, NULL);
53+ &statement, NULL);
54 if (status != SQLITE_OK) {
55 return status;
56 }
57@@ -1223,7 +1223,7 @@
58 if (result == NULL) {
59 return NULL;
60 }
61- status = sqlite3_prepare_v2(db->sql_handle, sql, n, &statement, NULL);
62+ status = sqlite3_prepare_v2(db->sql_handle, sql, n, &statement, NULL);
63 if (status != SQLITE_OK) {
64 result->status = status;
65 return result;
66@@ -1592,9 +1592,9 @@
67 if (db == NULL || index_name == NULL) {
68 return U1DB_INVALID_PARAMETER;
69 }
70- status = sqlite3_prepare_v2(db->sql_handle,
71+ status = sqlite3_prepare_v2(db->sql_handle,
72 "DELETE FROM index_definitions WHERE name = ?", -1,
73- &statement, NULL);
74+ &statement, NULL);
75 if (status != SQLITE_OK) { goto finish; }
76 status = sqlite3_bind_text(statement, 1, index_name, -1, SQLITE_TRANSIENT);
77 if (status != SQLITE_OK) { goto finish; }
78
79=== modified file 'src/u1db_query.c'
80--- src/u1db_query.c 2012-03-08 08:08:27 +0000
81+++ src/u1db_query.c 2012-03-08 13:47:22 +0000
82@@ -360,12 +360,28 @@
83 json_object *val;
84 const char *str_val;
85 int status = U1DB_OK;
86+ char *result = NULL;
87+ char *tmp_expression = NULL;
88+ char *progress = NULL;
89+ char *dot_chr = NULL;
90
91 ctx = (struct evaluate_index_context *)context;
92 if (ctx->obj == NULL || !json_object_is_type(ctx->obj, json_type_object)) {
93 return U1DB_INVALID_JSON;
94 }
95- val = json_object_object_get(ctx->obj, expression);
96+ tmp_expression = strdup(expression);
97+ result = tmp_expression;
98+ val = ctx->obj;
99+ while (result != NULL && val != NULL) {
100+ dot_chr = strchr(result, '.');
101+ if (dot_chr != NULL) {
102+ *dot_chr = '\0';
103+ dot_chr++;
104+ }
105+ val = json_object_object_get(val, result);
106+ result = dot_chr;
107+ }
108+ free(tmp_expression);
109 if (val != NULL) {
110 str_val = json_object_get_string(val);
111 if (str_val != NULL) {
112
113=== modified file 'u1db/tests/test_backends.py'
114--- u1db/tests/test_backends.py 2012-03-06 12:51:24 +0000
115+++ u1db/tests/test_backends.py 2012-03-08 13:47:22 +0000
116@@ -348,6 +348,7 @@
117 self.assertEqual([no_conflict_doc, doc2],
118 self.db.get_docs([doc1.doc_id, doc2.doc_id],
119 check_for_conflicts=False))
120+
121 def test_get_doc_conflicts(self):
122 doc = self.db.create_doc(simple_doc)
123 alt_doc = self.make_document(doc.doc_id, 'alternate:1', nested_doc)
124@@ -713,9 +714,6 @@
125 self.assertEqual(sorted([doc1, doc2, doc3]),
126 sorted(self.db.get_from_index('test-idx', [("v1", "*")])))
127
128-
129-class PyDatabaseIndexTests(tests.DatabaseBaseTests):
130-
131 def test_nested_index(self):
132 doc = self.db.create_doc(nested_doc)
133 self.db.create_index('test-idx', ['sub.doc'])
134@@ -726,6 +724,21 @@
135 sorted([doc, doc2]),
136 sorted(self.db.get_from_index('test-idx', [('underneath',)])))
137
138+ def test_nested_nonexistent(self):
139+ doc = self.db.create_doc(nested_doc)
140+ # sub exists, but sub.foo does not:
141+ self.db.create_index('test-idx', ['sub.foo'])
142+ self.assertEqual([], self.db.get_from_index('test-idx', [('*',)]))
143+
144+ def test_nested_nonexistent2(self):
145+ doc = self.db.create_doc(nested_doc)
146+ # sub exists, but sub.foo does not:
147+ self.db.create_index('test-idx', ['sub.foo.bar.baz.qux.fnord'])
148+ self.assertEqual([], self.db.get_from_index('test-idx', [('*',)]))
149+
150+
151+class PyDatabaseIndexTests(tests.DatabaseBaseTests):
152+
153 def test_get_from_index_case_sensitive(self):
154 self.db.create_index('test-idx', ['key'])
155 doc1 = self.db.create_doc(simple_doc)
156@@ -837,8 +850,10 @@
157 new_content = '{"key": "altval"}'
158 other_rev = 'test:1|z:2'
159 st = self.db.get_sync_target()
160+
161 def ignore(doc_id, doc_rev, doc):
162 pass
163+
164 doc_other = self.make_document(doc.doc_id, other_rev, new_content)
165 docs_by_gen = [(doc_other, 10)]
166 result = st.sync_exchange(docs_by_gen, 'other-replica',
167
168=== modified file 'u1db/tests/test_c_backend.py'
169--- u1db/tests/test_c_backend.py 2012-03-07 13:09:34 +0000
170+++ u1db/tests/test_c_backend.py 2012-03-08 13:47:22 +0000
171@@ -79,6 +79,7 @@
172 self.db = c_backend_wrapper.CDatabase(':memory:')
173 self.assertIsNot(None, self.db._replica_uid)
174 self.assertEqual(32, len(self.db._replica_uid))
175+ # casting to an int from the uid *is* the check for correct behavior.
176 val = int(self.db._replica_uid, 16)
177
178 def test_get_conflicts_with_borked_data(self):
179@@ -102,11 +103,6 @@
180 self.db = c_backend_wrapper.CDatabase(':memory:')
181 doc = self.db.create_doc(tests.nested_doc)
182 self.db.create_index("multi-idx", ["key", "sub.doc"])
183- # TODO: The current backend doesn't support nested fields, so we push
184- # that data in manually.
185- self.db._run_sql("INSERT INTO document_fields"
186- " VALUES ('%s', 'sub.doc', 'underneath')"
187- % (doc.doc_id,))
188 docs = self.db.get_from_index('multi-idx', [('value', 'underneath')])
189 self.assertEqual([doc], docs)
190

Subscribers

People subscribed via source and target branches