Merge lp:~thisfred/u1db/indexing-deleted-docs into lp:u1db

Proposed by Eric Casteleijn on 2012-04-23
Status: Merged
Approved by: Eric Casteleijn on 2012-04-24
Approved revision: 253
Merged at revision: 253
Proposed branch: lp:~thisfred/u1db/indexing-deleted-docs
Merge into: lp:u1db
Diff against target: 96 lines (+25/-5)
4 files modified
src/u1db_query.c (+9/-1)
u1db/backends/inmemory.py (+2/-1)
u1db/backends/sqlite_backend.py (+2/-0)
u1db/tests/test_backends.py (+12/-3)
To merge this branch: bzr merge lp:~thisfred/u1db/indexing-deleted-docs
Reviewer Review Type Date Requested Status
John A Meinel (community) 2012-04-23 Approve on 2012-04-24
Review via email: mp+103165@code.launchpad.net

Commit Message

Adding a new index to a database that contains deleted documents no longer breaks.

Description of the Change

Adding a new index to a database that contains deleted documents no longer breaks.

To post a comment you must log in.
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/23/2012 10:09 PM, Eric Casteleijn wrote:
> Eric Casteleijn has proposed merging
> lp:~thisfred/u1db/indexing-deleted-docs into lp:u1db.
>
> Requested reviews: Ubuntu One hackers (ubuntuone-hackers)
>
> For more details, see:
> https://code.launchpad.net/~thisfred/u1db/indexing-deleted-docs/+merge/103165
>
> Adding a new index to a database that contains deleted documents
> no longer breaks.

Nice catch on this. Though I'm not sure about your solution.

+ if (context.content == NULL)
+ {
+ // Invalid JSON in the database, for now we just continue?
+ status = sqlite3_step(statement);
+ continue;
+ }

I think this should instead be:

if (context.content == NULL) {
  // Deleted document, just skip it and onto the next
  continue;
}

And this one:

         if (context.obj == NULL
                 || !json_object_is_type(context.obj, json_type_object))
         {
             // Invalid JSON in the database, for now we just continue?
+ status = sqlite3_step(statement);
             continue;
         }

Should probably actually be an error, rather than just continuing. I
think I was just cutting corners because we didn't need it for
compatibility with the test suite.

So I think your change is ok here, but we need a test for this case. I
suppose it is arguable. Should we skip (because creating an index
isn't a great time to just fail because you have bad data), or should
we fail (so that you know early your content is incorrect.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+WlX0ACgkQJdeBCYSNAAMgfACfbWefqMJBiDvuvAKjUmkdRYM1
/YYAn232RCIBxvGLpDh2o0BUVpH9bkdu
=WeJf
-----END PGP SIGNATURE-----

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/23/2012 10:09 PM, Eric Casteleijn wrote:
> Eric Casteleijn has proposed merging
> lp:~thisfred/u1db/indexing-deleted-docs into lp:u1db.
>
> Requested reviews: Ubuntu One hackers (ubuntuone-hackers)
>
> For more details, see:
> https://code.launchpad.net/~thisfred/u1db/indexing-deleted-docs/+merge/103165
>
> Adding a new index to a database that contains deleted documents
> no longer breaks.

 review: approve

With the small tweak of fixing the comment on why we are continuing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+WlZYACgkQJdeBCYSNAAMsewCgydv24JAuttJVcURcpWzG6k5Z
FB0An3dd/nPYp+VlxhdP4jQVnbLHksKo
=X70+
-----END PGP SIGNATURE-----

review: Approve
Eric Casteleijn (thisfred) wrote :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 4/23/2012 10:09 PM, Eric Casteleijn wrote:
> > Eric Casteleijn has proposed merging
> > lp:~thisfred/u1db/indexing-deleted-docs into lp:u1db.
> >
> > Requested reviews: Ubuntu One hackers (ubuntuone-hackers)
> >
> > For more details, see:
> > https://code.launchpad.net/~thisfred/u1db/indexing-deleted-
> docs/+merge/103165
> >
> > Adding a new index to a database that contains deleted documents
> > no longer breaks.
>
> Nice catch on this. Though I'm not sure about your solution.
>
> + if (context.content == NULL)
> + {
> + // Invalid JSON in the database, for now we just continue?
> + status = sqlite3_step(statement);
> + continue;
> + }
>
> I think this should instead be:
>
> if (context.content == NULL) {
> // Deleted document, just skip it and onto the next
> continue;
> }

I'm not sure I understand:

The while loop will never advance to the next table row if we just do continue without the

status = sqlite3_step(statement);

unless we advance the row in the while statement itself, this would result in an infinite loop? Maybe I'm reading the code wrong.

> And this one:
>
> if (context.obj == NULL
> || !json_object_is_type(context.obj, json_type_object))
> {
> // Invalid JSON in the database, for now we just continue?
> + status = sqlite3_step(statement);
> continue;
> }
>
> Should probably actually be an error, rather than just continuing. I
> think I was just cutting corners because we didn't need it for
> compatibility with the test suite.

Yes, now that we catch the case of the deleted document, this should probably be an error.

> So I think your change is ok here, but we need a test for this case. I
> suppose it is arguable. Should we skip (because creating an index
> isn't a great time to just fail because you have bad data), or should
> we fail (so that you know early your content is incorrect.)

I'll file a bug and think about a good test case, and how to handle the case of corrupted content in general. Since it's not easy (hopefully) to get invalid JSON into the db using our API, I think we should probably make it break as early and explicitly as possible.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> I think this should instead be:
>>
>> if (context.content == NULL) { // Deleted document, just skip it
>> and onto the next continue; }
>
>
> I'm not sure I understand:
>
> The while loop will never advance to the next table row if we just
> do continue without the
>
> status = sqlite3_step(statement);
>
> unless we advance the row in the while statement itself, this would
> result in an infinite loop? Maybe I'm reading the code wrong.
>

Sorry, I'm not saying you shouldn't step, just fix the comment.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+WpBQACgkQJdeBCYSNAAOm1wCgiXxGVWtwWGLwwMjP1Aty/ncL
KncAn1HZEXxVqXK8w7I33EuExyouvJeN
=fb/Y
-----END PGP SIGNATURE-----

254. By Eric Casteleijn on 2012-04-24

comments improved.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/u1db_query.c'
2--- src/u1db_query.c 2012-04-17 14:26:03 +0000
3+++ src/u1db_query.c 2012-04-24 13:05:24 +0000
4@@ -796,15 +796,23 @@
5 while (status == SQLITE_ROW) {
6 context.doc_id = (const char*)sqlite3_column_text(statement, 0);
7 context.content = (const char*)sqlite3_column_text(statement, 1);
8+ if (context.content == NULL)
9+ {
10+ // This document is deleted so does not need to be indexed.
11+ status = sqlite3_step(statement);
12+ continue;
13+ }
14 context.obj = json_tokener_parse(context.content);
15 if (context.obj == NULL
16 || !json_object_is_type(context.obj, json_type_object))
17 {
18 // Invalid JSON in the database, for now we just continue?
19+ // TODO: Raise an error here.
20+ status = sqlite3_step(statement);
21 continue;
22 }
23 for (i = 0; i < n_expressions; ++i) {
24- status = evaluate_index_and_insert_into_db(&context,
25+ status = evaluate_index_and_insert_into_db(&context,
26 expressions[i]);
27 if (status != U1DB_OK)
28 goto finish;
29
30=== modified file 'u1db/backends/inmemory.py'
31--- u1db/backends/inmemory.py 2012-02-01 13:36:52 +0000
32+++ u1db/backends/inmemory.py 2012-04-24 13:05:24 +0000
33@@ -168,7 +168,8 @@
34 def create_index(self, index_name, index_expression):
35 index = InMemoryIndex(index_name, index_expression)
36 for doc_id, (doc_rev, doc) in self._docs.iteritems():
37- index.add_json(doc_id, doc)
38+ if doc is not None:
39+ index.add_json(doc_id, doc)
40 self._indexes[index_name] = index
41
42 def delete_index(self, index_name):
43
44=== modified file 'u1db/backends/sqlite_backend.py'
45--- u1db/backends/sqlite_backend.py 2012-02-23 09:46:58 +0000
46+++ u1db/backends/sqlite_backend.py 2012-04-24 13:05:24 +0000
47@@ -669,6 +669,8 @@
48 for field in new_fields]
49 c = self._db_handle.cursor()
50 for doc_id, doc in self._iter_all_docs():
51+ if doc is None:
52+ continue
53 raw_doc = simplejson.loads(doc)
54 self._update_indexes(doc_id, raw_doc, getters, c)
55
56
57=== modified file 'u1db/tests/test_backends.py'
58--- u1db/tests/test_backends.py 2012-04-16 16:15:56 +0000
59+++ u1db/tests/test_backends.py 2012-04-24 13:05:24 +0000
60@@ -42,7 +42,7 @@
61
62 def http_create_database(test, replica_uid, path='test'):
63 test.startServer()
64- db = test.request_state._create_database(replica_uid)
65+ test.request_state._create_database(replica_uid)
66 return http_database.HTTPDatabase(test.getURL(path))
67
68
69@@ -593,6 +593,14 @@
70 self.assertEqual([doc],
71 self.db.get_from_index('test-idx', [('value',)]))
72
73+ def test_create_index_after_deleting_document(self):
74+ doc = self.db.create_doc(simple_doc)
75+ doc2 = self.db.create_doc(simple_doc)
76+ self.db.delete_doc(doc2)
77+ self.db.create_index('test-idx', ['key'])
78+ self.assertEqual([doc],
79+ self.db.get_from_index('test-idx', [('value',)]))
80+
81 def test_delete_index(self):
82 self.db.create_index('test-idx', ['key'])
83 self.assertEqual([('test-idx', ['key'])], self.db.list_indexes())
84@@ -606,9 +614,10 @@
85 self.db.get_from_index('test-idx', [('value',)]))
86
87 def test_get_from_index_unmatched(self):
88- doc = self.db.create_doc(simple_doc)
89+ self.db.create_doc(simple_doc)
90 self.db.create_index('test-idx', ['key'])
91- self.assertEqual([], self.db.get_from_index('test-idx', [('novalue',)]))
92+ self.assertEqual(
93+ [], self.db.get_from_index('test-idx', [('novalue',)]))
94
95 def test_create_index_multiple_exact_matches(self):
96 doc = self.db.create_doc(simple_doc)

Subscribers

People subscribed via source and target branches