Merge lp:~thisfred/u1db/number-mapping into lp:u1db

Proposed by Eric Casteleijn on 2012-04-27
Status: Merged
Merged at revision: 262
Proposed branch: lp:~thisfred/u1db/number-mapping
Merge into: lp:u1db
Diff against target: 764 lines (+444/-151)
6 files modified
include/u1db/u1db.h (+5/-0)
setup.py (+1/-1)
src/u1db_query.c (+327/-145)
u1db/query_parser.py (+38/-3)
u1db/tests/test_backends.py (+16/-2)
u1db/tests/test_query_parser.py (+57/-0)
To merge this branch: bzr merge lp:~thisfred/u1db/number-mapping
Reviewer Review Type Date Requested Status
John A Meinel (community) 2012-04-27 Approve on 2012-05-02
Review via email: mp+103862@code.launchpad.net

Commit Message

This adds indexing of numeric values as zero padded strings, with the number(field, no_of_zeroes) mapping.

Description of the Change

This adds indexing of numeric values as zero padded strings, with the number(field, no_of_zeroes) mapping.

To post a comment you must log in.
John A Meinel (jameinel) wrote :
Download full text (4.9 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/27/2012 2:32 PM, Eric Casteleijn wrote:
> Eric Casteleijn has proposed merging
> lp:~thisfred/u1db/number-mapping into lp:u1db.
>
> Requested reviews: John A Meinel (jameinel) Related bugs: Bug
> #987412 in U1DB: "support indexes on non-string types"
> https://bugs.launchpad.net/u1db/+bug/987412
>
> For more details, see:
> https://code.launchpad.net/~thisfred/u1db/number-mapping/+merge/103862
>
> This adds indexing of numeric values as zero padded strings, with
> the number(field, no_of_zeroes) mapping.

I think this hints that operations should really be more of a
class-like structure. Rather than a function that you pass varargs,
you really want a bit of context along with your function.

I also don't fully understand why number has to be passed in as a
string for every application of 'op_number'. As that means for 50k
documents you call atoi on that number for each doc. It isn't
terrible, but it is surprising.

At one point, I thought you were looking to change the code so we
didn't have to parse the description every time. Did you get anywhere
with that?

+ value = item->data;
+ for (p = value; *p; p++) {
+ if (isdigit(*p) == 0) {
+ continue;
+ }
+ }
+ value_size = strlen(value);
+ if (zeroes > value_size)
+ value_size = zeroes;
+ value_size++;
+ new_value = (char *)calloc(value_size, 1);
+ if (new_value == NULL)
+ {
+ status = U1DB_NOMEM;
+ goto finish;
+ }
+ count = snprintf(new_value, value_size, "%0*d", zeroes,
atoi(value));

I thought originally you had the wrong buffer size, but I missed the
if(zeros > value_size) line. Would it be clearer as:

value_size = max(strlen(value), zeros) + 1;

We haven't clearly specified what should happen if you use number(x,
5) but the number in a document is >=100,000. And I don't see a test
case for a number too big for the width.

...

 trivial_raw_doc = {}
+PADDING = 5

I don't really prefer global constants like this. I would probably do
it as:

+ def assertNumber(self, expected, value, padding=5):

That will also let you easily test a number that is way to big to fit,
and a padding size that is way to big to put all the digits in.

+ for (p = value; *p; p++) {
+ if (isdigit(*p) == 0) {
+ continue;
+ }
+ }

I don't see the point of that loop. You don't do anything with 'p'
after the loop.

Maybe that is the 'p' that you check earlier? And you forgot and left
this loop in?

...

+static char *
+itoa(int integer_value)
+{

Because the maximum size of an integer is known in advance
(len(str(2**64)) == 20), I think we would be better off avoiding
counting the size of the integer and doing a calloc + free, and just
having a buffer of size 21, that gets allocated on the stack and
passed into something like 'snprintf(buf, 21, "%d", val)'.

So that would make this code:

+ } else if (json_object_is_type(val, json_type_int)) {
+ integer_value = json_object_get_int(val);
+ string_value = itoa(integer_value);
+ ...

Read more...

Eric Casteleijn (thisfred) wrote :

I addressed all of your comments, so please have another look, to see if you agree with the direction I'm taking if nothing else.

Known issues:

- there is some memory leaking, but make valgrind-leaks does not fail any tests... (I will chase this down tomorrow)
- the parsing of the field_mappings still happens once for every document. This is now no longer necessary, but lifting the transformations out of evaluate_index_and_insert_into_db is likely to make the branch quite a bit bigger. (I tried but it means changing the callback signature, and that affected quite a bit of the code.) Note that the python implementation *also* parses the expression once per document, so we may want to fix that as well.

lp:~thisfred/u1db/number-mapping updated on 2012-05-01
267. By Eric Casteleijn on 2012-05-01

added pyxdg

268. By Eric Casteleijn on 2012-05-01

merged trunk and fixed conflict

269. By Eric Casteleijn on 2012-05-01

added test

270. By Eric Casteleijn on 2012-05-01

fixed memory leak

Eric Casteleijn (thisfred) wrote :

Memory issue found and fixed.

John A Meinel (jameinel) wrote :

I'm thinking the max() code is defined elsewhere as well, we probably want it in a more common header, like maybe u1db_internal.h

111 + if (tr->args->head != NULL)
112 + {
113 + status = ((args_operation)tr->op)(result, tmp_values, tr->args);
114 + } else {
115 + status = ((operation)tr->op)(result, tmp_values);
116 + }

I think I'd rather just have all operations take a void * that could be the args list, rather than having this code special case. Just some operations would have a NULL args list.

I feel like I'd rather have the args as a void* in the transformation code, and have it be a struct that is defined by the type, rather than a generic linked-list of string values. For now this is fine, but it feels like a better place to end up at. (Maybe you have more of that in the next steps)

...
char string_value[21];

Thinking about this, we should pull the 21 out to be a defined constant with a comment about "max length of a decimal integer".

...

466 + status = init_transformation(&inner);
467 + if (status != U1DB_OK)
468 + goto finish;
469 + status = parse(new_ptr, inner);
470 + if (status != U1DB_OK)
471 + {
472 + destroy_transformation(inner);
473 + goto finish;
474 + }
475 + result->next = inner;

I only see a code path that calls "destroy_transformation" if the status is not U1DB_OK. Maybe I'm missing something because the transformation is a linked list?

...

91 +static void
92 +destroy_transformation(transformation *tr)
93 +{
94 + if (tr->next != NULL)

You don't have a check here:

if (tr == NULL) return;

Because if you go to parse a sequence, and you get an error with the first item, you'll never set a result, I think.
Actually, it looks like that only triggers if you fail to init the transformation in the first place, but it still seems to be a valid concern.

Otherwise, I'm happy enough.

Eric Casteleijn (thisfred) wrote :

> I'm thinking the max() code is defined elsewhere as well, we probably want it
> in a more common header, like maybe u1db_internal.h

I'll have a search. Mine might not be safe to use in all contexts.

> 111 + if (tr->args->head != NULL)
> 112 + {
> 113 + status = ((args_operation)tr->op)(result, tmp_values, tr->args);
> 114 + } else {
> 115 + status = ((operation)tr->op)(result, tmp_values);
> 116 + }
>
>
> I think I'd rather just have all operations take a void * that could be the
> args list, rather than having this code special case. Just some operations
> would have a NULL args list.
>
> I feel like I'd rather have the args as a void* in the transformation code,
> and have it be a struct that is defined by the type, rather than a generic
> linked-list of string values. For now this is fine, but it feels like a better
> place to end up at. (Maybe you have more of that in the next steps)

The arguments are parsed out of the index expression, but unless we start annotating the operations with how many and what types of arguments they expect, or hardcode lots of conditional logic about the different operators in the parser, I don't see how we can do much more than split on commas and let the operations deal with that list of strings. As we accumulate more operations, having this kind of type annotation may become worth the effort, but since we now have exactly one operation with a single extra argument, I don't think it's time to make that call yet. I will change the extra argument to a void which we can pass a NULL to in the usual case, because that is a definite improvement.

> ...
> char string_value[21];
>
> Thinking about this, we should pull the 21 out to be a defined constant with a
> comment about "max length of a decimal integer".

will do

> ...
>
> 466 + status = init_transformation(&inner);
> 467 + if (status != U1DB_OK)
> 468 + goto finish;
> 469 + status = parse(new_ptr, inner);
> 470 + if (status != U1DB_OK)
> 471 + {
> 472 + destroy_transformation(inner);
> 473 + goto finish;
> 474 + }
> 475 + result->next = inner;
>
> I only see a code path that calls "destroy_transformation" if the status is
> not U1DB_OK. Maybe I'm missing something because the transformation is a
> linked list?

I'll add a comment, but yeah: if the creation of the inner transformation succeeds, it will be cleaned up recursively when the outermost one is destroyed.

> ...
>
> 91 +static void
> 92 +destroy_transformation(transformation *tr)
> 93 +{
> 94 + if (tr->next != NULL)
>
> You don't have a check here:
>
> if (tr == NULL) return;
>
> Because if you go to parse a sequence, and you get an error with the first
> item, you'll never set a result, I think.
> Actually, it looks like that only triggers if you fail to init the
> transformation in the first place, but it still seems to be a valid concern.

Will fix.

lp:~thisfred/u1db/number-mapping updated on 2012-05-02
271. By Eric Casteleijn on 2012-05-02

simplification of operations

272. By Eric Casteleijn on 2012-05-02

added comment and fixed destroy transformation

Eric Casteleijn (thisfred) wrote :

> ... I will change the extra argument
> to a void which we can pass a NULL to in the usual case, because that is a
> definite improvement.

Actually, if it's always a string_list or null, we might as well keep the type. I did collapse the three different operation types back into one though, since that made no sense.

John A Meinel (jameinel) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'include/u1db/u1db.h'
2--- include/u1db/u1db.h 2012-04-30 11:19:18 +0000
3+++ include/u1db/u1db.h 2012-05-02 14:10:25 +0000
4@@ -58,6 +58,11 @@
5 #define U1DB_INVALID_VALUE_FOR_INDEX -9
6 #define U1DB_INVALID_HTTP_RESPONSE -10
7 #define U1DB_BROKEN_SYNC_STREAM -11
8+#define U1DB_INVALID_TRANSFORMATION_FUNCTION -12
9+#define U1DB_UNKNOWN_OPERATION -13
10+#define U1DB_UNHANDLED_CHARACTERS -14
11+#define U1DB_MISSING_FIELD_SPECIFIER -15
12+#define U1DB_INVALID_FIELD_SPECIFIER -16
13 #define U1DB_INTERNAL_ERROR -999
14
15 // Used by put_doc_if_newer
16
17=== modified file 'setup.py'
18--- setup.py 2012-04-17 14:51:19 +0000
19+++ setup.py 2012-05-02 14:10:25 +0000
20@@ -35,7 +35,7 @@
21 "package_data": {'': ["*.sql"]},
22 "scripts": ['u1db-client', 'u1db-serve'],
23 "ext_modules": ext,
24- "install_requires": ["paste", "simplejson", "routes"],
25+ "install_requires": ["paste", "simplejson", "routes", "pyxdg"],
26 # informational
27 "tests_require": ["testtools", "testscenarios", "Cython"],
28 "classifiers": [
29
30=== modified file 'src/u1db_query.c'
31--- src/u1db_query.c 2012-04-30 11:17:22 +0000
32+++ src/u1db_query.c 2012-05-02 14:10:25 +0000
33@@ -23,8 +23,11 @@
34 #include <ctype.h>
35 #include <json/json.h>
36
37-#define OPERATIONS 2
38-static const char *OPERATORS[OPERATIONS] = {"lower", "split_words"};
39+#define OPERATIONS 3
40+#ifndef max
41+ #define max(a, b) (((a) > (b)) ? (a) : (b))
42+#endif
43+static const char *OPERATORS[OPERATIONS] = {"lower", "number", "split_words"};
44
45 typedef struct string_list_item_
46 {
47@@ -38,11 +41,24 @@
48 string_list_item *tail;
49 } string_list;
50
51-typedef int(*operation)(string_list *, const string_list *);
52-static int op_lower(string_list *result, const string_list *value);
53-static int op_split_words(string_list *result, const string_list *value);
54-
55-static operation operations[OPERATIONS] = {op_lower, op_split_words};
56+typedef struct transformation_
57+{
58+ void *op;
59+ struct transformation_ *next;
60+ string_list *args;
61+} transformation;
62+
63+typedef int(*operation)(string_list *, const string_list *,
64+ const string_list *);
65+static int op_lower(string_list *result, const string_list *value,
66+ const string_list *args);
67+static int op_number(string_list *result, const string_list *value,
68+ const string_list *args);
69+static int op_split_words(string_list *result, const string_list *value,
70+ const string_list *args);
71+
72+static operation operations[OPERATIONS] = {
73+ op_lower, op_number, op_split_words};
74
75
76 static int
77@@ -97,6 +113,121 @@
78 }
79
80 static int
81+init_transformation(transformation **tr)
82+{
83+ int status = U1DB_OK;
84+ if ((*tr = (transformation *)malloc(sizeof(transformation))) == NULL)
85+ return U1DB_NOMEM;
86+ (*tr)->op = NULL;
87+ (*tr)->next = NULL;
88+ status = init_list(&((*tr)->args));
89+ if (status != U1DB_OK)
90+ return status;
91+ return status;
92+}
93+
94+static void
95+destroy_transformation(transformation *tr)
96+{
97+ if (tr == NULL)
98+ return;
99+ if (tr->next != NULL)
100+ destroy_transformation(tr->next);
101+ destroy_list(tr->args);
102+ free(tr);
103+}
104+
105+static int
106+extract_field_values(string_list *values, json_object *obj,
107+ const string_list *field_path)
108+{
109+ string_list_item *item = NULL;
110+ char string_value[21];
111+ struct array_list *list_val = NULL;
112+ json_object *val = NULL;
113+ int i, integer_value;
114+ int status = U1DB_OK;
115+ val = obj;
116+ if (val == NULL)
117+ goto finish;
118+ for (item = field_path->head; item != NULL; item = item->next)
119+ {
120+ val = json_object_object_get(val, item->data);
121+ if (val == NULL)
122+ goto finish;
123+ }
124+ if (json_object_is_type(val, json_type_string)) {
125+ if ((status = append(values, json_object_get_string(val))) != U1DB_OK)
126+ goto finish;
127+ } else if (json_object_is_type(val, json_type_int)) {
128+ integer_value = json_object_get_int(val);
129+ snprintf(string_value, 21, "%d", integer_value);
130+ if (status != U1DB_OK)
131+ goto finish;
132+ if ((status = append(values, string_value)) != U1DB_OK)
133+ goto finish;
134+ } else if (json_object_is_type(val, json_type_array)) {
135+ list_val = json_object_get_array(val);
136+ for (i = 0; i < list_val->length; i++)
137+ {
138+ if ((status = append(values, json_object_get_string(
139+ array_list_get_idx(
140+ list_val, i)))) != U1DB_OK)
141+ goto finish;
142+ }
143+ }
144+finish:
145+ return status;
146+}
147+
148+
149+static int
150+apply_transformation(transformation *tr, json_object *obj, string_list *result)
151+{
152+ int status = U1DB_OK;
153+ string_list *tmp_values = NULL;
154+ if (tr->next != NULL)
155+ {
156+ init_list(&tmp_values);
157+ status = apply_transformation(tr->next, obj, tmp_values);
158+ if (status != U1DB_OK)
159+ goto finish;
160+ if (tr->args->head != NULL)
161+ {
162+ status = ((operation)tr->op)(result, tmp_values, tr->args);
163+ } else {
164+ status = ((operation)tr->op)(result, tmp_values, NULL);
165+ }
166+ } else {
167+ status = extract_field_values(result, obj, tr->args);
168+ goto finish;
169+ }
170+finish:
171+ destroy_list(tmp_values);
172+ return status;
173+}
174+
175+static int
176+split(string_list *result, char *string, char splitter)
177+{
178+ int status = U1DB_OK;
179+ char *result_ptr, *split_point;
180+ result_ptr = string;
181+ while (result_ptr != NULL) {
182+ split_point = strchr(result_ptr, splitter);
183+ if (split_point != NULL) {
184+ *split_point = '\0';
185+ split_point++;
186+ }
187+ status = append(result, result_ptr);
188+ if (status != U1DB_OK)
189+ return status;
190+ result_ptr = split_point;
191+ }
192+ return status;
193+}
194+
195+static int
196 list_index(string_list *list, char *data)
197 {
198 int i = 0;
199@@ -113,20 +244,22 @@
200 }
201
202 static int
203-list_copy(string_list *copy, const string_list *original)
204+is_word_char(char c)
205 {
206- string_list_item *item = NULL;
207- int status = U1DB_OK;
208- for (item = original->head; item != NULL; item = item->next)
209- if ((status = append(copy, item->data)) != U1DB_OK)
210- {
211- return status;
212- }
213- return status;
214+ if (isalnum(c))
215+ {
216+ return 0;
217+ }
218+ if (c == '.')
219+ return 0;
220+ if (c == '_')
221+ return 0;
222+ return -1;
223 }
224
225 static int
226-op_lower(string_list *result, const string_list *values)
227+op_lower(string_list *result, const string_list *values,
228+ const string_list *args)
229 {
230 string_list_item *item = NULL;
231 char *new_value, *value = NULL;
232@@ -159,7 +292,63 @@
233 }
234
235 static int
236-op_split_words(string_list *result, const string_list *values)
237+op_number(string_list *result, const string_list *values,
238+ const string_list *args)
239+{
240+ string_list_item *item = NULL;
241+ char *p, *new_value, *value, *number = NULL;
242+ int count, zeroes, value_size, isnumber;
243+ int status = U1DB_OK;
244+
245+ number = args->head->data;
246+ for (p = number; *p; p++) {
247+ if (isdigit(*p) == 0) {
248+ status = U1DB_INVALID_VALUE_FOR_INDEX;
249+ goto finish;
250+ }
251+ }
252+ zeroes = atoi(number);
253+
254+ for (item = values->head; item != NULL; item = item->next)
255+ {
256+ value = item->data;
257+ isnumber = 1;
258+ for (p = value; *p; p++) {
259+ if (isdigit(*p) == 0) {
260+ isnumber = 0;
261+ break;
262+ }
263+ }
264+ if (isnumber == 0) {
265+ continue;
266+ }
267+ value_size = max(strlen(value), zeroes) + 1;
268+ new_value = (char *)calloc(value_size, 1);
269+ if (new_value == NULL)
270+ {
271+ status = U1DB_NOMEM;
272+ goto finish;
273+ }
274+ count = snprintf(new_value, value_size, "%0*d", zeroes, atoi(value));
275+ if (count != (value_size - 1)) {
276+ // Most likely encoding issues.
277+ status = U1DB_INVALID_PARAMETER;
278+ goto finish;
279+ }
280+ if ((status = append(result, new_value)) != U1DB_OK)
281+ {
282+ free(new_value);
283+ goto finish;
284+ }
285+ free(new_value);
286+ }
287+finish:
288+ return status;
289+}
290+
291+static int
292+op_split_words(string_list *result, const string_list *values,
293+ const string_list *args)
294 {
295 string_list_item *item = NULL;
296 char *intermediate, *intermediate_ptr = NULL;
297@@ -560,128 +749,124 @@
298 }
299
300 static int
301-extract_field_values(const char *expression, string_list *values,
302- json_object *obj)
303-{
304- char *lparen, *rparen, *sub = NULL;
305- char *result, *result_ptr, *dot_chr = NULL;
306- struct array_list *list_val = NULL;
307- json_object *val = NULL;
308- int path_size, i;
309- int status = U1DB_OK;
310- lparen = strchr(expression, '(');
311- if (lparen == NULL)
312- {
313- result = strdup(expression);
314- result_ptr = result;
315- val = obj;
316- while (result_ptr != NULL && val != NULL) {
317- dot_chr = strchr(result_ptr, '.');
318- if (dot_chr != NULL) {
319- *dot_chr = '\0';
320- dot_chr++;
321- }
322- // TODO: json_object uses ref-counting. Do we need to be
323- // json_object_put to the previous val so it gets cleaned up
324- // appropriately?
325- val = json_object_object_get(val, result_ptr);
326- result_ptr = dot_chr;
327- }
328- if (result != NULL)
329- {
330- free(result);
331- }
332- if (val == NULL)
333- {
334- return U1DB_OK;
335- }
336- if (json_object_is_type(val, json_type_string)) {
337- if ((status = append(values, json_object_get_string(val))) != U1DB_OK)
338- {
339- return status;
340- }
341- } else if (json_object_is_type(val, json_type_array)) {
342- list_val = json_object_get_array(val);
343- for (i = 0; i < list_val->length; i++)
344- {
345- if ((status = append(values, json_object_get_string(
346- array_list_get_idx(
347- list_val, i)))) != U1DB_OK)
348- {
349- return status;
350- }
351- }
352- } else {
353- }
354- // TODO: this segfaults unreliably:
355- // json_object_put(val);
356- return U1DB_OK;
357- }
358- rparen = strrchr(expression, ')');
359- if (rparen == NULL)
360- {
361- return U1DB_INVALID_VALUE_FOR_INDEX;
362- }
363- path_size = ((rparen - 1) - (lparen + 1)) + 1;
364- sub = (char *)calloc(path_size + 1, 1);
365- if (sub != NULL)
366- {
367- strncpy(sub, lparen + 1, path_size);
368- sub[path_size] = '\0';
369- status = extract_field_values(sub, values, obj);
370- free(sub);
371- }
372- return status;
373-}
374-
375-static int
376-apply_operations(const char *expression, string_list *result,
377- const string_list *values)
378-{
379- operation op = NULL;
380- char *lparen, *op_name = NULL;
381- int i, op_size;
382- int status = U1DB_OK;
383- string_list *tmp_values = NULL;
384- lparen = strchr(expression, '(');
385- if (lparen == NULL)
386- {
387- if ((status = list_copy(result, values)) != U1DB_OK)
388- return status;
389- return U1DB_OK;
390- }
391- op_size = ((lparen - 1) - expression) + 1;
392- op_name = (char *)calloc(op_size + 1, 1);
393- if (op_name != NULL)
394- {
395- strncpy(op_name, expression, op_size);
396- op_name[op_size] = '\0';
397+parse(const char *field, transformation *result)
398+{
399+ transformation *inner = NULL;
400+ char *new_field, *new_ptr, *argptr, *argend, *word, *first_comma = NULL;
401+ int status = U1DB_OK;
402+ int i, size;
403+ char *field_copy, *end = NULL;
404+ field_copy = strdup(field);
405+ end = field_copy;
406+ while (is_word_char(*end) == 0)
407+ {
408+ end++;
409+ }
410+ if (*end == '\0')
411+ {
412+ word = strdup(field_copy);
413+ if (word == NULL)
414+ {
415+ status = U1DB_NOMEM;
416+ goto finish;
417+ }
418+ }
419+ else {
420+ // TODO: unicode fieldnames ?
421+ size = (end - field_copy);
422+ word = (char *)calloc(size + 1, 1);
423+ strncpy(word, field_copy, size);
424+ }
425+ new_field = strdup(end);
426+ new_ptr = new_field;
427+ if (status != U1DB_OK)
428+ goto finish;
429+ if (*new_ptr == '(')
430+ {
431+ if (new_field[strlen(new_field) - 1] != ')')
432+ {
433+ status = U1DB_INVALID_TRANSFORMATION_FUNCTION;
434+ goto finish;
435+ }
436+ // step into parens
437+ new_ptr++;
438+ new_field[strlen(new_field) - 1] = '\0';
439 for (i = 0; i < OPERATIONS; i++)
440 {
441- if (strcmp(OPERATORS[i], op_name) == 0)
442+ if (strcmp(OPERATORS[i], word) == 0)
443 {
444- op = operations[i];
445+ result->op = operations[i];
446 break;
447 }
448 }
449- if (op == NULL)
450- {
451- status = U1DB_INVALID_VALUE_FOR_INDEX;
452- goto finish;
453- }
454- if ((status = init_list(&tmp_values)) != U1DB_OK)
455- goto finish;
456- if ((status = apply_operations(lparen + 1, tmp_values, values)) !=
457- U1DB_OK)
458- goto finish;
459- op(result, tmp_values);
460+ if (result == NULL)
461+ {
462+ status = U1DB_UNKNOWN_OPERATION;
463+ goto finish;
464+ }
465+ first_comma = strchr(new_field, ',');
466+ if (first_comma != NULL)
467+ {
468+ argptr = first_comma + 1;
469+ argend = argptr;
470+ *first_comma = '\0';
471+ while (*argend != '\0')
472+ {
473+ if (*argend == ',')
474+ {
475+ *argend = '\0';
476+ while (*argptr == ' ')
477+ argptr++;
478+ status = append(result->args, argptr);
479+ if (status != U1DB_OK)
480+ goto finish;
481+ argptr = argend + 1;
482+ }
483+ argend++;
484+ }
485+ if ((argend - argptr) > 0)
486+ {
487+ while (*argptr == ' ')
488+ argptr++;
489+ status = append(result->args, argptr);
490+ if (status != U1DB_OK)
491+ goto finish;
492+ }
493+ }
494+ status = init_transformation(&inner);
495+ if (status != U1DB_OK)
496+ goto finish;
497+ status = parse(new_ptr, inner);
498+ if (status != U1DB_OK)
499+ {
500+ // free the memory if the parsing fails. Otherwise, inner will be
501+ // cleaned up when the outer transformation (result) is destroyed.
502+ destroy_transformation(inner);
503+ goto finish;
504+ }
505+ result->next = inner;
506+ } else {
507+ if (*new_ptr != '\0')
508+ {
509+ status = U1DB_UNHANDLED_CHARACTERS;
510+ goto finish;
511+ }
512+ if (strlen(word) == 0)
513+ {
514+ status = U1DB_MISSING_FIELD_SPECIFIER;
515+ goto finish;
516+ }
517+ if (word[strlen(word) - 1] == '.')
518+ {
519+ status = U1DB_INVALID_FIELD_SPECIFIER;
520+ goto finish;
521+ }
522+ status = split(result->args, word, '.');
523 }
524 finish:
525- destroy_list(tmp_values);
526- if (op_name != NULL)
527- {
528- free(op_name);
529- }
530+ free(word);
531+ free(field_copy);
532+ free(new_field);
533 return status;
534 }
535
536@@ -689,7 +874,8 @@
537 evaluate_index_and_insert_into_db(void *context, const char *expression)
538 {
539 struct evaluate_index_context *ctx;
540- string_list *tmp_values, *values = NULL;
541+ transformation *tr = NULL;
542+ string_list *values = NULL;
543 string_list_item *item = NULL;
544 int status = U1DB_OK;
545
546@@ -697,19 +883,15 @@
547 if (ctx->obj == NULL || !json_object_is_type(ctx->obj, json_type_object)) {
548 return U1DB_INVALID_JSON;
549 }
550- if ((status = init_list(&tmp_values)) != U1DB_OK)
551- goto finish;
552- if ((status = extract_field_values(expression, tmp_values, ctx->obj)) !=
553- U1DB_OK)
554- {
555- goto finish;
556- }
557+ status = init_transformation(&tr);
558+ if (status != U1DB_OK)
559+ goto finish;
560+ status = parse(expression, tr);
561+ if (status != U1DB_OK)
562+ goto finish;
563 if ((status = init_list(&values)) != U1DB_OK)
564 goto finish;
565- if ((status = apply_operations(expression, values, tmp_values)) != U1DB_OK)
566- {
567- goto finish;
568- }
569+ status = apply_transformation(tr, ctx->obj, values);
570 for (item = values->head; item != NULL; item = item->next)
571 {
572 if ((status = add_to_document_fields(ctx->db, ctx->doc_id, expression,
573@@ -717,8 +899,8 @@
574 goto finish;
575 }
576 finish:
577- destroy_list(tmp_values);
578 destroy_list(values);
579+ destroy_transformation(tr);
580 return status;
581 }
582
583
584=== modified file 'u1db/query_parser.py'
585--- u1db/query_parser.py 2011-12-21 19:33:35 +0000
586+++ u1db/query_parser.py 2012-05-02 14:10:25 +0000
587@@ -141,6 +141,30 @@
588 return [val.lower() for val in values if self._can_transform(val)]
589
590
591+class Number(Transformation):
592+ """Convert an integer to a zero padded string.
593+
594+ This transformation will return None for non-integer inputs. However, it
595+ will transform any integers in a list, dropping any elements that are not
596+ integers.
597+ """
598+
599+ name = 'number'
600+
601+ def __init__(self, inner, number):
602+ super(Number, self).__init__(inner)
603+ self.padding = "%%0%sd" % number
604+
605+ def _can_transform(self, val):
606+ return not isinstance(val, (str, bool, float, list, dict))
607+
608+ def transform(self, values):
609+ """Transform any integers in values into zero padded strings."""
610+ if not values:
611+ return []
612+ return [self.padding % (v,) for v in values if self._can_transform(v)]
613+
614+
615 class SplitWords(Transformation):
616 """Split a string on whitespace.
617
618@@ -187,7 +211,6 @@
619 _word_chars = string.lowercase + string.uppercase + "._" + string.digits
620
621 def _take_word(self, partial):
622- word = ''
623 for idx, char in enumerate(partial):
624 if char not in self._word_chars:
625 return partial[:idx], partial[idx:]
626@@ -208,8 +231,19 @@
627 if op is None:
628 raise errors.IndexDefinitionParseError(
629 "Unknown operation: %s" % word)
630- inner = self._inner_parse(field[1:-1])
631- return op(inner)
632+ if ',' in field:
633+ # XXX: The arguments should probably be cast to whatever types
634+ # they represent, but short of evaling them, I don't see an
635+ # easy way to do that without adding a lot of complexity.
636+ # Since there is only one operation with an extra argument, I'm
637+ # punting on this until we grow some more.
638+ args = [a.strip() for a in field[1:-1].split(',')]
639+ extracted = args[0]
640+ else:
641+ args = []
642+ extracted = field[1:-1]
643+ inner = self._inner_parse(extracted)
644+ return op(inner, *args[1:])
645 else:
646 if len(field) != 0:
647 raise errors.IndexDefinitionParseError(
648@@ -235,4 +269,5 @@
649
650 Parser.register_transormation(SplitWords)
651 Parser.register_transormation(Lower)
652+Parser.register_transormation(Number)
653 Parser.register_transormation(IsNull)
654
655=== modified file 'u1db/tests/test_backends.py'
656--- u1db/tests/test_backends.py 2012-04-30 11:19:18 +0000
657+++ u1db/tests/test_backends.py 2012-05-02 14:10:25 +0000
658@@ -840,7 +840,7 @@
659 def test_index_lower_doesnt_match_different_case(self):
660 self.db.create_index("index", ["lower(name)"])
661 content = '{"name": "Foo"}'
662- doc = self.db.create_doc(content)
663+ self.db.create_doc(content)
664 rows = self.db.get_from_index("index", [("Foo", )])
665 self.assertEqual([], rows)
666
667@@ -848,7 +848,7 @@
668 self.db.create_index("index", ["lower(name)"])
669 self.db.create_index("other_index", ["name"])
670 content = '{"name": "Foo"}'
671- doc = self.db.create_doc(content)
672+ self.db.create_doc(content)
673 rows = self.db.get_from_index("index", [("Foo", )])
674 self.assertEqual(0, len(rows))
675
676@@ -894,6 +894,20 @@
677 rows = self.db.get_from_index("index", [("bar", )])
678 self.assertEqual([doc], rows)
679
680+ def test_get_from_index_with_number(self):
681+ self.db.create_index("index", ["number(foo, 5)"])
682+ content = '{"foo": 12}'
683+ doc = self.db.create_doc(content)
684+ rows = self.db.get_from_index("index", [("00012", )])
685+ self.assertEqual([doc], rows)
686+
687+ def test_get_from_index_with_number_bigger_than_padding(self):
688+ self.db.create_index("index", ["number(foo, 5)"])
689+ content = '{"foo": 123456}'
690+ doc = self.db.create_doc(content)
691+ rows = self.db.get_from_index("index", [("123456", )])
692+ self.assertEqual([doc], rows)
693+
694 def test_get_index_keys_from_index(self):
695 self.db.create_index('test-idx', ['key'])
696 content1 = '{"key": "value1"}'
697
698=== modified file 'u1db/tests/test_query_parser.py'
699--- u1db/tests/test_query_parser.py 2011-12-21 19:33:35 +0000
700+++ u1db/tests/test_query_parser.py 2012-05-02 14:10:25 +0000
701@@ -184,6 +184,63 @@
702 ['foo baz', {'baa': 'xam'}, 'bar sux'])
703
704
705+class TestNumber(tests.TestCase):
706+
707+ def assertNumber(self, expected, value, padding=5):
708+ """Assert number transformation produced expected values."""
709+ getter = query_parser.Number(query_parser.StaticGetter(value), padding)
710+ self.assertEqual(expected, getter.get(trivial_raw_doc))
711+
712+ def test_inner_returns_None(self):
713+ """None is thrown away."""
714+ self.assertNumber([], None)
715+
716+ def test_inner_returns_int(self):
717+ """A single integer is converted to zero padded strings."""
718+ self.assertNumber(['00009'], 9)
719+
720+ def test_inner_returns_list(self):
721+ """Integers are converted to zero padded strings."""
722+ self.assertNumber(['00009', '00235'], [9, 235])
723+
724+ def test_inner_returns_string(self):
725+ """A string is thrown away."""
726+ self.assertNumber([], 'foo bar')
727+
728+ def test_inner_returns_float(self):
729+ """A float is thrown away."""
730+ self.assertNumber([], 9.2)
731+
732+ def test_inner_returns_bool(self):
733+ """A boolean is thrown away."""
734+ self.assertNumber([], True)
735+
736+ def test_inner_returns_list_containing_strings(self):
737+ """Strings in a list are thrown away."""
738+ self.assertNumber(['00009'], ['foo baz', 9, 'bar sux'])
739+
740+ def test_inner_returns_list_containing_float(self):
741+ """Floats in a list are thrown away."""
742+ self.assertNumber(
743+ ['00083', '00073'], [83, 9.2, 73])
744+
745+ def test_inner_returns_list_containing_bool(self):
746+ """Booleans in a list are thrown away."""
747+ self.assertNumber(
748+ ['00083', '00073'], [83, True, 73])
749+
750+ def test_inner_returns_list_containing_list(self):
751+ """Lists in a list are thrown away."""
752+ # TODO: Expand sub-lists?
753+ self.assertNumber(
754+ ['00012', '03333'], [12, [29], 3333])
755+
756+ def test_inner_returns_list_containing_dict(self):
757+ """Dicts in a list are thrown away."""
758+ self.assertNumber(
759+ ['00012', '00001'], [12, {54: 89}, 1])
760+
761+
762 class TestIsNull(tests.TestCase):
763
764 def assertIsNull(self, value):

Subscribers

People subscribed via source and target branches