Merge lp:~thisfred/u1db/number-mapping into lp:u1db
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Merged at revision: | 262 | ||||
| Proposed branch: | lp:~thisfred/u1db/number-mapping | ||||
| Merge into: | lp:u1db | ||||
| Diff against target: |
764 lines (+444/-151) 6 files modified
include/u1db/u1db.h (+5/-0) setup.py (+1/-1) src/u1db_query.c (+327/-145) u1db/query_parser.py (+38/-3) u1db/tests/test_backends.py (+16/-2) u1db/tests/test_query_parser.py (+57/-0) |
||||
| To merge this branch: | bzr merge lp:~thisfred/u1db/number-mapping | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| John A Meinel (community) | 2012-04-27 | Approve on 2012-05-02 | |
|
Review via email:
|
|||
Commit Message
This adds indexing of numeric values as zero padded strings, with the number(field, no_of_zeroes) mapping.
Description of the Change
This adds indexing of numeric values as zero padded strings, with the number(field, no_of_zeroes) mapping.
| John A Meinel (jameinel) wrote : | # |
| Eric Casteleijn (thisfred) wrote : | # |
I addressed all of your comments, so please have another look, to see if you agree with the direction I'm taking if nothing else.
Known issues:
- there is some memory leaking, but make valgrind-leaks does not fail any tests... (I will chase this down tomorrow)
- the parsing of the field_mappings still happens once for every document. This is now no longer necessary, but lifting the transformations out of evaluate_
- 267. By Eric Casteleijn on 2012-05-01
-
added pyxdg
- 268. By Eric Casteleijn on 2012-05-01
-
merged trunk and fixed conflict
- 269. By Eric Casteleijn on 2012-05-01
-
added test
- 270. By Eric Casteleijn on 2012-05-01
-
fixed memory leak
| Eric Casteleijn (thisfred) wrote : | # |
Memory issue found and fixed.
| John A Meinel (jameinel) wrote : | # |
I'm thinking the max() code is defined elsewhere as well, we probably want it in a more common header, like maybe u1db_internal.h
111 + if (tr->args->head != NULL)
112 + {
113 + status = ((args_
114 + } else {
115 + status = ((operation)
116 + }
I think I'd rather just have all operations take a void * that could be the args list, rather than having this code special case. Just some operations would have a NULL args list.
I feel like I'd rather have the args as a void* in the transformation code, and have it be a struct that is defined by the type, rather than a generic linked-list of string values. For now this is fine, but it feels like a better place to end up at. (Maybe you have more of that in the next steps)
...
char string_value[21];
Thinking about this, we should pull the 21 out to be a defined constant with a comment about "max length of a decimal integer".
...
466 + status = init_transforma
467 + if (status != U1DB_OK)
468 + goto finish;
469 + status = parse(new_ptr, inner);
470 + if (status != U1DB_OK)
471 + {
472 + destroy_
473 + goto finish;
474 + }
475 + result->next = inner;
I only see a code path that calls "destroy_
...
91 +static void
92 +destroy_
93 +{
94 + if (tr->next != NULL)
You don't have a check here:
if (tr == NULL) return;
Because if you go to parse a sequence, and you get an error with the first item, you'll never set a result, I think.
Actually, it looks like that only triggers if you fail to init the transformation in the first place, but it still seems to be a valid concern.
Otherwise, I'm happy enough.
| Eric Casteleijn (thisfred) wrote : | # |
> I'm thinking the max() code is defined elsewhere as well, we probably want it
> in a more common header, like maybe u1db_internal.h
I'll have a search. Mine might not be safe to use in all contexts.
> 111 + if (tr->args->head != NULL)
> 112 + {
> 113 + status = ((args_
> 114 + } else {
> 115 + status = ((operation)
> 116 + }
>
>
> I think I'd rather just have all operations take a void * that could be the
> args list, rather than having this code special case. Just some operations
> would have a NULL args list.
>
> I feel like I'd rather have the args as a void* in the transformation code,
> and have it be a struct that is defined by the type, rather than a generic
> linked-list of string values. For now this is fine, but it feels like a better
> place to end up at. (Maybe you have more of that in the next steps)
The arguments are parsed out of the index expression, but unless we start annotating the operations with how many and what types of arguments they expect, or hardcode lots of conditional logic about the different operators in the parser, I don't see how we can do much more than split on commas and let the operations deal with that list of strings. As we accumulate more operations, having this kind of type annotation may become worth the effort, but since we now have exactly one operation with a single extra argument, I don't think it's time to make that call yet. I will change the extra argument to a void which we can pass a NULL to in the usual case, because that is a definite improvement.
> ...
> char string_value[21];
>
> Thinking about this, we should pull the 21 out to be a defined constant with a
> comment about "max length of a decimal integer".
will do
> ...
>
> 466 + status = init_transforma
> 467 + if (status != U1DB_OK)
> 468 + goto finish;
> 469 + status = parse(new_ptr, inner);
> 470 + if (status != U1DB_OK)
> 471 + {
> 472 + destroy_
> 473 + goto finish;
> 474 + }
> 475 + result->next = inner;
>
> I only see a code path that calls "destroy_
> not U1DB_OK. Maybe I'm missing something because the transformation is a
> linked list?
I'll add a comment, but yeah: if the creation of the inner transformation succeeds, it will be cleaned up recursively when the outermost one is destroyed.
> ...
>
> 91 +static void
> 92 +destroy_
> 93 +{
> 94 + if (tr->next != NULL)
>
> You don't have a check here:
>
> if (tr == NULL) return;
>
> Because if you go to parse a sequence, and you get an error with the first
> item, you'll never set a result, I think.
> Actually, it looks like that only triggers if you fail to init the
> transformation in the first place, but it still seems to be a valid concern.
Will fix.
- 271. By Eric Casteleijn on 2012-05-02
-
simplification of operations
- 272. By Eric Casteleijn on 2012-05-02
-
added comment and fixed destroy transformation
| Eric Casteleijn (thisfred) wrote : | # |
> ... I will change the extra argument
> to a void which we can pass a NULL to in the usual case, because that is a
> definite improvement.
Actually, if it's always a string_list or null, we might as well keep the type. I did collapse the three different operation types back into one though, since that made no sense.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 4/27/2012 2:32 PM, Eric Casteleijn wrote: /bugs.launchpad .net/u1db/ +bug/987412 /code.launchpad .net/~thisfred/ u1db/number- mapping/ +merge/ 103862
> Eric Casteleijn has proposed merging
> lp:~thisfred/u1db/number-mapping into lp:u1db.
>
> Requested reviews: John A Meinel (jameinel) Related bugs: Bug
> #987412 in U1DB: "support indexes on non-string types"
> https:/
>
> For more details, see:
> https:/
>
> This adds indexing of numeric values as zero padded strings, with
> the number(field, no_of_zeroes) mapping.
I think this hints that operations should really be more of a
class-like structure. Rather than a function that you pass varargs,
you really want a bit of context along with your function.
I also don't fully understand why number has to be passed in as a
string for every application of 'op_number'. As that means for 50k
documents you call atoi on that number for each doc. It isn't
terrible, but it is surprising.
At one point, I thought you were looking to change the code so we
didn't have to parse the description every time. Did you get anywhere
with that?
+ value = item->data; value_size, 1);
+ for (p = value; *p; p++) {
+ if (isdigit(*p) == 0) {
+ continue;
+ }
+ }
+ value_size = strlen(value);
+ if (zeroes > value_size)
+ value_size = zeroes;
+ value_size++;
+ new_value = (char *)calloc(
+ if (new_value == NULL)
+ {
+ status = U1DB_NOMEM;
+ goto finish;
+ }
+ count = snprintf(new_value, value_size, "%0*d", zeroes,
atoi(value));
I thought originally you had the wrong buffer size, but I missed the
if(zeros > value_size) line. Would it be clearer as:
value_size = max(strlen(value), zeros) + 1;
We haven't clearly specified what should happen if you use number(x,
5) but the number in a document is >=100,000. And I don't see a test
case for a number too big for the width.
...
trivial_raw_doc = {}
+PADDING = 5
I don't really prefer global constants like this. I would probably do
it as:
+ def assertNumber(self, expected, value, padding=5):
That will also let you easily test a number that is way to big to fit,
and a padding size that is way to big to put all the digits in.
+ for (p = value; *p; p++) {
+ if (isdigit(*p) == 0) {
+ continue;
+ }
+ }
I don't see the point of that loop. You don't do anything with 'p'
after the loop.
Maybe that is the 'p' that you check earlier? And you forgot and left
this loop in?
...
+static char *
+itoa(int integer_value)
+{
Because the maximum size of an integer is known in advance
(len(str(2**64)) == 20), I think we would be better off avoiding
counting the size of the integer and doing a calloc + free, and just
having a buffer of size 21, that gets allocated on the stack and
passed into something like 'snprintf(buf, 21, "%d", val)'.
So that would make this code:
+ } else if (json_object_ is_type( val, json_type_int)) { get_int( val); value);
+ integer_value = json_object_
+ string_value = itoa(integer_
+ ...