Merge lp:~allenap/storm/value-columns-by-name into lp:storm

Proposed by Gavin Panella
Status: Work in progress
Proposed branch: lp:~allenap/storm/value-columns-by-name
Merge into: lp:storm
Diff against target: 182 lines (+107/-9)
2 files modified
storm/store.py (+40/-8)
tests/store/base.py (+67/-1)
To merge this branch: bzr merge lp:~allenap/storm/value-columns-by-name
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Disapprove
Jamu Kakar (community) Needs Fixing
James Henstridge Needs Fixing
Review via email: mp+23480@code.launchpad.net

Commit message

ResultSet.values() can now accept column names as well as columns themselves.

Description of the change

In an environment like Launchpad where a lot of the code only uses Storm via a Zope prophylactic, and there is also an importfascist that complains bitterly when model code is imported by non-model modules, it would be extremely handy to be able to pass names into ResultSet.values() rather than columns.

I might have done this all the wrong way, but it's a start and I'm happy to learn the right way to get this branch landed.

To post a comment you must log in.
Revision history for this message
Jamu Kakar (jkakar) wrote :

Can you please file a bug and link this branch to it? All Storm
branches should have an associated bug.

Revision history for this message
James Henstridge (jamesh) wrote :

If we want to support this feature, it should probably exclude all tuple finds. For example:

  result = store.find((Company, Employee), Company.id == Employee.company_id)
  ids = result.values('id')

Assuming both tables have an 'id' column, which one am I referring to? It seems better to limit the feature to the simple situations where we can handle it unambiguously.

review: Needs Fixing
Revision history for this message
James Henstridge (jamesh) wrote :

And a follow up: to be consistent with Store.find(), I'd suggest using getattr(find_spec.default_cls, name) to convert names to column objects.

In the tuple find case, default_cls will be None, so you should check for this and raise FeatureError.

Revision history for this message
Jamu Kakar (jkakar) wrote :

[1]

Can you please add docstrings to the tests. Also, the @param for
'columns' in the docstring for ResultSet.values needs to be updated.

[2]

+ def test_find_values_by_column_name(self):

There are several different cases being exercised in this test. I
recommend you break it up into several smaller tests.

[3]

+ # If more than one column in the result set has the same name,
+ # the first will be chosen.
+ result = self.store.find(
+ (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
+ result = result.config(distinct=True).order_by(Foo.id)
+ values = result.values('id')
+ self.assertEquals(list(values), [10, 20])

I agree with James that this case should raise FeatureError. We
shouldn't be making guesses about which column the user is
specifying if the choice is ambiguous.

[4]

+ def test_find_multiple_values_by_column_name(self):

This test could also be split into two, so that each test exercises
one specific behaviour (multiple columns, and mixed string and Column
values).

review: Needs Fixing
Revision history for this message
Gavin Panella (allenap) wrote :

Getting columns from names is now done by getting the columns from the
find set, using it to construct a dict of name -> [columns] and
mapping with that. When the result of the mapping is a list of length
2 or more then it raises FeatureError. In this way it will work
naturally with tuple finds, and allow values() to (not quite, there's
another issue) yield expression values.

I've also changed it to copy the select before modifying it. I don't
know how necessary this is, but the code before looked a bit suspect
to my untrained eye.

Also, when variable_factory is not found on the column, it defaults to
Variable. This means that it's possible to do things like
results.values(Sum(Table.price)).

Okay, having said all that, if this is too fancy, or liable to cause
issues, I'll go for the approach in get_where_for_args().

However, if this approach is okay, a couple of notes:

- Either or both of the new methods ResultSet._get_column_name_map()
  and ResultSet._map_column_names() could be moved to FindSpec. Would
  that be a more natural home for them?

- Should _map_column_names() be use in order_by() and group_by()?

1=== modified file 'storm/store.py'
2--- storm/store.py 2010-04-15 13:31:08 +0000
3+++ storm/store.py 2010-04-22 09:32:10 +0000
4@@ -540,7 +540,6 @@
5 else:
6
7 cached_primary_vars = obj_info["primary_vars"]
8- primary_key_idx = cls_info.primary_key_idx
9
10 changes = self._get_changes_map(obj_info)
11
12@@ -1238,34 +1237,63 @@
13 """Get the sum of all values in an expression."""
14 return self._aggregate(Sum, expr, expr)
15
16+ def _get_column_name_map(self):
17+ """Return a mapping of column name to lists of columns."""
18+ columns, tables = self._find_spec.get_columns_and_tables()
19+ column_map = {}
20+ for column in columns:
21+ if isinstance(column, (Alias, Column)):
22+ if column.name in column_map:
23+ column_map[column.name].append(column)
24+ else:
25+ column_map[column.name] = [column]
26+ return column_map
27+
28+ def _map_column_names(self, columns):
29+ """Attempt to map column names to actual columns."""
30+ column_map = self._get_column_name_map()
31+ for column in columns:
32+ if column in column_map:
33+ column_name, column_list = column, column_map[column]
34+ if len(column_list) != 1:
35+ raise FeatureError("Ambiguous column: %s" % column_name)
36+ [column] = column_list
37+ if isinstance(column, Alias):
38+ column = column.expr
39+ yield column
40+
41 def values(self, *columns):
42 """Retrieve only the specified columns.
43
44 This does not load full objects from the database into Python.
45
46- @param columns: One or more L{storm.expr.Column} objects whose
47- values will be fetched.
48+ @param columns: One or more L{storm.expr.Column} objects, or
49+ column names, whose values will be fetched. If column
50+ names are given, each must refer unambiguously to a named
51+ column or alias in the result set.
52 @return: An iterator of tuples of the values for each column
53 from each matching row in the database.
54 """
55 if not columns:
56 raise FeatureError("values() takes at least one column "
57 "as argument")
58- select = self._get_select()
59- column_map = dict(
60- (column.name, column)
61- for column in reversed(select.columns)
62- if isinstance(column, Column))
63- columns = [column_map.get(column, column) for column in columns]
64+ columns = list(self._map_column_names(columns))
65+ # replace_columns() can lose ordering so it's not good here.
66+ select = copy(self._get_select())
67 select.columns = columns
68 result = self._store._connection.execute(select)
69- if len(columns) == 1:
70- variable = columns[0].variable_factory()
71+ variable_factories = [
72+ getattr(column, 'variable_factory', Variable)
73+ for column in columns]
74+ variables = [
75+ variable_factory()
76+ for variable_factory in variable_factories]
77+ if len(variables) == 1:
78+ [variable] = variables
79 for values in result:
80 result.set_variable(variable, values[0])
81 yield variable.get()
82 else:
83- variables = [column.variable_factory() for column in columns]
84 for values in result:
85 for variable, value in zip(variables, values):
86 result.set_variable(variable, value)
87@@ -1370,7 +1398,6 @@
88 def get_column(column):
89 return obj_info.variables[column].get()
90 objects = []
91- cls = self._find_spec.default_cls_info.cls
92 for obj_info in self._store._iter_alive():
93 try:
94 if (obj_info.cls_info is self._find_spec.default_cls_info and
95
96=== modified file 'tests/store/base.py'
97--- tests/store/base.py 2010-04-15 13:31:08 +0000
98+++ tests/store/base.py 2010-04-22 10:30:27 +0000
99@@ -1002,25 +1002,51 @@
100 self.assertEquals([type(value) for value in values],
101 [unicode, unicode, unicode])
102
103+ def test_find_values_with_expression(self):
104+ """
105+ An expression can be passed to ResultSet.values().
106+ """
107+ values = self.store.find(Foo.id).values(Sum(Foo.id))
108+ self.assertEquals(list(values), [60])
109+
110 def test_find_values_by_column_name(self):
111+ """
112+ ResultSet.values() can accept column names which are mapped to
113+ columns in the find spec.
114+ """
115 result = self.store.find(Foo).order_by(Foo.id)
116 values = result.values('id')
117 self.assertEquals(list(values), [10, 20, 30])
118 values = result.values('title')
119 self.assertEquals(list(values), ["Title 30", "Title 20", "Title 10"])
120- # If more than one column in the result set has the same name,
121- # the first will be chosen.
122+
123+ def test_find_values_by_alias_name(self):
124+ """
125+ Alias names can also be passed to ResultSet.values().
126+ """
127+ result = self.store.find(Alias(Foo.id, 'foo')).order_by(Foo.id)
128+ values = result.values('foo')
129+ self.assertEquals(list(values), [10, 20, 30])
130+
131+ def test_find_values_by_alias_name_to_expression(self):
132+ """
133+ Alias names can be passed to ResultSet.values(), even if the
134+ aliased column is actually an expression.
135+ """
136+ result = self.store.find(Alias(Sum(Foo.id), 'foo')).order_by(Foo.id)
137+ values = result.values('foo')
138+ self.assertEquals(list(values), [60])
139+
140+ def test_find_values_by_ambiguous_column_name(self):
141+ """
142+ If more than one column in the find spec has the same name,
143+ FeatureError is raised.
144+ """
145 result = self.store.find(
146 (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
147 result = result.config(distinct=True).order_by(Foo.id)
148 values = result.values('id')
149- self.assertEquals(list(values), [10, 20])
150- # The name is only matched against columns, not aliases or
151- # other expressions.
152- result = self.store.find(
153- (Alias(SQL('1'), 'id'), Foo.id)).order_by(Foo.id)
154- values = result.values('id')
155- self.assertEquals(list(values), [10, 20, 30])
156+ self.assertRaises(FeatureError, list, values)
157
158 def test_find_multiple_values(self):
159 result = self.store.find(Foo).order_by(Foo.id)
160@@ -1031,12 +1057,22 @@
161 (30, "Title 10")])
162
163 def test_find_multiple_values_by_column_name(self):
164+ """
165+ More than one column name can be given to ResultSet.values();
166+ it will map them all to columns.
167+ """
168 result = self.store.find(Foo).order_by(Foo.id)
169 values = result.values('id', 'title')
170 expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
171 self.assertEquals(list(values), expected)
172- # Columns and column names can be mixed.
173+
174+ def test_find_multiple_values_by_column_and_column_name(self):
175+ """
176+ Columns and column names can be mixed.
177+ """
178+ result = self.store.find(Foo).order_by(Foo.id)
179 values = result.values(Foo.id, 'title')
180+ expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
181 self.assertEquals(list(values), expected)
182
183 def test_find_values_with_no_arguments(self):
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Gavin,

I don't quite see the motivation for this feature. Using the actual class attributes is how it works pretty much everywhere else in Storm, and doesn't look like a big burden when compared to the column names.

If you have a huge class name, you can easily do something like this:

    C = MyHugeClassName

And then have exactly the same call length:

    resultset.values(C.id, C.name)

vs.

    resultset.values("id", "name")

With the advantage that the former, in case of errors, will blow up as syntax errors early, rather than SQL exceptions.

The import fascist isn't a great reason to add support like this in Storm either. Having a result set in the code means you have access to the model objects. What would be the reason of preventing access to the object's class in a situation where you have the object and a *result set* (which allows one to change column values without any checks).

Then, there's a significant additional cost being introduced in a place which used to be pretty lightweight.

For these reasons, I don't feel like it's an improvement over how it works today.

review: Disapprove
Revision history for this message
Gavin Panella (allenap) wrote :

> Gavin,
>
> I don't quite see the motivation for this feature. Using the actual class
> attributes is how it works pretty much everywhere else in Storm, and doesn't
> look like a big burden when compared to the column names.

This patch also lets you specify an aliased column, which could be an
expression rather than a plain column on a table. Perhaps there is a
better way of achieving this.

>
> If you have a huge class name, you can easily do something like this:
>
> C = MyHugeClassName
>
> And then have exactly the same call length:
>
> resultset.values(C.id, C.name)
>
> vs.
>
> resultset.values("id", "name")

Brevity wasn't my concern in this patch :)

> With the advantage that the former, in case of errors, will blow up as syntax
> errors early, rather than SQL exceptions.

Actually, somewhat amusingly, you'll get the string for the invalid
column back instead. So:

  list(store.find(Foo.id).values('frooty'))

returns:

  ['frooty', 'frooty', 'frooty']

But that's possible to remedy.

>
> The import fascist isn't a great reason to add support like this in Storm
> either. Having a result set in the code means you have access to the model
> objects. What would be the reason of preventing access to the object's class
> in a situation where you have the object and a *result set* (which allows one
> to change column values without any checks).

In Launchpad result sets are often returned from methods called on a
secured utility, so browser code and script code does not have access
to the model class.

Any code that uses the objects materialized from a result set must
know the name of the attributes its interested in, so it makes sense
that it could ask for the values of those attributes across the rows
defined by the result set.

My use-case is a script with lots of transactions. A result set
defining the interesting rows is obtained early on from a secured
utility and is used in many separate transactions. Currently there can
be as many as ~16000 interesting rows. I want to avoid materializing
these rows into model objects until they're absolutely needed, because
it's slow and the transaction killer is merciless, but sometimes the
code does need one or two attributes from the whole set.

Anyway, that was my reasoning, but, as earlier, there may be a better
way to do it. I could, for example, add more methods on the secured
utility to return different information from the result set for
me. It's not really the way I'd like to structure the code but it only
offends my taste a little bit ;)

>
> Then, there's a significant additional cost being introduced in a place which
> used to be pretty lightweight.

I haven't measured it, but I guess that the cost is still small
compared to compiling the query and doing a round-trip to the
database.

>
> For these reasons, I don't feel like it's an improvement over how it works
> today.

Fair enough, I'm not blocked on this. Thanks for looking at it! I've
learnt a lot about Storm from doing this.

Unmerged revisions

362. By Gavin Panella

Make ResultSet.values() work with expressions.

361. By Gavin Panella

Add docstrings to new private methods.

360. By Gavin Panella

Fix some lint.

359. By Gavin Panella

Make the implementation of _get_column_name_map() more readable and obvious.

358. By Gavin Panella

Update the docstring for ResultSet.values().

357. By Gavin Panella

Raise an error when the column choice is ambiguous, copy the select before mutating it, and break up tests.

356. By Gavin Panella

ResultSet.values() can now accept column names as well as columns themselves.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'storm/store.py'
2--- storm/store.py 2010-02-10 11:29:39 +0000
3+++ storm/store.py 2010-04-22 10:53:24 +0000
4@@ -540,7 +540,6 @@
5 else:
6
7 cached_primary_vars = obj_info["primary_vars"]
8- primary_key_idx = cls_info.primary_key_idx
9
10 changes = self._get_changes_map(obj_info)
11
12@@ -1246,29 +1245,63 @@
13 """Get the sum of all values in an expression."""
14 return self._aggregate(Sum, expr, expr)
15
16+ def _get_column_name_map(self):
17+ """Return a mapping of column name to lists of columns."""
18+ columns, tables = self._find_spec.get_columns_and_tables()
19+ column_map = {}
20+ for column in columns:
21+ if isinstance(column, (Alias, Column)):
22+ if column.name in column_map:
23+ column_map[column.name].append(column)
24+ else:
25+ column_map[column.name] = [column]
26+ return column_map
27+
28+ def _map_column_names(self, columns):
29+ """Attempt to map column names to actual columns."""
30+ column_map = self._get_column_name_map()
31+ for column in columns:
32+ if column in column_map:
33+ column_name, column_list = column, column_map[column]
34+ if len(column_list) != 1:
35+ raise FeatureError("Ambiguous column: %s" % column_name)
36+ [column] = column_list
37+ if isinstance(column, Alias):
38+ column = column.expr
39+ yield column
40+
41 def values(self, *columns):
42 """Retrieve only the specified columns.
43
44 This does not load full objects from the database into Python.
45
46- @param columns: One or more L{storm.expr.Column} objects whose
47- values will be fetched.
48+ @param columns: One or more L{storm.expr.Column} objects, or
49+ column names, whose values will be fetched. If column
50+ names are given, each must refer unambiguously to a named
51+ column or alias in the result set.
52 @return: An iterator of tuples of the values for each column
53 from each matching row in the database.
54 """
55 if not columns:
56 raise FeatureError("values() takes at least one column "
57 "as argument")
58- select = self._get_select()
59+ columns = list(self._map_column_names(columns))
60+ # replace_columns() can lose ordering so it's not good here.
61+ select = copy(self._get_select())
62 select.columns = columns
63 result = self._store._connection.execute(select)
64- if len(columns) == 1:
65- variable = columns[0].variable_factory()
66+ variable_factories = [
67+ getattr(column, 'variable_factory', Variable)
68+ for column in columns]
69+ variables = [
70+ variable_factory()
71+ for variable_factory in variable_factories]
72+ if len(variables) == 1:
73+ [variable] = variables
74 for values in result:
75 result.set_variable(variable, values[0])
76 yield variable.get()
77 else:
78- variables = [column.variable_factory() for column in columns]
79 for values in result:
80 for variable, value in zip(variables, values):
81 result.set_variable(variable, value)
82@@ -1373,7 +1406,6 @@
83 def get_column(column):
84 return obj_info.variables[column].get()
85 objects = []
86- cls = self._find_spec.default_cls_info.cls
87 for obj_info in self._store._iter_alive():
88 try:
89 if (obj_info.cls_info is self._find_spec.default_cls_info and
90
91=== modified file 'tests/store/base.py'
92--- tests/store/base.py 2010-04-16 07:12:13 +0000
93+++ tests/store/base.py 2010-04-22 10:53:24 +0000
94@@ -30,7 +30,8 @@
95 from storm.properties import PropertyPublisherMeta, Decimal
96 from storm.variables import PickleVariable
97 from storm.expr import (
98- Asc, Desc, Select, LeftJoin, SQL, Count, Sum, Avg, And, Or, Eq, Lower)
99+ Alias, And, Asc, Avg, Count, Desc, Eq, LeftJoin, Lower, Or, SQL, Select,
100+ Sum)
101 from storm.variables import Variable, UnicodeVariable, IntVariable
102 from storm.info import get_obj_info, ClassAlias
103 from storm.exceptions import *
104@@ -1001,6 +1002,52 @@
105 self.assertEquals([type(value) for value in values],
106 [unicode, unicode, unicode])
107
108+ def test_find_values_with_expression(self):
109+ """
110+ An expression can be passed to ResultSet.values().
111+ """
112+ values = self.store.find(Foo.id).values(Sum(Foo.id))
113+ self.assertEquals(list(values), [60])
114+
115+ def test_find_values_by_column_name(self):
116+ """
117+ ResultSet.values() can accept column names which are mapped to
118+ columns in the find spec.
119+ """
120+ result = self.store.find(Foo).order_by(Foo.id)
121+ values = result.values('id')
122+ self.assertEquals(list(values), [10, 20, 30])
123+ values = result.values('title')
124+ self.assertEquals(list(values), ["Title 30", "Title 20", "Title 10"])
125+
126+ def test_find_values_by_alias_name(self):
127+ """
128+ Alias names can also be passed to ResultSet.values().
129+ """
130+ result = self.store.find(Alias(Foo.id, 'foo')).order_by(Foo.id)
131+ values = result.values('foo')
132+ self.assertEquals(list(values), [10, 20, 30])
133+
134+ def test_find_values_by_alias_name_to_expression(self):
135+ """
136+ Alias names can be passed to ResultSet.values(), even if the
137+ aliased column is actually an expression.
138+ """
139+ result = self.store.find(Alias(Sum(Foo.id), 'foo')).order_by(Foo.id)
140+ values = result.values('foo')
141+ self.assertEquals(list(values), [60])
142+
143+ def test_find_values_by_ambiguous_column_name(self):
144+ """
145+ If more than one column in the find spec has the same name,
146+ FeatureError is raised.
147+ """
148+ result = self.store.find(
149+ (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
150+ result = result.config(distinct=True).order_by(Foo.id)
151+ values = result.values('id')
152+ self.assertRaises(FeatureError, list, values)
153+
154 def test_find_multiple_values(self):
155 result = self.store.find(Foo).order_by(Foo.id)
156 values = result.values(Foo.id, Foo.title)
157@@ -1009,6 +1056,25 @@
158 (20, "Title 20"),
159 (30, "Title 10")])
160
161+ def test_find_multiple_values_by_column_name(self):
162+ """
163+ More than one column name can be given to ResultSet.values();
164+ it will map them all to columns.
165+ """
166+ result = self.store.find(Foo).order_by(Foo.id)
167+ values = result.values('id', 'title')
168+ expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
169+ self.assertEquals(list(values), expected)
170+
171+ def test_find_multiple_values_by_column_and_column_name(self):
172+ """
173+ Columns and column names can be mixed.
174+ """
175+ result = self.store.find(Foo).order_by(Foo.id)
176+ values = result.values(Foo.id, 'title')
177+ expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
178+ self.assertEquals(list(values), expected)
179+
180 def test_find_values_with_no_arguments(self):
181 result = self.store.find(Foo).order_by(Foo.id)
182 self.assertRaises(FeatureError, result.values().next)

Subscribers

People subscribed via source and target branches

to status/vote changes: