Storm

Merge lp:~allenap/storm/value-columns-by-name into lp:storm

value-columns-by-name
Merge into trunk

Proposed by Gavin Panella on 2010-04-15

Status:

Work in progress

Proposed branch:

lp:~allenap/storm/value-columns-by-name

Merge into:

lp:storm

Diff against target:

182 lines (+107/-9)

2 files modified

storm/store.py (+40/-8)
tests/store/base.py (+67/-1)

To merge this branch:

bzr merge lp:~allenap/storm/value-columns-by-name

Medium

Won't Fix

Link a bug report

Reviewer	Date Requested	Status
Gustavo Niemeyer		Disapprove on 2010-04-22
Jamu Kakar (community)		Needs Fixing on 2010-04-16
James Henstridge	2010-04-15	Needs Fixing on 2010-04-16
Review via email: mp+23480@code.launchpad.net

Commit message

ResultSet.values() can now accept column names as well as columns themselves.

Description of the change

In an environment like Launchpad where a lot of the code only uses Storm via a Zope prophylactic, and there is also an importfascist that complains bitterly when model code is imported by non-model modules, it would be extremely handy to be able to pass names into ResultSet.values() rather than columns.

I might have done this all the wrong way, but it's a start and I'm happy to learn the right way to get this branch landed.

Revision history for this message

Jamu Kakar (jkakar) wrote on 2010-04-15:

Can you please file a bug and link this branch to it? All Storm
branches should have an associated bug.

Revision history for this message

James Henstridge (jamesh) wrote on 2010-04-16:

If we want to support this feature, it should probably exclude all tuple finds. For example:

result = store.find((Company, Employee), Company.id == Employee.company_id)
ids = result.values('id')

Assuming both tables have an 'id' column, which one am I referring to? It seems better to limit the feature to the simple situations where we can handle it unambiguously.

review: Needs Fixing

Revision history for this message

James Henstridge (jamesh) wrote on 2010-04-16:

And a follow up: to be consistent with Store.find(), I'd suggest using getattr(find_spec.default_cls, name) to convert names to column objects.

In the tuple find case, default_cls will be None, so you should check for this and raise FeatureError.

Revision history for this message

Jamu Kakar (jkakar) wrote on 2010-04-16:

[1]

Can you please add docstrings to the tests. Also, the @param for
'columns' in the docstring for ResultSet.values needs to be updated.

[2]

+ def test_find_values_by_column_name(self):

There are several different cases being exercised in this test. I
recommend you break it up into several smaller tests.

[3]

+ # If more than one column in the result set has the same name,
+ # the first will be chosen.
+ result = self.store.find(
+ (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
+ result = result.config(distinct=True).order_by(Foo.id)
+ values = result.values('id')
+ self.assertEquals(list(values), [10, 20])

I agree with James that this case should raise FeatureError. We
shouldn't be making guesses about which column the user is
specifying if the choice is ambiguous.

[4]

+ def test_find_multiple_values_by_column_name(self):

This test could also be split into two, so that each test exercises
one specific behaviour (multiple columns, and mixed string and Column
values).

review: Needs Fixing

Revision history for this message

Gavin Panella (allenap) wrote on 2010-04-22:

Getting columns from names is now done by getting the columns from the
find set, using it to construct a dict of name -> [columns] and
mapping with that. When the result of the mapping is a list of length
2 or more then it raises FeatureError. In this way it will work
naturally with tuple finds, and allow values() to (not quite, there's
another issue) yield expression values.

I've also changed it to copy the select before modifying it. I don't
know how necessary this is, but the code before looked a bit suspect
to my untrained eye.

Also, when variable_factory is not found on the column, it defaults to
Variable. This means that it's possible to do things like
results.values(Sum(Table.price)).

Okay, having said all that, if this is too fancy, or liable to cause
issues, I'll go for the approach in get_where_for_args().

However, if this approach is okay, a couple of notes:

- Either or both of the new methods ResultSet._get_column_name_map()
and ResultSet._map_column_names() could be moved to FindSpec. Would
that be a more natural home for them?

- Should _map_column_names() be use in order_by() and group_by()?

post-review.diff

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2010-04-22:

Gavin,

I don't quite see the motivation for this feature. Using the actual class attributes is how it works pretty much everywhere else in Storm, and doesn't look like a big burden when compared to the column names.

If you have a huge class name, you can easily do something like this:

C = MyHugeClassName

And then have exactly the same call length:

resultset.values(C.id, C.name)

vs.

resultset.values("id", "name")

With the advantage that the former, in case of errors, will blow up as syntax errors early, rather than SQL exceptions.

The import fascist isn't a great reason to add support like this in Storm either. Having a result set in the code means you have access to the model objects. What would be the reason of preventing access to the object's class in a situation where you have the object and a *result set* (which allows one to change column values without any checks).

Then, there's a significant additional cost being introduced in a place which used to be pretty lightweight.

For these reasons, I don't feel like it's an improvement over how it works today.

review: Disapprove

Revision history for this message

Gavin Panella (allenap) wrote on 2010-04-23:

> Gavin,
>
> I don't quite see the motivation for this feature. Using the actual class
> attributes is how it works pretty much everywhere else in Storm, and doesn't
> look like a big burden when compared to the column names.

This patch also lets you specify an aliased column, which could be an
expression rather than a plain column on a table. Perhaps there is a
better way of achieving this.

>
> If you have a huge class name, you can easily do something like this:
>
> C = MyHugeClassName
>
> And then have exactly the same call length:
>
> resultset.values(C.id, C.name)
>
> vs.
>
> resultset.values("id", "name")

Brevity wasn't my concern in this patch :)

> With the advantage that the former, in case of errors, will blow up as syntax
> errors early, rather than SQL exceptions.

Actually, somewhat amusingly, you'll get the string for the invalid
column back instead. So:

list(store.find(Foo.id).values('frooty'))

returns:

['frooty', 'frooty', 'frooty']

But that's possible to remedy.

>
> The import fascist isn't a great reason to add support like this in Storm
> either. Having a result set in the code means you have access to the model
> objects. What would be the reason of preventing access to the object's class
> in a situation where you have the object and a *result set* (which allows one
> to change column values without any checks).

In Launchpad result sets are often returned from methods called on a
secured utility, so browser code and script code does not have access
to the model class.

Any code that uses the objects materialized from a result set must
know the name of the attributes its interested in, so it makes sense
that it could ask for the values of those attributes across the rows
defined by the result set.

My use-case is a script with lots of transactions. A result set
defining the interesting rows is obtained early on from a secured
utility and is used in many separate transactions. Currently there can
be as many as ~16000 interesting rows. I want to avoid materializing
these rows into model objects until they're absolutely needed, because
it's slow and the transaction killer is merciless, but sometimes the
code does need one or two attributes from the whole set.

Anyway, that was my reasoning, but, as earlier, there may be a better
way to do it. I could, for example, add more methods on the secured
utility to return different information from the result set for
me. It's not really the way I'd like to structure the code but it only
offends my taste a little bit ;)

>
> Then, there's a significant additional cost being introduced in a place which
> used to be pretty lightweight.

I haven't measured it, but I guess that the cost is still small
compared to compiling the query and doing a round-trip to the
database.

>
> For these reasons, I don't feel like it's an improvement over how it works
> today.

Fair enough, I'm not blocked on this. Thanks for looking at it! I've
learnt a lot about Storm from doing this.

> Gavin,
> 
> I don't quite see the motivation for this feature.  Using the actual class
> attributes is how it works pretty much everywhere else in Storm, and doesn't
> look like a big burden when compared to the column names.

This patch also lets you specify an aliased column, which could be an
expression rather than a plain column on a table. Perhaps there is a
better way of achieving this.

> 
> If you have a huge class name, you can easily do something like this:
> 
>     C = MyHugeClassName
> 
> And then have exactly the same call length:
> 
>     resultset.values(C.id, C.name)
> 
> vs.
> 
>     resultset.values("id", "name")

Brevity wasn't my concern in this patch :)

> With the advantage that the former, in case of errors, will blow up as syntax
> errors early, rather than SQL exceptions.

Actually, somewhat amusingly, you'll get the string for the invalid
column back instead. So:

list(store.find(Foo.id).values('frooty'))

returns:

['frooty', 'frooty', 'frooty']

But that's possible to remedy.

> 
> The import fascist isn't a great reason to add support like this in Storm
> either.  Having a result set in the code means you have access to the model
> objects.  What would be the reason of preventing access to the object's class
> in a situation where you have the object and a *result set* (which allows one
> to change column values without any checks).

In Launchpad result sets are often returned from methods called on a
secured utility, so browser code and script code does not have access
to the model class.

> 
> Then, there's a significant additional cost being introduced in a place which
> used to be pretty lightweight.

I haven't measured it, but I guess that the cost is still small
compared to compiling the query and doing a round-trip to the
database.

> 
> For these reasons, I don't feel like it's an improvement over how it works
> today.

Fair enough, I'm not blocked on this. Thanks for looking at it! I've
learnt a lot about Storm from doing this.

Unmerged revisions

362. By Gavin Panella on 2010-04-22: Make ResultSet.values() work with expressions.
361. By Gavin Panella on 2010-04-20: Add docstrings to new private methods.
360. By Gavin Panella on 2010-04-20: Fix some lint.
359. By Gavin Panella on 2010-04-20: Make the implementation of _get_column_name_map() more readable and obvious.
358. By Gavin Panella on 2010-04-20: Update the docstring for ResultSet.values().
357. By Gavin Panella on 2010-04-16: Raise an error when the column choice is ambiguous, copy the select before mutating it, and break up tests.
356. By Gavin Panella on 2010-04-15: ResultSet.values() can now accept column names as well as columns themselves.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Colin Watson

Gavin Panella

James Henstridge

Michael Hudson-Doyle

Sidnei da Silva

to status/vote changes:

Richard Boulton

 === modified file 'storm/store.py'
 --- storm/store.py	2010-04-15 13:31:08 +0000
 +++ storm/store.py	2010-04-22 09:32:10 +0000
@@ -540,7 +540,6 @@
          else:
              cached_primary_vars = obj_info["primary_vars"]
--            primary_key_idx = cls_info.primary_key_idx
              changes = self._get_changes_map(obj_info)
@@ -1238,34 +1237,63 @@
          """Get the sum of all values in an expression."""
          return self._aggregate(Sum, expr, expr)
++    def _get_column_name_map(self):
++        """Return a mapping of column name to lists of columns."""
++        columns, tables = self._find_spec.get_columns_and_tables()
++        column_map = {}
++        for column in columns:
++            if isinstance(column, (Alias, Column)):
++                if column.name in column_map:
++                    column_map[column.name].append(column)
++                else:
++                    column_map[column.name] = [column]
++        return column_map
++
++    def _map_column_names(self, columns):
++        """Attempt to map column names to actual columns."""
++        column_map = self._get_column_name_map()
++        for column in columns:
++            if column in column_map:
++                column_name, column_list = column, column_map[column]
++                if len(column_list) != 1:
++                    raise FeatureError("Ambiguous column: %s" % column_name)
++                [column] = column_list
++            if isinstance(column, Alias):
++                column = column.expr
++            yield column
++
      def values(self, *columns):
          """Retrieve only the specified columns.
          This does not load full objects from the database into Python.
--        @param columns: One or more L{storm.expr.Column} objects whose
--            values will be fetched.
++        @param columns: One or more L{storm.expr.Column} objects, or
++            column names, whose values will be fetched. If column
++            names are given, each must refer unambiguously to a named
++            column or alias in the result set.
          @return: An iterator of tuples of the values for each column
              from each matching row in the database.
          """
          if not columns:
              raise FeatureError("values() takes at least one column "
                                 "as argument")
--        select = self._get_select()
--        column_map = dict(
--            (column.name, column)
--            for column in reversed(select.columns)
--            if isinstance(column, Column))
--        columns = [column_map.get(column, column) for column in columns]
++        columns = list(self._map_column_names(columns))
++        # replace_columns() can lose ordering so it's not good here.
++        select = copy(self._get_select())
          select.columns = columns
          result = self._store._connection.execute(select)
--        if len(columns) == 1:
--            variable = columns[0].variable_factory()
++        variable_factories = [
++            getattr(column, 'variable_factory', Variable)
++            for column in columns]
++        variables = [
++            variable_factory()
++            for variable_factory in variable_factories]
++        if len(variables) == 1:
++            [variable] = variables
              for values in result:
                  result.set_variable(variable, values[0])
                  yield variable.get()
          else:
--            variables = [column.variable_factory() for column in columns]
              for values in result:
                  for variable, value in zip(variables, values):
                      result.set_variable(variable, value)
@@ -1370,7 +1398,6 @@
              def get_column(column):
                  return obj_info.variables[column].get()
          objects = []
--        cls = self._find_spec.default_cls_info.cls
          for obj_info in self._store._iter_alive():
              try:
                  if (obj_info.cls_info is self._find_spec.default_cls_info and
 === modified file 'tests/store/base.py'
 --- tests/store/base.py	2010-04-15 13:31:08 +0000
 +++ tests/store/base.py	2010-04-22 10:30:27 +0000
@@ -1002,25 +1002,51 @@
          self.assertEquals([type(value) for value in values],
                            [unicode, unicode, unicode])
++    def test_find_values_with_expression(self):
++        """
++        An expression can be passed to ResultSet.values().
++        """
++        values = self.store.find(Foo.id).values(Sum(Foo.id))
++        self.assertEquals(list(values), [60])
++
      def test_find_values_by_column_name(self):
++        """
++        ResultSet.values() can accept column names which are mapped to
++        columns in the find spec.
++        """
          result = self.store.find(Foo).order_by(Foo.id)
          values = result.values('id')
          self.assertEquals(list(values), [10, 20, 30])
          values = result.values('title')
          self.assertEquals(list(values), ["Title 30", "Title 20", "Title 10"])
--        # If more than one column in the result set has the same name,
--        # the first will be chosen.
++
++    def test_find_values_by_alias_name(self):
++        """
++        Alias names can also be passed to ResultSet.values().
++        """
++        result = self.store.find(Alias(Foo.id, 'foo')).order_by(Foo.id)
++        values = result.values('foo')
++        self.assertEquals(list(values), [10, 20, 30])
++
++    def test_find_values_by_alias_name_to_expression(self):
++        """
++        Alias names can be passed to ResultSet.values(), even if the
++        aliased column is actually an expression.
++        """
++        result = self.store.find(Alias(Sum(Foo.id), 'foo')).order_by(Foo.id)
++        values = result.values('foo')
++        self.assertEquals(list(values), [60])
++
++    def test_find_values_by_ambiguous_column_name(self):
++        """
++        If more than one column in the find spec has the same name,
++        FeatureError is raised.
++        """
          result = self.store.find(
              (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
          result = result.config(distinct=True).order_by(Foo.id)
          values = result.values('id')
--        self.assertEquals(list(values), [10, 20])
--        # The name is only matched against columns, not aliases or
--        # other expressions.
--        result = self.store.find(
--            (Alias(SQL('1'), 'id'), Foo.id)).order_by(Foo.id)
--        values = result.values('id')
--        self.assertEquals(list(values), [10, 20, 30])
++        self.assertRaises(FeatureError, list, values)
      def test_find_multiple_values(self):
          result = self.store.find(Foo).order_by(Foo.id)
@@ -1031,12 +1057,22 @@
                             (30, "Title 10")])
      def test_find_multiple_values_by_column_name(self):
++        """
++        More than one column name can be given to ResultSet.values();
++        it will map them all to columns.
++        """
          result = self.store.find(Foo).order_by(Foo.id)
          values = result.values('id', 'title')
          expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
          self.assertEquals(list(values), expected)
--        # Columns and column names can be mixed.
++
++    def test_find_multiple_values_by_column_and_column_name(self):
++        """
++        Columns and column names can be mixed.
++        """
++        result = self.store.find(Foo).order_by(Foo.id)
          values = result.values(Foo.id, 'title')
++        expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
          self.assertEquals(list(values), expected)
      def test_find_values_with_no_arguments(self):

 === modified file 'storm/store.py'
 --- storm/store.py	2010-02-10 11:29:39 +0000
 +++ storm/store.py	2010-04-22 10:53:24 +0000
@@ -540,7 +540,6 @@
          else:
              cached_primary_vars = obj_info["primary_vars"]
--            primary_key_idx = cls_info.primary_key_idx
              changes = self._get_changes_map(obj_info)
@@ -1246,29 +1245,63 @@
          """Get the sum of all values in an expression."""
          return self._aggregate(Sum, expr, expr)
++    def _get_column_name_map(self):
++        """Return a mapping of column name to lists of columns."""
++        columns, tables = self._find_spec.get_columns_and_tables()
++        column_map = {}
++        for column in columns:
++            if isinstance(column, (Alias, Column)):
++                if column.name in column_map:
++                    column_map[column.name].append(column)
++                else:
++                    column_map[column.name] = [column]
++        return column_map
++
++    def _map_column_names(self, columns):
++        """Attempt to map column names to actual columns."""
++        column_map = self._get_column_name_map()
++        for column in columns:
++            if column in column_map:
++                column_name, column_list = column, column_map[column]
++                if len(column_list) != 1:
++                    raise FeatureError("Ambiguous column: %s" % column_name)
++                [column] = column_list
++            if isinstance(column, Alias):
++                column = column.expr
++            yield column
++
      def values(self, *columns):
          """Retrieve only the specified columns.
          This does not load full objects from the database into Python.
--        @param columns: One or more L{storm.expr.Column} objects whose
--            values will be fetched.
++        @param columns: One or more L{storm.expr.Column} objects, or
++            column names, whose values will be fetched. If column
++            names are given, each must refer unambiguously to a named
++            column or alias in the result set.
          @return: An iterator of tuples of the values for each column
              from each matching row in the database.
          """
          if not columns:
              raise FeatureError("values() takes at least one column "
                                 "as argument")
--        select = self._get_select()
++        columns = list(self._map_column_names(columns))
++        # replace_columns() can lose ordering so it's not good here.
++        select = copy(self._get_select())
          select.columns = columns
          result = self._store._connection.execute(select)
--        if len(columns) == 1:
--            variable = columns[0].variable_factory()
++        variable_factories = [
++            getattr(column, 'variable_factory', Variable)
++            for column in columns]
++        variables = [
++            variable_factory()
++            for variable_factory in variable_factories]
++        if len(variables) == 1:
++            [variable] = variables
              for values in result:
                  result.set_variable(variable, values[0])
                  yield variable.get()
          else:
--            variables = [column.variable_factory() for column in columns]
              for values in result:
                  for variable, value in zip(variables, values):
                      result.set_variable(variable, value)
@@ -1373,7 +1406,6 @@
              def get_column(column):
                  return obj_info.variables[column].get()
          objects = []
--        cls = self._find_spec.default_cls_info.cls
          for obj_info in self._store._iter_alive():
              try:
                  if (obj_info.cls_info is self._find_spec.default_cls_info and
 === modified file 'tests/store/base.py'
 --- tests/store/base.py	2010-04-16 07:12:13 +0000
 +++ tests/store/base.py	2010-04-22 10:53:24 +0000
@@ -30,7 +30,8 @@
  from storm.properties import PropertyPublisherMeta, Decimal
  from storm.variables import PickleVariable
  from storm.expr import (
--    Asc, Desc, Select, LeftJoin, SQL, Count, Sum, Avg, And, Or, Eq, Lower)
++    Alias, And, Asc, Avg, Count, Desc, Eq, LeftJoin, Lower, Or, SQL, Select,
++    Sum)
  from storm.variables import Variable, UnicodeVariable, IntVariable
  from storm.info import get_obj_info, ClassAlias
  from storm.exceptions import *
@@ -1001,6 +1002,52 @@
          self.assertEquals([type(value) for value in values],
                            [unicode, unicode, unicode])
++    def test_find_values_with_expression(self):
++        """
++        An expression can be passed to ResultSet.values().
++        """
++        values = self.store.find(Foo.id).values(Sum(Foo.id))
++        self.assertEquals(list(values), [60])
++
++    def test_find_values_by_column_name(self):
++        """
++        ResultSet.values() can accept column names which are mapped to
++        columns in the find spec.
++        """
++        result = self.store.find(Foo).order_by(Foo.id)
++        values = result.values('id')
++        self.assertEquals(list(values), [10, 20, 30])
++        values = result.values('title')
++        self.assertEquals(list(values), ["Title 30", "Title 20", "Title 10"])
++
++    def test_find_values_by_alias_name(self):
++        """
++        Alias names can also be passed to ResultSet.values().
++        """
++        result = self.store.find(Alias(Foo.id, 'foo')).order_by(Foo.id)
++        values = result.values('foo')
++        self.assertEquals(list(values), [10, 20, 30])
++
++    def test_find_values_by_alias_name_to_expression(self):
++        """
++        Alias names can be passed to ResultSet.values(), even if the
++        aliased column is actually an expression.
++        """
++        result = self.store.find(Alias(Sum(Foo.id), 'foo')).order_by(Foo.id)
++        values = result.values('foo')
++        self.assertEquals(list(values), [60])
++
++    def test_find_values_by_ambiguous_column_name(self):
++        """
++        If more than one column in the find spec has the same name,
++        FeatureError is raised.
++        """
++        result = self.store.find(
++            (Foo.id, FooValue.id), FooValue.foo_id == Foo.id)
++        result = result.config(distinct=True).order_by(Foo.id)
++        values = result.values('id')
++        self.assertRaises(FeatureError, list, values)
++
      def test_find_multiple_values(self):
          result = self.store.find(Foo).order_by(Foo.id)
          values = result.values(Foo.id, Foo.title)
@@ -1009,6 +1056,25 @@
                             (20, "Title 20"),
                             (30, "Title 10")])
++    def test_find_multiple_values_by_column_name(self):
++        """
++        More than one column name can be given to ResultSet.values();
++        it will map them all to columns.
++        """
++        result = self.store.find(Foo).order_by(Foo.id)
++        values = result.values('id', 'title')
++        expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
++        self.assertEquals(list(values), expected)
++
++    def test_find_multiple_values_by_column_and_column_name(self):
++        """
++        Columns and column names can be mixed.
++        """
++        result = self.store.find(Foo).order_by(Foo.id)
++        values = result.values(Foo.id, 'title')
++        expected = [(10, "Title 30"), (20, "Title 20"), (30, "Title 10")]
++        self.assertEquals(list(values), expected)
++
      def test_find_values_with_no_arguments(self):
          result = self.store.find(Foo).order_by(Foo.id)
          self.assertRaises(FeatureError, result.values().next)