Merge lp:~dylanmccall/harvest/harvest-dylan-m into lp:harvest
Status: | Merged |
---|---|
Merged at revision: | 186 |
Proposed branch: | lp:~dylanmccall/harvest/harvest-dylan-m |
Merge into: | lp:harvest |
Diff against target: |
806 lines (+706/-1) 11 files modified
INSTALL (+1/-1) harvest/common/url_tools.py (+32/-0) harvest/filters/__init__.py (+23/-0) harvest/filters/containers.py (+97/-0) harvest/filters/filters.py (+318/-0) harvest/opportunities/filters.py (+53/-0) harvest/opportunities/urls.py (+4/-0) harvest/opportunities/views.py (+34/-0) harvest/opportunities/wrappers.py (+90/-0) harvest/settings.py (+4/-0) harvest/templates/opportunities/opportunities_filter.html (+50/-0) |
To merge this branch: | bzr merge lp:~dylanmccall/harvest/harvest-dylan-m |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Daniel Holbach | Approve on 2010-07-05 | ||
James Westby | 2010-06-23 | Approve on 2010-06-23 | |
Review via email:
|
Description of the change
Introducing a new view at /opportunities/
This view presents all packages and all opportunities tracked by Harvest, where the user can filter them using a list of properties. First the list of source packages is filtered, then the list of opportunities related to those packages is filtered.
User interface is just a proof of concept so far, with very few avenues for interaction (beyond toggling some filters in a rudimentary way). The back-end lets us quickly define new properties to filter packages and opportunities by, and these are instantly reflected in the output.
Everything happens through static pages at the moment, with parameters passed in the query string. Any interaction that involves clicking a link, including expanding a package to show its related opportunities, is a full page load away.
Dylan McCall (dylanmccall) wrote : | # |
Daniel Holbach (dholbach) wrote : | # |
> Oh, I forgot. This also adds a dependency on debug-toolbar. That change can be
> reverted (it's just in settings.py and INSTALL), but it also doesn't hurt.
> It's very useful for tracking how changes affect performance.
>
> Note that the middleware adds a whole ton of data to the page as it is sent
> (when enabled), so it does have a performance hit of its own. It only turns on
> if it's being accessed through 127.0.0.1.
Maybe we can ask people to add that stuff to local_settings.py or we require YES_I_AM_
About the rest of the new code: HOLY COW! You put quite a bit of work into this! :)
FilterContainer:
~~~~~~~~~~~~~~~~
minor only, but the description of set_filters() says "Add a set of filters to be children of this one." although self.filters_dict is reset at the beginning. Would an add_*() function be useful or should the description just be modified a bit?
I think I'd rename get() to find(), not sure which is more common. Both work for me. :-)
113 + #the first bit is the one this instance stores
114 + if name_bits[0] in self.filters_dict:
115 + result = self.filters_
116 + else:
117 + result = None
Lines 116 and 117 can be removed.
119 + if isinstance(result, FilterContainer) and len(name_bits)>1:
120 + return result.
121 + else:
122 + #we have reached the end of the list. Return the result (which may be None)
123 + return result
121 can be removed and 123 unidented one level.
FilterSystem:
~~~~~~~~~~~~~
Does every call to get_parameters() produce a copy? Do we need a get()ter there?
Filter:
~~~~~~~
Do we need explicit getters and setters() for the very simple members of Filter?
Do you think it's useful to add a few very quick examples to some of descriptions? ie: to me it wasn't immediately obvious what the "toggling" is about.
Daniel Holbach (dholbach) wrote : | # |
(minor) Maybe FilterSystem.
Can FilterSystem and FilterContainer be merged? Are they used separately and in different ways? I must admit FilterSystem is not quite clear to me yet. Still digging through it. :)
Daniel Holbach (dholbach) wrote : | # |
Filter: I'm not sure what get_value() and get_value_string*() and get_parameters_
Daniel Holbach (dholbach) wrote : | # |
Can we get PackageListWrap
James Westby (james-w) wrote : | # |
Wow!
Thanks Dylan.
31 + url_params = list()
seems to be superfluous.
Would there ever be a desire for
36 +def current_
to remove parameters? That can be something that is added later if needed
though.
46 \ No newline at end of file
453 \ No newline at end of file
673 \ No newline at end of file
Fixing those would be nice.
149 + #note that this currently stores parameters that aren't relevant to the FilterSystem
150 + #we need to do that so we don't eat them in get_url_
151 + self.set_
is that still needed with the code to get the original params in get_url_
Where you mark a method as abstract and don't implement in then consider
raise NotImplementedE
which will give an error if the subclass doesn't implement it, instead of
silently doing nothing.
244 + @param choices_dict: Optional value to be used instead of internal value
that's not a parameter to that method, same in the next method.
Where you use mark_safe you should add a comment stating why that string is
known to be safe.
Do the default parameters make sense? I'm not sure we should be defaulting to
"ged" at least.
Overall this is some seriously great code, nice work!
Thanks,
James
Dylan McCall (dylanmccall) wrote : | # |
Yay! Thanks for the comments. I'll respond to them here.
There is a bit of superfluous initialization (by Python standards) going
on, indeed. Thanks for pointing them out! I was still in Vala mode when
I started this, desperately clinging to the comfort of strong typing ;)
So, in order from the top!
---FilterContai
under filters/filters.py, and FilterSystem under
filters go inside of. It should be The object that the rest of
the application uses to manipulate filters. The current HTTP
request is handed to that object and it sorts out the rest.
Thankfully, that means the thing can be poked at repeatedly
until the wackiness subsides, without affecting the rest of the
code.
FilterGroup is used to group filters that work with a specific
collection of objects (there's a pkg and opp group at the
moment). It is possible to enable and disable filters inside a
in the past. It will still only ever be called once, but I
needn't enforce that; it just adds complexity for no particular
reason. I renamed set_filters to add_filters and I'm
functions starting with set_ and get_ were making my head hurt,
anyway.
used anywhere any more. I'll strip that out. Similar
lot_ of copying going on for one request. It can probably be
sped up somewhere.
method of many other types, even though it is definitely not
like those. Good point. Changed it to find()!
Daniel, you're awesome at naming functions! :)
Thanks for spotting the leftover set_parameters stuff, James.
It's all gone now! (*phew*)
---Filters---
The default parameters are just for testing purposes, and indeed
make no sense. I'm using gedit as a consistent query to test
against, so when it looks exactly like the mockup I'll know! (It
also needs some kind of weird default, because it doesn't handle
big lists of packages very gracefully).
pkg:name is set as such because there is no interface to change
it yet, except editing the URL. Normally, of course, it would be
blank.
Javascript is easy, but doing it without needs a pretty
convoluted HTML form. I may just do the Javascript solution for
now).
- 201. By Dylan McCall on 2010-06-24
-
Cleanup, as discussed at <https:/
/code.edge. launchpad. net/~dylanmccal l/harvest/ harvest- dylan-m/ +merge/ 28262> Added newlines at end of files, where they were missing.
harvest/filters:
Finished docstrings
Removing "#abstract" comment where it makes no senseharvest/
filters/ containers. py:
Renamed some methods for clarityharvest/
filters/ filters. py:
raising exception for methods that need to be implemented
Rejigged get_values, get_value_string and get_parameters_for_value. Now a filter can be serialized and it's all a little bit simpler - 202. By Dylan McCall on 2010-06-27
-
Fixed a bug where serialize_value, given an empty set for value, used the current value instead.
Daniel Holbach (dholbach) wrote : | # |
Thanks a lot for your continued work on this and sorry for not replying earlier. I've been quite busy with other things. Sorry.
> The setters and getters found themselves in such quantity
> because the implementation of these things can fluctuate. A lot
> of that fluctuation has been reduced with the version you're
> seeing here, though. The other reason is I find it elegant to
> have the outside world only access an object through methods;
> never through properties. (That way the rules for accessing a
> given property are self-documenting). If that's silly, let me
> know!
In cases where setters and getters just do the minimal amount of work ("self.bla = bla" and "return bla"), I'd expect that accessing the object's property would just work. If additional work needs to be done, I'd try to do stuff in the constructor or in a function that does some kind of computation or other work.
I don't know if that's applicable here.
> The reason I have visible_packages and hidden_packages in
> wrappers.py is that there will be a bunch of other stuff there
> in the near future. For example, stuff that summarizes the two
> collections of packages. Some of this demands processing that
> may or may not happen, so it makes sense to call the appropriate
> functions from the template as appropriate (I think…).
Can you maybe explain what kind of summary you'd like to see there?
I personally feel it's a bit more work to get it right, but it might be beneficial to tune the queries in views.py right, which would also save us from loading too many objects into memory.
> PackageWrapper is similar; it has little at the moment, but in
> the future could be used to access what categories of
> opportunities lie within, including hidden ones. Throwing that
> logic in the template (even with template tags) feels like a
> horrible act, and probably wouldn't be as efficient.
> Having said that, PackageWrapper feels a bit more wrong because
> there is already a perfectly good SourcePackage object we can
> add data to. I just feel squeamish throwing that stuff directly
> at a SourcePackage model instance. I know it won't save it to
> the database or anything, but it feels wrong somehow.
>
> I kept an eye on performance when I put that together. It does
> grab the entire list of source packages from the database and
> turn them into new Python objects, but in the end we're just
> doing the same database hit that would happen later. Debug
> Toolbar says we hit the database once for each model; nothing
> blatantly redundant is happening. Always room for better
> performance, though!
I was under the impression that we wouldn't have to have all opportunities and source packages in memory if the query just asked for a specific subset. Maybe I'm wrong. I'll go and find out.
> Okay, that's a lot of writing, but this has been really helpful to get
> my thoughts straightened. It feels a lot smoother than it did this
> morning!
Y...
Dylan McCall (dylanmccall) wrote : | # |
I did some pondering and poking, and I'm further convinced to change my use of accessors. Turns out the most Pythonic way is to use plain instance variables and implement property() as appropriate, which lets us specify our own getters and setters for those variables (or just a getter). That Java course nearly corrupted me!
There is a bit of divergence in filters.py in the gsoc-client-stuff branch, so I'm a little reluctant to make that change in this branch. (Merge conflicts — even simple ones — invariably give me headaches). Are you okay if I do it in a new branch from gsoc-client-stuff?
> I was under the impression that we wouldn't have to have all
> opportunities and source packages in memory if the query just asked
> for a specific subset. Maybe I'm wrong. I'll go and find out.
I mean to say that all the source packages which have been met by the package filtergroup are examined in a similar way. It isn't an extensive thing, but we end up accessing the queryset, which gives us a list of all the references it has found, both hidden and visible packages. (The opportunities remain a queryset until the template asks for them). I should move that to be smarter about hidden packages, only storing their count via a query.
With gsoc-client-stuff, I'm working on asynchronously loading results, package info and hidden packages. This means implementing new views on the same query data as before. So, that's why I am a little picky about having one object to manage all that; all these views can just talk to the same thing in a somewhat balanced way. Having said that, it's mostly a stop-gap and if all goes well it should be really easy to yank it out for something quicker and more cool once the dust is settled. To be honest I'm not sure what the final result is going to look like in views.py, so I am reluctant to devote too much to a specific approach yet ;)
Daniel Holbach (dholbach) wrote : | # |
> That Java course nearly
> corrupted me!
BIG HUGS! I'm sure you'll survive, though! :-)
> There is a bit of divergence in filters.py in the gsoc-client-stuff branch, so
> I'm a little reluctant to make that change in this branch. (Merge conflicts —
> even simple ones — invariably give me headaches). Are you okay if I do it in a
> new branch from gsoc-client-stuff?
Sure. I'm also happy for small branches and small merge proposals to go in as it should be much quicker, easier to review, etc.
> > I was under the impression that we wouldn't have to have all
> > opportunities and source packages in memory if the query just asked
> > for a specific subset. Maybe I'm wrong. I'll go and find out.
>
> I mean to say that all the source packages which have been met by the package
> filtergroup are examined in a similar way. It isn't an extensive thing, but we
> end up accessing the queryset, which gives us a list of all the references it
> has found, both hidden and visible packages. (The opportunities remain a
> queryset until the template asks for them). I should move that to be smarter
> about hidden packages, only storing their count via a query.
As I told you in a separate discussion my main concern was merely the complexity of new classes we add. If we (and particularly new contributors) try to fix upcoming bugs, it'll be harder and harder to dive through various classes which sometimes have very similar names to figure out where something is done.
Having said that, we can surely merge this for now and simplify over time and as we get a better idea of what we want to get done.
> With gsoc-client-stuff, I'm working on asynchronously loading results, package
> info and hidden packages. This means implementing new views on the same query
> data as before. So, that's why I am a little picky about having one object to
> manage all that; all these views can just talk to the same thing in a somewhat
> balanced way. Having said that, it's mostly a stop-gap and if all goes well it
> should be really easy to yank it out for something quicker and more cool once
> the dust is settled. To be honest I'm not sure what the final result is going
> to look like in views.py, so I am reluctant to devote too much to a specific
> approach yet ;)
Gotcha. Thanks for your work on this and consideration.
Oh, I forgot. This also adds a dependency on debug-toolbar. That change can be reverted (it's just in settings.py and INSTALL), but it also doesn't hurt. It's very useful for tracking how changes affect performance.
Note that the middleware adds a whole ton of data to the page as it is sent (when enabled), so it does have a performance hit of its own. It only turns on if it's being accessed through 127.0.0.1.