On 12-09-10 03:55 PM, Barry Warsaw wrote:
> Thanks again for another fine refactoring and code deletion branch.
> Eventually this sweater will look lean and mean! :)

Thanks for reviewing so promptly. If it wasn't for your timely advice I 
wouldn't be able to move so swiftly ;-)

>> +        self.assertEqual(
>> +            '.cache/gwibber/avatars/4f37e5dc9d38391db1728048344c3ab5ff8cecb2',
>> +            path[-63:])
>
> There are a few problems with this test, which I am happy to fix so that you
> can continue with the more important work of working on the dispatcher.
>
> The first is that because this test uses the user's actual cache, it's success
> can be affected by previous runs, or even the operational environment of the
> user.  There are a couple of ways to fix this, but what I like best is to use
> a temporary directory to store the cache for the test run.  This can be
> accomplished by mocking the base path to the cache, setting up the tempdir in
> a setUp() and removing it in a tearDown().

Surely this critique refers to the *following* code block and not to 
what is immediately quoted above? The get_path test specifically avoids 
testing the specific user's home dir, and doesn't do anything but hash a 
string.

> Probably also a better way to test the expected path is to do an
> os.path.split() and check just the components you care about.

That would be cleaner, yeah.

>> +        # Confirm that the resulting cache image is actually a PNG.
>> +        with open(path, 'rb') as raw:
>> +            self.assertEqual(raw.read(8), bytes.fromhex('89504E470D0A1A0A'))
>
> I want to add another test, which is that if the cache is already filled,
> another download does not happen.

Yes, and I see you already mocked out the users actual cache dir, 
allowing the test to avoid stale results from previous test runs, which 
is a problem that hadn't occurred to me until after submitting the mp.

>> +class Avatar:
>> +    @staticmethod
>> +    def get_path(url):
>
> Of course, this could also be a module global function.  It's a tough call,
> with arguments on either side, but since the function is probably not needed
> outside this module (tests don't count), I'd personally make it a non-public
> module global.

I suspected you might have concerns about a class that contains only two 
staticmethods. There is still a del_avatar method left to port (was 
going to leave it for later since it didn't seem immediately critical), 
and I was thinking that Avatar class could grow into a more 
general-purpose 'AvatarManager' kind of thing, but for now I just wanted 
it to be a class for the sake of being an easy little bundle for 
importing (eg, import Avatar and you get both methods easily).

> The entire reason is as a useful, Pythonic, recognizable clue for future
> readers and developers.   Think about it this way: you have to port some
> unfamiliar code base (oh, I don't know, take a guess :) and you're trying to
> decide whether some functionality must be preserved or not.  You don't really
> know if there are any third party clients out there of the code, so you don't
> know whether a change of yours will break something or not.
>
> Even in a language like Python, marking non-public names with a single leading
> underscore tells clients "You can use this if you must, but the risk is
> yours.  If I change it in a future version, and it breaks your code, tough
> luck for you."  When you use public methods (e.g. no leading underscore),
> you're implicitly agreeing not to break the API.

Hmmmm, that point is slowly starting to sink in. I'll fully appreciate 
it one day soon I'm sure ;-)

>> +        if not os.path.isdir(CACHE_DIR):
>> +            os.makedirs(CACHE_DIR)
>> +        return os.path.join(CACHE_DIR, sha1(url.encode('utf-8')).hexdigest())
>
> There's one problem with this protocol.  The file is saved in a hash of the
> url used to download it, but without regard to the contents of the file.
>
> Imagine that an image changes but it is still available under the original
> url.  How would gwibber-service ever notice the new version of the image?

Well, it wouldn't, but this is a direct port of the previous behavior, 
so this bug you've identified was there from before. That is, unless 
there's perhaps something somewhere else in the codebase I haven't found 
that expires or deletes cache files. I did a grep -R and all I came up 
with was one spot where it deletes cache files that fail to load, which 
makes me shudder to think that there's some *other* code somewhere else 
that manages to write invalid image data into the cache files... 
(hopefully gdkpixbuf smooths over most of that mess for us)

Like I said, later on we can expand Avatar into AvatarManager and 
perhaps it can just do a quick scan of the cache dir and delete any 
files that are older than a month or something (some arbitrarily large 
time value such that it doesn't add a lot of overhead for downloading 
new avatars, but does make it possible for people to change their avatars).

Keep in mind, this would only be a bug for protocols that use a 
canonical URL for the user's avatar image. I'm not actually aware of any 
like this; if you look at the actual avatar URLs, you find that most of 
them contain some kind of hash of the image data, so that when somebody 
picks a new avatar, they get a new avatar URL, and thus the problem 
stops being "people can't change their avatar image ever" but reduces 
simply to "cache fills up with stale avatars, eventually filling the 
user's disk at some point after the heat death of the universe."

There is a log.debug statement there that triggers whenever it downloads 
a new URL, so perhaps we could monitor those logs and see what kinds of 
URLs go through it. That would necessarily come after integration 
testing, though.

> I'll put an XXX in the code when I merge it, but there are a couple of
> considerations to give.  First, the hash could be calculated from the file
> contents, which you do get from the Downloader.

This could never work because the whole point of this cache is to stop 
from having to re-download the avatar. If you have to download the 
avatar in order to hash it in order to check that it's already in the 
cache, then you've already lost.

> Second, there probably needs
> to be an API for cache eviction, e.g. after some amount of time, any existing
> cache entry would time out, and you'd re-download the image anyway.  Or, you
> can provide an API for explicitly clearing the cache.

Yes, this I definitely agree with, we'll need some cache expiry logic at 
some point.

>> +    @staticmethod
>> +    def get_image(url):
>> +        local_path = Avatar.get_path(url)
>> +        if not os.path.exists(local_path) or os.stat(local_path).st_size == 0:
>
> This is a typical "look before you leap" style, which we usually try to avoid
> in Python, in favor of an "ask forgiveness" type style.  Or in other words,
> just do the os.state() and catch the exception that might occur if the file is
> missing.

Darn, I had actually written it as try/except the first time around, but 
then I changed it. I see you changed it back, but did it better. The 
reason I didn't like it that way originally was that I wasn't quite 
familiar enough with that errno stuff to be confident that I wasn't 
catching too general of an exception, and also it felt a bit contorted 
the way I was attempting to conflate the case of the missing file with 
empty file.

I like the way you have it quite a bit though, the except: size = 0 
embodies the perfect semantics.

> === modified file 'gwibber/microblog/util/resources.py'
> --- gwibber/microblog/util/resources.py	2012-08-21 22:53:27 +0000
> +++ gwibber/microblog/util/resources.py	2012-09-10 20:08:21 +0000
>
> Yay for code deletion!

Easily my favorite part! ;-)