Merge lp:~brandon-jiang-p/variety/variety into lp:~variety/variety/trunk

Proposed by Brandon
Status: Rejected
Rejected by: Peter Levi
Proposed branch: lp:~brandon-jiang-p/variety/variety
Merge into: lp:~variety/variety/trunk
Diff against target: 42 lines (+15/-3)
1 file modified
variety/Downloader.py (+15/-3)
To merge this branch: bzr merge lp:~brandon-jiang-p/variety/variety
Reviewer Review Type Date Requested Status
Peter Levi Disapprove
Review via email: mp+297839@code.launchpad.net

Commit message

Feature: avoid duplicate image when download.

Avoid to download duplicate images which have been downloaded before. For exisiting variety user, you may also consider to hash all previous images.

Description of the change

Feature: avoid duplicate image when download.

Avoid to download duplicate images which have been downloaded before. For exisiting variety user, you may also consider to hash all previous images.

To post a comment you must log in.
Revision history for this message
James Lu (jlu5) :
Revision history for this message
James Lu (jlu5) wrote :

I left a few comments in the diff below.

Revision history for this message
Peter Levi (peterlevi) wrote :

I have bigger conceptual problems with this proposal. First, md5 is a bad way to compare images as the slightest change in the metadata results in different hashes. It's as good as or worse than simply taking the download url, so why not skip the heavy computation step of hashing the whole image and just use the url. But much more importantly - total downloads size is already guarded by the download quota, there is no reason to introduce a second, stateful mechanism for the same. Also not all downloads end up as wallpapers, the user may have paused the changes. If the image source are consumed in the meantime (some sources are limited), we don't want to stagnate on the last downloaded images and never restart the sources. And also we are piling up a huge file of hashes that never shrinks and search in it. This gets slower with time.

review: Disapprove

Unmerged revisions

555. By Brandon

Reject duplicate image when download

avoid to download duplicate images which have been downloaded. For exisiting variety user, you may also consider to hash all previous images.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'variety/Downloader.py'
2--- variety/Downloader.py 2016-06-04 09:11:18 +0000
3+++ variety/Downloader.py 2016-06-19 08:44:59 +0000
4@@ -82,9 +82,9 @@
5 return None
6
7 try:
8- r = Util.request(image_url, stream=True)
9+ data = Util.fetch(image_url)
10 with open(local_filename, 'wb') as f:
11- Util.request_write_to(r, f)
12+ f.write(data)
13 except Exception, e:
14 logger.info(lambda: "Download failed from image URL: %s (source location: %s) " % (image_url, self.location))
15 raise e
16@@ -94,6 +94,19 @@
17 os.unlink(local_filename)
18 return None
19
20+ #check hash modi_v
21+ import hashlib
22+ logger.info(lambda: self.parent.real_download_folder)
23+ hash_store_location=os.path.join(self.parent.real_download_folder, "image_hash_store")
24+ file_hash=hashlib.md5(open(local_filename, 'rb').read()).hexdigest()
25+
26+ if os.path.exists(hash_store_location) and file_hash in open(hash_store_location).read():
27+ logger.info(lambda: "find same file hash. delete file")
28+ os.unlink(local_filename)
29+ return None
30+ with open(hash_store_location,"a") as myfile:
31+ myfile.write(file_hash+"\n")
32+
33 metadata = {
34 "sourceType": source_type,
35 "sourceName": source_name,
36@@ -103,7 +116,6 @@
37 }
38 metadata.update(extra_metadata)
39 Util.write_metadata(local_filename, metadata)
40-
41 logger.info(lambda: "Download complete")
42 return local_filename
43

Subscribers

People subscribed via source and target branches