Merge lp:~noskcaj/ubuntu/vivid/urlgrabber/3.10.1 into lp:ubuntu/vivid/urlgrabber
- Vivid (15.04)
- 3.10.1
- Merge into vivid
Proposed by
Jackson Doak
Status: | Needs review |
---|---|
Proposed branch: | lp:~noskcaj/ubuntu/vivid/urlgrabber/3.10.1 |
Merge into: | lp:ubuntu/vivid/urlgrabber |
Diff against target: |
7325 lines (+1389/-4846) 26 files modified
.pc/applied-patches (+0/-3) .pc/grabber_fix.diff/urlgrabber/grabber.py (+0/-1730) .pc/progress_fix.diff/urlgrabber/progress.py (+0/-755) .pc/progress_object_callback_fix.diff/urlgrabber/grabber.py (+0/-1802) ChangeLog (+8/-0) MANIFEST (+2/-0) PKG-INFO (+22/-22) README (+1/-1) debian/changelog (+7/-0) debian/patches/grabber_fix.diff (+0/-236) debian/patches/progress_fix.diff (+0/-11) debian/patches/progress_object_callback_fix.diff (+0/-21) debian/patches/series (+0/-3) scripts/urlgrabber (+14/-6) scripts/urlgrabber-ext-down (+75/-0) setup.py (+4/-2) test/base_test_code.py (+1/-1) test/munittest.py (+3/-3) test/test_byterange.py (+1/-13) test/test_grabber.py (+2/-1) test/test_mirror.py (+72/-1) urlgrabber/__init__.py (+5/-4) urlgrabber/byterange.py (+8/-8) urlgrabber/grabber.py (+901/-152) urlgrabber/mirror.py (+54/-11) urlgrabber/progress.py (+209/-60) |
To merge this branch: | bzr merge lp:~noskcaj/ubuntu/vivid/urlgrabber/3.10.1 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Daniel Holbach (community) | Needs Fixing | ||
Review via email: mp+244676@code.launchpad.net |
Commit message
Description of the change
New upstream release, upstreams some patcges
To post a comment you must log in.
Unmerged revisions
- 12. By Jackson Doak
-
* New upstream release.
* Drop all patches, fixed upstream
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === removed file '.pc/applied-patches' |
2 | --- .pc/applied-patches 2011-08-09 17:45:08 +0000 |
3 | +++ .pc/applied-patches 1970-01-01 00:00:00 +0000 |
4 | @@ -1,3 +0,0 @@ |
5 | -grabber_fix.diff |
6 | -progress_fix.diff |
7 | -progress_object_callback_fix.diff |
8 | |
9 | === removed directory '.pc/grabber_fix.diff' |
10 | === removed directory '.pc/grabber_fix.diff/urlgrabber' |
11 | === removed file '.pc/grabber_fix.diff/urlgrabber/grabber.py' |
12 | --- .pc/grabber_fix.diff/urlgrabber/grabber.py 2010-07-08 17:40:08 +0000 |
13 | +++ .pc/grabber_fix.diff/urlgrabber/grabber.py 1970-01-01 00:00:00 +0000 |
14 | @@ -1,1730 +0,0 @@ |
15 | -# This library is free software; you can redistribute it and/or |
16 | -# modify it under the terms of the GNU Lesser General Public |
17 | -# License as published by the Free Software Foundation; either |
18 | -# version 2.1 of the License, or (at your option) any later version. |
19 | -# |
20 | -# This library is distributed in the hope that it will be useful, |
21 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
22 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
23 | -# Lesser General Public License for more details. |
24 | -# |
25 | -# You should have received a copy of the GNU Lesser General Public |
26 | -# License along with this library; if not, write to the |
27 | -# Free Software Foundation, Inc., |
28 | -# 59 Temple Place, Suite 330, |
29 | -# Boston, MA 02111-1307 USA |
30 | - |
31 | -# This file is part of urlgrabber, a high-level cross-protocol url-grabber |
32 | -# Copyright 2002-2004 Michael D. Stenner, Ryan Tomayko |
33 | -# Copyright 2009 Red Hat inc, pycurl code written by Seth Vidal |
34 | - |
35 | -"""A high-level cross-protocol url-grabber. |
36 | - |
37 | -GENERAL ARGUMENTS (kwargs) |
38 | - |
39 | - Where possible, the module-level default is indicated, and legal |
40 | - values are provided. |
41 | - |
42 | - copy_local = 0 [0|1] |
43 | - |
44 | - ignored except for file:// urls, in which case it specifies |
45 | - whether urlgrab should still make a copy of the file, or simply |
46 | - point to the existing copy. The module level default for this |
47 | - option is 0. |
48 | - |
49 | - close_connection = 0 [0|1] |
50 | - |
51 | - tells URLGrabber to close the connection after a file has been |
52 | - transfered. This is ignored unless the download happens with the |
53 | - http keepalive handler (keepalive=1). Otherwise, the connection |
54 | - is left open for further use. The module level default for this |
55 | - option is 0 (keepalive connections will not be closed). |
56 | - |
57 | - keepalive = 1 [0|1] |
58 | - |
59 | - specifies whether keepalive should be used for HTTP/1.1 servers |
60 | - that support it. The module level default for this option is 1 |
61 | - (keepalive is enabled). |
62 | - |
63 | - progress_obj = None |
64 | - |
65 | - a class instance that supports the following methods: |
66 | - po.start(filename, url, basename, length, text) |
67 | - # length will be None if unknown |
68 | - po.update(read) # read == bytes read so far |
69 | - po.end() |
70 | - |
71 | - text = None |
72 | - |
73 | - specifies alternative text to be passed to the progress meter |
74 | - object. If not given, the default progress meter will use the |
75 | - basename of the file. |
76 | - |
77 | - throttle = 1.0 |
78 | - |
79 | - a number - if it's an int, it's the bytes/second throttle limit. |
80 | - If it's a float, it is first multiplied by bandwidth. If throttle |
81 | - == 0, throttling is disabled. If None, the module-level default |
82 | - (which can be set on default_grabber.throttle) is used. See |
83 | - BANDWIDTH THROTTLING for more information. |
84 | - |
85 | - timeout = None |
86 | - |
87 | - a positive float expressing the number of seconds to wait for socket |
88 | - operations. If the value is None or 0.0, socket operations will block |
89 | - forever. Setting this option causes urlgrabber to call the settimeout |
90 | - method on the Socket object used for the request. See the Python |
91 | - documentation on settimeout for more information. |
92 | - http://www.python.org/doc/current/lib/socket-objects.html |
93 | - |
94 | - bandwidth = 0 |
95 | - |
96 | - the nominal max bandwidth in bytes/second. If throttle is a float |
97 | - and bandwidth == 0, throttling is disabled. If None, the |
98 | - module-level default (which can be set on |
99 | - default_grabber.bandwidth) is used. See BANDWIDTH THROTTLING for |
100 | - more information. |
101 | - |
102 | - range = None |
103 | - |
104 | - a tuple of the form (first_byte, last_byte) describing a byte |
105 | - range to retrieve. Either or both of the values may set to |
106 | - None. If first_byte is None, byte offset 0 is assumed. If |
107 | - last_byte is None, the last byte available is assumed. Note that |
108 | - the range specification is python-like in that (0,10) will yeild |
109 | - the first 10 bytes of the file. |
110 | - |
111 | - If set to None, no range will be used. |
112 | - |
113 | - reget = None [None|'simple'|'check_timestamp'] |
114 | - |
115 | - whether to attempt to reget a partially-downloaded file. Reget |
116 | - only applies to .urlgrab and (obviously) only if there is a |
117 | - partially downloaded file. Reget has two modes: |
118 | - |
119 | - 'simple' -- the local file will always be trusted. If there |
120 | - are 100 bytes in the local file, then the download will always |
121 | - begin 100 bytes into the requested file. |
122 | - |
123 | - 'check_timestamp' -- the timestamp of the server file will be |
124 | - compared to the timestamp of the local file. ONLY if the |
125 | - local file is newer than or the same age as the server file |
126 | - will reget be used. If the server file is newer, or the |
127 | - timestamp is not returned, the entire file will be fetched. |
128 | - |
129 | - NOTE: urlgrabber can do very little to verify that the partial |
130 | - file on disk is identical to the beginning of the remote file. |
131 | - You may want to either employ a custom "checkfunc" or simply avoid |
132 | - using reget in situations where corruption is a concern. |
133 | - |
134 | - user_agent = 'urlgrabber/VERSION' |
135 | - |
136 | - a string, usually of the form 'AGENT/VERSION' that is provided to |
137 | - HTTP servers in the User-agent header. The module level default |
138 | - for this option is "urlgrabber/VERSION". |
139 | - |
140 | - http_headers = None |
141 | - |
142 | - a tuple of 2-tuples, each containing a header and value. These |
143 | - will be used for http and https requests only. For example, you |
144 | - can do |
145 | - http_headers = (('Pragma', 'no-cache'),) |
146 | - |
147 | - ftp_headers = None |
148 | - |
149 | - this is just like http_headers, but will be used for ftp requests. |
150 | - |
151 | - proxies = None |
152 | - |
153 | - a dictionary that maps protocol schemes to proxy hosts. For |
154 | - example, to use a proxy server on host "foo" port 3128 for http |
155 | - and https URLs: |
156 | - proxies={ 'http' : 'http://foo:3128', 'https' : 'http://foo:3128' } |
157 | - note that proxy authentication information may be provided using |
158 | - normal URL constructs: |
159 | - proxies={ 'http' : 'http://user:host@foo:3128' } |
160 | - Lastly, if proxies is None, the default environment settings will |
161 | - be used. |
162 | - |
163 | - prefix = None |
164 | - |
165 | - a url prefix that will be prepended to all requested urls. For |
166 | - example: |
167 | - g = URLGrabber(prefix='http://foo.com/mirror/') |
168 | - g.urlgrab('some/file.txt') |
169 | - ## this will fetch 'http://foo.com/mirror/some/file.txt' |
170 | - This option exists primarily to allow identical behavior to |
171 | - MirrorGroup (and derived) instances. Note: a '/' will be inserted |
172 | - if necessary, so you cannot specify a prefix that ends with a |
173 | - partial file or directory name. |
174 | - |
175 | - opener = None |
176 | - No-op when using the curl backend (default) |
177 | - |
178 | - cache_openers = True |
179 | - No-op when using the curl backend (default) |
180 | - |
181 | - data = None |
182 | - |
183 | - Only relevant for the HTTP family (and ignored for other |
184 | - protocols), this allows HTTP POSTs. When the data kwarg is |
185 | - present (and not None), an HTTP request will automatically become |
186 | - a POST rather than GET. This is done by direct passthrough to |
187 | - urllib2. If you use this, you may also want to set the |
188 | - 'Content-length' and 'Content-type' headers with the http_headers |
189 | - option. Note that python 2.2 handles the case of these |
190 | - badly and if you do not use the proper case (shown here), your |
191 | - values will be overridden with the defaults. |
192 | - |
193 | - urlparser = URLParser() |
194 | - |
195 | - The URLParser class handles pre-processing of URLs, including |
196 | - auth-handling for user/pass encoded in http urls, file handing |
197 | - (that is, filenames not sent as a URL), and URL quoting. If you |
198 | - want to override any of this behavior, you can pass in a |
199 | - replacement instance. See also the 'quote' option. |
200 | - |
201 | - quote = None |
202 | - |
203 | - Whether or not to quote the path portion of a url. |
204 | - quote = 1 -> quote the URLs (they're not quoted yet) |
205 | - quote = 0 -> do not quote them (they're already quoted) |
206 | - quote = None -> guess what to do |
207 | - |
208 | - This option only affects proper urls like 'file:///etc/passwd'; it |
209 | - does not affect 'raw' filenames like '/etc/passwd'. The latter |
210 | - will always be quoted as they are converted to URLs. Also, only |
211 | - the path part of a url is quoted. If you need more fine-grained |
212 | - control, you should probably subclass URLParser and pass it in via |
213 | - the 'urlparser' option. |
214 | - |
215 | - ssl_ca_cert = None |
216 | - |
217 | - this option can be used if M2Crypto is available and will be |
218 | - ignored otherwise. If provided, it will be used to create an SSL |
219 | - context. If both ssl_ca_cert and ssl_context are provided, then |
220 | - ssl_context will be ignored and a new context will be created from |
221 | - ssl_ca_cert. |
222 | - |
223 | - ssl_context = None |
224 | - |
225 | - No-op when using the curl backend (default) |
226 | - |
227 | - |
228 | - self.ssl_verify_peer = True |
229 | - |
230 | - Check the server's certificate to make sure it is valid with what our CA validates |
231 | - |
232 | - self.ssl_verify_host = True |
233 | - |
234 | - Check the server's hostname to make sure it matches the certificate DN |
235 | - |
236 | - self.ssl_key = None |
237 | - |
238 | - Path to the key the client should use to connect/authenticate with |
239 | - |
240 | - self.ssl_key_type = 'PEM' |
241 | - |
242 | - PEM or DER - format of key |
243 | - |
244 | - self.ssl_cert = None |
245 | - |
246 | - Path to the ssl certificate the client should use to to authenticate with |
247 | - |
248 | - self.ssl_cert_type = 'PEM' |
249 | - |
250 | - PEM or DER - format of certificate |
251 | - |
252 | - self.ssl_key_pass = None |
253 | - |
254 | - password to access the ssl_key |
255 | - |
256 | - self.size = None |
257 | - |
258 | - size (in bytes) or Maximum size of the thing being downloaded. |
259 | - This is mostly to keep us from exploding with an endless datastream |
260 | - |
261 | - self.max_header_size = 2097152 |
262 | - |
263 | - Maximum size (in bytes) of the headers. |
264 | - |
265 | - |
266 | -RETRY RELATED ARGUMENTS |
267 | - |
268 | - retry = None |
269 | - |
270 | - the number of times to retry the grab before bailing. If this is |
271 | - zero, it will retry forever. This was intentional... really, it |
272 | - was :). If this value is not supplied or is supplied but is None |
273 | - retrying does not occur. |
274 | - |
275 | - retrycodes = [-1,2,4,5,6,7] |
276 | - |
277 | - a sequence of errorcodes (values of e.errno) for which it should |
278 | - retry. See the doc on URLGrabError for more details on this. You |
279 | - might consider modifying a copy of the default codes rather than |
280 | - building yours from scratch so that if the list is extended in the |
281 | - future (or one code is split into two) you can still enjoy the |
282 | - benefits of the default list. You can do that with something like |
283 | - this: |
284 | - |
285 | - retrycodes = urlgrabber.grabber.URLGrabberOptions().retrycodes |
286 | - if 12 not in retrycodes: |
287 | - retrycodes.append(12) |
288 | - |
289 | - checkfunc = None |
290 | - |
291 | - a function to do additional checks. This defaults to None, which |
292 | - means no additional checking. The function should simply return |
293 | - on a successful check. It should raise URLGrabError on an |
294 | - unsuccessful check. Raising of any other exception will be |
295 | - considered immediate failure and no retries will occur. |
296 | - |
297 | - If it raises URLGrabError, the error code will determine the retry |
298 | - behavior. Negative error numbers are reserved for use by these |
299 | - passed in functions, so you can use many negative numbers for |
300 | - different types of failure. By default, -1 results in a retry, |
301 | - but this can be customized with retrycodes. |
302 | - |
303 | - If you simply pass in a function, it will be given exactly one |
304 | - argument: a CallbackObject instance with the .url attribute |
305 | - defined and either .filename (for urlgrab) or .data (for urlread). |
306 | - For urlgrab, .filename is the name of the local file. For |
307 | - urlread, .data is the actual string data. If you need other |
308 | - arguments passed to the callback (program state of some sort), you |
309 | - can do so like this: |
310 | - |
311 | - checkfunc=(function, ('arg1', 2), {'kwarg': 3}) |
312 | - |
313 | - if the downloaded file has filename /tmp/stuff, then this will |
314 | - result in this call (for urlgrab): |
315 | - |
316 | - function(obj, 'arg1', 2, kwarg=3) |
317 | - # obj.filename = '/tmp/stuff' |
318 | - # obj.url = 'http://foo.com/stuff' |
319 | - |
320 | - NOTE: both the "args" tuple and "kwargs" dict must be present if |
321 | - you use this syntax, but either (or both) can be empty. |
322 | - |
323 | - failure_callback = None |
324 | - |
325 | - The callback that gets called during retries when an attempt to |
326 | - fetch a file fails. The syntax for specifying the callback is |
327 | - identical to checkfunc, except for the attributes defined in the |
328 | - CallbackObject instance. The attributes for failure_callback are: |
329 | - |
330 | - exception = the raised exception |
331 | - url = the url we're trying to fetch |
332 | - tries = the number of tries so far (including this one) |
333 | - retry = the value of the retry option |
334 | - |
335 | - The callback is present primarily to inform the calling program of |
336 | - the failure, but if it raises an exception (including the one it's |
337 | - passed) that exception will NOT be caught and will therefore cause |
338 | - future retries to be aborted. |
339 | - |
340 | - The callback is called for EVERY failure, including the last one. |
341 | - On the last try, the callback can raise an alternate exception, |
342 | - but it cannot (without severe trickiness) prevent the exception |
343 | - from being raised. |
344 | - |
345 | - interrupt_callback = None |
346 | - |
347 | - This callback is called if KeyboardInterrupt is received at any |
348 | - point in the transfer. Basically, this callback can have three |
349 | - impacts on the fetch process based on the way it exits: |
350 | - |
351 | - 1) raise no exception: the current fetch will be aborted, but |
352 | - any further retries will still take place |
353 | - |
354 | - 2) raise a URLGrabError: if you're using a MirrorGroup, then |
355 | - this will prompt a failover to the next mirror according to |
356 | - the behavior of the MirrorGroup subclass. It is recommended |
357 | - that you raise URLGrabError with code 15, 'user abort'. If |
358 | - you are NOT using a MirrorGroup subclass, then this is the |
359 | - same as (3). |
360 | - |
361 | - 3) raise some other exception (such as KeyboardInterrupt), which |
362 | - will not be caught at either the grabber or mirror levels. |
363 | - That is, it will be raised up all the way to the caller. |
364 | - |
365 | - This callback is very similar to failure_callback. They are |
366 | - passed the same arguments, so you could use the same function for |
367 | - both. |
368 | - |
369 | -BANDWIDTH THROTTLING |
370 | - |
371 | - urlgrabber supports throttling via two values: throttle and |
372 | - bandwidth Between the two, you can either specify and absolute |
373 | - throttle threshold or specify a theshold as a fraction of maximum |
374 | - available bandwidth. |
375 | - |
376 | - throttle is a number - if it's an int, it's the bytes/second |
377 | - throttle limit. If it's a float, it is first multiplied by |
378 | - bandwidth. If throttle == 0, throttling is disabled. If None, the |
379 | - module-level default (which can be set with set_throttle) is used. |
380 | - |
381 | - bandwidth is the nominal max bandwidth in bytes/second. If throttle |
382 | - is a float and bandwidth == 0, throttling is disabled. If None, the |
383 | - module-level default (which can be set with set_bandwidth) is used. |
384 | - |
385 | - THROTTLING EXAMPLES: |
386 | - |
387 | - Lets say you have a 100 Mbps connection. This is (about) 10^8 bits |
388 | - per second, or 12,500,000 Bytes per second. You have a number of |
389 | - throttling options: |
390 | - |
391 | - *) set_bandwidth(12500000); set_throttle(0.5) # throttle is a float |
392 | - |
393 | - This will limit urlgrab to use half of your available bandwidth. |
394 | - |
395 | - *) set_throttle(6250000) # throttle is an int |
396 | - |
397 | - This will also limit urlgrab to use half of your available |
398 | - bandwidth, regardless of what bandwidth is set to. |
399 | - |
400 | - *) set_throttle(6250000); set_throttle(1.0) # float |
401 | - |
402 | - Use half your bandwidth |
403 | - |
404 | - *) set_throttle(6250000); set_throttle(2.0) # float |
405 | - |
406 | - Use up to 12,500,000 Bytes per second (your nominal max bandwidth) |
407 | - |
408 | - *) set_throttle(6250000); set_throttle(0) # throttle = 0 |
409 | - |
410 | - Disable throttling - this is more efficient than a very large |
411 | - throttle setting. |
412 | - |
413 | - *) set_throttle(0); set_throttle(1.0) # throttle is float, bandwidth = 0 |
414 | - |
415 | - Disable throttling - this is the default when the module is loaded. |
416 | - |
417 | - SUGGESTED AUTHOR IMPLEMENTATION (THROTTLING) |
418 | - |
419 | - While this is flexible, it's not extremely obvious to the user. I |
420 | - suggest you implement a float throttle as a percent to make the |
421 | - distinction between absolute and relative throttling very explicit. |
422 | - |
423 | - Also, you may want to convert the units to something more convenient |
424 | - than bytes/second, such as kbps or kB/s, etc. |
425 | - |
426 | -""" |
427 | - |
428 | - |
429 | - |
430 | -import os |
431 | -import sys |
432 | -import urlparse |
433 | -import time |
434 | -import string |
435 | -import urllib |
436 | -import urllib2 |
437 | -import mimetools |
438 | -import thread |
439 | -import types |
440 | -import stat |
441 | -import pycurl |
442 | -from ftplib import parse150 |
443 | -from StringIO import StringIO |
444 | -from httplib import HTTPException |
445 | -import socket |
446 | -from byterange import range_tuple_normalize, range_tuple_to_header, RangeError |
447 | - |
448 | -######################################################################## |
449 | -# MODULE INITIALIZATION |
450 | -######################################################################## |
451 | -try: |
452 | - exec('from ' + (__name__.split('.'))[0] + ' import __version__') |
453 | -except: |
454 | - __version__ = '???' |
455 | - |
456 | -######################################################################## |
457 | -# functions for debugging output. These functions are here because they |
458 | -# are also part of the module initialization. |
459 | -DEBUG = None |
460 | -def set_logger(DBOBJ): |
461 | - """Set the DEBUG object. This is called by _init_default_logger when |
462 | - the environment variable URLGRABBER_DEBUG is set, but can also be |
463 | - called by a calling program. Basically, if the calling program uses |
464 | - the logging module and would like to incorporate urlgrabber logging, |
465 | - then it can do so this way. It's probably not necessary as most |
466 | - internal logging is only for debugging purposes. |
467 | - |
468 | - The passed-in object should be a logging.Logger instance. It will |
469 | - be pushed into the keepalive and byterange modules if they're |
470 | - being used. The mirror module pulls this object in on import, so |
471 | - you will need to manually push into it. In fact, you may find it |
472 | - tidier to simply push your logging object (or objects) into each |
473 | - of these modules independently. |
474 | - """ |
475 | - |
476 | - global DEBUG |
477 | - DEBUG = DBOBJ |
478 | - |
479 | -def _init_default_logger(logspec=None): |
480 | - '''Examines the environment variable URLGRABBER_DEBUG and creates |
481 | - a logging object (logging.logger) based on the contents. It takes |
482 | - the form |
483 | - |
484 | - URLGRABBER_DEBUG=level,filename |
485 | - |
486 | - where "level" can be either an integer or a log level from the |
487 | - logging module (DEBUG, INFO, etc). If the integer is zero or |
488 | - less, logging will be disabled. Filename is the filename where |
489 | - logs will be sent. If it is "-", then stdout will be used. If |
490 | - the filename is empty or missing, stderr will be used. If the |
491 | - variable cannot be processed or the logging module cannot be |
492 | - imported (python < 2.3) then logging will be disabled. Here are |
493 | - some examples: |
494 | - |
495 | - URLGRABBER_DEBUG=1,debug.txt # log everything to debug.txt |
496 | - URLGRABBER_DEBUG=WARNING,- # log warning and higher to stdout |
497 | - URLGRABBER_DEBUG=INFO # log info and higher to stderr |
498 | - |
499 | - This funtion is called during module initialization. It is not |
500 | - intended to be called from outside. The only reason it is a |
501 | - function at all is to keep the module-level namespace tidy and to |
502 | - collect the code into a nice block.''' |
503 | - |
504 | - try: |
505 | - if logspec is None: |
506 | - logspec = os.environ['URLGRABBER_DEBUG'] |
507 | - dbinfo = logspec.split(',') |
508 | - import logging |
509 | - level = logging._levelNames.get(dbinfo[0], None) |
510 | - if level is None: level = int(dbinfo[0]) |
511 | - if level < 1: raise ValueError() |
512 | - |
513 | - formatter = logging.Formatter('%(asctime)s %(message)s') |
514 | - if len(dbinfo) > 1: filename = dbinfo[1] |
515 | - else: filename = '' |
516 | - if filename == '': handler = logging.StreamHandler(sys.stderr) |
517 | - elif filename == '-': handler = logging.StreamHandler(sys.stdout) |
518 | - else: handler = logging.FileHandler(filename) |
519 | - handler.setFormatter(formatter) |
520 | - DBOBJ = logging.getLogger('urlgrabber') |
521 | - DBOBJ.addHandler(handler) |
522 | - DBOBJ.setLevel(level) |
523 | - except (KeyError, ImportError, ValueError): |
524 | - DBOBJ = None |
525 | - set_logger(DBOBJ) |
526 | - |
527 | -def _log_package_state(): |
528 | - if not DEBUG: return |
529 | - DEBUG.info('urlgrabber version = %s' % __version__) |
530 | - DEBUG.info('trans function "_" = %s' % _) |
531 | - |
532 | -_init_default_logger() |
533 | -_log_package_state() |
534 | - |
535 | - |
536 | -# normally this would be from i18n or something like it ... |
537 | -def _(st): |
538 | - return st |
539 | - |
540 | -######################################################################## |
541 | -# END MODULE INITIALIZATION |
542 | -######################################################################## |
543 | - |
544 | - |
545 | - |
546 | -class URLGrabError(IOError): |
547 | - """ |
548 | - URLGrabError error codes: |
549 | - |
550 | - URLGrabber error codes (0 -- 255) |
551 | - 0 - everything looks good (you should never see this) |
552 | - 1 - malformed url |
553 | - 2 - local file doesn't exist |
554 | - 3 - request for non-file local file (dir, etc) |
555 | - 4 - IOError on fetch |
556 | - 5 - OSError on fetch |
557 | - 6 - no content length header when we expected one |
558 | - 7 - HTTPException |
559 | - 8 - Exceeded read limit (for urlread) |
560 | - 9 - Requested byte range not satisfiable. |
561 | - 10 - Byte range requested, but range support unavailable |
562 | - 11 - Illegal reget mode |
563 | - 12 - Socket timeout |
564 | - 13 - malformed proxy url |
565 | - 14 - HTTPError (includes .code and .exception attributes) |
566 | - 15 - user abort |
567 | - 16 - error writing to local file |
568 | - |
569 | - MirrorGroup error codes (256 -- 511) |
570 | - 256 - No more mirrors left to try |
571 | - |
572 | - Custom (non-builtin) classes derived from MirrorGroup (512 -- 767) |
573 | - [ this range reserved for application-specific error codes ] |
574 | - |
575 | - Retry codes (< 0) |
576 | - -1 - retry the download, unknown reason |
577 | - |
578 | - Note: to test which group a code is in, you can simply do integer |
579 | - division by 256: e.errno / 256 |
580 | - |
581 | - Negative codes are reserved for use by functions passed in to |
582 | - retrygrab with checkfunc. The value -1 is built in as a generic |
583 | - retry code and is already included in the retrycodes list. |
584 | - Therefore, you can create a custom check function that simply |
585 | - returns -1 and the fetch will be re-tried. For more customized |
586 | - retries, you can use other negative number and include them in |
587 | - retry-codes. This is nice for outputting useful messages about |
588 | - what failed. |
589 | - |
590 | - You can use these error codes like so: |
591 | - try: urlgrab(url) |
592 | - except URLGrabError, e: |
593 | - if e.errno == 3: ... |
594 | - # or |
595 | - print e.strerror |
596 | - # or simply |
597 | - print e #### print '[Errno %i] %s' % (e.errno, e.strerror) |
598 | - """ |
599 | - def __init__(self, *args): |
600 | - IOError.__init__(self, *args) |
601 | - self.url = "No url specified" |
602 | - |
603 | -class CallbackObject: |
604 | - """Container for returned callback data. |
605 | - |
606 | - This is currently a dummy class into which urlgrabber can stuff |
607 | - information for passing to callbacks. This way, the prototype for |
608 | - all callbacks is the same, regardless of the data that will be |
609 | - passed back. Any function that accepts a callback function as an |
610 | - argument SHOULD document what it will define in this object. |
611 | - |
612 | - It is possible that this class will have some greater |
613 | - functionality in the future. |
614 | - """ |
615 | - def __init__(self, **kwargs): |
616 | - self.__dict__.update(kwargs) |
617 | - |
618 | -def urlgrab(url, filename=None, **kwargs): |
619 | - """grab the file at <url> and make a local copy at <filename> |
620 | - If filename is none, the basename of the url is used. |
621 | - urlgrab returns the filename of the local file, which may be different |
622 | - from the passed-in filename if the copy_local kwarg == 0. |
623 | - |
624 | - See module documentation for a description of possible kwargs. |
625 | - """ |
626 | - return default_grabber.urlgrab(url, filename, **kwargs) |
627 | - |
628 | -def urlopen(url, **kwargs): |
629 | - """open the url and return a file object |
630 | - If a progress object or throttle specifications exist, then |
631 | - a special file object will be returned that supports them. |
632 | - The file object can be treated like any other file object. |
633 | - |
634 | - See module documentation for a description of possible kwargs. |
635 | - """ |
636 | - return default_grabber.urlopen(url, **kwargs) |
637 | - |
638 | -def urlread(url, limit=None, **kwargs): |
639 | - """read the url into a string, up to 'limit' bytes |
640 | - If the limit is exceeded, an exception will be thrown. Note that urlread |
641 | - is NOT intended to be used as a way of saying "I want the first N bytes" |
642 | - but rather 'read the whole file into memory, but don't use too much' |
643 | - |
644 | - See module documentation for a description of possible kwargs. |
645 | - """ |
646 | - return default_grabber.urlread(url, limit, **kwargs) |
647 | - |
648 | - |
649 | -class URLParser: |
650 | - """Process the URLs before passing them to urllib2. |
651 | - |
652 | - This class does several things: |
653 | - |
654 | - * add any prefix |
655 | - * translate a "raw" file to a proper file: url |
656 | - * handle any http or https auth that's encoded within the url |
657 | - * quote the url |
658 | - |
659 | - Only the "parse" method is called directly, and it calls sub-methods. |
660 | - |
661 | - An instance of this class is held in the options object, which |
662 | - means that it's easy to change the behavior by sub-classing and |
663 | - passing the replacement in. It need only have a method like: |
664 | - |
665 | - url, parts = urlparser.parse(url, opts) |
666 | - """ |
667 | - |
668 | - def parse(self, url, opts): |
669 | - """parse the url and return the (modified) url and its parts |
670 | - |
671 | - Note: a raw file WILL be quoted when it's converted to a URL. |
672 | - However, other urls (ones which come with a proper scheme) may |
673 | - or may not be quoted according to opts.quote |
674 | - |
675 | - opts.quote = 1 --> quote it |
676 | - opts.quote = 0 --> do not quote it |
677 | - opts.quote = None --> guess |
678 | - """ |
679 | - quote = opts.quote |
680 | - |
681 | - if opts.prefix: |
682 | - url = self.add_prefix(url, opts.prefix) |
683 | - |
684 | - parts = urlparse.urlparse(url) |
685 | - (scheme, host, path, parm, query, frag) = parts |
686 | - |
687 | - if not scheme or (len(scheme) == 1 and scheme in string.letters): |
688 | - # if a scheme isn't specified, we guess that it's "file:" |
689 | - if url[0] not in '/\\': url = os.path.abspath(url) |
690 | - url = 'file:' + urllib.pathname2url(url) |
691 | - parts = urlparse.urlparse(url) |
692 | - quote = 0 # pathname2url quotes, so we won't do it again |
693 | - |
694 | - if scheme in ['http', 'https']: |
695 | - parts = self.process_http(parts, url) |
696 | - |
697 | - if quote is None: |
698 | - quote = self.guess_should_quote(parts) |
699 | - if quote: |
700 | - parts = self.quote(parts) |
701 | - |
702 | - url = urlparse.urlunparse(parts) |
703 | - return url, parts |
704 | - |
705 | - def add_prefix(self, url, prefix): |
706 | - if prefix[-1] == '/' or url[0] == '/': |
707 | - url = prefix + url |
708 | - else: |
709 | - url = prefix + '/' + url |
710 | - return url |
711 | - |
712 | - def process_http(self, parts, url): |
713 | - (scheme, host, path, parm, query, frag) = parts |
714 | - # TODO: auth-parsing here, maybe? pycurl doesn't really need it |
715 | - return (scheme, host, path, parm, query, frag) |
716 | - |
717 | - def quote(self, parts): |
718 | - """quote the URL |
719 | - |
720 | - This method quotes ONLY the path part. If you need to quote |
721 | - other parts, you should override this and pass in your derived |
722 | - class. The other alternative is to quote other parts before |
723 | - passing into urlgrabber. |
724 | - """ |
725 | - (scheme, host, path, parm, query, frag) = parts |
726 | - path = urllib.quote(path) |
727 | - return (scheme, host, path, parm, query, frag) |
728 | - |
729 | - hexvals = '0123456789ABCDEF' |
730 | - def guess_should_quote(self, parts): |
731 | - """ |
732 | - Guess whether we should quote a path. This amounts to |
733 | - guessing whether it's already quoted. |
734 | - |
735 | - find ' ' -> 1 |
736 | - find '%' -> 1 |
737 | - find '%XX' -> 0 |
738 | - else -> 1 |
739 | - """ |
740 | - (scheme, host, path, parm, query, frag) = parts |
741 | - if ' ' in path: |
742 | - return 1 |
743 | - ind = string.find(path, '%') |
744 | - if ind > -1: |
745 | - while ind > -1: |
746 | - if len(path) < ind+3: |
747 | - return 1 |
748 | - code = path[ind+1:ind+3].upper() |
749 | - if code[0] not in self.hexvals or \ |
750 | - code[1] not in self.hexvals: |
751 | - return 1 |
752 | - ind = string.find(path, '%', ind+1) |
753 | - return 0 |
754 | - return 1 |
755 | - |
756 | -class URLGrabberOptions: |
757 | - """Class to ease kwargs handling.""" |
758 | - |
759 | - def __init__(self, delegate=None, **kwargs): |
760 | - """Initialize URLGrabberOptions object. |
761 | - Set default values for all options and then update options specified |
762 | - in kwargs. |
763 | - """ |
764 | - self.delegate = delegate |
765 | - if delegate is None: |
766 | - self._set_defaults() |
767 | - self._set_attributes(**kwargs) |
768 | - |
769 | - def __getattr__(self, name): |
770 | - if self.delegate and hasattr(self.delegate, name): |
771 | - return getattr(self.delegate, name) |
772 | - raise AttributeError, name |
773 | - |
774 | - def raw_throttle(self): |
775 | - """Calculate raw throttle value from throttle and bandwidth |
776 | - values. |
777 | - """ |
778 | - if self.throttle <= 0: |
779 | - return 0 |
780 | - elif type(self.throttle) == type(0): |
781 | - return float(self.throttle) |
782 | - else: # throttle is a float |
783 | - return self.bandwidth * self.throttle |
784 | - |
785 | - def derive(self, **kwargs): |
786 | - """Create a derived URLGrabberOptions instance. |
787 | - This method creates a new instance and overrides the |
788 | - options specified in kwargs. |
789 | - """ |
790 | - return URLGrabberOptions(delegate=self, **kwargs) |
791 | - |
792 | - def _set_attributes(self, **kwargs): |
793 | - """Update object attributes with those provided in kwargs.""" |
794 | - self.__dict__.update(kwargs) |
795 | - if kwargs.has_key('range'): |
796 | - # normalize the supplied range value |
797 | - self.range = range_tuple_normalize(self.range) |
798 | - if not self.reget in [None, 'simple', 'check_timestamp']: |
799 | - raise URLGrabError(11, _('Illegal reget mode: %s') \ |
800 | - % (self.reget, )) |
801 | - |
802 | - def _set_defaults(self): |
803 | - """Set all options to their default values. |
804 | - When adding new options, make sure a default is |
805 | - provided here. |
806 | - """ |
807 | - self.progress_obj = None |
808 | - self.throttle = 1.0 |
809 | - self.bandwidth = 0 |
810 | - self.retry = None |
811 | - self.retrycodes = [-1,2,4,5,6,7] |
812 | - self.checkfunc = None |
813 | - self.copy_local = 0 |
814 | - self.close_connection = 0 |
815 | - self.range = None |
816 | - self.user_agent = 'urlgrabber/%s' % __version__ |
817 | - self.keepalive = 1 |
818 | - self.proxies = None |
819 | - self.reget = None |
820 | - self.failure_callback = None |
821 | - self.interrupt_callback = None |
822 | - self.prefix = None |
823 | - self.opener = None |
824 | - self.cache_openers = True |
825 | - self.timeout = None |
826 | - self.text = None |
827 | - self.http_headers = None |
828 | - self.ftp_headers = None |
829 | - self.data = None |
830 | - self.urlparser = URLParser() |
831 | - self.quote = None |
832 | - self.ssl_ca_cert = None # sets SSL_CAINFO - path to certdb |
833 | - self.ssl_context = None # no-op in pycurl |
834 | - self.ssl_verify_peer = True # check peer's cert for authenticityb |
835 | - self.ssl_verify_host = True # make sure who they are and who the cert is for matches |
836 | - self.ssl_key = None # client key |
837 | - self.ssl_key_type = 'PEM' #(or DER) |
838 | - self.ssl_cert = None # client cert |
839 | - self.ssl_cert_type = 'PEM' # (or DER) |
840 | - self.ssl_key_pass = None # password to access the key |
841 | - self.size = None # if we know how big the thing we're getting is going |
842 | - # to be. this is ultimately a MAXIMUM size for the file |
843 | - self.max_header_size = 2097152 #2mb seems reasonable for maximum header size |
844 | - |
845 | - def __repr__(self): |
846 | - return self.format() |
847 | - |
848 | - def format(self, indent=' '): |
849 | - keys = self.__dict__.keys() |
850 | - if self.delegate is not None: |
851 | - keys.remove('delegate') |
852 | - keys.sort() |
853 | - s = '{\n' |
854 | - for k in keys: |
855 | - s = s + indent + '%-15s: %s,\n' % \ |
856 | - (repr(k), repr(self.__dict__[k])) |
857 | - if self.delegate: |
858 | - df = self.delegate.format(indent + ' ') |
859 | - s = s + indent + '%-15s: %s\n' % ("'delegate'", df) |
860 | - s = s + indent + '}' |
861 | - return s |
862 | - |
863 | -class URLGrabber: |
864 | - """Provides easy opening of URLs with a variety of options. |
865 | - |
866 | - All options are specified as kwargs. Options may be specified when |
867 | - the class is created and may be overridden on a per request basis. |
868 | - |
869 | - New objects inherit default values from default_grabber. |
870 | - """ |
871 | - |
872 | - def __init__(self, **kwargs): |
873 | - self.opts = URLGrabberOptions(**kwargs) |
874 | - |
875 | - def _retry(self, opts, func, *args): |
876 | - tries = 0 |
877 | - while 1: |
878 | - # there are only two ways out of this loop. The second has |
879 | - # several "sub-ways" |
880 | - # 1) via the return in the "try" block |
881 | - # 2) by some exception being raised |
882 | - # a) an excepton is raised that we don't "except" |
883 | - # b) a callback raises ANY exception |
884 | - # c) we're not retry-ing or have run out of retries |
885 | - # d) the URLGrabError code is not in retrycodes |
886 | - # beware of infinite loops :) |
887 | - tries = tries + 1 |
888 | - exception = None |
889 | - retrycode = None |
890 | - callback = None |
891 | - if DEBUG: DEBUG.info('attempt %i/%s: %s', |
892 | - tries, opts.retry, args[0]) |
893 | - try: |
894 | - r = apply(func, (opts,) + args, {}) |
895 | - if DEBUG: DEBUG.info('success') |
896 | - return r |
897 | - except URLGrabError, e: |
898 | - exception = e |
899 | - callback = opts.failure_callback |
900 | - retrycode = e.errno |
901 | - except KeyboardInterrupt, e: |
902 | - exception = e |
903 | - callback = opts.interrupt_callback |
904 | - |
905 | - if DEBUG: DEBUG.info('exception: %s', exception) |
906 | - if callback: |
907 | - if DEBUG: DEBUG.info('calling callback: %s', callback) |
908 | - cb_func, cb_args, cb_kwargs = self._make_callback(callback) |
909 | - obj = CallbackObject(exception=exception, url=args[0], |
910 | - tries=tries, retry=opts.retry) |
911 | - cb_func(obj, *cb_args, **cb_kwargs) |
912 | - |
913 | - if (opts.retry is None) or (tries == opts.retry): |
914 | - if DEBUG: DEBUG.info('retries exceeded, re-raising') |
915 | - raise |
916 | - |
917 | - if (retrycode is not None) and (retrycode not in opts.retrycodes): |
918 | - if DEBUG: DEBUG.info('retrycode (%i) not in list %s, re-raising', |
919 | - retrycode, opts.retrycodes) |
920 | - raise |
921 | - |
922 | - def urlopen(self, url, **kwargs): |
923 | - """open the url and return a file object |
924 | - If a progress object or throttle value specified when this |
925 | - object was created, then a special file object will be |
926 | - returned that supports them. The file object can be treated |
927 | - like any other file object. |
928 | - """ |
929 | - opts = self.opts.derive(**kwargs) |
930 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
931 | - (url,parts) = opts.urlparser.parse(url, opts) |
932 | - def retryfunc(opts, url): |
933 | - return PyCurlFileObject(url, filename=None, opts=opts) |
934 | - return self._retry(opts, retryfunc, url) |
935 | - |
936 | - def urlgrab(self, url, filename=None, **kwargs): |
937 | - """grab the file at <url> and make a local copy at <filename> |
938 | - If filename is none, the basename of the url is used. |
939 | - urlgrab returns the filename of the local file, which may be |
940 | - different from the passed-in filename if copy_local == 0. |
941 | - """ |
942 | - opts = self.opts.derive(**kwargs) |
943 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
944 | - (url,parts) = opts.urlparser.parse(url, opts) |
945 | - (scheme, host, path, parm, query, frag) = parts |
946 | - if filename is None: |
947 | - filename = os.path.basename( urllib.unquote(path) ) |
948 | - if scheme == 'file' and not opts.copy_local: |
949 | - # just return the name of the local file - don't make a |
950 | - # copy currently |
951 | - path = urllib.url2pathname(path) |
952 | - if host: |
953 | - path = os.path.normpath('//' + host + path) |
954 | - if not os.path.exists(path): |
955 | - err = URLGrabError(2, |
956 | - _('Local file does not exist: %s') % (path, )) |
957 | - err.url = url |
958 | - raise err |
959 | - elif not os.path.isfile(path): |
960 | - err = URLGrabError(3, |
961 | - _('Not a normal file: %s') % (path, )) |
962 | - err.url = url |
963 | - raise err |
964 | - |
965 | - elif not opts.range: |
966 | - if not opts.checkfunc is None: |
967 | - cb_func, cb_args, cb_kwargs = \ |
968 | - self._make_callback(opts.checkfunc) |
969 | - obj = CallbackObject() |
970 | - obj.filename = path |
971 | - obj.url = url |
972 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
973 | - return path |
974 | - |
975 | - def retryfunc(opts, url, filename): |
976 | - fo = PyCurlFileObject(url, filename, opts) |
977 | - try: |
978 | - fo._do_grab() |
979 | - if not opts.checkfunc is None: |
980 | - cb_func, cb_args, cb_kwargs = \ |
981 | - self._make_callback(opts.checkfunc) |
982 | - obj = CallbackObject() |
983 | - obj.filename = filename |
984 | - obj.url = url |
985 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
986 | - finally: |
987 | - fo.close() |
988 | - return filename |
989 | - |
990 | - return self._retry(opts, retryfunc, url, filename) |
991 | - |
992 | - def urlread(self, url, limit=None, **kwargs): |
993 | - """read the url into a string, up to 'limit' bytes |
994 | - If the limit is exceeded, an exception will be thrown. Note |
995 | - that urlread is NOT intended to be used as a way of saying |
996 | - "I want the first N bytes" but rather 'read the whole file |
997 | - into memory, but don't use too much' |
998 | - """ |
999 | - opts = self.opts.derive(**kwargs) |
1000 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
1001 | - (url,parts) = opts.urlparser.parse(url, opts) |
1002 | - if limit is not None: |
1003 | - limit = limit + 1 |
1004 | - |
1005 | - def retryfunc(opts, url, limit): |
1006 | - fo = PyCurlFileObject(url, filename=None, opts=opts) |
1007 | - s = '' |
1008 | - try: |
1009 | - # this is an unfortunate thing. Some file-like objects |
1010 | - # have a default "limit" of None, while the built-in (real) |
1011 | - # file objects have -1. They each break the other, so for |
1012 | - # now, we just force the default if necessary. |
1013 | - if limit is None: s = fo.read() |
1014 | - else: s = fo.read(limit) |
1015 | - |
1016 | - if not opts.checkfunc is None: |
1017 | - cb_func, cb_args, cb_kwargs = \ |
1018 | - self._make_callback(opts.checkfunc) |
1019 | - obj = CallbackObject() |
1020 | - obj.data = s |
1021 | - obj.url = url |
1022 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
1023 | - finally: |
1024 | - fo.close() |
1025 | - return s |
1026 | - |
1027 | - s = self._retry(opts, retryfunc, url, limit) |
1028 | - if limit and len(s) > limit: |
1029 | - err = URLGrabError(8, |
1030 | - _('Exceeded limit (%i): %s') % (limit, url)) |
1031 | - err.url = url |
1032 | - raise err |
1033 | - |
1034 | - return s |
1035 | - |
1036 | - def _make_callback(self, callback_obj): |
1037 | - if callable(callback_obj): |
1038 | - return callback_obj, (), {} |
1039 | - else: |
1040 | - return callback_obj |
1041 | - |
1042 | -# create the default URLGrabber used by urlXXX functions. |
1043 | -# NOTE: actual defaults are set in URLGrabberOptions |
1044 | -default_grabber = URLGrabber() |
1045 | - |
1046 | - |
1047 | -class PyCurlFileObject(): |
1048 | - def __init__(self, url, filename, opts): |
1049 | - self.fo = None |
1050 | - self._hdr_dump = '' |
1051 | - self._parsed_hdr = None |
1052 | - self.url = url |
1053 | - self.scheme = urlparse.urlsplit(self.url)[0] |
1054 | - self.filename = filename |
1055 | - self.append = False |
1056 | - self.reget_time = None |
1057 | - self.opts = opts |
1058 | - if self.opts.reget == 'check_timestamp': |
1059 | - raise NotImplementedError, "check_timestamp regets are not implemented in this ver of urlgrabber. Please report this." |
1060 | - self._complete = False |
1061 | - self._rbuf = '' |
1062 | - self._rbufsize = 1024*8 |
1063 | - self._ttime = time.time() |
1064 | - self._tsize = 0 |
1065 | - self._amount_read = 0 |
1066 | - self._reget_length = 0 |
1067 | - self._prog_running = False |
1068 | - self._error = (None, None) |
1069 | - self.size = None |
1070 | - self._do_open() |
1071 | - |
1072 | - |
1073 | - def __getattr__(self, name): |
1074 | - """This effectively allows us to wrap at the instance level. |
1075 | - Any attribute not found in _this_ object will be searched for |
1076 | - in self.fo. This includes methods.""" |
1077 | - |
1078 | - if hasattr(self.fo, name): |
1079 | - return getattr(self.fo, name) |
1080 | - raise AttributeError, name |
1081 | - |
1082 | - def _retrieve(self, buf): |
1083 | - try: |
1084 | - if not self._prog_running: |
1085 | - if self.opts.progress_obj: |
1086 | - size = self.size + self._reget_length |
1087 | - self.opts.progress_obj.start(self._prog_reportname, |
1088 | - urllib.unquote(self.url), |
1089 | - self._prog_basename, |
1090 | - size=size, |
1091 | - text=self.opts.text) |
1092 | - self._prog_running = True |
1093 | - self.opts.progress_obj.update(self._amount_read) |
1094 | - |
1095 | - self._amount_read += len(buf) |
1096 | - self.fo.write(buf) |
1097 | - return len(buf) |
1098 | - except KeyboardInterrupt: |
1099 | - return -1 |
1100 | - |
1101 | - def _hdr_retrieve(self, buf): |
1102 | - if self._over_max_size(cur=len(self._hdr_dump), |
1103 | - max_size=self.opts.max_header_size): |
1104 | - return -1 |
1105 | - try: |
1106 | - self._hdr_dump += buf |
1107 | - # we have to get the size before we do the progress obj start |
1108 | - # but we can't do that w/o making it do 2 connects, which sucks |
1109 | - # so we cheat and stuff it in here in the hdr_retrieve |
1110 | - if self.scheme in ['http','https'] and buf.lower().find('content-length') != -1: |
1111 | - length = buf.split(':')[1] |
1112 | - self.size = int(length) |
1113 | - elif self.scheme in ['ftp']: |
1114 | - s = None |
1115 | - if buf.startswith('213 '): |
1116 | - s = buf[3:].strip() |
1117 | - elif buf.startswith('150 '): |
1118 | - s = parse150(buf) |
1119 | - if s: |
1120 | - self.size = int(s) |
1121 | - |
1122 | - return len(buf) |
1123 | - except KeyboardInterrupt: |
1124 | - return pycurl.READFUNC_ABORT |
1125 | - |
1126 | - def _return_hdr_obj(self): |
1127 | - if self._parsed_hdr: |
1128 | - return self._parsed_hdr |
1129 | - statusend = self._hdr_dump.find('\n') |
1130 | - hdrfp = StringIO() |
1131 | - hdrfp.write(self._hdr_dump[statusend:]) |
1132 | - self._parsed_hdr = mimetools.Message(hdrfp) |
1133 | - return self._parsed_hdr |
1134 | - |
1135 | - hdr = property(_return_hdr_obj) |
1136 | - http_code = property(fget= |
1137 | - lambda self: self.curl_obj.getinfo(pycurl.RESPONSE_CODE)) |
1138 | - |
1139 | - def _set_opts(self, opts={}): |
1140 | - # XXX |
1141 | - if not opts: |
1142 | - opts = self.opts |
1143 | - |
1144 | - |
1145 | - # defaults we're always going to set |
1146 | - self.curl_obj.setopt(pycurl.NOPROGRESS, False) |
1147 | - self.curl_obj.setopt(pycurl.NOSIGNAL, True) |
1148 | - self.curl_obj.setopt(pycurl.WRITEFUNCTION, self._retrieve) |
1149 | - self.curl_obj.setopt(pycurl.HEADERFUNCTION, self._hdr_retrieve) |
1150 | - self.curl_obj.setopt(pycurl.PROGRESSFUNCTION, self._progress_update) |
1151 | - self.curl_obj.setopt(pycurl.FAILONERROR, True) |
1152 | - self.curl_obj.setopt(pycurl.OPT_FILETIME, True) |
1153 | - |
1154 | - if DEBUG: |
1155 | - self.curl_obj.setopt(pycurl.VERBOSE, True) |
1156 | - if opts.user_agent: |
1157 | - self.curl_obj.setopt(pycurl.USERAGENT, opts.user_agent) |
1158 | - |
1159 | - # maybe to be options later |
1160 | - self.curl_obj.setopt(pycurl.FOLLOWLOCATION, True) |
1161 | - self.curl_obj.setopt(pycurl.MAXREDIRS, 5) |
1162 | - |
1163 | - # timeouts |
1164 | - timeout = 300 |
1165 | - if opts.timeout: |
1166 | - timeout = int(opts.timeout) |
1167 | - self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) |
1168 | - |
1169 | - # ssl options |
1170 | - if self.scheme == 'https': |
1171 | - if opts.ssl_ca_cert: # this may do ZERO with nss according to curl docs |
1172 | - self.curl_obj.setopt(pycurl.CAPATH, opts.ssl_ca_cert) |
1173 | - self.curl_obj.setopt(pycurl.CAINFO, opts.ssl_ca_cert) |
1174 | - self.curl_obj.setopt(pycurl.SSL_VERIFYPEER, opts.ssl_verify_peer) |
1175 | - self.curl_obj.setopt(pycurl.SSL_VERIFYHOST, opts.ssl_verify_host) |
1176 | - if opts.ssl_key: |
1177 | - self.curl_obj.setopt(pycurl.SSLKEY, opts.ssl_key) |
1178 | - if opts.ssl_key_type: |
1179 | - self.curl_obj.setopt(pycurl.SSLKEYTYPE, opts.ssl_key_type) |
1180 | - if opts.ssl_cert: |
1181 | - self.curl_obj.setopt(pycurl.SSLCERT, opts.ssl_cert) |
1182 | - if opts.ssl_cert_type: |
1183 | - self.curl_obj.setopt(pycurl.SSLCERTTYPE, opts.ssl_cert_type) |
1184 | - if opts.ssl_key_pass: |
1185 | - self.curl_obj.setopt(pycurl.SSLKEYPASSWD, opts.ssl_key_pass) |
1186 | - |
1187 | - #headers: |
1188 | - if opts.http_headers and self.scheme in ('http', 'https'): |
1189 | - headers = [] |
1190 | - for (tag, content) in opts.http_headers: |
1191 | - headers.append('%s:%s' % (tag, content)) |
1192 | - self.curl_obj.setopt(pycurl.HTTPHEADER, headers) |
1193 | - |
1194 | - # ranges: |
1195 | - if opts.range or opts.reget: |
1196 | - range_str = self._build_range() |
1197 | - if range_str: |
1198 | - self.curl_obj.setopt(pycurl.RANGE, range_str) |
1199 | - |
1200 | - # throttle/bandwidth |
1201 | - if hasattr(opts, 'raw_throttle') and opts.raw_throttle(): |
1202 | - self.curl_obj.setopt(pycurl.MAX_RECV_SPEED_LARGE, int(opts.raw_throttle())) |
1203 | - |
1204 | - # proxy settings |
1205 | - if opts.proxies: |
1206 | - for (scheme, proxy) in opts.proxies.items(): |
1207 | - if self.scheme in ('ftp'): # only set the ftp proxy for ftp items |
1208 | - if scheme not in ('ftp'): |
1209 | - continue |
1210 | - else: |
1211 | - if proxy == '_none_': proxy = "" |
1212 | - self.curl_obj.setopt(pycurl.PROXY, proxy) |
1213 | - elif self.scheme in ('http', 'https'): |
1214 | - if scheme not in ('http', 'https'): |
1215 | - continue |
1216 | - else: |
1217 | - if proxy == '_none_': proxy = "" |
1218 | - self.curl_obj.setopt(pycurl.PROXY, proxy) |
1219 | - |
1220 | - # FIXME username/password/auth settings |
1221 | - |
1222 | - #posts - simple - expects the fields as they are |
1223 | - if opts.data: |
1224 | - self.curl_obj.setopt(pycurl.POST, True) |
1225 | - self.curl_obj.setopt(pycurl.POSTFIELDS, self._to_utf8(opts.data)) |
1226 | - |
1227 | - # our url |
1228 | - self.curl_obj.setopt(pycurl.URL, self.url) |
1229 | - |
1230 | - |
1231 | - def _do_perform(self): |
1232 | - if self._complete: |
1233 | - return |
1234 | - |
1235 | - try: |
1236 | - self.curl_obj.perform() |
1237 | - except pycurl.error, e: |
1238 | - # XXX - break some of these out a bit more clearly |
1239 | - # to other URLGrabErrors from |
1240 | - # http://curl.haxx.se/libcurl/c/libcurl-errors.html |
1241 | - # this covers e.args[0] == 22 pretty well - which will be common |
1242 | - |
1243 | - code = self.http_code |
1244 | - errcode = e.args[0] |
1245 | - if self._error[0]: |
1246 | - errcode = self._error[0] |
1247 | - |
1248 | - if errcode == 23 and code >= 200 and code < 299: |
1249 | - err = URLGrabError(15, _('User (or something) called abort %s: %s') % (self.url, e)) |
1250 | - err.url = self.url |
1251 | - |
1252 | - # this is probably wrong but ultimately this is what happens |
1253 | - # we have a legit http code and a pycurl 'writer failed' code |
1254 | - # which almost always means something aborted it from outside |
1255 | - # since we cannot know what it is -I'm banking on it being |
1256 | - # a ctrl-c. XXXX - if there's a way of going back two raises to |
1257 | - # figure out what aborted the pycurl process FIXME |
1258 | - raise KeyboardInterrupt |
1259 | - |
1260 | - elif errcode == 28: |
1261 | - err = URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
1262 | - err.url = self.url |
1263 | - raise err |
1264 | - elif errcode == 35: |
1265 | - msg = _("problem making ssl connection") |
1266 | - err = URLGrabError(14, msg) |
1267 | - err.url = self.url |
1268 | - raise err |
1269 | - elif errcode == 37: |
1270 | - msg = _("Could not open/read %s") % (self.url) |
1271 | - err = URLGrabError(14, msg) |
1272 | - err.url = self.url |
1273 | - raise err |
1274 | - |
1275 | - elif errcode == 42: |
1276 | - err = URLGrabError(15, _('User (or something) called abort %s: %s') % (self.url, e)) |
1277 | - err.url = self.url |
1278 | - # this is probably wrong but ultimately this is what happens |
1279 | - # we have a legit http code and a pycurl 'writer failed' code |
1280 | - # which almost always means something aborted it from outside |
1281 | - # since we cannot know what it is -I'm banking on it being |
1282 | - # a ctrl-c. XXXX - if there's a way of going back two raises to |
1283 | - # figure out what aborted the pycurl process FIXME |
1284 | - raise KeyboardInterrupt |
1285 | - |
1286 | - elif errcode == 58: |
1287 | - msg = _("problem with the local client certificate") |
1288 | - err = URLGrabError(14, msg) |
1289 | - err.url = self.url |
1290 | - raise err |
1291 | - |
1292 | - elif errcode == 60: |
1293 | - msg = _("client cert cannot be verified or client cert incorrect") |
1294 | - err = URLGrabError(14, msg) |
1295 | - err.url = self.url |
1296 | - raise err |
1297 | - |
1298 | - elif errcode == 63: |
1299 | - if self._error[1]: |
1300 | - msg = self._error[1] |
1301 | - else: |
1302 | - msg = _("Max download size exceeded on %s") % (self.url) |
1303 | - err = URLGrabError(14, msg) |
1304 | - err.url = self.url |
1305 | - raise err |
1306 | - |
1307 | - elif str(e.args[1]) == '' and self.http_code != 0: # fake it until you make it |
1308 | - msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) |
1309 | - else: |
1310 | - msg = 'PYCURL ERROR %s - "%s"' % (errcode, str(e.args[1])) |
1311 | - code = errcode |
1312 | - err = URLGrabError(14, msg) |
1313 | - err.code = code |
1314 | - err.exception = e |
1315 | - raise err |
1316 | - |
1317 | - def _do_open(self): |
1318 | - self.curl_obj = _curl_cache |
1319 | - self.curl_obj.reset() # reset all old settings away, just in case |
1320 | - # setup any ranges |
1321 | - self._set_opts() |
1322 | - self._do_grab() |
1323 | - return self.fo |
1324 | - |
1325 | - def _add_headers(self): |
1326 | - pass |
1327 | - |
1328 | - def _build_range(self): |
1329 | - reget_length = 0 |
1330 | - rt = None |
1331 | - if self.opts.reget and type(self.filename) in types.StringTypes: |
1332 | - # we have reget turned on and we're dumping to a file |
1333 | - try: |
1334 | - s = os.stat(self.filename) |
1335 | - except OSError: |
1336 | - pass |
1337 | - else: |
1338 | - self.reget_time = s[stat.ST_MTIME] |
1339 | - reget_length = s[stat.ST_SIZE] |
1340 | - |
1341 | - # Set initial length when regetting |
1342 | - self._amount_read = reget_length |
1343 | - self._reget_length = reget_length # set where we started from, too |
1344 | - |
1345 | - rt = reget_length, '' |
1346 | - self.append = 1 |
1347 | - |
1348 | - if self.opts.range: |
1349 | - rt = self.opts.range |
1350 | - if rt[0]: rt = (rt[0] + reget_length, rt[1]) |
1351 | - |
1352 | - if rt: |
1353 | - header = range_tuple_to_header(rt) |
1354 | - if header: |
1355 | - return header.split('=')[1] |
1356 | - |
1357 | - |
1358 | - |
1359 | - def _make_request(self, req, opener): |
1360 | - #XXXX |
1361 | - # This doesn't do anything really, but we could use this |
1362 | - # instead of do_open() to catch a lot of crap errors as |
1363 | - # mstenner did before here |
1364 | - return (self.fo, self.hdr) |
1365 | - |
1366 | - try: |
1367 | - if self.opts.timeout: |
1368 | - old_to = socket.getdefaulttimeout() |
1369 | - socket.setdefaulttimeout(self.opts.timeout) |
1370 | - try: |
1371 | - fo = opener.open(req) |
1372 | - finally: |
1373 | - socket.setdefaulttimeout(old_to) |
1374 | - else: |
1375 | - fo = opener.open(req) |
1376 | - hdr = fo.info() |
1377 | - except ValueError, e: |
1378 | - err = URLGrabError(1, _('Bad URL: %s : %s') % (self.url, e, )) |
1379 | - err.url = self.url |
1380 | - raise err |
1381 | - |
1382 | - except RangeError, e: |
1383 | - err = URLGrabError(9, _('%s on %s') % (e, self.url)) |
1384 | - err.url = self.url |
1385 | - raise err |
1386 | - except urllib2.HTTPError, e: |
1387 | - new_e = URLGrabError(14, _('%s on %s') % (e, self.url)) |
1388 | - new_e.code = e.code |
1389 | - new_e.exception = e |
1390 | - new_e.url = self.url |
1391 | - raise new_e |
1392 | - except IOError, e: |
1393 | - if hasattr(e, 'reason') and isinstance(e.reason, socket.timeout): |
1394 | - err = URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
1395 | - err.url = self.url |
1396 | - raise err |
1397 | - else: |
1398 | - err = URLGrabError(4, _('IOError on %s: %s') % (self.url, e)) |
1399 | - err.url = self.url |
1400 | - raise err |
1401 | - |
1402 | - except OSError, e: |
1403 | - err = URLGrabError(5, _('%s on %s') % (e, self.url)) |
1404 | - err.url = self.url |
1405 | - raise err |
1406 | - |
1407 | - except HTTPException, e: |
1408 | - err = URLGrabError(7, _('HTTP Exception (%s) on %s: %s') % \ |
1409 | - (e.__class__.__name__, self.url, e)) |
1410 | - err.url = self.url |
1411 | - raise err |
1412 | - |
1413 | - else: |
1414 | - return (fo, hdr) |
1415 | - |
1416 | - def _do_grab(self): |
1417 | - """dump the file to a filename or StringIO buffer""" |
1418 | - |
1419 | - if self._complete: |
1420 | - return |
1421 | - _was_filename = False |
1422 | - if type(self.filename) in types.StringTypes and self.filename: |
1423 | - _was_filename = True |
1424 | - self._prog_reportname = str(self.filename) |
1425 | - self._prog_basename = os.path.basename(self.filename) |
1426 | - |
1427 | - if self.append: mode = 'ab' |
1428 | - else: mode = 'wb' |
1429 | - |
1430 | - if DEBUG: DEBUG.info('opening local file "%s" with mode %s' % \ |
1431 | - (self.filename, mode)) |
1432 | - try: |
1433 | - self.fo = open(self.filename, mode) |
1434 | - except IOError, e: |
1435 | - err = URLGrabError(16, _(\ |
1436 | - 'error opening local file from %s, IOError: %s') % (self.url, e)) |
1437 | - err.url = self.url |
1438 | - raise err |
1439 | - |
1440 | - else: |
1441 | - self._prog_reportname = 'MEMORY' |
1442 | - self._prog_basename = 'MEMORY' |
1443 | - |
1444 | - |
1445 | - self.fo = StringIO() |
1446 | - # if this is to be a tempfile instead.... |
1447 | - # it just makes crap in the tempdir |
1448 | - #fh, self._temp_name = mkstemp() |
1449 | - #self.fo = open(self._temp_name, 'wb') |
1450 | - |
1451 | - |
1452 | - self._do_perform() |
1453 | - |
1454 | - |
1455 | - |
1456 | - if _was_filename: |
1457 | - # close it up |
1458 | - self.fo.flush() |
1459 | - self.fo.close() |
1460 | - # set the time |
1461 | - mod_time = self.curl_obj.getinfo(pycurl.INFO_FILETIME) |
1462 | - if mod_time != -1: |
1463 | - os.utime(self.filename, (mod_time, mod_time)) |
1464 | - # re open it |
1465 | - self.fo = open(self.filename, 'r') |
1466 | - else: |
1467 | - #self.fo = open(self._temp_name, 'r') |
1468 | - self.fo.seek(0) |
1469 | - |
1470 | - self._complete = True |
1471 | - |
1472 | - def _fill_buffer(self, amt=None): |
1473 | - """fill the buffer to contain at least 'amt' bytes by reading |
1474 | - from the underlying file object. If amt is None, then it will |
1475 | - read until it gets nothing more. It updates the progress meter |
1476 | - and throttles after every self._rbufsize bytes.""" |
1477 | - # the _rbuf test is only in this first 'if' for speed. It's not |
1478 | - # logically necessary |
1479 | - if self._rbuf and not amt is None: |
1480 | - L = len(self._rbuf) |
1481 | - if amt > L: |
1482 | - amt = amt - L |
1483 | - else: |
1484 | - return |
1485 | - |
1486 | - # if we've made it here, then we don't have enough in the buffer |
1487 | - # and we need to read more. |
1488 | - |
1489 | - if not self._complete: self._do_grab() #XXX cheater - change on ranges |
1490 | - |
1491 | - buf = [self._rbuf] |
1492 | - bufsize = len(self._rbuf) |
1493 | - while amt is None or amt: |
1494 | - # first, delay if necessary for throttling reasons |
1495 | - if self.opts.raw_throttle(): |
1496 | - diff = self._tsize/self.opts.raw_throttle() - \ |
1497 | - (time.time() - self._ttime) |
1498 | - if diff > 0: time.sleep(diff) |
1499 | - self._ttime = time.time() |
1500 | - |
1501 | - # now read some data, up to self._rbufsize |
1502 | - if amt is None: readamount = self._rbufsize |
1503 | - else: readamount = min(amt, self._rbufsize) |
1504 | - try: |
1505 | - new = self.fo.read(readamount) |
1506 | - except socket.error, e: |
1507 | - err = URLGrabError(4, _('Socket Error on %s: %s') % (self.url, e)) |
1508 | - err.url = self.url |
1509 | - raise err |
1510 | - |
1511 | - except socket.timeout, e: |
1512 | - raise URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
1513 | - err.url = self.url |
1514 | - raise err |
1515 | - |
1516 | - except IOError, e: |
1517 | - raise URLGrabError(4, _('IOError on %s: %s') %(self.url, e)) |
1518 | - err.url = self.url |
1519 | - raise err |
1520 | - |
1521 | - newsize = len(new) |
1522 | - if not newsize: break # no more to read |
1523 | - |
1524 | - if amt: amt = amt - newsize |
1525 | - buf.append(new) |
1526 | - bufsize = bufsize + newsize |
1527 | - self._tsize = newsize |
1528 | - self._amount_read = self._amount_read + newsize |
1529 | - #if self.opts.progress_obj: |
1530 | - # self.opts.progress_obj.update(self._amount_read) |
1531 | - |
1532 | - self._rbuf = string.join(buf, '') |
1533 | - return |
1534 | - |
1535 | - def _progress_update(self, download_total, downloaded, upload_total, uploaded): |
1536 | - if self._over_max_size(cur=self._amount_read-self._reget_length): |
1537 | - return -1 |
1538 | - |
1539 | - try: |
1540 | - if self._prog_running: |
1541 | - downloaded += self._reget_length |
1542 | - self.opts.progress_obj.update(downloaded) |
1543 | - except KeyboardInterrupt: |
1544 | - return -1 |
1545 | - |
1546 | - def _over_max_size(self, cur, max_size=None): |
1547 | - |
1548 | - if not max_size: |
1549 | - max_size = self.size |
1550 | - if self.opts.size: # if we set an opts size use that, no matter what |
1551 | - max_size = self.opts.size |
1552 | - if not max_size: return False # if we have None for all of the Max then this is dumb |
1553 | - if cur > max_size + max_size*.10: |
1554 | - |
1555 | - msg = _("Downloaded more than max size for %s: %s > %s") \ |
1556 | - % (self.url, cur, max_size) |
1557 | - self._error = (pycurl.E_FILESIZE_EXCEEDED, msg) |
1558 | - return True |
1559 | - return False |
1560 | - |
1561 | - def _to_utf8(self, obj, errors='replace'): |
1562 | - '''convert 'unicode' to an encoded utf-8 byte string ''' |
1563 | - # stolen from yum.i18n |
1564 | - if isinstance(obj, unicode): |
1565 | - obj = obj.encode('utf-8', errors) |
1566 | - return obj |
1567 | - |
1568 | - def read(self, amt=None): |
1569 | - self._fill_buffer(amt) |
1570 | - if amt is None: |
1571 | - s, self._rbuf = self._rbuf, '' |
1572 | - else: |
1573 | - s, self._rbuf = self._rbuf[:amt], self._rbuf[amt:] |
1574 | - return s |
1575 | - |
1576 | - def readline(self, limit=-1): |
1577 | - if not self._complete: self._do_grab() |
1578 | - return self.fo.readline() |
1579 | - |
1580 | - i = string.find(self._rbuf, '\n') |
1581 | - while i < 0 and not (0 < limit <= len(self._rbuf)): |
1582 | - L = len(self._rbuf) |
1583 | - self._fill_buffer(L + self._rbufsize) |
1584 | - if not len(self._rbuf) > L: break |
1585 | - i = string.find(self._rbuf, '\n', L) |
1586 | - |
1587 | - if i < 0: i = len(self._rbuf) |
1588 | - else: i = i+1 |
1589 | - if 0 <= limit < len(self._rbuf): i = limit |
1590 | - |
1591 | - s, self._rbuf = self._rbuf[:i], self._rbuf[i:] |
1592 | - return s |
1593 | - |
1594 | - def close(self): |
1595 | - if self._prog_running: |
1596 | - self.opts.progress_obj.end(self._amount_read) |
1597 | - self.fo.close() |
1598 | - |
1599 | - |
1600 | -_curl_cache = pycurl.Curl() # make one and reuse it over and over and over |
1601 | - |
1602 | - |
1603 | -##################################################################### |
1604 | -# DEPRECATED FUNCTIONS |
1605 | -def set_throttle(new_throttle): |
1606 | - """Deprecated. Use: default_grabber.throttle = new_throttle""" |
1607 | - default_grabber.throttle = new_throttle |
1608 | - |
1609 | -def set_bandwidth(new_bandwidth): |
1610 | - """Deprecated. Use: default_grabber.bandwidth = new_bandwidth""" |
1611 | - default_grabber.bandwidth = new_bandwidth |
1612 | - |
1613 | -def set_progress_obj(new_progress_obj): |
1614 | - """Deprecated. Use: default_grabber.progress_obj = new_progress_obj""" |
1615 | - default_grabber.progress_obj = new_progress_obj |
1616 | - |
1617 | -def set_user_agent(new_user_agent): |
1618 | - """Deprecated. Use: default_grabber.user_agent = new_user_agent""" |
1619 | - default_grabber.user_agent = new_user_agent |
1620 | - |
1621 | -def retrygrab(url, filename=None, copy_local=0, close_connection=0, |
1622 | - progress_obj=None, throttle=None, bandwidth=None, |
1623 | - numtries=3, retrycodes=[-1,2,4,5,6,7], checkfunc=None): |
1624 | - """Deprecated. Use: urlgrab() with the retry arg instead""" |
1625 | - kwargs = {'copy_local' : copy_local, |
1626 | - 'close_connection' : close_connection, |
1627 | - 'progress_obj' : progress_obj, |
1628 | - 'throttle' : throttle, |
1629 | - 'bandwidth' : bandwidth, |
1630 | - 'retry' : numtries, |
1631 | - 'retrycodes' : retrycodes, |
1632 | - 'checkfunc' : checkfunc |
1633 | - } |
1634 | - return urlgrab(url, filename, **kwargs) |
1635 | - |
1636 | - |
1637 | -##################################################################### |
1638 | -# TESTING |
1639 | -def _main_test(): |
1640 | - try: url, filename = sys.argv[1:3] |
1641 | - except ValueError: |
1642 | - print 'usage:', sys.argv[0], \ |
1643 | - '<url> <filename> [copy_local=0|1] [close_connection=0|1]' |
1644 | - sys.exit() |
1645 | - |
1646 | - kwargs = {} |
1647 | - for a in sys.argv[3:]: |
1648 | - k, v = string.split(a, '=', 1) |
1649 | - kwargs[k] = int(v) |
1650 | - |
1651 | - set_throttle(1.0) |
1652 | - set_bandwidth(32 * 1024) |
1653 | - print "throttle: %s, throttle bandwidth: %s B/s" % (default_grabber.throttle, |
1654 | - default_grabber.bandwidth) |
1655 | - |
1656 | - try: from progress import text_progress_meter |
1657 | - except ImportError, e: pass |
1658 | - else: kwargs['progress_obj'] = text_progress_meter() |
1659 | - |
1660 | - try: name = apply(urlgrab, (url, filename), kwargs) |
1661 | - except URLGrabError, e: print e |
1662 | - else: print 'LOCAL FILE:', name |
1663 | - |
1664 | - |
1665 | -def _retry_test(): |
1666 | - try: url, filename = sys.argv[1:3] |
1667 | - except ValueError: |
1668 | - print 'usage:', sys.argv[0], \ |
1669 | - '<url> <filename> [copy_local=0|1] [close_connection=0|1]' |
1670 | - sys.exit() |
1671 | - |
1672 | - kwargs = {} |
1673 | - for a in sys.argv[3:]: |
1674 | - k, v = string.split(a, '=', 1) |
1675 | - kwargs[k] = int(v) |
1676 | - |
1677 | - try: from progress import text_progress_meter |
1678 | - except ImportError, e: pass |
1679 | - else: kwargs['progress_obj'] = text_progress_meter() |
1680 | - |
1681 | - def cfunc(filename, hello, there='foo'): |
1682 | - print hello, there |
1683 | - import random |
1684 | - rnum = random.random() |
1685 | - if rnum < .5: |
1686 | - print 'forcing retry' |
1687 | - raise URLGrabError(-1, 'forcing retry') |
1688 | - if rnum < .75: |
1689 | - print 'forcing failure' |
1690 | - raise URLGrabError(-2, 'forcing immediate failure') |
1691 | - print 'success' |
1692 | - return |
1693 | - |
1694 | - kwargs['checkfunc'] = (cfunc, ('hello',), {'there':'there'}) |
1695 | - try: name = apply(retrygrab, (url, filename), kwargs) |
1696 | - except URLGrabError, e: print e |
1697 | - else: print 'LOCAL FILE:', name |
1698 | - |
1699 | -def _file_object_test(filename=None): |
1700 | - import cStringIO |
1701 | - if filename is None: |
1702 | - filename = __file__ |
1703 | - print 'using file "%s" for comparisons' % filename |
1704 | - fo = open(filename) |
1705 | - s_input = fo.read() |
1706 | - fo.close() |
1707 | - |
1708 | - for testfunc in [_test_file_object_smallread, |
1709 | - _test_file_object_readall, |
1710 | - _test_file_object_readline, |
1711 | - _test_file_object_readlines]: |
1712 | - fo_input = cStringIO.StringIO(s_input) |
1713 | - fo_output = cStringIO.StringIO() |
1714 | - wrapper = PyCurlFileObject(fo_input, None, 0) |
1715 | - print 'testing %-30s ' % testfunc.__name__, |
1716 | - testfunc(wrapper, fo_output) |
1717 | - s_output = fo_output.getvalue() |
1718 | - if s_output == s_input: print 'passed' |
1719 | - else: print 'FAILED' |
1720 | - |
1721 | -def _test_file_object_smallread(wrapper, fo_output): |
1722 | - while 1: |
1723 | - s = wrapper.read(23) |
1724 | - fo_output.write(s) |
1725 | - if not s: return |
1726 | - |
1727 | -def _test_file_object_readall(wrapper, fo_output): |
1728 | - s = wrapper.read() |
1729 | - fo_output.write(s) |
1730 | - |
1731 | -def _test_file_object_readline(wrapper, fo_output): |
1732 | - while 1: |
1733 | - s = wrapper.readline() |
1734 | - fo_output.write(s) |
1735 | - if not s: return |
1736 | - |
1737 | -def _test_file_object_readlines(wrapper, fo_output): |
1738 | - li = wrapper.readlines() |
1739 | - fo_output.write(string.join(li, '')) |
1740 | - |
1741 | -if __name__ == '__main__': |
1742 | - _main_test() |
1743 | - _retry_test() |
1744 | - _file_object_test('test') |
1745 | |
1746 | === removed directory '.pc/progress_fix.diff' |
1747 | === removed directory '.pc/progress_fix.diff/urlgrabber' |
1748 | === removed file '.pc/progress_fix.diff/urlgrabber/progress.py' |
1749 | --- .pc/progress_fix.diff/urlgrabber/progress.py 2010-07-08 17:40:08 +0000 |
1750 | +++ .pc/progress_fix.diff/urlgrabber/progress.py 1970-01-01 00:00:00 +0000 |
1751 | @@ -1,755 +0,0 @@ |
1752 | -# This library is free software; you can redistribute it and/or |
1753 | -# modify it under the terms of the GNU Lesser General Public |
1754 | -# License as published by the Free Software Foundation; either |
1755 | -# version 2.1 of the License, or (at your option) any later version. |
1756 | -# |
1757 | -# This library is distributed in the hope that it will be useful, |
1758 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
1759 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
1760 | -# Lesser General Public License for more details. |
1761 | -# |
1762 | -# You should have received a copy of the GNU Lesser General Public |
1763 | -# License along with this library; if not, write to the |
1764 | -# Free Software Foundation, Inc., |
1765 | -# 59 Temple Place, Suite 330, |
1766 | -# Boston, MA 02111-1307 USA |
1767 | - |
1768 | -# This file is part of urlgrabber, a high-level cross-protocol url-grabber |
1769 | -# Copyright 2002-2004 Michael D. Stenner, Ryan Tomayko |
1770 | - |
1771 | - |
1772 | -import sys |
1773 | -import time |
1774 | -import math |
1775 | -import thread |
1776 | -import fcntl |
1777 | -import struct |
1778 | -import termios |
1779 | - |
1780 | -# Code from http://mail.python.org/pipermail/python-list/2000-May/033365.html |
1781 | -def terminal_width(fd=1): |
1782 | - """ Get the real terminal width """ |
1783 | - try: |
1784 | - buf = 'abcdefgh' |
1785 | - buf = fcntl.ioctl(fd, termios.TIOCGWINSZ, buf) |
1786 | - ret = struct.unpack('hhhh', buf)[1] |
1787 | - if ret == 0: |
1788 | - return 80 |
1789 | - # Add minimum too? |
1790 | - return ret |
1791 | - except: # IOError |
1792 | - return 80 |
1793 | - |
1794 | -_term_width_val = None |
1795 | -_term_width_last = None |
1796 | -def terminal_width_cached(fd=1, cache_timeout=1.000): |
1797 | - """ Get the real terminal width, but cache it for a bit. """ |
1798 | - global _term_width_val |
1799 | - global _term_width_last |
1800 | - |
1801 | - now = time.time() |
1802 | - if _term_width_val is None or (now - _term_width_last) > cache_timeout: |
1803 | - _term_width_val = terminal_width(fd) |
1804 | - _term_width_last = now |
1805 | - return _term_width_val |
1806 | - |
1807 | -class TerminalLine: |
1808 | - """ Help create dynamic progress bars, uses terminal_width_cached(). """ |
1809 | - |
1810 | - def __init__(self, min_rest=0, beg_len=None, fd=1, cache_timeout=1.000): |
1811 | - if beg_len is None: |
1812 | - beg_len = min_rest |
1813 | - self._min_len = min_rest |
1814 | - self._llen = terminal_width_cached(fd, cache_timeout) |
1815 | - if self._llen < beg_len: |
1816 | - self._llen = beg_len |
1817 | - self._fin = False |
1818 | - |
1819 | - def __len__(self): |
1820 | - """ Usable length for elements. """ |
1821 | - return self._llen - self._min_len |
1822 | - |
1823 | - def rest_split(self, fixed, elements=2): |
1824 | - """ After a fixed length, split the rest of the line length among |
1825 | - a number of different elements (default=2). """ |
1826 | - if self._llen < fixed: |
1827 | - return 0 |
1828 | - return (self._llen - fixed) / elements |
1829 | - |
1830 | - def add(self, element, full_len=None): |
1831 | - """ If there is room left in the line, above min_len, add element. |
1832 | - Note that as soon as one add fails all the rest will fail too. """ |
1833 | - |
1834 | - if full_len is None: |
1835 | - full_len = len(element) |
1836 | - if len(self) < full_len: |
1837 | - self._fin = True |
1838 | - if self._fin: |
1839 | - return '' |
1840 | - |
1841 | - self._llen -= len(element) |
1842 | - return element |
1843 | - |
1844 | - def rest(self): |
1845 | - """ Current rest of line, same as .rest_split(fixed=0, elements=1). """ |
1846 | - return self._llen |
1847 | - |
1848 | -class BaseMeter: |
1849 | - def __init__(self): |
1850 | - self.update_period = 0.3 # seconds |
1851 | - |
1852 | - self.filename = None |
1853 | - self.url = None |
1854 | - self.basename = None |
1855 | - self.text = None |
1856 | - self.size = None |
1857 | - self.start_time = None |
1858 | - self.last_amount_read = 0 |
1859 | - self.last_update_time = None |
1860 | - self.re = RateEstimator() |
1861 | - |
1862 | - def start(self, filename=None, url=None, basename=None, |
1863 | - size=None, now=None, text=None): |
1864 | - self.filename = filename |
1865 | - self.url = url |
1866 | - self.basename = basename |
1867 | - self.text = text |
1868 | - |
1869 | - #size = None ######### TESTING |
1870 | - self.size = size |
1871 | - if not size is None: self.fsize = format_number(size) + 'B' |
1872 | - |
1873 | - if now is None: now = time.time() |
1874 | - self.start_time = now |
1875 | - self.re.start(size, now) |
1876 | - self.last_amount_read = 0 |
1877 | - self.last_update_time = now |
1878 | - self._do_start(now) |
1879 | - |
1880 | - def _do_start(self, now=None): |
1881 | - pass |
1882 | - |
1883 | - def update(self, amount_read, now=None): |
1884 | - # for a real gui, you probably want to override and put a call |
1885 | - # to your mainloop iteration function here |
1886 | - if now is None: now = time.time() |
1887 | - if (now >= self.last_update_time + self.update_period) or \ |
1888 | - not self.last_update_time: |
1889 | - self.re.update(amount_read, now) |
1890 | - self.last_amount_read = amount_read |
1891 | - self.last_update_time = now |
1892 | - self._do_update(amount_read, now) |
1893 | - |
1894 | - def _do_update(self, amount_read, now=None): |
1895 | - pass |
1896 | - |
1897 | - def end(self, amount_read, now=None): |
1898 | - if now is None: now = time.time() |
1899 | - self.re.update(amount_read, now) |
1900 | - self.last_amount_read = amount_read |
1901 | - self.last_update_time = now |
1902 | - self._do_end(amount_read, now) |
1903 | - |
1904 | - def _do_end(self, amount_read, now=None): |
1905 | - pass |
1906 | - |
1907 | -# This is kind of a hack, but progress is gotten from grabber which doesn't |
1908 | -# know about the total size to download. So we do this so we can get the data |
1909 | -# out of band here. This will be "fixed" one way or anther soon. |
1910 | -_text_meter_total_size = 0 |
1911 | -_text_meter_sofar_size = 0 |
1912 | -def text_meter_total_size(size, downloaded=0): |
1913 | - global _text_meter_total_size |
1914 | - global _text_meter_sofar_size |
1915 | - _text_meter_total_size = size |
1916 | - _text_meter_sofar_size = downloaded |
1917 | - |
1918 | -# |
1919 | -# update: No size (minimal: 17 chars) |
1920 | -# ----------------------------------- |
1921 | -# <text> <rate> | <current size> <elapsed time> |
1922 | -# 8-48 1 8 3 6 1 9 5 |
1923 | -# |
1924 | -# Order: 1. <text>+<current size> (17) |
1925 | -# 2. +<elapsed time> (10, total: 27) |
1926 | -# 3. + ( 5, total: 32) |
1927 | -# 4. +<rate> ( 9, total: 41) |
1928 | -# |
1929 | -# update: Size, Single file |
1930 | -# ------------------------- |
1931 | -# <text> <pc> <bar> <rate> | <current size> <eta time> ETA |
1932 | -# 8-25 1 3-4 1 6-16 1 8 3 6 1 9 1 3 1 |
1933 | -# |
1934 | -# Order: 1. <text>+<current size> (17) |
1935 | -# 2. +<eta time> (10, total: 27) |
1936 | -# 3. +ETA ( 5, total: 32) |
1937 | -# 4. +<pc> ( 4, total: 36) |
1938 | -# 5. +<rate> ( 9, total: 45) |
1939 | -# 6. +<bar> ( 7, total: 52) |
1940 | -# |
1941 | -# update: Size, All files |
1942 | -# ----------------------- |
1943 | -# <text> <total pc> <pc> <bar> <rate> | <current size> <eta time> ETA |
1944 | -# 8-22 1 5-7 1 3-4 1 6-12 1 8 3 6 1 9 1 3 1 |
1945 | -# |
1946 | -# Order: 1. <text>+<current size> (17) |
1947 | -# 2. +<eta time> (10, total: 27) |
1948 | -# 3. +ETA ( 5, total: 32) |
1949 | -# 4. +<total pc> ( 5, total: 37) |
1950 | -# 4. +<pc> ( 4, total: 41) |
1951 | -# 5. +<rate> ( 9, total: 50) |
1952 | -# 6. +<bar> ( 7, total: 57) |
1953 | -# |
1954 | -# end |
1955 | -# --- |
1956 | -# <text> | <current size> <elapsed time> |
1957 | -# 8-56 3 6 1 9 5 |
1958 | -# |
1959 | -# Order: 1. <text> ( 8) |
1960 | -# 2. +<current size> ( 9, total: 17) |
1961 | -# 3. +<elapsed time> (10, total: 27) |
1962 | -# 4. + ( 5, total: 32) |
1963 | -# |
1964 | - |
1965 | -class TextMeter(BaseMeter): |
1966 | - def __init__(self, fo=sys.stderr): |
1967 | - BaseMeter.__init__(self) |
1968 | - self.fo = fo |
1969 | - |
1970 | - def _do_update(self, amount_read, now=None): |
1971 | - etime = self.re.elapsed_time() |
1972 | - fetime = format_time(etime) |
1973 | - fread = format_number(amount_read) |
1974 | - #self.size = None |
1975 | - if self.text is not None: |
1976 | - text = self.text |
1977 | - else: |
1978 | - text = self.basename |
1979 | - |
1980 | - ave_dl = format_number(self.re.average_rate()) |
1981 | - sofar_size = None |
1982 | - if _text_meter_total_size: |
1983 | - sofar_size = _text_meter_sofar_size + amount_read |
1984 | - sofar_pc = (sofar_size * 100) / _text_meter_total_size |
1985 | - |
1986 | - # Include text + ui_rate in minimal |
1987 | - tl = TerminalLine(8, 8+1+8) |
1988 | - ui_size = tl.add(' | %5sB' % fread) |
1989 | - if self.size is None: |
1990 | - ui_time = tl.add(' %9s' % fetime) |
1991 | - ui_end = tl.add(' ' * 5) |
1992 | - ui_rate = tl.add(' %5sB/s' % ave_dl) |
1993 | - out = '%-*.*s%s%s%s%s\r' % (tl.rest(), tl.rest(), text, |
1994 | - ui_rate, ui_size, ui_time, ui_end) |
1995 | - else: |
1996 | - rtime = self.re.remaining_time() |
1997 | - frtime = format_time(rtime) |
1998 | - frac = self.re.fraction_read() |
1999 | - |
2000 | - ui_time = tl.add(' %9s' % frtime) |
2001 | - ui_end = tl.add(' ETA ') |
2002 | - |
2003 | - if sofar_size is None: |
2004 | - ui_sofar_pc = '' |
2005 | - else: |
2006 | - ui_sofar_pc = tl.add(' (%i%%)' % sofar_pc, |
2007 | - full_len=len(" (100%)")) |
2008 | - |
2009 | - ui_pc = tl.add(' %2i%%' % (frac*100)) |
2010 | - ui_rate = tl.add(' %5sB/s' % ave_dl) |
2011 | - # Make text grow a bit before we start growing the bar too |
2012 | - blen = 4 + tl.rest_split(8 + 8 + 4) |
2013 | - bar = '='*int(blen * frac) |
2014 | - if (blen * frac) - int(blen * frac) >= 0.5: |
2015 | - bar += '-' |
2016 | - ui_bar = tl.add(' [%-*.*s]' % (blen, blen, bar)) |
2017 | - out = '%-*.*s%s%s%s%s%s%s%s\r' % (tl.rest(), tl.rest(), text, |
2018 | - ui_sofar_pc, ui_pc, ui_bar, |
2019 | - ui_rate, ui_size, ui_time, ui_end) |
2020 | - |
2021 | - self.fo.write(out) |
2022 | - self.fo.flush() |
2023 | - |
2024 | - def _do_end(self, amount_read, now=None): |
2025 | - global _text_meter_total_size |
2026 | - global _text_meter_sofar_size |
2027 | - |
2028 | - total_time = format_time(self.re.elapsed_time()) |
2029 | - total_size = format_number(amount_read) |
2030 | - if self.text is not None: |
2031 | - text = self.text |
2032 | - else: |
2033 | - text = self.basename |
2034 | - |
2035 | - tl = TerminalLine(8) |
2036 | - ui_size = tl.add(' | %5sB' % total_size) |
2037 | - ui_time = tl.add(' %9s' % total_time) |
2038 | - not_done = self.size is not None and amount_read != self.size |
2039 | - if not_done: |
2040 | - ui_end = tl.add(' ... ') |
2041 | - else: |
2042 | - ui_end = tl.add(' ' * 5) |
2043 | - |
2044 | - out = '\r%-*.*s%s%s%s\n' % (tl.rest(), tl.rest(), text, |
2045 | - ui_size, ui_time, ui_end) |
2046 | - self.fo.write(out) |
2047 | - self.fo.flush() |
2048 | - |
2049 | - # Don't add size to the sofar size until we have all of it. |
2050 | - # If we don't have a size, then just pretend/hope we got all of it. |
2051 | - if not_done: |
2052 | - return |
2053 | - |
2054 | - if _text_meter_total_size: |
2055 | - _text_meter_sofar_size += amount_read |
2056 | - if _text_meter_total_size <= _text_meter_sofar_size: |
2057 | - _text_meter_total_size = 0 |
2058 | - _text_meter_sofar_size = 0 |
2059 | - |
2060 | -text_progress_meter = TextMeter |
2061 | - |
2062 | -class MultiFileHelper(BaseMeter): |
2063 | - def __init__(self, master): |
2064 | - BaseMeter.__init__(self) |
2065 | - self.master = master |
2066 | - |
2067 | - def _do_start(self, now): |
2068 | - self.master.start_meter(self, now) |
2069 | - |
2070 | - def _do_update(self, amount_read, now): |
2071 | - # elapsed time since last update |
2072 | - self.master.update_meter(self, now) |
2073 | - |
2074 | - def _do_end(self, amount_read, now): |
2075 | - self.ftotal_time = format_time(now - self.start_time) |
2076 | - self.ftotal_size = format_number(self.last_amount_read) |
2077 | - self.master.end_meter(self, now) |
2078 | - |
2079 | - def failure(self, message, now=None): |
2080 | - self.master.failure_meter(self, message, now) |
2081 | - |
2082 | - def message(self, message): |
2083 | - self.master.message_meter(self, message) |
2084 | - |
2085 | -class MultiFileMeter: |
2086 | - helperclass = MultiFileHelper |
2087 | - def __init__(self): |
2088 | - self.meters = [] |
2089 | - self.in_progress_meters = [] |
2090 | - self._lock = thread.allocate_lock() |
2091 | - self.update_period = 0.3 # seconds |
2092 | - |
2093 | - self.numfiles = None |
2094 | - self.finished_files = 0 |
2095 | - self.failed_files = 0 |
2096 | - self.open_files = 0 |
2097 | - self.total_size = None |
2098 | - self.failed_size = 0 |
2099 | - self.start_time = None |
2100 | - self.finished_file_size = 0 |
2101 | - self.last_update_time = None |
2102 | - self.re = RateEstimator() |
2103 | - |
2104 | - def start(self, numfiles=None, total_size=None, now=None): |
2105 | - if now is None: now = time.time() |
2106 | - self.numfiles = numfiles |
2107 | - self.finished_files = 0 |
2108 | - self.failed_files = 0 |
2109 | - self.open_files = 0 |
2110 | - self.total_size = total_size |
2111 | - self.failed_size = 0 |
2112 | - self.start_time = now |
2113 | - self.finished_file_size = 0 |
2114 | - self.last_update_time = now |
2115 | - self.re.start(total_size, now) |
2116 | - self._do_start(now) |
2117 | - |
2118 | - def _do_start(self, now): |
2119 | - pass |
2120 | - |
2121 | - def end(self, now=None): |
2122 | - if now is None: now = time.time() |
2123 | - self._do_end(now) |
2124 | - |
2125 | - def _do_end(self, now): |
2126 | - pass |
2127 | - |
2128 | - def lock(self): self._lock.acquire() |
2129 | - def unlock(self): self._lock.release() |
2130 | - |
2131 | - ########################################################### |
2132 | - # child meter creation and destruction |
2133 | - def newMeter(self): |
2134 | - newmeter = self.helperclass(self) |
2135 | - self.meters.append(newmeter) |
2136 | - return newmeter |
2137 | - |
2138 | - def removeMeter(self, meter): |
2139 | - self.meters.remove(meter) |
2140 | - |
2141 | - ########################################################### |
2142 | - # child functions - these should only be called by helpers |
2143 | - def start_meter(self, meter, now): |
2144 | - if not meter in self.meters: |
2145 | - raise ValueError('attempt to use orphaned meter') |
2146 | - self._lock.acquire() |
2147 | - try: |
2148 | - if not meter in self.in_progress_meters: |
2149 | - self.in_progress_meters.append(meter) |
2150 | - self.open_files += 1 |
2151 | - finally: |
2152 | - self._lock.release() |
2153 | - self._do_start_meter(meter, now) |
2154 | - |
2155 | - def _do_start_meter(self, meter, now): |
2156 | - pass |
2157 | - |
2158 | - def update_meter(self, meter, now): |
2159 | - if not meter in self.meters: |
2160 | - raise ValueError('attempt to use orphaned meter') |
2161 | - if (now >= self.last_update_time + self.update_period) or \ |
2162 | - not self.last_update_time: |
2163 | - self.re.update(self._amount_read(), now) |
2164 | - self.last_update_time = now |
2165 | - self._do_update_meter(meter, now) |
2166 | - |
2167 | - def _do_update_meter(self, meter, now): |
2168 | - pass |
2169 | - |
2170 | - def end_meter(self, meter, now): |
2171 | - if not meter in self.meters: |
2172 | - raise ValueError('attempt to use orphaned meter') |
2173 | - self._lock.acquire() |
2174 | - try: |
2175 | - try: self.in_progress_meters.remove(meter) |
2176 | - except ValueError: pass |
2177 | - self.open_files -= 1 |
2178 | - self.finished_files += 1 |
2179 | - self.finished_file_size += meter.last_amount_read |
2180 | - finally: |
2181 | - self._lock.release() |
2182 | - self._do_end_meter(meter, now) |
2183 | - |
2184 | - def _do_end_meter(self, meter, now): |
2185 | - pass |
2186 | - |
2187 | - def failure_meter(self, meter, message, now): |
2188 | - if not meter in self.meters: |
2189 | - raise ValueError('attempt to use orphaned meter') |
2190 | - self._lock.acquire() |
2191 | - try: |
2192 | - try: self.in_progress_meters.remove(meter) |
2193 | - except ValueError: pass |
2194 | - self.open_files -= 1 |
2195 | - self.failed_files += 1 |
2196 | - if meter.size and self.failed_size is not None: |
2197 | - self.failed_size += meter.size |
2198 | - else: |
2199 | - self.failed_size = None |
2200 | - finally: |
2201 | - self._lock.release() |
2202 | - self._do_failure_meter(meter, message, now) |
2203 | - |
2204 | - def _do_failure_meter(self, meter, message, now): |
2205 | - pass |
2206 | - |
2207 | - def message_meter(self, meter, message): |
2208 | - pass |
2209 | - |
2210 | - ######################################################## |
2211 | - # internal functions |
2212 | - def _amount_read(self): |
2213 | - tot = self.finished_file_size |
2214 | - for m in self.in_progress_meters: |
2215 | - tot += m.last_amount_read |
2216 | - return tot |
2217 | - |
2218 | - |
2219 | -class TextMultiFileMeter(MultiFileMeter): |
2220 | - def __init__(self, fo=sys.stderr): |
2221 | - self.fo = fo |
2222 | - MultiFileMeter.__init__(self) |
2223 | - |
2224 | - # files: ###/### ###% data: ######/###### ###% time: ##:##:##/##:##:## |
2225 | - def _do_update_meter(self, meter, now): |
2226 | - self._lock.acquire() |
2227 | - try: |
2228 | - format = "files: %3i/%-3i %3i%% data: %6.6s/%-6.6s %3i%% " \ |
2229 | - "time: %8.8s/%8.8s" |
2230 | - df = self.finished_files |
2231 | - tf = self.numfiles or 1 |
2232 | - pf = 100 * float(df)/tf + 0.49 |
2233 | - dd = self.re.last_amount_read |
2234 | - td = self.total_size |
2235 | - pd = 100 * (self.re.fraction_read() or 0) + 0.49 |
2236 | - dt = self.re.elapsed_time() |
2237 | - rt = self.re.remaining_time() |
2238 | - if rt is None: tt = None |
2239 | - else: tt = dt + rt |
2240 | - |
2241 | - fdd = format_number(dd) + 'B' |
2242 | - ftd = format_number(td) + 'B' |
2243 | - fdt = format_time(dt, 1) |
2244 | - ftt = format_time(tt, 1) |
2245 | - |
2246 | - out = '%-79.79s' % (format % (df, tf, pf, fdd, ftd, pd, fdt, ftt)) |
2247 | - self.fo.write('\r' + out) |
2248 | - self.fo.flush() |
2249 | - finally: |
2250 | - self._lock.release() |
2251 | - |
2252 | - def _do_end_meter(self, meter, now): |
2253 | - self._lock.acquire() |
2254 | - try: |
2255 | - format = "%-30.30s %6.6s %8.8s %9.9s" |
2256 | - fn = meter.basename |
2257 | - size = meter.last_amount_read |
2258 | - fsize = format_number(size) + 'B' |
2259 | - et = meter.re.elapsed_time() |
2260 | - fet = format_time(et, 1) |
2261 | - frate = format_number(size / et) + 'B/s' |
2262 | - |
2263 | - out = '%-79.79s' % (format % (fn, fsize, fet, frate)) |
2264 | - self.fo.write('\r' + out + '\n') |
2265 | - finally: |
2266 | - self._lock.release() |
2267 | - self._do_update_meter(meter, now) |
2268 | - |
2269 | - def _do_failure_meter(self, meter, message, now): |
2270 | - self._lock.acquire() |
2271 | - try: |
2272 | - format = "%-30.30s %6.6s %s" |
2273 | - fn = meter.basename |
2274 | - if type(message) in (type(''), type(u'')): |
2275 | - message = message.splitlines() |
2276 | - if not message: message = [''] |
2277 | - out = '%-79s' % (format % (fn, 'FAILED', message[0] or '')) |
2278 | - self.fo.write('\r' + out + '\n') |
2279 | - for m in message[1:]: self.fo.write(' ' + m + '\n') |
2280 | - self._lock.release() |
2281 | - finally: |
2282 | - self._do_update_meter(meter, now) |
2283 | - |
2284 | - def message_meter(self, meter, message): |
2285 | - self._lock.acquire() |
2286 | - try: |
2287 | - pass |
2288 | - finally: |
2289 | - self._lock.release() |
2290 | - |
2291 | - def _do_end(self, now): |
2292 | - self._do_update_meter(None, now) |
2293 | - self._lock.acquire() |
2294 | - try: |
2295 | - self.fo.write('\n') |
2296 | - self.fo.flush() |
2297 | - finally: |
2298 | - self._lock.release() |
2299 | - |
2300 | -###################################################################### |
2301 | -# support classes and functions |
2302 | - |
2303 | -class RateEstimator: |
2304 | - def __init__(self, timescale=5.0): |
2305 | - self.timescale = timescale |
2306 | - |
2307 | - def start(self, total=None, now=None): |
2308 | - if now is None: now = time.time() |
2309 | - self.total = total |
2310 | - self.start_time = now |
2311 | - self.last_update_time = now |
2312 | - self.last_amount_read = 0 |
2313 | - self.ave_rate = None |
2314 | - |
2315 | - def update(self, amount_read, now=None): |
2316 | - if now is None: now = time.time() |
2317 | - if amount_read == 0: |
2318 | - # if we just started this file, all bets are off |
2319 | - self.last_update_time = now |
2320 | - self.last_amount_read = 0 |
2321 | - self.ave_rate = None |
2322 | - return |
2323 | - |
2324 | - #print 'times', now, self.last_update_time |
2325 | - time_diff = now - self.last_update_time |
2326 | - read_diff = amount_read - self.last_amount_read |
2327 | - # First update, on reget is the file size |
2328 | - if self.last_amount_read: |
2329 | - self.last_update_time = now |
2330 | - self.ave_rate = self._temporal_rolling_ave(\ |
2331 | - time_diff, read_diff, self.ave_rate, self.timescale) |
2332 | - self.last_amount_read = amount_read |
2333 | - #print 'results', time_diff, read_diff, self.ave_rate |
2334 | - |
2335 | - ##################################################################### |
2336 | - # result methods |
2337 | - def average_rate(self): |
2338 | - "get the average transfer rate (in bytes/second)" |
2339 | - return self.ave_rate |
2340 | - |
2341 | - def elapsed_time(self): |
2342 | - "the time between the start of the transfer and the most recent update" |
2343 | - return self.last_update_time - self.start_time |
2344 | - |
2345 | - def remaining_time(self): |
2346 | - "estimated time remaining" |
2347 | - if not self.ave_rate or not self.total: return None |
2348 | - return (self.total - self.last_amount_read) / self.ave_rate |
2349 | - |
2350 | - def fraction_read(self): |
2351 | - """the fraction of the data that has been read |
2352 | - (can be None for unknown transfer size)""" |
2353 | - if self.total is None: return None |
2354 | - elif self.total == 0: return 1.0 |
2355 | - else: return float(self.last_amount_read)/self.total |
2356 | - |
2357 | - ######################################################################### |
2358 | - # support methods |
2359 | - def _temporal_rolling_ave(self, time_diff, read_diff, last_ave, timescale): |
2360 | - """a temporal rolling average performs smooth averaging even when |
2361 | - updates come at irregular intervals. This is performed by scaling |
2362 | - the "epsilon" according to the time since the last update. |
2363 | - Specifically, epsilon = time_diff / timescale |
2364 | - |
2365 | - As a general rule, the average will take on a completely new value |
2366 | - after 'timescale' seconds.""" |
2367 | - epsilon = time_diff / timescale |
2368 | - if epsilon > 1: epsilon = 1.0 |
2369 | - return self._rolling_ave(time_diff, read_diff, last_ave, epsilon) |
2370 | - |
2371 | - def _rolling_ave(self, time_diff, read_diff, last_ave, epsilon): |
2372 | - """perform a "rolling average" iteration |
2373 | - a rolling average "folds" new data into an existing average with |
2374 | - some weight, epsilon. epsilon must be between 0.0 and 1.0 (inclusive) |
2375 | - a value of 0.0 means only the old value (initial value) counts, |
2376 | - and a value of 1.0 means only the newest value is considered.""" |
2377 | - |
2378 | - try: |
2379 | - recent_rate = read_diff / time_diff |
2380 | - except ZeroDivisionError: |
2381 | - recent_rate = None |
2382 | - if last_ave is None: return recent_rate |
2383 | - elif recent_rate is None: return last_ave |
2384 | - |
2385 | - # at this point, both last_ave and recent_rate are numbers |
2386 | - return epsilon * recent_rate + (1 - epsilon) * last_ave |
2387 | - |
2388 | - def _round_remaining_time(self, rt, start_time=15.0): |
2389 | - """round the remaining time, depending on its size |
2390 | - If rt is between n*start_time and (n+1)*start_time round downward |
2391 | - to the nearest multiple of n (for any counting number n). |
2392 | - If rt < start_time, round down to the nearest 1. |
2393 | - For example (for start_time = 15.0): |
2394 | - 2.7 -> 2.0 |
2395 | - 25.2 -> 25.0 |
2396 | - 26.4 -> 26.0 |
2397 | - 35.3 -> 34.0 |
2398 | - 63.6 -> 60.0 |
2399 | - """ |
2400 | - |
2401 | - if rt < 0: return 0.0 |
2402 | - shift = int(math.log(rt/start_time)/math.log(2)) |
2403 | - rt = int(rt) |
2404 | - if shift <= 0: return rt |
2405 | - return float(int(rt) >> shift << shift) |
2406 | - |
2407 | - |
2408 | -def format_time(seconds, use_hours=0): |
2409 | - if seconds is None or seconds < 0: |
2410 | - if use_hours: return '--:--:--' |
2411 | - else: return '--:--' |
2412 | - else: |
2413 | - seconds = int(seconds) |
2414 | - minutes = seconds / 60 |
2415 | - seconds = seconds % 60 |
2416 | - if use_hours: |
2417 | - hours = minutes / 60 |
2418 | - minutes = minutes % 60 |
2419 | - return '%02i:%02i:%02i' % (hours, minutes, seconds) |
2420 | - else: |
2421 | - return '%02i:%02i' % (minutes, seconds) |
2422 | - |
2423 | -def format_number(number, SI=0, space=' '): |
2424 | - """Turn numbers into human-readable metric-like numbers""" |
2425 | - symbols = ['', # (none) |
2426 | - 'k', # kilo |
2427 | - 'M', # mega |
2428 | - 'G', # giga |
2429 | - 'T', # tera |
2430 | - 'P', # peta |
2431 | - 'E', # exa |
2432 | - 'Z', # zetta |
2433 | - 'Y'] # yotta |
2434 | - |
2435 | - if SI: step = 1000.0 |
2436 | - else: step = 1024.0 |
2437 | - |
2438 | - thresh = 999 |
2439 | - depth = 0 |
2440 | - max_depth = len(symbols) - 1 |
2441 | - |
2442 | - # we want numbers between 0 and thresh, but don't exceed the length |
2443 | - # of our list. In that event, the formatting will be screwed up, |
2444 | - # but it'll still show the right number. |
2445 | - while number > thresh and depth < max_depth: |
2446 | - depth = depth + 1 |
2447 | - number = number / step |
2448 | - |
2449 | - if type(number) == type(1) or type(number) == type(1L): |
2450 | - # it's an int or a long, which means it didn't get divided, |
2451 | - # which means it's already short enough |
2452 | - format = '%i%s%s' |
2453 | - elif number < 9.95: |
2454 | - # must use 9.95 for proper sizing. For example, 9.99 will be |
2455 | - # rounded to 10.0 with the .1f format string (which is too long) |
2456 | - format = '%.1f%s%s' |
2457 | - else: |
2458 | - format = '%.0f%s%s' |
2459 | - |
2460 | - return(format % (float(number or 0), space, symbols[depth])) |
2461 | - |
2462 | -def _tst(fn, cur, tot, beg, size, *args): |
2463 | - tm = TextMeter() |
2464 | - text = "(%d/%d): %s" % (cur, tot, fn) |
2465 | - tm.start(fn, "http://www.example.com/path/to/fn/" + fn, fn, size, text=text) |
2466 | - num = beg |
2467 | - off = 0 |
2468 | - for (inc, delay) in args: |
2469 | - off += 1 |
2470 | - while num < ((size * off) / len(args)): |
2471 | - num += inc |
2472 | - tm.update(num) |
2473 | - time.sleep(delay) |
2474 | - tm.end(size) |
2475 | - |
2476 | -if __name__ == "__main__": |
2477 | - # (1/2): subversion-1.4.4-7.x86_64.rpm 2.4 MB / 85 kB/s 00:28 |
2478 | - # (2/2): mercurial-0.9.5-6.fc8.x86_64.rpm 924 kB / 106 kB/s 00:08 |
2479 | - if len(sys.argv) >= 2 and sys.argv[1] == 'total': |
2480 | - text_meter_total_size(1000 + 10000 + 10000 + 1000000 + 1000000 + |
2481 | - 1000000 + 10000 + 10000 + 10000 + 1000000) |
2482 | - _tst("sm-1.0.0-1.fc8.i386.rpm", 1, 10, 0, 1000, |
2483 | - (10, 0.2), (10, 0.1), (100, 0.25)) |
2484 | - _tst("s-1.0.1-1.fc8.i386.rpm", 2, 10, 0, 10000, |
2485 | - (10, 0.2), (100, 0.1), (100, 0.1), (100, 0.25)) |
2486 | - _tst("m-1.0.1-2.fc8.i386.rpm", 3, 10, 5000, 10000, |
2487 | - (10, 0.2), (100, 0.1), (100, 0.1), (100, 0.25)) |
2488 | - _tst("large-file-name-Foo-11.8.7-4.5.6.1.fc8.x86_64.rpm", 4, 10, 0, 1000000, |
2489 | - (1000, 0.2), (1000, 0.1), (10000, 0.1)) |
2490 | - _tst("large-file-name-Foo2-11.8.7-4.5.6.2.fc8.x86_64.rpm", 5, 10, |
2491 | - 500001, 1000000, (1000, 0.2), (1000, 0.1), (10000, 0.1)) |
2492 | - _tst("large-file-name-Foo3-11.8.7-4.5.6.3.fc8.x86_64.rpm", 6, 10, |
2493 | - 750002, 1000000, (1000, 0.2), (1000, 0.1), (10000, 0.1)) |
2494 | - _tst("large-file-name-Foo4-10.8.7-4.5.6.1.fc8.x86_64.rpm", 7, 10, 0, 10000, |
2495 | - (100, 0.1)) |
2496 | - _tst("large-file-name-Foo5-10.8.7-4.5.6.2.fc8.x86_64.rpm", 8, 10, |
2497 | - 5001, 10000, (100, 0.1)) |
2498 | - _tst("large-file-name-Foo6-10.8.7-4.5.6.3.fc8.x86_64.rpm", 9, 10, |
2499 | - 7502, 10000, (1, 0.1)) |
2500 | - _tst("large-file-name-Foox-9.8.7-4.5.6.1.fc8.x86_64.rpm", 10, 10, |
2501 | - 0, 1000000, (10, 0.5), |
2502 | - (100000, 0.1), (10000, 0.1), (10000, 0.1), (10000, 0.1), |
2503 | - (100000, 0.1), (10000, 0.1), (10000, 0.1), (10000, 0.1), |
2504 | - (100000, 0.1), (10000, 0.1), (10000, 0.1), (10000, 0.1), |
2505 | - (100000, 0.1), (10000, 0.1), (10000, 0.1), (10000, 0.1), |
2506 | - (100000, 0.1), (1, 0.1)) |
2507 | |
2508 | === removed directory '.pc/progress_object_callback_fix.diff' |
2509 | === removed directory '.pc/progress_object_callback_fix.diff/urlgrabber' |
2510 | === removed file '.pc/progress_object_callback_fix.diff/urlgrabber/grabber.py' |
2511 | --- .pc/progress_object_callback_fix.diff/urlgrabber/grabber.py 2011-08-09 17:45:08 +0000 |
2512 | +++ .pc/progress_object_callback_fix.diff/urlgrabber/grabber.py 1970-01-01 00:00:00 +0000 |
2513 | @@ -1,1802 +0,0 @@ |
2514 | -# This library is free software; you can redistribute it and/or |
2515 | -# modify it under the terms of the GNU Lesser General Public |
2516 | -# License as published by the Free Software Foundation; either |
2517 | -# version 2.1 of the License, or (at your option) any later version. |
2518 | -# |
2519 | -# This library is distributed in the hope that it will be useful, |
2520 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
2521 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
2522 | -# Lesser General Public License for more details. |
2523 | -# |
2524 | -# You should have received a copy of the GNU Lesser General Public |
2525 | -# License along with this library; if not, write to the |
2526 | -# Free Software Foundation, Inc., |
2527 | -# 59 Temple Place, Suite 330, |
2528 | -# Boston, MA 02111-1307 USA |
2529 | - |
2530 | -# This file is part of urlgrabber, a high-level cross-protocol url-grabber |
2531 | -# Copyright 2002-2004 Michael D. Stenner, Ryan Tomayko |
2532 | -# Copyright 2009 Red Hat inc, pycurl code written by Seth Vidal |
2533 | - |
2534 | -"""A high-level cross-protocol url-grabber. |
2535 | - |
2536 | -GENERAL ARGUMENTS (kwargs) |
2537 | - |
2538 | - Where possible, the module-level default is indicated, and legal |
2539 | - values are provided. |
2540 | - |
2541 | - copy_local = 0 [0|1] |
2542 | - |
2543 | - ignored except for file:// urls, in which case it specifies |
2544 | - whether urlgrab should still make a copy of the file, or simply |
2545 | - point to the existing copy. The module level default for this |
2546 | - option is 0. |
2547 | - |
2548 | - close_connection = 0 [0|1] |
2549 | - |
2550 | - tells URLGrabber to close the connection after a file has been |
2551 | - transfered. This is ignored unless the download happens with the |
2552 | - http keepalive handler (keepalive=1). Otherwise, the connection |
2553 | - is left open for further use. The module level default for this |
2554 | - option is 0 (keepalive connections will not be closed). |
2555 | - |
2556 | - keepalive = 1 [0|1] |
2557 | - |
2558 | - specifies whether keepalive should be used for HTTP/1.1 servers |
2559 | - that support it. The module level default for this option is 1 |
2560 | - (keepalive is enabled). |
2561 | - |
2562 | - progress_obj = None |
2563 | - |
2564 | - a class instance that supports the following methods: |
2565 | - po.start(filename, url, basename, length, text) |
2566 | - # length will be None if unknown |
2567 | - po.update(read) # read == bytes read so far |
2568 | - po.end() |
2569 | - |
2570 | - text = None |
2571 | - |
2572 | - specifies alternative text to be passed to the progress meter |
2573 | - object. If not given, the default progress meter will use the |
2574 | - basename of the file. |
2575 | - |
2576 | - throttle = 1.0 |
2577 | - |
2578 | - a number - if it's an int, it's the bytes/second throttle limit. |
2579 | - If it's a float, it is first multiplied by bandwidth. If throttle |
2580 | - == 0, throttling is disabled. If None, the module-level default |
2581 | - (which can be set on default_grabber.throttle) is used. See |
2582 | - BANDWIDTH THROTTLING for more information. |
2583 | - |
2584 | - timeout = 300 |
2585 | - |
2586 | - a positive integer expressing the number of seconds to wait before |
2587 | - timing out attempts to connect to a server. If the value is None |
2588 | - or 0, connection attempts will not time out. The timeout is passed |
2589 | - to the underlying pycurl object as its CONNECTTIMEOUT option, see |
2590 | - the curl documentation on CURLOPT_CONNECTTIMEOUT for more information. |
2591 | - http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCONNECTTIMEOUT |
2592 | - |
2593 | - bandwidth = 0 |
2594 | - |
2595 | - the nominal max bandwidth in bytes/second. If throttle is a float |
2596 | - and bandwidth == 0, throttling is disabled. If None, the |
2597 | - module-level default (which can be set on |
2598 | - default_grabber.bandwidth) is used. See BANDWIDTH THROTTLING for |
2599 | - more information. |
2600 | - |
2601 | - range = None |
2602 | - |
2603 | - a tuple of the form (first_byte, last_byte) describing a byte |
2604 | - range to retrieve. Either or both of the values may set to |
2605 | - None. If first_byte is None, byte offset 0 is assumed. If |
2606 | - last_byte is None, the last byte available is assumed. Note that |
2607 | - the range specification is python-like in that (0,10) will yeild |
2608 | - the first 10 bytes of the file. |
2609 | - |
2610 | - If set to None, no range will be used. |
2611 | - |
2612 | - reget = None [None|'simple'|'check_timestamp'] |
2613 | - |
2614 | - whether to attempt to reget a partially-downloaded file. Reget |
2615 | - only applies to .urlgrab and (obviously) only if there is a |
2616 | - partially downloaded file. Reget has two modes: |
2617 | - |
2618 | - 'simple' -- the local file will always be trusted. If there |
2619 | - are 100 bytes in the local file, then the download will always |
2620 | - begin 100 bytes into the requested file. |
2621 | - |
2622 | - 'check_timestamp' -- the timestamp of the server file will be |
2623 | - compared to the timestamp of the local file. ONLY if the |
2624 | - local file is newer than or the same age as the server file |
2625 | - will reget be used. If the server file is newer, or the |
2626 | - timestamp is not returned, the entire file will be fetched. |
2627 | - |
2628 | - NOTE: urlgrabber can do very little to verify that the partial |
2629 | - file on disk is identical to the beginning of the remote file. |
2630 | - You may want to either employ a custom "checkfunc" or simply avoid |
2631 | - using reget in situations where corruption is a concern. |
2632 | - |
2633 | - user_agent = 'urlgrabber/VERSION' |
2634 | - |
2635 | - a string, usually of the form 'AGENT/VERSION' that is provided to |
2636 | - HTTP servers in the User-agent header. The module level default |
2637 | - for this option is "urlgrabber/VERSION". |
2638 | - |
2639 | - http_headers = None |
2640 | - |
2641 | - a tuple of 2-tuples, each containing a header and value. These |
2642 | - will be used for http and https requests only. For example, you |
2643 | - can do |
2644 | - http_headers = (('Pragma', 'no-cache'),) |
2645 | - |
2646 | - ftp_headers = None |
2647 | - |
2648 | - this is just like http_headers, but will be used for ftp requests. |
2649 | - |
2650 | - proxies = None |
2651 | - |
2652 | - a dictionary that maps protocol schemes to proxy hosts. For |
2653 | - example, to use a proxy server on host "foo" port 3128 for http |
2654 | - and https URLs: |
2655 | - proxies={ 'http' : 'http://foo:3128', 'https' : 'http://foo:3128' } |
2656 | - note that proxy authentication information may be provided using |
2657 | - normal URL constructs: |
2658 | - proxies={ 'http' : 'http://user:host@foo:3128' } |
2659 | - Lastly, if proxies is None, the default environment settings will |
2660 | - be used. |
2661 | - |
2662 | - prefix = None |
2663 | - |
2664 | - a url prefix that will be prepended to all requested urls. For |
2665 | - example: |
2666 | - g = URLGrabber(prefix='http://foo.com/mirror/') |
2667 | - g.urlgrab('some/file.txt') |
2668 | - ## this will fetch 'http://foo.com/mirror/some/file.txt' |
2669 | - This option exists primarily to allow identical behavior to |
2670 | - MirrorGroup (and derived) instances. Note: a '/' will be inserted |
2671 | - if necessary, so you cannot specify a prefix that ends with a |
2672 | - partial file or directory name. |
2673 | - |
2674 | - opener = None |
2675 | - No-op when using the curl backend (default) |
2676 | - |
2677 | - cache_openers = True |
2678 | - No-op when using the curl backend (default) |
2679 | - |
2680 | - data = None |
2681 | - |
2682 | - Only relevant for the HTTP family (and ignored for other |
2683 | - protocols), this allows HTTP POSTs. When the data kwarg is |
2684 | - present (and not None), an HTTP request will automatically become |
2685 | - a POST rather than GET. This is done by direct passthrough to |
2686 | - urllib2. If you use this, you may also want to set the |
2687 | - 'Content-length' and 'Content-type' headers with the http_headers |
2688 | - option. Note that python 2.2 handles the case of these |
2689 | - badly and if you do not use the proper case (shown here), your |
2690 | - values will be overridden with the defaults. |
2691 | - |
2692 | - urlparser = URLParser() |
2693 | - |
2694 | - The URLParser class handles pre-processing of URLs, including |
2695 | - auth-handling for user/pass encoded in http urls, file handing |
2696 | - (that is, filenames not sent as a URL), and URL quoting. If you |
2697 | - want to override any of this behavior, you can pass in a |
2698 | - replacement instance. See also the 'quote' option. |
2699 | - |
2700 | - quote = None |
2701 | - |
2702 | - Whether or not to quote the path portion of a url. |
2703 | - quote = 1 -> quote the URLs (they're not quoted yet) |
2704 | - quote = 0 -> do not quote them (they're already quoted) |
2705 | - quote = None -> guess what to do |
2706 | - |
2707 | - This option only affects proper urls like 'file:///etc/passwd'; it |
2708 | - does not affect 'raw' filenames like '/etc/passwd'. The latter |
2709 | - will always be quoted as they are converted to URLs. Also, only |
2710 | - the path part of a url is quoted. If you need more fine-grained |
2711 | - control, you should probably subclass URLParser and pass it in via |
2712 | - the 'urlparser' option. |
2713 | - |
2714 | - ssl_ca_cert = None |
2715 | - |
2716 | - this option can be used if M2Crypto is available and will be |
2717 | - ignored otherwise. If provided, it will be used to create an SSL |
2718 | - context. If both ssl_ca_cert and ssl_context are provided, then |
2719 | - ssl_context will be ignored and a new context will be created from |
2720 | - ssl_ca_cert. |
2721 | - |
2722 | - ssl_context = None |
2723 | - |
2724 | - No-op when using the curl backend (default) |
2725 | - |
2726 | - |
2727 | - self.ssl_verify_peer = True |
2728 | - |
2729 | - Check the server's certificate to make sure it is valid with what our CA validates |
2730 | - |
2731 | - self.ssl_verify_host = True |
2732 | - |
2733 | - Check the server's hostname to make sure it matches the certificate DN |
2734 | - |
2735 | - self.ssl_key = None |
2736 | - |
2737 | - Path to the key the client should use to connect/authenticate with |
2738 | - |
2739 | - self.ssl_key_type = 'PEM' |
2740 | - |
2741 | - PEM or DER - format of key |
2742 | - |
2743 | - self.ssl_cert = None |
2744 | - |
2745 | - Path to the ssl certificate the client should use to to authenticate with |
2746 | - |
2747 | - self.ssl_cert_type = 'PEM' |
2748 | - |
2749 | - PEM or DER - format of certificate |
2750 | - |
2751 | - self.ssl_key_pass = None |
2752 | - |
2753 | - password to access the ssl_key |
2754 | - |
2755 | - self.size = None |
2756 | - |
2757 | - size (in bytes) or Maximum size of the thing being downloaded. |
2758 | - This is mostly to keep us from exploding with an endless datastream |
2759 | - |
2760 | - self.max_header_size = 2097152 |
2761 | - |
2762 | - Maximum size (in bytes) of the headers. |
2763 | - |
2764 | - |
2765 | -RETRY RELATED ARGUMENTS |
2766 | - |
2767 | - retry = None |
2768 | - |
2769 | - the number of times to retry the grab before bailing. If this is |
2770 | - zero, it will retry forever. This was intentional... really, it |
2771 | - was :). If this value is not supplied or is supplied but is None |
2772 | - retrying does not occur. |
2773 | - |
2774 | - retrycodes = [-1,2,4,5,6,7] |
2775 | - |
2776 | - a sequence of errorcodes (values of e.errno) for which it should |
2777 | - retry. See the doc on URLGrabError for more details on this. You |
2778 | - might consider modifying a copy of the default codes rather than |
2779 | - building yours from scratch so that if the list is extended in the |
2780 | - future (or one code is split into two) you can still enjoy the |
2781 | - benefits of the default list. You can do that with something like |
2782 | - this: |
2783 | - |
2784 | - retrycodes = urlgrabber.grabber.URLGrabberOptions().retrycodes |
2785 | - if 12 not in retrycodes: |
2786 | - retrycodes.append(12) |
2787 | - |
2788 | - checkfunc = None |
2789 | - |
2790 | - a function to do additional checks. This defaults to None, which |
2791 | - means no additional checking. The function should simply return |
2792 | - on a successful check. It should raise URLGrabError on an |
2793 | - unsuccessful check. Raising of any other exception will be |
2794 | - considered immediate failure and no retries will occur. |
2795 | - |
2796 | - If it raises URLGrabError, the error code will determine the retry |
2797 | - behavior. Negative error numbers are reserved for use by these |
2798 | - passed in functions, so you can use many negative numbers for |
2799 | - different types of failure. By default, -1 results in a retry, |
2800 | - but this can be customized with retrycodes. |
2801 | - |
2802 | - If you simply pass in a function, it will be given exactly one |
2803 | - argument: a CallbackObject instance with the .url attribute |
2804 | - defined and either .filename (for urlgrab) or .data (for urlread). |
2805 | - For urlgrab, .filename is the name of the local file. For |
2806 | - urlread, .data is the actual string data. If you need other |
2807 | - arguments passed to the callback (program state of some sort), you |
2808 | - can do so like this: |
2809 | - |
2810 | - checkfunc=(function, ('arg1', 2), {'kwarg': 3}) |
2811 | - |
2812 | - if the downloaded file has filename /tmp/stuff, then this will |
2813 | - result in this call (for urlgrab): |
2814 | - |
2815 | - function(obj, 'arg1', 2, kwarg=3) |
2816 | - # obj.filename = '/tmp/stuff' |
2817 | - # obj.url = 'http://foo.com/stuff' |
2818 | - |
2819 | - NOTE: both the "args" tuple and "kwargs" dict must be present if |
2820 | - you use this syntax, but either (or both) can be empty. |
2821 | - |
2822 | - failure_callback = None |
2823 | - |
2824 | - The callback that gets called during retries when an attempt to |
2825 | - fetch a file fails. The syntax for specifying the callback is |
2826 | - identical to checkfunc, except for the attributes defined in the |
2827 | - CallbackObject instance. The attributes for failure_callback are: |
2828 | - |
2829 | - exception = the raised exception |
2830 | - url = the url we're trying to fetch |
2831 | - tries = the number of tries so far (including this one) |
2832 | - retry = the value of the retry option |
2833 | - |
2834 | - The callback is present primarily to inform the calling program of |
2835 | - the failure, but if it raises an exception (including the one it's |
2836 | - passed) that exception will NOT be caught and will therefore cause |
2837 | - future retries to be aborted. |
2838 | - |
2839 | - The callback is called for EVERY failure, including the last one. |
2840 | - On the last try, the callback can raise an alternate exception, |
2841 | - but it cannot (without severe trickiness) prevent the exception |
2842 | - from being raised. |
2843 | - |
2844 | - interrupt_callback = None |
2845 | - |
2846 | - This callback is called if KeyboardInterrupt is received at any |
2847 | - point in the transfer. Basically, this callback can have three |
2848 | - impacts on the fetch process based on the way it exits: |
2849 | - |
2850 | - 1) raise no exception: the current fetch will be aborted, but |
2851 | - any further retries will still take place |
2852 | - |
2853 | - 2) raise a URLGrabError: if you're using a MirrorGroup, then |
2854 | - this will prompt a failover to the next mirror according to |
2855 | - the behavior of the MirrorGroup subclass. It is recommended |
2856 | - that you raise URLGrabError with code 15, 'user abort'. If |
2857 | - you are NOT using a MirrorGroup subclass, then this is the |
2858 | - same as (3). |
2859 | - |
2860 | - 3) raise some other exception (such as KeyboardInterrupt), which |
2861 | - will not be caught at either the grabber or mirror levels. |
2862 | - That is, it will be raised up all the way to the caller. |
2863 | - |
2864 | - This callback is very similar to failure_callback. They are |
2865 | - passed the same arguments, so you could use the same function for |
2866 | - both. |
2867 | - |
2868 | -BANDWIDTH THROTTLING |
2869 | - |
2870 | - urlgrabber supports throttling via two values: throttle and |
2871 | - bandwidth Between the two, you can either specify and absolute |
2872 | - throttle threshold or specify a theshold as a fraction of maximum |
2873 | - available bandwidth. |
2874 | - |
2875 | - throttle is a number - if it's an int, it's the bytes/second |
2876 | - throttle limit. If it's a float, it is first multiplied by |
2877 | - bandwidth. If throttle == 0, throttling is disabled. If None, the |
2878 | - module-level default (which can be set with set_throttle) is used. |
2879 | - |
2880 | - bandwidth is the nominal max bandwidth in bytes/second. If throttle |
2881 | - is a float and bandwidth == 0, throttling is disabled. If None, the |
2882 | - module-level default (which can be set with set_bandwidth) is used. |
2883 | - |
2884 | - THROTTLING EXAMPLES: |
2885 | - |
2886 | - Lets say you have a 100 Mbps connection. This is (about) 10^8 bits |
2887 | - per second, or 12,500,000 Bytes per second. You have a number of |
2888 | - throttling options: |
2889 | - |
2890 | - *) set_bandwidth(12500000); set_throttle(0.5) # throttle is a float |
2891 | - |
2892 | - This will limit urlgrab to use half of your available bandwidth. |
2893 | - |
2894 | - *) set_throttle(6250000) # throttle is an int |
2895 | - |
2896 | - This will also limit urlgrab to use half of your available |
2897 | - bandwidth, regardless of what bandwidth is set to. |
2898 | - |
2899 | - *) set_throttle(6250000); set_throttle(1.0) # float |
2900 | - |
2901 | - Use half your bandwidth |
2902 | - |
2903 | - *) set_throttle(6250000); set_throttle(2.0) # float |
2904 | - |
2905 | - Use up to 12,500,000 Bytes per second (your nominal max bandwidth) |
2906 | - |
2907 | - *) set_throttle(6250000); set_throttle(0) # throttle = 0 |
2908 | - |
2909 | - Disable throttling - this is more efficient than a very large |
2910 | - throttle setting. |
2911 | - |
2912 | - *) set_throttle(0); set_throttle(1.0) # throttle is float, bandwidth = 0 |
2913 | - |
2914 | - Disable throttling - this is the default when the module is loaded. |
2915 | - |
2916 | - SUGGESTED AUTHOR IMPLEMENTATION (THROTTLING) |
2917 | - |
2918 | - While this is flexible, it's not extremely obvious to the user. I |
2919 | - suggest you implement a float throttle as a percent to make the |
2920 | - distinction between absolute and relative throttling very explicit. |
2921 | - |
2922 | - Also, you may want to convert the units to something more convenient |
2923 | - than bytes/second, such as kbps or kB/s, etc. |
2924 | - |
2925 | -""" |
2926 | - |
2927 | - |
2928 | - |
2929 | -import os |
2930 | -import sys |
2931 | -import urlparse |
2932 | -import time |
2933 | -import string |
2934 | -import urllib |
2935 | -import urllib2 |
2936 | -import mimetools |
2937 | -import thread |
2938 | -import types |
2939 | -import stat |
2940 | -import pycurl |
2941 | -from ftplib import parse150 |
2942 | -from StringIO import StringIO |
2943 | -from httplib import HTTPException |
2944 | -import socket |
2945 | -from byterange import range_tuple_normalize, range_tuple_to_header, RangeError |
2946 | - |
2947 | -######################################################################## |
2948 | -# MODULE INITIALIZATION |
2949 | -######################################################################## |
2950 | -try: |
2951 | - exec('from ' + (__name__.split('.'))[0] + ' import __version__') |
2952 | -except: |
2953 | - __version__ = '???' |
2954 | - |
2955 | -try: |
2956 | - # this part isn't going to do much - need to talk to gettext |
2957 | - from i18n import _ |
2958 | -except ImportError, msg: |
2959 | - def _(st): return st |
2960 | - |
2961 | -######################################################################## |
2962 | -# functions for debugging output. These functions are here because they |
2963 | -# are also part of the module initialization. |
2964 | -DEBUG = None |
2965 | -def set_logger(DBOBJ): |
2966 | - """Set the DEBUG object. This is called by _init_default_logger when |
2967 | - the environment variable URLGRABBER_DEBUG is set, but can also be |
2968 | - called by a calling program. Basically, if the calling program uses |
2969 | - the logging module and would like to incorporate urlgrabber logging, |
2970 | - then it can do so this way. It's probably not necessary as most |
2971 | - internal logging is only for debugging purposes. |
2972 | - |
2973 | - The passed-in object should be a logging.Logger instance. It will |
2974 | - be pushed into the keepalive and byterange modules if they're |
2975 | - being used. The mirror module pulls this object in on import, so |
2976 | - you will need to manually push into it. In fact, you may find it |
2977 | - tidier to simply push your logging object (or objects) into each |
2978 | - of these modules independently. |
2979 | - """ |
2980 | - |
2981 | - global DEBUG |
2982 | - DEBUG = DBOBJ |
2983 | - |
2984 | -def _init_default_logger(logspec=None): |
2985 | - '''Examines the environment variable URLGRABBER_DEBUG and creates |
2986 | - a logging object (logging.logger) based on the contents. It takes |
2987 | - the form |
2988 | - |
2989 | - URLGRABBER_DEBUG=level,filename |
2990 | - |
2991 | - where "level" can be either an integer or a log level from the |
2992 | - logging module (DEBUG, INFO, etc). If the integer is zero or |
2993 | - less, logging will be disabled. Filename is the filename where |
2994 | - logs will be sent. If it is "-", then stdout will be used. If |
2995 | - the filename is empty or missing, stderr will be used. If the |
2996 | - variable cannot be processed or the logging module cannot be |
2997 | - imported (python < 2.3) then logging will be disabled. Here are |
2998 | - some examples: |
2999 | - |
3000 | - URLGRABBER_DEBUG=1,debug.txt # log everything to debug.txt |
3001 | - URLGRABBER_DEBUG=WARNING,- # log warning and higher to stdout |
3002 | - URLGRABBER_DEBUG=INFO # log info and higher to stderr |
3003 | - |
3004 | - This funtion is called during module initialization. It is not |
3005 | - intended to be called from outside. The only reason it is a |
3006 | - function at all is to keep the module-level namespace tidy and to |
3007 | - collect the code into a nice block.''' |
3008 | - |
3009 | - try: |
3010 | - if logspec is None: |
3011 | - logspec = os.environ['URLGRABBER_DEBUG'] |
3012 | - dbinfo = logspec.split(',') |
3013 | - import logging |
3014 | - level = logging._levelNames.get(dbinfo[0], None) |
3015 | - if level is None: level = int(dbinfo[0]) |
3016 | - if level < 1: raise ValueError() |
3017 | - |
3018 | - formatter = logging.Formatter('%(asctime)s %(message)s') |
3019 | - if len(dbinfo) > 1: filename = dbinfo[1] |
3020 | - else: filename = '' |
3021 | - if filename == '': handler = logging.StreamHandler(sys.stderr) |
3022 | - elif filename == '-': handler = logging.StreamHandler(sys.stdout) |
3023 | - else: handler = logging.FileHandler(filename) |
3024 | - handler.setFormatter(formatter) |
3025 | - DBOBJ = logging.getLogger('urlgrabber') |
3026 | - DBOBJ.addHandler(handler) |
3027 | - DBOBJ.setLevel(level) |
3028 | - except (KeyError, ImportError, ValueError): |
3029 | - DBOBJ = None |
3030 | - set_logger(DBOBJ) |
3031 | - |
3032 | -def _log_package_state(): |
3033 | - if not DEBUG: return |
3034 | - DEBUG.info('urlgrabber version = %s' % __version__) |
3035 | - DEBUG.info('trans function "_" = %s' % _) |
3036 | - |
3037 | -_init_default_logger() |
3038 | -_log_package_state() |
3039 | - |
3040 | - |
3041 | -# normally this would be from i18n or something like it ... |
3042 | -def _(st): |
3043 | - return st |
3044 | - |
3045 | -######################################################################## |
3046 | -# END MODULE INITIALIZATION |
3047 | -######################################################################## |
3048 | - |
3049 | - |
3050 | - |
3051 | -class URLGrabError(IOError): |
3052 | - """ |
3053 | - URLGrabError error codes: |
3054 | - |
3055 | - URLGrabber error codes (0 -- 255) |
3056 | - 0 - everything looks good (you should never see this) |
3057 | - 1 - malformed url |
3058 | - 2 - local file doesn't exist |
3059 | - 3 - request for non-file local file (dir, etc) |
3060 | - 4 - IOError on fetch |
3061 | - 5 - OSError on fetch |
3062 | - 6 - no content length header when we expected one |
3063 | - 7 - HTTPException |
3064 | - 8 - Exceeded read limit (for urlread) |
3065 | - 9 - Requested byte range not satisfiable. |
3066 | - 10 - Byte range requested, but range support unavailable |
3067 | - 11 - Illegal reget mode |
3068 | - 12 - Socket timeout |
3069 | - 13 - malformed proxy url |
3070 | - 14 - HTTPError (includes .code and .exception attributes) |
3071 | - 15 - user abort |
3072 | - 16 - error writing to local file |
3073 | - |
3074 | - MirrorGroup error codes (256 -- 511) |
3075 | - 256 - No more mirrors left to try |
3076 | - |
3077 | - Custom (non-builtin) classes derived from MirrorGroup (512 -- 767) |
3078 | - [ this range reserved for application-specific error codes ] |
3079 | - |
3080 | - Retry codes (< 0) |
3081 | - -1 - retry the download, unknown reason |
3082 | - |
3083 | - Note: to test which group a code is in, you can simply do integer |
3084 | - division by 256: e.errno / 256 |
3085 | - |
3086 | - Negative codes are reserved for use by functions passed in to |
3087 | - retrygrab with checkfunc. The value -1 is built in as a generic |
3088 | - retry code and is already included in the retrycodes list. |
3089 | - Therefore, you can create a custom check function that simply |
3090 | - returns -1 and the fetch will be re-tried. For more customized |
3091 | - retries, you can use other negative number and include them in |
3092 | - retry-codes. This is nice for outputting useful messages about |
3093 | - what failed. |
3094 | - |
3095 | - You can use these error codes like so: |
3096 | - try: urlgrab(url) |
3097 | - except URLGrabError, e: |
3098 | - if e.errno == 3: ... |
3099 | - # or |
3100 | - print e.strerror |
3101 | - # or simply |
3102 | - print e #### print '[Errno %i] %s' % (e.errno, e.strerror) |
3103 | - """ |
3104 | - def __init__(self, *args): |
3105 | - IOError.__init__(self, *args) |
3106 | - self.url = "No url specified" |
3107 | - |
3108 | -class CallbackObject: |
3109 | - """Container for returned callback data. |
3110 | - |
3111 | - This is currently a dummy class into which urlgrabber can stuff |
3112 | - information for passing to callbacks. This way, the prototype for |
3113 | - all callbacks is the same, regardless of the data that will be |
3114 | - passed back. Any function that accepts a callback function as an |
3115 | - argument SHOULD document what it will define in this object. |
3116 | - |
3117 | - It is possible that this class will have some greater |
3118 | - functionality in the future. |
3119 | - """ |
3120 | - def __init__(self, **kwargs): |
3121 | - self.__dict__.update(kwargs) |
3122 | - |
3123 | -def urlgrab(url, filename=None, **kwargs): |
3124 | - """grab the file at <url> and make a local copy at <filename> |
3125 | - If filename is none, the basename of the url is used. |
3126 | - urlgrab returns the filename of the local file, which may be different |
3127 | - from the passed-in filename if the copy_local kwarg == 0. |
3128 | - |
3129 | - See module documentation for a description of possible kwargs. |
3130 | - """ |
3131 | - return default_grabber.urlgrab(url, filename, **kwargs) |
3132 | - |
3133 | -def urlopen(url, **kwargs): |
3134 | - """open the url and return a file object |
3135 | - If a progress object or throttle specifications exist, then |
3136 | - a special file object will be returned that supports them. |
3137 | - The file object can be treated like any other file object. |
3138 | - |
3139 | - See module documentation for a description of possible kwargs. |
3140 | - """ |
3141 | - return default_grabber.urlopen(url, **kwargs) |
3142 | - |
3143 | -def urlread(url, limit=None, **kwargs): |
3144 | - """read the url into a string, up to 'limit' bytes |
3145 | - If the limit is exceeded, an exception will be thrown. Note that urlread |
3146 | - is NOT intended to be used as a way of saying "I want the first N bytes" |
3147 | - but rather 'read the whole file into memory, but don't use too much' |
3148 | - |
3149 | - See module documentation for a description of possible kwargs. |
3150 | - """ |
3151 | - return default_grabber.urlread(url, limit, **kwargs) |
3152 | - |
3153 | - |
3154 | -class URLParser: |
3155 | - """Process the URLs before passing them to urllib2. |
3156 | - |
3157 | - This class does several things: |
3158 | - |
3159 | - * add any prefix |
3160 | - * translate a "raw" file to a proper file: url |
3161 | - * handle any http or https auth that's encoded within the url |
3162 | - * quote the url |
3163 | - |
3164 | - Only the "parse" method is called directly, and it calls sub-methods. |
3165 | - |
3166 | - An instance of this class is held in the options object, which |
3167 | - means that it's easy to change the behavior by sub-classing and |
3168 | - passing the replacement in. It need only have a method like: |
3169 | - |
3170 | - url, parts = urlparser.parse(url, opts) |
3171 | - """ |
3172 | - |
3173 | - def parse(self, url, opts): |
3174 | - """parse the url and return the (modified) url and its parts |
3175 | - |
3176 | - Note: a raw file WILL be quoted when it's converted to a URL. |
3177 | - However, other urls (ones which come with a proper scheme) may |
3178 | - or may not be quoted according to opts.quote |
3179 | - |
3180 | - opts.quote = 1 --> quote it |
3181 | - opts.quote = 0 --> do not quote it |
3182 | - opts.quote = None --> guess |
3183 | - """ |
3184 | - quote = opts.quote |
3185 | - |
3186 | - if opts.prefix: |
3187 | - url = self.add_prefix(url, opts.prefix) |
3188 | - |
3189 | - parts = urlparse.urlparse(url) |
3190 | - (scheme, host, path, parm, query, frag) = parts |
3191 | - |
3192 | - if not scheme or (len(scheme) == 1 and scheme in string.letters): |
3193 | - # if a scheme isn't specified, we guess that it's "file:" |
3194 | - if url[0] not in '/\\': url = os.path.abspath(url) |
3195 | - url = 'file:' + urllib.pathname2url(url) |
3196 | - parts = urlparse.urlparse(url) |
3197 | - quote = 0 # pathname2url quotes, so we won't do it again |
3198 | - |
3199 | - if scheme in ['http', 'https']: |
3200 | - parts = self.process_http(parts, url) |
3201 | - |
3202 | - if quote is None: |
3203 | - quote = self.guess_should_quote(parts) |
3204 | - if quote: |
3205 | - parts = self.quote(parts) |
3206 | - |
3207 | - url = urlparse.urlunparse(parts) |
3208 | - return url, parts |
3209 | - |
3210 | - def add_prefix(self, url, prefix): |
3211 | - if prefix[-1] == '/' or url[0] == '/': |
3212 | - url = prefix + url |
3213 | - else: |
3214 | - url = prefix + '/' + url |
3215 | - return url |
3216 | - |
3217 | - def process_http(self, parts, url): |
3218 | - (scheme, host, path, parm, query, frag) = parts |
3219 | - # TODO: auth-parsing here, maybe? pycurl doesn't really need it |
3220 | - return (scheme, host, path, parm, query, frag) |
3221 | - |
3222 | - def quote(self, parts): |
3223 | - """quote the URL |
3224 | - |
3225 | - This method quotes ONLY the path part. If you need to quote |
3226 | - other parts, you should override this and pass in your derived |
3227 | - class. The other alternative is to quote other parts before |
3228 | - passing into urlgrabber. |
3229 | - """ |
3230 | - (scheme, host, path, parm, query, frag) = parts |
3231 | - path = urllib.quote(path) |
3232 | - return (scheme, host, path, parm, query, frag) |
3233 | - |
3234 | - hexvals = '0123456789ABCDEF' |
3235 | - def guess_should_quote(self, parts): |
3236 | - """ |
3237 | - Guess whether we should quote a path. This amounts to |
3238 | - guessing whether it's already quoted. |
3239 | - |
3240 | - find ' ' -> 1 |
3241 | - find '%' -> 1 |
3242 | - find '%XX' -> 0 |
3243 | - else -> 1 |
3244 | - """ |
3245 | - (scheme, host, path, parm, query, frag) = parts |
3246 | - if ' ' in path: |
3247 | - return 1 |
3248 | - ind = string.find(path, '%') |
3249 | - if ind > -1: |
3250 | - while ind > -1: |
3251 | - if len(path) < ind+3: |
3252 | - return 1 |
3253 | - code = path[ind+1:ind+3].upper() |
3254 | - if code[0] not in self.hexvals or \ |
3255 | - code[1] not in self.hexvals: |
3256 | - return 1 |
3257 | - ind = string.find(path, '%', ind+1) |
3258 | - return 0 |
3259 | - return 1 |
3260 | - |
3261 | -class URLGrabberOptions: |
3262 | - """Class to ease kwargs handling.""" |
3263 | - |
3264 | - def __init__(self, delegate=None, **kwargs): |
3265 | - """Initialize URLGrabberOptions object. |
3266 | - Set default values for all options and then update options specified |
3267 | - in kwargs. |
3268 | - """ |
3269 | - self.delegate = delegate |
3270 | - if delegate is None: |
3271 | - self._set_defaults() |
3272 | - self._set_attributes(**kwargs) |
3273 | - |
3274 | - def __getattr__(self, name): |
3275 | - if self.delegate and hasattr(self.delegate, name): |
3276 | - return getattr(self.delegate, name) |
3277 | - raise AttributeError, name |
3278 | - |
3279 | - def raw_throttle(self): |
3280 | - """Calculate raw throttle value from throttle and bandwidth |
3281 | - values. |
3282 | - """ |
3283 | - if self.throttle <= 0: |
3284 | - return 0 |
3285 | - elif type(self.throttle) == type(0): |
3286 | - return float(self.throttle) |
3287 | - else: # throttle is a float |
3288 | - return self.bandwidth * self.throttle |
3289 | - |
3290 | - def derive(self, **kwargs): |
3291 | - """Create a derived URLGrabberOptions instance. |
3292 | - This method creates a new instance and overrides the |
3293 | - options specified in kwargs. |
3294 | - """ |
3295 | - return URLGrabberOptions(delegate=self, **kwargs) |
3296 | - |
3297 | - def _set_attributes(self, **kwargs): |
3298 | - """Update object attributes with those provided in kwargs.""" |
3299 | - self.__dict__.update(kwargs) |
3300 | - if kwargs.has_key('range'): |
3301 | - # normalize the supplied range value |
3302 | - self.range = range_tuple_normalize(self.range) |
3303 | - if not self.reget in [None, 'simple', 'check_timestamp']: |
3304 | - raise URLGrabError(11, _('Illegal reget mode: %s') \ |
3305 | - % (self.reget, )) |
3306 | - |
3307 | - def _set_defaults(self): |
3308 | - """Set all options to their default values. |
3309 | - When adding new options, make sure a default is |
3310 | - provided here. |
3311 | - """ |
3312 | - self.progress_obj = None |
3313 | - self.throttle = 1.0 |
3314 | - self.bandwidth = 0 |
3315 | - self.retry = None |
3316 | - self.retrycodes = [-1,2,4,5,6,7] |
3317 | - self.checkfunc = None |
3318 | - self.copy_local = 0 |
3319 | - self.close_connection = 0 |
3320 | - self.range = None |
3321 | - self.user_agent = 'urlgrabber/%s' % __version__ |
3322 | - self.keepalive = 1 |
3323 | - self.proxies = None |
3324 | - self.reget = None |
3325 | - self.failure_callback = None |
3326 | - self.interrupt_callback = None |
3327 | - self.prefix = None |
3328 | - self.opener = None |
3329 | - self.cache_openers = True |
3330 | - self.timeout = 300 |
3331 | - self.text = None |
3332 | - self.http_headers = None |
3333 | - self.ftp_headers = None |
3334 | - self.data = None |
3335 | - self.urlparser = URLParser() |
3336 | - self.quote = None |
3337 | - self.ssl_ca_cert = None # sets SSL_CAINFO - path to certdb |
3338 | - self.ssl_context = None # no-op in pycurl |
3339 | - self.ssl_verify_peer = True # check peer's cert for authenticityb |
3340 | - self.ssl_verify_host = True # make sure who they are and who the cert is for matches |
3341 | - self.ssl_key = None # client key |
3342 | - self.ssl_key_type = 'PEM' #(or DER) |
3343 | - self.ssl_cert = None # client cert |
3344 | - self.ssl_cert_type = 'PEM' # (or DER) |
3345 | - self.ssl_key_pass = None # password to access the key |
3346 | - self.size = None # if we know how big the thing we're getting is going |
3347 | - # to be. this is ultimately a MAXIMUM size for the file |
3348 | - self.max_header_size = 2097152 #2mb seems reasonable for maximum header size |
3349 | - |
3350 | - def __repr__(self): |
3351 | - return self.format() |
3352 | - |
3353 | - def format(self, indent=' '): |
3354 | - keys = self.__dict__.keys() |
3355 | - if self.delegate is not None: |
3356 | - keys.remove('delegate') |
3357 | - keys.sort() |
3358 | - s = '{\n' |
3359 | - for k in keys: |
3360 | - s = s + indent + '%-15s: %s,\n' % \ |
3361 | - (repr(k), repr(self.__dict__[k])) |
3362 | - if self.delegate: |
3363 | - df = self.delegate.format(indent + ' ') |
3364 | - s = s + indent + '%-15s: %s\n' % ("'delegate'", df) |
3365 | - s = s + indent + '}' |
3366 | - return s |
3367 | - |
3368 | -class URLGrabber: |
3369 | - """Provides easy opening of URLs with a variety of options. |
3370 | - |
3371 | - All options are specified as kwargs. Options may be specified when |
3372 | - the class is created and may be overridden on a per request basis. |
3373 | - |
3374 | - New objects inherit default values from default_grabber. |
3375 | - """ |
3376 | - |
3377 | - def __init__(self, **kwargs): |
3378 | - self.opts = URLGrabberOptions(**kwargs) |
3379 | - |
3380 | - def _retry(self, opts, func, *args): |
3381 | - tries = 0 |
3382 | - while 1: |
3383 | - # there are only two ways out of this loop. The second has |
3384 | - # several "sub-ways" |
3385 | - # 1) via the return in the "try" block |
3386 | - # 2) by some exception being raised |
3387 | - # a) an excepton is raised that we don't "except" |
3388 | - # b) a callback raises ANY exception |
3389 | - # c) we're not retry-ing or have run out of retries |
3390 | - # d) the URLGrabError code is not in retrycodes |
3391 | - # beware of infinite loops :) |
3392 | - tries = tries + 1 |
3393 | - exception = None |
3394 | - retrycode = None |
3395 | - callback = None |
3396 | - if DEBUG: DEBUG.info('attempt %i/%s: %s', |
3397 | - tries, opts.retry, args[0]) |
3398 | - try: |
3399 | - r = apply(func, (opts,) + args, {}) |
3400 | - if DEBUG: DEBUG.info('success') |
3401 | - return r |
3402 | - except URLGrabError, e: |
3403 | - exception = e |
3404 | - callback = opts.failure_callback |
3405 | - retrycode = e.errno |
3406 | - except KeyboardInterrupt, e: |
3407 | - exception = e |
3408 | - callback = opts.interrupt_callback |
3409 | - |
3410 | - if DEBUG: DEBUG.info('exception: %s', exception) |
3411 | - if callback: |
3412 | - if DEBUG: DEBUG.info('calling callback: %s', callback) |
3413 | - cb_func, cb_args, cb_kwargs = self._make_callback(callback) |
3414 | - obj = CallbackObject(exception=exception, url=args[0], |
3415 | - tries=tries, retry=opts.retry) |
3416 | - cb_func(obj, *cb_args, **cb_kwargs) |
3417 | - |
3418 | - if (opts.retry is None) or (tries == opts.retry): |
3419 | - if DEBUG: DEBUG.info('retries exceeded, re-raising') |
3420 | - raise |
3421 | - |
3422 | - if (retrycode is not None) and (retrycode not in opts.retrycodes): |
3423 | - if DEBUG: DEBUG.info('retrycode (%i) not in list %s, re-raising', |
3424 | - retrycode, opts.retrycodes) |
3425 | - raise |
3426 | - |
3427 | - def urlopen(self, url, **kwargs): |
3428 | - """open the url and return a file object |
3429 | - If a progress object or throttle value specified when this |
3430 | - object was created, then a special file object will be |
3431 | - returned that supports them. The file object can be treated |
3432 | - like any other file object. |
3433 | - """ |
3434 | - opts = self.opts.derive(**kwargs) |
3435 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
3436 | - (url,parts) = opts.urlparser.parse(url, opts) |
3437 | - def retryfunc(opts, url): |
3438 | - return PyCurlFileObject(url, filename=None, opts=opts) |
3439 | - return self._retry(opts, retryfunc, url) |
3440 | - |
3441 | - def urlgrab(self, url, filename=None, **kwargs): |
3442 | - """grab the file at <url> and make a local copy at <filename> |
3443 | - If filename is none, the basename of the url is used. |
3444 | - urlgrab returns the filename of the local file, which may be |
3445 | - different from the passed-in filename if copy_local == 0. |
3446 | - """ |
3447 | - opts = self.opts.derive(**kwargs) |
3448 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
3449 | - (url,parts) = opts.urlparser.parse(url, opts) |
3450 | - (scheme, host, path, parm, query, frag) = parts |
3451 | - if filename is None: |
3452 | - filename = os.path.basename( urllib.unquote(path) ) |
3453 | - if scheme == 'file' and not opts.copy_local: |
3454 | - # just return the name of the local file - don't make a |
3455 | - # copy currently |
3456 | - path = urllib.url2pathname(path) |
3457 | - if host: |
3458 | - path = os.path.normpath('//' + host + path) |
3459 | - if not os.path.exists(path): |
3460 | - err = URLGrabError(2, |
3461 | - _('Local file does not exist: %s') % (path, )) |
3462 | - err.url = url |
3463 | - raise err |
3464 | - elif not os.path.isfile(path): |
3465 | - err = URLGrabError(3, |
3466 | - _('Not a normal file: %s') % (path, )) |
3467 | - err.url = url |
3468 | - raise err |
3469 | - |
3470 | - elif not opts.range: |
3471 | - if not opts.checkfunc is None: |
3472 | - cb_func, cb_args, cb_kwargs = \ |
3473 | - self._make_callback(opts.checkfunc) |
3474 | - obj = CallbackObject() |
3475 | - obj.filename = path |
3476 | - obj.url = url |
3477 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
3478 | - return path |
3479 | - |
3480 | - def retryfunc(opts, url, filename): |
3481 | - fo = PyCurlFileObject(url, filename, opts) |
3482 | - try: |
3483 | - fo._do_grab() |
3484 | - if not opts.checkfunc is None: |
3485 | - cb_func, cb_args, cb_kwargs = \ |
3486 | - self._make_callback(opts.checkfunc) |
3487 | - obj = CallbackObject() |
3488 | - obj.filename = filename |
3489 | - obj.url = url |
3490 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
3491 | - finally: |
3492 | - fo.close() |
3493 | - return filename |
3494 | - |
3495 | - return self._retry(opts, retryfunc, url, filename) |
3496 | - |
3497 | - def urlread(self, url, limit=None, **kwargs): |
3498 | - """read the url into a string, up to 'limit' bytes |
3499 | - If the limit is exceeded, an exception will be thrown. Note |
3500 | - that urlread is NOT intended to be used as a way of saying |
3501 | - "I want the first N bytes" but rather 'read the whole file |
3502 | - into memory, but don't use too much' |
3503 | - """ |
3504 | - opts = self.opts.derive(**kwargs) |
3505 | - if DEBUG: DEBUG.debug('combined options: %s' % repr(opts)) |
3506 | - (url,parts) = opts.urlparser.parse(url, opts) |
3507 | - if limit is not None: |
3508 | - limit = limit + 1 |
3509 | - |
3510 | - def retryfunc(opts, url, limit): |
3511 | - fo = PyCurlFileObject(url, filename=None, opts=opts) |
3512 | - s = '' |
3513 | - try: |
3514 | - # this is an unfortunate thing. Some file-like objects |
3515 | - # have a default "limit" of None, while the built-in (real) |
3516 | - # file objects have -1. They each break the other, so for |
3517 | - # now, we just force the default if necessary. |
3518 | - if limit is None: s = fo.read() |
3519 | - else: s = fo.read(limit) |
3520 | - |
3521 | - if not opts.checkfunc is None: |
3522 | - cb_func, cb_args, cb_kwargs = \ |
3523 | - self._make_callback(opts.checkfunc) |
3524 | - obj = CallbackObject() |
3525 | - obj.data = s |
3526 | - obj.url = url |
3527 | - apply(cb_func, (obj, )+cb_args, cb_kwargs) |
3528 | - finally: |
3529 | - fo.close() |
3530 | - return s |
3531 | - |
3532 | - s = self._retry(opts, retryfunc, url, limit) |
3533 | - if limit and len(s) > limit: |
3534 | - err = URLGrabError(8, |
3535 | - _('Exceeded limit (%i): %s') % (limit, url)) |
3536 | - err.url = url |
3537 | - raise err |
3538 | - |
3539 | - return s |
3540 | - |
3541 | - def _make_callback(self, callback_obj): |
3542 | - if callable(callback_obj): |
3543 | - return callback_obj, (), {} |
3544 | - else: |
3545 | - return callback_obj |
3546 | - |
3547 | -# create the default URLGrabber used by urlXXX functions. |
3548 | -# NOTE: actual defaults are set in URLGrabberOptions |
3549 | -default_grabber = URLGrabber() |
3550 | - |
3551 | - |
3552 | -class PyCurlFileObject(): |
3553 | - def __init__(self, url, filename, opts): |
3554 | - self.fo = None |
3555 | - self._hdr_dump = '' |
3556 | - self._parsed_hdr = None |
3557 | - self.url = url |
3558 | - self.scheme = urlparse.urlsplit(self.url)[0] |
3559 | - self.filename = filename |
3560 | - self.append = False |
3561 | - self.reget_time = None |
3562 | - self.opts = opts |
3563 | - if self.opts.reget == 'check_timestamp': |
3564 | - raise NotImplementedError, "check_timestamp regets are not implemented in this ver of urlgrabber. Please report this." |
3565 | - self._complete = False |
3566 | - self._rbuf = '' |
3567 | - self._rbufsize = 1024*8 |
3568 | - self._ttime = time.time() |
3569 | - self._tsize = 0 |
3570 | - self._amount_read = 0 |
3571 | - self._reget_length = 0 |
3572 | - self._prog_running = False |
3573 | - self._error = (None, None) |
3574 | - self.size = 0 |
3575 | - self._hdr_ended = False |
3576 | - self._do_open() |
3577 | - |
3578 | - |
3579 | - def geturl(self): |
3580 | - """ Provide the geturl() method, used to be got from |
3581 | - urllib.addinfourl, via. urllib.URLopener.* """ |
3582 | - return self.url |
3583 | - |
3584 | - def __getattr__(self, name): |
3585 | - """This effectively allows us to wrap at the instance level. |
3586 | - Any attribute not found in _this_ object will be searched for |
3587 | - in self.fo. This includes methods.""" |
3588 | - |
3589 | - if hasattr(self.fo, name): |
3590 | - return getattr(self.fo, name) |
3591 | - raise AttributeError, name |
3592 | - |
3593 | - def _retrieve(self, buf): |
3594 | - try: |
3595 | - if not self._prog_running: |
3596 | - if self.opts.progress_obj: |
3597 | - size = self.size + self._reget_length |
3598 | - self.opts.progress_obj.start(self._prog_reportname, |
3599 | - urllib.unquote(self.url), |
3600 | - self._prog_basename, |
3601 | - size=size, |
3602 | - text=self.opts.text) |
3603 | - self._prog_running = True |
3604 | - self.opts.progress_obj.update(self._amount_read) |
3605 | - |
3606 | - self._amount_read += len(buf) |
3607 | - self.fo.write(buf) |
3608 | - return len(buf) |
3609 | - except KeyboardInterrupt: |
3610 | - return -1 |
3611 | - |
3612 | - def _hdr_retrieve(self, buf): |
3613 | - if self._hdr_ended: |
3614 | - self._hdr_dump = '' |
3615 | - self.size = 0 |
3616 | - self._hdr_ended = False |
3617 | - |
3618 | - if self._over_max_size(cur=len(self._hdr_dump), |
3619 | - max_size=self.opts.max_header_size): |
3620 | - return -1 |
3621 | - try: |
3622 | - self._hdr_dump += buf |
3623 | - # we have to get the size before we do the progress obj start |
3624 | - # but we can't do that w/o making it do 2 connects, which sucks |
3625 | - # so we cheat and stuff it in here in the hdr_retrieve |
3626 | - if self.scheme in ['http','https'] and buf.lower().find('content-length') != -1: |
3627 | - length = buf.split(':')[1] |
3628 | - self.size = int(length) |
3629 | - elif self.scheme in ['ftp']: |
3630 | - s = None |
3631 | - if buf.startswith('213 '): |
3632 | - s = buf[3:].strip() |
3633 | - elif buf.startswith('150 '): |
3634 | - s = parse150(buf) |
3635 | - if s: |
3636 | - self.size = int(s) |
3637 | - |
3638 | - if buf.lower().find('location') != -1: |
3639 | - location = ':'.join(buf.split(':')[1:]) |
3640 | - location = location.strip() |
3641 | - self.scheme = urlparse.urlsplit(location)[0] |
3642 | - self.url = location |
3643 | - |
3644 | - if len(self._hdr_dump) != 0 and buf == '\r\n': |
3645 | - self._hdr_ended = True |
3646 | - if DEBUG: DEBUG.info('header ended:') |
3647 | - |
3648 | - return len(buf) |
3649 | - except KeyboardInterrupt: |
3650 | - return pycurl.READFUNC_ABORT |
3651 | - |
3652 | - def _return_hdr_obj(self): |
3653 | - if self._parsed_hdr: |
3654 | - return self._parsed_hdr |
3655 | - statusend = self._hdr_dump.find('\n') |
3656 | - statusend += 1 # ridiculous as it may seem. |
3657 | - hdrfp = StringIO() |
3658 | - hdrfp.write(self._hdr_dump[statusend:]) |
3659 | - hdrfp.seek(0) |
3660 | - self._parsed_hdr = mimetools.Message(hdrfp) |
3661 | - return self._parsed_hdr |
3662 | - |
3663 | - hdr = property(_return_hdr_obj) |
3664 | - http_code = property(fget= |
3665 | - lambda self: self.curl_obj.getinfo(pycurl.RESPONSE_CODE)) |
3666 | - |
3667 | - def _set_opts(self, opts={}): |
3668 | - # XXX |
3669 | - if not opts: |
3670 | - opts = self.opts |
3671 | - |
3672 | - |
3673 | - # defaults we're always going to set |
3674 | - self.curl_obj.setopt(pycurl.NOPROGRESS, False) |
3675 | - self.curl_obj.setopt(pycurl.NOSIGNAL, True) |
3676 | - self.curl_obj.setopt(pycurl.WRITEFUNCTION, self._retrieve) |
3677 | - self.curl_obj.setopt(pycurl.HEADERFUNCTION, self._hdr_retrieve) |
3678 | - self.curl_obj.setopt(pycurl.PROGRESSFUNCTION, self._progress_update) |
3679 | - self.curl_obj.setopt(pycurl.FAILONERROR, True) |
3680 | - self.curl_obj.setopt(pycurl.OPT_FILETIME, True) |
3681 | - self.curl_obj.setopt(pycurl.FOLLOWLOCATION, True) |
3682 | - |
3683 | - if DEBUG: |
3684 | - self.curl_obj.setopt(pycurl.VERBOSE, True) |
3685 | - if opts.user_agent: |
3686 | - self.curl_obj.setopt(pycurl.USERAGENT, opts.user_agent) |
3687 | - |
3688 | - # maybe to be options later |
3689 | - self.curl_obj.setopt(pycurl.FOLLOWLOCATION, True) |
3690 | - self.curl_obj.setopt(pycurl.MAXREDIRS, 5) |
3691 | - |
3692 | - # timeouts |
3693 | - timeout = 300 |
3694 | - if hasattr(opts, 'timeout'): |
3695 | - timeout = int(opts.timeout or 0) |
3696 | - self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) |
3697 | - self.curl_obj.setopt(pycurl.LOW_SPEED_LIMIT, 1) |
3698 | - self.curl_obj.setopt(pycurl.LOW_SPEED_TIME, timeout) |
3699 | - |
3700 | - # ssl options |
3701 | - if self.scheme == 'https': |
3702 | - if opts.ssl_ca_cert: # this may do ZERO with nss according to curl docs |
3703 | - self.curl_obj.setopt(pycurl.CAPATH, opts.ssl_ca_cert) |
3704 | - self.curl_obj.setopt(pycurl.CAINFO, opts.ssl_ca_cert) |
3705 | - self.curl_obj.setopt(pycurl.SSL_VERIFYPEER, opts.ssl_verify_peer) |
3706 | - self.curl_obj.setopt(pycurl.SSL_VERIFYHOST, opts.ssl_verify_host) |
3707 | - if opts.ssl_key: |
3708 | - self.curl_obj.setopt(pycurl.SSLKEY, opts.ssl_key) |
3709 | - if opts.ssl_key_type: |
3710 | - self.curl_obj.setopt(pycurl.SSLKEYTYPE, opts.ssl_key_type) |
3711 | - if opts.ssl_cert: |
3712 | - self.curl_obj.setopt(pycurl.SSLCERT, opts.ssl_cert) |
3713 | - if opts.ssl_cert_type: |
3714 | - self.curl_obj.setopt(pycurl.SSLCERTTYPE, opts.ssl_cert_type) |
3715 | - if opts.ssl_key_pass: |
3716 | - self.curl_obj.setopt(pycurl.SSLKEYPASSWD, opts.ssl_key_pass) |
3717 | - |
3718 | - #headers: |
3719 | - if opts.http_headers and self.scheme in ('http', 'https'): |
3720 | - headers = [] |
3721 | - for (tag, content) in opts.http_headers: |
3722 | - headers.append('%s:%s' % (tag, content)) |
3723 | - self.curl_obj.setopt(pycurl.HTTPHEADER, headers) |
3724 | - |
3725 | - # ranges: |
3726 | - if opts.range or opts.reget: |
3727 | - range_str = self._build_range() |
3728 | - if range_str: |
3729 | - self.curl_obj.setopt(pycurl.RANGE, range_str) |
3730 | - |
3731 | - # throttle/bandwidth |
3732 | - if hasattr(opts, 'raw_throttle') and opts.raw_throttle(): |
3733 | - self.curl_obj.setopt(pycurl.MAX_RECV_SPEED_LARGE, int(opts.raw_throttle())) |
3734 | - |
3735 | - # proxy settings |
3736 | - if opts.proxies: |
3737 | - for (scheme, proxy) in opts.proxies.items(): |
3738 | - if self.scheme in ('ftp'): # only set the ftp proxy for ftp items |
3739 | - if scheme not in ('ftp'): |
3740 | - continue |
3741 | - else: |
3742 | - if proxy == '_none_': proxy = "" |
3743 | - self.curl_obj.setopt(pycurl.PROXY, proxy) |
3744 | - elif self.scheme in ('http', 'https'): |
3745 | - if scheme not in ('http', 'https'): |
3746 | - continue |
3747 | - else: |
3748 | - if proxy == '_none_': proxy = "" |
3749 | - self.curl_obj.setopt(pycurl.PROXY, proxy) |
3750 | - |
3751 | - # FIXME username/password/auth settings |
3752 | - |
3753 | - #posts - simple - expects the fields as they are |
3754 | - if opts.data: |
3755 | - self.curl_obj.setopt(pycurl.POST, True) |
3756 | - self.curl_obj.setopt(pycurl.POSTFIELDS, self._to_utf8(opts.data)) |
3757 | - |
3758 | - # our url |
3759 | - self.curl_obj.setopt(pycurl.URL, self.url) |
3760 | - |
3761 | - |
3762 | - def _do_perform(self): |
3763 | - if self._complete: |
3764 | - return |
3765 | - |
3766 | - try: |
3767 | - self.curl_obj.perform() |
3768 | - except pycurl.error, e: |
3769 | - # XXX - break some of these out a bit more clearly |
3770 | - # to other URLGrabErrors from |
3771 | - # http://curl.haxx.se/libcurl/c/libcurl-errors.html |
3772 | - # this covers e.args[0] == 22 pretty well - which will be common |
3773 | - |
3774 | - code = self.http_code |
3775 | - errcode = e.args[0] |
3776 | - if self._error[0]: |
3777 | - errcode = self._error[0] |
3778 | - |
3779 | - if errcode == 23 and code >= 200 and code < 299: |
3780 | - err = URLGrabError(15, _('User (or something) called abort %s: %s') % (self.url, e)) |
3781 | - err.url = self.url |
3782 | - |
3783 | - # this is probably wrong but ultimately this is what happens |
3784 | - # we have a legit http code and a pycurl 'writer failed' code |
3785 | - # which almost always means something aborted it from outside |
3786 | - # since we cannot know what it is -I'm banking on it being |
3787 | - # a ctrl-c. XXXX - if there's a way of going back two raises to |
3788 | - # figure out what aborted the pycurl process FIXME |
3789 | - raise KeyboardInterrupt |
3790 | - |
3791 | - elif errcode == 28: |
3792 | - err = URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
3793 | - err.url = self.url |
3794 | - raise err |
3795 | - elif errcode == 35: |
3796 | - msg = _("problem making ssl connection") |
3797 | - err = URLGrabError(14, msg) |
3798 | - err.url = self.url |
3799 | - raise err |
3800 | - elif errcode == 37: |
3801 | - msg = _("Could not open/read %s") % (self.url) |
3802 | - err = URLGrabError(14, msg) |
3803 | - err.url = self.url |
3804 | - raise err |
3805 | - |
3806 | - elif errcode == 42: |
3807 | - err = URLGrabError(15, _('User (or something) called abort %s: %s') % (self.url, e)) |
3808 | - err.url = self.url |
3809 | - # this is probably wrong but ultimately this is what happens |
3810 | - # we have a legit http code and a pycurl 'writer failed' code |
3811 | - # which almost always means something aborted it from outside |
3812 | - # since we cannot know what it is -I'm banking on it being |
3813 | - # a ctrl-c. XXXX - if there's a way of going back two raises to |
3814 | - # figure out what aborted the pycurl process FIXME |
3815 | - raise KeyboardInterrupt |
3816 | - |
3817 | - elif errcode == 58: |
3818 | - msg = _("problem with the local client certificate") |
3819 | - err = URLGrabError(14, msg) |
3820 | - err.url = self.url |
3821 | - raise err |
3822 | - |
3823 | - elif errcode == 60: |
3824 | - msg = _("Peer cert cannot be verified or peer cert invalid") |
3825 | - err = URLGrabError(14, msg) |
3826 | - err.url = self.url |
3827 | - raise err |
3828 | - |
3829 | - elif errcode == 63: |
3830 | - if self._error[1]: |
3831 | - msg = self._error[1] |
3832 | - else: |
3833 | - msg = _("Max download size exceeded on %s") % (self.url) |
3834 | - err = URLGrabError(14, msg) |
3835 | - err.url = self.url |
3836 | - raise err |
3837 | - |
3838 | - elif str(e.args[1]) == '' and self.http_code != 0: # fake it until you make it |
3839 | - if self.scheme in ['http', 'https']: |
3840 | - msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) |
3841 | - elif self.scheme in ['ftp']: |
3842 | - msg = 'FTP Error %s : %s ' % (self.http_code, self.url) |
3843 | - else: |
3844 | - msg = "Unknown Error: URL=%s , scheme=%s" % (self.url, self.scheme) |
3845 | - else: |
3846 | - msg = 'PYCURL ERROR %s - "%s"' % (errcode, str(e.args[1])) |
3847 | - code = errcode |
3848 | - err = URLGrabError(14, msg) |
3849 | - err.code = code |
3850 | - err.exception = e |
3851 | - raise err |
3852 | - else: |
3853 | - if self._error[1]: |
3854 | - msg = self._error[1] |
3855 | - err = URLGRabError(14, msg) |
3856 | - err.url = self.url |
3857 | - raise err |
3858 | - |
3859 | - def _do_open(self): |
3860 | - self.curl_obj = _curl_cache |
3861 | - self.curl_obj.reset() # reset all old settings away, just in case |
3862 | - # setup any ranges |
3863 | - self._set_opts() |
3864 | - self._do_grab() |
3865 | - return self.fo |
3866 | - |
3867 | - def _add_headers(self): |
3868 | - pass |
3869 | - |
3870 | - def _build_range(self): |
3871 | - reget_length = 0 |
3872 | - rt = None |
3873 | - if self.opts.reget and type(self.filename) in types.StringTypes: |
3874 | - # we have reget turned on and we're dumping to a file |
3875 | - try: |
3876 | - s = os.stat(self.filename) |
3877 | - except OSError: |
3878 | - pass |
3879 | - else: |
3880 | - self.reget_time = s[stat.ST_MTIME] |
3881 | - reget_length = s[stat.ST_SIZE] |
3882 | - |
3883 | - # Set initial length when regetting |
3884 | - self._amount_read = reget_length |
3885 | - self._reget_length = reget_length # set where we started from, too |
3886 | - |
3887 | - rt = reget_length, '' |
3888 | - self.append = 1 |
3889 | - |
3890 | - if self.opts.range: |
3891 | - rt = self.opts.range |
3892 | - if rt[0]: rt = (rt[0] + reget_length, rt[1]) |
3893 | - |
3894 | - if rt: |
3895 | - header = range_tuple_to_header(rt) |
3896 | - if header: |
3897 | - return header.split('=')[1] |
3898 | - |
3899 | - |
3900 | - |
3901 | - def _make_request(self, req, opener): |
3902 | - #XXXX |
3903 | - # This doesn't do anything really, but we could use this |
3904 | - # instead of do_open() to catch a lot of crap errors as |
3905 | - # mstenner did before here |
3906 | - return (self.fo, self.hdr) |
3907 | - |
3908 | - try: |
3909 | - if self.opts.timeout: |
3910 | - old_to = socket.getdefaulttimeout() |
3911 | - socket.setdefaulttimeout(self.opts.timeout) |
3912 | - try: |
3913 | - fo = opener.open(req) |
3914 | - finally: |
3915 | - socket.setdefaulttimeout(old_to) |
3916 | - else: |
3917 | - fo = opener.open(req) |
3918 | - hdr = fo.info() |
3919 | - except ValueError, e: |
3920 | - err = URLGrabError(1, _('Bad URL: %s : %s') % (self.url, e, )) |
3921 | - err.url = self.url |
3922 | - raise err |
3923 | - |
3924 | - except RangeError, e: |
3925 | - err = URLGrabError(9, _('%s on %s') % (e, self.url)) |
3926 | - err.url = self.url |
3927 | - raise err |
3928 | - except urllib2.HTTPError, e: |
3929 | - new_e = URLGrabError(14, _('%s on %s') % (e, self.url)) |
3930 | - new_e.code = e.code |
3931 | - new_e.exception = e |
3932 | - new_e.url = self.url |
3933 | - raise new_e |
3934 | - except IOError, e: |
3935 | - if hasattr(e, 'reason') and isinstance(e.reason, socket.timeout): |
3936 | - err = URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
3937 | - err.url = self.url |
3938 | - raise err |
3939 | - else: |
3940 | - err = URLGrabError(4, _('IOError on %s: %s') % (self.url, e)) |
3941 | - err.url = self.url |
3942 | - raise err |
3943 | - |
3944 | - except OSError, e: |
3945 | - err = URLGrabError(5, _('%s on %s') % (e, self.url)) |
3946 | - err.url = self.url |
3947 | - raise err |
3948 | - |
3949 | - except HTTPException, e: |
3950 | - err = URLGrabError(7, _('HTTP Exception (%s) on %s: %s') % \ |
3951 | - (e.__class__.__name__, self.url, e)) |
3952 | - err.url = self.url |
3953 | - raise err |
3954 | - |
3955 | - else: |
3956 | - return (fo, hdr) |
3957 | - |
3958 | - def _do_grab(self): |
3959 | - """dump the file to a filename or StringIO buffer""" |
3960 | - |
3961 | - if self._complete: |
3962 | - return |
3963 | - _was_filename = False |
3964 | - if type(self.filename) in types.StringTypes and self.filename: |
3965 | - _was_filename = True |
3966 | - self._prog_reportname = str(self.filename) |
3967 | - self._prog_basename = os.path.basename(self.filename) |
3968 | - |
3969 | - if self.append: mode = 'ab' |
3970 | - else: mode = 'wb' |
3971 | - |
3972 | - if DEBUG: DEBUG.info('opening local file "%s" with mode %s' % \ |
3973 | - (self.filename, mode)) |
3974 | - try: |
3975 | - self.fo = open(self.filename, mode) |
3976 | - except IOError, e: |
3977 | - err = URLGrabError(16, _(\ |
3978 | - 'error opening local file from %s, IOError: %s') % (self.url, e)) |
3979 | - err.url = self.url |
3980 | - raise err |
3981 | - |
3982 | - else: |
3983 | - self._prog_reportname = 'MEMORY' |
3984 | - self._prog_basename = 'MEMORY' |
3985 | - |
3986 | - |
3987 | - self.fo = StringIO() |
3988 | - # if this is to be a tempfile instead.... |
3989 | - # it just makes crap in the tempdir |
3990 | - #fh, self._temp_name = mkstemp() |
3991 | - #self.fo = open(self._temp_name, 'wb') |
3992 | - |
3993 | - |
3994 | - self._do_perform() |
3995 | - |
3996 | - |
3997 | - |
3998 | - if _was_filename: |
3999 | - # close it up |
4000 | - self.fo.flush() |
4001 | - self.fo.close() |
4002 | - # set the time |
4003 | - mod_time = self.curl_obj.getinfo(pycurl.INFO_FILETIME) |
4004 | - if mod_time != -1: |
4005 | - try: |
4006 | - os.utime(self.filename, (mod_time, mod_time)) |
4007 | - except OSError, e: |
4008 | - err = URLGrabError(16, _(\ |
4009 | - 'error setting timestamp on file %s from %s, OSError: %s') |
4010 | - % (self.filenameself.url, e)) |
4011 | - err.url = self.url |
4012 | - raise err |
4013 | - # re open it |
4014 | - try: |
4015 | - self.fo = open(self.filename, 'r') |
4016 | - except IOError, e: |
4017 | - err = URLGrabError(16, _(\ |
4018 | - 'error opening file from %s, IOError: %s') % (self.url, e)) |
4019 | - err.url = self.url |
4020 | - raise err |
4021 | - |
4022 | - else: |
4023 | - #self.fo = open(self._temp_name, 'r') |
4024 | - self.fo.seek(0) |
4025 | - |
4026 | - self._complete = True |
4027 | - |
4028 | - def _fill_buffer(self, amt=None): |
4029 | - """fill the buffer to contain at least 'amt' bytes by reading |
4030 | - from the underlying file object. If amt is None, then it will |
4031 | - read until it gets nothing more. It updates the progress meter |
4032 | - and throttles after every self._rbufsize bytes.""" |
4033 | - # the _rbuf test is only in this first 'if' for speed. It's not |
4034 | - # logically necessary |
4035 | - if self._rbuf and not amt is None: |
4036 | - L = len(self._rbuf) |
4037 | - if amt > L: |
4038 | - amt = amt - L |
4039 | - else: |
4040 | - return |
4041 | - |
4042 | - # if we've made it here, then we don't have enough in the buffer |
4043 | - # and we need to read more. |
4044 | - |
4045 | - if not self._complete: self._do_grab() #XXX cheater - change on ranges |
4046 | - |
4047 | - buf = [self._rbuf] |
4048 | - bufsize = len(self._rbuf) |
4049 | - while amt is None or amt: |
4050 | - # first, delay if necessary for throttling reasons |
4051 | - if self.opts.raw_throttle(): |
4052 | - diff = self._tsize/self.opts.raw_throttle() - \ |
4053 | - (time.time() - self._ttime) |
4054 | - if diff > 0: time.sleep(diff) |
4055 | - self._ttime = time.time() |
4056 | - |
4057 | - # now read some data, up to self._rbufsize |
4058 | - if amt is None: readamount = self._rbufsize |
4059 | - else: readamount = min(amt, self._rbufsize) |
4060 | - try: |
4061 | - new = self.fo.read(readamount) |
4062 | - except socket.error, e: |
4063 | - err = URLGrabError(4, _('Socket Error on %s: %s') % (self.url, e)) |
4064 | - err.url = self.url |
4065 | - raise err |
4066 | - |
4067 | - except socket.timeout, e: |
4068 | - raise URLGrabError(12, _('Timeout on %s: %s') % (self.url, e)) |
4069 | - err.url = self.url |
4070 | - raise err |
4071 | - |
4072 | - except IOError, e: |
4073 | - raise URLGrabError(4, _('IOError on %s: %s') %(self.url, e)) |
4074 | - err.url = self.url |
4075 | - raise err |
4076 | - |
4077 | - newsize = len(new) |
4078 | - if not newsize: break # no more to read |
4079 | - |
4080 | - if amt: amt = amt - newsize |
4081 | - buf.append(new) |
4082 | - bufsize = bufsize + newsize |
4083 | - self._tsize = newsize |
4084 | - self._amount_read = self._amount_read + newsize |
4085 | - #if self.opts.progress_obj: |
4086 | - # self.opts.progress_obj.update(self._amount_read) |
4087 | - |
4088 | - self._rbuf = string.join(buf, '') |
4089 | - return |
4090 | - |
4091 | - def _progress_update(self, download_total, downloaded, upload_total, uploaded): |
4092 | - if self._over_max_size(cur=self._amount_read-self._reget_length): |
4093 | - return -1 |
4094 | - |
4095 | - try: |
4096 | - if self._prog_running: |
4097 | - downloaded += self._reget_length |
4098 | - self.opts.progress_obj.update(downloaded) |
4099 | - except KeyboardInterrupt: |
4100 | - return -1 |
4101 | - |
4102 | - def _over_max_size(self, cur, max_size=None): |
4103 | - |
4104 | - if not max_size: |
4105 | - if not self.opts.size: |
4106 | - max_size = self.size |
4107 | - else: |
4108 | - max_size = self.opts.size |
4109 | - |
4110 | - if not max_size: return False # if we have None for all of the Max then this is dumb |
4111 | - |
4112 | - if cur > int(float(max_size) * 1.10): |
4113 | - |
4114 | - msg = _("Downloaded more than max size for %s: %s > %s") \ |
4115 | - % (self.url, cur, max_size) |
4116 | - self._error = (pycurl.E_FILESIZE_EXCEEDED, msg) |
4117 | - return True |
4118 | - return False |
4119 | - |
4120 | - def _to_utf8(self, obj, errors='replace'): |
4121 | - '''convert 'unicode' to an encoded utf-8 byte string ''' |
4122 | - # stolen from yum.i18n |
4123 | - if isinstance(obj, unicode): |
4124 | - obj = obj.encode('utf-8', errors) |
4125 | - return obj |
4126 | - |
4127 | - def read(self, amt=None): |
4128 | - self._fill_buffer(amt) |
4129 | - if amt is None: |
4130 | - s, self._rbuf = self._rbuf, '' |
4131 | - else: |
4132 | - s, self._rbuf = self._rbuf[:amt], self._rbuf[amt:] |
4133 | - return s |
4134 | - |
4135 | - def readline(self, limit=-1): |
4136 | - if not self._complete: self._do_grab() |
4137 | - return self.fo.readline() |
4138 | - |
4139 | - i = string.find(self._rbuf, '\n') |
4140 | - while i < 0 and not (0 < limit <= len(self._rbuf)): |
4141 | - L = len(self._rbuf) |
4142 | - self._fill_buffer(L + self._rbufsize) |
4143 | - if not len(self._rbuf) > L: break |
4144 | - i = string.find(self._rbuf, '\n', L) |
4145 | - |
4146 | - if i < 0: i = len(self._rbuf) |
4147 | - else: i = i+1 |
4148 | - if 0 <= limit < len(self._rbuf): i = limit |
4149 | - |
4150 | - s, self._rbuf = self._rbuf[:i], self._rbuf[i:] |
4151 | - return s |
4152 | - |
4153 | - def close(self): |
4154 | - if self._prog_running: |
4155 | - self.opts.progress_obj.end(self._amount_read) |
4156 | - self.fo.close() |
4157 | - |
4158 | - def geturl(self): |
4159 | - """ Provide the geturl() method, used to be got from |
4160 | - urllib.addinfourl, via. urllib.URLopener.* """ |
4161 | - return self.url |
4162 | - |
4163 | -_curl_cache = pycurl.Curl() # make one and reuse it over and over and over |
4164 | - |
4165 | -def reset_curl_obj(): |
4166 | - """To make sure curl has reread the network/dns info we force a reload""" |
4167 | - global _curl_cache |
4168 | - _curl_cache.close() |
4169 | - _curl_cache = pycurl.Curl() |
4170 | - |
4171 | - |
4172 | - |
4173 | - |
4174 | -##################################################################### |
4175 | -# DEPRECATED FUNCTIONS |
4176 | -def set_throttle(new_throttle): |
4177 | - """Deprecated. Use: default_grabber.throttle = new_throttle""" |
4178 | - default_grabber.throttle = new_throttle |
4179 | - |
4180 | -def set_bandwidth(new_bandwidth): |
4181 | - """Deprecated. Use: default_grabber.bandwidth = new_bandwidth""" |
4182 | - default_grabber.bandwidth = new_bandwidth |
4183 | - |
4184 | -def set_progress_obj(new_progress_obj): |
4185 | - """Deprecated. Use: default_grabber.progress_obj = new_progress_obj""" |
4186 | - default_grabber.progress_obj = new_progress_obj |
4187 | - |
4188 | -def set_user_agent(new_user_agent): |
4189 | - """Deprecated. Use: default_grabber.user_agent = new_user_agent""" |
4190 | - default_grabber.user_agent = new_user_agent |
4191 | - |
4192 | -def retrygrab(url, filename=None, copy_local=0, close_connection=0, |
4193 | - progress_obj=None, throttle=None, bandwidth=None, |
4194 | - numtries=3, retrycodes=[-1,2,4,5,6,7], checkfunc=None): |
4195 | - """Deprecated. Use: urlgrab() with the retry arg instead""" |
4196 | - kwargs = {'copy_local' : copy_local, |
4197 | - 'close_connection' : close_connection, |
4198 | - 'progress_obj' : progress_obj, |
4199 | - 'throttle' : throttle, |
4200 | - 'bandwidth' : bandwidth, |
4201 | - 'retry' : numtries, |
4202 | - 'retrycodes' : retrycodes, |
4203 | - 'checkfunc' : checkfunc |
4204 | - } |
4205 | - return urlgrab(url, filename, **kwargs) |
4206 | - |
4207 | - |
4208 | -##################################################################### |
4209 | -# TESTING |
4210 | -def _main_test(): |
4211 | - try: url, filename = sys.argv[1:3] |
4212 | - except ValueError: |
4213 | - print 'usage:', sys.argv[0], \ |
4214 | - '<url> <filename> [copy_local=0|1] [close_connection=0|1]' |
4215 | - sys.exit() |
4216 | - |
4217 | - kwargs = {} |
4218 | - for a in sys.argv[3:]: |
4219 | - k, v = string.split(a, '=', 1) |
4220 | - kwargs[k] = int(v) |
4221 | - |
4222 | - set_throttle(1.0) |
4223 | - set_bandwidth(32 * 1024) |
4224 | - print "throttle: %s, throttle bandwidth: %s B/s" % (default_grabber.throttle, |
4225 | - default_grabber.bandwidth) |
4226 | - |
4227 | - try: from progress import text_progress_meter |
4228 | - except ImportError, e: pass |
4229 | - else: kwargs['progress_obj'] = text_progress_meter() |
4230 | - |
4231 | - try: name = apply(urlgrab, (url, filename), kwargs) |
4232 | - except URLGrabError, e: print e |
4233 | - else: print 'LOCAL FILE:', name |
4234 | - |
4235 | - |
4236 | -def _retry_test(): |
4237 | - try: url, filename = sys.argv[1:3] |
4238 | - except ValueError: |
4239 | - print 'usage:', sys.argv[0], \ |
4240 | - '<url> <filename> [copy_local=0|1] [close_connection=0|1]' |
4241 | - sys.exit() |
4242 | - |
4243 | - kwargs = {} |
4244 | - for a in sys.argv[3:]: |
4245 | - k, v = string.split(a, '=', 1) |
4246 | - kwargs[k] = int(v) |
4247 | - |
4248 | - try: from progress import text_progress_meter |
4249 | - except ImportError, e: pass |
4250 | - else: kwargs['progress_obj'] = text_progress_meter() |
4251 | - |
4252 | - def cfunc(filename, hello, there='foo'): |
4253 | - print hello, there |
4254 | - import random |
4255 | - rnum = random.random() |
4256 | - if rnum < .5: |
4257 | - print 'forcing retry' |
4258 | - raise URLGrabError(-1, 'forcing retry') |
4259 | - if rnum < .75: |
4260 | - print 'forcing failure' |
4261 | - raise URLGrabError(-2, 'forcing immediate failure') |
4262 | - print 'success' |
4263 | - return |
4264 | - |
4265 | - kwargs['checkfunc'] = (cfunc, ('hello',), {'there':'there'}) |
4266 | - try: name = apply(retrygrab, (url, filename), kwargs) |
4267 | - except URLGrabError, e: print e |
4268 | - else: print 'LOCAL FILE:', name |
4269 | - |
4270 | -def _file_object_test(filename=None): |
4271 | - import cStringIO |
4272 | - if filename is None: |
4273 | - filename = __file__ |
4274 | - print 'using file "%s" for comparisons' % filename |
4275 | - fo = open(filename) |
4276 | - s_input = fo.read() |
4277 | - fo.close() |
4278 | - |
4279 | - for testfunc in [_test_file_object_smallread, |
4280 | - _test_file_object_readall, |
4281 | - _test_file_object_readline, |
4282 | - _test_file_object_readlines]: |
4283 | - fo_input = cStringIO.StringIO(s_input) |
4284 | - fo_output = cStringIO.StringIO() |
4285 | - wrapper = PyCurlFileObject(fo_input, None, 0) |
4286 | - print 'testing %-30s ' % testfunc.__name__, |
4287 | - testfunc(wrapper, fo_output) |
4288 | - s_output = fo_output.getvalue() |
4289 | - if s_output == s_input: print 'passed' |
4290 | - else: print 'FAILED' |
4291 | - |
4292 | -def _test_file_object_smallread(wrapper, fo_output): |
4293 | - while 1: |
4294 | - s = wrapper.read(23) |
4295 | - fo_output.write(s) |
4296 | - if not s: return |
4297 | - |
4298 | -def _test_file_object_readall(wrapper, fo_output): |
4299 | - s = wrapper.read() |
4300 | - fo_output.write(s) |
4301 | - |
4302 | -def _test_file_object_readline(wrapper, fo_output): |
4303 | - while 1: |
4304 | - s = wrapper.readline() |
4305 | - fo_output.write(s) |
4306 | - if not s: return |
4307 | - |
4308 | -def _test_file_object_readlines(wrapper, fo_output): |
4309 | - li = wrapper.readlines() |
4310 | - fo_output.write(string.join(li, '')) |
4311 | - |
4312 | -if __name__ == '__main__': |
4313 | - _main_test() |
4314 | - _retry_test() |
4315 | - _file_object_test('test') |
4316 | |
4317 | === modified file 'ChangeLog' |
4318 | --- ChangeLog 2010-06-21 20:36:19 +0000 |
4319 | +++ ChangeLog 2014-12-13 22:24:13 +0000 |
4320 | @@ -1,3 +1,11 @@ |
4321 | +2013-10-09 Zdenek Pavlas <zpavlas@redhat.com> |
4322 | + |
4323 | + * lots of enahncements and bugfixes |
4324 | + (parallel downloading, mirror profiling, new options) |
4325 | + * updated authors, url |
4326 | + * updated unit tests |
4327 | + * bump version to 3.10 |
4328 | + |
4329 | 2009-09-25 Seth Vidal <skvidal@fedoraproject.org> |
4330 | |
4331 | * urlgrabber/__init__.py: bump version to 3.9.1 |
4332 | |
4333 | === modified file 'MANIFEST' |
4334 | --- MANIFEST 2010-06-21 20:36:19 +0000 |
4335 | +++ MANIFEST 2014-12-13 22:24:13 +0000 |
4336 | @@ -1,3 +1,4 @@ |
4337 | +# file GENERATED by distutils, do NOT edit |
4338 | ChangeLog |
4339 | LICENSE |
4340 | MANIFEST |
4341 | @@ -6,6 +7,7 @@ |
4342 | makefile |
4343 | setup.py |
4344 | scripts/urlgrabber |
4345 | +scripts/urlgrabber-ext-down |
4346 | test/base_test_code.py |
4347 | test/grabberperf.py |
4348 | test/munittest.py |
4349 | |
4350 | === modified file 'PKG-INFO' |
4351 | --- PKG-INFO 2010-06-21 20:36:19 +0000 |
4352 | +++ PKG-INFO 2014-12-13 22:24:13 +0000 |
4353 | @@ -1,37 +1,37 @@ |
4354 | -Metadata-Version: 1.0 |
4355 | +Metadata-Version: 1.1 |
4356 | Name: urlgrabber |
4357 | -Version: 3.9.1 |
4358 | +Version: 3.10.1 |
4359 | Summary: A high-level cross-protocol url-grabber |
4360 | -Home-page: http://linux.duke.edu/projects/urlgrabber/ |
4361 | +Home-page: http://urlgrabber.baseurl.org/ |
4362 | Author: Michael D. Stenner, Ryan Tomayko |
4363 | -Author-email: mstenner@linux.duke.edu, skvidal@fedoraproject.org |
4364 | +Author-email: mstenner@linux.duke.edu, zpavlas@redhat.com |
4365 | License: LGPL |
4366 | Description: A high-level cross-protocol url-grabber. |
4367 | |
4368 | Using urlgrabber, data can be fetched in three basic ways: |
4369 | |
4370 | - urlgrab(url) copy the file to the local filesystem |
4371 | - urlopen(url) open the remote file and return a file object |
4372 | - (like urllib2.urlopen) |
4373 | - urlread(url) return the contents of the file as a string |
4374 | + urlgrab(url) copy the file to the local filesystem |
4375 | + urlopen(url) open the remote file and return a file object |
4376 | + (like urllib2.urlopen) |
4377 | + urlread(url) return the contents of the file as a string |
4378 | |
4379 | When using these functions (or methods), urlgrabber supports the |
4380 | following features: |
4381 | |
4382 | - * identical behavior for http://, ftp://, and file:// urls |
4383 | - * http keepalive - faster downloads of many files by using |
4384 | - only a single connection |
4385 | - * byte ranges - fetch only a portion of the file |
4386 | - * reget - for a urlgrab, resume a partial download |
4387 | - * progress meters - the ability to report download progress |
4388 | - automatically, even when using urlopen! |
4389 | - * throttling - restrict bandwidth usage |
4390 | - * retries - automatically retry a download if it fails. The |
4391 | - number of retries and failure types are configurable. |
4392 | - * authenticated server access for http and ftp |
4393 | - * proxy support - support for authenticated http and ftp proxies |
4394 | - * mirror groups - treat a list of mirrors as a single source, |
4395 | - automatically switching mirrors if there is a failure. |
4396 | + * identical behavior for http://, ftp://, and file:// urls |
4397 | + * http keepalive - faster downloads of many files by using |
4398 | + only a single connection |
4399 | + * byte ranges - fetch only a portion of the file |
4400 | + * reget - for a urlgrab, resume a partial download |
4401 | + * progress meters - the ability to report download progress |
4402 | + automatically, even when using urlopen! |
4403 | + * throttling - restrict bandwidth usage |
4404 | + * retries - automatically retry a download if it fails. The |
4405 | + number of retries and failure types are configurable. |
4406 | + * authenticated server access for http and ftp |
4407 | + * proxy support - support for authenticated http and ftp proxies |
4408 | + * mirror groups - treat a list of mirrors as a single source, |
4409 | + automatically switching mirrors if there is a failure. |
4410 | |
4411 | Platform: UNKNOWN |
4412 | Classifier: Development Status :: 4 - Beta |
4413 | |
4414 | === modified file 'README' |
4415 | --- README 2005-10-23 12:29:28 +0000 |
4416 | +++ README 2014-12-13 22:24:13 +0000 |
4417 | @@ -19,7 +19,7 @@ |
4418 | python setup.py bdist_rpm |
4419 | |
4420 | The rpms (both source and "binary") will be specific to the current |
4421 | -distrubution/version and may not be portable to others. This is |
4422 | +distribution/version and may not be portable to others. This is |
4423 | because they will be built for the currently installed python. |
4424 | |
4425 | keepalive.py and byterange.py are generic urllib2 extension modules and |
4426 | |
4427 | === modified file 'debian/changelog' |
4428 | --- debian/changelog 2014-02-23 13:54:39 +0000 |
4429 | +++ debian/changelog 2014-12-13 22:24:13 +0000 |
4430 | @@ -1,3 +1,10 @@ |
4431 | +urlgrabber (3.10.1-0ubuntu1) vivid; urgency=medium |
4432 | + |
4433 | + * New upstream release. |
4434 | + * Drop all patches, fixed upstream |
4435 | + |
4436 | + -- Jackson Doak <noskcaj@ubuntu.com> Sun, 14 Dec 2014 09:12:57 +1100 |
4437 | + |
4438 | urlgrabber (3.9.1-4ubuntu3) trusty; urgency=medium |
4439 | |
4440 | * Rebuild to drop files installed into /usr/share/pyshared. |
4441 | |
4442 | === removed file 'debian/patches/grabber_fix.diff' |
4443 | --- debian/patches/grabber_fix.diff 2010-07-08 17:40:08 +0000 |
4444 | +++ debian/patches/grabber_fix.diff 1970-01-01 00:00:00 +0000 |
4445 | @@ -1,236 +0,0 @@ |
4446 | ---- urlgrabber-3.9.1/urlgrabber/grabber.py.orig 2010-07-02 21:24:12.000000000 -0400 |
4447 | -+++ urlgrabber-3.9.1/urlgrabber/grabber.py 2010-07-02 20:30:25.000000000 -0400 |
4448 | -@@ -68,14 +68,14 @@ |
4449 | - (which can be set on default_grabber.throttle) is used. See |
4450 | - BANDWIDTH THROTTLING for more information. |
4451 | - |
4452 | -- timeout = None |
4453 | -+ timeout = 300 |
4454 | - |
4455 | -- a positive float expressing the number of seconds to wait for socket |
4456 | -- operations. If the value is None or 0.0, socket operations will block |
4457 | -- forever. Setting this option causes urlgrabber to call the settimeout |
4458 | -- method on the Socket object used for the request. See the Python |
4459 | -- documentation on settimeout for more information. |
4460 | -- http://www.python.org/doc/current/lib/socket-objects.html |
4461 | -+ a positive integer expressing the number of seconds to wait before |
4462 | -+ timing out attempts to connect to a server. If the value is None |
4463 | -+ or 0, connection attempts will not time out. The timeout is passed |
4464 | -+ to the underlying pycurl object as its CONNECTTIMEOUT option, see |
4465 | -+ the curl documentation on CURLOPT_CONNECTTIMEOUT for more information. |
4466 | -+ http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCONNECTTIMEOUT |
4467 | - |
4468 | - bandwidth = 0 |
4469 | - |
4470 | -@@ -439,6 +439,12 @@ |
4471 | - except: |
4472 | - __version__ = '???' |
4473 | - |
4474 | -+try: |
4475 | -+ # this part isn't going to do much - need to talk to gettext |
4476 | -+ from i18n import _ |
4477 | -+except ImportError, msg: |
4478 | -+ def _(st): return st |
4479 | -+ |
4480 | - ######################################################################## |
4481 | - # functions for debugging output. These functions are here because they |
4482 | - # are also part of the module initialization. |
4483 | -@@ -808,7 +814,7 @@ |
4484 | - self.prefix = None |
4485 | - self.opener = None |
4486 | - self.cache_openers = True |
4487 | -- self.timeout = None |
4488 | -+ self.timeout = 300 |
4489 | - self.text = None |
4490 | - self.http_headers = None |
4491 | - self.ftp_headers = None |
4492 | -@@ -1052,9 +1058,15 @@ |
4493 | - self._reget_length = 0 |
4494 | - self._prog_running = False |
4495 | - self._error = (None, None) |
4496 | -- self.size = None |
4497 | -+ self.size = 0 |
4498 | -+ self._hdr_ended = False |
4499 | - self._do_open() |
4500 | - |
4501 | -+ |
4502 | -+ def geturl(self): |
4503 | -+ """ Provide the geturl() method, used to be got from |
4504 | -+ urllib.addinfourl, via. urllib.URLopener.* """ |
4505 | -+ return self.url |
4506 | - |
4507 | - def __getattr__(self, name): |
4508 | - """This effectively allows us to wrap at the instance level. |
4509 | -@@ -1085,9 +1097,14 @@ |
4510 | - return -1 |
4511 | - |
4512 | - def _hdr_retrieve(self, buf): |
4513 | -+ if self._hdr_ended: |
4514 | -+ self._hdr_dump = '' |
4515 | -+ self.size = 0 |
4516 | -+ self._hdr_ended = False |
4517 | -+ |
4518 | - if self._over_max_size(cur=len(self._hdr_dump), |
4519 | - max_size=self.opts.max_header_size): |
4520 | -- return -1 |
4521 | -+ return -1 |
4522 | - try: |
4523 | - self._hdr_dump += buf |
4524 | - # we have to get the size before we do the progress obj start |
4525 | -@@ -1104,7 +1121,17 @@ |
4526 | - s = parse150(buf) |
4527 | - if s: |
4528 | - self.size = int(s) |
4529 | -- |
4530 | -+ |
4531 | -+ if buf.lower().find('location') != -1: |
4532 | -+ location = ':'.join(buf.split(':')[1:]) |
4533 | -+ location = location.strip() |
4534 | -+ self.scheme = urlparse.urlsplit(location)[0] |
4535 | -+ self.url = location |
4536 | -+ |
4537 | -+ if len(self._hdr_dump) != 0 and buf == '\r\n': |
4538 | -+ self._hdr_ended = True |
4539 | -+ if DEBUG: DEBUG.info('header ended:') |
4540 | -+ |
4541 | - return len(buf) |
4542 | - except KeyboardInterrupt: |
4543 | - return pycurl.READFUNC_ABORT |
4544 | -@@ -1113,8 +1140,10 @@ |
4545 | - if self._parsed_hdr: |
4546 | - return self._parsed_hdr |
4547 | - statusend = self._hdr_dump.find('\n') |
4548 | -+ statusend += 1 # ridiculous as it may seem. |
4549 | - hdrfp = StringIO() |
4550 | - hdrfp.write(self._hdr_dump[statusend:]) |
4551 | -+ hdrfp.seek(0) |
4552 | - self._parsed_hdr = mimetools.Message(hdrfp) |
4553 | - return self._parsed_hdr |
4554 | - |
4555 | -@@ -1136,6 +1165,7 @@ |
4556 | - self.curl_obj.setopt(pycurl.PROGRESSFUNCTION, self._progress_update) |
4557 | - self.curl_obj.setopt(pycurl.FAILONERROR, True) |
4558 | - self.curl_obj.setopt(pycurl.OPT_FILETIME, True) |
4559 | -+ self.curl_obj.setopt(pycurl.FOLLOWLOCATION, True) |
4560 | - |
4561 | - if DEBUG: |
4562 | - self.curl_obj.setopt(pycurl.VERBOSE, True) |
4563 | -@@ -1148,9 +1178,11 @@ |
4564 | - |
4565 | - # timeouts |
4566 | - timeout = 300 |
4567 | -- if opts.timeout: |
4568 | -- timeout = int(opts.timeout) |
4569 | -- self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) |
4570 | -+ if hasattr(opts, 'timeout'): |
4571 | -+ timeout = int(opts.timeout or 0) |
4572 | -+ self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) |
4573 | -+ self.curl_obj.setopt(pycurl.LOW_SPEED_LIMIT, 1) |
4574 | -+ self.curl_obj.setopt(pycurl.LOW_SPEED_TIME, timeout) |
4575 | - |
4576 | - # ssl options |
4577 | - if self.scheme == 'https': |
4578 | -@@ -1276,7 +1308,7 @@ |
4579 | - raise err |
4580 | - |
4581 | - elif errcode == 60: |
4582 | -- msg = _("client cert cannot be verified or client cert incorrect") |
4583 | -+ msg = _("Peer cert cannot be verified or peer cert invalid") |
4584 | - err = URLGrabError(14, msg) |
4585 | - err.url = self.url |
4586 | - raise err |
4587 | -@@ -1291,7 +1323,12 @@ |
4588 | - raise err |
4589 | - |
4590 | - elif str(e.args[1]) == '' and self.http_code != 0: # fake it until you make it |
4591 | -- msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) |
4592 | -+ if self.scheme in ['http', 'https']: |
4593 | -+ msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) |
4594 | -+ elif self.scheme in ['ftp']: |
4595 | -+ msg = 'FTP Error %s : %s ' % (self.http_code, self.url) |
4596 | -+ else: |
4597 | -+ msg = "Unknown Error: URL=%s , scheme=%s" % (self.url, self.scheme) |
4598 | - else: |
4599 | - msg = 'PYCURL ERROR %s - "%s"' % (errcode, str(e.args[1])) |
4600 | - code = errcode |
4601 | -@@ -1299,6 +1336,12 @@ |
4602 | - err.code = code |
4603 | - err.exception = e |
4604 | - raise err |
4605 | -+ else: |
4606 | -+ if self._error[1]: |
4607 | -+ msg = self._error[1] |
4608 | -+ err = URLGRabError(14, msg) |
4609 | -+ err.url = self.url |
4610 | -+ raise err |
4611 | - |
4612 | - def _do_open(self): |
4613 | - self.curl_obj = _curl_cache |
4614 | -@@ -1446,9 +1489,23 @@ |
4615 | - # set the time |
4616 | - mod_time = self.curl_obj.getinfo(pycurl.INFO_FILETIME) |
4617 | - if mod_time != -1: |
4618 | -- os.utime(self.filename, (mod_time, mod_time)) |
4619 | -+ try: |
4620 | -+ os.utime(self.filename, (mod_time, mod_time)) |
4621 | -+ except OSError, e: |
4622 | -+ err = URLGrabError(16, _(\ |
4623 | -+ 'error setting timestamp on file %s from %s, OSError: %s') |
4624 | -+ % (self.filenameself.url, e)) |
4625 | -+ err.url = self.url |
4626 | -+ raise err |
4627 | - # re open it |
4628 | -- self.fo = open(self.filename, 'r') |
4629 | -+ try: |
4630 | -+ self.fo = open(self.filename, 'r') |
4631 | -+ except IOError, e: |
4632 | -+ err = URLGrabError(16, _(\ |
4633 | -+ 'error opening file from %s, IOError: %s') % (self.url, e)) |
4634 | -+ err.url = self.url |
4635 | -+ raise err |
4636 | -+ |
4637 | - else: |
4638 | - #self.fo = open(self._temp_name, 'r') |
4639 | - self.fo.seek(0) |
4640 | -@@ -1532,11 +1589,14 @@ |
4641 | - def _over_max_size(self, cur, max_size=None): |
4642 | - |
4643 | - if not max_size: |
4644 | -- max_size = self.size |
4645 | -- if self.opts.size: # if we set an opts size use that, no matter what |
4646 | -- max_size = self.opts.size |
4647 | -+ if not self.opts.size: |
4648 | -+ max_size = self.size |
4649 | -+ else: |
4650 | -+ max_size = self.opts.size |
4651 | -+ |
4652 | - if not max_size: return False # if we have None for all of the Max then this is dumb |
4653 | -- if cur > max_size + max_size*.10: |
4654 | -+ |
4655 | -+ if cur > int(float(max_size) * 1.10): |
4656 | - |
4657 | - msg = _("Downloaded more than max size for %s: %s > %s") \ |
4658 | - % (self.url, cur, max_size) |
4659 | -@@ -1582,9 +1642,21 @@ |
4660 | - self.opts.progress_obj.end(self._amount_read) |
4661 | - self.fo.close() |
4662 | - |
4663 | -- |
4664 | -+ def geturl(self): |
4665 | -+ """ Provide the geturl() method, used to be got from |
4666 | -+ urllib.addinfourl, via. urllib.URLopener.* """ |
4667 | -+ return self.url |
4668 | -+ |
4669 | - _curl_cache = pycurl.Curl() # make one and reuse it over and over and over |
4670 | - |
4671 | -+def reset_curl_obj(): |
4672 | -+ """To make sure curl has reread the network/dns info we force a reload""" |
4673 | -+ global _curl_cache |
4674 | -+ _curl_cache.close() |
4675 | -+ _curl_cache = pycurl.Curl() |
4676 | -+ |
4677 | -+ |
4678 | -+ |
4679 | - |
4680 | - ##################################################################### |
4681 | - # DEPRECATED FUNCTIONS |
4682 | |
4683 | === removed file 'debian/patches/progress_fix.diff' |
4684 | --- debian/patches/progress_fix.diff 2010-07-08 17:40:08 +0000 |
4685 | +++ debian/patches/progress_fix.diff 1970-01-01 00:00:00 +0000 |
4686 | @@ -1,11 +0,0 @@ |
4687 | ---- urlgrabber-3.9.1/urlgrabber/progress.py.orig 2010-07-02 21:25:51.000000000 -0400 |
4688 | -+++ urlgrabber-3.9.1/urlgrabber/progress.py 2010-07-02 20:30:25.000000000 -0400 |
4689 | -@@ -658,6 +658,8 @@ |
4690 | - if seconds is None or seconds < 0: |
4691 | - if use_hours: return '--:--:--' |
4692 | - else: return '--:--' |
4693 | -+ elif seconds == float('inf'): |
4694 | -+ return 'Infinite' |
4695 | - else: |
4696 | - seconds = int(seconds) |
4697 | - minutes = seconds / 60 |
4698 | |
4699 | === removed file 'debian/patches/progress_object_callback_fix.diff' |
4700 | --- debian/patches/progress_object_callback_fix.diff 2011-08-09 17:45:08 +0000 |
4701 | +++ debian/patches/progress_object_callback_fix.diff 1970-01-01 00:00:00 +0000 |
4702 | @@ -1,21 +0,0 @@ |
4703 | -From: James Antill <james@and.org> |
4704 | -Date: Thu, 19 May 2011 20:17:14 +0000 (-0400) |
4705 | -Subject: Fix documentation for progress_object callback. |
4706 | -X-Git-Url: http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commitdiff_plain;h=674d545ee303aa99701ffb982536851572d8db77 |
4707 | - |
4708 | -Fix documentation for progress_object callback. |
4709 | ---- |
4710 | - |
4711 | -diff --git a/urlgrabber/grabber.py b/urlgrabber/grabber.py |
4712 | -index 36212cf..f6f57bd 100644 |
4713 | ---- a/urlgrabber/grabber.py |
4714 | -+++ b/urlgrabber/grabber.py |
4715 | -@@ -49,7 +49,7 @@ GENERAL ARGUMENTS (kwargs) |
4716 | - progress_obj = None |
4717 | - |
4718 | - a class instance that supports the following methods: |
4719 | -- po.start(filename, url, basename, length, text) |
4720 | -+ po.start(filename, url, basename, size, now, text) |
4721 | - # length will be None if unknown |
4722 | - po.update(read) # read == bytes read so far |
4723 | - po.end() |
4724 | |
4725 | === modified file 'debian/patches/series' |
4726 | --- debian/patches/series 2011-08-09 17:45:08 +0000 |
4727 | +++ debian/patches/series 2014-12-13 22:24:13 +0000 |
4728 | @@ -1,3 +0,0 @@ |
4729 | -grabber_fix.diff |
4730 | -progress_fix.diff |
4731 | -progress_object_callback_fix.diff |
4732 | |
4733 | === modified file 'scripts/urlgrabber' |
4734 | --- scripts/urlgrabber 2010-06-21 20:36:19 +0000 |
4735 | +++ scripts/urlgrabber 2014-12-13 22:24:13 +0000 |
4736 | @@ -115,6 +115,7 @@ |
4737 | including quotes in the case of strings. |
4738 | e.g. --user_agent='"foobar/2.0"' |
4739 | |
4740 | + --output FILE |
4741 | -o FILE write output to FILE, otherwise the basename of the |
4742 | url will be used |
4743 | -O print the names of saved files to STDOUT |
4744 | @@ -170,12 +171,17 @@ |
4745 | return ug_options, ug_defaults |
4746 | |
4747 | def process_command_line(self): |
4748 | - short_options = 'vd:hoOpD' |
4749 | + short_options = 'vd:ho:OpD' |
4750 | long_options = ['profile', 'repeat=', 'verbose=', |
4751 | - 'debug=', 'help', 'progress'] |
4752 | + 'debug=', 'help', 'progress', 'output='] |
4753 | ug_long = [ o + '=' for o in self.ug_options ] |
4754 | - optlist, args = getopt.getopt(sys.argv[1:], short_options, |
4755 | - long_options + ug_long) |
4756 | + try: |
4757 | + optlist, args = getopt.getopt(sys.argv[1:], short_options, |
4758 | + long_options + ug_long) |
4759 | + except getopt.GetoptError, e: |
4760 | + print >>sys.stderr, "Error:", e |
4761 | + self.help([], ret=1) |
4762 | + |
4763 | self.verbose = 0 |
4764 | self.debug = None |
4765 | self.outputfile = None |
4766 | @@ -193,6 +199,7 @@ |
4767 | if o == '--verbose': self.verbose = v |
4768 | if o == '-v': self.verbose += 1 |
4769 | if o == '-o': self.outputfile = v |
4770 | + if o == '--output': self.outputfile = v |
4771 | if o == '-p' or o == '--progress': self.progress = 1 |
4772 | if o == '-d' or o == '--debug': self.debug = v |
4773 | if o == '--profile': self.profile = 1 |
4774 | @@ -222,7 +229,7 @@ |
4775 | print "ERROR: cannot use -o when grabbing multiple files" |
4776 | sys.exit(1) |
4777 | |
4778 | - def help(self, args): |
4779 | + def help(self, args, ret=0): |
4780 | if not args: |
4781 | print MAINHELP |
4782 | else: |
4783 | @@ -234,7 +241,7 @@ |
4784 | self.help_ug_option(a) |
4785 | else: |
4786 | print 'ERROR: no help on command "%s"' % a |
4787 | - sys.exit(0) |
4788 | + sys.exit(ret) |
4789 | |
4790 | def help_doc(self): |
4791 | print __doc__ |
4792 | @@ -294,6 +301,7 @@ |
4793 | if self.op.localfile: print f |
4794 | except URLGrabError, e: |
4795 | print e |
4796 | + sys.exit(1) |
4797 | |
4798 | def set_debug_logger(self, dbspec): |
4799 | try: |
4800 | |
4801 | === added file 'scripts/urlgrabber-ext-down' |
4802 | --- scripts/urlgrabber-ext-down 1970-01-01 00:00:00 +0000 |
4803 | +++ scripts/urlgrabber-ext-down 2014-12-13 22:24:13 +0000 |
4804 | @@ -0,0 +1,75 @@ |
4805 | +#! /usr/bin/python |
4806 | +# A very simple external downloader |
4807 | +# Copyright 2011-2012 Zdenek Pavlas |
4808 | + |
4809 | +# This library is free software; you can redistribute it and/or |
4810 | +# modify it under the terms of the GNU Lesser General Public |
4811 | +# License as published by the Free Software Foundation; either |
4812 | +# version 2.1 of the License, or (at your option) any later version. |
4813 | +# |
4814 | +# This library is distributed in the hope that it will be useful, |
4815 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
4816 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
4817 | +# Lesser General Public License for more details. |
4818 | +# |
4819 | +# You should have received a copy of the GNU Lesser General Public |
4820 | +# License along with this library; if not, write to the |
4821 | +# Free Software Foundation, Inc., |
4822 | +# 59 Temple Place, Suite 330, |
4823 | +# Boston, MA 02111-1307 USA |
4824 | + |
4825 | +import time, os, errno, sys |
4826 | +from urlgrabber.grabber import \ |
4827 | + _readlines, URLGrabberOptions, _loads, \ |
4828 | + PyCurlFileObject, URLGrabError |
4829 | + |
4830 | +def write(fmt, *arg): |
4831 | + try: os.write(1, fmt % arg) |
4832 | + except OSError, e: |
4833 | + if e.args[0] != errno.EPIPE: raise |
4834 | + sys.exit(1) |
4835 | + |
4836 | +class ProxyProgress: |
4837 | + def start(self, *d1, **d2): |
4838 | + self.next_update = 0 |
4839 | + def update(self, _amount_read): |
4840 | + t = time.time() |
4841 | + if t < self.next_update: return |
4842 | + self.next_update = t + 0.31 |
4843 | + write('%d %d\n', self._id, _amount_read) |
4844 | + |
4845 | +def main(): |
4846 | + import signal |
4847 | + signal.signal(signal.SIGINT, lambda n, f: sys.exit(1)) |
4848 | + cnt = 0 |
4849 | + while True: |
4850 | + lines = _readlines(0) |
4851 | + if not lines: break |
4852 | + for line in lines: |
4853 | + cnt += 1 |
4854 | + opts = URLGrabberOptions() |
4855 | + opts._id = cnt |
4856 | + for k in line.split(' '): |
4857 | + k, v = k.split('=', 1) |
4858 | + setattr(opts, k, _loads(v)) |
4859 | + if opts.progress_obj: |
4860 | + opts.progress_obj = ProxyProgress() |
4861 | + opts.progress_obj._id = cnt |
4862 | + |
4863 | + dlsz = dltm = 0 |
4864 | + try: |
4865 | + fo = PyCurlFileObject(opts.url, opts.filename, opts) |
4866 | + fo._do_grab() |
4867 | + fo.fo.close() |
4868 | + size = fo._amount_read |
4869 | + if fo._tm_last: |
4870 | + dlsz = fo._tm_last[0] - fo._tm_first[0] |
4871 | + dltm = fo._tm_last[1] - fo._tm_first[1] |
4872 | + ug_err = 'OK' |
4873 | + except URLGrabError, e: |
4874 | + size = 0 |
4875 | + ug_err = '%d %d %s' % (e.errno, getattr(e, 'code', 0), e.strerror) |
4876 | + write('%d %d %d %.3f %s\n', opts._id, size, dlsz, dltm, ug_err) |
4877 | + |
4878 | +if __name__ == '__main__': |
4879 | + main() |
4880 | |
4881 | === modified file 'setup.py' |
4882 | --- setup.py 2005-10-23 12:29:28 +0000 |
4883 | +++ setup.py 2014-12-13 22:24:13 +0000 |
4884 | @@ -15,8 +15,10 @@ |
4885 | packages = ['urlgrabber'] |
4886 | package_dir = {'urlgrabber':'urlgrabber'} |
4887 | scripts = ['scripts/urlgrabber'] |
4888 | -data_files = [('share/doc/' + name + '-' + version, |
4889 | - ['README','LICENSE', 'TODO', 'ChangeLog'])] |
4890 | +data_files = [ |
4891 | + ('share/doc/' + name + '-' + version, ['README','LICENSE', 'TODO', 'ChangeLog']), |
4892 | + ('libexec', ['scripts/urlgrabber-ext-down']), |
4893 | +] |
4894 | options = { 'clean' : { 'all' : 1 } } |
4895 | classifiers = [ |
4896 | 'Development Status :: 4 - Beta', |
4897 | |
4898 | === modified file 'test/base_test_code.py' |
4899 | --- test/base_test_code.py 2005-10-23 12:29:28 +0000 |
4900 | +++ test/base_test_code.py 2014-12-13 22:24:13 +0000 |
4901 | @@ -1,6 +1,6 @@ |
4902 | from munittest import * |
4903 | |
4904 | -base_http = 'http://www.linux.duke.edu/projects/urlgrabber/test/' |
4905 | +base_http = 'http://urlgrabber.baseurl.org/test/' |
4906 | base_ftp = 'ftp://localhost/test/' |
4907 | |
4908 | # set to a proftp server only. we're working around a couple of |
4909 | |
4910 | === modified file 'test/munittest.py' |
4911 | --- test/munittest.py 2005-10-23 12:29:28 +0000 |
4912 | +++ test/munittest.py 2014-12-13 22:24:13 +0000 |
4913 | @@ -113,7 +113,7 @@ |
4914 | __all__ = ['TestResult', 'TestCase', 'TestSuite', 'TextTestRunner', |
4915 | 'TestLoader', 'FunctionTestCase', 'main', 'defaultTestLoader'] |
4916 | |
4917 | -# Expose obsolete functions for backwards compatability |
4918 | +# Expose obsolete functions for backwards compatibility |
4919 | __all__.extend(['getTestCaseNames', 'makeSuite', 'findTestCases']) |
4920 | |
4921 | |
4922 | @@ -410,7 +410,7 @@ |
4923 | (default 7) and comparing to zero. |
4924 | |
4925 | Note that decimal places (from zero) is usually not the same |
4926 | - as significant digits (measured from the most signficant digit). |
4927 | + as significant digits (measured from the most significant digit). |
4928 | """ |
4929 | if round(second-first, places) != 0: |
4930 | raise self.failureException, \ |
4931 | @@ -422,7 +422,7 @@ |
4932 | (default 7) and comparing to zero. |
4933 | |
4934 | Note that decimal places (from zero) is usually not the same |
4935 | - as significant digits (measured from the most signficant digit). |
4936 | + as significant digits (measured from the most significant digit). |
4937 | """ |
4938 | if round(second-first, places) == 0: |
4939 | raise self.failureException, \ |
4940 | |
4941 | === modified file 'test/test_byterange.py' |
4942 | --- test/test_byterange.py 2005-10-23 12:29:28 +0000 |
4943 | +++ test/test_byterange.py 2014-12-13 22:24:13 +0000 |
4944 | @@ -25,7 +25,7 @@ |
4945 | |
4946 | import sys |
4947 | |
4948 | -from StringIO import StringIO |
4949 | +from cStringIO import StringIO |
4950 | from urlgrabber.byterange import RangeableFileObject |
4951 | |
4952 | from base_test_code import * |
4953 | @@ -52,18 +52,6 @@ |
4954 | self.rfo.seek(1,1) |
4955 | self.assertEquals('of', self.rfo.read(2)) |
4956 | |
4957 | - def test_poor_mans_seek(self): |
4958 | - """RangeableFileObject.seek() poor mans version.. |
4959 | - |
4960 | - We just delete the seek method from StringIO so we can |
4961 | - excercise RangeableFileObject when the file object supplied |
4962 | - doesn't support seek. |
4963 | - """ |
4964 | - seek = StringIO.seek |
4965 | - del(StringIO.seek) |
4966 | - self.test_seek() |
4967 | - StringIO.seek = seek |
4968 | - |
4969 | def test_read(self): |
4970 | """RangeableFileObject.read()""" |
4971 | self.assertEquals('the', self.rfo.read(3)) |
4972 | |
4973 | === modified file 'test/test_grabber.py' |
4974 | --- test/test_grabber.py 2010-06-21 20:36:19 +0000 |
4975 | +++ test/test_grabber.py 2014-12-13 22:24:13 +0000 |
4976 | @@ -86,7 +86,7 @@ |
4977 | |
4978 | class HTTPTests(TestCase): |
4979 | def test_reference_file(self): |
4980 | - "download refernce file via HTTP" |
4981 | + "download reference file via HTTP" |
4982 | filename = tempfile.mktemp() |
4983 | grabber.urlgrab(ref_http, filename) |
4984 | |
4985 | @@ -98,6 +98,7 @@ |
4986 | |
4987 | def test_post(self): |
4988 | "do an HTTP post" |
4989 | + self.skip() # disabled on server |
4990 | headers = (('Content-type', 'text/plain'),) |
4991 | ret = grabber.urlread(base_http + 'test_post.php', |
4992 | data=short_reference_data, |
4993 | |
4994 | === modified file 'test/test_mirror.py' |
4995 | --- test/test_mirror.py 2005-12-31 15:34:22 +0000 |
4996 | +++ test/test_mirror.py 2014-12-13 22:24:13 +0000 |
4997 | @@ -28,7 +28,7 @@ |
4998 | import string, tempfile, random, cStringIO, os |
4999 | |
5000 | import urlgrabber.grabber |
The diff has been truncated for viewing.
daniel@ daydream: ~/urlgrabber$ bzr merge lp:~noskcaj/ubuntu/vivid/urlgrabber/3.10.1 urlgrabber- ext-down patches/ grabber_ fix.diff patches/ progress_ fix.diff patches/ progress_ object_ callback_ fix.diff patches/ series test_code. py byterange. py grabber. py __init_ _.py byterange. py grabber. py mirror. py progress. py grabber. py daydream: ~/urlgrabber$
Unapplying quilt patches to prevent spurious conflicts
+N scripts/
M ChangeLog
M MANIFEST
M PKG-INFO
M README
M debian/changelog
-D debian/
-D debian/
-D debian/
M debian/
M scripts/urlgrabber
M setup.py
M test/base_
M test/munittest.py
M test/test_
M test/test_
M test/test_mirror.py
M urlgrabber/
M urlgrabber/
M urlgrabber/
M urlgrabber/
M urlgrabber/
Text conflict in urlgrabber/
1 conflicts encountered.
daniel@