Merge lp:~mycompostpile/cairo-dock-plug-ins-extras/YoutubeDl into lp:~cairo-dock-team/cairo-dock-plug-ins-extras/third-party

Proposed by Brian
Status: Merged
Merged at revision: 274
Proposed branch: lp:~mycompostpile/cairo-dock-plug-ins-extras/YoutubeDl
Merge into: lp:~cairo-dock-team/cairo-dock-plug-ins-extras/third-party
Diff against target: 5934 lines (+5866/-1)
12 files modified
Translator/Translator (+1/-1)
YoutubeDl/Configuration.py (+53/-0)
YoutubeDl/README (+29/-0)
YoutubeDl/YoutubeDl (+454/-0)
YoutubeDl/YoutubeDl.conf (+209/-0)
YoutubeDl/auto-load.conf (+19/-0)
YoutubeDl/constantTypes.py (+16/-0)
YoutubeDl/fileDialogs.py (+49/-0)
YoutubeDl/helpInfo.py (+70/-0)
YoutubeDl/myYoutubeDownloader.py (+299/-0)
YoutubeDl/userAlerts.py (+20/-0)
YoutubeDl/youtubedl.py (+4647/-0)
To merge this branch: bzr merge lp:~mycompostpile/cairo-dock-plug-ins-extras/YoutubeDl
Reviewer Review Type Date Requested Status
Matthieu Baerts Approve
Review via email: mp+108080@code.launchpad.net

Description of the change

Here is the updated YoutubeDl applet. It has been changed to use CDApplet and the pynotify initialization has been fixed. I still have some things to do like allowing the youtube-dl update on right click and some documentation. It works pretty good although the communication via the queue is a little slow depending on your update frequency set in the YoutubeDl configuration. I use this quite regularly to grab short videos on Youtube and I also create a URL list of longer videos that I allow to download over night.

To post a comment you must log in.
274. By Brian

Added a new applet YoutubeDl

Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hello

Thank you for this new applet! :)

I just quickly tested it and it seems working fine!
I just have a few ideas/suggestions:
 * If the download directory doesn't exist (by default ~/Videos), there is a problem but it's not explained what's the problem.
 * I also suggest to use the translated name for the default folder (check this file ~/.config/user-dirs.dirs or use the output of this command: xdg-user-dir VIDEOS )
 * By default, it's maybe better to use Cairo-Dock's dialogues instead of pynotify (or at least having an option to use these dialogues)
 * Why do you use a list in the dialogue when we do a left click?
 * For the 'Applet Help' dialogue, you can use a 'ShowDialog' with a timeout of 0 (and no button)
 * About this dialogue, there is another entry in YoutubeDL / Applet's Handbook: what's the difference?

PS: now you can use this command to sync with the master branch: bzr pull lp:cairo-dock-plug-ins-extras

review: Approve
Revision history for this message
Fabounet (fabounet03) wrote :

yes please use Dialogs instead of notifications, to be coherent with
the rest of the dock.
If one day we decide to use notifications, it should be a global
option, not scattered in all applets :-)

same remark for the Help, to be coherent with the other applets. The
applet's manual should only be defined in its autoload.conf file.

+1 for using XDG

and thanks a lot for sharing your applet and contributing to the project ! :-)

2012/5/31 Matthieu Baerts <email address hidden>:
> Review: Approve
>
> Hello
>
> Thank you for this new applet! :)
>
> I just quickly tested it and it seems working fine!
> I just have a few ideas/suggestions:
>  * If the download directory doesn't exist (by default ~/Videos), there is a problem but it's not explained what's the problem.
>  * I also suggest to use the translated name for the default folder (check this file ~/.config/user-dirs.dirs or use the output of this command: xdg-user-dir VIDEOS )
>  * By default, it's maybe better to use Cairo-Dock's dialogues instead of pynotify (or at least having an option to use these dialogues)
>  * Why do you use a list in the dialogue when we do a left click?
>  * For the 'Applet Help' dialogue, you can use a 'ShowDialog' with a timeout of 0 (and no button)
>  * About this dialogue, there is another entry in YoutubeDL / Applet's Handbook: what's the difference?
>
>
> PS: now you can use this command to sync with the master branch: bzr pull lp:cairo-dock-plug-ins-extras
> --
> https://code.launchpad.net/~mycompostpile/cairo-dock-plug-ins-extras/YoutubeDl/+merge/108080
> Your team Cairo-Dock Team is subscribed to branch lp:cairo-dock-plug-ins-extras.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~cairo-dock-team
> Post to     : <email address hidden>
> Unsubscribe : https://launchpad.net/~cairo-dock-team
> More help   : https://help.launchpad.net/ListHelp

Revision history for this message
Brian (mycompostpile) wrote :

* By default, it's maybe better to use Cairo-Dock's dialogues instead of pynotify (or at least having an option to use these dialogues)

That option is there. I thought I had changed it to default to dialogues but I missed it. It currently is set to true for pynotify. I will change that.

* Why do you use a list in the dialogue when we do a left click?

Some of my lists are long so the ability to scroll through them was important for me. I wanted to use a method that allows editing but haven't got that done yet (currently I pause downloading, save the list, edit with vi then load the list back in to accomplish the edit) so I just went for a quick "display the list method". I would appreciate any suggestions.

* If the download directory doesn't exist (by default ~/Videos), there is a problem but it's not explained what's the problem.
 * I also suggest to use the translated name for the default folder (check this file ~/.config/user-dirs.dirs or use the output of this command: xdg-user-dir VIDEOS )

Will look into this. I had thought about it before but it got pushed to the back of my mind and didn't get looked into.

About the help. There is a great deal of information with regards to the youtube-dl script and I wanted the user to be able to use many of the options. The applet handbook would be "very busy" putting all that information in it so I thought a help menu entry would be better. The only other option for the youtube-dl help is to have the user install youtube-dl and use the man facility to find out the information. I didn't see that as practical. Maybe I can put most of the information in the .conf file as tooltips when hovering over the options instead (most people that will use this would probably already be familiar with youtube-dl any way).

Thanks for the comments and I will try to implement them as time permits.

This applet would be much better implemented in C but the original youtube-dl script is in python so I did not want to reinvent the wheel.

I know there can be many improvements so any suggestion is appreciated.

Revision history for this message
Matthieu Baerts (matttbe) wrote :

>> By default, it's maybe better to use Cairo-Dock's dialogues instead of pynotify (or at least having an option to use these dialogues)
> That option is there.

Oh sorry, I didn't see/understand the option. Yes, it's better now if it's set to use dialogues by default.

>> Why do you use a list in the dialogue when we do a left click?
> Some of my lists are long so the ability to scroll through them was important for me.

Ok I understand. Maybe you can use another widget (or 2 lists in the menu of the applet)?

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'Translator/Translator'
2--- Translator/Translator 2012-03-22 22:23:44 +0000
3+++ Translator/Translator 2012-05-31 00:10:24 +0000
4@@ -490,4 +490,4 @@
5 self.translate(text, self.source.abbrv, self.destination.abbrv)
6
7 if __name__ == '__main__':
8- Applet().run()
9+ CDApplet.run()
10
11=== added directory 'YoutubeDl'
12=== added file 'YoutubeDl/Configuration.py'
13--- YoutubeDl/Configuration.py 1970-01-01 00:00:00 +0000
14+++ YoutubeDl/Configuration.py 2012-05-31 00:10:24 +0000
15@@ -0,0 +1,53 @@
16+# -*- coding: utf-8 -*-
17+
18+# YoutubeDl, plugin for Cairo-Dock. View the available disk space.
19+# Copyright 2010 Xavier Nayrac
20+#
21+# This program is free software: you can redistribute it and/or modify
22+# it under the terms of the GNU General Public License as published by
23+# the Free Software Foundation, either version 3 of the License, or
24+# (at your option) any later version.
25+#
26+# This program is distributed in the hope that it will be useful,
27+# but WITHOUT ANY WARRANTY; without even the implied warranty of
28+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29+# GNU General Public License for more details.
30+#
31+# You should have received a copy of the GNU General Public License
32+# along with this program. If not, see <http://www.gnu.org/licenses/>.
33+
34+from ConfigParser import RawConfigParser
35+from os.path import isfile
36+from os.path import expanduser
37+import sys
38+
39+CAIRO_CONF_PATH = "~/.config/cairo-dock/current_theme/plug-ins"
40+
41+class Configuration(RawConfigParser):
42+ """
43+ I manage the configuration's file of a Cairo-Dock plugin.
44+ """
45+ def __init__(self, nameOfPlugin):
46+ RawConfigParser.__init__(self)
47+ self.nameOfPlugin = ''
48+ self.__setFile(nameOfPlugin)
49+
50+ def __setFile(self, nameOfPlugin):
51+ """
52+ I set the name of the configuration's file, with the help of plugin's name.
53+ Then I read the configuration.
54+ """
55+ nameOfPlugin = "%s/%s/%s.conf" % (CAIRO_CONF_PATH, nameOfPlugin, nameOfPlugin)
56+ nameOfPlugin = expanduser(nameOfPlugin)
57+ if isfile(nameOfPlugin):
58+ self.nameOfPlugin = nameOfPlugin
59+ self.refresh()
60+ else:
61+ print "%s n'existe pas ! Fin du programme." % (nameOfPlugin)
62+ sys.exit(1)
63+
64+ def refresh(self):
65+ """
66+ I read the configuration's file.
67+ """
68+ self.readfp(open(self.nameOfPlugin))
69
70=== added file 'YoutubeDl/README'
71--- YoutubeDl/README 1970-01-01 00:00:00 +0000
72+++ YoutubeDl/README 2012-05-31 00:10:24 +0000
73@@ -0,0 +1,29 @@
74+This is an applet to Download Youtube videos
75+
76+Summary:
77+This Youtube video downloader allows users to drag and drop video URLs onto the YoutubeDl icon and have the videos downloaded.
78+
79+This is a work in progress so feel free to report any bugs and/or suggestions.
80+
81+License:
82+GPL, see COPYING.
83+
84+Details:
85+This applet allows a user to drag Youtube links from the Youtube website and
86+drop them on the icon to download. The backend downloader is based on
87+youtube-dl.py. If you have python-tk installed on your system saving and
88+loading url lists will be done using a graphical dialog box, otherwise you
89+need to set the directory path in the configuration and just enter the filename
90+in the popup box. If you have pynotify installed on your system you have the
91+option of letting the notification area of your desktop handle the alerts,
92+otherwise alerts will be done using the cairo-dock popup messages. You can
93+turn this feature on or off under the configuration. Currently the url list
94+is not editable from the dock but if you save the list to a file you can edit
95+with a text editor then load the list back into the plugin. Downloading can
96+be paused from the context menu of the icon and you can choose to have
97+downloads start automatically in the configuration area. The left click button
98+on the mouse brings up the current url list and the middle button can be
99+configured for different actions.
100+
101+Contact:
102+Brian @ mycompostpile@yahoo.ca
103
104=== added file 'YoutubeDl/YoutubeDl'
105--- YoutubeDl/YoutubeDl 1970-01-01 00:00:00 +0000
106+++ YoutubeDl/YoutubeDl 2012-05-31 00:10:24 +0000
107@@ -0,0 +1,454 @@
108+#!/usr/bin/python
109+# -*- coding: utf-8 -*-
110+
111+# YoutubeDl, Download videos from Youtube.
112+# This is a part of the external applets for Cairo-Dock
113+# Copyright: (C) 2012 Brian Whitelock
114+#
115+# This program is free software; you can redistribute it and/or
116+# modify it under the terms of the GNU General Public License
117+# as published by the Free Software Foundation; either version 2
118+# of the License, or (at your option) any later version.
119+#
120+# This program is distributed in the hope that it will be useful,
121+# but WITHOUT ANY WARRANTY; without even the implied warranty of
122+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
123+# GNU General Public License for more details.
124+# http://www.gnu.org/licenses/licenses.html#GPL
125+#
126+
127+import gobject
128+#from gobject import timeout_add
129+from Configuration import Configuration
130+from CDApplet import CDApplet, _
131+import os, subprocess
132+import multiprocessing, Queue, random
133+from myYoutubeDownloader import YoutubeDownloader
134+
135+# if pynotify is available use it otherwise use popup messages.
136+import userAlerts as alerts
137+
138+# if tkinter is available use it otherwise use popup messages.
139+import fileDialogs as dialogs
140+
141+# all constant types are placed in one file and used as needed.
142+from constantTypes import PopupTypes
143+from constantTypes import menuEntries
144+
145+# import the help messages used in the right click context menu
146+from helpInfo import helpMessages
147+
148+class Applet(CDApplet):
149+
150+ def __init__(self):
151+ #super(YoutubeDlPlugin, self).__init__()
152+ self.__interval = 60000 # 1 minute (in millisecondes)
153+ #self.__config = Configuration(os.path.basename(os.path.abspath(".")))
154+ self.__timerId = None
155+ self.work_queue = multiprocessing.Queue(1)
156+ self.result_queue = multiprocessing.Queue(2)
157+ self.result = ['Idle','Idle','Idle','Idle']
158+ self.activeDownload = False
159+ self.urlList = list()
160+ self.currentDialog = PopupTypes.infoDialog
161+ self.resultSummary = 'No Data'
162+ self.listAltered = False
163+ self.__debugMode = False
164+ self.startDownloader()
165+ CDApplet.__init__(self)
166+
167+ def startDownloader(self):
168+ # create the background downloader
169+ self.downloadManager = YoutubeDownloader(self.work_queue, self.result_queue)
170+ # Uncomment the following line to trace debug actions in the console.
171+ #self.downloadManager.debug()
172+ # start the background downloader
173+ self.downloadManager.start()
174+
175+ def begin(self):
176+
177+ """
178+ First method ran by CairoDock when applet is launched.
179+ """
180+ if self.__showProgressOnIcon:
181+ self.icon.SetQuickInfo(str(self.result[0]))
182+ else:
183+ self.icon.SetQuickInfo('')
184+ if self.__showStatusOnIcon:
185+ #self.icon.SetLabel(self.resultSummary)
186+ self.icon.SetLabel("YoutubeDl")
187+ self.__setTimer()
188+
189+ def end(self):
190+ # try to end the background downloader
191+ self.downloadManager.join(1)
192+ #If the downloader is still active then terminate it
193+ if self.downloadManager.is_alive():
194+ self.downloadManager.terminate()
195+ self.downloadManager.join(1)
196+
197+ def on_click(self, iState):
198+
199+ #super(YoutubeDlPlugin, self).onClick(iState)
200+ tempString = ' Current Download;'
201+ if self.activeDownload:
202+ tempString = tempString + '\n -> '.join(self.urlList[0]) + ';'
203+ rangeStart = 1
204+ else:
205+ tempString = tempString + ' -> None;'
206+ rangeStart = 0
207+ tempString = tempString + '\n Current URL List;'
208+ if len(self.urlList) > rangeStart:
209+ tempList = list()
210+ for item in range(rangeStart,len(self.urlList)):
211+ #tempList.append('\n -> '.join(self.urlList[item]))
212+ tempList.append('\n -> '.join(self.urlList[item]))
213+ self.messageDebug(tempString)
214+ #tempString = tempString + '\n'.join(tempList)
215+ tempString = tempString + ';'.join(tempList)
216+ self.messageDebug(tempString)
217+ self.messageDebug(tempString)
218+ else:
219+ tempString = tempString + ' -> Empty'
220+ self.messageDebug(tempString)
221+ self.icon.PopupDialog( {"message" : "Youtube Download URL List",
222+ "buttons" : "ok",
223+ "icon" : "gtk-stock-edit"},
224+ {"visible" : True,
225+ "widget-type" : "list",
226+ "multi-lines" : True,
227+ "editable" : False,
228+ "values" : tempString})
229+ self.currentDialog = PopupTypes.infoDialog
230+ return True
231+
232+ def on_middle_click(self):
233+ """
234+ I set my icon always vissible flag
235+ """
236+ #super(YoutubeDlPlugin, self).onMiddleClick()
237+ if self.__actionOnMiddleClick == 'Open Video Folder':
238+ subprocess.call(['xdg-open','/home/brian/Videos'], shell=False)
239+ else:
240+ alerts.doUserAlert(self,self.resultSummary,4)
241+
242+ return True
243+
244+ def reload(self):
245+ #"""
246+ #Je recharge la configuration si besoin.
247+ #"""
248+ #super(YoutubeDlPlugin, self).onReload(bConfigHasChanged)
249+ #if bConfigHasChanged:
250+ #self.__setConfiguration()
251+ #if self.__showStatusOnIcon:
252+ #self.icon.SetLabel(self.resultSummary)
253+ #if self.__showProgressOnIcon:
254+ #self.icon.SetQuickInfo(str(self.result[0]))
255+ #else:
256+ #self.icon.SetQuickInfo('')
257+ if not self.__videos_directory:
258+ self.__videos_directory = os.path.abspath(os.path.expanduser("~")+"/Videos")
259+ if not self.__urlList_directory:
260+ self.__urlList_directory = os.path.abspath('.')
261+ if self.__showProgressOnIcon:
262+ self.icon.SetQuickInfo(str(self.result[0]))
263+
264+ def doUpdate(self):
265+ """
266+ Update the current status for downloads.
267+ """
268+ if self.activeDownload:
269+ self.messageDebug("doUpdate: active downloads is true")
270+ try:
271+ queueContents = self.result_queue.get_nowait()
272+ if queueContents == 'DownloadComplete':
273+ self.result = ['Idle','Idle','Idle','Idle']
274+ self.activeDownload = False
275+ if self.__showAlertDownloadComplete:
276+ alerts.doUserAlert(self,"Download " + self.urlList[0][1] + " is Complete",4)
277+ del self.urlList[0]
278+ self.messageDebug("doUpdate: result_queue reports DownloadComplete")
279+ elif queueContents == 'DownloadAborted':
280+ self.result = ['Idle','Idle','Idle','Idle']
281+ self.activeDownload = False
282+ if self.__showAlertDownloadAbort:
283+ alerts.doUserAlert(self,"Download " + self.urlList[0][1] + " has been aborted",4)
284+ del self.urlList[0]
285+ self.messageDebug("the length of url list is: " + str(len(self.urlList)))
286+ self.messageDebug("doUpdate: result_queue reports DownloadAborted")
287+ else:
288+ self.result = queueContents.split(';')
289+ self.resultSummary = "%s\n%s of %s @ %s eta: %s" % (self.urlList[0][1],self.result[0],self.result[1],self.result[2],self.result[3])
290+ self.messageDebug("doUpdate: result summary:\n"+self.resultSummary)
291+ except Queue.Empty:
292+ self.result = ['Empty','Empty','Empty','Empty']
293+ self.messageDebug("doUpdate: Queue is empty")
294+ else:
295+ self.messageDebug("doUpdate: Active Downloads is false")
296+ self.resultSummary = "No Active Downloads"
297+ if (len(self.urlList) > 0):
298+ self.messageDebug("doUpdate: "+str(len(self.urlList))+" tems in url list")
299+ if self.__startDownloads:
300+ self.messageDebug("doUpdate: start Downloads is true")
301+ self.messageDebug("doUpdate: Start Download:\n"+self.urlList[0][0])
302+ self.startDownload(self.urlList[0][0])
303+ else:
304+ self.messageDebug("doUpdate: start Downloads is false")
305+ else:
306+ self.listAltered = False
307+ if self.__showStatusOnIcon:
308+ self.icon.SetLabel(self.resultSummary)
309+ #update the quickinfo on Icon
310+ if self.__showProgressOnIcon:
311+ self.icon.SetQuickInfo(str(self.result[0]))
312+ #Reset timer after doing update
313+ self.__setTimer()
314+
315+ def on_drop_data(self,cReceivedData):
316+ if self.__showAlertAddURL:
317+ alerts.doUserAlert(self,"Added to queue list: "+cReceivedData,4)
318+ if (not self.activeDownload) and self.__startDownloads:
319+ self.messageDebug("onDropData: download immediately:\n"+str(cReceivedData))
320+ self.startDownload(cReceivedData)
321+ if cReceivedData.find('watch?v=') == (-1):
322+ fileName = "no filename maybe it is a playlist"
323+ self.messageDebug("onDropData: Found watch?v= in url")
324+ else:
325+ self.messageDebug("onDropData: didn't find watch?v= in url")
326+ p = subprocess.Popen(["./youtubedl.py","--get-filename","-itf","18",cReceivedData],stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=False)
327+ fileName, errors = p.communicate()
328+ fileName=fileName.rstrip()
329+ self.urlList.append([str(cReceivedData),fileName])
330+ #self.filenameList.append(fileName)
331+ self.messageDebug("onDropData: New URL List:\n"+str(self.urlList))
332+ self.listAltered = True
333+
334+ def startDownload(self, url):
335+ try:
336+ self.work_queue.put(url)
337+ self.messageDebug("startDownload: put url on work queue:\n"+url)
338+ if self.__showAlertStartDownloads:
339+ alerts.doUserAlert(self,"Starting Download: "+url,4)
340+ self.activeDownload = True
341+ self.result = ['Starting','Starting','Starting','Starting']
342+ self.icon.SetQuickInfo(str(self.result[0]))
343+ except Queue.Full:
344+ self.messageDebug("startDownload: work queue is full:\n"+url)
345+ alerts.doUserAlert(self,"Can't Download: Queue is Full",4)
346+
347+ def on_build_menu(self):
348+ self.messageDebug("onBuildMenu: context menu called")
349+ items = []
350+ if self.activeDownload:
351+ items.append(
352+ {"label": "Abort current download",
353+ "icon" : "gtk-cancel",
354+ "id" : menuEntries.abortDownload })
355+ if len(self.urlList) > 0:
356+ items.append(
357+ {"label": "Save current URL list",
358+ "icon" : "gtk-save",
359+ "id" : menuEntries.saveURLs })
360+ items.append(
361+ {"label": "Clear current URL list",
362+ "icon" : "gtk-delete",
363+ "id" : menuEntries.clearURLs })
364+ if len(self.urlList) == 0:
365+ items.append(
366+ {"label": "Load URL list from file",
367+ "icon" : "gtk-open",
368+ "id" : menuEntries.loadURLs })
369+ if self.__startDownloads:
370+ items.append(
371+ {"label": "Pause Downloading",
372+ "icon" : "gtk-media-pause",
373+ "id" : menuEntries.pauseDownload })
374+ else:
375+ items.append(
376+ {"label": "Enable Downloading",
377+ "icon" : "gtk-media-play",
378+ "id" : menuEntries.enableDownload })
379+ items.append(
380+ {"type":1,
381+ "label": "Help Menu",
382+ "menu":0,
383+ "icon" : "gtk-help",
384+ "id" : menuEntries.helpSubMenu })
385+ items.append(
386+ {"type":0,
387+ "label": "youtube-dl Help",
388+ "menu":menuEntries.helpSubMenu,
389+ "icon" : "gtk-help",
390+ "id" : menuEntries.downloaderHelp })
391+ items.append(
392+ {"type":0,
393+ "label": "Applet Help",
394+ "menu": menuEntries.helpSubMenu,
395+ "icon" : "gtk-help",
396+ "id" : menuEntries.pluginHelp })
397+ self.icon.AddMenuItems(items)
398+
399+ def on_menu_select(self,iNumEntry):
400+ self.messageDebug("onSelectMenu: "+str(iNumEntry)+" selected")
401+ if iNumEntry == menuEntries.abortDownload:
402+ self.icon.PopupDialog( {"message" : "Are you sure you want to cancel the current download?",
403+ "buttons" : "ok;cancel",
404+ "icon" : "gtk-cancel"},
405+ {"visible" : True } )
406+ self.currentDialog = PopupTypes.confirmAbort
407+ elif iNumEntry == menuEntries.clearURLs:
408+ self.icon.PopupDialog( {"message" : "Are you sure you want to clear the current URL list?",
409+ "buttons" : "ok;cancel",
410+ "icon" : "gtk-delete"},
411+ {"visible" : True } )
412+ self.currentDialog = PopupTypes.delList
413+ elif iNumEntry == menuEntries.saveURLs:
414+ self.saveURLs()
415+ elif iNumEntry == menuEntries.loadURLs:
416+ self.loadURLs()
417+ elif iNumEntry == menuEntries.pauseDownload:
418+ self.__startDownloads = False
419+ message = "Downloading Paused"
420+ if self.activeDownload:
421+ message = message+": Current download will complete. To stop it use the Download Abort"
422+ alerts.doUserAlert(self,message,5)
423+ elif iNumEntry == menuEntries.enableDownload:
424+ self.__startDownloads = True
425+ alerts.doUserAlert(self,"Downloading Enabled",5)
426+ if not (self.activeDownload):
427+ if len(self.urlList) > 0:
428+ self.startDownload(self.urlList[0][0])
429+ else:
430+ self.result = ['Enabling','Enabling','Enabling','Enabling']
431+ self.icon.SetQuickInfo(str(self.result[0]))
432+ elif iNumEntry == menuEntries.downloaderHelp:
433+ helpMessage = helpMessages.downloaderHelp
434+ self.icon.PopupDialog( {"message" :helpMessage,
435+ "buttons" : "ok",
436+ "icon" : "gtk-help"},
437+ {"visible" : True } )
438+ self.currentDialog = PopupTypes.infoDialog
439+ elif iNumEntry == menuEntries.pluginHelp:
440+ helpMessage = helpMessages.pluginHelp
441+ self.icon.PopupDialog( {"message" :helpMessage,
442+ "buttons" : "ok",
443+ "icon" : "gtk-help"},
444+ {"visible" : True } )
445+ self.currentDialog = PopupTypes.infoDialog
446+ else:
447+ self.messageDebug("An unknown menu entry was received")
448+
449+ def on_answer_dialog(self,button, userResponse):
450+ if self.currentDialog == PopupTypes.confirmAbort:
451+ self.messageDebug("onAnswerDialog: confirm abort: "+str(button)+" "+str(userResponse))
452+ if button == 0:
453+ self.work_queue.put('Abort')
454+ self.result = ['Aborting','Aborting','Aborting','Aborting']
455+ self.icon.SetQuickInfo(str(self.result[0]))
456+ self.__startDownloads = False
457+ elif self.currentDialog == PopupTypes.delList:
458+ self.messageDebug("onAnswerDialog: confirm delete: "+str(button)+" "+str(userResponse))
459+ if button == 0:
460+ del self.urlList[:]
461+ elif self.currentDialog == PopupTypes.saveListFilename:
462+ self.messageDebug("onAnswerDialog: save list filename: "+str(button)+" "+str(userResponse))
463+ elif self.currentDialog == PopupTypes.getListFilename:
464+ self.messageDebug("onAnswerDialog: get list filename: "+str(button)+" "+str(userResponse))
465+ elif self.currentDialog == PopupTypes.infoDialog:
466+ self.messageDebug("onAnswerDialog: info dialog : "+str(button)+" "+str(userResponse))
467+ elif self.currentDialog == PopupTypes.showUrlList:
468+ self.messageDebug("onAnswerDialog: showUrlList : "+str(button)+" "+str(userResponse))
469+ else:
470+ self.messageDebug("onAnswerDialog: Unknown dialog : "+str(button)+" "+str(userResponse))
471+
472+ self.currentDialog = PopupTypes.infoDialog
473+
474+ def saveURLs(self):
475+ fileName=dialogs.saveUrlFilename()
476+ if fileName == None:
477+ self.messageDebug("returned filename is None")
478+ elif len(fileName) > 0:
479+ self.messageDebug("returned filename is: "+fileName)
480+ else:
481+ self.messageDebug("returned filename is 0 ")
482+ if len(fileName) > 0:
483+ self.icon.ShowDialog("Saving list",4)
484+ saveFile = open(fileName, 'w')
485+ for item in range(len(self.urlList)):
486+ saveFile.write("{0}::{1}\n".format(self.urlList[item][0],self.urlList[item][1]))
487+ saveFile.close()
488+ self.listAltered = False
489+
490+ def loadURLs(self):
491+ fileName=dialogs.openUrlFilename(self.__urlList_directory)
492+ if fileName == None:
493+ self.messageDebug("returned filename is None")
494+ elif len(fileName) > 0:
495+ del self.urlList[:]
496+ self.urlList = [line.strip().split('::') for line in open(fileName)]
497+ self.listAltered = False
498+ self.messageDebug("new list is: ")
499+ self.messageDebug(self.urlList)
500+ else:
501+ self.messageDebug("returned filename is 0 ")
502+
503+ def messageDebug(self, message):
504+ """
505+ I write message to console if I have permission to do this.
506+ """
507+ if self.__debugMode:
508+ print '<%s : %s>' % (self.__name, message)
509+
510+ def debug(self):
511+ """
512+ Call me one time in the beginning of your script. If you are running Cairo-Dock
513+ from a console window, you'll be able to see what I'm doing.
514+ """
515+ self.__debugMode = True
516+
517+
518+ #def __setConfiguration(self):
519+ def get_config(self, keyfile):
520+ """
521+ I reload the configuration.
522+ """
523+ #self.__config.refresh()
524+ #interval = int(self.__config.get('User Interface', 'interval'))
525+ interval = keyfile.getint('User Interface', 'interval')
526+ self.__interval = interval * 1000 # convert in millisecondes.
527+ self.__startDownloads = keyfile.getboolean('User Interface', 'startDownloads')
528+ self.__showAlertStartDownloads = keyfile.getboolean('User Interface', 'showAlertStartDownloads')
529+ self.__showAlertDownloadComplete = keyfile.getboolean('User Interface', 'showAlertDownloadComplete')
530+ self.__showAlertDownloadAbort = keyfile.getboolean('User Interface', 'showAlertDownloadAbort')
531+ self.__showAlertAddURL = keyfile.getboolean('User Interface', 'showAlertAddURL')
532+ self.usePynotify = keyfile.getboolean('User Interface', 'usePynotify')
533+ self.__actionOnMiddleClick = keyfile.get('User Interface', 'actionOnMiddleClick')
534+ self.__showProgressOnIcon = keyfile.getboolean('User Interface', 'showProgressOnIcon')
535+ self.__showStatusOnIcon = keyfile.getboolean('User Interface', 'showStatusOnIcon')
536+ self.__videos_directory = keyfile.get('User Interface', 'videos_directory')
537+ self.__urlList_directory = keyfile.get('User Interface', 'urlList_directory')
538+ self.__setTimer()
539+
540+ def __setTimer(self):
541+ """
542+ I set the time between two checks.
543+ """
544+ self.__removeTimer()
545+ #self.__timerId = timeout_add(self.__interval, self.doUpdate)
546+ self.__timerId = gobject.timeout_add(self.__interval, self.doUpdate)
547+
548+ def __removeTimer(self):
549+ """
550+ I properly remove the timer.
551+ """
552+ if self.__timerId != None:
553+ gobject.source_remove(self.__timerId)
554+ gobject.source_remove(self.__timerId)
555+
556+############
557+### main ###
558+############
559+if __name__ == '__main__':
560+ Applet().run()
561+
562
563=== added file 'YoutubeDl/YoutubeDl.conf'
564--- YoutubeDl/YoutubeDl.conf 1970-01-01 00:00:00 +0000
565+++ YoutubeDl/YoutubeDl.conf 2012-05-31 00:10:24 +0000
566@@ -0,0 +1,209 @@
567+#!en;1.0.1
568+
569+#[gtk-about]
570+[Icon]
571+#j+[0;128] Desired icon size for this applet
572+#{Set to 0 to use the default applet size}
573+icon size = 0;0
574+
575+#s Name of the icon as it will appear in its label in the dock :
576+name = YoutubeDl
577+
578+#S+ Image's filename :
579+#{Let empty to use the default one.}
580+icon =
581+
582+#d Name of the dock it belongs to:
583+dock name =
584+
585+order=
586+
587+#b Always display the icon, even when the dock is hidden?
588+always visi = false
589+
590+#F[Applet's Handbook]
591+frame_hand=
592+#A
593+handbook=YoutubeDl
594+
595+#[gtk-convert]
596+[Desklet]
597+
598+#j+[48;512] Desklet's dimension (width x height) :
599+#{Depending on your WindowManager, you can resize it with ALT + middle_click or ALT + left_click for exemple.}
600+size = 96;96
601+
602+#i[-2048;2048] Desklet's position (x ; y) :
603+#{Depending on your WindowManager, you can move it with ALT + left_click}
604+x position=0
605+#i[-2048;2048] ...
606+y position=0
607+
608+#b Is detached from the dock ?
609+initially detached=false
610+#l[Normal;Keep above;Keep below;On Widget Layer;Reserve space] Accessibility :
611+#{for CompizFusion's "widget layer", set behaviour in Compiz to: (class=Cairo-dock & type=utility)}
612+accessibility=0
613+#b Should be visible on all desktops ?
614+sticky=true
615+
616+#b Lock position ?
617+#{If locked, the desklet can't be moved by simply dragging it with the left mouse button. Of course you can still move it with ALT + left_click.}
618+locked = false
619+
620+#I[-180;180] Rotation :
621+#{in degrees.}
622+rotation = 0
623+
624+use size=
625+
626+#F[Decorations;gtk-orientation-portrait]
627+frame_deco=
628+
629+#o+ Choose a decoration theme for this desklet :
630+#{Choose the 'personnal' one to define your own decorations below.}
631+decorations = default
632+
633+#v
634+sep_deco =
635+
636+#S+ Background image :
637+#{It's an image that will be displayed below the drawings, like a frame for exemple. Let empty to not use any.}
638+bg desklet =
639+#e+[0;1] Background tansparency :
640+bg alpha = 1
641+#i+[0;256] Left offset :
642+#{in pixels. Use this to adjust the left position of the drawings.}
643+left offset = 0
644+#i+[0;256] Top offset :
645+#{in pixels. Use this to adjust the top position of the drawings.}
646+top offset = 0
647+#i+[0;256] Right offset :
648+#{in pixels. Use this to adjust the right position of the drawings.}
649+right offset = 0
650+#i+[0;256] Bottom offset :
651+#{in pixels. Use this to adjust the bottom position of the drawings.}
652+bottom offset = 0
653+#S+ Foreground image :
654+#{It's an image that will be displayed above the drawings, like a reflect for exemple. Let empty to not use any.}
655+fg desklet =
656+#e+[0;1] Foreground tansparency :
657+fg alpha = 1
658+
659+
660+#User interface options
661+[User Interface]
662+
663+#b Start downloading videos imediately? :
664+startDownloads = true
665+
666+#v
667+mySep =
668+
669+#i[1;300] Time between updates:
670+#{in secondds}
671+interval = 30
672+
673+#v
674+mySep =
675+
676+#L[Open Video Folder;Show Status] Action on middle click:
677+actionOnMiddleClick = 0
678+
679+#v
680+mySep =
681+
682+#b Show download progress on icon? :
683+showProgressOnIcon = true
684+
685+#b Show download status in icon label when hovering over icon? :
686+showStatusOnIcon = true
687+
688+#v
689+mySep =
690+
691+#X[Notification selections]
692+frame_notify=
693+
694+#b Show a pop-up message when starting downloads? :
695+showAlertStartDownloads = true
696+
697+#b Show a pop-up message when download is complete? :
698+showAlertDownloadComplete = true
699+
700+#b Show a pop-up message when download is aborted? :
701+showAlertDownloadAbort = true
702+
703+#b Show a pop-up message when adding url to queue list? :
704+showAlertAddURL = false
705+
706+#b Attempt to use desktop notifications instead of dock messages? :
707+usePynotify = true
708+
709+#X
710+myDummyFrame =
711+#v
712+mySep =
713+
714+#X[Directory Setup]
715+frame_folders=
716+
717+#D Save videos to this directory:
718+#{The path to save the videos to. Leave it empty to use the default one (cuurently in the users Videos directory).}
719+videos_directory =
720+
721+#D Save URL List to this directory:
722+#{The path to save the URL list to. Leave it empty to use the default one (cuurently in the plugin directory).}
723+urlList_directory =
724+
725+[Download Options]
726+#> Set the download options that will be used for the youtubedl backend
727+thisLabel3=
728+
729+#b Resume partially downloaded files (--continue)?
730+resumeDownload = true
731+
732+#b Ignore errors during download (--ignore-errors)?
733+ignoreErrors = true
734+
735+#b Do no overwrite already existing files (--no-overwrites)?
736+noOverwrites = true
737+
738+#b Use the title of the video in the file name used to download the video (--title)?
739+useTitle = true
740+
741+#v
742+mySep=
743+
744+#y[Specify the video format;Limit the maximum quality;Download default format] Which download format to use:
745+useFormat=0
746+
747+#L[H264 - MP4 at 480p;H264 - MP4 at 720p;H264 - MP4 at 1080p;H264 - FLV at 360p;H264 - FLV at 480p;H263 - FLV at 240p;Webm at 480p;Webm at 720p;3GP video] Specify the video format (quality) in which to download the video (--format):
748+videoFormat=
749+
750+#L[H264 - MP4 at 480p;H264 - MP4 at 720p;H264 - MP4 at 1080p;H264 - FLV at 360p;H264 - FLV at 480p;H263 - FLV at 240p;Webm at 480p;Webm at 720p;3GP video] Limit the maximum quality of the videos to download to (--max-quality):
751+maxVideoFormat=
752+
753+#>Download the default format.
754+defaultFormat=
755+
756+#v
757+mySep =
758+
759+#B- Use the Username and Passwrod authentication process.
760+useAuthentication=false
761+
762+#F[Username/Password]
763+authFrame=
764+
765+#s Username:
766+userName=
767+
768+#p Password:
769+userPassword=
770+
771+#F
772+emptyFrame=
773+
774+#v
775+mySep=
776
777=== added file 'YoutubeDl/auto-load.conf'
778--- YoutubeDl/auto-load.conf 1970-01-01 00:00:00 +0000
779+++ YoutubeDl/auto-load.conf 2012-05-31 00:10:24 +0000
780@@ -0,0 +1,19 @@
781+[Register]
782+
783+# Author of the applet
784+author = Brian
785+
786+# A short description of the applet and how to use it.
787+description = This applet allows a user to drag Youtube links from the Youtube website and drop them on the icon to download. The backend downloader is based on youtube-dl.py.\n\n If you have python-tk installed on your system saving and loading url lists will be done using a graphical dialog box, otherwise you need to set the directory path in the configuration and just enter the filename in the popup box.\n\n If you have pynotify installed on your system you have the option of letting the notification area of your desktop handle the alerts, otherwise alerts will be done using the cairo-dock popup messages. You can turn this feature on or off under the configuration.\n\nCurrently the url list is not editable from the dock but if you save the list to a file you can edit with a text editor then load the list back into the plugin. Downloading can be paused from the context menu of the icon and you can choose to have downloads start automatically in the configuration area.\n\nThe left click button on the mouse brings up the current url list and the middle button can be configured for different actions.\n\nPlease enjoy.
788+
789+# Category of the applet : 2 = files, 3 = internet, 4 = Desktop, 5 = accessory, 6 = fun
790+category = 3
791+
792+# Version of the applet; change it everytime you change something in the config file. Don't forget to update the version both in this file and in the config file.
793+version = 0.0.5
794+
795+# Default icon to use if no icon has been defined by the user. If not specified, or if the file is not found, the "icon" file will be used.
796+icon =
797+
798+# Whether the applet can be instanciated several times or not.
799+multi-instance = false
800
801=== added file 'YoutubeDl/constantTypes.py'
802--- YoutubeDl/constantTypes.py 1970-01-01 00:00:00 +0000
803+++ YoutubeDl/constantTypes.py 2012-05-31 00:10:24 +0000
804@@ -0,0 +1,16 @@
805+class PopupTypes:
806+ (infoDialog, confirmAbort, saveListFilename, getListFilename, delList, showUrlList) = range(0, 6)
807+
808+class menuEntries:
809+ (abortDownload, saveURLs, loadURLs, pauseDownload, enableDownload, clearURLs,helpSubMenu,downloaderHelp,pluginHelp) = range(1,10)
810+
811+class youtube:
812+ videoFormats = {"H264 - MP4 at 480p":'18',
813+ "H264 - MP4 at 720p":'22',
814+ "H264 - MP4 at 1080p":'37',
815+ "H264 - FLV at 360p":'34',
816+ "H264 - FLV at 480p":'35',
817+ "H263 - FLV at 240p":'5',
818+ "Webm at 480p":'43',
819+ "Webm at 720p":'45',
820+ "3GP video":'17'}
821
822=== added file 'YoutubeDl/fileDialogs.py'
823--- YoutubeDl/fileDialogs.py 1970-01-01 00:00:00 +0000
824+++ YoutubeDl/fileDialogs.py 2012-05-31 00:10:24 +0000
825@@ -0,0 +1,49 @@
826+try:
827+ import Tkinter, tkFileDialog
828+ def openUrlFilename(initialDirectory):
829+
830+ rootDialog = Tkinter.Tk()
831+
832+ # define options for opening or saving a file
833+ file_opt = options = {}
834+ options['defaultextension'] = 'txt' # couldn't figure out how this works
835+ options['filetypes'] = [('all files', '.*'), ('text files', '.txt')]
836+ options['initialdir'] = initialDirectory
837+ #options['initialfile'] = initialDirectory
838+ options['title'] = 'Please select a url list file to open'
839+ options['parent'] = rootDialog
840+ #fileName = tkFileDialog.askopenfilename(parent=rootDialog,title='Please select file to open')
841+ fileName = tkFileDialog.askopenfilename(**file_opt)
842+ rootDialog.destroy()
843+ return fileName
844+
845+ def saveUrlFilename():
846+ rootDialog = Tkinter.Tk()
847+ fileName = tkFileDialog.asksaveasfilename(parent=rootDialog,title='Please select file to open')
848+ rootDialog.destroy()
849+ return fileName
850+
851+except ImportError:
852+ def openUrlFilename():
853+ return None
854+ def saveUrlFilename():
855+ return None
856+ #from PopupDialogTypes import *
857+
858+ #def openUrlFilename(master):
859+ #master.PopupDialog( {"message" : "Enter filename to open:",
860+ #"buttons" : "ok;cancel",
861+ #"icon" : "gtk-stock-edit"},
862+ #{"widget-type" : "text-entry",
863+ #"visible" : False} )
864+ #master.currentDialog = PopupDialogTypes.infoDialog
865+ #return None
866+
867+ #def saveUrlFilename(master):
868+ #master.PopupDialog( {"message" : "Enter filename to open:",
869+ #"buttons" : "ok;cancel",
870+ #"icon" : "gtk-stock-edit"},
871+ #{"widget-type" : "text-entry",
872+ #"visible" : False} )
873+ #master.currentDialog = PopupDialogTypes.infoDialog
874+ #return None
875
876=== added file 'YoutubeDl/helpInfo.py'
877--- YoutubeDl/helpInfo.py 1970-01-01 00:00:00 +0000
878+++ YoutubeDl/helpInfo.py 2012-05-31 00:10:24 +0000
879@@ -0,0 +1,70 @@
880+class helpMessages:
881+ downloaderHelp ="""Download videos from youtube.com or other video platforms
882+
883+DESCRIPTION
884+ Download videos from youtube.com or any other of the supported video platforms.
885+
886+Currently supported sites are: CollegeHumor, Comedy Central, Dailymotion,
887+Facebook, Metacafe, MyVideo, Photobucket, The Escapist, Vimeo, Yahoo!, YouTube,
888+blip.tv, depositfiles.com, video.google.com, xvideos, Soundcloud, InfoQ,
889+Mixcloud, OpenClassRoom.
890+
891+Many YouTube.com videos are in Flash Video format and their extension would
892+be "flv". Other videos are encoded in H.264 and these usually have the
893+extension "mp4". In Linux and other unices, video players using a
894+recent version of ffmpeg can play them. That includes MPlayer, VLC,
895+xine, among others.
896+
897+OPTIONS
898+ Update: Update this program to the latest stable version.
899+ Ignore Errors: Ignore errors during download and continue processing.
900+ Username: Specify the youtube account username. Some videos require an
901+ account to be downloaded, mostly because they're flagged as mature
902+ content.
903+ Password: Like the username, specifies the account password.
904+ Format: Specify the video format (quality) in which to download the video.
905+ For youtube.com, in particular, the meaning of the format codes is
906+ given as:
907+ WebM video at 480p: 43
908+ WebM video at 720p: 45
909+ H264 video in MP4 container at 480p: 18
910+ H264 video in MP4 container at 720p: 22
911+ H264 video in MP4 container at 1080p: 37
912+ H264 video in FLV container at 360p: 34
913+ H264 video in FLV container at 480p: 35
914+ H263 video at 240p: 5
915+ 3GP video: 17
916+
917+ Note that not all videos are available in all formats and that
918+ other sites supported by youtube-dl may have different conventions
919+ for their video formats.
920+ Max Quality: Limit the maximum quality of the videos to download.
921+ Title: Use the title of the video in the file name used to download the
922+ video.
923+ No Overwrites: Do no overwrite already existing files.
924+
925+ Continue: Resume partially downloaded files.
926+
927+AUTHOR
928+ youtube-dl was written by Ricardo Garcia Gonzalez and many contributors
929+ from all around the internet.
930+"""
931+
932+ pluginHelp = """ Youtube Download Applet
933+
934+This applet allows a user to drag Youtube links from the Youtube website and
935+drop them on the icon to download. The backend downloader is based on
936+youtube-dl.py. If you have python-tk installed on your system saving and
937+loading url lists will be done using a graphical dialog box, otherwise you
938+need to set the directory path in the configuration and just enter the filename
939+in the popup box. If you have pynotify installed on your system you have the
940+option of letting the notification area of your desktop handle the alerts,
941+otherwise alerts will be done using the cairo-dock popup messages. You can
942+turn this feature on or off under the configuration. Currently the url list
943+is not editable from the dock but if you save the list to a file you can edit
944+with a text editor then load the list back into the plugin. Downloading can
945+be paused from the context menu of the icon and you can choose to have
946+downloads start automatically in the configuration area. The left click button
947+on the mouse brings up the current url list and the middle button can be
948+configured for different actions.
949+"""
950
951=== added file 'YoutubeDl/icon'
952Binary files YoutubeDl/icon 1970-01-01 00:00:00 +0000 and YoutubeDl/icon 2012-05-31 00:10:24 +0000 differ
953=== added file 'YoutubeDl/myYoutubeDownloader.py'
954--- YoutubeDl/myYoutubeDownloader.py 1970-01-01 00:00:00 +0000
955+++ YoutubeDl/myYoutubeDownloader.py 2012-05-31 00:10:24 +0000
956@@ -0,0 +1,299 @@
957+#! /usr/bin/env python
958+import random
959+import multiprocessing, Queue
960+import time, sys, os
961+from youtubedl import *
962+from Configuration import Configuration
963+#from CairoDockPlugin import CairoDockPlugin
964+from constantTypes import youtube
965+
966+#gloabal debug option
967+doDebug = False
968+
969+class myFileDownloader(FileDownloader):
970+ def __init__(self, params, result_queue, work_queue):
971+ super( myFileDownloader, self ).__init__(params)
972+ self.result_queue = result_queue
973+ self.work_queue = work_queue
974+ self.status = "Idle"
975+
976+ def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
977+ global doDebug
978+ reportList = percent_str+";"+data_len_str+";"+speed_str+";"+eta_str
979+ if not(self.status == "Aborting"):
980+ try:
981+ self.result_queue.put_nowait(reportList)
982+ if doDebug:
983+ print "putting report list on the result queue:\n "+reportList
984+ except Queue.Full:
985+ #clear the queue if the Plugin has not read it since the last put
986+ self.result_queue.get_nowait()
987+ if doDebug:
988+ print "Clearing result queue"
989+ #now go ahead and put information on it
990+ self.result_queue.put_nowait(reportList)
991+ if doDebug:
992+ print "report list placed on the queue: "+reportList
993+ try:
994+ command = self.work_queue.get_nowait()
995+ if command == 'Abort':
996+ if doDebug:
997+ print "Download Abort sent to sys.exit : report_progress"
998+ print "Download Abort sent to sys.exit : report_progress"
999+ sys.exit('Abort')
1000+ self.status = "Aborting"
1001+ #clear the queue if the Plugin has not read it since the last put
1002+ self.result_queue.get_nowait()
1003+ self.result_queue.get_nowait()
1004+ except Queue.Empty:
1005+ time.sleep(1)
1006+
1007+class YoutubeDownloader(multiprocessing.Process):
1008+
1009+ def __init__(self, work_queue, result_queue):
1010+ global doDebug
1011+
1012+ # base class initialization
1013+ multiprocessing.Process.__init__(self)
1014+
1015+ self.name = os.path.basename(os.path.abspath("."))
1016+ self.__config = Configuration(self.name)
1017+ # job management stuff
1018+ self.work_queue = work_queue
1019+ self.result_queue = result_queue
1020+ self.kill_received = False
1021+
1022+ def run(self):
1023+ global doDebug
1024+ while not self.kill_received:
1025+ if doDebug:
1026+ print "check for next url"
1027+ try:
1028+ url = self.work_queue.get()
1029+ self.__config.refresh()
1030+ videos_directory = self.__config.get('User Interface', 'videos_directory')
1031+ self.useFormat = self.__config.get('Download Options', 'useFormat')
1032+ self.videoFormat = self.__config.get('Download Options', 'videoFormat')
1033+ self.maxVideoFormat = self.__config.get('Download Options', 'maxVideoFormat')
1034+ self.useAuthentication = self.__config.get('Download Options', 'useAuthentication')
1035+ self.userName = self.__config.get('Download Options', 'userName')
1036+ self.userPassword = self.__config.get('Download Options', 'userPassword')
1037+ self.ignoreErrors = self.__config.get('Download Options', 'ignoreErrors')
1038+ self.resumeDownload = self.__config.get('Download Options', 'resumeDownload')
1039+ self.noOverwrites = self.__config.get('Download Options', 'noOverwrites')
1040+ self.useTitle = self.__config.get('Download Options', 'useTitle')
1041+
1042+ if not videos_directory:
1043+ videos_directory = os.path.abspath(os.path.expanduser("~")+"/Videos")
1044+ if not (videos_directory == os.path.abspath('.')):
1045+ os.chdir(videos_directory)
1046+ if doDebug:
1047+ print "current video directory is: "+os.path.abspath('.')
1048+ retcode = self.main(url)
1049+ except Queue.Empty:
1050+ time.sleep(1)
1051+
1052+ def debug():
1053+ global doDebug
1054+ doDebug = True
1055+
1056+ def main(self,url):
1057+ global doDebug
1058+ if doDebug:
1059+ print "In main URL is: "+url
1060+ parser, opts, args = parseOpts()
1061+ #Need to set user options from config file if we can
1062+ if self.useFormat == '0':
1063+ opts.format = youtube.videoFormats.get(self.videoFormat)
1064+ elif self.useFormat == '1':
1065+ opts.format_limit = youtube.videoFormats.get(self.maxVideoFormat)
1066+ if self.useAuthentication == '1':
1067+ opts.username = self.userName
1068+ opts.password = self.userPassword
1069+ opts.continue_dl = self.resumeDownload
1070+ opts.ignoreerrors = self.ignoreErrors
1071+ opts.nooverwrites = self.noOverwrites
1072+ opts.usetitle = self.useTitle
1073+
1074+ # Open appropriate CookieJar
1075+ if opts.cookiefile is None:
1076+ jar = cookielib.CookieJar()
1077+ else:
1078+ try:
1079+ jar = cookielib.MozillaCookieJar(opts.cookiefile)
1080+ if os.path.isfile(opts.cookiefile) and os.access(opts.cookiefile, os.R_OK):
1081+ jar.load()
1082+ except (IOError, OSError), err:
1083+ time.sleep(1)
1084+
1085+ # Dump user agent
1086+ if opts.dump_user_agent:
1087+ print std_headers['User-Agent']
1088+
1089+ # Batch file verification
1090+ batchurls = []
1091+ if opts.batchfile is not None:
1092+ try:
1093+ if opts.batchfile == '-':
1094+ batchfd = sys.stdin
1095+ else:
1096+ batchfd = open(opts.batchfile, 'r')
1097+ batchurls = batchfd.readlines()
1098+ batchurls = [x.strip() for x in batchurls]
1099+ batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
1100+ except IOError:
1101+ time.s;eep(1)
1102+ all_urls = batchurls
1103+ all_urls.append(url)
1104+
1105+ # General configuration
1106+ cookie_processor = urllib2.HTTPCookieProcessor(jar)
1107+ proxy_handler = urllib2.ProxyHandler()
1108+ opener = urllib2.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
1109+ urllib2.install_opener(opener)
1110+ socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
1111+
1112+ if opts.verbose:
1113+ print(u'[debug] Proxy map: ' + str(proxy_handler.proxies))
1114+
1115+ extractors = gen_extractors()
1116+
1117+ if opts.list_extractors:
1118+ for ie in extractors:
1119+ print(ie.IE_NAME)
1120+ matchedUrls = filter(lambda url: ie.suitable(url), all_urls)
1121+ all_urls = filter(lambda url: url not in matchedUrls, all_urls)
1122+ for mu in matchedUrls:
1123+ print(u' ' + mu)
1124+
1125+ # Conflicting, missing and erroneous options
1126+ if opts.usenetrc and (opts.username is not None or opts.password is not None):
1127+ parser.error(u'using .netrc conflicts with giving username/password')
1128+ if opts.password is not None and opts.username is None:
1129+ parser.error(u'account username missing')
1130+ if opts.outtmpl is not None and (opts.useliteral or opts.usetitle or opts.autonumber):
1131+ parser.error(u'using output template conflicts with using title, literal title or auto number')
1132+ if opts.usetitle and opts.useliteral:
1133+ parser.error(u'using title conflicts with using literal title')
1134+ if opts.username is not None and opts.password is None:
1135+ opts.password = getpass.getpass(u'Type account password and press return:')
1136+ if opts.ratelimit is not None:
1137+ numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
1138+ if numeric_limit is None:
1139+ parser.error(u'invalid rate limit specified')
1140+ opts.ratelimit = numeric_limit
1141+ if opts.retries is not None:
1142+ try:
1143+ opts.retries = long(opts.retries)
1144+ except (TypeError, ValueError), err:
1145+ parser.error(u'invalid retry count specified')
1146+ try:
1147+ opts.playliststart = int(opts.playliststart)
1148+ if opts.playliststart <= 0:
1149+ raise ValueError(u'Playlist start must be positive')
1150+ except (TypeError, ValueError), err:
1151+ parser.error(u'invalid playlist start number specified')
1152+ try:
1153+ opts.playlistend = int(opts.playlistend)
1154+ if opts.playlistend != -1 and (opts.playlistend <= 0 or opts.playlistend < opts.playliststart):
1155+ raise ValueError(u'Playlist end must be greater than playlist start')
1156+ except (TypeError, ValueError), err:
1157+ parser.error(u'invalid playlist end number specified')
1158+ if opts.extractaudio:
1159+ if opts.audioformat not in ['best', 'aac', 'mp3', 'vorbis', 'm4a', 'wav']:
1160+ parser.error(u'invalid audio format specified')
1161+
1162+ # File downloader
1163+ fd = myFileDownloader({
1164+ 'usenetrc': opts.usenetrc,
1165+ 'username': opts.username,
1166+ 'password': opts.password,
1167+ 'quiet': (opts.quiet or opts.geturl or opts.gettitle or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
1168+ 'forceurl': opts.geturl,
1169+ 'forcetitle': opts.gettitle,
1170+ 'forcethumbnail': opts.getthumbnail,
1171+ 'forcedescription': opts.getdescription,
1172+ 'forcefilename': opts.getfilename,
1173+ 'forceformat': opts.getformat,
1174+ 'simulate': opts.simulate,
1175+ 'skip_download': (opts.skip_download or opts.simulate or opts.geturl or opts.gettitle or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat),
1176+ 'format': opts.format,
1177+ 'format_limit': opts.format_limit,
1178+ 'listformats': opts.listformats,
1179+ 'outtmpl': ((opts.outtmpl is not None and opts.outtmpl.decode(preferredencoding()))
1180+ or (opts.format == '-1' and opts.usetitle and u'%(stitle)s-%(id)s-%(format)s.%(ext)s')
1181+ or (opts.format == '-1' and opts.useliteral and u'%(title)s-%(id)s-%(format)s.%(ext)s')
1182+ or (opts.format == '-1' and u'%(id)s-%(format)s.%(ext)s')
1183+ or (opts.usetitle and opts.autonumber and u'%(autonumber)s-%(stitle)s-%(id)s.%(ext)s')
1184+ or (opts.useliteral and opts.autonumber and u'%(autonumber)s-%(title)s-%(id)s.%(ext)s')
1185+ or (opts.usetitle and u'%(stitle)s-%(id)s.%(ext)s')
1186+ or (opts.useliteral and u'%(title)s-%(id)s.%(ext)s')
1187+ or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
1188+ or u'%(id)s.%(ext)s'),
1189+ 'ignoreerrors': opts.ignoreerrors,
1190+ 'ratelimit': opts.ratelimit,
1191+ 'nooverwrites': opts.nooverwrites,
1192+ 'retries': opts.retries,
1193+ 'continuedl': opts.continue_dl,
1194+ 'noprogress': opts.noprogress,
1195+ 'playliststart': opts.playliststart,
1196+ 'playlistend': opts.playlistend,
1197+ 'logtostderr': opts.outtmpl == '-',
1198+ 'consoletitle': opts.consoletitle,
1199+ 'nopart': opts.nopart,
1200+ 'updatetime': opts.updatetime,
1201+ 'writedescription': opts.writedescription,
1202+ 'writeinfojson': opts.writeinfojson,
1203+ 'matchtitle': opts.matchtitle,
1204+ 'rejecttitle': opts.rejecttitle,
1205+ 'max_downloads': opts.max_downloads,
1206+ 'prefer_free_formats': opts.prefer_free_formats,
1207+ 'verbose': opts.verbose,
1208+ }, self.result_queue, self.work_queue)
1209+ for extractor in extractors:
1210+ fd.add_info_extractor(extractor)
1211+
1212+ # PostProcessors
1213+ if opts.extractaudio:
1214+ fd.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, keepvideo=opts.keepvideo))
1215+
1216+ # Update version
1217+ if opts.update_self:
1218+ updateSelf(fd, sys.argv[0])
1219+
1220+ # Maybe do nothing
1221+ if len(all_urls) < 1:
1222+ if not opts.update_self:
1223+ parser.error(u'you must provide at least one URL')
1224+ else:
1225+ time.sleep(1)
1226+
1227+ try:
1228+ retcode = fd.download(all_urls)
1229+ self.result_queue.put("DownloadComplete")
1230+ if doDebug:
1231+ print "put DownloadComplete on result queue: return from fd.download"
1232+ except MaxDownloadsReached:
1233+ fd.to_screen(u'--max-download limit reached, aborting.')
1234+ self.result_queue.put("DownloadAborted")
1235+ if doDebug:
1236+ print "put DownloadAborted on result queue: MaxDownloadReached"
1237+ retcode = 101
1238+ except SystemExit:
1239+ self.result_queue.get_nowait()
1240+ self.result_queue.get_nowait()
1241+ self.result_queue.put("DownloadAborted")
1242+ print "put DownloadAborted on result queue: SystemExit"
1243+ if doDebug:
1244+ print "put DownloadAborted on result queue: SystemExit"
1245+
1246+ # Dump cookie jar if requested
1247+ if opts.cookiefile is not None:
1248+ try:
1249+ jar.save()
1250+ except (IOError, OSError), err:
1251+ time.s;eep(1)
1252+
1253+ print "myDownloader has ended"
1254+ if doDebug:
1255+ print "myDownloader has ended"
1256
1257=== added file 'YoutubeDl/preview'
1258Binary files YoutubeDl/preview 1970-01-01 00:00:00 +0000 and YoutubeDl/preview 2012-05-31 00:10:24 +0000 differ
1259=== added file 'YoutubeDl/userAlerts.py'
1260--- YoutubeDl/userAlerts.py 1970-01-01 00:00:00 +0000
1261+++ YoutubeDl/userAlerts.py 2012-05-31 00:10:24 +0000
1262@@ -0,0 +1,20 @@
1263+try:
1264+ import pynotify
1265+
1266+ def doUserAlert(master, message, time):
1267+ """
1268+ Notify user of alerts
1269+ """
1270+ if master.usePynotify and pynotify.init('YoutubeDl'):
1271+ n = pynotify.Notification(message)
1272+ n.show()
1273+ else:
1274+ master.icon.ShowDialog(message,time)
1275+except ImportError:
1276+
1277+ def doUserAlert(master, message, time):
1278+ """
1279+ Notify user of alerts
1280+ """
1281+ master.icon.ShowDialog(message,time)
1282+
1283
1284=== added file 'YoutubeDl/youtubedl.py'
1285--- YoutubeDl/youtubedl.py 1970-01-01 00:00:00 +0000
1286+++ YoutubeDl/youtubedl.py 2012-05-31 00:10:24 +0000
1287@@ -0,0 +1,4647 @@
1288+#!/usr/bin/env python
1289+# -*- coding: utf-8 -*-
1290+
1291+__authors__ = (
1292+ 'Ricardo Garcia Gonzalez',
1293+ 'Danny Colligan',
1294+ 'Benjamin Johnson',
1295+ 'Vasyl\' Vavrychuk',
1296+ 'Witold Baryluk',
1297+ 'Paweł Paprota',
1298+ 'Gergely Imreh',
1299+ 'Rogério Brito',
1300+ 'Philipp Hagemeister',
1301+ 'Sören Schulze',
1302+ 'Kevin Ngo',
1303+ 'Ori Avtalion',
1304+ 'shizeeg',
1305+ )
1306+
1307+__license__ = 'Public Domain'
1308+__version__ = '2012.02.27'
1309+
1310+UPDATE_URL = 'https://raw.github.com/rg3/youtube-dl/master/youtube-dl'
1311+
1312+
1313+import cookielib
1314+import datetime
1315+import getpass
1316+import gzip
1317+import htmlentitydefs
1318+import HTMLParser
1319+import httplib
1320+import locale
1321+import math
1322+import netrc
1323+import optparse
1324+import os
1325+import os.path
1326+import re
1327+import shlex
1328+import socket
1329+import string
1330+import subprocess
1331+import sys
1332+import time
1333+import urllib
1334+import urllib2
1335+import warnings
1336+import zlib
1337+
1338+if os.name == 'nt':
1339+ import ctypes
1340+
1341+try:
1342+ import email.utils
1343+except ImportError: # Python 2.4
1344+ import email.Utils
1345+try:
1346+ import cStringIO as StringIO
1347+except ImportError:
1348+ import StringIO
1349+
1350+# parse_qs was moved from the cgi module to the urlparse module recently.
1351+try:
1352+ from urlparse import parse_qs
1353+except ImportError:
1354+ from cgi import parse_qs
1355+
1356+try:
1357+ import lxml.etree
1358+except ImportError:
1359+ pass # Handled below
1360+
1361+try:
1362+ import xml.etree.ElementTree
1363+except ImportError: # Python<2.5: Not officially supported, but let it slip
1364+ warnings.warn('xml.etree.ElementTree support is missing. Consider upgrading to Python >= 2.5 if you get related errors.')
1365+
1366+std_headers = {
1367+ 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:5.0.1) Gecko/20100101 Firefox/5.0.1',
1368+ 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
1369+ 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
1370+ 'Accept-Encoding': 'gzip, deflate',
1371+ 'Accept-Language': 'en-us,en;q=0.5',
1372+}
1373+
1374+try:
1375+ import json
1376+except ImportError: # Python <2.6, use trivialjson (https://github.com/phihag/trivialjson):
1377+ import re
1378+ class json(object):
1379+ @staticmethod
1380+ def loads(s):
1381+ s = s.decode('UTF-8')
1382+ def raiseError(msg, i):
1383+ raise ValueError(msg + ' at position ' + str(i) + ' of ' + repr(s) + ': ' + repr(s[i:]))
1384+ def skipSpace(i, expectMore=True):
1385+ while i < len(s) and s[i] in ' \t\r\n':
1386+ i += 1
1387+ if expectMore:
1388+ if i >= len(s):
1389+ raiseError('Premature end', i)
1390+ return i
1391+ def decodeEscape(match):
1392+ esc = match.group(1)
1393+ _STATIC = {
1394+ '"': '"',
1395+ '\\': '\\',
1396+ '/': '/',
1397+ 'b': unichr(0x8),
1398+ 'f': unichr(0xc),
1399+ 'n': '\n',
1400+ 'r': '\r',
1401+ 't': '\t',
1402+ }
1403+ if esc in _STATIC:
1404+ return _STATIC[esc]
1405+ if esc[0] == 'u':
1406+ if len(esc) == 1+4:
1407+ return unichr(int(esc[1:5], 16))
1408+ if len(esc) == 5+6 and esc[5:7] == '\\u':
1409+ hi = int(esc[1:5], 16)
1410+ low = int(esc[7:11], 16)
1411+ return unichr((hi - 0xd800) * 0x400 + low - 0xdc00 + 0x10000)
1412+ raise ValueError('Unknown escape ' + str(esc))
1413+ def parseString(i):
1414+ i += 1
1415+ e = i
1416+ while True:
1417+ e = s.index('"', e)
1418+ bslashes = 0
1419+ while s[e-bslashes-1] == '\\':
1420+ bslashes += 1
1421+ if bslashes % 2 == 1:
1422+ e += 1
1423+ continue
1424+ break
1425+ rexp = re.compile(r'\\(u[dD][89aAbB][0-9a-fA-F]{2}\\u[0-9a-fA-F]{4}|u[0-9a-fA-F]{4}|.|$)')
1426+ stri = rexp.sub(decodeEscape, s[i:e])
1427+ return (e+1,stri)
1428+ def parseObj(i):
1429+ i += 1
1430+ res = {}
1431+ i = skipSpace(i)
1432+ if s[i] == '}': # Empty dictionary
1433+ return (i+1,res)
1434+ while True:
1435+ if s[i] != '"':
1436+ raiseError('Expected a string object key', i)
1437+ i,key = parseString(i)
1438+ i = skipSpace(i)
1439+ if i >= len(s) or s[i] != ':':
1440+ raiseError('Expected a colon', i)
1441+ i,val = parse(i+1)
1442+ res[key] = val
1443+ i = skipSpace(i)
1444+ if s[i] == '}':
1445+ return (i+1, res)
1446+ if s[i] != ',':
1447+ raiseError('Expected comma or closing curly brace', i)
1448+ i = skipSpace(i+1)
1449+ def parseArray(i):
1450+ res = []
1451+ i = skipSpace(i+1)
1452+ if s[i] == ']': # Empty array
1453+ return (i+1,res)
1454+ while True:
1455+ i,val = parse(i)
1456+ res.append(val)
1457+ i = skipSpace(i) # Raise exception if premature end
1458+ if s[i] == ']':
1459+ return (i+1, res)
1460+ if s[i] != ',':
1461+ raiseError('Expected a comma or closing bracket', i)
1462+ i = skipSpace(i+1)
1463+ def parseDiscrete(i):
1464+ for k,v in {'true': True, 'false': False, 'null': None}.items():
1465+ if s.startswith(k, i):
1466+ return (i+len(k), v)
1467+ raiseError('Not a boolean (or null)', i)
1468+ def parseNumber(i):
1469+ mobj = re.match('^(-?(0|[1-9][0-9]*)(\.[0-9]*)?([eE][+-]?[0-9]+)?)', s[i:])
1470+ if mobj is None:
1471+ raiseError('Not a number', i)
1472+ nums = mobj.group(1)
1473+ if '.' in nums or 'e' in nums or 'E' in nums:
1474+ return (i+len(nums), float(nums))
1475+ return (i+len(nums), int(nums))
1476+ CHARMAP = {'{': parseObj, '[': parseArray, '"': parseString, 't': parseDiscrete, 'f': parseDiscrete, 'n': parseDiscrete}
1477+ def parse(i):
1478+ i = skipSpace(i)
1479+ i,res = CHARMAP.get(s[i], parseNumber)(i)
1480+ i = skipSpace(i, False)
1481+ return (i,res)
1482+ i,res = parse(0)
1483+ if i < len(s):
1484+ raise ValueError('Extra data at end of input (index ' + str(i) + ' of ' + repr(s) + ': ' + repr(s[i:]) + ')')
1485+ return res
1486+
1487+def preferredencoding():
1488+ """Get preferred encoding.
1489+
1490+ Returns the best encoding scheme for the system, based on
1491+ locale.getpreferredencoding() and some further tweaks.
1492+ """
1493+ def yield_preferredencoding():
1494+ try:
1495+ pref = locale.getpreferredencoding()
1496+ u'TEST'.encode(pref)
1497+ except:
1498+ pref = 'UTF-8'
1499+ while True:
1500+ yield pref
1501+ return yield_preferredencoding().next()
1502+
1503+
1504+def htmlentity_transform(matchobj):
1505+ """Transforms an HTML entity to a Unicode character.
1506+
1507+ This function receives a match object and is intended to be used with
1508+ the re.sub() function.
1509+ """
1510+ entity = matchobj.group(1)
1511+
1512+ # Known non-numeric HTML entity
1513+ if entity in htmlentitydefs.name2codepoint:
1514+ return unichr(htmlentitydefs.name2codepoint[entity])
1515+
1516+ # Unicode character
1517+ mobj = re.match(ur'(?u)#(x?\d+)', entity)
1518+ if mobj is not None:
1519+ numstr = mobj.group(1)
1520+ if numstr.startswith(u'x'):
1521+ base = 16
1522+ numstr = u'0%s' % numstr
1523+ else:
1524+ base = 10
1525+ return unichr(long(numstr, base))
1526+
1527+ # Unknown entity in name, return its literal representation
1528+ return (u'&%s;' % entity)
1529+
1530+
1531+def sanitize_title(utitle):
1532+ """Sanitizes a video title so it could be used as part of a filename."""
1533+ utitle = re.sub(ur'(?u)&(.+?);', htmlentity_transform, utitle)
1534+ return utitle.replace(unicode(os.sep), u'%')
1535+
1536+
1537+def sanitize_open(filename, open_mode):
1538+ """Try to open the given filename, and slightly tweak it if this fails.
1539+
1540+ Attempts to open the given filename. If this fails, it tries to change
1541+ the filename slightly, step by step, until it's either able to open it
1542+ or it fails and raises a final exception, like the standard open()
1543+ function.
1544+
1545+ It returns the tuple (stream, definitive_file_name).
1546+ """
1547+ try:
1548+ if filename == u'-':
1549+ if sys.platform == 'win32':
1550+ import msvcrt
1551+ msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
1552+ return (sys.stdout, filename)
1553+ stream = open(_encodeFilename(filename), open_mode)
1554+ return (stream, filename)
1555+ except (IOError, OSError), err:
1556+ # In case of error, try to remove win32 forbidden chars
1557+ filename = re.sub(ur'[/<>:"\|\?\*]', u'#', filename)
1558+
1559+ # An exception here should be caught in the caller
1560+ stream = open(_encodeFilename(filename), open_mode)
1561+ return (stream, filename)
1562+
1563+
1564+def timeconvert(timestr):
1565+ """Convert RFC 2822 defined time string into system timestamp"""
1566+ timestamp = None
1567+ timetuple = email.utils.parsedate_tz(timestr)
1568+ if timetuple is not None:
1569+ timestamp = email.utils.mktime_tz(timetuple)
1570+ return timestamp
1571+
1572+def _simplify_title(title):
1573+ expr = re.compile(ur'[^\w\d_\-]+', flags=re.UNICODE)
1574+ return expr.sub(u'_', title).strip(u'_')
1575+
1576+def _orderedSet(iterable):
1577+ """ Remove all duplicates from the input iterable """
1578+ res = []
1579+ for el in iterable:
1580+ if el not in res:
1581+ res.append(el)
1582+ return res
1583+
1584+def _unescapeHTML(s):
1585+ """
1586+ @param s a string (of type unicode)
1587+ """
1588+ assert type(s) == type(u'')
1589+
1590+ htmlParser = HTMLParser.HTMLParser()
1591+ return htmlParser.unescape(s)
1592+
1593+def _encodeFilename(s):
1594+ """
1595+ @param s The name of the file (of type unicode)
1596+ """
1597+
1598+ assert type(s) == type(u'')
1599+
1600+ if sys.platform == 'win32' and sys.getwindowsversion().major >= 5:
1601+ # Pass u'' directly to use Unicode APIs on Windows 2000 and up
1602+ # (Detecting Windows NT 4 is tricky because 'major >= 4' would
1603+ # match Windows 9x series as well. Besides, NT 4 is obsolete.)
1604+ return s
1605+ else:
1606+ return s.encode(sys.getfilesystemencoding(), 'ignore')
1607+
1608+class DownloadError(Exception):
1609+ """Download Error exception.
1610+
1611+ This exception may be thrown by FileDownloader objects if they are not
1612+ configured to continue on errors. They will contain the appropriate
1613+ error message.
1614+ """
1615+ pass
1616+
1617+
1618+class SameFileError(Exception):
1619+ """Same File exception.
1620+
1621+ This exception will be thrown by FileDownloader objects if they detect
1622+ multiple files would have to be downloaded to the same file on disk.
1623+ """
1624+ pass
1625+
1626+
1627+class PostProcessingError(Exception):
1628+ """Post Processing exception.
1629+
1630+ This exception may be raised by PostProcessor's .run() method to
1631+ indicate an error in the postprocessing task.
1632+ """
1633+ pass
1634+
1635+class MaxDownloadsReached(Exception):
1636+ """ --max-downloads limit has been reached. """
1637+ pass
1638+
1639+
1640+class UnavailableVideoError(Exception):
1641+ """Unavailable Format exception.
1642+
1643+ This exception will be thrown when a video is requested
1644+ in a format that is not available for that video.
1645+ """
1646+ pass
1647+
1648+
1649+class ContentTooShortError(Exception):
1650+ """Content Too Short exception.
1651+
1652+ This exception may be raised by FileDownloader objects when a file they
1653+ download is too small for what the server announced first, indicating
1654+ the connection was probably interrupted.
1655+ """
1656+ # Both in bytes
1657+ downloaded = None
1658+ expected = None
1659+
1660+ def __init__(self, downloaded, expected):
1661+ self.downloaded = downloaded
1662+ self.expected = expected
1663+
1664+
1665+class YoutubeDLHandler(urllib2.HTTPHandler):
1666+ """Handler for HTTP requests and responses.
1667+
1668+ This class, when installed with an OpenerDirector, automatically adds
1669+ the standard headers to every HTTP request and handles gzipped and
1670+ deflated responses from web servers. If compression is to be avoided in
1671+ a particular request, the original request in the program code only has
1672+ to include the HTTP header "Youtubedl-No-Compression", which will be
1673+ removed before making the real request.
1674+
1675+ Part of this code was copied from:
1676+
1677+ http://techknack.net/python-urllib2-handlers/
1678+
1679+ Andrew Rowls, the author of that code, agreed to release it to the
1680+ public domain.
1681+ """
1682+
1683+ @staticmethod
1684+ def deflate(data):
1685+ try:
1686+ return zlib.decompress(data, -zlib.MAX_WBITS)
1687+ except zlib.error:
1688+ return zlib.decompress(data)
1689+
1690+ @staticmethod
1691+ def addinfourl_wrapper(stream, headers, url, code):
1692+ if hasattr(urllib2.addinfourl, 'getcode'):
1693+ return urllib2.addinfourl(stream, headers, url, code)
1694+ ret = urllib2.addinfourl(stream, headers, url)
1695+ ret.code = code
1696+ return ret
1697+
1698+ def http_request(self, req):
1699+ for h in std_headers:
1700+ if h in req.headers:
1701+ del req.headers[h]
1702+ req.add_header(h, std_headers[h])
1703+ if 'Youtubedl-no-compression' in req.headers:
1704+ if 'Accept-encoding' in req.headers:
1705+ del req.headers['Accept-encoding']
1706+ del req.headers['Youtubedl-no-compression']
1707+ return req
1708+
1709+ def http_response(self, req, resp):
1710+ old_resp = resp
1711+ # gzip
1712+ if resp.headers.get('Content-encoding', '') == 'gzip':
1713+ gz = gzip.GzipFile(fileobj=StringIO.StringIO(resp.read()), mode='r')
1714+ resp = self.addinfourl_wrapper(gz, old_resp.headers, old_resp.url, old_resp.code)
1715+ resp.msg = old_resp.msg
1716+ # deflate
1717+ if resp.headers.get('Content-encoding', '') == 'deflate':
1718+ gz = StringIO.StringIO(self.deflate(resp.read()))
1719+ resp = self.addinfourl_wrapper(gz, old_resp.headers, old_resp.url, old_resp.code)
1720+ resp.msg = old_resp.msg
1721+ return resp
1722+
1723+
1724+class FileDownloader(object):
1725+ """File Downloader class.
1726+
1727+ File downloader objects are the ones responsible of downloading the
1728+ actual video file and writing it to disk if the user has requested
1729+ it, among some other tasks. In most cases there should be one per
1730+ program. As, given a video URL, the downloader doesn't know how to
1731+ extract all the needed information, task that InfoExtractors do, it
1732+ has to pass the URL to one of them.
1733+
1734+ For this, file downloader objects have a method that allows
1735+ InfoExtractors to be registered in a given order. When it is passed
1736+ a URL, the file downloader handles it to the first InfoExtractor it
1737+ finds that reports being able to handle it. The InfoExtractor extracts
1738+ all the information about the video or videos the URL refers to, and
1739+ asks the FileDownloader to process the video information, possibly
1740+ downloading the video.
1741+
1742+ File downloaders accept a lot of parameters. In order not to saturate
1743+ the object constructor with arguments, it receives a dictionary of
1744+ options instead. These options are available through the params
1745+ attribute for the InfoExtractors to use. The FileDownloader also
1746+ registers itself as the downloader in charge for the InfoExtractors
1747+ that are added to it, so this is a "mutual registration".
1748+
1749+ Available options:
1750+
1751+ username: Username for authentication purposes.
1752+ password: Password for authentication purposes.
1753+ usenetrc: Use netrc for authentication instead.
1754+ quiet: Do not print messages to stdout.
1755+ forceurl: Force printing final URL.
1756+ forcetitle: Force printing title.
1757+ forcethumbnail: Force printing thumbnail URL.
1758+ forcedescription: Force printing description.
1759+ forcefilename: Force printing final filename.
1760+ simulate: Do not download the video files.
1761+ format: Video format code.
1762+ format_limit: Highest quality format to try.
1763+ outtmpl: Template for output names.
1764+ ignoreerrors: Do not stop on download errors.
1765+ ratelimit: Download speed limit, in bytes/sec.
1766+ nooverwrites: Prevent overwriting files.
1767+ retries: Number of times to retry for HTTP error 5xx
1768+ continuedl: Try to continue downloads if possible.
1769+ noprogress: Do not print the progress bar.
1770+ playliststart: Playlist item to start at.
1771+ playlistend: Playlist item to end at.
1772+ matchtitle: Download only matching titles.
1773+ rejecttitle: Reject downloads for matching titles.
1774+ logtostderr: Log messages to stderr instead of stdout.
1775+ consoletitle: Display progress in console window's titlebar.
1776+ nopart: Do not use temporary .part files.
1777+ updatetime: Use the Last-modified header to set output file timestamps.
1778+ writedescription: Write the video description to a .description file
1779+ writeinfojson: Write the video description to a .info.json file
1780+ """
1781+
1782+ params = None
1783+ _ies = []
1784+ _pps = []
1785+ _download_retcode = None
1786+ _num_downloads = None
1787+ _screen_file = None
1788+
1789+ def __init__(self, params):
1790+ """Create a FileDownloader object with the given options."""
1791+ self._ies = []
1792+ self._pps = []
1793+ self._download_retcode = 0
1794+ self._num_downloads = 0
1795+ self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
1796+ self.params = params
1797+
1798+ @staticmethod
1799+ def format_bytes(bytes):
1800+ if bytes is None:
1801+ return 'N/A'
1802+ if type(bytes) is str:
1803+ bytes = float(bytes)
1804+ if bytes == 0.0:
1805+ exponent = 0
1806+ else:
1807+ exponent = long(math.log(bytes, 1024.0))
1808+ suffix = 'bkMGTPEZY'[exponent]
1809+ converted = float(bytes) / float(1024 ** exponent)
1810+ return '%.2f%s' % (converted, suffix)
1811+
1812+ @staticmethod
1813+ def calc_percent(byte_counter, data_len):
1814+ if data_len is None:
1815+ return '---.-%'
1816+ return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
1817+
1818+ @staticmethod
1819+ def calc_eta(start, now, total, current):
1820+ if total is None:
1821+ return '--:--'
1822+ dif = now - start
1823+ if current == 0 or dif < 0.001: # One millisecond
1824+ return '--:--'
1825+ rate = float(current) / dif
1826+ eta = long((float(total) - float(current)) / rate)
1827+ (eta_mins, eta_secs) = divmod(eta, 60)
1828+ if eta_mins > 99:
1829+ return '--:--'
1830+ return '%02d:%02d' % (eta_mins, eta_secs)
1831+
1832+ @staticmethod
1833+ def calc_speed(start, now, bytes):
1834+ dif = now - start
1835+ if bytes == 0 or dif < 0.001: # One millisecond
1836+ return '%10s' % '---b/s'
1837+ return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
1838+
1839+ @staticmethod
1840+ def best_block_size(elapsed_time, bytes):
1841+ new_min = max(bytes / 2.0, 1.0)
1842+ new_max = min(max(bytes * 2.0, 1.0), 4194304) # Do not surpass 4 MB
1843+ if elapsed_time < 0.001:
1844+ return long(new_max)
1845+ rate = bytes / elapsed_time
1846+ if rate > new_max:
1847+ return long(new_max)
1848+ if rate < new_min:
1849+ return long(new_min)
1850+ return long(rate)
1851+
1852+ @staticmethod
1853+ def parse_bytes(bytestr):
1854+ """Parse a string indicating a byte quantity into a long integer."""
1855+ matchobj = re.match(r'(?i)^(\d+(?:\.\d+)?)([kMGTPEZY]?)$', bytestr)
1856+ if matchobj is None:
1857+ return None
1858+ number = float(matchobj.group(1))
1859+ multiplier = 1024.0 ** 'bkmgtpezy'.index(matchobj.group(2).lower())
1860+ return long(round(number * multiplier))
1861+
1862+ def add_info_extractor(self, ie):
1863+ """Add an InfoExtractor object to the end of the list."""
1864+ self._ies.append(ie)
1865+ ie.set_downloader(self)
1866+
1867+ def add_post_processor(self, pp):
1868+ """Add a PostProcessor object to the end of the chain."""
1869+ self._pps.append(pp)
1870+ pp.set_downloader(self)
1871+
1872+ def to_screen(self, message, skip_eol=False):
1873+ """Print message to stdout if not in quiet mode."""
1874+ assert type(message) == type(u'')
1875+ if not self.params.get('quiet', False):
1876+ terminator = [u'\n', u''][skip_eol]
1877+ output = message + terminator
1878+
1879+ if 'b' not in self._screen_file.mode or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
1880+ output = output.encode(preferredencoding(), 'ignore')
1881+ self._screen_file.write(output)
1882+ self._screen_file.flush()
1883+
1884+ def to_stderr(self, message):
1885+ """Print message to stderr."""
1886+ print >>sys.stderr, message.encode(preferredencoding())
1887+
1888+ def to_cons_title(self, message):
1889+ """Set console/terminal window title to message."""
1890+ if not self.params.get('consoletitle', False):
1891+ return
1892+ if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
1893+ # c_wchar_p() might not be necessary if `message` is
1894+ # already of type unicode()
1895+ ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
1896+ elif 'TERM' in os.environ:
1897+ sys.stderr.write('\033]0;%s\007' % message.encode(preferredencoding()))
1898+
1899+ def fixed_template(self):
1900+ """Checks if the output template is fixed."""
1901+ return (re.search(ur'(?u)%\(.+?\)s', self.params['outtmpl']) is None)
1902+
1903+ def trouble(self, message=None):
1904+ """Determine action to take when a download problem appears.
1905+
1906+ Depending on if the downloader has been configured to ignore
1907+ download errors or not, this method may throw an exception or
1908+ not when errors are found, after printing the message.
1909+ """
1910+ if message is not None:
1911+ self.to_stderr(message)
1912+ if not self.params.get('ignoreerrors', False):
1913+ raise DownloadError(message)
1914+ self._download_retcode = 1
1915+
1916+ def slow_down(self, start_time, byte_counter):
1917+ """Sleep if the download speed is over the rate limit."""
1918+ rate_limit = self.params.get('ratelimit', None)
1919+ if rate_limit is None or byte_counter == 0:
1920+ return
1921+ now = time.time()
1922+ elapsed = now - start_time
1923+ if elapsed <= 0.0:
1924+ return
1925+ speed = float(byte_counter) / elapsed
1926+ if speed > rate_limit:
1927+ time.sleep((byte_counter - rate_limit * (now - start_time)) / rate_limit)
1928+
1929+ def temp_name(self, filename):
1930+ """Returns a temporary filename for the given filename."""
1931+ if self.params.get('nopart', False) or filename == u'-' or \
1932+ (os.path.exists(_encodeFilename(filename)) and not os.path.isfile(_encodeFilename(filename))):
1933+ return filename
1934+ return filename + u'.part'
1935+
1936+ def undo_temp_name(self, filename):
1937+ if filename.endswith(u'.part'):
1938+ return filename[:-len(u'.part')]
1939+ return filename
1940+
1941+ def try_rename(self, old_filename, new_filename):
1942+ try:
1943+ if old_filename == new_filename:
1944+ return
1945+ os.rename(_encodeFilename(old_filename), _encodeFilename(new_filename))
1946+ except (IOError, OSError), err:
1947+ self.trouble(u'ERROR: unable to rename file')
1948+
1949+ def try_utime(self, filename, last_modified_hdr):
1950+ """Try to set the last-modified time of the given file."""
1951+ if last_modified_hdr is None:
1952+ return
1953+ if not os.path.isfile(_encodeFilename(filename)):
1954+ return
1955+ timestr = last_modified_hdr
1956+ if timestr is None:
1957+ return
1958+ filetime = timeconvert(timestr)
1959+ if filetime is None:
1960+ return filetime
1961+ try:
1962+ os.utime(filename, (time.time(), filetime))
1963+ except:
1964+ pass
1965+ return filetime
1966+
1967+ def report_writedescription(self, descfn):
1968+ """ Report that the description file is being written """
1969+ self.to_screen(u'[info] Writing video description to: ' + descfn)
1970+
1971+ def report_writeinfojson(self, infofn):
1972+ """ Report that the metadata file has been written """
1973+ self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
1974+
1975+ def report_destination(self, filename):
1976+ """Report destination filename."""
1977+ self.to_screen(u'[download] Destination: ' + filename)
1978+
1979+ def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
1980+ """Report download progress."""
1981+ if self.params.get('noprogress', False):
1982+ return
1983+ self.to_screen(u'\r[download] %s of %s at %s ETA %s' %
1984+ (percent_str, data_len_str, speed_str, eta_str), skip_eol=True)
1985+ self.to_cons_title(u'youtube-dl - %s of %s at %s ETA %s' %
1986+ (percent_str.strip(), data_len_str.strip(), speed_str.strip(), eta_str.strip()))
1987+
1988+ def report_resuming_byte(self, resume_len):
1989+ """Report attempt to resume at given byte."""
1990+ self.to_screen(u'[download] Resuming download at byte %s' % resume_len)
1991+
1992+ def report_retry(self, count, retries):
1993+ """Report retry in case of HTTP error 5xx"""
1994+ self.to_screen(u'[download] Got server HTTP error. Retrying (attempt %d of %d)...' % (count, retries))
1995+
1996+ def report_file_already_downloaded(self, file_name):
1997+ """Report file has already been fully downloaded."""
1998+ try:
1999+ self.to_screen(u'[download] %s has already been downloaded' % file_name)
2000+ except (UnicodeEncodeError), err:
2001+ self.to_screen(u'[download] The file has already been downloaded')
2002+
2003+ def report_unable_to_resume(self):
2004+ """Report it was impossible to resume download."""
2005+ self.to_screen(u'[download] Unable to resume')
2006+
2007+ def report_finish(self):
2008+ """Report download finished."""
2009+ if self.params.get('noprogress', False):
2010+ self.to_screen(u'[download] Download completed')
2011+ else:
2012+ self.to_screen(u'')
2013+
2014+ def increment_downloads(self):
2015+ """Increment the ordinal that assigns a number to each file."""
2016+ self._num_downloads += 1
2017+
2018+ def prepare_filename(self, info_dict):
2019+ """Generate the output filename."""
2020+ try:
2021+ template_dict = dict(info_dict)
2022+ template_dict['epoch'] = unicode(long(time.time()))
2023+ template_dict['autonumber'] = unicode('%05d' % self._num_downloads)
2024+ filename = self.params['outtmpl'] % template_dict
2025+ return filename
2026+ except (ValueError, KeyError), err:
2027+ self.trouble(u'ERROR: invalid system charset or erroneous output template')
2028+ return None
2029+
2030+ def _match_entry(self, info_dict):
2031+ """ Returns None iff the file should be downloaded """
2032+
2033+ title = info_dict['title']
2034+ matchtitle = self.params.get('matchtitle', False)
2035+ if matchtitle and not re.search(matchtitle, title, re.IGNORECASE):
2036+ return u'[download] "' + title + '" title did not match pattern "' + matchtitle + '"'
2037+ rejecttitle = self.params.get('rejecttitle', False)
2038+ if rejecttitle and re.search(rejecttitle, title, re.IGNORECASE):
2039+ return u'"' + title + '" title matched reject pattern "' + rejecttitle + '"'
2040+ return None
2041+
2042+ def process_info(self, info_dict):
2043+ """Process a single dictionary returned by an InfoExtractor."""
2044+
2045+ reason = self._match_entry(info_dict)
2046+ if reason is not None:
2047+ self.to_screen(u'[download] ' + reason)
2048+ return
2049+
2050+ max_downloads = self.params.get('max_downloads')
2051+ if max_downloads is not None:
2052+ if self._num_downloads > int(max_downloads):
2053+ raise MaxDownloadsReached()
2054+
2055+ filename = self.prepare_filename(info_dict)
2056+
2057+ # Forced printings
2058+ if self.params.get('forcetitle', False):
2059+ print info_dict['title'].encode(preferredencoding(), 'xmlcharrefreplace')
2060+ if self.params.get('forceurl', False):
2061+ print info_dict['url'].encode(preferredencoding(), 'xmlcharrefreplace')
2062+ if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict:
2063+ print info_dict['thumbnail'].encode(preferredencoding(), 'xmlcharrefreplace')
2064+ if self.params.get('forcedescription', False) and 'description' in info_dict:
2065+ print info_dict['description'].encode(preferredencoding(), 'xmlcharrefreplace')
2066+ if self.params.get('forcefilename', False) and filename is not None:
2067+ print filename.encode(preferredencoding(), 'xmlcharrefreplace')
2068+ if self.params.get('forceformat', False):
2069+ print info_dict['format'].encode(preferredencoding(), 'xmlcharrefreplace')
2070+
2071+ # Do nothing else if in simulate mode
2072+ if self.params.get('simulate', False):
2073+ return
2074+
2075+ if filename is None:
2076+ return
2077+
2078+ try:
2079+ dn = os.path.dirname(_encodeFilename(filename))
2080+ if dn != '' and not os.path.exists(dn): # dn is already encoded
2081+ os.makedirs(dn)
2082+ except (OSError, IOError), err:
2083+ self.trouble(u'ERROR: unable to create directory ' + unicode(err))
2084+ return
2085+
2086+ if self.params.get('writedescription', False):
2087+ try:
2088+ descfn = filename + u'.description'
2089+ self.report_writedescription(descfn)
2090+ descfile = open(_encodeFilename(descfn), 'wb')
2091+ try:
2092+ descfile.write(info_dict['description'].encode('utf-8'))
2093+ finally:
2094+ descfile.close()
2095+ except (OSError, IOError):
2096+ self.trouble(u'ERROR: Cannot write description file ' + descfn)
2097+ return
2098+
2099+ if self.params.get('writeinfojson', False):
2100+ infofn = filename + u'.info.json'
2101+ self.report_writeinfojson(infofn)
2102+ try:
2103+ json.dump
2104+ except (NameError,AttributeError):
2105+ self.trouble(u'ERROR: No JSON encoder found. Update to Python 2.6+, setup a json module, or leave out --write-info-json.')
2106+ return
2107+ try:
2108+ infof = open(_encodeFilename(infofn), 'wb')
2109+ try:
2110+ json_info_dict = dict((k,v) for k,v in info_dict.iteritems() if not k in ('urlhandle',))
2111+ json.dump(json_info_dict, infof)
2112+ finally:
2113+ infof.close()
2114+ except (OSError, IOError):
2115+ self.trouble(u'ERROR: Cannot write metadata to JSON file ' + infofn)
2116+ return
2117+
2118+ if not self.params.get('skip_download', False):
2119+ if self.params.get('nooverwrites', False) and os.path.exists(_encodeFilename(filename)):
2120+ success = True
2121+ else:
2122+ try:
2123+ success = self._do_download(filename, info_dict)
2124+ except (OSError, IOError), err:
2125+ raise UnavailableVideoError
2126+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2127+ self.trouble(u'ERROR: unable to download video data: %s' % str(err))
2128+ return
2129+ except (ContentTooShortError, ), err:
2130+ self.trouble(u'ERROR: content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
2131+ return
2132+
2133+ if success:
2134+ try:
2135+ self.post_process(filename, info_dict)
2136+ except (PostProcessingError), err:
2137+ self.trouble(u'ERROR: postprocessing: %s' % str(err))
2138+ return
2139+
2140+ def download(self, url_list):
2141+ """Download a given list of URLs."""
2142+ if len(url_list) > 1 and self.fixed_template():
2143+ raise SameFileError(self.params['outtmpl'])
2144+
2145+ for url in url_list:
2146+ suitable_found = False
2147+ for ie in self._ies:
2148+ # Go to next InfoExtractor if not suitable
2149+ if not ie.suitable(url):
2150+ continue
2151+
2152+ # Suitable InfoExtractor found
2153+ suitable_found = True
2154+
2155+ # Extract information from URL and process it
2156+ ie.extract(url)
2157+
2158+ # Suitable InfoExtractor had been found; go to next URL
2159+ break
2160+
2161+ if not suitable_found:
2162+ self.trouble(u'ERROR: no suitable InfoExtractor: %s' % url)
2163+
2164+ return self._download_retcode
2165+
2166+ def post_process(self, filename, ie_info):
2167+ """Run the postprocessing chain on the given file."""
2168+ info = dict(ie_info)
2169+ info['filepath'] = filename
2170+ for pp in self._pps:
2171+ info = pp.run(info)
2172+ if info is None:
2173+ break
2174+
2175+ def _download_with_rtmpdump(self, filename, url, player_url):
2176+ self.report_destination(filename)
2177+ tmpfilename = self.temp_name(filename)
2178+
2179+ # Check for rtmpdump first
2180+ try:
2181+ subprocess.call(['rtmpdump', '-h'], stdout=(file(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
2182+ except (OSError, IOError):
2183+ self.trouble(u'ERROR: RTMP download detected but "rtmpdump" could not be run')
2184+ return False
2185+
2186+ # Download using rtmpdump. rtmpdump returns exit code 2 when
2187+ # the connection was interrumpted and resuming appears to be
2188+ # possible. This is part of rtmpdump's normal usage, AFAIK.
2189+ basic_args = ['rtmpdump', '-q'] + [[], ['-W', player_url]][player_url is not None] + ['-r', url, '-o', tmpfilename]
2190+ args = basic_args + [[], ['-e', '-k', '1']][self.params.get('continuedl', False)]
2191+ if self.params.get('verbose', False):
2192+ try:
2193+ import pipes
2194+ shell_quote = lambda args: ' '.join(map(pipes.quote, args))
2195+ except ImportError:
2196+ shell_quote = repr
2197+ self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
2198+ retval = subprocess.call(args)
2199+ while retval == 2 or retval == 1:
2200+ prevsize = os.path.getsize(_encodeFilename(tmpfilename))
2201+ self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
2202+ time.sleep(5.0) # This seems to be needed
2203+ retval = subprocess.call(basic_args + ['-e'] + [[], ['-k', '1']][retval == 1])
2204+ cursize = os.path.getsize(_encodeFilename(tmpfilename))
2205+ if prevsize == cursize and retval == 1:
2206+ break
2207+ # Some rtmp streams seem abort after ~ 99.8%. Don't complain for those
2208+ if prevsize == cursize and retval == 2 and cursize > 1024:
2209+ self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
2210+ retval = 0
2211+ break
2212+ if retval == 0:
2213+ self.to_screen(u'\r[rtmpdump] %s bytes' % os.path.getsize(_encodeFilename(tmpfilename)))
2214+ self.try_rename(tmpfilename, filename)
2215+ return True
2216+ else:
2217+ self.trouble(u'\nERROR: rtmpdump exited with code %d' % retval)
2218+ return False
2219+
2220+ def _do_download(self, filename, info_dict):
2221+ url = info_dict['url']
2222+ player_url = info_dict.get('player_url', None)
2223+
2224+ # Check file already present
2225+ if self.params.get('continuedl', False) and os.path.isfile(_encodeFilename(filename)) and not self.params.get('nopart', False):
2226+ self.report_file_already_downloaded(filename)
2227+ return True
2228+
2229+ # Attempt to download using rtmpdump
2230+ if url.startswith('rtmp'):
2231+ return self._download_with_rtmpdump(filename, url, player_url)
2232+
2233+ tmpfilename = self.temp_name(filename)
2234+ stream = None
2235+
2236+ # Do not include the Accept-Encoding header
2237+ headers = {'Youtubedl-no-compression': 'True'}
2238+ basic_request = urllib2.Request(url, None, headers)
2239+ request = urllib2.Request(url, None, headers)
2240+
2241+ # Establish possible resume length
2242+ if os.path.isfile(_encodeFilename(tmpfilename)):
2243+ resume_len = os.path.getsize(_encodeFilename(tmpfilename))
2244+ else:
2245+ resume_len = 0
2246+
2247+ open_mode = 'wb'
2248+ if resume_len != 0:
2249+ if self.params.get('continuedl', False):
2250+ self.report_resuming_byte(resume_len)
2251+ request.add_header('Range','bytes=%d-' % resume_len)
2252+ open_mode = 'ab'
2253+ else:
2254+ resume_len = 0
2255+
2256+ count = 0
2257+ retries = self.params.get('retries', 0)
2258+ while count <= retries:
2259+ # Establish connection
2260+ try:
2261+ if count == 0 and 'urlhandle' in info_dict:
2262+ data = info_dict['urlhandle']
2263+ data = urllib2.urlopen(request)
2264+ break
2265+ except (urllib2.HTTPError, ), err:
2266+ if (err.code < 500 or err.code >= 600) and err.code != 416:
2267+ # Unexpected HTTP error
2268+ raise
2269+ elif err.code == 416:
2270+ # Unable to resume (requested range not satisfiable)
2271+ try:
2272+ # Open the connection again without the range header
2273+ data = urllib2.urlopen(basic_request)
2274+ content_length = data.info()['Content-Length']
2275+ except (urllib2.HTTPError, ), err:
2276+ if err.code < 500 or err.code >= 600:
2277+ raise
2278+ else:
2279+ # Examine the reported length
2280+ if (content_length is not None and
2281+ (resume_len - 100 < long(content_length) < resume_len + 100)):
2282+ # The file had already been fully downloaded.
2283+ # Explanation to the above condition: in issue #175 it was revealed that
2284+ # YouTube sometimes adds or removes a few bytes from the end of the file,
2285+ # changing the file size slightly and causing problems for some users. So
2286+ # I decided to implement a suggested change and consider the file
2287+ # completely downloaded if the file size differs less than 100 bytes from
2288+ # the one in the hard drive.
2289+ self.report_file_already_downloaded(filename)
2290+ self.try_rename(tmpfilename, filename)
2291+ return True
2292+ else:
2293+ # The length does not match, we start the download over
2294+ self.report_unable_to_resume()
2295+ open_mode = 'wb'
2296+ break
2297+ # Retry
2298+ count += 1
2299+ if count <= retries:
2300+ self.report_retry(count, retries)
2301+
2302+ if count > retries:
2303+ self.trouble(u'ERROR: giving up after %s retries' % retries)
2304+ return False
2305+
2306+ data_len = data.info().get('Content-length', None)
2307+ if data_len is not None:
2308+ data_len = long(data_len) + resume_len
2309+ data_len_str = self.format_bytes(data_len)
2310+ byte_counter = 0 + resume_len
2311+ block_size = 1024
2312+ start = time.time()
2313+ while True:
2314+ # Download and write
2315+ before = time.time()
2316+ data_block = data.read(block_size)
2317+ after = time.time()
2318+ if len(data_block) == 0:
2319+ break
2320+ byte_counter += len(data_block)
2321+
2322+ # Open file just in time
2323+ if stream is None:
2324+ try:
2325+ (stream, tmpfilename) = sanitize_open(tmpfilename, open_mode)
2326+ assert stream is not None
2327+ filename = self.undo_temp_name(tmpfilename)
2328+ self.report_destination(filename)
2329+ except (OSError, IOError), err:
2330+ self.trouble(u'ERROR: unable to open for writing: %s' % str(err))
2331+ return False
2332+ try:
2333+ stream.write(data_block)
2334+ except (IOError, OSError), err:
2335+ self.trouble(u'\nERROR: unable to write data: %s' % str(err))
2336+ return False
2337+ block_size = self.best_block_size(after - before, len(data_block))
2338+
2339+ # Progress message
2340+ speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
2341+ if data_len is None:
2342+ self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
2343+ else:
2344+ percent_str = self.calc_percent(byte_counter, data_len)
2345+ eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
2346+ self.report_progress(percent_str, data_len_str, speed_str, eta_str)
2347+
2348+ # Apply rate limit
2349+ self.slow_down(start, byte_counter - resume_len)
2350+
2351+ if stream is None:
2352+ self.trouble(u'\nERROR: Did not get any data blocks')
2353+ return False
2354+ stream.close()
2355+ self.report_finish()
2356+ if data_len is not None and byte_counter != data_len:
2357+ raise ContentTooShortError(byte_counter, long(data_len))
2358+ self.try_rename(tmpfilename, filename)
2359+
2360+ # Update file modification time
2361+ if self.params.get('updatetime', True):
2362+ info_dict['filetime'] = self.try_utime(filename, data.info().get('last-modified', None))
2363+
2364+ return True
2365+
2366+
2367+class InfoExtractor(object):
2368+ """Information Extractor class.
2369+
2370+ Information extractors are the classes that, given a URL, extract
2371+ information from the video (or videos) the URL refers to. This
2372+ information includes the real video URL, the video title and simplified
2373+ title, author and others. The information is stored in a dictionary
2374+ which is then passed to the FileDownloader. The FileDownloader
2375+ processes this information possibly downloading the video to the file
2376+ system, among other possible outcomes. The dictionaries must include
2377+ the following fields:
2378+
2379+ id: Video identifier.
2380+ url: Final video URL.
2381+ uploader: Nickname of the video uploader.
2382+ title: Literal title.
2383+ stitle: Simplified title.
2384+ ext: Video filename extension.
2385+ format: Video format.
2386+ player_url: SWF Player URL (may be None).
2387+
2388+ The following fields are optional. Their primary purpose is to allow
2389+ youtube-dl to serve as the backend for a video search function, such
2390+ as the one in youtube2mp3. They are only used when their respective
2391+ forced printing functions are called:
2392+
2393+ thumbnail: Full URL to a video thumbnail image.
2394+ description: One-line video description.
2395+
2396+ Subclasses of this one should re-define the _real_initialize() and
2397+ _real_extract() methods and define a _VALID_URL regexp.
2398+ Probably, they should also be added to the list of extractors.
2399+ """
2400+
2401+ _ready = False
2402+ _downloader = None
2403+
2404+ def __init__(self, downloader=None):
2405+ """Constructor. Receives an optional downloader."""
2406+ self._ready = False
2407+ self.set_downloader(downloader)
2408+
2409+ def suitable(self, url):
2410+ """Receives a URL and returns True if suitable for this IE."""
2411+ return re.match(self._VALID_URL, url) is not None
2412+
2413+ def initialize(self):
2414+ """Initializes an instance (authentication, etc)."""
2415+ if not self._ready:
2416+ self._real_initialize()
2417+ self._ready = True
2418+
2419+ def extract(self, url):
2420+ """Extracts URL information and returns it in list of dicts."""
2421+ self.initialize()
2422+ return self._real_extract(url)
2423+
2424+ def set_downloader(self, downloader):
2425+ """Sets the downloader for this IE."""
2426+ self._downloader = downloader
2427+
2428+ def _real_initialize(self):
2429+ """Real initialization process. Redefine in subclasses."""
2430+ pass
2431+
2432+ def _real_extract(self, url):
2433+ """Real extraction process. Redefine in subclasses."""
2434+ pass
2435+
2436+
2437+class YoutubeIE(InfoExtractor):
2438+ """Information extractor for youtube.com."""
2439+
2440+ _VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
2441+ _LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
2442+ _LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
2443+ _AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
2444+ _NETRC_MACHINE = 'youtube'
2445+ # Listed in order of quality
2446+ _available_formats = ['38', '37', '22', '45', '35', '44', '34', '18', '43', '6', '5', '17', '13']
2447+ _available_formats_prefer_free = ['38', '37', '45', '22', '44', '35', '43', '34', '18', '6', '5', '17', '13']
2448+ _video_extensions = {
2449+ '13': '3gp',
2450+ '17': 'mp4',
2451+ '18': 'mp4',
2452+ '22': 'mp4',
2453+ '37': 'mp4',
2454+ '38': 'video', # You actually don't know if this will be MOV, AVI or whatever
2455+ '43': 'webm',
2456+ '44': 'webm',
2457+ '45': 'webm',
2458+ }
2459+ _video_dimensions = {
2460+ '5': '240x400',
2461+ '6': '???',
2462+ '13': '???',
2463+ '17': '144x176',
2464+ '18': '360x640',
2465+ '22': '720x1280',
2466+ '34': '360x640',
2467+ '35': '480x854',
2468+ '37': '1080x1920',
2469+ '38': '3072x4096',
2470+ '43': '360x640',
2471+ '44': '480x854',
2472+ '45': '720x1280',
2473+ }
2474+ IE_NAME = u'youtube'
2475+
2476+ def report_lang(self):
2477+ """Report attempt to set language."""
2478+ self._downloader.to_screen(u'[youtube] Setting language')
2479+
2480+ def report_login(self):
2481+ """Report attempt to log in."""
2482+ self._downloader.to_screen(u'[youtube] Logging in')
2483+
2484+ def report_age_confirmation(self):
2485+ """Report attempt to confirm age."""
2486+ self._downloader.to_screen(u'[youtube] Confirming age')
2487+
2488+ def report_video_webpage_download(self, video_id):
2489+ """Report attempt to download video webpage."""
2490+ self._downloader.to_screen(u'[youtube] %s: Downloading video webpage' % video_id)
2491+
2492+ def report_video_info_webpage_download(self, video_id):
2493+ """Report attempt to download video info webpage."""
2494+ self._downloader.to_screen(u'[youtube] %s: Downloading video info webpage' % video_id)
2495+
2496+ def report_information_extraction(self, video_id):
2497+ """Report attempt to extract video information."""
2498+ self._downloader.to_screen(u'[youtube] %s: Extracting video information' % video_id)
2499+
2500+ def report_unavailable_format(self, video_id, format):
2501+ """Report extracted video URL."""
2502+ self._downloader.to_screen(u'[youtube] %s: Format %s not available' % (video_id, format))
2503+
2504+ def report_rtmp_download(self):
2505+ """Indicate the download will use the RTMP protocol."""
2506+ self._downloader.to_screen(u'[youtube] RTMP download detected')
2507+
2508+ def _print_formats(self, formats):
2509+ print 'Available formats:'
2510+ for x in formats:
2511+ print '%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'flv'), self._video_dimensions.get(x, '???'))
2512+
2513+ def _real_initialize(self):
2514+ if self._downloader is None:
2515+ return
2516+
2517+ username = None
2518+ password = None
2519+ downloader_params = self._downloader.params
2520+
2521+ # Attempt to use provided username and password or .netrc data
2522+ if downloader_params.get('username', None) is not None:
2523+ username = downloader_params['username']
2524+ password = downloader_params['password']
2525+ elif downloader_params.get('usenetrc', False):
2526+ try:
2527+ info = netrc.netrc().authenticators(self._NETRC_MACHINE)
2528+ if info is not None:
2529+ username = info[0]
2530+ password = info[2]
2531+ else:
2532+ raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
2533+ except (IOError, netrc.NetrcParseError), err:
2534+ self._downloader.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
2535+ return
2536+
2537+ # Set language
2538+ request = urllib2.Request(self._LANG_URL)
2539+ try:
2540+ self.report_lang()
2541+ urllib2.urlopen(request).read()
2542+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2543+ self._downloader.to_stderr(u'WARNING: unable to set language: %s' % str(err))
2544+ return
2545+
2546+ # No authentication to be performed
2547+ if username is None:
2548+ return
2549+
2550+ # Log in
2551+ login_form = {
2552+ 'current_form': 'loginForm',
2553+ 'next': '/',
2554+ 'action_login': 'Log In',
2555+ 'username': username,
2556+ 'password': password,
2557+ }
2558+ request = urllib2.Request(self._LOGIN_URL, urllib.urlencode(login_form))
2559+ try:
2560+ self.report_login()
2561+ login_results = urllib2.urlopen(request).read()
2562+ if re.search(r'(?i)<form[^>]* name="loginForm"', login_results) is not None:
2563+ self._downloader.to_stderr(u'WARNING: unable to log in: bad username or password')
2564+ return
2565+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2566+ self._downloader.to_stderr(u'WARNING: unable to log in: %s' % str(err))
2567+ return
2568+
2569+ # Confirm age
2570+ age_form = {
2571+ 'next_url': '/',
2572+ 'action_confirm': 'Confirm',
2573+ }
2574+ request = urllib2.Request(self._AGE_URL, urllib.urlencode(age_form))
2575+ try:
2576+ self.report_age_confirmation()
2577+ age_results = urllib2.urlopen(request).read()
2578+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2579+ self._downloader.trouble(u'ERROR: unable to confirm age: %s' % str(err))
2580+ return
2581+
2582+ def _real_extract(self, url):
2583+ # Extract video id from URL
2584+ mobj = re.match(self._VALID_URL, url)
2585+ if mobj is None:
2586+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
2587+ return
2588+ video_id = mobj.group(2)
2589+
2590+ # Get video webpage
2591+ self.report_video_webpage_download(video_id)
2592+ request = urllib2.Request('http://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1' % video_id)
2593+ try:
2594+ video_webpage = urllib2.urlopen(request).read()
2595+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2596+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
2597+ return
2598+
2599+ # Attempt to extract SWF player URL
2600+ mobj = re.search(r'swfConfig.*?"(http:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
2601+ if mobj is not None:
2602+ player_url = re.sub(r'\\(.)', r'\1', mobj.group(1))
2603+ else:
2604+ player_url = None
2605+
2606+ # Get video info
2607+ self.report_video_info_webpage_download(video_id)
2608+ for el_type in ['&el=embedded', '&el=detailpage', '&el=vevo', '']:
2609+ video_info_url = ('http://www.youtube.com/get_video_info?&video_id=%s%s&ps=default&eurl=&gl=US&hl=en'
2610+ % (video_id, el_type))
2611+ request = urllib2.Request(video_info_url)
2612+ try:
2613+ video_info_webpage = urllib2.urlopen(request).read()
2614+ video_info = parse_qs(video_info_webpage)
2615+ if 'token' in video_info:
2616+ break
2617+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2618+ self._downloader.trouble(u'ERROR: unable to download video info webpage: %s' % str(err))
2619+ return
2620+ if 'token' not in video_info:
2621+ if 'reason' in video_info:
2622+ self._downloader.trouble(u'ERROR: YouTube said: %s' % video_info['reason'][0].decode('utf-8'))
2623+ else:
2624+ self._downloader.trouble(u'ERROR: "token" parameter not in video info for unknown reason')
2625+ return
2626+
2627+ # Start extracting information
2628+ self.report_information_extraction(video_id)
2629+
2630+ # uploader
2631+ if 'author' not in video_info:
2632+ self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
2633+ return
2634+ video_uploader = urllib.unquote_plus(video_info['author'][0])
2635+
2636+ # title
2637+ if 'title' not in video_info:
2638+ self._downloader.trouble(u'ERROR: unable to extract video title')
2639+ return
2640+ video_title = urllib.unquote_plus(video_info['title'][0])
2641+ video_title = video_title.decode('utf-8')
2642+ video_title = sanitize_title(video_title)
2643+
2644+ # simplified title
2645+ simple_title = _simplify_title(video_title)
2646+
2647+ # thumbnail image
2648+ if 'thumbnail_url' not in video_info:
2649+ self._downloader.trouble(u'WARNING: unable to extract video thumbnail')
2650+ video_thumbnail = ''
2651+ else: # don't panic if we can't find it
2652+ video_thumbnail = urllib.unquote_plus(video_info['thumbnail_url'][0])
2653+
2654+ # upload date
2655+ upload_date = u'NA'
2656+ mobj = re.search(r'id="eow-date.*?>(.*?)</span>', video_webpage, re.DOTALL)
2657+ if mobj is not None:
2658+ upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split())
2659+ format_expressions = ['%d %B %Y', '%B %d %Y', '%b %d %Y']
2660+ for expression in format_expressions:
2661+ try:
2662+ upload_date = datetime.datetime.strptime(upload_date, expression).strftime('%Y%m%d')
2663+ except:
2664+ pass
2665+
2666+ # description
2667+ try:
2668+ lxml.etree
2669+ except NameError:
2670+ video_description = u'No description available.'
2671+ mobj = re.search(r'<meta name="description" content="(.*?)">', video_webpage)
2672+ if mobj is not None:
2673+ video_description = mobj.group(1).decode('utf-8')
2674+ else:
2675+ html_parser = lxml.etree.HTMLParser(encoding='utf-8')
2676+ vwebpage_doc = lxml.etree.parse(StringIO.StringIO(video_webpage), html_parser)
2677+ video_description = u''.join(vwebpage_doc.xpath('id("eow-description")//text()'))
2678+ # TODO use another parser
2679+
2680+ # token
2681+ video_token = urllib.unquote_plus(video_info['token'][0])
2682+
2683+ # Decide which formats to download
2684+ req_format = self._downloader.params.get('format', None)
2685+
2686+ if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'):
2687+ self.report_rtmp_download()
2688+ video_url_list = [(None, video_info['conn'][0])]
2689+ elif 'url_encoded_fmt_stream_map' in video_info and len(video_info['url_encoded_fmt_stream_map']) >= 1:
2690+ url_data_strs = video_info['url_encoded_fmt_stream_map'][0].split(',')
2691+ url_data = [parse_qs(uds) for uds in url_data_strs]
2692+ url_data = filter(lambda ud: 'itag' in ud and 'url' in ud, url_data)
2693+ url_map = dict((ud['itag'][0], ud['url'][0]) for ud in url_data)
2694+
2695+ format_limit = self._downloader.params.get('format_limit', None)
2696+ available_formats = self._available_formats_prefer_free if self._downloader.params.get('prefer_free_formats', False) else self._available_formats
2697+ if format_limit is not None and format_limit in available_formats:
2698+ format_list = available_formats[available_formats.index(format_limit):]
2699+ else:
2700+ format_list = available_formats
2701+ existing_formats = [x for x in format_list if x in url_map]
2702+ if len(existing_formats) == 0:
2703+ self._downloader.trouble(u'ERROR: no known formats available for video')
2704+ return
2705+ if self._downloader.params.get('listformats', None):
2706+ self._print_formats(existing_formats)
2707+ return
2708+ if req_format is None or req_format == 'best':
2709+ video_url_list = [(existing_formats[0], url_map[existing_formats[0]])] # Best quality
2710+ elif req_format == 'worst':
2711+ video_url_list = [(existing_formats[len(existing_formats)-1], url_map[existing_formats[len(existing_formats)-1]])] # worst quality
2712+ elif req_format in ('-1', 'all'):
2713+ video_url_list = [(f, url_map[f]) for f in existing_formats] # All formats
2714+ else:
2715+ # Specific formats. We pick the first in a slash-delimeted sequence.
2716+ # For example, if '1/2/3/4' is requested and '2' and '4' are available, we pick '2'.
2717+ req_formats = req_format.split('/')
2718+ video_url_list = None
2719+ for rf in req_formats:
2720+ if rf in url_map:
2721+ video_url_list = [(rf, url_map[rf])]
2722+ break
2723+ if video_url_list is None:
2724+ self._downloader.trouble(u'ERROR: requested format not available')
2725+ return
2726+ else:
2727+ self._downloader.trouble(u'ERROR: no conn or url_encoded_fmt_stream_map information found in video info')
2728+ return
2729+
2730+ for format_param, video_real_url in video_url_list:
2731+ # At this point we have a new video
2732+ self._downloader.increment_downloads()
2733+
2734+ # Extension
2735+ video_extension = self._video_extensions.get(format_param, 'flv')
2736+
2737+ try:
2738+ # Process video information
2739+ self._downloader.process_info({
2740+ 'id': video_id.decode('utf-8'),
2741+ 'url': video_real_url.decode('utf-8'),
2742+ 'uploader': video_uploader.decode('utf-8'),
2743+ 'upload_date': upload_date,
2744+ 'title': video_title,
2745+ 'stitle': simple_title,
2746+ 'ext': video_extension.decode('utf-8'),
2747+ 'format': (format_param is None and u'NA' or format_param.decode('utf-8')),
2748+ 'thumbnail': video_thumbnail.decode('utf-8'),
2749+ 'description': video_description,
2750+ 'player_url': player_url,
2751+ })
2752+ except UnavailableVideoError, err:
2753+ self._downloader.trouble(u'\nERROR: unable to download video')
2754+
2755+
2756+class MetacafeIE(InfoExtractor):
2757+ """Information Extractor for metacafe.com."""
2758+
2759+ _VALID_URL = r'(?:http://)?(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
2760+ _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
2761+ _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
2762+ _youtube_ie = None
2763+ IE_NAME = u'metacafe'
2764+
2765+ def __init__(self, youtube_ie, downloader=None):
2766+ InfoExtractor.__init__(self, downloader)
2767+ self._youtube_ie = youtube_ie
2768+
2769+ def report_disclaimer(self):
2770+ """Report disclaimer retrieval."""
2771+ self._downloader.to_screen(u'[metacafe] Retrieving disclaimer')
2772+
2773+ def report_age_confirmation(self):
2774+ """Report attempt to confirm age."""
2775+ self._downloader.to_screen(u'[metacafe] Confirming age')
2776+
2777+ def report_download_webpage(self, video_id):
2778+ """Report webpage download."""
2779+ self._downloader.to_screen(u'[metacafe] %s: Downloading webpage' % video_id)
2780+
2781+ def report_extraction(self, video_id):
2782+ """Report information extraction."""
2783+ self._downloader.to_screen(u'[metacafe] %s: Extracting information' % video_id)
2784+
2785+ def _real_initialize(self):
2786+ # Retrieve disclaimer
2787+ request = urllib2.Request(self._DISCLAIMER)
2788+ try:
2789+ self.report_disclaimer()
2790+ disclaimer = urllib2.urlopen(request).read()
2791+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2792+ self._downloader.trouble(u'ERROR: unable to retrieve disclaimer: %s' % str(err))
2793+ return
2794+
2795+ # Confirm age
2796+ disclaimer_form = {
2797+ 'filters': '0',
2798+ 'submit': "Continue - I'm over 18",
2799+ }
2800+ request = urllib2.Request(self._FILTER_POST, urllib.urlencode(disclaimer_form))
2801+ try:
2802+ self.report_age_confirmation()
2803+ disclaimer = urllib2.urlopen(request).read()
2804+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2805+ self._downloader.trouble(u'ERROR: unable to confirm age: %s' % str(err))
2806+ return
2807+
2808+ def _real_extract(self, url):
2809+ # Extract id and simplified title from URL
2810+ mobj = re.match(self._VALID_URL, url)
2811+ if mobj is None:
2812+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
2813+ return
2814+
2815+ video_id = mobj.group(1)
2816+
2817+ # Check if video comes from YouTube
2818+ mobj2 = re.match(r'^yt-(.*)$', video_id)
2819+ if mobj2 is not None:
2820+ self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % mobj2.group(1))
2821+ return
2822+
2823+ # At this point we have a new video
2824+ self._downloader.increment_downloads()
2825+
2826+ simple_title = mobj.group(2).decode('utf-8')
2827+
2828+ # Retrieve video webpage to extract further information
2829+ request = urllib2.Request('http://www.metacafe.com/watch/%s/' % video_id)
2830+ try:
2831+ self.report_download_webpage(video_id)
2832+ webpage = urllib2.urlopen(request).read()
2833+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2834+ self._downloader.trouble(u'ERROR: unable retrieve video webpage: %s' % str(err))
2835+ return
2836+
2837+ # Extract URL, uploader and title from webpage
2838+ self.report_extraction(video_id)
2839+ mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage)
2840+ if mobj is not None:
2841+ mediaURL = urllib.unquote(mobj.group(1))
2842+ video_extension = mediaURL[-3:]
2843+
2844+ # Extract gdaKey if available
2845+ mobj = re.search(r'(?m)&gdaKey=(.*?)&', webpage)
2846+ if mobj is None:
2847+ video_url = mediaURL
2848+ else:
2849+ gdaKey = mobj.group(1)
2850+ video_url = '%s?__gda__=%s' % (mediaURL, gdaKey)
2851+ else:
2852+ mobj = re.search(r' name="flashvars" value="(.*?)"', webpage)
2853+ if mobj is None:
2854+ self._downloader.trouble(u'ERROR: unable to extract media URL')
2855+ return
2856+ vardict = parse_qs(mobj.group(1))
2857+ if 'mediaData' not in vardict:
2858+ self._downloader.trouble(u'ERROR: unable to extract media URL')
2859+ return
2860+ mobj = re.search(r'"mediaURL":"(http.*?)","key":"(.*?)"', vardict['mediaData'][0])
2861+ if mobj is None:
2862+ self._downloader.trouble(u'ERROR: unable to extract media URL')
2863+ return
2864+ mediaURL = mobj.group(1).replace('\\/', '/')
2865+ video_extension = mediaURL[-3:]
2866+ video_url = '%s?__gda__=%s' % (mediaURL, mobj.group(2))
2867+
2868+ mobj = re.search(r'(?im)<title>(.*) - Video</title>', webpage)
2869+ if mobj is None:
2870+ self._downloader.trouble(u'ERROR: unable to extract title')
2871+ return
2872+ video_title = mobj.group(1).decode('utf-8')
2873+ video_title = sanitize_title(video_title)
2874+
2875+ mobj = re.search(r'(?ms)By:\s*<a .*?>(.+?)<', webpage)
2876+ if mobj is None:
2877+ self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
2878+ return
2879+ video_uploader = mobj.group(1)
2880+
2881+ try:
2882+ # Process video information
2883+ self._downloader.process_info({
2884+ 'id': video_id.decode('utf-8'),
2885+ 'url': video_url.decode('utf-8'),
2886+ 'uploader': video_uploader.decode('utf-8'),
2887+ 'upload_date': u'NA',
2888+ 'title': video_title,
2889+ 'stitle': simple_title,
2890+ 'ext': video_extension.decode('utf-8'),
2891+ 'format': u'NA',
2892+ 'player_url': None,
2893+ })
2894+ except UnavailableVideoError:
2895+ self._downloader.trouble(u'\nERROR: unable to download video')
2896+
2897+
2898+class DailymotionIE(InfoExtractor):
2899+ """Information Extractor for Dailymotion"""
2900+
2901+ _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^_/]+)_([^/]+)'
2902+ IE_NAME = u'dailymotion'
2903+
2904+ def __init__(self, downloader=None):
2905+ InfoExtractor.__init__(self, downloader)
2906+
2907+ def report_download_webpage(self, video_id):
2908+ """Report webpage download."""
2909+ self._downloader.to_screen(u'[dailymotion] %s: Downloading webpage' % video_id)
2910+
2911+ def report_extraction(self, video_id):
2912+ """Report information extraction."""
2913+ self._downloader.to_screen(u'[dailymotion] %s: Extracting information' % video_id)
2914+
2915+ def _real_extract(self, url):
2916+ # Extract id and simplified title from URL
2917+ mobj = re.match(self._VALID_URL, url)
2918+ if mobj is None:
2919+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
2920+ return
2921+
2922+ # At this point we have a new video
2923+ self._downloader.increment_downloads()
2924+ video_id = mobj.group(1)
2925+
2926+ video_extension = 'flv'
2927+
2928+ # Retrieve video webpage to extract further information
2929+ request = urllib2.Request(url)
2930+ request.add_header('Cookie', 'family_filter=off')
2931+ try:
2932+ self.report_download_webpage(video_id)
2933+ webpage = urllib2.urlopen(request).read()
2934+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
2935+ self._downloader.trouble(u'ERROR: unable retrieve video webpage: %s' % str(err))
2936+ return
2937+
2938+ # Extract URL, uploader and title from webpage
2939+ self.report_extraction(video_id)
2940+ mobj = re.search(r'(?i)addVariable\(\"sequence\"\s*,\s*\"([^\"]+?)\"\)', webpage)
2941+ if mobj is None:
2942+ self._downloader.trouble(u'ERROR: unable to extract media URL')
2943+ return
2944+ sequence = urllib.unquote(mobj.group(1))
2945+ mobj = re.search(r',\"sdURL\"\:\"([^\"]+?)\",', sequence)
2946+ if mobj is None:
2947+ self._downloader.trouble(u'ERROR: unable to extract media URL')
2948+ return
2949+ mediaURL = urllib.unquote(mobj.group(1)).replace('\\', '')
2950+
2951+ # if needed add http://www.dailymotion.com/ if relative URL
2952+
2953+ video_url = mediaURL
2954+
2955+ mobj = re.search(r'<meta property="og:title" content="(?P<title>[^"]*)" />', webpage)
2956+ if mobj is None:
2957+ self._downloader.trouble(u'ERROR: unable to extract title')
2958+ return
2959+ video_title = _unescapeHTML(mobj.group('title').decode('utf-8'))
2960+ video_title = sanitize_title(video_title)
2961+ simple_title = _simplify_title(video_title)
2962+
2963+ mobj = re.search(r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a></span>', webpage)
2964+ if mobj is None:
2965+ self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
2966+ return
2967+ video_uploader = mobj.group(1)
2968+
2969+ try:
2970+ # Process video information
2971+ self._downloader.process_info({
2972+ 'id': video_id.decode('utf-8'),
2973+ 'url': video_url.decode('utf-8'),
2974+ 'uploader': video_uploader.decode('utf-8'),
2975+ 'upload_date': u'NA',
2976+ 'title': video_title,
2977+ 'stitle': simple_title,
2978+ 'ext': video_extension.decode('utf-8'),
2979+ 'format': u'NA',
2980+ 'player_url': None,
2981+ })
2982+ except UnavailableVideoError:
2983+ self._downloader.trouble(u'\nERROR: unable to download video')
2984+
2985+
2986+class GoogleIE(InfoExtractor):
2987+ """Information extractor for video.google.com."""
2988+
2989+ _VALID_URL = r'(?:http://)?video\.google\.(?:com(?:\.au)?|co\.(?:uk|jp|kr|cr)|ca|de|es|fr|it|nl|pl)/videoplay\?docid=([^\&]+).*'
2990+ IE_NAME = u'video.google'
2991+
2992+ def __init__(self, downloader=None):
2993+ InfoExtractor.__init__(self, downloader)
2994+
2995+ def report_download_webpage(self, video_id):
2996+ """Report webpage download."""
2997+ self._downloader.to_screen(u'[video.google] %s: Downloading webpage' % video_id)
2998+
2999+ def report_extraction(self, video_id):
3000+ """Report information extraction."""
3001+ self._downloader.to_screen(u'[video.google] %s: Extracting information' % video_id)
3002+
3003+ def _real_extract(self, url):
3004+ # Extract id from URL
3005+ mobj = re.match(self._VALID_URL, url)
3006+ if mobj is None:
3007+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3008+ return
3009+
3010+ # At this point we have a new video
3011+ self._downloader.increment_downloads()
3012+ video_id = mobj.group(1)
3013+
3014+ video_extension = 'mp4'
3015+
3016+ # Retrieve video webpage to extract further information
3017+ request = urllib2.Request('http://video.google.com/videoplay?docid=%s&hl=en&oe=utf-8' % video_id)
3018+ try:
3019+ self.report_download_webpage(video_id)
3020+ webpage = urllib2.urlopen(request).read()
3021+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3022+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3023+ return
3024+
3025+ # Extract URL, uploader, and title from webpage
3026+ self.report_extraction(video_id)
3027+ mobj = re.search(r"download_url:'([^']+)'", webpage)
3028+ if mobj is None:
3029+ video_extension = 'flv'
3030+ mobj = re.search(r"(?i)videoUrl\\x3d(.+?)\\x26", webpage)
3031+ if mobj is None:
3032+ self._downloader.trouble(u'ERROR: unable to extract media URL')
3033+ return
3034+ mediaURL = urllib.unquote(mobj.group(1))
3035+ mediaURL = mediaURL.replace('\\x3d', '\x3d')
3036+ mediaURL = mediaURL.replace('\\x26', '\x26')
3037+
3038+ video_url = mediaURL
3039+
3040+ mobj = re.search(r'<title>(.*)</title>', webpage)
3041+ if mobj is None:
3042+ self._downloader.trouble(u'ERROR: unable to extract title')
3043+ return
3044+ video_title = mobj.group(1).decode('utf-8')
3045+ video_title = sanitize_title(video_title)
3046+ simple_title = _simplify_title(video_title)
3047+
3048+ # Extract video description
3049+ mobj = re.search(r'<span id=short-desc-content>([^<]*)</span>', webpage)
3050+ if mobj is None:
3051+ self._downloader.trouble(u'ERROR: unable to extract video description')
3052+ return
3053+ video_description = mobj.group(1).decode('utf-8')
3054+ if not video_description:
3055+ video_description = 'No description available.'
3056+
3057+ # Extract video thumbnail
3058+ if self._downloader.params.get('forcethumbnail', False):
3059+ request = urllib2.Request('http://video.google.com/videosearch?q=%s+site:video.google.com&hl=en' % abs(int(video_id)))
3060+ try:
3061+ webpage = urllib2.urlopen(request).read()
3062+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3063+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3064+ return
3065+ mobj = re.search(r'<img class=thumbnail-img (?:.* )?src=(http.*)>', webpage)
3066+ if mobj is None:
3067+ self._downloader.trouble(u'ERROR: unable to extract video thumbnail')
3068+ return
3069+ video_thumbnail = mobj.group(1)
3070+ else: # we need something to pass to process_info
3071+ video_thumbnail = ''
3072+
3073+ try:
3074+ # Process video information
3075+ self._downloader.process_info({
3076+ 'id': video_id.decode('utf-8'),
3077+ 'url': video_url.decode('utf-8'),
3078+ 'uploader': u'NA',
3079+ 'upload_date': u'NA',
3080+ 'title': video_title,
3081+ 'stitle': simple_title,
3082+ 'ext': video_extension.decode('utf-8'),
3083+ 'format': u'NA',
3084+ 'player_url': None,
3085+ })
3086+ except UnavailableVideoError:
3087+ self._downloader.trouble(u'\nERROR: unable to download video')
3088+
3089+
3090+class PhotobucketIE(InfoExtractor):
3091+ """Information extractor for photobucket.com."""
3092+
3093+ _VALID_URL = r'(?:http://)?(?:[a-z0-9]+\.)?photobucket\.com/.*[\?\&]current=(.*\.flv)'
3094+ IE_NAME = u'photobucket'
3095+
3096+ def __init__(self, downloader=None):
3097+ InfoExtractor.__init__(self, downloader)
3098+
3099+ def report_download_webpage(self, video_id):
3100+ """Report webpage download."""
3101+ self._downloader.to_screen(u'[photobucket] %s: Downloading webpage' % video_id)
3102+
3103+ def report_extraction(self, video_id):
3104+ """Report information extraction."""
3105+ self._downloader.to_screen(u'[photobucket] %s: Extracting information' % video_id)
3106+
3107+ def _real_extract(self, url):
3108+ # Extract id from URL
3109+ mobj = re.match(self._VALID_URL, url)
3110+ if mobj is None:
3111+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3112+ return
3113+
3114+ # At this point we have a new video
3115+ self._downloader.increment_downloads()
3116+ video_id = mobj.group(1)
3117+
3118+ video_extension = 'flv'
3119+
3120+ # Retrieve video webpage to extract further information
3121+ request = urllib2.Request(url)
3122+ try:
3123+ self.report_download_webpage(video_id)
3124+ webpage = urllib2.urlopen(request).read()
3125+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3126+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3127+ return
3128+
3129+ # Extract URL, uploader, and title from webpage
3130+ self.report_extraction(video_id)
3131+ mobj = re.search(r'<link rel="video_src" href=".*\?file=([^"]+)" />', webpage)
3132+ if mobj is None:
3133+ self._downloader.trouble(u'ERROR: unable to extract media URL')
3134+ return
3135+ mediaURL = urllib.unquote(mobj.group(1))
3136+
3137+ video_url = mediaURL
3138+
3139+ mobj = re.search(r'<title>(.*) video by (.*) - Photobucket</title>', webpage)
3140+ if mobj is None:
3141+ self._downloader.trouble(u'ERROR: unable to extract title')
3142+ return
3143+ video_title = mobj.group(1).decode('utf-8')
3144+ video_title = sanitize_title(video_title)
3145+ simple_title = _simplify_title(vide_title)
3146+
3147+ video_uploader = mobj.group(2).decode('utf-8')
3148+
3149+ try:
3150+ # Process video information
3151+ self._downloader.process_info({
3152+ 'id': video_id.decode('utf-8'),
3153+ 'url': video_url.decode('utf-8'),
3154+ 'uploader': video_uploader,
3155+ 'upload_date': u'NA',
3156+ 'title': video_title,
3157+ 'stitle': simple_title,
3158+ 'ext': video_extension.decode('utf-8'),
3159+ 'format': u'NA',
3160+ 'player_url': None,
3161+ })
3162+ except UnavailableVideoError:
3163+ self._downloader.trouble(u'\nERROR: unable to download video')
3164+
3165+
3166+class YahooIE(InfoExtractor):
3167+ """Information extractor for video.yahoo.com."""
3168+
3169+ # _VALID_URL matches all Yahoo! Video URLs
3170+ # _VPAGE_URL matches only the extractable '/watch/' URLs
3171+ _VALID_URL = r'(?:http://)?(?:[a-z]+\.)?video\.yahoo\.com/(?:watch|network)/([0-9]+)(?:/|\?v=)([0-9]+)(?:[#\?].*)?'
3172+ _VPAGE_URL = r'(?:http://)?video\.yahoo\.com/watch/([0-9]+)/([0-9]+)(?:[#\?].*)?'
3173+ IE_NAME = u'video.yahoo'
3174+
3175+ def __init__(self, downloader=None):
3176+ InfoExtractor.__init__(self, downloader)
3177+
3178+ def report_download_webpage(self, video_id):
3179+ """Report webpage download."""
3180+ self._downloader.to_screen(u'[video.yahoo] %s: Downloading webpage' % video_id)
3181+
3182+ def report_extraction(self, video_id):
3183+ """Report information extraction."""
3184+ self._downloader.to_screen(u'[video.yahoo] %s: Extracting information' % video_id)
3185+
3186+ def _real_extract(self, url, new_video=True):
3187+ # Extract ID from URL
3188+ mobj = re.match(self._VALID_URL, url)
3189+ if mobj is None:
3190+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3191+ return
3192+
3193+ # At this point we have a new video
3194+ self._downloader.increment_downloads()
3195+ video_id = mobj.group(2)
3196+ video_extension = 'flv'
3197+
3198+ # Rewrite valid but non-extractable URLs as
3199+ # extractable English language /watch/ URLs
3200+ if re.match(self._VPAGE_URL, url) is None:
3201+ request = urllib2.Request(url)
3202+ try:
3203+ webpage = urllib2.urlopen(request).read()
3204+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3205+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3206+ return
3207+
3208+ mobj = re.search(r'\("id", "([0-9]+)"\);', webpage)
3209+ if mobj is None:
3210+ self._downloader.trouble(u'ERROR: Unable to extract id field')
3211+ return
3212+ yahoo_id = mobj.group(1)
3213+
3214+ mobj = re.search(r'\("vid", "([0-9]+)"\);', webpage)
3215+ if mobj is None:
3216+ self._downloader.trouble(u'ERROR: Unable to extract vid field')
3217+ return
3218+ yahoo_vid = mobj.group(1)
3219+
3220+ url = 'http://video.yahoo.com/watch/%s/%s' % (yahoo_vid, yahoo_id)
3221+ return self._real_extract(url, new_video=False)
3222+
3223+ # Retrieve video webpage to extract further information
3224+ request = urllib2.Request(url)
3225+ try:
3226+ self.report_download_webpage(video_id)
3227+ webpage = urllib2.urlopen(request).read()
3228+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3229+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3230+ return
3231+
3232+ # Extract uploader and title from webpage
3233+ self.report_extraction(video_id)
3234+ mobj = re.search(r'<meta name="title" content="(.*)" />', webpage)
3235+ if mobj is None:
3236+ self._downloader.trouble(u'ERROR: unable to extract video title')
3237+ return
3238+ video_title = mobj.group(1).decode('utf-8')
3239+ simple_title = _simplify_title(video_title)
3240+
3241+ mobj = re.search(r'<h2 class="ti-5"><a href="http://video\.yahoo\.com/(people|profile)/[0-9]+" beacon=".*">(.*)</a></h2>', webpage)
3242+ if mobj is None:
3243+ self._downloader.trouble(u'ERROR: unable to extract video uploader')
3244+ return
3245+ video_uploader = mobj.group(1).decode('utf-8')
3246+
3247+ # Extract video thumbnail
3248+ mobj = re.search(r'<link rel="image_src" href="(.*)" />', webpage)
3249+ if mobj is None:
3250+ self._downloader.trouble(u'ERROR: unable to extract video thumbnail')
3251+ return
3252+ video_thumbnail = mobj.group(1).decode('utf-8')
3253+
3254+ # Extract video description
3255+ mobj = re.search(r'<meta name="description" content="(.*)" />', webpage)
3256+ if mobj is None:
3257+ self._downloader.trouble(u'ERROR: unable to extract video description')
3258+ return
3259+ video_description = mobj.group(1).decode('utf-8')
3260+ if not video_description:
3261+ video_description = 'No description available.'
3262+
3263+ # Extract video height and width
3264+ mobj = re.search(r'<meta name="video_height" content="([0-9]+)" />', webpage)
3265+ if mobj is None:
3266+ self._downloader.trouble(u'ERROR: unable to extract video height')
3267+ return
3268+ yv_video_height = mobj.group(1)
3269+
3270+ mobj = re.search(r'<meta name="video_width" content="([0-9]+)" />', webpage)
3271+ if mobj is None:
3272+ self._downloader.trouble(u'ERROR: unable to extract video width')
3273+ return
3274+ yv_video_width = mobj.group(1)
3275+
3276+ # Retrieve video playlist to extract media URL
3277+ # I'm not completely sure what all these options are, but we
3278+ # seem to need most of them, otherwise the server sends a 401.
3279+ yv_lg = 'R0xx6idZnW2zlrKP8xxAIR' # not sure what this represents
3280+ yv_bitrate = '700' # according to Wikipedia this is hard-coded
3281+ request = urllib2.Request('http://cosmos.bcst.yahoo.com/up/yep/process/getPlaylistFOP.php?node_id=' + video_id +
3282+ '&tech=flash&mode=playlist&lg=' + yv_lg + '&bitrate=' + yv_bitrate + '&vidH=' + yv_video_height +
3283+ '&vidW=' + yv_video_width + '&swf=as3&rd=video.yahoo.com&tk=null&adsupported=v1,v2,&eventid=1301797')
3284+ try:
3285+ self.report_download_webpage(video_id)
3286+ webpage = urllib2.urlopen(request).read()
3287+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3288+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3289+ return
3290+
3291+ # Extract media URL from playlist XML
3292+ mobj = re.search(r'<STREAM APP="(http://.*)" FULLPATH="/?(/.*\.flv\?[^"]*)"', webpage)
3293+ if mobj is None:
3294+ self._downloader.trouble(u'ERROR: Unable to extract media URL')
3295+ return
3296+ video_url = urllib.unquote(mobj.group(1) + mobj.group(2)).decode('utf-8')
3297+ video_url = re.sub(r'(?u)&(.+?);', htmlentity_transform, video_url)
3298+
3299+ try:
3300+ # Process video information
3301+ self._downloader.process_info({
3302+ 'id': video_id.decode('utf-8'),
3303+ 'url': video_url,
3304+ 'uploader': video_uploader,
3305+ 'upload_date': u'NA',
3306+ 'title': video_title,
3307+ 'stitle': simple_title,
3308+ 'ext': video_extension.decode('utf-8'),
3309+ 'thumbnail': video_thumbnail.decode('utf-8'),
3310+ 'description': video_description,
3311+ 'thumbnail': video_thumbnail,
3312+ 'player_url': None,
3313+ })
3314+ except UnavailableVideoError:
3315+ self._downloader.trouble(u'\nERROR: unable to download video')
3316+
3317+
3318+class VimeoIE(InfoExtractor):
3319+ """Information extractor for vimeo.com."""
3320+
3321+ # _VALID_URL matches Vimeo URLs
3322+ _VALID_URL = r'(?:https?://)?(?:(?:www|player).)?vimeo\.com/(?:groups/[^/]+/)?(?:videos?/)?([0-9]+)'
3323+ IE_NAME = u'vimeo'
3324+
3325+ def __init__(self, downloader=None):
3326+ InfoExtractor.__init__(self, downloader)
3327+
3328+ def report_download_webpage(self, video_id):
3329+ """Report webpage download."""
3330+ self._downloader.to_screen(u'[vimeo] %s: Downloading webpage' % video_id)
3331+
3332+ def report_extraction(self, video_id):
3333+ """Report information extraction."""
3334+ self._downloader.to_screen(u'[vimeo] %s: Extracting information' % video_id)
3335+
3336+ def _real_extract(self, url, new_video=True):
3337+ # Extract ID from URL
3338+ mobj = re.match(self._VALID_URL, url)
3339+ if mobj is None:
3340+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3341+ return
3342+
3343+ # At this point we have a new video
3344+ self._downloader.increment_downloads()
3345+ video_id = mobj.group(1)
3346+
3347+ # Retrieve video webpage to extract further information
3348+ request = urllib2.Request("http://vimeo.com/moogaloop/load/clip:%s" % video_id, None, std_headers)
3349+ try:
3350+ self.report_download_webpage(video_id)
3351+ webpage = urllib2.urlopen(request).read()
3352+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3353+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3354+ return
3355+
3356+ # Now we begin extracting as much information as we can from what we
3357+ # retrieved. First we extract the information common to all extractors,
3358+ # and latter we extract those that are Vimeo specific.
3359+ self.report_extraction(video_id)
3360+
3361+ # Extract title
3362+ mobj = re.search(r'<caption>(.*?)</caption>', webpage)
3363+ if mobj is None:
3364+ self._downloader.trouble(u'ERROR: unable to extract video title')
3365+ return
3366+ video_title = mobj.group(1).decode('utf-8')
3367+ simple_title = _simplify_title(video_title)
3368+
3369+ # Extract uploader
3370+ mobj = re.search(r'<uploader_url>http://vimeo.com/(.*?)</uploader_url>', webpage)
3371+ if mobj is None:
3372+ self._downloader.trouble(u'ERROR: unable to extract video uploader')
3373+ return
3374+ video_uploader = mobj.group(1).decode('utf-8')
3375+
3376+ # Extract video thumbnail
3377+ mobj = re.search(r'<thumbnail>(.*?)</thumbnail>', webpage)
3378+ if mobj is None:
3379+ self._downloader.trouble(u'ERROR: unable to extract video thumbnail')
3380+ return
3381+ video_thumbnail = mobj.group(1).decode('utf-8')
3382+
3383+ # # Extract video description
3384+ # mobj = re.search(r'<meta property="og:description" content="(.*)" />', webpage)
3385+ # if mobj is None:
3386+ # self._downloader.trouble(u'ERROR: unable to extract video description')
3387+ # return
3388+ # video_description = mobj.group(1).decode('utf-8')
3389+ # if not video_description: video_description = 'No description available.'
3390+ video_description = 'Foo.'
3391+
3392+ # Vimeo specific: extract request signature
3393+ mobj = re.search(r'<request_signature>(.*?)</request_signature>', webpage)
3394+ if mobj is None:
3395+ self._downloader.trouble(u'ERROR: unable to extract request signature')
3396+ return
3397+ sig = mobj.group(1).decode('utf-8')
3398+
3399+ # Vimeo specific: extract video quality information
3400+ mobj = re.search(r'<isHD>(\d+)</isHD>', webpage)
3401+ if mobj is None:
3402+ self._downloader.trouble(u'ERROR: unable to extract video quality information')
3403+ return
3404+ quality = mobj.group(1).decode('utf-8')
3405+
3406+ if int(quality) == 1:
3407+ quality = 'hd'
3408+ else:
3409+ quality = 'sd'
3410+
3411+ # Vimeo specific: Extract request signature expiration
3412+ mobj = re.search(r'<request_signature_expires>(.*?)</request_signature_expires>', webpage)
3413+ if mobj is None:
3414+ self._downloader.trouble(u'ERROR: unable to extract request signature expiration')
3415+ return
3416+ sig_exp = mobj.group(1).decode('utf-8')
3417+
3418+ video_url = "http://vimeo.com/moogaloop/play/clip:%s/%s/%s/?q=%s" % (video_id, sig, sig_exp, quality)
3419+
3420+ try:
3421+ # Process video information
3422+ self._downloader.process_info({
3423+ 'id': video_id.decode('utf-8'),
3424+ 'url': video_url,
3425+ 'uploader': video_uploader,
3426+ 'upload_date': u'NA',
3427+ 'title': video_title,
3428+ 'stitle': simple_title,
3429+ 'ext': u'mp4',
3430+ 'thumbnail': video_thumbnail.decode('utf-8'),
3431+ 'description': video_description,
3432+ 'thumbnail': video_thumbnail,
3433+ 'description': video_description,
3434+ 'player_url': None,
3435+ })
3436+ except UnavailableVideoError:
3437+ self._downloader.trouble(u'ERROR: unable to download video')
3438+
3439+
3440+class GenericIE(InfoExtractor):
3441+ """Generic last-resort information extractor."""
3442+
3443+ _VALID_URL = r'.*'
3444+ IE_NAME = u'generic'
3445+
3446+ def __init__(self, downloader=None):
3447+ InfoExtractor.__init__(self, downloader)
3448+
3449+ def report_download_webpage(self, video_id):
3450+ """Report webpage download."""
3451+ self._downloader.to_screen(u'WARNING: Falling back on generic information extractor.')
3452+ self._downloader.to_screen(u'[generic] %s: Downloading webpage' % video_id)
3453+
3454+ def report_extraction(self, video_id):
3455+ """Report information extraction."""
3456+ self._downloader.to_screen(u'[generic] %s: Extracting information' % video_id)
3457+
3458+ def _real_extract(self, url):
3459+ # At this point we have a new video
3460+ self._downloader.increment_downloads()
3461+
3462+ video_id = url.split('/')[-1]
3463+ request = urllib2.Request(url)
3464+ try:
3465+ self.report_download_webpage(video_id)
3466+ webpage = urllib2.urlopen(request).read()
3467+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3468+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
3469+ return
3470+ except ValueError, err:
3471+ # since this is the last-resort InfoExtractor, if
3472+ # this error is thrown, it'll be thrown here
3473+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3474+ return
3475+
3476+ self.report_extraction(video_id)
3477+ # Start with something easy: JW Player in SWFObject
3478+ mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
3479+ if mobj is None:
3480+ # Broaden the search a little bit
3481+ mobj = re.search(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)
3482+ if mobj is None:
3483+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3484+ return
3485+
3486+ # It's possible that one of the regexes
3487+ # matched, but returned an empty group:
3488+ if mobj.group(1) is None:
3489+ self._downloader.trouble(u'ERROR: Invalid URL: %s' % url)
3490+ return
3491+
3492+ video_url = urllib.unquote(mobj.group(1))
3493+ video_id = os.path.basename(video_url)
3494+
3495+ # here's a fun little line of code for you:
3496+ video_extension = os.path.splitext(video_id)[1][1:]
3497+ video_id = os.path.splitext(video_id)[0]
3498+
3499+ # it's tempting to parse this further, but you would
3500+ # have to take into account all the variations like
3501+ # Video Title - Site Name
3502+ # Site Name | Video Title
3503+ # Video Title - Tagline | Site Name
3504+ # and so on and so forth; it's just not practical
3505+ mobj = re.search(r'<title>(.*)</title>', webpage)
3506+ if mobj is None:
3507+ self._downloader.trouble(u'ERROR: unable to extract title')
3508+ return
3509+ video_title = mobj.group(1).decode('utf-8')
3510+ video_title = sanitize_title(video_title)
3511+ simple_title = _simplify_title(video_title)
3512+
3513+ # video uploader is domain name
3514+ mobj = re.match(r'(?:https?://)?([^/]*)/.*', url)
3515+ if mobj is None:
3516+ self._downloader.trouble(u'ERROR: unable to extract title')
3517+ return
3518+ video_uploader = mobj.group(1).decode('utf-8')
3519+
3520+ try:
3521+ # Process video information
3522+ self._downloader.process_info({
3523+ 'id': video_id.decode('utf-8'),
3524+ 'url': video_url.decode('utf-8'),
3525+ 'uploader': video_uploader,
3526+ 'upload_date': u'NA',
3527+ 'title': video_title,
3528+ 'stitle': simple_title,
3529+ 'ext': video_extension.decode('utf-8'),
3530+ 'format': u'NA',
3531+ 'player_url': None,
3532+ })
3533+ except UnavailableVideoError, err:
3534+ self._downloader.trouble(u'\nERROR: unable to download video')
3535+
3536+
3537+class YoutubeSearchIE(InfoExtractor):
3538+ """Information Extractor for YouTube search queries."""
3539+ _VALID_URL = r'ytsearch(\d+|all)?:[\s\S]+'
3540+ _TEMPLATE_URL = 'http://www.youtube.com/results?search_query=%s&page=%s&gl=US&hl=en'
3541+ _VIDEO_INDICATOR = r'href="/watch\?v=.+?"'
3542+ _MORE_PAGES_INDICATOR = r'(?m)>\s*Next\s*</a>'
3543+ _youtube_ie = None
3544+ _max_youtube_results = 1000
3545+ IE_NAME = u'youtube:search'
3546+
3547+ def __init__(self, youtube_ie, downloader=None):
3548+ InfoExtractor.__init__(self, downloader)
3549+ self._youtube_ie = youtube_ie
3550+
3551+ def report_download_page(self, query, pagenum):
3552+ """Report attempt to download playlist page with given number."""
3553+ query = query.decode(preferredencoding())
3554+ self._downloader.to_screen(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
3555+
3556+ def _real_initialize(self):
3557+ self._youtube_ie.initialize()
3558+
3559+ def _real_extract(self, query):
3560+ mobj = re.match(self._VALID_URL, query)
3561+ if mobj is None:
3562+ self._downloader.trouble(u'ERROR: invalid search query "%s"' % query)
3563+ return
3564+
3565+ prefix, query = query.split(':')
3566+ prefix = prefix[8:]
3567+ query = query.encode('utf-8')
3568+ if prefix == '':
3569+ self._download_n_results(query, 1)
3570+ return
3571+ elif prefix == 'all':
3572+ self._download_n_results(query, self._max_youtube_results)
3573+ return
3574+ else:
3575+ try:
3576+ n = long(prefix)
3577+ if n <= 0:
3578+ self._downloader.trouble(u'ERROR: invalid download number %s for query "%s"' % (n, query))
3579+ return
3580+ elif n > self._max_youtube_results:
3581+ self._downloader.to_stderr(u'WARNING: ytsearch returns max %i results (you requested %i)' % (self._max_youtube_results, n))
3582+ n = self._max_youtube_results
3583+ self._download_n_results(query, n)
3584+ return
3585+ except ValueError: # parsing prefix as integer fails
3586+ self._download_n_results(query, 1)
3587+ return
3588+
3589+ def _download_n_results(self, query, n):
3590+ """Downloads a specified number of results for a query"""
3591+
3592+ video_ids = []
3593+ already_seen = set()
3594+ pagenum = 1
3595+
3596+ while True:
3597+ self.report_download_page(query, pagenum)
3598+ result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
3599+ request = urllib2.Request(result_url)
3600+ try:
3601+ page = urllib2.urlopen(request).read()
3602+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3603+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
3604+ return
3605+
3606+ # Extract video identifiers
3607+ for mobj in re.finditer(self._VIDEO_INDICATOR, page):
3608+ video_id = page[mobj.span()[0]:mobj.span()[1]].split('=')[2][:-1]
3609+ if video_id not in already_seen:
3610+ video_ids.append(video_id)
3611+ already_seen.add(video_id)
3612+ if len(video_ids) == n:
3613+ # Specified n videos reached
3614+ for id in video_ids:
3615+ self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
3616+ return
3617+
3618+ if re.search(self._MORE_PAGES_INDICATOR, page) is None:
3619+ for id in video_ids:
3620+ self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
3621+ return
3622+
3623+ pagenum = pagenum + 1
3624+
3625+
3626+class GoogleSearchIE(InfoExtractor):
3627+ """Information Extractor for Google Video search queries."""
3628+ _VALID_URL = r'gvsearch(\d+|all)?:[\s\S]+'
3629+ _TEMPLATE_URL = 'http://video.google.com/videosearch?q=%s+site:video.google.com&start=%s&hl=en'
3630+ _VIDEO_INDICATOR = r'videoplay\?docid=([^\&>]+)\&'
3631+ _MORE_PAGES_INDICATOR = r'<span>Next</span>'
3632+ _google_ie = None
3633+ _max_google_results = 1000
3634+ IE_NAME = u'video.google:search'
3635+
3636+ def __init__(self, google_ie, downloader=None):
3637+ InfoExtractor.__init__(self, downloader)
3638+ self._google_ie = google_ie
3639+
3640+ def report_download_page(self, query, pagenum):
3641+ """Report attempt to download playlist page with given number."""
3642+ query = query.decode(preferredencoding())
3643+ self._downloader.to_screen(u'[video.google] query "%s": Downloading page %s' % (query, pagenum))
3644+
3645+ def _real_initialize(self):
3646+ self._google_ie.initialize()
3647+
3648+ def _real_extract(self, query):
3649+ mobj = re.match(self._VALID_URL, query)
3650+ if mobj is None:
3651+ self._downloader.trouble(u'ERROR: invalid search query "%s"' % query)
3652+ return
3653+
3654+ prefix, query = query.split(':')
3655+ prefix = prefix[8:]
3656+ query = query.encode('utf-8')
3657+ if prefix == '':
3658+ self._download_n_results(query, 1)
3659+ return
3660+ elif prefix == 'all':
3661+ self._download_n_results(query, self._max_google_results)
3662+ return
3663+ else:
3664+ try:
3665+ n = long(prefix)
3666+ if n <= 0:
3667+ self._downloader.trouble(u'ERROR: invalid download number %s for query "%s"' % (n, query))
3668+ return
3669+ elif n > self._max_google_results:
3670+ self._downloader.to_stderr(u'WARNING: gvsearch returns max %i results (you requested %i)' % (self._max_google_results, n))
3671+ n = self._max_google_results
3672+ self._download_n_results(query, n)
3673+ return
3674+ except ValueError: # parsing prefix as integer fails
3675+ self._download_n_results(query, 1)
3676+ return
3677+
3678+ def _download_n_results(self, query, n):
3679+ """Downloads a specified number of results for a query"""
3680+
3681+ video_ids = []
3682+ already_seen = set()
3683+ pagenum = 1
3684+
3685+ while True:
3686+ self.report_download_page(query, pagenum)
3687+ result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
3688+ request = urllib2.Request(result_url)
3689+ try:
3690+ page = urllib2.urlopen(request).read()
3691+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3692+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
3693+ return
3694+
3695+ # Extract video identifiers
3696+ for mobj in re.finditer(self._VIDEO_INDICATOR, page):
3697+ video_id = mobj.group(1)
3698+ if video_id not in already_seen:
3699+ video_ids.append(video_id)
3700+ already_seen.add(video_id)
3701+ if len(video_ids) == n:
3702+ # Specified n videos reached
3703+ for id in video_ids:
3704+ self._google_ie.extract('http://video.google.com/videoplay?docid=%s' % id)
3705+ return
3706+
3707+ if re.search(self._MORE_PAGES_INDICATOR, page) is None:
3708+ for id in video_ids:
3709+ self._google_ie.extract('http://video.google.com/videoplay?docid=%s' % id)
3710+ return
3711+
3712+ pagenum = pagenum + 1
3713+
3714+
3715+class YahooSearchIE(InfoExtractor):
3716+ """Information Extractor for Yahoo! Video search queries."""
3717+ _VALID_URL = r'yvsearch(\d+|all)?:[\s\S]+'
3718+ _TEMPLATE_URL = 'http://video.yahoo.com/search/?p=%s&o=%s'
3719+ _VIDEO_INDICATOR = r'href="http://video\.yahoo\.com/watch/([0-9]+/[0-9]+)"'
3720+ _MORE_PAGES_INDICATOR = r'\s*Next'
3721+ _yahoo_ie = None
3722+ _max_yahoo_results = 1000
3723+ IE_NAME = u'video.yahoo:search'
3724+
3725+ def __init__(self, yahoo_ie, downloader=None):
3726+ InfoExtractor.__init__(self, downloader)
3727+ self._yahoo_ie = yahoo_ie
3728+
3729+ def report_download_page(self, query, pagenum):
3730+ """Report attempt to download playlist page with given number."""
3731+ query = query.decode(preferredencoding())
3732+ self._downloader.to_screen(u'[video.yahoo] query "%s": Downloading page %s' % (query, pagenum))
3733+
3734+ def _real_initialize(self):
3735+ self._yahoo_ie.initialize()
3736+
3737+ def _real_extract(self, query):
3738+ mobj = re.match(self._VALID_URL, query)
3739+ if mobj is None:
3740+ self._downloader.trouble(u'ERROR: invalid search query "%s"' % query)
3741+ return
3742+
3743+ prefix, query = query.split(':')
3744+ prefix = prefix[8:]
3745+ query = query.encode('utf-8')
3746+ if prefix == '':
3747+ self._download_n_results(query, 1)
3748+ return
3749+ elif prefix == 'all':
3750+ self._download_n_results(query, self._max_yahoo_results)
3751+ return
3752+ else:
3753+ try:
3754+ n = long(prefix)
3755+ if n <= 0:
3756+ self._downloader.trouble(u'ERROR: invalid download number %s for query "%s"' % (n, query))
3757+ return
3758+ elif n > self._max_yahoo_results:
3759+ self._downloader.to_stderr(u'WARNING: yvsearch returns max %i results (you requested %i)' % (self._max_yahoo_results, n))
3760+ n = self._max_yahoo_results
3761+ self._download_n_results(query, n)
3762+ return
3763+ except ValueError: # parsing prefix as integer fails
3764+ self._download_n_results(query, 1)
3765+ return
3766+
3767+ def _download_n_results(self, query, n):
3768+ """Downloads a specified number of results for a query"""
3769+
3770+ video_ids = []
3771+ already_seen = set()
3772+ pagenum = 1
3773+
3774+ while True:
3775+ self.report_download_page(query, pagenum)
3776+ result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
3777+ request = urllib2.Request(result_url)
3778+ try:
3779+ page = urllib2.urlopen(request).read()
3780+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3781+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
3782+ return
3783+
3784+ # Extract video identifiers
3785+ for mobj in re.finditer(self._VIDEO_INDICATOR, page):
3786+ video_id = mobj.group(1)
3787+ if video_id not in already_seen:
3788+ video_ids.append(video_id)
3789+ already_seen.add(video_id)
3790+ if len(video_ids) == n:
3791+ # Specified n videos reached
3792+ for id in video_ids:
3793+ self._yahoo_ie.extract('http://video.yahoo.com/watch/%s' % id)
3794+ return
3795+
3796+ if re.search(self._MORE_PAGES_INDICATOR, page) is None:
3797+ for id in video_ids:
3798+ self._yahoo_ie.extract('http://video.yahoo.com/watch/%s' % id)
3799+ return
3800+
3801+ pagenum = pagenum + 1
3802+
3803+
3804+class YoutubePlaylistIE(InfoExtractor):
3805+ """Information Extractor for YouTube playlists."""
3806+
3807+ _VALID_URL = r'(?:https?://)?(?:\w+\.)?youtube\.com/(?:(?:course|view_play_list|my_playlists|artist|playlist)\?.*?(p|a|list)=|user/.*?/user/|p/|user/.*?#[pg]/c/)(?:PL)?([0-9A-Za-z-_]+)(?:/.*?/([0-9A-Za-z_-]+))?.*'
3808+ _TEMPLATE_URL = 'http://www.youtube.com/%s?%s=%s&page=%s&gl=US&hl=en'
3809+ _VIDEO_INDICATOR = r'/watch\?v=(.+?)&'
3810+ _MORE_PAGES_INDICATOR = r'(?m)>\s*Next\s*</a>'
3811+ _youtube_ie = None
3812+ IE_NAME = u'youtube:playlist'
3813+
3814+ def __init__(self, youtube_ie, downloader=None):
3815+ InfoExtractor.__init__(self, downloader)
3816+ self._youtube_ie = youtube_ie
3817+
3818+ def report_download_page(self, playlist_id, pagenum):
3819+ """Report attempt to download playlist page with given number."""
3820+ self._downloader.to_screen(u'[youtube] PL %s: Downloading page #%s' % (playlist_id, pagenum))
3821+
3822+ def _real_initialize(self):
3823+ self._youtube_ie.initialize()
3824+
3825+ def _real_extract(self, url):
3826+ # Extract playlist id
3827+ mobj = re.match(self._VALID_URL, url)
3828+ if mobj is None:
3829+ self._downloader.trouble(u'ERROR: invalid url: %s' % url)
3830+ return
3831+
3832+ # Single video case
3833+ if mobj.group(3) is not None:
3834+ self._youtube_ie.extract(mobj.group(3))
3835+ return
3836+
3837+ # Download playlist pages
3838+ # prefix is 'p' as default for playlists but there are other types that need extra care
3839+ playlist_prefix = mobj.group(1)
3840+ if playlist_prefix == 'a':
3841+ playlist_access = 'artist'
3842+ else:
3843+ playlist_prefix = 'p'
3844+ playlist_access = 'view_play_list'
3845+ playlist_id = mobj.group(2)
3846+ video_ids = []
3847+ pagenum = 1
3848+
3849+ while True:
3850+ self.report_download_page(playlist_id, pagenum)
3851+ url = self._TEMPLATE_URL % (playlist_access, playlist_prefix, playlist_id, pagenum)
3852+ request = urllib2.Request(url)
3853+ try:
3854+ page = urllib2.urlopen(request).read()
3855+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3856+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
3857+ return
3858+
3859+ # Extract video identifiers
3860+ ids_in_page = []
3861+ for mobj in re.finditer(self._VIDEO_INDICATOR, page):
3862+ if mobj.group(1) not in ids_in_page:
3863+ ids_in_page.append(mobj.group(1))
3864+ video_ids.extend(ids_in_page)
3865+
3866+ if re.search(self._MORE_PAGES_INDICATOR, page) is None:
3867+ break
3868+ pagenum = pagenum + 1
3869+
3870+ playliststart = self._downloader.params.get('playliststart', 1) - 1
3871+ playlistend = self._downloader.params.get('playlistend', -1)
3872+ video_ids = video_ids[playliststart:playlistend]
3873+
3874+ for id in video_ids:
3875+ self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
3876+ return
3877+
3878+
3879+class YoutubeUserIE(InfoExtractor):
3880+ """Information Extractor for YouTube users."""
3881+
3882+ _VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/user/)|ytuser:)([A-Za-z0-9_-]+)'
3883+ _TEMPLATE_URL = 'http://gdata.youtube.com/feeds/api/users/%s'
3884+ _GDATA_PAGE_SIZE = 50
3885+ _GDATA_URL = 'http://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d'
3886+ _VIDEO_INDICATOR = r'/watch\?v=(.+?)[\<&]'
3887+ _youtube_ie = None
3888+ IE_NAME = u'youtube:user'
3889+
3890+ def __init__(self, youtube_ie, downloader=None):
3891+ InfoExtractor.__init__(self, downloader)
3892+ self._youtube_ie = youtube_ie
3893+
3894+ def report_download_page(self, username, start_index):
3895+ """Report attempt to download user page."""
3896+ self._downloader.to_screen(u'[youtube] user %s: Downloading video ids from %d to %d' %
3897+ (username, start_index, start_index + self._GDATA_PAGE_SIZE))
3898+
3899+ def _real_initialize(self):
3900+ self._youtube_ie.initialize()
3901+
3902+ def _real_extract(self, url):
3903+ # Extract username
3904+ mobj = re.match(self._VALID_URL, url)
3905+ if mobj is None:
3906+ self._downloader.trouble(u'ERROR: invalid url: %s' % url)
3907+ return
3908+
3909+ username = mobj.group(1)
3910+
3911+ # Download video ids using YouTube Data API. Result size per
3912+ # query is limited (currently to 50 videos) so we need to query
3913+ # page by page until there are no video ids - it means we got
3914+ # all of them.
3915+
3916+ video_ids = []
3917+ pagenum = 0
3918+
3919+ while True:
3920+ start_index = pagenum * self._GDATA_PAGE_SIZE + 1
3921+ self.report_download_page(username, start_index)
3922+
3923+ request = urllib2.Request(self._GDATA_URL % (username, self._GDATA_PAGE_SIZE, start_index))
3924+
3925+ try:
3926+ page = urllib2.urlopen(request).read()
3927+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3928+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
3929+ return
3930+
3931+ # Extract video identifiers
3932+ ids_in_page = []
3933+
3934+ for mobj in re.finditer(self._VIDEO_INDICATOR, page):
3935+ if mobj.group(1) not in ids_in_page:
3936+ ids_in_page.append(mobj.group(1))
3937+
3938+ video_ids.extend(ids_in_page)
3939+
3940+ # A little optimization - if current page is not
3941+ # "full", ie. does not contain PAGE_SIZE video ids then
3942+ # we can assume that this page is the last one - there
3943+ # are no more ids on further pages - no need to query
3944+ # again.
3945+
3946+ if len(ids_in_page) < self._GDATA_PAGE_SIZE:
3947+ break
3948+
3949+ pagenum += 1
3950+
3951+ all_ids_count = len(video_ids)
3952+ playliststart = self._downloader.params.get('playliststart', 1) - 1
3953+ playlistend = self._downloader.params.get('playlistend', -1)
3954+
3955+ if playlistend == -1:
3956+ video_ids = video_ids[playliststart:]
3957+ else:
3958+ video_ids = video_ids[playliststart:playlistend]
3959+
3960+ self._downloader.to_screen(u"[youtube] user %s: Collected %d video ids (downloading %d of them)" %
3961+ (username, all_ids_count, len(video_ids)))
3962+
3963+ for video_id in video_ids:
3964+ self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % video_id)
3965+
3966+
3967+class DepositFilesIE(InfoExtractor):
3968+ """Information extractor for depositfiles.com"""
3969+
3970+ _VALID_URL = r'(?:http://)?(?:\w+\.)?depositfiles\.com/(?:../(?#locale))?files/(.+)'
3971+ IE_NAME = u'DepositFiles'
3972+
3973+ def __init__(self, downloader=None):
3974+ InfoExtractor.__init__(self, downloader)
3975+
3976+ def report_download_webpage(self, file_id):
3977+ """Report webpage download."""
3978+ self._downloader.to_screen(u'[DepositFiles] %s: Downloading webpage' % file_id)
3979+
3980+ def report_extraction(self, file_id):
3981+ """Report information extraction."""
3982+ self._downloader.to_screen(u'[DepositFiles] %s: Extracting information' % file_id)
3983+
3984+ def _real_extract(self, url):
3985+ # At this point we have a new file
3986+ self._downloader.increment_downloads()
3987+
3988+ file_id = url.split('/')[-1]
3989+ # Rebuild url in english locale
3990+ url = 'http://depositfiles.com/en/files/' + file_id
3991+
3992+ # Retrieve file webpage with 'Free download' button pressed
3993+ free_download_indication = { 'gateway_result' : '1' }
3994+ request = urllib2.Request(url, urllib.urlencode(free_download_indication))
3995+ try:
3996+ self.report_download_webpage(file_id)
3997+ webpage = urllib2.urlopen(request).read()
3998+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
3999+ self._downloader.trouble(u'ERROR: Unable to retrieve file webpage: %s' % str(err))
4000+ return
4001+
4002+ # Search for the real file URL
4003+ mobj = re.search(r'<form action="(http://fileshare.+?)"', webpage)
4004+ if (mobj is None) or (mobj.group(1) is None):
4005+ # Try to figure out reason of the error.
4006+ mobj = re.search(r'<strong>(Attention.*?)</strong>', webpage, re.DOTALL)
4007+ if (mobj is not None) and (mobj.group(1) is not None):
4008+ restriction_message = re.sub('\s+', ' ', mobj.group(1)).strip()
4009+ self._downloader.trouble(u'ERROR: %s' % restriction_message)
4010+ else:
4011+ self._downloader.trouble(u'ERROR: unable to extract download URL from: %s' % url)
4012+ return
4013+
4014+ file_url = mobj.group(1)
4015+ file_extension = os.path.splitext(file_url)[1][1:]
4016+
4017+ # Search for file title
4018+ mobj = re.search(r'<b title="(.*?)">', webpage)
4019+ if mobj is None:
4020+ self._downloader.trouble(u'ERROR: unable to extract title')
4021+ return
4022+ file_title = mobj.group(1).decode('utf-8')
4023+
4024+ try:
4025+ # Process file information
4026+ self._downloader.process_info({
4027+ 'id': file_id.decode('utf-8'),
4028+ 'url': file_url.decode('utf-8'),
4029+ 'uploader': u'NA',
4030+ 'upload_date': u'NA',
4031+ 'title': file_title,
4032+ 'stitle': file_title,
4033+ 'ext': file_extension.decode('utf-8'),
4034+ 'format': u'NA',
4035+ 'player_url': None,
4036+ })
4037+ except UnavailableVideoError, err:
4038+ self._downloader.trouble(u'ERROR: unable to download file')
4039+
4040+
4041+class FacebookIE(InfoExtractor):
4042+ """Information Extractor for Facebook"""
4043+
4044+ _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
4045+ _LOGIN_URL = 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php&'
4046+ _NETRC_MACHINE = 'facebook'
4047+ _available_formats = ['video', 'highqual', 'lowqual']
4048+ _video_extensions = {
4049+ 'video': 'mp4',
4050+ 'highqual': 'mp4',
4051+ 'lowqual': 'mp4',
4052+ }
4053+ IE_NAME = u'facebook'
4054+
4055+ def __init__(self, downloader=None):
4056+ InfoExtractor.__init__(self, downloader)
4057+
4058+ def _reporter(self, message):
4059+ """Add header and report message."""
4060+ self._downloader.to_screen(u'[facebook] %s' % message)
4061+
4062+ def report_login(self):
4063+ """Report attempt to log in."""
4064+ self._reporter(u'Logging in')
4065+
4066+ def report_video_webpage_download(self, video_id):
4067+ """Report attempt to download video webpage."""
4068+ self._reporter(u'%s: Downloading video webpage' % video_id)
4069+
4070+ def report_information_extraction(self, video_id):
4071+ """Report attempt to extract video information."""
4072+ self._reporter(u'%s: Extracting video information' % video_id)
4073+
4074+ def _parse_page(self, video_webpage):
4075+ """Extract video information from page"""
4076+ # General data
4077+ data = {'title': r'\("video_title", "(.*?)"\)',
4078+ 'description': r'<div class="datawrap">(.*?)</div>',
4079+ 'owner': r'\("video_owner_name", "(.*?)"\)',
4080+ 'thumbnail': r'\("thumb_url", "(?P<THUMB>.*?)"\)',
4081+ }
4082+ video_info = {}
4083+ for piece in data.keys():
4084+ mobj = re.search(data[piece], video_webpage)
4085+ if mobj is not None:
4086+ video_info[piece] = urllib.unquote_plus(mobj.group(1).decode("unicode_escape"))
4087+
4088+ # Video urls
4089+ video_urls = {}
4090+ for fmt in self._available_formats:
4091+ mobj = re.search(r'\("%s_src\", "(.+?)"\)' % fmt, video_webpage)
4092+ if mobj is not None:
4093+ # URL is in a Javascript segment inside an escaped Unicode format within
4094+ # the generally utf-8 page
4095+ video_urls[fmt] = urllib.unquote_plus(mobj.group(1).decode("unicode_escape"))
4096+ video_info['video_urls'] = video_urls
4097+
4098+ return video_info
4099+
4100+ def _real_initialize(self):
4101+ if self._downloader is None:
4102+ return
4103+
4104+ useremail = None
4105+ password = None
4106+ downloader_params = self._downloader.params
4107+
4108+ # Attempt to use provided username and password or .netrc data
4109+ if downloader_params.get('username', None) is not None:
4110+ useremail = downloader_params['username']
4111+ password = downloader_params['password']
4112+ elif downloader_params.get('usenetrc', False):
4113+ try:
4114+ info = netrc.netrc().authenticators(self._NETRC_MACHINE)
4115+ if info is not None:
4116+ useremail = info[0]
4117+ password = info[2]
4118+ else:
4119+ raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
4120+ except (IOError, netrc.NetrcParseError), err:
4121+ self._downloader.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
4122+ return
4123+
4124+ if useremail is None:
4125+ return
4126+
4127+ # Log in
4128+ login_form = {
4129+ 'email': useremail,
4130+ 'pass': password,
4131+ 'login': 'Log+In'
4132+ }
4133+ request = urllib2.Request(self._LOGIN_URL, urllib.urlencode(login_form))
4134+ try:
4135+ self.report_login()
4136+ login_results = urllib2.urlopen(request).read()
4137+ if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
4138+ self._downloader.to_stderr(u'WARNING: unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
4139+ return
4140+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4141+ self._downloader.to_stderr(u'WARNING: unable to log in: %s' % str(err))
4142+ return
4143+
4144+ def _real_extract(self, url):
4145+ mobj = re.match(self._VALID_URL, url)
4146+ if mobj is None:
4147+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4148+ return
4149+ video_id = mobj.group('ID')
4150+
4151+ # Get video webpage
4152+ self.report_video_webpage_download(video_id)
4153+ request = urllib2.Request('https://www.facebook.com/video/video.php?v=%s' % video_id)
4154+ try:
4155+ page = urllib2.urlopen(request)
4156+ video_webpage = page.read()
4157+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4158+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
4159+ return
4160+
4161+ # Start extracting information
4162+ self.report_information_extraction(video_id)
4163+
4164+ # Extract information
4165+ video_info = self._parse_page(video_webpage)
4166+
4167+ # uploader
4168+ if 'owner' not in video_info:
4169+ self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
4170+ return
4171+ video_uploader = video_info['owner']
4172+
4173+ # title
4174+ if 'title' not in video_info:
4175+ self._downloader.trouble(u'ERROR: unable to extract video title')
4176+ return
4177+ video_title = video_info['title']
4178+ video_title = video_title.decode('utf-8')
4179+ video_title = sanitize_title(video_title)
4180+
4181+ simple_title = _simplify_title(video_title)
4182+
4183+ # thumbnail image
4184+ if 'thumbnail' not in video_info:
4185+ self._downloader.trouble(u'WARNING: unable to extract video thumbnail')
4186+ video_thumbnail = ''
4187+ else:
4188+ video_thumbnail = video_info['thumbnail']
4189+
4190+ # upload date
4191+ upload_date = u'NA'
4192+ if 'upload_date' in video_info:
4193+ upload_time = video_info['upload_date']
4194+ timetuple = email.utils.parsedate_tz(upload_time)
4195+ if timetuple is not None:
4196+ try:
4197+ upload_date = time.strftime('%Y%m%d', timetuple[0:9])
4198+ except:
4199+ pass
4200+
4201+ # description
4202+ video_description = video_info.get('description', 'No description available.')
4203+
4204+ url_map = video_info['video_urls']
4205+ if len(url_map.keys()) > 0:
4206+ # Decide which formats to download
4207+ req_format = self._downloader.params.get('format', None)
4208+ format_limit = self._downloader.params.get('format_limit', None)
4209+
4210+ if format_limit is not None and format_limit in self._available_formats:
4211+ format_list = self._available_formats[self._available_formats.index(format_limit):]
4212+ else:
4213+ format_list = self._available_formats
4214+ existing_formats = [x for x in format_list if x in url_map]
4215+ if len(existing_formats) == 0:
4216+ self._downloader.trouble(u'ERROR: no known formats available for video')
4217+ return
4218+ if req_format is None:
4219+ video_url_list = [(existing_formats[0], url_map[existing_formats[0]])] # Best quality
4220+ elif req_format == 'worst':
4221+ video_url_list = [(existing_formats[len(existing_formats)-1], url_map[existing_formats[len(existing_formats)-1]])] # worst quality
4222+ elif req_format == '-1':
4223+ video_url_list = [(f, url_map[f]) for f in existing_formats] # All formats
4224+ else:
4225+ # Specific format
4226+ if req_format not in url_map:
4227+ self._downloader.trouble(u'ERROR: requested format not available')
4228+ return
4229+ video_url_list = [(req_format, url_map[req_format])] # Specific format
4230+
4231+ for format_param, video_real_url in video_url_list:
4232+
4233+ # At this point we have a new video
4234+ self._downloader.increment_downloads()
4235+
4236+ # Extension
4237+ video_extension = self._video_extensions.get(format_param, 'mp4')
4238+
4239+ try:
4240+ # Process video information
4241+ self._downloader.process_info({
4242+ 'id': video_id.decode('utf-8'),
4243+ 'url': video_real_url.decode('utf-8'),
4244+ 'uploader': video_uploader.decode('utf-8'),
4245+ 'upload_date': upload_date,
4246+ 'title': video_title,
4247+ 'stitle': simple_title,
4248+ 'ext': video_extension.decode('utf-8'),
4249+ 'format': (format_param is None and u'NA' or format_param.decode('utf-8')),
4250+ 'thumbnail': video_thumbnail.decode('utf-8'),
4251+ 'description': video_description.decode('utf-8'),
4252+ 'player_url': None,
4253+ })
4254+ except UnavailableVideoError, err:
4255+ self._downloader.trouble(u'\nERROR: unable to download video')
4256+
4257+class BlipTVIE(InfoExtractor):
4258+ """Information extractor for blip.tv"""
4259+
4260+ _VALID_URL = r'^(?:https?://)?(?:\w+\.)?blip\.tv(/.+)$'
4261+ _URL_EXT = r'^.*\.([a-z0-9]+)$'
4262+ IE_NAME = u'blip.tv'
4263+
4264+ def report_extraction(self, file_id):
4265+ """Report information extraction."""
4266+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, file_id))
4267+
4268+ def report_direct_download(self, title):
4269+ """Report information extraction."""
4270+ self._downloader.to_screen(u'[%s] %s: Direct download detected' % (self.IE_NAME, title))
4271+
4272+ def _real_extract(self, url):
4273+ mobj = re.match(self._VALID_URL, url)
4274+ if mobj is None:
4275+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4276+ return
4277+
4278+ if '?' in url:
4279+ cchar = '&'
4280+ else:
4281+ cchar = '?'
4282+ json_url = url + cchar + 'skin=json&version=2&no_wrap=1'
4283+ request = urllib2.Request(json_url)
4284+ self.report_extraction(mobj.group(1))
4285+ info = None
4286+ try:
4287+ urlh = urllib2.urlopen(request)
4288+ if urlh.headers.get('Content-Type', '').startswith('video/'): # Direct download
4289+ basename = url.split('/')[-1]
4290+ title,ext = os.path.splitext(basename)
4291+ title = title.decode('UTF-8')
4292+ ext = ext.replace('.', '')
4293+ self.report_direct_download(title)
4294+ info = {
4295+ 'id': title,
4296+ 'url': url,
4297+ 'title': title,
4298+ 'stitle': _simplify_title(title),
4299+ 'ext': ext,
4300+ 'urlhandle': urlh
4301+ }
4302+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4303+ self._downloader.trouble(u'ERROR: unable to download video info webpage: %s' % str(err))
4304+ return
4305+ if info is None: # Regular URL
4306+ try:
4307+ json_code = urlh.read()
4308+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4309+ self._downloader.trouble(u'ERROR: unable to read video info webpage: %s' % str(err))
4310+ return
4311+
4312+ try:
4313+ json_data = json.loads(json_code)
4314+ if 'Post' in json_data:
4315+ data = json_data['Post']
4316+ else:
4317+ data = json_data
4318+
4319+ upload_date = datetime.datetime.strptime(data['datestamp'], '%m-%d-%y %H:%M%p').strftime('%Y%m%d')
4320+ video_url = data['media']['url']
4321+ umobj = re.match(self._URL_EXT, video_url)
4322+ if umobj is None:
4323+ raise ValueError('Can not determine filename extension')
4324+ ext = umobj.group(1)
4325+
4326+ info = {
4327+ 'id': data['item_id'],
4328+ 'url': video_url,
4329+ 'uploader': data['display_name'],
4330+ 'upload_date': upload_date,
4331+ 'title': data['title'],
4332+ 'stitle': _simplify_title(data['title']),
4333+ 'ext': ext,
4334+ 'format': data['media']['mimeType'],
4335+ 'thumbnail': data['thumbnailUrl'],
4336+ 'description': data['description'],
4337+ 'player_url': data['embedUrl']
4338+ }
4339+ except (ValueError,KeyError), err:
4340+ self._downloader.trouble(u'ERROR: unable to parse video information: %s' % repr(err))
4341+ return
4342+
4343+ self._downloader.increment_downloads()
4344+
4345+ try:
4346+ self._downloader.process_info(info)
4347+ except UnavailableVideoError, err:
4348+ self._downloader.trouble(u'\nERROR: unable to download video')
4349+
4350+
4351+class MyVideoIE(InfoExtractor):
4352+ """Information Extractor for myvideo.de."""
4353+
4354+ _VALID_URL = r'(?:http://)?(?:www\.)?myvideo\.de/watch/([0-9]+)/([^?/]+).*'
4355+ IE_NAME = u'myvideo'
4356+
4357+ def __init__(self, downloader=None):
4358+ InfoExtractor.__init__(self, downloader)
4359+
4360+ def report_download_webpage(self, video_id):
4361+ """Report webpage download."""
4362+ self._downloader.to_screen(u'[myvideo] %s: Downloading webpage' % video_id)
4363+
4364+ def report_extraction(self, video_id):
4365+ """Report information extraction."""
4366+ self._downloader.to_screen(u'[myvideo] %s: Extracting information' % video_id)
4367+
4368+ def _real_extract(self,url):
4369+ mobj = re.match(self._VALID_URL, url)
4370+ if mobj is None:
4371+ self._download.trouble(u'ERROR: invalid URL: %s' % url)
4372+ return
4373+
4374+ video_id = mobj.group(1)
4375+
4376+ # Get video webpage
4377+ request = urllib2.Request('http://www.myvideo.de/watch/%s' % video_id)
4378+ try:
4379+ self.report_download_webpage(video_id)
4380+ webpage = urllib2.urlopen(request).read()
4381+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4382+ self._downloader.trouble(u'ERROR: Unable to retrieve video webpage: %s' % str(err))
4383+ return
4384+
4385+ self.report_extraction(video_id)
4386+ mobj = re.search(r'<link rel=\'image_src\' href=\'(http://is[0-9].myvideo\.de/de/movie[0-9]+/[a-f0-9]+)/thumbs/[^.]+\.jpg\' />',
4387+ webpage)
4388+ if mobj is None:
4389+ self._downloader.trouble(u'ERROR: unable to extract media URL')
4390+ return
4391+ video_url = mobj.group(1) + ('/%s.flv' % video_id)
4392+
4393+ mobj = re.search('<title>([^<]+)</title>', webpage)
4394+ if mobj is None:
4395+ self._downloader.trouble(u'ERROR: unable to extract title')
4396+ return
4397+
4398+ video_title = mobj.group(1)
4399+ video_title = sanitize_title(video_title)
4400+
4401+ simple_title = _simplify_title(video_title)
4402+
4403+ try:
4404+ self._downloader.process_info({
4405+ 'id': video_id,
4406+ 'url': video_url,
4407+ 'uploader': u'NA',
4408+ 'upload_date': u'NA',
4409+ 'title': video_title,
4410+ 'stitle': simple_title,
4411+ 'ext': u'flv',
4412+ 'format': u'NA',
4413+ 'player_url': None,
4414+ })
4415+ except UnavailableVideoError:
4416+ self._downloader.trouble(u'\nERROR: Unable to download video')
4417+
4418+class ComedyCentralIE(InfoExtractor):
4419+ """Information extractor for The Daily Show and Colbert Report """
4420+
4421+ _VALID_URL = r'^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport))|(https?://)?(www\.)?(?P<showname>thedailyshow|colbertnation)\.com/full-episodes/(?P<episode>.*)$'
4422+ IE_NAME = u'comedycentral'
4423+
4424+ def report_extraction(self, episode_id):
4425+ self._downloader.to_screen(u'[comedycentral] %s: Extracting information' % episode_id)
4426+
4427+ def report_config_download(self, episode_id):
4428+ self._downloader.to_screen(u'[comedycentral] %s: Downloading configuration' % episode_id)
4429+
4430+ def report_index_download(self, episode_id):
4431+ self._downloader.to_screen(u'[comedycentral] %s: Downloading show index' % episode_id)
4432+
4433+ def report_player_url(self, episode_id):
4434+ self._downloader.to_screen(u'[comedycentral] %s: Determining player URL' % episode_id)
4435+
4436+ def _real_extract(self, url):
4437+ mobj = re.match(self._VALID_URL, url)
4438+ if mobj is None:
4439+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4440+ return
4441+
4442+ if mobj.group('shortname'):
4443+ if mobj.group('shortname') in ('tds', 'thedailyshow'):
4444+ url = u'http://www.thedailyshow.com/full-episodes/'
4445+ else:
4446+ url = u'http://www.colbertnation.com/full-episodes/'
4447+ mobj = re.match(self._VALID_URL, url)
4448+ assert mobj is not None
4449+
4450+ dlNewest = not mobj.group('episode')
4451+ if dlNewest:
4452+ epTitle = mobj.group('showname')
4453+ else:
4454+ epTitle = mobj.group('episode')
4455+
4456+ req = urllib2.Request(url)
4457+ self.report_extraction(epTitle)
4458+ try:
4459+ htmlHandle = urllib2.urlopen(req)
4460+ html = htmlHandle.read()
4461+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4462+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % unicode(err))
4463+ return
4464+ if dlNewest:
4465+ url = htmlHandle.geturl()
4466+ mobj = re.match(self._VALID_URL, url)
4467+ if mobj is None:
4468+ self._downloader.trouble(u'ERROR: Invalid redirected URL: ' + url)
4469+ return
4470+ if mobj.group('episode') == '':
4471+ self._downloader.trouble(u'ERROR: Redirected URL is still not specific: ' + url)
4472+ return
4473+ epTitle = mobj.group('episode')
4474+
4475+ mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*episode.*?:.*?))"', html)
4476+ if len(mMovieParams) == 0:
4477+ self._downloader.trouble(u'ERROR: unable to find Flash URL in webpage ' + url)
4478+ return
4479+
4480+ playerUrl_raw = mMovieParams[0][0]
4481+ self.report_player_url(epTitle)
4482+ try:
4483+ urlHandle = urllib2.urlopen(playerUrl_raw)
4484+ playerUrl = urlHandle.geturl()
4485+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4486+ self._downloader.trouble(u'ERROR: unable to find out player URL: ' + unicode(err))
4487+ return
4488+
4489+ uri = mMovieParams[0][1]
4490+ indexUrl = 'http://shadow.comedycentral.com/feeds/video_player/mrss/?' + urllib.urlencode({'uri': uri})
4491+ self.report_index_download(epTitle)
4492+ try:
4493+ indexXml = urllib2.urlopen(indexUrl).read()
4494+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4495+ self._downloader.trouble(u'ERROR: unable to download episode index: ' + unicode(err))
4496+ return
4497+
4498+ idoc = xml.etree.ElementTree.fromstring(indexXml)
4499+ itemEls = idoc.findall('.//item')
4500+ for itemEl in itemEls:
4501+ mediaId = itemEl.findall('./guid')[0].text
4502+ shortMediaId = mediaId.split(':')[-1]
4503+ showId = mediaId.split(':')[-2].replace('.com', '')
4504+ officialTitle = itemEl.findall('./title')[0].text
4505+ officialDate = itemEl.findall('./pubDate')[0].text
4506+
4507+ configUrl = ('http://www.comedycentral.com/global/feeds/entertainment/media/mediaGenEntertainment.jhtml?' +
4508+ urllib.urlencode({'uri': mediaId}))
4509+ configReq = urllib2.Request(configUrl)
4510+ self.report_config_download(epTitle)
4511+ try:
4512+ configXml = urllib2.urlopen(configReq).read()
4513+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4514+ self._downloader.trouble(u'ERROR: unable to download webpage: %s' % unicode(err))
4515+ return
4516+
4517+ cdoc = xml.etree.ElementTree.fromstring(configXml)
4518+ turls = []
4519+ for rendition in cdoc.findall('.//rendition'):
4520+ finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
4521+ turls.append(finfo)
4522+
4523+ if len(turls) == 0:
4524+ self._downloader.trouble(u'\nERROR: unable to download ' + mediaId + ': No videos found')
4525+ continue
4526+
4527+ # For now, just pick the highest bitrate
4528+ format,video_url = turls[-1]
4529+
4530+ self._downloader.increment_downloads()
4531+
4532+ effTitle = showId + u'-' + epTitle
4533+ info = {
4534+ 'id': shortMediaId,
4535+ 'url': video_url,
4536+ 'uploader': showId,
4537+ 'upload_date': officialDate,
4538+ 'title': effTitle,
4539+ 'stitle': _simplify_title(effTitle),
4540+ 'ext': 'mp4',
4541+ 'format': format,
4542+ 'thumbnail': None,
4543+ 'description': officialTitle,
4544+ 'player_url': playerUrl
4545+ }
4546+
4547+ try:
4548+ self._downloader.process_info(info)
4549+ except UnavailableVideoError, err:
4550+ self._downloader.trouble(u'\nERROR: unable to download ' + mediaId)
4551+ continue
4552+
4553+
4554+class EscapistIE(InfoExtractor):
4555+ """Information extractor for The Escapist """
4556+
4557+ _VALID_URL = r'^(https?://)?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<episode>[^/?]+)[/?]?.*$'
4558+ IE_NAME = u'escapist'
4559+
4560+ def report_extraction(self, showName):
4561+ self._downloader.to_screen(u'[escapist] %s: Extracting information' % showName)
4562+
4563+ def report_config_download(self, showName):
4564+ self._downloader.to_screen(u'[escapist] %s: Downloading configuration' % showName)
4565+
4566+ def _real_extract(self, url):
4567+ htmlParser = HTMLParser.HTMLParser()
4568+
4569+ mobj = re.match(self._VALID_URL, url)
4570+ if mobj is None:
4571+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4572+ return
4573+ showName = mobj.group('showname')
4574+ videoId = mobj.group('episode')
4575+
4576+ self.report_extraction(showName)
4577+ try:
4578+ webPage = urllib2.urlopen(url).read()
4579+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4580+ self._downloader.trouble(u'ERROR: unable to download webpage: ' + unicode(err))
4581+ return
4582+
4583+ descMatch = re.search('<meta name="description" content="([^"]*)"', webPage)
4584+ description = htmlParser.unescape(descMatch.group(1))
4585+ imgMatch = re.search('<meta property="og:image" content="([^"]*)"', webPage)
4586+ imgUrl = htmlParser.unescape(imgMatch.group(1))
4587+ playerUrlMatch = re.search('<meta property="og:video" content="([^"]*)"', webPage)
4588+ playerUrl = htmlParser.unescape(playerUrlMatch.group(1))
4589+ configUrlMatch = re.search('config=(.*)$', playerUrl)
4590+ configUrl = urllib2.unquote(configUrlMatch.group(1))
4591+
4592+ self.report_config_download(showName)
4593+ try:
4594+ configJSON = urllib2.urlopen(configUrl).read()
4595+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4596+ self._downloader.trouble(u'ERROR: unable to download configuration: ' + unicode(err))
4597+ return
4598+
4599+ # Technically, it's JavaScript, not JSON
4600+ configJSON = configJSON.replace("'", '"')
4601+
4602+ try:
4603+ config = json.loads(configJSON)
4604+ except (ValueError,), err:
4605+ self._downloader.trouble(u'ERROR: Invalid JSON in configuration file: ' + unicode(err))
4606+ return
4607+
4608+ playlist = config['playlist']
4609+ videoUrl = playlist[1]['url']
4610+
4611+ self._downloader.increment_downloads()
4612+ info = {
4613+ 'id': videoId,
4614+ 'url': videoUrl,
4615+ 'uploader': showName,
4616+ 'upload_date': None,
4617+ 'title': showName,
4618+ 'stitle': _simplify_title(showName),
4619+ 'ext': 'flv',
4620+ 'format': 'flv',
4621+ 'thumbnail': imgUrl,
4622+ 'description': description,
4623+ 'player_url': playerUrl,
4624+ }
4625+
4626+ try:
4627+ self._downloader.process_info(info)
4628+ except UnavailableVideoError, err:
4629+ self._downloader.trouble(u'\nERROR: unable to download ' + videoId)
4630+
4631+
4632+class CollegeHumorIE(InfoExtractor):
4633+ """Information extractor for collegehumor.com"""
4634+
4635+ _VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/video/(?P<videoid>[0-9]+)/(?P<shorttitle>.*)$'
4636+ IE_NAME = u'collegehumor'
4637+
4638+ def report_webpage(self, video_id):
4639+ """Report information extraction."""
4640+ self._downloader.to_screen(u'[%s] %s: Downloading webpage' % (self.IE_NAME, video_id))
4641+
4642+ def report_extraction(self, video_id):
4643+ """Report information extraction."""
4644+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, video_id))
4645+
4646+ def _real_extract(self, url):
4647+ htmlParser = HTMLParser.HTMLParser()
4648+
4649+ mobj = re.match(self._VALID_URL, url)
4650+ if mobj is None:
4651+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4652+ return
4653+ video_id = mobj.group('videoid')
4654+
4655+ self.report_webpage(video_id)
4656+ request = urllib2.Request(url)
4657+ try:
4658+ webpage = urllib2.urlopen(request).read()
4659+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4660+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
4661+ return
4662+
4663+ m = re.search(r'id="video:(?P<internalvideoid>[0-9]+)"', webpage)
4664+ if m is None:
4665+ self._downloader.trouble(u'ERROR: Cannot extract internal video ID')
4666+ return
4667+ internal_video_id = m.group('internalvideoid')
4668+
4669+ info = {
4670+ 'id': video_id,
4671+ 'internal_id': internal_video_id,
4672+ }
4673+
4674+ self.report_extraction(video_id)
4675+ xmlUrl = 'http://www.collegehumor.com/moogaloop/video:' + internal_video_id
4676+ try:
4677+ metaXml = urllib2.urlopen(xmlUrl).read()
4678+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4679+ self._downloader.trouble(u'ERROR: unable to download video info XML: %s' % str(err))
4680+ return
4681+
4682+ mdoc = xml.etree.ElementTree.fromstring(metaXml)
4683+ try:
4684+ videoNode = mdoc.findall('./video')[0]
4685+ info['description'] = videoNode.findall('./description')[0].text
4686+ info['title'] = videoNode.findall('./caption')[0].text
4687+ info['stitle'] = _simplify_title(info['title'])
4688+ info['url'] = videoNode.findall('./file')[0].text
4689+ info['thumbnail'] = videoNode.findall('./thumbnail')[0].text
4690+ info['ext'] = info['url'].rpartition('.')[2]
4691+ info['format'] = info['ext']
4692+ except IndexError:
4693+ self._downloader.trouble(u'\nERROR: Invalid metadata XML file')
4694+ return
4695+
4696+ self._downloader.increment_downloads()
4697+
4698+ try:
4699+ self._downloader.process_info(info)
4700+ except UnavailableVideoError, err:
4701+ self._downloader.trouble(u'\nERROR: unable to download video')
4702+
4703+
4704+class XVideosIE(InfoExtractor):
4705+ """Information extractor for xvideos.com"""
4706+
4707+ _VALID_URL = r'^(?:https?://)?(?:www\.)?xvideos\.com/video([0-9]+)(?:.*)'
4708+ IE_NAME = u'xvideos'
4709+
4710+ def report_webpage(self, video_id):
4711+ """Report information extraction."""
4712+ self._downloader.to_screen(u'[%s] %s: Downloading webpage' % (self.IE_NAME, video_id))
4713+
4714+ def report_extraction(self, video_id):
4715+ """Report information extraction."""
4716+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, video_id))
4717+
4718+ def _real_extract(self, url):
4719+ htmlParser = HTMLParser.HTMLParser()
4720+
4721+ mobj = re.match(self._VALID_URL, url)
4722+ if mobj is None:
4723+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4724+ return
4725+ video_id = mobj.group(1).decode('utf-8')
4726+
4727+ self.report_webpage(video_id)
4728+
4729+ request = urllib2.Request(r'http://www.xvideos.com/video' + video_id)
4730+ try:
4731+ webpage = urllib2.urlopen(request).read()
4732+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4733+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
4734+ return
4735+
4736+ self.report_extraction(video_id)
4737+
4738+
4739+ # Extract video URL
4740+ mobj = re.search(r'flv_url=(.+?)&', webpage)
4741+ if mobj is None:
4742+ self._downloader.trouble(u'ERROR: unable to extract video url')
4743+ return
4744+ video_url = urllib2.unquote(mobj.group(1).decode('utf-8'))
4745+
4746+
4747+ # Extract title
4748+ mobj = re.search(r'<title>(.*?)\s+-\s+XVID', webpage)
4749+ if mobj is None:
4750+ self._downloader.trouble(u'ERROR: unable to extract video title')
4751+ return
4752+ video_title = mobj.group(1).decode('utf-8')
4753+
4754+
4755+ # Extract video thumbnail
4756+ mobj = re.search(r'http://(?:img.*?\.)xvideos.com/videos/thumbs/[a-fA-F0-9]/[a-fA-F0-9]/[a-fA-F0-9]/([a-fA-F0-9.]+jpg)', webpage)
4757+ if mobj is None:
4758+ self._downloader.trouble(u'ERROR: unable to extract video thumbnail')
4759+ return
4760+ video_thumbnail = mobj.group(1).decode('utf-8')
4761+
4762+
4763+
4764+ self._downloader.increment_downloads()
4765+ info = {
4766+ 'id': video_id,
4767+ 'url': video_url,
4768+ 'uploader': None,
4769+ 'upload_date': None,
4770+ 'title': video_title,
4771+ 'stitle': _simplify_title(video_title),
4772+ 'ext': 'flv',
4773+ 'format': 'flv',
4774+ 'thumbnail': video_thumbnail,
4775+ 'description': None,
4776+ 'player_url': None,
4777+ }
4778+
4779+ try:
4780+ self._downloader.process_info(info)
4781+ except UnavailableVideoError, err:
4782+ self._downloader.trouble(u'\nERROR: unable to download ' + video_id)
4783+
4784+
4785+class SoundcloudIE(InfoExtractor):
4786+ """Information extractor for soundcloud.com
4787+ To access the media, the uid of the song and a stream token
4788+ must be extracted from the page source and the script must make
4789+ a request to media.soundcloud.com/crossdomain.xml. Then
4790+ the media can be grabbed by requesting from an url composed
4791+ of the stream token and uid
4792+ """
4793+
4794+ _VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/([\w\d-]+)'
4795+ IE_NAME = u'soundcloud'
4796+
4797+ def __init__(self, downloader=None):
4798+ InfoExtractor.__init__(self, downloader)
4799+
4800+ def report_webpage(self, video_id):
4801+ """Report information extraction."""
4802+ self._downloader.to_screen(u'[%s] %s: Downloading webpage' % (self.IE_NAME, video_id))
4803+
4804+ def report_extraction(self, video_id):
4805+ """Report information extraction."""
4806+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, video_id))
4807+
4808+ def _real_extract(self, url):
4809+ htmlParser = HTMLParser.HTMLParser()
4810+
4811+ mobj = re.match(self._VALID_URL, url)
4812+ if mobj is None:
4813+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4814+ return
4815+
4816+ # extract uploader (which is in the url)
4817+ uploader = mobj.group(1).decode('utf-8')
4818+ # extract simple title (uploader + slug of song title)
4819+ slug_title = mobj.group(2).decode('utf-8')
4820+ simple_title = uploader + '-' + slug_title
4821+
4822+ self.report_webpage('%s/%s' % (uploader, slug_title))
4823+
4824+ request = urllib2.Request('http://soundcloud.com/%s/%s' % (uploader, slug_title))
4825+ try:
4826+ webpage = urllib2.urlopen(request).read()
4827+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4828+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
4829+ return
4830+
4831+ self.report_extraction('%s/%s' % (uploader, slug_title))
4832+
4833+ # extract uid and stream token that soundcloud hands out for access
4834+ mobj = re.search('"uid":"([\w\d]+?)".*?stream_token=([\w\d]+)', webpage)
4835+ if mobj:
4836+ video_id = mobj.group(1)
4837+ stream_token = mobj.group(2)
4838+
4839+ # extract unsimplified title
4840+ mobj = re.search('"title":"(.*?)",', webpage)
4841+ if mobj:
4842+ title = mobj.group(1)
4843+
4844+ # construct media url (with uid/token)
4845+ mediaURL = "http://media.soundcloud.com/stream/%s?stream_token=%s"
4846+ mediaURL = mediaURL % (video_id, stream_token)
4847+
4848+ # description
4849+ description = u'No description available'
4850+ mobj = re.search('track-description-value"><p>(.*?)</p>', webpage)
4851+ if mobj:
4852+ description = mobj.group(1)
4853+
4854+ # upload date
4855+ upload_date = None
4856+ mobj = re.search("pretty-date'>on ([\w]+ [\d]+, [\d]+ \d+:\d+)</abbr></h2>", webpage)
4857+ if mobj:
4858+ try:
4859+ upload_date = datetime.datetime.strptime(mobj.group(1), '%B %d, %Y %H:%M').strftime('%Y%m%d')
4860+ except Exception, e:
4861+ print str(e)
4862+
4863+ # for soundcloud, a request to a cross domain is required for cookies
4864+ request = urllib2.Request('http://media.soundcloud.com/crossdomain.xml', std_headers)
4865+
4866+ try:
4867+ self._downloader.process_info({
4868+ 'id': video_id.decode('utf-8'),
4869+ 'url': mediaURL,
4870+ 'uploader': uploader.decode('utf-8'),
4871+ 'upload_date': upload_date,
4872+ 'title': simple_title.decode('utf-8'),
4873+ 'stitle': simple_title.decode('utf-8'),
4874+ 'ext': u'mp3',
4875+ 'format': u'NA',
4876+ 'player_url': None,
4877+ 'description': description.decode('utf-8')
4878+ })
4879+ except UnavailableVideoError:
4880+ self._downloader.trouble(u'\nERROR: unable to download video')
4881+
4882+
4883+class InfoQIE(InfoExtractor):
4884+ """Information extractor for infoq.com"""
4885+
4886+ _VALID_URL = r'^(?:https?://)?(?:www\.)?infoq\.com/[^/]+/[^/]+$'
4887+ IE_NAME = u'infoq'
4888+
4889+ def report_webpage(self, video_id):
4890+ """Report information extraction."""
4891+ self._downloader.to_screen(u'[%s] %s: Downloading webpage' % (self.IE_NAME, video_id))
4892+
4893+ def report_extraction(self, video_id):
4894+ """Report information extraction."""
4895+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, video_id))
4896+
4897+ def _real_extract(self, url):
4898+ htmlParser = HTMLParser.HTMLParser()
4899+
4900+ mobj = re.match(self._VALID_URL, url)
4901+ if mobj is None:
4902+ self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
4903+ return
4904+
4905+ self.report_webpage(url)
4906+
4907+ request = urllib2.Request(url)
4908+ try:
4909+ webpage = urllib2.urlopen(request).read()
4910+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4911+ self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
4912+ return
4913+
4914+ self.report_extraction(url)
4915+
4916+
4917+ # Extract video URL
4918+ mobj = re.search(r"jsclassref='([^']*)'", webpage)
4919+ if mobj is None:
4920+ self._downloader.trouble(u'ERROR: unable to extract video url')
4921+ return
4922+ video_url = 'rtmpe://video.infoq.com/cfx/st/' + urllib2.unquote(mobj.group(1).decode('base64'))
4923+
4924+
4925+ # Extract title
4926+ mobj = re.search(r'contentTitle = "(.*?)";', webpage)
4927+ if mobj is None:
4928+ self._downloader.trouble(u'ERROR: unable to extract video title')
4929+ return
4930+ video_title = mobj.group(1).decode('utf-8')
4931+
4932+ # Extract description
4933+ video_description = u'No description available.'
4934+ mobj = re.search(r'<meta name="description" content="(.*)"(?:\s*/)?>', webpage)
4935+ if mobj is not None:
4936+ video_description = mobj.group(1).decode('utf-8')
4937+
4938+ video_filename = video_url.split('/')[-1]
4939+ video_id, extension = video_filename.split('.')
4940+
4941+ self._downloader.increment_downloads()
4942+ info = {
4943+ 'id': video_id,
4944+ 'url': video_url,
4945+ 'uploader': None,
4946+ 'upload_date': None,
4947+ 'title': video_title,
4948+ 'stitle': _simplify_title(video_title),
4949+ 'ext': extension,
4950+ 'format': extension, # Extension is always(?) mp4, but seems to be flv
4951+ 'thumbnail': None,
4952+ 'description': video_description,
4953+ 'player_url': None,
4954+ }
4955+
4956+ try:
4957+ self._downloader.process_info(info)
4958+ except UnavailableVideoError, err:
4959+ self._downloader.trouble(u'\nERROR: unable to download ' + video_url)
4960+
4961+class MixcloudIE(InfoExtractor):
4962+ """Information extractor for www.mixcloud.com"""
4963+ _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([\w\d-]+)/([\w\d-]+)'
4964+ IE_NAME = u'mixcloud'
4965+
4966+ def __init__(self, downloader=None):
4967+ InfoExtractor.__init__(self, downloader)
4968+
4969+ def report_download_json(self, file_id):
4970+ """Report JSON download."""
4971+ self._downloader.to_screen(u'[%s] Downloading json' % self.IE_NAME)
4972+
4973+ def report_extraction(self, file_id):
4974+ """Report information extraction."""
4975+ self._downloader.to_screen(u'[%s] %s: Extracting information' % (self.IE_NAME, file_id))
4976+
4977+ def get_urls(self, jsonData, fmt, bitrate='best'):
4978+ """Get urls from 'audio_formats' section in json"""
4979+ file_url = None
4980+ try:
4981+ bitrate_list = jsonData[fmt]
4982+ if bitrate is None or bitrate == 'best' or bitrate not in bitrate_list:
4983+ bitrate = max(bitrate_list) # select highest
4984+
4985+ url_list = jsonData[fmt][bitrate]
4986+ except TypeError: # we have no bitrate info.
4987+ url_list = jsonData[fmt]
4988+
4989+ return url_list
4990+
4991+ def check_urls(self, url_list):
4992+ """Returns 1st active url from list"""
4993+ for url in url_list:
4994+ try:
4995+ urllib2.urlopen(url)
4996+ return url
4997+ except (urllib2.URLError, httplib.HTTPException, socket.error), err:
4998+ url = None
4999+
5000+ return None
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches

to all changes: