Merge lp:~steffensiebert/calibre/trunk into lp:calibre

Proposed by Steffen Siebert
Status: Needs review
Proposed branch: lp:~steffensiebert/calibre/trunk
Merge into: lp:calibre
Diff against target: 302 lines (+184/-31)
6 files modified
resources/recipes/navigationtest.recipe (+97/-0)
src/calibre/ebooks/conversion/plumber.py (+9/-0)
src/calibre/ebooks/epub/output.py (+2/-1)
src/calibre/ebooks/oeb/transforms/filenames.py (+55/-12)
src/calibre/web/feeds/input.py (+4/-1)
src/calibre/web/feeds/news.py (+17/-17)
To merge this branch: bzr merge lp:~steffensiebert/calibre/trunk
Reviewer Review Type Date Requested Status
Kovid Goyal Needs Resubmitting
Review via email: mp+42727@code.launchpad.net

Description of the change

Implements http://bugs.calibre-ebook.com/ticket/7788 (Flatten content of EPUB created by recipes to make them more compatible) and http://bugs.calibre-ebook.com/ticket/7789 (Enable Recipes to download EPUBs unmodified by calibre)

To post a comment you must log in.
lp:~steffensiebert/calibre/trunk updated
7037. By steffen.siebert <steffen.siebert@X201>

CHANGED: Undo commit 7035.

7038. By steffen.siebert <steffen.siebert@X201>

Merge from trunk

7039. By steffen.siebert <steffen.siebert@X201>

NEW: Recipe navigationtest added to create dummy ebook for navigation testing.

7040. By Steffen Siebert

FIXED: Wrong indentation.

7041. By Steffen Siebert

CHANGED: Flatten EPUB content to fix issues with some EPUB readers like FBReaderJ.

7042. By Steffen Siebert

CHANGED: Class which creates flat EPUB output renamed to FlatFilenames. Make original class UniqueFilenames available again as an alternative.

Revision history for this message
Kovid Goyal (kovid) wrote :

I've merged the flatten file transform. The rest of this branch is not going to be merged.

review: Needs Resubmitting

Unmerged revisions

7042. By Steffen Siebert

CHANGED: Class which creates flat EPUB output renamed to FlatFilenames. Make original class UniqueFilenames available again as an alternative.

7041. By Steffen Siebert

CHANGED: Flatten EPUB content to fix issues with some EPUB readers like FBReaderJ.

7040. By Steffen Siebert

FIXED: Wrong indentation.

7039. By steffen.siebert <steffen.siebert@X201>

NEW: Recipe navigationtest added to create dummy ebook for navigation testing.

7038. By steffen.siebert <steffen.siebert@X201>

Merge from trunk

7037. By steffen.siebert <steffen.siebert@X201>

CHANGED: Undo commit 7035.

7036. By Steffen Siebert

NEW: Recipes can download EPUB files without conversion from calibre by returning the path to the EPUB files from the build_index() method.

7035. By Steffen Siebert

CHANGED: Use unique filenames without subdirectores for web feed content to make the generated EPUB navigation work with more ebook readers.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'resources/recipes/navigationtest.recipe'
2--- resources/recipes/navigationtest.recipe 1970-01-01 00:00:00 +0000
3+++ resources/recipes/navigationtest.recipe 2010-12-09 23:03:49 +0000
4@@ -0,0 +1,97 @@
5+#!/usr/bin/env python
6+# -*- coding: utf-8 mode: python -*-
7+
8+__license__ = 'GPL v3'
9+__copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>'
10+__version__ = '1.0'
11+
12+""" Create dummy ebook to test navigation elements. """
13+
14+import re
15+import string
16+from calibre.web.feeds.recipes import BasicNewsRecipe
17+from calibre.ptempfile import PersistentTemporaryFile
18+
19+class NavigationTest(BasicNewsRecipe):
20+ __author__ = 'Steffen Siebert'
21+ title = 'Navigation Test'
22+ description = 'Navigation Test'
23+ publisher ='Steffen Siebert'
24+ lang = 'de-DE'
25+ language = 'de'
26+ publication_type = 'magazine'
27+ articles_are_obfuscated = True
28+ use_embedded_content = False
29+ no_stylesheets = True
30+ conversion_options = {'comments': description, 'language': language, 'publisher': publisher}
31+
32+ feeds = 3
33+ """ The number of feeds to generate. """
34+ articles_per_feed = 3
35+ """ The number of articles to generate for each feed. """
36+
37+ LOREM_IPSUM = """<p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
38+ dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi
39+ consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
40+ Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
41+ <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu
42+ feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
43+ augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
44+ nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>"""
45+ """ Dummy text. """
46+
47+ """
48+ Calibre recipe to create dummy ebook for testing navigation elements.
49+ """
50+
51+ def generate_image(self, feed, article):
52+ try:
53+ from PIL import Image, ImageDraw, ImageFont
54+ Image, ImageDraw, ImageFont
55+ except ImportError:
56+ import Image, ImageDraw, ImageFont
57+
58+ font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
59+ img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white')
60+ draw = ImageDraw.Draw(img)
61+ font = ImageFont.truetype(font_path, 22)
62+ text = "Image of feed %s article %s" % (feed, article)
63+ width, height = draw.textsize(text, font=font)
64+ left = max(int((self.MI_WIDTH - width)/2.), 0)
65+ top = max(int((self.MI_HEIGHT - height)/2.), 0)
66+ draw.text((left, top), text, fill=(255,0,0), font=font)
67+ output = PersistentTemporaryFile('_fa.jpg')
68+ img.save(output, 'JPEG')
69+ output.close()
70+ return output.name
71+
72+ def get_obfuscated_article(self, url):
73+ result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url)
74+ feed = result.group(1)
75+ article = result.group(2)
76+ imageUrl = "file:///%s" % self.generate_image(feed, article)
77+
78+ # Generate content into new temporary html file.
79+ html = PersistentTemporaryFile('_fa.html')
80+ html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article))
81+ html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article))
82+ html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article))
83+ html.write(self.LOREM_IPSUM)
84+ html.write("</body>\n</html>\n")
85+ html.close()
86+
87+ return html.name
88+
89+ def parse_index(self):
90+ feeds = []
91+
92+ for feed in range(1, self.feeds + 1):
93+ feedName = "Feed %i" % feed
94+ articles = []
95+ for article in range(1, self.articles_per_feed + 1):
96+ url = "http://dummy/feed_%i/article_%i.html" % (feed, article)
97+ title = "Feed %i Article %i" % (feed, article)
98+ articles.append({'title': title, 'url': url, 'date': ''})
99+ feeds.append((feedName, articles))
100+
101+ return feeds
102
103=== modified file 'src/calibre/ebooks/conversion/plumber.py'
104--- src/calibre/ebooks/conversion/plumber.py 2010-11-20 04:26:57 +0000
105+++ src/calibre/ebooks/conversion/plumber.py 2010-12-09 23:03:49 +0000
106@@ -838,6 +838,15 @@
107 self.dump_input(self.oeb, tdir)
108 if self.abort_after_input_dump:
109 return
110+ oebExt = os.path.splitext(self.oeb)[1]
111+ outExt = os.path.splitext(self.output)[1]
112+ if outExt.lower() == oebExt.lower():
113+ self.log("Result is already in the correct format, no further processing necessary.")
114+ shutil.copyfile(self.oeb, self.output)
115+ self.log(self.output_fmt.upper(), 'output written to', self.output)
116+ self.flush()
117+ return
118+
119 if self.input_fmt in ('recipe', 'downloaded_recipe'):
120 self.opts_to_mi(self.user_metadata)
121 if not hasattr(self.oeb, 'manifest'):
122
123=== modified file 'src/calibre/ebooks/epub/output.py'
124--- src/calibre/ebooks/epub/output.py 2010-12-05 03:20:20 +0000
125+++ src/calibre/ebooks/epub/output.py 2010-12-09 23:03:49 +0000
126@@ -13,6 +13,7 @@
127 from calibre import CurrentDir
128 from calibre.customize.conversion import OptionRecommendation
129 from calibre.constants import filesystem_encoding
130+from calibre.ebooks.oeb.transforms.filenames import UniqueFilenames, FlatFilenames
131
132 from lxml import etree
133
134@@ -142,7 +143,7 @@
135 def convert(self, oeb, output_path, input_plugin, opts, log):
136 self.log, self.opts, self.oeb = log, opts, oeb
137
138- #from calibre.ebooks.oeb.transforms.filenames import UniqueFilenames
139+ FlatFilenames()(oeb, opts)
140 #UniqueFilenames()(oeb, opts)
141
142 self.workaround_ade_quirks()
143
144=== modified file 'src/calibre/ebooks/oeb/transforms/filenames.py'
145--- src/calibre/ebooks/oeb/transforms/filenames.py 2010-12-05 03:20:20 +0000
146+++ src/calibre/ebooks/oeb/transforms/filenames.py 2010-12-09 23:03:49 +0000
147@@ -20,8 +20,9 @@
148 and manifest are not touched by this transform.
149 '''
150
151- def __init__(self, rename_map):
152+ def __init__(self, rename_map, renamed_items_map = None):
153 self.rename_map = rename_map
154+ self.renamed_items_map = renamed_items_map
155
156 def __call__(self, oeb, opts):
157 self.log = oeb.logger
158@@ -49,7 +50,6 @@
159 if self.oeb.toc:
160 self.fix_toc_entry(self.oeb.toc)
161
162-
163 def fix_toc_entry(self, toc):
164 if toc.href:
165 href = urlnormalize(toc.href)
166@@ -66,16 +66,18 @@
167 self.fix_toc_entry(x)
168
169 def url_replacer(self, orig_url):
170- url = urlnormalize(orig_url)
171- path, frag = urldefrag(url)
172- href = self.current_item.abshref(path)
173- replacement = self.rename_map.get(href, None)
174- if replacement is None:
175- return orig_url
176- replacement = self.current_item.relhref(replacement)
177- if frag:
178- replacement += '#' + frag
179- return replacement
180+ url = urlnormalize(orig_url)
181+ path, frag = urldefrag(url)
182+ if self.renamed_items_map:
183+ orig_item = self.renamed_items_map.get(self.current_item.href, self.current_item)
184+ else:
185+ orig_item = self.current_item
186+
187+ href = orig_item.abshref(path)
188+ replacement = self.current_item.relhref(self.rename_map.get(href, href))
189+ if frag:
190+ replacement += '#' + frag
191+ return replacement
192
193 class UniqueFilenames(object):
194
195@@ -128,3 +130,44 @@
196 if candidate not in self.seen_filenames:
197 return suffix
198
199+class FlatFilenames(object):
200+
201+ 'Ensure that every item in the manifest has a unique filename without subdirectories.'
202+
203+ def __call__(self, oeb, opts):
204+ self.log = oeb.logger
205+ self.opts = opts
206+ self.oeb = oeb
207+
208+ self.rename_map = {}
209+ self.renamed_items_map = {}
210+
211+ for item in list(oeb.manifest.items):
212+ # Flatten URL by removing directories.
213+ # Example: a/b/c/index.html -> a_b_c_index.html
214+ nhref = item.href.replace("/", "_")
215+
216+ if item.href == nhref:
217+ # URL hasn't changed, skip item.
218+ continue
219+
220+ data = item.data
221+ nhref = oeb.manifest.generate(href=nhref)[1]
222+ nitem = oeb.manifest.add(item.id, nhref, item.media_type, data=data,
223+ fallback=item.fallback)
224+ self.rename_map[item.href] = nhref
225+ self.renamed_items_map[nhref] = item
226+ if item.spine_position is not None:
227+ oeb.spine.insert(item.spine_position, nitem, item.linear)
228+ oeb.spine.remove(item)
229+ oeb.manifest.remove(item)
230+
231+ if self.rename_map:
232+ self.log('Found non-flat filenames, renaming to support broken'
233+ ' EPUB readers like FBReader, Aldiko and Stanza...')
234+ from pprint import pformat
235+ self.log.debug(pformat(self.rename_map))
236+ self.log.debug(pformat(self.renamed_items_map))
237+
238+ renamer = RenameFiles(self.rename_map, self.renamed_items_map)
239+ renamer(oeb, opts)
240
241=== modified file 'src/calibre/web/feeds/input.py'
242--- src/calibre/web/feeds/input.py 2010-09-17 18:02:43 +0000
243+++ src/calibre/web/feeds/input.py 2010-12-09 23:03:49 +0000
244@@ -102,8 +102,11 @@
245 disabled = getattr(ro, 'recipe_disabled', None)
246 if disabled is not None:
247 raise RecipeDisabled(disabled)
248- ro.download()
249+ index = ro.download()
250 self.recipe_object = ro
251+ if index.endswith('.epub'):
252+ # The result is already in EPUB format, no need to search for .opf file.
253+ return os.path.abspath(index)
254
255 for key, val in self.recipe_object.conversion_options.items():
256 setattr(opts, key, val)
257
258=== modified file 'src/calibre/web/feeds/news.py'
259--- src/calibre/web/feeds/news.py 2010-11-04 22:26:10 +0000
260+++ src/calibre/web/feeds/news.py 2010-12-09 23:03:49 +0000
261@@ -1364,24 +1364,24 @@
262
263 @classmethod
264 def adeify_images(cls, soup):
265- '''
266- If your recipe when converted to EPUB has problems with images when
267- viewed in Adobe Digital Editions, call this method from within
268- :meth:`postprocess_html`.
269- '''
270- for item in soup.findAll('img'):
271- for attrib in ['height','width','border','align','style']:
272- if item.has_key(attrib):
273+ '''
274+ If your recipe when converted to EPUB has problems with images when
275+ viewed in Adobe Digital Editions, call this method from within
276+ :meth:`postprocess_html`.
277+ '''
278+ for item in soup.findAll('img'):
279+ for attrib in ['height','width','border','align','style']:
280+ if item.has_key(attrib):
281 del item[attrib]
282- oldParent = item.parent
283- myIndex = oldParent.contents.index(item)
284- item.extract()
285- divtag = Tag(soup,'div')
286- brtag = Tag(soup,'br')
287- oldParent.insert(myIndex,divtag)
288- divtag.append(item)
289- divtag.append(brtag)
290- return soup
291+ oldParent = item.parent
292+ myIndex = oldParent.contents.index(item)
293+ item.extract()
294+ divtag = Tag(soup,'div')
295+ brtag = Tag(soup,'br')
296+ oldParent.insert(myIndex,divtag)
297+ divtag.append(item)
298+ divtag.append(brtag)
299+ return soup
300
301
302 class CustomIndexRecipe(BasicNewsRecipe):

Subscribers

People subscribed via source and target branches