calibre

Merge lp:~steffensiebert/calibre/trunk into lp:calibre

trunk
Merge into trunk

Proposed by Steffen Siebert on 2010-12-04

Status:	Needs review
Proposed branch:	lp:~steffensiebert/calibre/trunk
Merge into:	lp:calibre
Diff against target:	302 lines (+184/-31) 6 files modified resources/recipes/navigationtest.recipe (+97/-0) src/calibre/ebooks/conversion/plumber.py (+9/-0) src/calibre/ebooks/epub/output.py (+2/-1) src/calibre/ebooks/oeb/transforms/filenames.py (+55/-12) src/calibre/web/feeds/input.py (+4/-1) src/calibre/web/feeds/news.py (+17/-17)
To merge this branch:	bzr merge lp:~steffensiebert/calibre/trunk
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Kovid Goyal		2010-12-04	Needs Resubmitting on 2010-12-14
Review via email: mp+42727@code.launchpad.net

Description of the change

Implements http://bugs.calibre-ebook.com/ticket/7788 (Flatten content of EPUB created by recipes to make them more compatible) and http://bugs.calibre-ebook.com/ticket/7789 (Enable Recipes to download EPUBs unmodified by calibre)

lp:~steffensiebert/calibre/trunk updated on 2010-12-09

7037. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: CHANGED: Undo commit 7035.
7038. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: Merge from trunk
7039. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: NEW: Recipe navigationtest added to create dummy ebook for navigation testing.
7040. By Steffen Siebert on 2010-12-08: FIXED: Wrong indentation.
7041. By Steffen Siebert on 2010-12-08: CHANGED: Flatten EPUB content to fix issues with some EPUB readers like FBReaderJ.
7042. By Steffen Siebert on 2010-12-09: CHANGED: Class which creates flat EPUB output renamed to FlatFilenames. Make original class UniqueFilenames available again as an alternative.

Revision history for this message

Kovid Goyal (kovid) wrote on 2010-12-14:

I've merged the flatten file transform. The rest of this branch is not going to be merged.

review: Needs Resubmitting

Unmerged revisions

7042. By Steffen Siebert on 2010-12-09: CHANGED: Class which creates flat EPUB output renamed to FlatFilenames. Make original class UniqueFilenames available again as an alternative.
7041. By Steffen Siebert on 2010-12-08: CHANGED: Flatten EPUB content to fix issues with some EPUB readers like FBReaderJ.
7040. By Steffen Siebert on 2010-12-08: FIXED: Wrong indentation.
7039. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: NEW: Recipe navigationtest added to create dummy ebook for navigation testing.
7038. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: Merge from trunk
7037. By steffen.siebert <steffen.siebert@X201> on 2010-12-08: CHANGED: Undo commit 7035.
7036. By Steffen Siebert on 2010-11-28: NEW: Recipes can download EPUB files without conversion from calibre by returning the path to the EPUB files from the build_index() method.
7035. By Steffen Siebert on 2010-11-28: CHANGED: Use unique filenames without subdirectores for web feed content to make the generated EPUB navigation work with more ebook readers.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Ali Baba

Kovid Goyal

Pankaj

Steffen Siebert

Timothy Legge

gstoychev

 === added file 'resources/recipes/navigationtest.recipe'
 --- resources/recipes/navigationtest.recipe	1970-01-01 00:00:00 +0000
 +++ resources/recipes/navigationtest.recipe	2010-12-09 23:03:49 +0000
@@ -0,0 +1,97 @@
++#!/usr/bin/env  python
++# -*- coding: utf-8 mode: python -*-
++
++__license__   = 'GPL v3'
++__copyright__ = 'Steffen Siebert <calibre at steffensiebert.de>'
++__version__   = '1.0'
++
++""" Create dummy ebook to test navigation elements. """
++
++import re
++import string
++from calibre.web.feeds.recipes import BasicNewsRecipe
++from calibre.ptempfile import PersistentTemporaryFile
++
++class NavigationTest(BasicNewsRecipe):
++    __author__ = 'Steffen Siebert'
++    title = 'Navigation Test'
++    description = 'Navigation Test'
++    publisher ='Steffen Siebert'
++    lang = 'de-DE'
++    language = 'de'
++    publication_type = 'magazine'
++    articles_are_obfuscated = True
++    use_embedded_content = False
++    no_stylesheets = True
++    conversion_options = {'comments': description, 'language': language, 'publisher': publisher}
++
++    feeds = 3
++    """ The number of feeds to generate. """
++    articles_per_feed = 3
++    """ The number of articles to generate for each feed. """
++
++    LOREM_IPSUM = """<p>Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et
++    dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi
++    consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
++    Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
++    <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu
++    feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit
++    augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy
++    nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p>"""
++    """ Dummy text. """
++
++    """
++    Calibre recipe to create dummy ebook for testing navigation elements.
++    """
++
++    def generate_image(self, feed, article):
++        try:
++            from PIL import Image, ImageDraw, ImageFont
++            Image, ImageDraw, ImageFont
++        except ImportError:
++            import Image, ImageDraw, ImageFont
++
++        font_path = P('fonts/liberation/LiberationSerif-Bold.ttf')
++        img = Image.new('RGB', (self.MI_WIDTH, self.MI_HEIGHT), 'white')
++        draw = ImageDraw.Draw(img)
++        font = ImageFont.truetype(font_path, 22)
++        text = "Image of feed %s article %s" % (feed, article)
++        width, height = draw.textsize(text, font=font)
++        left = max(int((self.MI_WIDTH - width)/2.), 0)
++        top = max(int((self.MI_HEIGHT - height)/2.), 0)
++        draw.text((left, top), text, fill=(255,0,0), font=font)
++        output = PersistentTemporaryFile('_fa.jpg')
++        img.save(output, 'JPEG')
++        output.close()
++        return output.name
++
++    def get_obfuscated_article(self, url):
++        result = re.match("^http://dummy/feed_([0-9]+)/article_([0-9]+).html$", url)
++        feed = result.group(1)
++        article = result.group(2)
++        imageUrl = "file:///%s" % self.generate_image(feed, article)
++
++        # Generate content into new temporary html file.
++        html = PersistentTemporaryFile('_fa.html')
++        html.write('<html>\n<head>\n<title>Feed %s Article %s</title>\n</head>\n' % (feed, article))
++        html.write("<body>\n<h1>Feed %s Article %s</h1>\n" % (feed, article))
++        html.write('<p><img src="%s" alt="Image of feed %s article %s"></p>' % (imageUrl, feed, article))
++        html.write(self.LOREM_IPSUM)
++        html.write("</body>\n</html>\n")
++        html.close()
++
++        return html.name
++
++    def parse_index(self):
++        feeds = []
++
++        for feed in range(1, self.feeds + 1):
++            feedName = "Feed %i" % feed
++            articles = []
++            for article in range(1, self.articles_per_feed + 1):
++                url = "http://dummy/feed_%i/article_%i.html" % (feed, article)
++                title = "Feed %i Article %i" % (feed, article)
++                articles.append({'title': title, 'url': url, 'date': ''})
++            feeds.append((feedName, articles))
++
++        return feeds
 === modified file 'src/calibre/ebooks/conversion/plumber.py'
 --- src/calibre/ebooks/conversion/plumber.py	2010-11-20 04:26:57 +0000
 +++ src/calibre/ebooks/conversion/plumber.py	2010-12-09 23:03:49 +0000
@@ -838,6 +838,15 @@
                  self.dump_input(self.oeb, tdir)
                  if self.abort_after_input_dump:
                      return
++            oebExt = os.path.splitext(self.oeb)[1]
++            outExt = os.path.splitext(self.output)[1]
++            if outExt.lower() == oebExt.lower():
++                self.log("Result is already in the correct format, no further processing necessary.")
++                shutil.copyfile(self.oeb, self.output)
++                self.log(self.output_fmt.upper(), 'output written to', self.output)
++                self.flush()
++                return
++
              if self.input_fmt in ('recipe', 'downloaded_recipe'):
                  self.opts_to_mi(self.user_metadata)
              if not hasattr(self.oeb, 'manifest'):
 === modified file 'src/calibre/ebooks/epub/output.py'
 --- src/calibre/ebooks/epub/output.py	2010-12-05 03:20:20 +0000
 +++ src/calibre/ebooks/epub/output.py	2010-12-09 23:03:49 +0000
@@ -13,6 +13,7 @@
  from calibre import CurrentDir
  from calibre.customize.conversion import OptionRecommendation
  from calibre.constants import filesystem_encoding
++from calibre.ebooks.oeb.transforms.filenames import UniqueFilenames, FlatFilenames
  from lxml import etree
@@ -142,7 +143,7 @@
      def convert(self, oeb, output_path, input_plugin, opts, log):
          self.log, self.opts, self.oeb = log, opts, oeb
--        #from calibre.ebooks.oeb.transforms.filenames import UniqueFilenames
++        FlatFilenames()(oeb, opts)
          #UniqueFilenames()(oeb, opts)
          self.workaround_ade_quirks()
 === modified file 'src/calibre/ebooks/oeb/transforms/filenames.py'
 --- src/calibre/ebooks/oeb/transforms/filenames.py	2010-12-05 03:20:20 +0000
 +++ src/calibre/ebooks/oeb/transforms/filenames.py	2010-12-09 23:03:49 +0000
@@ -20,8 +20,9 @@
      and manifest are not touched by this transform.
      '''
--    def __init__(self, rename_map):
++    def __init__(self, rename_map, renamed_items_map = None):
          self.rename_map = rename_map
++        self.renamed_items_map = renamed_items_map
      def __call__(self, oeb, opts):
          self.log = oeb.logger
@@ -49,7 +50,6 @@
          if self.oeb.toc:
              self.fix_toc_entry(self.oeb.toc)
--
      def fix_toc_entry(self, toc):
          if toc.href:
              href = urlnormalize(toc.href)
@@ -66,16 +66,18 @@
              self.fix_toc_entry(x)
      def url_replacer(self, orig_url):
--         url = urlnormalize(orig_url)
--         path, frag = urldefrag(url)
--         href = self.current_item.abshref(path)
--         replacement = self.rename_map.get(href, None)
--         if replacement is None:
--             return orig_url
--         replacement = self.current_item.relhref(replacement)
--         if frag:
--             replacement += '#' + frag
--         return replacement
++        url = urlnormalize(orig_url)
++        path, frag = urldefrag(url)
++        if self.renamed_items_map:
++            orig_item = self.renamed_items_map.get(self.current_item.href, self.current_item)
++        else:
++            orig_item = self.current_item
++
++        href = orig_item.abshref(path)
++        replacement = self.current_item.relhref(self.rename_map.get(href, href))
++        if frag:
++            replacement += '#' + frag
++        return replacement
  class UniqueFilenames(object):
@@ -128,3 +130,44 @@
              if candidate not in self.seen_filenames:
                  return suffix
++class FlatFilenames(object):
++
++    'Ensure that every item in the manifest has a unique filename without subdirectories.'
++
++    def __call__(self, oeb, opts):
++        self.log = oeb.logger
++        self.opts = opts
++        self.oeb = oeb
++
++        self.rename_map = {}
++        self.renamed_items_map = {}
++
++        for item in list(oeb.manifest.items):
++            # Flatten URL by removing directories.
++            # Example: a/b/c/index.html -> a_b_c_index.html
++            nhref = item.href.replace("/", "_")
++
++            if item.href == nhref:
++                # URL hasn't changed, skip item.
++                continue
++
++            data = item.data
++            nhref = oeb.manifest.generate(href=nhref)[1]
++            nitem = oeb.manifest.add(item.id, nhref, item.media_type, data=data,
++                                     fallback=item.fallback)
++            self.rename_map[item.href] = nhref
++            self.renamed_items_map[nhref] = item
++            if item.spine_position is not None:
++                oeb.spine.insert(item.spine_position, nitem, item.linear)
++                oeb.spine.remove(item)
++            oeb.manifest.remove(item)
++
++        if self.rename_map:
++            self.log('Found non-flat filenames, renaming to support broken'
++                    ' EPUB readers like FBReader, Aldiko and Stanza...')
++            from pprint import pformat
++            self.log.debug(pformat(self.rename_map))
++            self.log.debug(pformat(self.renamed_items_map))
++
++            renamer = RenameFiles(self.rename_map, self.renamed_items_map)
++            renamer(oeb, opts)
 === modified file 'src/calibre/web/feeds/input.py'
 --- src/calibre/web/feeds/input.py	2010-09-17 18:02:43 +0000
 +++ src/calibre/web/feeds/input.py	2010-12-09 23:03:49 +0000
@@ -102,8 +102,11 @@
              disabled = getattr(ro, 'recipe_disabled', None)
              if disabled is not None:
                  raise RecipeDisabled(disabled)
--            ro.download()
++            index = ro.download()
              self.recipe_object = ro
++            if index.endswith('.epub'):
++                # The result is already in EPUB format, no need to search for .opf file.
++                return os.path.abspath(index)
          for key, val in self.recipe_object.conversion_options.items():
              setattr(opts, key, val)
 === modified file 'src/calibre/web/feeds/news.py'
 --- src/calibre/web/feeds/news.py	2010-11-04 22:26:10 +0000
 +++ src/calibre/web/feeds/news.py	2010-12-09 23:03:49 +0000
@@ -1364,24 +1364,24 @@
      @classmethod
      def adeify_images(cls, soup):
--         '''
--         If your recipe when converted to EPUB has problems with images when
--         viewed in Adobe Digital Editions, call this method from within
--         :meth:`postprocess_html`.
--         '''
--         for item in soup.findAll('img'):
--             for attrib in ['height','width','border','align','style']:
--                 if item.has_key(attrib):
++        '''
++        If your recipe when converted to EPUB has problems with images when
++        viewed in Adobe Digital Editions, call this method from within
++        :meth:`postprocess_html`.
++        '''
++        for item in soup.findAll('img'):
++            for attrib in ['height','width','border','align','style']:
++                if item.has_key(attrib):
                      del item[attrib]
--             oldParent = item.parent
--             myIndex = oldParent.contents.index(item)
--             item.extract()
--             divtag = Tag(soup,'div')
--             brtag  = Tag(soup,'br')
--             oldParent.insert(myIndex,divtag)
--             divtag.append(item)
--             divtag.append(brtag)
--         return soup
++            oldParent = item.parent
++            myIndex = oldParent.contents.index(item)
++            item.extract()
++            divtag = Tag(soup,'div')
++            brtag  = Tag(soup,'br')
++            oldParent.insert(myIndex,divtag)
++            divtag.append(item)
++            divtag.append(brtag)
++        return soup
  class CustomIndexRecipe(BasicNewsRecipe):