Launchpad itself

Merge lp:~mbp/launchpad/391780-markdown into lp:launchpad

391780-markdown
Merge into devel

Proposed by Martin Pool on 2011-11-20

Status:

Merged

Approved by:

Martin Pool on 2011-11-25

Approved revision:

no longer in the source branch.

Merged at revision:

14384

Proposed branch:

lp:~mbp/launchpad/391780-markdown

Merge into:

lp:launchpad

Diff against target:

225 lines (+90/-4)

7 files modified

lib/lp/app/browser/stringformatter.py (+26/-1)
lib/lp/app/browser/tests/test_stringformatter.py (+50/-0)
lib/lp/registry/browser/person.py (+3/-1)
lib/lp/registry/templates/product-index.pt (+3/-2)
lib/lp/services/features/flags.py (+6/-0)
setup.py (+1/-0)
versions.cfg (+1/-0)

To merge this branch:

bzr merge lp:~mbp/launchpad/391780-markdown

Low

Triaged

Link a bug report

Reviewer	Review Type	Date Requested	Status
Raphaël Badin (community)		2011-11-20	Approve on 2011-11-21
Review via email: mp+82832@code.launchpad.net

Commit message

[r=rvb][bug=391780][incr] Markdown markup in project and user home pages

Description of the change

Support Markdown markup in project and user home pages, per bug 391780.

Revision history for this message

Raphaël Badin (rvb) wrote on 2011-11-21:

Looks good.

A few remarks:

[0]

50 +def format_markdown(text):
51 + """Return html form of marked-up text."""
52 + # This returns whole paragraphs (in p tags), similarly to text_to_html.
53 + md = markdown.Markdown(
54 + safe_mode='escape',
55 + extensions=[
56 + 'tables',
57 + ])
58 + return md.convert(text) # How easy was that?

Looks like there is a big overhead in creating the Mardown class each time we have a string to convert:

python -m timeit -s 'import markdown; md = markdown.Markdown(safe_mode="escape",extensions=[ "tables",])' 'for x in range(1000): md.convert("a *b* a");'
10 loops, best of 3: 246 msec per loop

python -m timeit -s 'import markdown;' 'for x in range(1000): markdown.Markdown(safe_mode="escape",extensions=[ "tables",]).convert("a *b* a");'
10 loops, best of 3: 471 msec per loop

You might want to create and reuse instances stored in threading.local.

[1]

125 + def test_plain_text(self):
126 + self.assertThat(
127 + 'hello world',
128 + MarksDownAs('<p>hello world</p>'))

Don't you want to include 'real' markdown markup here, just as an illustration for the reader?

[2]

160 - <div class="summary" tal:content="context/summary">
161 + <div class="summary"
162 + tal:content="structure context/summary/fmt:markdown">
163 $Product.summary goes here. This should be quite short,

Small indentation inconsistency here ;)

review: Approve

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-22:

thanks for all these reviews.

this is not quite ready to land, with some issues still discussed on the bug.

Thank you for timing this - of course we have to do that first.
That's a good point that constructing the instance seems expensive. I
was worried by your timings until I realized you're actually measuring
1000 iterations. On my machine:

"a *b* a" is 0.246ms
10 plain text lines - 0.34ms
100 indented code lines - 0.4ms
100 lines with inline emphasis and links - 29ms
1000 lines with markup - 306ms

So for really large amounts of text it is starting to eat up a fair
amount of our render budget, but for more realistic amounts it will be
ok. Perhaps in a future release we can render on the client (if the
client is smart) and then actually take net load off the server.

Revision history for this message

Alexander Regueiro (alexreg) wrote on 2011-11-22:

I'm not sure the efficiency here is a big problem, though it does intrigue me. Which Python library are you using for Markdown rendering? There exist efficient C libraries, if it would make a difference.

Revision history for this message

Huw Wilkins (huwshimi) wrote on 2011-11-22:

@mbp one thing we'll need to do is style this content (for all the different types of elements Markdown generates). The consideration here is that this is a user controlled content block. We'll want to style the elements here slightly differently to how we style the same elements elsewhere (or at least have the ability to do so).

I guess to make that happen it would be easiest if the element wrapping the markdown content had the class "markdown".

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-22:

alexreg python-markdown 2.0.3.

i think we can safely keep the option of a C rewrite in our pocket until later.

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-22:

I think what still needs to be done before the first tranche can land is:

* make links in the output be rel=nofollow

and good to in the first landing or soon after

* put some kind of css marker on markdown formatted content
* update to Markdown 2.1beta so we can have nl2br
* add patterns for things that are currently linkified by launchpad (bug N etc)

Revision history for this message

Alexander Regueiro (alexreg) wrote on 2011-11-22:

Fair enough. Let's wait until efficiency proves itself as a problem in that case.

All four points seem quite sensible, though I think the last one won't prove too useful until we get wiki support in Launchpad (which I hope to have a go at soon). On project summary pages -- the only current application afaik -- there's no rush probably.

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-23:

Linkifying text like 'bug 391780' is important otherwise this will be a regression compared to the way text is currently displayed. Though, it is probably fairly unlikely that kind of text occurs in project or person home pages, so perhaps it's not crucial. It would be important for bug comments.

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-25:

When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of difference in creating a formatter object every time vs reusing it, but not such a drastic difference that I think we have to worry about it right now::

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown' 'markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"]).convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.31 msec per loop

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown;md =markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"])' 'md.reset();md.convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.07 msec per loop

so about .24ms overhead per markdown block that we format.

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-25:

It looks like <a rel=nofollow> is going to require development of a new extension, which could probably usefully be sent upstream. So I think it's ok to deploy this as a limited beta, without that - it only makes a difference to search engines, they won't be able to see it while it's hidden, and we can see if there are any other issues.

Revision history for this message

Raphaël Badin (rvb) wrote on 2011-11-25:

> It looks like <a rel=nofollow> is going to require development of a new
> extension, which could probably usefully be sent upstream. So I think it's ok
> to deploy this as a limited beta, without that - it only makes a difference to
> search engines, they won't be able to see it while it's hidden, and we can see
> if there are any other issues.

I understood your goal was to test this on real data… since it's all behind a FF I don't see why you should not land this and see what happens.

Revision history for this message

Raphaël Badin (rvb) wrote on 2011-11-25:

> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
> so about .24ms overhead per markdown block that we format.

Right, the absolute number is not so great but it is still a 30% difference. On the other hand, maybe the difference will be even smaller when rendering more complex markup.

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-25:

2011/11/25 Raphaël Badin <email address hidden>:
>> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
>> so about .24ms overhead per markdown block that we format.
>
> Right, the absolute number is not so great but it is still a 30% difference. On the other hand, maybe the difference will be even smaller when rendering more complex markup.

I think the relevant calculation is not to divide by the render time,
but rather to multiply by the number of blocks on the page. With this
branch, it's at most one, and 0.24ms is negligible.

I think the most we would forseeably get to is a heavily commented bug
or mp with say 100 comments, all using md, and then it would be 24ms
which is probably still not too bad, considering such a page probably
has a >2000ms render time today.

If eventually we want md even for very fine grained items we might
have to work out something else.

Anyhow, thanks for not blocking initial landing.

--
Martin

Revision history for this message

Raphaël Badin (rvb) wrote on 2011-11-25:

> I think the relevant calculation is not to divide by the render time,
> but rather to multiply by the number of blocks on the page. With this
> branch, it's at most one, and 0.24ms is negligible.
>
> I think the most we would forseeably get to is a heavily commented bug
> or mp with say 100 comments, all using md, and then it would be 24ms
> which is probably still not too bad, considering such a page probably
> has a >2000ms render time today.

Good point. I've been working on optimizing queries and stuff for the past 3 weeks and this is fu***** with my brain a little I must say :).

> Anyhow, thanks for not blocking initial landing.

That's what FF are for: landing semi finished stuff to see how it works in production ;)

Anyway, thanks for pushing this forward Martin.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alexander Regueiro

Barki Mustapha

Celso Providelo

Christian Reis

Christy Awad

Colin Watson

Harpianto,ANDI

James Troup

John A Meinel

Kevin bush

Launchpad code reviewers

Launchpad code reviewers from Canonical

Martin Pool

Matthew Tanner

Maximiliano Bertacchini

Oguz Ersoz

Simon Brakhane

Ubuntu-BR DevOps

William Grant

alhawiti

api.ng

pedro cavazos

todaioan

wenjingwen

to status/vote changes:

Tzaddi

Tzaddi Belding

 === modified file 'lib/lp/app/browser/stringformatter.py'
 --- lib/lp/app/browser/stringformatter.py	2011-11-22 06:54:05 +0000
 +++ lib/lp/app/browser/stringformatter.py	2011-11-25 08:27:29 +0000
@@ -22,6 +22,7 @@
  import re
  from lxml import html
  from xml.sax.saxutils import unescape as xml_unescape
++import markdown
  from zope.component import getUtility
  from zope.interface import implements
@@ -39,6 +40,9 @@
      re_email_address,
      obfuscate_email,
+     )
++from lp.services.features import (
++    getFeatureFlag,
++    )
  def escape(text, quote=True):
@@ -556,7 +560,8 @@
 *(?P<bugnum>\d+)
        ) |
        (?P<faq>
--        \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?(?:[\s=-]|<br\s*/>)*
++        \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?
++        (?:[\s=-]|<br\s*/>)*
 *(?P<faqnum>\d+)
        ) |
        (?P<oops>
@@ -982,6 +987,12 @@
          url = root_url + self._stringtoformat
          return '<a href="%s">%s</a>' % (url, self._stringtoformat)
++    def markdown(self):
++        if getFeatureFlag('markdown.enabled'):
++            return format_markdown(self._stringtoformat)
++        else:
++            return self.text_to_html()
++
      def traverse(self, name, furtherPath):
          if name == 'nl_to_br':
              return self.nl_to_br()
@@ -991,6 +1002,8 @@
              return self.lower()
          elif name == 'break-long-words':
              return self.break_long_words()
++        elif name == 'markdown':
++            return self.markdown()
          elif name == 'text-to-html':
              return self.text_to_html()
          elif name == 'text-to-html-with-target':
@@ -1033,3 +1046,15 @@
              return self.oops_id()
          else:
              raise TraversalError(name)
++
++
++def format_markdown(text):
++    """Return html form of marked-up text."""
++    # This returns whole paragraphs (in p tags), similarly to text_to_html.
++    md = markdown.Markdown(
++        safe_mode='escape',
++        extensions=[
++            'tables',
++            'nl2br',
++            ])
++    return md.convert(text)  # How easy was that?
 === modified file 'lib/lp/app/browser/tests/test_stringformatter.py'
 --- lib/lp/app/browser/tests/test_stringformatter.py	2011-08-28 07:29:11 +0000
 +++ lib/lp/app/browser/tests/test_stringformatter.py	2011-11-25 08:27:29 +0000
@@ -9,6 +9,11 @@
  from textwrap import dedent
  import unittest
++from testtools.matchers import (
++    Equals,
++    Matcher,
++    )
++
  from zope.component import getUtility
  from canonical.config import config
@@ -20,6 +25,7 @@
      linkify_bug_numbers,
+     )
  from lp.testing import TestCase
++from lp.services.features.testing import FeatureFixture
  def test_split_paragraphs():
@@ -401,6 +407,50 @@
                  expected_string, formatted_string))
++class MarksDownAs(Matcher):
++
++    def __init__(self, expected_html):
++        self.expected_html = expected_html
++
++    def match(self, input_string):
++        return Equals(self.expected_html).match(
++            FormattersAPI(input_string).markdown())
++
++
++class TestMarkdownDisabled(TestCase):
++    """Feature flag can turn Markdown stuff off.
++    """
++
++    layer = DatabaseFunctionalLayer  # Fixtures need the database for now
++
++    def setUp(self):
++        super(TestMarkdownDisabled, self).setUp()
++        self.useFixture(FeatureFixture({'markdown.enabled': None}))
++
++    def test_plain_text(self):
++        self.assertThat(
++            'hello **simple** world',
++            MarksDownAs('<p>hello **simple** world</p>'))
++
++
++class TestMarkdown(TestCase):
++    """Test for Markdown integration within Launchpad.
++
++    Not an exhaustive test, more of a check for our integration and configuration.
++    """
++
++    layer = DatabaseFunctionalLayer  # Fixtures need the database for now
++
++    def setUp(self):
++        super(TestMarkdown, self).setUp()
++        self.useFixture(FeatureFixture({'markdown.enabled': 'on'}))
++
++    def test_plain_text(self):
++        self.assertThat(
++            'hello world',
++            MarksDownAs('<p>hello world</p>'))
++
++
  def test_suite():
      suite = unittest.TestSuite()
      suite.addTests(DocTestSuite())
 === modified file 'lib/lp/registry/browser/person.py'
 --- lib/lp/registry/browser/person.py	2011-11-20 07:23:45 +0000
 +++ lib/lp/registry/browser/person.py	2011-11-25 08:27:29 +0000
@@ -2381,10 +2381,12 @@
          if content is None:
              return None
          elif self.is_probationary_or_invalid_user:
++            # XXX: Is this really useful?  They can post links in many other
++            # places. -- mbp 2011-11-20.
              return cgi.escape(content)
          else:
              formatter = FormattersAPI
--            return formatter(content).text_to_html()
++            return formatter(content).markdown()
      @cachedproperty
      def recently_approved_members(self):
 === modified file 'lib/lp/registry/templates/product-index.pt'
 --- lib/lp/registry/templates/product-index.pt	2011-06-16 13:50:58 +0000
 +++ lib/lp/registry/templates/product-index.pt	2011-11-25 08:27:29 +0000
@@ -55,14 +55,15 @@
            <a tal:replace="structure overview_menu/review_license/fmt:icon"/>
          </p>
--        <div class="summary" tal:content="context/summary">
++	<div class="summary"
++	  tal:content="structure context/summary/fmt:markdown">
            $Product.summary goes here. This should be quite short,
            just a single paragraph of text really, giving the project
            highlights.
          </div>
          <div class="description"
--          tal:content="structure context/description/fmt:text-to-html"
++          tal:content="structure context/description/fmt:markdown"
            tal:condition="context/description">
            $Product.description goes here. This should be a longer piece of
            text, up to three paragraphs in length, which gives much more
 === modified file 'lib/lp/services/features/flags.py'
 --- lib/lp/services/features/flags.py	2011-11-24 15:58:51 +0000
 +++ lib/lp/services/features/flags.py	2011-11-25 08:27:29 +0000
@@ -113,6 +113,12 @@
       '',
       '',
       ''),
++    ('markdown.enabled',
++     'boolean',
++     'Interpret selected user content as Markdown.',
++     'disabled',
++     'Markdown',
++     'https://launchpad.net/bugs/391780'),
      ('memcache',
       'boolean',
       'Enables use of memcached where it is supported.',
 === modified file 'setup.py'
 --- setup.py	2011-11-07 12:47:36 +0000
 +++ setup.py	2011-11-25 08:27:29 +0000
@@ -52,6 +52,7 @@
          # Required for launchpadlib
          'keyring',
          'manuel',
++        'Markdown',
          'mechanize',
          'meliae',
          'mercurial',
 === modified file 'versions.cfg'
 --- versions.cfg	2011-11-21 15:52:36 +0000
 +++ versions.cfg	2011-11-25 08:27:29 +0000
@@ -43,6 +43,7 @@
  lazr.testing = 0.1.1
  lazr.uri = 1.0.2
  manuel = 1.1.1
++Markdown = 2.1.0
  martian = 0.11
  mechanize = 0.1.11
  meliae = 0.2.0.final.0