Merge lp:~mbp/launchpad/391780-markdown into lp:launchpad

Proposed by Martin Pool
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 14384
Proposed branch: lp:~mbp/launchpad/391780-markdown
Merge into: lp:launchpad
Diff against target: 225 lines (+90/-4)
7 files modified
lib/lp/app/browser/stringformatter.py (+26/-1)
lib/lp/app/browser/tests/test_stringformatter.py (+50/-0)
lib/lp/registry/browser/person.py (+3/-1)
lib/lp/registry/templates/product-index.pt (+3/-2)
lib/lp/services/features/flags.py (+6/-0)
setup.py (+1/-0)
versions.cfg (+1/-0)
To merge this branch: bzr merge lp:~mbp/launchpad/391780-markdown
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Review via email: mp+82832@code.launchpad.net

Commit message

[r=rvb][bug=391780][incr] Markdown markup in project and user home pages

Description of the change

Support Markdown markup in project and user home pages, per bug 391780.

To post a comment you must log in.
Revision history for this message
Raphaël Badin (rvb) wrote :

Looks good.

A few remarks:

[0]

50 +def format_markdown(text):
51 + """Return html form of marked-up text."""
52 + # This returns whole paragraphs (in p tags), similarly to text_to_html.
53 + md = markdown.Markdown(
54 + safe_mode='escape',
55 + extensions=[
56 + 'tables',
57 + ])
58 + return md.convert(text) # How easy was that?

Looks like there is a big overhead in creating the Mardown class each time we have a string to convert:

python -m timeit -s 'import markdown; md = markdown.Markdown(safe_mode="escape",extensions=[ "tables",])' 'for x in range(1000): md.convert("a *b* a");'
10 loops, best of 3: 246 msec per loop

python -m timeit -s 'import markdown;' 'for x in range(1000): markdown.Markdown(safe_mode="escape",extensions=[ "tables",]).convert("a *b* a");'
10 loops, best of 3: 471 msec per loop

You might want to create and reuse instances stored in threading.local.

[1]

125 + def test_plain_text(self):
126 + self.assertThat(
127 + 'hello world',
128 + MarksDownAs('<p>hello world</p>'))

Don't you want to include 'real' markdown markup here, just as an illustration for the reader?

[2]

160 - <div class="summary" tal:content="context/summary">
161 + <div class="summary"
162 + tal:content="structure context/summary/fmt:markdown">
163 $Product.summary goes here. This should be quite short,

Small indentation inconsistency here ;)

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

thanks for all these reviews.

this is not quite ready to land, with some issues still discussed on the bug.

Thank you for timing this - of course we have to do that first.
That's a good point that constructing the instance seems expensive. I
was worried by your timings until I realized you're actually measuring
1000 iterations. On my machine:

"a *b* a" is 0.246ms
10 plain text lines - 0.34ms
100 indented code lines - 0.4ms
100 lines with inline emphasis and links - 29ms
1000 lines with markup - 306ms

So for really large amounts of text it is starting to eat up a fair
amount of our render budget, but for more realistic amounts it will be
ok. Perhaps in a future release we can render on the client (if the
client is smart) and then actually take net load off the server.

Revision history for this message
Alexander Regueiro (alexreg) wrote :

I'm not sure the efficiency here is a big problem, though it does intrigue me. Which Python library are you using for Markdown rendering? There exist efficient C libraries, if it would make a difference.

Revision history for this message
Huw Wilkins (huwshimi) wrote :

@mbp one thing we'll need to do is style this content (for all the different types of elements Markdown generates). The consideration here is that this is a user controlled content block. We'll want to style the elements here slightly differently to how we style the same elements elsewhere (or at least have the ability to do so).

I guess to make that happen it would be easiest if the element wrapping the markdown content had the class "markdown".

Revision history for this message
Martin Pool (mbp) wrote :

alexreg python-markdown 2.0.3.

i think we can safely keep the option of a C rewrite in our pocket until later.

Revision history for this message
Martin Pool (mbp) wrote :

I think what still needs to be done before the first tranche can land is:

 * make links in the output be rel=nofollow

and good to in the first landing or soon after

 * put some kind of css marker on markdown formatted content
 * update to Markdown 2.1beta so we can have nl2br
 * add patterns for things that are currently linkified by launchpad (bug N etc)

Revision history for this message
Alexander Regueiro (alexreg) wrote :

Fair enough. Let's wait until efficiency proves itself as a problem in that case.

All four points seem quite sensible, though I think the last one won't prove too useful until we get wiki support in Launchpad (which I hope to have a go at soon). On project summary pages -- the only current application afaik -- there's no rush probably.

Revision history for this message
Martin Pool (mbp) wrote :

Linkifying text like 'bug 391780' is important otherwise this will be a regression compared to the way text is currently displayed. Though, it is probably fairly unlikely that kind of text occurs in project or person home pages, so perhaps it's not crucial. It would be important for bug comments.

Revision history for this message
Martin Pool (mbp) wrote :

When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of difference in creating a formatter object every time vs reusing it, but not such a drastic difference that I think we have to worry about it right now::

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown' 'markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"]).convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.31 msec per loop

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown;md =markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"])' 'md.reset();md.convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.07 msec per loop

so about .24ms overhead per markdown block that we format.

Revision history for this message
Martin Pool (mbp) wrote :

It looks like <a rel=nofollow> is going to require development of a new extension, which could probably usefully be sent upstream. So I think it's ok to deploy this as a limited beta, without that - it only makes a difference to search engines, they won't be able to see it while it's hidden, and we can see if there are any other issues.

Revision history for this message
Raphaël Badin (rvb) wrote :

> It looks like <a rel=nofollow> is going to require development of a new
> extension, which could probably usefully be sent upstream. So I think it's ok
> to deploy this as a limited beta, without that - it only makes a difference to
> search engines, they won't be able to see it while it's hidden, and we can see
> if there are any other issues.

I understood your goal was to test this on real data… since it's all behind a FF I don't see why you should not land this and see what happens.

Revision history for this message
Raphaël Badin (rvb) wrote :

> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
> so about .24ms overhead per markdown block that we format.

Right, the absolute number is not so great but it is still a 30% difference. On the other hand, maybe the difference will be even smaller when rendering more complex markup.

Revision history for this message
Martin Pool (mbp) wrote :

2011/11/25 Raphaël Badin <email address hidden>:
>> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
>> so about .24ms overhead per markdown block that we format.
>
> Right, the absolute number is not so great but it is still a 30% difference.  On the other hand, maybe the difference will be even smaller when rendering more complex markup.

I think the relevant calculation is not to divide by the render time,
but rather to multiply by the number of blocks on the page. With this
branch, it's at most one, and 0.24ms is negligible.

I think the most we would forseeably get to is a heavily commented bug
or mp with say 100 comments, all using md, and then it would be 24ms
which is probably still not too bad, considering such a page probably
has a >2000ms render time today.

If eventually we want md even for very fine grained items we might
have to work out something else.

Anyhow, thanks for not blocking initial landing.

--
Martin

Revision history for this message
Raphaël Badin (rvb) wrote :

> I think the relevant calculation is not to divide by the render time,
> but rather to multiply by the number of blocks on the page. With this
> branch, it's at most one, and 0.24ms is negligible.
>
> I think the most we would forseeably get to is a heavily commented bug
> or mp with say 100 comments, all using md, and then it would be 24ms
> which is probably still not too bad, considering such a page probably
> has a >2000ms render time today.

Good point. I've been working on optimizing queries and stuff for the past 3 weeks and this is fu***** with my brain a little I must say :).

> Anyhow, thanks for not blocking initial landing.

That's what FF are for: landing semi finished stuff to see how it works in production ;)

Anyway, thanks for pushing this forward Martin.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/app/browser/stringformatter.py'
--- lib/lp/app/browser/stringformatter.py 2011-11-22 06:54:05 +0000
+++ lib/lp/app/browser/stringformatter.py 2011-11-25 08:27:29 +0000
@@ -22,6 +22,7 @@
22import re22import re
23from lxml import html23from lxml import html
24from xml.sax.saxutils import unescape as xml_unescape24from xml.sax.saxutils import unescape as xml_unescape
25import markdown
2526
26from zope.component import getUtility27from zope.component import getUtility
27from zope.interface import implements28from zope.interface import implements
@@ -39,6 +40,9 @@
39 re_email_address,40 re_email_address,
40 obfuscate_email,41 obfuscate_email,
41 )42 )
43from lp.services.features import (
44 getFeatureFlag,
45 )
4246
4347
44def escape(text, quote=True):48def escape(text, quote=True):
@@ -556,7 +560,8 @@
556 0*(?P<bugnum>\d+)560 0*(?P<bugnum>\d+)
557 ) |561 ) |
558 (?P<faq>562 (?P<faq>
559 \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?(?:[\s=-]|<br\s*/>)*563 \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?
564 (?:[\s=-]|<br\s*/>)*
560 0*(?P<faqnum>\d+)565 0*(?P<faqnum>\d+)
561 ) |566 ) |
562 (?P<oops>567 (?P<oops>
@@ -982,6 +987,12 @@
982 url = root_url + self._stringtoformat987 url = root_url + self._stringtoformat
983 return '<a href="%s">%s</a>' % (url, self._stringtoformat)988 return '<a href="%s">%s</a>' % (url, self._stringtoformat)
984989
990 def markdown(self):
991 if getFeatureFlag('markdown.enabled'):
992 return format_markdown(self._stringtoformat)
993 else:
994 return self.text_to_html()
995
985 def traverse(self, name, furtherPath):996 def traverse(self, name, furtherPath):
986 if name == 'nl_to_br':997 if name == 'nl_to_br':
987 return self.nl_to_br()998 return self.nl_to_br()
@@ -991,6 +1002,8 @@
991 return self.lower()1002 return self.lower()
992 elif name == 'break-long-words':1003 elif name == 'break-long-words':
993 return self.break_long_words()1004 return self.break_long_words()
1005 elif name == 'markdown':
1006 return self.markdown()
994 elif name == 'text-to-html':1007 elif name == 'text-to-html':
995 return self.text_to_html()1008 return self.text_to_html()
996 elif name == 'text-to-html-with-target':1009 elif name == 'text-to-html-with-target':
@@ -1033,3 +1046,15 @@
1033 return self.oops_id()1046 return self.oops_id()
1034 else:1047 else:
1035 raise TraversalError(name)1048 raise TraversalError(name)
1049
1050
1051def format_markdown(text):
1052 """Return html form of marked-up text."""
1053 # This returns whole paragraphs (in p tags), similarly to text_to_html.
1054 md = markdown.Markdown(
1055 safe_mode='escape',
1056 extensions=[
1057 'tables',
1058 'nl2br',
1059 ])
1060 return md.convert(text) # How easy was that?
10361061
=== modified file 'lib/lp/app/browser/tests/test_stringformatter.py'
--- lib/lp/app/browser/tests/test_stringformatter.py 2011-08-28 07:29:11 +0000
+++ lib/lp/app/browser/tests/test_stringformatter.py 2011-11-25 08:27:29 +0000
@@ -9,6 +9,11 @@
9from textwrap import dedent9from textwrap import dedent
10import unittest10import unittest
1111
12from testtools.matchers import (
13 Equals,
14 Matcher,
15 )
16
12from zope.component import getUtility17from zope.component import getUtility
1318
14from canonical.config import config19from canonical.config import config
@@ -20,6 +25,7 @@
20 linkify_bug_numbers,25 linkify_bug_numbers,
21 )26 )
22from lp.testing import TestCase27from lp.testing import TestCase
28from lp.services.features.testing import FeatureFixture
2329
2430
25def test_split_paragraphs():31def test_split_paragraphs():
@@ -401,6 +407,50 @@
401 expected_string, formatted_string))407 expected_string, formatted_string))
402408
403409
410class MarksDownAs(Matcher):
411
412 def __init__(self, expected_html):
413 self.expected_html = expected_html
414
415 def match(self, input_string):
416 return Equals(self.expected_html).match(
417 FormattersAPI(input_string).markdown())
418
419
420class TestMarkdownDisabled(TestCase):
421 """Feature flag can turn Markdown stuff off.
422 """
423
424 layer = DatabaseFunctionalLayer # Fixtures need the database for now
425
426 def setUp(self):
427 super(TestMarkdownDisabled, self).setUp()
428 self.useFixture(FeatureFixture({'markdown.enabled': None}))
429
430 def test_plain_text(self):
431 self.assertThat(
432 'hello **simple** world',
433 MarksDownAs('<p>hello **simple** world</p>'))
434
435
436class TestMarkdown(TestCase):
437 """Test for Markdown integration within Launchpad.
438
439 Not an exhaustive test, more of a check for our integration and configuration.
440 """
441
442 layer = DatabaseFunctionalLayer # Fixtures need the database for now
443
444 def setUp(self):
445 super(TestMarkdown, self).setUp()
446 self.useFixture(FeatureFixture({'markdown.enabled': 'on'}))
447
448 def test_plain_text(self):
449 self.assertThat(
450 'hello world',
451 MarksDownAs('<p>hello world</p>'))
452
453
404def test_suite():454def test_suite():
405 suite = unittest.TestSuite()455 suite = unittest.TestSuite()
406 suite.addTests(DocTestSuite())456 suite.addTests(DocTestSuite())
407457
=== modified file 'lib/lp/registry/browser/person.py'
--- lib/lp/registry/browser/person.py 2011-11-20 07:23:45 +0000
+++ lib/lp/registry/browser/person.py 2011-11-25 08:27:29 +0000
@@ -2381,10 +2381,12 @@
2381 if content is None:2381 if content is None:
2382 return None2382 return None
2383 elif self.is_probationary_or_invalid_user:2383 elif self.is_probationary_or_invalid_user:
2384 # XXX: Is this really useful? They can post links in many other
2385 # places. -- mbp 2011-11-20.
2384 return cgi.escape(content)2386 return cgi.escape(content)
2385 else:2387 else:
2386 formatter = FormattersAPI2388 formatter = FormattersAPI
2387 return formatter(content).text_to_html()2389 return formatter(content).markdown()
23882390
2389 @cachedproperty2391 @cachedproperty
2390 def recently_approved_members(self):2392 def recently_approved_members(self):
23912393
=== modified file 'lib/lp/registry/templates/product-index.pt'
--- lib/lp/registry/templates/product-index.pt 2011-06-16 13:50:58 +0000
+++ lib/lp/registry/templates/product-index.pt 2011-11-25 08:27:29 +0000
@@ -55,14 +55,15 @@
55 <a tal:replace="structure overview_menu/review_license/fmt:icon"/>55 <a tal:replace="structure overview_menu/review_license/fmt:icon"/>
56 </p>56 </p>
5757
58 <div class="summary" tal:content="context/summary">58 <div class="summary"
59 tal:content="structure context/summary/fmt:markdown">
59 $Product.summary goes here. This should be quite short,60 $Product.summary goes here. This should be quite short,
60 just a single paragraph of text really, giving the project61 just a single paragraph of text really, giving the project
61 highlights.62 highlights.
62 </div>63 </div>
6364
64 <div class="description"65 <div class="description"
65 tal:content="structure context/description/fmt:text-to-html"66 tal:content="structure context/description/fmt:markdown"
66 tal:condition="context/description">67 tal:condition="context/description">
67 $Product.description goes here. This should be a longer piece of68 $Product.description goes here. This should be a longer piece of
68 text, up to three paragraphs in length, which gives much more69 text, up to three paragraphs in length, which gives much more
6970
=== modified file 'lib/lp/services/features/flags.py'
--- lib/lp/services/features/flags.py 2011-11-24 15:58:51 +0000
+++ lib/lp/services/features/flags.py 2011-11-25 08:27:29 +0000
@@ -113,6 +113,12 @@
113 '',113 '',
114 '',114 '',
115 ''),115 ''),
116 ('markdown.enabled',
117 'boolean',
118 'Interpret selected user content as Markdown.',
119 'disabled',
120 'Markdown',
121 'https://launchpad.net/bugs/391780'),
116 ('memcache',122 ('memcache',
117 'boolean',123 'boolean',
118 'Enables use of memcached where it is supported.',124 'Enables use of memcached where it is supported.',
119125
=== modified file 'setup.py'
--- setup.py 2011-11-07 12:47:36 +0000
+++ setup.py 2011-11-25 08:27:29 +0000
@@ -52,6 +52,7 @@
52 # Required for launchpadlib52 # Required for launchpadlib
53 'keyring',53 'keyring',
54 'manuel',54 'manuel',
55 'Markdown',
55 'mechanize',56 'mechanize',
56 'meliae',57 'meliae',
57 'mercurial',58 'mercurial',
5859
=== modified file 'versions.cfg'
--- versions.cfg 2011-11-21 15:52:36 +0000
+++ versions.cfg 2011-11-25 08:27:29 +0000
@@ -43,6 +43,7 @@
43lazr.testing = 0.1.143lazr.testing = 0.1.1
44lazr.uri = 1.0.244lazr.uri = 1.0.2
45manuel = 1.1.145manuel = 1.1.1
46Markdown = 2.1.0
46martian = 0.1147martian = 0.11
47mechanize = 0.1.1148mechanize = 0.1.11
48meliae = 0.2.0.final.049meliae = 0.2.0.final.0