Merge lp:~mbp/launchpad/391780-markdown into lp:launchpad

Proposed by Martin Pool
Status: Merged
Approved by: Martin Pool
Approved revision: no longer in the source branch.
Merged at revision: 14384
Proposed branch: lp:~mbp/launchpad/391780-markdown
Merge into: lp:launchpad
Diff against target: 225 lines (+90/-4)
7 files modified
lib/lp/app/browser/stringformatter.py (+26/-1)
lib/lp/app/browser/tests/test_stringformatter.py (+50/-0)
lib/lp/registry/browser/person.py (+3/-1)
lib/lp/registry/templates/product-index.pt (+3/-2)
lib/lp/services/features/flags.py (+6/-0)
setup.py (+1/-0)
versions.cfg (+1/-0)
To merge this branch: bzr merge lp:~mbp/launchpad/391780-markdown
Reviewer Review Type Date Requested Status
Raphaël Badin (community) Approve
Review via email: mp+82832@code.launchpad.net

Commit message

[r=rvb][bug=391780][incr] Markdown markup in project and user home pages

Description of the change

Support Markdown markup in project and user home pages, per bug 391780.

To post a comment you must log in.
Revision history for this message
Raphaël Badin (rvb) wrote :

Looks good.

A few remarks:

[0]

50 +def format_markdown(text):
51 + """Return html form of marked-up text."""
52 + # This returns whole paragraphs (in p tags), similarly to text_to_html.
53 + md = markdown.Markdown(
54 + safe_mode='escape',
55 + extensions=[
56 + 'tables',
57 + ])
58 + return md.convert(text) # How easy was that?

Looks like there is a big overhead in creating the Mardown class each time we have a string to convert:

python -m timeit -s 'import markdown; md = markdown.Markdown(safe_mode="escape",extensions=[ "tables",])' 'for x in range(1000): md.convert("a *b* a");'
10 loops, best of 3: 246 msec per loop

python -m timeit -s 'import markdown;' 'for x in range(1000): markdown.Markdown(safe_mode="escape",extensions=[ "tables",]).convert("a *b* a");'
10 loops, best of 3: 471 msec per loop

You might want to create and reuse instances stored in threading.local.

[1]

125 + def test_plain_text(self):
126 + self.assertThat(
127 + 'hello world',
128 + MarksDownAs('<p>hello world</p>'))

Don't you want to include 'real' markdown markup here, just as an illustration for the reader?

[2]

160 - <div class="summary" tal:content="context/summary">
161 + <div class="summary"
162 + tal:content="structure context/summary/fmt:markdown">
163 $Product.summary goes here. This should be quite short,

Small indentation inconsistency here ;)

review: Approve
Revision history for this message
Martin Pool (mbp) wrote :

thanks for all these reviews.

this is not quite ready to land, with some issues still discussed on the bug.

Thank you for timing this - of course we have to do that first.
That's a good point that constructing the instance seems expensive. I
was worried by your timings until I realized you're actually measuring
1000 iterations. On my machine:

"a *b* a" is 0.246ms
10 plain text lines - 0.34ms
100 indented code lines - 0.4ms
100 lines with inline emphasis and links - 29ms
1000 lines with markup - 306ms

So for really large amounts of text it is starting to eat up a fair
amount of our render budget, but for more realistic amounts it will be
ok. Perhaps in a future release we can render on the client (if the
client is smart) and then actually take net load off the server.

Revision history for this message
Alexander Regueiro (alexreg) wrote :

I'm not sure the efficiency here is a big problem, though it does intrigue me. Which Python library are you using for Markdown rendering? There exist efficient C libraries, if it would make a difference.

Revision history for this message
Huw Wilkins (huwshimi) wrote :

@mbp one thing we'll need to do is style this content (for all the different types of elements Markdown generates). The consideration here is that this is a user controlled content block. We'll want to style the elements here slightly differently to how we style the same elements elsewhere (or at least have the ability to do so).

I guess to make that happen it would be easiest if the element wrapping the markdown content had the class "markdown".

Revision history for this message
Martin Pool (mbp) wrote :

alexreg python-markdown 2.0.3.

i think we can safely keep the option of a C rewrite in our pocket until later.

Revision history for this message
Martin Pool (mbp) wrote :

I think what still needs to be done before the first tranche can land is:

 * make links in the output be rel=nofollow

and good to in the first landing or soon after

 * put some kind of css marker on markdown formatted content
 * update to Markdown 2.1beta so we can have nl2br
 * add patterns for things that are currently linkified by launchpad (bug N etc)

Revision history for this message
Alexander Regueiro (alexreg) wrote :

Fair enough. Let's wait until efficiency proves itself as a problem in that case.

All four points seem quite sensible, though I think the last one won't prove too useful until we get wiki support in Launchpad (which I hope to have a go at soon). On project summary pages -- the only current application afaik -- there's no rush probably.

Revision history for this message
Martin Pool (mbp) wrote :

Linkifying text like 'bug 391780' is important otherwise this will be a regression compared to the way text is currently displayed. Though, it is probably fairly unlikely that kind of text occurs in project or person home pages, so perhaps it's not crucial. It would be important for bug comments.

Revision history for this message
Martin Pool (mbp) wrote :

When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of difference in creating a formatter object every time vs reusing it, but not such a drastic difference that I think we have to worry about it right now::

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown' 'markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"]).convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.31 msec per loop

mbp@joy% ./bin/py -m timeit -r10 -s 'import markdown;md =markdown.Markdown(safe_mode="escape", extensions=["tables", "nl2br"])' 'md.reset();md.convert("foo foo\n" * 20)'
1000 loops, best of 10: 1.07 msec per loop

so about .24ms overhead per markdown block that we format.

Revision history for this message
Martin Pool (mbp) wrote :

It looks like <a rel=nofollow> is going to require development of a new extension, which could probably usefully be sent upstream. So I think it's ok to deploy this as a limited beta, without that - it only makes a difference to search engines, they won't be able to see it while it's hidden, and we can see if there are any other issues.

Revision history for this message
Raphaël Badin (rvb) wrote :

> It looks like <a rel=nofollow> is going to require development of a new
> extension, which could probably usefully be sent upstream. So I think it's ok
> to deploy this as a limited beta, without that - it only makes a difference to
> search engines, they won't be able to see it while it's hidden, and we can see
> if there are any other issues.

I understood your goal was to test this on real data… since it's all behind a FF I don't see why you should not land this and see what happens.

Revision history for this message
Raphaël Badin (rvb) wrote :

> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
> so about .24ms overhead per markdown block that we format.

Right, the absolute number is not so great but it is still a 30% difference. On the other hand, maybe the difference will be even smaller when rendering more complex markup.

Revision history for this message
Martin Pool (mbp) wrote :

2011/11/25 Raphaël Badin <email address hidden>:
>> When I tested again with Markdown 2.1.0 (now in lp-sourcedeps), I see a bit of
>> so about .24ms overhead per markdown block that we format.
>
> Right, the absolute number is not so great but it is still a 30% difference.  On the other hand, maybe the difference will be even smaller when rendering more complex markup.

I think the relevant calculation is not to divide by the render time,
but rather to multiply by the number of blocks on the page. With this
branch, it's at most one, and 0.24ms is negligible.

I think the most we would forseeably get to is a heavily commented bug
or mp with say 100 comments, all using md, and then it would be 24ms
which is probably still not too bad, considering such a page probably
has a >2000ms render time today.

If eventually we want md even for very fine grained items we might
have to work out something else.

Anyhow, thanks for not blocking initial landing.

--
Martin

Revision history for this message
Raphaël Badin (rvb) wrote :

> I think the relevant calculation is not to divide by the render time,
> but rather to multiply by the number of blocks on the page. With this
> branch, it's at most one, and 0.24ms is negligible.
>
> I think the most we would forseeably get to is a heavily commented bug
> or mp with say 100 comments, all using md, and then it would be 24ms
> which is probably still not too bad, considering such a page probably
> has a >2000ms render time today.

Good point. I've been working on optimizing queries and stuff for the past 3 weeks and this is fu***** with my brain a little I must say :).

> Anyhow, thanks for not blocking initial landing.

That's what FF are for: landing semi finished stuff to see how it works in production ;)

Anyway, thanks for pushing this forward Martin.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/app/browser/stringformatter.py'
2--- lib/lp/app/browser/stringformatter.py 2011-11-22 06:54:05 +0000
3+++ lib/lp/app/browser/stringformatter.py 2011-11-25 08:27:29 +0000
4@@ -22,6 +22,7 @@
5 import re
6 from lxml import html
7 from xml.sax.saxutils import unescape as xml_unescape
8+import markdown
9
10 from zope.component import getUtility
11 from zope.interface import implements
12@@ -39,6 +40,9 @@
13 re_email_address,
14 obfuscate_email,
15 )
16+from lp.services.features import (
17+ getFeatureFlag,
18+ )
19
20
21 def escape(text, quote=True):
22@@ -556,7 +560,8 @@
23 0*(?P<bugnum>\d+)
24 ) |
25 (?P<faq>
26- \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?(?:[\s=-]|<br\s*/>)*
27+ \bfaq(?:[\s=-]|<br\s*/>)*(?:\#|item|number?|num\.?|no\.?)?
28+ (?:[\s=-]|<br\s*/>)*
29 0*(?P<faqnum>\d+)
30 ) |
31 (?P<oops>
32@@ -982,6 +987,12 @@
33 url = root_url + self._stringtoformat
34 return '<a href="%s">%s</a>' % (url, self._stringtoformat)
35
36+ def markdown(self):
37+ if getFeatureFlag('markdown.enabled'):
38+ return format_markdown(self._stringtoformat)
39+ else:
40+ return self.text_to_html()
41+
42 def traverse(self, name, furtherPath):
43 if name == 'nl_to_br':
44 return self.nl_to_br()
45@@ -991,6 +1002,8 @@
46 return self.lower()
47 elif name == 'break-long-words':
48 return self.break_long_words()
49+ elif name == 'markdown':
50+ return self.markdown()
51 elif name == 'text-to-html':
52 return self.text_to_html()
53 elif name == 'text-to-html-with-target':
54@@ -1033,3 +1046,15 @@
55 return self.oops_id()
56 else:
57 raise TraversalError(name)
58+
59+
60+def format_markdown(text):
61+ """Return html form of marked-up text."""
62+ # This returns whole paragraphs (in p tags), similarly to text_to_html.
63+ md = markdown.Markdown(
64+ safe_mode='escape',
65+ extensions=[
66+ 'tables',
67+ 'nl2br',
68+ ])
69+ return md.convert(text) # How easy was that?
70
71=== modified file 'lib/lp/app/browser/tests/test_stringformatter.py'
72--- lib/lp/app/browser/tests/test_stringformatter.py 2011-08-28 07:29:11 +0000
73+++ lib/lp/app/browser/tests/test_stringformatter.py 2011-11-25 08:27:29 +0000
74@@ -9,6 +9,11 @@
75 from textwrap import dedent
76 import unittest
77
78+from testtools.matchers import (
79+ Equals,
80+ Matcher,
81+ )
82+
83 from zope.component import getUtility
84
85 from canonical.config import config
86@@ -20,6 +25,7 @@
87 linkify_bug_numbers,
88 )
89 from lp.testing import TestCase
90+from lp.services.features.testing import FeatureFixture
91
92
93 def test_split_paragraphs():
94@@ -401,6 +407,50 @@
95 expected_string, formatted_string))
96
97
98+class MarksDownAs(Matcher):
99+
100+ def __init__(self, expected_html):
101+ self.expected_html = expected_html
102+
103+ def match(self, input_string):
104+ return Equals(self.expected_html).match(
105+ FormattersAPI(input_string).markdown())
106+
107+
108+class TestMarkdownDisabled(TestCase):
109+ """Feature flag can turn Markdown stuff off.
110+ """
111+
112+ layer = DatabaseFunctionalLayer # Fixtures need the database for now
113+
114+ def setUp(self):
115+ super(TestMarkdownDisabled, self).setUp()
116+ self.useFixture(FeatureFixture({'markdown.enabled': None}))
117+
118+ def test_plain_text(self):
119+ self.assertThat(
120+ 'hello **simple** world',
121+ MarksDownAs('<p>hello **simple** world</p>'))
122+
123+
124+class TestMarkdown(TestCase):
125+ """Test for Markdown integration within Launchpad.
126+
127+ Not an exhaustive test, more of a check for our integration and configuration.
128+ """
129+
130+ layer = DatabaseFunctionalLayer # Fixtures need the database for now
131+
132+ def setUp(self):
133+ super(TestMarkdown, self).setUp()
134+ self.useFixture(FeatureFixture({'markdown.enabled': 'on'}))
135+
136+ def test_plain_text(self):
137+ self.assertThat(
138+ 'hello world',
139+ MarksDownAs('<p>hello world</p>'))
140+
141+
142 def test_suite():
143 suite = unittest.TestSuite()
144 suite.addTests(DocTestSuite())
145
146=== modified file 'lib/lp/registry/browser/person.py'
147--- lib/lp/registry/browser/person.py 2011-11-20 07:23:45 +0000
148+++ lib/lp/registry/browser/person.py 2011-11-25 08:27:29 +0000
149@@ -2381,10 +2381,12 @@
150 if content is None:
151 return None
152 elif self.is_probationary_or_invalid_user:
153+ # XXX: Is this really useful? They can post links in many other
154+ # places. -- mbp 2011-11-20.
155 return cgi.escape(content)
156 else:
157 formatter = FormattersAPI
158- return formatter(content).text_to_html()
159+ return formatter(content).markdown()
160
161 @cachedproperty
162 def recently_approved_members(self):
163
164=== modified file 'lib/lp/registry/templates/product-index.pt'
165--- lib/lp/registry/templates/product-index.pt 2011-06-16 13:50:58 +0000
166+++ lib/lp/registry/templates/product-index.pt 2011-11-25 08:27:29 +0000
167@@ -55,14 +55,15 @@
168 <a tal:replace="structure overview_menu/review_license/fmt:icon"/>
169 </p>
170
171- <div class="summary" tal:content="context/summary">
172+ <div class="summary"
173+ tal:content="structure context/summary/fmt:markdown">
174 $Product.summary goes here. This should be quite short,
175 just a single paragraph of text really, giving the project
176 highlights.
177 </div>
178
179 <div class="description"
180- tal:content="structure context/description/fmt:text-to-html"
181+ tal:content="structure context/description/fmt:markdown"
182 tal:condition="context/description">
183 $Product.description goes here. This should be a longer piece of
184 text, up to three paragraphs in length, which gives much more
185
186=== modified file 'lib/lp/services/features/flags.py'
187--- lib/lp/services/features/flags.py 2011-11-24 15:58:51 +0000
188+++ lib/lp/services/features/flags.py 2011-11-25 08:27:29 +0000
189@@ -113,6 +113,12 @@
190 '',
191 '',
192 ''),
193+ ('markdown.enabled',
194+ 'boolean',
195+ 'Interpret selected user content as Markdown.',
196+ 'disabled',
197+ 'Markdown',
198+ 'https://launchpad.net/bugs/391780'),
199 ('memcache',
200 'boolean',
201 'Enables use of memcached where it is supported.',
202
203=== modified file 'setup.py'
204--- setup.py 2011-11-07 12:47:36 +0000
205+++ setup.py 2011-11-25 08:27:29 +0000
206@@ -52,6 +52,7 @@
207 # Required for launchpadlib
208 'keyring',
209 'manuel',
210+ 'Markdown',
211 'mechanize',
212 'meliae',
213 'mercurial',
214
215=== modified file 'versions.cfg'
216--- versions.cfg 2011-11-21 15:52:36 +0000
217+++ versions.cfg 2011-11-25 08:27:29 +0000
218@@ -43,6 +43,7 @@
219 lazr.testing = 0.1.1
220 lazr.uri = 1.0.2
221 manuel = 1.1.1
222+Markdown = 2.1.0
223 martian = 0.11
224 mechanize = 0.1.11
225 meliae = 0.2.0.final.0