Merge lp:~mitya57/ubuntu/raring/python-docutils/updated-aliases-patch into lp:ubuntu/raring/python-docutils

Proposed by Dmitry Shachnev
Status: Merged
Merged at revision: 33
Proposed branch: lp:~mitya57/ubuntu/raring/python-docutils/updated-aliases-patch
Merge into: lp:ubuntu/raring/python-docutils
Diff against target: 10178 lines (+9862/-50)
14 files modified
.pc/applied-patches (+1/-0)
.pc/support-aliases-in-references.diff/docs/ref/rst/restructuredtext.txt (+2979/-0)
.pc/support-aliases-in-references.diff/docutils/parsers/rst/states.py (+3052/-0)
.pc/support-aliases-in-references.diff/docutils/transforms/references.py (+903/-0)
.pc/support-aliases-in-references.diff/test/test_parsers/test_rst/test_inline_markup.py (+1682/-0)
.pc/support-aliases-in-references.diff/test/test_transforms/test_hyperlinks.py (+843/-0)
debian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/support-aliases-in-references.diff (+130/-18)
docs/ref/rst/restructuredtext.txt (+39/-12)
docutils/parsers/rst/states.py (+37/-16)
docutils/transforms/references.py (+6/-4)
test/test_parsers/test_rst/test_inline_markup.py (+85/-0)
test/test_transforms/test_hyperlinks.py (+97/-0)
To merge this branch: bzr merge lp:~mitya57/ubuntu/raring/python-docutils/updated-aliases-patch
Reviewer Review Type Date Requested Status
Ubuntu branches Pending
Review via email: mp+154004@code.launchpad.net

Description of the change

Note: I don't think this needs an FFe because this patch was present in previous version of the package before the FF, and disabling it was a quick work-around for the crash (please correct me if I'm wrong).

The patch is question makes it possible to always use links like

    `Example link <1_>`_

    .. _1: http://example.com

which is an useful feature for localized projects like Ubuntu Packaging Guide (it has been possible to *sometimes* use such links before, but now the support is consistent).

This patch was disabled in 0.10-1ubuntu2 because it caused python3.3 docs to fail to build. That crash could be possible without this patch, but the patch made it easier to trigger. The crash is now fixed in the new version, and also references that start with the URI scheme are now ignored and processed as in the previous versions (that ensures we don't break any existing projects).

All changes are covered by regression tests, which are run during build and on jenkins.ubuntu.com. I've verified that both Docutils and Sphinx test suites succeed, and that python3.3 docs build correctly.

To post a comment you must log in.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file '.pc/applied-patches'
--- .pc/applied-patches 2013-03-06 14:34:02 +0000
+++ .pc/applied-patches 2013-03-19 07:30:29 +0000
@@ -8,3 +8,4 @@
8test-sys-path.diff8test-sys-path.diff
9move-data-to-usr-share.diff9move-data-to-usr-share.diff
10disable_py33_failing_tests.diff10disable_py33_failing_tests.diff
11support-aliases-in-references.diff
1112
=== added directory '.pc/support-aliases-in-references.diff'
=== added directory '.pc/support-aliases-in-references.diff/docs'
=== added directory '.pc/support-aliases-in-references.diff/docs/ref'
=== added directory '.pc/support-aliases-in-references.diff/docs/ref/rst'
=== added file '.pc/support-aliases-in-references.diff/docs/ref/rst/restructuredtext.txt'
--- .pc/support-aliases-in-references.diff/docs/ref/rst/restructuredtext.txt 1970-01-01 00:00:00 +0000
+++ .pc/support-aliases-in-references.diff/docs/ref/rst/restructuredtext.txt 2013-03-19 07:30:29 +0000
@@ -0,0 +1,2979 @@
1.. -*- coding: utf-8 -*-
2
3=======================================
4 reStructuredText Markup Specification
5=======================================
6
7:Author: David Goodger
8:Contact: docutils-develop@lists.sourceforge.net
9:Revision: $Revision: 7302 $
10:Date: $Date: 2012-01-03 20:23:53 +0100 (Di, 03. Jan 2012) $
11:Copyright: This document has been placed in the public domain.
12
13.. Note::
14
15 This document is a detailed technical specification; it is not a
16 tutorial or a primer. If this is your first exposure to
17 reStructuredText, please read `A ReStructuredText Primer`_ and the
18 `Quick reStructuredText`_ user reference first.
19
20.. _A ReStructuredText Primer: ../../user/rst/quickstart.html
21.. _Quick reStructuredText: ../../user/rst/quickref.html
22
23
24reStructuredText_ is plaintext that uses simple and intuitive
25constructs to indicate the structure of a document. These constructs
26are equally easy to read in raw and processed forms. This document is
27itself an example of reStructuredText (raw, if you are reading the
28text file, or processed, if you are reading an HTML document, for
29example). The reStructuredText parser is a component of Docutils_.
30
31Simple, implicit markup is used to indicate special constructs, such
32as section headings, bullet lists, and emphasis. The markup used is
33as minimal and unobtrusive as possible. Less often-used constructs
34and extensions to the basic reStructuredText syntax may have more
35elaborate or explicit markup.
36
37reStructuredText is applicable to documents of any length, from the
38very small (such as inline program documentation fragments, e.g.
39Python docstrings) to the quite large (this document).
40
41The first section gives a quick overview of the syntax of the
42reStructuredText markup by example. A complete specification is given
43in the `Syntax Details`_ section.
44
45`Literal blocks`_ (in which no markup processing is done) are used for
46examples throughout this document, to illustrate the plaintext markup.
47
48
49.. contents::
50
51
52-----------------------
53 Quick Syntax Overview
54-----------------------
55
56A reStructuredText document is made up of body or block-level
57elements, and may be structured into sections. Sections_ are
58indicated through title style (underlines & optional overlines).
59Sections contain body elements and/or subsections. Some body elements
60contain further elements, such as lists containing list items, which
61in turn may contain paragraphs and other body elements. Others, such
62as paragraphs, contain text and `inline markup`_ elements.
63
64Here are examples of `body elements`_:
65
66- Paragraphs_ (and `inline markup`_)::
67
68 Paragraphs contain text and may contain inline markup:
69 *emphasis*, **strong emphasis**, `interpreted text`, ``inline
70 literals``, standalone hyperlinks (http://www.python.org),
71 external hyperlinks (Python_), internal cross-references
72 (example_), footnote references ([1]_), citation references
73 ([CIT2002]_), substitution references (|example|), and _`inline
74 internal targets`.
75
76 Paragraphs are separated by blank lines and are left-aligned.
77
78- Five types of lists:
79
80 1. `Bullet lists`_::
81
82 - This is a bullet list.
83
84 - Bullets can be "*", "+", or "-".
85
86 2. `Enumerated lists`_::
87
88 1. This is an enumerated list.
89
90 2. Enumerators may be arabic numbers, letters, or roman
91 numerals.
92
93 3. `Definition lists`_::
94
95 what
96 Definition lists associate a term with a definition.
97
98 how
99 The term is a one-line phrase, and the definition is one
100 or more paragraphs or body elements, indented relative to
101 the term.
102
103 4. `Field lists`_::
104
105 :what: Field lists map field names to field bodies, like
106 database records. They are often part of an extension
107 syntax.
108
109 :how: The field marker is a colon, the field name, and a
110 colon.
111
112 The field body may contain one or more body elements,
113 indented relative to the field marker.
114
115 5. `Option lists`_, for listing command-line options::
116
117 -a command-line option "a"
118 -b file options can have arguments
119 and long descriptions
120 --long options can be long also
121 --input=file long options can also have
122 arguments
123 /V DOS/VMS-style options too
124
125 There must be at least two spaces between the option and the
126 description.
127
128- `Literal blocks`_::
129
130 Literal blocks are either indented or line-prefix-quoted blocks,
131 and indicated with a double-colon ("::") at the end of the
132 preceding paragraph (right here -->)::
133
134 if literal_block:
135 text = 'is left as-is'
136 spaces_and_linebreaks = 'are preserved'
137 markup_processing = None
138
139- `Block quotes`_::
140
141 Block quotes consist of indented body elements:
142
143 This theory, that is mine, is mine.
144
145 -- Anne Elk (Miss)
146
147- `Doctest blocks`_::
148
149 >>> print 'Python-specific usage examples; begun with ">>>"'
150 Python-specific usage examples; begun with ">>>"
151 >>> print '(cut and pasted from interactive Python sessions)'
152 (cut and pasted from interactive Python sessions)
153
154- Two syntaxes for tables_:
155
156 1. `Grid tables`_; complete, but complex and verbose::
157
158 +------------------------+------------+----------+
159 | Header row, column 1 | Header 2 | Header 3 |
160 +========================+============+==========+
161 | body row 1, column 1 | column 2 | column 3 |
162 +------------------------+------------+----------+
163 | body row 2 | Cells may span |
164 +------------------------+-----------------------+
165
166 2. `Simple tables`_; easy and compact, but limited::
167
168 ==================== ========== ==========
169 Header row, column 1 Header 2 Header 3
170 ==================== ========== ==========
171 body row 1, column 1 column 2 column 3
172 body row 2 Cells may span columns
173 ==================== ======================
174
175- `Explicit markup blocks`_ all begin with an explicit block marker,
176 two periods and a space:
177
178 - Footnotes_::
179
180 .. [1] A footnote contains body elements, consistently
181 indented by at least 3 spaces.
182
183 - Citations_::
184
185 .. [CIT2002] Just like a footnote, except the label is
186 textual.
187
188 - `Hyperlink targets`_::
189
190 .. _Python: http://www.python.org
191
192 .. _example:
193
194 The "_example" target above points to this paragraph.
195
196 - Directives_::
197
198 .. image:: mylogo.png
199
200 - `Substitution definitions`_::
201
202 .. |symbol here| image:: symbol.png
203
204 - Comments_::
205
206 .. Comments begin with two dots and a space. Anything may
207 follow, except for the syntax of footnotes/citations,
208 hyperlink targets, directives, or substitution definitions.
209
210
211----------------
212 Syntax Details
213----------------
214
215Descriptions below list "doctree elements" (document tree element
216names; XML DTD generic identifiers) corresponding to syntax
217constructs. For details on the hierarchy of elements, please see `The
218Docutils Document Tree`_ and the `Docutils Generic DTD`_ XML document
219type definition.
220
221
222Whitespace
223==========
224
225Spaces are recommended for indentation_, but tabs may also be used.
226Tabs will be converted to spaces. Tab stops are at every 8th column.
227
228Other whitespace characters (form feeds [chr(12)] and vertical tabs
229[chr(11)]) are converted to single spaces before processing.
230
231
232Blank Lines
233-----------
234
235Blank lines are used to separate paragraphs and other elements.
236Multiple successive blank lines are equivalent to a single blank line,
237except within literal blocks (where all whitespace is preserved).
238Blank lines may be omitted when the markup makes element separation
239unambiguous, in conjunction with indentation. The first line of a
240document is treated as if it is preceded by a blank line, and the last
241line of a document is treated as if it is followed by a blank line.
242
243
244Indentation
245-----------
246
247Indentation is used to indicate -- and is only significant in
248indicating -- block quotes, definitions (in definition list items),
249and local nested content:
250
251- list item content (multi-line contents of list items, and multiple
252 body elements within a list item, including nested lists),
253- the content of literal blocks, and
254- the content of explicit markup blocks.
255
256Any text whose indentation is less than that of the current level
257(i.e., unindented text or "dedents") ends the current level of
258indentation.
259
260Since all indentation is significant, the level of indentation must be
261consistent. For example, indentation is the sole markup indicator for
262`block quotes`_::
263
264 This is a top-level paragraph.
265
266 This paragraph belongs to a first-level block quote.
267
268 Paragraph 2 of the first-level block quote.
269
270Multiple levels of indentation within a block quote will result in
271more complex structures::
272
273 This is a top-level paragraph.
274
275 This paragraph belongs to a first-level block quote.
276
277 This paragraph belongs to a second-level block quote.
278
279 Another top-level paragraph.
280
281 This paragraph belongs to a second-level block quote.
282
283 This paragraph belongs to a first-level block quote. The
284 second-level block quote above is inside this first-level
285 block quote.
286
287When a paragraph or other construct consists of more than one line of
288text, the lines must be left-aligned::
289
290 This is a paragraph. The lines of
291 this paragraph are aligned at the left.
292
293 This paragraph has problems. The
294 lines are not left-aligned. In addition
295 to potential misinterpretation, warning
296 and/or error messages will be generated
297 by the parser.
298
299Several constructs begin with a marker, and the body of the construct
300must be indented relative to the marker. For constructs using simple
301markers (`bullet lists`_, `enumerated lists`_, footnotes_, citations_,
302`hyperlink targets`_, directives_, and comments_), the level of
303indentation of the body is determined by the position of the first
304line of text, which begins on the same line as the marker. For
305example, bullet list bodies must be indented by at least two columns
306relative to the left edge of the bullet::
307
308 - This is the first line of a bullet list
309 item's paragraph. All lines must align
310 relative to the first line. [1]_
311
312 This indented paragraph is interpreted
313 as a block quote.
314
315 Because it is not sufficiently indented,
316 this paragraph does not belong to the list
317 item.
318
319 .. [1] Here's a footnote. The second line is aligned
320 with the beginning of the footnote label. The ".."
321 marker is what determines the indentation.
322
323For constructs using complex markers (`field lists`_ and `option
324lists`_), where the marker may contain arbitrary text, the indentation
325of the first line *after* the marker determines the left edge of the
326body. For example, field lists may have very long markers (containing
327the field names)::
328
329 :Hello: This field has a short field name, so aligning the field
330 body with the first line is feasible.
331
332 :Number-of-African-swallows-required-to-carry-a-coconut: It would
333 be very difficult to align the field body with the left edge
334 of the first line. It may even be preferable not to begin the
335 body on the same line as the marker.
336
337
338Escaping Mechanism
339==================
340
341The character set universally available to plaintext documents, 7-bit
342ASCII, is limited. No matter what characters are used for markup,
343they will already have multiple meanings in written text. Therefore
344markup characters *will* sometimes appear in text **without being
345intended as markup**. Any serious markup system requires an escaping
346mechanism to override the default meaning of the characters used for
347the markup. In reStructuredText we use the backslash, commonly used
348as an escaping character in other domains.
349
350A backslash followed by any character (except whitespace characters)
351escapes that character. The escaped character represents the
352character itself, and is prevented from playing a role in any markup
353interpretation. The backslash is removed from the output. A literal
354backslash is represented by two backslashes in a row (the first
355backslash "escapes" the second, preventing it being interpreted in an
356"escaping" role).
357
358Backslash-escaped whitespace characters are removed from the document.
359This allows for character-level `inline markup`_.
360
361There are two contexts in which backslashes have no special meaning:
362literal blocks and inline literals. In these contexts, a single
363backslash represents a literal backslash, without having to double up.
364
365Please note that the reStructuredText specification and parser do not
366address the issue of the representation or extraction of text input
367(how and in what form the text actually *reaches* the parser).
368Backslashes and other characters may serve a character-escaping
369purpose in certain contexts and must be dealt with appropriately. For
370example, Python uses backslashes in strings to escape certain
371characters, but not others. The simplest solution when backslashes
372appear in Python docstrings is to use raw docstrings::
373
374 r"""This is a raw docstring. Backslashes (\) are not touched."""
375
376
377Reference Names
378===============
379
380Simple reference names are single words consisting of alphanumerics
381plus isolated (no two adjacent) internal hyphens, underscores,
382periods, colons and plus signs; no whitespace or other characters are
383allowed. Footnote labels (Footnotes_ & `Footnote References`_), citation
384labels (Citations_ & `Citation References`_), `interpreted text`_ roles,
385and some `hyperlink references`_ use the simple reference name syntax.
386
387Reference names using punctuation or whose names are phrases (two or
388more space-separated words) are called "phrase-references".
389Phrase-references are expressed by enclosing the phrase in backquotes
390and treating the backquoted text as a reference name::
391
392 Want to learn about `my favorite programming language`_?
393
394 .. _my favorite programming language: http://www.python.org
395
396Simple reference names may also optionally use backquotes.
397
398Reference names are whitespace-neutral and case-insensitive. When
399resolving reference names internally:
400
401- whitespace is normalized (one or more spaces, horizontal or vertical
402 tabs, newlines, carriage returns, or form feeds, are interpreted as
403 a single space), and
404
405- case is normalized (all alphabetic characters are converted to
406 lowercase).
407
408For example, the following `hyperlink references`_ are equivalent::
409
410 - `A HYPERLINK`_
411 - `a hyperlink`_
412 - `A
413 Hyperlink`_
414
415Hyperlinks_, footnotes_, and citations_ all share the same namespace
416for reference names. The labels of citations (simple reference names)
417and manually-numbered footnotes (numbers) are entered into the same
418database as other hyperlink names. This means that a footnote
419(defined as "``.. [1]``") which can be referred to by a footnote
420reference (``[1]_``), can also be referred to by a plain hyperlink
421reference (1_). Of course, each type of reference (hyperlink,
422footnote, citation) may be processed and rendered differently. Some
423care should be taken to avoid reference name conflicts.
424
425
426Document Structure
427==================
428
429Document
430--------
431
432Doctree element: document.
433
434The top-level element of a parsed reStructuredText document is the
435"document" element. After initial parsing, the document element is a
436simple container for a document fragment, consisting of `body
437elements`_, transitions_, and sections_, but lacking a document title
438or other bibliographic elements. The code that calls the parser may
439choose to run one or more optional post-parse transforms_,
440rearranging the document fragment into a complete document with a
441title and possibly other metadata elements (author, date, etc.; see
442`Bibliographic Fields`_).
443
444Specifically, there is no way to indicate a document title and
445subtitle explicitly in reStructuredText. Instead, a lone top-level
446section title (see Sections_ below) can be treated as the document
447title. Similarly, a lone second-level section title immediately after
448the "document title" can become the document subtitle. The rest of
449the sections are then lifted up a level or two. See the `DocTitle
450transform`_ for details.
451
452
453Sections
454--------
455
456Doctree elements: section, title.
457
458Sections are identified through their titles, which are marked up with
459adornment: "underlines" below the title text, or underlines and
460matching "overlines" above the title. An underline/overline is a
461single repeated punctuation character that begins in column 1 and
462forms a line extending at least as far as the right edge of the title
463text. Specifically, an underline/overline character may be any
464non-alphanumeric printable 7-bit ASCII character [#]_. When an
465overline is used, the length and character used must match the
466underline. Underline-only adornment styles are distinct from
467overline-and-underline styles that use the same character. There may
468be any number of levels of section titles, although some output
469formats may have limits (HTML has 6 levels).
470
471.. [#] The following are all valid section title adornment
472 characters::
473
474 ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
475
476 Some characters are more suitable than others. The following are
477 recommended::
478
479 = - ` : . ' " ~ ^ _ * + #
480
481Rather than imposing a fixed number and order of section title
482adornment styles, the order enforced will be the order as encountered.
483The first style encountered will be an outermost title (like HTML H1),
484the second style will be a subtitle, the third will be a subsubtitle,
485and so on.
486
487Below are examples of section title styles::
488
489 ===============
490 Section Title
491 ===============
492
493 ---------------
494 Section Title
495 ---------------
496
497 Section Title
498 =============
499
500 Section Title
501 -------------
502
503 Section Title
504 `````````````
505
506 Section Title
507 '''''''''''''
508
509 Section Title
510 .............
511
512 Section Title
513 ~~~~~~~~~~~~~
514
515 Section Title
516 *************
517
518 Section Title
519 +++++++++++++
520
521 Section Title
522 ^^^^^^^^^^^^^
523
524When a title has both an underline and an overline, the title text may
525be inset, as in the first two examples above. This is merely
526aesthetic and not significant. Underline-only title text may *not* be
527inset.
528
529A blank line after a title is optional. All text blocks up to the
530next title of the same or higher level are included in a section (or
531subsection, etc.).
532
533All section title styles need not be used, nor need any specific
534section title style be used. However, a document must be consistent
535in its use of section titles: once a hierarchy of title styles is
536established, sections must use that hierarchy.
537
538Each section title automatically generates a hyperlink target pointing
539to the section. The text of the hyperlink target (the "reference
540name") is the same as that of the section title. See `Implicit
541Hyperlink Targets`_ for a complete description.
542
543Sections may contain `body elements`_, transitions_, and nested
544sections.
545
546
547Transitions
548-----------
549
550Doctree element: transition.
551
552 Instead of subheads, extra space or a type ornament between
553 paragraphs may be used to mark text divisions or to signal
554 changes in subject or emphasis.
555
556 (The Chicago Manual of Style, 14th edition, section 1.80)
557
558Transitions are commonly seen in novels and short fiction, as a gap
559spanning one or more lines, with or without a type ornament such as a
560row of asterisks. Transitions separate other body elements. A
561transition should not begin or end a section or document, nor should
562two transitions be immediately adjacent.
563
564The syntax for a transition marker is a horizontal line of 4 or more
565repeated punctuation characters. The syntax is the same as section
566title underlines without title text. Transition markers require blank
567lines before and after::
568
569 Para.
570
571 ----------
572
573 Para.
574
575Unlike section title underlines, no hierarchy of transition markers is
576enforced, nor do differences in transition markers accomplish
577anything. It is recommended that a single consistent style be used.
578
579The processing system is free to render transitions in output in any
580way it likes. For example, horizontal rules (``<hr>``) in HTML output
581would be an obvious choice.
582
583
584Body Elements
585=============
586
587Paragraphs
588----------
589
590Doctree element: paragraph.
591
592Paragraphs consist of blocks of left-aligned text with no markup
593indicating any other body element. Blank lines separate paragraphs
594from each other and from other body elements. Paragraphs may contain
595`inline markup`_.
596
597Syntax diagram::
598
599 +------------------------------+
600 | paragraph |
601 | |
602 +------------------------------+
603
604 +------------------------------+
605 | paragraph |
606 | |
607 +------------------------------+
608
609
610Bullet Lists
611------------
612
613Doctree elements: bullet_list, list_item.
614
615A text block which begins with a "*", "+", "-", "•", "‣", or "⁃",
616followed by whitespace, is a bullet list item (a.k.a. "unordered" list
617item). List item bodies must be left-aligned and indented relative to
618the bullet; the text immediately after the bullet determines the
619indentation. For example::
620
621 - This is the first bullet list item. The blank line above the
622 first list item is required; blank lines between list items
623 (such as below this paragraph) are optional.
624
625 - This is the first paragraph in the second item in the list.
626
627 This is the second paragraph in the second item in the list.
628 The blank line above this paragraph is required. The left edge
629 of this paragraph lines up with the paragraph above, both
630 indented relative to the bullet.
631
632 - This is a sublist. The bullet lines up with the left edge of
633 the text blocks above. A sublist is a new list so requires a
634 blank line above and below.
635
636 - This is the third item of the main list.
637
638 This paragraph is not part of the list.
639
640Here are examples of **incorrectly** formatted bullet lists::
641
642 - This first line is fine.
643 A blank line is required between list items and paragraphs.
644 (Warning)
645
646 - The following line appears to be a new sublist, but it is not:
647 - This is a paragraph continuation, not a sublist (since there's
648 no blank line). This line is also incorrectly indented.
649 - Warnings may be issued by the implementation.
650
651Syntax diagram::
652
653 +------+-----------------------+
654 | "- " | list item |
655 +------| (body elements)+ |
656 +-----------------------+
657
658
659Enumerated Lists
660----------------
661
662Doctree elements: enumerated_list, list_item.
663
664Enumerated lists (a.k.a. "ordered" lists) are similar to bullet lists,
665but use enumerators instead of bullets. An enumerator consists of an
666enumeration sequence member and formatting, followed by whitespace.
667The following enumeration sequences are recognized:
668
669- arabic numerals: 1, 2, 3, ... (no upper limit).
670- uppercase alphabet characters: A, B, C, ..., Z.
671- lower-case alphabet characters: a, b, c, ..., z.
672- uppercase Roman numerals: I, II, III, IV, ..., MMMMCMXCIX (4999).
673- lowercase Roman numerals: i, ii, iii, iv, ..., mmmmcmxcix (4999).
674
675In addition, the auto-enumerator, "#", may be used to automatically
676enumerate a list. Auto-enumerated lists may begin with explicit
677enumeration, which sets the sequence. Fully auto-enumerated lists use
678arabic numerals and begin with 1. (Auto-enumerated lists are new in
679Docutils 0.3.8.)
680
681The following formatting types are recognized:
682
683- suffixed with a period: "1.", "A.", "a.", "I.", "i.".
684- surrounded by parentheses: "(1)", "(A)", "(a)", "(I)", "(i)".
685- suffixed with a right-parenthesis: "1)", "A)", "a)", "I)", "i)".
686
687While parsing an enumerated list, a new list will be started whenever:
688
689- An enumerator is encountered which does not have the same format and
690 sequence type as the current list (e.g. "1.", "(a)" produces two
691 separate lists).
692
693- The enumerators are not in sequence (e.g., "1.", "3." produces two
694 separate lists).
695
696It is recommended that the enumerator of the first list item be
697ordinal-1 ("1", "A", "a", "I", or "i"). Although other start-values
698will be recognized, they may not be supported by the output format. A
699level-1 [info] system message will be generated for any list beginning
700with a non-ordinal-1 enumerator.
701
702Lists using Roman numerals must begin with "I"/"i" or a
703multi-character value, such as "II" or "XV". Any other
704single-character Roman numeral ("V", "X", "L", "C", "D", "M") will be
705interpreted as a letter of the alphabet, not as a Roman numeral.
706Likewise, lists using letters of the alphabet may not begin with
707"I"/"i", since these are recognized as Roman numeral 1.
708
709The second line of each enumerated list item is checked for validity.
710This is to prevent ordinary paragraphs from being mistakenly
711interpreted as list items, when they happen to begin with text
712identical to enumerators. For example, this text is parsed as an
713ordinary paragraph::
714
715 A. Einstein was a really
716 smart dude.
717
718However, ambiguity cannot be avoided if the paragraph consists of only
719one line. This text is parsed as an enumerated list item::
720
721 A. Einstein was a really smart dude.
722
723If a single-line paragraph begins with text identical to an enumerator
724("A.", "1.", "(b)", "I)", etc.), the first character will have to be
725escaped in order to have the line parsed as an ordinary paragraph::
726
727 \A. Einstein was a really smart dude.
728
729Examples of nested enumerated lists::
730
731 1. Item 1 initial text.
732
733 a) Item 1a.
734 b) Item 1b.
735
736 2. a) Item 2a.
737 b) Item 2b.
738
739Example syntax diagram::
740
741 +-------+----------------------+
742 | "1. " | list item |
743 +-------| (body elements)+ |
744 +----------------------+
745
746
747Definition Lists
748----------------
749
750Doctree elements: definition_list, definition_list_item, term,
751classifier, definition.
752
753Each definition list item contains a term, optional classifiers, and a
754definition. A term is a simple one-line word or phrase. Optional
755classifiers may follow the term on the same line, each after an inline
756" : " (space, colon, space). A definition is a block indented
757relative to the term, and may contain multiple paragraphs and other
758body elements. There may be no blank line between a term line and a
759definition block (this distinguishes definition lists from `block
760quotes`_). Blank lines are required before the first and after the
761last definition list item, but are optional in-between. For example::
762
763 term 1
764 Definition 1.
765
766 term 2
767 Definition 2, paragraph 1.
768
769 Definition 2, paragraph 2.
770
771 term 3 : classifier
772 Definition 3.
773
774 term 4 : classifier one : classifier two
775 Definition 4.
776
777Inline markup is parsed in the term line before the classifier
778delimiter (" : ") is recognized. The delimiter will only be
779recognized if it appears outside of any inline markup.
780
781A definition list may be used in various ways, including:
782
783- As a dictionary or glossary. The term is the word itself, a
784 classifier may be used to indicate the usage of the term (noun,
785 verb, etc.), and the definition follows.
786
787- To describe program variables. The term is the variable name, a
788 classifier may be used to indicate the type of the variable (string,
789 integer, etc.), and the definition describes the variable's use in
790 the program. This usage of definition lists supports the classifier
791 syntax of Grouch_, a system for describing and enforcing a Python
792 object schema.
793
794Syntax diagram::
795
796 +----------------------------+
797 | term [ " : " classifier ]* |
798 +--+-------------------------+--+
799 | definition |
800 | (body elements)+ |
801 +----------------------------+
802
803
804Field Lists
805-----------
806
807Doctree elements: field_list, field, field_name, field_body.
808
809Field lists are used as part of an extension syntax, such as options
810for directives_, or database-like records meant for further
811processing. They may also be used for two-column table-like
812structures resembling database records (label & data pairs).
813Applications of reStructuredText may recognize field names and
814transform fields or field bodies in certain contexts. For examples,
815see `Bibliographic Fields`_ below, or the "image_" and "meta_"
816directives in `reStructuredText Directives`_.
817
818Field lists are mappings from field names to field bodies, modeled on
819RFC822_ headers. A field name may consist of any characters, but
820colons (":") inside of field names must be escaped with a backslash.
821Inline markup is parsed in field names. Field names are
822case-insensitive when further processed or transformed. The field
823name, along with a single colon prefix and suffix, together form the
824field marker. The field marker is followed by whitespace and the
825field body. The field body may contain multiple body elements,
826indented relative to the field marker. The first line after the field
827name marker determines the indentation of the field body. For
828example::
829
830 :Date: 2001-08-16
831 :Version: 1
832 :Authors: - Me
833 - Myself
834 - I
835 :Indentation: Since the field marker may be quite long, the second
836 and subsequent lines of the field body do not have to line up
837 with the first line, but they must be indented relative to the
838 field name marker, and they must line up with each other.
839 :Parameter i: integer
840
841The interpretation of individual words in a multi-word field name is
842up to the application. The application may specify a syntax for the
843field name. For example, second and subsequent words may be treated
844as "arguments", quoted phrases may be treated as a single argument,
845and direct support for the "name=value" syntax may be added.
846
847Standard RFC822_ headers cannot be used for this construct because
848they are ambiguous. A word followed by a colon at the beginning of a
849line is common in written text. However, in well-defined contexts
850such as when a field list invariably occurs at the beginning of a
851document (PEPs and email messages), standard RFC822 headers could be
852used.
853
854Syntax diagram (simplified)::
855
856 +--------------------+----------------------+
857 | ":" field name ":" | field body |
858 +-------+------------+ |
859 | (body elements)+ |
860 +-----------------------------------+
861
862
863Bibliographic Fields
864````````````````````
865
866Doctree elements: docinfo, author, authors, organization, contact,
867version, status, date, copyright, field, topic.
868
869When a field list is the first non-comment element in a document
870(after the document title, if there is one), it may have its fields
871transformed to document bibliographic data. This bibliographic data
872corresponds to the front matter of a book, such as the title page and
873copyright page.
874
875Certain registered field names (listed below) are recognized and
876transformed to the corresponding doctree elements, most becoming child
877elements of the "docinfo" element. No ordering is required of these
878fields, although they may be rearranged to fit the document structure,
879as noted. Unless otherwise indicated below, each of the bibliographic
880elements' field bodies may contain a single paragraph only. Field
881bodies may be checked for `RCS keywords`_ and cleaned up. Any
882unrecognized fields will remain as generic fields in the docinfo
883element.
884
885The registered bibliographic field names and their corresponding
886doctree elements are as follows:
887
888- Field name "Author": author element.
889- "Authors": authors.
890- "Organization": organization.
891- "Contact": contact.
892- "Address": address.
893- "Version": version.
894- "Status": status.
895- "Date": date.
896- "Copyright": copyright.
897- "Dedication": topic.
898- "Abstract": topic.
899
900The "Authors" field may contain either: a single paragraph consisting
901of a list of authors, separated by ";" or ","; or a bullet list whose
902elements each contain a single paragraph per author. ";" is checked
903first, so "Doe, Jane; Doe, John" will work. In some languages
904(e.g. Swedish), there is no singular/plural distinction between
905"Author" and "Authors", so only an "Authors" field is provided, and a
906single name is interpreted as an "Author". If a single name contains
907a comma, end it with a semicolon to disambiguate: ":Authors: Doe,
908Jane;".
909
910The "Address" field is for a multi-line surface mailing address.
911Newlines and whitespace will be preserved.
912
913The "Dedication" and "Abstract" fields may contain arbitrary body
914elements. Only one of each is allowed. They become topic elements
915with "Dedication" or "Abstract" titles (or language equivalents)
916immediately following the docinfo element.
917
918This field-name-to-element mapping can be replaced for other
919languages. See the `DocInfo transform`_ implementation documentation
920for details.
921
922Unregistered/generic fields may contain one or more paragraphs or
923arbitrary body elements.
924
925
926RCS Keywords
927````````````
928
929`Bibliographic fields`_ recognized by the parser are normally checked
930for RCS [#]_ keywords and cleaned up [#]_. RCS keywords may be
931entered into source files as "$keyword$", and once stored under RCS or
932CVS [#]_, they are expanded to "$keyword: expansion text $". For
933example, a "Status" field will be transformed to a "status" element::
934
935 :Status: $keyword: expansion text $
936
937.. [#] Revision Control System.
938.. [#] RCS keyword processing can be turned off (unimplemented).
939.. [#] Concurrent Versions System. CVS uses the same keywords as RCS.
940
941Processed, the "status" element's text will become simply "expansion
942text". The dollar sign delimiters and leading RCS keyword name are
943removed.
944
945The RCS keyword processing only kicks in when the field list is in
946bibliographic context (first non-comment construct in the document,
947after a document title if there is one).
948
949
950Option Lists
951------------
952
953Doctree elements: option_list, option_list_item, option_group, option,
954option_string, option_argument, description.
955
956Option lists are two-column lists of command-line options and
957descriptions, documenting a program's options. For example::
958
959 -a Output all.
960 -b Output both (this description is
961 quite long).
962 -c arg Output just arg.
963 --long Output all day long.
964
965 -p This option has two paragraphs in the description.
966 This is the first.
967
968 This is the second. Blank lines may be omitted between
969 options (as above) or left in (as here and below).
970
971 --very-long-option A VMS-style option. Note the adjustment for
972 the required two spaces.
973
974 --an-even-longer-option
975 The description can also start on the next line.
976
977 -2, --two This option has two variants.
978
979 -f FILE, --file=FILE These two options are synonyms; both have
980 arguments.
981
982 /V A VMS/DOS-style option.
983
984There are several types of options recognized by reStructuredText:
985
986- Short POSIX options consist of one dash and an option letter.
987- Long POSIX options consist of two dashes and an option word; some
988 systems use a single dash.
989- Old GNU-style "plus" options consist of one plus and an option
990 letter ("plus" options are deprecated now, their use discouraged).
991- DOS/VMS options consist of a slash and an option letter or word.
992
993Please note that both POSIX-style and DOS/VMS-style options may be
994used by DOS or Windows software. These and other variations are
995sometimes used mixed together. The names above have been chosen for
996convenience only.
997
998The syntax for short and long POSIX options is based on the syntax
999supported by Python's getopt.py_ module, which implements an option
1000parser similar to the `GNU libc getopt_long()`_ function but with some
1001restrictions. There are many variant option systems, and
1002reStructuredText option lists do not support all of them.
1003
1004Although long POSIX and DOS/VMS option words may be allowed to be
1005truncated by the operating system or the application when used on the
1006command line, reStructuredText option lists do not show or support
1007this with any special syntax. The complete option word should be
1008given, supported by notes about truncation if and when applicable.
1009
1010Options may be followed by an argument placeholder, whose role and
1011syntax should be explained in the description text. Either a space or
1012an equals sign may be used as a delimiter between options and option
1013argument placeholders; short options ("-" or "+" prefix only) may omit
1014the delimiter. Option arguments may take one of two forms:
1015
1016- Begins with a letter (``[a-zA-Z]``) and subsequently consists of
1017 letters, numbers, underscores and hyphens (``[a-zA-Z0-9_-]``).
1018- Begins with an open-angle-bracket (``<``) and ends with a
1019 close-angle-bracket (``>``); any characters except angle brackets
1020 are allowed internally.
1021
1022Multiple option "synonyms" may be listed, sharing a single
1023description. They must be separated by comma-space.
1024
1025There must be at least two spaces between the option(s) and the
1026description. The description may contain multiple body elements. The
1027first line after the option marker determines the indentation of the
1028description. As with other types of lists, blank lines are required
1029before the first option list item and after the last, but are optional
1030between option entries.
1031
1032Syntax diagram (simplified)::
1033
1034 +----------------------------+-------------+
1035 | option [" " argument] " " | description |
1036 +-------+--------------------+ |
1037 | (body elements)+ |
1038 +----------------------------------+
1039
1040
1041Literal Blocks
1042--------------
1043
1044Doctree element: literal_block.
1045
1046A paragraph consisting of two colons ("::") signifies that the
1047following text block(s) comprise a literal block. The literal block
1048must either be indented or quoted (see below). No markup processing
1049is done within a literal block. It is left as-is, and is typically
1050rendered in a monospaced typeface::
1051
1052 This is a typical paragraph. An indented literal block follows.
1053
1054 ::
1055
1056 for a in [5,4,3,2,1]: # this is program code, shown as-is
1057 print a
1058 print "it's..."
1059 # a literal block continues until the indentation ends
1060
1061 This text has returned to the indentation of the first paragraph,
1062 is outside of the literal block, and is therefore treated as an
1063 ordinary paragraph.
1064
1065The paragraph containing only "::" will be completely removed from the
1066output; no empty paragraph will remain.
1067
1068As a convenience, the "::" is recognized at the end of any paragraph.
1069If immediately preceded by whitespace, both colons will be removed
1070from the output (this is the "partially minimized" form). When text
1071immediately precedes the "::", *one* colon will be removed from the
1072output, leaving only one colon visible (i.e., "::" will be replaced by
1073":"; this is the "fully minimized" form).
1074
1075In other words, these are all equivalent (please pay attention to the
1076colons after "Paragraph"):
1077
10781. Expanded form::
1079
1080 Paragraph:
1081
1082 ::
1083
1084 Literal block
1085
10862. Partially minimized form::
1087
1088 Paragraph: ::
1089
1090 Literal block
1091
10923. Fully minimized form::
1093
1094 Paragraph::
1095
1096 Literal block
1097
1098All whitespace (including line breaks, but excluding minimum
1099indentation for indented literal blocks) is preserved. Blank lines
1100are required before and after a literal block, but these blank lines
1101are not included as part of the literal block.
1102
1103
1104Indented Literal Blocks
1105```````````````````````
1106
1107Indented literal blocks are indicated by indentation relative to the
1108surrounding text (leading whitespace on each line). The minimum
1109indentation will be removed from each line of an indented literal
1110block. The literal block need not be contiguous; blank lines are
1111allowed between sections of indented text. The literal block ends
1112with the end of the indentation.
1113
1114Syntax diagram::
1115
1116 +------------------------------+
1117 | paragraph |
1118 | (ends with "::") |
1119 +------------------------------+
1120 +---------------------------+
1121 | indented literal block |
1122 +---------------------------+
1123
1124
1125Quoted Literal Blocks
1126`````````````````````
1127
1128Quoted literal blocks are unindented contiguous blocks of text where
1129each line begins with the same non-alphanumeric printable 7-bit ASCII
1130character [#]_. A blank line ends a quoted literal block. The
1131quoting characters are preserved in the processed document.
1132
1133.. [#]
1134 The following are all valid quoting characters::
1135
1136 ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
1137
1138 Note that these are the same characters as are valid for title
1139 adornment of sections_.
1140
1141Possible uses include literate programming in Haskell and email
1142quoting::
1143
1144 John Doe wrote::
1145
1146 >> Great idea!
1147 >
1148 > Why didn't I think of that?
1149
1150 You just did! ;-)
1151
1152Syntax diagram::
1153
1154 +------------------------------+
1155 | paragraph |
1156 | (ends with "::") |
1157 +------------------------------+
1158 +------------------------------+
1159 | ">" per-line-quoted |
1160 | ">" contiguous literal block |
1161 +------------------------------+
1162
1163
1164Line Blocks
1165-----------
1166
1167Doctree elements: line_block, line. (New in Docutils 0.3.5.)
1168
1169Line blocks are useful for address blocks, verse (poetry, song
1170lyrics), and unadorned lists, where the structure of lines is
1171significant. Line blocks are groups of lines beginning with vertical
1172bar ("|") prefixes. Each vertical bar prefix indicates a new line, so
1173line breaks are preserved. Initial indents are also significant,
1174resulting in a nested structure. Inline markup is supported.
1175Continuation lines are wrapped portions of long lines; they begin with
1176a space in place of the vertical bar. The left edge of a continuation
1177line must be indented, but need not be aligned with the left edge of
1178the text above it. A line block ends with a blank line.
1179
1180This example illustrates continuation lines::
1181
1182 | Lend us a couple of bob till Thursday.
1183 | I'm absolutely skint.
1184 | But I'm expecting a postal order and I can pay you back
1185 as soon as it comes.
1186 | Love, Ewan.
1187
1188This example illustrates the nesting of line blocks, indicated by the
1189initial indentation of new lines::
1190
1191 Take it away, Eric the Orchestra Leader!
1192
1193 | A one, two, a one two three four
1194 |
1195 | Half a bee, philosophically,
1196 | must, *ipso facto*, half not be.
1197 | But half the bee has got to be,
1198 | *vis a vis* its entity. D'you see?
1199 |
1200 | But can a bee be said to be
1201 | or not to be an entire bee,
1202 | when half the bee is not a bee,
1203 | due to some ancient injury?
1204 |
1205 | Singing...
1206
1207Syntax diagram::
1208
1209 +------+-----------------------+
1210 | "| " | line |
1211 +------| continuation line |
1212 +-----------------------+
1213
1214
1215Block Quotes
1216------------
1217
1218Doctree element: block_quote, attribution.
1219
1220A text block that is indented relative to the preceding text, without
1221preceding markup indicating it to be a literal block or other content,
1222is a block quote. All markup processing (for body elements and inline
1223markup) continues within the block quote::
1224
1225 This is an ordinary paragraph, introducing a block quote.
1226
1227 "It is my business to know things. That is my trade."
1228
1229 -- Sherlock Holmes
1230
1231A block quote may end with an attribution: a text block beginning with
1232"--", "---", or a true em-dash, flush left within the block quote. If
1233the attribution consists of multiple lines, the left edges of the
1234second and subsequent lines must align.
1235
1236Multiple block quotes may occur consecutively if terminated with
1237attributions.
1238
1239 Unindented paragraph.
1240
1241 Block quote 1.
1242
1243 -- Attribution 1
1244
1245 Block quote 2.
1246
1247`Empty comments`_ may be used to explicitly terminate preceding
1248constructs that would otherwise consume a block quote::
1249
1250 * List item.
1251
1252 ..
1253
1254 Block quote 3.
1255
1256Empty comments may also be used to separate block quotes::
1257
1258 Block quote 4.
1259
1260 ..
1261
1262 Block quote 5.
1263
1264Blank lines are required before and after a block quote, but these
1265blank lines are not included as part of the block quote.
1266
1267Syntax diagram::
1268
1269 +------------------------------+
1270 | (current level of |
1271 | indentation) |
1272 +------------------------------+
1273 +---------------------------+
1274 | block quote |
1275 | (body elements)+ |
1276 | |
1277 | -- attribution text |
1278 | (optional) |
1279 +---------------------------+
1280
1281
1282Doctest Blocks
1283--------------
1284
1285Doctree element: doctest_block.
1286
1287Doctest blocks are interactive Python sessions cut-and-pasted into
1288docstrings. They are meant to illustrate usage by example, and
1289provide an elegant and powerful testing environment via the `doctest
1290module`_ in the Python standard library.
1291
1292Doctest blocks are text blocks which begin with ``">>> "``, the Python
1293interactive interpreter main prompt, and end with a blank line.
1294Doctest blocks are treated as a special case of literal blocks,
1295without requiring the literal block syntax. If both are present, the
1296literal block syntax takes priority over Doctest block syntax::
1297
1298 This is an ordinary paragraph.
1299
1300 >>> print 'this is a Doctest block'
1301 this is a Doctest block
1302
1303 The following is a literal block::
1304
1305 >>> This is not recognized as a doctest block by
1306 reStructuredText. It *will* be recognized by the doctest
1307 module, though!
1308
1309Indentation is not required for doctest blocks.
1310
1311
1312Tables
1313------
1314
1315Doctree elements: table, tgroup, colspec, thead, tbody, row, entry.
1316
1317ReStructuredText provides two syntaxes for delineating table cells:
1318`Grid Tables`_ and `Simple Tables`_.
1319
1320As with other body elements, blank lines are required before and after
1321tables. Tables' left edges should align with the left edge of
1322preceding text blocks; if indented, the table is considered to be part
1323of a block quote.
1324
1325Once isolated, each table cell is treated as a miniature document; the
1326top and bottom cell boundaries act as delimiting blank lines. Each
1327cell contains zero or more body elements. Cell contents may include
1328left and/or right margins, which are removed before processing.
1329
1330
1331Grid Tables
1332```````````
1333
1334Grid tables provide a complete table representation via grid-like
1335"ASCII art". Grid tables allow arbitrary cell contents (body
1336elements), and both row and column spans. However, grid tables can be
1337cumbersome to produce, especially for simple data sets. The `Emacs
1338table mode`_ is a tool that allows easy editing of grid tables, in
1339Emacs. See `Simple Tables`_ for a simpler (but limited)
1340representation.
1341
1342Grid tables are described with a visual grid made up of the characters
1343"-", "=", "|", and "+". The hyphen ("-") is used for horizontal lines
1344(row separators). The equals sign ("=") may be used to separate
1345optional header rows from the table body (not supported by the `Emacs
1346table mode`_). The vertical bar ("|") is used for vertical lines
1347(column separators). The plus sign ("+") is used for intersections of
1348horizontal and vertical lines. Example::
1349
1350 +------------------------+------------+----------+----------+
1351 | Header row, column 1 | Header 2 | Header 3 | Header 4 |
1352 | (header rows optional) | | | |
1353 +========================+============+==========+==========+
1354 | body row 1, column 1 | column 2 | column 3 | column 4 |
1355 +------------------------+------------+----------+----------+
1356 | body row 2 | Cells may span columns. |
1357 +------------------------+------------+---------------------+
1358 | body row 3 | Cells may | - Table cells |
1359 +------------------------+ span rows. | - contain |
1360 | body row 4 | | - body elements. |
1361 +------------------------+------------+---------------------+
1362
1363Some care must be taken with grid tables to avoid undesired
1364interactions with cell text in rare cases. For example, the following
1365table contains a cell in row 2 spanning from column 2 to column 4::
1366
1367 +--------------+----------+-----------+-----------+
1368 | row 1, col 1 | column 2 | column 3 | column 4 |
1369 +--------------+----------+-----------+-----------+
1370 | row 2 | |
1371 +--------------+----------+-----------+-----------+
1372 | row 3 | | | |
1373 +--------------+----------+-----------+-----------+
1374
1375If a vertical bar is used in the text of that cell, it could have
1376unintended effects if accidentally aligned with column boundaries::
1377
1378 +--------------+----------+-----------+-----------+
1379 | row 1, col 1 | column 2 | column 3 | column 4 |
1380 +--------------+----------+-----------+-----------+
1381 | row 2 | Use the command ``ls | more``. |
1382 +--------------+----------+-----------+-----------+
1383 | row 3 | | | |
1384 +--------------+----------+-----------+-----------+
1385
1386Several solutions are possible. All that is needed is to break the
1387continuity of the cell outline rectangle. One possibility is to shift
1388the text by adding an extra space before::
1389
1390 +--------------+----------+-----------+-----------+
1391 | row 1, col 1 | column 2 | column 3 | column 4 |
1392 +--------------+----------+-----------+-----------+
1393 | row 2 | Use the command ``ls | more``. |
1394 +--------------+----------+-----------+-----------+
1395 | row 3 | | | |
1396 +--------------+----------+-----------+-----------+
1397
1398Another possibility is to add an extra line to row 2::
1399
1400 +--------------+----------+-----------+-----------+
1401 | row 1, col 1 | column 2 | column 3 | column 4 |
1402 +--------------+----------+-----------+-----------+
1403 | row 2 | Use the command ``ls | more``. |
1404 | | |
1405 +--------------+----------+-----------+-----------+
1406 | row 3 | | | |
1407 +--------------+----------+-----------+-----------+
1408
1409
1410Simple Tables
1411`````````````
1412
1413Simple tables provide a compact and easy to type but limited
1414row-oriented table representation for simple data sets. Cell contents
1415are typically single paragraphs, although arbitrary body elements may
1416be represented in most cells. Simple tables allow multi-line rows (in
1417all but the first column) and column spans, but not row spans. See
1418`Grid Tables`_ above for a complete table representation.
1419
1420Simple tables are described with horizontal borders made up of "=" and
1421"-" characters. The equals sign ("=") is used for top and bottom
1422table borders, and to separate optional header rows from the table
1423body. The hyphen ("-") is used to indicate column spans in a single
1424row by underlining the joined columns, and may optionally be used to
1425explicitly and/or visually separate rows.
1426
1427A simple table begins with a top border of equals signs with one or
1428more spaces at each column boundary (two or more spaces recommended).
1429Regardless of spans, the top border *must* fully describe all table
1430columns. There must be at least two columns in the table (to
1431differentiate it from section headers). The top border may be
1432followed by header rows, and the last of the optional header rows is
1433underlined with '=', again with spaces at column boundaries. There
1434may not be a blank line below the header row separator; it would be
1435interpreted as the bottom border of the table. The bottom boundary of
1436the table consists of '=' underlines, also with spaces at column
1437boundaries. For example, here is a truth table, a three-column table
1438with one header row and four body rows::
1439
1440 ===== ===== =======
1441 A B A and B
1442 ===== ===== =======
1443 False False False
1444 True False False
1445 False True False
1446 True True True
1447 ===== ===== =======
1448
1449Underlines of '-' may be used to indicate column spans by "filling in"
1450column margins to join adjacent columns. Column span underlines must
1451be complete (they must cover all columns) and align with established
1452column boundaries. Text lines containing column span underlines may
1453not contain any other text. A column span underline applies only to
1454one row immediately above it. For example, here is a table with a
1455column span in the header::
1456
1457 ===== ===== ======
1458 Inputs Output
1459 ------------ ------
1460 A B A or B
1461 ===== ===== ======
1462 False False False
1463 True False True
1464 False True True
1465 True True True
1466 ===== ===== ======
1467
1468Each line of text must contain spaces at column boundaries, except
1469where cells have been joined by column spans. Each line of text
1470starts a new row, except when there is a blank cell in the first
1471column. In that case, that line of text is parsed as a continuation
1472line. For this reason, cells in the first column of new rows (*not*
1473continuation lines) *must* contain some text; blank cells would lead
1474to a misinterpretation (but see the tip below). Also, this mechanism
1475limits cells in the first column to only one line of text. Use `grid
1476tables`_ if this limitation is unacceptable.
1477
1478.. Tip::
1479
1480 To start a new row in a simple table without text in the first
1481 column in the processed output, use one of these:
1482
1483 * an empty comment (".."), which may be omitted from the processed
1484 output (see Comments_ below)
1485
1486 * a backslash escape ("``\``") followed by a space (see `Escaping
1487 Mechanism`_ above)
1488
1489Underlines of '-' may also be used to visually separate rows, even if
1490there are no column spans. This is especially useful in long tables,
1491where rows are many lines long.
1492
1493Blank lines are permitted within simple tables. Their interpretation
1494depends on the context. Blank lines *between* rows are ignored.
1495Blank lines *within* multi-line rows may separate paragraphs or other
1496body elements within cells.
1497
1498The rightmost column is unbounded; text may continue past the edge of
1499the table (as indicated by the table borders). However, it is
1500recommended that borders be made long enough to contain the entire
1501text.
1502
1503The following example illustrates continuation lines (row 2 consists
1504of two lines of text, and four lines for row 3), a blank line
1505separating paragraphs (row 3, column 2), text extending past the right
1506edge of the table, and a new row which will have no text in the first
1507column in the processed output (row 4)::
1508
1509 ===== =====
1510 col 1 col 2
1511 ===== =====
1512 1 Second column of row 1.
1513 2 Second column of row 2.
1514 Second line of paragraph.
1515 3 - Second column of row 3.
1516
1517 - Second item in bullet
1518 list (row 3, column 2).
1519 \ Row 4; column 1 will be empty.
1520 ===== =====
1521
1522
1523Explicit Markup Blocks
1524----------------------
1525
1526An explicit markup block is a text block:
1527
1528- whose first line begins with ".." followed by whitespace (the
1529 "explicit markup start"),
1530- whose second and subsequent lines (if any) are indented relative to
1531 the first, and
1532- which ends before an unindented line.
1533
1534Explicit markup blocks are analogous to bullet list items, with ".."
1535as the bullet. The text on the lines immediately after the explicit
1536markup start determines the indentation of the block body. The
1537maximum common indentation is always removed from the second and
1538subsequent lines of the block body. Therefore if the first construct
1539fits in one line, and the indentation of the first and second
1540constructs should differ, the first construct should not begin on the
1541same line as the explicit markup start.
1542
1543Blank lines are required between explicit markup blocks and other
1544elements, but are optional between explicit markup blocks where
1545unambiguous.
1546
1547The explicit markup syntax is used for footnotes, citations, hyperlink
1548targets, directives, substitution definitions, and comments.
1549
1550
1551Footnotes
1552`````````
1553
1554Doctree elements: footnote, label.
1555
1556Each footnote consists of an explicit markup start (".. "), a left
1557square bracket, the footnote label, a right square bracket, and
1558whitespace, followed by indented body elements. A footnote label can
1559be:
1560
1561- a whole decimal number consisting of one or more digits,
1562
1563- a single "#" (denoting `auto-numbered footnotes`_),
1564
1565- a "#" followed by a simple reference name (an `autonumber label`_),
1566 or
1567
1568- a single "*" (denoting `auto-symbol footnotes`_).
1569
1570The footnote content (body elements) must be consistently indented (by
1571at least 3 spaces) and left-aligned. The first body element within a
1572footnote may often begin on the same line as the footnote label.
1573However, if the first element fits on one line and the indentation of
1574the remaining elements differ, the first element must begin on the
1575line after the footnote label. Otherwise, the difference in
1576indentation will not be detected.
1577
1578Footnotes may occur anywhere in the document, not only at the end.
1579Where and how they appear in the processed output depends on the
1580processing system.
1581
1582Here is a manually numbered footnote::
1583
1584 .. [1] Body elements go here.
1585
1586Each footnote automatically generates a hyperlink target pointing to
1587itself. The text of the hyperlink target name is the same as that of
1588the footnote label. `Auto-numbered footnotes`_ generate a number as
1589their footnote label and reference name. See `Implicit Hyperlink
1590Targets`_ for a complete description of the mechanism.
1591
1592Syntax diagram::
1593
1594 +-------+-------------------------+
1595 | ".. " | "[" label "]" footnote |
1596 +-------+ |
1597 | (body elements)+ |
1598 +-------------------------+
1599
1600
1601Auto-Numbered Footnotes
1602.......................
1603
1604A number sign ("#") may be used as the first character of a footnote
1605label to request automatic numbering of the footnote or footnote
1606reference.
1607
1608The first footnote to request automatic numbering is assigned the
1609label "1", the second is assigned the label "2", and so on (assuming
1610there are no manually numbered footnotes present; see `Mixed Manual
1611and Auto-Numbered Footnotes`_ below). A footnote which has
1612automatically received a label "1" generates an implicit hyperlink
1613target with name "1", just as if the label was explicitly specified.
1614
1615.. _autonumber label: `autonumber labels`_
1616
1617A footnote may specify a label explicitly while at the same time
1618requesting automatic numbering: ``[#label]``. These labels are called
1619_`autonumber labels`. Autonumber labels do two things:
1620
1621- On the footnote itself, they generate a hyperlink target whose name
1622 is the autonumber label (doesn't include the "#").
1623
1624- They allow an automatically numbered footnote to be referred to more
1625 than once, as a footnote reference or hyperlink reference. For
1626 example::
1627
1628 If [#note]_ is the first footnote reference, it will show up as
1629 "[1]". We can refer to it again as [#note]_ and again see
1630 "[1]". We can also refer to it as note_ (an ordinary internal
1631 hyperlink reference).
1632
1633 .. [#note] This is the footnote labeled "note".
1634
1635The numbering is determined by the order of the footnotes, not by the
1636order of the references. For footnote references without autonumber
1637labels (``[#]_``), the footnotes and footnote references must be in
1638the same relative order but need not alternate in lock-step. For
1639example::
1640
1641 [#]_ is a reference to footnote 1, and [#]_ is a reference to
1642 footnote 2.
1643
1644 .. [#] This is footnote 1.
1645 .. [#] This is footnote 2.
1646 .. [#] This is footnote 3.
1647
1648 [#]_ is a reference to footnote 3.
1649
1650Special care must be taken if footnotes themselves contain
1651auto-numbered footnote references, or if multiple references are made
1652in close proximity. Footnotes and references are noted in the order
1653they are encountered in the document, which is not necessarily the
1654same as the order in which a person would read them.
1655
1656
1657Auto-Symbol Footnotes
1658.....................
1659
1660An asterisk ("*") may be used for footnote labels to request automatic
1661symbol generation for footnotes and footnote references. The asterisk
1662may be the only character in the label. For example::
1663
1664 Here is a symbolic footnote reference: [*]_.
1665
1666 .. [*] This is the footnote.
1667
1668A transform will insert symbols as labels into corresponding footnotes
1669and footnote references. The number of references must be equal to
1670the number of footnotes. One symbol footnote cannot have multiple
1671references.
1672
1673The standard Docutils system uses the following symbols for footnote
1674marks [#]_:
1675
1676- asterisk/star ("*")
1677- dagger (HTML character entity "&dagger;", Unicode U+02020)
1678- double dagger ("&Dagger;"/U+02021)
1679- section mark ("&sect;"/U+000A7)
1680- pilcrow or paragraph mark ("&para;"/U+000B6)
1681- number sign ("#")
1682- spade suit ("&spades;"/U+02660)
1683- heart suit ("&hearts;"/U+02665)
1684- diamond suit ("&diams;"/U+02666)
1685- club suit ("&clubs;"/U+02663)
1686
1687.. [#] This list was inspired by the list of symbols for "Note
1688 Reference Marks" in The Chicago Manual of Style, 14th edition,
1689 section 12.51. "Parallels" ("||") were given in CMoS instead of
1690 the pilcrow. The last four symbols (the card suits) were added
1691 arbitrarily.
1692
1693If more than ten symbols are required, the same sequence will be
1694reused, doubled and then tripled, and so on ("**" etc.).
1695
1696.. Note:: When using auto-symbol footnotes, the choice of output
1697 encoding is important. Many of the symbols used are not encodable
1698 in certain common text encodings such as Latin-1 (ISO 8859-1). The
1699 use of UTF-8 for the output encoding is recommended. An
1700 alternative for HTML and XML output is to use the
1701 "xmlcharrefreplace" `output encoding error handler`__.
1702
1703__ ../../user/config.html#output-encoding-error-handler
1704
1705
1706Mixed Manual and Auto-Numbered Footnotes
1707........................................
1708
1709Manual and automatic footnote numbering may both be used within a
1710single document, although the results may not be expected. Manual
1711numbering takes priority. Only unused footnote numbers are assigned
1712to auto-numbered footnotes. The following example should be
1713illustrative::
1714
1715 [2]_ will be "2" (manually numbered),
1716 [#]_ will be "3" (anonymous auto-numbered), and
1717 [#label]_ will be "1" (labeled auto-numbered).
1718
1719 .. [2] This footnote is labeled manually, so its number is fixed.
1720
1721 .. [#label] This autonumber-labeled footnote will be labeled "1".
1722 It is the first auto-numbered footnote and no other footnote
1723 with label "1" exists. The order of the footnotes is used to
1724 determine numbering, not the order of the footnote references.
1725
1726 .. [#] This footnote will be labeled "3". It is the second
1727 auto-numbered footnote, but footnote label "2" is already used.
1728
1729
1730Citations
1731`````````
1732
1733Citations are identical to footnotes except that they use only
1734non-numeric labels such as ``[note]`` or ``[GVR2001]``. Citation
1735labels are simple `reference names`_ (case-insensitive single words
1736consisting of alphanumerics plus internal hyphens, underscores, and
1737periods; no whitespace). Citations may be rendered separately and
1738differently from footnotes. For example::
1739
1740 Here is a citation reference: [CIT2002]_.
1741
1742 .. [CIT2002] This is the citation. It's just like a footnote,
1743 except the label is textual.
1744
1745
1746.. _hyperlinks:
1747
1748Hyperlink Targets
1749`````````````````
1750
1751Doctree element: target.
1752
1753These are also called _`explicit hyperlink targets`, to differentiate
1754them from `implicit hyperlink targets`_ defined below.
1755
1756Hyperlink targets identify a location within or outside of a document,
1757which may be linked to by `hyperlink references`_.
1758
1759Hyperlink targets may be named or anonymous. Named hyperlink targets
1760consist of an explicit markup start (".. "), an underscore, the
1761reference name (no trailing underscore), a colon, whitespace, and a
1762link block::
1763
1764 .. _hyperlink-name: link-block
1765
1766Reference names are whitespace-neutral and case-insensitive. See
1767`Reference Names`_ for details and examples.
1768
1769Anonymous hyperlink targets consist of an explicit markup start
1770(".. "), two underscores, a colon, whitespace, and a link block; there
1771is no reference name::
1772
1773 .. __: anonymous-hyperlink-target-link-block
1774
1775An alternate syntax for anonymous hyperlinks consists of two
1776underscores, a space, and a link block::
1777
1778 __ anonymous-hyperlink-target-link-block
1779
1780See `Anonymous Hyperlinks`_ below.
1781
1782There are three types of hyperlink targets: internal, external, and
1783indirect.
1784
17851. _`Internal hyperlink targets` have empty link blocks. They provide
1786 an end point allowing a hyperlink to connect one place to another
1787 within a document. An internal hyperlink target points to the
1788 element following the target. For example::
1789
1790 Clicking on this internal hyperlink will take us to the target_
1791 below.
1792
1793 .. _target:
1794
1795 The hyperlink target above points to this paragraph.
1796
1797 Internal hyperlink targets may be "chained". Multiple adjacent
1798 internal hyperlink targets all point to the same element::
1799
1800 .. _target1:
1801 .. _target2:
1802
1803 The targets "target1" and "target2" are synonyms; they both
1804 point to this paragraph.
1805
1806 If the element "pointed to" is an external hyperlink target (with a
1807 URI in its link block; see #2 below) the URI from the external
1808 hyperlink target is propagated to the internal hyperlink targets;
1809 they will all "point to" the same URI. There is no need to
1810 duplicate a URI. For example, all three of the following hyperlink
1811 targets refer to the same URI::
1812
1813 .. _Python DOC-SIG mailing list archive:
1814 .. _archive:
1815 .. _Doc-SIG: http://mail.python.org/pipermail/doc-sig/
1816
1817 An inline form of internal hyperlink target is available; see
1818 `Inline Internal Targets`_.
1819
18202. _`External hyperlink targets` have an absolute or relative URI or
1821 email address in their link blocks. For example, take the
1822 following input::
1823
1824 See the Python_ home page for info.
1825
1826 `Write to me`_ with your questions.
1827
1828 .. _Python: http://www.python.org
1829 .. _Write to me: jdoe@example.com
1830
1831 After processing into HTML, the hyperlinks might be expressed as::
1832
1833 See the <a href="http://www.python.org">Python</a> home page
1834 for info.
1835
1836 <a href="mailto:jdoe@example.com">Write to me</a> with your
1837 questions.
1838
1839 An external hyperlink's URI may begin on the same line as the
1840 explicit markup start and target name, or it may begin in an
1841 indented text block immediately following, with no intervening
1842 blank lines. If there are multiple lines in the link block, they
1843 are concatenated. Any whitespace is removed (whitespace is
1844 permitted to allow for line wrapping). The following external
1845 hyperlink targets are equivalent::
1846
1847 .. _one-liner: http://docutils.sourceforge.net/rst.html
1848
1849 .. _starts-on-this-line: http://
1850 docutils.sourceforge.net/rst.html
1851
1852 .. _entirely-below:
1853 http://docutils.
1854 sourceforge.net/rst.html
1855
1856 If an external hyperlink target's URI contains an underscore as its
1857 last character, it must be escaped to avoid being mistaken for an
1858 indirect hyperlink target::
1859
1860 This link_ refers to a file called ``underscore_``.
1861
1862 .. _link: underscore\_
1863
1864 It is possible (although not generally recommended) to include URIs
1865 directly within hyperlink references. See `Embedded URIs`_ below.
1866
18673. _`Indirect hyperlink targets` have a hyperlink reference in their
1868 link blocks. In the following example, target "one" indirectly
1869 references whatever target "two" references, and target "two"
1870 references target "three", an internal hyperlink target. In
1871 effect, all three reference the same thing::
1872
1873 .. _one: two_
1874 .. _two: three_
1875 .. _three:
1876
1877 Just as with `hyperlink references`_ anywhere else in a document,
1878 if a phrase-reference is used in the link block it must be enclosed
1879 in backquotes. As with `external hyperlink targets`_, the link
1880 block of an indirect hyperlink target may begin on the same line as
1881 the explicit markup start or the next line. It may also be split
1882 over multiple lines, in which case the lines are joined with
1883 whitespace before being normalized.
1884
1885 For example, the following indirect hyperlink targets are
1886 equivalent::
1887
1888 .. _one-liner: `A HYPERLINK`_
1889 .. _entirely-below:
1890 `a hyperlink`_
1891 .. _split: `A
1892 Hyperlink`_
1893
1894If the reference name contains any colons, either:
1895
1896- the phrase must be enclosed in backquotes::
1897
1898 .. _`FAQTS: Computers: Programming: Languages: Python`:
1899 http://python.faqts.com/
1900
1901- or the colon(s) must be backslash-escaped in the link target::
1902
1903 .. _Chapter One\: "Tadpole Days":
1904
1905 It's not easy being green...
1906
1907See `Implicit Hyperlink Targets`_ below for the resolution of
1908duplicate reference names.
1909
1910Syntax diagram::
1911
1912 +-------+----------------------+
1913 | ".. " | "_" name ":" link |
1914 +-------+ block |
1915 | |
1916 +----------------------+
1917
1918
1919Anonymous Hyperlinks
1920....................
1921
1922The `World Wide Web Consortium`_ recommends in its `HTML Techniques
1923for Web Content Accessibility Guidelines`_ that authors should
1924"clearly identify the target of each link." Hyperlink references
1925should be as verbose as possible, but duplicating a verbose hyperlink
1926name in the target is onerous and error-prone. Anonymous hyperlinks
1927are designed to allow convenient verbose hyperlink references, and are
1928analogous to `Auto-Numbered Footnotes`_. They are particularly useful
1929in short or one-off documents. However, this feature is easily abused
1930and can result in unreadable plaintext and/or unmaintainable
1931documents. Caution is advised.
1932
1933Anonymous `hyperlink references`_ are specified with two underscores
1934instead of one::
1935
1936 See `the web site of my favorite programming language`__.
1937
1938Anonymous targets begin with ".. __:"; no reference name is required
1939or allowed::
1940
1941 .. __: http://www.python.org
1942
1943As a convenient alternative, anonymous targets may begin with "__"
1944only::
1945
1946 __ http://www.python.org
1947
1948The reference name of the reference is not used to match the reference
1949to its target. Instead, the order of anonymous hyperlink references
1950and targets within the document is significant: the first anonymous
1951reference will link to the first anonymous target. The number of
1952anonymous hyperlink references in a document must match the number of
1953anonymous targets. For readability, it is recommended that targets be
1954kept close to references. Take care when editing text containing
1955anonymous references; adding, removing, and rearranging references
1956require attention to the order of corresponding targets.
1957
1958
1959Directives
1960``````````
1961
1962Doctree elements: depend on the directive.
1963
1964Directives are an extension mechanism for reStructuredText, a way of
1965adding support for new constructs without adding new primary syntax
1966(directives may support additional syntax locally). All standard
1967directives (those implemented and registered in the reference
1968reStructuredText parser) are described in the `reStructuredText
1969Directives`_ document, and are always available. Any other directives
1970are domain-specific, and may require special action to make them
1971available when processing the document.
1972
1973For example, here's how an image_ may be placed::
1974
1975 .. image:: mylogo.jpeg
1976
1977A figure_ (a graphic with a caption) may placed like this::
1978
1979 .. figure:: larch.png
1980
1981 The larch.
1982
1983An admonition_ (note, caution, etc.) contains other body elements::
1984
1985 .. note:: This is a paragraph
1986
1987 - Here is a bullet list.
1988
1989Directives are indicated by an explicit markup start (".. ") followed
1990by the directive type, two colons, and whitespace (together called the
1991"directive marker"). Directive types are case-insensitive single
1992words (alphanumerics plus isolated internal hyphens, underscores,
1993plus signs, colons, and periods; no whitespace). Two colons are used
1994after the directive type for these reasons:
1995
1996- Two colons are distinctive, and unlikely to be used in common text.
1997
1998- Two colons avoids clashes with common comment text like::
1999
2000 .. Danger: modify at your own risk!
2001
2002- If an implementation of reStructuredText does not recognize a
2003 directive (i.e., the directive-handler is not installed), a level-3
2004 (error) system message is generated, and the entire directive block
2005 (including the directive itself) will be included as a literal
2006 block. Thus "::" is a natural choice.
2007
2008The directive block is consists of any text on the first line of the
2009directive after the directive marker, and any subsequent indented
2010text. The interpretation of the directive block is up to the
2011directive code. There are three logical parts to the directive block:
2012
20131. Directive arguments.
20142. Directive options.
20153. Directive content.
2016
2017Individual directives can employ any combination of these parts.
2018Directive arguments can be filesystem paths, URLs, title text, etc.
2019Directive options are indicated using `field lists`_; the field names
2020and contents are directive-specific. Arguments and options must form
2021a contiguous block beginning on the first or second line of the
2022directive; a blank line indicates the beginning of the directive
2023content block. If either arguments and/or options are employed by the
2024directive, a blank line must separate them from the directive content.
2025The "figure" directive employs all three parts::
2026
2027 .. figure:: larch.png
2028 :scale: 50
2029
2030 The larch.
2031
2032Simple directives may not require any content. If a directive that
2033does not employ a content block is followed by indented text anyway,
2034it is an error. If a block quote should immediately follow a
2035directive, use an empty comment in-between (see Comments_ below).
2036
2037Actions taken in response to directives and the interpretation of text
2038in the directive content block or subsequent text block(s) are
2039directive-dependent. See `reStructuredText Directives`_ for details.
2040
2041Directives are meant for the arbitrary processing of their contents,
2042which can be transformed into something possibly unrelated to the
2043original text. It may also be possible for directives to be used as
2044pragmas, to modify the behavior of the parser, such as to experiment
2045with alternate syntax. There is no parser support for this
2046functionality at present; if a reasonable need for pragma directives
2047is found, they may be supported.
2048
2049Directives do not generate "directive" elements; they are a *parser
2050construct* only, and have no intrinsic meaning outside of
2051reStructuredText. Instead, the parser will transform recognized
2052directives into (possibly specialized) document elements. Unknown
2053directives will trigger level-3 (error) system messages.
2054
2055Syntax diagram::
2056
2057 +-------+-------------------------------+
2058 | ".. " | directive type "::" directive |
2059 +-------+ block |
2060 | |
2061 +-------------------------------+
2062
2063
2064Substitution Definitions
2065````````````````````````
2066
2067Doctree element: substitution_definition.
2068
2069Substitution definitions are indicated by an explicit markup start
2070(".. ") followed by a vertical bar, the substitution text, another
2071vertical bar, whitespace, and the definition block. Substitution text
2072may not begin or end with whitespace. A substitution definition block
2073contains an embedded inline-compatible directive (without the leading
2074".. "), such as "image_" or "replace_". For example::
2075
2076 The |biohazard| symbol must be used on containers used to
2077 dispose of medical waste.
2078
2079 .. |biohazard| image:: biohazard.png
2080
2081It is an error for a substitution definition block to directly or
2082indirectly contain a circular substitution reference.
2083
2084`Substitution references`_ are replaced in-line by the processed
2085contents of the corresponding definition (linked by matching
2086substitution text). Matches are case-sensitive but forgiving; if no
2087exact match is found, a case-insensitive comparison is attempted.
2088
2089Substitution definitions allow the power and flexibility of
2090block-level directives_ to be shared by inline text. They are a way
2091to include arbitrarily complex inline structures within text, while
2092keeping the details out of the flow of text. They are the equivalent
2093of SGML/XML's named entities or programming language macros.
2094
2095Without the substitution mechanism, every time someone wants an
2096application-specific new inline structure, they would have to petition
2097for a syntax change. In combination with existing directive syntax,
2098any inline structure can be coded without new syntax (except possibly
2099a new directive).
2100
2101Syntax diagram::
2102
2103 +-------+-----------------------------------------------------+
2104 | ".. " | "|" substitution text "| " directive type "::" data |
2105 +-------+ directive block |
2106 | |
2107 +-----------------------------------------------------+
2108
2109Following are some use cases for the substitution mechanism. Please
2110note that most of the embedded directives shown are examples only and
2111have not been implemented.
2112
2113Objects
2114 Substitution references may be used to associate ambiguous text
2115 with a unique object identifier.
2116
2117 For example, many sites may wish to implement an inline "user"
2118 directive::
2119
2120 |Michael| and |Jon| are our widget-wranglers.
2121
2122 .. |Michael| user:: mjones
2123 .. |Jon| user:: jhl
2124
2125 Depending on the needs of the site, this may be used to index the
2126 document for later searching, to hyperlink the inline text in
2127 various ways (mailto, homepage, mouseover Javascript with profile
2128 and contact information, etc.), or to customize presentation of
2129 the text (include username in the inline text, include an icon
2130 image with a link next to the text, make the text bold or a
2131 different color, etc.).
2132
2133 The same approach can be used in documents which frequently refer
2134 to a particular type of objects with unique identifiers but
2135 ambiguous common names. Movies, albums, books, photos, court
2136 cases, and laws are possible. For example::
2137
2138 |The Transparent Society| offers a fascinating alternate view
2139 on privacy issues.
2140
2141 .. |The Transparent Society| book:: isbn=0738201448
2142
2143 Classes or functions, in contexts where the module or class names
2144 are unclear and/or interpreted text cannot be used, are another
2145 possibility::
2146
2147 4XSLT has the convenience method |runString|, so you don't
2148 have to mess with DOM objects if all you want is the
2149 transformed output.
2150
2151 .. |runString| function:: module=xml.xslt class=Processor
2152
2153Images
2154 Images are a common use for substitution references::
2155
2156 West led the |H| 3, covered by dummy's |H| Q, East's |H| K,
2157 and trumped in hand with the |S| 2.
2158
2159 .. |H| image:: /images/heart.png
2160 :height: 11
2161 :width: 11
2162 .. |S| image:: /images/spade.png
2163 :height: 11
2164 :width: 11
2165
2166 * |Red light| means stop.
2167 * |Green light| means go.
2168 * |Yellow light| means go really fast.
2169
2170 .. |Red light| image:: red_light.png
2171 .. |Green light| image:: green_light.png
2172 .. |Yellow light| image:: yellow_light.png
2173
2174 |-><-| is the official symbol of POEE_.
2175
2176 .. |-><-| image:: discord.png
2177 .. _POEE: http://www.poee.org/
2178
2179 The "image_" directive has been implemented.
2180
2181Styles [#]_
2182 Substitution references may be used to associate inline text with
2183 an externally defined presentation style::
2184
2185 Even |the text in Texas| is big.
2186
2187 .. |the text in Texas| style:: big
2188
2189 The style name may be meaningful in the context of some particular
2190 output format (CSS class name for HTML output, LaTeX style name
2191 for LaTeX, etc), or may be ignored for other output formats (such
2192 as plaintext).
2193
2194 .. @@@ This needs to be rethought & rewritten or removed:
2195
2196 Interpreted text is unsuitable for this purpose because the set
2197 of style names cannot be predefined - it is the domain of the
2198 content author, not the author of the parser and output
2199 formatter - and there is no way to associate a style name
2200 argument with an interpreted text style role. Also, it may be
2201 desirable to use the same mechanism for styling blocks::
2202
2203 .. style:: motto
2204 At Bob's Underwear Shop, we'll do anything to get in
2205 your pants.
2206
2207 .. style:: disclaimer
2208 All rights reversed. Reprint what you like.
2209
2210 .. [#] There may be sufficient need for a "style" mechanism to
2211 warrant simpler syntax such as an extension to the interpreted
2212 text role syntax. The substitution mechanism is cumbersome for
2213 simple text styling.
2214
2215Templates
2216 Inline markup may be used for later processing by a template
2217 engine. For example, a Zope_ author might write::
2218
2219 Welcome back, |name|!
2220
2221 .. |name| tal:: replace user/getUserName
2222
2223 After processing, this ZPT output would result::
2224
2225 Welcome back,
2226 <span tal:replace="user/getUserName">name</span>!
2227
2228 Zope would then transform this to something like "Welcome back,
2229 David!" during a session with an actual user.
2230
2231Replacement text
2232 The substitution mechanism may be used for simple macro
2233 substitution. This may be appropriate when the replacement text
2234 is repeated many times throughout one or more documents,
2235 especially if it may need to change later. A short example is
2236 unavoidably contrived::
2237
2238 |RST|_ is a little annoying to type over and over, especially
2239 when writing about |RST| itself, and spelling out the
2240 bicapitalized word |RST| every time isn't really necessary for
2241 |RST| source readability.
2242
2243 .. |RST| replace:: reStructuredText
2244 .. _RST: http://docutils.sourceforge.net/rst.html
2245
2246 Note the trailing underscore in the first use of a substitution
2247 reference. This indicates a reference to the corresponding
2248 hyperlink target.
2249
2250 Substitution is also appropriate when the replacement text cannot
2251 be represented using other inline constructs, or is obtrusively
2252 long::
2253
2254 But still, that's nothing compared to a name like
2255 |j2ee-cas|__.
2256
2257 .. |j2ee-cas| replace::
2258 the Java `TM`:super: 2 Platform, Enterprise Edition Client
2259 Access Services
2260 __ http://developer.java.sun.com/developer/earlyAccess/
2261 j2eecas/
2262
2263 The "replace_" directive has been implemented.
2264
2265
2266Comments
2267````````
2268
2269Doctree element: comment.
2270
2271Arbitrary indented text may follow the explicit markup start and will
2272be processed as a comment element. No further processing is done on
2273the comment block text; a comment contains a single "text blob".
2274Depending on the output formatter, comments may be removed from the
2275processed output. The only restriction on comments is that they not
2276use the same syntax as any of the other explicit markup constructs:
2277substitution definitions, directives, footnotes, citations, or
2278hyperlink targets. To ensure that none of the other explicit markup
2279constructs is recognized, leave the ".." on a line by itself::
2280
2281 .. This is a comment
2282 ..
2283 _so: is this!
2284 ..
2285 [and] this!
2286 ..
2287 this:: too!
2288 ..
2289 |even| this:: !
2290
2291.. _empty comments:
2292
2293An explicit markup start followed by a blank line and nothing else
2294(apart from whitespace) is an "_`empty comment`". It serves to
2295terminate a preceding construct, and does **not** consume any indented
2296text following. To have a block quote follow a list or any indented
2297construct, insert an unindented empty comment in-between.
2298
2299Syntax diagram::
2300
2301 +-------+----------------------+
2302 | ".. " | comment |
2303 +-------+ block |
2304 | |
2305 +----------------------+
2306
2307
2308Implicit Hyperlink Targets
2309==========================
2310
2311Implicit hyperlink targets are generated by section titles, footnotes,
2312and citations, and may also be generated by extension constructs.
2313Implicit hyperlink targets otherwise behave identically to explicit
2314`hyperlink targets`_.
2315
2316Problems of ambiguity due to conflicting duplicate implicit and
2317explicit reference names are avoided by following this procedure:
2318
23191. `Explicit hyperlink targets`_ override any implicit targets having
2320 the same reference name. The implicit hyperlink targets are
2321 removed, and level-1 (info) system messages are inserted.
2322
23232. Duplicate implicit hyperlink targets are removed, and level-1
2324 (info) system messages inserted. For example, if two or more
2325 sections have the same title (such as "Introduction" subsections of
2326 a rigidly-structured document), there will be duplicate implicit
2327 hyperlink targets.
2328
23293. Duplicate explicit hyperlink targets are removed, and level-2
2330 (warning) system messages are inserted. Exception: duplicate
2331 `external hyperlink targets`_ (identical hyperlink names and
2332 referenced URIs) do not conflict, and are not removed.
2333
2334System messages are inserted where target links have been removed.
2335See "Error Handling" in `PEP 258`_.
2336
2337The parser must return a set of *unique* hyperlink targets. The
2338calling software (such as the Docutils_) can warn of unresolvable
2339links, giving reasons for the messages.
2340
2341
2342Inline Markup
2343=============
2344
2345In reStructuredText, inline markup applies to words or phrases within
2346a text block. The same whitespace and punctuation that serves to
2347delimit words in written text is used to delimit the inline markup
2348syntax constructs. The text within inline markup may not begin or end
2349with whitespace. Arbitrary `character-level inline markup`_ is
2350supported although not encouraged. Inline markup cannot be nested.
2351
2352There are nine inline markup constructs. Five of the constructs use
2353identical start-strings and end-strings to indicate the markup:
2354
2355- emphasis_: "*"
2356- `strong emphasis`_: "**"
2357- `interpreted text`_: "`"
2358- `inline literals`_: "``"
2359- `substitution references`_: "|"
2360
2361Three constructs use different start-strings and end-strings:
2362
2363- `inline internal targets`_: "_`" and "`"
2364- `footnote references`_: "[" and "]_"
2365- `hyperlink references`_: "`" and "\`_" (phrases), or just a
2366 trailing "_" (single words)
2367
2368`Standalone hyperlinks`_ are recognized implicitly, and use no extra
2369markup.
2370
2371Inline markup recognition rules
2372-------------------------------
2373
2374Inline markup start-strings and end-strings are only recognized if all of
2375the following conditions are met:
2376
23771. Inline markup start-strings must start a text block or be
2378 immediately preceded by
2379
2380 * whitespace,
2381 * one of the ASCII characters ``- : / ' " < ( [ {`` or
2382 * a non-ASCII punctuation character with `Unicode category`_
2383 `Pd` (Dash),
2384 `Po` (Other),
2385 `Ps` (Open),
2386 `Pi` (Initial quote), or
2387 `Pf` (Final quote) [#PiPf]_.
2388
23892. Inline markup start-strings must be immediately followed by
2390 non-whitespace.
2391
23923. Inline markup end-strings must be immediately preceded by
2393 non-whitespace.
2394
23954. Inline markup end-strings must end a text block or be immediately
2396 followed by
2397
2398 * whitespace,
2399 * one of the ASCII characters ``- . , : ; ! ? \ / ' " ) ] } >`` or
2400 * a non-ASCII punctuation character with `Unicode category`_
2401 `Pd` (Dash),
2402 `Po` (Other),
2403 `Pe` (Close),
2404 `Pf` (Final quote), or
2405 `Pi` (Initial quote) [#PiPf]_.
2406
24075. If an inline markup start-string is immediately preceded by one of the
2408 ASCII characters ``' " < ( [ {``, or a character with Unicode character
2409 category `Ps`, `Pi`, or `Pf`, it must not be followed by the
2410 corresponding [#corresponding-quotes]_ closing character from
2411 ``' " ) ] } >`` or the categories `Pe`, `Pf`, or `Pi`.
2412
24136. An inline markup end-string must be separated by at least one
2414 character from the start-string.
2415
24167. An unescaped backslash preceding a start-string or end-string will
2417 disable markup recognition, except for the end-string of `inline
2418 literals`_. See `Escaping Mechanism`_ above for details.
2419
2420.. [#PiPf] `Pi` (Punctuation, Initial quote) characters are "usually
2421 closing, sometimes opening". `Pf` (Punctuation, Final quote)
2422 characters are "usually closing, sometimes opening".
2423
2424.. [#corresponding-quotes] For quotes, corresponding characters can be
2425 any of the `quotation marks in international usage`_
2426
2427.. _Unicode category:
2428 http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values
2429
2430.. _quotation marks in international usage:
2431 http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage
2432
2433The inline markup recognition rules were devised to allow 90% of non-markup
2434uses of "*", "`", "_", and "|" without escaping. For example, none of the
2435following terms are recognized as containing inline markup strings:
2436
2437- 2*x a**b O(N**2) e**(x*y) f(x)*f(y) a|b file*.* (breaks 1)
2438- 2 * x a ** b (* BOM32_* ` `` _ __ | (breaks 2)
2439- "*" '|' (*) [*] {*} <*>
2440 ‘*’ ‚*‘ ‘*‚ ’*’ ‚*’
2441 “*” „*“ “*„ ”*” „*”
2442 »*« ›*‹ «*» »*» ›*› (breaks 5)
2443- || (breaks 6)
2444- __init__ __init__()
2445
2446No escaping is required inside the following inline markup examples:
2447
2448- *2 * x *a **b *.txt* (breaks 3)
2449- *2*x a**b O(N**2) e**(x*y) f(x)*f(y) a*(1+2)* (breaks 4)
2450
2451It may be desirable to use `inline literals`_ for some of these anyhow,
2452especially if they represent code snippets. It's a judgment call.
2453
2454These cases *do* require either literal-quoting or escaping to avoid
2455misinterpretation:
2456
2457 \*4, class\_, \*args, \**kwargs, \`TeX-quoted', \*ML, \*.txt
2458
2459In most use cases, `inline literals`_ or `literal blocks`_ are the best
2460choice (by default, this also selects a monospaced font)::
2461
2462 *4, class_, *args, **kwargs, `TeX-quoted', *ML, *.txt
2463
2464Recognition order
2465-----------------
2466
2467Inline markup delimiter characters are used for multiple constructs,
2468so to avoid ambiguity there must be a specific recognition order for
2469each character. The inline markup recognition order is as follows:
2470
2471- Asterisks: `Strong emphasis`_ ("**") is recognized before emphasis_
2472 ("*").
2473
2474- Backquotes: `Inline literals`_ ("``"), `inline internal targets`_
2475 (leading "_`", trailing "`"), are mutually independent, and are
2476 recognized before phrase `hyperlink references`_ (leading "`",
2477 trailing "\`_") and `interpreted text`_ ("`").
2478
2479- Trailing underscores: Footnote references ("[" + label + "]_") and
2480 simple `hyperlink references`_ (name + trailing "_") are mutually
2481 independent.
2482
2483- Vertical bars: `Substitution references`_ ("|") are independently
2484 recognized.
2485
2486- `Standalone hyperlinks`_ are the last to be recognized.
2487
2488
2489Character-Level Inline Markup
2490-----------------------------
2491
2492It is possible to mark up individual characters within a word with
2493backslash escapes (see `Escaping Mechanism`_ above). Backslash
2494escapes can be used to allow arbitrary text to immediately follow
2495inline markup::
2496
2497 Python ``list``\s use square bracket syntax.
2498
2499The backslash will disappear from the processed document. The word
2500"list" will appear as inline literal text, and the letter "s" will
2501immediately follow it as normal text, with no space in-between.
2502
2503Arbitrary text may immediately precede inline markup using
2504backslash-escaped whitespace::
2505
2506 Possible in *re*\ ``Structured``\ *Text*, though not encouraged.
2507
2508The backslashes and spaces separating "re", "Structured", and "Text"
2509above will disappear from the processed document.
2510
2511.. CAUTION::
2512
2513 The use of backslash-escapes for character-level inline markup is
2514 not encouraged. Such use is ugly and detrimental to the
2515 unprocessed document's readability. Please use this feature
2516 sparingly and only where absolutely necessary.
2517
2518
2519Emphasis
2520--------
2521
2522Doctree element: emphasis.
2523
2524Start-string = end-string = "*".
2525
2526Text enclosed by single asterisk characters is emphasized::
2527
2528 This is *emphasized text*.
2529
2530Emphasized text is typically displayed in italics.
2531
2532
2533Strong Emphasis
2534---------------
2535
2536Doctree element: strong.
2537
2538Start-string = end-string = "**".
2539
2540Text enclosed by double-asterisks is emphasized strongly::
2541
2542 This is **strong text**.
2543
2544Strongly emphasized text is typically displayed in boldface.
2545
2546
2547Interpreted Text
2548----------------
2549
2550Doctree element: depends on the explicit or implicit role and
2551processing.
2552
2553Start-string = end-string = "`".
2554
2555Interpreted text is text that is meant to be related, indexed, linked,
2556summarized, or otherwise processed, but the text itself is typically
2557left alone. Interpreted text is enclosed by single backquote
2558characters::
2559
2560 This is `interpreted text`.
2561
2562The "role" of the interpreted text determines how the text is
2563interpreted. The role may be inferred implicitly (as above; the
2564"default role" is used) or indicated explicitly, using a role marker.
2565A role marker consists of a colon, the role name, and another colon.
2566A role name is a single word consisting of alphanumerics plus isolated
2567internal hyphens, underscores, plus signs, colons, and periods;
2568no whitespace or other characters are allowed. A role marker is
2569either a prefix or a suffix to the interpreted text, whichever reads
2570better; it's up to the author::
2571
2572 :role:`interpreted text`
2573
2574 `interpreted text`:role:
2575
2576Interpreted text allows extensions to the available inline descriptive
2577markup constructs. To emphasis_, `strong emphasis`_, `inline
2578literals`_, and `hyperlink references`_, we can add "title reference",
2579"index entry", "acronym", "class", "red", "blinking" or anything else
2580we want. Only pre-determined roles are recognized; unknown roles will
2581generate errors. A core set of standard roles is implemented in the
2582reference parser; see `reStructuredText Interpreted Text Roles`_ for
2583individual descriptions. The role_ directive can be used to define
2584custom interpreted text roles. In addition, applications may support
2585specialized roles.
2586
2587
2588Inline Literals
2589---------------
2590
2591Doctree element: literal.
2592
2593Start-string = end-string = "``".
2594
2595Text enclosed by double-backquotes is treated as inline literals::
2596
2597 This text is an example of ``inline literals``.
2598
2599Inline literals may contain any characters except two adjacent
2600backquotes in an end-string context (according to the recognition
2601rules above). No markup interpretation (including backslash-escape
2602interpretation) is done within inline literals.
2603
2604Line breaks are *not* preserved in inline literals. Although a
2605reStructuredText parser will preserve runs of spaces in its output,
2606the final representation of the processed document is dependent on the
2607output formatter, thus the preservation of whitespace cannot be
2608guaranteed. If the preservation of line breaks and/or other
2609whitespace is important, `literal blocks`_ should be used.
2610
2611Inline literals are useful for short code snippets. For example::
2612
2613 The regular expression ``[+-]?(\d+(\.\d*)?|\.\d+)`` matches
2614 floating-point numbers (without exponents).
2615
2616
2617Hyperlink References
2618--------------------
2619
2620Doctree element: reference.
2621
2622- Named hyperlink references:
2623
2624 - Start-string = "" (empty string), end-string = "_".
2625 - Start-string = "`", end-string = "\`_". (Phrase references.)
2626
2627- Anonymous hyperlink references:
2628
2629 - Start-string = "" (empty string), end-string = "__".
2630 - Start-string = "`", end-string = "\`__". (Phrase references.)
2631
2632Hyperlink references are indicated by a trailing underscore, "_",
2633except for `standalone hyperlinks`_ which are recognized
2634independently. The underscore can be thought of as a right-pointing
2635arrow. The trailing underscores point away from hyperlink references,
2636and the leading underscores point toward `hyperlink targets`_.
2637
2638Hyperlinks consist of two parts. In the text body, there is a source
2639link, a reference name with a trailing underscore (or two underscores
2640for `anonymous hyperlinks`_)::
2641
2642 See the Python_ home page for info.
2643
2644A target link with a matching reference name must exist somewhere else
2645in the document. See `Hyperlink Targets`_ for a full description).
2646
2647`Anonymous hyperlinks`_ (which see) do not use reference names to
2648match references to targets, but otherwise behave similarly to named
2649hyperlinks.
2650
2651
2652Embedded URIs
2653`````````````
2654
2655A hyperlink reference may directly embed a target URI inline, within
2656angle brackets ("<...>") as follows::
2657
2658 See the `Python home page <http://www.python.org>`_ for info.
2659
2660This is exactly equivalent to::
2661
2662 See the `Python home page`_ for info.
2663
2664 .. _Python home page: http://www.python.org
2665
2666The bracketed URI must be preceded by whitespace and be the last text
2667before the end string. With a single trailing underscore, the
2668reference is named and the same target URI may be referred to again.
2669
2670With two trailing underscores, the reference and target are both
2671anonymous, and the target cannot be referred to again. These are
2672"one-off" hyperlinks. For example::
2673
2674 `RFC 2396 <http://www.rfc-editor.org/rfc/rfc2396.txt>`__ and `RFC
2675 2732 <http://www.rfc-editor.org/rfc/rfc2732.txt>`__ together
2676 define the syntax of URIs.
2677
2678Equivalent to::
2679
2680 `RFC 2396`__ and `RFC 2732`__ together define the syntax of URIs.
2681
2682 __ http://www.rfc-editor.org/rfc/rfc2396.txt
2683 __ http://www.rfc-editor.org/rfc/rfc2732.txt
2684
2685If reference text happens to end with angle-bracketed text that is
2686*not* a URI, the open-angle-bracket needs to be backslash-escaped.
2687For example, here is a reference to a title describing a tag::
2688
2689 See `HTML Element: \<a>`_ below.
2690
2691The reference text may also be omitted, in which case the URI will be
2692duplicated for use as the reference text. This is useful for relative
2693URIs where the address or file name is also the desired reference
2694text::
2695
2696 See `<a_named_relative_link>`_ or `<an_anonymous_relative_link>`__
2697 for details.
2698
2699.. CAUTION::
2700
2701 This construct offers easy authoring and maintenance of hyperlinks
2702 at the expense of general readability. Inline URIs, especially
2703 long ones, inevitably interrupt the natural flow of text. For
2704 documents meant to be read in source form, the use of independent
2705 block-level `hyperlink targets`_ is **strongly recommended**. The
2706 embedded URI construct is most suited to documents intended *only*
2707 to be read in processed form.
2708
2709
2710Inline Internal Targets
2711------------------------
2712
2713Doctree element: target.
2714
2715Start-string = "_`", end-string = "`".
2716
2717Inline internal targets are the equivalent of explicit `internal
2718hyperlink targets`_, but may appear within running text. The syntax
2719begins with an underscore and a backquote, is followed by a hyperlink
2720name or phrase, and ends with a backquote. Inline internal targets
2721may not be anonymous.
2722
2723For example, the following paragraph contains a hyperlink target named
2724"Norwegian Blue"::
2725
2726 Oh yes, the _`Norwegian Blue`. What's, um, what's wrong with it?
2727
2728See `Implicit Hyperlink Targets`_ for the resolution of duplicate
2729reference names.
2730
2731
2732Footnote References
2733-------------------
2734
2735Doctree element: footnote_reference.
2736
2737Start-string = "[", end-string = "]_".
2738
2739Each footnote reference consists of a square-bracketed label followed
2740by a trailing underscore. Footnote labels are one of:
2741
2742- one or more digits (i.e., a number),
2743
2744- a single "#" (denoting `auto-numbered footnotes`_),
2745
2746- a "#" followed by a simple reference name (an `autonumber label`_),
2747 or
2748
2749- a single "*" (denoting `auto-symbol footnotes`_).
2750
2751For example::
2752
2753 Please RTFM [1]_.
2754
2755 .. [1] Read The Fine Manual
2756
2757
2758Citation References
2759-------------------
2760
2761Doctree element: citation_reference.
2762
2763Start-string = "[", end-string = "]_".
2764
2765Each citation reference consists of a square-bracketed label followed
2766by a trailing underscore. Citation labels are simple `reference
2767names`_ (case-insensitive single words, consisting of alphanumerics
2768plus internal hyphens, underscores, and periods; no whitespace).
2769
2770For example::
2771
2772 Here is a citation reference: [CIT2002]_.
2773
2774See Citations_ for the citation itself.
2775
2776
2777Substitution References
2778-----------------------
2779
2780Doctree element: substitution_reference, reference.
2781
2782Start-string = "|", end-string = "|" (optionally followed by "_" or
2783"__").
2784
2785Vertical bars are used to bracket the substitution reference text. A
2786substitution reference may also be a hyperlink reference by appending
2787a "_" (named) or "__" (anonymous) suffix; the substitution text is
2788used for the reference text in the named case.
2789
2790The processing system replaces substitution references with the
2791processed contents of the corresponding `substitution definitions`_
2792(which see for the definition of "correspond"). Substitution
2793definitions produce inline-compatible elements.
2794
2795Examples::
2796
2797 This is a simple |substitution reference|. It will be replaced by
2798 the processing system.
2799
2800 This is a combination |substitution and hyperlink reference|_. In
2801 addition to being replaced, the replacement text or element will
2802 refer to the "substitution and hyperlink reference" target.
2803
2804
2805Standalone Hyperlinks
2806---------------------
2807
2808Doctree element: reference.
2809
2810Start-string = end-string = "" (empty string).
2811
2812A URI (absolute URI [#URI]_ or standalone email address) within a text
2813block is treated as a general external hyperlink with the URI itself
2814as the link's text. For example::
2815
2816 See http://www.python.org for info.
2817
2818would be marked up in HTML as::
2819
2820 See <a href="http://www.python.org">http://www.python.org</a> for
2821 info.
2822
2823Two forms of URI are recognized:
2824
28251. Absolute URIs. These consist of a scheme, a colon (":"), and a
2826 scheme-specific part whose interpretation depends on the scheme.
2827
2828 The scheme is the name of the protocol, such as "http", "ftp",
2829 "mailto", or "telnet". The scheme consists of an initial letter,
2830 followed by letters, numbers, and/or "+", "-", ".". Recognition is
2831 limited to known schemes, per the `Official IANA Registry of URI
2832 Schemes`_ and the W3C's `Retired Index of WWW Addressing Schemes`_.
2833
2834 The scheme-specific part of the resource identifier may be either
2835 hierarchical or opaque:
2836
2837 - Hierarchical identifiers begin with one or two slashes and may
2838 use slashes to separate hierarchical components of the path.
2839 Examples are web pages and FTP sites::
2840
2841 http://www.python.org
2842
2843 ftp://ftp.python.org/pub/python
2844
2845 - Opaque identifiers do not begin with slashes. Examples are
2846 email addresses and newsgroups::
2847
2848 mailto:someone@somewhere.com
2849
2850 news:comp.lang.python
2851
2852 With queries, fragments, and %-escape sequences, URIs can become
2853 quite complicated. A reStructuredText parser must be able to
2854 recognize any absolute URI, as defined in RFC2396_ and RFC2732_.
2855
28562. Standalone email addresses, which are treated as if they were
2857 absolute URIs with a "mailto:" scheme. Example::
2858
2859 someone@somewhere.com
2860
2861Punctuation at the end of a URI is not considered part of the URI,
2862unless the URI is terminated by a closing angle bracket (">").
2863Backslashes may be used in URIs to escape markup characters,
2864specifically asterisks ("*") and underscores ("_") which are vaid URI
2865characters (see `Escaping Mechanism`_ above).
2866
2867.. [#URI] Uniform Resource Identifier. URIs are a general form of
2868 URLs (Uniform Resource Locators). For the syntax of URIs see
2869 RFC2396_ and RFC2732_.
2870
2871
2872Units
2873=====
2874
2875(New in Docutils 0.3.10.)
2876
2877All measures consist of a positive floating point number in standard
2878(non-scientific) notation and a unit, possibly separated by one or
2879more spaces.
2880
2881Units are only supported where explicitly mentioned in the reference
2882manuals.
2883
2884
2885Length Units
2886------------
2887
2888The following length units are supported by the reStructuredText
2889parser:
2890
2891* em (ems, the height of the element's font)
2892* ex (x-height, the height of the letter "x")
2893* px (pixels, relative to the canvas resolution)
2894* in (inches; 1in=2.54cm)
2895* cm (centimeters; 1cm=10mm)
2896* mm (millimeters)
2897* pt (points; 1pt=1/72in)
2898* pc (picas; 1pc=12pt)
2899
2900This set corresponds to the `length units in CSS`_.
2901
2902(List and explanations taken from
2903http://www.htmlhelp.com/reference/css/units.html#length.)
2904
2905The following are all valid length values: "1.5em", "20 mm", ".5in".
2906
2907Length values without unit are completed with a writer-dependent
2908default (e.g. px with `html4css1`, pt with `latex2e`). See the writer
2909specific documentation in the `user doc`__ for details.
2910
2911.. _length units in CSS:
2912 http://www.w3.org/TR/CSS2/syndata.html#length-units
2913
2914__ ../../user/
2915
2916Percentage Units
2917----------------
2918
2919Percentage values have a percent sign ("%") as unit. Percentage
2920values are relative to other values, depending on the context in which
2921they occur.
2922
2923
2924----------------
2925 Error Handling
2926----------------
2927
2928Doctree element: system_message, problematic.
2929
2930Markup errors are handled according to the specification in `PEP
2931258`_.
2932
2933
2934.. _reStructuredText: http://docutils.sourceforge.net/rst.html
2935.. _Docutils: http://docutils.sourceforge.net/
2936.. _The Docutils Document Tree: ../doctree.html
2937.. _Docutils Generic DTD: ../docutils.dtd
2938.. _transforms:
2939 http://docutils.sourceforge.net/docutils/transforms/
2940.. _Grouch: http://www.mems-exchange.org/software/grouch/
2941.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt
2942.. _DocTitle transform:
2943.. _DocInfo transform:
2944 http://docutils.sourceforge.net/docutils/transforms/frontmatter.py
2945.. _getopt.py:
2946 http://www.python.org/doc/current/lib/module-getopt.html
2947.. _GNU libc getopt_long():
2948 http://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html
2949.. _doctest module:
2950 http://www.python.org/doc/current/lib/module-doctest.html
2951.. _Emacs table mode: http://table.sourceforge.net/
2952.. _Official IANA Registry of URI Schemes:
2953 http://www.iana.org/assignments/uri-schemes
2954.. _Retired Index of WWW Addressing Schemes:
2955 http://www.w3.org/Addressing/schemes.html
2956.. _World Wide Web Consortium: http://www.w3.org/
2957.. _HTML Techniques for Web Content Accessibility Guidelines:
2958 http://www.w3.org/TR/WCAG10-HTML-TECHS/#link-text
2959.. _image: directives.html#image
2960.. _replace: directives.html#replace
2961.. _meta: directives.html#meta
2962.. _figure: directives.html#figure
2963.. _admonition: directives.html#admonitions
2964.. _role: directives.html#custom-interpreted-text-roles
2965.. _reStructuredText Directives: directives.html
2966.. _reStructuredText Interpreted Text Roles: roles.html
2967.. _RFC2396: http://www.rfc-editor.org/rfc/rfc2396.txt
2968.. _RFC2732: http://www.rfc-editor.org/rfc/rfc2732.txt
2969.. _Zope: http://www.zope.com/
2970.. _PEP 258: ../../peps/pep-0258.html
2971
2972
02973
2974..
2975 Local Variables:
2976 mode: indented-text
2977 indent-tabs-mode: nil
2978 sentence-end-double-space: t
2979 fill-column: 70
2980 End:
12981
=== added directory '.pc/support-aliases-in-references.diff/docutils'
=== added directory '.pc/support-aliases-in-references.diff/docutils/parsers'
=== added directory '.pc/support-aliases-in-references.diff/docutils/parsers/rst'
=== added file '.pc/support-aliases-in-references.diff/docutils/parsers/rst/states.py'
--- .pc/support-aliases-in-references.diff/docutils/parsers/rst/states.py 1970-01-01 00:00:00 +0000
+++ .pc/support-aliases-in-references.diff/docutils/parsers/rst/states.py 2013-03-19 07:30:29 +0000
@@ -0,0 +1,3052 @@
1# $Id: states.py 7495 2012-08-16 14:50:57Z milde $
2# Author: David Goodger <goodger@python.org>
3# Copyright: This module has been placed in the public domain.
4
5"""
6This is the ``docutils.parsers.rst.states`` module, the core of
7the reStructuredText parser. It defines the following:
8
9:Classes:
10 - `RSTStateMachine`: reStructuredText parser's entry point.
11 - `NestedStateMachine`: recursive StateMachine.
12 - `RSTState`: reStructuredText State superclass.
13 - `Inliner`: For parsing inline markup.
14 - `Body`: Generic classifier of the first line of a block.
15 - `SpecializedBody`: Superclass for compound element members.
16 - `BulletList`: Second and subsequent bullet_list list_items
17 - `DefinitionList`: Second+ definition_list_items.
18 - `EnumeratedList`: Second+ enumerated_list list_items.
19 - `FieldList`: Second+ fields.
20 - `OptionList`: Second+ option_list_items.
21 - `RFC2822List`: Second+ RFC2822-style fields.
22 - `ExtensionOptions`: Parses directive option fields.
23 - `Explicit`: Second+ explicit markup constructs.
24 - `SubstitutionDef`: For embedded directives in substitution definitions.
25 - `Text`: Classifier of second line of a text block.
26 - `SpecializedText`: Superclass for continuation lines of Text-variants.
27 - `Definition`: Second line of potential definition_list_item.
28 - `Line`: Second line of overlined section title or transition marker.
29 - `Struct`: An auxiliary collection class.
30
31:Exception classes:
32 - `MarkupError`
33 - `ParserError`
34 - `MarkupMismatch`
35
36:Functions:
37 - `escape2null()`: Return a string, escape-backslashes converted to nulls.
38 - `unescape()`: Return a string, nulls removed or restored to backslashes.
39
40:Attributes:
41 - `state_classes`: set of State classes used with `RSTStateMachine`.
42
43Parser Overview
44===============
45
46The reStructuredText parser is implemented as a recursive state machine,
47examining its input one line at a time. To understand how the parser works,
48please first become familiar with the `docutils.statemachine` module. In the
49description below, references are made to classes defined in this module;
50please see the individual classes for details.
51
52Parsing proceeds as follows:
53
541. The state machine examines each line of input, checking each of the
55 transition patterns of the state `Body`, in order, looking for a match.
56 The implicit transitions (blank lines and indentation) are checked before
57 any others. The 'text' transition is a catch-all (matches anything).
58
592. The method associated with the matched transition pattern is called.
60
61 A. Some transition methods are self-contained, appending elements to the
62 document tree (`Body.doctest` parses a doctest block). The parser's
63 current line index is advanced to the end of the element, and parsing
64 continues with step 1.
65
66 B. Other transition methods trigger the creation of a nested state machine,
67 whose job is to parse a compound construct ('indent' does a block quote,
68 'bullet' does a bullet list, 'overline' does a section [first checking
69 for a valid section header], etc.).
70
71 - In the case of lists and explicit markup, a one-off state machine is
72 created and run to parse contents of the first item.
73
74 - A new state machine is created and its initial state is set to the
75 appropriate specialized state (`BulletList` in the case of the
76 'bullet' transition; see `SpecializedBody` for more detail). This
77 state machine is run to parse the compound element (or series of
78 explicit markup elements), and returns as soon as a non-member element
79 is encountered. For example, the `BulletList` state machine ends as
80 soon as it encounters an element which is not a list item of that
81 bullet list. The optional omission of inter-element blank lines is
82 enabled by this nested state machine.
83
84 - The current line index is advanced to the end of the elements parsed,
85 and parsing continues with step 1.
86
87 C. The result of the 'text' transition depends on the next line of text.
88 The current state is changed to `Text`, under which the second line is
89 examined. If the second line is:
90
91 - Indented: The element is a definition list item, and parsing proceeds
92 similarly to step 2.B, using the `DefinitionList` state.
93
94 - A line of uniform punctuation characters: The element is a section
95 header; again, parsing proceeds as in step 2.B, and `Body` is still
96 used.
97
98 - Anything else: The element is a paragraph, which is examined for
99 inline markup and appended to the parent element. Processing
100 continues with step 1.
101"""
102
103__docformat__ = 'reStructuredText'
104
105
106import sys
107import re
108try:
109 import roman
110except ImportError:
111 import docutils.utils.roman as roman
112from types import FunctionType, MethodType
113
114from docutils import nodes, statemachine, utils
115from docutils import ApplicationError, DataError
116from docutils.statemachine import StateMachineWS, StateWS
117from docutils.nodes import fully_normalize_name as normalize_name
118from docutils.nodes import whitespace_normalize_name
119import docutils.parsers.rst
120from docutils.parsers.rst import directives, languages, tableparser, roles
121from docutils.parsers.rst.languages import en as _fallback_language_module
122from docutils.utils import escape2null, unescape, column_width
123from docutils.utils import punctuation_chars, urischemes
124
125class MarkupError(DataError): pass
126class UnknownInterpretedRoleError(DataError): pass
127class InterpretedRoleNotImplementedError(DataError): pass
128class ParserError(ApplicationError): pass
129class MarkupMismatch(Exception): pass
130
131
132class Struct:
133
134 """Stores data attributes for dotted-attribute access."""
135
136 def __init__(self, **keywordargs):
137 self.__dict__.update(keywordargs)
138
139
140class RSTStateMachine(StateMachineWS):
141
142 """
143 reStructuredText's master StateMachine.
144
145 The entry point to reStructuredText parsing is the `run()` method.
146 """
147
148 def run(self, input_lines, document, input_offset=0, match_titles=True,
149 inliner=None):
150 """
151 Parse `input_lines` and modify the `document` node in place.
152
153 Extend `StateMachineWS.run()`: set up parse-global data and
154 run the StateMachine.
155 """
156 self.language = languages.get_language(
157 document.settings.language_code)
158 self.match_titles = match_titles
159 if inliner is None:
160 inliner = Inliner()
161 inliner.init_customizations(document.settings)
162 self.memo = Struct(document=document,
163 reporter=document.reporter,
164 language=self.language,
165 title_styles=[],
166 section_level=0,
167 section_bubble_up_kludge=False,
168 inliner=inliner)
169 self.document = document
170 self.attach_observer(document.note_source)
171 self.reporter = self.memo.reporter
172 self.node = document
173 results = StateMachineWS.run(self, input_lines, input_offset,
174 input_source=document['source'])
175 assert results == [], 'RSTStateMachine.run() results should be empty!'
176 self.node = self.memo = None # remove unneeded references
177
178
179class NestedStateMachine(StateMachineWS):
180
181 """
182 StateMachine run from within other StateMachine runs, to parse nested
183 document structures.
184 """
185
186 def run(self, input_lines, input_offset, memo, node, match_titles=True):
187 """
188 Parse `input_lines` and populate a `docutils.nodes.document` instance.
189
190 Extend `StateMachineWS.run()`: set up document-wide data.
191 """
192 self.match_titles = match_titles
193 self.memo = memo
194 self.document = memo.document
195 self.attach_observer(self.document.note_source)
196 self.reporter = memo.reporter
197 self.language = memo.language
198 self.node = node
199 results = StateMachineWS.run(self, input_lines, input_offset)
200 assert results == [], ('NestedStateMachine.run() results should be '
201 'empty!')
202 return results
203
204
205class RSTState(StateWS):
206
207 """
208 reStructuredText State superclass.
209
210 Contains methods used by all State subclasses.
211 """
212
213 nested_sm = NestedStateMachine
214 nested_sm_cache = []
215
216 def __init__(self, state_machine, debug=False):
217 self.nested_sm_kwargs = {'state_classes': state_classes,
218 'initial_state': 'Body'}
219 StateWS.__init__(self, state_machine, debug)
220
221 def runtime_init(self):
222 StateWS.runtime_init(self)
223 memo = self.state_machine.memo
224 self.memo = memo
225 self.reporter = memo.reporter
226 self.inliner = memo.inliner
227 self.document = memo.document
228 self.parent = self.state_machine.node
229 # enable the reporter to determine source and source-line
230 if not hasattr(self.reporter, 'get_source_and_line'):
231 self.reporter.get_source_and_line = self.state_machine.get_source_and_line
232 # print "adding get_source_and_line to reporter", self.state_machine.input_offset
233
234
235 def goto_line(self, abs_line_offset):
236 """
237 Jump to input line `abs_line_offset`, ignoring jumps past the end.
238 """
239 try:
240 self.state_machine.goto_line(abs_line_offset)
241 except EOFError:
242 pass
243
244 def no_match(self, context, transitions):
245 """
246 Override `StateWS.no_match` to generate a system message.
247
248 This code should never be run.
249 """
250 self.reporter.severe(
251 'Internal error: no transition pattern match. State: "%s"; '
252 'transitions: %s; context: %s; current line: %r.'
253 % (self.__class__.__name__, transitions, context,
254 self.state_machine.line))
255 return context, None, []
256
257 def bof(self, context):
258 """Called at beginning of file."""
259 return [], []
260
261 def nested_parse(self, block, input_offset, node, match_titles=False,
262 state_machine_class=None, state_machine_kwargs=None):
263 """
264 Create a new StateMachine rooted at `node` and run it over the input
265 `block`.
266 """
267 use_default = 0
268 if state_machine_class is None:
269 state_machine_class = self.nested_sm
270 use_default += 1
271 if state_machine_kwargs is None:
272 state_machine_kwargs = self.nested_sm_kwargs
273 use_default += 1
274 block_length = len(block)
275
276 state_machine = None
277 if use_default == 2:
278 try:
279 state_machine = self.nested_sm_cache.pop()
280 except IndexError:
281 pass
282 if not state_machine:
283 state_machine = state_machine_class(debug=self.debug,
284 **state_machine_kwargs)
285 state_machine.run(block, input_offset, memo=self.memo,
286 node=node, match_titles=match_titles)
287 if use_default == 2:
288 self.nested_sm_cache.append(state_machine)
289 else:
290 state_machine.unlink()
291 new_offset = state_machine.abs_line_offset()
292 # No `block.parent` implies disconnected -- lines aren't in sync:
293 if block.parent and (len(block) - block_length) != 0:
294 # Adjustment for block if modified in nested parse:
295 self.state_machine.next_line(len(block) - block_length)
296 return new_offset
297
298 def nested_list_parse(self, block, input_offset, node, initial_state,
299 blank_finish,
300 blank_finish_state=None,
301 extra_settings={},
302 match_titles=False,
303 state_machine_class=None,
304 state_machine_kwargs=None):
305 """
306 Create a new StateMachine rooted at `node` and run it over the input
307 `block`. Also keep track of optional intermediate blank lines and the
308 required final one.
309 """
310 if state_machine_class is None:
311 state_machine_class = self.nested_sm
312 if state_machine_kwargs is None:
313 state_machine_kwargs = self.nested_sm_kwargs.copy()
314 state_machine_kwargs['initial_state'] = initial_state
315 state_machine = state_machine_class(debug=self.debug,
316 **state_machine_kwargs)
317 if blank_finish_state is None:
318 blank_finish_state = initial_state
319 state_machine.states[blank_finish_state].blank_finish = blank_finish
320 for key, value in extra_settings.items():
321 setattr(state_machine.states[initial_state], key, value)
322 state_machine.run(block, input_offset, memo=self.memo,
323 node=node, match_titles=match_titles)
324 blank_finish = state_machine.states[blank_finish_state].blank_finish
325 state_machine.unlink()
326 return state_machine.abs_line_offset(), blank_finish
327
328 def section(self, title, source, style, lineno, messages):
329 """Check for a valid subsection and create one if it checks out."""
330 if self.check_subsection(source, style, lineno):
331 self.new_subsection(title, lineno, messages)
332
333 def check_subsection(self, source, style, lineno):
334 """
335 Check for a valid subsection header. Return 1 (true) or None (false).
336
337 When a new section is reached that isn't a subsection of the current
338 section, back up the line count (use ``previous_line(-x)``), then
339 ``raise EOFError``. The current StateMachine will finish, then the
340 calling StateMachine can re-examine the title. This will work its way
341 back up the calling chain until the correct section level isreached.
342
343 @@@ Alternative: Evaluate the title, store the title info & level, and
344 back up the chain until that level is reached. Store in memo? Or
345 return in results?
346
347 :Exception: `EOFError` when a sibling or supersection encountered.
348 """
349 memo = self.memo
350 title_styles = memo.title_styles
351 mylevel = memo.section_level
352 try: # check for existing title style
353 level = title_styles.index(style) + 1
354 except ValueError: # new title style
355 if len(title_styles) == memo.section_level: # new subsection
356 title_styles.append(style)
357 return 1
358 else: # not at lowest level
359 self.parent += self.title_inconsistent(source, lineno)
360 return None
361 if level <= mylevel: # sibling or supersection
362 memo.section_level = level # bubble up to parent section
363 if len(style) == 2:
364 memo.section_bubble_up_kludge = True
365 # back up 2 lines for underline title, 3 for overline title
366 self.state_machine.previous_line(len(style) + 1)
367 raise EOFError # let parent section re-evaluate
368 if level == mylevel + 1: # immediate subsection
369 return 1
370 else: # invalid subsection
371 self.parent += self.title_inconsistent(source, lineno)
372 return None
373
374 def title_inconsistent(self, sourcetext, lineno):
375 error = self.reporter.severe(
376 'Title level inconsistent:', nodes.literal_block('', sourcetext),
377 line=lineno)
378 return error
379
380 def new_subsection(self, title, lineno, messages):
381 """Append new subsection to document tree. On return, check level."""
382 memo = self.memo
383 mylevel = memo.section_level
384 memo.section_level += 1
385 section_node = nodes.section()
386 self.parent += section_node
387 textnodes, title_messages = self.inline_text(title, lineno)
388 titlenode = nodes.title(title, '', *textnodes)
389 name = normalize_name(titlenode.astext())
390 section_node['names'].append(name)
391 section_node += titlenode
392 section_node += messages
393 section_node += title_messages
394 self.document.note_implicit_target(section_node, section_node)
395 offset = self.state_machine.line_offset + 1
396 absoffset = self.state_machine.abs_line_offset() + 1
397 newabsoffset = self.nested_parse(
398 self.state_machine.input_lines[offset:], input_offset=absoffset,
399 node=section_node, match_titles=True)
400 self.goto_line(newabsoffset)
401 if memo.section_level <= mylevel: # can't handle next section?
402 raise EOFError # bubble up to supersection
403 # reset section_level; next pass will detect it properly
404 memo.section_level = mylevel
405
406 def paragraph(self, lines, lineno):
407 """
408 Return a list (paragraph & messages) & a boolean: literal_block next?
409 """
410 data = '\n'.join(lines).rstrip()
411 if re.search(r'(?<!\\)(\\\\)*::$', data):
412 if len(data) == 2:
413 return [], 1
414 elif data[-3] in ' \n':
415 text = data[:-3].rstrip()
416 else:
417 text = data[:-1]
418 literalnext = 1
419 else:
420 text = data
421 literalnext = 0
422 textnodes, messages = self.inline_text(text, lineno)
423 p = nodes.paragraph(data, '', *textnodes)
424 p.source, p.line = self.state_machine.get_source_and_line(lineno)
425 return [p] + messages, literalnext
426
427 def inline_text(self, text, lineno):
428 """
429 Return 2 lists: nodes (text and inline elements), and system_messages.
430 """
431 return self.inliner.parse(text, lineno, self.memo, self.parent)
432
433 def unindent_warning(self, node_name):
434 # the actual problem is one line below the current line
435 lineno = self.state_machine.abs_line_number()+1
436 return self.reporter.warning('%s ends without a blank line; '
437 'unexpected unindent.' % node_name,
438 line=lineno)
439
440
441def build_regexp(definition, compile=True):
442 """
443 Build, compile and return a regular expression based on `definition`.
444
445 :Parameter: `definition`: a 4-tuple (group name, prefix, suffix, parts),
446 where "parts" is a list of regular expressions and/or regular
447 expression definitions to be joined into an or-group.
448 """
449 name, prefix, suffix, parts = definition
450 part_strings = []
451 for part in parts:
452 if type(part) is tuple:
453 part_strings.append(build_regexp(part, None))
454 else:
455 part_strings.append(part)
456 or_group = '|'.join(part_strings)
457 regexp = '%(prefix)s(?P<%(name)s>%(or_group)s)%(suffix)s' % locals()
458 if compile:
459 return re.compile(regexp, re.UNICODE)
460 else:
461 return regexp
462
463
464class Inliner:
465
466 """
467 Parse inline markup; call the `parse()` method.
468 """
469
470 def __init__(self):
471 self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
472 """List of (pattern, bound method) tuples, used by
473 `self.implicit_inline`."""
474
475 def init_customizations(self, settings):
476 """Setting-based customizations; run when parsing begins."""
477 if settings.pep_references:
478 self.implicit_dispatch.append((self.patterns.pep,
479 self.pep_reference))
480 if settings.rfc_references:
481 self.implicit_dispatch.append((self.patterns.rfc,
482 self.rfc_reference))
483
484 def parse(self, text, lineno, memo, parent):
485 # Needs to be refactored for nested inline markup.
486 # Add nested_parse() method?
487 """
488 Return 2 lists: nodes (text and inline elements), and system_messages.
489
490 Using `self.patterns.initial`, a pattern which matches start-strings
491 (emphasis, strong, interpreted, phrase reference, literal,
492 substitution reference, and inline target) and complete constructs
493 (simple reference, footnote reference), search for a candidate. When
494 one is found, check for validity (e.g., not a quoted '*' character).
495 If valid, search for the corresponding end string if applicable, and
496 check it for validity. If not found or invalid, generate a warning
497 and ignore the start-string. Implicit inline markup (e.g. standalone
498 URIs) is found last.
499 """
500 self.reporter = memo.reporter
501 self.document = memo.document
502 self.language = memo.language
503 self.parent = parent
504 pattern_search = self.patterns.initial.search
505 dispatch = self.dispatch
506 remaining = escape2null(text)
507 processed = []
508 unprocessed = []
509 messages = []
510 while remaining:
511 match = pattern_search(remaining)
512 if match:
513 groups = match.groupdict()
514 method = dispatch[groups['start'] or groups['backquote']
515 or groups['refend'] or groups['fnend']]
516 before, inlines, remaining, sysmessages = method(self, match,
517 lineno)
518 unprocessed.append(before)
519 messages += sysmessages
520 if inlines:
521 processed += self.implicit_inline(''.join(unprocessed),
522 lineno)
523 processed += inlines
524 unprocessed = []
525 else:
526 break
527 remaining = ''.join(unprocessed) + remaining
528 if remaining:
529 processed += self.implicit_inline(remaining, lineno)
530 return processed, messages
531
532 # Inline object recognition
533 # -------------------------
534 # lookahead and look-behind expressions for inline markup rules
535 start_string_prefix = (u'(^|(?<=\\s|[%s%s]))' %
536 (punctuation_chars.openers,
537 punctuation_chars.delimiters))
538 end_string_suffix = (u'($|(?=\\s|[\x00%s%s%s]))' %
539 (punctuation_chars.closing_delimiters,
540 punctuation_chars.delimiters,
541 punctuation_chars.closers))
542 # print start_string_prefix.encode('utf8')
543 # TODO: support non-ASCII whitespace in the following 4 patterns?
544 non_whitespace_before = r'(?<![ \n])'
545 non_whitespace_escape_before = r'(?<![ \n\x00])'
546 non_unescaped_whitespace_escape_before = r'(?<!(?<!\x00)[ \n\x00])'
547 non_whitespace_after = r'(?![ \n])'
548 # Alphanumerics with isolated internal [-._+:] chars (i.e. not 2 together):
549 simplename = r'(?:(?!_)\w)+(?:[-._+:](?:(?!_)\w)+)*'
550 # Valid URI characters (see RFC 2396 & RFC 2732);
551 # final \x00 allows backslash escapes in URIs:
552 uric = r"""[-_.!~*'()[\];/:@&=+$,%a-zA-Z0-9\x00]"""
553 # Delimiter indicating the end of a URI (not part of the URI):
554 uri_end_delim = r"""[>]"""
555 # Last URI character; same as uric but no punctuation:
556 urilast = r"""[_~*/=+a-zA-Z0-9]"""
557 # End of a URI (either 'urilast' or 'uric followed by a
558 # uri_end_delim'):
559 uri_end = r"""(?:%(urilast)s|%(uric)s(?=%(uri_end_delim)s))""" % locals()
560 emailc = r"""[-_!~*'{|}/#?^`&=+$%a-zA-Z0-9\x00]"""
561 email_pattern = r"""
562 %(emailc)s+(?:\.%(emailc)s+)* # name
563 (?<!\x00)@ # at
564 %(emailc)s+(?:\.%(emailc)s*)* # host
565 %(uri_end)s # final URI char
566 """
567 parts = ('initial_inline', start_string_prefix, '',
568 [('start', '', non_whitespace_after, # simple start-strings
569 [r'\*\*', # strong
570 r'\*(?!\*)', # emphasis but not strong
571 r'``', # literal
572 r'_`', # inline internal target
573 r'\|(?!\|)'] # substitution reference
574 ),
575 ('whole', '', end_string_suffix, # whole constructs
576 [# reference name & end-string
577 r'(?P<refname>%s)(?P<refend>__?)' % simplename,
578 ('footnotelabel', r'\[', r'(?P<fnend>\]_)',
579 [r'[0-9]+', # manually numbered
580 r'\#(%s)?' % simplename, # auto-numbered (w/ label?)
581 r'\*', # auto-symbol
582 r'(?P<citationlabel>%s)' % simplename] # citation reference
583 )
584 ]
585 ),
586 ('backquote', # interpreted text or phrase reference
587 '(?P<role>(:%s:)?)' % simplename, # optional role
588 non_whitespace_after,
589 ['`(?!`)'] # but not literal
590 )
591 ]
592 )
593 patterns = Struct(
594 initial=build_regexp(parts),
595 emphasis=re.compile(non_whitespace_escape_before
596 + r'(\*)' + end_string_suffix, re.UNICODE),
597 strong=re.compile(non_whitespace_escape_before
598 + r'(\*\*)' + end_string_suffix, re.UNICODE),
599 interpreted_or_phrase_ref=re.compile(
600 r"""
601 %(non_unescaped_whitespace_escape_before)s
602 (
603 `
604 (?P<suffix>
605 (?P<role>:%(simplename)s:)?
606 (?P<refend>__?)?
607 )
608 )
609 %(end_string_suffix)s
610 """ % locals(), re.VERBOSE | re.UNICODE),
611 embedded_uri=re.compile(
612 r"""
613 (
614 (?:[ \n]+|^) # spaces or beginning of line/string
615 < # open bracket
616 %(non_whitespace_after)s
617 ([^<>\x00]+) # anything but angle brackets & nulls
618 %(non_whitespace_before)s
619 > # close bracket w/o whitespace before
620 )
621 $ # end of string
622 """ % locals(), re.VERBOSE | re.UNICODE),
623 literal=re.compile(non_whitespace_before + '(``)'
624 + end_string_suffix),
625 target=re.compile(non_whitespace_escape_before
626 + r'(`)' + end_string_suffix),
627 substitution_ref=re.compile(non_whitespace_escape_before
628 + r'(\|_{0,2})'
629 + end_string_suffix),
630 email=re.compile(email_pattern % locals() + '$',
631 re.VERBOSE | re.UNICODE),
632 uri=re.compile(
633 (r"""
634 %(start_string_prefix)s
635 (?P<whole>
636 (?P<absolute> # absolute URI
637 (?P<scheme> # scheme (http, ftp, mailto)
638 [a-zA-Z][a-zA-Z0-9.+-]*
639 )
640 :
641 (
642 ( # either:
643 (//?)? # hierarchical URI
644 %(uric)s* # URI characters
645 %(uri_end)s # final URI char
646 )
647 ( # optional query
648 \?%(uric)s*
649 %(uri_end)s
650 )?
651 ( # optional fragment
652 \#%(uric)s*
653 %(uri_end)s
654 )?
655 )
656 )
657 | # *OR*
658 (?P<email> # email address
659 """ + email_pattern + r"""
660 )
661 )
662 %(end_string_suffix)s
663 """) % locals(), re.VERBOSE | re.UNICODE),
664 pep=re.compile(
665 r"""
666 %(start_string_prefix)s
667 (
668 (pep-(?P<pepnum1>\d+)(.txt)?) # reference to source file
669 |
670 (PEP\s+(?P<pepnum2>\d+)) # reference by name
671 )
672 %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE),
673 rfc=re.compile(
674 r"""
675 %(start_string_prefix)s
676 (RFC(-|\s+)?(?P<rfcnum>\d+))
677 %(end_string_suffix)s""" % locals(), re.VERBOSE | re.UNICODE))
678
679 def quoted_start(self, match):
680 """Test if inline markup start-string is 'quoted'.
681
682 'Quoted' in this context means the start-string is enclosed in a pair
683 of matching opening/closing delimiters (not necessarily quotes)
684 or at the end of the match.
685 """
686 string = match.string
687 start = match.start()
688 if start == 0: # start-string at beginning of text
689 return False
690 prestart = string[start - 1]
691 try:
692 poststart = string[match.end()]
693 except IndexError: # start-string at end of text
694 return True # not "quoted" but no markup start-string either
695 return punctuation_chars.match_chars(prestart, poststart)
696
697 def inline_obj(self, match, lineno, end_pattern, nodeclass,
698 restore_backslashes=False):
699 string = match.string
700 matchstart = match.start('start')
701 matchend = match.end('start')
702 if self.quoted_start(match):
703 return (string[:matchend], [], string[matchend:], [], '')
704 endmatch = end_pattern.search(string[matchend:])
705 if endmatch and endmatch.start(1): # 1 or more chars
706 text = unescape(endmatch.string[:endmatch.start(1)],
707 restore_backslashes)
708 textend = matchend + endmatch.end(1)
709 rawsource = unescape(string[matchstart:textend], 1)
710 return (string[:matchstart], [nodeclass(rawsource, text)],
711 string[textend:], [], endmatch.group(1))
712 msg = self.reporter.warning(
713 'Inline %s start-string without end-string.'
714 % nodeclass.__name__, line=lineno)
715 text = unescape(string[matchstart:matchend], 1)
716 rawsource = unescape(string[matchstart:matchend], 1)
717 prb = self.problematic(text, rawsource, msg)
718 return string[:matchstart], [prb], string[matchend:], [msg], ''
719
720 def problematic(self, text, rawsource, message):
721 msgid = self.document.set_id(message, self.parent)
722 problematic = nodes.problematic(rawsource, text, refid=msgid)
723 prbid = self.document.set_id(problematic)
724 message.add_backref(prbid)
725 return problematic
726
727 def emphasis(self, match, lineno):
728 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
729 match, lineno, self.patterns.emphasis, nodes.emphasis)
730 return before, inlines, remaining, sysmessages
731
732 def strong(self, match, lineno):
733 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
734 match, lineno, self.patterns.strong, nodes.strong)
735 return before, inlines, remaining, sysmessages
736
737 def interpreted_or_phrase_ref(self, match, lineno):
738 end_pattern = self.patterns.interpreted_or_phrase_ref
739 string = match.string
740 matchstart = match.start('backquote')
741 matchend = match.end('backquote')
742 rolestart = match.start('role')
743 role = match.group('role')
744 position = ''
745 if role:
746 role = role[1:-1]
747 position = 'prefix'
748 elif self.quoted_start(match):
749 return (string[:matchend], [], string[matchend:], [])
750 endmatch = end_pattern.search(string[matchend:])
751 if endmatch and endmatch.start(1): # 1 or more chars
752 textend = matchend + endmatch.end()
753 if endmatch.group('role'):
754 if role:
755 msg = self.reporter.warning(
756 'Multiple roles in interpreted text (both '
757 'prefix and suffix present; only one allowed).',
758 line=lineno)
759 text = unescape(string[rolestart:textend], 1)
760 prb = self.problematic(text, text, msg)
761 return string[:rolestart], [prb], string[textend:], [msg]
762 role = endmatch.group('suffix')[1:-1]
763 position = 'suffix'
764 escaped = endmatch.string[:endmatch.start(1)]
765 rawsource = unescape(string[matchstart:textend], 1)
766 if rawsource[-1:] == '_':
767 if role:
768 msg = self.reporter.warning(
769 'Mismatch: both interpreted text role %s and '
770 'reference suffix.' % position, line=lineno)
771 text = unescape(string[rolestart:textend], 1)
772 prb = self.problematic(text, text, msg)
773 return string[:rolestart], [prb], string[textend:], [msg]
774 return self.phrase_ref(string[:matchstart], string[textend:],
775 rawsource, escaped, unescape(escaped))
776 else:
777 rawsource = unescape(string[rolestart:textend], 1)
778 nodelist, messages = self.interpreted(rawsource, escaped, role,
779 lineno)
780 return (string[:rolestart], nodelist,
781 string[textend:], messages)
782 msg = self.reporter.warning(
783 'Inline interpreted text or phrase reference start-string '
784 'without end-string.', line=lineno)
785 text = unescape(string[matchstart:matchend], 1)
786 prb = self.problematic(text, text, msg)
787 return string[:matchstart], [prb], string[matchend:], [msg]
788
789 def phrase_ref(self, before, after, rawsource, escaped, text):
790 match = self.patterns.embedded_uri.search(escaped)
791 if match:
792 text = unescape(escaped[:match.start(0)])
793 uri_text = match.group(2)
794 uri = ''.join(uri_text.split())
795 uri = self.adjust_uri(uri)
796 if uri:
797 target = nodes.target(match.group(1), refuri=uri)
798 target.referenced = 1
799 else:
800 raise ApplicationError('problem with URI: %r' % uri_text)
801 if not text:
802 text = uri
803 else:
804 target = None
805 refname = normalize_name(text)
806 reference = nodes.reference(rawsource, text,
807 name=whitespace_normalize_name(text))
808 node_list = [reference]
809 if rawsource[-2:] == '__':
810 if target:
811 reference['refuri'] = uri
812 else:
813 reference['anonymous'] = 1
814 else:
815 if target:
816 reference['refuri'] = uri
817 target['names'].append(refname)
818 self.document.note_explicit_target(target, self.parent)
819 node_list.append(target)
820 else:
821 reference['refname'] = refname
822 self.document.note_refname(reference)
823 return before, node_list, after, []
824
825 def adjust_uri(self, uri):
826 match = self.patterns.email.match(uri)
827 if match:
828 return 'mailto:' + uri
829 else:
830 return uri
831
832 def interpreted(self, rawsource, text, role, lineno):
833 role_fn, messages = roles.role(role, self.language, lineno,
834 self.reporter)
835 if role_fn:
836 nodes, messages2 = role_fn(role, rawsource, text, lineno, self)
837 return nodes, messages + messages2
838 else:
839 msg = self.reporter.error(
840 'Unknown interpreted text role "%s".' % role,
841 line=lineno)
842 return ([self.problematic(rawsource, rawsource, msg)],
843 messages + [msg])
844
845 def literal(self, match, lineno):
846 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
847 match, lineno, self.patterns.literal, nodes.literal,
848 restore_backslashes=True)
849 return before, inlines, remaining, sysmessages
850
851 def inline_internal_target(self, match, lineno):
852 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
853 match, lineno, self.patterns.target, nodes.target)
854 if inlines and isinstance(inlines[0], nodes.target):
855 assert len(inlines) == 1
856 target = inlines[0]
857 name = normalize_name(target.astext())
858 target['names'].append(name)
859 self.document.note_explicit_target(target, self.parent)
860 return before, inlines, remaining, sysmessages
861
862 def substitution_reference(self, match, lineno):
863 before, inlines, remaining, sysmessages, endstring = self.inline_obj(
864 match, lineno, self.patterns.substitution_ref,
865 nodes.substitution_reference)
866 if len(inlines) == 1:
867 subref_node = inlines[0]
868 if isinstance(subref_node, nodes.substitution_reference):
869 subref_text = subref_node.astext()
870 self.document.note_substitution_ref(subref_node, subref_text)
871 if endstring[-1:] == '_':
872 reference_node = nodes.reference(
873 '|%s%s' % (subref_text, endstring), '')
874 if endstring[-2:] == '__':
875 reference_node['anonymous'] = 1
876 else:
877 reference_node['refname'] = normalize_name(subref_text)
878 self.document.note_refname(reference_node)
879 reference_node += subref_node
880 inlines = [reference_node]
881 return before, inlines, remaining, sysmessages
882
883 def footnote_reference(self, match, lineno):
884 """
885 Handles `nodes.footnote_reference` and `nodes.citation_reference`
886 elements.
887 """
888 label = match.group('footnotelabel')
889 refname = normalize_name(label)
890 string = match.string
891 before = string[:match.start('whole')]
892 remaining = string[match.end('whole'):]
893 if match.group('citationlabel'):
894 refnode = nodes.citation_reference('[%s]_' % label,
895 refname=refname)
896 refnode += nodes.Text(label)
897 self.document.note_citation_ref(refnode)
898 else:
899 refnode = nodes.footnote_reference('[%s]_' % label)
900 if refname[0] == '#':
901 refname = refname[1:]
902 refnode['auto'] = 1
903 self.document.note_autofootnote_ref(refnode)
904 elif refname == '*':
905 refname = ''
906 refnode['auto'] = '*'
907 self.document.note_symbol_footnote_ref(
908 refnode)
909 else:
910 refnode += nodes.Text(label)
911 if refname:
912 refnode['refname'] = refname
913 self.document.note_footnote_ref(refnode)
914 if utils.get_trim_footnote_ref_space(self.document.settings):
915 before = before.rstrip()
916 return (before, [refnode], remaining, [])
917
918 def reference(self, match, lineno, anonymous=False):
919 referencename = match.group('refname')
920 refname = normalize_name(referencename)
921 referencenode = nodes.reference(
922 referencename + match.group('refend'), referencename,
923 name=whitespace_normalize_name(referencename))
924 if anonymous:
925 referencenode['anonymous'] = 1
926 else:
927 referencenode['refname'] = refname
928 self.document.note_refname(referencenode)
929 string = match.string
930 matchstart = match.start('whole')
931 matchend = match.end('whole')
932 return (string[:matchstart], [referencenode], string[matchend:], [])
933
934 def anonymous_reference(self, match, lineno):
935 return self.reference(match, lineno, anonymous=1)
936
937 def standalone_uri(self, match, lineno):
938 if (not match.group('scheme')
939 or match.group('scheme').lower() in urischemes.schemes):
940 if match.group('email'):
941 addscheme = 'mailto:'
942 else:
943 addscheme = ''
944 text = match.group('whole')
945 unescaped = unescape(text, 0)
946 return [nodes.reference(unescape(text, 1), unescaped,
947 refuri=addscheme + unescaped)]
948 else: # not a valid scheme
949 raise MarkupMismatch
950
951 def pep_reference(self, match, lineno):
952 text = match.group(0)
953 if text.startswith('pep-'):
954 pepnum = int(match.group('pepnum1'))
955 elif text.startswith('PEP'):
956 pepnum = int(match.group('pepnum2'))
957 else:
958 raise MarkupMismatch
959 ref = (self.document.settings.pep_base_url
960 + self.document.settings.pep_file_url_template % pepnum)
961 unescaped = unescape(text, 0)
962 return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
963
964 rfc_url = 'rfc%d.html'
965
966 def rfc_reference(self, match, lineno):
967 text = match.group(0)
968 if text.startswith('RFC'):
969 rfcnum = int(match.group('rfcnum'))
970 ref = self.document.settings.rfc_base_url + self.rfc_url % rfcnum
971 else:
972 raise MarkupMismatch
973 unescaped = unescape(text, 0)
974 return [nodes.reference(unescape(text, 1), unescaped, refuri=ref)]
975
976 def implicit_inline(self, text, lineno):
977 """
978 Check each of the patterns in `self.implicit_dispatch` for a match,
979 and dispatch to the stored method for the pattern. Recursively check
980 the text before and after the match. Return a list of `nodes.Text`
981 and inline element nodes.
982 """
983 if not text:
984 return []
985 for pattern, method in self.implicit_dispatch:
986 match = pattern.search(text)
987 if match:
988 try:
989 # Must recurse on strings before *and* after the match;
990 # there may be multiple patterns.
991 return (self.implicit_inline(text[:match.start()], lineno)
992 + method(match, lineno) +
993 self.implicit_inline(text[match.end():], lineno))
994 except MarkupMismatch:
995 pass
996 return [nodes.Text(unescape(text), rawsource=unescape(text, 1))]
997
998 dispatch = {'*': emphasis,
999 '**': strong,
1000 '`': interpreted_or_phrase_ref,
1001 '``': literal,
1002 '_`': inline_internal_target,
1003 ']_': footnote_reference,
1004 '|': substitution_reference,
1005 '_': reference,
1006 '__': anonymous_reference}
1007
1008
1009def _loweralpha_to_int(s, _zero=(ord('a')-1)):
1010 return ord(s) - _zero
1011
1012def _upperalpha_to_int(s, _zero=(ord('A')-1)):
1013 return ord(s) - _zero
1014
1015def _lowerroman_to_int(s):
1016 return roman.fromRoman(s.upper())
1017
1018
1019class Body(RSTState):
1020
1021 """
1022 Generic classifier of the first line of a block.
1023 """
1024
1025 double_width_pad_char = tableparser.TableParser.double_width_pad_char
1026 """Padding character for East Asian double-width text."""
1027
1028 enum = Struct()
1029 """Enumerated list parsing information."""
1030
1031 enum.formatinfo = {
1032 'parens': Struct(prefix='(', suffix=')', start=1, end=-1),
1033 'rparen': Struct(prefix='', suffix=')', start=0, end=-1),
1034 'period': Struct(prefix='', suffix='.', start=0, end=-1)}
1035 enum.formats = enum.formatinfo.keys()
1036 enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
1037 'lowerroman', 'upperroman'] # ORDERED!
1038 enum.sequencepats = {'arabic': '[0-9]+',
1039 'loweralpha': '[a-z]',
1040 'upperalpha': '[A-Z]',
1041 'lowerroman': '[ivxlcdm]+',
1042 'upperroman': '[IVXLCDM]+',}
1043 enum.converters = {'arabic': int,
1044 'loweralpha': _loweralpha_to_int,
1045 'upperalpha': _upperalpha_to_int,
1046 'lowerroman': _lowerroman_to_int,
1047 'upperroman': roman.fromRoman}
1048
1049 enum.sequenceregexps = {}
1050 for sequence in enum.sequences:
1051 enum.sequenceregexps[sequence] = re.compile(
1052 enum.sequencepats[sequence] + '$', re.UNICODE)
1053
1054 grid_table_top_pat = re.compile(r'\+-[-+]+-\+ *$')
1055 """Matches the top (& bottom) of a full table)."""
1056
1057 simple_table_top_pat = re.compile('=+( +=+)+ *$')
1058 """Matches the top of a simple table."""
1059
1060 simple_table_border_pat = re.compile('=+[ =]*$')
1061 """Matches the bottom & header bottom of a simple table."""
1062
1063 pats = {}
1064 """Fragments of patterns used by transitions."""
1065
1066 pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
1067 pats['alpha'] = '[a-zA-Z]'
1068 pats['alphanum'] = '[a-zA-Z0-9]'
1069 pats['alphanumplus'] = '[a-zA-Z0-9_-]'
1070 pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s'
1071 '|%(upperroman)s|#)' % enum.sequencepats)
1072 pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
1073 # @@@ Loosen up the pattern? Allow Unicode?
1074 pats['optarg'] = '(%(alpha)s%(alphanumplus)s*|<[^<>]+>)' % pats
1075 pats['shortopt'] = r'(-|\+)%(alphanum)s( ?%(optarg)s)?' % pats
1076 pats['longopt'] = r'(--|/)%(optname)s([ =]%(optarg)s)?' % pats
1077 pats['option'] = r'(%(shortopt)s|%(longopt)s)' % pats
1078
1079 for format in enum.formats:
1080 pats[format] = '(?P<%s>%s%s%s)' % (
1081 format, re.escape(enum.formatinfo[format].prefix),
1082 pats['enum'], re.escape(enum.formatinfo[format].suffix))
1083
1084 patterns = {
1085 'bullet': u'[-+*\u2022\u2023\u2043]( +|$)',
1086 'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)' % pats,
1087 'field_marker': r':(?![: ])([^:\\]|\\.)*(?<! ):( +|$)',
1088 'option_marker': r'%(option)s(, %(option)s)*( +| ?$)' % pats,
1089 'doctest': r'>>>( +|$)',
1090 'line_block': r'\|( +|$)',
1091 'grid_table_top': grid_table_top_pat,
1092 'simple_table_top': simple_table_top_pat,
1093 'explicit_markup': r'\.\.( +|$)',
1094 'anonymous': r'__( +|$)',
1095 'line': r'(%(nonalphanum7bit)s)\1* *$' % pats,
1096 'text': r''}
1097 initial_transitions = (
1098 'bullet',
1099 'enumerator',
1100 'field_marker',
1101 'option_marker',
1102 'doctest',
1103 'line_block',
1104 'grid_table_top',
1105 'simple_table_top',
1106 'explicit_markup',
1107 'anonymous',
1108 'line',
1109 'text')
1110
1111 def indent(self, match, context, next_state):
1112 """Block quote."""
1113 indented, indent, line_offset, blank_finish = \
1114 self.state_machine.get_indented()
1115 elements = self.block_quote(indented, line_offset)
1116 self.parent += elements
1117 if not blank_finish:
1118 self.parent += self.unindent_warning('Block quote')
1119 return context, next_state, []
1120
1121 def block_quote(self, indented, line_offset):
1122 elements = []
1123 while indented:
1124 (blockquote_lines,
1125 attribution_lines,
1126 attribution_offset,
1127 indented,
1128 new_line_offset) = self.split_attribution(indented, line_offset)
1129 blockquote = nodes.block_quote()
1130 self.nested_parse(blockquote_lines, line_offset, blockquote)
1131 elements.append(blockquote)
1132 if attribution_lines:
1133 attribution, messages = self.parse_attribution(
1134 attribution_lines, attribution_offset)
1135 blockquote += attribution
1136 elements += messages
1137 line_offset = new_line_offset
1138 while indented and not indented[0]:
1139 indented = indented[1:]
1140 line_offset += 1
1141 return elements
1142
1143 # U+2014 is an em-dash:
1144 attribution_pattern = re.compile(u'(---?(?!-)|\u2014) *(?=[^ \\n])',
1145 re.UNICODE)
1146
1147 def split_attribution(self, indented, line_offset):
1148 """
1149 Check for a block quote attribution and split it off:
1150
1151 * First line after a blank line must begin with a dash ("--", "---",
1152 em-dash; matches `self.attribution_pattern`).
1153 * Every line after that must have consistent indentation.
1154 * Attributions must be preceded by block quote content.
1155
1156 Return a tuple of: (block quote content lines, content offset,
1157 attribution lines, attribution offset, remaining indented lines).
1158 """
1159 blank = None
1160 nonblank_seen = False
1161 for i in range(len(indented)):
1162 line = indented[i].rstrip()
1163 if line:
1164 if nonblank_seen and blank == i - 1: # last line blank
1165 match = self.attribution_pattern.match(line)
1166 if match:
1167 attribution_end, indent = self.check_attribution(
1168 indented, i)
1169 if attribution_end:
1170 a_lines = indented[i:attribution_end]
1171 a_lines.trim_left(match.end(), end=1)
1172 a_lines.trim_left(indent, start=1)
1173 return (indented[:i], a_lines,
1174 i, indented[attribution_end:],
1175 line_offset + attribution_end)
1176 nonblank_seen = True
1177 else:
1178 blank = i
1179 else:
1180 return (indented, None, None, None, None)
1181
1182 def check_attribution(self, indented, attribution_start):
1183 """
1184 Check attribution shape.
1185 Return the index past the end of the attribution, and the indent.
1186 """
1187 indent = None
1188 i = attribution_start + 1
1189 for i in range(attribution_start + 1, len(indented)):
1190 line = indented[i].rstrip()
1191 if not line:
1192 break
1193 if indent is None:
1194 indent = len(line) - len(line.lstrip())
1195 elif len(line) - len(line.lstrip()) != indent:
1196 return None, None # bad shape; not an attribution
1197 else:
1198 # return index of line after last attribution line:
1199 i += 1
1200 return i, (indent or 0)
1201
1202 def parse_attribution(self, indented, line_offset):
1203 text = '\n'.join(indented).rstrip()
1204 lineno = self.state_machine.abs_line_number() + line_offset
1205 textnodes, messages = self.inline_text(text, lineno)
1206 node = nodes.attribution(text, '', *textnodes)
1207 node.source, node.line = self.state_machine.get_source_and_line(lineno)
1208 return node, messages
1209
1210 def bullet(self, match, context, next_state):
1211 """Bullet list item."""
1212 bulletlist = nodes.bullet_list()
1213 self.parent += bulletlist
1214 bulletlist['bullet'] = match.string[0]
1215 i, blank_finish = self.list_item(match.end())
1216 bulletlist += i
1217 offset = self.state_machine.line_offset + 1 # next line
1218 new_line_offset, blank_finish = self.nested_list_parse(
1219 self.state_machine.input_lines[offset:],
1220 input_offset=self.state_machine.abs_line_offset() + 1,
1221 node=bulletlist, initial_state='BulletList',
1222 blank_finish=blank_finish)
1223 self.goto_line(new_line_offset)
1224 if not blank_finish:
1225 self.parent += self.unindent_warning('Bullet list')
1226 return [], next_state, []
1227
1228 def list_item(self, indent):
1229 if self.state_machine.line[indent:]:
1230 indented, line_offset, blank_finish = (
1231 self.state_machine.get_known_indented(indent))
1232 else:
1233 indented, indent, line_offset, blank_finish = (
1234 self.state_machine.get_first_known_indented(indent))
1235 listitem = nodes.list_item('\n'.join(indented))
1236 if indented:
1237 self.nested_parse(indented, input_offset=line_offset,
1238 node=listitem)
1239 return listitem, blank_finish
1240
1241 def enumerator(self, match, context, next_state):
1242 """Enumerated List Item"""
1243 format, sequence, text, ordinal = self.parse_enumerator(match)
1244 if not self.is_enumerated_list_item(ordinal, sequence, format):
1245 raise statemachine.TransitionCorrection('text')
1246 enumlist = nodes.enumerated_list()
1247 self.parent += enumlist
1248 if sequence == '#':
1249 enumlist['enumtype'] = 'arabic'
1250 else:
1251 enumlist['enumtype'] = sequence
1252 enumlist['prefix'] = self.enum.formatinfo[format].prefix
1253 enumlist['suffix'] = self.enum.formatinfo[format].suffix
1254 if ordinal != 1:
1255 enumlist['start'] = ordinal
1256 msg = self.reporter.info(
1257 'Enumerated list start value not ordinal-1: "%s" (ordinal %s)'
1258 % (text, ordinal))
1259 self.parent += msg
1260 listitem, blank_finish = self.list_item(match.end())
1261 enumlist += listitem
1262 offset = self.state_machine.line_offset + 1 # next line
1263 newline_offset, blank_finish = self.nested_list_parse(
1264 self.state_machine.input_lines[offset:],
1265 input_offset=self.state_machine.abs_line_offset() + 1,
1266 node=enumlist, initial_state='EnumeratedList',
1267 blank_finish=blank_finish,
1268 extra_settings={'lastordinal': ordinal,
1269 'format': format,
1270 'auto': sequence == '#'})
1271 self.goto_line(newline_offset)
1272 if not blank_finish:
1273 self.parent += self.unindent_warning('Enumerated list')
1274 return [], next_state, []
1275
1276 def parse_enumerator(self, match, expected_sequence=None):
1277 """
1278 Analyze an enumerator and return the results.
1279
1280 :Return:
1281 - the enumerator format ('period', 'parens', or 'rparen'),
1282 - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
1283 - the text of the enumerator, stripped of formatting, and
1284 - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
1285 ``None`` is returned for invalid enumerator text).
1286
1287 The enumerator format has already been determined by the regular
1288 expression match. If `expected_sequence` is given, that sequence is
1289 tried first. If not, we check for Roman numeral 1. This way,
1290 single-character Roman numerals (which are also alphabetical) can be
1291 matched. If no sequence has been matched, all sequences are checked in
1292 order.
1293 """
1294 groupdict = match.groupdict()
1295 sequence = ''
1296 for format in self.enum.formats:
1297 if groupdict[format]: # was this the format matched?
1298 break # yes; keep `format`
1299 else: # shouldn't happen
1300 raise ParserError('enumerator format not matched')
1301 text = groupdict[format][self.enum.formatinfo[format].start
1302 :self.enum.formatinfo[format].end]
1303 if text == '#':
1304 sequence = '#'
1305 elif expected_sequence:
1306 try:
1307 if self.enum.sequenceregexps[expected_sequence].match(text):
1308 sequence = expected_sequence
1309 except KeyError: # shouldn't happen
1310 raise ParserError('unknown enumerator sequence: %s'
1311 % sequence)
1312 elif text == 'i':
1313 sequence = 'lowerroman'
1314 elif text == 'I':
1315 sequence = 'upperroman'
1316 if not sequence:
1317 for sequence in self.enum.sequences:
1318 if self.enum.sequenceregexps[sequence].match(text):
1319 break
1320 else: # shouldn't happen
1321 raise ParserError('enumerator sequence not matched')
1322 if sequence == '#':
1323 ordinal = 1
1324 else:
1325 try:
1326 ordinal = self.enum.converters[sequence](text)
1327 except roman.InvalidRomanNumeralError:
1328 ordinal = None
1329 return format, sequence, text, ordinal
1330
1331 def is_enumerated_list_item(self, ordinal, sequence, format):
1332 """
1333 Check validity based on the ordinal value and the second line.
1334
1335 Return true if the ordinal is valid and the second line is blank,
1336 indented, or starts with the next enumerator or an auto-enumerator.
1337 """
1338 if ordinal is None:
1339 return None
1340 try:
1341 next_line = self.state_machine.next_line()
1342 except EOFError: # end of input lines
1343 self.state_machine.previous_line()
1344 return 1
1345 else:
1346 self.state_machine.previous_line()
1347 if not next_line[:1].strip(): # blank or indented
1348 return 1
1349 result = self.make_enumerator(ordinal + 1, sequence, format)
1350 if result:
1351 next_enumerator, auto_enumerator = result
1352 try:
1353 if ( next_line.startswith(next_enumerator) or
1354 next_line.startswith(auto_enumerator) ):
1355 return 1
1356 except TypeError:
1357 pass
1358 return None
1359
1360 def make_enumerator(self, ordinal, sequence, format):
1361 """
1362 Construct and return the next enumerated list item marker, and an
1363 auto-enumerator ("#" instead of the regular enumerator).
1364
1365 Return ``None`` for invalid (out of range) ordinals.
1366 """ #"
1367 if sequence == '#':
1368 enumerator = '#'
1369 elif sequence == 'arabic':
1370 enumerator = str(ordinal)
1371 else:
1372 if sequence.endswith('alpha'):
1373 if ordinal > 26:
1374 return None
1375 enumerator = chr(ordinal + ord('a') - 1)
1376 elif sequence.endswith('roman'):
1377 try:
1378 enumerator = roman.toRoman(ordinal)
1379 except roman.RomanError:
1380 return None
1381 else: # shouldn't happen
1382 raise ParserError('unknown enumerator sequence: "%s"'
1383 % sequence)
1384 if sequence.startswith('lower'):
1385 enumerator = enumerator.lower()
1386 elif sequence.startswith('upper'):
1387 enumerator = enumerator.upper()
1388 else: # shouldn't happen
1389 raise ParserError('unknown enumerator sequence: "%s"'
1390 % sequence)
1391 formatinfo = self.enum.formatinfo[format]
1392 next_enumerator = (formatinfo.prefix + enumerator + formatinfo.suffix
1393 + ' ')
1394 auto_enumerator = formatinfo.prefix + '#' + formatinfo.suffix + ' '
1395 return next_enumerator, auto_enumerator
1396
1397 def field_marker(self, match, context, next_state):
1398 """Field list item."""
1399 field_list = nodes.field_list()
1400 self.parent += field_list
1401 field, blank_finish = self.field(match)
1402 field_list += field
1403 offset = self.state_machine.line_offset + 1 # next line
1404 newline_offset, blank_finish = self.nested_list_parse(
1405 self.state_machine.input_lines[offset:],
1406 input_offset=self.state_machine.abs_line_offset() + 1,
1407 node=field_list, initial_state='FieldList',
1408 blank_finish=blank_finish)
1409 self.goto_line(newline_offset)
1410 if not blank_finish:
1411 self.parent += self.unindent_warning('Field list')
1412 return [], next_state, []
1413
1414 def field(self, match):
1415 name = self.parse_field_marker(match)
1416 src, srcline = self.state_machine.get_source_and_line()
1417 lineno = self.state_machine.abs_line_number()
1418 indented, indent, line_offset, blank_finish = \
1419 self.state_machine.get_first_known_indented(match.end())
1420 field_node = nodes.field()
1421 field_node.source = src
1422 field_node.line = srcline
1423 name_nodes, name_messages = self.inline_text(name, lineno)
1424 field_node += nodes.field_name(name, '', *name_nodes)
1425 field_body = nodes.field_body('\n'.join(indented), *name_messages)
1426 field_node += field_body
1427 if indented:
1428 self.parse_field_body(indented, line_offset, field_body)
1429 return field_node, blank_finish
1430
1431 def parse_field_marker(self, match):
1432 """Extract & return field name from a field marker match."""
1433 field = match.group()[1:] # strip off leading ':'
1434 field = field[:field.rfind(':')] # strip off trailing ':' etc.
1435 return field
1436
1437 def parse_field_body(self, indented, offset, node):
1438 self.nested_parse(indented, input_offset=offset, node=node)
1439
1440 def option_marker(self, match, context, next_state):
1441 """Option list item."""
1442 optionlist = nodes.option_list()
1443 try:
1444 listitem, blank_finish = self.option_list_item(match)
1445 except MarkupError, error:
1446 # This shouldn't happen; pattern won't match.
1447 msg = self.reporter.error(u'Invalid option list marker: %s' %
1448 error)
1449 self.parent += msg
1450 indented, indent, line_offset, blank_finish = \
1451 self.state_machine.get_first_known_indented(match.end())
1452 elements = self.block_quote(indented, line_offset)
1453 self.parent += elements
1454 if not blank_finish:
1455 self.parent += self.unindent_warning('Option list')
1456 return [], next_state, []
1457 self.parent += optionlist
1458 optionlist += listitem
1459 offset = self.state_machine.line_offset + 1 # next line
1460 newline_offset, blank_finish = self.nested_list_parse(
1461 self.state_machine.input_lines[offset:],
1462 input_offset=self.state_machine.abs_line_offset() + 1,
1463 node=optionlist, initial_state='OptionList',
1464 blank_finish=blank_finish)
1465 self.goto_line(newline_offset)
1466 if not blank_finish:
1467 self.parent += self.unindent_warning('Option list')
1468 return [], next_state, []
1469
1470 def option_list_item(self, match):
1471 offset = self.state_machine.abs_line_offset()
1472 options = self.parse_option_marker(match)
1473 indented, indent, line_offset, blank_finish = \
1474 self.state_machine.get_first_known_indented(match.end())
1475 if not indented: # not an option list item
1476 self.goto_line(offset)
1477 raise statemachine.TransitionCorrection('text')
1478 option_group = nodes.option_group('', *options)
1479 description = nodes.description('\n'.join(indented))
1480 option_list_item = nodes.option_list_item('', option_group,
1481 description)
1482 if indented:
1483 self.nested_parse(indented, input_offset=line_offset,
1484 node=description)
1485 return option_list_item, blank_finish
1486
1487 def parse_option_marker(self, match):
1488 """
1489 Return a list of `node.option` and `node.option_argument` objects,
1490 parsed from an option marker match.
1491
1492 :Exception: `MarkupError` for invalid option markers.
1493 """
1494 optlist = []
1495 optionstrings = match.group().rstrip().split(', ')
1496 for optionstring in optionstrings:
1497 tokens = optionstring.split()
1498 delimiter = ' '
1499 firstopt = tokens[0].split('=', 1)
1500 if len(firstopt) > 1:
1501 # "--opt=value" form
1502 tokens[:1] = firstopt
1503 delimiter = '='
1504 elif (len(tokens[0]) > 2
1505 and ((tokens[0].startswith('-')
1506 and not tokens[0].startswith('--'))
1507 or tokens[0].startswith('+'))):
1508 # "-ovalue" form
1509 tokens[:1] = [tokens[0][:2], tokens[0][2:]]
1510 delimiter = ''
1511 if len(tokens) > 1 and (tokens[1].startswith('<')
1512 and tokens[-1].endswith('>')):
1513 # "-o <value1 value2>" form; join all values into one token
1514 tokens[1:] = [' '.join(tokens[1:])]
1515 if 0 < len(tokens) <= 2:
1516 option = nodes.option(optionstring)
1517 option += nodes.option_string(tokens[0], tokens[0])
1518 if len(tokens) > 1:
1519 option += nodes.option_argument(tokens[1], tokens[1],
1520 delimiter=delimiter)
1521 optlist.append(option)
1522 else:
1523 raise MarkupError(
1524 'wrong number of option tokens (=%s), should be 1 or 2: '
1525 '"%s"' % (len(tokens), optionstring))
1526 return optlist
1527
1528 def doctest(self, match, context, next_state):
1529 data = '\n'.join(self.state_machine.get_text_block())
1530 self.parent += nodes.doctest_block(data, data)
1531 return [], next_state, []
1532
1533 def line_block(self, match, context, next_state):
1534 """First line of a line block."""
1535 block = nodes.line_block()
1536 self.parent += block
1537 lineno = self.state_machine.abs_line_number()
1538 line, messages, blank_finish = self.line_block_line(match, lineno)
1539 block += line
1540 self.parent += messages
1541 if not blank_finish:
1542 offset = self.state_machine.line_offset + 1 # next line
1543 new_line_offset, blank_finish = self.nested_list_parse(
1544 self.state_machine.input_lines[offset:],
1545 input_offset=self.state_machine.abs_line_offset() + 1,
1546 node=block, initial_state='LineBlock',
1547 blank_finish=0)
1548 self.goto_line(new_line_offset)
1549 if not blank_finish:
1550 self.parent += self.reporter.warning(
1551 'Line block ends without a blank line.',
1552 line=lineno+1)
1553 if len(block):
1554 if block[0].indent is None:
1555 block[0].indent = 0
1556 self.nest_line_block_lines(block)
1557 return [], next_state, []
1558
1559 def line_block_line(self, match, lineno):
1560 """Return one line element of a line_block."""
1561 indented, indent, line_offset, blank_finish = \
1562 self.state_machine.get_first_known_indented(match.end(),
1563 until_blank=True)
1564 text = u'\n'.join(indented)
1565 text_nodes, messages = self.inline_text(text, lineno)
1566 line = nodes.line(text, '', *text_nodes)
1567 if match.string.rstrip() != '|': # not empty
1568 line.indent = len(match.group(1)) - 1
1569 return line, messages, blank_finish
1570
1571 def nest_line_block_lines(self, block):
1572 for index in range(1, len(block)):
1573 if block[index].indent is None:
1574 block[index].indent = block[index - 1].indent
1575 self.nest_line_block_segment(block)
1576
1577 def nest_line_block_segment(self, block):
1578 indents = [item.indent for item in block]
1579 least = min(indents)
1580 new_items = []
1581 new_block = nodes.line_block()
1582 for item in block:
1583 if item.indent > least:
1584 new_block.append(item)
1585 else:
1586 if len(new_block):
1587 self.nest_line_block_segment(new_block)
1588 new_items.append(new_block)
1589 new_block = nodes.line_block()
1590 new_items.append(item)
1591 if len(new_block):
1592 self.nest_line_block_segment(new_block)
1593 new_items.append(new_block)
1594 block[:] = new_items
1595
1596 def grid_table_top(self, match, context, next_state):
1597 """Top border of a full table."""
1598 return self.table_top(match, context, next_state,
1599 self.isolate_grid_table,
1600 tableparser.GridTableParser)
1601
1602 def simple_table_top(self, match, context, next_state):
1603 """Top border of a simple table."""
1604 return self.table_top(match, context, next_state,
1605 self.isolate_simple_table,
1606 tableparser.SimpleTableParser)
1607
1608 def table_top(self, match, context, next_state,
1609 isolate_function, parser_class):
1610 """Top border of a generic table."""
1611 nodelist, blank_finish = self.table(isolate_function, parser_class)
1612 self.parent += nodelist
1613 if not blank_finish:
1614 msg = self.reporter.warning(
1615 'Blank line required after table.',
1616 line=self.state_machine.abs_line_number()+1)
1617 self.parent += msg
1618 return [], next_state, []
1619
1620 def table(self, isolate_function, parser_class):
1621 """Parse a table."""
1622 block, messages, blank_finish = isolate_function()
1623 if block:
1624 try:
1625 parser = parser_class()
1626 tabledata = parser.parse(block)
1627 tableline = (self.state_machine.abs_line_number() - len(block)
1628 + 1)
1629 table = self.build_table(tabledata, tableline)
1630 nodelist = [table] + messages
1631 except tableparser.TableMarkupError, err:
1632 nodelist = self.malformed_table(block, ' '.join(err.args),
1633 offset=err.offset) + messages
1634 else:
1635 nodelist = messages
1636 return nodelist, blank_finish
1637
1638 def isolate_grid_table(self):
1639 messages = []
1640 blank_finish = 1
1641 try:
1642 block = self.state_machine.get_text_block(flush_left=True)
1643 except statemachine.UnexpectedIndentationError, err:
1644 block, src, srcline = err.args
1645 messages.append(self.reporter.error('Unexpected indentation.',
1646 source=src, line=srcline))
1647 blank_finish = 0
1648 block.disconnect()
1649 # for East Asian chars:
1650 block.pad_double_width(self.double_width_pad_char)
1651 width = len(block[0].strip())
1652 for i in range(len(block)):
1653 block[i] = block[i].strip()
1654 if block[i][0] not in '+|': # check left edge
1655 blank_finish = 0
1656 self.state_machine.previous_line(len(block) - i)
1657 del block[i:]
1658 break
1659 if not self.grid_table_top_pat.match(block[-1]): # find bottom
1660 blank_finish = 0
1661 # from second-last to third line of table:
1662 for i in range(len(block) - 2, 1, -1):
1663 if self.grid_table_top_pat.match(block[i]):
1664 self.state_machine.previous_line(len(block) - i + 1)
1665 del block[i+1:]
1666 break
1667 else:
1668 messages.extend(self.malformed_table(block))
1669 return [], messages, blank_finish
1670 for i in range(len(block)): # check right edge
1671 if len(block[i]) != width or block[i][-1] not in '+|':
1672 messages.extend(self.malformed_table(block))
1673 return [], messages, blank_finish
1674 return block, messages, blank_finish
1675
1676 def isolate_simple_table(self):
1677 start = self.state_machine.line_offset
1678 lines = self.state_machine.input_lines
1679 limit = len(lines) - 1
1680 toplen = len(lines[start].strip())
1681 pattern_match = self.simple_table_border_pat.match
1682 found = 0
1683 found_at = None
1684 i = start + 1
1685 while i <= limit:
1686 line = lines[i]
1687 match = pattern_match(line)
1688 if match:
1689 if len(line.strip()) != toplen:
1690 self.state_machine.next_line(i - start)
1691 messages = self.malformed_table(
1692 lines[start:i+1], 'Bottom/header table border does '
1693 'not match top border.')
1694 return [], messages, i == limit or not lines[i+1].strip()
1695 found += 1
1696 found_at = i
1697 if found == 2 or i == limit or not lines[i+1].strip():
1698 end = i
1699 break
1700 i += 1
1701 else: # reached end of input_lines
1702 if found:
1703 extra = ' or no blank line after table bottom'
1704 self.state_machine.next_line(found_at - start)
1705 block = lines[start:found_at+1]
1706 else:
1707 extra = ''
1708 self.state_machine.next_line(i - start - 1)
1709 block = lines[start:]
1710 messages = self.malformed_table(
1711 block, 'No bottom table border found%s.' % extra)
1712 return [], messages, not extra
1713 self.state_machine.next_line(end - start)
1714 block = lines[start:end+1]
1715 # for East Asian chars:
1716 block.pad_double_width(self.double_width_pad_char)
1717 return block, [], end == limit or not lines[end+1].strip()
1718
1719 def malformed_table(self, block, detail='', offset=0):
1720 block.replace(self.double_width_pad_char, '')
1721 data = '\n'.join(block)
1722 message = 'Malformed table.'
1723 startline = self.state_machine.abs_line_number() - len(block) + 1
1724 if detail:
1725 message += '\n' + detail
1726 error = self.reporter.error(message, nodes.literal_block(data, data),
1727 line=startline+offset)
1728 return [error]
1729
1730 def build_table(self, tabledata, tableline, stub_columns=0):
1731 colwidths, headrows, bodyrows = tabledata
1732 table = nodes.table()
1733 tgroup = nodes.tgroup(cols=len(colwidths))
1734 table += tgroup
1735 for colwidth in colwidths:
1736 colspec = nodes.colspec(colwidth=colwidth)
1737 if stub_columns:
1738 colspec.attributes['stub'] = 1
1739 stub_columns -= 1
1740 tgroup += colspec
1741 if headrows:
1742 thead = nodes.thead()
1743 tgroup += thead
1744 for row in headrows:
1745 thead += self.build_table_row(row, tableline)
1746 tbody = nodes.tbody()
1747 tgroup += tbody
1748 for row in bodyrows:
1749 tbody += self.build_table_row(row, tableline)
1750 return table
1751
1752 def build_table_row(self, rowdata, tableline):
1753 row = nodes.row()
1754 for cell in rowdata:
1755 if cell is None:
1756 continue
1757 morerows, morecols, offset, cellblock = cell
1758 attributes = {}
1759 if morerows:
1760 attributes['morerows'] = morerows
1761 if morecols:
1762 attributes['morecols'] = morecols
1763 entry = nodes.entry(**attributes)
1764 row += entry
1765 if ''.join(cellblock):
1766 self.nested_parse(cellblock, input_offset=tableline+offset,
1767 node=entry)
1768 return row
1769
1770
1771 explicit = Struct()
1772 """Patterns and constants used for explicit markup recognition."""
1773
1774 explicit.patterns = Struct(
1775 target=re.compile(r"""
1776 (
1777 _ # anonymous target
1778 | # *OR*
1779 (?!_) # no underscore at the beginning
1780 (?P<quote>`?) # optional open quote
1781 (?![ `]) # first char. not space or
1782 # backquote
1783 (?P<name> # reference name
1784 .+?
1785 )
1786 %(non_whitespace_escape_before)s
1787 (?P=quote) # close quote if open quote used
1788 )
1789 (?<!(?<!\x00):) # no unescaped colon at end
1790 %(non_whitespace_escape_before)s
1791 [ ]? # optional space
1792 : # end of reference name
1793 ([ ]+|$) # followed by whitespace
1794 """ % vars(Inliner), re.VERBOSE | re.UNICODE),
1795 reference=re.compile(r"""
1796 (
1797 (?P<simple>%(simplename)s)_
1798 | # *OR*
1799 ` # open backquote
1800 (?![ ]) # not space
1801 (?P<phrase>.+?) # hyperlink phrase
1802 %(non_whitespace_escape_before)s
1803 `_ # close backquote,
1804 # reference mark
1805 )
1806 $ # end of string
1807 """ % vars(Inliner), re.VERBOSE | re.UNICODE),
1808 substitution=re.compile(r"""
1809 (
1810 (?![ ]) # first char. not space
1811 (?P<name>.+?) # substitution text
1812 %(non_whitespace_escape_before)s
1813 \| # close delimiter
1814 )
1815 ([ ]+|$) # followed by whitespace
1816 """ % vars(Inliner),
1817 re.VERBOSE | re.UNICODE),)
1818
1819 def footnote(self, match):
1820 src, srcline = self.state_machine.get_source_and_line()
1821 indented, indent, offset, blank_finish = \
1822 self.state_machine.get_first_known_indented(match.end())
1823 label = match.group(1)
1824 name = normalize_name(label)
1825 footnote = nodes.footnote('\n'.join(indented))
1826 footnote.source = src
1827 footnote.line = srcline
1828 if name[0] == '#': # auto-numbered
1829 name = name[1:] # autonumber label
1830 footnote['auto'] = 1
1831 if name:
1832 footnote['names'].append(name)
1833 self.document.note_autofootnote(footnote)
1834 elif name == '*': # auto-symbol
1835 name = ''
1836 footnote['auto'] = '*'
1837 self.document.note_symbol_footnote(footnote)
1838 else: # manually numbered
1839 footnote += nodes.label('', label)
1840 footnote['names'].append(name)
1841 self.document.note_footnote(footnote)
1842 if name:
1843 self.document.note_explicit_target(footnote, footnote)
1844 else:
1845 self.document.set_id(footnote, footnote)
1846 if indented:
1847 self.nested_parse(indented, input_offset=offset, node=footnote)
1848 return [footnote], blank_finish
1849
1850 def citation(self, match):
1851 src, srcline = self.state_machine.get_source_and_line()
1852 indented, indent, offset, blank_finish = \
1853 self.state_machine.get_first_known_indented(match.end())
1854 label = match.group(1)
1855 name = normalize_name(label)
1856 citation = nodes.citation('\n'.join(indented))
1857 citation.source = src
1858 citation.line = srcline
1859 citation += nodes.label('', label)
1860 citation['names'].append(name)
1861 self.document.note_citation(citation)
1862 self.document.note_explicit_target(citation, citation)
1863 if indented:
1864 self.nested_parse(indented, input_offset=offset, node=citation)
1865 return [citation], blank_finish
1866
1867 def hyperlink_target(self, match):
1868 pattern = self.explicit.patterns.target
1869 lineno = self.state_machine.abs_line_number()
1870 block, indent, offset, blank_finish = \
1871 self.state_machine.get_first_known_indented(
1872 match.end(), until_blank=True, strip_indent=False)
1873 blocktext = match.string[:match.end()] + '\n'.join(block)
1874 block = [escape2null(line) for line in block]
1875 escaped = block[0]
1876 blockindex = 0
1877 while True:
1878 targetmatch = pattern.match(escaped)
1879 if targetmatch:
1880 break
1881 blockindex += 1
1882 try:
1883 escaped += block[blockindex]
1884 except IndexError:
1885 raise MarkupError('malformed hyperlink target.')
1886 del block[:blockindex]
1887 block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
1888 target = self.make_target(block, blocktext, lineno,
1889 targetmatch.group('name'))
1890 return [target], blank_finish
1891
1892 def make_target(self, block, block_text, lineno, target_name):
1893 target_type, data = self.parse_target(block, block_text, lineno)
1894 if target_type == 'refname':
1895 target = nodes.target(block_text, '', refname=normalize_name(data))
1896 target.indirect_reference_name = data
1897 self.add_target(target_name, '', target, lineno)
1898 self.document.note_indirect_target(target)
1899 return target
1900 elif target_type == 'refuri':
1901 target = nodes.target(block_text, '')
1902 self.add_target(target_name, data, target, lineno)
1903 return target
1904 else:
1905 return data
1906
1907 def parse_target(self, block, block_text, lineno):
1908 """
1909 Determine the type of reference of a target.
1910
1911 :Return: A 2-tuple, one of:
1912
1913 - 'refname' and the indirect reference name
1914 - 'refuri' and the URI
1915 - 'malformed' and a system_message node
1916 """
1917 if block and block[-1].strip()[-1:] == '_': # possible indirect target
1918 reference = ' '.join([line.strip() for line in block])
1919 refname = self.is_reference(reference)
1920 if refname:
1921 return 'refname', refname
1922 reference = ''.join([''.join(line.split()) for line in block])
1923 return 'refuri', unescape(reference)
1924
1925 def is_reference(self, reference):
1926 match = self.explicit.patterns.reference.match(
1927 whitespace_normalize_name(reference))
1928 if not match:
1929 return None
1930 return unescape(match.group('simple') or match.group('phrase'))
1931
1932 def add_target(self, targetname, refuri, target, lineno):
1933 target.line = lineno
1934 if targetname:
1935 name = normalize_name(unescape(targetname))
1936 target['names'].append(name)
1937 if refuri:
1938 uri = self.inliner.adjust_uri(refuri)
1939 if uri:
1940 target['refuri'] = uri
1941 else:
1942 raise ApplicationError('problem with URI: %r' % refuri)
1943 self.document.note_explicit_target(target, self.parent)
1944 else: # anonymous target
1945 if refuri:
1946 target['refuri'] = refuri
1947 target['anonymous'] = 1
1948 self.document.note_anonymous_target(target)
1949
1950 def substitution_def(self, match):
1951 pattern = self.explicit.patterns.substitution
1952 src, srcline = self.state_machine.get_source_and_line()
1953 block, indent, offset, blank_finish = \
1954 self.state_machine.get_first_known_indented(match.end(),
1955 strip_indent=False)
1956 blocktext = (match.string[:match.end()] + '\n'.join(block))
1957 block.disconnect()
1958 escaped = escape2null(block[0].rstrip())
1959 blockindex = 0
1960 while True:
1961 subdefmatch = pattern.match(escaped)
1962 if subdefmatch:
1963 break
1964 blockindex += 1
1965 try:
1966 escaped = escaped + ' ' + escape2null(block[blockindex].strip())
1967 except IndexError:
1968 raise MarkupError('malformed substitution definition.')
1969 del block[:blockindex] # strip out the substitution marker
1970 block[0] = (block[0].strip() + ' ')[subdefmatch.end()-len(escaped)-1:-1]
1971 if not block[0]:
1972 del block[0]
1973 offset += 1
1974 while block and not block[-1].strip():
1975 block.pop()
1976 subname = subdefmatch.group('name')
1977 substitution_node = nodes.substitution_definition(blocktext)
1978 substitution_node.source = src
1979 substitution_node.line = srcline
1980 if not block:
1981 msg = self.reporter.warning(
1982 'Substitution definition "%s" missing contents.' % subname,
1983 nodes.literal_block(blocktext, blocktext),
1984 source=src, line=srcline)
1985 return [msg], blank_finish
1986 block[0] = block[0].strip()
1987 substitution_node['names'].append(
1988 nodes.whitespace_normalize_name(subname))
1989 new_abs_offset, blank_finish = self.nested_list_parse(
1990 block, input_offset=offset, node=substitution_node,
1991 initial_state='SubstitutionDef', blank_finish=blank_finish)
1992 i = 0
1993 for node in substitution_node[:]:
1994 if not (isinstance(node, nodes.Inline) or
1995 isinstance(node, nodes.Text)):
The diff has been truncated for viewing.

Subscribers

People subscribed via source and target branches