From d4711bb865a17dcefb3b0907c0d452ef49c33c16 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Mon, 11 Nov 2019 12:38:43 -0800 Subject: Updaet spec.txt. --- test/spec.txt | 6854 +++++++++++++++++++++++++++++---------------------------- 1 file changed, 3432 insertions(+), 3422 deletions(-) diff --git a/test/spec.txt b/test/spec.txt index a09394e..1197d1b 100644 --- a/test/spec.txt +++ b/test/spec.txt @@ -326,6 +326,9 @@ A [space](@) is `U+0020`. A [non-whitespace character](@) is any character that is not a [whitespace character]. +An [ASCII control character](@) is a character between `U+0000–1F` (both +including) or `U+007F`. + An [ASCII punctuation character](@) is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, `*`, `+`, `,`, `-`, `.`, `/` (U+0021–2F), @@ -478,3903 +481,3653 @@ bar For security reasons, the Unicode character `U+0000` must be replaced with the REPLACEMENT CHARACTER (`U+FFFD`). -# Blocks and inlines - -We can think of a document as a sequence of -[blocks](@)---structural elements like paragraphs, block -quotations, lists, headings, rules, and code blocks. Some blocks (like -block quotes and list items) contain other blocks; others (like -headings and paragraphs) contain [inline](@) content---text, -links, emphasized text, images, code spans, and so on. -## Precedence +## Backslash escapes -Indicators of block structure always take precedence over indicators -of inline structure. So, for example, the following is a list with -two items, not a list with one item containing a code span: +Any ASCII punctuation character may be backslash-escaped: ```````````````````````````````` example -- `one -- two` +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ . - +

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

```````````````````````````````` -This means that parsing can proceed in two steps: first, the block -structure of the document can be discerned; second, text lines inside -paragraphs, headings, and other block constructs can be parsed for inline -structure. The second step requires information about link reference -definitions that will be available only at the end of the first -step. Note that the first step requires processing lines in sequence, -but the second can be parallelized, since the inline parsing of -one block element does not affect the inline parsing of any other. - -## Container blocks and leaf blocks - -We can divide blocks into two types: -[container blocks](@), -which can contain other blocks, and [leaf blocks](@), -which cannot. - -# Leaf blocks +Backslashes before other characters are treated as literal +backslashes: -This section describes the different kinds of leaf block that make up a -Markdown document. +```````````````````````````````` example +\→\A\a\ \3\φ\« +. +

\→\A\a\ \3\φ\«

+```````````````````````````````` -## Thematic breaks -A line consisting of 0-3 spaces of indentation, followed by a sequence -of three or more matching `-`, `_`, or `*` characters, each followed -optionally by any number of spaces or tabs, forms a -[thematic break](@). +Escaped characters are treated as regular characters and do +not have their usual Markdown meanings: ```````````````````````````````` example -*** ---- -___ +\*not emphasized* +\
not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a heading +\[foo]: /url "not a reference" +\ö not a character entity . -
-
-
+

*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a heading +[foo]: /url "not a reference" +&ouml; not a character entity

```````````````````````````````` -Wrong characters: +If a backslash is itself escaped, the following character is not: ```````````````````````````````` example -+++ +\\*emphasis* . -

+++

+

\emphasis

```````````````````````````````` +A backslash at the end of the line is a [hard line break]: + ```````````````````````````````` example -=== +foo\ +bar . -

===

+

foo
+bar

```````````````````````````````` -Not enough characters: +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: ```````````````````````````````` example --- -** -__ +`` \[\` `` . -

-- -** -__

+

\[\`

```````````````````````````````` -One to three spaces indent are allowed: - ```````````````````````````````` example - *** - *** - *** + \[\] . -
-
-
+
\[\]
+
```````````````````````````````` -Four spaces is too many: - ```````````````````````````````` example - *** +~~~ +\[\] +~~~ . -
***
+
\[\]
 
```````````````````````````````` ```````````````````````````````` example -Foo - *** + . -

Foo -***

+

http://example.com?find=\*

```````````````````````````````` -More than three characters may be used: - ```````````````````````````````` example -_____________________________________ + . -
+
```````````````````````````````` -Spaces are allowed between the characters: +But they work in all other contexts, including URLs and link titles, +link references, and [info strings] in [fenced code blocks]: ```````````````````````````````` example - - - - +[foo](/bar\* "ti\*tle") . -
+

foo

```````````````````````````````` ```````````````````````````````` example - ** * ** * ** * ** +[foo] + +[foo]: /bar\* "ti\*tle" . -
+

foo

```````````````````````````````` ```````````````````````````````` example -- - - - +``` foo\+bar +foo +``` . -
+
foo
+
```````````````````````````````` -Spaces are allowed at the end: +## Entity and numeric character references -```````````````````````````````` example -- - - - -. -
-```````````````````````````````` +Valid HTML entity references and numeric character references +can be used in place of the corresponding Unicode character, +with the following exceptions: +- Entity and character references are not recognized in code + blocks and code spans. -However, no other characters may occur in the line: +- Entity and character references cannot stand in place of + special characters that define structural elements in + CommonMark. For example, although `*` can be used + in place of a literal `*` character, `*` cannot replace + `*` in emphasis delimiters, bullet list markers, or thematic + breaks. -```````````````````````````````` example -_ _ _ _ a +Conforming CommonMark parsers need not store information about +whether a particular character was represented in the source +using a Unicode character or an entity reference. -a------ +[Entity references](@) consist of `&` + any of the valid +HTML5 entity names + `;`. The +document +is used as an authoritative source for the valid entity +references and their corresponding code points. ----a--- +```````````````````````````````` example +  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸ . -

_ _ _ _ a

-

a------

-

---a---

+

  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸

```````````````````````````````` -It is required that all of the [non-whitespace characters] be the same. -So, this is not a thematic break: +[Decimal numeric character +references](@) +consist of `&#` + a string of 1--7 arabic digits + `;`. A +numeric character reference is parsed as the corresponding +Unicode character. Invalid Unicode code points will be replaced by +the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons, +the code point `U+0000` will also be replaced by `U+FFFD`. ```````````````````````````````` example - *-* +# Ӓ Ϡ � . -

-

+

# Ӓ Ϡ �

```````````````````````````````` -Thematic breaks do not need blank lines before or after: +[Hexadecimal numeric character +references](@) consist of `&#` + +either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal). ```````````````````````````````` example -- foo -*** -- bar +" ആ ಫ . -
    -
  • foo
  • -
-
-
    -
  • bar
  • -
+

" ആ ಫ

```````````````````````````````` -Thematic breaks can interrupt a paragraph: +Here are some nonentities: ```````````````````````````````` example -Foo -*** -bar +  &x; &#; &#x; +� +&#abcdef0; +&ThisIsNotDefined; &hi?; . -

Foo

-
-

bar

+

&nbsp &x; &#; &#x; +&#87654321; +&#abcdef0; +&ThisIsNotDefined; &hi?;

```````````````````````````````` -If a line of dashes that meets the above conditions for being a -thematic break could also be interpreted as the underline of a [setext -heading], the interpretation as a -[setext heading] takes precedence. Thus, for example, -this is a setext heading, not a paragraph followed by a thematic break: +Although HTML5 does accept some entity references +without a trailing semicolon (such as `©`), these are not +recognized here, because it makes the grammar too ambiguous: ```````````````````````````````` example -Foo ---- -bar +© . -

Foo

-

bar

+

&copy

```````````````````````````````` -When both a thematic break and a list item are possible -interpretations of a line, the thematic break takes precedence: +Strings that are not on the list of HTML5 named entities are not +recognized as entity references either: ```````````````````````````````` example -* Foo -* * * -* Bar +&MadeUpEntity; . -
    -
  • Foo
  • -
-
-
    -
  • Bar
  • -
+

&MadeUpEntity;

```````````````````````````````` -If you want a thematic break in a list item, use a different bullet: +Entity and numeric character references are recognized in any +context besides code spans or code blocks, including +URLs, [link titles], and [fenced code block][] [info strings]: ```````````````````````````````` example -- Foo -- * * * + . -
    -
  • Foo
  • -
  • -
    -
  • -
+
```````````````````````````````` -## ATX headings - -An [ATX heading](@) -consists of a string of characters, parsed as inline content, between an -opening sequence of 1--6 unescaped `#` characters and an optional -closing sequence of any number of unescaped `#` characters. -The opening sequence of `#` characters must be followed by a -[space] or by the end of line. The optional closing sequence of `#`s must be -preceded by a [space] and may be followed by spaces only. The opening -`#` character may be indented 0-3 spaces. The raw contents of the -heading are stripped of leading and trailing spaces before being parsed -as inline content. The heading level is equal to the number of `#` -characters in the opening sequence. - -Simple headings: - ```````````````````````````````` example -# foo -## foo -### foo -#### foo -##### foo -###### foo +[foo](/föö "föö") . -

foo

-

foo

-

foo

-

foo

-
foo
-
foo
+

foo

```````````````````````````````` -More than six `#` characters is not a heading: - ```````````````````````````````` example -####### foo +[foo] + +[foo]: /föö "föö" . -

####### foo

+

foo

```````````````````````````````` -At least one space is required between the `#` characters and the -heading's contents, unless the heading is empty. Note that many -implementations currently do not require the space. However, the -space was required by the -[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), -and it helps prevent things like the following from being parsed as -headings: - ```````````````````````````````` example -#5 bolt - -#hashtag +``` föö +foo +``` . -

#5 bolt

-

#hashtag

+
foo
+
```````````````````````````````` -This is not a heading, because the first `#` is escaped: +Entity and numeric character references are treated as literal +text in code spans and code blocks: ```````````````````````````````` example -\## foo +`föö` . -

## foo

+

f&ouml;&ouml;

```````````````````````````````` -Contents are parsed as inlines: - ```````````````````````````````` example -# foo *bar* \*baz\* + föfö . -

foo bar *baz*

+
f&ouml;f&ouml;
+
```````````````````````````````` -Leading and trailing [whitespace] is ignored in parsing inline content: +Entity and numeric character references cannot be used +in place of symbols indicating structure in CommonMark +documents. ```````````````````````````````` example -# foo +*foo* +*foo* . -

foo

+

*foo* +foo

```````````````````````````````` - -One to three spaces indentation are allowed: - ```````````````````````````````` example - ### foo - ## foo - # foo +* foo + +* foo . -

foo

-

foo

-

foo

+

* foo

+
    +
  • foo
  • +
```````````````````````````````` +```````````````````````````````` example +foo bar +. +

foo -Four spaces are too much: +bar

+```````````````````````````````` ```````````````````````````````` example - # foo + foo . -
# foo
-
+

→foo

```````````````````````````````` ```````````````````````````````` example -foo - # bar +[a](url "tit") . -

foo -# bar

+

[a](url "tit")

```````````````````````````````` -A closing sequence of `#` characters is optional: -```````````````````````````````` example -## foo ## - ### bar ### -. -

foo

-

bar

-```````````````````````````````` +# Blocks and inlines +We can think of a document as a sequence of +[blocks](@)---structural elements like paragraphs, block +quotations, lists, headings, rules, and code blocks. Some blocks (like +block quotes and list items) contain other blocks; others (like +headings and paragraphs) contain [inline](@) content---text, +links, emphasized text, images, code spans, and so on. -It need not be the same length as the opening sequence: +## Precedence + +Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span: ```````````````````````````````` example -# foo ################################## -##### foo ## +- `one +- two` . -

foo

-
foo
+
    +
  • `one
  • +
  • two`
  • +
```````````````````````````````` -Spaces are allowed after the closing sequence: +This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headings, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other. + +## Container blocks and leaf blocks + +We can divide blocks into two types: +[container blocks](@), +which can contain other blocks, and [leaf blocks](@), +which cannot. + +# Leaf blocks + +This section describes the different kinds of leaf block that make up a +Markdown document. + +## Thematic breaks + +A line consisting of 0-3 spaces of indentation, followed by a sequence +of three or more matching `-`, `_`, or `*` characters, each followed +optionally by any number of spaces or tabs, forms a +[thematic break](@). ```````````````````````````````` example -### foo ### +*** +--- +___ . -

foo

+
+
+
```````````````````````````````` -A sequence of `#` characters with anything but [spaces] following it -is not a closing sequence, but counts as part of the contents of the -heading: +Wrong characters: ```````````````````````````````` example -### foo ### b ++++ . -

foo ### b

+

+++

```````````````````````````````` -The closing sequence must be preceded by a space: - ```````````````````````````````` example -# foo# +=== . -

foo#

+

===

```````````````````````````````` -Backslash-escaped `#` characters do not count as part -of the closing sequence: +Not enough characters: ```````````````````````````````` example -### foo \### -## foo #\## -# foo \# +-- +** +__ . -

foo ###

-

foo ###

-

foo #

+

-- +** +__

```````````````````````````````` -ATX headings need not be separated from surrounding content by blank -lines, and they can interrupt paragraphs: +One to three spaces indent are allowed: ```````````````````````````````` example -**** -## foo -**** + *** + *** + *** .
-

foo

+

```````````````````````````````` +Four spaces is too many: + ```````````````````````````````` example -Foo bar -# baz -Bar foo + *** . -

Foo bar

-

baz

-

Bar foo

+
***
+
```````````````````````````````` -ATX headings can be empty: - ```````````````````````````````` example -## -# -### ### +Foo + *** . -

-

-

+

Foo +***

```````````````````````````````` -## Setext headings - -A [setext heading](@) consists of one or more -lines of text, each containing at least one [non-whitespace -character], with no more than 3 spaces indentation, followed by -a [setext heading underline]. The lines of text must be such -that, were they not followed by the setext heading underline, -they would be interpreted as a paragraph: they cannot be -interpretable as a [code fence], [ATX heading][ATX headings], -[block quote][block quotes], [thematic break][thematic breaks], -[list item][list items], or [HTML block][HTML blocks]. - -A [setext heading underline](@) is a sequence of -`=` characters or a sequence of `-` characters, with no more than 3 -spaces indentation and any number of trailing spaces. If a line -containing a single `-` can be interpreted as an -empty [list items], it should be interpreted this way -and not as a [setext heading underline]. - -The heading is a level 1 heading if `=` characters are used in -the [setext heading underline], and a level 2 heading if `-` -characters are used. The contents of the heading are the result -of parsing the preceding lines of text as CommonMark inline -content. - -In general, a setext heading need not be preceded or followed by a -blank line. However, it cannot interrupt a paragraph, so when a -setext heading comes after a paragraph, a blank line is needed between -them. - -Simple examples: +More than three characters may be used: ```````````````````````````````` example -Foo *bar* -========= - -Foo *bar* ---------- +_____________________________________ . -

Foo bar

-

Foo bar

+
```````````````````````````````` -The content of the header may span more than one line: +Spaces are allowed between the characters: ```````````````````````````````` example -Foo *bar -baz* -==== + - - - . -

Foo bar -baz

+
```````````````````````````````` -The contents are the result of parsing the headings's raw -content as inlines. The heading's raw content is formed by -concatenating the lines and removing initial and final -[whitespace]. ```````````````````````````````` example - Foo *bar -baz*→ -==== + ** * ** * ** * ** . -

Foo bar -baz

+
```````````````````````````````` -The underlining can be any length: - ```````````````````````````````` example -Foo -------------------------- - -Foo -= +- - - - . -

Foo

-

Foo

+
```````````````````````````````` -The heading content can be indented up to three spaces, and need -not line up with the underlining: +Spaces are allowed at the end: ```````````````````````````````` example - Foo ---- - - Foo ------ - - Foo - === +- - - - . -

Foo

-

Foo

-

Foo

+
```````````````````````````````` -Four spaces indent is too much: +However, no other characters may occur in the line: ```````````````````````````````` example - Foo - --- +_ _ _ _ a - Foo ---- -. -
Foo
----
+a------
 
-Foo
-
-
+---a--- +. +

_ _ _ _ a

+

a------

+

---a---

```````````````````````````````` -The setext heading underline can be indented up to three spaces, and -may have trailing spaces: +It is required that all of the [non-whitespace characters] be the same. +So, this is not a thematic break: ```````````````````````````````` example -Foo - ---- + *-* . -

Foo

+

-

```````````````````````````````` -Four spaces is too much: +Thematic breaks do not need blank lines before or after: ```````````````````````````````` example -Foo - --- +- foo +*** +- bar . -

Foo ----

+
    +
  • foo
  • +
+
+
    +
  • bar
  • +
```````````````````````````````` -The setext heading underline cannot contain internal spaces: +Thematic breaks can interrupt a paragraph: ```````````````````````````````` example Foo -= = - -Foo ---- - +*** +bar . -

Foo -= =

Foo


+

bar

```````````````````````````````` -Trailing spaces in the content line do not cause a line break: +If a line of dashes that meets the above conditions for being a +thematic break could also be interpreted as the underline of a [setext +heading], the interpretation as a +[setext heading] takes precedence. Thus, for example, +this is a setext heading, not a paragraph followed by a thematic break: ```````````````````````````````` example -Foo ------ +Foo +--- +bar .

Foo

+

bar

```````````````````````````````` -Nor does a backslash at the end: +When both a thematic break and a list item are possible +interpretations of a line, the thematic break takes precedence: ```````````````````````````````` example -Foo\ ----- +* Foo +* * * +* Bar . -

Foo\

+
    +
  • Foo
  • +
+
+
    +
  • Bar
  • +
```````````````````````````````` -Since indicators of block structure take precedence over -indicators of inline structure, the following are setext headings: +If you want a thematic break in a list item, use a different bullet: ```````````````````````````````` example -`Foo ----- -` - - +- Foo +- * * * . -

`Foo

-

`

-

<a title="a lot

-

of dashes"/>

+
    +
  • Foo
  • +
  • +
    +
  • +
```````````````````````````````` -The setext heading underline cannot be a [lazy continuation -line] in a list item or block quote: +## ATX headings -```````````````````````````````` example -> Foo ---- -. -
-

Foo

-
-
-```````````````````````````````` +An [ATX heading](@) +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped `#` characters and an optional +closing sequence of any number of unescaped `#` characters. +The opening sequence of `#` characters must be followed by a +[space] or by the end of line. The optional closing sequence of `#`s must be +preceded by a [space] and may be followed by spaces only. The opening +`#` character may be indented 0-3 spaces. The raw contents of the +heading are stripped of leading and trailing spaces before being parsed +as inline content. The heading level is equal to the number of `#` +characters in the opening sequence. +Simple headings: ```````````````````````````````` example -> foo -bar -=== +# foo +## foo +### foo +#### foo +##### foo +###### foo . -
-

foo -bar -===

-
+

foo

+

foo

+

foo

+

foo

+
foo
+
foo
```````````````````````````````` -```````````````````````````````` example -- Foo ---- +More than six `#` characters is not a heading: + +```````````````````````````````` example +####### foo . -
    -
  • Foo
  • -
-
+

####### foo

```````````````````````````````` -A blank line is needed between a paragraph and a following -setext heading, since otherwise the paragraph becomes part -of the heading's content: +At least one space is required between the `#` characters and the +heading's contents, unless the heading is empty. Note that many +implementations currently do not require the space. However, the +space was required by the +[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), +and it helps prevent things like the following from being parsed as +headings: ```````````````````````````````` example -Foo -Bar ---- +#5 bolt + +#hashtag . -

Foo -Bar

+

#5 bolt

+

#hashtag

```````````````````````````````` -But in general a blank line is not required before or after -setext headings: +This is not a heading, because the first `#` is escaped: ```````````````````````````````` example ---- -Foo ---- -Bar ---- -Baz +\## foo . -
-

Foo

-

Bar

-

Baz

+

## foo

```````````````````````````````` -Setext headings cannot be empty: +Contents are parsed as inlines: ```````````````````````````````` example - -==== +# foo *bar* \*baz\* . -

====

+

foo bar *baz*

```````````````````````````````` -Setext heading text lines must not be interpretable as block -constructs other than paragraphs. So, the line of dashes -in these examples gets interpreted as a thematic break: +Leading and trailing [whitespace] is ignored in parsing inline content: ```````````````````````````````` example ---- ---- +# foo . -
-
+

foo

```````````````````````````````` +One to three spaces indentation are allowed: + ```````````````````````````````` example -- foo ------ + ### foo + ## foo + # foo . -
    -
  • foo
  • -
-
+

foo

+

foo

+

foo

```````````````````````````````` +Four spaces are too much: + ```````````````````````````````` example - foo ---- + # foo . -
foo
+
# foo
 
-
```````````````````````````````` ```````````````````````````````` example -> foo ------ +foo + # bar . -
-

foo

-
-
+

foo +# bar

```````````````````````````````` -If you want a heading with `> foo` as its literal text, you can -use backslash escapes: +A closing sequence of `#` characters is optional: ```````````````````````````````` example -\> foo ------- +## foo ## + ### bar ### . -

> foo

+

foo

+

bar

```````````````````````````````` -**Compatibility note:** Most existing Markdown implementations -do not allow the text of setext headings to span multiple lines. -But there is no consensus about how to interpret - -``` markdown -Foo -bar ---- -baz -``` +It need not be the same length as the opening sequence: -One can find four different interpretations: +```````````````````````````````` example +# foo ################################## +##### foo ## +. +

foo

+
foo
+```````````````````````````````` -1. paragraph "Foo", heading "bar", paragraph "baz" -2. paragraph "Foo bar", thematic break, paragraph "baz" -3. paragraph "Foo bar --- baz" -4. heading "Foo bar", paragraph "baz" -We find interpretation 4 most natural, and interpretation 4 -increases the expressive power of CommonMark, by allowing -multiline headings. Authors who want interpretation 1 can -put a blank line after the first paragraph: +Spaces are allowed after the closing sequence: ```````````````````````````````` example -Foo - -bar ---- -baz +### foo ### . -

Foo

-

bar

-

baz

+

foo

```````````````````````````````` -Authors who want interpretation 2 can put blank lines around -the thematic break, +A sequence of `#` characters with anything but [spaces] following it +is not a closing sequence, but counts as part of the contents of the +heading: ```````````````````````````````` example -Foo -bar - ---- - -baz +### foo ### b . -

Foo -bar

-
-

baz

+

foo ### b

```````````````````````````````` -or use a thematic break that cannot count as a [setext heading -underline], such as +The closing sequence must be preceded by a space: ```````````````````````````````` example -Foo -bar -* * * -baz +# foo# . -

Foo -bar

-
-

baz

+

foo#

```````````````````````````````` -Authors who want interpretation 3 can use backslash escapes: +Backslash-escaped `#` characters do not count as part +of the closing sequence: ```````````````````````````````` example -Foo -bar -\--- -baz +### foo \### +## foo #\## +# foo \# . -

Foo -bar ---- -baz

+

foo ###

+

foo ###

+

foo #

```````````````````````````````` -## Indented code blocks - -An [indented code block](@) is composed of one or more -[indented chunks] separated by blank lines. -An [indented chunk](@) is a sequence of non-blank lines, -each indented four or more spaces. The contents of the code block are -the literal contents of the lines, including trailing -[line endings], minus four spaces of indentation. -An indented code block has no [info string]. - -An indented code block cannot interrupt a paragraph, so there must be -a blank line between a paragraph and a following indented code block. -(A blank line is not needed, however, between a code block and a following -paragraph.) +ATX headings need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs: ```````````````````````````````` example - a simple - indented code block +**** +## foo +**** . -
a simple
-  indented code block
-
+
+

foo

+
```````````````````````````````` -If there is any ambiguity between an interpretation of indentation -as a code block and as indicating that material belongs to a [list -item][list items], the list item interpretation takes precedence: - ```````````````````````````````` example - - foo - - bar +Foo bar +# baz +Bar foo . -
    -
  • -

    foo

    -

    bar

    -
  • -
+

Foo bar

+

baz

+

Bar foo

```````````````````````````````` -```````````````````````````````` example -1. foo +ATX headings can be empty: - - bar +```````````````````````````````` example +## +# +### ### . -
    -
  1. -

    foo

    -
      -
    • bar
    • -
    -
  2. -
+

+

+

```````````````````````````````` +## Setext headings -The contents of a code block are literal text, and do not get parsed -as Markdown: - -```````````````````````````````` example -
- *hi* +A [setext heading](@) consists of one or more +lines of text, each containing at least one [non-whitespace +character], with no more than 3 spaces indentation, followed by +a [setext heading underline]. The lines of text must be such +that, were they not followed by the setext heading underline, +they would be interpreted as a paragraph: they cannot be +interpretable as a [code fence], [ATX heading][ATX headings], +[block quote][block quotes], [thematic break][thematic breaks], +[list item][list items], or [HTML block][HTML blocks]. - - one -. -
<a/>
-*hi*
+A [setext heading underline](@) is a sequence of
+`=` characters or a sequence of `-` characters, with no more than 3
+spaces indentation and any number of trailing spaces.  If a line
+containing a single `-` can be interpreted as an
+empty [list items], it should be interpreted this way
+and not as a [setext heading underline].
 
-- one
-
-```````````````````````````````` +The heading is a level 1 heading if `=` characters are used in +the [setext heading underline], and a level 2 heading if `-` +characters are used. The contents of the heading are the result +of parsing the preceding lines of text as CommonMark inline +content. +In general, a setext heading need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext heading comes after a paragraph, a blank line is needed between +them. -Here we have three chunks separated by blank lines: +Simple examples: ```````````````````````````````` example - chunk1 +Foo *bar* +========= - chunk2 - - - - chunk3 +Foo *bar* +--------- . -
chunk1
-
-chunk2
+

Foo bar

+

Foo bar

+```````````````````````````````` +The content of the header may span more than one line: -chunk3 -
+```````````````````````````````` example +Foo *bar +baz* +==== +. +

Foo bar +baz

```````````````````````````````` - -Any initial spaces beyond four will be included in the content, even -in interior blank lines: +The contents are the result of parsing the headings's raw +content as inlines. The heading's raw content is formed by +concatenating the lines and removing initial and final +[whitespace]. ```````````````````````````````` example - chunk1 - - chunk2 + Foo *bar +baz*→ +==== . -
chunk1
-  
-  chunk2
-
+

Foo bar +baz

```````````````````````````````` -An indented code block cannot interrupt a paragraph. (This -allows hanging indents and the like.) +The underlining can be any length: ```````````````````````````````` example Foo - bar +------------------------- +Foo += . -

Foo -bar

+

Foo

+

Foo

```````````````````````````````` -However, any non-blank line with fewer than four leading spaces ends -the code block immediately. So a paragraph may occur immediately -after indented code: +The heading content can be indented up to three spaces, and need +not line up with the underlining: ```````````````````````````````` example - foo -bar + Foo +--- + + Foo +----- + + Foo + === . -
foo
-
-

bar

+

Foo

+

Foo

+

Foo

```````````````````````````````` -And indented code can occur immediately before and after other kinds of -blocks: +Four spaces indent is too much: ```````````````````````````````` example -# Heading - foo -Heading ------- - foo ----- + Foo + --- + + Foo +--- . -

Heading

-
foo
-
-

Heading

-
foo
+
Foo
+---
+
+Foo
 

```````````````````````````````` -The first line can be indented more than four spaces: +The setext heading underline can be indented up to three spaces, and +may have trailing spaces: ```````````````````````````````` example - foo - bar +Foo + ---- . -
    foo
-bar
-
+

Foo

```````````````````````````````` -Blank lines preceding or following an indented code block -are not included in it: +Four spaces is too much: ```````````````````````````````` example - - - foo - - +Foo + --- . -
foo
-
+

Foo +---

```````````````````````````````` -Trailing spaces are included in the code block's content: +The setext heading underline cannot contain internal spaces: ```````````````````````````````` example - foo +Foo += = + +Foo +--- - . -
foo  
-
+

Foo += =

+

Foo

+
```````````````````````````````` +Trailing spaces in the content line do not cause a line break: -## Fenced code blocks - -A [code fence](@) is a sequence -of at least three consecutive backtick characters (`` ` ``) or -tildes (`~`). (Tildes and backticks cannot be mixed.) -A [fenced code block](@) -begins with a code fence, indented no more than three spaces. - -The line with the opening code fence may optionally contain some text -following the code fence; this is trimmed of leading and trailing -whitespace and called the [info string](@). If the [info string] comes -after a backtick fence, it may not contain any backtick -characters. (The reason for this restriction is that otherwise -some inline code would be incorrectly interpreted as the -beginning of a fenced code block.) +```````````````````````````````` example +Foo +----- +. +

Foo

+```````````````````````````````` -The content of the code block consists of all subsequent lines, until -a closing [code fence] of the same type as the code block -began with (backticks or tildes), and with at least as many backticks -or tildes as the opening code fence. If the leading code fence is -indented N spaces, then up to N spaces of indentation are removed from -each line of the content (if present). (If a content line is not -indented, it is preserved unchanged. If it is indented less than N -spaces, all of the indentation is removed.) -The closing code fence may be indented up to three spaces, and may be -followed only by spaces, which are ignored. If the end of the -containing block (or document) is reached and no closing code fence -has been found, the code block contains all of the lines after the -opening code fence until the end of the containing block (or -document). (An alternative spec would require backtracking in the -event that a closing code fence is not found. But this makes parsing -much less efficient, and there seems to be no real down side to the -behavior described here.) +Nor does a backslash at the end: -A fenced code block may interrupt a paragraph, and does not require -a blank line either before or after. +```````````````````````````````` example +Foo\ +---- +. +

Foo\

+```````````````````````````````` -The content of a code fence is treated as literal text, not parsed -as inlines. The first word of the [info string] is typically used to -specify the language of the code sample, and rendered in the `class` -attribute of the `code` tag. However, this spec does not mandate any -particular treatment of the [info string]. -Here is a simple example with backticks: +Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headings: ```````````````````````````````` example -``` -< - > -``` +`Foo +---- +` + +
. -
<
- >
-
+

`Foo

+

`

+

<a title="a lot

+

of dashes"/>

```````````````````````````````` -With tildes: +The setext heading underline cannot be a [lazy continuation +line] in a list item or block quote: ```````````````````````````````` example -~~~ -< - > -~~~ +> Foo +--- . -
<
- >
-
+
+

Foo

+
+
```````````````````````````````` -Fewer than three backticks is not enough: ```````````````````````````````` example -`` -foo -`` +> foo +bar +=== . -

foo

+
+

foo +bar +===

+
```````````````````````````````` -The closing code fence must use the same character as the opening -fence: ```````````````````````````````` example -``` -aaa -~~~ -``` +- Foo +--- . -
aaa
-~~~
-
+
    +
  • Foo
  • +
+
```````````````````````````````` +A blank line is needed between a paragraph and a following +setext heading, since otherwise the paragraph becomes part +of the heading's content: + ```````````````````````````````` example -~~~ -aaa -``` -~~~ +Foo +Bar +--- . -
aaa
-```
-
+

Foo +Bar

```````````````````````````````` -The closing code fence must be at least as long as the opening fence: +But in general a blank line is not required before or after +setext headings: ```````````````````````````````` example -```` -aaa -``` -`````` +--- +Foo +--- +Bar +--- +Baz . -
aaa
-```
-
+
+

Foo

+

Bar

+

Baz

```````````````````````````````` +Setext headings cannot be empty: + ```````````````````````````````` example -~~~~ -aaa -~~~ -~~~~ + +==== . -
aaa
-~~~
-
+

====

```````````````````````````````` -Unclosed code blocks are closed by the end of the document -(or the enclosing [block quote][block quotes] or [list item][list items]): +Setext heading text lines must not be interpretable as block +constructs other than paragraphs. So, the line of dashes +in these examples gets interpreted as a thematic break: ```````````````````````````````` example -``` +--- +--- . -
+
+
```````````````````````````````` ```````````````````````````````` example -````` +- foo +----- +. +
    +
  • foo
  • +
+
+```````````````````````````````` -``` -aaa + +```````````````````````````````` example + foo +--- . -

-```
-aaa
+
foo
 
+
```````````````````````````````` ```````````````````````````````` example -> ``` -> aaa - -bbb +> foo +----- .
-
aaa
-
+

foo

-

bbb

+
```````````````````````````````` -A code block can have all empty lines as its content: +If you want a heading with `> foo` as its literal text, you can +use backslash escapes: ```````````````````````````````` example -``` - - -``` +\> foo +------ . -

-  
-
+

> foo

```````````````````````````````` -A code block can be empty: +**Compatibility note:** Most existing Markdown implementations +do not allow the text of setext headings to span multiple lines. +But there is no consensus about how to interpret -```````````````````````````````` example -``` +``` markdown +Foo +bar +--- +baz ``` -. -
-```````````````````````````````` +One can find four different interpretations: -Fences can be indented. If the opening fence is indented, -content lines will have equivalent opening indentation removed, -if present: +1. paragraph "Foo", heading "bar", paragraph "baz" +2. paragraph "Foo bar", thematic break, paragraph "baz" +3. paragraph "Foo bar --- baz" +4. heading "Foo bar", paragraph "baz" + +We find interpretation 4 most natural, and interpretation 4 +increases the expressive power of CommonMark, by allowing +multiline headings. Authors who want interpretation 1 can +put a blank line after the first paragraph: ```````````````````````````````` example - ``` - aaa -aaa -``` +Foo + +bar +--- +baz . -
aaa
-aaa
-
+

Foo

+

bar

+

baz

```````````````````````````````` +Authors who want interpretation 2 can put blank lines around +the thematic break, + ```````````````````````````````` example - ``` -aaa - aaa -aaa - ``` -. -
aaa
-aaa
-aaa
-
-```````````````````````````````` +Foo +bar +--- -```````````````````````````````` example - ``` - aaa - aaa - aaa - ``` +baz . -
aaa
- aaa
-aaa
-
+

Foo +bar

+
+

baz

```````````````````````````````` -Four spaces indentation produces an indented code block: +or use a thematic break that cannot count as a [setext heading +underline], such as ```````````````````````````````` example - ``` - aaa - ``` +Foo +bar +* * * +baz . -
```
-aaa
-```
-
+

Foo +bar

+
+

baz

```````````````````````````````` -Closing fences may be indented by 0-3 spaces, and their indentation -need not match that of the opening fence: +Authors who want interpretation 3 can use backslash escapes: ```````````````````````````````` example -``` -aaa - ``` +Foo +bar +\--- +baz . -
aaa
-
+

Foo +bar +--- +baz

```````````````````````````````` -```````````````````````````````` example - ``` -aaa - ``` -. -
aaa
-
-```````````````````````````````` +## Indented code blocks +An [indented code block](@) is composed of one or more +[indented chunks] separated by blank lines. +An [indented chunk](@) is a sequence of non-blank lines, +each indented four or more spaces. The contents of the code block are +the literal contents of the lines, including trailing +[line endings], minus four spaces of indentation. +An indented code block has no [info string]. -This is not a closing fence, because it is indented 4 spaces: +An indented code block cannot interrupt a paragraph, so there must be +a blank line between a paragraph and a following indented code block. +(A blank line is not needed, however, between a code block and a following +paragraph.) ```````````````````````````````` example -``` -aaa - ``` + a simple + indented code block . -
aaa
-    ```
+
a simple
+  indented code block
 
```````````````````````````````` - -Code fences (opening and closing) cannot contain internal spaces: +If there is any ambiguity between an interpretation of indentation +as a code block and as indicating that material belongs to a [list +item][list items], the list item interpretation takes precedence: ```````````````````````````````` example -``` ``` -aaa + - foo + + bar . -

-aaa

+
    +
  • +

    foo

    +

    bar

    +
  • +
```````````````````````````````` ```````````````````````````````` example -~~~~~~ -aaa -~~~ ~~ +1. foo + + - bar . -
aaa
-~~~ ~~
-
+
    +
  1. +

    foo

    +
      +
    • bar
    • +
    +
  2. +
```````````````````````````````` -Fenced code blocks can interrupt paragraphs, and can be followed -directly by paragraphs, without a blank line between: + +The contents of a code block are literal text, and do not get parsed +as Markdown: ```````````````````````````````` example -foo -``` -bar -``` -baz +
+ *hi* + + - one . -

foo

-
bar
+
<a/>
+*hi*
+
+- one
 
-

baz

```````````````````````````````` -Other blocks can also occur before and after fenced code blocks -without an intervening blank line: +Here we have three chunks separated by blank lines: ```````````````````````````````` example -foo ---- -~~~ -bar -~~~ -# baz + chunk1 + + chunk2 + + + + chunk3 . -

foo

-
bar
+
chunk1
+
+chunk2
+
+
+
+chunk3
 
-

baz

```````````````````````````````` -An [info string] can be provided after the opening code fence. -Although this spec doesn't mandate any particular treatment of -the info string, the first word is typically used to specify -the language of the code block. In HTML output, the language is -normally indicated by adding a class to the `code` element consisting -of `language-` followed by the language name. +Any initial spaces beyond four will be included in the content, even +in interior blank lines: ```````````````````````````````` example -```ruby -def foo(x) - return 3 -end -``` + chunk1 + + chunk2 . -
def foo(x)
-  return 3
-end
+
chunk1
+  
+  chunk2
 
```````````````````````````````` +An indented code block cannot interrupt a paragraph. (This +allows hanging indents and the like.) + ```````````````````````````````` example -~~~~ ruby startline=3 $%@#$ -def foo(x) - return 3 -end -~~~~~~~ +Foo + bar + . -
def foo(x)
-  return 3
-end
-
+

Foo +bar

```````````````````````````````` +However, any non-blank line with fewer than four leading spaces ends +the code block immediately. So a paragraph may occur immediately +after indented code: + ```````````````````````````````` example -````; -```` + foo +bar . -
+
foo
+
+

bar

```````````````````````````````` -[Info strings] for backtick code blocks cannot contain backticks: +And indented code can occur immediately before and after other kinds of +blocks: ```````````````````````````````` example -``` aa ``` -foo +# Heading + foo +Heading +------ + foo +---- . -

aa -foo

+

Heading

+
foo
+
+

Heading

+
foo
+
+
```````````````````````````````` -[Info strings] for tilde code blocks can contain backticks and tildes: +The first line can be indented more than four spaces: ```````````````````````````````` example -~~~ aa ``` ~~~ -foo -~~~ + foo + bar . -
foo
+
    foo
+bar
 
```````````````````````````````` -Closing code fences cannot have [info strings]: +Blank lines preceding or following an indented code block +are not included in it: ```````````````````````````````` example -``` -``` aaa -``` + + + foo + + . -
``` aaa
+
foo
 
```````````````````````````````` +Trailing spaces are included in the code block's content: -## HTML blocks +```````````````````````````````` example + foo +. +
foo  
+
+```````````````````````````````` -An [HTML block](@) is a group of lines that is treated -as raw HTML (and will not be escaped in HTML output). -There are seven kinds of [HTML block], which can be defined by their -start and end conditions. The block begins with a line that meets a -[start condition](@) (after up to three spaces optional indentation). -It ends with the first subsequent line that meets a matching [end -condition](@), or the last line of the document, or the last line of -the [container block](#container-blocks) containing the current HTML -block, if no line is encountered that meets the [end condition]. If -the first line meets both the [start condition] and the [end -condition], the block will contain just that line. -1. **Start condition:** line begins with the string ``, or the end of the line.\ -**End condition:** line contains an end tag -``, `
`, or `` (case-insensitive; it -need not match the start tag). +## Fenced code blocks -2. **Start condition:** line begins with the string ``. +A [code fence](@) is a sequence +of at least three consecutive backtick characters (`` ` ``) or +tildes (`~`). (Tildes and backticks cannot be mixed.) +A [fenced code block](@) +begins with a code fence, indented no more than three spaces. -3. **Start condition:** line begins with the string ``. +The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +whitespace and called the [info string](@). If the [info string] comes +after a backtick fence, it may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.) -4. **Start condition:** line begins with the string ``. +The content of the code block consists of all subsequent lines, until +a closing [code fence] of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +indented N spaces, then up to N spaces of indentation are removed from +each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented less than N +spaces, all of the indentation is removed.) -5. **Start condition:** line begins with the string -``. - -6. **Start condition:** line begins the string `<` or ``, or -the string `/>`.\ -**End condition:** line is followed by a [blank line]. +The closing code fence may be indented up to three spaces, and may be +followed only by spaces, which are ignored. If the end of the +containing block (or document) is reached and no closing code fence +has been found, the code block contains all of the lines after the +opening code fence until the end of the containing block (or +document). (An alternative spec would require backtracking in the +event that a closing code fence is not found. But this makes parsing +much less efficient, and there seems to be no real down side to the +behavior described here.) -7. **Start condition:** line begins with a complete [open tag] -(with any [tag name] other than `script`, -`style`, or `pre`) or a complete [closing tag], -followed only by [whitespace] or the end of the line.\ -**End condition:** line is followed by a [blank line]. +A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after. -HTML blocks continue until they are closed by their appropriate -[end condition], or the last line of the document or other [container -block](#container-blocks). This means any HTML **within an HTML -block** that might otherwise be recognised as a start condition will -be ignored by the parser and passed through as-is, without changing -the parser's state. +The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the [info string] is typically used to +specify the language of the code sample, and rendered in the `class` +attribute of the `code` tag. However, this spec does not mandate any +particular treatment of the [info string]. -For instance, `
` within a HTML block started by `` will not affect
-the parser state; as the HTML block was started in by start condition 6, it
-will end at any blank line. This can be surprising:
+Here is a simple example with backticks:
 
 ```````````````````````````````` example
-
-
-**Hello**,
-
-_world_.
-
-
+``` +< + > +``` . -
-
-**Hello**,
-

world. -

-
+
<
+ >
+
```````````````````````````````` -In this case, the HTML block is terminated by the newline — the `**Hello**` -text remains verbatim — and regular parsing resumes, with a paragraph, -emphasised `world` and inline and block HTML following. - -All types of [HTML blocks] except type 7 may interrupt -a paragraph. Blocks of type 7 may not interrupt a paragraph. -(This restriction is intended to prevent unwanted interpretation -of long tags inside a wrapped paragraph as starting HTML blocks.) -Some simple examples follow. Here are some basic HTML blocks -of type 6: +With tildes: ```````````````````````````````` example - - - - -
- hi -
- -okay. +~~~ +< + > +~~~ . - - - - -
- hi -
-

okay.

+
<
+ >
+
```````````````````````````````` +Fewer than three backticks is not enough: ```````````````````````````````` example -
-*foo* +
aaa
+~~~
+
```````````````````````````````` -Here we have two HTML blocks with a Markdown paragraph between them: - ```````````````````````````````` example -
- -*Markdown* - -
+~~~ +aaa +``` +~~~ . -
-

Markdown

-
+
aaa
+```
+
```````````````````````````````` -The tag on the first line can be partial, as long -as it is split where there would be whitespace: +The closing code fence must be at least as long as the opening fence: ```````````````````````````````` example -
-
+```` +aaa +``` +`````` . -
-
+
aaa
+```
+
```````````````````````````````` ```````````````````````````````` example -
-
+~~~~ +aaa +~~~ +~~~~ . -
-
+
aaa
+~~~
+
```````````````````````````````` -An open tag need not be closed: -```````````````````````````````` example -
-*foo* +Unclosed code blocks are closed by the end of the document +(or the enclosing [block quote][block quotes] or [list item][list items]): -*bar* +```````````````````````````````` example +``` . -
-*foo* -

bar

+
```````````````````````````````` - -A partial tag need not even be completed (garbage -in, garbage out): - ```````````````````````````````` example -
+``` +aaa +
```````````````````````````````` ```````````````````````````````` example -
``` +> aaa + +bbb . -
+
aaa
+
+ +

bbb

```````````````````````````````` -The initial tag doesn't even need to be a valid -tag, as long as it starts like one: +A code block can have all empty lines as its content: ```````````````````````````````` example -
+ +
```````````````````````````````` -In type 6 blocks, the initial tag need not be on a line by -itself: +A code block can be empty: ```````````````````````````````` example - +``` +``` . - +
```````````````````````````````` +Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present: + ```````````````````````````````` example -
-foo -
+ ``` + aaa +aaa +``` . -
-foo -
+
aaa
+aaa
+
```````````````````````````````` -Everything until the next blank line or end of document -gets included in the HTML block. So, in the following -example, what looks like a Markdown code block -is actually part of the HTML block, which continues until a blank -line or the end of the document is reached: - ```````````````````````````````` example -
-``` c -int x = 33; -``` + ``` +aaa + aaa +aaa + ``` . -
-``` c -int x = 33; -``` +
aaa
+aaa
+aaa
+
```````````````````````````````` -To start an [HTML block] with a tag that is *not* in the -list of block-level tags in (6), you must put the tag by -itself on the first line (and it must be complete): - ```````````````````````````````` example - -*bar* - + ``` + aaa + aaa + aaa + ``` . - -*bar* - +
aaa
+ aaa
+aaa
+
```````````````````````````````` -In type 7 blocks, the [tag name] can be anything: +Four spaces indentation produces an indented code block: ```````````````````````````````` example - -*bar* - + ``` + aaa + ``` . - -*bar* - +
```
+aaa
+```
+
```````````````````````````````` +Closing fences may be indented by 0-3 spaces, and their indentation +need not match that of the opening fence: + ```````````````````````````````` example - -*bar* - +``` +aaa + ``` . - -*bar* - +
aaa
+
```````````````````````````````` ```````````````````````````````` example - -*bar* + ``` +aaa + ``` . - -*bar* +
aaa
+
```````````````````````````````` -These rules are designed to allow us to work with tags that -can function as either block-level or inline-level tags. -The `` tag is a nice example. We can surround content with -`` tags in three different ways. In this case, we get a raw -HTML block, because the `` tag is on a line by itself: +This is not a closing fence, because it is indented 4 spaces: ```````````````````````````````` example - -*foo* - +``` +aaa + ``` . - -*foo* - +
aaa
+    ```
+
```````````````````````````````` -In this case, we get a raw HTML block that just includes -the `` tag (because it ends with the following blank -line). So the contents get interpreted as CommonMark: + +Code fences (opening and closing) cannot contain internal spaces: ```````````````````````````````` example - +``` ``` +aaa +. +

+aaa

+```````````````````````````````` -*foo* -
+```````````````````````````````` example +~~~~~~ +aaa +~~~ ~~ . - -

foo

-
+
aaa
+~~~ ~~
+
```````````````````````````````` -Finally, in this case, the `` tags are interpreted -as [raw HTML] *inside* the CommonMark paragraph. (Because -the tag is not on a line by itself, we get inline HTML -rather than an [HTML block].) +Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between: ```````````````````````````````` example -*foo* +foo +``` +bar +``` +baz . -

foo

+

foo

+
bar
+
+

baz

```````````````````````````````` -HTML tags designed to contain literal content -(`script`, `style`, `pre`), comments, processing instructions, -and declarations are treated somewhat differently. -Instead of ending at the first blank line, these blocks -end at the first line containing a corresponding end tag. -As a result, these blocks can contain blank lines: - -A pre tag (type 1): +Other blocks can also occur before and after fenced code blocks +without an intervening blank line: ```````````````````````````````` example -

-import Text.HTML.TagSoup
-
-main :: IO ()
-main = print $ parseTags tags
-
-okay +foo +--- +~~~ +bar +~~~ +# baz . -

-import Text.HTML.TagSoup
-
-main :: IO ()
-main = print $ parseTags tags
+

foo

+
bar
 
-

okay

+

baz

```````````````````````````````` -A script tag (type 1): +An [info string] can be provided after the opening code fence. +Although this spec doesn't mandate any particular treatment of +the info string, the first word is typically used to specify +the language of the code block. In HTML output, the language is +normally indicated by adding a class to the `code` element consisting +of `language-` followed by the language name. ```````````````````````````````` example - -okay +```ruby +def foo(x) + return 3 +end +``` . - -

okay

+
def foo(x)
+  return 3
+end
+
```````````````````````````````` -A style tag (type 1): - ```````````````````````````````` example - -okay +~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ . - -

okay

+
def foo(x)
+  return 3
+end
+
```````````````````````````````` -If there is no matching end tag, the block will end at the -end of the document (or the enclosing [block quote][block quotes] -or [list item][list items]): - ```````````````````````````````` example - -*foo* +``` +``` aaa +``` . - -

foo

+
``` aaa
+
```````````````````````````````` -```````````````````````````````` example -*bar* -*baz* -. -*bar* -

baz

-```````````````````````````````` +## HTML blocks -Note that anything on the last line after the -end tag will be included in the [HTML block]: +An [HTML block](@) is a group of lines that is treated +as raw HTML (and will not be escaped in HTML output). -```````````````````````````````` example -1. *bar* -. -1. *bar* -```````````````````````````````` +There are seven kinds of [HTML block], which can be defined by their +start and end conditions. The block begins with a line that meets a +[start condition](@) (after up to three spaces optional indentation). +It ends with the first subsequent line that meets a matching [end +condition](@), or the last line of the document, or the last line of +the [container block](#container-blocks) containing the current HTML +block, if no line is encountered that meets the [end condition]. If +the first line meets both the [start condition] and the [end +condition], the block will contain just that line. +1. **Start condition:** line begins with the string ``, or the end of the line.\ +**End condition:** line contains an end tag +``, `
`, or `` (case-insensitive; it +need not match the start tag). -A comment (type 2): +2. **Start condition:** line begins with the string ``. -```````````````````````````````` example - -okay -. - -

okay

-```````````````````````````````` +5. **Start condition:** line begins with the string +``. + +6. **Start condition:** line begins the string `<` or ``, or +the string `/>`.\ +**End condition:** line is followed by a [blank line]. +7. **Start condition:** line begins with a complete [open tag] +(with any [tag name] other than `script`, +`style`, or `pre`) or a complete [closing tag], +followed only by [whitespace] or the end of the line.\ +**End condition:** line is followed by a [blank line]. +HTML blocks continue until they are closed by their appropriate +[end condition], or the last line of the document or other [container +block](#container-blocks). This means any HTML **within an HTML +block** that might otherwise be recognised as a start condition will +be ignored by the parser and passed through as-is, without changing +the parser's state. -A processing instruction (type 3): +For instance, `
` within a HTML block started by `` will not affect
+the parser state; as the HTML block was started in by start condition 6, it
+will end at any blank line. This can be surprising:
 
 ```````````````````````````````` example
-';
+
+
+**Hello**,
 
-?>
-okay
+_world_.
+
+
. -'; - -?> -

okay

+
+
+**Hello**,
+

world. +

+
```````````````````````````````` +In this case, the HTML block is terminated by the newline — the `**Hello**` +text remains verbatim — and regular parsing resumes, with a paragraph, +emphasised `world` and inline and block HTML following. -A declaration (type 4): +All types of [HTML blocks] except type 7 may interrupt +a paragraph. Blocks of type 7 may not interrupt a paragraph. +(This restriction is intended to prevent unwanted interpretation +of long tags inside a wrapped paragraph as starting HTML blocks.) + +Some simple examples follow. Here are some basic HTML blocks +of type 6: ```````````````````````````````` example - + + + + +
+ hi +
+ +okay. . - + + + + +
+ hi +
+

okay.

```````````````````````````````` -CDATA (type 5): - ```````````````````````````````` example - -okay + +*foo* ```````````````````````````````` +Here we have two HTML blocks with a Markdown paragraph between them: + ```````````````````````````````` example -
+
-
+*Markdown* + +
. -
-
<div>
-
+
+

Markdown

+
```````````````````````````````` -An HTML block of types 1--6 can interrupt a paragraph, and need not be -preceded by a blank line. +The tag on the first line can be partial, as long +as it is split where there would be whitespace: ```````````````````````````````` example -Foo -
-bar +
. -

Foo

-
-bar +
```````````````````````````````` -However, a following blank line is needed, except at the end of -a document, and except for blocks of types 1--5, [above][HTML -block]: - ```````````````````````````````` example -
-bar +
-*foo* . -
-bar +
-*foo* ```````````````````````````````` -HTML blocks of type 7 cannot interrupt a paragraph: - +An open tag need not be closed: ```````````````````````````````` example -Foo -
-baz +
+*foo* + +*bar* . -

Foo - -baz

+
+*foo* +

bar

```````````````````````````````` -This rule differs from John Gruber's original Markdown syntax -specification, which says: - -> The only restrictions are that block-level HTML elements — -> e.g. `
`, ``, `
`, `

`, etc. — must be separated from -> surrounding content by blank lines, and the start and end tags of the -> block should not be indented with tabs or spaces. - -In some ways Gruber's rule is more restrictive than the one given -here: - -- It requires that an HTML block be preceded by a blank line. -- It does not allow the start tag to be indented. -- It requires a matching end tag, which it also does not allow to - be indented. - -Most Markdown implementations (including some of Gruber's own) do not -respect all of these restrictions. - -There is one respect, however, in which Gruber's rule is more liberal -than the one given here, since it allows blank lines to occur inside -an HTML block. There are two reasons for disallowing them here. -First, it removes the need to parse balanced tags, which is -expensive and can require backtracking from the end of the document -if no matching end tag is found. Second, it provides a very simple -and flexible way of including Markdown content inside HTML tags: -simply separate the Markdown from the HTML using blank lines: -Compare: +A partial tag need not even be completed (garbage +in, garbage out): ```````````````````````````````` example -

- -*Emphasized* text. - -
+
-

Emphasized text.

-
+
-*Emphasized* text. -
+
-*Emphasized* text. -
+
- -
- - - - - -
-Hi -
+
- - -Hi - - - +
- - - - - Hi - + +. + +```````````````````````````````` - - +```````````````````````````````` example +
+foo +
. - - -
<td>
-  Hi
-</td>
-
- -
+
+foo +
```````````````````````````````` -Fortunately, blank lines are usually not necessary and can be -deleted. The exception is inside `
` tags, but as described
-[above][HTML blocks], raw HTML blocks starting with `
`
-*can* contain blank lines.
+Everything until the next blank line or end of document
+gets included in the HTML block.  So, in the following
+example, what looks like a Markdown code block
+is actually part of the HTML block, which continues until a blank
+line or the end of the document is reached:
 
-## Link reference definitions
+```````````````````````````````` example
+
+``` c +int x = 33; +``` +. +
+``` c +int x = 33; +``` +```````````````````````````````` -A [link reference definition](@) -consists of a [link label], indented up to three spaces, followed -by a colon (`:`), optional [whitespace] (including up to one -[line ending]), a [link destination], -optional [whitespace] (including up to one -[line ending]), and an optional [link -title], which if it is present must be separated -from the [link destination] by [whitespace]. -No further [non-whitespace characters] may occur on the line. -A [link reference definition] -does not correspond to a structural element of a document. Instead, it -defines a label which can be used in [reference links] -and reference-style [images] elsewhere in the document. [Link -reference definitions] can come either before or after the links that use -them. +To start an [HTML block] with a tag that is *not* in the +list of block-level tags in (6), you must put the tag by +itself on the first line (and it must be complete): ```````````````````````````````` example -[foo]: /url "title" - -[foo] + +*bar* + . -

foo

+ +*bar* + ```````````````````````````````` -```````````````````````````````` example - [foo]: - /url - 'the title' +In type 7 blocks, the [tag name] can be anything: -[foo] +```````````````````````````````` example + +*bar* + . -

foo

+ +*bar* + ```````````````````````````````` ```````````````````````````````` example -[Foo*bar\]]:my_(url) 'title (with parens)' - -[Foo*bar\]] + +*bar* + . -

Foo*bar]

+ +*bar* + ```````````````````````````````` ```````````````````````````````` example -[Foo bar]: - -'title' - -[Foo bar] + +*bar* . -

Foo bar

+ +*bar* ```````````````````````````````` -The title may extend over multiple lines: +These rules are designed to allow us to work with tags that +can function as either block-level or inline-level tags. +The `` tag is a nice example. We can surround content with +`` tags in three different ways. In this case, we get a raw +HTML block, because the `` tag is on a line by itself: ```````````````````````````````` example -[foo]: /url ' -title -line1 -line2 -' - -[foo] + +*foo* + . -

foo

+ +*foo* + ```````````````````````````````` -However, it may not contain a [blank line]: +In this case, we get a raw HTML block that just includes +the `` tag (because it ends with the following blank +line). So the contents get interpreted as CommonMark: ```````````````````````````````` example -[foo]: /url 'title + -with blank line' +*foo* -[foo] + . -

[foo]: /url 'title

-

with blank line'

-

[foo]

+ +

foo

+
```````````````````````````````` -The title may be omitted: - -```````````````````````````````` example -[foo]: -/url +Finally, in this case, the `` tags are interpreted +as [raw HTML] *inside* the CommonMark paragraph. (Because +the tag is not on a line by itself, we get inline HTML +rather than an [HTML block].) -[foo] +```````````````````````````````` example +*foo* . -

foo

+

foo

```````````````````````````````` -The link destination may not be omitted: +HTML tags designed to contain literal content +(`script`, `style`, `pre`), comments, processing instructions, +and declarations are treated somewhat differently. +Instead of ending at the first blank line, these blocks +end at the first line containing a corresponding end tag. +As a result, these blocks can contain blank lines: + +A pre tag (type 1): ```````````````````````````````` example -[foo]: +

+import Text.HTML.TagSoup
 
-[foo]
+main :: IO ()
+main = print $ parseTags tags
+
+okay . -

[foo]:

-

[foo]

+

+import Text.HTML.TagSoup
+
+main :: IO ()
+main = print $ parseTags tags
+
+

okay

```````````````````````````````` - However, an empty link destination may be specified using - angle brackets: + +A script tag (type 1): ```````````````````````````````` example -[foo]: <> + +okay . -

foo

+ +

okay

```````````````````````````````` -The title must be separated from the link destination by -whitespace: + +A style tag (type 1): ```````````````````````````````` example -[foo]: (baz) + +okay . -

[foo]: (baz)

-

[foo]

+ +

okay

```````````````````````````````` -Both title and destination can contain backslash escapes -and literal backslashes: +If there is no matching end tag, the block will end at the +end of the document (or the enclosing [block quote][block quotes] +or [list item][list items]): ```````````````````````````````` example -[foo]: /url\bar\*baz "foo\"bar\baz" + +*foo* . -

Foo

+ +

foo

```````````````````````````````` ```````````````````````````````` example -[ΑΓΩ]: /φου - -[αγω] +*bar* +*baz* . -

αγω

+*bar* +

baz

```````````````````````````````` -Here is a link reference definition with no corresponding link. -It contributes nothing to the document. +Note that anything on the last line after the +end tag will be included in the [HTML block]: ```````````````````````````````` example -[foo]: /url +1. *bar* . +1. *bar* ```````````````````````````````` -Here is another one: +A comment (type 2): ```````````````````````````````` example -[ -foo -]: /url + +okay . -

bar

+ +

okay

```````````````````````````````` -This is not a link reference definition, because there are -[non-whitespace characters] after the title: + +A processing instruction (type 3): ```````````````````````````````` example -[foo]: /url "title" ok +'; + +?> +okay . -

[foo]: /url "title" ok

+'; + +?> +

okay

```````````````````````````````` -This is a link reference definition, but it has no title: +A declaration (type 4): ```````````````````````````````` example -[foo]: /url -"title" ok + . -

"title" ok

+ ```````````````````````````````` -This is not a link reference definition, because it is indented -four spaces: +CDATA (type 5): ```````````````````````````````` example - [foo]: /url "title" + +okay . -
[foo]: /url "title"
-
-

[foo]

+ +

okay

```````````````````````````````` -This is not a link reference definition, because it occurs inside -a code block: +The opening tag can be indented 1-3 spaces, but not 4: ```````````````````````````````` example -``` -[foo]: /url -``` + -[foo] + . -
[foo]: /url
+  
+
<!-- foo -->
 
-

[foo]

```````````````````````````````` -A [link reference definition] cannot interrupt a paragraph. - ```````````````````````````````` example -Foo -[bar]: /baz +
-[bar] +
. -

Foo -[bar]: /baz

-

[bar]

+
+
<div>
+
```````````````````````````````` -However, it can directly follow other block elements, such as headings -and thematic breaks, and it need not be followed by a blank line. +An HTML block of types 1--6 can interrupt a paragraph, and need not be +preceded by a blank line. ```````````````````````````````` example -# [Foo] -[foo]: /url -> bar -. -

Foo

-
-

bar

-
+Foo +
+bar +
+. +

Foo

+
+bar +
```````````````````````````````` + +However, a following blank line is needed, except at the end of +a document, and except for blocks of types 1--5, [above][HTML +block]: + ```````````````````````````````` example -[foo]: /url +
bar -=== -[foo] +
+*foo* . -

bar

-

foo

+
+bar +
+*foo* ```````````````````````````````` + +HTML blocks of type 7 cannot interrupt a paragraph: + ```````````````````````````````` example -[foo]: /url -=== -[foo] +Foo + +baz . -

=== -foo

+

Foo + +baz

```````````````````````````````` -Several [link reference definitions] -can occur one after another, without intervening blank lines. +This rule differs from John Gruber's original Markdown syntax +specification, which says: -```````````````````````````````` example -[foo]: /foo-url "foo" -[bar]: /bar-url - "bar" -[baz]: /baz-url +> The only restrictions are that block-level HTML elements — +> e.g. `
`, ``, `
`, `

`, etc. — must be separated from +> surrounding content by blank lines, and the start and end tags of the +> block should not be indented with tabs or spaces. -[foo], -[bar], -[baz] -. -

foo, -bar, -baz

-```````````````````````````````` +In some ways Gruber's rule is more restrictive than the one given +here: +- It requires that an HTML block be preceded by a blank line. +- It does not allow the start tag to be indented. +- It requires a matching end tag, which it also does not allow to + be indented. -[Link reference definitions] can occur -inside block containers, like lists and block quotations. They -affect the entire document, not just the container in which they -are defined: +Most Markdown implementations (including some of Gruber's own) do not +respect all of these restrictions. + +There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including Markdown content inside HTML tags: +simply separate the Markdown from the HTML using blank lines: + +Compare: ```````````````````````````````` example -[foo] +
-> [foo]: /url +*Emphasized* text. + +
. -

foo

-
-
+
+

Emphasized text.

+
```````````````````````````````` -Whether something is a [link reference definition] is -independent of whether the link reference it defines is -used in the document. Thus, for example, the following -document contains just a link reference definition, and -no visible content: - ```````````````````````````````` example -[foo]: /url +
+*Emphasized* text. +
. +
+*Emphasized* text. +
```````````````````````````````` -## Paragraphs - -A sequence of non-blank lines that cannot be interpreted as other -kinds of blocks forms a [paragraph](@). -The contents of the paragraph are the result of parsing the -paragraph's raw content as inlines. The paragraph's raw content -is formed by concatenating the lines and removing initial and final -[whitespace]. +Some Markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute `markdown=1`. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse. -A simple example with two paragraphs: +The main potential drawback is that one can no longer paste HTML +blocks into Markdown documents with 100% reliability. However, +*in most cases* this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example: ```````````````````````````````` example -aaa - -bbb -. -

aaa

-

bbb

-```````````````````````````````` +
+ -Paragraphs can contain multiple lines, but no blank lines: + -```````````````````````````````` example -aaa -bbb + -ccc -ddd +
+Hi +
. -

aaa -bbb

-

ccc -ddd

+ + + + +
+Hi +
```````````````````````````````` -Multiple blank lines between paragraph have no effect: +There are problems, however, if the inner tags are indented +*and* separated by spaces, as then they will be interpreted as +an indented code block: ```````````````````````````````` example -aaa + + -bbb + + + + +
+ Hi +
. -

aaa

-

bbb

+ + +
<td>
+  Hi
+</td>
+
+ +
```````````````````````````````` -Leading spaces are skipped: +Fortunately, blank lines are usually not necessary and can be +deleted. The exception is inside `
` tags, but as described
+[above][HTML blocks], raw HTML blocks starting with `
`
+*can* contain blank lines.
+
+## Link reference definitions
+
+A [link reference definition](@)
+consists of a [link label], indented up to three spaces, followed
+by a colon (`:`), optional [whitespace] (including up to one
+[line ending]), a [link destination],
+optional [whitespace] (including up to one
+[line ending]), and an optional [link
+title], which if it is present must be separated
+from the [link destination] by [whitespace].
+No further [non-whitespace characters] may occur on the line.
+
+A [link reference definition]
+does not correspond to a structural element of a document.  Instead, it
+defines a label which can be used in [reference links]
+and reference-style [images] elsewhere in the document.  [Link
+reference definitions] can come either before or after the links that use
+them.
 
 ```````````````````````````````` example
-  aaa
- bbb
+[foo]: /url "title"
+
+[foo]
 .
-

aaa -bbb

+

foo

```````````````````````````````` -Lines after the first may be indented any amount, since indented -code blocks cannot interrupt paragraphs. - ```````````````````````````````` example -aaa - bbb - ccc + [foo]: + /url + 'the title' + +[foo] . -

aaa -bbb -ccc

+

foo

```````````````````````````````` -However, the first line may be indented at most three spaces, -or an indented code block will be triggered: - ```````````````````````````````` example - aaa -bbb +[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] . -

aaa -bbb

+

Foo*bar]

```````````````````````````````` ```````````````````````````````` example - aaa -bbb +[Foo bar]: + +'title' + +[Foo bar] . -
aaa
-
-

bbb

+

Foo bar

```````````````````````````````` -Final spaces are stripped before inline parsing, so a paragraph -that ends with two or more spaces will not end with a [hard line -break]: +The title may extend over multiple lines: ```````````````````````````````` example -aaa -bbb +[foo]: /url ' +title +line1 +line2 +' + +[foo] . -

aaa
-bbb

+

foo

```````````````````````````````` -## Blank lines - -[Blank lines] between block-level elements are ignored, -except for the role they play in determining whether a [list] -is [tight] or [loose]. - -Blank lines at the beginning and end of the document are also ignored. +However, it may not contain a [blank line]: ```````````````````````````````` example - - -aaa - +[foo]: /url 'title -# aaa +with blank line' - +[foo] . -

aaa

-

aaa

+

[foo]: /url 'title

+

with blank line'

+

[foo]

```````````````````````````````` +The title may be omitted: -# Container blocks - -A [container block](#container-blocks) is a block that has other -blocks as its contents. There are two basic kinds of container blocks: -[block quotes] and [list items]. -[Lists] are meta-containers for [list items]. - -We define the syntax for container blocks recursively. The general -form of the definition is: - -> If X is a sequence of blocks, then the result of -> transforming X in such-and-such a way is a container of type Y -> with these blocks as its content. +```````````````````````````````` example +[foo]: +/url -So, we explain what counts as a block quote or list item by explaining -how these can be *generated* from their contents. This should suffice -to define the syntax, although it does not give a recipe for *parsing* -these constructions. (A recipe is provided below in the section entitled -[A parsing strategy](#appendix-a-parsing-strategy).) +[foo] +. +

foo

+```````````````````````````````` -## Block quotes -A [block quote marker](@) -consists of 0-3 spaces of initial indent, plus (a) the character `>` together -with a following space, or (b) a single character `>` not followed by a space. +The link destination may not be omitted: -The following rules define [block quotes]: +```````````````````````````````` example +[foo]: -1. **Basic case.** If a string of lines *Ls* constitute a sequence - of blocks *Bs*, then the result of prepending a [block quote - marker] to the beginning of each line in *Ls* - is a [block quote](#block-quotes) containing *Bs*. +[foo] +. +

[foo]:

+

[foo]

+```````````````````````````````` -2. **Laziness.** If a string of lines *Ls* constitute a [block - quote](#block-quotes) with contents *Bs*, then the result of deleting - the initial [block quote marker] from one or - more lines in which the next [non-whitespace character] after the [block - quote marker] is [paragraph continuation - text] is a block quote with *Bs* as its content. - [Paragraph continuation text](@) is text - that will be parsed as part of the content of a paragraph, but does - not occur at the beginning of the paragraph. + However, an empty link destination may be specified using + angle brackets: -3. **Consecutiveness.** A document cannot contain two [block - quotes] in a row unless there is a [blank line] between them. +```````````````````````````````` example +[foo]: <> -Nothing else counts as a [block quote](#block-quotes). +[foo] +. +

foo

+```````````````````````````````` -Here is a simple example: +The title must be separated from the link destination by +whitespace: ```````````````````````````````` example -> # Foo -> bar -> baz +[foo]: (baz) + +[foo] . -
-

Foo

-

bar -baz

-
+

[foo]: (baz)

+

[foo]

```````````````````````````````` -The spaces after the `>` characters can be omitted: +Both title and destination can contain backslash escapes +and literal backslashes: ```````````````````````````````` example -># Foo ->bar -> baz +[foo]: /url\bar\*baz "foo\"bar\baz" + +[foo] . -
-

Foo

-

bar -baz

-
+

foo

```````````````````````````````` -The `>` characters can be indented 1-3 spaces: +A link can come before its corresponding definition: ```````````````````````````````` example - > # Foo - > bar - > baz +[foo] + +[foo]: url . -
-

Foo

-

bar -baz

-
+

foo

```````````````````````````````` -Four spaces gives us a code block: +If there are several matching definitions, the first one takes +precedence: ```````````````````````````````` example - > # Foo - > bar - > baz +[foo] + +[foo]: first +[foo]: second . -
> # Foo
-> bar
-> baz
-
+

foo

```````````````````````````````` -The Laziness clause allows us to omit the `>` before -[paragraph continuation text]: +As noted in the section on [Links], matching of labels is +case-insensitive (see [matches]). ```````````````````````````````` example -> # Foo -> bar -baz +[FOO]: /url + +[Foo] . -
-

Foo

-

bar -baz

-
+

Foo

```````````````````````````````` -A block quote can contain some lazy and some non-lazy -continuation lines: - ```````````````````````````````` example -> bar -baz -> foo +[ΑΓΩ]: /φου + +[αγω] . -
-

bar -baz -foo

-
+

αγω

```````````````````````````````` -Laziness only applies to lines that would have been continuations of -paragraphs had they been prepended with [block quote markers]. -For example, the `> ` cannot be omitted in the second line of - -``` markdown -> foo -> --- -``` - -without changing the meaning: +Here is a link reference definition with no corresponding link. +It contributes nothing to the document. ```````````````````````````````` example -> foo ---- +[foo]: /url . -
-

foo

-
-
```````````````````````````````` -Similarly, if we omit the `> ` in the second line of - -``` markdown -> - foo -> - bar -``` - -then the block quote ends after the first line: +Here is another one: ```````````````````````````````` example -> - foo -- bar +[ +foo +]: /url +bar . -
-
    -
  • foo
  • -
-
-
    -
  • bar
  • -
+

bar

```````````````````````````````` -For the same reason, we can't omit the `> ` in front of -subsequent lines of an indented or fenced code block: +This is not a link reference definition, because there are +[non-whitespace characters] after the title: ```````````````````````````````` example -> foo - bar +[foo]: /url "title" ok . -
-
foo
-
-
-
bar
-
+

[foo]: /url "title" ok

```````````````````````````````` +This is a link reference definition, but it has no title: + ```````````````````````````````` example -> ``` -foo -``` +[foo]: /url +"title" ok . -
-
-
-

foo

-
+

"title" ok

```````````````````````````````` -Note that in the following case, we have a [lazy -continuation line]: +This is not a link reference definition, because it is indented +four spaces: ```````````````````````````````` example -> foo - - bar + [foo]: /url "title" + +[foo] . -
-

foo -- bar

-
+
[foo]: /url "title"
+
+

[foo]

```````````````````````````````` -To see why, note that in +This is not a link reference definition, because it occurs inside +a code block: -```markdown -> foo -> - bar +```````````````````````````````` example +``` +[foo]: /url ``` -the `- bar` is indented too far to start a list, and can't -be an indented code block because indented code blocks cannot -interrupt paragraphs, so it is [paragraph continuation text]. +[foo] +. +
[foo]: /url
+
+

[foo]

+```````````````````````````````` -A block quote can be empty: + +A [link reference definition] cannot interrupt a paragraph. ```````````````````````````````` example -> +Foo +[bar]: /baz + +[bar] . -
-
+

Foo +[bar]: /baz

+

[bar]

```````````````````````````````` +However, it can directly follow other block elements, such as headings +and thematic breaks, and it need not be followed by a blank line. + ```````````````````````````````` example -> -> -> +# [Foo] +[foo]: /url +> bar . +

Foo

+

bar

```````````````````````````````` - -A block quote can have initial or final blank lines: +```````````````````````````````` example +[foo]: /url +bar +=== +[foo] +. +

bar

+

foo

+```````````````````````````````` ```````````````````````````````` example -> -> foo -> +[foo]: /url +=== +[foo] . -
-

foo

-
+

=== +foo

```````````````````````````````` -A blank line always separates block quotes: +Several [link reference definitions] +can occur one after another, without intervening blank lines. ```````````````````````````````` example -> foo +[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url -> bar +[foo], +[bar], +[baz] . -
-

foo

-
-
-

bar

-
+

foo, +bar, +baz

```````````````````````````````` -(Most current Markdown implementations, including John Gruber's -original `Markdown.pl`, will parse this example as a single block quote -with two paragraphs. But it seems better to allow the author to decide -whether two block quotes or one are wanted.) - -Consecutiveness means that if we put these block quotes together, -we get a single block quote: +[Link reference definitions] can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined: ```````````````````````````````` example -> foo -> bar +[foo] + +> [foo]: /url . +

foo

-

foo -bar

```````````````````````````````` -To get a block quote with two paragraphs, use: +Whether something is a [link reference definition] is +independent of whether the link reference it defines is +used in the document. Thus, for example, the following +document contains just a link reference definition, and +no visible content: ```````````````````````````````` example -> foo -> -> bar +[foo]: /url . -
-

foo

-

bar

-
```````````````````````````````` -Block quotes can interrupt paragraphs: +## Paragraphs + +A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a [paragraph](@). +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +[whitespace]. + +A simple example with two paragraphs: ```````````````````````````````` example -foo -> bar +aaa + +bbb . -

foo

-
-

bar

-
+

aaa

+

bbb

```````````````````````````````` -In general, blank lines are not needed before or after block -quotes: +Paragraphs can contain multiple lines, but no blank lines: ```````````````````````````````` example -> aaa -*** -> bbb +aaa +bbb + +ccc +ddd . -
-

aaa

-
-
-
-

bbb

-
+

aaa +bbb

+

ccc +ddd

```````````````````````````````` -However, because of laziness, a blank line is needed between -a block quote and a following paragraph: +Multiple blank lines between paragraph have no effect: ```````````````````````````````` example -> bar -baz +aaa + + +bbb . -
-

bar -baz

-
+

aaa

+

bbb

```````````````````````````````` -```````````````````````````````` example -> bar +Leading spaces are skipped: -baz +```````````````````````````````` example + aaa + bbb . -
-

bar

-
-

baz

+

aaa +bbb

```````````````````````````````` +Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs. + ```````````````````````````````` example -> bar -> -baz +aaa + bbb + ccc . -
-

bar

-
-

baz

+

aaa +bbb +ccc

```````````````````````````````` -It is a consequence of the Laziness rule that any number -of initial `>`s may be omitted on a continuation line of a -nested block quote: +However, the first line may be indented at most three spaces, +or an indented code block will be triggered: ```````````````````````````````` example -> > > foo -bar + aaa +bbb . -
-
-
-

foo -bar

-
-
-
+

aaa +bbb

```````````````````````````````` ```````````````````````````````` example ->>> foo -> bar ->>baz + aaa +bbb . -
-
-
-

foo -bar -baz

-
-
-
+
aaa
+
+

bbb

```````````````````````````````` -When including an indented code block in a block quote, -remember that the [block quote marker] includes -both the `>` and a following space. So *five spaces* are needed after -the `>`: +Final spaces are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a [hard line +break]: ```````````````````````````````` example -> code - -> not code +aaa +bbb . -
-
code
-
-
-
-

not code

-
+

aaa
+bbb

```````````````````````````````` +## Blank lines -## List items +[Blank lines] between block-level elements are ignored, +except for the role they play in determining whether a [list] +is [tight] or [loose]. -A [list marker](@) is a -[bullet list marker] or an [ordered list marker]. +Blank lines at the beginning and end of the document are also ignored. -A [bullet list marker](@) -is a `-`, `+`, or `*` character. +```````````````````````````````` example + -An [ordered list marker](@) -is a sequence of 1--9 arabic digits (`0-9`), followed by either a -`.` character or a `)` character. (The reason for the length -limit is that with 10 digits we start seeing integer overflows -in some browsers.) +aaa + -The following rules define [list items]: +# aaa -1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of - blocks *Bs* starting with a [non-whitespace character], and *M* is a - list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result - of prepending *M* and the following spaces to the first line of - *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a - list item with *Bs* as its contents. The type of the list item - (bullet or ordered) is determined by the type of its list marker. - If the list item is ordered, then it is also assigned a start - number, based on the ordered list marker. + +. +

aaa

+

aaa

+```````````````````````````````` - Exceptions: - 1. When the first list item in a [list] interrupts - a paragraph---that is, when it starts on a line that would - otherwise count as [paragraph continuation text]---then (a) - the lines *Ls* must not begin with a blank line, and (b) if - the list item is ordered, the start number must be 1. - 2. If any line is a [thematic break][thematic breaks] then - that line is not a list item. -For example, let *Ls* be the lines +# Container blocks -```````````````````````````````` example -A paragraph -with two lines. +A [container block](#container-blocks) is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes] and [list items]. +[Lists] are meta-containers for [list items]. - indented code +We define the syntax for container blocks recursively. The general +form of the definition is: -> A block quote. -. -

A paragraph -with two lines.

-
indented code
-
-
-

A block quote.

-
-```````````````````````````````` +> If X is a sequence of blocks, then the result of +> transforming X in such-and-such a way is a container of type Y +> with these blocks as its content. +So, we explain what counts as a block quote or list item by explaining +how these can be *generated* from their contents. This should suffice +to define the syntax, although it does not give a recipe for *parsing* +these constructions. (A recipe is provided below in the section entitled +[A parsing strategy](#appendix-a-parsing-strategy).) -And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says -that the following is an ordered list item with start number 1, -and the same contents as *Ls*: +## Block quotes -```````````````````````````````` example -1. A paragraph - with two lines. +A [block quote marker](@) +consists of 0-3 spaces of initial indent, plus (a) the character `>` together +with a following space, or (b) a single character `>` not followed by a space. - indented code +The following rules define [block quotes]: - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-```````````````````````````````` +1. **Basic case.** If a string of lines *Ls* constitute a sequence + of blocks *Bs*, then the result of prepending a [block quote + marker] to the beginning of each line in *Ls* + is a [block quote](#block-quotes) containing *Bs*. +2. **Laziness.** If a string of lines *Ls* constitute a [block + quote](#block-quotes) with contents *Bs*, then the result of deleting + the initial [block quote marker] from one or + more lines in which the next [non-whitespace character] after the [block + quote marker] is [paragraph continuation + text] is a block quote with *Bs* as its content. + [Paragraph continuation text](@) is text + that will be parsed as part of the content of a paragraph, but does + not occur at the beginning of the paragraph. -The most important thing to notice is that the position of -the text after the list marker determines how much indentation -is needed in subsequent blocks in the list item. If the list -marker takes up two spaces, and there are three spaces between -the list marker and the next [non-whitespace character], then blocks -must be indented five spaces in order to fall under the list -item. +3. **Consecutiveness.** A document cannot contain two [block + quotes] in a row unless there is a [blank line] between them. -Here are some examples showing how far content must be indented to be -put under the list item: +Nothing else counts as a [block quote](#block-quotes). -```````````````````````````````` example -- one +Here is a simple example: - two +```````````````````````````````` example +> # Foo +> bar +> baz . -
    -
  • one
  • -
-

two

+
+

Foo

+

bar +baz

+
```````````````````````````````` -```````````````````````````````` example -- one +The spaces after the `>` characters can be omitted: - two +```````````````````````````````` example +># Foo +>bar +> baz . -
    -
  • -

    one

    -

    two

    -
  • -
+
+

Foo

+

bar +baz

+
```````````````````````````````` -```````````````````````````````` example - - one +The `>` characters can be indented 1-3 spaces: - two +```````````````````````````````` example + > # Foo + > bar + > baz . -
    -
  • one
  • -
-
 two
-
+
+

Foo

+

bar +baz

+
```````````````````````````````` -```````````````````````````````` example - - one +Four spaces gives us a code block: - two +```````````````````````````````` example + > # Foo + > bar + > baz . -
    -
  • -

    one

    -

    two

    -
  • -
+
> # Foo
+> bar
+> baz
+
```````````````````````````````` -It is tempting to think of this in terms of columns: the continuation -blocks must be indented at least to the column of the first -[non-whitespace character] after the list marker. However, that is not quite right. -The spaces after the list marker determine how much relative indentation -is needed. Which column this indentation reaches will depend on -how the list item is embedded in other constructions, as shown by -this example: +The Laziness clause allows us to omit the `>` before +[paragraph continuation text]: ```````````````````````````````` example - > > 1. one ->> ->> two +> # Foo +> bar +baz .
-
-
    -
  1. -

    one

    -

    two

    -
  2. -
-
+

Foo

+

bar +baz

```````````````````````````````` -Here `two` occurs in the same column as the list marker `1.`, -but is actually contained in the list item, because there is -sufficient indentation after the last containing blockquote marker. - -The converse is also possible. In the following example, the word `two` -occurs far to the right of the initial text of the list item, `one`, but -it is not considered part of the list item, because it is not indented -far enough past the blockquote marker: +A block quote can contain some lazy and some non-lazy +continuation lines: ```````````````````````````````` example ->>- one ->> - > > two +> bar +baz +> foo .
-
-
    -
  • one
  • -
-

two

-
+

bar +baz +foo

```````````````````````````````` -Note that at least one space is needed between the list marker and -any following content, so these are not list items: +Laziness only applies to lines that would have been continuations of +paragraphs had they been prepended with [block quote markers]. +For example, the `> ` cannot be omitted in the second line of -```````````````````````````````` example --one +``` markdown +> foo +> --- +``` -2.two +without changing the meaning: + +```````````````````````````````` example +> foo +--- . -

-one

-

2.two

+
+

foo

+
+
```````````````````````````````` -A list item may contain blocks that are separated by more than -one blank line. +Similarly, if we omit the `> ` in the second line of -```````````````````````````````` example -- foo +``` markdown +> - foo +> - bar +``` +then the block quote ends after the first line: - bar +```````````````````````````````` example +> - foo +- bar . +
    -
  • -

    foo

    -

    bar

    -
  • +
  • foo
  • +
+
+
    +
  • bar
```````````````````````````````` -A list item may contain any kind of block: +For the same reason, we can't omit the `> ` in front of +subsequent lines of an indented or fenced code block: ```````````````````````````````` example -1. foo - - ``` +> foo bar - ``` - - baz - - > bam . -
    -
  1. -

    foo

    +
    +
    foo
    +
    +
    bar
     
    -

    baz

    +```````````````````````````````` + + +```````````````````````````````` example +> ``` +foo +``` +.
    -

    bam

    +
    -
  2. -
+

foo

+
```````````````````````````````` -A list item that contains an indented code block will preserve -empty lines within the code block verbatim. +Note that in the following case, we have a [lazy +continuation line]: ```````````````````````````````` example -- Foo - - bar +> foo + - bar +. +
+

foo +- bar

+
+```````````````````````````````` - baz -. -
    -
  • -

    Foo

    -
    bar
    +To see why, note that in
     
    +```markdown
    +> foo
    +>     - bar
    +```
     
    -baz
    -
    -
  • -
-```````````````````````````````` +the `- bar` is indented too far to start a list, and can't +be an indented code block because indented code blocks cannot +interrupt paragraphs, so it is [paragraph continuation text]. -Note that ordered list start numbers must be nine digits or less: +A block quote can be empty: ```````````````````````````````` example -123456789. ok +> . -
    -
  1. ok
  2. -
+
+
```````````````````````````````` ```````````````````````````````` example -1234567890. not ok +> +> +> . -

1234567890. not ok

+
+
```````````````````````````````` -A start number may begin with 0s: +A block quote can have initial or final blank lines: ```````````````````````````````` example -0. ok +> +> foo +> . -
    -
  1. ok
  2. -
+
+

foo

+
```````````````````````````````` +A blank line always separates block quotes: + ```````````````````````````````` example -003. ok +> foo + +> bar . -
    -
  1. ok
  2. -
+
+

foo

+
+
+

bar

+
```````````````````````````````` -A start number may not be negative: +(Most current Markdown implementations, including John Gruber's +original `Markdown.pl`, will parse this example as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.) + +Consecutiveness means that if we put these block quotes together, +we get a single block quote: ```````````````````````````````` example --1. not ok +> foo +> bar . -

-1. not ok

+
+

foo +bar

+
```````````````````````````````` - -2. **Item starting with indented code.** If a sequence of lines *Ls* - constitute a sequence of blocks *Bs* starting with an indented code - block, and *M* is a list marker of width *W* followed by - one space, then the result of prepending *M* and the following - space to the first line of *Ls*, and indenting subsequent lines of - *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. - If a line is empty, then it need not be indented. The type of the - list item (bullet or ordered) is determined by the type of its list - marker. If the list item is ordered, then it is also assigned a - start number, based on the ordered list marker. - -An indented code block will have to be indented four spaces beyond -the edge of the region where text will be included in the list item. -In the following case that is 6 spaces: +To get a block quote with two paragraphs, use: ```````````````````````````````` example -- foo - - bar +> foo +> +> bar . -
    -
  • +

    foo

    -
    bar
    -
    -
  • -
+

bar

+ ```````````````````````````````` -And in this case it is 11 spaces: +Block quotes can interrupt paragraphs: ```````````````````````````````` example - 10. foo - - bar +foo +> bar . -
    -
  1. foo

    -
    bar
    -
    -
  2. -
+
+

bar

+
```````````````````````````````` -If the *first* block in the list item is an indented code block, -then by rule #2, the contents must be indented *one* space after the -list marker: +In general, blank lines are not needed before or after block +quotes: ```````````````````````````````` example - indented code - -paragraph - - more code -. -
indented code
-
-

paragraph

-
more code
-
-```````````````````````````````` - - -```````````````````````````````` example -1. indented code - - paragraph - - more code -. -
    -
  1. -
    indented code
    -
    -

    paragraph

    -
    more code
    -
    -
  2. -
-```````````````````````````````` - - -Note that an additional space indent is interpreted as space -inside the code block: - -```````````````````````````````` example -1. indented code - - paragraph - - more code +> aaa +*** +> bbb . -
    -
  1. -
     indented code
    -
    -

    paragraph

    -
    more code
    -
    -
  2. -
+
+

aaa

+
+
+
+

bbb

+
```````````````````````````````` -Note that rules #1 and #2 only apply to two cases: (a) cases -in which the lines to be included in a list item begin with a -[non-whitespace character], and (b) cases in which -they begin with an indented code -block. In a case like the following, where the first block begins with -a three-space indent, the rules do not allow us to form a list item by -indenting the whole thing and prepending a list marker: +However, because of laziness, a blank line is needed between +a block quote and a following paragraph: ```````````````````````````````` example - foo - -bar +> bar +baz . -

foo

-

bar

+
+

bar +baz

+
```````````````````````````````` ```````````````````````````````` example -- foo +> bar - bar +baz . -
    -
  • foo
  • -
+

bar

+
+

baz

```````````````````````````````` -This is not a significant restriction, because when a block begins -with 1-3 spaces indent, the indentation can always be removed without -a change in interpretation, allowing rule #1 to be applied. So, in -the above case: - ```````````````````````````````` example -- foo - - bar +> bar +> +baz . -
    -
  • -

    foo

    +

    bar

    -
  • -
+ +

baz

```````````````````````````````` -3. **Item starting with a blank line.** If a sequence of lines *Ls* - starting with a single [blank line] constitute a (possibly empty) - sequence of blocks *Bs*, not separated from each other by more than - one blank line, and *M* is a list marker of width *W*, - then the result of prepending *M* to the first line of *Ls*, and - indenting subsequent lines of *Ls* by *W + 1* spaces, is a list - item with *Bs* as its contents. - If a line is empty, then it need not be indented. The type of the - list item (bullet or ordered) is determined by the type of its list - marker. If the list item is ordered, then it is also assigned a - start number, based on the ordered list marker. - -Here are some list items that start with a blank line but are not empty: +It is a consequence of the Laziness rule that any number +of initial `>`s may be omitted on a continuation line of a +nested block quote: ```````````````````````````````` example -- - foo -- - ``` - bar - ``` -- - baz +> > > foo +bar . -
    -
  • foo
  • -
  • -
    bar
    -
    -
  • -
  • -
    baz
    -
    -
  • -
+
+
+
+

foo +bar

+
+
+
```````````````````````````````` -When the list item starts with a blank line, the number of spaces -following the list marker doesn't change the required indentation: ```````````````````````````````` example -- - foo +>>> foo +> bar +>>baz . -
    -
  • foo
  • -
+
+
+
+

foo +bar +baz

+
+
+
```````````````````````````````` -A list item can begin with at most one blank line. -In the following example, `foo` is not part of the list -item: +When including an indented code block in a block quote, +remember that the [block quote marker] includes +both the `>` and a following space. So *five spaces* are needed after +the `>`: ```````````````````````````````` example -- +> code - foo +> not code . -
    -
  • -
-

foo

+
+
code
+
+
+
+

not code

+
```````````````````````````````` -Here is an empty bullet list item: -```````````````````````````````` example -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-```````````````````````````````` +## List items +A [list marker](@) is a +[bullet list marker] or an [ordered list marker]. -It does not matter whether there are spaces following the [list marker]: +A [bullet list marker](@) +is a `-`, `+`, or `*` character. -```````````````````````````````` example -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-```````````````````````````````` +An [ordered list marker](@) +is a sequence of 1--9 arabic digits (`0-9`), followed by either a +`.` character or a `)` character. (The reason for the length +limit is that with 10 digits we start seeing integer overflows +in some browsers.) +The following rules define [list items]: -Here is an empty ordered list item: +1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of + blocks *Bs* starting with a [non-whitespace character], and *M* is a + list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result + of prepending *M* and the following spaces to the first line of + *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a + list item with *Bs* as its contents. The type of the list item + (bullet or ordered) is determined by the type of its list marker. + If the list item is ordered, then it is also assigned a start + number, based on the ordered list marker. -```````````````````````````````` example -1. foo -2. -3. bar -. -
    -
  1. foo
  2. -
  3. -
  4. bar
  5. -
-```````````````````````````````` + Exceptions: + 1. When the first list item in a [list] interrupts + a paragraph---that is, when it starts on a line that would + otherwise count as [paragraph continuation text]---then (a) + the lines *Ls* must not begin with a blank line, and (b) if + the list item is ordered, the start number must be 1. + 2. If any line is a [thematic break][thematic breaks] then + that line is not a list item. -A list may start or end with an empty list item: +For example, let *Ls* be the lines ```````````````````````````````` example -* -. -
    -
  • -
-```````````````````````````````` - -However, an empty list item cannot interrupt a paragraph: - -```````````````````````````````` example -foo -* - -foo -1. -. -

foo -*

-

foo -1.

-```````````````````````````````` - - -4. **Indentation.** If a sequence of lines *Ls* constitutes a list item - according to rule #1, #2, or #3, then the result of indenting each line - of *Ls* by 1-3 spaces (the same for each line) also constitutes a - list item with the same contents and attributes. If a line is - empty, then it need not be indented. - -Indented one space: - -```````````````````````````````` example - 1. A paragraph - with two lines. +A paragraph +with two lines. - indented code + indented code - > A block quote. +> A block quote. . -
    -
  1. A paragraph with two lines.

    indented code
    @@ -4382,20 +4135,20 @@ with two lines.

    A block quote.

    -
  2. -
```````````````````````````````` -Indented two spaces: +And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as *Ls*: ```````````````````````````````` example - 1. A paragraph - with two lines. +1. A paragraph + with two lines. - indented code + indented code - > A block quote. + > A block quote. .
  1. @@ -4411,658 +4164,750 @@ with two lines.

    ```````````````````````````````` -Indented three spaces: +The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces, and there are three spaces between +the list marker and the next [non-whitespace character], then blocks +must be indented five spaces in order to fall under the list +item. -```````````````````````````````` example - 1. A paragraph - with two lines. +Here are some examples showing how far content must be indented to be +put under the list item: - indented code +```````````````````````````````` example +- one - > A block quote. + two . -
      -
    1. -

      A paragraph -with two lines.

      -
      indented code
      -
      -
      -

      A block quote.

      -
      -
    2. -
    +
      +
    • one
    • +
    +

    two

    ```````````````````````````````` -Four spaces indent gives a code block: - ```````````````````````````````` example - 1. A paragraph - with two lines. - - indented code +- one - > A block quote. + two . -
    1.  A paragraph
    -    with two lines.
    -
    -        indented code
    -
    -    > A block quote.
    -
    +
      +
    • +

      one

      +

      two

      +
    • +
    ```````````````````````````````` - -5. **Laziness.** If a string of lines *Ls* constitute a [list - item](#list-items) with contents *Bs*, then the result of deleting - some or all of the indentation from one or more lines in which the - next [non-whitespace character] after the indentation is - [paragraph continuation text] is a - list item with the same contents and attributes. The unindented - lines are called - [lazy continuation line](@)s. - -Here is an example with [lazy continuation lines]: - ```````````````````````````````` example - 1. A paragraph -with two lines. - - indented code + - one - > A block quote. + two . -
      -
    1. -

      A paragraph -with two lines.

      -
      indented code
      +
        +
      • one
      • +
      +
       two
       
      -
      -

      A block quote.

      -
      -
    2. -
    ```````````````````````````````` -Indentation can be partially deleted: - ```````````````````````````````` example - 1. A paragraph - with two lines. + - one + + two . -
      -
    1. A paragraph -with two lines.
    2. -
    +
      +
    • +

      one

      +

      two

      +
    • +
    ```````````````````````````````` -These examples show how laziness can work in nested structures: +It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first +[non-whitespace character] after the list marker. However, that is not quite right. +The spaces after the list marker determine how much relative indentation +is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as shown by +this example: ```````````````````````````````` example -> 1. > Blockquote -continued here. + > > 1. one +>> +>> two .
    +
    1. -
      -

      Blockquote -continued here.

      -
      +

      one

      +

      two

    +
    ```````````````````````````````` +Here `two` occurs in the same column as the list marker `1.`, +but is actually contained in the list item, because there is +sufficient indentation after the last containing blockquote marker. + +The converse is also possible. In the following example, the word `two` +occurs far to the right of the initial text of the list item, `one`, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker: + ```````````````````````````````` example -> 1. > Blockquote -> continued here. +>>- one +>> + > > two .
    -
      -
    1. -

      Blockquote -continued here.

      +
        +
      • one
      • +
      +

      two

      -
    2. -
    ```````````````````````````````` +Note that at least one space is needed between the list marker and +any following content, so these are not list items: + +```````````````````````````````` example +-one -6. **That's all.** Nothing that is not counted as a list item by rules - #1--5 counts as a [list item](#list-items). +2.two +. +

    -one

    +

    2.two

    +```````````````````````````````` -The rules for sublists follow from the general rules -[above][List items]. A sublist must be indented the same number -of spaces a paragraph would need to be in order to be included -in the list item. -So, in this case we need two spaces indent: +A list item may contain blocks that are separated by more than +one blank line. ```````````````````````````````` example - foo - - bar - - baz - - boo + + + bar .
      -
    • foo -
        -
      • bar -
          -
        • baz -
            -
          • boo
          • -
          -
        • -
        -
      • -
      +
    • +

      foo

      +

      bar

    ```````````````````````````````` -One is not enough: +A list item may contain any kind of block: ```````````````````````````````` example -- foo - - bar - - baz - - boo -. -
      -
    • foo
    • -
    • bar
    • -
    • baz
    • -
    • boo
    • -
    -```````````````````````````````` +1. foo + ``` + bar + ``` -Here we need four, because the list marker is wider: + baz -```````````````````````````````` example -10) foo - - bar + > bam . -
      -
    1. foo -
        -
      • bar
      • -
      +
        +
      1. +

        foo

        +
        bar
        +
        +

        baz

        +
        +

        bam

        +
      ```````````````````````````````` -Three is not enough: +A list item that contains an indented code block will preserve +empty lines within the code block verbatim. ```````````````````````````````` example -10) foo - - bar -. -
        -
      1. foo
      2. -
      -
        -
      • bar
      • -
      -```````````````````````````````` +- Foo + bar -A list may be the first block in a list item: -```````````````````````````````` example -- - foo + baz .
      • -
          -
        • foo
        • -
        +

        Foo

        +
        bar
        +
        +
        +baz
        +
      ```````````````````````````````` +Note that ordered list start numbers must be nine digits or less: ```````````````````````````````` example -1. - 2. foo +123456789. ok . -
        -
      1. -
          -
        • -
            -
          1. foo
          2. -
          -
        • -
        -
      2. +
          +
        1. ok
        ```````````````````````````````` -A list item can contain a heading: - ```````````````````````````````` example -- # Foo -- Bar - --- - baz +1234567890. not ok . -
          -
        • -

          Foo

          -
        • -
        • -

          Bar

          -baz
        • -
        +

        1234567890. not ok

        ```````````````````````````````` -### Motivation - -John Gruber's Markdown spec says the following about list items: - -1. "List markers typically start at the left margin, but may be indented - by up to three spaces. List markers must be followed by one or more - spaces or a tab." +A start number may begin with 0s: -2. "To make lists look nice, you can wrap items with hanging indents.... - But if you don't want to, you don't have to." +```````````````````````````````` example +0. ok +. +
          +
        1. ok
        2. +
        +```````````````````````````````` -3. "List items may consist of multiple paragraphs. Each subsequent - paragraph in a list item must be indented by either 4 spaces or one - tab." -4. "It looks nice if you indent every line of the subsequent paragraphs, - but here again, Markdown will allow you to be lazy." +```````````````````````````````` example +003. ok +. +
          +
        1. ok
        2. +
        +```````````````````````````````` -5. "To put a blockquote within a list item, the blockquote's `>` - delimiters need to be indented." -6. "To put a code block within a list item, the code block needs to be - indented twice — 8 spaces or two tabs." +A start number may not be negative: -These rules specify that a paragraph under a list item must be indented -four spaces (presumably, from the left margin, rather than the start of -the list marker, but this is not said), and that code under a list item -must be indented eight spaces instead of the usual four. They also say -that a block quote must be indented, but not by how much; however, the -example given has four spaces indentation. Although nothing is said -about other kinds of block-level content, it is certainly reasonable to -infer that *all* block elements under a list item, including other -lists, must be indented four spaces. This principle has been called the -*four-space rule*. +```````````````````````````````` example +-1. not ok +. +

        -1. not ok

        +```````````````````````````````` -The four-space rule is clear and principled, and if the reference -implementation `Markdown.pl` had followed it, it probably would have -become the standard. However, `Markdown.pl` allowed paragraphs and -sublists to start with only two spaces indentation, at least on the -outer level. Worse, its behavior was inconsistent: a sublist of an -outer-level list needed two spaces indentation, but a sublist of this -sublist needed three spaces. It is not surprising, then, that different -implementations of Markdown have developed very different rules for -determining what comes under a list item. (Pandoc and python-Markdown, -for example, stuck with Gruber's syntax description and the four-space -rule, while discount, redcarpet, marked, PHP Markdown, and others -followed `Markdown.pl`'s behavior more closely.) -Unfortunately, given the divergences between implementations, there -is no way to give a spec for list items that will be guaranteed not -to break any existing documents. However, the spec given here should -correctly handle lists formatted with either the four-space rule or -the more forgiving `Markdown.pl` behavior, provided they are laid out -in a way that is natural for a human to read. -The strategy here is to let the width and indentation of the list marker -determine the indentation necessary for blocks to fall under the list -item, rather than having a fixed and arbitrary number. The writer can -think of the body of the list item as a unit which gets indented to the -right enough to fit the list marker (and any indentation on the list -marker). (The laziness rule, #5, then allows continuation lines to be -unindented if needed.) +2. **Item starting with indented code.** If a sequence of lines *Ls* + constitute a sequence of blocks *Bs* starting with an indented code + block, and *M* is a list marker of width *W* followed by + one space, then the result of prepending *M* and the following + space to the first line of *Ls*, and indenting subsequent lines of + *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. -This rule is superior, we claim, to any rule requiring a fixed level of -indentation from the margin. The four-space rule is clear but -unnatural. It is quite unintuitive that +An indented code block will have to be indented four spaces beyond +the edge of the region where text will be included in the list item. +In the following case that is 6 spaces: -``` markdown +```````````````````````````````` example - foo - bar - - - baz -``` - -should be parsed as two lists with an intervening paragraph, - -``` html -
          -
        • foo
        • -
        -

        bar

        -
          -
        • baz
        • -
        -``` - -as the four-space rule demands, rather than a single list, - -``` html + bar +.
        • foo

          -

          bar

          -
            -
          • baz
          • -
          +
          bar
          +
        -``` - -The choice of four spaces is arbitrary. It can be learned, but it is -not likely to be guessed, and it trips up beginners regularly. - -Would it help to adopt a two-space rule? The problem is that such -a rule, together with the rule allowing 1--3 spaces indentation of the -initial list marker, allows text that is indented *less than* the -original list marker to be included in the list item. For example, -`Markdown.pl` parses +```````````````````````````````` -``` markdown - - one - two -``` +And in this case it is 11 spaces: -as a single list item, with `two` a continuation paragraph: +```````````````````````````````` example + 10. foo -``` html -
          + bar +. +
          1. -

            one

            -

            two

            +

            foo

            +
            bar
            +
          2. -
        -``` +
      +```````````````````````````````` -and similarly -``` markdown -> - one -> -> two -``` +If the *first* block in the list item is an indented code block, +then by rule #2, the contents must be indented *one* space after the +list marker: -as +```````````````````````````````` example + indented code -``` html -
      -
        -
      • -

        one

        -

        two

        +paragraph + + more code +. +
        indented code
        +
        +

        paragraph

        +
        more code
        +
        +```````````````````````````````` + + +```````````````````````````````` example +1. indented code + + paragraph + + more code +. +
          +
        1. +
          indented code
          +
          +

          paragraph

          +
          more code
          +
        2. +
        +```````````````````````````````` + + +Note that an additional space indent is interpreted as space +inside the code block: + +```````````````````````````````` example +1. indented code + + paragraph + + more code +. +
          +
        1. +
           indented code
          +
          +

          paragraph

          +
          more code
          +
          +
        2. +
        +```````````````````````````````` + + +Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a +[non-whitespace character], and (b) cases in which +they begin with an indented code +block. In a case like the following, where the first block begins with +a three-space indent, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker: + +```````````````````````````````` example + foo + +bar +. +

        foo

        +

        bar

        +```````````````````````````````` + + +```````````````````````````````` example +- foo + + bar +. +
          +
        • foo
        -
      -``` +

      bar

      +```````````````````````````````` -This is extremely unintuitive. -Rather than requiring a fixed indent from the margin, we could require -a fixed indent (say, two spaces, or even one space) from the list marker (which -may itself be indented). This proposal would remove the last anomaly -discussed. Unlike the spec presented above, it would count the following -as a list item with a subparagraph, even though the paragraph `bar` -is not indented as far as the first paragraph `foo`: +This is not a significant restriction, because when a block begins +with 1-3 spaces indent, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case: -``` markdown - 10. foo +```````````````````````````````` example +- foo - bar -``` + bar +. +
        +
      • +

        foo

        +

        bar

        +
      • +
      +```````````````````````````````` -Arguably this text does read like a list item with `bar` as a subparagraph, -which may count in favor of the proposal. However, on this proposal indented -code would have to be indented six spaces after the list marker. And this -would break a lot of existing Markdown, which has the pattern: -``` markdown -1. foo +3. **Item starting with a blank line.** If a sequence of lines *Ls* + starting with a single [blank line] constitute a (possibly empty) + sequence of blocks *Bs*, not separated from each other by more than + one blank line, and *M* is a list marker of width *W*, + then the result of prepending *M* to the first line of *Ls*, and + indenting subsequent lines of *Ls* by *W + 1* spaces, is a list + item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. - indented code -``` +Here are some list items that start with a blank line but are not empty: -where the code is indented eight spaces. The spec above, by contrast, will -parse this text as expected, since the code block's indentation is measured -from the beginning of `foo`. +```````````````````````````````` example +- + foo +- + ``` + bar + ``` +- + baz +. +
        +
      • foo
      • +
      • +
        bar
        +
        +
      • +
      • +
        baz
        +
        +
      • +
      +```````````````````````````````` -The one case that needs special treatment is a list item that *starts* -with indented code. How much indentation is required in that case, since -we don't have a "first paragraph" to measure from? Rule #2 simply stipulates -that in such cases, we require one space indentation from the list marker -(and then the normal four spaces for the indented code). This will match the -four-space rule in cases where the list marker plus its initial indentation -takes four spaces (a common case), but diverge in other cases. +When the list item starts with a blank line, the number of spaces +following the list marker doesn't change the required indentation: -## Lists +```````````````````````````````` example +- + foo +. +
        +
      • foo
      • +
      +```````````````````````````````` -A [list](@) is a sequence of one or more -list items [of the same type]. The list items -may be separated by any number of blank lines. -Two list items are [of the same type](@) -if they begin with a [list marker] of the same type. -Two list markers are of the -same type if (a) they are bullet list markers using the same character -(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same -delimiter (either `.` or `)`). +A list item can begin with at most one blank line. +In the following example, `foo` is not part of the list +item: -A list is an [ordered list](@) -if its constituent list items begin with -[ordered list markers], and a -[bullet list](@) if its constituent list -items begin with [bullet list markers]. +```````````````````````````````` example +- -The [start number](@) -of an [ordered list] is determined by the list number of -its initial list item. The numbers of subsequent list items are -disregarded. + foo +. +
        +
      • +
      +

      foo

      +```````````````````````````````` -A list is [loose](@) if any of its constituent -list items are separated by blank lines, or if any of its constituent -list items directly contain two block-level elements with a blank line -between them. Otherwise a list is [tight](@). -(The difference in HTML output is that paragraphs in a loose list are -wrapped in `

      ` tags, while paragraphs in a tight list are not.) -Changing the bullet or ordered list delimiter starts a new list: +Here is an empty bullet list item: ```````````````````````````````` example - foo +- - bar -+ baz .

      • foo
      • +
      • bar
      +```````````````````````````````` + + +It does not matter whether there are spaces following the [list marker]: + +```````````````````````````````` example +- foo +- +- bar +.
        -
      • baz
      • +
      • foo
      • +
      • +
      • bar
      ```````````````````````````````` +Here is an empty ordered list item: + ```````````````````````````````` example 1. foo -2. bar -3) baz +2. +3. bar .
      1. foo
      2. +
      3. bar
      -
        -
      1. baz
      2. -
      ```````````````````````````````` -In CommonMark, a list can interrupt a paragraph. That is, -no blank line is needed to separate a paragraph from a following -list: +A list may start or end with an empty list item: ```````````````````````````````` example -Foo -- bar -- baz +* . -

      Foo

        -
      • bar
      • -
      • baz
      • +
      ```````````````````````````````` -`Markdown.pl` does not allow this, through fear of triggering a list -via a numeral in a hard-wrapped line: +However, an empty list item cannot interrupt a paragraph: -``` markdown -The number of windows in my house is -14. The number of doors is 6. -``` +```````````````````````````````` example +foo +* -Oddly, though, `Markdown.pl` *does* allow a blockquote to -interrupt a paragraph, even though the same considerations might -apply. +foo +1. +. +

      foo +*

      +

      foo +1.

      +```````````````````````````````` -In CommonMark, we do allow lists to interrupt paragraphs, for -two reasons. First, it is natural and not uncommon for people -to start lists without blank lines: -``` markdown -I need to buy -- new shoes -- a coat -- a plane ticket -``` +4. **Indentation.** If a sequence of lines *Ls* constitutes a list item + according to rule #1, #2, or #3, then the result of indenting each line + of *Ls* by 1-3 spaces (the same for each line) also constitutes a + list item with the same contents and attributes. If a line is + empty, then it need not be indented. -Second, we are attracted to a +Indented one space: -> [principle of uniformity](@): -> if a chunk of text has a certain -> meaning, it will continue to have the same meaning when put into a -> container block (such as a list item or blockquote). +```````````````````````````````` example + 1. A paragraph + with two lines. -(Indeed, the spec for [list items] and [block quotes] presupposes -this principle.) This principle implies that if + indented code -``` markdown - * I need to buy - - new shoes - - a coat - - a plane ticket -``` + > A block quote. +. +
        +
      1. +

        A paragraph +with two lines.

        +
        indented code
        +
        +
        +

        A block quote.

        +
        +
      2. +
      +```````````````````````````````` -is a list item containing a paragraph followed by a nested sublist, -as all Markdown implementations agree it is (though the paragraph -may be rendered without `

      ` tags, since the list is "tight"), -then -``` markdown -I need to buy -- new shoes -- a coat -- a plane ticket -``` +Indented two spaces: -by itself should be a paragraph followed by a nested sublist. +```````````````````````````````` example + 1. A paragraph + with two lines. -Since it is well established Markdown practice to allow lists to -interrupt paragraphs inside list items, the [principle of -uniformity] requires us to allow this outside list items as -well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) -takes a different approach, requiring blank lines before lists -even inside other list items.) + indented code + + > A block quote. +. +

        +
      1. +

        A paragraph +with two lines.

        +
        indented code
        +
        +
        +

        A block quote.

        +
        +
      2. +
      +```````````````````````````````` + + +Indented three spaces: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
        +
      1. +

        A paragraph +with two lines.

        +
        indented code
        +
        +
        +

        A block quote.

        +
        +
      2. +
      +```````````````````````````````` + + +Four spaces indent gives a code block: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
      1.  A paragraph
      +    with two lines.
      +
      +        indented code
      +
      +    > A block quote.
      +
      +```````````````````````````````` + + + +5. **Laziness.** If a string of lines *Ls* constitute a [list + item](#list-items) with contents *Bs*, then the result of deleting + some or all of the indentation from one or more lines in which the + next [non-whitespace character] after the indentation is + [paragraph continuation text] is a + list item with the same contents and attributes. The unindented + lines are called + [lazy continuation line](@)s. -In order to solve of unwanted lists in paragraphs with -hard-wrapped numerals, we allow only lists starting with `1` to -interrupt paragraphs. Thus, +Here is an example with [lazy continuation lines]: ```````````````````````````````` example -The number of windows in my house is -14. The number of doors is 6. -. -

      The number of windows in my house is -14. The number of doors is 6.

      -```````````````````````````````` + 1. A paragraph +with two lines. -We may still get an unintended result in cases like + indented code -```````````````````````````````` example -The number of windows in my house is -1. The number of doors is 6. + > A block quote. . -

      The number of windows in my house is

        -
      1. The number of doors is 6.
      2. +
      3. +

        A paragraph +with two lines.

        +
        indented code
        +
        +
        +

        A block quote.

        +
        +
      ```````````````````````````````` -but this rule should prevent most spurious list captures. -There can be any number of blank lines between items: +Indentation can be partially deleted: ```````````````````````````````` example -- foo + 1. A paragraph + with two lines. +. +
        +
      1. A paragraph +with two lines.
      2. +
      +```````````````````````````````` -- bar +These examples show how laziness can work in nested structures: -- baz +```````````````````````````````` example +> 1. > Blockquote +continued here. . -
        -
      • -

        foo

        -
      • +
        +
        1. -

          bar

          +
          +

          Blockquote +continued here.

          +
        2. +
        +
        +```````````````````````````````` + + +```````````````````````````````` example +> 1. > Blockquote +> continued here. +. +
        +
        1. -

          baz

          +
          +

          Blockquote +continued here.

          +
        2. -
      +
    + ```````````````````````````````` + + +6. **That's all.** Nothing that is not counted as a list item by rules + #1--5 counts as a [list item](#list-items). + +The rules for sublists follow from the general rules +[above][List items]. A sublist must be indented the same number +of spaces a paragraph would need to be in order to be included +in the list item. + +So, in this case we need two spaces indent: + ```````````````````````````````` example - foo - bar - baz - - - bim + - boo .
    • foo
      • bar
          -
        • -

          baz

          -

          bim

          +
        • baz +
            +
          • boo
          • +
      • @@ -5072,778 +4917,938 @@ There can be any number of blank lines between items: ```````````````````````````````` -To separate consecutive lists of the same type, or to separate a -list from an indented code block that would otherwise be parsed -as a subparagraph of the final list item, you can insert a blank HTML -comment: +One is not enough: ```````````````````````````````` example - foo -- bar - - - -- baz -- bim + - bar + - baz + - boo .
        • foo
        • bar
        • -
        - -
        • baz
        • -
        • bim
        • -
        -```````````````````````````````` - - -```````````````````````````````` example -- foo - - notcode - -- foo - - - - code -. -
          -
        • -

          foo

          -

          notcode

          -
        • -
        • -

          foo

          -
        • -
        - -
        code
        -
        -```````````````````````````````` - - -List items need not be indented to the same level. The following -list items will be treated as items at the same list level, -since none is indented enough to belong to the previous list -item: - -```````````````````````````````` example -- a - - b - - c - - d - - e - - f -- g -. -
          -
        • a
        • -
        • b
        • -
        • c
        • -
        • d
        • -
        • e
        • -
        • f
        • -
        • g
        • +
        • boo
        ```````````````````````````````` -```````````````````````````````` example -1. a - - 2. b - - 3. c -. -
          -
        1. -

          a

          -
        2. -
        3. -

          b

          -
        4. -
        5. -

          c

          -
        6. -
        -```````````````````````````````` - -Note, however, that list items may not be indented more than -three spaces. Here `- e` is treated as a paragraph continuation -line, because it is indented more than three spaces: +Here we need four, because the list marker is wider: ```````````````````````````````` example -- a - - b - - c - - d - - e +10) foo + - bar . +
          +
        1. foo
            -
          • a
          • -
          • b
          • -
          • c
          • -
          • d -- e
          • +
          • bar
          -```````````````````````````````` - -And here, `3. c` is treated as in indented code block, -because it is indented four spaces and preceded by a -blank line. - -```````````````````````````````` example -1. a - - 2. b - - 3. c -. -
            -
          1. -

            a

            -
          2. -
          3. -

            b

          -
          3. c
          -
          ```````````````````````````````` -This is a loose list, because there is a blank line between -two of the list items: +Three is not enough: ```````````````````````````````` example -- a -- b - -- c +10) foo + - bar . -
            -
          • -

            a

            -
          • -
          • -

            b

            -
          • -
          • -

            c

            -
          • +
              +
            1. foo
            2. +
            +
              +
            • bar
            ```````````````````````````````` -So is this, with a empty second item: +A list may be the first block in a list item: ```````````````````````````````` example -* a -* - -* c +- - foo .
            • -

              a

              -
            • -
            • -
            • -

              c

              +
                +
              • foo
              • +
            ```````````````````````````````` -These are loose lists, even though there is no space between the items, -because one of the items directly contains two block-level elements -with a blank line between them: - ```````````````````````````````` example -- a -- b - - c -- d +1. - 2. foo . -
              -
            • -

              a

              -
            • +
              1. -

                b

                -

                c

                -
              2. +
                • -

                  d

                  +
                    +
                  1. foo
                  2. +
                + +
              ```````````````````````````````` -```````````````````````````````` example -- a -- b +A list item can contain a heading: - [ref]: /url -- d +```````````````````````````````` example +- # Foo +- Bar + --- + baz .
              • -

                a

                -
              • -
              • -

                b

                +

                Foo

              • -

                d

                -
              • +

                Bar

                +baz
              ```````````````````````````````` -This is a tight list, because the blank lines are in a code block: +### Motivation -```````````````````````````````` example -- a -- ``` - b +John Gruber's Markdown spec says the following about list items: + +1. "List markers typically start at the left margin, but may be indented + by up to three spaces. List markers must be followed by one or more + spaces or a tab." +2. "To make lists look nice, you can wrap items with hanging indents.... + But if you don't want to, you don't have to." - ``` -- c -. +3. "List items may consist of multiple paragraphs. Each subsequent + paragraph in a list item must be indented by either 4 spaces or one + tab." + +4. "It looks nice if you indent every line of the subsequent paragraphs, + but here again, Markdown will allow you to be lazy." + +5. "To put a blockquote within a list item, the blockquote's `>` + delimiters need to be indented." + +6. "To put a code block within a list item, the code block needs to be + indented twice — 8 spaces or two tabs." + +These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that *all* block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +*four-space rule*. + +The four-space rule is clear and principled, and if the reference +implementation `Markdown.pl` had followed it, it probably would have +become the standard. However, `Markdown.pl` allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of Markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-Markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP Markdown, and others +followed `Markdown.pl`'s behavior more closely.) + +Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving `Markdown.pl` behavior, provided they are laid out +in a way that is natural for a human to read. + +The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #5, then allows continuation lines to be +unindented if needed.) + +This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that + +``` markdown +- foo + + bar + + - baz +``` + +should be parsed as two lists with an intervening paragraph, + +``` html
                -
              • a
              • -
              • -
                b
                +
              • foo
              • +
              +

              bar

              +
                +
              • baz
              • +
              +``` +as the four-space rule demands, rather than a single list, -
+``` html +
    +
  • +

    foo

    +

    bar

    +
      +
    • baz
    • +
  • -
  • c
-```````````````````````````````` +``` +The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly. -This is a tight list, because the blank line is between two -paragraphs of a sublist. So the sublist is loose while -the outer list is tight: +Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing 1--3 spaces indentation of the +initial list marker, allows text that is indented *less than* the +original list marker to be included in the list item. For example, +`Markdown.pl` parses -```````````````````````````````` example -- a - - b +``` markdown + - one - c -- d -. -
    -
  • a + two +``` + +as a single list item, with `two` a continuation paragraph: + +``` html
    • -

      b

      -

      c

      +

      one

      +

      two

    +``` + +and similarly + +``` markdown +> - one +> +> two +``` + +as + +``` html +
    +
      +
    • +

      one

      +

      two

    • -
    • d
    -```````````````````````````````` +
    +``` + +This is extremely unintuitive. + +Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph `bar` +is not indented as far as the first paragraph `foo`: + +``` markdown + 10. foo + + bar +``` + +Arguably this text does read like a list item with `bar` as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing Markdown, which has the pattern: + +``` markdown +1. foo + + indented code +``` + +where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of `foo`. +The one case that needs special treatment is a list item that *starts* +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases. -This is a tight list, because the blank line is inside the -block quote: +## Lists -```````````````````````````````` example -* a - > b - > -* c -. -
      -
    • a -
      -

      b

      -
      -
    • -
    • c
    • -
    -```````````````````````````````` +A [list](@) is a sequence of one or more +list items [of the same type]. The list items +may be separated by any number of blank lines. +Two list items are [of the same type](@) +if they begin with a [list marker] of the same type. +Two list markers are of the +same type if (a) they are bullet list markers using the same character +(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same +delimiter (either `.` or `)`). -This list is tight, because the consecutive block elements -are not separated by blank lines: +A list is an [ordered list](@) +if its constituent list items begin with +[ordered list markers], and a +[bullet list](@) if its constituent list +items begin with [bullet list markers]. -```````````````````````````````` example -- a - > b - ``` - c - ``` -- d -. -
      -
    • a -
      -

      b

      -
      -
      c
      -
      -
    • -
    • d
    • -
    -```````````````````````````````` +The [start number](@) +of an [ordered list] is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded. +A list is [loose](@) if any of its constituent +list items are separated by blank lines, or if any of its constituent +list items directly contain two block-level elements with a blank line +between them. Otherwise a list is [tight](@). +(The difference in HTML output is that paragraphs in a loose list are +wrapped in `

    ` tags, while paragraphs in a tight list are not.) -A single-paragraph list is tight: +Changing the bullet or ordered list delimiter starts a new list: ```````````````````````````````` example -- a +- foo +- bar ++ baz .

      -
    • a
    • +
    • foo
    • +
    • bar
    -```````````````````````````````` - - -```````````````````````````````` example -- a - - b -. -
      -
    • a
        -
      • b
      • -
      -
    • +
    • baz
    ```````````````````````````````` -This list is loose, because of the blank line between the -two block elements in the list item: - ```````````````````````````````` example -1. ``` - foo - ``` - - bar +1. foo +2. bar +3) baz .
      -
    1. -
      foo
      -
      -

      bar

      -
    2. +
    3. foo
    4. +
    5. bar
    6. +
    +
      +
    1. baz
    ```````````````````````````````` -Here the outer list is loose, the inner list tight: +In CommonMark, a list can interrupt a paragraph. That is, +no blank line is needed to separate a paragraph from a following +list: ```````````````````````````````` example -* foo - * bar - - baz +Foo +- bar +- baz . -
      -
    • -

      foo

      +

      Foo

      • bar
      • -
      -

      baz

      -
    • +
    • baz
    ```````````````````````````````` +`Markdown.pl` does not allow this, through fear of triggering a list +via a numeral in a hard-wrapped line: -```````````````````````````````` example -- a - - b - - c +``` markdown +The number of windows in my house is +14. The number of doors is 6. +``` -- d - - e - - f -. -
      -
    • -

      a

      -
        -
      • b
      • -
      • c
      • -
      -
    • -
    • -

      d

      -
        -
      • e
      • -
      • f
      • -
      -
    • -
    -```````````````````````````````` +Oddly, though, `Markdown.pl` *does* allow a blockquote to +interrupt a paragraph, even though the same considerations might +apply. +In CommonMark, we do allow lists to interrupt paragraphs, for +two reasons. First, it is natural and not uncommon for people +to start lists without blank lines: -# Inlines +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` -Inlines are parsed sequentially from the beginning of the character -stream to the end (left to right, in left-to-right languages). -Thus, for example, in +Second, we are attracted to a -```````````````````````````````` example -`hi`lo` -. -

    hilo`

    -```````````````````````````````` +> [principle of uniformity](@): +> if a chunk of text has a certain +> meaning, it will continue to have the same meaning when put into a +> container block (such as a list item or blockquote). -`hi` is parsed as code, leaving the backtick at the end as a literal -backtick. +(Indeed, the spec for [list items] and [block quotes] presupposes +this principle.) This principle implies that if +``` markdown + * I need to buy + - new shoes + - a coat + - a plane ticket +``` -## Backslash escapes +is a list item containing a paragraph followed by a nested sublist, +as all Markdown implementations agree it is (though the paragraph +may be rendered without `

    ` tags, since the list is "tight"), +then -Any ASCII punctuation character may be backslash-escaped: +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` -```````````````````````````````` example -\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ -. -

    !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

    -```````````````````````````````` +by itself should be a paragraph followed by a nested sublist. +Since it is well established Markdown practice to allow lists to +interrupt paragraphs inside list items, the [principle of +uniformity] requires us to allow this outside list items as +well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) +takes a different approach, requiring blank lines before lists +even inside other list items.) -Backslashes before other characters are treated as literal -backslashes: +In order to solve of unwanted lists in paragraphs with +hard-wrapped numerals, we allow only lists starting with `1` to +interrupt paragraphs. Thus, ```````````````````````````````` example -\→\A\a\ \3\φ\« +The number of windows in my house is +14. The number of doors is 6. . -

    \→\A\a\ \3\φ\«

    +

    The number of windows in my house is +14. The number of doors is 6.

    ```````````````````````````````` - -Escaped characters are treated as regular characters and do -not have their usual Markdown meanings: +We may still get an unintended result in cases like ```````````````````````````````` example -\*not emphasized* -\
    not a tag -\[not a link](/foo) -\`not code` -1\. not a list -\* not a list -\# not a heading -\[foo]: /url "not a reference" -\ö not a character entity +The number of windows in my house is +1. The number of doors is 6. . -

    *not emphasized* -<br/> not a tag -[not a link](/foo) -`not code` -1. not a list -* not a list -# not a heading -[foo]: /url "not a reference" -&ouml; not a character entity

    +

    The number of windows in my house is

    +
      +
    1. The number of doors is 6.
    2. +
    ```````````````````````````````` +but this rule should prevent most spurious list captures. -If a backslash is itself escaped, the following character is not: +There can be any number of blank lines between items: ```````````````````````````````` example -\\*emphasis* -. -

    \emphasis

    -```````````````````````````````` +- foo +- bar -A backslash at the end of the line is a [hard line break]: -```````````````````````````````` example -foo\ -bar +- baz . -

    foo
    -bar

    +
      +
    • +

      foo

      +
    • +
    • +

      bar

      +
    • +
    • +

      baz

      +
    • +
    ```````````````````````````````` +```````````````````````````````` example +- foo + - bar + - baz -Backslash escapes do not work in code blocks, code spans, autolinks, or -raw HTML: -```````````````````````````````` example -`` \[\` `` + bim . -

    \[\`

    +
      +
    • foo +
        +
      • bar +
          +
        • +

          baz

          +

          bim

          +
        • +
        +
      • +
      +
    • +
    ```````````````````````````````` +To separate consecutive lists of the same type, or to separate a +list from an indented code block that would otherwise be parsed +as a subparagraph of the final list item, you can insert a blank HTML +comment: + ```````````````````````````````` example - \[\] -. -
    \[\]
    -
    -```````````````````````````````` +- foo +- bar + -```````````````````````````````` example -~~~ -\[\] -~~~ +- baz +- bim . -
    \[\]
    -
    +
      +
    • foo
    • +
    • bar
    • +
    + +
      +
    • baz
    • +
    • bim
    • +
    ```````````````````````````````` ```````````````````````````````` example - -. -

    http://example.com?find=\*

    -```````````````````````````````` +- foo + + notcode +- foo -```````````````````````````````` example - + + + code . - +
      +
    • +

      foo

      +

      notcode

      +
    • +
    • +

      foo

      +
    • +
    + +
    code
    +
    ```````````````````````````````` -But they work in all other contexts, including URLs and link titles, -link references, and [info strings] in [fenced code blocks]: +List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item: ```````````````````````````````` example -[foo](/bar\* "ti\*tle") +- a + - b + - c + - d + - e + - f +- g . -

    foo

    +
      +
    • a
    • +
    • b
    • +
    • c
    • +
    • d
    • +
    • e
    • +
    • f
    • +
    • g
    • +
    ```````````````````````````````` ```````````````````````````````` example -[foo] +1. a -[foo]: /bar\* "ti\*tle" + 2. b + + 3. c . -

    foo

    +
      +
    1. +

      a

      +
    2. +
    3. +

      b

      +
    4. +
    5. +

      c

      +
    6. +
    ```````````````````````````````` +Note, however, that list items may not be indented more than +three spaces. Here `- e` is treated as a paragraph continuation +line, because it is indented more than three spaces: ```````````````````````````````` example -``` foo\+bar -foo -``` +- a + - b + - c + - d + - e . -
    foo
    -
    +
      +
    • a
    • +
    • b
    • +
    • c
    • +
    • d +- e
    • +
    ```````````````````````````````` +And here, `3. c` is treated as in indented code block, +because it is indented four spaces and preceded by a +blank line. +```````````````````````````````` example +1. a -## Entity and numeric character references - -Valid HTML entity references and numeric character references -can be used in place of the corresponding Unicode character, -with the following exceptions: - -- Entity and character references are not recognized in code - blocks and code spans. + 2. b -- Entity and character references cannot stand in place of - special characters that define structural elements in - CommonMark. For example, although `*` can be used - in place of a literal `*` character, `*` cannot replace - `*` in emphasis delimiters, bullet list markers, or thematic - breaks. + 3. c +. +
      +
    1. +

      a

      +
    2. +
    3. +

      b

      +
    4. +
    +
    3. c
    +
    +```````````````````````````````` -Conforming CommonMark parsers need not store information about -whether a particular character was represented in the source -using a Unicode character or an entity reference. -[Entity references](@) consist of `&` + any of the valid -HTML5 entity names + `;`. The -document -is used as an authoritative source for the valid entity -references and their corresponding code points. +This is a loose list, because there is a blank line between +two of the list items: ```````````````````````````````` example -  & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸ +- a +- b + +- c . -

      & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +
    • +
    • +

      c

      +
    • +
    ```````````````````````````````` -[Decimal numeric character -references](@) -consist of `&#` + a string of 1--7 arabic digits + `;`. A -numeric character reference is parsed as the corresponding -Unicode character. Invalid Unicode code points will be replaced by -the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons, -the code point `U+0000` will also be replaced by `U+FFFD`. +So is this, with a empty second item: ```````````````````````````````` example -# Ӓ Ϡ � +* a +* + +* c . -

    # Ӓ Ϡ �

    +
      +
    • +

      a

      +
    • +
    • +
    • +

      c

      +
    • +
    ```````````````````````````````` -[Hexadecimal numeric character -references](@) consist of `&#` + -either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. -They too are parsed as the corresponding Unicode character (this -time specified with a hexadecimal numeral instead of decimal). +These are loose lists, even though there is no space between the items, +because one of the items directly contains two block-level elements +with a blank line between them: ```````````````````````````````` example -" ആ ಫ +- a +- b + + c +- d . -

    " ആ ಫ

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +

      c

      +
    • +
    • +

      d

      +
    • +
    ```````````````````````````````` -Here are some nonentities: - ```````````````````````````````` example -  &x; &#; &#x; -� -&#abcdef0; -&ThisIsNotDefined; &hi?; +- a +- b + + [ref]: /url +- d . -

    &nbsp &x; &#; &#x; -&#987654321; -&#abcdef0; -&ThisIsNotDefined; &hi?;

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +
    • +
    • +

      d

      +
    • +
    ```````````````````````````````` -Although HTML5 does accept some entity references -without a trailing semicolon (such as `©`), these are not -recognized here, because it makes the grammar too ambiguous: +This is a tight list, because the blank lines are in a code block: ```````````````````````````````` example -© +- a +- ``` + b + + + ``` +- c . -

    &copy

    +
      +
    • a
    • +
    • +
      b
      +
      +
      +
      +
    • +
    • c
    • +
    ```````````````````````````````` -Strings that are not on the list of HTML5 named entities are not -recognized as entity references either: +This is a tight list, because the blank line is between two +paragraphs of a sublist. So the sublist is loose while +the outer list is tight: ```````````````````````````````` example -&MadeUpEntity; +- a + - b + + c +- d . -

    &MadeUpEntity;

    +
      +
    • a +
        +
      • +

        b

        +

        c

        +
      • +
      +
    • +
    • d
    • +
    ```````````````````````````````` -Entity and numeric character references are recognized in any -context besides code spans or code blocks, including -URLs, [link titles], and [fenced code block][] [info strings]: +This is a tight list, because the blank line is inside the +block quote: ```````````````````````````````` example - +* a + > b + > +* c . - +
      +
    • a +
      +

      b

      +
      +
    • +
    • c
    • +
    ```````````````````````````````` +This list is tight, because the consecutive block elements +are not separated by blank lines: + ```````````````````````````````` example -[foo](/föö "föö") +- a + > b + ``` + c + ``` +- d . -

    foo

    +
      +
    • a +
      +

      b

      +
      +
      c
      +
      +
    • +
    • d
    • +
    ```````````````````````````````` -```````````````````````````````` example -[foo] +A single-paragraph list is tight: -[foo]: /föö "föö" +```````````````````````````````` example +- a . -

    foo

    +
      +
    • a
    • +
    ```````````````````````````````` ```````````````````````````````` example -``` föö -foo -``` +- a + - b . -
    foo
    -
    +
      +
    • a +
        +
      • b
      • +
      +
    • +
    ```````````````````````````````` -Entity and numeric character references are treated as literal -text in code spans and code blocks: +This list is loose, because of the blank line between the +two block elements in the list item: ```````````````````````````````` example -`föö` -. -

    f&ouml;&ouml;

    -```````````````````````````````` - +1. ``` + foo + ``` -```````````````````````````````` example - föfö + bar . -
    f&ouml;f&ouml;
    +
      +
    1. +
      foo
       
      +

      bar

      +
    2. +
    ```````````````````````````````` -Entity and numeric character references cannot be used -in place of symbols indicating structure in CommonMark -documents. +Here the outer list is loose, the inner list tight: ```````````````````````````````` example -*foo* -*foo* +* foo + * bar + + baz . -

    *foo* -foo

    +
      +
    • +

      foo

      +
        +
      • bar
      • +
      +

      baz

      +
    • +
    ```````````````````````````````` + ```````````````````````````````` example -* foo +- a + - b + - c -* foo +- d + - e + - f . -

    * foo

      -
    • foo
    • +
    • +

      a

      +
        +
      • b
      • +
      • c
      • +
      +
    • +
    • +

      d

      +
        +
      • e
      • +
      • f
      • +
      +
    ```````````````````````````````` -```````````````````````````````` example -foo bar -. -

    foo -bar

    -```````````````````````````````` +# Inlines + +Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in ```````````````````````````````` example - foo +`hi`lo` . -

    →foo

    +

    hilo`

    ```````````````````````````````` +`hi` is parsed as code, leaving the backtick at the end as a literal +backtick. -```````````````````````````````` example -[a](url "tit") -. -

    [a](url "tit")

    -```````````````````````````````` ## Code spans @@ -7461,10 +7466,11 @@ A [link destination](@) consists of either closing `>` that contains no line breaks or unescaped `<` or `>` characters, or -- a nonempty sequence of characters that does not start with - `<`, does not include ASCII space or control characters, and - includes parentheses only if (a) they are backslash-escaped or - (b) they are part of a balanced pair of unescaped parentheses. +- a nonempty sequence of characters that does not start with `<`, + does not include [ASCII control characters][ASCII control character] + or [whitespace][], and includes parentheses only if (a) they are + backslash-escaped or (b) they are part of a balanced pair of + unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.) @@ -7615,6 +7621,13 @@ balanced: However, if you have unbalanced parentheses, you need to escape or use the `<...>` form: +```````````````````````````````` example +[link](foo(and(bar)) +. +

    [link](foo(and(bar))

    +```````````````````````````````` + + ```````````````````````````````` example [link](foo\(and\(bar\)) . @@ -7923,9 +7936,8 @@ perform the *Unicode case fold*, strip leading and trailing matching reference link definitions, the one that comes first in the document is used. (It is desirable in such cases to emit a warning.) -The contents of the first link label are parsed as inlines, which are -used as the link's text. The link's URI and title are provided by the -matching [link reference definition]. +The link's URI and title are provided by the matching [link +reference definition]. Here is a simple example: @@ -8018,11 +8030,11 @@ emphasis grouping: ```````````````````````````````` example -[foo *bar][ref] +[foo *bar][ref]* [ref]: /uri . -

    foo *bar

    +

    foo *bar*

    ```````````````````````````````` @@ -8070,11 +8082,11 @@ Matching is case-insensitive: Unicode case fold is used: ```````````````````````````````` example -[Толпой][Толпой] is a Russian word. +[ẞ] -[ТОЛПОЙ]: /url +[SS]: /url . -

    Толпой is a Russian word.

    +

    ```````````````````````````````` @@ -8707,9 +8719,9 @@ a link to the URI, with the URI as the link's label. An [absolute URI](@), for these purposes, consists of a [scheme] followed by a colon (`:`) -followed by zero or more characters other than ASCII -[whitespace] and control characters, `<`, and `>`. If -the URI includes these characters, they must be percent-encoded +followed by zero or more characters other [ASCII control +characters][ASCII control character] or [whitespace][] , `<`, and `>`. +If the URI includes these characters, they must be percent-encoded (e.g. `%20` for a space). For purposes of this spec, a [scheme](@) is any sequence @@ -8942,10 +8954,8 @@ consists of the string ``, and the string `?>`. -A [declaration](@) consists of the -string ``, and the character `>`. +A [declaration](@) consists of the string ``, and the character `>`. A [CDATA section](@) consists of the string `` for a block quote). If we encounter a new block start, we close any blocks unmatched in step 1 before creating the new block as a child of the last -matched block. +matched container block. 3. Finally, we look at the remainder of the line (after block markers like `>`, list markers, and indentation have been consumed). -- cgit v1.2.3