From 1caeda5b537c5cd30f4fc2bf078a00265473894c Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Sat, 24 Jan 2015 21:13:50 -0800 Subject: Moved spec.txt to test/ directory. --- Makefile | 4 +- spec.txt | 7321 --------------------------------------------------- test/CMakeLists.txt | 4 +- test/spec.txt | 7321 +++++++++++++++++++++++++++++++++++++++++++++++++++ test/spec_tests.py | 2 +- 5 files changed, 7326 insertions(+), 7326 deletions(-) delete mode 100644 spec.txt create mode 100644 test/spec.txt diff --git a/Makefile b/Makefile index 62b0023..f7a9335 100644 --- a/Makefile +++ b/Makefile @@ -4,7 +4,7 @@ BUILDDIR?=build GENERATOR?=Unix Makefiles MINGW_BUILDDIR?=build-mingw MINGW_INSTALLDIR?=windows -SPEC=spec.txt +SPEC=test/spec.txt SITE=_site SPECVERSION=$(shell perl -ne 'print $$1 if /^version: *([0-9.]+)/' $(SPEC)) FUZZCHARS?=2000000 # for fuzztest @@ -82,7 +82,7 @@ $(SRCDIR)/scanners.c: $(SRCDIR)/scanners.re test: $(SPEC) cmake_build make -C $(BUILDDIR) test || (cat $(BUILDDIR)/Testing/Temporary/LastTest.log && exit 1) -$(ALLTESTS): spec.txt +$(ALLTESTS): $(SPEC) python3 test/spec_tests.py --spec $< --dump-tests | python3 -c 'import json; import sys; tests = json.loads(sys.stdin.read()); print("\n".join([test["markdown"] for test in tests]))' > $@ leakcheck: $(ALLTESTS) diff --git a/spec.txt b/spec.txt deleted file mode 100644 index e754810..0000000 --- a/spec.txt +++ /dev/null @@ -1,7321 +0,0 @@ ---- -title: CommonMark Spec -author: John MacFarlane -version: 0.17 -date: 2015-01-24 -license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' -... - -# Introduction - -## What is Markdown? - -Markdown is a plain text format for writing structured documents, -based on conventions used for indicating formatting in email and -usenet posts. It was developed in 2004 by John Gruber, who wrote -the first Markdown-to-HTML converter in perl, and it soon became -widely used in websites. By 2014 there were dozens of -implementations in many languages. Some of them extended basic -Markdown syntax with conventions for footnotes, definition lists, -tables, and other constructs, and some allowed output not just in -HTML but in LaTeX and many other formats. - -## Why is a spec needed? - -John Gruber's [canonical description of Markdown's -syntax](http://daringfireball.net/projects/markdown/syntax) -does not specify the syntax unambiguously. Here are some examples of -questions it does not answer: - -1. How much indentation is needed for a sublist? The spec says that - continuation paragraphs need to be indented four spaces, but is - not fully explicit about sublists. It is natural to think that - they, too, must be indented four spaces, but `Markdown.pl` does - not require that. This is hardly a "corner case," and divergences - between implementations on this issue often lead to surprises for - users in real documents. (See [this comment by John - Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) - -2. Is a blank line needed before a block quote or header? - Most implementations do not require the blank line. However, - this can lead to unexpected results in hard-wrapped text, and - also to ambiguities in parsing (note that some implementations - put the header inside the blockquote, while others do not). - (John Gruber has also spoken [in favor of requiring the blank - lines](http://article.gmane.org/gmane.text.markdown.general/2146).) - -3. Is a blank line needed before an indented code block? - (`Markdown.pl` requires it, but this is not mentioned in the - documentation, and some implementations do not require it.) - - ``` markdown - paragraph - code? - ``` - -4. What is the exact rule for determining when list items get - wrapped in `

` tags? Can a list be partially "loose" and partially - "tight"? What should we do with a list like this? - - ``` markdown - 1. one - - 2. two - 3. three - ``` - - Or this? - - ``` markdown - 1. one - - a - - - b - 2. two - ``` - - (There are some relevant comments by John Gruber - [here](http://article.gmane.org/gmane.text.markdown.general/2554).) - -5. Can list markers be indented? Can ordered list markers be right-aligned? - - ``` markdown - 8. item 1 - 9. item 2 - 10. item 2a - ``` - -6. Is this one list with a horizontal rule in its second item, - or two lists separated by a horizontal rule? - - ``` markdown - * a - * * * * * - * b - ``` - -7. When list markers change from numbers to bullets, do we have - two lists or one? (The Markdown syntax description suggests two, - but the perl scripts and many other implementations produce one.) - - ``` markdown - 1. fee - 2. fie - - foe - - fum - ``` - -8. What are the precedence rules for the markers of inline structure? - For example, is the following a valid link, or does the code span - take precedence ? - - ``` markdown - [a backtick (`)](/url) and [another backtick (`)](/url). - ``` - -9. What are the precedence rules for markers of emphasis and strong - emphasis? For example, how should the following be parsed? - - ``` markdown - *foo *bar* baz* - ``` - -10. What are the precedence rules between block-level and inline-level - structure? For example, how should the following be parsed? - - ``` markdown - - `a long code span can contain a hyphen like this - - and it can screw things up` - ``` - -11. Can list items include section headers? (`Markdown.pl` does not - allow this, but does allow blockquotes to include headers.) - - ``` markdown - - # Heading - ``` - -12. Can list items be empty? - - ``` markdown - * a - * - * b - ``` - -13. Can link references be defined inside block quotes or list items? - - ``` markdown - > Blockquote [foo]. - > - > [foo]: /url - ``` - -14. If there are multiple definitions for the same reference, which takes - precedence? - - ``` markdown - [foo]: /url1 - [foo]: /url2 - - [foo][] - ``` - -In the absence of a spec, early implementers consulted `Markdown.pl` -to resolve these ambiguities. But `Markdown.pl` was quite buggy, and -gave manifestly bad results in many cases, so it was not a -satisfactory replacement for a spec. - -Because there is no unambiguous spec, implementations have diverged -considerably. As a result, users are often surprised to find that -a document that renders one way on one system (say, a github wiki) -renders differently on another (say, converting to docbook using -pandoc). To make matters worse, because nothing in Markdown counts -as a "syntax error," the divergence often isn't discovered right away. - -## About this document - -This document attempts to specify Markdown syntax unambiguously. -It contains many examples with side-by-side Markdown and -HTML. These are intended to double as conformance tests. An -accompanying script `spec_tests.py` can be used to run the tests -against any Markdown program: - - python test/spec_tests.py --spec spec.txt --program PROGRAM - -Since this document describes how Markdown is to be parsed into -an abstract syntax tree, it would have made sense to use an abstract -representation of the syntax tree instead of HTML. But HTML is capable -of representing the structural distinctions we need to make, and the -choice of HTML for the tests makes it possible to run the tests against -an implementation without writing an abstract syntax tree renderer. - -This document is generated from a text file, `spec.txt`, written -in Markdown with a small extension for the side-by-side tests. -The script `spec2md.pl` can be used to turn `spec.txt` into pandoc -Markdown, which can then be converted into other formats. - -In the examples, the `→` character is used to represent tabs. - -# Preliminaries - -## Characters and lines - -Any sequence of [character]s is a valid CommonMark -document. - -A [character](@character) is a unicode code point. -This spec does not specify an encoding; it thinks of lines as composed -of characters rather than bytes. A conforming parser may be limited -to a certain encoding. - -A [line](@line) is a sequence of zero or more [character]s -followed by a [line ending] or by the end of file. - -A [line ending](@line-ending) is, depending on the platform, a -newline (`U+000A`), carriage return (`U+000D`), or -carriage return + newline. - -For security reasons, a conforming parser must strip or replace the -Unicode character `U+0000`. - -A line containing no characters, or a line containing only spaces -(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). - -The following definitions of character classes will be used in this spec: - -A [whitespace character](@whitespace-character) is a space -(`U+0020`), tab (`U+0009`), carriage return (`U+000D`), or -newline (`U+000A`). - -[Whitespace](@whitespace) is a sequence of one or more [whitespace -character]s. - -A [unicode whitespace character](@unicode-whitespace-character) is -any code point in the unicode `Zs` class, or a tab (`U+0009`), -carriage return (`U+000D`), newline (`U+000A`), or form feed -(`U+000C`). - -[Unicode whitespace](@unicode-whitespace) is a sequence of one -or more [unicode whitespace character]s. - -A [non-space character](@non-space-character) is anything but `U+0020`. - -An [ASCII punctuation character](@ascii-punctuation-character) -is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, -`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, -`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. - -A [punctuation character](@punctuation-character) is an [ASCII -punctuation character] or anything in -the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. - -## Tab expansion - -Tabs in lines are expanded to spaces, with a tab stop of 4 characters: - -. -→foo→baz→→bim -. -

foo baz     bim
-
-. - -. - a→a - ὐ→a -. -
a   a
-ὐ   a
-
-. - -# Blocks and inlines - -We can think of a document as a sequence of -[blocks](@block)---structural -elements like paragraphs, block quotations, -lists, headers, rules, and code blocks. Blocks can contain other -blocks, or they can contain [inline](@inline) content: -words, spaces, links, emphasized text, images, and inline code. - -## Precedence - -Indicators of block structure always take precedence over indicators -of inline structure. So, for example, the following is a list with -two items, not a list with one item containing a code span: - -. -- `one -- two` -. - -. - -This means that parsing can proceed in two steps: first, the block -structure of the document can be discerned; second, text lines inside -paragraphs, headers, and other block constructs can be parsed for inline -structure. The second step requires information about link reference -definitions that will be available only at the end of the first -step. Note that the first step requires processing lines in sequence, -but the second can be parallelized, since the inline parsing of -one block element does not affect the inline parsing of any other. - -## Container blocks and leaf blocks - -We can divide blocks into two types: -[container block](@container-block)s, -which can contain other blocks, and [leaf block](@leaf-block)s, -which cannot. - -# Leaf blocks - -This section describes the different kinds of leaf block that make up a -Markdown document. - -## Horizontal rules - -A line consisting of 0-3 spaces of indentation, followed by a sequence -of three or more matching `-`, `_`, or `*` characters, each followed -optionally by any number of spaces, forms a -[horizontal rule](@horizontal-rule). - -. -*** ---- -___ -. -
-
-
-. - -Wrong characters: - -. -+++ -. -

+++

-. - -. -=== -. -

===

-. - -Not enough characters: - -. --- -** -__ -. -

-- -** -__

-. - -One to three spaces indent are allowed: - -. - *** - *** - *** -. -
-
-
-. - -Four spaces is too many: - -. - *** -. -
***
-
-. - -. -Foo - *** -. -

Foo -***

-. - -More than three characters may be used: - -. -_____________________________________ -. -
-. - -Spaces are allowed between the characters: - -. - - - - -. -
-. - -. - ** * ** * ** * ** -. -
-. - -. -- - - - -. -
-. - -Spaces are allowed at the end: - -. -- - - - -. -
-. - -However, no other characters may occur in the line: - -. -_ _ _ _ a - -a------ - ----a--- -. -

_ _ _ _ a

-

a------

-

---a---

-. - -It is required that all of the [non-space character]s be the same. -So, this is not a horizontal rule: - -. - *-* -. -

-

-. - -Horizontal rules do not need blank lines before or after: - -. -- foo -*** -- bar -. - -
- -. - -Horizontal rules can interrupt a paragraph: - -. -Foo -*** -bar -. -

Foo

-
-

bar

-. - -If a line of dashes that meets the above conditions for being a -horizontal rule could also be interpreted as the underline of a [setext -header], the interpretation as a -[setext header] takes precedence. Thus, for example, -this is a setext header, not a paragraph followed by a horizontal rule: - -. -Foo ---- -bar -. -

Foo

-

bar

-. - -When both a horizontal rule and a list item are possible -interpretations of a line, the horizontal rule takes precedence: - -. -* Foo -* * * -* Bar -. - -
- -. - -If you want a horizontal rule in a list item, use a different bullet: - -. -- Foo -- * * * -. - -. - -## ATX headers - -An [ATX header](@atx-header) -consists of a string of characters, parsed as inline content, between an -opening sequence of 1--6 unescaped `#` characters and an optional -closing sequence of any number of `#` characters. The opening sequence -of `#` characters cannot be followed directly by a -[non-space character]. -The optional closing sequence of `#`s must be preceded by a space and may be -followed by spaces only. The opening `#` character may be indented 0-3 -spaces. The raw contents of the header are stripped of leading and -trailing spaces before being parsed as inline content. The header level -is equal to the number of `#` characters in the opening sequence. - -Simple headers: - -. -# foo -## foo -### foo -#### foo -##### foo -###### foo -. -

foo

-

foo

-

foo

-

foo

-
foo
-
foo
-. - -More than six `#` characters is not a header: - -. -####### foo -. -

####### foo

-. - -A space is required between the `#` characters and the header's -contents. Note that many implementations currently do not require -the space. However, the space was required by the [original ATX -implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps -prevent things like the following from being parsed as headers: - -. -#5 bolt -. -

#5 bolt

-. - -This is not a header, because the first `#` is escaped: - -. -\## foo -. -

## foo

-. - -Contents are parsed as inlines: - -. -# foo *bar* \*baz\* -. -

foo bar *baz*

-. - -Leading and trailing blanks are ignored in parsing inline content: - -. -# foo -. -

foo

-. - -One to three spaces indentation are allowed: - -. - ### foo - ## foo - # foo -. -

foo

-

foo

-

foo

-. - -Four spaces are too much: - -. - # foo -. -
# foo
-
-. - -. -foo - # bar -. -

foo -# bar

-. - -A closing sequence of `#` characters is optional: - -. -## foo ## - ### bar ### -. -

foo

-

bar

-. - -It need not be the same length as the opening sequence: - -. -# foo ################################## -##### foo ## -. -

foo

-
foo
-. - -Spaces are allowed after the closing sequence: - -. -### foo ### -. -

foo

-. - -A sequence of `#` characters with a -[non-space character] following it -is not a closing sequence, but counts as part of the contents of the -header: - -. -### foo ### b -. -

foo ### b

-. - -The closing sequence must be preceded by a space: - -. -# foo# -. -

foo#

-. - -Backslash-escaped `#` characters do not count as part -of the closing sequence: - -. -### foo \### -## foo #\## -# foo \# -. -

foo ###

-

foo ###

-

foo #

-. - -ATX headers need not be separated from surrounding content by blank -lines, and they can interrupt paragraphs: - -. -**** -## foo -**** -. -
-

foo

-
-. - -. -Foo bar -# baz -Bar foo -. -

Foo bar

-

baz

-

Bar foo

-. - -ATX headers can be empty: - -. -## -# -### ### -. -

-

-

-. - -## Setext headers - -A [setext header](@setext-header) -consists of a line of text, containing at least one -[non-space character], -with no more than 3 spaces indentation, followed by a [setext header -underline]. The line of text must be -one that, were it not followed by the setext header underline, -would be interpreted as part of a paragraph: it cannot be a code -block, header, blockquote, horizontal rule, or list. - -A [setext header underline](@setext-header-underline) is a sequence of -`=` characters or a sequence of `-` characters, with no more than 3 -spaces indentation and any number of trailing spaces. If a line -containing a single `-` can be interpreted as an -empty [list items], it should be interpreted this way -and not as a [setext header underline]. - -The header is a level 1 header if `=` characters are used in the -[setext header underline], and a level 2 -header if `-` characters are used. The contents of the header are the -result of parsing the first line as Markdown inline content. - -In general, a setext header need not be preceded or followed by a -blank line. However, it cannot interrupt a paragraph, so when a -setext header comes after a paragraph, a blank line is needed between -them. - -Simple examples: - -. -Foo *bar* -========= - -Foo *bar* ---------- -. -

Foo bar

-

Foo bar

-. - -The underlining can be any length: - -. -Foo -------------------------- - -Foo -= -. -

Foo

-

Foo

-. - -The header content can be indented up to three spaces, and need -not line up with the underlining: - -. - Foo ---- - - Foo ------ - - Foo - === -. -

Foo

-

Foo

-

Foo

-. - -Four spaces indent is too much: - -. - Foo - --- - - Foo ---- -. -
Foo
----
-
-Foo
-
-
-. - -The setext header underline can be indented up to three spaces, and -may have trailing spaces: - -. -Foo - ---- -. -

Foo

-. - -Four spaces is too much: - -. -Foo - --- -. -

Foo ----

-. - -The setext header underline cannot contain internal spaces: - -. -Foo -= = - -Foo ---- - -. -

Foo -= =

-

Foo

-
-. - -Trailing spaces in the content line do not cause a line break: - -. -Foo ------ -. -

Foo

-. - -Nor does a backslash at the end: - -. -Foo\ ----- -. -

Foo\

-. - -Since indicators of block structure take precedence over -indicators of inline structure, the following are setext headers: - -. -`Foo ----- -` - - -. -

`Foo

-

`

-

<a title="a lot

-

of dashes"/>

-. - -The setext header underline cannot be a [lazy continuation -line] in a list item or block quote: - -. -> Foo ---- -. -
-

Foo

-
-
-. - -. -- Foo ---- -. - -
-. - -A setext header cannot interrupt a paragraph: - -. -Foo -Bar ---- - -Foo -Bar -=== -. -

Foo -Bar

-
-

Foo -Bar -===

-. - -But in general a blank line is not required before or after: - -. ---- -Foo ---- -Bar ---- -Baz -. -
-

Foo

-

Bar

-

Baz

-. - -Setext headers cannot be empty: - -. - -==== -. -

====

-. - -Setext header text lines must not be interpretable as block -constructs other than paragraphs. So, the line of dashes -in these examples gets interpreted as a horizontal rule: - -. ---- ---- -. -
-
-. - -. -- foo ------ -. - -
-. - -. - foo ---- -. -
foo
-
-
-. - -. -> foo ------ -. -
-

foo

-
-
-. - -If you want a header with `> foo` as its literal text, you can -use backslash escapes: - -. -\> foo ------- -. -

> foo

-. - -## Indented code blocks - -An [indented code block](@indented-code-block) is composed of one or more -[indented chunk]s separated by blank lines. -An [indented chunk](@indented-chunk) is a sequence of non-blank lines, -each indented four or more spaces. The contents of the code block are -the literal contents of the lines, including trailing -[line ending]s, minus four spaces of indentation. -An indented code block has no [info string]. - -An indented code block cannot interrupt a paragraph, so there must be -a blank line between a paragraph and a following indented code block. -(A blank line is not needed, however, between a code block and a following -paragraph.) - -. - a simple - indented code block -. -
a simple
-  indented code block
-
-. - -The contents are literal text, and do not get parsed as Markdown: - -. -
- *hi* - - - one -. -
<a/>
-*hi*
-
-- one
-
-. - -Here we have three chunks separated by blank lines: - -. - chunk1 - - chunk2 - - - - chunk3 -. -
chunk1
-
-chunk2
-
-
-
-chunk3
-
-. - -Any initial spaces beyond four will be included in the content, even -in interior blank lines: - -. - chunk1 - - chunk2 -. -
chunk1
-  
-  chunk2
-
-. - -An indented code block cannot interrupt a paragraph. (This -allows hanging indents and the like.) - -. -Foo - bar - -. -

Foo -bar

-. - -However, any non-blank line with fewer than four leading spaces ends -the code block immediately. So a paragraph may occur immediately -after indented code: - -. - foo -bar -. -
foo
-
-

bar

-. - -And indented code can occur immediately before and after other kinds of -blocks: - -. -# Header - foo -Header ------- - foo ----- -. -

Header

-
foo
-
-

Header

-
foo
-
-
-. - -The first line can be indented more than four spaces: - -. - foo - bar -. -
    foo
-bar
-
-. - -Blank lines preceding or following an indented code block -are not included in it: - -. - - - foo - - -. -
foo
-
-. - -Trailing spaces are included in the code block's content: - -. - foo -. -
foo  
-
-. - - -## Fenced code blocks - -A [code fence](@code-fence) is a sequence -of at least three consecutive backtick characters (`` ` ``) or -tildes (`~`). (Tildes and backticks cannot be mixed.) -A [fenced code block](@fenced-code-block) -begins with a code fence, indented no more than three spaces. - -The line with the opening code fence may optionally contain some text -following the code fence; this is trimmed of leading and trailing -spaces and called the [info string](@info-string). -The [info string] may not contain any backtick -characters. (The reason for this restriction is that otherwise -some inline code would be incorrectly interpreted as the -beginning of a fenced code block.) - -The content of the code block consists of all subsequent lines, until -a closing [code fence] of the same type as the code block -began with (backticks or tildes), and with at least as many backticks -or tildes as the opening code fence. If the leading code fence is -indented N spaces, then up to N spaces of indentation are removed from -each line of the content (if present). (If a content line is not -indented, it is preserved unchanged. If it is indented less than N -spaces, all of the indentation is removed.) - -The closing code fence may be indented up to three spaces, and may be -followed only by spaces, which are ignored. If the end of the -containing block (or document) is reached and no closing code fence -has been found, the code block contains all of the lines after the -opening code fence until the end of the containing block (or -document). (An alternative spec would require backtracking in the -event that a closing code fence is not found. But this makes parsing -much less efficient, and there seems to be no real down side to the -behavior described here.) - -A fenced code block may interrupt a paragraph, and does not require -a blank line either before or after. - -The content of a code fence is treated as literal text, not parsed -as inlines. The first word of the [info string] is typically used to -specify the language of the code sample, and rendered in the `class` -attribute of the `code` tag. However, this spec does not mandate any -particular treatment of the [info string]. - -Here is a simple example with backticks: - -. -``` -< - > -``` -. -
<
- >
-
-. - -With tildes: - -. -~~~ -< - > -~~~ -. -
<
- >
-
-. - -The closing code fence must use the same character as the opening -fence: - -. -``` -aaa -~~~ -``` -. -
aaa
-~~~
-
-. - -. -~~~ -aaa -``` -~~~ -. -
aaa
-```
-
-. - -The closing code fence must be at least as long as the opening fence: - -. -```` -aaa -``` -`````` -. -
aaa
-```
-
-. - -. -~~~~ -aaa -~~~ -~~~~ -. -
aaa
-~~~
-
-. - -Unclosed code blocks are closed by the end of the document: - -. -``` -. -
-. - -. -````` - -``` -aaa -. -

-```
-aaa
-
-. - -A code block can have all empty lines as its content: - -. -``` - - -``` -. -

-  
-
-. - -A code block can be empty: - -. -``` -``` -. -
-. - -Fences can be indented. If the opening fence is indented, -content lines will have equivalent opening indentation removed, -if present: - -. - ``` - aaa -aaa -``` -. -
aaa
-aaa
-
-. - -. - ``` -aaa - aaa -aaa - ``` -. -
aaa
-aaa
-aaa
-
-. - -. - ``` - aaa - aaa - aaa - ``` -. -
aaa
- aaa
-aaa
-
-. - -Four spaces indentation produces an indented code block: - -. - ``` - aaa - ``` -. -
```
-aaa
-```
-
-. - -Closing fences may be indented by 0-3 spaces, and their indentation -need not match that of the opening fence: - -. -``` -aaa - ``` -. -
aaa
-
-. - -. - ``` -aaa - ``` -. -
aaa
-
-. - -This is not a closing fence, because it is indented 4 spaces: - -. -``` -aaa - ``` -. -
aaa
-    ```
-
-. - - -Code fences (opening and closing) cannot contain internal spaces: - -. -``` ``` -aaa -. -

-aaa

-. - -. -~~~~~~ -aaa -~~~ ~~ -. -
aaa
-~~~ ~~
-
-. - -Fenced code blocks can interrupt paragraphs, and can be followed -directly by paragraphs, without a blank line between: - -. -foo -``` -bar -``` -baz -. -

foo

-
bar
-
-

baz

-. - -Other blocks can also occur before and after fenced code blocks -without an intervening blank line: - -. -foo ---- -~~~ -bar -~~~ -# baz -. -

foo

-
bar
-
-

baz

-. - -An [info string] can be provided after the opening code fence. -Opening and closing spaces will be stripped, and the first word, prefixed -with `language-`, is used as the value for the `class` attribute of the -`code` element within the enclosing `pre` element. - -. -```ruby -def foo(x) - return 3 -end -``` -. -
def foo(x)
-  return 3
-end
-
-. - -. -~~~~ ruby startline=3 $%@#$ -def foo(x) - return 3 -end -~~~~~~~ -. -
def foo(x)
-  return 3
-end
-
-. - -. -````; -```` -. -
-. - -[Info string]s for backtick code blocks cannot contain backticks: - -. -``` aa ``` -foo -. -

aa -foo

-. - -Closing code fences cannot have [info string]s: - -. -``` -``` aaa -``` -. -
``` aaa
-
-. - - -## HTML blocks - -An [HTML block tag](@html-block-tag) is -an [open tag] or [closing tag] whose tag -name is one of the following (case-insensitive): -`article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `iframe`, -`body`, `li`, `map`, `button`, `object`, `canvas`, `ol`, `caption`, -`output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, -`section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, -`fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, -`tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`, -`script`, `style`. - -An [HTML block](@html-block) begins with an -[HTML block tag], [HTML comment], [processing instruction], -[declaration], or [CDATA section]. -It ends when a [blank line] or the end of the -input is encountered. The initial line may be indented up to three -spaces, and subsequent lines may have any indentation. The contents -of the HTML block are interpreted as raw HTML, and will not be escaped -in HTML output. - -Some simple examples: - -. - - - - -
- hi -
- -okay. -. - - - - -
- hi -
-

okay.

-. - -. -
- *hello* - -. -
- *hello* - -. - -Here we have two HTML blocks with a Markdown paragraph between them: - -. -
- -*Markdown* - -
-. -
-

Markdown

-
-. - -In the following example, what looks like a Markdown code block -is actually part of the HTML block, which continues until a blank -line or the end of the document is reached: - -. -
-``` c -int x = 33; -``` -. -
-``` c -int x = 33; -``` -. - -A comment: - -. - -. - -. - -A processing instruction: - -. -'; -?> -. -'; -?> -. - -CDATA: - -. - -. - -. - -The opening tag can be indented 1-3 spaces, but not 4: - -. - - - -. - -
<!-- foo -->
-
-. - -An HTML block can interrupt a paragraph, and need not be preceded -by a blank line. - -. -Foo -
-bar -
-. -

Foo

-
-bar -
-. - -However, a following blank line is always needed, except at the end of -a document: - -. -
-bar -
-*foo* -. -
-bar -
-*foo* -. - -An incomplete HTML block tag may also start an HTML block: - -. -
The only restrictions are that block-level HTML elements — -> e.g. `
`, ``, `
`, `

`, etc. — must be separated from -> surrounding content by blank lines, and the start and end tags of the -> block should not be indented with tabs or spaces. - -In some ways Gruber's rule is more restrictive than the one given -here: - -- It requires that an HTML block be preceded by a blank line. -- It does not allow the start tag to be indented. -- It requires a matching end tag, which it also does not allow to - be indented. - -Indeed, most Markdown implementations, including some of Gruber's -own perl implementations, do not impose these restrictions. - -There is one respect, however, in which Gruber's rule is more liberal -than the one given here, since it allows blank lines to occur inside -an HTML block. There are two reasons for disallowing them here. -First, it removes the need to parse balanced tags, which is -expensive and can require backtracking from the end of the document -if no matching end tag is found. Second, it provides a very simple -and flexible way of including Markdown content inside HTML tags: -simply separate the Markdown from the HTML using blank lines: - -. -

- -*Emphasized* text. - -
-. -
-

Emphasized text.

-
-. - -Compare: - -. -
-*Emphasized* text. -
-. -
-*Emphasized* text. -
-. - -Some Markdown implementations have adopted a convention of -interpreting content inside tags as text if the open tag has -the attribute `markdown=1`. The rule given above seems a simpler and -more elegant way of achieving the same expressive power, which is also -much simpler to parse. - -The main potential drawback is that one can no longer paste HTML -blocks into Markdown documents with 100% reliability. However, -*in most cases* this will work fine, because the blank lines in -HTML are usually followed by HTML block tags. For example: - -. -
- - - - - - - -
-Hi -
-. - - - - -
-Hi -
-. - -Moreover, blank lines are usually not necessary and can be -deleted. The exception is inside `
` tags; here, one can
-replace the blank lines with `
` entities.
-
-So there is no important loss of expressive power with the new rule.
-
-## Link reference definitions
-
-A [link reference definition](@link-reference-definition)
-consists of a [link label], indented up to three spaces, followed
-by a colon (`:`), optional [whitespace] (including up to one
-[line ending]), a [link destination],
-optional [whitespace] (including up to one
-[line ending]), and an optional [link
-title], which if it is present must be separated
-from the [link destination] by [whitespace].
-No further [non-space character]s may occur on the line.
-
-A [link reference-definition]
-does not correspond to a structural element of a document.  Instead, it
-defines a label which can be used in [reference link]s
-and reference-style [images] elsewhere in the document.  [Link
-reference definitions] can come either before or after the links that use
-them.
-
-.
-[foo]: /url "title"
-
-[foo]
-.
-

foo

-. - -. - [foo]: - /url - 'the title' - -[foo] -. -

foo

-. - -. -[Foo*bar\]]:my_(url) 'title (with parens)' - -[Foo*bar\]] -. -

Foo*bar]

-. - -. -[Foo bar]: - -'title' - -[Foo bar] -. -

Foo bar

-. - -The title may be omitted: - -. -[foo]: -/url - -[foo] -. -

foo

-. - -The link destination may not be omitted: - -. -[foo]: - -[foo] -. -

[foo]:

-

[foo]

-. - -A link can come before its corresponding definition: - -. -[foo] - -[foo]: url -. -

foo

-. - -If there are several matching definitions, the first one takes -precedence: - -. -[foo] - -[foo]: first -[foo]: second -. -

foo

-. - -As noted in the section on [Links], matching of labels is -case-insensitive (see [matches]). - -. -[FOO]: /url - -[Foo] -. -

Foo

-. - -. -[ΑΓΩ]: /φου - -[αγω] -. -

αγω

-. - -Here is a link reference definition with no corresponding link. -It contributes nothing to the document. - -. -[foo]: /url -. -. - -This is not a link reference definition, because there are -[non-space character]s after the title: - -. -[foo]: /url "title" ok -. -

[foo]: /url "title" ok

-. - -This is not a link reference definition, because it is indented -four spaces: - -. - [foo]: /url "title" - -[foo] -. -
[foo]: /url "title"
-
-

[foo]

-. - -This is not a link reference definition, because it occurs inside -a code block: - -. -``` -[foo]: /url -``` - -[foo] -. -
[foo]: /url
-
-

[foo]

-. - -A [link reference definition] cannot interrupt a paragraph. - -. -Foo -[bar]: /baz - -[bar] -. -

Foo -[bar]: /baz

-

[bar]

-. - -However, it can directly follow other block elements, such as headers -and horizontal rules, and it need not be followed by a blank line. - -. -# [Foo] -[foo]: /url -> bar -. -

Foo

-
-

bar

-
-. - -Several [link reference definition]s -can occur one after another, without intervening blank lines. - -. -[foo]: /foo-url "foo" -[bar]: /bar-url - "bar" -[baz]: /baz-url - -[foo], -[bar], -[baz] -. -

foo, -bar, -baz

-. - -[Link reference definition]s can occur -inside block containers, like lists and block quotations. They -affect the entire document, not just the container in which they -are defined: - -. -[foo] - -> [foo]: /url -. -

foo

-
-
-. - - -## Paragraphs - -A sequence of non-blank lines that cannot be interpreted as other -kinds of blocks forms a [paragraph](@paragraph). -The contents of the paragraph are the result of parsing the -paragraph's raw content as inlines. The paragraph's raw content -is formed by concatenating the lines and removing initial and final -[whitespace]. - -A simple example with two paragraphs: - -. -aaa - -bbb -. -

aaa

-

bbb

-. - -Paragraphs can contain multiple lines, but no blank lines: - -. -aaa -bbb - -ccc -ddd -. -

aaa -bbb

-

ccc -ddd

-. - -Multiple blank lines between paragraph have no effect: - -. -aaa - - -bbb -. -

aaa

-

bbb

-. - -Leading spaces are skipped: - -. - aaa - bbb -. -

aaa -bbb

-. - -Lines after the first may be indented any amount, since indented -code blocks cannot interrupt paragraphs. - -. -aaa - bbb - ccc -. -

aaa -bbb -ccc

-. - -However, the first line may be indented at most three spaces, -or an indented code block will be triggered: - -. - aaa -bbb -. -

aaa -bbb

-. - -. - aaa -bbb -. -
aaa
-
-

bbb

-. - -Final spaces are stripped before inline parsing, so a paragraph -that ends with two or more spaces will not end with a [hard line -break]: - -. -aaa -bbb -. -

aaa
-bbb

-. - -## Blank lines - -[Blank line]s between block-level elements are ignored, -except for the role they play in determining whether a [list] -is [tight] or [loose]. - -Blank lines at the beginning and end of the document are also ignored. - -. - - -aaa - - -# aaa - - -. -

aaa

-

aaa

-. - - -# Container blocks - -A [container block] is a block that has other -blocks as its contents. There are two basic kinds of container blocks: -[block quotes] and [list items]. -[Lists] are meta-containers for [list items]. - -We define the syntax for container blocks recursively. The general -form of the definition is: - -> If X is a sequence of blocks, then the result of -> transforming X in such-and-such a way is a container of type Y -> with these blocks as its content. - -So, we explain what counts as a block quote or list item by explaining -how these can be *generated* from their contents. This should suffice -to define the syntax, although it does not give a recipe for *parsing* -these constructions. (A recipe is provided below in the section entitled -[A parsing strategy](#appendix-a-a-parsing-strategy).) - -## Block quotes - -A [block quote marker](@block-quote-marker) -consists of 0-3 spaces of initial indent, plus (a) the character `>` together -with a following space, or (b) a single character `>` not followed by a space. - -The following rules define [block quotes]: - -1. **Basic case.** If a string of lines *Ls* constitute a sequence - of blocks *Bs*, then the result of prepending a [block quote - marker] to the beginning of each line in *Ls* - is a [block quote](#block-quotes) containing *Bs*. - -2. **Laziness.** If a string of lines *Ls* constitute a [block - quote](#block-quotes) with contents *Bs*, then the result of deleting - the initial [block quote marker] from one or - more lines in which the next [non-space character] after the [block - quote marker] is [paragraph continuation - text] is a block quote with *Bs* as its content. - [Paragraph continuation text](@paragraph-continuation-text) is text - that will be parsed as part of the content of a paragraph, but does - not occur at the beginning of the paragraph. - -3. **Consecutiveness.** A document cannot contain two [block - quotes] in a row unless there is a [blank line] between them. - -Nothing else counts as a [block quote](#block-quotes). - -Here is a simple example: - -. -> # Foo -> bar -> baz -. -
-

Foo

-

bar -baz

-
-. - -The spaces after the `>` characters can be omitted: - -. -># Foo ->bar -> baz -. -
-

Foo

-

bar -baz

-
-. - -The `>` characters can be indented 1-3 spaces: - -. - > # Foo - > bar - > baz -. -
-

Foo

-

bar -baz

-
-. - -Four spaces gives us a code block: - -. - > # Foo - > bar - > baz -. -
> # Foo
-> bar
-> baz
-
-. - -The Laziness clause allows us to omit the `>` before a -paragraph continuation line: - -. -> # Foo -> bar -baz -. -
-

Foo

-

bar -baz

-
-. - -A block quote can contain some lazy and some non-lazy -continuation lines: - -. -> bar -baz -> foo -. -
-

bar -baz -foo

-
-. - -Laziness only applies to lines that are continuations of -paragraphs. Lines containing characters or indentation that indicate -block structure cannot be lazy. - -. -> foo ---- -. -
-

foo

-
-
-. - -. -> - foo -- bar -. -
-
    -
  • foo
  • -
-
-
    -
  • bar
  • -
-. - -. -> foo - bar -. -
-
foo
-
-
-
bar
-
-. - -. -> ``` -foo -``` -. -
-
-
-

foo

-
-. - -A block quote can be empty: - -. -> -. -
-
-. - -. -> -> -> -. -
-
-. - -A block quote can have initial or final blank lines: - -. -> -> foo -> -. -
-

foo

-
-. - -A blank line always separates block quotes: - -. -> foo - -> bar -. -
-

foo

-
-
-

bar

-
-. - -(Most current Markdown implementations, including John Gruber's -original `Markdown.pl`, will parse this example as a single block quote -with two paragraphs. But it seems better to allow the author to decide -whether two block quotes or one are wanted.) - -Consecutiveness means that if we put these block quotes together, -we get a single block quote: - -. -> foo -> bar -. -
-

foo -bar

-
-. - -To get a block quote with two paragraphs, use: - -. -> foo -> -> bar -. -
-

foo

-

bar

-
-. - -Block quotes can interrupt paragraphs: - -. -foo -> bar -. -

foo

-
-

bar

-
-. - -In general, blank lines are not needed before or after block -quotes: - -. -> aaa -*** -> bbb -. -
-

aaa

-
-
-
-

bbb

-
-. - -However, because of laziness, a blank line is needed between -a block quote and a following paragraph: - -. -> bar -baz -. -
-

bar -baz

-
-. - -. -> bar - -baz -. -
-

bar

-
-

baz

-. - -. -> bar -> -baz -. -
-

bar

-
-

baz

-. - -It is a consequence of the Laziness rule that any number -of initial `>`s may be omitted on a continuation line of a -nested block quote: - -. -> > > foo -bar -. -
-
-
-

foo -bar

-
-
-
-. - -. ->>> foo -> bar ->>baz -. -
-
-
-

foo -bar -baz

-
-
-
-. - -When including an indented code block in a block quote, -remember that the [block quote marker] includes -both the `>` and a following space. So *five spaces* are needed after -the `>`: - -. -> code - -> not code -. -
-
code
-
-
-
-

not code

-
-. - - -## List items - -A [list marker](@list-marker) is a -[bullet list marker] or an [ordered list marker]. - -A [bullet list marker](@bullet-list-marker) -is a `-`, `+`, or `*` character. - -An [ordered list marker](@ordered-list-marker) -is a sequence of one of more digits (`0-9`), followed by either a -`.` character or a `)` character. - -The following rules define [list items]: - -1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of - blocks *Bs* starting with a [non-space character] and not separated - from each other by more than one blank line, and *M* is a list - marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result - of prepending *M* and the following spaces to the first line of - *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a - list item with *Bs* as its contents. The type of the list item - (bullet or ordered) is determined by the type of its list marker. - If the list item is ordered, then it is also assigned a start - number, based on the ordered list marker. - -For example, let *Ls* be the lines - -. -A paragraph -with two lines. - - indented code - -> A block quote. -. -

A paragraph -with two lines.

-
indented code
-
-
-

A block quote.

-
-. - -And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says -that the following is an ordered list item with start number 1, -and the same contents as *Ls*: - -. -1. A paragraph - with two lines. - - indented code - - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-. - -The most important thing to notice is that the position of -the text after the list marker determines how much indentation -is needed in subsequent blocks in the list item. If the list -marker takes up two spaces, and there are three spaces between -the list marker and the next [non-space character], then blocks -must be indented five spaces in order to fall under the list -item. - -Here are some examples showing how far content must be indented to be -put under the list item: - -. -- one - - two -. -
    -
  • one
  • -
-

two

-. - -. -- one - - two -. -
    -
  • -

    one

    -

    two

    -
  • -
-. - -. - - one - - two -. -
    -
  • one
  • -
-
 two
-
-. - -. - - one - - two -. -
    -
  • -

    one

    -

    two

    -
  • -
-. - -It is tempting to think of this in terms of columns: the continuation -blocks must be indented at least to the column of the first -[non-space character] after the list marker. However, that is not quite right. -The spaces after the list marker determine how much relative indentation -is needed. Which column this indentation reaches will depend on -how the list item is embedded in other constructions, as shown by -this example: - -. - > > 1. one ->> ->> two -. -
-
-
    -
  1. -

    one

    -

    two

    -
  2. -
-
-
-. - -Here `two` occurs in the same column as the list marker `1.`, -but is actually contained in the list item, because there is -sufficent indentation after the last containing blockquote marker. - -The converse is also possible. In the following example, the word `two` -occurs far to the right of the initial text of the list item, `one`, but -it is not considered part of the list item, because it is not indented -far enough past the blockquote marker: - -. ->>- one ->> - > > two -. -
-
-
    -
  • one
  • -
-

two

-
-
-. - -A list item may not contain blocks that are separated by more than -one blank line. Thus, two blank lines will end a list, unless the -two blanks are contained in a [fenced code block]. - -. -- foo - - bar - -- foo - - - bar - -- ``` - foo - - - bar - ``` - -- baz - - + ``` - foo - - - bar - ``` -. -
    -
  • -

    foo

    -

    bar

    -
  • -
  • -

    foo

    -
  • -
-

bar

-
    -
  • -
    foo
    -
    -
    -bar
    -
    -
  • -
  • -

    baz

    -
      -
    • -
      foo
      -
      -
      -bar
      -
      -
    • -
    -
  • -
-. - -A list item may contain any kind of block: - -. -1. foo - - ``` - bar - ``` - - baz - - > bam -. -
    -
  1. -

    foo

    -
    bar
    -
    -

    baz

    -
    -

    bam

    -
    -
  2. -
-. - -2. **Item starting with indented code.** If a sequence of lines *Ls* - constitute a sequence of blocks *Bs* starting with an indented code - block and not separated from each other by more than one blank line, - and *M* is a list marker *M* of width *W* followed by - one space, then the result of prepending *M* and the following - space to the first line of *Ls*, and indenting subsequent lines of - *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. - If a line is empty, then it need not be indented. The type of the - list item (bullet or ordered) is determined by the type of its list - marker. If the list item is ordered, then it is also assigned a - start number, based on the ordered list marker. - -An indented code block will have to be indented four spaces beyond -the edge of the region where text will be included in the list item. -In the following case that is 6 spaces: - -. -- foo - - bar -. -
    -
  • -

    foo

    -
    bar
    -
    -
  • -
-. - -And in this case it is 11 spaces: - -. - 10. foo - - bar -. -
    -
  1. -

    foo

    -
    bar
    -
    -
  2. -
-. - -If the *first* block in the list item is an indented code block, -then by rule #2, the contents must be indented *one* space after the -list marker: - -. - indented code - -paragraph - - more code -. -
indented code
-
-

paragraph

-
more code
-
-. - -. -1. indented code - - paragraph - - more code -. -
    -
  1. -
    indented code
    -
    -

    paragraph

    -
    more code
    -
    -
  2. -
-. - -Note that an additional space indent is interpreted as space -inside the code block: - -. -1. indented code - - paragraph - - more code -. -
    -
  1. -
     indented code
    -
    -

    paragraph

    -
    more code
    -
    -
  2. -
-. - -Note that rules #1 and #2 only apply to two cases: (a) cases -in which the lines to be included in a list item begin with a -[non-space character], and (b) cases in which -they begin with an indented code -block. In a case like the following, where the first block begins with -a three-space indent, the rules do not allow us to form a list item by -indenting the whole thing and prepending a list marker: - -. - foo - -bar -. -

foo

-

bar

-. - -. -- foo - - bar -. -
    -
  • foo
  • -
-

bar

-. - -This is not a significant restriction, because when a block begins -with 1-3 spaces indent, the indentation can always be removed without -a change in interpretation, allowing rule #1 to be applied. So, in -the above case: - -. -- foo - - bar -. -
    -
  • -

    foo

    -

    bar

    -
  • -
-. - -3. **Empty list item.** A [list marker] followed by a -line containing only [whitespace] is a list item with no contents. - -Here is an empty bullet list item: - -. -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-. - -It does not matter whether there are spaces following the [list marker]: - -. -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-. - -Here is an empty ordered list item: - -. -1. foo -2. -3. bar -. -
    -
  1. foo
  2. -
  3. -
  4. bar
  5. -
-. - -A list may start or end with an empty list item: - -. -* -. -
    -
  • -
-. - -4. **Indentation.** If a sequence of lines *Ls* constitutes a list item - according to rule #1, #2, or #3, then the result of indenting each line - of *L* by 1-3 spaces (the same for each line) also constitutes a - list item with the same contents and attributes. If a line is - empty, then it need not be indented. - -Indented one space: - -. - 1. A paragraph - with two lines. - - indented code - - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-. - -Indented two spaces: - -. - 1. A paragraph - with two lines. - - indented code - - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-. - -Indented three spaces: - -. - 1. A paragraph - with two lines. - - indented code - - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-. - -Four spaces indent gives a code block: - -. - 1. A paragraph - with two lines. - - indented code - - > A block quote. -. -
1.  A paragraph
-    with two lines.
-
-        indented code
-
-    > A block quote.
-
-. - - -5. **Laziness.** If a string of lines *Ls* constitute a [list - item](#list-items) with contents *Bs*, then the result of deleting - some or all of the indentation from one or more lines in which the - next [non-space character] after the indentation is - [paragraph continuation text] is a - list item with the same contents and attributes. The unindented - lines are called - [lazy continuation line](@lazy-continuation-line)s. - -Here is an example with [lazy continuation line]s: - -. - 1. A paragraph -with two lines. - - indented code - - > A block quote. -. -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. -
-. - -Indentation can be partially deleted: - -. - 1. A paragraph - with two lines. -. -
    -
  1. A paragraph -with two lines.
  2. -
-. - -These examples show how laziness can work in nested structures: - -. -> 1. > Blockquote -continued here. -. -
-
    -
  1. -
    -

    Blockquote -continued here.

    -
    -
  2. -
-
-. - -. -> 1. > Blockquote -> continued here. -. -
-
    -
  1. -
    -

    Blockquote -continued here.

    -
    -
  2. -
-
-. - - -6. **That's all.** Nothing that is not counted as a list item by rules - #1--5 counts as a [list item](#list-items). - -The rules for sublists follow from the general rules above. A sublist -must be indented the same number of spaces a paragraph would need to be -in order to be included in the list item. - -So, in this case we need two spaces indent: - -. -- foo - - bar - - baz -. -
    -
  • foo -
      -
    • bar -
        -
      • baz
      • -
      -
    • -
    -
  • -
-. - -One is not enough: - -. -- foo - - bar - - baz -. -
    -
  • foo
  • -
  • bar
  • -
  • baz
  • -
-. - -Here we need four, because the list marker is wider: - -. -10) foo - - bar -. -
    -
  1. foo -
      -
    • bar
    • -
    -
  2. -
-. - -Three is not enough: - -. -10) foo - - bar -. -
    -
  1. foo
  2. -
-
    -
  • bar
  • -
-. - -A list may be the first block in a list item: - -. -- - foo -. -
    -
  • -
      -
    • foo
    • -
    -
  • -
-. - -. -1. - 2. foo -. -
    -
  1. -
      -
    • -
        -
      1. foo
      2. -
      -
    • -
    -
  2. -
-. - -A list item can contain a header: - -. -- # Foo -- Bar - --- - baz -. -
    -
  • -

    Foo

    -
  • -
  • -

    Bar

    -baz
  • -
-. - -### Motivation - -John Gruber's Markdown spec says the following about list items: - -1. "List markers typically start at the left margin, but may be indented - by up to three spaces. List markers must be followed by one or more - spaces or a tab." - -2. "To make lists look nice, you can wrap items with hanging indents.... - But if you don't want to, you don't have to." - -3. "List items may consist of multiple paragraphs. Each subsequent - paragraph in a list item must be indented by either 4 spaces or one - tab." - -4. "It looks nice if you indent every line of the subsequent paragraphs, - but here again, Markdown will allow you to be lazy." - -5. "To put a blockquote within a list item, the blockquote's `>` - delimiters need to be indented." - -6. "To put a code block within a list item, the code block needs to be - indented twice — 8 spaces or two tabs." - -These rules specify that a paragraph under a list item must be indented -four spaces (presumably, from the left margin, rather than the start of -the list marker, but this is not said), and that code under a list item -must be indented eight spaces instead of the usual four. They also say -that a block quote must be indented, but not by how much; however, the -example given has four spaces indentation. Although nothing is said -about other kinds of block-level content, it is certainly reasonable to -infer that *all* block elements under a list item, including other -lists, must be indented four spaces. This principle has been called the -*four-space rule*. - -The four-space rule is clear and principled, and if the reference -implementation `Markdown.pl` had followed it, it probably would have -become the standard. However, `Markdown.pl` allowed paragraphs and -sublists to start with only two spaces indentation, at least on the -outer level. Worse, its behavior was inconsistent: a sublist of an -outer-level list needed two spaces indentation, but a sublist of this -sublist needed three spaces. It is not surprising, then, that different -implementations of Markdown have developed very different rules for -determining what comes under a list item. (Pandoc and python-Markdown, -for example, stuck with Gruber's syntax description and the four-space -rule, while discount, redcarpet, marked, PHP Markdown, and others -followed `Markdown.pl`'s behavior more closely.) - -Unfortunately, given the divergences between implementations, there -is no way to give a spec for list items that will be guaranteed not -to break any existing documents. However, the spec given here should -correctly handle lists formatted with either the four-space rule or -the more forgiving `Markdown.pl` behavior, provided they are laid out -in a way that is natural for a human to read. - -The strategy here is to let the width and indentation of the list marker -determine the indentation necessary for blocks to fall under the list -item, rather than having a fixed and arbitrary number. The writer can -think of the body of the list item as a unit which gets indented to the -right enough to fit the list marker (and any indentation on the list -marker). (The laziness rule, #5, then allows continuation lines to be -unindented if needed.) - -This rule is superior, we claim, to any rule requiring a fixed level of -indentation from the margin. The four-space rule is clear but -unnatural. It is quite unintuitive that - -``` markdown -- foo - - bar - - - baz -``` - -should be parsed as two lists with an intervening paragraph, - -``` html -
    -
  • foo
  • -
-

bar

-
    -
  • baz
  • -
-``` - -as the four-space rule demands, rather than a single list, - -``` html -
    -
  • -

    foo

    -

    bar

    -
      -
    • baz
    • -
    -
  • -
-``` - -The choice of four spaces is arbitrary. It can be learned, but it is -not likely to be guessed, and it trips up beginners regularly. - -Would it help to adopt a two-space rule? The problem is that such -a rule, together with the rule allowing 1--3 spaces indentation of the -initial list marker, allows text that is indented *less than* the -original list marker to be included in the list item. For example, -`Markdown.pl` parses - -``` markdown - - one - - two -``` - -as a single list item, with `two` a continuation paragraph: - -``` html -
    -
  • -

    one

    -

    two

    -
  • -
-``` - -and similarly - -``` markdown -> - one -> -> two -``` - -as - -``` html -
-
    -
  • -

    one

    -

    two

    -
  • -
-
-``` - -This is extremely unintuitive. - -Rather than requiring a fixed indent from the margin, we could require -a fixed indent (say, two spaces, or even one space) from the list marker (which -may itself be indented). This proposal would remove the last anomaly -discussed. Unlike the spec presented above, it would count the following -as a list item with a subparagraph, even though the paragraph `bar` -is not indented as far as the first paragraph `foo`: - -``` markdown - 10. foo - - bar -``` - -Arguably this text does read like a list item with `bar` as a subparagraph, -which may count in favor of the proposal. However, on this proposal indented -code would have to be indented six spaces after the list marker. And this -would break a lot of existing Markdown, which has the pattern: - -``` markdown -1. foo - - indented code -``` - -where the code is indented eight spaces. The spec above, by contrast, will -parse this text as expected, since the code block's indentation is measured -from the beginning of `foo`. - -The one case that needs special treatment is a list item that *starts* -with indented code. How much indentation is required in that case, since -we don't have a "first paragraph" to measure from? Rule #2 simply stipulates -that in such cases, we require one space indentation from the list marker -(and then the normal four spaces for the indented code). This will match the -four-space rule in cases where the list marker plus its initial indentation -takes four spaces (a common case), but diverge in other cases. - -## Lists - -A [list](@list) is a sequence of one or more -list items [of the same type]. The list items -may be separated by single [blank lines], but two -blank lines end all containing lists. - -Two list items are [of the same type](@of-the-same-type) -if they begin with a [list marker] of the same type. -Two list markers are of the -same type if (a) they are bullet list markers using the same character -(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same -delimiter (either `.` or `)`). - -A list is an [ordered list](@ordered-list) -if its constituent list items begin with -[ordered list marker]s, and a -[bullet list](@bullet-list) if its constituent list -items begin with [bullet list marker]s. - -The [start number](@start-number) -of an [ordered list] is determined by the list number of -its initial list item. The numbers of subsequent list items are -disregarded. - -A list is [loose](@loose) if it any of its constituent -list items are separated by blank lines, or if any of its constituent -list items directly contain two block-level elements with a blank line -between them. Otherwise a list is [tight](@tight). -(The difference in HTML output is that paragraphs in a loose list are -wrapped in `

` tags, while paragraphs in a tight list are not.) - -Changing the bullet or ordered list delimiter starts a new list: - -. -- foo -- bar -+ baz -. -

    -
  • foo
  • -
  • bar
  • -
-
    -
  • baz
  • -
-. - -. -1. foo -2. bar -3) baz -. -
    -
  1. foo
  2. -
  3. bar
  4. -
-
    -
  1. baz
  2. -
-. - -In CommonMark, a list can interrupt a paragraph. That is, -no blank line is needed to separate a paragraph from a following -list: - -. -Foo -- bar -- baz -. -

Foo

-
    -
  • bar
  • -
  • baz
  • -
-. - -`Markdown.pl` does not allow this, through fear of triggering a list -via a numeral in a hard-wrapped line: - -. -The number of windows in my house is -14. The number of doors is 6. -. -

The number of windows in my house is

-
    -
  1. The number of doors is 6.
  2. -
-. - -Oddly, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph, -even though the same considerations might apply. We think that the two -cases should be treated the same. Here are two reasons for allowing -lists to interrupt paragraphs: - -First, it is natural and not uncommon for people to start lists without -blank lines: - - I need to buy - - new shoes - - a coat - - a plane ticket - -Second, we are attracted to a - -> [principle of uniformity](@principle-of-uniformity): -> if a chunk of text has a certain -> meaning, it will continue to have the same meaning when put into a -> container block (such as a list item or blockquote). - -(Indeed, the spec for [list items] and [block quotes] presupposes -this principle.) This principle implies that if - - * I need to buy - - new shoes - - a coat - - a plane ticket - -is a list item containing a paragraph followed by a nested sublist, -as all Markdown implementations agree it is (though the paragraph -may be rendered without `

` tags, since the list is "tight"), -then - - I need to buy - - new shoes - - a coat - - a plane ticket - -by itself should be a paragraph followed by a nested sublist. - -Our adherence to the [principle of uniformity] -thus inclines us to think that there are two coherent packages: - -1. Require blank lines before *all* lists and blockquotes, - including lists that occur as sublists inside other list items. - -2. Require blank lines in none of these places. - -[reStructuredText](http://docutils.sourceforge.net/rst.html) takes -the first approach, for which there is much to be said. But the second -seems more consistent with established practice with Markdown. - -There can be blank lines between items, but two blank lines end -a list: - -. -- foo - -- bar - - -- baz -. -

    -
  • -

    foo

    -
  • -
  • -

    bar

    -
  • -
-
    -
  • baz
  • -
-. - -As illustrated above in the section on [list items], -two blank lines between blocks *within* a list item will also end a -list: - -. -- foo - - - bar -- baz -. -
    -
  • foo
  • -
-

bar

-
    -
  • baz
  • -
-. - -Indeed, two blank lines will end *all* containing lists: - -. -- foo - - bar - - baz - - - bim -. -
    -
  • foo -
      -
    • bar -
        -
      • baz
      • -
      -
    • -
    -
  • -
-
  bim
-
-. - -Thus, two blank lines can be used to separate consecutive lists of -the same type, or to separate a list from an indented code block -that would otherwise be parsed as a subparagraph of the final list -item: - -. -- foo -- bar - - -- baz -- bim -. -
    -
  • foo
  • -
  • bar
  • -
-
    -
  • baz
  • -
  • bim
  • -
-. - -. -- foo - - notcode - -- foo - - - code -. -
    -
  • -

    foo

    -

    notcode

    -
  • -
  • -

    foo

    -
  • -
-
code
-
-. - -List items need not be indented to the same level. The following -list items will be treated as items at the same list level, -since none is indented enough to belong to the previous list -item: - -. -- a - - b - - c - - d - - e - - f -- g -. -
    -
  • a
  • -
  • b
  • -
  • c
  • -
  • d
  • -
  • e
  • -
  • f
  • -
  • g
  • -
-. - -This is a loose list, because there is a blank line between -two of the list items: - -. -- a -- b - -- c -. -
    -
  • -

    a

    -
  • -
  • -

    b

    -
  • -
  • -

    c

    -
  • -
-. - -So is this, with a empty second item: - -. -* a -* - -* c -. -
    -
  • -

    a

    -
  • -
  • -
  • -

    c

    -
  • -
-. - -These are loose lists, even though there is no space between the items, -because one of the items directly contains two block-level elements -with a blank line between them: - -. -- a -- b - - c -- d -. -
    -
  • -

    a

    -
  • -
  • -

    b

    -

    c

    -
  • -
  • -

    d

    -
  • -
-. - -. -- a -- b - - [ref]: /url -- d -. -
    -
  • -

    a

    -
  • -
  • -

    b

    -
  • -
  • -

    d

    -
  • -
-. - -This is a tight list, because the blank lines are in a code block: - -. -- a -- ``` - b - - - ``` -- c -. -
    -
  • a
  • -
  • -
    b
    -
    -
    -
    -
  • -
  • c
  • -
-. - -This is a tight list, because the blank line is between two -paragraphs of a sublist. So the sublist is loose while -the outer list is tight: - -. -- a - - b - - c -- d -. -
    -
  • a -
      -
    • -

      b

      -

      c

      -
    • -
    -
  • -
  • d
  • -
-. - -This is a tight list, because the blank line is inside the -block quote: - -. -* a - > b - > -* c -. -
    -
  • a -
    -

    b

    -
    -
  • -
  • c
  • -
-. - -This list is tight, because the consecutive block elements -are not separated by blank lines: - -. -- a - > b - ``` - c - ``` -- d -. -
    -
  • a -
    -

    b

    -
    -
    c
    -
    -
  • -
  • d
  • -
-. - -A single-paragraph list is tight: - -. -- a -. -
    -
  • a
  • -
-. - -. -- a - - b -. -
    -
  • a -
      -
    • b
    • -
    -
  • -
-. - -This list is loose, because of the blank line between the -two block elements in the list item: - -. -1. ``` - foo - ``` - - bar -. -
    -
  1. -
    foo
    -
    -

    bar

    -
  2. -
-. - -Here the outer list is loose, the inner list tight: - -. -* foo - * bar - - baz -. -
    -
  • -

    foo

    -
      -
    • bar
    • -
    -

    baz

    -
  • -
-. - -. -- a - - b - - c - -- d - - e - - f -. -
    -
  • -

    a

    -
      -
    • b
    • -
    • c
    • -
    -
  • -
  • -

    d

    -
      -
    • e
    • -
    • f
    • -
    -
  • -
-. - -# Inlines - -Inlines are parsed sequentially from the beginning of the character -stream to the end (left to right, in left-to-right languages). -Thus, for example, in - -. -`hi`lo` -. -

hilo`

-. - -`hi` is parsed as code, leaving the backtick at the end as a literal -backtick. - -## Backslash escapes - -Any ASCII punctuation character may be backslash-escaped: - -. -\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ -. -

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

-. - -Backslashes before other characters are treated as literal -backslashes: - -. -\→\A\a\ \3\φ\« -. -

\ \A\a\ \3\φ\«

-. - -Escaped characters are treated as regular characters and do -not have their usual Markdown meanings: - -. -\*not emphasized* -\
not a tag -\[not a link](/foo) -\`not code` -1\. not a list -\* not a list -\# not a header -\[foo]: /url "not a reference" -. -

*not emphasized* -<br/> not a tag -[not a link](/foo) -`not code` -1. not a list -* not a list -# not a header -[foo]: /url "not a reference"

-. - -If a backslash is itself escaped, the following character is not: - -. -\\*emphasis* -. -

\emphasis

-. - -A backslash at the end of the line is a [hard line break]: - -. -foo\ -bar -. -

foo
-bar

-. - -Backslash escapes do not work in code blocks, code spans, autolinks, or -raw HTML: - -. -`` \[\` `` -. -

\[\`

-. - -. - \[\] -. -
\[\]
-
-. - -. -~~~ -\[\] -~~~ -. -
\[\]
-
-. - -. - -. -

http://example.com?find=\*

-. - -. - -. -

-. - -But they work in all other contexts, including URLs and link titles, -link references, and [info string]s in [fenced code block]s: - -. -[foo](/bar\* "ti\*tle") -. -

foo

-. - -. -[foo] - -[foo]: /bar\* "ti\*tle" -. -

foo

-. - -. -``` foo\+bar -foo -``` -. -
foo
-
-. - - -## Entities - -With the goal of making this standard as HTML-agnostic as possible, all -valid HTML entities (except in code blocks and code spans) -are recognized as such and converted into unicode characters before -they are stored in the AST. This means that renderers to formats other -than HTML need not be HTML-entity aware. HTML renderers may either escape -unicode characters as entities or leave them as they are. (However, -`"`, `&`, `<`, and `>` must always be rendered as entities.) - -[Named entities](@name-entities) consist of `&` -+ any of the valid HTML5 entity names + `;`. The -[following document](https://html.spec.whatwg.org/multipage/entities.json) -is used as an authoritative source of the valid entity names and their -corresponding codepoints. - -. -  & © Æ Ď ¾ ℋ ⅆ ∲ -. -

  & © Æ Ď ¾ ℋ ⅆ ∲

-. - -[Decimal entities](@decimal-entities) -consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these -entities need to be recognised and tranformed into their corresponding -UTF8 codepoints. Invalid Unicode codepoints will be written as the -"unknown codepoint" character (`0xFFFD`) - -. -# Ӓ Ϡ � -. -

# Ӓ Ϡ �

-. - -[Hexadecimal entities](@hexadecimal-entities) -consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits -+ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST. - -. -" ആ ಫ -. -

" ആ ಫ

-. - -Here are some nonentities: - -. -  &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; -. -

&nbsp &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;

-. - -Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here, because it -makes the grammar too ambiguous: - -. -© -. -

&copy

-. - -Strings that are not on the list of HTML5 named entities are not -recognized as entities either: - -. -&MadeUpEntity; -. -

&MadeUpEntity;

-. - -Entities are recognized in any context besides code spans or -code blocks, including raw HTML, URLs, [link title]s, and -[fenced code block] [info string]s: - -. - -. -

-. - -. -[foo](/föö "föö") -. -

foo

-. - -. -[foo] - -[foo]: /föö "föö" -. -

foo

-. - -. -``` föö -foo -``` -. -
foo
-
-. - -Entities are treated as literal text in code spans and code blocks: - -. -`föö` -. -

f&ouml;&ouml;

-. - -. - föfö -. -
f&ouml;f&ouml;
-
-. - -## Code spans - -A [backtick string](@backtick-string) -is a string of one or more backtick characters (`` ` ``) that is neither -preceded nor followed by a backtick. - -A [code span](@code-span) begins with a backtick string and ends with -a backtick string of equal length. The contents of the code span are -the characters between the two backtick strings, with leading and -trailing spaces and [line ending]s removed, and -[whitespace] collapsed to single spaces. - -This is a simple code span: - -. -`foo` -. -

foo

-. - -Here two backticks are used, because the code contains a backtick. -This example also illustrates stripping of leading and trailing spaces: - -. -`` foo ` bar `` -. -

foo ` bar

-. - -This example shows the motivation for stripping leading and trailing -spaces: - -. -` `` ` -. -

``

-. - -[Line ending]s are treated like spaces: - -. -`` -foo -`` -. -

foo

-. - -Interior spaces and [line ending]s are collapsed into -single spaces, just as they would be by a browser: - -. -`foo bar - baz` -. -

foo bar baz

-. - -Q: Why not just leave the spaces, since browsers will collapse them -anyway? A: Because we might be targeting a non-HTML format, and we -shouldn't rely on HTML-specific rendering assumptions. - -(Existing implementations differ in their treatment of internal -spaces and [line ending]s. Some, including `Markdown.pl` and -`showdown`, convert an internal [line ending] into a -`
` tag. But this makes things difficult for those who like to -hard-wrap their paragraphs, since a line break in the midst of a code -span will cause an unintended line break in the output. Others just -leave internal spaces as they are, which is fine if only HTML is being -targeted.) - -. -`foo `` bar` -. -

foo `` bar

-. - -Note that backslash escapes do not work in code spans. All backslashes -are treated literally: - -. -`foo\`bar` -. -

foo\bar`

-. - -Backslash escapes are never needed, because one can always choose a -string of *n* backtick characters as delimiters, where the code does -not contain any strings of exactly *n* backtick characters. - -Code span backticks have higher precedence than any other inline -constructs except HTML tags and autolinks. Thus, for example, this is -not parsed as emphasized text, since the second `*` is part of a code -span: - -. -*foo`*` -. -

*foo*

-. - -And this is not parsed as a link: - -. -[not a `link](/foo`) -. -

[not a link](/foo)

-. - -Code spans, HTML tags, and autolinks have the same precedence. -Thus, this is code: - -. -`` -. -

<a href="">`

-. - -But this is an HTML tag: - -. -
` -. -

`

-. - -And this is code: - -. -`` -. -

<http://foo.bar.baz>`

-. - -But this is an autolink: - -. -` -. -

http://foo.bar.`baz`

-. - -When a backtick string is not closed by a matching backtick string, -we just have literal backticks: - -. -```foo`` -. -

```foo``

-. - -. -`foo -. -

`foo

-. - -## Emphasis and strong emphasis - -John Gruber's original [Markdown syntax -description](http://daringfireball.net/projects/markdown/syntax#em) says: - -> Markdown treats asterisks (`*`) and underscores (`_`) as indicators of -> emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML -> `` tag; double `*`'s or `_`'s will be wrapped with an HTML `` -> tag. - -This is enough for most users, but these rules leave much undecided, -especially when it comes to nested emphasis. The original -`Markdown.pl` test suite makes it clear that triple `***` and -`___` delimiters can be used for strong emphasis, and most -implementations have also allowed the following patterns: - -``` markdown -***strong emph*** -***strong** in emph* -***emph* in strong** -**in strong *emph*** -*in emph **strong*** -``` - -The following patterns are less widely supported, but the intent -is clear and they are useful (especially in contexts like bibliography -entries): - -``` markdown -*emph *with emph* in it* -**strong **with strong** in it** -``` - -Many implementations have also restricted intraword emphasis to -the `*` forms, to avoid unwanted emphasis in words containing -internal underscores. (It is best practice to put these in code -spans, but users often do not.) - -``` markdown -internal emphasis: foo*bar*baz -no emphasis: foo_bar_baz -``` - -The rules given below capture all of these patterns, while allowing -for efficient parsing strategies that do not backtrack. - -First, some definitions. A [delimiter run](@delimiter-run) is either -a sequence of one or more `*` characters that is not preceded or -followed by a `*` character, or a sequence of one or more `_` -characters that is not preceded or followed by a `_` character. - -A [left-flanking delimiter run](@left-flanking-delimiter-run) is -a [delimiter run] that is (a) not followed by [unicode whitespace], -and (b) either not followed by a [punctuation character], or -preceded by [unicode whitespace] or a [punctuation character]. - -A [right-flanking delimiter run](@right-flanking-delimiter-run) is -a [delimiter run] that is (a) not preceded by [unicode whitespace], -and (b) either not preceded by a [punctuation character], or -followed by [unicode whitespace] or a [punctuation character]. - -Here are some examples of delimiter runs. - - - left-flanking but not right-flanking: - - ``` - ***abc - _abc - **"abc" - _"abc" - ``` - - - right-flanking but not left-flanking: - - ``` - abc*** - abc_ - "abc"** - _"abc" - ``` - - - Both right and right-flanking: - - ``` - abc***def - "abc"_"def" - ``` - - - Neither right nor right-flanking: - - ``` - abc *** def - a _ b - ``` - -(The idea of distinguishing left-flanking and right-flanking -delimiter runs based on the character before and the character -after comes from Roopesh Chander's -[vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags). -vfmd uses the terminology "emphasis indicator string" instead of "delimiter -run," and its rules for distinguishing left- and right-flanking runs -are a bit more complex than the ones given here.) - -The following rules define emphasis and strong emphasis: - -1. A single `*` character [can open emphasis](@can-open-emphasis) - iff it is part of a [left-flanking delimiter run]. - -2. A single `_` character [can open emphasis] iff - it is part of a [left-flanking delimiter run] - and not part of a [right-flanking delimiter run]. - -3. A single `*` character [can close emphasis](@can-close-emphasis) - iff it is part of a [right-flanking delimiter run]. - -4. A single `_` character [can close emphasis] - iff it is part of a [right-flanking delimiter run] - and not part of a [left-flanking delimiter run]. - -5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) - iff it is part of a [left-flanking delimiter run]. - -6. A double `__` [can open strong emphasis] - iff it is part of a [left-flanking delimiter run] - and not part of a [right-flanking delimiter run]. - -7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) - iff it is part of a [right-flanking delimiter run]. - -8. A double `__` [can close strong emphasis] - iff it is part of a [right-flanking delimiter run] - and not part of a [left-flanking delimiter run]. - -9. Emphasis begins with a delimiter that [can open emphasis] and ends - with a delimiter that [can close emphasis], and that uses the same - character (`_` or `*`) as the opening delimiter. There must - be a nonempty sequence of inlines between the open delimiter - and the closing delimiter; these form the contents of the emphasis - inline. - -10. Strong emphasis begins with a delimiter that - [can open strong emphasis] and ends with a delimiter that - [can close strong emphasis], and that uses the same character - (`_` or `*`) as the opening delimiter. - There must be a nonempty sequence of inlines between the open - delimiter and the closing delimiter; these form the contents of - the strong emphasis inline. - -11. A literal `*` character cannot occur at the beginning or end of - `*`-delimited emphasis or `**`-delimited strong emphasis, unless it - is backslash-escaped. - -12. A literal `_` character cannot occur at the beginning or end of - `_`-delimited emphasis or `__`-delimited strong emphasis, unless it - is backslash-escaped. - -Where rules 1--12 above are compatible with multiple parsings, -the following principles resolve ambiguity: - -13. The number of nestings should be minimized. Thus, for example, - an interpretation `...` is always preferred to - `...`. - -14. An interpretation `...` is always - preferred to `..`. - -15. When two potential emphasis or strong emphasis spans overlap, - so that the second begins before the first ends and ends after - the first ends, the first takes precedence. Thus, for example, - `*foo _bar* baz_` is parsed as `foo _bar baz_` rather - than `*foo bar* baz`. For the same reason, - `**foo*bar**` is parsed as `foobar*` - rather than `foo*bar`. - -16. When there are two potential emphasis or strong emphasis spans - with the same closing delimiter, the shorter one (the one that - opens later) takes precedence. Thus, for example, - `**foo **bar baz**` is parsed as `**foo bar baz` - rather than `foo **bar baz`. - -17. Inline code spans, links, images, and HTML tags group more tightly - than emphasis. So, when there is a choice between an interpretation - that contains one of these elements and one that does not, the - former always wins. Thus, for example, `*[foo*](bar)` is - parsed as `*foo*` rather than as - `[foo](bar)`. - -These rules can be illustrated through a series of examples. - -Rule 1: - -. -*foo bar* -. -

foo bar

-. - -This is not emphasis, because the opening `*` is followed by -whitespace, and hence not part of a [left-flanking delimiter run]: - -. -a * foo bar* -. -

a * foo bar*

-. - -This is not emphasis, because the opening `*` is preceded -by an alphanumeric and followed by punctuation, and hence -not part of a [left-flanking delimiter run]: - -. -a*"foo"* -. -

a*"foo"*

-. - -Unicode nonbreaking spaces count as whitespace, too: - -. -* a * -. -

* a *

-. - -Intraword emphasis with `*` is permitted: - -. -foo*bar* -. -

foobar

-. - -. -5*6*78 -. -

5678

-. - -Rule 2: - -. -_foo bar_ -. -

foo bar

-. - -This is not emphasis, because the opening `_` is followed by -whitespace: - -. -_ foo bar_ -. -

_ foo bar_

-. - -This is not emphasis, because the opening `_` is preceded -by an alphanumeric and followed by punctuation: - -. -a_"foo"_ -. -

a_"foo"_

-. - -Emphasis with `_` is not allowed inside words: - -. -foo_bar_ -. -

foo_bar_

-. - -. -5_6_78 -. -

5_6_78

-. - -. -пристаням_стремятся_ -. -

пристаням_стремятся_

-. - -Here `_` does not generate emphasis, because the first delimiter run -is right-flanking and the second left-flanking: - -. -aa_"bb"_cc -. -

aa_"bb"_cc

-. - -Here there is no emphasis, because the delimiter runs are -both left- and right-flanking: - -. -"aa"_"bb"_"cc" -. -

"aa"_"bb"_"cc"

-. - -Rule 3: - -This is not emphasis, because the closing delimiter does -not match the opening delimiter: - -. -_foo* -. -

_foo*

-. - -This is not emphasis, because the closing `*` is preceded by -whitespace: - -. -*foo bar * -. -

*foo bar *

-. - -This is not emphasis, because the second `*` is -preceded by punctuation and followed by an alphanumeric -(hence it is not part of a [right-flanking delimiter run]: - -. -*(*foo) -. -

*(*foo)

-. - -The point of this restriction is more easily appreciated -with this example: - -. -*(*foo*)* -. -

(foo)

-. - -Intraword emphasis with `*` is allowed: - -. -*foo*bar -. -

foobar

-. - - -Rule 4: - -This is not emphasis, because the closing `_` is preceded by -whitespace: - -. -_foo bar _ -. -

_foo bar _

-. - -This is not emphasis, because the second `_` is -preceded by punctuation and followed by an alphanumeric: - -. -_(_foo) -. -

_(_foo)

-. - -This is emphasis within emphasis: - -. -_(_foo_)_ -. -

(foo)

-. - -Intraword emphasis is disallowed for `_`: - -. -_foo_bar -. -

_foo_bar

-. - -. -_пристаням_стремятся -. -

_пристаням_стремятся

-. - -. -_foo_bar_baz_ -. -

foo_bar_baz

-. - -Rule 5: - -. -**foo bar** -. -

foo bar

-. - -This is not strong emphasis, because the opening delimiter is -followed by whitespace: - -. -** foo bar** -. -

** foo bar**

-. - -This is not strong emphasis, because the opening `**` is preceded -by an alphanumeric and followed by punctuation, and hence -not part of a [left-flanking delimiter run]: - -. -a**"foo"** -. -

a**"foo"**

-. - -Intraword strong emphasis with `**` is permitted: - -. -foo**bar** -. -

foobar

-. - -Rule 6: - -. -__foo bar__ -. -

foo bar

-. - -This is not strong emphasis, because the opening delimiter is -followed by whitespace: - -. -__ foo bar__ -. -

__ foo bar__

-. - -This is not strong emphasis, because the opening `__` is preceded -by an alphanumeric and followed by punctuation: - -. -a__"foo"__ -. -

a__"foo"__

-. - -Intraword strong emphasis is forbidden with `__`: - -. -foo__bar__ -. -

foo__bar__

-. - -. -5__6__78 -. -

5__6__78

-. - -. -пристаням__стремятся__ -. -

пристаням__стремятся__

-. - -. -__foo, __bar__, baz__ -. -

foo, bar, baz

-. - -Rule 7: - -This is not strong emphasis, because the closing delimiter is preceded -by whitespace: - -. -**foo bar ** -. -

**foo bar **

-. - -(Nor can it be interpreted as an emphasized `*foo bar *`, because of -Rule 11.) - -This is not strong emphasis, because the second `**` is -preceded by punctuation and followed by an alphanumeric: - -. -**(**foo) -. -

**(**foo)

-. - -The point of this restriction is more easily appreciated -with these examples: - -. -*(**foo**)* -. -

(foo)

-. - -. -**Gomphocarpus (*Gomphocarpus physocarpus*, syn. -*Asclepias physocarpa*)** -. -

Gomphocarpus (Gomphocarpus physocarpus, syn. -Asclepias physocarpa)

-. - -. -**foo "*bar*" foo** -. -

foo "bar" foo

-. - -Intraword emphasis: - -. -**foo**bar -. -

foobar

-. - -Rule 8: - -This is not strong emphasis, because the closing delimiter is -preceded by whitespace: - -. -__foo bar __ -. -

__foo bar __

-. - -This is not strong emphasis, because the second `__` is -preceded by punctuation and followed by an alphanumeric: - -. -__(__foo) -. -

__(__foo)

-. - -The point of this restriction is more easily appreciated -with this example: - -. -_(__foo__)_ -. -

(foo)

-. - -Intraword strong emphasis is forbidden with `__`: - -. -__foo__bar -. -

__foo__bar

-. - -. -__пристаням__стремятся -. -

__пристаням__стремятся

-. - -. -__foo__bar__baz__ -. -

foo__bar__baz

-. - -Rule 9: - -Any nonempty sequence of inline elements can be the contents of an -emphasized span. - -. -*foo [bar](/url)* -. -

foo bar

-. - -. -*foo -bar* -. -

foo -bar

-. - -In particular, emphasis and strong emphasis can be nested -inside emphasis: - -. -_foo __bar__ baz_ -. -

foo bar baz

-. - -. -_foo _bar_ baz_ -. -

foo bar baz

-. - -. -__foo_ bar_ -. -

foo bar

-. - -. -*foo *bar** -. -

foo bar

-. - -. -*foo **bar** baz* -. -

foo bar baz

-. - -But note: - -. -*foo**bar**baz* -. -

foobarbaz

-. - -The difference is that in the preceding case, the internal delimiters -[can close emphasis], while in the cases with spaces, they cannot. - -. -***foo** bar* -. -

foo bar

-. - -. -*foo **bar*** -. -

foo bar

-. - -Note, however, that in the following case we get no strong -emphasis, because the opening delimiter is closed by the first -`*` before `bar`: - -. -*foo**bar*** -. -

foobar**

-. - - -Indefinite levels of nesting are possible: - -. -*foo **bar *baz* bim** bop* -. -

foo bar baz bim bop

-. - -. -*foo [*bar*](/url)* -. -

foo bar

-. - -There can be no empty emphasis or strong emphasis: - -. -** is not an empty emphasis -. -

** is not an empty emphasis

-. - -. -**** is not an empty strong emphasis -. -

**** is not an empty strong emphasis

-. - - -Rule 10: - -Any nonempty sequence of inline elements can be the contents of an -strongly emphasized span. - -. -**foo [bar](/url)** -. -

foo bar

-. - -. -**foo -bar** -. -

foo -bar

-. - -In particular, emphasis and strong emphasis can be nested -inside strong emphasis: - -. -__foo _bar_ baz__ -. -

foo bar baz

-. - -. -__foo __bar__ baz__ -. -

foo bar baz

-. - -. -____foo__ bar__ -. -

foo bar

-. - -. -**foo **bar**** -. -

foo bar

-. - -. -**foo *bar* baz** -. -

foo bar baz

-. - -But note: - -. -**foo*bar*baz** -. -

foobarbaz**

-. - -The difference is that in the preceding case, the internal delimiters -[can close emphasis], while in the cases with spaces, they cannot. - -. -***foo* bar** -. -

foo bar

-. - -. -**foo *bar*** -. -

foo bar

-. - -Indefinite levels of nesting are possible: - -. -**foo *bar **baz** -bim* bop** -. -

foo bar baz -bim bop

-. - -. -**foo [*bar*](/url)** -. -

foo bar

-. - -There can be no empty emphasis or strong emphasis: - -. -__ is not an empty emphasis -. -

__ is not an empty emphasis

-. - -. -____ is not an empty strong emphasis -. -

____ is not an empty strong emphasis

-. - - -Rule 11: - -. -foo *** -. -

foo ***

-. - -. -foo *\** -. -

foo *

-. - -. -foo *_* -. -

foo _

-. - -. -foo ***** -. -

foo *****

-. - -. -foo **\*** -. -

foo *

-. - -. -foo **_** -. -

foo _

-. - -Note that when delimiters do not match evenly, Rule 11 determines -that the excess literal `*` characters will appear outside of the -emphasis, rather than inside it: - -. -**foo* -. -

*foo

-. - -. -*foo** -. -

foo*

-. - -. -***foo** -. -

*foo

-. - -. -****foo* -. -

***foo

-. - -. -**foo*** -. -

foo*

-. - -. -*foo**** -. -

foo***

-. - - -Rule 12: - -. -foo ___ -. -

foo ___

-. - -. -foo _\__ -. -

foo _

-. - -. -foo _*_ -. -

foo *

-. - -. -foo _____ -. -

foo _____

-. - -. -foo __\___ -. -

foo _

-. - -. -foo __*__ -. -

foo *

-. - -. -__foo_ -. -

_foo

-. - -Note that when delimiters do not match evenly, Rule 12 determines -that the excess literal `_` characters will appear outside of the -emphasis, rather than inside it: - -. -_foo__ -. -

foo_

-. - -. -___foo__ -. -

_foo

-. - -. -____foo_ -. -

___foo

-. - -. -__foo___ -. -

foo_

-. - -. -_foo____ -. -

foo___

-. - -Rule 13 implies that if you want emphasis nested directly inside -emphasis, you must use different delimiters: - -. -**foo** -. -

foo

-. - -. -*_foo_* -. -

foo

-. - -. -__foo__ -. -

foo

-. - -. -_*foo*_ -. -

foo

-. - -However, strong emphasis within strong emphasis is possible without -switching delimiters: - -. -****foo**** -. -

foo

-. - -. -____foo____ -. -

foo

-. - - -Rule 13 can be applied to arbitrarily long sequences of -delimiters: - -. -******foo****** -. -

foo

-. - -Rule 14: - -. -***foo*** -. -

foo

-. - -. -_____foo_____ -. -

foo

-. - -Rule 15: - -. -*foo _bar* baz_ -. -

foo _bar baz_

-. - -. -**foo*bar** -. -

foobar*

-. - - -Rule 16: - -. -**foo **bar baz** -. -

**foo bar baz

-. - -. -*foo *bar baz* -. -

*foo bar baz

-. - -Rule 17: - -. -*[bar*](/url) -. -

*bar*

-. - -. -_foo [bar_](/url) -. -

_foo bar_

-. - -. -* -. -

*

-. - -. -** -. -

**

-. - -. -__ -. -

__

-. - -. -*a `*`* -. -

a *

-. - -. -_a `_`_ -. -

a _

-. - -. -**a -. -

**ahttp://foo.bar?q=**

-. - -. -__a -. -

__ahttp://foo.bar?q=__

-. - - -## Links - -A link contains [link text] (the visible text), a [link destination] -(the URI that is the link destination), and optionally a [link title]. -There are two basic kinds of links in Markdown. In [inline link]s the -destination and title are given immediately after the link text. In -[reference link]s the destination and title are defined elsewhere in -the document. - -A [link text](@link-text) consists of a sequence of zero or more -inline elements enclosed by square brackets (`[` and `]`). The -following rules apply: - -- Links may not contain other links, at any level of nesting. - -- Brackets are allowed in the [link text] only if (a) they - are backslash-escaped or (b) they appear as a matched pair of brackets, - with an open bracket `[`, a sequence of zero or more inlines, and - a close bracket `]`. - -- Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly - than the brackets in link text. Thus, for example, - `` [foo`]` `` could not be a link text, since the second `]` - is part of a code span. - -- The brackets in link text bind more tightly than markers for - [emphasis and strong emphasis]. Thus, for example, `*[foo*](url)` is a link. - -A [link destination](@link-destination) consists of either - -- a sequence of zero or more characters between an opening `<` and a - closing `>` that contains no line breaks or unescaped `<` or `>` - characters, or - -- a nonempty sequence of characters that does not include - ASCII space or control characters, and includes parentheses - only if (a) they are backslash-escaped or (b) they are part of - a balanced pair of unescaped parentheses that is not itself - inside a balanced pair of unescaped paretheses. - -A [link title](@link-title) consists of either - -- a sequence of zero or more characters between straight double-quote - characters (`"`), including a `"` character only if it is - backslash-escaped, or - -- a sequence of zero or more characters between straight single-quote - characters (`'`), including a `'` character only if it is - backslash-escaped, or - -- a sequence of zero or more characters between matching parentheses - (`(...)`), including a `)` character only if it is backslash-escaped. - -An [inline link](@inline-link) consists of a [link text] followed immediately -by a left parenthesis `(`, optional [whitespace], an optional -[link destination], an optional [link title] separated from the link -destination by [whitespace], optional [whitespace], and a right -parenthesis `)`. The link's text consists of the inlines contained -in the [link text] (excluding the enclosing square brackets). -The link's URI consists of the link destination, excluding enclosing -`<...>` if present, with backslash-escapes in effect as described -above. The link's title consists of the link title, excluding its -enclosing delimiters, with backslash-escapes in effect as described -above. - -Here is a simple inline link: - -. -[link](/uri "title") -. -

link

-. - -The title may be omitted: - -. -[link](/uri) -. -

link

-. - -Both the title and the destination may be omitted: - -. -[link]() -. -

link

-. - -. -[link](<>) -. -

link

-. - -If the destination contains spaces, it must be enclosed in pointy -braces: - -. -[link](/my uri) -. -

[link](/my uri)

-. - -. -[link](
) -. -

link

-. - -The destination cannot contain line breaks, even with pointy braces: - -. -[link](foo -bar) -. -

[link](foo -bar)

-. - -. -[link]() -. -

[link]()

-. - -One level of balanced parentheses is allowed without escaping: - -. -[link]((foo)and(bar)) -. -

link

-. - -However, if you have parentheses within parentheses, you need to escape -or use the `<...>` form: - -. -[link](foo(and(bar))) -. -

[link](foo(and(bar)))

-. - -. -[link](foo(and\(bar\))) -. -

link

-. - -. -[link]() -. -

link

-. - -Parentheses and other symbols can also be escaped, as usual -in Markdown: - -. -[link](foo\)\:) -. -

link

-. - -URL-escaping should be left alone inside the destination, as all -URL-escaped characters are also valid URL characters. HTML entities in -the destination will be parsed into their UTF-8 codepoints, as usual, and -optionally URL-escaped when written as HTML. - -. -[link](foo%20bä) -. -

link

-. - -Note that, because titles can often be parsed as destinations, -if you try to omit the destination and keep the title, you'll -get unexpected results: - -. -[link]("title") -. -

link

-. - -Titles may be in single quotes, double quotes, or parentheses: - -. -[link](/url "title") -[link](/url 'title') -[link](/url (title)) -. -

link -link -link

-. - -Backslash escapes and entities may be used in titles: - -. -[link](/url "title \""") -. -

link

-. - -Nested balanced quotes are not allowed without escaping: - -. -[link](/url "title "and" title") -. -

[link](/url "title "and" title")

-. - -But it is easy to work around this by using a different quote type: - -. -[link](/url 'title "and" title') -. -

link

-. - -(Note: `Markdown.pl` did allow double quotes inside a double-quoted -title, and its test suite included a test demonstrating this. -But it is hard to see a good rationale for the extra complexity this -brings, since there are already many ways---backslash escaping, -entities, or using a different quote type for the enclosing title---to -write titles containing double quotes. `Markdown.pl`'s handling of -titles has a number of other strange features. For example, it allows -single-quoted titles in inline links, but not reference links. And, in -reference links but not inline links, it allows a title to begin with -`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing -quotation mark, though 1.0.2b8 does not. It seems preferable to adopt -a simple, rational rule that works the same way in inline links and -link reference definitions.) - -[Whitespace] is allowed around the destination and title: - -. -[link]( /uri - "title" ) -. -

link

-. - -But it is not allowed between the link text and the -following parenthesis: - -. -[link] (/uri) -. -

[link] (/uri)

-. - -The link text may contain balanced brackets, but not unbalanced ones, -unless they are escaped: - -. -[link [foo [bar]]](/uri) -. -

link [foo [bar]]

-. - -. -[link] bar](/uri) -. -

[link] bar](/uri)

-. - -. -[link [bar](/uri) -. -

[link bar

-. - -. -[link \[bar](/uri) -. -

link [bar

-. - -The link text may contain inline content: - -. -[link *foo **bar** `#`*](/uri) -. -

link foo bar #

-. - -. -[![moon](moon.jpg)](/uri) -. -

moon

-. - -However, links may not contain other links, at any level of nesting. - -. -[foo [bar](/uri)](/uri) -. -

[foo bar](/uri)

-. - -. -[foo *[bar [baz](/uri)](/uri)*](/uri) -. -

[foo [bar baz](/uri)](/uri)

-. - -. -![[[foo](uri1)](uri2)](uri3) -. -

[foo](uri2)

-. - -These cases illustrate the precedence of link text grouping over -emphasis grouping: - -. -*[foo*](/uri) -. -

*foo*

-. - -. -[foo *bar](baz*) -. -

foo *bar

-. - -Note that brackets that *aren't* part of links do not take -precedence: - -. -*foo [bar* baz] -. -

foo [bar baz]

-. - -These cases illustrate the precedence of HTML tags, code spans, -and autolinks over link grouping: - -. -[foo -. -

[foo

-. - -. -[foo`](/uri)` -. -

[foo](/uri)

-. - -. -[foo -. -

[foohttp://example.com?search=](uri)

-. - -There are three kinds of [reference link](@reference-link)s: -[full](#full-reference-link), [collapsed](#collapsed-reference-link), -and [shortcut](#shortcut-reference-link). - -A [full reference link](@full-reference-link) -consists of a [link text], optional [whitespace], and a [link label] -that [matches] a [link reference definition] elsewhere in the document. - -A [link label](@link-label) begins with a left bracket (`[`) and ends -with the first right bracket (`]`) that is not backslash-escaped. -Unescaped square bracket characters are not allowed in -[link label]s. A link label can have at most 999 -characters inside the square brackets. - -One label [matches](@matches) -another just in case their normalized forms are equal. To normalize a -label, perform the *unicode case fold* and collapse consecutive internal -[whitespace] to a single space. If there are multiple -matching reference link definitions, the one that comes first in the -document is used. (It is desirable in such cases to emit a warning.) - -The contents of the first link label are parsed as inlines, which are -used as the link's text. The link's URI and title are provided by the -matching [link reference definition]. - -Here is a simple example: - -. -[foo][bar] - -[bar]: /url "title" -. -

foo

-. - -The rules for the [link text] are the same as with -[inline link]s. Thus: - -The link text may contain balanced brackets, but not unbalanced ones, -unless they are escaped: - -. -[link [foo [bar]]][ref] - -[ref]: /uri -. -

link [foo [bar]]

-. - -. -[link \[bar][ref] - -[ref]: /uri -. -

link [bar

-. - -The link text may contain inline content: - -. -[link *foo **bar** `#`*][ref] - -[ref]: /uri -. -

link foo bar #

-. - -. -[![moon](moon.jpg)][ref] - -[ref]: /uri -. -

moon

-. - -However, links may not contain other links, at any level of nesting. - -. -[foo [bar](/uri)][ref] - -[ref]: /uri -. -

[foo bar]ref

-. - -. -[foo *bar [baz][ref]*][ref] - -[ref]: /uri -. -

[foo bar baz]ref

-. - -(In the examples above, we have two [shortcut reference link]s -instead of one [full reference link].) - -The following cases illustrate the precedence of link text grouping over -emphasis grouping: - -. -*[foo*][ref] - -[ref]: /uri -. -

*foo*

-. - -. -[foo *bar][ref] - -[ref]: /uri -. -

foo *bar

-. - -These cases illustrate the precedence of HTML tags, code spans, -and autolinks over link grouping: - -. -[foo - -[ref]: /uri -. -

[foo

-. - -. -[foo`][ref]` - -[ref]: /uri -. -

[foo][ref]

-. - -. -[foo - -[ref]: /uri -. -

[foohttp://example.com?search=][ref]

-. - -Matching is case-insensitive: - -. -[foo][BaR] - -[bar]: /url "title" -. -

foo

-. - -Unicode case fold is used: - -. -[Толпой][Толпой] is a Russian word. - -[ТОЛПОЙ]: /url -. -

Толпой is a Russian word.

-. - -Consecutive internal [whitespace] is treated as one space for -purposes of determining matching: - -. -[Foo - bar]: /url - -[Baz][Foo bar] -. -

Baz

-. - -There can be [whitespace] between the [link text] and the [link label]: - -. -[foo] [bar] - -[bar]: /url "title" -. -

foo

-. - -. -[foo] -[bar] - -[bar]: /url "title" -. -

foo

-. - -When there are multiple matching [link reference definition]s, -the first is used: - -. -[foo]: /url1 - -[foo]: /url2 - -[bar][foo] -. -

bar

-. - -Note that matching is performed on normalized strings, not parsed -inline content. So the following does not match, even though the -labels define equivalent inline content: - -. -[bar][foo\!] - -[foo!]: /url -. -

[bar][foo!]

-. - -[Link label]s cannot contain brackets, unless they are -backslash-escaped: - -. -[foo][ref[] - -[ref[]: /uri -. -

[foo][ref[]

-

[ref[]: /uri

-. - -. -[foo][ref[bar]] - -[ref[bar]]: /uri -. -

[foo][ref[bar]]

-

[ref[bar]]: /uri

-. - -. -[[[foo]]] - -[[[foo]]]: /url -. -

[[[foo]]]

-

[[[foo]]]: /url

-. - -. -[foo][ref\[] - -[ref\[]: /uri -. -

foo

-. - -A [collapsed reference link](@collapsed-reference-link) -consists of a [link label] that [matches] a -[link reference definition] elsewhere in the -document, optional [whitespace], and the string `[]`. -The contents of the first link label are parsed as inlines, -which are used as the link's text. The link's URI and title are -provided by the matching reference link definition. Thus, -`[foo][]` is equivalent to `[foo][foo]`. - -. -[foo][] - -[foo]: /url "title" -. -

foo

-. - -. -[*foo* bar][] - -[*foo* bar]: /url "title" -. -

foo bar

-. - -The link labels are case-insensitive: - -. -[Foo][] - -[foo]: /url "title" -. -

Foo

-. - - -As with full reference links, [whitespace] is allowed -between the two sets of brackets: - -. -[foo] -[] - -[foo]: /url "title" -. -

foo

-. - -A [shortcut reference link](@shortcut-reference-link) -consists of a [link label] that [matches] a -[link reference definition] elsewhere in the -document and is not followed by `[]` or a link label. -The contents of the first link label are parsed as inlines, -which are used as the link's text. the link's URI and title -are provided by the matching link reference definition. -Thus, `[foo]` is equivalent to `[foo][]`. - -. -[foo] - -[foo]: /url "title" -. -

foo

-. - -. -[*foo* bar] - -[*foo* bar]: /url "title" -. -

foo bar

-. - -. -[[*foo* bar]] - -[*foo* bar]: /url "title" -. -

[foo bar]

-. - -The link labels are case-insensitive: - -. -[Foo] - -[foo]: /url "title" -. -

Foo

-. - -A space after the link text should be preserved: - -. -[foo] bar - -[foo]: /url -. -

foo bar

-. - -If you just want bracketed text, you can backslash-escape the -opening bracket to avoid links: - -. -\[foo] - -[foo]: /url "title" -. -

[foo]

-. - -Note that this is a link, because a link label ends with the first -following closing bracket: - -. -[foo*]: /url - -*[foo*] -. -

*foo*

-. - -Full references take precedence over shortcut references: - -. -[foo][bar] - -[foo]: /url1 -[bar]: /url2 -. -

foo

-. - -In the following case `[bar][baz]` is parsed as a reference, -`[foo]` as normal text: - -. -[foo][bar][baz] - -[baz]: /url -. -

[foo]bar

-. - -Here, though, `[foo][bar]` is parsed as a reference, since -`[bar]` is defined: - -. -[foo][bar][baz] - -[baz]: /url1 -[bar]: /url2 -. -

foobaz

-. - -Here `[foo]` is not parsed as a shortcut reference, because it -is followed by a link label (even though `[bar]` is not defined): - -. -[foo][bar][baz] - -[baz]: /url1 -[foo]: /url2 -. -

[foo]bar

-. - - -## Images - -Syntax for images is like the syntax for links, with one -difference. Instead of [link text], we have an -[image description](@image-description). The rules for this are the -same as for [link text], except that (a) an -image description starts with `![` rather than `[`, and -(b) an image description may contain links. -An image description has inline elements -as its contents. When an image is rendered to HTML, -this is standardly used as the image's `alt` attribute. - -. -![foo](/url "title") -. -

foo

-. - -. -![foo *bar*] - -[foo *bar*]: train.jpg "train & tracks" -. -

foo bar

-. - -. -![foo ![bar](/url)](/url2) -. -

foo bar

-. - -. -![foo [bar](/url)](/url2) -. -

foo bar

-. - -Though this spec is concerned with parsing, not rendering, it is -recommended that in rendering to HTML, only the plain string content -of the [image description] be used. Note that in -the above example, the alt attribute's value is `foo bar`, not `foo -[bar](/url)` or `foo bar`. Only the plain string -content is rendered, without formatting. - -. -![foo *bar*][] - -[foo *bar*]: train.jpg "train & tracks" -. -

foo bar

-. - -. -![foo *bar*][foobar] - -[FOOBAR]: train.jpg "train & tracks" -. -

foo bar

-. - -. -![foo](train.jpg) -. -

foo

-. - -. -My ![foo bar](/path/to/train.jpg "title" ) -. -

My foo bar

-. - -. -![foo]() -. -

foo

-. - -. -![](/url) -. -

-. - -Reference-style: - -. -![foo] [bar] - -[bar]: /url -. -

foo

-. - -. -![foo] [bar] - -[BAR]: /url -. -

foo

-. - -Collapsed: - -. -![foo][] - -[foo]: /url "title" -. -

foo

-. - -. -![*foo* bar][] - -[*foo* bar]: /url "title" -. -

foo bar

-. - -The labels are case-insensitive: - -. -![Foo][] - -[foo]: /url "title" -. -

Foo

-. - -As with full reference links, [whitespace] is allowed -between the two sets of brackets: - -. -![foo] -[] - -[foo]: /url "title" -. -

foo

-. - -Shortcut: - -. -![foo] - -[foo]: /url "title" -. -

foo

-. - -. -![*foo* bar] - -[*foo* bar]: /url "title" -. -

foo bar

-. - -Note that link labels cannot contain unescaped brackets: - -. -![[foo]] - -[[foo]]: /url "title" -. -

![[foo]]

-

[[foo]]: /url "title"

-. - -The link labels are case-insensitive: - -. -![Foo] - -[foo]: /url "title" -. -

Foo

-. - -If you just want bracketed text, you can backslash-escape the -opening `!` and `[`: - -. -\!\[foo] - -[foo]: /url "title" -. -

![foo]

-. - -If you want a link after a literal `!`, backslash-escape the -`!`: - -. -\![foo] - -[foo]: /url "title" -. -

!foo

-. - -## Autolinks - -[Autolink](@autolink)s are absolute URIs and email addresses inside -`<` and `>`. They are parsed as links, with the URL or email address -as the link label. - -A [URI autolink](@uri-autolink) consists of `<`, followed by an -[absolute URI] not containing `<`, followed by `>`. It is parsed as -a link to the URI, with the URI as the link's label. - -An [absolute URI](@absolute-uri), -for these purposes, consists of a [scheme] followed by a colon (`:`) -followed by zero or more characters other than ASCII -[whitespace] and control characters, `<`, and `>`. If -the URI includes these characters, you must use percent-encoding -(e.g. `%20` for a space). - -The following [schemes](@scheme) -are recognized (case-insensitive): -`coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`, -`cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`, -`gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`, -`ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`, -`mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`, -`ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`, -`service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,` -soap.beep`, `soap.beeps`, `tag`, `tel`, `telnet`, `tftp`, `thismessage`, -`tn3270`, `tip`, `tv`, `urn`, `vemmi`, `ws`, `wss`, `xcon`, -`xcon-userid`, `xmlrpc.beep`, `xmlrpc.beeps`, `xmpp`, `z39.50r`, -`z39.50s`, `adiumxtra`, `afp`, `afs`, `aim`, `apt`,` attachment`, `aw`, -`beshare`, `bitcoin`, `bolo`, `callto`, `chrome`,` chrome-extension`, -`com-eventbrite-attendee`, `content`, `cvs`,` dlna-playsingle`, -`dlna-playcontainer`, `dtn`, `dvb`, `ed2k`, `facetime`, `feed`, -`finger`, `fish`, `gg`, `git`, `gizmoproject`, `gtalk`, `hcp`, `icon`, -`ipn`, `irc`, `irc6`, `ircs`, `itms`, `jar`, `jms`, `keyparc`, `lastfm`, -`ldaps`, `magnet`, `maps`, `market`,` message`, `mms`, `ms-help`, -`msnim`, `mumble`, `mvn`, `notes`, `oid`, `palm`, `paparazzi`, -`platform`, `proxy`, `psyc`, `query`, `res`, `resource`, `rmi`, `rsync`, -`rtmp`, `secondlife`, `sftp`, `sgn`, `skype`, `smb`, `soldat`, -`spotify`, `ssh`, `steam`, `svn`, `teamspeak`, `things`, `udp`, -`unreal`, `ut2004`, `ventrilo`, `view-source`, `webcal`, `wtai`, -`wyciwyg`, `xfire`, `xri`, `ymsgr`. - -Here are some valid autolinks: - -. - -. -

http://foo.bar.baz

-. - -. - -. -

http://foo.bar.baz?q=hello&id=22&boolean

-. - -. - -. -

irc://foo.bar:2233/baz

-. - -Uppercase is also fine: - -. - -. -

MAILTO:FOO@BAR.BAZ

-. - -Spaces are not allowed in autolinks: - -. - -. -

<http://foo.bar/baz bim>

-. - -An [email autolink](@email-autolink) -consists of `<`, followed by an [email address], -followed by `>`. The link's label is the email address, -and the URL is `mailto:` followed by the email address. - -An [email address](@email-address), -for these purposes, is anything that matches -the [non-normative regex from the HTML5 -spec](https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)): - - /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? - (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ - -Examples of email autolinks: - -. - -. -

foo@bar.example.com

-. - -. - -. -

foo+special@Bar.baz-bar0.com

-. - -These are not autolinks: - -. -<> -. -

<>

-. - -. - -. -

<heck://bing.bong>

-. - -. -< http://foo.bar > -. -

< http://foo.bar >

-. - -. - -. -

<foo.bar.baz>

-. - -. - -. -

<localhost:5001/foo>

-. - -. -http://example.com -. -

http://example.com

-. - -. -foo@bar.example.com -. -

foo@bar.example.com

-. - -## Raw HTML - -Text between `<` and `>` that looks like an HTML tag is parsed as a -raw HTML tag and will be rendered in HTML without escaping. -Tag and attribute names are not limited to current HTML tags, -so custom tags (and even, say, DocBook tags) may be used. - -Here is the grammar for tags: - -A [tag name](@tag-name) consists of an ASCII letter -followed by zero or more ASCII letters or digits. - -An [attribute](@attribute) consists of [whitespace], -an [attribute name], and an optional -[attribute value specification]. - -An [attribute name](@attribute-name) -consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII -letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML -specification restricted to ASCII. HTML5 is laxer.) - -An [attribute value specification](@attribute-value-specification) -consists of optional [whitespace], -a `=` character, optional [whitespace], and an [attribute -value]. - -An [attribute value](@attribute-value) -consists of an [unquoted attribute value], -a [single-quoted attribute value], or a [double-quoted attribute value]. - -An [unquoted attribute value](@unquoted-attribute-value) -is a nonempty string of characters not -including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``. - -A [single-quoted attribute value](@single-quoted-attribute-value) -consists of `'`, zero or more -characters not including `'`, and a final `'`. - -A [double-quoted attribute value](@double-quoted-attribute-value) -consists of `"`, zero or more -characters not including `"`, and a final `"`. - -An [open tag](@open-tag) consists of a `<` character, a [tag name], -zero or more [attributes], optional [whitespace], an optional `/` -character, and a `>` character. - -A [closing tag](@closing-tag) consists of the string ``. - -An [HTML comment](@html-comment) consists of ``, -where *text* does not start with `>` or `->`, does not end with `-`, -and does not contain `--`. (See the -[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) - -A [processing instruction](@processing-instruction) -consists of the string ``, and the string -`?>`. - -A [declaration](@declaration) consists of the -string ``, and the character `>`. - -A [CDATA section](@cdata-section) consists of -the string ``, and the string `]]>`. - -An [HTML tag](@html-tag) consists of an [open tag], a [closing tag], -an [HTML comment], a [processing instruction], a [declaration], -or a [CDATA section]. - -Here are some simple open tags: - -. - -. -

-. - -Empty elements: - -. - -. -

-. - -[Whitespace] is allowed: - -. - -. -

-. - -With attributes: - -. - -. -

-. - -Illegal tag names, not parsed as HTML: - -. -<33> <__> -. -

<33> <__>

-. - -Illegal attribute names: - -. -
-. -

<a h*#ref="hi">

-. - -Illegal attribute values: - -. -
-. -

</a href="foo">

-. - -Comments: - -. -foo -. -

foo

-. - -. -foo -. -

foo <!-- not a comment -- two hyphens -->

-. - -Not comments: - -. -foo foo --> - -foo -. -

foo <!--> foo -->

-

foo <!-- foo--->

-. - -Processing instructions: - -. -foo -. -

foo

-. - -Declarations: - -. -foo -. -

foo

-. - -CDATA sections: - -. -foo &<]]> -. -

foo &<]]>

-. - -Entities are preserved in HTML attributes: - -. -
-. -

-. - -Backslash escapes do not work in HTML attributes: - -. - -. -

-. - -. - -. -

<a href=""">

-. - -## Hard line breaks - -A line break (not in a code span or HTML tag) that is preceded -by two or more spaces and does not occur at the end of a block -is parsed as a [hard line break](@hard-line-break) (rendered -in HTML as a `
` tag): - -. -foo -baz -. -

foo
-baz

-. - -For a more visible alternative, a backslash before the -[line ending] may be used instead of two spaces: - -. -foo\ -baz -. -

foo
-baz

-. - -More than two spaces can be used: - -. -foo -baz -. -

foo
-baz

-. - -Leading spaces at the beginning of the next line are ignored: - -. -foo - bar -. -

foo
-bar

-. - -. -foo\ - bar -. -

foo
-bar

-. - -Line breaks can occur inside emphasis, links, and other constructs -that allow inline content: - -. -*foo -bar* -. -

foo
-bar

-. - -. -*foo\ -bar* -. -

foo
-bar

-. - -Line breaks do not occur inside code spans - -. -`code -span` -. -

code span

-. - -. -`code\ -span` -. -

code\ span

-. - -or HTML tags: - -. -
-. -

-. - -. - -. -

-. - -Hard line breaks are for separating inline content within a block. -Neither syntax for hard line breaks works at the end of a paragraph or -other block element: - -. -foo\ -. -

foo\

-. - -. -foo -. -

foo

-. - -. -### foo\ -. -

foo\

-. - -. -### foo -. -

foo

-. - -## Soft line breaks - -A regular line break (not in a code span or HTML tag) that is not -preceded by two or more spaces is parsed as a softbreak. (A -softbreak may be rendered in HTML either as a -[line ending] or as a space. The result will be the same -in browsers. In the examples here, a [line ending] will be used.) - -. -foo -baz -. -

foo -baz

-. - -Spaces at the end of the line and beginning of the next line are -removed: - -. -foo - baz -. -

foo -baz

-. - -A conforming parser may render a soft line break in HTML either as a -line break or as a space. - -A renderer may also provide an option to render soft line breaks -as hard line breaks. - -## Textual content - -Any characters not given an interpretation by the above rules will -be parsed as plain textual content. - -. -hello $.;'there -. -

hello $.;'there

-. - -. -Foo χρῆν -. -

Foo χρῆν

-. - -Internal spaces are preserved verbatim: - -. -Multiple spaces -. -

Multiple spaces

-. - - - -# Appendix A: A parsing strategy {-} - -## Overview {-} - -Parsing has two phases: - -1. In the first phase, lines of input are consumed and the block -structure of the document---its division into paragraphs, block quotes, -list items, and so on---is constructed. Text is assigned to these -blocks but not parsed. Link reference definitions are parsed and a -map of links is constructed. - -2. In the second phase, the raw text contents of paragraphs and headers -are parsed into sequences of Markdown inline elements (strings, -code spans, links, emphasis, and so on), using the map of link -references constructed in phase 1. - -## The document tree {-} - -At each point in processing, the document is represented as a tree of -**blocks**. The root of the tree is a `document` block. The `document` -may have any number of other blocks as **children**. These children -may, in turn, have other blocks as children. The last child of a block -is normally considered **open**, meaning that subsequent lines of input -can alter its contents. (Blocks that are not open are **closed**.) -Here, for example, is a possible document tree, with the open blocks -marked by arrows: - -``` tree --> document - -> block_quote - paragraph - "Lorem ipsum dolor\nsit amet." - -> list (type=bullet tight=true bullet_char=-) - list_item - paragraph - "Qui *quodsi iracundia*" - -> list_item - -> paragraph - "aliquando id" -``` - -## How source lines alter the document tree {-} - -Each line that is processed has an effect on this tree. The line is -analyzed and, depending on its contents, the document may be altered -in one or more of the following ways: - -1. One or more open blocks may be closed. -2. One or more new blocks may be created as children of the - last open block. -3. Text may be added to the last (deepest) open block remaining - on the tree. - -Once a line has been incorporated into the tree in this way, -it can be discarded, so input can be read in a stream. - -We can see how this works by considering how the tree above is -generated by four lines of Markdown: - -``` markdown -> Lorem ipsum dolor -sit amet. -> - Qui *quodsi iracundia* -> - aliquando id -``` - -At the outset, our document model is just - -``` tree --> document -``` - -The first line of our text, - -``` markdown -> Lorem ipsum dolor -``` - -causes a `block_quote` block to be created as a child of our -open `document` block, and a `paragraph` block as a child of -the `block_quote`. Then the text is added to the last open -block, the `paragraph`: - -``` tree --> document - -> block_quote - -> paragraph - "Lorem ipsum dolor" -``` - -The next line, - -``` markdown -sit amet. -``` - -is a "lazy continuation" of the open `paragraph`, so it gets added -to the paragraph's text: - -``` tree --> document - -> block_quote - -> paragraph - "Lorem ipsum dolor\nsit amet." -``` - -The third line, - -``` markdown -> - Qui *quodsi iracundia* -``` - -causes the `paragraph` block to be closed, and a new `list` block -opened as a child of the `block_quote`. A `list_item` is also -added as a child of the `list`, and a `paragraph` as a child of -the `list_item`. The text is then added to the new `paragraph`: - -``` tree --> document - -> block_quote - paragraph - "Lorem ipsum dolor\nsit amet." - -> list (type=bullet tight=true bullet_char=-) - -> list_item - -> paragraph - "Qui *quodsi iracundia*" -``` - -The fourth line, - -``` markdown -> - aliquando id -``` - -causes the `list_item` (and its child the `paragraph`) to be closed, -and a new `list_item` opened up as child of the `list`. A `paragraph` -is added as a child of the new `list_item`, to contain the text. -We thus obtain the final tree: - -``` tree --> document - -> block_quote - paragraph - "Lorem ipsum dolor\nsit amet." - -> list (type=bullet tight=true bullet_char=-) - list_item - paragraph - "Qui *quodsi iracundia*" - -> list_item - -> paragraph - "aliquando id" -``` - -## From block structure to the final document {-} - -Once all of the input has been parsed, all open blocks are closed. - -We then "walk the tree," visiting every node, and parse raw -string contents of paragraphs and headers as inlines. At this -point we have seen all the link reference definitions, so we can -resolve reference links as we go. - -``` tree -document - block_quote - paragraph - str "Lorem ipsum dolor" - softbreak - str "sit amet." - list (type=bullet tight=true bullet_char=-) - list_item - paragraph - str "Qui " - emph - str "quodsi iracundia" - list_item - paragraph - str "aliquando id" -``` - -Notice how the [line ending] in the first paragraph has -been parsed as a `softbreak`, and the asterisks in the first list item -have become an `emph`. - -The document can be rendered as HTML, or in any other format, given -an appropriate renderer. diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 11a27c6..0fba1b3 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -27,7 +27,7 @@ IF (PYTHONINTERP_FOUND) add_test(spectest_library ${PYTHON_EXECUTABLE} "${CMAKE_CURRENT_SOURCE_DIR}/spec_tests.py" "--no-normalize" "--spec" - "${CMAKE_SOURCE_DIR}/spec.txt" "--library-dir" "${CMAKE_BINARY_DIR}/src" + "${CMAKE_SOURCE_DIR}/test/spec.txt" "--library-dir" "${CMAKE_BINARY_DIR}/src" ) add_test(pathological_tests_library @@ -36,7 +36,7 @@ IF (PYTHONINTERP_FOUND) ) add_test(spectest_executable - ${PYTHON_EXECUTABLE} "${CMAKE_CURRENT_SOURCE_DIR}/spec_tests.py" "--no-normalize" "--spec" "${CMAKE_SOURCE_DIR}/spec.txt" "--program" "${CMAKE_BINARY_DIR}/src/cmark" + ${PYTHON_EXECUTABLE} "${CMAKE_CURRENT_SOURCE_DIR}/spec_tests.py" "--no-normalize" "--spec" "${CMAKE_SOURCE_DIR}/test/spec.txt" "--program" "${CMAKE_BINARY_DIR}/src/cmark" ) ELSE(PYTHONINTERP_FOUND) diff --git a/test/spec.txt b/test/spec.txt new file mode 100644 index 0000000..e754810 --- /dev/null +++ b/test/spec.txt @@ -0,0 +1,7321 @@ +--- +title: CommonMark Spec +author: John MacFarlane +version: 0.17 +date: 2015-01-24 +license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' +... + +# Introduction + +## What is Markdown? + +Markdown is a plain text format for writing structured documents, +based on conventions used for indicating formatting in email and +usenet posts. It was developed in 2004 by John Gruber, who wrote +the first Markdown-to-HTML converter in perl, and it soon became +widely used in websites. By 2014 there were dozens of +implementations in many languages. Some of them extended basic +Markdown syntax with conventions for footnotes, definition lists, +tables, and other constructs, and some allowed output not just in +HTML but in LaTeX and many other formats. + +## Why is a spec needed? + +John Gruber's [canonical description of Markdown's +syntax](http://daringfireball.net/projects/markdown/syntax) +does not specify the syntax unambiguously. Here are some examples of +questions it does not answer: + +1. How much indentation is needed for a sublist? The spec says that + continuation paragraphs need to be indented four spaces, but is + not fully explicit about sublists. It is natural to think that + they, too, must be indented four spaces, but `Markdown.pl` does + not require that. This is hardly a "corner case," and divergences + between implementations on this issue often lead to surprises for + users in real documents. (See [this comment by John + Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) + +2. Is a blank line needed before a block quote or header? + Most implementations do not require the blank line. However, + this can lead to unexpected results in hard-wrapped text, and + also to ambiguities in parsing (note that some implementations + put the header inside the blockquote, while others do not). + (John Gruber has also spoken [in favor of requiring the blank + lines](http://article.gmane.org/gmane.text.markdown.general/2146).) + +3. Is a blank line needed before an indented code block? + (`Markdown.pl` requires it, but this is not mentioned in the + documentation, and some implementations do not require it.) + + ``` markdown + paragraph + code? + ``` + +4. What is the exact rule for determining when list items get + wrapped in `

` tags? Can a list be partially "loose" and partially + "tight"? What should we do with a list like this? + + ``` markdown + 1. one + + 2. two + 3. three + ``` + + Or this? + + ``` markdown + 1. one + - a + + - b + 2. two + ``` + + (There are some relevant comments by John Gruber + [here](http://article.gmane.org/gmane.text.markdown.general/2554).) + +5. Can list markers be indented? Can ordered list markers be right-aligned? + + ``` markdown + 8. item 1 + 9. item 2 + 10. item 2a + ``` + +6. Is this one list with a horizontal rule in its second item, + or two lists separated by a horizontal rule? + + ``` markdown + * a + * * * * * + * b + ``` + +7. When list markers change from numbers to bullets, do we have + two lists or one? (The Markdown syntax description suggests two, + but the perl scripts and many other implementations produce one.) + + ``` markdown + 1. fee + 2. fie + - foe + - fum + ``` + +8. What are the precedence rules for the markers of inline structure? + For example, is the following a valid link, or does the code span + take precedence ? + + ``` markdown + [a backtick (`)](/url) and [another backtick (`)](/url). + ``` + +9. What are the precedence rules for markers of emphasis and strong + emphasis? For example, how should the following be parsed? + + ``` markdown + *foo *bar* baz* + ``` + +10. What are the precedence rules between block-level and inline-level + structure? For example, how should the following be parsed? + + ``` markdown + - `a long code span can contain a hyphen like this + - and it can screw things up` + ``` + +11. Can list items include section headers? (`Markdown.pl` does not + allow this, but does allow blockquotes to include headers.) + + ``` markdown + - # Heading + ``` + +12. Can list items be empty? + + ``` markdown + * a + * + * b + ``` + +13. Can link references be defined inside block quotes or list items? + + ``` markdown + > Blockquote [foo]. + > + > [foo]: /url + ``` + +14. If there are multiple definitions for the same reference, which takes + precedence? + + ``` markdown + [foo]: /url1 + [foo]: /url2 + + [foo][] + ``` + +In the absence of a spec, early implementers consulted `Markdown.pl` +to resolve these ambiguities. But `Markdown.pl` was quite buggy, and +gave manifestly bad results in many cases, so it was not a +satisfactory replacement for a spec. + +Because there is no unambiguous spec, implementations have diverged +considerably. As a result, users are often surprised to find that +a document that renders one way on one system (say, a github wiki) +renders differently on another (say, converting to docbook using +pandoc). To make matters worse, because nothing in Markdown counts +as a "syntax error," the divergence often isn't discovered right away. + +## About this document + +This document attempts to specify Markdown syntax unambiguously. +It contains many examples with side-by-side Markdown and +HTML. These are intended to double as conformance tests. An +accompanying script `spec_tests.py` can be used to run the tests +against any Markdown program: + + python test/spec_tests.py --spec spec.txt --program PROGRAM + +Since this document describes how Markdown is to be parsed into +an abstract syntax tree, it would have made sense to use an abstract +representation of the syntax tree instead of HTML. But HTML is capable +of representing the structural distinctions we need to make, and the +choice of HTML for the tests makes it possible to run the tests against +an implementation without writing an abstract syntax tree renderer. + +This document is generated from a text file, `spec.txt`, written +in Markdown with a small extension for the side-by-side tests. +The script `spec2md.pl` can be used to turn `spec.txt` into pandoc +Markdown, which can then be converted into other formats. + +In the examples, the `→` character is used to represent tabs. + +# Preliminaries + +## Characters and lines + +Any sequence of [character]s is a valid CommonMark +document. + +A [character](@character) is a unicode code point. +This spec does not specify an encoding; it thinks of lines as composed +of characters rather than bytes. A conforming parser may be limited +to a certain encoding. + +A [line](@line) is a sequence of zero or more [character]s +followed by a [line ending] or by the end of file. + +A [line ending](@line-ending) is, depending on the platform, a +newline (`U+000A`), carriage return (`U+000D`), or +carriage return + newline. + +For security reasons, a conforming parser must strip or replace the +Unicode character `U+0000`. + +A line containing no characters, or a line containing only spaces +(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). + +The following definitions of character classes will be used in this spec: + +A [whitespace character](@whitespace-character) is a space +(`U+0020`), tab (`U+0009`), carriage return (`U+000D`), or +newline (`U+000A`). + +[Whitespace](@whitespace) is a sequence of one or more [whitespace +character]s. + +A [unicode whitespace character](@unicode-whitespace-character) is +any code point in the unicode `Zs` class, or a tab (`U+0009`), +carriage return (`U+000D`), newline (`U+000A`), or form feed +(`U+000C`). + +[Unicode whitespace](@unicode-whitespace) is a sequence of one +or more [unicode whitespace character]s. + +A [non-space character](@non-space-character) is anything but `U+0020`. + +An [ASCII punctuation character](@ascii-punctuation-character) +is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, +`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, +`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. + +A [punctuation character](@punctuation-character) is an [ASCII +punctuation character] or anything in +the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. + +## Tab expansion + +Tabs in lines are expanded to spaces, with a tab stop of 4 characters: + +. +→foo→baz→→bim +. +

foo baz     bim
+
+. + +. + a→a + ὐ→a +. +
a   a
+ὐ   a
+
+. + +# Blocks and inlines + +We can think of a document as a sequence of +[blocks](@block)---structural +elements like paragraphs, block quotations, +lists, headers, rules, and code blocks. Blocks can contain other +blocks, or they can contain [inline](@inline) content: +words, spaces, links, emphasized text, images, and inline code. + +## Precedence + +Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span: + +. +- `one +- two` +. +
    +
  • `one
  • +
  • two`
  • +
+. + +This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headers, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other. + +## Container blocks and leaf blocks + +We can divide blocks into two types: +[container block](@container-block)s, +which can contain other blocks, and [leaf block](@leaf-block)s, +which cannot. + +# Leaf blocks + +This section describes the different kinds of leaf block that make up a +Markdown document. + +## Horizontal rules + +A line consisting of 0-3 spaces of indentation, followed by a sequence +of three or more matching `-`, `_`, or `*` characters, each followed +optionally by any number of spaces, forms a +[horizontal rule](@horizontal-rule). + +. +*** +--- +___ +. +
+
+
+. + +Wrong characters: + +. ++++ +. +

+++

+. + +. +=== +. +

===

+. + +Not enough characters: + +. +-- +** +__ +. +

-- +** +__

+. + +One to three spaces indent are allowed: + +. + *** + *** + *** +. +
+
+
+. + +Four spaces is too many: + +. + *** +. +
***
+
+. + +. +Foo + *** +. +

Foo +***

+. + +More than three characters may be used: + +. +_____________________________________ +. +
+. + +Spaces are allowed between the characters: + +. + - - - +. +
+. + +. + ** * ** * ** * ** +. +
+. + +. +- - - - +. +
+. + +Spaces are allowed at the end: + +. +- - - - +. +
+. + +However, no other characters may occur in the line: + +. +_ _ _ _ a + +a------ + +---a--- +. +

_ _ _ _ a

+

a------

+

---a---

+. + +It is required that all of the [non-space character]s be the same. +So, this is not a horizontal rule: + +. + *-* +. +

-

+. + +Horizontal rules do not need blank lines before or after: + +. +- foo +*** +- bar +. +
    +
  • foo
  • +
+
+
    +
  • bar
  • +
+. + +Horizontal rules can interrupt a paragraph: + +. +Foo +*** +bar +. +

Foo

+
+

bar

+. + +If a line of dashes that meets the above conditions for being a +horizontal rule could also be interpreted as the underline of a [setext +header], the interpretation as a +[setext header] takes precedence. Thus, for example, +this is a setext header, not a paragraph followed by a horizontal rule: + +. +Foo +--- +bar +. +

Foo

+

bar

+. + +When both a horizontal rule and a list item are possible +interpretations of a line, the horizontal rule takes precedence: + +. +* Foo +* * * +* Bar +. +
    +
  • Foo
  • +
+
+
    +
  • Bar
  • +
+. + +If you want a horizontal rule in a list item, use a different bullet: + +. +- Foo +- * * * +. +
    +
  • Foo
  • +
  • +
    +
  • +
+. + +## ATX headers + +An [ATX header](@atx-header) +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped `#` characters and an optional +closing sequence of any number of `#` characters. The opening sequence +of `#` characters cannot be followed directly by a +[non-space character]. +The optional closing sequence of `#`s must be preceded by a space and may be +followed by spaces only. The opening `#` character may be indented 0-3 +spaces. The raw contents of the header are stripped of leading and +trailing spaces before being parsed as inline content. The header level +is equal to the number of `#` characters in the opening sequence. + +Simple headers: + +. +# foo +## foo +### foo +#### foo +##### foo +###### foo +. +

foo

+

foo

+

foo

+

foo

+
foo
+
foo
+. + +More than six `#` characters is not a header: + +. +####### foo +. +

####### foo

+. + +A space is required between the `#` characters and the header's +contents. Note that many implementations currently do not require +the space. However, the space was required by the [original ATX +implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps +prevent things like the following from being parsed as headers: + +. +#5 bolt +. +

#5 bolt

+. + +This is not a header, because the first `#` is escaped: + +. +\## foo +. +

## foo

+. + +Contents are parsed as inlines: + +. +# foo *bar* \*baz\* +. +

foo bar *baz*

+. + +Leading and trailing blanks are ignored in parsing inline content: + +. +# foo +. +

foo

+. + +One to three spaces indentation are allowed: + +. + ### foo + ## foo + # foo +. +

foo

+

foo

+

foo

+. + +Four spaces are too much: + +. + # foo +. +
# foo
+
+. + +. +foo + # bar +. +

foo +# bar

+. + +A closing sequence of `#` characters is optional: + +. +## foo ## + ### bar ### +. +

foo

+

bar

+. + +It need not be the same length as the opening sequence: + +. +# foo ################################## +##### foo ## +. +

foo

+
foo
+. + +Spaces are allowed after the closing sequence: + +. +### foo ### +. +

foo

+. + +A sequence of `#` characters with a +[non-space character] following it +is not a closing sequence, but counts as part of the contents of the +header: + +. +### foo ### b +. +

foo ### b

+. + +The closing sequence must be preceded by a space: + +. +# foo# +. +

foo#

+. + +Backslash-escaped `#` characters do not count as part +of the closing sequence: + +. +### foo \### +## foo #\## +# foo \# +. +

foo ###

+

foo ###

+

foo #

+. + +ATX headers need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs: + +. +**** +## foo +**** +. +
+

foo

+
+. + +. +Foo bar +# baz +Bar foo +. +

Foo bar

+

baz

+

Bar foo

+. + +ATX headers can be empty: + +. +## +# +### ### +. +

+

+

+. + +## Setext headers + +A [setext header](@setext-header) +consists of a line of text, containing at least one +[non-space character], +with no more than 3 spaces indentation, followed by a [setext header +underline]. The line of text must be +one that, were it not followed by the setext header underline, +would be interpreted as part of a paragraph: it cannot be a code +block, header, blockquote, horizontal rule, or list. + +A [setext header underline](@setext-header-underline) is a sequence of +`=` characters or a sequence of `-` characters, with no more than 3 +spaces indentation and any number of trailing spaces. If a line +containing a single `-` can be interpreted as an +empty [list items], it should be interpreted this way +and not as a [setext header underline]. + +The header is a level 1 header if `=` characters are used in the +[setext header underline], and a level 2 +header if `-` characters are used. The contents of the header are the +result of parsing the first line as Markdown inline content. + +In general, a setext header need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext header comes after a paragraph, a blank line is needed between +them. + +Simple examples: + +. +Foo *bar* +========= + +Foo *bar* +--------- +. +

Foo bar

+

Foo bar

+. + +The underlining can be any length: + +. +Foo +------------------------- + +Foo += +. +

Foo

+

Foo

+. + +The header content can be indented up to three spaces, and need +not line up with the underlining: + +. + Foo +--- + + Foo +----- + + Foo + === +. +

Foo

+

Foo

+

Foo

+. + +Four spaces indent is too much: + +. + Foo + --- + + Foo +--- +. +
Foo
+---
+
+Foo
+
+
+. + +The setext header underline can be indented up to three spaces, and +may have trailing spaces: + +. +Foo + ---- +. +

Foo

+. + +Four spaces is too much: + +. +Foo + --- +. +

Foo +---

+. + +The setext header underline cannot contain internal spaces: + +. +Foo += = + +Foo +--- - +. +

Foo += =

+

Foo

+
+. + +Trailing spaces in the content line do not cause a line break: + +. +Foo +----- +. +

Foo

+. + +Nor does a backslash at the end: + +. +Foo\ +---- +. +

Foo\

+. + +Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headers: + +. +`Foo +---- +` + +
+. +

`Foo

+

`

+

<a title="a lot

+

of dashes"/>

+. + +The setext header underline cannot be a [lazy continuation +line] in a list item or block quote: + +. +> Foo +--- +. +
+

Foo

+
+
+. + +. +- Foo +--- +. +
    +
  • Foo
  • +
+
+. + +A setext header cannot interrupt a paragraph: + +. +Foo +Bar +--- + +Foo +Bar +=== +. +

Foo +Bar

+
+

Foo +Bar +===

+. + +But in general a blank line is not required before or after: + +. +--- +Foo +--- +Bar +--- +Baz +. +
+

Foo

+

Bar

+

Baz

+. + +Setext headers cannot be empty: + +. + +==== +. +

====

+. + +Setext header text lines must not be interpretable as block +constructs other than paragraphs. So, the line of dashes +in these examples gets interpreted as a horizontal rule: + +. +--- +--- +. +
+
+. + +. +- foo +----- +. +
    +
  • foo
  • +
+
+. + +. + foo +--- +. +
foo
+
+
+. + +. +> foo +----- +. +
+

foo

+
+
+. + +If you want a header with `> foo` as its literal text, you can +use backslash escapes: + +. +\> foo +------ +. +

> foo

+. + +## Indented code blocks + +An [indented code block](@indented-code-block) is composed of one or more +[indented chunk]s separated by blank lines. +An [indented chunk](@indented-chunk) is a sequence of non-blank lines, +each indented four or more spaces. The contents of the code block are +the literal contents of the lines, including trailing +[line ending]s, minus four spaces of indentation. +An indented code block has no [info string]. + +An indented code block cannot interrupt a paragraph, so there must be +a blank line between a paragraph and a following indented code block. +(A blank line is not needed, however, between a code block and a following +paragraph.) + +. + a simple + indented code block +. +
a simple
+  indented code block
+
+. + +The contents are literal text, and do not get parsed as Markdown: + +. +
+ *hi* + + - one +. +
<a/>
+*hi*
+
+- one
+
+. + +Here we have three chunks separated by blank lines: + +. + chunk1 + + chunk2 + + + + chunk3 +. +
chunk1
+
+chunk2
+
+
+
+chunk3
+
+. + +Any initial spaces beyond four will be included in the content, even +in interior blank lines: + +. + chunk1 + + chunk2 +. +
chunk1
+  
+  chunk2
+
+. + +An indented code block cannot interrupt a paragraph. (This +allows hanging indents and the like.) + +. +Foo + bar + +. +

Foo +bar

+. + +However, any non-blank line with fewer than four leading spaces ends +the code block immediately. So a paragraph may occur immediately +after indented code: + +. + foo +bar +. +
foo
+
+

bar

+. + +And indented code can occur immediately before and after other kinds of +blocks: + +. +# Header + foo +Header +------ + foo +---- +. +

Header

+
foo
+
+

Header

+
foo
+
+
+. + +The first line can be indented more than four spaces: + +. + foo + bar +. +
    foo
+bar
+
+. + +Blank lines preceding or following an indented code block +are not included in it: + +. + + + foo + + +. +
foo
+
+. + +Trailing spaces are included in the code block's content: + +. + foo +. +
foo  
+
+. + + +## Fenced code blocks + +A [code fence](@code-fence) is a sequence +of at least three consecutive backtick characters (`` ` ``) or +tildes (`~`). (Tildes and backticks cannot be mixed.) +A [fenced code block](@fenced-code-block) +begins with a code fence, indented no more than three spaces. + +The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +spaces and called the [info string](@info-string). +The [info string] may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.) + +The content of the code block consists of all subsequent lines, until +a closing [code fence] of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +indented N spaces, then up to N spaces of indentation are removed from +each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented less than N +spaces, all of the indentation is removed.) + +The closing code fence may be indented up to three spaces, and may be +followed only by spaces, which are ignored. If the end of the +containing block (or document) is reached and no closing code fence +has been found, the code block contains all of the lines after the +opening code fence until the end of the containing block (or +document). (An alternative spec would require backtracking in the +event that a closing code fence is not found. But this makes parsing +much less efficient, and there seems to be no real down side to the +behavior described here.) + +A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after. + +The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the [info string] is typically used to +specify the language of the code sample, and rendered in the `class` +attribute of the `code` tag. However, this spec does not mandate any +particular treatment of the [info string]. + +Here is a simple example with backticks: + +. +``` +< + > +``` +. +
<
+ >
+
+. + +With tildes: + +. +~~~ +< + > +~~~ +. +
<
+ >
+
+. + +The closing code fence must use the same character as the opening +fence: + +. +``` +aaa +~~~ +``` +. +
aaa
+~~~
+
+. + +. +~~~ +aaa +``` +~~~ +. +
aaa
+```
+
+. + +The closing code fence must be at least as long as the opening fence: + +. +```` +aaa +``` +`````` +. +
aaa
+```
+
+. + +. +~~~~ +aaa +~~~ +~~~~ +. +
aaa
+~~~
+
+. + +Unclosed code blocks are closed by the end of the document: + +. +``` +. +
+. + +. +````` + +``` +aaa +. +

+```
+aaa
+
+. + +A code block can have all empty lines as its content: + +. +``` + + +``` +. +

+  
+
+. + +A code block can be empty: + +. +``` +``` +. +
+. + +Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present: + +. + ``` + aaa +aaa +``` +. +
aaa
+aaa
+
+. + +. + ``` +aaa + aaa +aaa + ``` +. +
aaa
+aaa
+aaa
+
+. + +. + ``` + aaa + aaa + aaa + ``` +. +
aaa
+ aaa
+aaa
+
+. + +Four spaces indentation produces an indented code block: + +. + ``` + aaa + ``` +. +
```
+aaa
+```
+
+. + +Closing fences may be indented by 0-3 spaces, and their indentation +need not match that of the opening fence: + +. +``` +aaa + ``` +. +
aaa
+
+. + +. + ``` +aaa + ``` +. +
aaa
+
+. + +This is not a closing fence, because it is indented 4 spaces: + +. +``` +aaa + ``` +. +
aaa
+    ```
+
+. + + +Code fences (opening and closing) cannot contain internal spaces: + +. +``` ``` +aaa +. +

+aaa

+. + +. +~~~~~~ +aaa +~~~ ~~ +. +
aaa
+~~~ ~~
+
+. + +Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between: + +. +foo +``` +bar +``` +baz +. +

foo

+
bar
+
+

baz

+. + +Other blocks can also occur before and after fenced code blocks +without an intervening blank line: + +. +foo +--- +~~~ +bar +~~~ +# baz +. +

foo

+
bar
+
+

baz

+. + +An [info string] can be provided after the opening code fence. +Opening and closing spaces will be stripped, and the first word, prefixed +with `language-`, is used as the value for the `class` attribute of the +`code` element within the enclosing `pre` element. + +. +```ruby +def foo(x) + return 3 +end +``` +. +
def foo(x)
+  return 3
+end
+
+. + +. +~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ +. +
def foo(x)
+  return 3
+end
+
+. + +. +````; +```` +. +
+. + +[Info string]s for backtick code blocks cannot contain backticks: + +. +``` aa ``` +foo +. +

aa +foo

+. + +Closing code fences cannot have [info string]s: + +. +``` +``` aaa +``` +. +
``` aaa
+
+. + + +## HTML blocks + +An [HTML block tag](@html-block-tag) is +an [open tag] or [closing tag] whose tag +name is one of the following (case-insensitive): +`article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `iframe`, +`body`, `li`, `map`, `button`, `object`, `canvas`, `ol`, `caption`, +`output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, +`section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, +`fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, +`tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`, +`script`, `style`. + +An [HTML block](@html-block) begins with an +[HTML block tag], [HTML comment], [processing instruction], +[declaration], or [CDATA section]. +It ends when a [blank line] or the end of the +input is encountered. The initial line may be indented up to three +spaces, and subsequent lines may have any indentation. The contents +of the HTML block are interpreted as raw HTML, and will not be escaped +in HTML output. + +Some simple examples: + +. + + + + +
+ hi +
+ +okay. +. + + + + +
+ hi +
+

okay.

+. + +. +
+ *hello* + +. +
+ *hello* + +. + +Here we have two HTML blocks with a Markdown paragraph between them: + +. +
+ +*Markdown* + +
+. +
+

Markdown

+
+. + +In the following example, what looks like a Markdown code block +is actually part of the HTML block, which continues until a blank +line or the end of the document is reached: + +. +
+``` c +int x = 33; +``` +. +
+``` c +int x = 33; +``` +. + +A comment: + +. + +. + +. + +A processing instruction: + +. +'; +?> +. +'; +?> +. + +CDATA: + +. + +. + +. + +The opening tag can be indented 1-3 spaces, but not 4: + +. + + + +. + +
<!-- foo -->
+
+. + +An HTML block can interrupt a paragraph, and need not be preceded +by a blank line. + +. +Foo +
+bar +
+. +

Foo

+
+bar +
+. + +However, a following blank line is always needed, except at the end of +a document: + +. +
+bar +
+*foo* +. +
+bar +
+*foo* +. + +An incomplete HTML block tag may also start an HTML block: + +. +
The only restrictions are that block-level HTML elements — +> e.g. `
`, ``, `
`, `

`, etc. — must be separated from +> surrounding content by blank lines, and the start and end tags of the +> block should not be indented with tabs or spaces. + +In some ways Gruber's rule is more restrictive than the one given +here: + +- It requires that an HTML block be preceded by a blank line. +- It does not allow the start tag to be indented. +- It requires a matching end tag, which it also does not allow to + be indented. + +Indeed, most Markdown implementations, including some of Gruber's +own perl implementations, do not impose these restrictions. + +There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including Markdown content inside HTML tags: +simply separate the Markdown from the HTML using blank lines: + +. +

+ +*Emphasized* text. + +
+. +
+

Emphasized text.

+
+. + +Compare: + +. +
+*Emphasized* text. +
+. +
+*Emphasized* text. +
+. + +Some Markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute `markdown=1`. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse. + +The main potential drawback is that one can no longer paste HTML +blocks into Markdown documents with 100% reliability. However, +*in most cases* this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example: + +. +
+ + + + + + + +
+Hi +
+. + + + + +
+Hi +
+. + +Moreover, blank lines are usually not necessary and can be +deleted. The exception is inside `
` tags; here, one can
+replace the blank lines with `
` entities.
+
+So there is no important loss of expressive power with the new rule.
+
+## Link reference definitions
+
+A [link reference definition](@link-reference-definition)
+consists of a [link label], indented up to three spaces, followed
+by a colon (`:`), optional [whitespace] (including up to one
+[line ending]), a [link destination],
+optional [whitespace] (including up to one
+[line ending]), and an optional [link
+title], which if it is present must be separated
+from the [link destination] by [whitespace].
+No further [non-space character]s may occur on the line.
+
+A [link reference-definition]
+does not correspond to a structural element of a document.  Instead, it
+defines a label which can be used in [reference link]s
+and reference-style [images] elsewhere in the document.  [Link
+reference definitions] can come either before or after the links that use
+them.
+
+.
+[foo]: /url "title"
+
+[foo]
+.
+

foo

+. + +. + [foo]: + /url + 'the title' + +[foo] +. +

foo

+. + +. +[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] +. +

Foo*bar]

+. + +. +[Foo bar]: + +'title' + +[Foo bar] +. +

Foo bar

+. + +The title may be omitted: + +. +[foo]: +/url + +[foo] +. +

foo

+. + +The link destination may not be omitted: + +. +[foo]: + +[foo] +. +

[foo]:

+

[foo]

+. + +A link can come before its corresponding definition: + +. +[foo] + +[foo]: url +. +

foo

+. + +If there are several matching definitions, the first one takes +precedence: + +. +[foo] + +[foo]: first +[foo]: second +. +

foo

+. + +As noted in the section on [Links], matching of labels is +case-insensitive (see [matches]). + +. +[FOO]: /url + +[Foo] +. +

Foo

+. + +. +[ΑΓΩ]: /φου + +[αγω] +. +

αγω

+. + +Here is a link reference definition with no corresponding link. +It contributes nothing to the document. + +. +[foo]: /url +. +. + +This is not a link reference definition, because there are +[non-space character]s after the title: + +. +[foo]: /url "title" ok +. +

[foo]: /url "title" ok

+. + +This is not a link reference definition, because it is indented +four spaces: + +. + [foo]: /url "title" + +[foo] +. +
[foo]: /url "title"
+
+

[foo]

+. + +This is not a link reference definition, because it occurs inside +a code block: + +. +``` +[foo]: /url +``` + +[foo] +. +
[foo]: /url
+
+

[foo]

+. + +A [link reference definition] cannot interrupt a paragraph. + +. +Foo +[bar]: /baz + +[bar] +. +

Foo +[bar]: /baz

+

[bar]

+. + +However, it can directly follow other block elements, such as headers +and horizontal rules, and it need not be followed by a blank line. + +. +# [Foo] +[foo]: /url +> bar +. +

Foo

+
+

bar

+
+. + +Several [link reference definition]s +can occur one after another, without intervening blank lines. + +. +[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url + +[foo], +[bar], +[baz] +. +

foo, +bar, +baz

+. + +[Link reference definition]s can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined: + +. +[foo] + +> [foo]: /url +. +

foo

+
+
+. + + +## Paragraphs + +A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a [paragraph](@paragraph). +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +[whitespace]. + +A simple example with two paragraphs: + +. +aaa + +bbb +. +

aaa

+

bbb

+. + +Paragraphs can contain multiple lines, but no blank lines: + +. +aaa +bbb + +ccc +ddd +. +

aaa +bbb

+

ccc +ddd

+. + +Multiple blank lines between paragraph have no effect: + +. +aaa + + +bbb +. +

aaa

+

bbb

+. + +Leading spaces are skipped: + +. + aaa + bbb +. +

aaa +bbb

+. + +Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs. + +. +aaa + bbb + ccc +. +

aaa +bbb +ccc

+. + +However, the first line may be indented at most three spaces, +or an indented code block will be triggered: + +. + aaa +bbb +. +

aaa +bbb

+. + +. + aaa +bbb +. +
aaa
+
+

bbb

+. + +Final spaces are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a [hard line +break]: + +. +aaa +bbb +. +

aaa
+bbb

+. + +## Blank lines + +[Blank line]s between block-level elements are ignored, +except for the role they play in determining whether a [list] +is [tight] or [loose]. + +Blank lines at the beginning and end of the document are also ignored. + +. + + +aaa + + +# aaa + + +. +

aaa

+

aaa

+. + + +# Container blocks + +A [container block] is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes] and [list items]. +[Lists] are meta-containers for [list items]. + +We define the syntax for container blocks recursively. The general +form of the definition is: + +> If X is a sequence of blocks, then the result of +> transforming X in such-and-such a way is a container of type Y +> with these blocks as its content. + +So, we explain what counts as a block quote or list item by explaining +how these can be *generated* from their contents. This should suffice +to define the syntax, although it does not give a recipe for *parsing* +these constructions. (A recipe is provided below in the section entitled +[A parsing strategy](#appendix-a-a-parsing-strategy).) + +## Block quotes + +A [block quote marker](@block-quote-marker) +consists of 0-3 spaces of initial indent, plus (a) the character `>` together +with a following space, or (b) a single character `>` not followed by a space. + +The following rules define [block quotes]: + +1. **Basic case.** If a string of lines *Ls* constitute a sequence + of blocks *Bs*, then the result of prepending a [block quote + marker] to the beginning of each line in *Ls* + is a [block quote](#block-quotes) containing *Bs*. + +2. **Laziness.** If a string of lines *Ls* constitute a [block + quote](#block-quotes) with contents *Bs*, then the result of deleting + the initial [block quote marker] from one or + more lines in which the next [non-space character] after the [block + quote marker] is [paragraph continuation + text] is a block quote with *Bs* as its content. + [Paragraph continuation text](@paragraph-continuation-text) is text + that will be parsed as part of the content of a paragraph, but does + not occur at the beginning of the paragraph. + +3. **Consecutiveness.** A document cannot contain two [block + quotes] in a row unless there is a [blank line] between them. + +Nothing else counts as a [block quote](#block-quotes). + +Here is a simple example: + +. +> # Foo +> bar +> baz +. +
+

Foo

+

bar +baz

+
+. + +The spaces after the `>` characters can be omitted: + +. +># Foo +>bar +> baz +. +
+

Foo

+

bar +baz

+
+. + +The `>` characters can be indented 1-3 spaces: + +. + > # Foo + > bar + > baz +. +
+

Foo

+

bar +baz

+
+. + +Four spaces gives us a code block: + +. + > # Foo + > bar + > baz +. +
> # Foo
+> bar
+> baz
+
+. + +The Laziness clause allows us to omit the `>` before a +paragraph continuation line: + +. +> # Foo +> bar +baz +. +
+

Foo

+

bar +baz

+
+. + +A block quote can contain some lazy and some non-lazy +continuation lines: + +. +> bar +baz +> foo +. +
+

bar +baz +foo

+
+. + +Laziness only applies to lines that are continuations of +paragraphs. Lines containing characters or indentation that indicate +block structure cannot be lazy. + +. +> foo +--- +. +
+

foo

+
+
+. + +. +> - foo +- bar +. +
+
    +
  • foo
  • +
+
+
    +
  • bar
  • +
+. + +. +> foo + bar +. +
+
foo
+
+
+
bar
+
+. + +. +> ``` +foo +``` +. +
+
+
+

foo

+
+. + +A block quote can be empty: + +. +> +. +
+
+. + +. +> +> +> +. +
+
+. + +A block quote can have initial or final blank lines: + +. +> +> foo +> +. +
+

foo

+
+. + +A blank line always separates block quotes: + +. +> foo + +> bar +. +
+

foo

+
+
+

bar

+
+. + +(Most current Markdown implementations, including John Gruber's +original `Markdown.pl`, will parse this example as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.) + +Consecutiveness means that if we put these block quotes together, +we get a single block quote: + +. +> foo +> bar +. +
+

foo +bar

+
+. + +To get a block quote with two paragraphs, use: + +. +> foo +> +> bar +. +
+

foo

+

bar

+
+. + +Block quotes can interrupt paragraphs: + +. +foo +> bar +. +

foo

+
+

bar

+
+. + +In general, blank lines are not needed before or after block +quotes: + +. +> aaa +*** +> bbb +. +
+

aaa

+
+
+
+

bbb

+
+. + +However, because of laziness, a blank line is needed between +a block quote and a following paragraph: + +. +> bar +baz +. +
+

bar +baz

+
+. + +. +> bar + +baz +. +
+

bar

+
+

baz

+. + +. +> bar +> +baz +. +
+

bar

+
+

baz

+. + +It is a consequence of the Laziness rule that any number +of initial `>`s may be omitted on a continuation line of a +nested block quote: + +. +> > > foo +bar +. +
+
+
+

foo +bar

+
+
+
+. + +. +>>> foo +> bar +>>baz +. +
+
+
+

foo +bar +baz

+
+
+
+. + +When including an indented code block in a block quote, +remember that the [block quote marker] includes +both the `>` and a following space. So *five spaces* are needed after +the `>`: + +. +> code + +> not code +. +
+
code
+
+
+
+

not code

+
+. + + +## List items + +A [list marker](@list-marker) is a +[bullet list marker] or an [ordered list marker]. + +A [bullet list marker](@bullet-list-marker) +is a `-`, `+`, or `*` character. + +An [ordered list marker](@ordered-list-marker) +is a sequence of one of more digits (`0-9`), followed by either a +`.` character or a `)` character. + +The following rules define [list items]: + +1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of + blocks *Bs* starting with a [non-space character] and not separated + from each other by more than one blank line, and *M* is a list + marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result + of prepending *M* and the following spaces to the first line of + *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a + list item with *Bs* as its contents. The type of the list item + (bullet or ordered) is determined by the type of its list marker. + If the list item is ordered, then it is also assigned a start + number, based on the ordered list marker. + +For example, let *Ls* be the lines + +. +A paragraph +with two lines. + + indented code + +> A block quote. +. +

A paragraph +with two lines.

+
indented code
+
+
+

A block quote.

+
+. + +And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as *Ls*: + +. +1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
    +
  1. +

    A paragraph +with two lines.

    +
    indented code
    +
    +
    +

    A block quote.

    +
    +
  2. +
+. + +The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces, and there are three spaces between +the list marker and the next [non-space character], then blocks +must be indented five spaces in order to fall under the list +item. + +Here are some examples showing how far content must be indented to be +put under the list item: + +. +- one + + two +. +
    +
  • one
  • +
+

two

+. + +. +- one + + two +. +
    +
  • +

    one

    +

    two

    +
  • +
+. + +. + - one + + two +. +
    +
  • one
  • +
+
 two
+
+. + +. + - one + + two +. +
    +
  • +

    one

    +

    two

    +
  • +
+. + +It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first +[non-space character] after the list marker. However, that is not quite right. +The spaces after the list marker determine how much relative indentation +is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as shown by +this example: + +. + > > 1. one +>> +>> two +. +
+
+
    +
  1. +

    one

    +

    two

    +
  2. +
+
+
+. + +Here `two` occurs in the same column as the list marker `1.`, +but is actually contained in the list item, because there is +sufficent indentation after the last containing blockquote marker. + +The converse is also possible. In the following example, the word `two` +occurs far to the right of the initial text of the list item, `one`, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker: + +. +>>- one +>> + > > two +. +
+
+
    +
  • one
  • +
+

two

+
+
+. + +A list item may not contain blocks that are separated by more than +one blank line. Thus, two blank lines will end a list, unless the +two blanks are contained in a [fenced code block]. + +. +- foo + + bar + +- foo + + + bar + +- ``` + foo + + + bar + ``` + +- baz + + + ``` + foo + + + bar + ``` +. +
    +
  • +

    foo

    +

    bar

    +
  • +
  • +

    foo

    +
  • +
+

bar

+
    +
  • +
    foo
    +
    +
    +bar
    +
    +
  • +
  • +

    baz

    +
      +
    • +
      foo
      +
      +
      +bar
      +
      +
    • +
    +
  • +
+. + +A list item may contain any kind of block: + +. +1. foo + + ``` + bar + ``` + + baz + + > bam +. +
    +
  1. +

    foo

    +
    bar
    +
    +

    baz

    +
    +

    bam

    +
    +
  2. +
+. + +2. **Item starting with indented code.** If a sequence of lines *Ls* + constitute a sequence of blocks *Bs* starting with an indented code + block and not separated from each other by more than one blank line, + and *M* is a list marker *M* of width *W* followed by + one space, then the result of prepending *M* and the following + space to the first line of *Ls*, and indenting subsequent lines of + *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. + +An indented code block will have to be indented four spaces beyond +the edge of the region where text will be included in the list item. +In the following case that is 6 spaces: + +. +- foo + + bar +. +
    +
  • +

    foo

    +
    bar
    +
    +
  • +
+. + +And in this case it is 11 spaces: + +. + 10. foo + + bar +. +
    +
  1. +

    foo

    +
    bar
    +
    +
  2. +
+. + +If the *first* block in the list item is an indented code block, +then by rule #2, the contents must be indented *one* space after the +list marker: + +. + indented code + +paragraph + + more code +. +
indented code
+
+

paragraph

+
more code
+
+. + +. +1. indented code + + paragraph + + more code +. +
    +
  1. +
    indented code
    +
    +

    paragraph

    +
    more code
    +
    +
  2. +
+. + +Note that an additional space indent is interpreted as space +inside the code block: + +. +1. indented code + + paragraph + + more code +. +
    +
  1. +
     indented code
    +
    +

    paragraph

    +
    more code
    +
    +
  2. +
+. + +Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a +[non-space character], and (b) cases in which +they begin with an indented code +block. In a case like the following, where the first block begins with +a three-space indent, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker: + +. + foo + +bar +. +

foo

+

bar

+. + +. +- foo + + bar +. +
    +
  • foo
  • +
+

bar

+. + +This is not a significant restriction, because when a block begins +with 1-3 spaces indent, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case: + +. +- foo + + bar +. +
    +
  • +

    foo

    +

    bar

    +
  • +
+. + +3. **Empty list item.** A [list marker] followed by a +line containing only [whitespace] is a list item with no contents. + +Here is an empty bullet list item: + +. +- foo +- +- bar +. +
    +
  • foo
  • +
  • +
  • bar
  • +
+. + +It does not matter whether there are spaces following the [list marker]: + +. +- foo +- +- bar +. +
    +
  • foo
  • +
  • +
  • bar
  • +
+. + +Here is an empty ordered list item: + +. +1. foo +2. +3. bar +. +
    +
  1. foo
  2. +
  3. +
  4. bar
  5. +
+. + +A list may start or end with an empty list item: + +. +* +. +
    +
  • +
+. + +4. **Indentation.** If a sequence of lines *Ls* constitutes a list item + according to rule #1, #2, or #3, then the result of indenting each line + of *L* by 1-3 spaces (the same for each line) also constitutes a + list item with the same contents and attributes. If a line is + empty, then it need not be indented. + +Indented one space: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
    +
  1. +

    A paragraph +with two lines.

    +
    indented code
    +
    +
    +

    A block quote.

    +
    +
  2. +
+. + +Indented two spaces: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
    +
  1. +

    A paragraph +with two lines.

    +
    indented code
    +
    +
    +

    A block quote.

    +
    +
  2. +
+. + +Indented three spaces: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
    +
  1. +

    A paragraph +with two lines.

    +
    indented code
    +
    +
    +

    A block quote.

    +
    +
  2. +
+. + +Four spaces indent gives a code block: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +
1.  A paragraph
+    with two lines.
+
+        indented code
+
+    > A block quote.
+
+. + + +5. **Laziness.** If a string of lines *Ls* constitute a [list + item](#list-items) with contents *Bs*, then the result of deleting + some or all of the indentation from one or more lines in which the + next [non-space character] after the indentation is + [paragraph continuation text] is a + list item with the same contents and attributes. The unindented + lines are called + [lazy continuation line](@lazy-continuation-line)s. + +Here is an example with [lazy continuation line]s: + +. + 1. A paragraph +with two lines. + + indented code + + > A block quote. +. +
    +
  1. +

    A paragraph +with two lines.

    +
    indented code
    +
    +
    +

    A block quote.

    +
    +
  2. +
+. + +Indentation can be partially deleted: + +. + 1. A paragraph + with two lines. +. +
    +
  1. A paragraph +with two lines.
  2. +
+. + +These examples show how laziness can work in nested structures: + +. +> 1. > Blockquote +continued here. +. +
+
    +
  1. +
    +

    Blockquote +continued here.

    +
    +
  2. +
+
+. + +. +> 1. > Blockquote +> continued here. +. +
+
    +
  1. +
    +

    Blockquote +continued here.

    +
    +
  2. +
+
+. + + +6. **That's all.** Nothing that is not counted as a list item by rules + #1--5 counts as a [list item](#list-items). + +The rules for sublists follow from the general rules above. A sublist +must be indented the same number of spaces a paragraph would need to be +in order to be included in the list item. + +So, in this case we need two spaces indent: + +. +- foo + - bar + - baz +. +
    +
  • foo +
      +
    • bar +
        +
      • baz
      • +
      +
    • +
    +
  • +
+. + +One is not enough: + +. +- foo + - bar + - baz +. +
    +
  • foo
  • +
  • bar
  • +
  • baz
  • +
+. + +Here we need four, because the list marker is wider: + +. +10) foo + - bar +. +
    +
  1. foo +
      +
    • bar
    • +
    +
  2. +
+. + +Three is not enough: + +. +10) foo + - bar +. +
    +
  1. foo
  2. +
+
    +
  • bar
  • +
+. + +A list may be the first block in a list item: + +. +- - foo +. +
    +
  • +
      +
    • foo
    • +
    +
  • +
+. + +. +1. - 2. foo +. +
    +
  1. +
      +
    • +
        +
      1. foo
      2. +
      +
    • +
    +
  2. +
+. + +A list item can contain a header: + +. +- # Foo +- Bar + --- + baz +. +
    +
  • +

    Foo

    +
  • +
  • +

    Bar

    +baz
  • +
+. + +### Motivation + +John Gruber's Markdown spec says the following about list items: + +1. "List markers typically start at the left margin, but may be indented + by up to three spaces. List markers must be followed by one or more + spaces or a tab." + +2. "To make lists look nice, you can wrap items with hanging indents.... + But if you don't want to, you don't have to." + +3. "List items may consist of multiple paragraphs. Each subsequent + paragraph in a list item must be indented by either 4 spaces or one + tab." + +4. "It looks nice if you indent every line of the subsequent paragraphs, + but here again, Markdown will allow you to be lazy." + +5. "To put a blockquote within a list item, the blockquote's `>` + delimiters need to be indented." + +6. "To put a code block within a list item, the code block needs to be + indented twice — 8 spaces or two tabs." + +These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that *all* block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +*four-space rule*. + +The four-space rule is clear and principled, and if the reference +implementation `Markdown.pl` had followed it, it probably would have +become the standard. However, `Markdown.pl` allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of Markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-Markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP Markdown, and others +followed `Markdown.pl`'s behavior more closely.) + +Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving `Markdown.pl` behavior, provided they are laid out +in a way that is natural for a human to read. + +The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #5, then allows continuation lines to be +unindented if needed.) + +This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that + +``` markdown +- foo + + bar + + - baz +``` + +should be parsed as two lists with an intervening paragraph, + +``` html +
    +
  • foo
  • +
+

bar

+
    +
  • baz
  • +
+``` + +as the four-space rule demands, rather than a single list, + +``` html +
    +
  • +

    foo

    +

    bar

    +
      +
    • baz
    • +
    +
  • +
+``` + +The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly. + +Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing 1--3 spaces indentation of the +initial list marker, allows text that is indented *less than* the +original list marker to be included in the list item. For example, +`Markdown.pl` parses + +``` markdown + - one + + two +``` + +as a single list item, with `two` a continuation paragraph: + +``` html +
    +
  • +

    one

    +

    two

    +
  • +
+``` + +and similarly + +``` markdown +> - one +> +> two +``` + +as + +``` html +
+
    +
  • +

    one

    +

    two

    +
  • +
+
+``` + +This is extremely unintuitive. + +Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph `bar` +is not indented as far as the first paragraph `foo`: + +``` markdown + 10. foo + + bar +``` + +Arguably this text does read like a list item with `bar` as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing Markdown, which has the pattern: + +``` markdown +1. foo + + indented code +``` + +where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of `foo`. + +The one case that needs special treatment is a list item that *starts* +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases. + +## Lists + +A [list](@list) is a sequence of one or more +list items [of the same type]. The list items +may be separated by single [blank lines], but two +blank lines end all containing lists. + +Two list items are [of the same type](@of-the-same-type) +if they begin with a [list marker] of the same type. +Two list markers are of the +same type if (a) they are bullet list markers using the same character +(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same +delimiter (either `.` or `)`). + +A list is an [ordered list](@ordered-list) +if its constituent list items begin with +[ordered list marker]s, and a +[bullet list](@bullet-list) if its constituent list +items begin with [bullet list marker]s. + +The [start number](@start-number) +of an [ordered list] is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded. + +A list is [loose](@loose) if it any of its constituent +list items are separated by blank lines, or if any of its constituent +list items directly contain two block-level elements with a blank line +between them. Otherwise a list is [tight](@tight). +(The difference in HTML output is that paragraphs in a loose list are +wrapped in `

` tags, while paragraphs in a tight list are not.) + +Changing the bullet or ordered list delimiter starts a new list: + +. +- foo +- bar ++ baz +. +

    +
  • foo
  • +
  • bar
  • +
+
    +
  • baz
  • +
+. + +. +1. foo +2. bar +3) baz +. +
    +
  1. foo
  2. +
  3. bar
  4. +
+
    +
  1. baz
  2. +
+. + +In CommonMark, a list can interrupt a paragraph. That is, +no blank line is needed to separate a paragraph from a following +list: + +. +Foo +- bar +- baz +. +

Foo

+
    +
  • bar
  • +
  • baz
  • +
+. + +`Markdown.pl` does not allow this, through fear of triggering a list +via a numeral in a hard-wrapped line: + +. +The number of windows in my house is +14. The number of doors is 6. +. +

The number of windows in my house is

+
    +
  1. The number of doors is 6.
  2. +
+. + +Oddly, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph, +even though the same considerations might apply. We think that the two +cases should be treated the same. Here are two reasons for allowing +lists to interrupt paragraphs: + +First, it is natural and not uncommon for people to start lists without +blank lines: + + I need to buy + - new shoes + - a coat + - a plane ticket + +Second, we are attracted to a + +> [principle of uniformity](@principle-of-uniformity): +> if a chunk of text has a certain +> meaning, it will continue to have the same meaning when put into a +> container block (such as a list item or blockquote). + +(Indeed, the spec for [list items] and [block quotes] presupposes +this principle.) This principle implies that if + + * I need to buy + - new shoes + - a coat + - a plane ticket + +is a list item containing a paragraph followed by a nested sublist, +as all Markdown implementations agree it is (though the paragraph +may be rendered without `

` tags, since the list is "tight"), +then + + I need to buy + - new shoes + - a coat + - a plane ticket + +by itself should be a paragraph followed by a nested sublist. + +Our adherence to the [principle of uniformity] +thus inclines us to think that there are two coherent packages: + +1. Require blank lines before *all* lists and blockquotes, + including lists that occur as sublists inside other list items. + +2. Require blank lines in none of these places. + +[reStructuredText](http://docutils.sourceforge.net/rst.html) takes +the first approach, for which there is much to be said. But the second +seems more consistent with established practice with Markdown. + +There can be blank lines between items, but two blank lines end +a list: + +. +- foo + +- bar + + +- baz +. +

    +
  • +

    foo

    +
  • +
  • +

    bar

    +
  • +
+
    +
  • baz
  • +
+. + +As illustrated above in the section on [list items], +two blank lines between blocks *within* a list item will also end a +list: + +. +- foo + + + bar +- baz +. +
    +
  • foo
  • +
+

bar

+
    +
  • baz
  • +
+. + +Indeed, two blank lines will end *all* containing lists: + +. +- foo + - bar + - baz + + + bim +. +
    +
  • foo +
      +
    • bar +
        +
      • baz
      • +
      +
    • +
    +
  • +
+
  bim
+
+. + +Thus, two blank lines can be used to separate consecutive lists of +the same type, or to separate a list from an indented code block +that would otherwise be parsed as a subparagraph of the final list +item: + +. +- foo +- bar + + +- baz +- bim +. +
    +
  • foo
  • +
  • bar
  • +
+
    +
  • baz
  • +
  • bim
  • +
+. + +. +- foo + + notcode + +- foo + + + code +. +
    +
  • +

    foo

    +

    notcode

    +
  • +
  • +

    foo

    +
  • +
+
code
+
+. + +List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item: + +. +- a + - b + - c + - d + - e + - f +- g +. +
    +
  • a
  • +
  • b
  • +
  • c
  • +
  • d
  • +
  • e
  • +
  • f
  • +
  • g
  • +
+. + +This is a loose list, because there is a blank line between +two of the list items: + +. +- a +- b + +- c +. +
    +
  • +

    a

    +
  • +
  • +

    b

    +
  • +
  • +

    c

    +
  • +
+. + +So is this, with a empty second item: + +. +* a +* + +* c +. +
    +
  • +

    a

    +
  • +
  • +
  • +

    c

    +
  • +
+. + +These are loose lists, even though there is no space between the items, +because one of the items directly contains two block-level elements +with a blank line between them: + +. +- a +- b + + c +- d +. +
    +
  • +

    a

    +
  • +
  • +

    b

    +

    c

    +
  • +
  • +

    d

    +
  • +
+. + +. +- a +- b + + [ref]: /url +- d +. +
    +
  • +

    a

    +
  • +
  • +

    b

    +
  • +
  • +

    d

    +
  • +
+. + +This is a tight list, because the blank lines are in a code block: + +. +- a +- ``` + b + + + ``` +- c +. +
    +
  • a
  • +
  • +
    b
    +
    +
    +
    +
  • +
  • c
  • +
+. + +This is a tight list, because the blank line is between two +paragraphs of a sublist. So the sublist is loose while +the outer list is tight: + +. +- a + - b + + c +- d +. +
    +
  • a +
      +
    • +

      b

      +

      c

      +
    • +
    +
  • +
  • d
  • +
+. + +This is a tight list, because the blank line is inside the +block quote: + +. +* a + > b + > +* c +. +
    +
  • a +
    +

    b

    +
    +
  • +
  • c
  • +
+. + +This list is tight, because the consecutive block elements +are not separated by blank lines: + +. +- a + > b + ``` + c + ``` +- d +. +
    +
  • a +
    +

    b

    +
    +
    c
    +
    +
  • +
  • d
  • +
+. + +A single-paragraph list is tight: + +. +- a +. +
    +
  • a
  • +
+. + +. +- a + - b +. +
    +
  • a +
      +
    • b
    • +
    +
  • +
+. + +This list is loose, because of the blank line between the +two block elements in the list item: + +. +1. ``` + foo + ``` + + bar +. +
    +
  1. +
    foo
    +
    +

    bar

    +
  2. +
+. + +Here the outer list is loose, the inner list tight: + +. +* foo + * bar + + baz +. +
    +
  • +

    foo

    +
      +
    • bar
    • +
    +

    baz

    +
  • +
+. + +. +- a + - b + - c + +- d + - e + - f +. +
    +
  • +

    a

    +
      +
    • b
    • +
    • c
    • +
    +
  • +
  • +

    d

    +
      +
    • e
    • +
    • f
    • +
    +
  • +
+. + +# Inlines + +Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in + +. +`hi`lo` +. +

hilo`

+. + +`hi` is parsed as code, leaving the backtick at the end as a literal +backtick. + +## Backslash escapes + +Any ASCII punctuation character may be backslash-escaped: + +. +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ +. +

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

+. + +Backslashes before other characters are treated as literal +backslashes: + +. +\→\A\a\ \3\φ\« +. +

\ \A\a\ \3\φ\«

+. + +Escaped characters are treated as regular characters and do +not have their usual Markdown meanings: + +. +\*not emphasized* +\
not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a header +\[foo]: /url "not a reference" +. +

*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a header +[foo]: /url "not a reference"

+. + +If a backslash is itself escaped, the following character is not: + +. +\\*emphasis* +. +

\emphasis

+. + +A backslash at the end of the line is a [hard line break]: + +. +foo\ +bar +. +

foo
+bar

+. + +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: + +. +`` \[\` `` +. +

\[\`

+. + +. + \[\] +. +
\[\]
+
+. + +. +~~~ +\[\] +~~~ +. +
\[\]
+
+. + +. + +. +

http://example.com?find=\*

+. + +. + +. +

+. + +But they work in all other contexts, including URLs and link titles, +link references, and [info string]s in [fenced code block]s: + +. +[foo](/bar\* "ti\*tle") +. +

foo

+. + +. +[foo] + +[foo]: /bar\* "ti\*tle" +. +

foo

+. + +. +``` foo\+bar +foo +``` +. +
foo
+
+. + + +## Entities + +With the goal of making this standard as HTML-agnostic as possible, all +valid HTML entities (except in code blocks and code spans) +are recognized as such and converted into unicode characters before +they are stored in the AST. This means that renderers to formats other +than HTML need not be HTML-entity aware. HTML renderers may either escape +unicode characters as entities or leave them as they are. (However, +`"`, `&`, `<`, and `>` must always be rendered as entities.) + +[Named entities](@name-entities) consist of `&` ++ any of the valid HTML5 entity names + `;`. The +[following document](https://html.spec.whatwg.org/multipage/entities.json) +is used as an authoritative source of the valid entity names and their +corresponding codepoints. + +. +  & © Æ Ď ¾ ℋ ⅆ ∲ +. +

  & © Æ Ď ¾ ℋ ⅆ ∲

+. + +[Decimal entities](@decimal-entities) +consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these +entities need to be recognised and tranformed into their corresponding +UTF8 codepoints. Invalid Unicode codepoints will be written as the +"unknown codepoint" character (`0xFFFD`) + +. +# Ӓ Ϡ � +. +

# Ӓ Ϡ �

+. + +[Hexadecimal entities](@hexadecimal-entities) +consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits ++ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST. + +. +" ആ ಫ +. +

" ആ ಫ

+. + +Here are some nonentities: + +. +  &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; +. +

&nbsp &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;

+. + +Although HTML5 does accept some entities without a trailing semicolon +(such as `©`), these are not recognized as entities here, because it +makes the grammar too ambiguous: + +. +© +. +

&copy

+. + +Strings that are not on the list of HTML5 named entities are not +recognized as entities either: + +. +&MadeUpEntity; +. +

&MadeUpEntity;

+. + +Entities are recognized in any context besides code spans or +code blocks, including raw HTML, URLs, [link title]s, and +[fenced code block] [info string]s: + +. + +. +

+. + +. +[foo](/föö "föö") +. +

foo

+. + +. +[foo] + +[foo]: /föö "föö" +. +

foo

+. + +. +``` föö +foo +``` +. +
foo
+
+. + +Entities are treated as literal text in code spans and code blocks: + +. +`föö` +. +

f&ouml;&ouml;

+. + +. + föfö +. +
f&ouml;f&ouml;
+
+. + +## Code spans + +A [backtick string](@backtick-string) +is a string of one or more backtick characters (`` ` ``) that is neither +preceded nor followed by a backtick. + +A [code span](@code-span) begins with a backtick string and ends with +a backtick string of equal length. The contents of the code span are +the characters between the two backtick strings, with leading and +trailing spaces and [line ending]s removed, and +[whitespace] collapsed to single spaces. + +This is a simple code span: + +. +`foo` +. +

foo

+. + +Here two backticks are used, because the code contains a backtick. +This example also illustrates stripping of leading and trailing spaces: + +. +`` foo ` bar `` +. +

foo ` bar

+. + +This example shows the motivation for stripping leading and trailing +spaces: + +. +` `` ` +. +

``

+. + +[Line ending]s are treated like spaces: + +. +`` +foo +`` +. +

foo

+. + +Interior spaces and [line ending]s are collapsed into +single spaces, just as they would be by a browser: + +. +`foo bar + baz` +. +

foo bar baz

+. + +Q: Why not just leave the spaces, since browsers will collapse them +anyway? A: Because we might be targeting a non-HTML format, and we +shouldn't rely on HTML-specific rendering assumptions. + +(Existing implementations differ in their treatment of internal +spaces and [line ending]s. Some, including `Markdown.pl` and +`showdown`, convert an internal [line ending] into a +`
` tag. But this makes things difficult for those who like to +hard-wrap their paragraphs, since a line break in the midst of a code +span will cause an unintended line break in the output. Others just +leave internal spaces as they are, which is fine if only HTML is being +targeted.) + +. +`foo `` bar` +. +

foo `` bar

+. + +Note that backslash escapes do not work in code spans. All backslashes +are treated literally: + +. +`foo\`bar` +. +

foo\bar`

+. + +Backslash escapes are never needed, because one can always choose a +string of *n* backtick characters as delimiters, where the code does +not contain any strings of exactly *n* backtick characters. + +Code span backticks have higher precedence than any other inline +constructs except HTML tags and autolinks. Thus, for example, this is +not parsed as emphasized text, since the second `*` is part of a code +span: + +. +*foo`*` +. +

*foo*

+. + +And this is not parsed as a link: + +. +[not a `link](/foo`) +. +

[not a link](/foo)

+. + +Code spans, HTML tags, and autolinks have the same precedence. +Thus, this is code: + +. +`` +. +

<a href="">`

+. + +But this is an HTML tag: + +. +
` +. +

`

+. + +And this is code: + +. +`` +. +

<http://foo.bar.baz>`

+. + +But this is an autolink: + +. +` +. +

http://foo.bar.`baz`

+. + +When a backtick string is not closed by a matching backtick string, +we just have literal backticks: + +. +```foo`` +. +

```foo``

+. + +. +`foo +. +

`foo

+. + +## Emphasis and strong emphasis + +John Gruber's original [Markdown syntax +description](http://daringfireball.net/projects/markdown/syntax#em) says: + +> Markdown treats asterisks (`*`) and underscores (`_`) as indicators of +> emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML +> `` tag; double `*`'s or `_`'s will be wrapped with an HTML `` +> tag. + +This is enough for most users, but these rules leave much undecided, +especially when it comes to nested emphasis. The original +`Markdown.pl` test suite makes it clear that triple `***` and +`___` delimiters can be used for strong emphasis, and most +implementations have also allowed the following patterns: + +``` markdown +***strong emph*** +***strong** in emph* +***emph* in strong** +**in strong *emph*** +*in emph **strong*** +``` + +The following patterns are less widely supported, but the intent +is clear and they are useful (especially in contexts like bibliography +entries): + +``` markdown +*emph *with emph* in it* +**strong **with strong** in it** +``` + +Many implementations have also restricted intraword emphasis to +the `*` forms, to avoid unwanted emphasis in words containing +internal underscores. (It is best practice to put these in code +spans, but users often do not.) + +``` markdown +internal emphasis: foo*bar*baz +no emphasis: foo_bar_baz +``` + +The rules given below capture all of these patterns, while allowing +for efficient parsing strategies that do not backtrack. + +First, some definitions. A [delimiter run](@delimiter-run) is either +a sequence of one or more `*` characters that is not preceded or +followed by a `*` character, or a sequence of one or more `_` +characters that is not preceded or followed by a `_` character. + +A [left-flanking delimiter run](@left-flanking-delimiter-run) is +a [delimiter run] that is (a) not followed by [unicode whitespace], +and (b) either not followed by a [punctuation character], or +preceded by [unicode whitespace] or a [punctuation character]. + +A [right-flanking delimiter run](@right-flanking-delimiter-run) is +a [delimiter run] that is (a) not preceded by [unicode whitespace], +and (b) either not preceded by a [punctuation character], or +followed by [unicode whitespace] or a [punctuation character]. + +Here are some examples of delimiter runs. + + - left-flanking but not right-flanking: + + ``` + ***abc + _abc + **"abc" + _"abc" + ``` + + - right-flanking but not left-flanking: + + ``` + abc*** + abc_ + "abc"** + _"abc" + ``` + + - Both right and right-flanking: + + ``` + abc***def + "abc"_"def" + ``` + + - Neither right nor right-flanking: + + ``` + abc *** def + a _ b + ``` + +(The idea of distinguishing left-flanking and right-flanking +delimiter runs based on the character before and the character +after comes from Roopesh Chander's +[vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags). +vfmd uses the terminology "emphasis indicator string" instead of "delimiter +run," and its rules for distinguishing left- and right-flanking runs +are a bit more complex than the ones given here.) + +The following rules define emphasis and strong emphasis: + +1. A single `*` character [can open emphasis](@can-open-emphasis) + iff it is part of a [left-flanking delimiter run]. + +2. A single `_` character [can open emphasis] iff + it is part of a [left-flanking delimiter run] + and not part of a [right-flanking delimiter run]. + +3. A single `*` character [can close emphasis](@can-close-emphasis) + iff it is part of a [right-flanking delimiter run]. + +4. A single `_` character [can close emphasis] + iff it is part of a [right-flanking delimiter run] + and not part of a [left-flanking delimiter run]. + +5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) + iff it is part of a [left-flanking delimiter run]. + +6. A double `__` [can open strong emphasis] + iff it is part of a [left-flanking delimiter run] + and not part of a [right-flanking delimiter run]. + +7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) + iff it is part of a [right-flanking delimiter run]. + +8. A double `__` [can close strong emphasis] + iff it is part of a [right-flanking delimiter run] + and not part of a [left-flanking delimiter run]. + +9. Emphasis begins with a delimiter that [can open emphasis] and ends + with a delimiter that [can close emphasis], and that uses the same + character (`_` or `*`) as the opening delimiter. There must + be a nonempty sequence of inlines between the open delimiter + and the closing delimiter; these form the contents of the emphasis + inline. + +10. Strong emphasis begins with a delimiter that + [can open strong emphasis] and ends with a delimiter that + [can close strong emphasis], and that uses the same character + (`_` or `*`) as the opening delimiter. + There must be a nonempty sequence of inlines between the open + delimiter and the closing delimiter; these form the contents of + the strong emphasis inline. + +11. A literal `*` character cannot occur at the beginning or end of + `*`-delimited emphasis or `**`-delimited strong emphasis, unless it + is backslash-escaped. + +12. A literal `_` character cannot occur at the beginning or end of + `_`-delimited emphasis or `__`-delimited strong emphasis, unless it + is backslash-escaped. + +Where rules 1--12 above are compatible with multiple parsings, +the following principles resolve ambiguity: + +13. The number of nestings should be minimized. Thus, for example, + an interpretation `...` is always preferred to + `...`. + +14. An interpretation `...` is always + preferred to `..`. + +15. When two potential emphasis or strong emphasis spans overlap, + so that the second begins before the first ends and ends after + the first ends, the first takes precedence. Thus, for example, + `*foo _bar* baz_` is parsed as `foo _bar baz_` rather + than `*foo bar* baz`. For the same reason, + `**foo*bar**` is parsed as `foobar*` + rather than `foo*bar`. + +16. When there are two potential emphasis or strong emphasis spans + with the same closing delimiter, the shorter one (the one that + opens later) takes precedence. Thus, for example, + `**foo **bar baz**` is parsed as `**foo bar baz` + rather than `foo **bar baz`. + +17. Inline code spans, links, images, and HTML tags group more tightly + than emphasis. So, when there is a choice between an interpretation + that contains one of these elements and one that does not, the + former always wins. Thus, for example, `*[foo*](bar)` is + parsed as `*foo*` rather than as + `[foo](bar)`. + +These rules can be illustrated through a series of examples. + +Rule 1: + +. +*foo bar* +. +

foo bar

+. + +This is not emphasis, because the opening `*` is followed by +whitespace, and hence not part of a [left-flanking delimiter run]: + +. +a * foo bar* +. +

a * foo bar*

+. + +This is not emphasis, because the opening `*` is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]: + +. +a*"foo"* +. +

a*"foo"*

+. + +Unicode nonbreaking spaces count as whitespace, too: + +. +* a * +. +

* a *

+. + +Intraword emphasis with `*` is permitted: + +. +foo*bar* +. +

foobar

+. + +. +5*6*78 +. +

5678

+. + +Rule 2: + +. +_foo bar_ +. +

foo bar

+. + +This is not emphasis, because the opening `_` is followed by +whitespace: + +. +_ foo bar_ +. +

_ foo bar_

+. + +This is not emphasis, because the opening `_` is preceded +by an alphanumeric and followed by punctuation: + +. +a_"foo"_ +. +

a_"foo"_

+. + +Emphasis with `_` is not allowed inside words: + +. +foo_bar_ +. +

foo_bar_

+. + +. +5_6_78 +. +

5_6_78

+. + +. +пристаням_стремятся_ +. +

пристаням_стремятся_

+. + +Here `_` does not generate emphasis, because the first delimiter run +is right-flanking and the second left-flanking: + +. +aa_"bb"_cc +. +

aa_"bb"_cc

+. + +Here there is no emphasis, because the delimiter runs are +both left- and right-flanking: + +. +"aa"_"bb"_"cc" +. +

"aa"_"bb"_"cc"

+. + +Rule 3: + +This is not emphasis, because the closing delimiter does +not match the opening delimiter: + +. +_foo* +. +

_foo*

+. + +This is not emphasis, because the closing `*` is preceded by +whitespace: + +. +*foo bar * +. +

*foo bar *

+. + +This is not emphasis, because the second `*` is +preceded by punctuation and followed by an alphanumeric +(hence it is not part of a [right-flanking delimiter run]: + +. +*(*foo) +. +

*(*foo)

+. + +The point of this restriction is more easily appreciated +with this example: + +. +*(*foo*)* +. +

(foo)

+. + +Intraword emphasis with `*` is allowed: + +. +*foo*bar +. +

foobar

+. + + +Rule 4: + +This is not emphasis, because the closing `_` is preceded by +whitespace: + +. +_foo bar _ +. +

_foo bar _

+. + +This is not emphasis, because the second `_` is +preceded by punctuation and followed by an alphanumeric: + +. +_(_foo) +. +

_(_foo)

+. + +This is emphasis within emphasis: + +. +_(_foo_)_ +. +

(foo)

+. + +Intraword emphasis is disallowed for `_`: + +. +_foo_bar +. +

_foo_bar

+. + +. +_пристаням_стремятся +. +

_пристаням_стремятся

+. + +. +_foo_bar_baz_ +. +

foo_bar_baz

+. + +Rule 5: + +. +**foo bar** +. +

foo bar

+. + +This is not strong emphasis, because the opening delimiter is +followed by whitespace: + +. +** foo bar** +. +

** foo bar**

+. + +This is not strong emphasis, because the opening `**` is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]: + +. +a**"foo"** +. +

a**"foo"**

+. + +Intraword strong emphasis with `**` is permitted: + +. +foo**bar** +. +

foobar

+. + +Rule 6: + +. +__foo bar__ +. +

foo bar

+. + +This is not strong emphasis, because the opening delimiter is +followed by whitespace: + +. +__ foo bar__ +. +

__ foo bar__

+. + +This is not strong emphasis, because the opening `__` is preceded +by an alphanumeric and followed by punctuation: + +. +a__"foo"__ +. +

a__"foo"__

+. + +Intraword strong emphasis is forbidden with `__`: + +. +foo__bar__ +. +

foo__bar__

+. + +. +5__6__78 +. +

5__6__78

+. + +. +пристаням__стремятся__ +. +

пристаням__стремятся__

+. + +. +__foo, __bar__, baz__ +. +

foo, bar, baz

+. + +Rule 7: + +This is not strong emphasis, because the closing delimiter is preceded +by whitespace: + +. +**foo bar ** +. +

**foo bar **

+. + +(Nor can it be interpreted as an emphasized `*foo bar *`, because of +Rule 11.) + +This is not strong emphasis, because the second `**` is +preceded by punctuation and followed by an alphanumeric: + +. +**(**foo) +. +

**(**foo)

+. + +The point of this restriction is more easily appreciated +with these examples: + +. +*(**foo**)* +. +

(foo)

+. + +. +**Gomphocarpus (*Gomphocarpus physocarpus*, syn. +*Asclepias physocarpa*)** +. +

Gomphocarpus (Gomphocarpus physocarpus, syn. +Asclepias physocarpa)

+. + +. +**foo "*bar*" foo** +. +

foo "bar" foo

+. + +Intraword emphasis: + +. +**foo**bar +. +

foobar

+. + +Rule 8: + +This is not strong emphasis, because the closing delimiter is +preceded by whitespace: + +. +__foo bar __ +. +

__foo bar __

+. + +This is not strong emphasis, because the second `__` is +preceded by punctuation and followed by an alphanumeric: + +. +__(__foo) +. +

__(__foo)

+. + +The point of this restriction is more easily appreciated +with this example: + +. +_(__foo__)_ +. +

(foo)

+. + +Intraword strong emphasis is forbidden with `__`: + +. +__foo__bar +. +

__foo__bar

+. + +. +__пристаням__стремятся +. +

__пристаням__стремятся

+. + +. +__foo__bar__baz__ +. +

foo__bar__baz

+. + +Rule 9: + +Any nonempty sequence of inline elements can be the contents of an +emphasized span. + +. +*foo [bar](/url)* +. +

foo bar

+. + +. +*foo +bar* +. +

foo +bar

+. + +In particular, emphasis and strong emphasis can be nested +inside emphasis: + +. +_foo __bar__ baz_ +. +

foo bar baz

+. + +. +_foo _bar_ baz_ +. +

foo bar baz

+. + +. +__foo_ bar_ +. +

foo bar

+. + +. +*foo *bar** +. +

foo bar

+. + +. +*foo **bar** baz* +. +

foo bar baz

+. + +But note: + +. +*foo**bar**baz* +. +

foobarbaz

+. + +The difference is that in the preceding case, the internal delimiters +[can close emphasis], while in the cases with spaces, they cannot. + +. +***foo** bar* +. +

foo bar

+. + +. +*foo **bar*** +. +

foo bar

+. + +Note, however, that in the following case we get no strong +emphasis, because the opening delimiter is closed by the first +`*` before `bar`: + +. +*foo**bar*** +. +

foobar**

+. + + +Indefinite levels of nesting are possible: + +. +*foo **bar *baz* bim** bop* +. +

foo bar baz bim bop

+. + +. +*foo [*bar*](/url)* +. +

foo bar

+. + +There can be no empty emphasis or strong emphasis: + +. +** is not an empty emphasis +. +

** is not an empty emphasis

+. + +. +**** is not an empty strong emphasis +. +

**** is not an empty strong emphasis

+. + + +Rule 10: + +Any nonempty sequence of inline elements can be the contents of an +strongly emphasized span. + +. +**foo [bar](/url)** +. +

foo bar

+. + +. +**foo +bar** +. +

foo +bar

+. + +In particular, emphasis and strong emphasis can be nested +inside strong emphasis: + +. +__foo _bar_ baz__ +. +

foo bar baz

+. + +. +__foo __bar__ baz__ +. +

foo bar baz

+. + +. +____foo__ bar__ +. +

foo bar

+. + +. +**foo **bar**** +. +

foo bar

+. + +. +**foo *bar* baz** +. +

foo bar baz

+. + +But note: + +. +**foo*bar*baz** +. +

foobarbaz**

+. + +The difference is that in the preceding case, the internal delimiters +[can close emphasis], while in the cases with spaces, they cannot. + +. +***foo* bar** +. +

foo bar

+. + +. +**foo *bar*** +. +

foo bar

+. + +Indefinite levels of nesting are possible: + +. +**foo *bar **baz** +bim* bop** +. +

foo bar baz +bim bop

+. + +. +**foo [*bar*](/url)** +. +

foo bar

+. + +There can be no empty emphasis or strong emphasis: + +. +__ is not an empty emphasis +. +

__ is not an empty emphasis

+. + +. +____ is not an empty strong emphasis +. +

____ is not an empty strong emphasis

+. + + +Rule 11: + +. +foo *** +. +

foo ***

+. + +. +foo *\** +. +

foo *

+. + +. +foo *_* +. +

foo _

+. + +. +foo ***** +. +

foo *****

+. + +. +foo **\*** +. +

foo *

+. + +. +foo **_** +. +

foo _

+. + +Note that when delimiters do not match evenly, Rule 11 determines +that the excess literal `*` characters will appear outside of the +emphasis, rather than inside it: + +. +**foo* +. +

*foo

+. + +. +*foo** +. +

foo*

+. + +. +***foo** +. +

*foo

+. + +. +****foo* +. +

***foo

+. + +. +**foo*** +. +

foo*

+. + +. +*foo**** +. +

foo***

+. + + +Rule 12: + +. +foo ___ +. +

foo ___

+. + +. +foo _\__ +. +

foo _

+. + +. +foo _*_ +. +

foo *

+. + +. +foo _____ +. +

foo _____

+. + +. +foo __\___ +. +

foo _

+. + +. +foo __*__ +. +

foo *

+. + +. +__foo_ +. +

_foo

+. + +Note that when delimiters do not match evenly, Rule 12 determines +that the excess literal `_` characters will appear outside of the +emphasis, rather than inside it: + +. +_foo__ +. +

foo_

+. + +. +___foo__ +. +

_foo

+. + +. +____foo_ +. +

___foo

+. + +. +__foo___ +. +

foo_

+. + +. +_foo____ +. +

foo___

+. + +Rule 13 implies that if you want emphasis nested directly inside +emphasis, you must use different delimiters: + +. +**foo** +. +

foo

+. + +. +*_foo_* +. +

foo

+. + +. +__foo__ +. +

foo

+. + +. +_*foo*_ +. +

foo

+. + +However, strong emphasis within strong emphasis is possible without +switching delimiters: + +. +****foo**** +. +

foo

+. + +. +____foo____ +. +

foo

+. + + +Rule 13 can be applied to arbitrarily long sequences of +delimiters: + +. +******foo****** +. +

foo

+. + +Rule 14: + +. +***foo*** +. +

foo

+. + +. +_____foo_____ +. +

foo

+. + +Rule 15: + +. +*foo _bar* baz_ +. +

foo _bar baz_

+. + +. +**foo*bar** +. +

foobar*

+. + + +Rule 16: + +. +**foo **bar baz** +. +

**foo bar baz

+. + +. +*foo *bar baz* +. +

*foo bar baz

+. + +Rule 17: + +. +*[bar*](/url) +. +

*bar*

+. + +. +_foo [bar_](/url) +. +

_foo bar_

+. + +. +* +. +

*

+. + +. +** +. +

**

+. + +. +__ +. +

__

+. + +. +*a `*`* +. +

a *

+. + +. +_a `_`_ +. +

a _

+. + +. +**a +. +

**ahttp://foo.bar?q=**

+. + +. +__a +. +

__ahttp://foo.bar?q=__

+. + + +## Links + +A link contains [link text] (the visible text), a [link destination] +(the URI that is the link destination), and optionally a [link title]. +There are two basic kinds of links in Markdown. In [inline link]s the +destination and title are given immediately after the link text. In +[reference link]s the destination and title are defined elsewhere in +the document. + +A [link text](@link-text) consists of a sequence of zero or more +inline elements enclosed by square brackets (`[` and `]`). The +following rules apply: + +- Links may not contain other links, at any level of nesting. + +- Brackets are allowed in the [link text] only if (a) they + are backslash-escaped or (b) they appear as a matched pair of brackets, + with an open bracket `[`, a sequence of zero or more inlines, and + a close bracket `]`. + +- Backtick [code span]s, [autolink]s, and raw [HTML tag]s bind more tightly + than the brackets in link text. Thus, for example, + `` [foo`]` `` could not be a link text, since the second `]` + is part of a code span. + +- The brackets in link text bind more tightly than markers for + [emphasis and strong emphasis]. Thus, for example, `*[foo*](url)` is a link. + +A [link destination](@link-destination) consists of either + +- a sequence of zero or more characters between an opening `<` and a + closing `>` that contains no line breaks or unescaped `<` or `>` + characters, or + +- a nonempty sequence of characters that does not include + ASCII space or control characters, and includes parentheses + only if (a) they are backslash-escaped or (b) they are part of + a balanced pair of unescaped parentheses that is not itself + inside a balanced pair of unescaped paretheses. + +A [link title](@link-title) consists of either + +- a sequence of zero or more characters between straight double-quote + characters (`"`), including a `"` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between straight single-quote + characters (`'`), including a `'` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between matching parentheses + (`(...)`), including a `)` character only if it is backslash-escaped. + +An [inline link](@inline-link) consists of a [link text] followed immediately +by a left parenthesis `(`, optional [whitespace], an optional +[link destination], an optional [link title] separated from the link +destination by [whitespace], optional [whitespace], and a right +parenthesis `)`. The link's text consists of the inlines contained +in the [link text] (excluding the enclosing square brackets). +The link's URI consists of the link destination, excluding enclosing +`<...>` if present, with backslash-escapes in effect as described +above. The link's title consists of the link title, excluding its +enclosing delimiters, with backslash-escapes in effect as described +above. + +Here is a simple inline link: + +. +[link](/uri "title") +. +

link

+. + +The title may be omitted: + +. +[link](/uri) +. +

link

+. + +Both the title and the destination may be omitted: + +. +[link]() +. +

link

+. + +. +[link](<>) +. +

link

+. + +If the destination contains spaces, it must be enclosed in pointy +braces: + +. +[link](/my uri) +. +

[link](/my uri)

+. + +. +[link](
) +. +

link

+. + +The destination cannot contain line breaks, even with pointy braces: + +. +[link](foo +bar) +. +

[link](foo +bar)

+. + +. +[link]() +. +

[link]()

+. + +One level of balanced parentheses is allowed without escaping: + +. +[link]((foo)and(bar)) +. +

link

+. + +However, if you have parentheses within parentheses, you need to escape +or use the `<...>` form: + +. +[link](foo(and(bar))) +. +

[link](foo(and(bar)))

+. + +. +[link](foo(and\(bar\))) +. +

link

+. + +. +[link]() +. +

link

+. + +Parentheses and other symbols can also be escaped, as usual +in Markdown: + +. +[link](foo\)\:) +. +

link

+. + +URL-escaping should be left alone inside the destination, as all +URL-escaped characters are also valid URL characters. HTML entities in +the destination will be parsed into their UTF-8 codepoints, as usual, and +optionally URL-escaped when written as HTML. + +. +[link](foo%20bä) +. +

link

+. + +Note that, because titles can often be parsed as destinations, +if you try to omit the destination and keep the title, you'll +get unexpected results: + +. +[link]("title") +. +

link

+. + +Titles may be in single quotes, double quotes, or parentheses: + +. +[link](/url "title") +[link](/url 'title') +[link](/url (title)) +. +

link +link +link

+. + +Backslash escapes and entities may be used in titles: + +. +[link](/url "title \""") +. +

link

+. + +Nested balanced quotes are not allowed without escaping: + +. +[link](/url "title "and" title") +. +

[link](/url "title "and" title")

+. + +But it is easy to work around this by using a different quote type: + +. +[link](/url 'title "and" title') +. +

link

+. + +(Note: `Markdown.pl` did allow double quotes inside a double-quoted +title, and its test suite included a test demonstrating this. +But it is hard to see a good rationale for the extra complexity this +brings, since there are already many ways---backslash escaping, +entities, or using a different quote type for the enclosing title---to +write titles containing double quotes. `Markdown.pl`'s handling of +titles has a number of other strange features. For example, it allows +single-quoted titles in inline links, but not reference links. And, in +reference links but not inline links, it allows a title to begin with +`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing +quotation mark, though 1.0.2b8 does not. It seems preferable to adopt +a simple, rational rule that works the same way in inline links and +link reference definitions.) + +[Whitespace] is allowed around the destination and title: + +. +[link]( /uri + "title" ) +. +

link

+. + +But it is not allowed between the link text and the +following parenthesis: + +. +[link] (/uri) +. +

[link] (/uri)

+. + +The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped: + +. +[link [foo [bar]]](/uri) +. +

link [foo [bar]]

+. + +. +[link] bar](/uri) +. +

[link] bar](/uri)

+. + +. +[link [bar](/uri) +. +

[link bar

+. + +. +[link \[bar](/uri) +. +

link [bar

+. + +The link text may contain inline content: + +. +[link *foo **bar** `#`*](/uri) +. +

link foo bar #

+. + +. +[![moon](moon.jpg)](/uri) +. +

moon

+. + +However, links may not contain other links, at any level of nesting. + +. +[foo [bar](/uri)](/uri) +. +

[foo bar](/uri)

+. + +. +[foo *[bar [baz](/uri)](/uri)*](/uri) +. +

[foo [bar baz](/uri)](/uri)

+. + +. +![[[foo](uri1)](uri2)](uri3) +. +

[foo](uri2)

+. + +These cases illustrate the precedence of link text grouping over +emphasis grouping: + +. +*[foo*](/uri) +. +

*foo*

+. + +. +[foo *bar](baz*) +. +

foo *bar

+. + +Note that brackets that *aren't* part of links do not take +precedence: + +. +*foo [bar* baz] +. +

foo [bar baz]

+. + +These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping: + +. +[foo +. +

[foo

+. + +. +[foo`](/uri)` +. +

[foo](/uri)

+. + +. +[foo +. +

[foohttp://example.com?search=](uri)

+. + +There are three kinds of [reference link](@reference-link)s: +[full](#full-reference-link), [collapsed](#collapsed-reference-link), +and [shortcut](#shortcut-reference-link). + +A [full reference link](@full-reference-link) +consists of a [link text], optional [whitespace], and a [link label] +that [matches] a [link reference definition] elsewhere in the document. + +A [link label](@link-label) begins with a left bracket (`[`) and ends +with the first right bracket (`]`) that is not backslash-escaped. +Unescaped square bracket characters are not allowed in +[link label]s. A link label can have at most 999 +characters inside the square brackets. + +One label [matches](@matches) +another just in case their normalized forms are equal. To normalize a +label, perform the *unicode case fold* and collapse consecutive internal +[whitespace] to a single space. If there are multiple +matching reference link definitions, the one that comes first in the +document is used. (It is desirable in such cases to emit a warning.) + +The contents of the first link label are parsed as inlines, which are +used as the link's text. The link's URI and title are provided by the +matching [link reference definition]. + +Here is a simple example: + +. +[foo][bar] + +[bar]: /url "title" +. +

foo

+. + +The rules for the [link text] are the same as with +[inline link]s. Thus: + +The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped: + +. +[link [foo [bar]]][ref] + +[ref]: /uri +. +

link [foo [bar]]

+. + +. +[link \[bar][ref] + +[ref]: /uri +. +

link [bar

+. + +The link text may contain inline content: + +. +[link *foo **bar** `#`*][ref] + +[ref]: /uri +. +

link foo bar #

+. + +. +[![moon](moon.jpg)][ref] + +[ref]: /uri +. +

moon

+. + +However, links may not contain other links, at any level of nesting. + +. +[foo [bar](/uri)][ref] + +[ref]: /uri +. +

[foo bar]ref

+. + +. +[foo *bar [baz][ref]*][ref] + +[ref]: /uri +. +

[foo bar baz]ref

+. + +(In the examples above, we have two [shortcut reference link]s +instead of one [full reference link].) + +The following cases illustrate the precedence of link text grouping over +emphasis grouping: + +. +*[foo*][ref] + +[ref]: /uri +. +

*foo*

+. + +. +[foo *bar][ref] + +[ref]: /uri +. +

foo *bar

+. + +These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping: + +. +[foo + +[ref]: /uri +. +

[foo

+. + +. +[foo`][ref]` + +[ref]: /uri +. +

[foo][ref]

+. + +. +[foo + +[ref]: /uri +. +

[foohttp://example.com?search=][ref]

+. + +Matching is case-insensitive: + +. +[foo][BaR] + +[bar]: /url "title" +. +

foo

+. + +Unicode case fold is used: + +. +[Толпой][Толпой] is a Russian word. + +[ТОЛПОЙ]: /url +. +

Толпой is a Russian word.

+. + +Consecutive internal [whitespace] is treated as one space for +purposes of determining matching: + +. +[Foo + bar]: /url + +[Baz][Foo bar] +. +

Baz

+. + +There can be [whitespace] between the [link text] and the [link label]: + +. +[foo] [bar] + +[bar]: /url "title" +. +

foo

+. + +. +[foo] +[bar] + +[bar]: /url "title" +. +

foo

+. + +When there are multiple matching [link reference definition]s, +the first is used: + +. +[foo]: /url1 + +[foo]: /url2 + +[bar][foo] +. +

bar

+. + +Note that matching is performed on normalized strings, not parsed +inline content. So the following does not match, even though the +labels define equivalent inline content: + +. +[bar][foo\!] + +[foo!]: /url +. +

[bar][foo!]

+. + +[Link label]s cannot contain brackets, unless they are +backslash-escaped: + +. +[foo][ref[] + +[ref[]: /uri +. +

[foo][ref[]

+

[ref[]: /uri

+. + +. +[foo][ref[bar]] + +[ref[bar]]: /uri +. +

[foo][ref[bar]]

+

[ref[bar]]: /uri

+. + +. +[[[foo]]] + +[[[foo]]]: /url +. +

[[[foo]]]

+

[[[foo]]]: /url

+. + +. +[foo][ref\[] + +[ref\[]: /uri +. +

foo

+. + +A [collapsed reference link](@collapsed-reference-link) +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document, optional [whitespace], and the string `[]`. +The contents of the first link label are parsed as inlines, +which are used as the link's text. The link's URI and title are +provided by the matching reference link definition. Thus, +`[foo][]` is equivalent to `[foo][foo]`. + +. +[foo][] + +[foo]: /url "title" +. +

foo

+. + +. +[*foo* bar][] + +[*foo* bar]: /url "title" +. +

foo bar

+. + +The link labels are case-insensitive: + +. +[Foo][] + +[foo]: /url "title" +. +

Foo

+. + + +As with full reference links, [whitespace] is allowed +between the two sets of brackets: + +. +[foo] +[] + +[foo]: /url "title" +. +

foo

+. + +A [shortcut reference link](@shortcut-reference-link) +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document and is not followed by `[]` or a link label. +The contents of the first link label are parsed as inlines, +which are used as the link's text. the link's URI and title +are provided by the matching link reference definition. +Thus, `[foo]` is equivalent to `[foo][]`. + +. +[foo] + +[foo]: /url "title" +. +

foo

+. + +. +[*foo* bar] + +[*foo* bar]: /url "title" +. +

foo bar

+. + +. +[[*foo* bar]] + +[*foo* bar]: /url "title" +. +

[foo bar]

+. + +The link labels are case-insensitive: + +. +[Foo] + +[foo]: /url "title" +. +

Foo

+. + +A space after the link text should be preserved: + +. +[foo] bar + +[foo]: /url +. +

foo bar

+. + +If you just want bracketed text, you can backslash-escape the +opening bracket to avoid links: + +. +\[foo] + +[foo]: /url "title" +. +

[foo]

+. + +Note that this is a link, because a link label ends with the first +following closing bracket: + +. +[foo*]: /url + +*[foo*] +. +

*foo*

+. + +Full references take precedence over shortcut references: + +. +[foo][bar] + +[foo]: /url1 +[bar]: /url2 +. +

foo

+. + +In the following case `[bar][baz]` is parsed as a reference, +`[foo]` as normal text: + +. +[foo][bar][baz] + +[baz]: /url +. +

[foo]bar

+. + +Here, though, `[foo][bar]` is parsed as a reference, since +`[bar]` is defined: + +. +[foo][bar][baz] + +[baz]: /url1 +[bar]: /url2 +. +

foobaz

+. + +Here `[foo]` is not parsed as a shortcut reference, because it +is followed by a link label (even though `[bar]` is not defined): + +. +[foo][bar][baz] + +[baz]: /url1 +[foo]: /url2 +. +

[foo]bar

+. + + +## Images + +Syntax for images is like the syntax for links, with one +difference. Instead of [link text], we have an +[image description](@image-description). The rules for this are the +same as for [link text], except that (a) an +image description starts with `![` rather than `[`, and +(b) an image description may contain links. +An image description has inline elements +as its contents. When an image is rendered to HTML, +this is standardly used as the image's `alt` attribute. + +. +![foo](/url "title") +. +

foo

+. + +. +![foo *bar*] + +[foo *bar*]: train.jpg "train & tracks" +. +

foo bar

+. + +. +![foo ![bar](/url)](/url2) +. +

foo bar

+. + +. +![foo [bar](/url)](/url2) +. +

foo bar

+. + +Though this spec is concerned with parsing, not rendering, it is +recommended that in rendering to HTML, only the plain string content +of the [image description] be used. Note that in +the above example, the alt attribute's value is `foo bar`, not `foo +[bar](/url)` or `foo bar`. Only the plain string +content is rendered, without formatting. + +. +![foo *bar*][] + +[foo *bar*]: train.jpg "train & tracks" +. +

foo bar

+. + +. +![foo *bar*][foobar] + +[FOOBAR]: train.jpg "train & tracks" +. +

foo bar

+. + +. +![foo](train.jpg) +. +

foo

+. + +. +My ![foo bar](/path/to/train.jpg "title" ) +. +

My foo bar

+. + +. +![foo]() +. +

foo

+. + +. +![](/url) +. +

+. + +Reference-style: + +. +![foo] [bar] + +[bar]: /url +. +

foo

+. + +. +![foo] [bar] + +[BAR]: /url +. +

foo

+. + +Collapsed: + +. +![foo][] + +[foo]: /url "title" +. +

foo

+. + +. +![*foo* bar][] + +[*foo* bar]: /url "title" +. +

foo bar

+. + +The labels are case-insensitive: + +. +![Foo][] + +[foo]: /url "title" +. +

Foo

+. + +As with full reference links, [whitespace] is allowed +between the two sets of brackets: + +. +![foo] +[] + +[foo]: /url "title" +. +

foo

+. + +Shortcut: + +. +![foo] + +[foo]: /url "title" +. +

foo

+. + +. +![*foo* bar] + +[*foo* bar]: /url "title" +. +

foo bar

+. + +Note that link labels cannot contain unescaped brackets: + +. +![[foo]] + +[[foo]]: /url "title" +. +

![[foo]]

+

[[foo]]: /url "title"

+. + +The link labels are case-insensitive: + +. +![Foo] + +[foo]: /url "title" +. +

Foo

+. + +If you just want bracketed text, you can backslash-escape the +opening `!` and `[`: + +. +\!\[foo] + +[foo]: /url "title" +. +

![foo]

+. + +If you want a link after a literal `!`, backslash-escape the +`!`: + +. +\![foo] + +[foo]: /url "title" +. +

!foo

+. + +## Autolinks + +[Autolink](@autolink)s are absolute URIs and email addresses inside +`<` and `>`. They are parsed as links, with the URL or email address +as the link label. + +A [URI autolink](@uri-autolink) consists of `<`, followed by an +[absolute URI] not containing `<`, followed by `>`. It is parsed as +a link to the URI, with the URI as the link's label. + +An [absolute URI](@absolute-uri), +for these purposes, consists of a [scheme] followed by a colon (`:`) +followed by zero or more characters other than ASCII +[whitespace] and control characters, `<`, and `>`. If +the URI includes these characters, you must use percent-encoding +(e.g. `%20` for a space). + +The following [schemes](@scheme) +are recognized (case-insensitive): +`coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`, +`cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`, +`gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`, +`ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`, +`mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`, +`ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`, +`service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,` +soap.beep`, `soap.beeps`, `tag`, `tel`, `telnet`, `tftp`, `thismessage`, +`tn3270`, `tip`, `tv`, `urn`, `vemmi`, `ws`, `wss`, `xcon`, +`xcon-userid`, `xmlrpc.beep`, `xmlrpc.beeps`, `xmpp`, `z39.50r`, +`z39.50s`, `adiumxtra`, `afp`, `afs`, `aim`, `apt`,` attachment`, `aw`, +`beshare`, `bitcoin`, `bolo`, `callto`, `chrome`,` chrome-extension`, +`com-eventbrite-attendee`, `content`, `cvs`,` dlna-playsingle`, +`dlna-playcontainer`, `dtn`, `dvb`, `ed2k`, `facetime`, `feed`, +`finger`, `fish`, `gg`, `git`, `gizmoproject`, `gtalk`, `hcp`, `icon`, +`ipn`, `irc`, `irc6`, `ircs`, `itms`, `jar`, `jms`, `keyparc`, `lastfm`, +`ldaps`, `magnet`, `maps`, `market`,` message`, `mms`, `ms-help`, +`msnim`, `mumble`, `mvn`, `notes`, `oid`, `palm`, `paparazzi`, +`platform`, `proxy`, `psyc`, `query`, `res`, `resource`, `rmi`, `rsync`, +`rtmp`, `secondlife`, `sftp`, `sgn`, `skype`, `smb`, `soldat`, +`spotify`, `ssh`, `steam`, `svn`, `teamspeak`, `things`, `udp`, +`unreal`, `ut2004`, `ventrilo`, `view-source`, `webcal`, `wtai`, +`wyciwyg`, `xfire`, `xri`, `ymsgr`. + +Here are some valid autolinks: + +. + +. +

http://foo.bar.baz

+. + +. + +. +

http://foo.bar.baz?q=hello&id=22&boolean

+. + +. + +. +

irc://foo.bar:2233/baz

+. + +Uppercase is also fine: + +. + +. +

MAILTO:FOO@BAR.BAZ

+. + +Spaces are not allowed in autolinks: + +. + +. +

<http://foo.bar/baz bim>

+. + +An [email autolink](@email-autolink) +consists of `<`, followed by an [email address], +followed by `>`. The link's label is the email address, +and the URL is `mailto:` followed by the email address. + +An [email address](@email-address), +for these purposes, is anything that matches +the [non-normative regex from the HTML5 +spec](https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)): + + /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? + (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ + +Examples of email autolinks: + +. + +. +

foo@bar.example.com

+. + +. + +. +

foo+special@Bar.baz-bar0.com

+. + +These are not autolinks: + +. +<> +. +

<>

+. + +. + +. +

<heck://bing.bong>

+. + +. +< http://foo.bar > +. +

< http://foo.bar >

+. + +. + +. +

<foo.bar.baz>

+. + +. + +. +

<localhost:5001/foo>

+. + +. +http://example.com +. +

http://example.com

+. + +. +foo@bar.example.com +. +

foo@bar.example.com

+. + +## Raw HTML + +Text between `<` and `>` that looks like an HTML tag is parsed as a +raw HTML tag and will be rendered in HTML without escaping. +Tag and attribute names are not limited to current HTML tags, +so custom tags (and even, say, DocBook tags) may be used. + +Here is the grammar for tags: + +A [tag name](@tag-name) consists of an ASCII letter +followed by zero or more ASCII letters or digits. + +An [attribute](@attribute) consists of [whitespace], +an [attribute name], and an optional +[attribute value specification]. + +An [attribute name](@attribute-name) +consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII +letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML +specification restricted to ASCII. HTML5 is laxer.) + +An [attribute value specification](@attribute-value-specification) +consists of optional [whitespace], +a `=` character, optional [whitespace], and an [attribute +value]. + +An [attribute value](@attribute-value) +consists of an [unquoted attribute value], +a [single-quoted attribute value], or a [double-quoted attribute value]. + +An [unquoted attribute value](@unquoted-attribute-value) +is a nonempty string of characters not +including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``. + +A [single-quoted attribute value](@single-quoted-attribute-value) +consists of `'`, zero or more +characters not including `'`, and a final `'`. + +A [double-quoted attribute value](@double-quoted-attribute-value) +consists of `"`, zero or more +characters not including `"`, and a final `"`. + +An [open tag](@open-tag) consists of a `<` character, a [tag name], +zero or more [attributes], optional [whitespace], an optional `/` +character, and a `>` character. + +A [closing tag](@closing-tag) consists of the string ``. + +An [HTML comment](@html-comment) consists of ``, +where *text* does not start with `>` or `->`, does not end with `-`, +and does not contain `--`. (See the +[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) + +A [processing instruction](@processing-instruction) +consists of the string ``, and the string +`?>`. + +A [declaration](@declaration) consists of the +string ``, and the character `>`. + +A [CDATA section](@cdata-section) consists of +the string ``, and the string `]]>`. + +An [HTML tag](@html-tag) consists of an [open tag], a [closing tag], +an [HTML comment], a [processing instruction], a [declaration], +or a [CDATA section]. + +Here are some simple open tags: + +. + +. +

+. + +Empty elements: + +. + +. +

+. + +[Whitespace] is allowed: + +. + +. +

+. + +With attributes: + +. + +. +

+. + +Illegal tag names, not parsed as HTML: + +. +<33> <__> +. +

<33> <__>

+. + +Illegal attribute names: + +. +
+. +

<a h*#ref="hi">

+. + +Illegal attribute values: + +. +
+. +

</a href="foo">

+. + +Comments: + +. +foo +. +

foo

+. + +. +foo +. +

foo <!-- not a comment -- two hyphens -->

+. + +Not comments: + +. +foo foo --> + +foo +. +

foo <!--> foo -->

+

foo <!-- foo--->

+. + +Processing instructions: + +. +foo +. +

foo

+. + +Declarations: + +. +foo +. +

foo

+. + +CDATA sections: + +. +foo &<]]> +. +

foo &<]]>

+. + +Entities are preserved in HTML attributes: + +. +
+. +

+. + +Backslash escapes do not work in HTML attributes: + +. + +. +

+. + +. + +. +

<a href=""">

+. + +## Hard line breaks + +A line break (not in a code span or HTML tag) that is preceded +by two or more spaces and does not occur at the end of a block +is parsed as a [hard line break](@hard-line-break) (rendered +in HTML as a `
` tag): + +. +foo +baz +. +

foo
+baz

+. + +For a more visible alternative, a backslash before the +[line ending] may be used instead of two spaces: + +. +foo\ +baz +. +

foo
+baz

+. + +More than two spaces can be used: + +. +foo +baz +. +

foo
+baz

+. + +Leading spaces at the beginning of the next line are ignored: + +. +foo + bar +. +

foo
+bar

+. + +. +foo\ + bar +. +

foo
+bar

+. + +Line breaks can occur inside emphasis, links, and other constructs +that allow inline content: + +. +*foo +bar* +. +

foo
+bar

+. + +. +*foo\ +bar* +. +

foo
+bar

+. + +Line breaks do not occur inside code spans + +. +`code +span` +. +

code span

+. + +. +`code\ +span` +. +

code\ span

+. + +or HTML tags: + +. +
+. +

+. + +. + +. +

+. + +Hard line breaks are for separating inline content within a block. +Neither syntax for hard line breaks works at the end of a paragraph or +other block element: + +. +foo\ +. +

foo\

+. + +. +foo +. +

foo

+. + +. +### foo\ +. +

foo\

+. + +. +### foo +. +

foo

+. + +## Soft line breaks + +A regular line break (not in a code span or HTML tag) that is not +preceded by two or more spaces is parsed as a softbreak. (A +softbreak may be rendered in HTML either as a +[line ending] or as a space. The result will be the same +in browsers. In the examples here, a [line ending] will be used.) + +. +foo +baz +. +

foo +baz

+. + +Spaces at the end of the line and beginning of the next line are +removed: + +. +foo + baz +. +

foo +baz

+. + +A conforming parser may render a soft line break in HTML either as a +line break or as a space. + +A renderer may also provide an option to render soft line breaks +as hard line breaks. + +## Textual content + +Any characters not given an interpretation by the above rules will +be parsed as plain textual content. + +. +hello $.;'there +. +

hello $.;'there

+. + +. +Foo χρῆν +. +

Foo χρῆν

+. + +Internal spaces are preserved verbatim: + +. +Multiple spaces +. +

Multiple spaces

+. + + + +# Appendix A: A parsing strategy {-} + +## Overview {-} + +Parsing has two phases: + +1. In the first phase, lines of input are consumed and the block +structure of the document---its division into paragraphs, block quotes, +list items, and so on---is constructed. Text is assigned to these +blocks but not parsed. Link reference definitions are parsed and a +map of links is constructed. + +2. In the second phase, the raw text contents of paragraphs and headers +are parsed into sequences of Markdown inline elements (strings, +code spans, links, emphasis, and so on), using the map of link +references constructed in phase 1. + +## The document tree {-} + +At each point in processing, the document is represented as a tree of +**blocks**. The root of the tree is a `document` block. The `document` +may have any number of other blocks as **children**. These children +may, in turn, have other blocks as children. The last child of a block +is normally considered **open**, meaning that subsequent lines of input +can alter its contents. (Blocks that are not open are **closed**.) +Here, for example, is a possible document tree, with the open blocks +marked by arrows: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## How source lines alter the document tree {-} + +Each line that is processed has an effect on this tree. The line is +analyzed and, depending on its contents, the document may be altered +in one or more of the following ways: + +1. One or more open blocks may be closed. +2. One or more new blocks may be created as children of the + last open block. +3. Text may be added to the last (deepest) open block remaining + on the tree. + +Once a line has been incorporated into the tree in this way, +it can be discarded, so input can be read in a stream. + +We can see how this works by considering how the tree above is +generated by four lines of Markdown: + +``` markdown +> Lorem ipsum dolor +sit amet. +> - Qui *quodsi iracundia* +> - aliquando id +``` + +At the outset, our document model is just + +``` tree +-> document +``` + +The first line of our text, + +``` markdown +> Lorem ipsum dolor +``` + +causes a `block_quote` block to be created as a child of our +open `document` block, and a `paragraph` block as a child of +the `block_quote`. Then the text is added to the last open +block, the `paragraph`: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor" +``` + +The next line, + +``` markdown +sit amet. +``` + +is a "lazy continuation" of the open `paragraph`, so it gets added +to the paragraph's text: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor\nsit amet." +``` + +The third line, + +``` markdown +> - Qui *quodsi iracundia* +``` + +causes the `paragraph` block to be closed, and a new `list` block +opened as a child of the `block_quote`. A `list_item` is also +added as a child of the `list`, and a `paragraph` as a child of +the `list_item`. The text is then added to the new `paragraph`: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + -> list_item + -> paragraph + "Qui *quodsi iracundia*" +``` + +The fourth line, + +``` markdown +> - aliquando id +``` + +causes the `list_item` (and its child the `paragraph`) to be closed, +and a new `list_item` opened up as child of the `list`. A `paragraph` +is added as a child of the new `list_item`, to contain the text. +We thus obtain the final tree: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## From block structure to the final document {-} + +Once all of the input has been parsed, all open blocks are closed. + +We then "walk the tree," visiting every node, and parse raw +string contents of paragraphs and headers as inlines. At this +point we have seen all the link reference definitions, so we can +resolve reference links as we go. + +``` tree +document + block_quote + paragraph + str "Lorem ipsum dolor" + softbreak + str "sit amet." + list (type=bullet tight=true bullet_char=-) + list_item + paragraph + str "Qui " + emph + str "quodsi iracundia" + list_item + paragraph + str "aliquando id" +``` + +Notice how the [line ending] in the first paragraph has +been parsed as a `softbreak`, and the asterisks in the first list item +have become an `emph`. + +The document can be rendered as HTML, or in any other format, given +an appropriate renderer. diff --git a/test/spec_tests.py b/test/spec_tests.py index cc676be..b1b0373 100755 --- a/test/spec_tests.py +++ b/test/spec_tests.py @@ -13,7 +13,7 @@ if __name__ == "__main__": parser = argparse.ArgumentParser(description='Run cmark tests.') parser.add_argument('-p', '--program', dest='program', nargs='?', default=None, help='program to test') - parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='spec.txt', + parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='test/spec.txt', help='path to spec') parser.add_argument('-P', '--pattern', dest='pattern', nargs='?', default=None, help='limit to sections matching regex pattern') -- cgit v1.2.3