From 45c1d9fadb3e8aab4a01bb27a4e2ece379902d1a Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Thu, 4 Sep 2014 17:26:11 +0200 Subject: 426/15 --- spec.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 82ae0b6..d7e70f5 100644 --- a/spec.txt +++ b/spec.txt @@ -1682,7 +1682,7 @@ them. [Foo bar] . -

Foo bar

+

Foo bar

. The title may be omitted: @@ -1745,7 +1745,7 @@ case-insensitive (see [matches](#matches)). [αγω] . -

αγω

+

αγω

. Here is a link reference definition with no corresponding link. @@ -3688,7 +3688,7 @@ raw HTML: . . -

http://google.com?find=\*

+

http://google.com?find=\*

. . -- cgit v1.2.3 From d8f44f1e4f0bd944ab43e6434a1579d670ed66cf Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Thu, 4 Sep 2014 17:49:13 +0200 Subject: 433/8 --- spec.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index d7e70f5..cfda2a3 100644 --- a/spec.txt +++ b/spec.txt @@ -3946,7 +3946,7 @@ But this is a link: . ` . -

http://foo.bar.`baz`

+

http://foo.bar.`baz`

. And this is an HTML tag: -- cgit v1.2.3 From 38220c56c9a888a0c00ff22fb82ba156fec1f6a8 Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Thu, 4 Sep 2014 17:54:37 +0200 Subject: 5 failed --- spec.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index cfda2a3..a353d56 100644 --- a/spec.txt +++ b/spec.txt @@ -3688,7 +3688,7 @@ raw HTML: . . -

http://google.com?find=\*

+

http://google.com?find=\*

. . @@ -4755,7 +4755,7 @@ braces: . [link]() . -

link

+

link

. The destination cannot contain line breaks, even with pointy braces: @@ -4821,7 +4821,7 @@ get unexpected results: . [link]("title") . -

link

+

link

. Titles may be in single quotes, double quotes, or parentheses: -- cgit v1.2.3 From d260c800c90e024714a6d84e28ac2caea70866e7 Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Thu, 4 Sep 2014 20:04:12 +0200 Subject: This spec was correct --- spec.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index a353d56..616cb96 100644 --- a/spec.txt +++ b/spec.txt @@ -3688,7 +3688,7 @@ raw HTML: . . -

http://google.com?find=\*

+

http://google.com?find=\*

. . -- cgit v1.2.3 From 798f58a2b614280201141b398c8e498cecc8ab5e Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Sat, 6 Sep 2014 21:17:23 +0200 Subject: This is going well --- spec.txt | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 616cb96..ebd6d98 100644 --- a/spec.txt +++ b/spec.txt @@ -3688,7 +3688,7 @@ raw HTML: . . -

http://google.com?find=\*

+

http://google.com?find=\*

. . @@ -3727,25 +3727,37 @@ foo ## Entities -Entities are parsed as entities, not as literal text, in all contexts -except code spans and code blocks. Three kinds of entities are recognized. +With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any +context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing +the entity itself) before they are stored in the AST. + +This allows implementations that target HTML output to trivially escape the entities when generating HTML, +and simplifies the job of implementations targetting other languages, as these will only need to handle the +UTF8 chars and need not be HTML-entity aware. [Named entities](#name-entities) consist of `&` -+ a string of 2-32 alphanumerics beginning with a letter + `;`. ++ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json) +is used as an authoritative source of the valid entity names and their corresponding codepoints. + +Conforming implementations that target Markdown don't need to generate entities for all the valid +named entities that exist, with the exception of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), +which always need to be written as entities for security reasons. .   & © Æ Ď ¾ ℋ ⅆ ∲ . -

  & © Æ Ď ¾ ℋ ⅆ ∲

+

  & © Æ Ď ¾ ℋ ⅆ ∲

. [Decimal entities](#decimal-entities) -consist of `&#` + a string of 1--8 arabic digits + `;`. +consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised +and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written +as the "unknown codepoint" character (`0xFFFD`) . - # Ӓ Ϡ � +# Ӓ Ϡ � . -

 # Ӓ Ϡ �

+

# Ӓ Ϡ �

. [Hexadecimal entities](#hexadecimal-entities) @@ -3767,7 +3779,7 @@ Here are some nonentities: . Although HTML5 does accept some entities without a trailing semicolon -(such as `©`), these are not recognized as entities here: +(such as `©`), these are not recognized as entities here, because it makes the grammar too ambiguous: . © @@ -3775,13 +3787,12 @@ Although HTML5 does accept some entities without a trailing semicolon

&copy

. -On the other hand, many strings that are not on the list of HTML5 -named entities are recognized as entities here: +Strings that are not on the list of HTML5 named entities are not recognized as entities either: . &MadeUpEntity; . -

&MadeUpEntity;

+

&MadeUpEntity;

. Entities are recognized in any context besides code spans or -- cgit v1.2.3 From 9d86d2f32303ae0048f6a5daa552bacceb9b12ea Mon Sep 17 00:00:00 2001 From: Vicent Marti Date: Tue, 9 Sep 2014 04:00:36 +0200 Subject: Update the spec with better entity handling --- spec.txt | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index ebd6d98..112dccc 100644 --- a/spec.txt +++ b/spec.txt @@ -3762,20 +3762,20 @@ as the "unknown codepoint" character (`0xFFFD`) [Hexadecimal entities](#hexadecimal-entities) consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits -+ `;`. ++ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST. . - " ആ ಫ +" ആ ಫ . -

 " ആ ಫ

+

" ആ ಫ

. Here are some nonentities: . -  &x; &#; &#x; � &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; +  &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; . -

&nbsp &x; &#; &#x; &#123456789; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;

+

&nbsp &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;

. Although HTML5 does accept some entities without a trailing semicolon @@ -3808,7 +3808,7 @@ code blocks, including raw HTML, URLs, [link titles](#link-title), and . [foo](/föö "föö") . -

foo

+

foo

. . @@ -3816,7 +3816,7 @@ code blocks, including raw HTML, URLs, [link titles](#link-title), and [foo]: /föö "föö" . -

foo

+

foo

. . @@ -3824,7 +3824,7 @@ code blocks, including raw HTML, URLs, [link titles](#link-title), and foo ``` . -
foo
+
foo
 
. @@ -4817,12 +4817,14 @@ in Markdown:

link

. -URL-escaping and entities should be left alone inside the destination: +URL-escaping and should be left alone inside the destination, as all URL-escaped characters +are also valid URL characters. HTML entities in the destination will be parsed into their UTF8 +codepoints, as usual, and optionally URL-escaped when written as HTML. . [link](foo%20bä) . -

link

+

link

. Note that, because titles can often be parsed as destinations, -- cgit v1.2.3 From 9c08b31793f269e4b5902908282034618ee66eef Mon Sep 17 00:00:00 2001 From: Alex Kocharin Date: Tue, 16 Sep 2014 00:44:52 +0400 Subject: typo fix --- spec.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 4a9e9fd..40d04f2 100644 --- a/spec.txt +++ b/spec.txt @@ -4030,7 +4030,7 @@ for efficient parsing strategies that do not backtrack: (a) it is not part of a sequence of four or more unescaped `_`s, (b) it is not followed by whitespace, - (c) is is not preceded by an ASCII alphanumeric character, and + (c) it is not preceded by an ASCII alphanumeric character, and (d) either it is not followed by a `_` character or it is followed immediately by strong emphasis. -- cgit v1.2.3 From c4b76cf93c8c54b6a33bab82056dc542c6630d92 Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Fri, 19 Sep 2014 18:11:33 -0700 Subject: spec: Fixed date, version. Closes #133. --- spec.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 040c060..fce8792 100644 --- a/spec.txt +++ b/spec.txt @@ -2,8 +2,8 @@ title: CommonMark Spec author: - John MacFarlane -version: 1 -date: 2014-09-06 +version: 2 +date: 2014-09-19 ... # Introduction -- cgit v1.2.3