From 158bbebe1a0eede2122feecd6f6b5aee9a53468d Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Mon, 3 Nov 2014 17:36:01 -0800 Subject: Removed artificial rule for emph/strong markers. Previously there was a rule that nothing in a string of more than 3 `*` or `_` characters could close or start emphasis. This was artifical and led to strange asymmetries, e.g. you could have `*a *b**` emph within emph but not `**a **b****` strong within strong. The new parsing strategy makes it easy to remove this limitation. Spec, js, and c implementations have been updated. Spec might need some further grooming. --- spec.txt | 104 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 55 insertions(+), 49 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 1bbd287..3eabb31 100644 --- a/spec.txt +++ b/spec.txt @@ -4250,60 +4250,52 @@ for efficient parsing strategies that do not backtrack: 1. A single `*` character [can open emphasis](#can-open-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `*`s, - (b) it is not followed by whitespace, and - (c) either it is not followed by a `*` character or it is + (a) it is not followed by whitespace, and + (b) either it is not followed by a `*` character or it is followed immediately by emphasis or strong emphasis. 2. A single `_` character [can open emphasis](#can-open-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `_`s, - (b) it is not followed by whitespace, - (c) it is not preceded by an ASCII alphanumeric character, and - (d) either it is not followed by a `_` character or it is + (a) it is not followed by whitespace, + (b) it is not preceded by an ASCII alphanumeric character, and + (c) either it is not followed by a `_` character or it is followed immediately by emphasis or strong emphasis. 3. A single `*` character [can close emphasis](#can-close-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `*`s, and (b) it is not preceded by whitespace. 4. A single `_` character [can close emphasis](#can-close-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `_`s, - (b) it is not preceded by whitespace, and - (c) it is not followed by an ASCII alphanumeric character. + (a) it is not preceded by whitespace, and + (b) it is not followed by an ASCII alphanumeric character. 5. A double `**` [can open strong emphasis](#can-open-strong-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `*`s, - (b) it is not followed by whitespace, and - (c) either it is not followed by a `*` character or it is + (a) it is not followed by whitespace, and + (b) either it is not followed by a `*` character or it is followed immediately by emphasis. 6. A double `__` [can open strong emphasis](#can-open-strong-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `_`s, - (b) it is not followed by whitespace, and - (c) it is not preceded by an ASCII alphanumeric character, and - (d) either it is not followed by a `_` character or it is + (a) it is not followed by whitespace, and + (b) it is not preceded by an ASCII alphanumeric character, and + (c) either it is not followed by a `_` character or it is followed immediately by emphasis. 7. A double `**` [can close strong emphasis](#can-close-strong-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `*`s, and - (b) it is not preceded by whitespace. + (a) it is not preceded by whitespace. 8. A double `__` [can close strong emphasis](#can-close-strong-emphasis) iff - (a) it is not part of a sequence of four or more unescaped `_`s, - (b) it is not preceded by whitespace, and - (c) it is not followed by an ASCII alphanumeric character. + (a) it is not preceded by whitespace, and + (b) it is not followed by an ASCII alphanumeric character. 9. Emphasis begins with a delimiter that [can open emphasis](#can-open-emphasis) and ends with a delimiter that [can close @@ -4544,19 +4536,13 @@ and __foo bar __

and __foo bar __

. -The rules imply that a sequence of four or more unescaped `*` or -`_` characters will always be parsed as a literal string: - -. -****hi**** -. -

****hi****

-. +The rules imply that a sequence of `*` or `_` characters +surrounded by whitespace will be parsed as a literal string: . -_____hi_____ +foo ******** . -

_____hi_____

+

foo ********

. . @@ -4827,8 +4813,7 @@ the internal delimiters [can close emphasis](#can-close-emphasis), while in the cases with spaces, they cannot. Note that you cannot nest emphasis directly inside emphasis -using the same delimeter, or strong emphasis directly inside -strong emphasis: +using the same delimeter: . **foo** @@ -4836,22 +4821,25 @@ strong emphasis:

foo

. +For this, you need to switch delimiters: + . -****foo**** +*_foo_* . -

****foo****

+

foo

. -For these nestings, you need to switch delimiters: +Strong within strong is possible without switching +delimiters: . -*_foo_* +****foo**** . -

foo

+

foo

. . -**__foo__** +____foo____ .

foo

. @@ -4890,21 +4878,19 @@ similarly for `_` and `__`):

foo bar**

. -The following contains no strong emphasis, because the opening -delimiter is closed by the first `*` before `bar`: - . -*foo**bar*** +*foo**** . -

foobar**

+

foo***

. -However, a string of four or more `****` can never close emphasis: +The following contains no strong emphasis, because the opening +delimiter is closed by the first `*` before `bar`: . -*foo**** +*foo**bar*** . -

*foo****

+

foobar**

. We retain symmetry in these cases: @@ -4927,6 +4913,26 @@ We retain symmetry in these cases:

foo bar

. +. +**foo*** + +***foo** +. +

foo*

+

*foo

+. + +. +**foo **bar**** + +****foo** bar** +. +

foo bar

+

foo bar

+. + + + More cases with mismatched delimiters: . -- cgit v1.2.3