summaryrefslogtreecommitdiff
path: root/spec.txt
diff options
context:
space:
mode:
authorJohn MacFarlane <jgm@berkeley.edu>2014-10-07 22:24:53 -0700
committerJohn MacFarlane <jgm@berkeley.edu>2014-10-07 22:24:53 -0700
commitd3c3e749f4f7b95a9604f751cf993fd488a15b19 (patch)
tree03e132839569e7b87edb75f2960867dda6e83b4f /spec.txt
parent3d99baba064091f74b9da78eaed38fcf4875af46 (diff)
Cleaned up entity section of spec.
We convert entities to unicode characters, not UTF-8 sequences. (Though they might ultimately be output that way.)
Diffstat (limited to 'spec.txt')
-rw-r--r--spec.txt41
1 files changed, 24 insertions, 17 deletions
diff --git a/spec.txt b/spec.txt
index db62f53..489b9c0 100644
--- a/spec.txt
+++ b/spec.txt
@@ -3727,21 +3727,25 @@ foo
## Entities
-With the goal of making this standard as HTML-agnostic as possible, all HTML valid HTML Entities in any
-context are recognized as such and converted into their actual values (i.e. the UTF8 characters representing
-the entity itself) before they are stored in the AST.
+With the goal of making this standard as HTML-agnostic as possible, all
+valid HTML entities in any context are recognized as such and
+converted into unicode characters before they are stored in the AST.
-This allows implementations that target HTML output to trivially escape the entities when generating HTML,
-and simplifies the job of implementations targetting other languages, as these will only need to handle the
-UTF8 chars and need not be HTML-entity aware.
+This allows implementations that target HTML output to trivially escape
+the entities when generating HTML, and simplifies the job of
+implementations targetting other languages, as these will only need to
+handle the unicode chars and need not be HTML-entity aware.
[Named entities](#name-entities) <a id="named-entities"></a> consist of `&`
-+ any of the valid HTML5 entity names + `;`. The [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json)
-is used as an authoritative source of the valid entity names and their corresponding codepoints.
++ any of the valid HTML5 entity names + `;`. The
+[following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json)
+is used as an authoritative source of the valid entity names and their
+corresponding codepoints.
-Conforming implementations that target Markdown don't need to generate entities for all the valid
-named entities that exist, with the exception of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`),
-which always need to be written as entities for security reasons.
+Conforming implementations that target HTML don't need to generate
+entities for all the valid named entities that exist, with the exception
+of `"` (`&quot;`), `&` (`&amp;`), `<` (`&lt;`) and `>` (`&gt;`), which
+always need to be written as entities for security reasons.
.
&nbsp; &amp; &copy; &AElig; &Dcaron; &frac34; &HilbertSpace; &DifferentialD; &ClockwiseContourIntegral;
@@ -3750,9 +3754,10 @@ which always need to be written as entities for security reasons.
.
[Decimal entities](#decimal-entities) <a id="decimal-entities"></a>
-consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised
-and tranformed into their corresponding UTF8 codepoints. Invalid Unicode codepoints will be written
-as the "unknown codepoint" character (`0xFFFD`)
+consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
+entities need to be recognised and tranformed into their corresponding
+UTF8 codepoints. Invalid Unicode codepoints will be written as the
+"unknown codepoint" character (`0xFFFD`)
.
&#35; &#1234; &#992; &#98765432;
@@ -3779,7 +3784,8 @@ Here are some nonentities:
.
Although HTML5 does accept some entities without a trailing semicolon
-(such as `&copy`), these are not recognized as entities here, because it makes the grammar too ambiguous:
+(such as `&copy`), these are not recognized as entities here, because it
+makes the grammar too ambiguous:
.
&copy
@@ -3787,7 +3793,8 @@ Although HTML5 does accept some entities without a trailing semicolon
<p>&amp;copy</p>
.
-Strings that are not on the list of HTML5 named entities are not recognized as entities either:
+Strings that are not on the list of HTML5 named entities are not
+recognized as entities either:
.
&MadeUpEntity;
@@ -4836,7 +4843,7 @@ in Markdown:
URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in
-the destination will be parsed into their UTF8 codepoints, as usual, and
+the destination will be parsed into their UTF-8 codepoints, as usual, and
optionally URL-escaped when written as HTML.
.