diff options
Diffstat (limited to 'test')
-rw-r--r-- | test/spec.txt | 22 |
1 files changed, 12 insertions, 10 deletions
diff --git a/test/spec.txt b/test/spec.txt index 9b2b977..6c660bb 100644 --- a/test/spec.txt +++ b/test/spec.txt @@ -212,12 +212,8 @@ to a certain encoding. A [line](@line) is a sequence of zero or more [character]s followed by a [line ending] or by the end of file. -A [line ending](@line-ending) is, depending on the platform, a -newline (`U+000A`), carriage return (`U+000D`), or -carriage return + newline. - -For security reasons, a conforming parser must strip or replace the -Unicode character `U+0000`. +A [line ending](@line-ending) is a newline (`U+000A`), carriage return +(`U+000D`), or carriage return + newline. A line containing no characters, or a line containing only spaces (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). @@ -270,6 +266,11 @@ Tabs in lines are expanded to spaces, with a tab stop of 4 characters: </code></pre> . +## Insecure characters + +For security reasons, the Unicode character `U+0000` must be replaced +with the replacement character (`U+FFFD`). + # Blocks and inlines We can think of a document as a sequence of @@ -4284,13 +4285,14 @@ corresponding codepoints. [Decimal entities](@decimal-entities) consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised and transformed into their corresponding -unicode codepoints. Invalid unicode codepoints will be written as the -"unknown codepoint" character (`0xFFFD`) +unicode codepoints. Invalid unicode codepoints will be replaced by +the "unknown codepoint" character (`U+FFFD`). For security reasons, +the codepoint `U+0000` will also be replaced by `U+FFFD`. . -# Ӓ Ϡ � +# Ӓ Ϡ � � . -<p># Ӓ Ϡ �</p> +<p># Ӓ Ϡ � �</p> . [Hexadecimal entities](@hexadecimal-entities) |