Age | Commit message (Collapse) | Author |
|
|
|
|
|
This will need corresponding spec changes.
The change is this: when considering matches between an interior
delimiter run (one that can open and can close) and another delimiter
run, we require that the sum of the lengths of the two delimiter
runs mod 3 is not 0.
Thus, for example, in
*a**b*
1 23 4
delimiter 1 cannot match 2, since the sum of the lengths of
the first delimiter run (1) and the second (1,2) == 3.
Thus we get `<em>a**b</em>` instead of `<em>a</em><em>b</em>`.
This gives better behavior on things like
*a**b**c*
which previously got parsed as
<em>a</em><em>b</em><em>c</em>
and now would be parsed as
<em>a<strong>b</strong>c</em>
With this change we get four spec test failures, but in each
case the output seems more "intuitive":
```
Example 386 (lines 6490-6494) Emphasis and strong emphasis
*foo**bar**baz*
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em>foo</em><em>bar</em><em>baz</em></p>
+<p><em>foo<strong>bar</strong>baz</em></p>
Example 389 (lines 6518-6522) Emphasis and strong emphasis
*foo**bar***
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em>foo</em><em>bar</em>**</p>
+<p><em>foo<strong>bar</strong></em></p>
Example 401 (lines 6620-6624) Emphasis and strong emphasis
**foo*bar*baz**
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em><em>foo</em>bar</em>baz**</p>
+<p><strong>foo<em>bar</em>baz</strong></p>
Example 442 (lines 6944-6948) Emphasis and strong emphasis
**foo*bar**
--- expected HTML
+++ actual HTML
@@ -1 +1 @@
-<p><em><em>foo</em>bar</em>*</p>
+<p><strong>foo*bar</strong></p>
```
|
|
It is no longer needed; only the brackets struct needs it.
Thanks to @robinst.
|
|
|
|
This is too strict, as it prevents the use of dynamically
loaded extensions: see
https://github.com/jgm/cmark/pull/123#discussion_r67231518.
Documented in man page and public header that one should use the same
memory allocator for every node in a tree.
|
|
|
|
|
|
|
|
See https://github.com/jgm/commonmark.js/pull/101
This uses a separate stack for brackets, instead of
putting them on the delimiter stack. This avoids the
need for looking through the delimiter stack for the next
bracket.
It also avoids a shortcut reference lookup when the reference
text contains brackets.
The change dramatically improved performance on the nested links
pathological test for commonmark.js. It has a smaller but measurable
effect here.
|
|
This reverts commit c069cb55bcadfd0f45890d846ff412b3c892eb87.
|
|
|
|
We reuse the parser for reference labels, instead
of just assuming that a slice of the link text
will be a valid reference label. (It might contain
interior brackets, for example.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reduce the storage size for the `cmark_code` struct
|
|
|
|
Save node information in flags instead of using one boolean for each
property.
|
|
|
|
|
|
|
|
|
|
The previous work for unbounded memory usage and overflows on the buffer
API had several shortcomings:
1. The total size of the buffer was limited by arbitrarily small
precision on the storage type for buffer indexes (typedef'd as
`bufsize_t`). This is not a good design pattern in secure applications,
particualarly since it requires the addition of helper functions to cast
to/from the native `size` types and the custom type for the buffer, and
check for overflows.
2. The library was calling `abort` on overflow and memory allocation
failures. This is not a good practice for production libraries, since it
turns a potential RCE into a trivial, guaranteed DoS to the whole
application that is linked against the library. It defeats the whole
point of performing overflow or allocation checks when the checks will
crash the library and the enclosing program anyway.
3. The default size limits for buffers were essentially unbounded
(capped to the precision of the storage type) and could lead to DoS
attacks by simple memory exhaustion (particularly critical in 32-bit
platforms). This is not a good practice for a library that handles
arbitrary user input.
Hence, this patchset provides slight (but in my opinion critical)
improvements on this area, copying some of the patterns we've used in
the past for high throughput, security sensitive Markdown parsers:
1. The storage type for buffer sizes is now platform native (`ssize_t`).
Ideally, this would be a `size_t`, but several parts of the code expect
buffer indexes to be possibly negative. Either way, switching to a
`size` type is an strict improvement, particularly in 64-bit platforms.
All the helpers that assured that values cannot escape the `size` range
have been removed, since they are superfluous.
2. The overflow checks have been removed. Instead, the maximum size for
a buffer has been set to a safe value for production usage (32mb) that
can be proven not to overflow in practice. Users that need to parse
particularly large Markdown documents can increase this value. A static,
compile-time check has been added to ensure that the maximum buffer size
cannot overflow on any growth operations.
3. The library no longer aborts on buffer overflow. The CMark library
now follows the convention of other Markdown implementations (such as
Hoedown and Sundown) and silently handles buffer overflows and
allocation failures by dropping data from the buffer. The result is
that pathological Markdown documents that try to exploit the library
will instead generate truncated (but valid, and safe) outputs.
All tests after these small refactorings have been verified to pass.
---
NOTE: Regarding 32 bit overflows, generating test cases that crash the
library is trivial (any input document larger than 2gb will crash
CMark), but most Python implementations have issues with large strings
to begin with, so a test case cannot be added to the pathological tests
suite, since it's written in Python.
|
|
Fix ctypes in Python FFI calls
|
|
Fix character type detection in commonmark.c
|
|
This didn't cause problems so far because
- all types are 32-bit on 32-bit systems and
- arguments are passed in registers on x86-64.
The wrong types could cause crashes on other platforms, though.
|
|
- Implement cmark_isalpha.
- Check for ASCII character before implicit cast to char.
- Use internal ctype functions in commonmark.c.
Fixes test failures on Windows and undefined behavior.
|
|
We don't want a blank line before a code block when it's
the first thing in a list item.
|
|
In the commonmark writer we separate lists, and lists and
indented code, using a dummy HTML comment rather than two
blank lines (this is more portable).
So in evaluating the round-trip tests, we now strip out
these comments.
We also normalize HTML to avoid issues having to do with
line breaks.
|
|
This replaces the old use of simple shell scripts.
It is much faster, and more flexible. (We will be able
to do custom normalization and skip certain tests.)
|
|
|
|
|
|
|
|
|
|
We generally want this option to prohibit any breaking
in things like headers (not just wraps, but softbreaks).
|
|
Previously they actually ran cmark instead of the round-trip
version, since there was a bug in setting the ROUNDTRIP
variable.
Now round trip tests fail! This was unnoticed before.
See #131.
|
|
This is an alternate solution for pull request #132,
which introduced a new warning on the comparison:
latex.c:191:20: warning: comparison of integers of
different signs: 'size_t' (aka 'unsigned long') and 'bufsize_t'
(aka 'int') [-Wsign-compare]
if (realurllen == link_text->as.literal.len &&
~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
inlines: Remove unused variable "link_text"
|
|
Changed type from int to size_t to fix implicit type conversion warning
|
|
|
|
|
|
Add 2016 to copyright
|
|
I thought I had an outdated version of the binary because it printed 2015 for
the version string.
|
|
Fix tests under MinGW
|
|
- Fix PATH for api_test, see:
https://cmake.org/pipermail/cmake/2009-May/029423.html
- DLL is named libcmark.dll under MinGW.
|
|
in cmark.h and its man page. Closes #124.
|