Age | Commit message (Collapse) | Author |
|
Introduced by a recent commit. Found by OSS-Fuzz.
|
|
Introduce multi-purpose data/len members in struct cmark_node. This
is mainly used to store literal text for inlines, code and HTML blocks.
Move the content strbuf for blocks from cmark_node to cmark_parser.
When finalizing nodes that allow inlines (paragraphs and headings),
detach the strbuf and store the block content in the node's data/len
members. Free the block content after processing inlines.
Reduces size of struct cmark_node by 8 bytes.
|
|
Use zero-terminated C strings and a separate length field instead of
cmark_chunks. Literal inline text will now be copied from the parent
block's content buffer, slowing the benchmark down by 10-15%.
The node struct never references memory of other nodes now, fixing #309.
Node accessors don't have to check for delayed creation of C strings,
so parsing and iterating all literals using the public API should
actually be faster than before.
|
|
Use zero-terminated C strings instead of cmark_chunks without storing
the length. This introduces a few additional strlen computations,
but overhead should be low.
Allows to reduce size of struct cmark_node later.
|
|
|
|
When CMARK_OPT_SMART is enabled, we escape literal `-`,
`.`, and quote characters when needed to avoid their
being "smartified."
See e.g. jgm/pandoc#6041 for an application.
|
|
|
|
The string literal being assigned is const, but the assignment looses
the constness of this string. This enables building with `/Zc:strictString`
with MSVC as well.
|
|
This solves problems with adjacent code blocks being
merged.
|
|
by inserting an HTML comment. Closes #317.
I think I'll follow up with a change to use fenced code
blocks, but this was the minimal fix.
|
|
Closes #316.
|
|
URL-escape special characters when escape mode is URL,
and not otherwise.
Entity-escape control characters (< 0x20) in non-literal
escape modes.
|
|
This is needed for round-trip tests.
|
|
These affect both parsing and writing commonmark.
|
|
A UBSAN warning can be triggered when handling a long sequence of backticks:
src/commonmark.c:98:20: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
which can be triggered by:
```
| a | b |
| --- | --** `c```````````````````````````````- |
| c | `|d` \| e |
```
|
|
Closes #211.
Found by google/oss-fuzz:
https://oss-fuzz.com/v2/testcase-detail/4686992824598528
|
|
|
|
The previous work for unbounded memory usage and overflows on the buffer
API had several shortcomings:
1. The total size of the buffer was limited by arbitrarily small
precision on the storage type for buffer indexes (typedef'd as
`bufsize_t`). This is not a good design pattern in secure applications,
particualarly since it requires the addition of helper functions to cast
to/from the native `size` types and the custom type for the buffer, and
check for overflows.
2. The library was calling `abort` on overflow and memory allocation
failures. This is not a good practice for production libraries, since it
turns a potential RCE into a trivial, guaranteed DoS to the whole
application that is linked against the library. It defeats the whole
point of performing overflow or allocation checks when the checks will
crash the library and the enclosing program anyway.
3. The default size limits for buffers were essentially unbounded
(capped to the precision of the storage type) and could lead to DoS
attacks by simple memory exhaustion (particularly critical in 32-bit
platforms). This is not a good practice for a library that handles
arbitrary user input.
Hence, this patchset provides slight (but in my opinion critical)
improvements on this area, copying some of the patterns we've used in
the past for high throughput, security sensitive Markdown parsers:
1. The storage type for buffer sizes is now platform native (`ssize_t`).
Ideally, this would be a `size_t`, but several parts of the code expect
buffer indexes to be possibly negative. Either way, switching to a
`size` type is an strict improvement, particularly in 64-bit platforms.
All the helpers that assured that values cannot escape the `size` range
have been removed, since they are superfluous.
2. The overflow checks have been removed. Instead, the maximum size for
a buffer has been set to a safe value for production usage (32mb) that
can be proven not to overflow in practice. Users that need to parse
particularly large Markdown documents can increase this value. A static,
compile-time check has been added to ensure that the maximum buffer size
cannot overflow on any growth operations.
3. The library no longer aborts on buffer overflow. The CMark library
now follows the convention of other Markdown implementations (such as
Hoedown and Sundown) and silently handles buffer overflows and
allocation failures by dropping data from the buffer. The result is
that pathological Markdown documents that try to exploit the library
will instead generate truncated (but valid, and safe) outputs.
All tests after these small refactorings have been verified to pass.
---
NOTE: Regarding 32 bit overflows, generating test cases that crash the
library is trivial (any input document larger than 2gb will crash
CMark), but most Python implementations have issues with large strings
to begin with, so a test case cannot be added to the pathological tests
suite, since it's written in Python.
|
|
- Implement cmark_isalpha.
- Check for ASCII character before implicit cast to char.
- Use internal ctype functions in commonmark.c.
Fixes test failures on Windows and undefined behavior.
|
|
We don't want a blank line before a code block when it's
the first thing in a list item.
|
|
We generally want this option to prohibit any breaking
in things like headers (not just wraps, but softbreaks).
|
|
|
|
- Extend CMARK_OPT_NOBREAKS to all renderers and add `--nobreaks`.
- Do not autowrap, regardless of width parameter, if CMARK_OPT_NOBREAKS
is set.
- Fixed CMARK_OPT_HARDBREAKS for LaTeX and man renderers.
- Ensure that no auto-wrapping occurs if CMARK_OPT_NOBREAKS is enabled,
or if output is CommonMark and CMARK_OPT_HARDBREAKS is enabled.
- Updated man pages.
|
|
They're not supported by MSVC.
|
|
Newer MSVC versions support enough of C99 to be able to compile cmark
in plain C mode. Only the "inline" keyword is still unsupported.
We have to use "__inline" instead.
|
|
We need to cast value passed to isspace(3) to unsigned char to explicitly
prevent possibly undefined behavior.
/tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c: In function 'S_render_node':
/tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:273:9: warning: array subscript has type 'char' [-Wchar-subscripts]
(code_len > 2 && !isspace(code[0]) &&
^
/tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:274:10: warning: array subscript has type 'char' [-Wchar-subscripts]
!(isspace(code[code_len - 1]) && isspace(code[code_len - 2]))) &&
^
/tmp/pkgsrc-tmp/wip/cmark/work/cmark-0.24.1/src/commonmark.c:274:10: warning: array subscript has type 'char' [-Wchar-subscripts]
CTYPE(3) Library Functions Manual CTYPE(3)
NAME
isalpha, isupper, islower, isdigit, isxdigit, isalnum, isspace, ispunct,
isprint, isgraph, iscntrl, isblank, toupper, tolower, - character
classification and mapping functions
LIBRARY
Standard C Library (libc, -lc)
CAVEATS
The first argument of these functions is of type int, but only a very
restricted subset of values are actually valid. The argument must either
be the value of the macro EOF (which has a negative value), or must be a
non-negative value within the range representable as unsigned char.
Passing invalid values leads to undefined behavior.
NetBSD 7.99 February 25, 2015 NetBSD 7.99
|
|
|
|
|
|
|
|
We try not to escape punctuation unless we absolutely have
to. So, `)` and `.` are no longer escaped whenever they
occur after digits; now they are only escaped if they are
geuninely in a position where they'd cause a list item.
This required a couple changes to render.c.
- `renderer->begin_content` is only set to false AFTER a
string of digits at the beginning of the line. (This is
slightly unprincipled.)
- We never break before a numeral (also slightly unprincipled).
|
|
following list or code block.
This has several advantages. First, the two blank lines breaks
out of list syntax is still controversial in CommonMark.
And it isn't used in other implementations. HTML comments
will always work.
Second, two blank lines breaks out of all lists; an HTML
comment can be used to break out of just one level of nesting.
|
|
This makes the output compatible with more implementations.
|
|
This is more portable.
Closes #90.
|
|
This did not allow for the possibility that a node
might have no containing block, causing the commonmark
renderer to segfault if passed an inline node with no
block parent.
|
|
when they're at the beginning of a block, e.g.
> \- foo
|
|
API change. Sorry, but this is the time to break things,
before 1.0 is released. This matches the recent changes to
CommonMark.dtd.
|
|
CMARK_NODE_HRULE -> CMARK_NODE_THEMATIC_BREAK.
However we've defined the former as the latter to keep
backwards compatibility.
See jgm/CommonMark 8fa94cb460f5e516b0e57adca33f50a669d51f6c
|
|
Defined CMARK_NODE_HEADER to CMARK_NODE_HEADING to ease
the transition.
|
|
See jgm/CommonMark commit 0cdbcee4e840abd0ac7db93797b2b75ca4104314
Note that we have defined
cmark_node_get_header_level = cmark_node_get_heading_level
and
cmark_node_set_header_level = camrk_node_set_heading_level
for backwards compatibility in the API.
|
|
Otherwise we get failures of roundtrip tests.
|
|
Instead of using their `as.literal` content, we now
give each custom node *two* literal fields, one to
be printed on entering the node (before rendering
the children, if any), the other on exiting (after
rendering children).
This gives us the flexibility to have custom nodes
with children.
|
|
|
|
These are passed through verbatim by all writers, with no
escaping.
They are never generated by the parser, and do not correspond
to CommonMark elements. They are designed to be inserted by
filters that postprocess the AST. For example, a filter might
convert specially marked code blocks to svg diagrams in HTML
and tikz diagrams in LaTeX, passing these through to the renderer
as a RAW_BLOCK.
|
|
|
|
|
|
|
|
This fixes an MSVC warning "conversion from 'size_t' to 'int', possible loss of data"
|
|
* Reformatted all source files.
* Added 'format' target to Makefile.
* Removed 'astyle' target.
* Updated .editorconfig.
|
|
|
|
This avoids an allocation and use of strbuf_printf.
|