summaryrefslogtreecommitdiff
path: root/src/blocks.c
AgeCommit message (Collapse)Author
2020-07-12Treat textarea like script, style, pre (type 1 HTML block)...John MacFarlane
in accordance with spec change.
2020-05-13Add needed include in blocks.cJohn MacFarlane
2020-03-03Skip UTF-8 BOM if present at beginning of buffer.John MacFarlane
Closes #334.
2020-01-23Rearrange struct cmark_nodeNick Wellnhofer
Introduce multi-purpose data/len members in struct cmark_node. This is mainly used to store literal text for inlines, code and HTML blocks. Move the content strbuf for blocks from cmark_node to cmark_parser. When finalizing nodes that allow inlines (paragraphs and headings), detach the strbuf and store the block content in the node's data/len members. Free the block content after processing inlines. Reduces size of struct cmark_node by 8 bytes.
2020-01-23Use C string instead of chunk for literal textNick Wellnhofer
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing #309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.
2020-01-23Use C string instead of chunk for code info and literalNick Wellnhofer
Use zero-terminated C strings instead of cmark_chunks without storing the length. The length of code literals will be readded in a later commit. strlen overhead for code info should be negligible. Reduces size of struct cmark_node by 8 bytes.
2019-04-06Resolve link references before creating setext header.John MacFarlane
A setext header line after a link reference should not create a header, according to the spec. See commonmark/commonmark-spec#395.
2019-03-17Use hand-rolled scanner for thematic break.John MacFarlane
Keep track of the last position where a thematic break failed to match on a line, to avoid rescanning unnecessarily. See commonmark/cmark#284.
2019-03-17Do cheaper test first.John MacFarlane
2019-03-17Rename ends_with_blank_line with S_ prefix.John MacFarlane
As with other static functions.
2019-03-17Add CMARK_NODE__LAST_LINE_CHECKED flag.John MacFarlane
Use this to avoid unnecessary recursion in ends_with_blank_line. Closes #284.
2019-03-17In ends_with_blank_line, call S_set_last_line_blank...John MacFarlane
to avoid unnecessary repetition. Once we settle whether a list item ends in a blank line, we don't need to revisit this in considering parent list items. See commonmark/cmark#284.
2018-04-14Optimize S_find_first_nonspace.John MacFarlane
We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem.
2018-03-25Don't allow list markers to be indented >= 4 spaces.John MacFarlane
See commonmark/CommonMark#497.
2017-11-02Merge branch 'master' into upstream/inline-sourceposAshe Connor
2017-09-14blocks: Fix quadratic behavior in `finalize`Vicent Marti
2017-08-10Fix inlines spanning newlines, text in non-paraYuki Izumi
2017-06-23Reset bytes after UTF8 procYuki Izumi
See https://github.com/jgm/cmark/issues/206.
2017-06-02Fixed cmark_node_get_list_start to return 0 for bullet lists...John MacFarlane
as documented! Closes #202.
2017-05-30Use CMARK_NO_DELIM for bullet lists. Closes #201.John MacFarlane
2017-05-05Remove normalize as an option per #190 (#194)Yuki Izumi
2017-01-20Fixed buffer overflow error in S_parser_feed.John MacFarlane
The overflow could occur in the following condition: the buffer ends with `\r` and the next memory address contains `\n`. Closes #184.
2017-01-03Revert "More sourcepos! (#169)"John MacFarlane
This reverts commit 9e643720ec903f3b448bd2589a0c02c2514805ae.
2017-01-03Revert "Change types for source map offsets (#174)"John MacFarlane
This reverts commit 4fbe344df43ed7f60a3d3a53981088334cb709fc.
2016-12-30Change types for source map offsets (#174)Nick Wellnhofer
* Improve strbuf guarantees Introduce BUFSIZE_MAX macro and make sure that the strbuf implementation can handle strings up to this size. * Abort early if document size exceeds internal limit * Change types for source map offsets Switch to size_t for the public API, making the public headers C89-compatible again. Switch to bufsize_t internally, reducing memory usage and improving performance on 32-bit platforms. * Make parser return NULL on internal index overflow Make S_parser_feed set an error and ignore subsequent chunks if the total input document size exceeds an internal limit. Make cmark_parser_finish return NULL if an error was encountered. Add public API functions to retrieve error code and error message. strbuf overflow in renderers and OOM in parser or renderers still cause an abort.
2016-12-20More sourcepos! (#169)Mathieu Duponchelle
* open_new_blocks: always create child before advancing offset * Source map * Extent's typology * In-depth python bindings
2016-12-09Correctly initialize chunk in S_process_line (#170)Nick Wellnhofer
The `alloc` member wasn't initialized. This also allows to add an assertion in `chunk_rtrim` which doesn't work for alloced chunks.
2016-10-11Ran 'make format' to reformat code.John MacFarlane
2016-10-11Changed logic for null/eol checks.John MacFarlane
- only check once for "not at end of line" - check for null before we check for newline characters (the previous patch would fail for NULL + CR) See #160.
2016-10-11Fix by not advancing past both \0 and \nYuki Izumi
2016-09-26Merge pull request #157 from kivikakk/list-parse-mem-leakJohn MacFarlane
Fix memory leak in list parsing
2016-09-26Fix memory leak in list parsingYuki Izumi
If `parse_list_marker` returns 1, but the second part of the `&&` clause is false, we leak `data` here.
2016-09-26Use cmark_mem to free where used to allocYuki Izumi
2016-07-15Reformatted.John MacFarlane
2016-07-13Fix sourcepos for blockquotes.John MacFarlane
Fixes #142.
2016-07-13Replaced check for `\n` with `S_is_line_end_char`.John MacFarlane
2016-07-13Empty list items cannot interrupt paragraphs (spec change).John MacFarlane
2016-07-11Fix mistaken sourcepos for atx headers.John MacFarlane
Closes #141.
2016-07-11Removed "two blanks breaks out of a list" feature.John MacFarlane
2016-07-11Don't allow ordered lists to interrupt paragraphs unless...John MacFarlane
...they start with 1.
2016-06-24Reformatted.John MacFarlane
2016-06-06msvc: Fix warnings and errorsVicent Marti
2016-06-06mem: Rename the new APIsVicent Marti
2016-06-06mem: Add a `realloc` pointer to the memory handlerVicent Marti
2016-06-06node: Memory dietVicent Marti
Reduce the storage size for the `cmark_code` struct
2016-06-06node: Memory dietVicent Marti
Save node information in flags instead of using one boolean for each property.
2016-06-06cmark: Implement support for custom allocatorsVicent Marti
2016-06-06cmake: Global handler for OOM situationsVicent Marti
2016-06-06buffer: proper safety checks for unbounded memoryVicent Marti
The previous work for unbounded memory usage and overflows on the buffer API had several shortcomings: 1. The total size of the buffer was limited by arbitrarily small precision on the storage type for buffer indexes (typedef'd as `bufsize_t`). This is not a good design pattern in secure applications, particualarly since it requires the addition of helper functions to cast to/from the native `size` types and the custom type for the buffer, and check for overflows. 2. The library was calling `abort` on overflow and memory allocation failures. This is not a good practice for production libraries, since it turns a potential RCE into a trivial, guaranteed DoS to the whole application that is linked against the library. It defeats the whole point of performing overflow or allocation checks when the checks will crash the library and the enclosing program anyway. 3. The default size limits for buffers were essentially unbounded (capped to the precision of the storage type) and could lead to DoS attacks by simple memory exhaustion (particularly critical in 32-bit platforms). This is not a good practice for a library that handles arbitrary user input. Hence, this patchset provides slight (but in my opinion critical) improvements on this area, copying some of the patterns we've used in the past for high throughput, security sensitive Markdown parsers: 1. The storage type for buffer sizes is now platform native (`ssize_t`). Ideally, this would be a `size_t`, but several parts of the code expect buffer indexes to be possibly negative. Either way, switching to a `size` type is an strict improvement, particularly in 64-bit platforms. All the helpers that assured that values cannot escape the `size` range have been removed, since they are superfluous. 2. The overflow checks have been removed. Instead, the maximum size for a buffer has been set to a safe value for production usage (32mb) that can be proven not to overflow in practice. Users that need to parse particularly large Markdown documents can increase this value. A static, compile-time check has been added to ensure that the maximum buffer size cannot overflow on any growth operations. 3. The library no longer aborts on buffer overflow. The CMark library now follows the convention of other Markdown implementations (such as Hoedown and Sundown) and silently handles buffer overflows and allocation failures by dropping data from the buffer. The result is that pathological Markdown documents that try to exploit the library will instead generate truncated (but valid, and safe) outputs. All tests after these small refactorings have been verified to pass. --- NOTE: Regarding 32 bit overflows, generating test cases that crash the library is trivial (any input document larger than 2gb will crash CMark), but most Python implementations have issues with large strings to begin with, so a test case cannot be added to the pathological tests suite, since it's written in Python.
2016-04-09Reformatted.John MacFarlane