cmark - My own fork of cmark for commonmark conversion

Age	Commit message (Collapse)	Author
2015-06-17	Added assertion to peek_char to catch any stray NULLs.	John MacFarlane
	Note that our current procedure for removing nulls is not working properly.
2015-06-17	Renamed entities.h -> entities.inc.	John MacFarlane
	Also tools/make_entities_h.py -> tools/make_entitis_inc.py.
2015-06-16	Added `CMARK_OPT_VALIDATE_UTF8` option.	John MacFarlane
	Also command line option `--validate-utf8`. This option causes cmark to check for valid UTF-8, replacing invalid sequences with the replacement character, U+FFFD. Reinstated api tests for utf8.
2015-06-16	Small code simplification in inlines.c.	John MacFarlane
	Use S_is_line_end_char.
2015-06-16	is_blank: recognize tab as a blank character.	John MacFarlane

2015-06-16	skip_spaces: skip tabs too.	John MacFarlane

2015-06-16	Don't rtrim in subject_from_buffer.	John MacFarlane
	This gives bad results in parsing reference links, where we might have trailing blanks. (finalize in blocks.c removes the bytes parsed as a reference definition; before this change, some blank bytes might remain on the line.)
2015-06-16	Removed utf8 validation.	John MacFarlane
	We now replace null characters in the line splitting code.
2015-06-16	Renamed utf8proc_detab as utf8proc_check, removed detabbing function.	John MacFarlane
	Now it just replaces bad UTF-8 sequences and NULLs. This restores benchmarks to near their previous levels.
2015-06-16	Preliminary changes for new tab handling.	John MacFarlane
	We no longer preprocess tabs to spaces before parsing. Instead, we keep track of both the byte offset and the (virtual) column as we parse block starts. This allows us to handle tabs without converting to spaces first. Tabs are left as tabs in the output. Added `column` and `first_nonspace_column` fields to `parser`. Added utility function to advance the offset, computing the virtual column too. Note that we don't need to deal with UTF-8 here at all. Only ASCII occurs in block starts. Significant performance improvement due to the fact that we're not doing UTF-8 validation -- though we might want to add that back in.
2015-06-16	Simpler approach for entity lookup.	John MacFarlane
	We dispense with the hashes and just do string comparsions. Since the array is in order, we can search intelligently and should never need to do more than 8 or so comparisons. This reduces binary size even further, at a small cost in performance. (This shouldn't matter too much, as it's only detectable in really entity-heavy sources.)
2015-06-16	Mark entity data structures as const.	John MacFarlane

2015-06-16	entities: Make the first entity in the array (TripleDot) work.	John MacFarlane
	We now use -1 instead of 0 to indicate leaf nodes.
2015-06-16	astyle formatting changes.	John MacFarlane

2015-06-16	Replace gperf-based entity lookup with binary tree lookup.	John MacFarlane
	The primary advantage is a big reduction in the size of the compiled library and executable (> 100K). There should be no measurable performance difference in normal documents. I detected a slight performance hit (around 5%) in a file containing 1,000,000 entities. * Removed `src/html_unescape.gperf` and `src/html_unescape.h`. * Added `src/entities.h` (generated by `tools/make_entities_h.py`). * Added binary tree lookup functions to `houdini_html_u.c`, and use the data in `src/entities.h`.
2015-06-15	Fixed cases like	John MacFarlane
	``` [ref]: url "title" ok ``` Here we should parse the first line as a reference.
2015-06-15	inlines.c: Added utility functions to skip spaces and line endings.	John MacFarlane

2015-06-13	Fixed backslashes in link destinations that are not part of escapes.	John MacFarlane
	See jgm/commonmark#45.
2015-06-13	Fixed entity lookup table.	John MacFarlane
	The old one had many errors. The new one is derived from the list in the npm entities package. Since the sequences can now be longer (multi-code-point), we have bumped the length limit from 4 to 8, which also affects houdini_html_u.c. An example of the kind of error that was fixed in given in jgm/commonmark.js#47: `&ngE;` should be rendered as "≧̸" (U+02267 U+00338), but it's actually rendered as "≧" (which is the same as `&gE;`).
2015-06-11	Removed "add newline if line doesn't have one."	John MacFarlane
	This isn't actually needed.
2015-06-11	Small logic fixes and a simplification in process_emphasis.	John MacFarlane

2015-06-11	Fixed `process_emphasis` to handle new pathological cases.	John MacFarlane
	Now we have an array of pointers (`potential_openers`), keyed to the delim char. When we've failed to match a potential opener prior to point X in the delimiter stack, we reset `potential_openers` for that opener type to X, and thus avoid having to look again through all the openers we've already rejected. See jgm/commonmark#43.
2015-06-10	More code simplification.	John MacFarlane

2015-06-10	Code simplification.	John MacFarlane

2015-06-10	process_inlines: remove closers from delim stack when possible.	John MacFarlane
	When they have no matching openers and cannot be openers themselves, we can safely remove them. This helps with a performance case: "a_ " * 20000. See jgm/commonmark.js#43.
2015-06-10	Revert "Merge pull request #58 from nwellnhof/optimize_utf8proc_detab"	John MacFarlane
	This reverts commit 54d1249c2caebf45a24d691dc765fb93c9a5e594, reversing changes made to bc14d869323650e936c7143dcf941b28ccd5b57d.
2015-06-09	Further optimize utf8proc_valid	Nick Wellnhofer
	Assume a multi-byte sequence and rework switch statement into if/else for another 2% speedup.
2015-06-09	Roll utf8proc_charlen into utf8proc_valid	Nick Wellnhofer
	Speeds up "make bench" by another percent.
2015-06-09	Optimize utf8proc_detab	Nick Wellnhofer
	Handle valid UTF-8 chars inside the main loop and avoid a call to strbuf_put for every UTF-8 char. Results in a 8% speedup in the UTF-8-heavy "make bench" on my system.
2015-06-07	Remove unimplemented functions from houdini.h	Nick Wellnhofer

2015-06-07	Helper to safely call strlen	Nick Wellnhofer

2015-06-07	Avoid strlen in html.c	Nick Wellnhofer

2015-06-07	Avoid strlen in xml.c	Nick Wellnhofer

2015-06-07	Avoid strlen in commonmark.c	Nick Wellnhofer

2015-06-07	Check for overflow in S_parser_feed	Nick Wellnhofer
	Guard against too large chunks passed via the API.
2015-06-07	Convert code base to strbuf_t	Nick Wellnhofer
	There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32.
2015-06-07	Change return type of cmark_strbuf_len	Nick Wellnhofer

2015-06-07	Missing bounds checks in buffer.c	Nick Wellnhofer

2015-06-07	Remove unused function cmark_strbuf_attach	Nick Wellnhofer
	This function was missing a couple of range checks that I'm too lazy to fix.
2015-06-07	Fix check in cmark_strbuf_vprintf	Nick Wellnhofer
	Avoid potential overflow and allow for different bufsize types.
2015-06-07	Check for negative lengths in buffer.c	Nick Wellnhofer

2015-06-07	Check for overflow when growing strbufs	Nick Wellnhofer
	Replace macro ENSURE_SIZE with inline function S_strbuf_grow_by that checks for overflow.
2015-06-07	Remove useless code in cmark_strbuf_grow	Nick Wellnhofer
	cmark_strbuf_grow will never truncate a buffer.
2015-06-07	Account for null terminator in cmark_strbuf_grow	Nick Wellnhofer
	This simplifies overflow checks.
2015-06-07	Check for overflow in cmark_strbuf_grow	Nick Wellnhofer

2015-06-07	Simplify oversizing of strbufs	Nick Wellnhofer
	Always add 50% on top of target size. No need for a loop.
2015-06-07	Use custom type bufsize_t for string buffer sizes	Nick Wellnhofer
	This makes it easier to change the type later. No functional change. The rest of the code base still has to be adjusted to use the new type. Also add some TODO comments in buffer.c.
2015-06-07	Switch cmark_markdown_to_html over to size_t	Nick Wellnhofer

2015-06-07	Abort on strbuf errors	Nick Wellnhofer
	Users of the strbuf API are supposed to check for an OOM condition after appending to strbufs, but: * This is never done in the whole code base. * The implementation was flawed because only `ptr` was set to the OOM value without adjusting `size` and `asize`. After an error, subsequent calls could very well lead to segfaults, contrary to the documentation. Change the code to always abort on errors with a message printed to stderr. The only alternative is to propagate errors throughout the whole library which seems infeasible.
2015-06-06	Merge pull request #54 from bryant/ensure-size-typo	John MacFarlane
	fix ENSURE_SIZE to actually check left arg length.