summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-06-16normalize.py: don't collapse whitespace in pre contexts.John MacFarlane
2015-06-16Simpler approach for entity lookup.John MacFarlane
We dispense with the hashes and just do string comparsions. Since the array is in order, we can search intelligently and should never need to do more than 8 or so comparisons. This reduces binary size even further, at a small cost in performance. (This shouldn't matter too much, as it's only detectable in really entity-heavy sources.)
2015-06-16make_entities_h.py: confirm there are no hash collisions.John MacFarlane
At least with valid data.
2015-06-16Revert "Rebuild src/entities.h when the generating python program changes."John MacFarlane
This reverts commit e113185554c4d775e6fca0596011b405fa1700a5.
2015-06-16Rebuild src/entities.h when the generating python program changes.John MacFarlane
2015-06-16Mark entity data structures as const.John MacFarlane
2015-06-16entities: Make the first entity in the array (TripleDot) work.John MacFarlane
We now use -1 instead of 0 to indicate leaf nodes.
2015-06-16astyle formatting changes.John MacFarlane
2015-06-16Added explanatory note about entities.h in Makefile.John MacFarlane
2015-06-16Replace gperf-based entity lookup with binary tree lookup.John MacFarlane
The primary advantage is a big reduction in the size of the compiled library and executable (> 100K). There should be no measurable performance difference in normal documents. I detected a slight performance hit (around 5%) in a file containing 1,000,000 entities. * Removed `src/html_unescape.gperf` and `src/html_unescape.h`. * Added `src/entities.h` (generated by `tools/make_entities_h.py`). * Added binary tree lookup functions to `houdini_html_u.c`, and use the data in `src/entities.h`.
2015-06-15Fixed cases likeJohn MacFarlane
``` [ref]: url "title" ok ``` Here we should parse the first line as a reference.
2015-06-15inlines.c: Added utility functions to skip spaces and line endings.John MacFarlane
2015-06-15Updated spec.txt.John MacFarlane
2015-06-13Fixed backslashes in link destinations that are not part of escapes.John MacFarlane
See jgm/commonmark#45.
2015-06-13Updated spec.txt.John MacFarlane
2015-06-13Updated spec.txt.John MacFarlane
2015-06-13Fixed entity lookup table.John MacFarlane
The old one had many errors. The new one is derived from the list in the npm entities package. Since the sequences can now be longer (multi-code-point), we have bumped the length limit from 4 to 8, which also affects houdini_html_u.c. An example of the kind of error that was fixed in given in jgm/commonmark.js#47: `≧̸` should be rendered as "≧̸" (U+02267 U+00338), but it's actually rendered as "≧" (which is the same as `≧`).
2015-06-11Removed "add newline if line doesn't have one."John MacFarlane
This isn't actually needed.
2015-06-11pathological_tests: removed timeout stuff.John MacFarlane
It breaks on Windows.
2015-06-11Small logic fixes and a simplification in process_emphasis.John MacFarlane
2015-06-11Updated benchmarks.md.John MacFarlane
Removed sundown, because the reading was anomalous. This commit in hoedown caused the speed difference btw sundown and hoedown that I was measuring before (on 32 bit machines): https://github.com/hoedown/hoedown/commit/ca829ff83580ed52cc56c09a67c80119026bae20 As Nick Wellnhofer explains: "The commit removes a rather arbitrary limit of 16MB for buffers. Your benchmark input probably results in an buffer larger than 16MB. It also seems that hoedown didn't check error returns thoroughly at the time of the commit. This basically means that large input files ould produce any kind of random behavior before that commit, and that any benchmark that results in a too large buffer can't be relied on."
2015-06-11Fixed `process_emphasis` to handle new pathological cases.John MacFarlane
Now we have an array of pointers (`potential_openers`), keyed to the delim char. When we've failed to match a potential opener prior to point X in the delimiter stack, we reset `potential_openers` for that opener type to X, and thus avoid having to look again through all the openers we've already rejected. See jgm/commonmark#43.
2015-06-11Added another case to pathological tests.John MacFarlane
"*a_ " * 20000 See jgm/commonmark#43.
2015-06-11Added timetouts to pathological tests.John MacFarlane
This way tests fail instead of just hanging. Currently we use a 1 sec timeout. Added a failing test from jgm/commonmark#43.
2015-06-10More code simplification.John MacFarlane
2015-06-10Code simplification.John MacFarlane
2015-06-10Added more pathological tests.John MacFarlane
Many link closers with no openers. Many link openers with no closers. Many emph openers with no closers.
2015-06-10Added pathological test case for jgm/commonmark#43.John MacFarlane
Many closers with no openers.
2015-06-10process_inlines: remove closers from delim stack when possible.John MacFarlane
When they have no matching openers and cannot be openers themselves, we can safely remove them. This helps with a performance case: "a_ " * 20000. See jgm/commonmark.js#43.
2015-06-10Revert "Merge pull request #58 from nwellnhof/optimize_utf8proc_detab"John MacFarlane
This reverts commit 54d1249c2caebf45a24d691dc765fb93c9a5e594, reversing changes made to bc14d869323650e936c7143dcf941b28ccd5b57d.
2015-06-09Updated spec.John MacFarlane
2015-06-09Merge pull request #58 from nwellnhof/optimize_utf8proc_detabJohn MacFarlane
Further optimize utf8proc_valid
2015-06-09Further optimize utf8proc_validNick Wellnhofer
Assume a multi-byte sequence and rework switch statement into if/else for another 2% speedup.
2015-06-09Merge pull request #57 from nwellnhof/optimize_utf8proc_detabJohn MacFarlane
Optimize utf8proc_detab
2015-06-09Roll utf8proc_charlen into utf8proc_validNick Wellnhofer
Speeds up "make bench" by another percent.
2015-06-09Optimize utf8proc_detabNick Wellnhofer
Handle valid UTF-8 chars inside the main loop and avoid a call to strbuf_put for every UTF-8 char. Results in a 8% speedup in the UTF-8-heavy "make bench" on my system.
2015-06-08Updated spec.John MacFarlane
2015-06-07Updated changelog.John MacFarlane
2015-06-07Merge pull request #56 from nwellnhof/bufsize_tJohn MacFarlane
Safer handling of string buffer sizes and indices
2015-06-07Remove unimplemented functions from houdini.hNick Wellnhofer
2015-06-07Helper to safely call strlenNick Wellnhofer
2015-06-07Use size_t for strlen result in API testNick Wellnhofer
2015-06-07Avoid strlen in html.cNick Wellnhofer
2015-06-07Avoid strlen in xml.cNick Wellnhofer
2015-06-07Avoid strlen in commonmark.cNick Wellnhofer
2015-06-07Check for overflow in S_parser_feedNick Wellnhofer
Guard against too large chunks passed via the API.
2015-06-07Convert code base to strbuf_tNick Wellnhofer
There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32.
2015-06-07Change return type of cmark_strbuf_lenNick Wellnhofer
2015-06-07Missing bounds checks in buffer.cNick Wellnhofer
2015-06-07Remove unused function cmark_strbuf_attachNick Wellnhofer
This function was missing a couple of range checks that I'm too lazy to fix.