Age | Commit message (Collapse) | Author | |
---|---|---|---|
2015-06-16 | Renamed utf8proc_detab as utf8proc_check, removed detabbing function. | John MacFarlane | |
Now it just replaces bad UTF-8 sequences and NULLs. This restores benchmarks to near their previous levels. | |||
2015-06-16 | astyle formatting changes. | John MacFarlane | |
2015-06-10 | Revert "Merge pull request #58 from nwellnhof/optimize_utf8proc_detab" | John MacFarlane | |
This reverts commit 54d1249c2caebf45a24d691dc765fb93c9a5e594, reversing changes made to bc14d869323650e936c7143dcf941b28ccd5b57d. | |||
2015-06-09 | Further optimize utf8proc_valid | Nick Wellnhofer | |
Assume a multi-byte sequence and rework switch statement into if/else for another 2% speedup. | |||
2015-06-09 | Roll utf8proc_charlen into utf8proc_valid | Nick Wellnhofer | |
Speeds up "make bench" by another percent. | |||
2015-06-09 | Optimize utf8proc_detab | Nick Wellnhofer | |
Handle valid UTF-8 chars inside the main loop and avoid a call to strbuf_put for every UTF-8 char. Results in a 8% speedup in the UTF-8-heavy "make bench" on my system. | |||
2015-06-07 | Convert code base to strbuf_t | Nick Wellnhofer | |
There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32. | |||
2015-04-16 | Pass-through Unicode non-characters | Nick Wellnhofer | |
Despite their name, Unicode non-characters are valid code points. They should be passed through by a library like libcmark. | |||
2015-01-05 | Reformatted code consistently with astyle. | John MacFarlane | |
2014-12-29 | Added cmark_ prefix to functions in cmark_ctype. | John MacFarlane | |
2014-12-29 | Added cmark_ctype.h with locale-independent isspace, ispunct, etc. | John MacFarlane | |
Otherwise cmark's behavior varies unpredictably with the locale. `is_punctuation` in utf8.h has also been adjusted so that everything that counts all ASCII symbol characters count as punctuation, even though some are not in P* character classes. | |||
2014-12-15 | Re-added cmark_ prefix to strbuf and chunk. | John MacFarlane | |
Reverts 225d720. | |||
2014-11-24 | Validate UTF-8 input | Nick Wellnhofer | |
Invalid UTF-8 byte sequences are replaced with the Unicode replacement character U+FFFD. Fixes #213. | |||
2014-11-24 | Off-by-one error in utf8proc_detab | Nick Wellnhofer | |
2014-11-20 | Added utf8proc_is_space. | John MacFarlane | |
2014-11-20 | Added utf8proc_is_punctuation. | John MacFarlane | |
We'll probably need this when the spec for emph/strong gets revised. | |||
2014-11-16 | Remove unneeded #includes | Nick Wellnhofer | |
Fixes cross-platform issues. | |||
2014-10-18 | Reindented c sources. | John MacFarlane | |
2014-09-10 | Improve invalid UTF8 codepoint skipping | Vicent Marti | |
2014-09-10 | Fix infinite loop when case folding invalid UTF8 chars | Vicent Marti | |
2014-09-10 | Cleanup reference implementation | Vicent Marti | |
2014-09-09 | UTF8-aware detabbing and entity handling | Vicent Marti | |
2014-09-09 | Rename to strbuf | Vicent Marti | |
2014-09-09 | It buiiiilds | Vicent Marti | |
2014-09-09 | ffffix | Vicent Marti | |
2014-09-09 | lol | Vicent Marti | |
2014-08-13 | Initial commit | John MacFarlane | |