summaryrefslogtreecommitdiff
path: root/src/utf8.c
AgeCommit message (Collapse)Author
2015-06-16Renamed utf8proc_detab as utf8proc_check, removed detabbing function.John MacFarlane
Now it just replaces bad UTF-8 sequences and NULLs. This restores benchmarks to near their previous levels.
2015-06-16astyle formatting changes.John MacFarlane
2015-06-10Revert "Merge pull request #58 from nwellnhof/optimize_utf8proc_detab"John MacFarlane
This reverts commit 54d1249c2caebf45a24d691dc765fb93c9a5e594, reversing changes made to bc14d869323650e936c7143dcf941b28ccd5b57d.
2015-06-09Further optimize utf8proc_validNick Wellnhofer
Assume a multi-byte sequence and rework switch statement into if/else for another 2% speedup.
2015-06-09Roll utf8proc_charlen into utf8proc_validNick Wellnhofer
Speeds up "make bench" by another percent.
2015-06-09Optimize utf8proc_detabNick Wellnhofer
Handle valid UTF-8 chars inside the main loop and avoid a call to strbuf_put for every UTF-8 char. Results in a 8% speedup in the UTF-8-heavy "make bench" on my system.
2015-06-07Convert code base to strbuf_tNick Wellnhofer
There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32.
2015-04-16Pass-through Unicode non-charactersNick Wellnhofer
Despite their name, Unicode non-characters are valid code points. They should be passed through by a library like libcmark.
2015-01-05Reformatted code consistently with astyle.John MacFarlane
2014-12-29Added cmark_ prefix to functions in cmark_ctype.John MacFarlane
2014-12-29Added cmark_ctype.h with locale-independent isspace, ispunct, etc.John MacFarlane
Otherwise cmark's behavior varies unpredictably with the locale. `is_punctuation` in utf8.h has also been adjusted so that everything that counts all ASCII symbol characters count as punctuation, even though some are not in P* character classes.
2014-12-15Re-added cmark_ prefix to strbuf and chunk.John MacFarlane
Reverts 225d720.
2014-11-24Validate UTF-8 inputNick Wellnhofer
Invalid UTF-8 byte sequences are replaced with the Unicode replacement character U+FFFD. Fixes #213.
2014-11-24Off-by-one error in utf8proc_detabNick Wellnhofer
2014-11-20Added utf8proc_is_space.John MacFarlane
2014-11-20Added utf8proc_is_punctuation.John MacFarlane
We'll probably need this when the spec for emph/strong gets revised.
2014-11-16Remove unneeded #includesNick Wellnhofer
Fixes cross-platform issues.
2014-10-18Reindented c sources.John MacFarlane
2014-09-10Improve invalid UTF8 codepoint skippingVicent Marti
2014-09-10Fix infinite loop when case folding invalid UTF8 charsVicent Marti
2014-09-10Cleanup reference implementationVicent Marti
2014-09-09UTF8-aware detabbing and entity handlingVicent Marti
2014-09-09Rename to strbufVicent Marti
2014-09-09It buiiiildsVicent Marti
2014-09-09ffffixVicent Marti
2014-09-09lolVicent Marti
2014-08-13Initial commitJohn MacFarlane