summaryrefslogtreecommitdiff
path: root/src/houdini_html_u.c
AgeCommit message (Collapse)Author
2015-12-10Fix warnings about dropping const qualifierKevin Wojniak
2015-08-06Prefix utf8proc functions to avoid conflict with existing libraryKevin Wojniak
2015-07-27Use clang-format, llvm style, for formatting.John MacFarlane
* Reformatted all source files. * Added 'format' target to Makefile. * Removed 'astyle' target. * Updated .editorconfig.
2015-06-17Renamed entities.h -> entities.inc.John MacFarlane
Also tools/make_entities_h.py -> tools/make_entitis_inc.py.
2015-06-16Simpler approach for entity lookup.John MacFarlane
We dispense with the hashes and just do string comparsions. Since the array is in order, we can search intelligently and should never need to do more than 8 or so comparisons. This reduces binary size even further, at a small cost in performance. (This shouldn't matter too much, as it's only detectable in really entity-heavy sources.)
2015-06-16entities: Make the first entity in the array (TripleDot) work.John MacFarlane
We now use -1 instead of 0 to indicate leaf nodes.
2015-06-16astyle formatting changes.John MacFarlane
2015-06-16Replace gperf-based entity lookup with binary tree lookup.John MacFarlane
The primary advantage is a big reduction in the size of the compiled library and executable (> 100K). There should be no measurable performance difference in normal documents. I detected a slight performance hit (around 5%) in a file containing 1,000,000 entities. * Removed `src/html_unescape.gperf` and `src/html_unescape.h`. * Added `src/entities.h` (generated by `tools/make_entities_h.py`). * Added binary tree lookup functions to `houdini_html_u.c`, and use the data in `src/entities.h`.
2015-06-13Fixed entity lookup table.John MacFarlane
The old one had many errors. The new one is derived from the list in the npm entities package. Since the sequences can now be longer (multi-code-point), we have bumped the length limit from 4 to 8, which also affects houdini_html_u.c. An example of the kind of error that was fixed in given in jgm/commonmark.js#47: `≧̸` should be rendered as "≧̸" (U+02267 U+00338), but it's actually rendered as "≧" (which is the same as `≧`).
2015-06-07Convert code base to strbuf_tNick Wellnhofer
There are probably a couple of places I missed. But this will only be a problem if we use a 64-bit bufsize_t at some point. Then, we'll get warnings from -Wshorten-64-to-32.
2015-05-07Multiple issues with numeric entitiesNick Wellnhofer
This closes #33.
2015-02-02Don't rely on strnlen being availableNick Wellnhofer
2015-01-12Reduce size of gperf entity tableNick Wellnhofer
Don't store length of UTF-8 string. It can be computed by NULL-terminating strings shorter than 4 bytes and using strnlen. Use gperf's string pool option. This allows to use an 'int' index into the string pool instead of a pointer and is helpful on 64-bit systems. Shaves about 75 KB off the 32-bit binaries on Linux and 128 KB off the 64-bit binaries on OS X.
2014-12-15Re-added cmark_ prefix to strbuf and chunk.John MacFarlane
Reverts 225d720.
2014-12-04Moved source files from src/html into src.John MacFarlane
The separate directory presents problems for some simple extension building systems, like luarocks.