summaryrefslogtreecommitdiff
path: root/src/html_unescape.h
AgeCommit message (Collapse)Author
2015-06-16Replace gperf-based entity lookup with binary tree lookup.John MacFarlane
The primary advantage is a big reduction in the size of the compiled library and executable (> 100K). There should be no measurable performance difference in normal documents. I detected a slight performance hit (around 5%) in a file containing 1,000,000 entities. * Removed `src/html_unescape.gperf` and `src/html_unescape.h`. * Added `src/entities.h` (generated by `tools/make_entities_h.py`). * Added binary tree lookup functions to `houdini_html_u.c`, and use the data in `src/entities.h`.
2015-06-13Fixed entity lookup table.John MacFarlane
The old one had many errors. The new one is derived from the list in the npm entities package. Since the sequences can now be longer (multi-code-point), we have bumped the length limit from 4 to 8, which also affects houdini_html_u.c. An example of the kind of error that was fixed in given in jgm/commonmark.js#47: `≧̸` should be rendered as "≧̸" (U+02267 U+00338), but it's actually rendered as "≧" (which is the same as `≧`).
2015-01-12Reduce size of gperf entity tableNick Wellnhofer
Don't store length of UTF-8 string. It can be computed by NULL-terminating strings shorter than 4 bytes and using strnlen. Use gperf's string pool option. This allows to use an 'int' index into the string pool instead of a pointer and is helpful on 64-bit systems. Shaves about 75 KB off the 32-bit binaries on Linux and 128 KB off the 64-bit binaries on OS X.
2014-12-08Create html_unescape.h with extra struct initializersNick Wellnhofer
Fixes missing initializer warnings.
2014-12-04Moved source files from src/html into src.John MacFarlane
The separate directory presents problems for some simple extension building systems, like luarocks.