Age | Commit message (Collapse) | Author |
|
|
|
|
|
We dispense with the hashes and just do string comparsions.
Since the array is in order, we can search intelligently
and should never need to do more than 8 or so comparisons.
This reduces binary size even further, at a small cost
in performance. (This shouldn't matter too much, as
it's only detectable in really entity-heavy sources.)
|
|
At least with valid data.
|
|
This reverts commit e113185554c4d775e6fca0596011b405fa1700a5.
|
|
|
|
|
|
We now use -1 instead of 0 to indicate leaf nodes.
|
|
|
|
|
|
The primary advantage is a big reduction in the size of
the compiled library and executable (> 100K).
There should be no measurable performance difference in
normal documents. I detected a slight performance
hit (around 5%) in a file containing 1,000,000 entities.
* Removed `src/html_unescape.gperf` and `src/html_unescape.h`.
* Added `src/entities.h` (generated by `tools/make_entities_h.py`).
* Added binary tree lookup functions to `houdini_html_u.c`, and
use the data in `src/entities.h`.
|
|
```
[ref]: url
"title" ok
```
Here we should parse the first line as a reference.
|
|
|
|
|
|
See jgm/commonmark#45.
|
|
|
|
|
|
The old one had many errors.
The new one is derived from the list in the npm entities package.
Since the sequences can now be longer (multi-code-point), we
have bumped the length limit from 4 to 8, which also affects
houdini_html_u.c.
An example of the kind of error that was fixed in given
in jgm/commonmark.js#47: `≧̸` should be rendered as "≧̸" (U+02267
U+00338), but it's actually rendered as "≧" (which is the same as
`≧`).
|
|
This isn't actually needed.
|
|
It breaks on Windows.
|
|
|
|
Removed sundown, because the reading was anomalous.
This commit in hoedown caused the speed difference btw
sundown and hoedown that I was measuring before (on 32 bit
machines):
https://github.com/hoedown/hoedown/commit/ca829ff83580ed52cc56c09a67c80119026bae20
As Nick Wellnhofer explains: "The commit removes a rather arbitrary
limit of 16MB for buffers. Your benchmark input probably results in
an buffer larger than 16MB. It also seems that hoedown didn't check
error returns thoroughly at the time of the commit. This basically means
that large input files ould produce any kind of random behavior before
that commit, and that any benchmark that results in a too large buffer
can't be relied on."
|
|
Now we have an array of pointers (`potential_openers`),
keyed to the delim char.
When we've failed to match a potential opener prior to point X
in the delimiter stack, we reset `potential_openers` for that opener
type to X, and thus avoid having to look again through all the openers
we've already rejected.
See jgm/commonmark#43.
|
|
"*a_ " * 20000
See jgm/commonmark#43.
|
|
This way tests fail instead of just hanging.
Currently we use a 1 sec timeout.
Added a failing test from jgm/commonmark#43.
|
|
|
|
|
|
Many link closers with no openers.
Many link openers with no closers.
Many emph openers with no closers.
|
|
Many closers with no openers.
|
|
When they have no matching openers and cannot be openers themselves,
we can safely remove them.
This helps with a performance case: "a_ " * 20000.
See jgm/commonmark.js#43.
|
|
This reverts commit 54d1249c2caebf45a24d691dc765fb93c9a5e594, reversing
changes made to bc14d869323650e936c7143dcf941b28ccd5b57d.
|
|
|
|
Further optimize utf8proc_valid
|
|
Assume a multi-byte sequence and rework switch statement into if/else
for another 2% speedup.
|
|
Optimize utf8proc_detab
|
|
Speeds up "make bench" by another percent.
|
|
Handle valid UTF-8 chars inside the main loop and avoid a call to
strbuf_put for every UTF-8 char.
Results in a 8% speedup in the UTF-8-heavy "make bench" on my system.
|
|
|
|
|
|
Safer handling of string buffer sizes and indices
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guard against too large chunks passed via the API.
|
|
There are probably a couple of places I missed. But this will only
be a problem if we use a 64-bit bufsize_t at some point. Then, we'll
get warnings from -Wshorten-64-to-32.
|
|
|
|
|