summaryrefslogtreecommitdiff
path: root/src/references.c
AgeCommit message (Collapse)Author
2020-02-16Add casts for MSVC10.John MacFarlane
This is kivikakk's commit 62166fe3b6b07068ed4c4207113e3c4b060ad4a8 in cmark-gfm.
2020-02-16Fix #220 (hash collisions for references).John MacFarlane
This commit ports Vicent Marti's fix in cmark-gfm. (384cc9db4cd7a90f59c0751e58eb7b3023d38b85) His commit message follows: As explained on the previous commit, it is trivial to DoS the CMark parser by generating a document where all the link reference names hash to the same bucket in the hash table. This will cause the lookup process for each reference to take linear time on the amount of references in the document, and with enough link references to lookup, the end result is a pathological O(N^2) that causes medium-sized documents to finish parsing in 5+ minutes. To avoid this issue, we propose the present commit. Based on the fact that all reference lookup/resolution in a Markdown document is always performed as a last step during the parse process, we've reimplemented reference storage as follows: 1. New references are always inserted at the end of a linked list. This is an O(1) operation, and does not check whether an existing (duplicate) reference with the same label already exists in the document. 2. Upon the first call to `cmark_reference_lookup` (when it is expected that no further references will be added to the reference map), the linked list of references is written into a fixed-size array. 3. The fixed size array can then be efficiently sorted in-place in O(n log n). This operation only happens once. We perform this sort in a _stable_ manner to ensure that the earliest link reference in the document always has preference, as the spec dictates. To accomplish this, every reference is tagged with a generation number when initially inserted in the linked list. 4. The sorted array is then compacted in O(n). Since it was sorted in a stable way, the first reference for each label is preserved and the duplicates are removed, matching the spec. 5. We can now simply perform a binary search for the current `cmark_reference_lookup` query in O(log n). Any further lookup calls will also be O(log n), since the sorted references table only needs to be generated once. The resulting implementation is notably simple (as it uses standard library builtins `qsort` and `bsearch`), whilst performing better than the fixed size hash table in documents that have a high number of references and never becoming pathological regardless of the input.
2020-01-23Use C string instead of chunk for link URL and titleNick Wellnhofer
Use zero-terminated C strings instead of cmark_chunks without storing the length. This introduces a few additional strlen computations, but overhead should be low. Allows to reduce size of struct cmark_node later.
2016-09-26Use cmark_mem to free where used to allocYuki Izumi
2016-06-24Reformatted.John MacFarlane
2016-06-22cmark_reference_lookup: Return NULL if reference is null string.John MacFarlane
2016-06-06msvc: Fix warnings and errorsVicent Marti
2016-06-06cmark: Implement support for custom allocatorsVicent Marti
2016-06-06cmake: Global handler for OOM situationsVicent Marti
2015-08-06Prefix utf8proc functions to avoid conflict with existing libraryKevin Wojniak
2015-07-27Use clang-format, llvm style, for formatting.John MacFarlane
* Reformatted all source files. * Added 'format' target to Makefile. * Removed 'astyle' target. * Updated .editorconfig.
2015-05-14Store link URL and title as cmark_chunkNick Wellnhofer
2015-01-05Reformatted code consistently with astyle.John MacFarlane
2014-12-15Re-added cmark_ prefix to strbuf and chunk.John MacFarlane
Reverts 225d720.
2014-11-28Use prefixed names for symbols from references.hNick Wellnhofer
2014-11-28Use prefixed names for symbols from inlines.hNick Wellnhofer
2014-11-17Rename ast.h to parser.hNick Wellnhofer
2014-11-16Cast void pointers explicitlyNick Wellnhofer
Needed for C++ compatibility.
2014-11-16Moved AST details from public header cmark.h to private ast.h.John MacFarlane
2014-11-13Removed ast modules, moved these defs back to cmark.h.John MacFarlane
2014-11-09Added MAX_LINK_LABEL_LENGTH to cmark.h.John MacFarlane
Use in link label parsing and reference lookup.
2014-11-06Reformatted code consistently.John MacFarlane
2014-10-24Renamed c program and library stmd -> cmark.John MacFarlane
Also renamed internal library functions accordingly.
2014-10-24Merge branch 'master' of https://github.com/tchetch/stmd into tchetch-masterJohn MacFarlane
Conflicts: src/inlines.c
2014-10-24Use unsigned char, not char, throughout.John MacFarlane
Closes #43.
2014-10-18Reindented c sources.John MacFarlane
2014-10-06- Use of calloc instead of malloctchetch
- Test for NULL after allocation
2014-09-15Cleanup external APIsVicent Marti
2014-09-10Do not create references with empty namesVicent Marti
2014-09-10Fix misc bugsVicent Marti
2014-09-10Cleanup reference implementationVicent Marti