summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-03Skip UTF-8 BOM if present at beginning of buffer.John MacFarlane
Closes #334.
2020-02-16Add casts for MSVC10.John MacFarlane
This is kivikakk's commit 62166fe3b6b07068ed4c4207113e3c4b060ad4a8 in cmark-gfm.
2020-02-16Fix #220 (hash collisions for references).John MacFarlane
This commit ports Vicent Marti's fix in cmark-gfm. (384cc9db4cd7a90f59c0751e58eb7b3023d38b85) His commit message follows: As explained on the previous commit, it is trivial to DoS the CMark parser by generating a document where all the link reference names hash to the same bucket in the hash table. This will cause the lookup process for each reference to take linear time on the amount of references in the document, and with enough link references to lookup, the end result is a pathological O(N^2) that causes medium-sized documents to finish parsing in 5+ minutes. To avoid this issue, we propose the present commit. Based on the fact that all reference lookup/resolution in a Markdown document is always performed as a last step during the parse process, we've reimplemented reference storage as follows: 1. New references are always inserted at the end of a linked list. This is an O(1) operation, and does not check whether an existing (duplicate) reference with the same label already exists in the document. 2. Upon the first call to `cmark_reference_lookup` (when it is expected that no further references will be added to the reference map), the linked list of references is written into a fixed-size array. 3. The fixed size array can then be efficiently sorted in-place in O(n log n). This operation only happens once. We perform this sort in a _stable_ manner to ensure that the earliest link reference in the document always has preference, as the spec dictates. To accomplish this, every reference is tagged with a generation number when initially inserted in the linked list. 4. The sorted array is then compacted in O(n). Since it was sorted in a stable way, the first reference for each label is preserved and the duplicates are removed, matching the spec. 5. We can now simply perform a binary search for the current `cmark_reference_lookup` query in O(log n). Any further lookup calls will also be O(log n), since the sorted references table only needs to be generated once. The resulting implementation is notably simple (as it uses standard library builtins `qsort` and `bsearch`), whilst performing better than the fixed size hash table in documents that have a high number of references and never becoming pathological regardless of the input.
2020-02-16Add pathological test for reference collisions (see #220).John MacFarlane
This is taken from GitHub's fix: https://github.com/github/cmark-gfm/commit/66a0836dc91e1653f7931e1218446664493da520
2020-02-11Update date on cmark.1.John MacFarlane
2020-02-11cmark.1 - Document --unsafe instead of --safe.John MacFarlane
Closes #332.
2020-02-11cmark.1: remove docs for `--normalize` which no longer exists.John MacFarlane
See #332
2020-02-09Add cmark_get_default_mem_allocator().John MacFarlane
API change: This adds a new exported function in cmark.h. Closes #330.
2020-01-25Fix URL check in is_autolinkNick Wellnhofer
In a recent commit, the check was changed to strcmp, but we really have to use strncmp.
2020-01-25Fix null pointer deref in is_autolinkNick Wellnhofer
Introduced by a recent commit. Found by OSS-Fuzz.
2020-01-24build: substitute the path into the generate filesSaleem Abdulrasool
This resorts to the variable substitution to ensure the path embedded is correct. Without this, the path at the time of the configuration. In the case of the Swift project, this ended up searching in the *source* directory rather than the *build* directory. This will ensure that we export the file to an absolute location and we use the same location in the `cmarkConfig.cmake` file by means of CMake's `configure_file` subsitution.
2020-01-23build: use absolute path for cmarkTargets.cmakeSaleem Abdulrasool
Adjust the include of the CMake file to use a cmarkConfig.cmake relative location which enables use without considerations for the path.
2020-01-23Rearrange struct cmark_nodeNick Wellnhofer
Introduce multi-purpose data/len members in struct cmark_node. This is mainly used to store literal text for inlines, code and HTML blocks. Move the content strbuf for blocks from cmark_node to cmark_parser. When finalizing nodes that allow inlines (paragraphs and headings), detach the strbuf and store the block content in the node's data/len members. Free the block content after processing inlines. Reduces size of struct cmark_node by 8 bytes.
2020-01-23Improve packing of struct cmark_listNick Wellnhofer
Allows to reduce size of struct cmark_node later.
2020-01-23Use C string instead of chunk in rendererNick Wellnhofer
Fix another place where an "allocated" cmark_chunk was used.
2020-01-23Use C string instead of chunk for literal textNick Wellnhofer
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing #309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.
2020-01-23Use C string instead of chunk for custom block contentsNick Wellnhofer
Reduces size of struct cmark_node by 8 bytes.
2020-01-23Use C string instead of chunk for link URL and titleNick Wellnhofer
Use zero-terminated C strings instead of cmark_chunks without storing the length. This introduces a few additional strlen computations, but overhead should be low. Allows to reduce size of struct cmark_node later.
2020-01-23Use C string instead of chunk for code info and literalNick Wellnhofer
Use zero-terminated C strings instead of cmark_chunks without storing the length. The length of code literals will be readded in a later commit. strlen overhead for code info should be negligible. Reduces size of struct cmark_node by 8 bytes.
2020-01-23Helper function to set C strings in nodesNick Wellnhofer
2020-01-15Fix pathological_tests.py on WindowsNick Wellnhofer
When using multiprocessing on Windows, the main program must be guarded with a __name__ check.
2020-01-15Remove useless __name__ check in test scriptsNick Wellnhofer
These checks don't seem to be required and broke pathological_tests.py on Windows where multiprocessing sets __name__ to "__mp_main__".
2020-01-15Remove unused variableNick Wellnhofer
2020-01-15Reintroduce version check for MSVC /TP flagNick Wellnhofer
The flag is only required for old MSVC versions.
2020-01-11Fix CMake generator expression checking for MSVCNick Wellnhofer
2020-01-10commonmark renderer: better escaping in smart mode.John MacFarlane
When CMARK_OPT_SMART is enabled, we escape literal `-`, `.`, and quote characters when needed to avoid their being "smartified." See e.g. jgm/pandoc#6041 for an application.
2020-01-10Add options field to cmark_renderer.John MacFarlane
This is an internal change, as this isn't part of the public API.
2020-01-05Move C_VISIBILITY_PRESET back to src/CMakeLists.txt.John MacFarlane
This reverts a change by @compnerd in commit b6ffaca93e2b539ec407aeb4fd588c7f9441e7a9. We don't want this for api_tests, as it triggers this warning: ``` CMake Warning (dev) at api_test/CMakeLists.txt:1 (add_executable): Policy CMP0063 is not set: Honor visibility properties for all target types. Run "cmake --help-policy CMP0063" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Target "api_test" of type "EXECUTABLE" has the following visibility properties set for C: C_VISIBILITY_PRESET For compatibility CMake is not honoring them for this target. This warning is for project developers. Use -Wno-dev to suppress it. ```
2020-01-05commonmark.c - use size_t instead of int.John MacFarlane
2020-01-05Include string.h in cmark-fuzz.c.John MacFarlane
Recommended by build log at https://oss-fuzz-build-logs.storage.googleapis.com/log-6a7500a1-8617-42c6-b8e4-78cab009b5b5.txt
2020-01-03fix -Wconst-qual warningSaleem Abdulrasool
The string literal being assigned is const, but the assignment looses the constness of this string. This enables building with `/Zc:strictString` with MSVC as well.
2020-01-02build: add exports targets for build tree usageSaleem Abdulrasool
This enables the use of the export targets from the build tree to allow easy use of the CMark library in other projects. Resolves: #307
2020-01-02build: use target properties for include pathsSaleem Abdulrasool
This configures the target to setup the include paths publicly for the library targets in the build interface. This enables uses of the targets in the build tree without having to specify the include directories. This is particularly useful for use in the export targets, but also simplifies the rules for the API tests. The install interface does not need the include directories as `cmark.h` is installed into `include` which is a default include path.
2020-01-02build: chmod -x CMakeLists.txt (NFC)Saleem Abdulrasool
Remove the unnecessary execute permission on CMakeLists.txt.
2020-01-02build: reduce property computation in CMakeSaleem Abdulrasool
This reduces the work that CMake needs to do to configure the libraries by setting all the properties at once.
2020-01-02build: use `CMAKE_INCLUDE_CURRENT_DIRECTORY`Saleem Abdulrasool
This uses the CMake mechanism for including the current source and binary directories. This avoids the custom handling for this.
2020-01-02build: improve man page installationSaleem Abdulrasool
man pages are extremely useful, but are not generally available on Windows. This changes the install condition to check for the Windows cross-compile rather than the toolchain in use. It is possible to build for Windows using clang in the GNU driver.
2020-01-02build: only include GNUInstallDirs onceSaleem Abdulrasool
Avoid including the utility once, which should avoid some unnecessary CMake checks, and reduces duplication.
2019-12-26build: replace `add_compile_definitions` (#321)Saleem Abdulrasool
Replace `add_compile_definitions` with `add_compile_options` since the former was introduced in 3.12.
2019-12-22build: cleanup CMake (#319)Saleem Abdulrasool
* build: inline a variable * build: use `LINKER_LANGUAGE` property for C++ runtime Rather than explicitly name the C++ runtime, use the `LINKER_LANGUAGE` property to use the driver to spell the C++ runtime appropriately. * build: use CMake to control C standard Rather than use compiler specific flags to control the language standard, indicate to CMake the desired standard. * build: use the correct variable These flags are being applied to the *C* compiler, check the C compiler, not the C++ compiler. * build: loosen the compiler check This loosens the compiler identifier check to enable matching AppleClang which is the identifier for the Xcode compiler. * build: hoist shared flags to top-level CMakeLists This hoists the common shared flags handling to the top-level CMakeLists from sub-layers. This prevents the duplication of the handling. * build: remove duplicated flags This is unnecessary, `/TP` is forced on all MSVC builds, no need to duplicate the flag for older versions. * build: loosen C compiler identifier check Loosen the check to a match rather than equality check, this allows it to match AppleClang which is the identifier for the Apple vended clang compiler part of Xcode. * build: use `add_compile_options` Use `add_compile_options` rather than modify `CMAKE_C_FLAGS`. The latter is meant to be only modified by the user, not the package developer. * build: hoist sanitizer flags to global state This moves the CMAKE_C_FLAGS handling to the top-level and uses `add_compile_options` rather than modifying the user controlled flags. * build: hoist `-fvisibilty` flags to top-level These are global settings, hoist them to the top level. * build: hoist the debug flag handling Use a generator expression and hoist the flag handling for the debug build. * build: hoist the profile flag handling This is a global flag, hoist it to the top level and use `add_compile_options` rather than modify the user controlled flags. * build: remove incorrect variable handling This seemed to be attempting to set the linker not the linker flags for the profile configuration. This variable is not used, do not set it. * build: remove unused CMake includes
2019-12-21Commonmark renderer: always use fences for code (#317).John MacFarlane
This solves problems with adjacent code blocks being merged.
2019-12-21Ensure that consecutive indented code blocks aren't merged...John MacFarlane
by inserting an HTML comment. Closes #317. I think I'll follow up with a change to use fenced code blocks, but this was the minimal fix.
2019-12-19Improve rendering of commonmark code spans with spaces.John MacFarlane
Closes #316.
2019-11-27normalize.py: use html.escape instead of cgi.escape.John MacFarlane
Closes #313.
2019-11-11Cleaner approach to max digits for numeric entities.John MacFarlane
This modifies unescaping in houdini_html_u.c rather than the entity handling in inlines.c. Unlike the other, this approach works also in e.g. link titles.
2019-11-11Fix entity parser (and api test) to respect length limit on numeric entities.John MacFarlane
2019-11-11Code reformatJohn MacFarlane
2019-11-11Don't allow link destinations with unbalanced unescaped parentheses.John MacFarlane
See commonmark/commonmark.js#177.
2019-11-11Updaet spec.txt.John MacFarlane
2019-10-14Create FUNDING.ymlJohn MacFarlane