Proposal: markdown parsing for react-native-render-html
See original GitHub issueThis is a follow-up on a request from @roryabraham. I have investigated the options to integrate Markdown parsing a little further, and here are my findings:
- Markdown parsing is hard. While XML or HTML markups are not context sensible, Markdown is. So implementing my own tokenizer is a no-go for a production-ready deliverable (although I might try the challenge as a hobby at some point).
- Changing the structure of the DOM would bring too many breaking changes, so using remarkjs is a no-go.
- The solution would be intermediary: use an open-source, well tested markdown tokenizer and plug that in
htmlparser2
Tokenizer class to emit a DOM.
Chosing a Tokenizer
I’ve forked this benchmark to add micromark which I found very well structured and solid (via remark-html), and below are my findings (Intel i7-8809G, 32GB of RAM, Nodejs 14.16.0).
Average Ops per second
Minmax parse time
Average Throughput
Conclusion
Markdown-it is the clear winner, since there is no official web assembly support in React Native. Other plus:
- Great ecosystem with many plugins and GFM support including emojis;
- Safe by default;
- Great maintenance metrics (5 open issues).
Implementation Plan
Get inspiration from MarkdownIt.Renderer
:
consume a token tree from MarkdownIt.parse
and invoke corresponding htmlparser2
callbacks while walking the tree.
I’ll also need some help to assess which features you want to enable for Expensify.cash.
Package Design
I need to think of a new package design since I don’t want @native-html/core
to depend directly on markdown-it.
Testing Strategy
The parser will be tested against the official commonmark-spec repository.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (2 by maintainers)
@robertjchen Thanks for pointing that out! I must confess that I’m not proficient in WASM nor asm.js. After reading
markdown-wasm
in surface, I realize that it’s basically a wrapper around md4c, compiled in wasm. For our usecase, we would need to get the parse tree generated by md4c, and access the tree in memory with some JS bindings. I don’t know if WASM offers this capability! Nonetheless, JSI (JavaScript Interface) offers this feature with C/C++ so that would probably be a better fit! However I am not a C/C++ developer (although I’d love to get my hands on), but I find the prospect very interesting. The Graal would be a transient render engine written in C++ with JSI bindings, but that is clearly a huge endeavor and obviously out of scope 🤣To help me understand, can you confirm:
Are you sure wasm isn’t supported? @robertjchen is planning on using it for encryption, I think.
Can you benchmark this against our current custom markdown>html parser?
I thought the idea was translate directly from markdown to DOM, skipping HTML entirely. But this seems like it is just benchmarking various open source markdown>html converters. Can you restate the goal?
Thanks!