question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: TSDoc-flavored-Markdown (TSFM) instead of CommonMark

See original GitHub issue

Based on the issues encountered in the issue #12 thread, we are concluding that TSDoc cannot reasonably be based directly on the CommonMark spec. The goals are conflicting:

  • CommonMark goal: (“common” = union) Provide a standardized algorithm for parsing every familiar markup notation. It’s okay if the resulting syntax rules are impossible for humans to memorize, because mistakes can be easily corrected using the editor’s interactive preview. If a syntax is occasionally misinterpreted, the consequence is incorrect formatting on the web site, which is a relatively minor issue.

  • TSFM goal: (“common” = intersection) Provide a familiar syntax that is very easy for humans to memorize, so that a layperson can predict exactly how their markup will be rendered (by every possible downstream doc pipeline). Computer source code is handled by many different viewers which may not support interactive preview. If a syntax is occasionally misinterpreted, the consequence is that a tag such as @beta or @internal may be ignored by the parser, which could potentially cause a serious issue (e.g. an escalation from an enterprise customer whose service was interrupted because of a broken API contract).

Hypothesis: For every TSFM construct, there exists a normalized form that will be parsed identically by CommonMark and TSDoc. In “strict mode” the TSDoc library can issue warnings for expressions that are not in normalized form. Assuming the author eliminates all such warnings, then a documentation pipeline can passthrough unmodified TSDoc content to a backend CommonMark engine, and have confidence that the rendered output will be correct.

Below are some proposed TSFM restrictions:

Whitespace generally doesn’t matter

This principle is very easy for people to remember, and eliminates a ton of edge cases.

Example 1:

/**
 * TSFM considers this to be an HTML element, whereas CommonMark does not:
 * <element attribute="@tag"
 *
 * />
 */

Example 1 converted to normalized form (so CommonMark interprets it the same as TSDoc):

/**
 * TSFM considers this to be an HTML element, whereas CommonMark does not:
 * <element attribute="@tag"
 * />
 */

Example 2:

/**
 * CommonMark interprets this indentation to make a code block, TSFM sees rich markup:
 * 
 *     **bold** @tag
 */

Example 2 converted to normalized form (so CommonMark interprets it the same as TSDoc):

/**
 * CommonMark interprets this indentation to make a code block, TSFM sees rich markup:
 * 
 * **bold** @tag
 */

Stars cannot be nested arbitrarily

TSDoc will support stars for bold/italics, based on 6 types of tokens that can be recognized by the lexical analyzer with minimal lookahead:

  • Opening italics single-star, e.g. *text is interpreted as <i>text
  • Closing italics single-star, e.g. text* is interpreted as text</i>
  • Opening bold double-star, e.g. **text is interpreted as <b>text
  • Closing bold double-star, e.g. text** is interpreted as text</b>
  • Opening bold+italics triple-star, e.g. ***text is interpreted as if <b+i>text
  • Closing bold+italics triple-star, e.g. text*** is interpreted as if text</b+i>

Other patterns are NOT interpreted as star tokens, e.g. text * text * contains literal asterisks, as does ****a****. A letter in the middle of a word can never be styled using stars, e.g. Toys*R*Us contains literal asterisk characters. A single-star followed by a double-star can be closed by a triple-star (e.g. *italics **bold+italics*** is seen as <i>italics<b>bold+italics</b+i>). Star markup is prohibited from spanning multiple lines.

Other characters (e.g. underscore) are NOT supported by TSDoc as synonyms for bold/italics.

Example 3:

/**
 * *CommonMark sees italics, but TSDoc does not because
 * its stars cannot span lines.*
 *
 * CommonMark sees italics here: __proto__
 *
 * Common**M**ark sees a boldfaced M, but TSDoc sees literal stars.
 */

Example 3 normalized form:

/**
 * \*CommonMark sees italics, but TSDoc does not because
 * its stars cannot span lines.\*
 *
 * CommonMark sees italics here: \_\_proto\_\_ (or better to use `__proto__`)
 *
 * Common\*\*M\*\*ark sees a boldfaced M, but TSDoc sees literal stars.
 *
 * If you really need to boldface a letter, use HTML elements: Common<b>M</b>ark.
 */

Example 4:

/**
 * For **A **B** C** the B is double-boldfaced according to CommonMark.
 * The TSDoc tokenizer sees `<b>A <b>B</b> C</b>` which the parser then flattens
 * to `<b>A **B</b> C**` because it doesn't allow nesting.
 *
 * Improper balancing also gets ignored, e.g. for **A *B** C* the TSDoc tokenizer
 * will see `<b>A <i>B</b> C</i>` which the parser flattens to `<b>A *B</b> C*`
 * Whereas CommonMark would counterintuitively see `<i><i>A<i>B</i></i>C</i>`.
 */

Example 4 normalized form:

/**
 * For **A \*\*B** C\*\* the B is double-boldfaced according to CommonMark.
 * The TSDoc tokenizer sees `<b>A <b>B</b> C</b>` which the parser then flattens
 * to `<b>A **B</b> C**` because it doesn't allow nesting.
 *
 * Improper balancing also gets ignored, e.g. for **A \*B** C\* the TSDoc tokenizer
 * will see `<b>A <i>B</b> C</i>` which the parser flattens to `<b>A *B</b> C*`
 * Whereas CommonMark would counterintuitively see `<i><i>A<i>B</i></i>C</i>`.
 */

Code spans are simplified

For TSFM, a nonescaped backtick will always start a code span and end with the next backtick. Whitespace doesn’t matter.

Example 5:

/**
 * `Both TSDoc and CommonMark
 * agree this is code.`
 *
 * before `CommonMark disagrees
 *
 * if a line is skipped, though.` after
 *
 * `But this is not code because the backtick is unterminated
 */

Example 5 normalized form:

/**
 * `Both TSDoc and CommonMark
 * agree this is code.`
 *
 * before `CommonMark disagrees
 * if a line is skipped, though.` after
 *
 * \`But this is not code because the backtick is unterminated
 */

Blocks don’t nest

I want to say that “>” blockquotes should not be supported at all, since the whitespace handling for these constructs is highly counterintuitive. Instead we would recommend <blockquote> HTML tags for this scenario.

Lists are a very useful and common scenario. However, CommonMark lists also have a lot of counterintuitive rules regarding handling of whitespace.

A simplification would be to say that TSFM interprets any line that starts with “-” as being a list item, and the list ends with the first blank line. No other character (e.g. “*” or “+”) can be used to create lists. If complicated nesting is required, then HTML tags such as <ul> and <li> should be used to avoid any confusion.

Example 6:

/**
 * A list with 3 things
 * - item 1
 *              - item 2
 * spans several
 *      lines
 * - item 3
 *
 * Two lists separated by a newline
 * -  list 1 with one item
 *
 * - list 2 with one item
 *
 * + not a list item
 * + not a list item
 *
 * CommonMark surprisingly considers this to be a list whose first item is another list,
 * whereas TSDoc sees a minus character as the first item:
 * - - foo
 */

Example 6 normalized form:

/**
 * A list with 3 things
 * - item 1
 * - item 2
 *   spans several
 *   lines
 * - item 3
 *
 * Two lists separated by a newline
 * -  list 1 with one item
 * <!-- CommonMark requires an HTML comment to separate two lists -->
 * - list 2 with one item
 *
 * \+ not a list item
 * \+ not a list item
 * 
 * CommonMark surprisingly considers this to be a list whose first item is another list,
 * whereas TSDoc sees a minus character as the first item:
 * - \- foo
 */

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
sharwellcommented, Dec 3, 2018

@pgonzal The current examples show an input that would be treated differently by the two implementations, but only show one normalized form. Each example should be presented with two normalized forms:

  1. The normalized form that causes both parsers to interpret the input in the manner that TSFM interprets the original input
  2. The normalized form that causes both parsers to interpret the input in the manner that CommonMark interprets the original input (this is the one that’s missing)
1reaction
dendcommented, Jul 9, 2018

Something worth calling out here is how this can interact with docs.microsoft.com/DocFX. Now, I know that we are working on a standard here, but fragmentation and a bunch of custom stuff is a bit of a concern. We do have support for Markdown Extensions, so likely that should be a place where we can plug in.

The format you are talking about here is parser-specific - on docs.microsoft.com, we’ve recently switched to MarkDig, that handles CommonMark parsing much better. It would be preferable to not be inventing our own standard due to the fact that the rest of the documentation stack does not use (and we have no plans to), and guiding people to one set of conventions for TS documentation contributions and another one for the rest of docs seems problematic. Besides, this also adds the added issue of our own parser interpreting the proposed conventions incorrectly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

An RFC for CommonMark? - Spec
I'm wondering if the authors and community are interested in publishing CommonMark in the RFC Series. This would make it easier for other ......
Read more >
RFC 7764: Guidance on Markdown: Design Philosophies ...
RFC 7764 : Guidance on Markdown: Design Philosophies, Stability Strategies, and Select Registrations.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found