"smart punctuation", i.e. dynamic display of input characters
See original GitHub issueSome Markdown processors offer features called smart dashes, smart quotes, or smart punctuation, which translate certain sequences of basic characters available from the keyboard into specialized characters. When these characters are displayed literally in the editor pane, then the appearance is less appealing than when they appear translated.
It would be a wonderful enhancement to apply the conversions before displaying text in the editor.
Desired translations depend on localization, but to begin, I have provided a list for English (see below), some items on which may be used also in other languages.
Some translations are simple pattern substitution, whereas others are context dependent.
I also attempted an example set (see bottom).
Note that some editors translate input characters in the saved MarkDown source. I would not recommend such a feature, unless it is only enabled by user preference.
Note also that that backslash escaping would need to be available to suppress translation in cases as desired.
Source literal pattern | Target Unicode characters | Descriptive name |
---|---|---|
Exactly one ASCII hyphen: - |
U+2010, or U+2012 if immediately preceded and followed by digits with no intermediary space | Hypthen or Figure dash |
Exactly two ASCII hyphens: -- |
U+2013 | En Dash |
Exactly three ASCII hyphens: --- |
U+2014 | Em Dash |
Exactly three ASCII full stops: ... |
U+2026 | Horizontal Ellipsis |
ASCII double quote: " |
U+201D if not followed by any letter in word, otherwise U+201C | Left and right double quote mark |
ASCII apostrophe: ' |
U+2018 if not preceding by any letter in word and if matched by a later character in paragraph interpreted as a right single quote mark, otherwise U+2019. (Interpret as right single quote if not followed by any letter in word and if matching an earlier character in paragraph interpreted as a single left quote.) | Left and right single quote mark |
Original text | Formatted output |
---|---|
The Nov--Dec period is busiest for retail outlets. |
The Nov–Dec period is busiest for retail outlets. |
I went shopping yesterday---I go every Sunday---but the market was closed. |
I went shopping yesterday—I go every Sunday—but the market was closed. |
Ages 15-20 are considered formative for adult personality. |
Ages 15‒20 are considered formative for adult personality. |
It seems he wants us to follow him... |
It seems he wants us to follow him… |
The '29 stock-market crash precipitated the Great Depression. |
The ’29 stock-market crash precipitated the Great Depression. |
Marsha Robinson's dog is ill. |
Marsha Robinson’s dog is ill. |
At least when I asked Tom, he said, "Marsha's dog is ill." |
At least when I asked Tom, he said, “Marsha’s dog is ill”. |
I told you, "Tom said, 'Marsha's dog is ill'". |
I told you, “Tom said, ‘Marsha’s dog is ill’”. |
So you if you see a sick dog, you can say, 'This dog is the Robinsons''. |
So you if you see an ill dog, you can say, ‘This dog is the Robinsons’’. |
Be sure not to say, 'This dog is the *Petersons'*'! |
Be sure not to say, ‘This dog is the Petersons’’! |
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:9 (5 by maintainers)
And users writing in such languages load packages that alter processing to follow appropriate conventions. Only English-language users may lazily avoid use of these packages because of the American origin of the software. Ultimately, extensibility is required in any case to meet the demands of all users worldwide with respect to a feature as complicated is the one proposed.
Then LaTeX behaves the same as in the rules of the proposal, except that the proposal requires a figure dash in special cases. I won’t change the proposal now, because this distinction is far from the central purpose of the discussion, but I am open to any design choice that favors adherence to some authoritative convention.
Thanks for the comments.
Are you sure? I believe that a figure dash is always preferred for numeric ranges, which LaTeX (or any similar software) should identify as the conversion target if the source character is surrounded by digits (as opposed to word characters, e.g. letters). Actually I am not familiar with a convention of a double hyphen in the source when surrounded by digits. LaTeX may convert it to an em dash as it would otherwise, but I have doubts that this result is desirable typographically. Em dash as a different purpose, usually to set apart clauses.
See the Wikipedia articles on the characters for clarification.