Garbled text in emails with Japanese script
See original GitHub issue``When grabbing the HtmlBody of Outlook emails with Japanese text, we often get a mixture of correct text and unicode placeholder characters: ��
This has been an issue for a number of versions (still appears on latest 4.5.2), and .NET 6/7. I have had word from some of our other developers that the issue does NOT appear on .NET Framework 4.7.2
Example message, how it appears in Outlook:
HTML Body output of message:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"MS PGothic";
panose-1:2 11 6 0 7 2 5 8 2 4;}
@font-face
{font-family:"\@MS PGothic";}
@font-face
{font-family:Meiryo;}
@font-face
{font-family:"Meiryo UI";}
@font-face
{font-family:"\@Meiryo UI";}
@font-face
{font-family:"\@Meiryo";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:JA;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
font-size:10.0pt;
font-family:"Meiryo UI",sans-serif;
mso-fareast-language:JA;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Meiryo UI",sans-serif;
mso-fareast-language:JA;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72" style='word-wrap:break-word'><div class=WordSection1><p class=MsoNormal><span style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'>ABC<span lang=JA>ご担当者様</span></span><span style='font-size:10.5pt'><o:p></o:p></span>
</p><p class=MsoNormal><span style='font-size:10.5pt'><o:p> </o:p></span>
</p><p class=MsoNormal><span lang=JA style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'>お疲れ様です。</span><span style='font-size:10.5pt'><o:p></o:p></span>
</p><p class=MsoNormal><span style='font-size:10.5pt'>blahblahblah)</span><span lang=JA style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'>へ</span><span style='font-size:10.5pt'><o:p></o:p></span>
</p><p class=MsoPlainText>Q#<span style='font-family:"Meiryo",sans-serif'>123450</span><span lang=JA style='font-size:10.5pt'>��</span><o:p></o:p>
</p><p class=MsoNormal><span lang=JA style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'>標準構成掲載をお願いいたします。</span><span style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'><o:p></o:p></span>
</p><p class=MsoNormal><span style='font-size:10.5pt'><o:p> </o:p></span>
</p><p class=MsoNormal><span lang=JA style='font-size:10.5pt;font-family:"Meiryo UI",sans-serif'>何卒よろしくお願いいたします。</span><span style='font-size:10.5pt'><o:p></o:p></span>
</p><p class=MsoNormal><o:p> </o:p>
</p><br />
<p class=msipfooter90245289 align="Left" style="margin:0"><span style='font-size:7.0pt;font-family:Calibri;color:#737373'>Internal Use - Confidential</span>
</p></div></body></html>
How this appears (with Unicode placeholders hardcoded into the text):
Here is the body text of the email as typed in Outlook:
ABCご担当者様
お疲れ様です。
blahblahblah)へ
Q#123450を
標準構成掲載をお願いいたします。
何卒よろしくお願いいたします。
Issue Analytics
- State:
- Created 6 months ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
Japanese characters in Group emails are garbled or ...
When sending Office 365 Group emails in Plain Text with Hiragana letters (Japanese Characters), the resulting emails may have missing or garbled ......
Read more >Incomming email body is garbled for japanese characters. ...
But the thing is, when a user sends a mail to spiceworks in japanese characters, the subject line works fine but the body...
Read more >Japanese text is garbled (sometimes) when PGP is enabled
Hi everyone! I've been having some troubles using SimpleLogin with Protonmail. I often receive emails with japanese texts in it.
Read more >Japanese Texts Appear Garbled in Email Subject Line
the Japanese texts are garbled in the email subject of the Outlook Window after selecting the Share > Email buttons in Tableau Server...
Read more >Mail sent using PHP form showing garbled Japanese text
Here's the code. The website page itself is encoded as euc-jp. The email subject shows without problems. The text is garbled in my...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
At the moment I’m a little bit busy with some other project, I’ll try to look into your mails this week or next week.
I have rewritten the code that extract HTML from RTF (de-encapsulation) … please try this version (version 5.0.0 on nuget) and see if this fixes your issue. The previous extractor was somewhat a mess because of patch on patch on patch… etc… I also finally found some good Microsoft documentation about how to extract HTML from RTF