Russian Webpage parsing support.
See original GitHub issue- Platform: Mac
- Mercury Parser Version: Web based api (at moment)
- Node Version (if a Node bug):
- Browser Version (if a browser bug):
Expected Behavior
Proper encoding for Russian language.
Current Behavior
When parsing this link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb it doesn’t give proper encode output and hence format is messed up when rendering in html.
Steps to Reproduce
- Parse link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb
- Check the content output
- Try to render that content with Cyrillic font
- You will see instead of proper format it shows bunch of ‘�’
Detailed Description
I use this API for parsing articles in my reader app. And there are some Russian news feed try to use and are not able to get proper format output.
Possible Solution
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:7 (1 by maintainers)
Top Results From Across the Web
Russian Webpage parsing support. · Issue #263 · postlight/parser ...
I use this API for parsing articles in my reader app. And there are some Russian news feed try to use and are...
Read more >How to parse russian text properly using Python 2.7 and ...
I am trying to parse all posts from a russian website(http://games4you.ucoz.ua/news/). I am using Python 2.7.9 and BeautifulSoup 4.
Read more >Parsing cyrillic site comming up with question marks, how to add ...
Now when i try to parse a russian site in cyrillic its coming up as ... PowerShell apparantly uses a font which doesn't...
Read more >Parsing Html The Cthulhu Way - Coding Horror
Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your ......
Read more >Parsing "check in store" information from zara.com - Google Groups
I want to parse clothes availability information from Zara website. I need this for russian website but it is completely the same with...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
My findings #267
var ENCODING_RE = /charset=([\w-]+)\b/i;
if (metaContentType && properEncoding !== encoding) {
Thanks
More reference https://github.com/mrgodhani/raven-reader/issues/269