question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Russian Webpage parsing support.

See original GitHub issue
  • Platform: Mac
  • Mercury Parser Version: Web based api (at moment)
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

Expected Behavior

Proper encoding for Russian language.

Current Behavior

When parsing this link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb it doesn’t give proper encode output and hence format is messed up when rendering in html.

Steps to Reproduce

  1. Parse link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb
  2. Check the content output
  3. Try to render that content with Cyrillic font
  4. You will see instead of proper format it shows bunch of ‘�’

Detailed Description

I use this API for parsing articles in my reader app. And there are some Russian news feed try to use and are not able to get proper format output.

Possible Solution

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
vjyanandcommented, Feb 12, 2019

My findings #267

  1. Change regex case-insensitive

var ENCODING_RE = /charset=([\w-]+)\b/i;

  1. Check truthfulness of metaContentType before comparing

if (metaContentType && properEncoding !== encoding) {

Thanks

1reaction
mrgodhanicommented, Feb 8, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

Russian Webpage parsing support. · Issue #263 · postlight/parser ...
I use this API for parsing articles in my reader app. And there are some Russian news feed try to use and are...
Read more >
How to parse russian text properly using Python 2.7 and ...
I am trying to parse all posts from a russian website(http://games4you.ucoz.ua/news/). I am using Python 2.7.9 and BeautifulSoup 4.
Read more >
Parsing cyrillic site comming up with question marks, how to add ...
Now when i try to parse a russian site in cyrillic its coming up as ... PowerShell apparantly uses a font which doesn't...
Read more >
Parsing Html The Cthulhu Way - Coding Horror
Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your ......
Read more >
Parsing "check in store" information from zara.com - Google Groups
I want to parse clothes availability information from Zara website. I need this for russian website but it is completely the same with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found