question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Invalid decoded text

See original GitHub issue
  • Platform: Mac
  • Mercury Parser Version: 2.1.0
  • Node Version (if a Node bug): 10
  • Browser Version (if a browser bug):

Expected Behavior

Parsed HTML should be properly encoded as per the original text

Current Behavior

The parsed html contains in invalid text .Might be because of decoding issue.

Steps to Reproduce

  • Fetch the html using any client
  • Pass that to the parse using Mercury.parse(url,{html:fetchedHtml})
  • Returned HTML contains incorrectly decoded text

Some Links: https://www.newyorker.com/culture/the-new-yorker-interview/daenerys-tells-all-game-of-thrones-finale-emilia-clarke-beyonce

Detailed Description

I want to parse by fetching the html and giving to the parse instead of parser fetching the html.

Possible Solution

After looking at the code, it seem you are handling the case for browser only i.e. only if the html is provided from the browser, the proper encoding is checked from the html file. Ideally it should be able to decode the text irrespective of whether the parser is running on a browser or not

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8

github_iconTop GitHub Comments

6reactions
FarmaanElahicommented, Jul 30, 2019

For me problem was when I was trying to pass the local html as string. Using Buffer fixed the issue

Mercury.parse(url, {
        html: Buffer.from(html, "utf-8"),
        headers: {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " +
                "Chrome/60.0.3112.113 Safari/537.36"
        },
    })

1reaction
farmaan-appachhicommented, May 24, 2019

Fixed it by passing the html as Buffer with utf-8 instead of string as mentioned in the README

Read more comments on GitHub >

github_iconTop Results From Across the Web

Invalid decoded text · Issue #425 · postlight/parser · GitHub
The parsed html contains in invalid text .Might be because of decoding issue. Steps to Reproduce. Fetch the html using any client; Pass...
Read more >
Why is python decode replacing more than the invalid bytes ...
This means that for an invalid encoded sequence like '\xF0SUFFIX' , it will decode u'\ufffdFIX' instead of u'\ufffdSUFFIX' . Example 1: Introducing DOM...
Read more >
base64: invalid input error when trying to decode contents of ...
The error base64: invalid input seems to indicate that the base64 program is not able to accept the encoded input into its decode...
Read more >
encode and decode error: invalid character in a base-64 string.
Solution 1. The error means that your encoded string going in, is broken. It's probably been modified along the way, use the debugger...
Read more >
Base64 Encoding of "invalid" - Online
Encode invalid to Base64 format with various advanced options. ... Select a file to upload and process, then you can download the encoded...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found