question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect handling of UTF-8 encoding on Google Finance page

See original GitHub issue

This code produces an incorrect Unicode character instead of the right single quote (U+2019):

var request = require('request');
var cheerio = require('cheerio');

request('https://www.google.com/finance?q=NYSE%3ASGL', function (error, response, body) {
  if (!error && response.statusCode === 200) {
    console.dir(response.headers['content-type']);
    var $ = cheerio.load(body);
    console.log($('.companySummary').text().match(/The Fund.s/g)[0]);
  }
});

Expected:

'text/html; charset=utf-8'
The Fund’s

Actual:

'text/html; charset=utf-8'
The Fund�s

The incorrect character has the code point U+FFFD.

Using cheerio 0.17.0 on Windows. With the same invocation, the Unicode for this Japanese page is produced correctly:

'use strict';
var request = require('request');
var cheerio = require('cheerio');

request('https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8', function (error, response, body) {
  if (!error && response.statusCode === 200) {
    console.dir(response.headers['content-type']);
    var $ = cheerio.load(body);
    console.log($('.mw-headline').text());
  }
});

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

9reactions
AdamMadrzejewskicommented, Nov 3, 2015

Try to use request.get({ uri: baseURI, encoding: 'binary' }, function it solved my problem, but don’t ask me why it works. Solved problem thanks to this topic:

https://github.com/request/request/issues/118

So it’s not an issue with Cheerios but with encoding and request module.

2reactions
vishnuSoftwarecommented, Aug 20, 2018

Please give perfect solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix: Invalid UTF-8 encoding - Google Support
If you're using Notepad to save your file, please select Save As, and then select ANSI or UTF-8 in the Encoding options. If...
Read more >
How to Change CSV File Encoding to UTF-8 with Google Sheets
Step 1: Create a new Google Sheet. · Step 2: Open file you want to convert to UTF-8 · Step 3: Download file...
Read more >
Filename encoding and interoperability problems
Using UTF-8 for all object names and filenames will ensure that gsutil doesn't encounter character encoding errors while operating on the files. Unfortunately, ......
Read more >
How can I fix the UTF-8 error when bulk uploading users?
This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the...
Read more >
Saving CSV/Excel file as UTF-8 Encoded - WebToffee
This article explains how to encode a CSV file in UTF-8 using MS Excel, Google Sheets, Libre Office, Notepad, Apple Number, and TextEdit....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found