Incorrect handling of UTF-8 encoding on Google Finance page
See original GitHub issueThis code produces an incorrect Unicode character instead of the right single quote (U+2019):
var request = require('request');
var cheerio = require('cheerio');
request('https://www.google.com/finance?q=NYSE%3ASGL', function (error, response, body) {
if (!error && response.statusCode === 200) {
console.dir(response.headers['content-type']);
var $ = cheerio.load(body);
console.log($('.companySummary').text().match(/The Fund.s/g)[0]);
}
});
Expected:
'text/html; charset=utf-8'
The Fund’s
Actual:
'text/html; charset=utf-8'
The Fund�s
The incorrect character has the code point U+FFFD.
Using cheerio 0.17.0 on Windows. With the same invocation, the Unicode for this Japanese page is produced correctly:
'use strict';
var request = require('request');
var cheerio = require('cheerio');
request('https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8', function (error, response, body) {
if (!error && response.statusCode === 200) {
console.dir(response.headers['content-type']);
var $ = cheerio.load(body);
console.log($('.mw-headline').text());
}
});
Issue Analytics
- State:
- Created 9 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
How to fix: Invalid UTF-8 encoding - Google Support
If you're using Notepad to save your file, please select Save As, and then select ANSI or UTF-8 in the Encoding options. If...
Read more >How to Change CSV File Encoding to UTF-8 with Google Sheets
Step 1: Create a new Google Sheet. · Step 2: Open file you want to convert to UTF-8 · Step 3: Download file...
Read more >Filename encoding and interoperability problems
Using UTF-8 for all object names and filenames will ensure that gsutil doesn't encounter character encoding errors while operating on the files. Unfortunately, ......
Read more >How can I fix the UTF-8 error when bulk uploading users?
This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the...
Read more >Saving CSV/Excel file as UTF-8 Encoded - WebToffee
This article explains how to encode a CSV file in UTF-8 using MS Excel, Google Sheets, Libre Office, Notepad, Apple Number, and TextEdit....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Try to use
request.get({ uri: baseURI, encoding: 'binary' }, function
it solved my problem, but don’t ask me why it works. Solved problem thanks to this topic:https://github.com/request/request/issues/118
So it’s not an issue with Cheerios but with encoding and request module.
Please give perfect solution.