This library doesn't decode everything properly
See original GitHub issueI’m using this piece of code to download a webpage (using request
library) and decode everything (using your iconv-lite
library). The loader
function is for finding some elements from the body of the website, then returning them as a JavaScript object.
request.get({url: url, encoding: null}, function(error, response, body) {
// if webpage exists, process it, otherwise throw 'not found' error
if (response.statusCode === 200) {
body = iconv.decode(body, "iso-8859-1");
const $ = cheerio.load(body);
async function show() {
var data = await loader.getDay($, date, html_tags, thumbs, res, image_thumbnail_size);
res.send(JSON.stringify(data));
}
show();
} else {
res.status(404);
res.send(JSON.stringify({"error":"No content for this date."}))
}
});
The pages are encoded in ISO-8859-1 format, and the content is looking normal, there are no bad chars. When I wasn’t using iconv-lite
, some characters, eg. ü
, were looking like this: �. Now, when I’m using the library like in the code provided above, most of the chars are looking good, but some, eg. š
are an empty box, even though they’re displayed without any problems on the website.
I’m sure it’s not cheerio’s issue, because when I printed the output using res.send(body);
or res.send(JSON.stringify({"body":body}));
, the empty box character was still present there. If that’s important, I copied the empty box character to Google, and it has changed to š
. Also, I tried to change output of Express using res.charset
but that didn’t help.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
Yep, it’s working right now, so it was an issue with the website itself, not this library. Thanks for help! 😃
https://validator.w3.org/nu/?doc=https%3A%2F%2Fapod.nasa.gov%2Fapod%2Fap170813.html
This website gave me pretty interesting results about the real charset used on the NASA website
When I get back home I’ll check if it’s working when I change iconv’s decoding to windows-1252.