question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Encoding issue when html meta charset differs from HTTP header Content-Type charset

See original GitHub issue

When crawling a website which respond using HTTP header

Content-Type: "text/html; charset=windows-1252"

And HTML contains :

<meta charset="utf-8">

We end up with accented characters being badly encoded because the real encoding is windows-1252 but while parsing the DOM we are using UTF8

An other issue seems to appear with charset not being detected when there is a space juste after the semi colon.

In MimeType.cs > GetParameter(String key) method, the StartsWith should take into account spaces, or the variable _params should be trimmed in the ctor ?.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
FlorianRapplcommented, Mar 22, 2016

Which version are you using? This should have been fixed (as a sideeffect) in #282 (see PR #284).

0reactions
FlorianRapplcommented, Mar 23, 2016

Will be available in the next release, see PR #299.

Read more comments on GitHub >

github_iconTop Results From Across the Web

<meta charset="utf-8"> vs <meta http-equiv="Content-Type">
Declare the encoding in your HTML files using meta charset (like above). ... declaring the UTF-8 encoding in the Content-Type HTTP header.
Read more >
Declaring character encodings in HTML
Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and content attributes ...
Read more >
Content Encoding: why and how to use the meta charset ...
When the browser receives an HTTP response, it actually receives text encoded in bytes, where each byte or sequence of bytes represents a...
Read more >
<meta charset="utf-8"> vs <meta http-equiv="Content-Type">
It is used to specify the character encoding used in the HTML document. It is used to specify the HTTP header content type....
Read more >
A document must not include both a “meta” element with an ...
A document must not include both a “meta” element with an “http-equiv” attribute whose value is “content-type”, and a “meta” element with a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found