Encoding issue when html meta charset differs from HTTP header Content-Type charset
See original GitHub issueWhen crawling a website which respond using HTTP header
Content-Type: "text/html; charset=windows-1252"
And HTML contains :
<meta charset="utf-8">
We end up with accented characters being badly encoded because the real encoding is windows-1252 but while parsing the DOM we are using UTF8
An other issue seems to appear with charset not being detected when there is a space juste after the semi colon.
In MimeType.cs > GetParameter(String key) method, the StartsWith should take into account spaces, or the variable _params
should be trimmed in the ctor ?.
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
<meta charset="utf-8"> vs <meta http-equiv="Content-Type">
Declare the encoding in your HTML files using meta charset (like above). ... declaring the UTF-8 encoding in the Content-Type HTTP header.
Read more >Declaring character encodings in HTML
Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and content attributes ...
Read more >Content Encoding: why and how to use the meta charset ...
When the browser receives an HTTP response, it actually receives text encoded in bytes, where each byte or sequence of bytes represents a...
Read more ><meta charset="utf-8"> vs <meta http-equiv="Content-Type">
It is used to specify the character encoding used in the HTML document. It is used to specify the HTTP header content type....
Read more >A document must not include both a “meta” element with an ...
A document must not include both a “meta” element with an “http-equiv” attribute whose value is “content-type”, and a “meta” element with a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Which version are you using? This should have been fixed (as a sideeffect) in #282 (see PR #284).
Will be available in the next release, see PR #299.