question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

utils.get_encoding_from_headers returns ISO-8859-1 incorrectly

See original GitHub issue

When I call get_encoding_from_headers on this url:

http://thelastpsychiatrist.com/2012/02/my_fiancee_is_pushing_me_away.html

The response is ISO-8859-1:

(Pdb) get_encoding_from_headers(self.response.headers)
'ISO-8859-1'

Even though the headers don’t contain that characterset:

(Pdb) self.response.headers
{'date': 'Sun, 11 Mar 2012 21:10:40 GMT', 'transfer-encoding': 'chunked', 'content-type': 'text/html', 'server': 'Apache/2.2.22'}

It looks like this was an intentional choice in the source, but this is problematic for me because, if I knew that the encoding was guessed, I’d want to check the HTML meta tag myself - which would then properly parse as UTF-8.

I think the better solution for is to either return None explicitly, or provide a default kwarg param that people could set to an encoding manually if they wanted to.

I can patch this if it sounds like a good solution.

Issue Analytics

  • State:closed
  • Created 12 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
umbraecommented, Mar 31, 2012

For future reference to anyone who stumbles upon this, the spec is:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1

The “charset” parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the “text” type are defined to have a default charset value of “ISO-8859-1” when received via HTTP. Data in character sets other than “ISO-8859-1” or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.

0reactions
kennethreitzcommented, Aug 22, 2012
Read more comments on GitHub >

github_iconTop Results From Across the Web

ResourceBundle loading ISO-8859-1 characters incorrectly
I have a following test_fi.properties file under my project, where I have special characters that are visible properly in IntelliJ.
Read more >
Non-ascii (iso-8859-1) location headers are handled ... - GitHub
The real problem is that we should be operating on the location header as a set of bytes that are a encoded in...
Read more >
SOAP MESSAGE INCORRECTLY CONVERTED TO ISO-8859 ...
The presence of DFHCONTENTTYPE container after a repeated INVOKE WEBSERVICE causes to incorrectly convert the SOAP message to to ISO-8859-1 (ASCII) instead ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found