Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

utils.get_encoding_from_headers returns ISO-8859-1 incorrectly

See original GitHub issue

When I call get_encoding_from_headers on this url:

http://thelastpsychiatrist.com/2012/02/my_fiancee_is_pushing_me_away.html

The response is ISO-8859-1:

(Pdb) get_encoding_from_headers(self.response.headers)
'ISO-8859-1'

Even though the headers don’t contain that characterset:

(Pdb) self.response.headers
{'date': 'Sun, 11 Mar 2012 21:10:40 GMT', 'transfer-encoding': 'chunked', 'content-type': 'text/html', 'server': 'Apache/2.2.22'}

It looks like this was an intentional choice in the source, but this is problematic for me because, if I knew that the encoding was guessed, I’d want to check the HTML meta tag myself - which would then properly parse as UTF-8.

I think the better solution for is to either return None explicitly, or provide a default kwarg param that people could set to an encoding manually if they wanted to.

I can patch this if it sounds like a good solution.

Issue Analytics

State:
Created 12 years ago
Comments:16 (16 by maintainers)

Top GitHub Comments

1reaction

umbraecommented, Mar 31, 2012

For future reference to anyone who stumbles upon this, the spec is:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1

The “charset” parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the “text” type are defined to have a default charset value of “ISO-8859-1” when received via HTTP. Data in character sets other than “ISO-8859-1” or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.

0reactions

kennethreitzcommented, Aug 22, 2012

We already have an extensive hook system:

http://docs.python-requests.org/en/latest/user/advanced/#event-hooks