Wrong response.body encoding with http-equiv headers
See original GitHub issueA Response
object doesn’t seem to obey a http-equiv
header for Content-Type
encoding when it found a HTTP header saying different.
So if the http header says ‘utf-8’ but the body content is, say, codepage 1252 and the documents’ http-equiv says 1252, then scrapy appears to still picks utf-8 for decoding body content.
That might be the right decision, but I think it’s wrong. The document itself should know it’s encoding better than a server-wide setting would.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Flutter http response.body bad utf8 encoding - Stack Overflow
If the server response sets the Content-Type header to application/json; charset=utf-8 the body should work as expected.
Read more >Attribute “http-equiv” not allowed on element “meta” at this point.
While HTTP response headers can be set from the server, not everyone has access to the server configuration, so an alternative is using...
Read more >Declaring character encodings in HTML - W3C
Quick answer. Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and ...
Read more >Feature #2567: Net::HTTP does not handle encoding correctly
puts result.body.encoding # ASCII-8BIT <- incorrect encoding, should be UTF-8 ... What should the user expect when the response headers are wrong?
Read more >The remote server returned an error: (400) Bad Request.
... charset=us-ascii"></HEAD> <BODY><h2>Bad Request - Invalid URL</h2> ... Response Header (include the x-correlation-id) 5. Response Body.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Instead of creating a pull request, I think you could just share the solution here, as people will have to copy-paste it anyway.
Thank you for the information and your ideas! Sadly, we have a quite short time frame for our assignment, so we did not have the possibility to change approach after we had started. As of right now, we have implemented a new downloader middleware that is only included in the middleware pipeline if it is enabled in the settings. The processing of the responses is implemented as:
This would update the response to obey the encoding defined by the body over the encoding defined in the header, as well as keeping the behaviour of letting an encoding passed in the
__init__
methodencoding
argument remain its priority above both the encoding in the body and the header. Our solution works according to our tests, but there might be some edge cases that we have forgotten to test.We realise, as you have already mentioned, that it might be better to implement the feature directly in the TextResponse class, instead of needing to process every Response in a middleware. However, this is definitely a quick and easy solution for someone wanting to achieve this behaviour before a better solution is implemented and merged.
Would it be useful for anyone if we created a pull request (just to show our full solution) or would that be unnecessary?