Explicit "content-length" in header leads to incorrect HTTP request
See original GitHub issueDescription
Explicit “content-length” in header leads to incorrect HTTP request - 400 error from target
Steps to Reproduce
Request("https://webhook.site/<YOUR ID>", method='POST', headers={}) >>> OK
Request("https://webhook.site/<YOUR ID>", method='POST', headers={'content-length': '0'}) >>> ERROR: <twisted.python.failure.Failure scrapy.spidermiddlewares.httperror.HttpError: Ignoring non-200 response>
Expected behavior: [What you expect to happen]
OK
Actual behavior: [What actually happens]
ERROR
Reproduces how often: [What percentage of the time does it reproduce?] 100%
Versions
2.4.1
Additional context
Seems twisted duplicates content-length field in the resulting request. I came across this from standard workflow, when I copy the request from browser (import to curl), test it with curl (ok), implement in scrapy (fails).
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
What are the consequences of not including a content-length ...
If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body...
Read more >Content-Length of HEAD requests incorrectly computed as 0
When responding to a HEAD request without streaming the entity and without setting the Content-Length, the Content-Length is incorrectly ...
Read more >95 (Handling multiple Content-Length header fields)
This has caused "request/response smuggling attacks", when any pair of the server, the proxy, and the clients involved are interpreting those differently. The ......
Read more >15.2. Content-Length: The Entity's Size - HTTP - O'Reilly
The Content-Length header is mandatory for messages with entity bodies, unless the message is transported using chunked encoding. Content-Length is needed to ...
Read more >HTTP/1.1: Header Field Definitions
The Accept request-header field can be used to specify certain media types ... then all character sets not explicitly mentioned get a quality...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am not sure what the best approach is.
I mean, ideally the user should not set the
Content-Length
header at all, and let Scrapy (Twisted) do that.If a user sets the header, we could drop it from the download handlers to avoid this issue. However, doing so silently could be confusing. For example, imagine the user sets the wrong header value. Do we ignore the user-defined value without any feedback?
Maybe we should error out on the client side (i.e. do not let the request reach the target server) on requests with the header, and ask the user to remove the header. And if someone has a valid use case for setting
Content-Length
to the wrong value, let them explain and defend that use case in a GitHub feature request.We should agree on the best solution before someone tries and implements the wrong one.
Hello @wRAR! @marlenachatzigrigoriou and I would like to contribute to this issue. We are new to scrapy so we would like to have your guidance. Thank you!