question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Explicit "content-length" in header leads to incorrect HTTP request

See original GitHub issue

Description

Explicit “content-length” in header leads to incorrect HTTP request - 400 error from target

Steps to Reproduce

Request("https://webhook.site/<YOUR ID>", method='POST', headers={}) >>> OK

Request("https://webhook.site/<YOUR ID>", method='POST', headers={'content-length': '0'}) >>> ERROR: <twisted.python.failure.Failure scrapy.spidermiddlewares.httperror.HttpError: Ignoring non-200 response>

Expected behavior: [What you expect to happen] OK

Actual behavior: [What actually happens] ERROR

Reproduces how often: [What percentage of the time does it reproduce?] 100%

Versions

2.4.1

Additional context

Seems twisted duplicates content-length field in the resulting request. I came across this from standard workflow, when I copy the request from browser (import to curl), test it with curl (ok), implement in scrapy (fails).

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Gallaeciocommented, Jun 3, 2022

I am not sure what the best approach is.

I mean, ideally the user should not set the Content-Length header at all, and let Scrapy (Twisted) do that.

If a user sets the header, we could drop it from the download handlers to avoid this issue. However, doing so silently could be confusing. For example, imagine the user sets the wrong header value. Do we ignore the user-defined value without any feedback?

Maybe we should error out on the client side (i.e. do not let the request reach the target server) on requests with the header, and ask the user to remove the header. And if someone has a valid use case for setting Content-Length to the wrong value, let them explain and defend that use case in a GitHub feature request.

We should agree on the best solution before someone tries and implements the wrong one.

1reaction
mmitropouloucommented, Jun 15, 2021

Hello @wRAR! @marlenachatzigrigoriou and I would like to contribute to this issue. We are new to scrapy so we would like to have your guidance. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are the consequences of not including a content-length ...
If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body...
Read more >
Content-Length of HEAD requests incorrectly computed as 0
When responding to a HEAD request without streaming the entity and without setting the Content-Length, the Content-Length is incorrectly ...
Read more >
95 (Handling multiple Content-Length header fields)
This has caused "request/response smuggling attacks", when any pair of the server, the proxy, and the clients involved are interpreting those differently. The ......
Read more >
15.2. Content-Length: The Entity's Size - HTTP - O'Reilly
The Content-Length header is mandatory for messages with entity bodies, unless the message is transported using chunked encoding. Content-Length is needed to ...
Read more >
HTTP/1.1: Header Field Definitions
The Accept request-header field can be used to specify certain media types ... then all character sets not explicitly mentioned get a quality...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found