Scrapy capitalizes headers for request
See original GitHub issueI’m setting the headers following way
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'cache-control': 'no-cache',
...
}
And calling request like that:
yield scrapy.Request(url='https:/myurl.com/', callback=self.parse, headers=headers, cookies=cookies, meta={'proxy': 'http://localhost:8888'})
And it makes that scrapy capitalizes all these headers and it looks like that (I’m using Charles proxy for debugging):
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cache-Control: no-cache
And this is not working correctly for my case.
If I’m using curl and set headers lowercase
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
cache-control: no-cache
everything works like a charm.
Is there any way how I can disable this capitalizing behavior in Scrapy? Thanks for any help!
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:9 (6 by maintainers)
Top Results From Across the Web
Scrapy capitalizes request headers - python - Stack Overflow
I know that some websites do request header fingerprinting to detect bots, but the capitalized headers generated by scrapy look much more ...
Read more >Requests and Responses — Scrapy 2.7.1 documentation
Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and...
Read more >How To Use HEADERS in SCRAPY SHELL, Python Requests ...
See how to add headers in the scrapy shell fetch commandand how to use cURL to check a URL via command line. Make...
Read more >Scraping Data on the Web with BeautifulSoup
Because Scrapy serves the purpose of mass-scraping, it is much easier to get in trouble with ... import requests from bs4 import BeautifulSoup...
Read more >064-真正解决scrapy自动将header请求头大写问题 - 家的博客
本文主要讲解如何真正解决scrapy 将header 请求头自动 ... 分别用request 和scrapy 请求目标网站,url,参数,form 等都用一样的数据(排除类似随机 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A fast solution for this issue. In your spider code:
that prevents twisted to capitalize those headers. The problem is entirely at twisted side. Nothing is required to be done at scrapy side.
I think it’d be good to not capitalize header names by default, and pass them as-is.