scrapy different with requests? requests work, but scrapy not
See original GitHub issue- use requests library, it work.
import requests
headers = {}
headers['Host'] = 'www.bloomberg.com'
headers['User-Agent'] = 'Charles/4.2.1'
response = requests.get('https://www.xxx.com/quote/700:HK', headers=headers
- When use scrapy, it was block because the Website realize scrapy is a robot. what’s the different with requests?
2018-08-15 16:48:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (307) to <GET https://www.xxx.com/tosv2.html?vid=&uuid=f3262880-a067-11e8-922c-0f4bf8a1cda2&url=L3F1b3RlLzcwMDpISw==> from <GET https://www.xxx.com/quote/700:HK>
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (6 by maintainers)
Top Results From Across the Web
Why do a request work on requests but not on scrapy
But it only works if I use the requests lib. When i run requests. get() with the same url used with scrapy I...
Read more >Requests and Responses — Scrapy 2.7.1 documentation
Both Request and Response classes have subclasses which add functionality not required in the base classes. These are described below in ...
Read more >Requests and Responses - Scrapy documentation
Scrapy uses Request and Response objects for crawling web sites. ... Both Request and Response classes have subclasses which add functionality not required ......
Read more >Why would some use scrapy instead of just crawling ... - Quora
Scrapy code is more readable and maintainable in the sign that you separate your requesting stuff from middleware, pipelines, and proxy connections for...
Read more >How do you perform sequential requests? - Google Groups
handle async/multiple simultaneous requests like scrapy does through. Twisted. ... No idea if it's something useful or do-able, but I'd prefer:
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@EdgarMagalhaes is it for bloomberg or for some other website? The bloomberg case is special as it’s most likely a sophisticated anti-bot measure.
Hello @zeroleo12345,thank you for forgetting to remove the website name from the code, it allowed me to play with this for some time. It’s definitely an anti-bot protection but I don’t know why it trips on Scrapy requests more often than on similar curl requests. I say “more often” because I got both 200 with Scrapy and 307 with curl. You should have more luck by setting a real user-agent but it’s definitely not enough. My only guess is that Scrapy does something with HTTPS differently because it seems the only thing that can be different when all headers are the same.