question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scrapy different with requests? requests work, but scrapy not

See original GitHub issue
  1. use requests library, it work.
import requests

headers = {}
headers['Host'] = 'www.bloomberg.com'
headers['User-Agent'] = 'Charles/4.2.1'

response = requests.get('https://www.xxx.com/quote/700:HK', headers=headers
  1. When use scrapy, it was block because the Website realize scrapy is a robot. what’s the different with requests?

2018-08-15 16:48:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (307) to <GET https://www.xxx.com/tosv2.html?vid=&uuid=f3262880-a067-11e8-922c-0f4bf8a1cda2&url=L3F1b3RlLzcwMDpISw==> from <GET https://www.xxx.com/quote/700:HK>

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
wRARcommented, Nov 30, 2018

@EdgarMagalhaes is it for bloomberg or for some other website? The bloomberg case is special as it’s most likely a sophisticated anti-bot measure.

1reaction
wRARcommented, Sep 14, 2018

Hello @zeroleo12345,thank you for forgetting to remove the website name from the code, it allowed me to play with this for some time. It’s definitely an anti-bot protection but I don’t know why it trips on Scrapy requests more often than on similar curl requests. I say “more often” because I got both 200 with Scrapy and 307 with curl. You should have more luck by setting a real user-agent but it’s definitely not enough. My only guess is that Scrapy does something with HTTPS differently because it seems the only thing that can be different when all headers are the same.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why do a request work on requests but not on scrapy
But it only works if I use the requests lib. When i run requests. get() with the same url used with scrapy I...
Read more >
Requests and Responses — Scrapy 2.7.1 documentation
Both Request and Response classes have subclasses which add functionality not required in the base classes. These are described below in ...
Read more >
Requests and Responses - Scrapy documentation
Scrapy uses Request and Response objects for crawling web sites. ... Both Request and Response classes have subclasses which add functionality not required ......
Read more >
Why would some use scrapy instead of just crawling ... - Quora
Scrapy code is more readable and maintainable in the sign that you separate your requesting stuff from middleware, pipelines, and proxy connections for...
Read more >
How do you perform sequential requests? - Google Groups
handle async/multiple simultaneous requests like scrapy does through. Twisted. ... No idea if it's something useful or do-able, but I'd prefer:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found