Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scrapy different with requests? requests work, but scrapy not

See original GitHub issue

use requests library, it work.

import requests

headers = {}
headers['Host'] = 'www.bloomberg.com'
headers['User-Agent'] = 'Charles/4.2.1'

response = requests.get('https://www.xxx.com/quote/700:HK', headers=headers

When use scrapy, it was block because the Website realize scrapy is a robot. what’s the different with requests?

2018-08-15 16:48:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (307) to <GET https://www.xxx.com/tosv2.html?vid=&uuid=f3262880-a067-11e8-922c-0f4bf8a1cda2&url=L3F1b3RlLzcwMDpISw==> from <GET https://www.xxx.com/quote/700:HK>

Issue Analytics

State:
Created 5 years ago
Comments:13 (6 by maintainers)

Top GitHub Comments

1reaction

wRARcommented, Nov 30, 2018

@EdgarMagalhaes is it for bloomberg or for some other website? The bloomberg case is special as it’s most likely a sophisticated anti-bot measure.

1reaction

wRARcommented, Sep 14, 2018

Hello @zeroleo12345,thank you for forgetting to remove the website name from the code, it allowed me to play with this for some time. It’s definitely an anti-bot protection but I don’t know why it trips on Scrapy requests more often than on similar curl requests. I say “more often” because I got both 200 with Scrapy and 307 with curl. You should have more luck by setting a real user-agent but it’s definitely not enough. My only guess is that Scrapy does something with HTTPS differently because it seems the only thing that can be different when all headers are the same.

Top Results From Across the Web

Why do a request work on requests but not on scrapy

But it only works if I use the requests lib. When i run requests. get() with the same url used with scrapy I...

Requests and Responses — Scrapy 2.7.1 documentation

Both Request and Response classes have subclasses which add functionality not required in the base classes. These are described below in ...

Requests and Responses - Scrapy documentation

Scrapy uses Request and Response objects for crawling web sites. ... Both Request and Response classes have subclasses which add functionality not required ......

Why would some use scrapy instead of just crawling ... - Quora

Scrapy code is more readable and maintainable in the sign that you separate your requesting stuff from middleware, pipelines, and proxy connections for...

How do you perform sequential requests? - Google Groups

handle async/multiple simultaneous requests like scrapy does through. Twisted. ... No idea if it's something useful or do-able, but I'd prefer: