scrapy-splash recursive crawl using CrawlSpider not working
See original GitHub issueHi !
I have integrated scrapy-splash in my CrawlSpider process_request in rules like this:
def process_request(self,request):
request.meta['splash']={
'args': {
# set rendering arguments here
'html': 1,
}
}
return request
The problem is that the crawl renders just urls in the first depth, I wonder also how can I get response even with bad http code or redirected reponse;
Thanks in advance,
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:36 (2 by maintainers)
Top Results From Across the Web
Scrapy-Splash recursive crawl using CrawlSpider not working
I have integrated scrapy-splash in my CrawlSpider and it only crawl renders the start_urls. Wondering how to have scrapy-splash crawl the ...
Read more >scrapy-splash recursive crawl using CrawlSpider not working
Coming soon: A brand new website interface for an even better experience!
Read more >Release notes — Scrapy 1.8.3 documentation
Security bug fix: When HttpProxyMiddleware processes a request with proxy metadata, and that proxy metadata includes proxy credentials, HttpProxyMiddleware ...
Read more >Crawl and Follow links with SCRAPY - YouTube
Scrapy is a powerful web scrapign framework for Python, we can use it to following links and crawl a website, in this case...
Read more >Scrapy Splash for Beginners - Example, Settings and Shell Use
In this video I will show you how to get scrapy working with splash. By sending our requests to the splash API we...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I also got the same issue here today and found that CrawlSpider do a response type check in _requests_to_follow function
However responses generated by Splash would be SplashTextResponse or SplashJsonResponse. That check caused splash response won’t have any requests to follow.
@MontaLabidi Your solution worked for me.
This is how my code looks:
This works perfectly for me.