How to call spider.parse from a single url
See original GitHub issueHi, I am trying to use a method parse
of Spider in a source.
The following is an ideal code, though it does not work.
url = 'http://www.foo.com'
responce = Response(url)
myitem = MySpider().parse(response)
return myitem
where MySpider
implements some parse
method to yield an Item instance.
The reason might be that the response
in the above downloads nothing.
I think Request
must be utilized to realize that, but I do not know how to do.
thanks.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
One spider with 2 different URL and 2 parse using Scrapy
In first if you call yield scrapy.Request with self.url . It is a list, that's why it raises an error. Replace it with...
Read more >Spiders — Scrapy 2.7.1 documentation
The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have ......
Read more >Easy web scraping with Scrapy | ScrapingBee
With Scrapy, Spiders are classes where you define your crawling (what links / URLs need to be scraped) and scraping (what to extract)...
Read more >Spiders - Scrapy documentation - Read the Docs
The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have...
Read more >Scrapy - Spiders - Tutorialspoint
When no particular URLs are specified and the spider is opened for scrapping, Scrapy calls start_requests() method. 10. make_requests_from_url(url).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Another approach is to separate scraping from crawling by extracting the scraping code to a separate function, which uses parsel.Selector instead of Response:
@elacuesta Thanks, yes, I try to use the snippet as a standalone script. I have already developed
MySpider class
and succeeded in crawling and scraping some pages. In my task, I want to scrape a page written in the same formatMySpider class
handled without crawling. So I wonder whether there is a way to reuseMySpider class
to scrape the pages.I know the way is different from the original use of this package. And, in fact, I solved this by using another package.