question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to call spider.parse from a single url

See original GitHub issue

Hi, I am trying to use a method parse of Spider in a source. The following is an ideal code, though it does not work.

url = 'http://www.foo.com'
responce = Response(url)
myitem = MySpider().parse(response)

return myitem

where MySpider implements some parse method to yield an Item instance.

The reason might be that the response in the above downloads nothing. I think Request must be utilized to realize that, but I do not know how to do.

thanks.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
kmikecommented, Dec 28, 2018

Another approach is to separate scraping from crawling by extracting the scraping code to a separate function, which uses parsel.Selector instead of Response:


def extract_items(sel):
    # this function can be used without Scrapy now, just with parsel library
    # ...

class MySpider(scrapy.Spider):
    # ...
    def parse(self, response):
         return extract_items(response.selector) 
0reactions
hiro-o918commented, Dec 12, 2018

@elacuesta Thanks, yes, I try to use the snippet as a standalone script. I have already developed MySpider class and succeeded in crawling and scraping some pages. In my task, I want to scrape a page written in the same format MySpider class handled without crawling. So I wonder whether there is a way to reuse MySpider class to scrape the pages.

I know the way is different from the original use of this package. And, in fact, I solved this by using another package.

Read more comments on GitHub >

github_iconTop Results From Across the Web

One spider with 2 different URL and 2 parse using Scrapy
In first if you call yield scrapy.Request with self.url . It is a list, that's why it raises an error. Replace it with...
Read more >
Spiders — Scrapy 2.7.1 documentation
The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have ......
Read more >
Easy web scraping with Scrapy | ScrapingBee
With Scrapy, Spiders are classes where you define your crawling (what links / URLs need to be scraped) and scraping (what to extract)...
Read more >
Spiders - Scrapy documentation - Read the Docs
The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have...
Read more >
Scrapy - Spiders - Tutorialspoint
When no particular URLs are specified and the spider is opened for scrapping, Scrapy calls start_requests() method. 10. make_requests_from_url(url).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found