Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to call spider.parse from a single url

See original GitHub issue

Hi, I am trying to use a method parse of Spider in a source. The following is an ideal code, though it does not work.

url = 'http://www.foo.com'
responce = Response(url)
myitem = MySpider().parse(response)

return myitem

where MySpider implements some parse method to yield an Item instance.

The reason might be that the response in the above downloads nothing. I think Request must be utilized to realize that, but I do not know how to do.

thanks.

Issue Analytics

State:
Created 5 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

kmikecommented, Dec 28, 2018

Another approach is to separate scraping from crawling by extracting the scraping code to a separate function, which uses parsel.Selector instead of Response:


def extract_items(sel):
    # this function can be used without Scrapy now, just with parsel library
    # ...

class MySpider(scrapy.Spider):
    # ...
    def parse(self, response):
         return extract_items(response.selector)

0reactions

hiro-o918commented, Dec 12, 2018

@elacuesta Thanks, yes, I try to use the snippet as a standalone script. I have already developed MySpider class and succeeded in crawling and scraping some pages. In my task, I want to scrape a page written in the same format MySpider class handled without crawling. So I wonder whether there is a way to reuse MySpider class to scrape the pages.

I know the way is different from the original use of this package. And, in fact, I solved this by using another package.

Top Results From Across the Web

One spider with 2 different URL and 2 parse using Scrapy

In first if you call yield scrapy.Request with self.url . It is a list, that's why it raises an error. Replace it with...

Spiders — Scrapy 2.7.1 documentation

The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have ......

Easy web scraping with Scrapy | ScrapingBee

With Scrapy, Spiders are classes where you define your crawling (what links / URLs need to be scraped) and scraping (what to extract)...

Spiders - Scrapy documentation - Read the Docs

The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have...

Scrapy - Spiders - Tutorialspoint

When no particular URLs are specified and the spider is opened for scrapping, Scrapy calls start_requests() method. 10. make_requests_from_url(url).

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

How to call spider.parse from a single url

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

should raise `AttributeError: Response.meta not available` when access response.meta in `process_resonse`?

Scrapy mail.send error