Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cannot yield item in forsubfunction of parse process

See original GitHub issue

Here I will give a example:

class HiapkSpider(scrapy.spider.Spider):
    name="hiapk"
    allowed_domains=['apk.hiapk.com']

    def __init__(self,start=1,end=2):
        self.prefix_url="http://apk.hiapk.com/apps?sort=5&pi="
        self.base_url="http://apk.hiapk.com"
        self.prefix_path="../temp/"
        self.page_start=int(start)
        self.page_index=self.page_start
        self.page_end=int(end)
        HiapkSpider.start_urls=[self.prefix_url+str(self.page_start)]

    def parse(self,response):
        self.apkList=[]
        if self.page_index > self.page_end:
            print 'Spider Completed'
            return
        for sel in response.xpath('//li[contains(@class,"list_item")]'):
            item=ApkItem()
            item['name']=sel.xpath('div/dl/dt/span/a/@href').extract()[0].split('.')[-1]
            item['version']=sel.xpath('div/dl/dt/*[2]/text()').extract()[0][1:-1]
            item['url']=self.base_url+sel.xpath('div/*[3]/a/@href').extract()[0]
            item['path']=self.prefix_path+item['name']+'.apk'
            self.apkList.append(item)
        self.download()
        self.page_index+=1
        for item in self.apkList:
            yield item
        yield Request(self.prefix_url+str(self.page_index),
                      callback=self.parse)


    def download(self):
        print 'DownLoad Start'
        p=Pool()
        result=[]
        for item in self.apkList:
           # tmp=p.apply_async(urllib.urlretrieve,[item['url'],item['path']])
           tmp=p.apply_async(kkk,[2])
           result.append(tmp)
        p.close()
        p.join()
        for i in range(0,len(self.apkList)):
            try:
                print result[i].get()
            except:
                continue
            else:
                yield self.apkList[i]

if I use yield in the for stucture of download function , this function will not be excuted. But if I yield directly in download function, all right. Very strange!

Issue Analytics

State:
Created 8 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

2reactions

curitacommented, Apr 3, 2015

self.download() returns an iterator, you have to iterate over its results to get them. Replacing that line with:

for item in self.download():
    yield item

should do the work.

I just skimmed through that code but this seems like a support question, not a bug, I encourage you to use our scrapy-users mailing list for those.

0reactions

kmikecommented, Apr 14, 2015

It doesn’t look like an issue with Scrapy. Feel free to reopen if you still think there is some problem.

As a side note, you can get async downloads without using a thread pool (see http://doc.scrapy.org/en/master/topics/request-response.html#passing-additional-data-to-callback-functions). There are tickets to make it easier: #1144, #1138.

Top Results From Across the Web

Scrapy can't return item when using yield? - Stack Overflow

I'll try to abstract my code because it's a bit big. So this function is used to parse a forum thread def parse_thread_next_pages(self,response): ......

Process All Yield Items In Scrapy - ADocLib

Currently I have a Scrapy Spider yielding anycodingsscrapy various items on the parse method.Is there anycodingsscrapy any way to get all. Scrapy supports...

User's guide for the Yield Estimation Subsystem Data Management ...

i n the process of crer4tirg and maintaining the data base. ... since they are internally parsed in order as they appear on....

Scrapy cant return item when using yield - Anycodings.com

I'll try to abstract my code because it's a anycodings_web-crawler bit big. So this function is used to parse a forum anycodings_web-crawler ...

Print Preview - Creating Web Pages in your Account

You cannot use cell arrays or structures. If you include fewer formatting operators than there are values to insert,. MATLAB reuses the operators...